All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations
@ 2017-04-27 11:59 Richard Henderson
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 01/19] target/nios2: Fix 64-bit ilp32 compilation Richard Henderson
                   ` (21 more replies)
  0 siblings, 22 replies; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 11:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

Changes since Emilio's v4:
  * Fold tcg/i386 exit_tb 0 to the epilogue we created for goto_ptr.
  * Drop gen_jr in favor of DISAS_EXIT for target/arm.
  * Backend support for ppc, aarch64, sparc, s390.
  * Fix 3 build failures that appear on sparc v8plus (64-bit ilp32).

I attempted to throw together an x32 environment to validate
x86_64 ilp32, which ought to have had the same problems as sparc,
but my patience was exhausted by gentoo misconfigury.  I may try
that again later, but not now.


r~


Emilio G. Cota (11):
  exec-all: export tb_htable_lookup
  tcg-runtime: add lookup_tb_ptr helper
  tcg: introduce goto_ptr opcode
  tcg: export tcg_gen_lookup_and_goto_ptr
  target/arm: optimize cross-page direct jumps in softmmu
  target/arm: optimize indirect branches
  target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr
  target/i386: optimize cross-page direct jumps in softmmu
  target/i386: optimize indirect branches
  tb-hash: improve tb_jmp_cache hash function in user mode
  tcg/i386: implement goto_ptr

Richard Henderson (8):
  target/nios2: Fix 64-bit ilp32 compilation
  tcg/sparc: Use the proper compilation flags for 32-bit
  qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts
  target/alpha: Use tcg_gen_goto_ptr
  tcg/ppc: Implement goto_ptr
  tcg/aarch64: Implement goto_ptr
  tcg/sparc: Implement goto_ptr
  tcg/s390: Implement goto_ptr

 configure                    |  6 +++---
 cpu-exec.c                   |  6 ++----
 include/exec/exec-all.h      |  2 ++
 include/exec/tb-hash.h       | 12 +++++++++++
 include/qemu/atomic.h        | 34 ++++++++++++++++++++++--------
 target/alpha/translate.c     | 49 ++++++++++++++++++++++++++++++++------------
 target/arm/translate.c       | 21 ++++++++++++++-----
 target/arm/translate.h       |  4 ++++
 target/i386/translate.c      | 43 ++++++++++++++++++++++++++++++--------
 target/nios2/translate.c     |  2 +-
 tcg-runtime.c                | 24 ++++++++++++++++++++++
 tcg/README                   |  8 ++++++++
 tcg/aarch64/tcg-target.h     |  1 +
 tcg/aarch64/tcg-target.inc.c | 22 ++++++++++++++++++--
 tcg/arm/tcg-target.h         |  1 +
 tcg/i386/tcg-target.h        |  1 +
 tcg/i386/tcg-target.inc.c    | 24 ++++++++++++++++++++--
 tcg/ia64/tcg-target.h        |  1 +
 tcg/mips/tcg-target.h        |  1 +
 tcg/ppc/tcg-target.h         |  1 +
 tcg/ppc/tcg-target.inc.c     |  7 +++++++
 tcg/s390/tcg-target.h        |  1 +
 tcg/s390/tcg-target.inc.c    | 24 +++++++++++++++++++---
 tcg/sparc/tcg-target.h       |  1 +
 tcg/sparc/tcg-target.inc.c   | 11 +++++++++-
 tcg/tcg-op.c                 | 13 ++++++++++++
 tcg/tcg-op.h                 | 11 ++++++++++
 tcg/tcg-opc.h                |  1 +
 tcg/tcg-runtime.h            |  2 ++
 tcg/tcg.h                    |  1 +
 tcg/tci/tcg-target.h         |  1 +
 31 files changed, 285 insertions(+), 51 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 01/19] target/nios2: Fix 64-bit ilp32 compilation
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
@ 2017-04-27 11:59 ` Richard Henderson
  2017-04-27 16:03   ` Alex Bennée
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 02/19] tcg/sparc: Use the proper compilation flags for 32-bit Richard Henderson
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 11:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

Avoid a "cast from pointer to integer of different size" warning
by using the proper host type.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/nios2/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/nios2/translate.c b/target/nios2/translate.c
index cfec479..2f3c2e5 100644
--- a/target/nios2/translate.c
+++ b/target/nios2/translate.c
@@ -164,7 +164,7 @@ static void gen_goto_tb(DisasContext *dc, int n, uint32_t dest)
     if (use_goto_tb(dc, dest)) {
         tcg_gen_goto_tb(n);
         tcg_gen_movi_tl(dc->cpu_R[R_PC], dest);
-        tcg_gen_exit_tb((tcg_target_long)tb + n);
+        tcg_gen_exit_tb((uintptr_t)tb + n);
     } else {
         tcg_gen_movi_tl(dc->cpu_R[R_PC], dest);
         tcg_gen_exit_tb(0);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 02/19] tcg/sparc: Use the proper compilation flags for 32-bit
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 01/19] target/nios2: Fix 64-bit ilp32 compilation Richard Henderson
@ 2017-04-27 11:59 ` Richard Henderson
  2017-04-27 16:04   ` Alex Bennée
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 03/19] qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts Richard Henderson
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 11:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

We have required a v9 cpu since 9b9c37c36439ee0452632253dac7a31897f27f70.
However, the flags we were using did not reliably enable v8plus, which
meant that the compiler didn't know it could inline 64-bit atomics.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 configure | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index c35acf1..55dd9c3 100755
--- a/configure
+++ b/configure
@@ -1206,12 +1206,12 @@ case "$cpu" in
            LDFLAGS="-m64 $LDFLAGS"
            ;;
     sparc)
-           LDFLAGS="-m32 $LDFLAGS"
-           CPU_CFLAGS="-m32 -mcpu=ultrasparc"
+           CPU_CFLAGS="-m32 -mv8plus -mcpu=ultrasparc"
+           LDFLAGS="-m32 -mv8plus $LDFLAGS"
            ;;
     sparc64)
-           LDFLAGS="-m64 $LDFLAGS"
            CPU_CFLAGS="-m64 -mcpu=ultrasparc"
+           LDFLAGS="-m64 $LDFLAGS"
            ;;
     s390)
            CPU_CFLAGS="-m31"
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 03/19] qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 01/19] target/nios2: Fix 64-bit ilp32 compilation Richard Henderson
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 02/19] tcg/sparc: Use the proper compilation flags for 32-bit Richard Henderson
@ 2017-04-27 11:59 ` Richard Henderson
  2017-04-27 16:10   ` Alex Bennée
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 04/19] exec-all: export tb_htable_lookup Richard Henderson
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 11:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

We need to coordinate with the TCG_OVERSIZED_GUEST test in cputlb.c,
and allow 64-bit atomics even though sizeof(void *) == 4.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/qemu/atomic.h | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index 878fa07..8a564e9 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -88,6 +88,24 @@
 #define smp_read_barrier_depends()   barrier()
 #endif
 
+/* Sanity check that the size of an atomic operation isn't "overly large".
+ * Despite the fact that e.g. i686 has 64-bit atomic operations, we do not
+ * want to use them because we ought not need them, and this lets us do a
+ * bit of sanity checking that other 32-bit hosts might build.
+ *
+ * That said, 64-bit hosts running in ilp32 mode cannot use pointer size
+ * as the test; we need the full register size.
+ * ??? Testing TCG_TARGET_REG_BITS == 64 would exact, but we probably do
+ * not want to pull in everything else TCG related.
+ *
+ * Note that x32 is fully detected with __x64_64__ + _ILP32, and that for
+ * Sparc we always force the use of sparcv9 in configure.
+ */
+#if defined(__x86_64__) || defined(__sparc__)
+# define ATOMIC_REG_SIZE  8
+#else
+# define ATOMIC_REG_SIZE  sizeof(void *)
+#endif
 
 /* Weak atomic operations prevent the compiler moving other
  * loads/stores past the atomic operation load/store. However there is
@@ -104,7 +122,7 @@
 
 #define atomic_read(ptr)                              \
     ({                                                \
-    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *)); \
+    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE); \
     atomic_read__nocheck(ptr);                        \
     })
 
@@ -112,7 +130,7 @@
     __atomic_store_n(ptr, i, __ATOMIC_RELAXED)
 
 #define atomic_set(ptr, i)  do {                      \
-    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *)); \
+    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE); \
     atomic_set__nocheck(ptr, i);                      \
 } while(0)
 
@@ -130,27 +148,27 @@
 
 #define atomic_rcu_read(ptr)                          \
     ({                                                \
-    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *)); \
+    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE); \
     typeof_strip_qual(*ptr) _val;                     \
     atomic_rcu_read__nocheck(ptr, &_val);             \
     _val;                                             \
     })
 
 #define atomic_rcu_set(ptr, i) do {                   \
-    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *)); \
+    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE); \
     __atomic_store_n(ptr, i, __ATOMIC_RELEASE);       \
 } while(0)
 
 #define atomic_load_acquire(ptr)                        \
     ({                                                  \
-    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
+    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE);  \
     typeof_strip_qual(*ptr) _val;                       \
     __atomic_load(ptr, &_val, __ATOMIC_ACQUIRE);        \
     _val;                                               \
     })
 
 #define atomic_store_release(ptr, i)  do {              \
-    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
+    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE);  \
     __atomic_store_n(ptr, i, __ATOMIC_RELEASE);         \
 } while(0)
 
@@ -162,7 +180,7 @@
 })
 
 #define atomic_xchg(ptr, i)    ({                           \
-    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));       \
+    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE);      \
     atomic_xchg__nocheck(ptr, i);                           \
 })
 
@@ -175,7 +193,7 @@
 })
 
 #define atomic_cmpxchg(ptr, old, new)    ({                             \
-    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));                   \
+    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE);                  \
     atomic_cmpxchg__nocheck(ptr, old, new);                             \
 })
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 04/19] exec-all: export tb_htable_lookup
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (2 preceding siblings ...)
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 03/19] qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts Richard Henderson
@ 2017-04-27 11:59 ` Richard Henderson
  2017-04-27 16:10   ` Alex Bennée
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 05/19] tcg-runtime: add lookup_tb_ptr helper Richard Henderson
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 11:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

From: "Emilio G. Cota" <cota@braap.org>

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-2-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 cpu-exec.c              | 6 ++----
 include/exec/exec-all.h | 2 ++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 63a56d0..5b181c1 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -309,10 +309,8 @@ static bool tb_cmp(const void *p, const void *d)
     return false;
 }
 
-static TranslationBlock *tb_htable_lookup(CPUState *cpu,
-                                          target_ulong pc,
-                                          target_ulong cs_base,
-                                          uint32_t flags)
+TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
+                                   target_ulong cs_base, uint32_t flags)
 {
     tb_page_addr_t phys_pc;
     struct tb_desc desc;
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index bcde1e6..87ae10b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -368,6 +368,8 @@ struct TranslationBlock {
 void tb_free(TranslationBlock *tb);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
+TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
+                                   target_ulong cs_base, uint32_t flags);
 
 #if defined(USE_DIRECT_JUMP)
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 05/19] tcg-runtime: add lookup_tb_ptr helper
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (3 preceding siblings ...)
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 04/19] exec-all: export tb_htable_lookup Richard Henderson
@ 2017-04-27 11:59 ` Richard Henderson
  2017-04-28 10:29   ` Alex Bennée
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 06/19] tcg: introduce goto_ptr opcode Richard Henderson
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 11:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

From: "Emilio G. Cota" <cota@braap.org>

This paves the way for upcoming work.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-3-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg-runtime.c     | 24 ++++++++++++++++++++++++
 tcg/tcg-runtime.h |  2 ++
 tcg/tcg.h         |  1 +
 3 files changed, 27 insertions(+)

diff --git a/tcg-runtime.c b/tcg-runtime.c
index 4c60c96..8a24bdd 100644
--- a/tcg-runtime.c
+++ b/tcg-runtime.c
@@ -27,6 +27,7 @@
 #include "exec/helper-proto.h"
 #include "exec/cpu_ldst.h"
 #include "exec/exec-all.h"
+#include "exec/tb-hash.h"
 
 /* 32-bit helpers */
 
@@ -141,6 +142,29 @@ uint64_t HELPER(ctpop_i64)(uint64_t arg)
     return ctpop64(arg);
 }
 
+void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr)
+{
+    CPUState *cpu = ENV_GET_CPU(env);
+    TranslationBlock *tb;
+    target_ulong cs_base, pc;
+    uint32_t flags;
+
+    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
+    if (likely(tb)) {
+        cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
+        if (likely(tb->pc == addr && tb->cs_base == cs_base &&
+                   tb->flags == flags)) {
+            return tb->tc_ptr;
+        }
+        tb = tb_htable_lookup(cpu, pc, cs_base, flags);
+        if (likely(tb)) {
+            atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)], tb);
+            return tb->tc_ptr;
+        }
+    }
+    return tcg_ctx.code_gen_epilogue;
+}
+
 void HELPER(exit_atomic)(CPUArchState *env)
 {
     cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
index 114ea6f..c41d38a 100644
--- a/tcg/tcg-runtime.h
+++ b/tcg/tcg-runtime.h
@@ -24,6 +24,8 @@ DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(ctpop_i32, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(ctpop_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 
+DEF_HELPER_FLAGS_2(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env, tl)
+
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
 #ifdef CONFIG_SOFTMMU
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 6c216bb..5ec48d1 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -699,6 +699,7 @@ struct TCGContext {
        extension that allows arithmetic on void*.  */
     int code_gen_max_blocks;
     void *code_gen_prologue;
+    void *code_gen_epilogue;
     void *code_gen_buffer;
     size_t code_gen_buffer_size;
     void *code_gen_ptr;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 06/19] tcg: introduce goto_ptr opcode
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (4 preceding siblings ...)
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 05/19] tcg-runtime: add lookup_tb_ptr helper Richard Henderson
@ 2017-04-27 11:59 ` Richard Henderson
  2017-04-28 10:32   ` Alex Bennée
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 07/19] tcg: export tcg_gen_lookup_and_goto_ptr Richard Henderson
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 11:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

From: "Emilio G. Cota" <cota@braap.org>

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-4-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.h | 1 +
 tcg/arm/tcg-target.h     | 1 +
 tcg/i386/tcg-target.h    | 1 +
 tcg/ia64/tcg-target.h    | 1 +
 tcg/mips/tcg-target.h    | 1 +
 tcg/ppc/tcg-target.h     | 1 +
 tcg/s390/tcg-target.h    | 1 +
 tcg/sparc/tcg-target.h   | 1 +
 tcg/tcg-opc.h            | 1 +
 tcg/tci/tcg-target.h     | 1 +
 10 files changed, 10 insertions(+)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 1a5ea23..b82eac4 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -77,6 +77,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_extrl_i64_i32    0
 #define TCG_TARGET_HAS_extrh_i64_i32    0
+#define TCG_TARGET_HAS_goto_ptr         0
 
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          1
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 75ea247..c114df7 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -123,6 +123,7 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_div_i32          use_idiv_instructions
 #define TCG_TARGET_HAS_rem_i32          0
+#define TCG_TARGET_HAS_goto_ptr         0
 
 enum {
     TCG_AREG0 = TCG_REG_R6,
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 4275787..59d9835 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -107,6 +107,7 @@ extern bool have_popcnt;
 #define TCG_TARGET_HAS_muls2_i32        1
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
+#define TCG_TARGET_HAS_goto_ptr         0
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_extrl_i64_i32    0
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 42aea03..901bb75 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -173,6 +173,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mulsh_i64        0
 #define TCG_TARGET_HAS_extrl_i64_i32    0
 #define TCG_TARGET_HAS_extrh_i64_i32    0
+#define TCG_TARGET_HAS_goto_ptr         0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) ((len) <= 16)
 #define TCG_TARGET_deposit_i64_valid(ofs, len) ((len) <= 16)
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index f46d64a..e3240cf 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -130,6 +130,7 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_muluh_i32        1
 #define TCG_TARGET_HAS_mulsh_i32        1
 #define TCG_TARGET_HAS_bswap32_i32      1
+#define TCG_TARGET_HAS_goto_ptr         0
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index abd8b3d..a9aa974 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -82,6 +82,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        1
 #define TCG_TARGET_HAS_mulsh_i32        1
+#define TCG_TARGET_HAS_goto_ptr         0
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index cbdd2a6..6b7bcfb 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -92,6 +92,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_mulsh_i32      0
 #define TCG_TARGET_HAS_extrl_i64_i32  0
 #define TCG_TARGET_HAS_extrh_i64_i32  0
+#define TCG_TARGET_HAS_goto_ptr       0
 
 #define TCG_TARGET_HAS_div2_i64       1
 #define TCG_TARGET_HAS_rot_i64        1
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index b8b74f96f..9348ddd 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -123,6 +123,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muls2_i32        1
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
+#define TCG_TARGET_HAS_goto_ptr         0
 
 #define TCG_TARGET_HAS_extrl_i64_i32    1
 #define TCG_TARGET_HAS_extrh_i64_i32    1
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index f06f894..956fb1e 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -193,6 +193,7 @@ DEF(insn_start, 0, 0, TLADDR_ARGS * TARGET_INSN_START_WORDS,
     TCG_OPF_NOT_PRESENT)
 DEF(exit_tb, 0, 0, 1, TCG_OPF_BB_END)
 DEF(goto_tb, 0, 0, 1, TCG_OPF_BB_END)
+DEF(goto_ptr, 0, 1, 0, TCG_OPF_BB_END | IMPL(TCG_TARGET_HAS_goto_ptr))
 
 DEF(qemu_ld_i32, 1, TLADDR_ARGS, 1,
     TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 838bf3a..0696328 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -85,6 +85,7 @@
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
+#define TCG_TARGET_HAS_goto_ptr         0
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_extrl_i64_i32    0
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 07/19] tcg: export tcg_gen_lookup_and_goto_ptr
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (5 preceding siblings ...)
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 06/19] tcg: introduce goto_ptr opcode Richard Henderson
@ 2017-04-27 11:59 ` Richard Henderson
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 08/19] target/arm: optimize cross-page direct jumps in softmmu Richard Henderson
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 11:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

From: "Emilio G. Cota" <cota@braap.org>

Instead of exporting goto_ptr directly to TCG frontends, export
tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer
returned by the lookup_tb_ptr() helper. This is the only use case
we have for goto_ptr and lookup_tb_ptr, so having this function is
very convenient. Furthermore, it trivially allows us to avoid calling
the lookup helper if goto_ptr is not implemented by the backend.

Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-5-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/README   |  8 ++++++++
 tcg/tcg-op.c | 13 +++++++++++++
 tcg/tcg-op.h | 11 +++++++++++
 3 files changed, 32 insertions(+)

diff --git a/tcg/README b/tcg/README
index a9858c2..bf49e82 100644
--- a/tcg/README
+++ b/tcg/README
@@ -477,6 +477,14 @@ current TB was linked to this TB. Otherwise execute the next
 instructions. Only indices 0 and 1 are valid and tcg_gen_goto_tb may be issued
 at most once with each slot index per TB.
 
+* lookup_and_goto_ptr tb_addr
+
+Look up a TB address ('tb_addr') and jump to it if valid. If not valid,
+jump to the TCG epilogue to go back to the exec loop.
+
+This operation is optional. If the TCG backend does not implement the
+goto_ptr opcode, emitting this op is equivalent to emitting exit_tb(0).
+
 * qemu_ld_i32/i64 t0, t1, flags, memidx
 * qemu_st_i32/i64 t0, t1, flags, memidx
 
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 6b1f415..660dac9 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2587,6 +2587,19 @@ void tcg_gen_goto_tb(unsigned idx)
     tcg_gen_op1i(INDEX_op_goto_tb, idx);
 }
 
+void tcg_gen_lookup_and_goto_ptr(TCGv addr)
+{
+    if (TCG_TARGET_HAS_goto_ptr) {
+        TCGv_ptr ptr = tcg_temp_new_ptr();
+
+        gen_helper_lookup_tb_ptr(ptr, tcg_ctx.tcg_env, addr);
+        tcg_gen_op1i(INDEX_op_goto_ptr, GET_TCGV_PTR(ptr));
+        tcg_temp_free_ptr(ptr);
+    } else {
+        tcg_gen_exit_tb(0);
+    }
+}
+
 static inline TCGMemOp tcg_canonicalize_memop(TCGMemOp op, bool is64, bool st)
 {
     /* Trigger the asserts within as early as possible.  */
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index c68e300..5d3278f 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -796,6 +796,17 @@ static inline void tcg_gen_exit_tb(uintptr_t val)
  */
 void tcg_gen_goto_tb(unsigned idx);
 
+/**
+ * tcg_gen_lookup_and_goto_ptr() - look up a TB and jump to it if valid
+ * @addr: Guest address of the target TB
+ *
+ * If the TB is not valid, jump to the epilogue.
+ *
+ * This operation is optional. If the TCG backend does not implement goto_ptr,
+ * this op is equivalent to calling tcg_gen_exit_tb() with 0 as the argument.
+ */
+void tcg_gen_lookup_and_goto_ptr(TCGv addr);
+
 #if TARGET_LONG_BITS == 32
 #define tcg_temp_new() tcg_temp_new_i32()
 #define tcg_global_reg_new tcg_global_reg_new_i32
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 08/19] target/arm: optimize cross-page direct jumps in softmmu
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (6 preceding siblings ...)
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 07/19] tcg: export tcg_gen_lookup_and_goto_ptr Richard Henderson
@ 2017-04-27 11:59 ` Richard Henderson
  2017-04-28 11:30   ` Alex Bennée
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 09/19] target/arm: optimize indirect branches Richard Henderson
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 11:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

From: "Emilio G. Cota" <cota@braap.org>

Instead of unconditionally exiting to the exec loop, use the
lookup_and_goto_ptr helper to jump to the target if it is valid.

Perf impact: see next commit's log.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-7-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/arm/translate.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 0b5a0bc..facb52f 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4153,8 +4153,12 @@ static inline void gen_goto_tb(DisasContext *s, int n, target_ulong dest)
         gen_set_pc_im(s, dest);
         tcg_gen_exit_tb((uintptr_t)s->tb + n);
     } else {
+        TCGv addr = tcg_temp_new();
+
         gen_set_pc_im(s, dest);
-        tcg_gen_exit_tb(0);
+        tcg_gen_extu_i32_tl(addr, cpu_R[15]);
+        tcg_gen_lookup_and_goto_ptr(addr);
+        tcg_temp_free(addr);
     }
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 09/19] target/arm: optimize indirect branches
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (7 preceding siblings ...)
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 08/19] target/arm: optimize cross-page direct jumps in softmmu Richard Henderson
@ 2017-04-27 11:59 ` Richard Henderson
  2017-04-27 22:58   ` Emilio G. Cota
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 10/19] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr Richard Henderson
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 11:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

From: "Emilio G. Cota" <cota@braap.org>

Speed up indirect branches by jumping to the target if it is valid.

Softmmu measurements (see later commit for user-mode results):

Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.

- Impact on Boot time

| setup  | ARM debian jessie boot+shutdown time | stddev |
|--------+--------------------------------------+--------|
| v2.9.0 |                                 8.84 |   0.07 |
| +cross |                                 8.85 |   0.03 |
| +jr    |                                 8.83 |   0.06 |

-                            NBench, arm-softmmu (debian jessie guest). Host: Intel i7-4790K @ 4.00GHz

  1.3x +-+-------------------------------------------------------------------------------------------------------------+-+
       |                                                                                                                 |
       |   cross                                                          ####                                           |
 1.25x +cross+jr..........................................................#++#.........................................+-+
       |                                                        ####      #  #                                           |
       |                                                     +++#  #      #  #                                           |
       |                                      +++            ****  #      #  #                                           |
  1.2x +-+...................................####............*..*..#......#..#.........................................+-+
       |                                  ****  #            *  *  #      #  #     ####                                  |
       |                                  *  *  #            *  *  #      #  #     #  #                                  |
 1.15x +-+................................*..*..#............*..*..#......#..#.....#..#................................+-+
       |                                  *  *  #            *  *  #      #  #     #  #                                  |
       |                                  *  *  #      ####  *  *  #      #  #     #  #                                  |
       |                                  *  *  #      #  #  *  *  #      #  #     #  #                         ####     |
  1.1x +-+................................*..*..#......#..#..*..*..#......#..#.....#..#.........................#..#...+-+
       |                                  *  *  #      #  #  *  *  #      #  #     #  #                         #  #     |
       |                                  *  *  #      #  #  *  *  #      #  #     #  #                         #  #     |
 1.05x +-+..........................####..*..*..#......#..#..*..*..#......#..#.....#..#......+++............*****..#...+-+
       |                        *****  #  *  *  #      #  #  *  *  #  *****  #     #  #   +++ |    ****###  *   *  #     |
       |                        *+++*  #  *  *  #      #  #  *  *  #  *+++*  #  ****  #  *****###  *  *  #  *   *  #     |
       |     *****###  +++####  *   *  #  *  *  #  *****  #  *  *  #  *   *  #  *  *  #  * | *++#  *  *  #  *   *  #     |
    1x +-++-+*+++*-+#++****++#++*+-+*++#+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-++-+
       |     *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
       |     *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
 0.95x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
       ASSIGNMENT BITFIELD   FOURFP EMULATION   HUFFMAN   LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT     hmean
  png: http://imgur.com/eOLmZNR

NB. 'cross' represents the previous commit.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-8-git-send-email-cota@braap.org>
[rth: Replace gen_jr global variable with DISAS_EXIT state.]
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/arm/translate.c | 25 ++++++++++++++++---------
 target/arm/translate.h |  4 ++++
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index facb52f..f879da6 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1178,7 +1178,7 @@ static void gen_exception_internal_insn(DisasContext *s, int offset, int excp)
     gen_set_condexec(s);
     gen_set_pc_im(s, s->pc - offset);
     gen_exception_internal(excp);
-    s->is_jmp = DISAS_JUMP;
+    s->is_jmp = DISAS_EXC;
 }
 
 static void gen_exception_insn(DisasContext *s, int offset, int excp,
@@ -1187,14 +1187,14 @@ static void gen_exception_insn(DisasContext *s, int offset, int excp,
     gen_set_condexec(s);
     gen_set_pc_im(s, s->pc - offset);
     gen_exception(excp, syn, target_el);
-    s->is_jmp = DISAS_JUMP;
+    s->is_jmp = DISAS_EXC;
 }
 
 /* Force a TB lookup after an instruction that changes the CPU state.  */
 static inline void gen_lookup_tb(DisasContext *s)
 {
     tcg_gen_movi_i32(cpu_R[15], s->pc & ~1);
-    s->is_jmp = DISAS_JUMP;
+    s->is_jmp = DISAS_EXIT;
 }
 
 static inline void gen_hlt(DisasContext *s, int imm)
@@ -4146,19 +4146,23 @@ static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
 #endif
 }
 
-static inline void gen_goto_tb(DisasContext *s, int n, target_ulong dest)
+static void gen_goto_ptr(void)
+{
+    TCGv addr = tcg_temp_new();
+    tcg_gen_extu_i32_tl(addr, cpu_R[15]);
+    tcg_gen_lookup_and_goto_ptr(addr);
+    tcg_temp_free(addr);
+}
+
+static void gen_goto_tb(DisasContext *s, int n, target_ulong dest)
 {
     if (use_goto_tb(s, dest)) {
         tcg_gen_goto_tb(n);
         gen_set_pc_im(s, dest);
         tcg_gen_exit_tb((uintptr_t)s->tb + n);
     } else {
-        TCGv addr = tcg_temp_new();
-
         gen_set_pc_im(s, dest);
-        tcg_gen_extu_i32_tl(addr, cpu_R[15]);
-        tcg_gen_lookup_and_goto_ptr(addr);
-        tcg_temp_free(addr);
+        gen_goto_ptr();
     }
 }
 
@@ -12091,11 +12095,14 @@ void gen_intermediate_code(CPUARMState *env, TranslationBlock *tb)
             gen_set_pc_im(dc, dc->pc);
             /* fall through */
         case DISAS_JUMP:
+            gen_goto_ptr();
+            break;
         default:
             /* indicate that the hash table must be used to find the next TB */
             tcg_gen_exit_tb(0);
             break;
         case DISAS_TB_JUMP:
+        case DISAS_EXC:
             /* nothing more to generate */
             break;
         case DISAS_WFI:
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 629dab9..93de13f 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -139,6 +139,10 @@ static void disas_set_insn_syndrome(DisasContext *s, uint32_t syn)
  * custom end-of-TB code)
  */
 #define DISAS_BX_EXCRET 11
+/* For instructions which want an immediate exit to the main loop,
+ * as opposed to attempting to use lookup_and_goto_ptr.
+ */
+#define DISAS_EXIT 12
 
 #ifdef TARGET_AARCH64
 void a64_translate_init(void);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 10/19] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (8 preceding siblings ...)
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 09/19] target/arm: optimize indirect branches Richard Henderson
@ 2017-04-27 11:59 ` Richard Henderson
  2017-04-28 16:50   ` Alex Bennée
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 11/19] target/i386: optimize cross-page direct jumps in softmmu Richard Henderson
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 11:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

From: "Emilio G. Cota" <cota@braap.org>

This helper will be used by subsequent changes.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-9-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/i386/translate.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 1d1372f..f0e48dc 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -141,6 +141,7 @@ typedef struct DisasContext {
 } DisasContext;
 
 static void gen_eob(DisasContext *s);
+static void gen_jr(DisasContext *s, TCGv dest);
 static void gen_jmp(DisasContext *s, target_ulong eip);
 static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num);
 static void gen_op(DisasContext *s1, int op, TCGMemOp ot, int d);
@@ -2509,7 +2510,8 @@ static void gen_bnd_jmp(DisasContext *s)
    If INHIBIT, set HF_INHIBIT_IRQ_MASK if it isn't already set.
    If RECHECK_TF, emit a rechecking helper for #DB, ignoring the state of
    S->TF.  This is used by the syscall/sysret insns.  */
-static void gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf)
+static void
+do_gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf, TCGv jr)
 {
     gen_update_cc_op(s);
 
@@ -2530,12 +2532,27 @@ static void gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf)
         tcg_gen_exit_tb(0);
     } else if (s->tf) {
         gen_helper_single_step(cpu_env);
+    } else if (!TCGV_IS_UNUSED(jr)) {
+        TCGv vaddr = tcg_temp_new();
+
+        tcg_gen_add_tl(vaddr, jr, cpu_seg_base[R_CS]);
+        tcg_gen_lookup_and_goto_ptr(vaddr);
+        tcg_temp_free(vaddr);
     } else {
         tcg_gen_exit_tb(0);
     }
     s->is_jmp = DISAS_TB_JUMP;
 }
 
+static inline void
+gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf)
+{
+    TCGv unused;
+
+    TCGV_UNUSED(unused);
+    do_gen_eob_worker(s, inhibit, recheck_tf, unused);
+}
+
 /* End of block.
    If INHIBIT, set HF_INHIBIT_IRQ_MASK if it isn't already set.  */
 static void gen_eob_inhibit_irq(DisasContext *s, bool inhibit)
@@ -2549,6 +2566,12 @@ static void gen_eob(DisasContext *s)
     gen_eob_worker(s, false, false);
 }
 
+/* Jump to register */
+static void gen_jr(DisasContext *s, TCGv dest)
+{
+    do_gen_eob_worker(s, false, false, dest);
+}
+
 /* generate a jump to eip. No segment change must happen before as a
    direct call to the next block may occur */
 static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 11/19] target/i386: optimize cross-page direct jumps in softmmu
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (9 preceding siblings ...)
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 10/19] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr Richard Henderson
@ 2017-04-27 11:59 ` Richard Henderson
  2017-04-28 16:56   ` Alex Bennée
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 12/19] target/i386: optimize indirect branches Richard Henderson
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 11:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

From: "Emilio G. Cota" <cota@braap.org>

Instead of unconditionally exiting to the exec loop, use the
gen_jr helper to jump to the target if it is valid.

Perf impact: see next commit's log.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-10-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/i386/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index f0e48dc..ea113fe 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -2154,9 +2154,9 @@ static inline void gen_goto_tb(DisasContext *s, int tb_num, target_ulong eip)
         gen_jmp_im(eip);
         tcg_gen_exit_tb((uintptr_t)s->tb + tb_num);
     } else {
-        /* jump to another page: currently not optimized */
+        /* jump to another page */
         gen_jmp_im(eip);
-        gen_eob(s);
+        gen_jr(s, cpu_tmp0);
     }
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 12/19] target/i386: optimize indirect branches
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (10 preceding siblings ...)
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 11/19] target/i386: optimize cross-page direct jumps in softmmu Richard Henderson
@ 2017-04-27 11:59 ` Richard Henderson
  2017-04-28 16:58   ` Alex Bennée
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode Richard Henderson
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 11:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

From: "Emilio G. Cota" <cota@braap.org>

Speed up indirect branches by jumping to the target if it is valid.

Softmmu measurements (see later commit for user-mode numbers):

Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.

-                  SPECint06 (test set), x86_64-softmmu (Ubuntu 16.04 guest). Host: Intel i7-4790K @ 4.00GHz

 2.4x +-+--------------------------------------------------------------------------------------------------------------+-+
      |                                                                                                                  |
      |   cross                                                                                                          |
 2.2x +cross+jr..........................................................................+++...........................+-+
      |                                                                                   |                              |
      |                                                                               +++ |                              |
   2x +-+..............................................................................|..|............................+-+
      |                                                                                |  |                              |
      |                                                                                |  |                              |
 1.8x +-+..............................................................................|####...........................+-+
      |                                                                                |# |#                             |
      |                                                                              **** |#                             |
 1.6x +-+............................................................................*.|*.|#...........................+-+
      |                                                                              * |* |#                             |
      |                                                                              * |* |#                             |
 1.4x +-+.......................................................................+++..*.|*.|#...........................+-+
      |                                                      ++++++             #### * |*++#             +++             |
      |                        +++                            |  |              #++# *++*  #          +++ |              |
 1.2x +-+......................###.....####....+++............|..|...........****..#.*..*..#....####...|.###.....####..+-+
      |        +++          **** #  ****  #    ####          ***###          *++*  # *  *  #    #++#  ****|#  +++#++#    |
      |    ****###     +++  *++* #  *++*  #  ++#  #    ####  *|* |#     +++  *  *  # *  *  #  ***  #  *| *|#  ****  #    |
   1x +-++-*++*++#++***###++*++*+#++*+-*++#+****++#++***++#+-*+*++#-+****##++*++*-+#+*++*-+#++*+*++#++*-+*+#++*++*++#-++-+
      |    *  *  #  * *  #  *  * #  *  *  # *  *  #  * *  #  *|* |#  *++* #  *  *  # *  *  #  * *  #  *  * #  *  *  #    |
      |    *  *  #  * *  #  *  * #  *  *  # *  *  #  * *  #  *+*++#  *  * #  *  *  # *  *  #  * *  #  *  * #  *  *  #    |
 0.8x +-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+
         astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
  png: http://imgur.com/DU36YFU

NB. 'cross' represents the previous commit.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-11-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/i386/translate.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index ea113fe..674ec96 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4996,7 +4996,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             gen_push_v(s, cpu_T1);
             gen_op_jmp_v(cpu_T0);
             gen_bnd_jmp(s);
-            gen_eob(s);
+            gen_jr(s, cpu_T0);
             break;
         case 3: /* lcall Ev */
             gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
@@ -5014,7 +5014,8 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                                       tcg_const_i32(dflag - 1),
                                       tcg_const_i32(s->pc - s->cs_base));
             }
-            gen_eob(s);
+            tcg_gen_ld_tl(cpu_tmp4, cpu_env, offsetof(CPUX86State, eip));
+            gen_jr(s, cpu_tmp4);
             break;
         case 4: /* jmp Ev */
             if (dflag == MO_16) {
@@ -5022,7 +5023,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             }
             gen_op_jmp_v(cpu_T0);
             gen_bnd_jmp(s);
-            gen_eob(s);
+            gen_jr(s, cpu_T0);
             break;
         case 5: /* ljmp Ev */
             gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
@@ -5037,7 +5038,8 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 gen_op_movl_seg_T0_vm(R_CS);
                 gen_op_jmp_v(cpu_T1);
             }
-            gen_eob(s);
+            tcg_gen_ld_tl(cpu_tmp4, cpu_env, offsetof(CPUX86State, eip));
+            gen_jr(s, cpu_tmp4);
             break;
         case 6: /* push Ev */
             gen_push_v(s, cpu_T0);
@@ -6417,7 +6419,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         /* Note that gen_pop_T0 uses a zero-extending load.  */
         gen_op_jmp_v(cpu_T0);
         gen_bnd_jmp(s);
-        gen_eob(s);
+        gen_jr(s, cpu_T0);
         break;
     case 0xc3: /* ret */
         ot = gen_pop_T0(s);
@@ -6425,7 +6427,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         /* Note that gen_pop_T0 uses a zero-extending load.  */
         gen_op_jmp_v(cpu_T0);
         gen_bnd_jmp(s);
-        gen_eob(s);
+        gen_jr(s, cpu_T0);
         break;
     case 0xca: /* lret im */
         val = cpu_ldsw_code(env, s->pc);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (11 preceding siblings ...)
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 12/19] target/i386: optimize indirect branches Richard Henderson
@ 2017-04-27 12:00 ` Richard Henderson
  2017-04-28 17:00   ` Alex Bennée
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 14/19] target/alpha: Use tcg_gen_goto_ptr Richard Henderson
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 12:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

From: "Emilio G. Cota" <cota@braap.org>

Optimizations to cross-page chaining and indirect branches make
performance more sensitive to the hit rate of tb_jmp_cache.
The constraint of reserving some bits for the page number
lowers the achievable quality of the hashing function.

However, user-mode does not have this requirement. Thus,
with this change we use for user-mode a hashing function that
is both faster and of better quality than the previous one.

Measurements:

Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.

-                           SPECint06 (test set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz

 2.2x +-+--------------------------------------------------------------------------------------------------------------+-+
      |                                                                                                                  |
      |         jr                                                                                                       |
   2x +jr+multhash        +....................................................+++++...................................+-+
      |    jr+hash                                                              |$$$                                     |
      |                                                                         |$+$                                     |
      |                                                                        ### $                                     |
 1.8x +-+......................................................................#|#.$...................................+-+
      |                                                                      ++#+# $                                     |
      |                                                                       |# # $                                     |
 1.6x +-+....................................................................***.#.$....................++$$$..........+-+
      |                                         $$$                          *+* # $                     |$+$            |
      |                       ++$$$           ### $                          * * # $                  +++|$ $            |
      |                     ++###+$           # # $                          * * # $           ###   ****## $            |
 1.4x +-+...................***+#.$.........***.#.$..........................*.*.#.$...........#+#$$.*++*|#.$..........+-+
      |                     *+* # $         * * # $                          * * # $           # # $ *  *+# $            |
      |                     * * # $   +++++ * * # $                          * * # $         *** # $ *  * # $   ###$$    |
 1.2x +-+...................*.*.#.$.***##$$.*.*.#.$..........................*.*.#.$.........*.*.#.$.*..*.#.$.***+#+$..+-+
      |                     * * # $ *+* # $ * * # $   +++                    * * # $ ++###$$ * * # $ *  * # $ * * # $    |
      |    ***##$$          * * # $ * * # $ * * # $ ***##$$          ++###   * * # $ *** #+$ * * # $ *  * # $ * * # $    |
      |    *+*+#+$ ***##$$$ * * # $ * * # $ * * # $ *+* # $ ++####$$ ***+#   * * # $ * * # $ * * # $ *  * # $ * * # $    |
   1x +-++-*+*+#+$+*+*+#-+$+*+*-#+$+*+*+#+$+*+*+#+$+*-*+#+$+***++#+$+*+*+#$$+*+*+#+$+*+*+#+$+*+*-#+$+*+-*+#+$+*+*+#+$-++-+
      |    * * # $ * * #  $ * * # $ * * # $ * * # $ * * # $ * *  # $ * * # $ * * # $ * * # $ * * # $ *  * # $ * * # $    |
      |    * * # $ * * #  $ * * # $ * * # $ * * # $ * * # $ * *  # $ * * # $ * * # $ * * # $ * * # $ *  * # $ * * # $    |
 0.8x +-+--***##$$-***##$$$-***##$$-***##$$-***##$$-***##$$-***###$$-***##$$-***##$$-***##$$-***##$$-****##$$-***##$$--+-+
         astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
  png: http://imgur.com/4UXTrEc

Here I also tried the hash function suggested by Paolo ("multhash"):

  return ((uint64_t) (pc * 2654435761) >> 32) & (TB_JMP_CACHE_SIZE - 1);

As you can see it is just as good as the other new function ("hash"),
which is what I ended up going with.

-                          SPECint06 (train set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz

 2.6x +-+--------------------------------------------------------------------------------------------------------------+-+
      |                                                                                                                  |
      |     jr                                                                                           ###             |
 2.4x +jr+hash...........................................................................................#.#...........+-+
      |                                                                                                  # #             |
      |                                                                                                  # #             |
 2.2x +-+................................................................................................#.#...........+-+
      |                                                                                                  # #             |
      |                                                                                                  # #             |
   2x +-+................................................................................................#.#...........+-+
      |                                                                                               **** #             |
      |                                                                                               *  * #             |
 1.8x +-+.............................................................................................*..*.#...........+-+
      |                                                                         +++                   *  * #             |
      |                                                                         ####    ####          *  * #             |
 1.6x +-+......................................####.............................#..#.****..#..........*..*.#...........+-+
      |                        +++             #++#                          ****  # *  *  #    ####  *  * #             |
      |                        ###             #  #                          *  *  # *  *  #    #  #  *  * #             |
 1.4x +-+...................****+#..........****..#..........................*..*..#.*..*..#....#..#..*..*.#...........+-+
      |                     *++* #          *  *  #                          *  *  # *  *  #  ***  #  *  * #     ####    |
      |                     *  * #     #### *  *  #                          *  *  # *  *  #  * *  #  *  * #  ****  #    |
 1.2x +-+...................*..*.#..****++#.*..*..#..........................*..*..#.*..*..#..*.*..#..*..*.#..*..*..#..+-+
      |    ****###          *  * #  *  *  # *  *  #                          *  *  # *  *  #  * *  #  *  * #  *  *  #    |
      |    *  *  #  ***###  *  * #  *  *  # *  *  #                  ****##  *  *  # *  *  #  * *  #  *  * #  *  *  #    |
   1x +-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+
         astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
  png: http://imgur.com/ArCbHqo

-                                    NBench, x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz

 1.12x +-+-------------------------------------------------------------------------------------------------------------+-+
       |                                                                                                                 |
       |     jr                                                           +++                                            |
  1.1x +jr+hash...........................................................####.........................................+-+
       |                                                               +++#| #                                           |
       |                                                                | #++#                                           |
 1.08x +-+................................+++................+++.+++..*****..#.........................................+-+
       |                                   |  +++             |   |   * | *  #                                           |
       |                                   |   |              |   |   *+++*  #                                           |
 1.06x +-+................................****###.............|...|...*...*..#.........................+++.............+-+
       |                                  *| * |#            ****###  *   *  #                          |                |
       |                                  *| *++#            *| * |#  *   *  #                        ####               |
 1.04x +-+................................*++*..#............*|.*.|#..*...*..#........................#.|#.............+-+
       |                                  *  *  #            *++*++#  *   *  #                     +++#++#               |
       |                                  *  *  #            *  *  #  *   *  #                      | #  #   +++####     |
 1.02x +-+................................*..*..#......+++...*..*..#..*...*..#.....................****..#..*****++#...+-+
       |         +++                      *  *  #   +++ |    *  *  #  *   *  #  +++                *| *  #  *+++*  #     |
       |      +++ |    +++ +++   ++++++   *  *  #  *****###  *  *  #  *   *  #   |  +++   ++++++   *++*  #  *   *  #     |
    1x +-++-+++++####++****###++++-+####+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-+++####-+*****###++*++*++#++*+-+*++#+-++-+
       |     *****| #  *++* |#  *****| #  *  *  #  *   *++#  *  *  #  *   *  #  **** |#  *   *  #  *  *  #  *   *  #     |
       |     * | *| #  *  *++#  * | *++#  *  *  #  *   *  #  *  *  #  *   *  #  *| *++#  *   *  #  *  *  #  *   *  #     |
 0.98x +-+...*.|.*++#..*..*..#..*+++*..#..*..*..#..*...*..#..*..*..#..*...*..#..*++*..#..*...*..#..*..*..#..*...*..#...+-+
       |     *+++*  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
       |     *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
 0.96x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
       ASSIGNMENT BITFIELD   FOURFP EMULATION   HUFFMAN   LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT     hmean
  png: http://imgur.com/ZXFX0hJ

-                                   NBench, arm-linux-user. Host: Intel i7-4790K @ 4.00GHz

  1.3x +-+-------------------------------------------------------------------------------------------------------------+-+
       |                            ####                                                                                 |
       |     jr                     #  #                                            +++                                  |
 1.25x +jr+hash.....................#..#...........................................####................................+-+
       |                            #  #                                           #  #                                  |
       |                            #  #                                           #  #                                  |
  1.2x +-+..........................#..#...........................................#..#................................+-+
       |                            #  #                                           #  #                                  |
       |                            #  #                                           #  #                                  |
 1.15x +-+..........................#..#...........................................#..#................................+-+
       |                            #  #                                  ####     #  #                                  |
       |                            #  #                                  #  #     #  #                                  |
  1.1x +-+..........................#..#..................................#..#.....#..#................................+-+
       |                            #  #                                  #  #     #  #                         +++      |
       |                            #  #               ####               #  #     #  #                         ####     |
 1.05x +-+..........................#..#...............#..#.....####......#..#.....#..#.........................#..#...+-+
       |                            #  #               #  #     #  #      #  #     #  #                +++      #  #     |
       |                   +++  *****  #     ####  *****  #     #  #   +++#  #  ****  #            ****###      #  #     |
    1x +-++-+*****###++****+++++*+-+*++#+-****++#-+*+++*-+#+++++#++#++*****++#+-*++*++#-+*****-++++*++*++#++*****++#+-++-+
       |     *   *  #  *  * |   *   *  #  *  *  #  *   *  #  ****  #  *   *  #  *  *  #  *   *###  *  *++#  *   *  #     |
       |     *   *  #  *  *###  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
 0.95x +-+...*...*..#..*..*.|#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#...+-+
       |     *   *  #  *  * |#  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
       |     *   *  #  *  * |#  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
  0.9x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
       ASSIGNMENT BITFIELD   FOURFP EMULATION   HUFFMAN   LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT     hmean
  png: http://imgur.com/FfD27ey

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-12-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/exec/tb-hash.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index 2c27490..b1fe2d0 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -22,6 +22,8 @@
 
 #include "exec/tb-hash-xx.h"
 
+#ifdef CONFIG_SOFTMMU
+
 /* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for
    addresses on the same page.  The top bits are the same.  This allows
    TLB invalidation to quickly clear a subset of the hash table.  */
@@ -45,6 +47,16 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
            | (tmp & TB_JMP_ADDR_MASK));
 }
 
+#else
+
+/* In user-mode we can get better hashing because we do not have a TLB */
+static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
+{
+    return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1);
+}
+
+#endif /* CONFIG_SOFTMMU */
+
 static inline
 uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags)
 {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 14/19] target/alpha: Use tcg_gen_goto_ptr
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (12 preceding siblings ...)
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode Richard Henderson
@ 2017-04-27 12:00 ` Richard Henderson
  2017-04-28 17:10   ` Alex Bennée
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 15/19] tcg/i386: implement goto_ptr Richard Henderson
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 12:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/alpha/translate.c | 49 +++++++++++++++++++++++++++++++++++-------------
 1 file changed, 36 insertions(+), 13 deletions(-)

diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index df5d695..c1a5fbf 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -89,6 +89,10 @@ typedef enum {
        updated the PC for the next instruction to be executed.  */
     EXIT_PC_STALE,
 
+    /* Similarly, but force TB exit without chaining.  */
+    EXIT_PC_UPDATED_FORCE,
+    EXIT_PC_STALE_FORCE,
+
     /* We are ending the TB with a noreturn function call, e.g. longjmp.
        No following code will be executed.  */
     EXIT_NORETURN,
@@ -455,11 +459,16 @@ static bool in_superpage(DisasContext *ctx, int64_t addr)
 #endif
 }
 
+static bool use_exit_tb(DisasContext *ctx)
+{
+    /* Suppress any optimization in the case of single-steping and IO.  */
+    return ((ctx->tb->cflags & CF_LAST_IO)
+            || ctx->singlestep_enabled || singlestep);
+}
+
 static bool use_goto_tb(DisasContext *ctx, uint64_t dest)
 {
-    /* Suppress goto_tb in the case of single-steping and IO.  */
-    if ((ctx->tb->cflags & CF_LAST_IO)
-        || ctx->singlestep_enabled || singlestep) {
+    if (use_exit_tb(ctx)) {
         return false;
     }
 #ifndef CONFIG_USER_ONLY
@@ -1257,14 +1266,14 @@ static ExitStatus gen_call_pal(DisasContext *ctx, int palcode)
            need the page permissions check.  We'll see the existence of
            the page when we create the TB, and we'll flush all TBs if
            we change the PAL base register.  */
-        if (!ctx->singlestep_enabled && !(ctx->tb->cflags & CF_LAST_IO)) {
+        if (use_exit_tb(ctx)) {
+            tcg_gen_movi_i64(cpu_pc, entry);
+            return EXIT_PC_UPDATED;
+        } else {
             tcg_gen_goto_tb(0);
             tcg_gen_movi_i64(cpu_pc, entry);
             tcg_gen_exit_tb((uintptr_t)ctx->tb);
             return EXIT_GOTO_TB;
-        } else {
-            tcg_gen_movi_i64(cpu_pc, entry);
-            return EXIT_PC_UPDATED;
         }
     }
 #endif
@@ -1323,7 +1332,7 @@ static ExitStatus gen_mfpr(DisasContext *ctx, TCGv va, int regno)
             gen_io_start();
             helper(va);
             gen_io_end();
-            return EXIT_PC_STALE;
+            return EXIT_PC_STALE_FORCE;
         } else {
             helper(va);
         }
@@ -1374,7 +1383,7 @@ static ExitStatus gen_mtpr(DisasContext *ctx, TCGv vb, int regno)
     case 252:
         /* HALT */
         gen_helper_halt(vb);
-        return EXIT_PC_STALE;
+        return EXIT_PC_STALE_FORCE;
 
     case 251:
         /* ALARM */
@@ -1388,7 +1397,7 @@ static ExitStatus gen_mtpr(DisasContext *ctx, TCGv vb, int regno)
            that ended with a CALL_PAL.  Since the base register usually only
            changes during boot, flushing everything works well.  */
         gen_helper_tb_flush(cpu_env);
-        return EXIT_PC_STALE;
+        return EXIT_PC_STALE_FORCE;
 
     case 32 ... 39:
         /* Accessing the "non-shadow" general registers.  */
@@ -2373,7 +2382,7 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
                 gen_io_start();
                 gen_helper_load_pcc(va, cpu_env);
                 gen_io_end();
-                ret = EXIT_PC_STALE;
+                ret = EXIT_PC_STALE_FORCE;
             } else {
                 gen_helper_load_pcc(va, cpu_env);
             }
@@ -2990,18 +2999,32 @@ void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
     case EXIT_GOTO_TB:
     case EXIT_NORETURN:
         break;
+
     case EXIT_PC_STALE:
         tcg_gen_movi_i64(cpu_pc, ctx.pc);
-        /* FALLTHRU */
+        goto do_exit_pc_updated;
+    case EXIT_PC_STALE_FORCE:
+        tcg_gen_movi_i64(cpu_pc, ctx.pc);
+        goto do_exit_pc_updated_force;
+
     case EXIT_PC_UPDATED:
+    do_exit_pc_updated:
+        if (!use_exit_tb(&ctx)) {
+            tcg_gen_lookup_and_goto_ptr(cpu_pc);
+            break;
+        }
+        /* FALLTHRU */
+    case EXIT_PC_UPDATED_FORCE:
+    do_exit_pc_updated_force:
         if (ctx.singlestep_enabled) {
             gen_excp_1(EXCP_DEBUG, 0);
         } else {
             tcg_gen_exit_tb(0);
         }
         break;
+
     default:
-        abort();
+        g_assert_not_reached();
     }
 
     gen_tb_end(tb, num_insns);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 15/19] tcg/i386: implement goto_ptr
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (13 preceding siblings ...)
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 14/19] target/alpha: Use tcg_gen_goto_ptr Richard Henderson
@ 2017-04-27 12:00 ` Richard Henderson
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 16/19] tcg/ppc: Implement goto_ptr Richard Henderson
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 12:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

From: "Emilio G. Cota" <cota@braap.org>

Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1493263764-18657-6-git-send-email-cota@braap.org>
[rth: Reuse goto_ptr epilogue for exit_tb 0.]
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.h     |  2 +-
 tcg/i386/tcg-target.inc.c | 24 ++++++++++++++++++++++--
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 59d9835..73a15f7 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -107,7 +107,7 @@ extern bool have_popcnt;
 #define TCG_TARGET_HAS_muls2_i32        1
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_goto_ptr         0
+#define TCG_TARGET_HAS_goto_ptr         1
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_extrl_i64_i32    0
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 5918008..01e3b4e 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1882,8 +1882,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     switch (opc) {
     case INDEX_op_exit_tb:
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_EAX, a0);
-        tcg_out_jmp(s, tb_ret_addr);
+        /* Reuse the zeroing that exists for goto_ptr.  */
+        if (a0 == 0) {
+            tcg_out_jmp(s, s->code_gen_epilogue);
+        } else {
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_EAX, a0);
+            tcg_out_jmp(s, tb_ret_addr);
+        }
         break;
     case INDEX_op_goto_tb:
         if (s->tb_jmp_insn_offset) {
@@ -1906,6 +1911,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         s->tb_jmp_reset_offset[a0] = tcg_current_code_size(s);
         break;
+    case INDEX_op_goto_ptr:
+        /* jmp to the given host address (could be epilogue) */
+        tcg_out_modrm(s, OPC_GRP5, EXT5_JMPN_Ev, a0);
+        break;
     case INDEX_op_br:
         tcg_out_jxx(s, JCC_JMP, arg_label(a0), 0);
         break;
@@ -2277,6 +2286,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
 static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
 {
+    static const TCGTargetOpDef r = { .args_ct_str = { "r" } };
     static const TCGTargetOpDef ri_r = { .args_ct_str = { "ri", "r" } };
     static const TCGTargetOpDef re_r = { .args_ct_str = { "re", "r" } };
     static const TCGTargetOpDef qi_r = { .args_ct_str = { "qi", "r" } };
@@ -2299,6 +2309,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         = { .args_ct_str = { "L", "L", "L", "L" } };
 
     switch (op) {
+    case INDEX_op_goto_ptr:
+        return &r;
+
     case INDEX_op_ld8u_i32:
     case INDEX_op_ld8u_i64:
     case INDEX_op_ld8s_i32:
@@ -2567,6 +2580,13 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     tcg_out_modrm(s, OPC_GRP5, EXT5_JMPN_Ev, tcg_target_call_iarg_regs[1]);
 #endif
 
+    /*
+     * Return path for goto_ptr. Set return value to 0, a-la exit_tb,
+     * and fall through to the rest of the epilogue.
+     */
+    s->code_gen_epilogue = s->code_ptr;
+    tcg_out_movi(s, TCG_TYPE_REG, TCG_REG_EAX, 0);
+
     /* TB epilogue */
     tb_ret_addr = s->code_ptr;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 16/19] tcg/ppc: Implement goto_ptr
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (14 preceding siblings ...)
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 15/19] tcg/i386: implement goto_ptr Richard Henderson
@ 2017-04-27 12:00 ` Richard Henderson
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 17/19] tcg/aarch64: " Richard Henderson
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 12:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc/tcg-target.h     | 2 +-
 tcg/ppc/tcg-target.inc.c | 7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index a9aa974..5f4a40a 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -82,7 +82,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        1
 #define TCG_TARGET_HAS_mulsh_i32        1
-#define TCG_TARGET_HAS_goto_ptr         0
+#define TCG_TARGET_HAS_goto_ptr         1
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 64f67d2..8d50f18 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -1932,6 +1932,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
     /* Epilogue */
     tcg_debug_assert(tb_ret_addr == s->code_ptr);
+    s->code_gen_epilogue = tb_ret_addr;
 
     tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R0, TCG_REG_R1, FRAME_SIZE+LR_OFFSET);
     for (i = 0; i < ARRAY_SIZE(tcg_target_callee_save_regs); ++i) {
@@ -1986,6 +1987,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 #endif
         s->tb_jmp_reset_offset[args[0]] = tcg_current_code_size(s);
         break;
+    case INDEX_op_goto_ptr:
+        tcg_out32(s, MTSPR | RS(args[0]) | CTR);
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R3, 0);
+        tcg_out32(s, BCCTR | BO_ALWAYS);
+        break;
     case INDEX_op_br:
         {
             TCGLabel *l = arg_label(args[0]);
@@ -2555,6 +2561,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_exit_tb, { } },
     { INDEX_op_goto_tb, { } },
     { INDEX_op_br, { } },
+    { INDEX_op_goto_ptr, { "r" } },
 
     { INDEX_op_ld8u_i32, { "r", "r" } },
     { INDEX_op_ld8s_i32, { "r", "r" } },
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 17/19] tcg/aarch64: Implement goto_ptr
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (15 preceding siblings ...)
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 16/19] tcg/ppc: Implement goto_ptr Richard Henderson
@ 2017-04-27 12:00 ` Richard Henderson
  2017-04-27 22:18   ` Emilio G. Cota
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 18/19] tcg/sparc: " Richard Henderson
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 12:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.h     |  2 +-
 tcg/aarch64/tcg-target.inc.c | 22 ++++++++++++++++++++--
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index b82eac4..55a46ac 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -77,7 +77,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_extrl_i64_i32    0
 #define TCG_TARGET_HAS_extrh_i64_i32    0
-#define TCG_TARGET_HAS_goto_ptr         0
+#define TCG_TARGET_HAS_goto_ptr         1
 
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          1
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 290de6d..5f18545 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1357,8 +1357,13 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     switch (opc) {
     case INDEX_op_exit_tb:
-        tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_X0, a0);
-        tcg_out_goto(s, tb_ret_addr);
+        /* Reuse the zeroing that exists for goto_ptr.  */
+        if (a0 == 0) {
+            tcg_out_goto(s, s->code_gen_epilogue);
+        } else {
+            tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_X0, a0);
+            tcg_out_goto(s, tb_ret_addr);
+        }
         break;
 
     case INDEX_op_goto_tb:
@@ -1374,6 +1379,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         s->tb_jmp_reset_offset[a0] = tcg_current_code_size(s);
         break;
 
+    case INDEX_op_goto_ptr:
+        tcg_out_insn(s, 3207, BR, a0);
+        break;
+
     case INDEX_op_br:
         tcg_out_goto_label(s, arg_label(a0));
         break;
@@ -1735,6 +1744,7 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_exit_tb, { } },
     { INDEX_op_goto_tb, { } },
     { INDEX_op_br, { } },
+    { INDEX_op_goto_ptr, { "r" } },
 
     { INDEX_op_ld8u_i32, { "r", "r" } },
     { INDEX_op_ld8s_i32, { "r", "r" } },
@@ -1942,6 +1952,14 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
     tcg_out_insn(s, 3207, BR, tcg_target_call_iarg_regs[1]);
 
+    /*
+     * Return path for goto_ptr. Set return value to 0, a-la exit_tb,
+     * and fall through to the rest of the epilogue.
+     */
+    s->code_gen_epilogue = s->code_ptr;
+    tcg_out_movi(s, TCG_TYPE_REG, TCG_REG_X0, 0);
+
+    /* TB epilogue */
     tb_ret_addr = s->code_ptr;
 
     /* Remove TCG locals stack space.  */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 18/19] tcg/sparc: Implement goto_ptr
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (16 preceding siblings ...)
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 17/19] tcg/aarch64: " Richard Henderson
@ 2017-04-27 12:00 ` Richard Henderson
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 19/19] tcg/s390: " Richard Henderson
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 12:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.h     |  2 +-
 tcg/sparc/tcg-target.inc.c | 11 ++++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 9348ddd..854a0af 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -123,7 +123,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muls2_i32        1
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_goto_ptr         0
+#define TCG_TARGET_HAS_goto_ptr         1
 
 #define TCG_TARGET_HAS_extrl_i64_i32    1
 #define TCG_TARGET_HAS_extrh_i64_i32    1
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 3785d77..18afce2 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -1003,7 +1003,11 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     /* delay slot */
     tcg_out_nop(s);
 
-    /* No epilogue required.  We issue ret + restore directly in the TB.  */
+    /* Epilogue for goto_ptr.  */
+    s->code_gen_epilogue = s->code_ptr;
+    tcg_out_arithi(s, TCG_REG_G0, TCG_REG_I7, 8, RETURN);
+    /* delay slot */
+    tcg_out_movi_imm13(s, TCG_REG_O0, 0);
 
 #ifdef CONFIG_SOFTMMU
     build_trampolines(s);
@@ -1288,6 +1292,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_nop(s);
         s->tb_jmp_reset_offset[a0] = tcg_current_code_size(s);
         break;
+    case INDEX_op_goto_ptr:
+        tcg_out_arithi(s, TCG_REG_G0, a0, 0, JMPL);
+        tcg_out_nop(s);
+        break;
     case INDEX_op_br:
         tcg_out_bpcc(s, COND_A, BPCC_PT, arg_label(a0));
         tcg_out_nop(s);
@@ -1513,6 +1521,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_exit_tb, { } },
     { INDEX_op_goto_tb, { } },
     { INDEX_op_br, { } },
+    { INDEX_op_goto_ptr, { "r" } },
 
     { INDEX_op_ld8u_i32, { "r", "r" } },
     { INDEX_op_ld8s_i32, { "r", "r" } },
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 19/19] tcg/s390: Implement goto_ptr
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (17 preceding siblings ...)
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 18/19] tcg/sparc: " Richard Henderson
@ 2017-04-27 12:00 ` Richard Henderson
  2017-04-27 12:58 ` [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations no-reply
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: Richard Henderson @ 2017-04-27 12:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: cota

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.h     |  2 +-
 tcg/s390/tcg-target.inc.c | 24 +++++++++++++++++++++---
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 6b7bcfb..957f0c0 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -92,7 +92,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_mulsh_i32      0
 #define TCG_TARGET_HAS_extrl_i64_i32  0
 #define TCG_TARGET_HAS_extrh_i64_i32  0
-#define TCG_TARGET_HAS_goto_ptr       0
+#define TCG_TARGET_HAS_goto_ptr       1
 
 #define TCG_TARGET_HAS_div2_i64       1
 #define TCG_TARGET_HAS_rot_i64        1
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index a679280..5d7083e 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -1741,9 +1741,14 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     switch (opc) {
     case INDEX_op_exit_tb:
-        /* return value */
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R2, args[0]);
-        tgen_gotoi(s, S390_CC_ALWAYS, tb_ret_addr);
+        /* Reuse the zeroing that exists for goto_ptr.  */
+        a0 = args[0];
+        if (a0 == 0) {
+            tgen_gotoi(s, S390_CC_ALWAYS, s->code_gen_epilogue);
+        } else {
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R2, a0);
+            tgen_gotoi(s, S390_CC_ALWAYS, tb_ret_addr);
+        }
         break;
 
     case INDEX_op_goto_tb:
@@ -1767,6 +1772,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         s->tb_jmp_reset_offset[args[0]] = tcg_current_code_size(s);
         break;
 
+    case INDEX_op_goto_ptr:
+        tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, args[0]);
+        break;
+
     OP_32_64(ld8u):
         /* ??? LLC (RXY format) is only present with the extended-immediate
            facility, whereas LLGC is always present.  */
@@ -2241,6 +2250,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_exit_tb, { } },
     { INDEX_op_goto_tb, { } },
     { INDEX_op_br, { } },
+    { INDEX_op_goto_ptr, { "r" } },
 
     { INDEX_op_ld8u_i32, { "r", "r" } },
     { INDEX_op_ld8s_i32, { "r", "r" } },
@@ -2439,6 +2449,14 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     /* br %r3 (go to TB) */
     tcg_out_insn(s, RR, BCR, S390_CC_ALWAYS, tcg_target_call_iarg_regs[1]);
 
+    /*
+     * Return path for goto_ptr. Set return value to 0, a-la exit_tb,
+     * and fall through to the rest of the epilogue.
+     */
+    s->code_gen_epilogue = s->code_ptr;
+    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R2, 0);
+
+    /* TB epilogue */
     tb_ret_addr = s->code_ptr;
 
     /* lmg %r6,%r15,fs+48(%r15) (restore registers) */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (18 preceding siblings ...)
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 19/19] tcg/s390: " Richard Henderson
@ 2017-04-27 12:58 ` no-reply
  2017-04-28 19:17 ` [Qemu-devel] [PATCH v5+] " Emilio G. Cota
  2017-04-30 14:52 ` [Qemu-devel] [PATCH v5++] TCG cross-tb optimizations Aurelien Jarno
  21 siblings, 0 replies; 58+ messages in thread
From: no-reply @ 2017-04-27 12:58 UTC (permalink / raw)
  To: rth; +Cc: famz, qemu-devel, cota

Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations
Message-id: 20170427120006.20564-1-rth@twiddle.net
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
dd00ff3 tcg/s390: Implement goto_ptr
7a3bb3b tcg/sparc: Implement goto_ptr
d0faf3d tcg/aarch64: Implement goto_ptr
3e4a3c8 tcg/ppc: Implement goto_ptr
4203286 tcg/i386: implement goto_ptr
1ab1bc0 target/alpha: Use tcg_gen_goto_ptr
aa74c2b tb-hash: improve tb_jmp_cache hash function in user mode
623c464 target/i386: optimize indirect branches
8265f15 target/i386: optimize cross-page direct jumps in softmmu
e57b419 target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr
f6f46f5 target/arm: optimize indirect branches
e3a959e target/arm: optimize cross-page direct jumps in softmmu
580b0ef tcg: export tcg_gen_lookup_and_goto_ptr
12ab749 tcg: introduce goto_ptr opcode
0bbf965 tcg-runtime: add lookup_tb_ptr helper
6e2d49c exec-all: export tb_htable_lookup
b0be14c qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts
99bd77b tcg/sparc: Use the proper compilation flags for 32-bit
f0e5c62 target/nios2: Fix 64-bit ilp32 compilation

=== OUTPUT BEGIN ===
Checking PATCH 1/19: target/nios2: Fix 64-bit ilp32 compilation...
Checking PATCH 2/19: tcg/sparc: Use the proper compilation flags for 32-bit...
Checking PATCH 3/19: qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts...
WARNING: architecture specific defines should be avoided
#33: FILE: include/qemu/atomic.h:104:
+#if defined(__x86_64__) || defined(__sparc__)

total: 0 errors, 1 warnings, 87 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 4/19: exec-all: export tb_htable_lookup...
Checking PATCH 5/19: tcg-runtime: add lookup_tb_ptr helper...
Checking PATCH 6/19: tcg: introduce goto_ptr opcode...
Checking PATCH 7/19: tcg: export tcg_gen_lookup_and_goto_ptr...
Checking PATCH 8/19: target/arm: optimize cross-page direct jumps in softmmu...
Checking PATCH 9/19: target/arm: optimize indirect branches...
Checking PATCH 10/19: target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr...
Checking PATCH 11/19: target/i386: optimize cross-page direct jumps in softmmu...
Checking PATCH 12/19: target/i386: optimize indirect branches...
Checking PATCH 13/19: tb-hash: improve tb_jmp_cache hash function in user mode...
Checking PATCH 14/19: target/alpha: Use tcg_gen_goto_ptr...
ERROR: return is not a function, parentheses are not required
#31: FILE: target/alpha/translate.c:465:
+    return ((ctx->tb->cflags & CF_LAST_IO)

total: 1 errors, 0 warnings, 113 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 15/19: tcg/i386: implement goto_ptr...
Checking PATCH 16/19: tcg/ppc: Implement goto_ptr...
Checking PATCH 17/19: tcg/aarch64: Implement goto_ptr...
Checking PATCH 18/19: tcg/sparc: Implement goto_ptr...
Checking PATCH 19/19: tcg/s390: Implement goto_ptr...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/19] target/nios2: Fix 64-bit ilp32 compilation
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 01/19] target/nios2: Fix 64-bit ilp32 compilation Richard Henderson
@ 2017-04-27 16:03   ` Alex Bennée
  0 siblings, 0 replies; 58+ messages in thread
From: Alex Bennée @ 2017-04-27 16:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota


Richard Henderson <rth@twiddle.net> writes:

> Avoid a "cast from pointer to integer of different size" warning
> by using the proper host type.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/nios2/translate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/nios2/translate.c b/target/nios2/translate.c
> index cfec479..2f3c2e5 100644
> --- a/target/nios2/translate.c
> +++ b/target/nios2/translate.c
> @@ -164,7 +164,7 @@ static void gen_goto_tb(DisasContext *dc, int n, uint32_t dest)
>      if (use_goto_tb(dc, dest)) {
>          tcg_gen_goto_tb(n);
>          tcg_gen_movi_tl(dc->cpu_R[R_PC], dest);
> -        tcg_gen_exit_tb((tcg_target_long)tb + n);
> +        tcg_gen_exit_tb((uintptr_t)tb + n);
>      } else {
>          tcg_gen_movi_tl(dc->cpu_R[R_PC], dest);
>          tcg_gen_exit_tb(0);


--
Alex Bennée

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 02/19] tcg/sparc: Use the proper compilation flags for 32-bit
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 02/19] tcg/sparc: Use the proper compilation flags for 32-bit Richard Henderson
@ 2017-04-27 16:04   ` Alex Bennée
  0 siblings, 0 replies; 58+ messages in thread
From: Alex Bennée @ 2017-04-27 16:04 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota


Richard Henderson <rth@twiddle.net> writes:

> We have required a v9 cpu since 9b9c37c36439ee0452632253dac7a31897f27f70.
> However, the flags we were using did not reliably enable v8plus, which
> meant that the compiler didn't know it could inline 64-bit atomics.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  configure | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/configure b/configure
> index c35acf1..55dd9c3 100755
> --- a/configure
> +++ b/configure
> @@ -1206,12 +1206,12 @@ case "$cpu" in
>             LDFLAGS="-m64 $LDFLAGS"
>             ;;
>      sparc)
> -           LDFLAGS="-m32 $LDFLAGS"
> -           CPU_CFLAGS="-m32 -mcpu=ultrasparc"
> +           CPU_CFLAGS="-m32 -mv8plus -mcpu=ultrasparc"
> +           LDFLAGS="-m32 -mv8plus $LDFLAGS"
>             ;;
>      sparc64)
> -           LDFLAGS="-m64 $LDFLAGS"
>             CPU_CFLAGS="-m64 -mcpu=ultrasparc"
> +           LDFLAGS="-m64 $LDFLAGS"
>             ;;
>      s390)
>             CPU_CFLAGS="-m31"


--
Alex Bennée

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 03/19] qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 03/19] qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts Richard Henderson
@ 2017-04-27 16:10   ` Alex Bennée
  2017-04-28  7:07     ` Richard Henderson
  0 siblings, 1 reply; 58+ messages in thread
From: Alex Bennée @ 2017-04-27 16:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota


Richard Henderson <rth@twiddle.net> writes:

> We need to coordinate with the TCG_OVERSIZED_GUEST test in cputlb.c,
> and allow 64-bit atomics even though sizeof(void *) == 4.

Hmm you say this here but we never actually do it. But the other changes
seem fine.

>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/qemu/atomic.h | 34 ++++++++++++++++++++++++++--------
>  1 file changed, 26 insertions(+), 8 deletions(-)
>
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index 878fa07..8a564e9 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -88,6 +88,24 @@
>  #define smp_read_barrier_depends()   barrier()
>  #endif
>
> +/* Sanity check that the size of an atomic operation isn't "overly large".
> + * Despite the fact that e.g. i686 has 64-bit atomic operations, we do not
> + * want to use them because we ought not need them, and this lets us do a
> + * bit of sanity checking that other 32-bit hosts might build.
> + *
> + * That said, 64-bit hosts running in ilp32 mode cannot use pointer size
> + * as the test; we need the full register size.
> + * ??? Testing TCG_TARGET_REG_BITS == 64 would exact, but we probably do
> + * not want to pull in everything else TCG related.
> + *
> + * Note that x32 is fully detected with __x64_64__ + _ILP32, and that for
> + * Sparc we always force the use of sparcv9 in configure.
> + */
> +#if defined(__x86_64__) || defined(__sparc__)
> +# define ATOMIC_REG_SIZE  8
> +#else
> +# define ATOMIC_REG_SIZE  sizeof(void *)
> +#endif
>
>  /* Weak atomic operations prevent the compiler moving other
>   * loads/stores past the atomic operation load/store. However there is
> @@ -104,7 +122,7 @@
>
>  #define atomic_read(ptr)                              \
>      ({                                                \
> -    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *)); \
> +    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE); \
>      atomic_read__nocheck(ptr);                        \
>      })
>
> @@ -112,7 +130,7 @@
>      __atomic_store_n(ptr, i, __ATOMIC_RELAXED)
>
>  #define atomic_set(ptr, i)  do {                      \
> -    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *)); \
> +    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE); \
>      atomic_set__nocheck(ptr, i);                      \
>  } while(0)
>
> @@ -130,27 +148,27 @@
>
>  #define atomic_rcu_read(ptr)                          \
>      ({                                                \
> -    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *)); \
> +    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE); \
>      typeof_strip_qual(*ptr) _val;                     \
>      atomic_rcu_read__nocheck(ptr, &_val);             \
>      _val;                                             \
>      })
>
>  #define atomic_rcu_set(ptr, i) do {                   \
> -    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *)); \
> +    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE); \
>      __atomic_store_n(ptr, i, __ATOMIC_RELEASE);       \
>  } while(0)
>
>  #define atomic_load_acquire(ptr)                        \
>      ({                                                  \
> -    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
> +    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE);  \
>      typeof_strip_qual(*ptr) _val;                       \
>      __atomic_load(ptr, &_val, __ATOMIC_ACQUIRE);        \
>      _val;                                               \
>      })
>
>  #define atomic_store_release(ptr, i)  do {              \
> -    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
> +    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE);  \
>      __atomic_store_n(ptr, i, __ATOMIC_RELEASE);         \
>  } while(0)
>
> @@ -162,7 +180,7 @@
>  })
>
>  #define atomic_xchg(ptr, i)    ({                           \
> -    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));       \
> +    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE);      \
>      atomic_xchg__nocheck(ptr, i);                           \
>  })
>
> @@ -175,7 +193,7 @@
>  })
>
>  #define atomic_cmpxchg(ptr, old, new)    ({                             \
> -    QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));                   \
> +    QEMU_BUILD_BUG_ON(sizeof(*ptr) > ATOMIC_REG_SIZE);                  \
>      atomic_cmpxchg__nocheck(ptr, old, new);                             \
>  })


--
Alex Bennée

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 04/19] exec-all: export tb_htable_lookup
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 04/19] exec-all: export tb_htable_lookup Richard Henderson
@ 2017-04-27 16:10   ` Alex Bennée
  0 siblings, 0 replies; 58+ messages in thread
From: Alex Bennée @ 2017-04-27 16:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1493263764-18657-2-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  cpu-exec.c              | 6 ++----
>  include/exec/exec-all.h | 2 ++
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 63a56d0..5b181c1 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -309,10 +309,8 @@ static bool tb_cmp(const void *p, const void *d)
>      return false;
>  }
>
> -static TranslationBlock *tb_htable_lookup(CPUState *cpu,
> -                                          target_ulong pc,
> -                                          target_ulong cs_base,
> -                                          uint32_t flags)
> +TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
> +                                   target_ulong cs_base, uint32_t flags)
>  {
>      tb_page_addr_t phys_pc;
>      struct tb_desc desc;
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index bcde1e6..87ae10b 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -368,6 +368,8 @@ struct TranslationBlock {
>  void tb_free(TranslationBlock *tb);
>  void tb_flush(CPUState *cpu);
>  void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
> +TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
> +                                   target_ulong cs_base, uint32_t flags);
>
>  #if defined(USE_DIRECT_JUMP)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 17/19] tcg/aarch64: Implement goto_ptr
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 17/19] tcg/aarch64: " Richard Henderson
@ 2017-04-27 22:18   ` Emilio G. Cota
  0 siblings, 0 replies; 58+ messages in thread
From: Emilio G. Cota @ 2017-04-27 22:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Apr 27, 2017 at 14:00:04 +0200, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.h     |  2 +-
>  tcg/aarch64/tcg-target.inc.c | 22 ++++++++++++++++++++--
>  2 files changed, 21 insertions(+), 3 deletions(-)

Tested-by: Emilio G. Cota <cota@braap.org>

Would be nice to add to the commit log these before/after numbers
I just got on an aarch64 host:

                      SPECint06 (test set), x86_64-linux-user. Host: APM 64-bit ARMv8 (Atlas/A57) @ 2.4 GHz                  
                                                                                                                             
 1.45x +-+-------------------------------------------------------------------------------------------------------------+-+   
       |                                      *****                                                                      |   
       |      +++                             *   *                                                    +goto-ptr         |   
  1.4x +-+...*****............................*...*....................................................................+-+   
       |     *+++*                            *   *                            +++                                       |   
 1.35x +-+...*...*............................*...*...........................*****....................................+-+   
       |     *   *                            *   *                           *+++*                                      |   
       |     *   *                            *   *                           *   *                                      |   
  1.3x +-+...*...*............................*...*...........................*...*....................................+-+   
       |     *   *                            *   *                           *   *                                      |   
       |     *   *                            *   *                           *   *                    *****             |   
 1.25x +-+...*...*...........*****............*...*...........................*...*............*****...*...*...........+-+   
       |     *   *           *   *            *   *                           *   *            *+++*   *   *             |   
  1.2x +-+...*...*...........*...*............*...*...........................*...*............*...*...*...*...........+-+   
       |     *   *           *   *            *   *                           *   *            *   *   *   *             |   
       |     *   *           *   *            *   *                           *   *            *   *   *   *   *****     |   
 1.15x +-+...*...*...........*...*............*...*...........................*...*............*...*...*...*...*...*...+-+   
       |     *   *           *   *            *   *                           *   *    +++     *   *   *   *   *   *     |   
       |     *   *           *   *            *   *                           *   *   *****    *   *   *   *   *   *     |   
  1.1x +-+...*...*...........*...*....*****...*...*...*****...................*...*...*...*....*...*...*...*...*...*...+-+   
       |     *   *           *   *    *   *   *   *   *   *                   *   *   *   *    *   *   *   *   *   *     |   
 1.05x +-+...*...*...........*...*....*...*...*...*...*...*...................*...*...*...*....*...*...*...*...*...*...+-+   
       |     *   *   *****   *   *    *   *   *   *   *   *                   *   *   *   *    *   *   *   *   *   *     |   
       |     *   *   *   *   *   *    *   *   *   *   *   *   *****   *****   *   *   *   *    *   *   *   *   *   *     |   
    1x +-+---*****---*****---*****----*****---*****---*****---*****---*****---*****---*****----*****---*****---*****---+-+   
          astar   bzip2     gcc    gobmk h264ref   hmmlibquantum     mcf omnetpperlbench    sjenxalancbmk   hmean            
  png: http://imgur.com/en9HE8L

Thanks,

		E.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 09/19] target/arm: optimize indirect branches
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 09/19] target/arm: optimize indirect branches Richard Henderson
@ 2017-04-27 22:58   ` Emilio G. Cota
  0 siblings, 0 replies; 58+ messages in thread
From: Emilio G. Cota @ 2017-04-27 22:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Apr 27, 2017 at 13:59:56 +0200, Richard Henderson wrote:
(snip)
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1493263764-18657-8-git-send-email-cota@braap.org>
> [rth: Replace gen_jr global variable with DISAS_EXIT state.]
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Just want to confirm that this patch passes my testing.
Much better than using the boolean!

Thanks,

		E.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 03/19] qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts
  2017-04-27 16:10   ` Alex Bennée
@ 2017-04-28  7:07     ` Richard Henderson
  2017-04-28  7:47       ` Alex Bennée
  0 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-28  7:07 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, cota

On 04/27/2017 06:10 PM, Alex Bennée wrote:
> 
> Richard Henderson <rth@twiddle.net> writes:
> 
>> We need to coordinate with the TCG_OVERSIZED_GUEST test in cputlb.c,
>> and allow 64-bit atomics even though sizeof(void *) == 4.
> 
> Hmm you say this here but we never actually do it. But the other changes
> seem fine.

I don't understand this comment.

>> +#if defined(__x86_64__) || defined(__sparc__)
>> +# define ATOMIC_REG_SIZE  8
>> +#else
>> +# define ATOMIC_REG_SIZE  sizeof(void *)
>> +#endif

How does this "never actually do it"?


r~

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 03/19] qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts
  2017-04-28  7:07     ` Richard Henderson
@ 2017-04-28  7:47       ` Alex Bennée
  2017-04-28  8:05         ` Richard Henderson
  0 siblings, 1 reply; 58+ messages in thread
From: Alex Bennée @ 2017-04-28  7:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota


Richard Henderson <rth@twiddle.net> writes:

> On 04/27/2017 06:10 PM, Alex Bennée wrote:
>>
>> Richard Henderson <rth@twiddle.net> writes:
>>
>>> We need to coordinate with the TCG_OVERSIZED_GUEST test in cputlb.c,
>>> and allow 64-bit atomics even though sizeof(void *) == 4.
>>
>> Hmm you say this here but we never actually do it. But the other changes
>> seem fine.
>
> I don't understand this comment.
>
>>> +#if defined(__x86_64__) || defined(__sparc__)
>>> +# define ATOMIC_REG_SIZE  8
>>> +#else
>>> +# define ATOMIC_REG_SIZE  sizeof(void *)
>>> +#endif
>
> How does this "never actually do it"?

I meant this is independent of the definition of TCG_OVERSIZED_GUEST:

#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
#define TCG_OVERSIZED_GUEST 1
#else
#define TCG_OVERSIZED_GUEST 0
#endif

So maybe the comment should be clearer for ATOMIC_REG_SIZE that it
should match TCG_TARGET_REG_BITS (and therefore sync with
TCG_OVERSIZED_GUEST) in the atomic.h comment.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 03/19] qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts
  2017-04-28  7:47       ` Alex Bennée
@ 2017-04-28  8:05         ` Richard Henderson
  2017-04-28 10:25           ` Alex Bennée
  0 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-28  8:05 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, cota

On 04/28/2017 09:47 AM, Alex Bennée wrote:
> So maybe the comment should be clearer for ATOMIC_REG_SIZE that it
> should match TCG_TARGET_REG_BITS (and therefore sync with
> TCG_OVERSIZED_GUEST) in the atomic.h comment.

How about

  * That said, we have a problem on 64-bit ILP32 hosts in that in order to
  * sync with TCG_OVERSIZED_GUEST, this must match TCG_TARGET_REG_BITS.
  * We'd prefer not want to pull in everything else TCG related, so handle
  * those few cases by hand.

?

r~

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 03/19] qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts
  2017-04-28  8:05         ` Richard Henderson
@ 2017-04-28 10:25           ` Alex Bennée
  0 siblings, 0 replies; 58+ messages in thread
From: Alex Bennée @ 2017-04-28 10:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota


Richard Henderson <rth@twiddle.net> writes:

> On 04/28/2017 09:47 AM, Alex Bennée wrote:
>> So maybe the comment should be clearer for ATOMIC_REG_SIZE that it
>> should match TCG_TARGET_REG_BITS (and therefore sync with
>> TCG_OVERSIZED_GUEST) in the atomic.h comment.
>
> How about
>
>  * That said, we have a problem on 64-bit ILP32 hosts in that in order to
>  * sync with TCG_OVERSIZED_GUEST, this must match TCG_TARGET_REG_BITS.
>  * We'd prefer not want to pull in everything else TCG related, so handle
>  * those few cases by hand.
>
> ?

Sounds good to me.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/19] tcg-runtime: add lookup_tb_ptr helper
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 05/19] tcg-runtime: add lookup_tb_ptr helper Richard Henderson
@ 2017-04-28 10:29   ` Alex Bennée
  2017-04-28 10:32     ` Richard Henderson
  0 siblings, 1 reply; 58+ messages in thread
From: Alex Bennée @ 2017-04-28 10:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> This paves the way for upcoming work.
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1493263764-18657-3-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg-runtime.c     | 24 ++++++++++++++++++++++++
>  tcg/tcg-runtime.h |  2 ++
>  tcg/tcg.h         |  1 +
>  3 files changed, 27 insertions(+)
>
> diff --git a/tcg-runtime.c b/tcg-runtime.c
> index 4c60c96..8a24bdd 100644
> --- a/tcg-runtime.c
> +++ b/tcg-runtime.c
> @@ -27,6 +27,7 @@
>  #include "exec/helper-proto.h"
>  #include "exec/cpu_ldst.h"
>  #include "exec/exec-all.h"
> +#include "exec/tb-hash.h"
>
>  /* 32-bit helpers */
>
> @@ -141,6 +142,29 @@ uint64_t HELPER(ctpop_i64)(uint64_t arg)
>      return ctpop64(arg);
>  }
>
> +void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr)
> +{
> +    CPUState *cpu = ENV_GET_CPU(env);
> +    TranslationBlock *tb;
> +    target_ulong cs_base, pc;
> +    uint32_t flags;
> +
> +    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
> +    if (likely(tb)) {
> +        cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> +        if (likely(tb->pc == addr && tb->cs_base == cs_base &&
> +                   tb->flags == flags)) {
> +            return tb->tc_ptr;
> +        }
> +        tb = tb_htable_lookup(cpu, pc, cs_base, flags);
> +        if (likely(tb)) {
> +            atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)], tb);
> +            return tb->tc_ptr;
> +        }
> +    }
> +    return tcg_ctx.code_gen_epilogue;

Minor comments: I did notice is given we rely on the backends to set this up
in later patches if it is worth adding an assert (or tcg_debug_assert?)
to catch this early if a new backend doesn't set this up?

> +}
> +
>  void HELPER(exit_atomic)(CPUArchState *env)
>  {
>      cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
> diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
> index 114ea6f..c41d38a 100644
> --- a/tcg/tcg-runtime.h
> +++ b/tcg/tcg-runtime.h
> @@ -24,6 +24,8 @@ DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
>  DEF_HELPER_FLAGS_1(ctpop_i32, TCG_CALL_NO_RWG_SE, i32, i32)
>  DEF_HELPER_FLAGS_1(ctpop_i64, TCG_CALL_NO_RWG_SE, i64, i64)
>
> +DEF_HELPER_FLAGS_2(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env, tl)
> +
>  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
>
>  #ifdef CONFIG_SOFTMMU
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 6c216bb..5ec48d1 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -699,6 +699,7 @@ struct TCGContext {
>         extension that allows arithmetic on void*.  */
>      int code_gen_max_blocks;
>      void *code_gen_prologue;
> +    void *code_gen_epilogue;
>      void *code_gen_buffer;
>      size_t code_gen_buffer_size;
>      void *code_gen_ptr;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/19] tcg-runtime: add lookup_tb_ptr helper
  2017-04-28 10:29   ` Alex Bennée
@ 2017-04-28 10:32     ` Richard Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Henderson @ 2017-04-28 10:32 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, cota

On 04/28/2017 12:29 PM, Alex Bennée wrote:
>> +    return tcg_ctx.code_gen_epilogue;
> 
> Minor comments: I did notice is given we rely on the backends to set this up
> in later patches if it is worth adding an assert (or tcg_debug_assert?)
> to catch this early if a new backend doesn't set this up?

Sure, I can check for this in tcg_prologue_init and a faulty backend will die 
very early.


r~

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/19] tcg: introduce goto_ptr opcode
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 06/19] tcg: introduce goto_ptr opcode Richard Henderson
@ 2017-04-28 10:32   ` Alex Bennée
  0 siblings, 0 replies; 58+ messages in thread
From: Alex Bennée @ 2017-04-28 10:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1493263764-18657-4-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/aarch64/tcg-target.h | 1 +
>  tcg/arm/tcg-target.h     | 1 +
>  tcg/i386/tcg-target.h    | 1 +
>  tcg/ia64/tcg-target.h    | 1 +
>  tcg/mips/tcg-target.h    | 1 +
>  tcg/ppc/tcg-target.h     | 1 +
>  tcg/s390/tcg-target.h    | 1 +
>  tcg/sparc/tcg-target.h   | 1 +
>  tcg/tcg-opc.h            | 1 +
>  tcg/tci/tcg-target.h     | 1 +
>  10 files changed, 10 insertions(+)
>
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 1a5ea23..b82eac4 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -77,6 +77,7 @@ typedef enum {
>  #define TCG_TARGET_HAS_mulsh_i32        0
>  #define TCG_TARGET_HAS_extrl_i64_i32    0
>  #define TCG_TARGET_HAS_extrh_i64_i32    0
> +#define TCG_TARGET_HAS_goto_ptr         0
>
>  #define TCG_TARGET_HAS_div_i64          1
>  #define TCG_TARGET_HAS_rem_i64          1
> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index 75ea247..c114df7 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -123,6 +123,7 @@ extern bool use_idiv_instructions;
>  #define TCG_TARGET_HAS_mulsh_i32        0
>  #define TCG_TARGET_HAS_div_i32          use_idiv_instructions
>  #define TCG_TARGET_HAS_rem_i32          0
> +#define TCG_TARGET_HAS_goto_ptr         0
>
>  enum {
>      TCG_AREG0 = TCG_REG_R6,
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 4275787..59d9835 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -107,6 +107,7 @@ extern bool have_popcnt;
>  #define TCG_TARGET_HAS_muls2_i32        1
>  #define TCG_TARGET_HAS_muluh_i32        0
>  #define TCG_TARGET_HAS_mulsh_i32        0
> +#define TCG_TARGET_HAS_goto_ptr         0
>
>  #if TCG_TARGET_REG_BITS == 64
>  #define TCG_TARGET_HAS_extrl_i64_i32    0
> diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
> index 42aea03..901bb75 100644
> --- a/tcg/ia64/tcg-target.h
> +++ b/tcg/ia64/tcg-target.h
> @@ -173,6 +173,7 @@ typedef enum {
>  #define TCG_TARGET_HAS_mulsh_i64        0
>  #define TCG_TARGET_HAS_extrl_i64_i32    0
>  #define TCG_TARGET_HAS_extrh_i64_i32    0
> +#define TCG_TARGET_HAS_goto_ptr         0
>
>  #define TCG_TARGET_deposit_i32_valid(ofs, len) ((len) <= 16)
>  #define TCG_TARGET_deposit_i64_valid(ofs, len) ((len) <= 16)
> diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
> index f46d64a..e3240cf 100644
> --- a/tcg/mips/tcg-target.h
> +++ b/tcg/mips/tcg-target.h
> @@ -130,6 +130,7 @@ extern bool use_mips32r2_instructions;
>  #define TCG_TARGET_HAS_muluh_i32        1
>  #define TCG_TARGET_HAS_mulsh_i32        1
>  #define TCG_TARGET_HAS_bswap32_i32      1
> +#define TCG_TARGET_HAS_goto_ptr         0
>
>  #if TCG_TARGET_REG_BITS == 64
>  #define TCG_TARGET_HAS_add2_i32         0
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index abd8b3d..a9aa974 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -82,6 +82,7 @@ extern bool have_isa_3_00;
>  #define TCG_TARGET_HAS_muls2_i32        0
>  #define TCG_TARGET_HAS_muluh_i32        1
>  #define TCG_TARGET_HAS_mulsh_i32        1
> +#define TCG_TARGET_HAS_goto_ptr         0
>
>  #if TCG_TARGET_REG_BITS == 64
>  #define TCG_TARGET_HAS_add2_i32         0
> diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
> index cbdd2a6..6b7bcfb 100644
> --- a/tcg/s390/tcg-target.h
> +++ b/tcg/s390/tcg-target.h
> @@ -92,6 +92,7 @@ extern uint64_t s390_facilities;
>  #define TCG_TARGET_HAS_mulsh_i32      0
>  #define TCG_TARGET_HAS_extrl_i64_i32  0
>  #define TCG_TARGET_HAS_extrh_i64_i32  0
> +#define TCG_TARGET_HAS_goto_ptr       0
>
>  #define TCG_TARGET_HAS_div2_i64       1
>  #define TCG_TARGET_HAS_rot_i64        1
> diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
> index b8b74f96f..9348ddd 100644
> --- a/tcg/sparc/tcg-target.h
> +++ b/tcg/sparc/tcg-target.h
> @@ -123,6 +123,7 @@ extern bool use_vis3_instructions;
>  #define TCG_TARGET_HAS_muls2_i32        1
>  #define TCG_TARGET_HAS_muluh_i32        0
>  #define TCG_TARGET_HAS_mulsh_i32        0
> +#define TCG_TARGET_HAS_goto_ptr         0
>
>  #define TCG_TARGET_HAS_extrl_i64_i32    1
>  #define TCG_TARGET_HAS_extrh_i64_i32    1
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index f06f894..956fb1e 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -193,6 +193,7 @@ DEF(insn_start, 0, 0, TLADDR_ARGS * TARGET_INSN_START_WORDS,
>      TCG_OPF_NOT_PRESENT)
>  DEF(exit_tb, 0, 0, 1, TCG_OPF_BB_END)
>  DEF(goto_tb, 0, 0, 1, TCG_OPF_BB_END)
> +DEF(goto_ptr, 0, 1, 0, TCG_OPF_BB_END | IMPL(TCG_TARGET_HAS_goto_ptr))
>
>  DEF(qemu_ld_i32, 1, TLADDR_ARGS, 1,
>      TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
> diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
> index 838bf3a..0696328 100644
> --- a/tcg/tci/tcg-target.h
> +++ b/tcg/tci/tcg-target.h
> @@ -85,6 +85,7 @@
>  #define TCG_TARGET_HAS_muls2_i32        0
>  #define TCG_TARGET_HAS_muluh_i32        0
>  #define TCG_TARGET_HAS_mulsh_i32        0
> +#define TCG_TARGET_HAS_goto_ptr         0
>
>  #if TCG_TARGET_REG_BITS == 64
>  #define TCG_TARGET_HAS_extrl_i64_i32    0


--
Alex Bennée

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/19] target/arm: optimize cross-page direct jumps in softmmu
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 08/19] target/arm: optimize cross-page direct jumps in softmmu Richard Henderson
@ 2017-04-28 11:30   ` Alex Bennée
  0 siblings, 0 replies; 58+ messages in thread
From: Alex Bennée @ 2017-04-28 11:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> Instead of unconditionally exiting to the exec loop, use the
> lookup_and_goto_ptr helper to jump to the target if it is valid.
>
> Perf impact: see next commit's log.
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1493263764-18657-7-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/arm/translate.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 0b5a0bc..facb52f 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -4153,8 +4153,12 @@ static inline void gen_goto_tb(DisasContext *s, int n, target_ulong dest)
>          gen_set_pc_im(s, dest);
>          tcg_gen_exit_tb((uintptr_t)s->tb + n);
>      } else {
> +        TCGv addr = tcg_temp_new();
> +
>          gen_set_pc_im(s, dest);
> -        tcg_gen_exit_tb(0);
> +        tcg_gen_extu_i32_tl(addr, cpu_R[15]);
> +        tcg_gen_lookup_and_goto_ptr(addr);
> +        tcg_temp_free(addr);
>      }
>  }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/19] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 10/19] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr Richard Henderson
@ 2017-04-28 16:50   ` Alex Bennée
  0 siblings, 0 replies; 58+ messages in thread
From: Alex Bennée @ 2017-04-28 16:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> This helper will be used by subsequent changes.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1493263764-18657-9-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/i386/translate.c | 25 ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index 1d1372f..f0e48dc 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -141,6 +141,7 @@ typedef struct DisasContext {
>  } DisasContext;
>
>  static void gen_eob(DisasContext *s);
> +static void gen_jr(DisasContext *s, TCGv dest);
>  static void gen_jmp(DisasContext *s, target_ulong eip);
>  static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num);
>  static void gen_op(DisasContext *s1, int op, TCGMemOp ot, int d);
> @@ -2509,7 +2510,8 @@ static void gen_bnd_jmp(DisasContext *s)
>     If INHIBIT, set HF_INHIBIT_IRQ_MASK if it isn't already set.
>     If RECHECK_TF, emit a rechecking helper for #DB, ignoring the state of
>     S->TF.  This is used by the syscall/sysret insns.  */
> -static void gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf)
> +static void
> +do_gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf, TCGv jr)
>  {
>      gen_update_cc_op(s);
>
> @@ -2530,12 +2532,27 @@ static void gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf)
>          tcg_gen_exit_tb(0);
>      } else if (s->tf) {
>          gen_helper_single_step(cpu_env);
> +    } else if (!TCGV_IS_UNUSED(jr)) {
> +        TCGv vaddr = tcg_temp_new();
> +
> +        tcg_gen_add_tl(vaddr, jr, cpu_seg_base[R_CS]);
> +        tcg_gen_lookup_and_goto_ptr(vaddr);
> +        tcg_temp_free(vaddr);
>      } else {
>          tcg_gen_exit_tb(0);
>      }
>      s->is_jmp = DISAS_TB_JUMP;
>  }
>
> +static inline void
> +gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf)
> +{
> +    TCGv unused;
> +
> +    TCGV_UNUSED(unused);
> +    do_gen_eob_worker(s, inhibit, recheck_tf, unused);
> +}
> +
>  /* End of block.
>     If INHIBIT, set HF_INHIBIT_IRQ_MASK if it isn't already set.  */
>  static void gen_eob_inhibit_irq(DisasContext *s, bool inhibit)
> @@ -2549,6 +2566,12 @@ static void gen_eob(DisasContext *s)
>      gen_eob_worker(s, false, false);
>  }
>
> +/* Jump to register */
> +static void gen_jr(DisasContext *s, TCGv dest)
> +{
> +    do_gen_eob_worker(s, false, false, dest);
> +}
> +
>  /* generate a jump to eip. No segment change must happen before as a
>     direct call to the next block may occur */
>  static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/19] target/i386: optimize cross-page direct jumps in softmmu
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 11/19] target/i386: optimize cross-page direct jumps in softmmu Richard Henderson
@ 2017-04-28 16:56   ` Alex Bennée
  2017-04-29  9:14     ` Richard Henderson
  0 siblings, 1 reply; 58+ messages in thread
From: Alex Bennée @ 2017-04-28 16:56 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> Instead of unconditionally exiting to the exec loop, use the
> gen_jr helper to jump to the target if it is valid.
>
> Perf impact: see next commit's log.
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1493263764-18657-10-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target/i386/translate.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index f0e48dc..ea113fe 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -2154,9 +2154,9 @@ static inline void gen_goto_tb(DisasContext *s, int tb_num, target_ulong eip)
>          gen_jmp_im(eip);
>          tcg_gen_exit_tb((uintptr_t)s->tb + tb_num);
>      } else {
> -        /* jump to another page: currently not optimized */
> +        /* jump to another page */
>          gen_jmp_im(eip);
> -        gen_eob(s);
> +        gen_jr(s, cpu_tmp0);


I had to look up what was going on with cpu_tmp0 there. Is there a
particular reason i386 has these global temps with implied setting
rules? It does seem somewhat hacky.

Given cmp_tmp0 seems to be a heavily used across i386 I guess it keeps
to the style of the translator :-/


>      }
>  }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 12/19] target/i386: optimize indirect branches
  2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 12/19] target/i386: optimize indirect branches Richard Henderson
@ 2017-04-28 16:58   ` Alex Bennée
  0 siblings, 0 replies; 58+ messages in thread
From: Alex Bennée @ 2017-04-28 16:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> Speed up indirect branches by jumping to the target if it is valid.
>
> Softmmu measurements (see later commit for user-mode numbers):
>
> Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.
>
> -                  SPECint06 (test set), x86_64-softmmu (Ubuntu 16.04 guest). Host: Intel i7-4790K @ 4.00GHz
>
>  2.4x +-+--------------------------------------------------------------------------------------------------------------+-+
>       |                                                                                                                  |
>       |   cross                                                                                                          |
>  2.2x +cross+jr..........................................................................+++...........................+-+
>       |                                                                                   |                              |
>       |                                                                               +++ |                              |
>    2x +-+..............................................................................|..|............................+-+
>       |                                                                                |  |                              |
>       |                                                                                |  |                              |
>  1.8x +-+..............................................................................|####...........................+-+
>       |                                                                                |# |#                             |
>       |                                                                              **** |#                             |
>  1.6x +-+............................................................................*.|*.|#...........................+-+
>       |                                                                              * |* |#                             |
>       |                                                                              * |* |#                             |
>  1.4x +-+.......................................................................+++..*.|*.|#...........................+-+
>       |                                                      ++++++             #### * |*++#             +++             |
>       |                        +++                            |  |              #++# *++*  #          +++ |              |
>  1.2x +-+......................###.....####....+++............|..|...........****..#.*..*..#....####...|.###.....####..+-+
>       |        +++          **** #  ****  #    ####          ***###          *++*  # *  *  #    #++#  ****|#  +++#++#    |
>       |    ****###     +++  *++* #  *++*  #  ++#  #    ####  *|* |#     +++  *  *  # *  *  #  ***  #  *| *|#  ****  #    |
>    1x +-++-*++*++#++***###++*++*+#++*+-*++#+****++#++***++#+-*+*++#-+****##++*++*-+#+*++*-+#++*+*++#++*-+*+#++*++*++#-++-+
>       |    *  *  #  * *  #  *  * #  *  *  # *  *  #  * *  #  *|* |#  *++* #  *  *  # *  *  #  * *  #  *  * #  *  *  #    |
>       |    *  *  #  * *  #  *  * #  *  *  # *  *  #  * *  #  *+*++#  *  * #  *  *  # *  *  #  * *  #  *  * #  *  *  #    |
>  0.8x +-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+
>          astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
>   png: http://imgur.com/DU36YFU
>
> NB. 'cross' represents the previous commit.
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1493263764-18657-11-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target/i386/translate.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index ea113fe..674ec96 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -4996,7 +4996,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
>              gen_push_v(s, cpu_T1);
>              gen_op_jmp_v(cpu_T0);
>              gen_bnd_jmp(s);
> -            gen_eob(s);
> +            gen_jr(s, cpu_T0);
>              break;
>          case 3: /* lcall Ev */
>              gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
> @@ -5014,7 +5014,8 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
>                                        tcg_const_i32(dflag - 1),
>                                        tcg_const_i32(s->pc - s->cs_base));
>              }
> -            gen_eob(s);
> +            tcg_gen_ld_tl(cpu_tmp4, cpu_env, offsetof(CPUX86State, eip));
> +            gen_jr(s, cpu_tmp4);
>              break;
>          case 4: /* jmp Ev */
>              if (dflag == MO_16) {
> @@ -5022,7 +5023,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
>              }
>              gen_op_jmp_v(cpu_T0);
>              gen_bnd_jmp(s);
> -            gen_eob(s);
> +            gen_jr(s, cpu_T0);
>              break;
>          case 5: /* ljmp Ev */
>              gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
> @@ -5037,7 +5038,8 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
>                  gen_op_movl_seg_T0_vm(R_CS);
>                  gen_op_jmp_v(cpu_T1);
>              }
> -            gen_eob(s);
> +            tcg_gen_ld_tl(cpu_tmp4, cpu_env, offsetof(CPUX86State, eip));
> +            gen_jr(s, cpu_tmp4);
>              break;
>          case 6: /* push Ev */
>              gen_push_v(s, cpu_T0);
> @@ -6417,7 +6419,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
>          /* Note that gen_pop_T0 uses a zero-extending load.  */
>          gen_op_jmp_v(cpu_T0);
>          gen_bnd_jmp(s);
> -        gen_eob(s);
> +        gen_jr(s, cpu_T0);
>          break;
>      case 0xc3: /* ret */
>          ot = gen_pop_T0(s);
> @@ -6425,7 +6427,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
>          /* Note that gen_pop_T0 uses a zero-extending load.  */
>          gen_op_jmp_v(cpu_T0);
>          gen_bnd_jmp(s);
> -        gen_eob(s);
> +        gen_jr(s, cpu_T0);
>          break;
>      case 0xca: /* lret im */
>          val = cpu_ldsw_code(env, s->pc);


--
Alex Bennée

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode Richard Henderson
@ 2017-04-28 17:00   ` Alex Bennée
  2017-04-28 17:44     ` Emilio G. Cota
  0 siblings, 1 reply; 58+ messages in thread
From: Alex Bennée @ 2017-04-28 17:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota


Richard Henderson <rth@twiddle.net> writes:

> From: "Emilio G. Cota" <cota@braap.org>
>
> Optimizations to cross-page chaining and indirect branches make
> performance more sensitive to the hit rate of tb_jmp_cache.
> The constraint of reserving some bits for the page number
> lowers the achievable quality of the hashing function.
>
> However, user-mode does not have this requirement. Thus,
> with this change we use for user-mode a hashing function that
> is both faster and of better quality than the previous one.
>
> Measurements:
>
> Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.
>
> -                           SPECint06 (test set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz
>
>  2.2x +-+--------------------------------------------------------------------------------------------------------------+-+
>       |                                                                                                                  |
>       |         jr                                                                                                       |
>    2x +jr+multhash        +....................................................+++++...................................+-+
>       |    jr+hash                                                              |$$$                                     |
>       |                                                                         |$+$                                     |
>       |                                                                        ### $                                     |
>  1.8x +-+......................................................................#|#.$...................................+-+
>       |                                                                      ++#+# $                                     |
>       |                                                                       |# # $                                     |
>  1.6x +-+....................................................................***.#.$....................++$$$..........+-+
>       |                                         $$$                          *+* # $                     |$+$            |
>       |                       ++$$$           ### $                          * * # $                  +++|$ $            |
>       |                     ++###+$           # # $                          * * # $           ###   ****## $            |
>  1.4x +-+...................***+#.$.........***.#.$..........................*.*.#.$...........#+#$$.*++*|#.$..........+-+
>       |                     *+* # $         * * # $                          * * # $           # # $ *  *+# $            |
>       |                     * * # $   +++++ * * # $                          * * # $         *** # $ *  * # $   ###$$    |
>  1.2x +-+...................*.*.#.$.***##$$.*.*.#.$..........................*.*.#.$.........*.*.#.$.*..*.#.$.***+#+$..+-+
>       |                     * * # $ *+* # $ * * # $   +++                    * * # $ ++###$$ * * # $ *  * # $ * * # $    |
>       |    ***##$$          * * # $ * * # $ * * # $ ***##$$          ++###   * * # $ *** #+$ * * # $ *  * # $ * * # $    |
>       |    *+*+#+$ ***##$$$ * * # $ * * # $ * * # $ *+* # $ ++####$$ ***+#   * * # $ * * # $ * * # $ *  * # $ * * # $    |
>    1x +-++-*+*+#+$+*+*+#-+$+*+*-#+$+*+*+#+$+*+*+#+$+*-*+#+$+***++#+$+*+*+#$$+*+*+#+$+*+*+#+$+*+*-#+$+*+-*+#+$+*+*+#+$-++-+
>       |    * * # $ * * #  $ * * # $ * * # $ * * # $ * * # $ * *  # $ * * # $ * * # $ * * # $ * * # $ *  * # $ * * # $    |
>       |    * * # $ * * #  $ * * # $ * * # $ * * # $ * * # $ * *  # $ * * # $ * * # $ * * # $ * * # $ *  * # $ * * # $    |
>  0.8x +-+--***##$$-***##$$$-***##$$-***##$$-***##$$-***##$$-***###$$-***##$$-***##$$-***##$$-***##$$-****##$$-***##$$--+-+
>          astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
>   png: http://imgur.com/4UXTrEc
>
> Here I also tried the hash function suggested by Paolo ("multhash"):
>
>   return ((uint64_t) (pc * 2654435761) >> 32) & (TB_JMP_CACHE_SIZE - 1);
>
> As you can see it is just as good as the other new function ("hash"),
> which is what I ended up going with.
>
> -                          SPECint06 (train set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz
>
>  2.6x +-+--------------------------------------------------------------------------------------------------------------+-+
>       |                                                                                                                  |
>       |     jr                                                                                           ###             |
>  2.4x +jr+hash...........................................................................................#.#...........+-+
>       |                                                                                                  # #             |
>       |                                                                                                  # #             |
>  2.2x +-+................................................................................................#.#...........+-+
>       |                                                                                                  # #             |
>       |                                                                                                  # #             |
>    2x +-+................................................................................................#.#...........+-+
>       |                                                                                               **** #             |
>       |                                                                                               *  * #             |
>  1.8x +-+.............................................................................................*..*.#...........+-+
>       |                                                                         +++                   *  * #             |
>       |                                                                         ####    ####          *  * #             |
>  1.6x +-+......................................####.............................#..#.****..#..........*..*.#...........+-+
>       |                        +++             #++#                          ****  # *  *  #    ####  *  * #             |
>       |                        ###             #  #                          *  *  # *  *  #    #  #  *  * #             |
>  1.4x +-+...................****+#..........****..#..........................*..*..#.*..*..#....#..#..*..*.#...........+-+
>       |                     *++* #          *  *  #                          *  *  # *  *  #  ***  #  *  * #     ####    |
>       |                     *  * #     #### *  *  #                          *  *  # *  *  #  * *  #  *  * #  ****  #    |
>  1.2x +-+...................*..*.#..****++#.*..*..#..........................*..*..#.*..*..#..*.*..#..*..*.#..*..*..#..+-+
>       |    ****###          *  * #  *  *  # *  *  #                          *  *  # *  *  #  * *  #  *  * #  *  *  #    |
>       |    *  *  #  ***###  *  * #  *  *  # *  *  #                  ****##  *  *  # *  *  #  * *  #  *  * #  *  *  #    |
>    1x +-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+
>          astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
>   png: http://imgur.com/ArCbHqo
>
> -                                    NBench, x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz
>
>  1.12x +-+-------------------------------------------------------------------------------------------------------------+-+
>        |                                                                                                                 |
>        |     jr                                                           +++                                            |
>   1.1x +jr+hash...........................................................####.........................................+-+
>        |                                                               +++#| #                                           |
>        |                                                                | #++#                                           |
>  1.08x +-+................................+++................+++.+++..*****..#.........................................+-+
>        |                                   |  +++             |   |   * | *  #                                           |
>        |                                   |   |              |   |   *+++*  #                                           |
>  1.06x +-+................................****###.............|...|...*...*..#.........................+++.............+-+
>        |                                  *| * |#            ****###  *   *  #                          |                |
>        |                                  *| *++#            *| * |#  *   *  #                        ####               |
>  1.04x +-+................................*++*..#............*|.*.|#..*...*..#........................#.|#.............+-+
>        |                                  *  *  #            *++*++#  *   *  #                     +++#++#               |
>        |                                  *  *  #            *  *  #  *   *  #                      | #  #   +++####     |
>  1.02x +-+................................*..*..#......+++...*..*..#..*...*..#.....................****..#..*****++#...+-+
>        |         +++                      *  *  #   +++ |    *  *  #  *   *  #  +++                *| *  #  *+++*  #     |
>        |      +++ |    +++ +++   ++++++   *  *  #  *****###  *  *  #  *   *  #   |  +++   ++++++   *++*  #  *   *  #     |
>     1x +-++-+++++####++****###++++-+####+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-+++####-+*****###++*++*++#++*+-+*++#+-++-+
>        |     *****| #  *++* |#  *****| #  *  *  #  *   *++#  *  *  #  *   *  #  **** |#  *   *  #  *  *  #  *   *  #     |
>        |     * | *| #  *  *++#  * | *++#  *  *  #  *   *  #  *  *  #  *   *  #  *| *++#  *   *  #  *  *  #  *   *  #     |
>  0.98x +-+...*.|.*++#..*..*..#..*+++*..#..*..*..#..*...*..#..*..*..#..*...*..#..*++*..#..*...*..#..*..*..#..*...*..#...+-+
>        |     *+++*  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
>        |     *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
>  0.96x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
>        ASSIGNMENT BITFIELD   FOURFP EMULATION   HUFFMAN   LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT     hmean
>   png: http://imgur.com/ZXFX0hJ
>
> -                                   NBench, arm-linux-user. Host: Intel i7-4790K @ 4.00GHz
>
>   1.3x +-+-------------------------------------------------------------------------------------------------------------+-+
>        |                            ####                                                                                 |
>        |     jr                     #  #                                            +++                                  |
>  1.25x +jr+hash.....................#..#...........................................####................................+-+
>        |                            #  #                                           #  #                                  |
>        |                            #  #                                           #  #                                  |
>   1.2x +-+..........................#..#...........................................#..#................................+-+
>        |                            #  #                                           #  #                                  |
>        |                            #  #                                           #  #                                  |
>  1.15x +-+..........................#..#...........................................#..#................................+-+
>        |                            #  #                                  ####     #  #                                  |
>        |                            #  #                                  #  #     #  #                                  |
>   1.1x +-+..........................#..#..................................#..#.....#..#................................+-+
>        |                            #  #                                  #  #     #  #                         +++      |
>        |                            #  #               ####               #  #     #  #                         ####     |
>  1.05x +-+..........................#..#...............#..#.....####......#..#.....#..#.........................#..#...+-+
>        |                            #  #               #  #     #  #      #  #     #  #                +++      #  #     |
>        |                   +++  *****  #     ####  *****  #     #  #   +++#  #  ****  #            ****###      #  #     |
>     1x +-++-+*****###++****+++++*+-+*++#+-****++#-+*+++*-+#+++++#++#++*****++#+-*++*++#-+*****-++++*++*++#++*****++#+-++-+
>        |     *   *  #  *  * |   *   *  #  *  *  #  *   *  #  ****  #  *   *  #  *  *  #  *   *###  *  *++#  *   *  #     |
>        |     *   *  #  *  *###  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
>  0.95x +-+...*...*..#..*..*.|#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#...+-+
>        |     *   *  #  *  * |#  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
>        |     *   *  #  *  * |#  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
>   0.9x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
>        ASSIGNMENT BITFIELD   FOURFP EMULATION   HUFFMAN   LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT     hmean
>   png: http://imgur.com/FfD27ey
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1493263764-18657-12-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/exec/tb-hash.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
> index 2c27490..b1fe2d0 100644
> --- a/include/exec/tb-hash.h
> +++ b/include/exec/tb-hash.h
> @@ -22,6 +22,8 @@
>
>  #include "exec/tb-hash-xx.h"
>
> +#ifdef CONFIG_SOFTMMU
> +
>  /* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for
>     addresses on the same page.  The top bits are the same.  This allows
>     TLB invalidation to quickly clear a subset of the hash table.  */
> @@ -45,6 +47,16 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
>             | (tmp & TB_JMP_ADDR_MASK));
>  }
>
> +#else
> +
> +/* In user-mode we can get better hashing because we do not have a TLB */
> +static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
> +{
> +    return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1);
> +}
> +
> +#endif /* CONFIG_SOFTMMU */
> +
>  static inline
>  uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags)
>  {

I'll note when I've plotted hit-rates against the cache we don't seem to
be making a good even use of the cache over time. But I suspect there is
more that could be done here. That said the numbers are compelling so:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 14/19] target/alpha: Use tcg_gen_goto_ptr
  2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 14/19] target/alpha: Use tcg_gen_goto_ptr Richard Henderson
@ 2017-04-28 17:10   ` Alex Bennée
  0 siblings, 0 replies; 58+ messages in thread
From: Alex Bennée @ 2017-04-28 17:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, cota


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target/alpha/translate.c | 49 +++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 36 insertions(+), 13 deletions(-)
>
> diff --git a/target/alpha/translate.c b/target/alpha/translate.c
> index df5d695..c1a5fbf 100644
> --- a/target/alpha/translate.c
> +++ b/target/alpha/translate.c
> @@ -89,6 +89,10 @@ typedef enum {
>         updated the PC for the next instruction to be executed.  */
>      EXIT_PC_STALE,
>
> +    /* Similarly, but force TB exit without chaining.  */
> +    EXIT_PC_UPDATED_FORCE,
> +    EXIT_PC_STALE_FORCE,
> +
>      /* We are ending the TB with a noreturn function call, e.g. longjmp.
>         No following code will be executed.  */
>      EXIT_NORETURN,
> @@ -455,11 +459,16 @@ static bool in_superpage(DisasContext *ctx, int64_t addr)
>  #endif
>  }
>
> +static bool use_exit_tb(DisasContext *ctx)
> +{
> +    /* Suppress any optimization in the case of single-steping and IO.  */
> +    return ((ctx->tb->cflags & CF_LAST_IO)
> +            || ctx->singlestep_enabled || singlestep);
> +}
> +
>  static bool use_goto_tb(DisasContext *ctx, uint64_t dest)
>  {
> -    /* Suppress goto_tb in the case of single-steping and IO.  */
> -    if ((ctx->tb->cflags & CF_LAST_IO)
> -        || ctx->singlestep_enabled || singlestep) {
> +    if (use_exit_tb(ctx)) {
>          return false;
>      }
>  #ifndef CONFIG_USER_ONLY
> @@ -1257,14 +1266,14 @@ static ExitStatus gen_call_pal(DisasContext *ctx, int palcode)
>             need the page permissions check.  We'll see the existence of
>             the page when we create the TB, and we'll flush all TBs if
>             we change the PAL base register.  */
> -        if (!ctx->singlestep_enabled && !(ctx->tb->cflags & CF_LAST_IO)) {
> +        if (use_exit_tb(ctx)) {
> +            tcg_gen_movi_i64(cpu_pc, entry);
> +            return EXIT_PC_UPDATED;
> +        } else {
>              tcg_gen_goto_tb(0);
>              tcg_gen_movi_i64(cpu_pc, entry);
>              tcg_gen_exit_tb((uintptr_t)ctx->tb);
>              return EXIT_GOTO_TB;
> -        } else {
> -            tcg_gen_movi_i64(cpu_pc, entry);
> -            return EXIT_PC_UPDATED;
>          }
>      }
>  #endif
> @@ -1323,7 +1332,7 @@ static ExitStatus gen_mfpr(DisasContext *ctx, TCGv va, int regno)
>              gen_io_start();
>              helper(va);
>              gen_io_end();
> -            return EXIT_PC_STALE;
> +            return EXIT_PC_STALE_FORCE;
>          } else {
>              helper(va);
>          }
> @@ -1374,7 +1383,7 @@ static ExitStatus gen_mtpr(DisasContext *ctx, TCGv vb, int regno)
>      case 252:
>          /* HALT */
>          gen_helper_halt(vb);
> -        return EXIT_PC_STALE;
> +        return EXIT_PC_STALE_FORCE;
>
>      case 251:
>          /* ALARM */
> @@ -1388,7 +1397,7 @@ static ExitStatus gen_mtpr(DisasContext *ctx, TCGv vb, int regno)
>             that ended with a CALL_PAL.  Since the base register usually only
>             changes during boot, flushing everything works well.  */
>          gen_helper_tb_flush(cpu_env);
> -        return EXIT_PC_STALE;
> +        return EXIT_PC_STALE_FORCE;
>
>      case 32 ... 39:
>          /* Accessing the "non-shadow" general registers.  */
> @@ -2373,7 +2382,7 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
>                  gen_io_start();
>                  gen_helper_load_pcc(va, cpu_env);
>                  gen_io_end();
> -                ret = EXIT_PC_STALE;
> +                ret = EXIT_PC_STALE_FORCE;
>              } else {
>                  gen_helper_load_pcc(va, cpu_env);
>              }
> @@ -2990,18 +2999,32 @@ void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
>      case EXIT_GOTO_TB:
>      case EXIT_NORETURN:
>          break;
> +
>      case EXIT_PC_STALE:
>          tcg_gen_movi_i64(cpu_pc, ctx.pc);
> -        /* FALLTHRU */
> +        goto do_exit_pc_updated;
> +    case EXIT_PC_STALE_FORCE:
> +        tcg_gen_movi_i64(cpu_pc, ctx.pc);
> +        goto do_exit_pc_updated_force;
> +
>      case EXIT_PC_UPDATED:
> +    do_exit_pc_updated:
> +        if (!use_exit_tb(&ctx)) {
> +            tcg_gen_lookup_and_goto_ptr(cpu_pc);
> +            break;
> +        }
> +        /* FALLTHRU */
> +    case EXIT_PC_UPDATED_FORCE:
> +    do_exit_pc_updated_force:
>          if (ctx.singlestep_enabled) {
>              gen_excp_1(EXCP_DEBUG, 0);
>          } else {
>              tcg_gen_exit_tb(0);
>          }
>          break;

Hmm this is ugly. You can make it a bit cleaner I think:

    case EXIT_PC_STALE:
        tcg_gen_movi_i64(cpu_pc, ctx.pc);
        /* FALLTHRU */
    case EXIT_PC_UPDATED:
        if (!use_exit_tb(&ctx)) {
            tcg_gen_lookup_and_goto_ptr(cpu_pc);
            break;
        }
        goto do_exit_pc_updated_force;

    case EXIT_PC_STALE_FORCE:
        tcg_gen_movi_i64(cpu_pc, ctx.pc);
        /* FALLTHRU */

    case EXIT_PC_UPDATED_FORCE:
    do_exit_pc_updated_force:
        if (ctx.singlestep_enabled) {
            gen_excp_1(EXCP_DEBUG, 0);
        } else {
            tcg_gen_exit_tb(0);
        }
        break;

But personally I'd be tempted to inline the force function and have:

static inline gen_exit_or_excp(void)
{
    if (ctx.singlestep_enabled) {
        gen_excp_1(EXCP_DEBUG, 0);
    } else {
        tcg_gen_exit_tb(0);
    }
}

and:

    case EXIT_PC_STALE:
        tcg_gen_movi_i64(cpu_pc, ctx.pc);
        /* FALLTHRU */
    case EXIT_PC_UPDATED:
        if (!use_exit_tb(&ctx)) {
            tcg_gen_lookup_and_goto_ptr(cpu_pc);
            break;
        }
        gen_exit_or_excp();
        break;

    case EXIT_PC_STALE_FORCE:
        tcg_gen_movi_i64(cpu_pc, ctx.pc);
        /* FALLTHRU */

    case EXIT_PC_UPDATED_FORCE:
        gen_exit_or_excp();
        break;

    default:
        g_assert_not_reached();
    }



> +
>      default:
> -        abort();
> +        g_assert_not_reached();
>      }
>
>      gen_tb_end(tb, num_insns);


--
Alex Bennée

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode
  2017-04-28 17:00   ` Alex Bennée
@ 2017-04-28 17:44     ` Emilio G. Cota
  0 siblings, 0 replies; 58+ messages in thread
From: Emilio G. Cota @ 2017-04-28 17:44 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Richard Henderson, qemu-devel

On Fri, Apr 28, 2017 at 18:00:57 +0100, Alex Bennée wrote:
> I'll note when I've plotted hit-rates against the cache we don't seem to
> be making a good even use of the cache over time. But I suspect there is
> more that could be done here. 

If you're talking about full-system mode here, then yes, the hit rate
isn't great, which I think is due to frequent flushes of tb_jmp_cache.

For user-mode and for many of these benchmarks we get hit rates in the high
90's. I also tried xxhash and hit rates didn't go up (sometimes even went
one or two percentage points lower), which shows that the hashing function
we have now is good enough, i.e. unless we increase the size of tb_jmp_cache,
there's not a lot to improve here.

I didn't check all benchmarks though, just a few of the SPEC06int ones.

If you have those numbers/plots I'd be interested in seeing them.

Thanks,

		E.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5+] TCG cross-tb optimizations
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (19 preceding siblings ...)
  2017-04-27 12:58 ` [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations no-reply
@ 2017-04-28 19:17 ` Emilio G. Cota
  2017-04-28 19:17   ` [Qemu-devel] [PATCH v5 + 1/2] target/aarch64: optimize cross-page direct jumps in softmmu Emilio G. Cota
  2017-04-28 19:17   ` [Qemu-devel] [PATCH v5 + 2/2] target/aarch64: optimize indirect branches Emilio G. Cota
  2017-04-30 14:52 ` [Qemu-devel] [PATCH v5++] TCG cross-tb optimizations Aurelien Jarno
  21 siblings, 2 replies; 58+ messages in thread
From: Emilio G. Cota @ 2017-04-28 19:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson, alex.bennee

While at it, we might want to include these two for aarch64.

They apply on top Richard's tcg-next branch, which includes v5.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 + 1/2] target/aarch64: optimize cross-page direct jumps in softmmu
  2017-04-28 19:17 ` [Qemu-devel] [PATCH v5+] " Emilio G. Cota
@ 2017-04-28 19:17   ` Emilio G. Cota
  2017-04-28 19:22     ` Emilio G. Cota
  2017-04-28 19:17   ` [Qemu-devel] [PATCH v5 + 2/2] target/aarch64: optimize indirect branches Emilio G. Cota
  1 sibling, 1 reply; 58+ messages in thread
From: Emilio G. Cota @ 2017-04-28 19:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson, alex.bennee

Perf numbers in next commit's log.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/arm/translate-a64.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 24de30d..5b691fc 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -373,8 +373,7 @@ static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest)
         } else if (s->singlestep_enabled) {
             gen_exception_internal(EXCP_DEBUG);
         } else {
-            tcg_gen_exit_tb(0);
-            s->is_jmp = DISAS_TB_JUMP;
+            tcg_gen_lookup_and_goto_ptr(cpu_pc);
         }
     }
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5 + 2/2] target/aarch64: optimize indirect branches
  2017-04-28 19:17 ` [Qemu-devel] [PATCH v5+] " Emilio G. Cota
  2017-04-28 19:17   ` [Qemu-devel] [PATCH v5 + 1/2] target/aarch64: optimize cross-page direct jumps in softmmu Emilio G. Cota
@ 2017-04-28 19:17   ` Emilio G. Cota
  2017-04-28 21:19     ` Emilio G. Cota
  2017-04-30  9:47     ` Richard Henderson
  1 sibling, 2 replies; 58+ messages in thread
From: Emilio G. Cota @ 2017-04-28 19:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson, alex.bennee

Measurements:

[Baseline performance is that before applying this and the previous commit]

-                                    NBench, aarch64-softmmu. Host: Intel i7-4790K @ 4.00GHz

 1.7x +-+--------------------------------------------------------------------------------------------------------------+-+
      |                                                                                                                  |
      |   cross                                                                                                          |
 1.6x +cross+jr.................................................####...................................................+-+
      |                                                         #++#                                                     |
      |                                                         #  #                                                     |
 1.5x +-+...................................................*****..#...................................................+-+
      |                                                     *+++*  #                                                     |
      |                                                     *   *  #                                                     |
 1.4x +-+...................................................*...*..#...................................................+-+
      |                                                     *   *  #                                                     |
      |                                     #####           *   *  #                                                     |
 1.3x +-+................................****+++#...........*...*..#...................................................+-+
      |                                  *++*   #           *   *  #                                                     |
      |                                  *  *   #           *   *  #                                                     |
 1.2x +-+................................*..*...#...........*...*..#...................................................+-+
      |                                  *  *   #           *   *  #                                                     |
      |                            ####  *  *   #           *   *  #                                                     |
 1.1x +-+.......................+++#..#..*..*...#...........*...*..#...................................................+-+
      |                         ****  #  *  *   #           *   *  #                                        ****####     |
      |                         *  *  #  *  *   #           *   *  #  ****###   +++####            ****###  *  *   #     |
   1x +-++-++++++-++++****###++-*++*++#++*++*+-+#++****+++++*+++*++#++*++*-+#++*****++#++****###-++*++*-+#++*+-*+++#+-++-+
      |     *****###  *  *  #   *  *  #  *  *   #  *++*###  *   *  #  *  *  #  *   *  #  *  *++#   *  *  #  *  *   #     |
      |     *   *++#  *  *  #   *  *  #  *  *   #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #   *  *  #  *  *   #     |
 0.9x +-+---*****###--****###---****###--****####--****###--*****###--****###--*****###--****###---****###--****####---+-+
      ASSIGNMENT BITFIELD   FOURFP EMULATION   HUFFMAN   LU DECOMPOSITIONNEURAL NUMERIC SORSTRING SORT    hmean
  png: http://imgur.com/qO9ubtk
NB. cross here represents the previous commit.

-                            SPECint06 (test set), x86_64-linux-user. Host: Intel i7-4790K @ 4.00GHz

 1.5x +-+--------------------------------------------------------------------------------------------------------------+-+
      |                                                                       *****                                      |
      |                                                                       *+++*                           jr         |
      |                                                                       *   *                                      |
 1.4x +-+.....................................................................*...*.....................+++............+-+
      |                                                                       *   *                      |               |
      |                                      *****                            *   *                      |               |
      |                                      *   *                            *   *                    *****             |
 1.3x +-+....................................*...*............................*...*....................*.|.*...........+-+
      |                       +++            *   *                            *   *                    * | *             |
      |                      *****           *   *                            *   *                    *+++*             |
      |                      *   *           *   *                            *   *                    *   *             |
 1.2x +-+....................*...*...........*...*............................*...*...........*****....*...*...........+-+
      |     *****            *   *           *   *                            *   *           *   *    *   *    +++      |
      |     *   *            *   *           *   *                            *   *           *   *    *   *   *****     |
      |     *   *            *   *   *****   *   *                            *   *           *   *    *   *   *   *     |
 1.1x +-+...*...*............*...*...*...*...*...*............................*...*....+++....*...*....*...*...*...*...+-+
      |     *   *            *   *   *   *   *   *                            *   *   *****   *   *    *   *   *   *     |
      |     *   *            *   *   *   *   *   *   *****                    *   *   *   *   *   *    *   *   *   *     |
      |     *   *   *****    *   *   *   *   *   *   *   *   ******           *   *   *   *   *   *    *   *   *   *     |
   1x +-++-+*+++*-++*+++*++++*+-+*+++*-++*+++*-++*+++*+++*++-*++++*-++*****+++*++-*+++*++-*+++*+-+*++++*+++*++-*+++*+-++-+
      |     *   *   *   *    *   *   *   *   *   *   *   *   *    *   *+++*   *   *   *   *   *   *    *   *   *   *     |
      |     *   *   *   *    *   *   *   *   *   *   *   *   *    *   *   *   *   *   *   *   *   *    *   *   *   *     |
      |     *   *   *   *    *   *   *   *   *   *   *   *   *    *   *   *   *   *   *   *   *   *    *   *   *   *     |
 0.9x +-+---*****---*****----*****---*****---*****---*****---******---*****---*****---*****---*****----*****---*****---+-+
         astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
  png: http://imgur.com/R0FXKxP

-                           SPECint06 (train set), x86_64-linux-user. Host: Intel i7-4790K @ 4.00GHz

 1.7x +-+--------------------------------------------------------------------------------------------------------------+-+
      |                                                                                                                  |
      |                                                                                                       jr         |
 1.6x +-+...............................................................................................+++............+-+
      |                                                                                                *****             |
      |                                                                                                *+++*             |
      |                                                                                                *   *             |
 1.5x +-+..............................................................................................*...*...........+-+
      |                                                                        +++                     *   *             |
      |                                                                       *****                    *   *             |
 1.4x +-+.....................................................................*+++*....................*...*...........+-+
      |                                                                       *   *                    *   *             |
      |                                      *****                            *   *                    *   *             |
      |                                      *   *                            *   *   *****            *   *             |
 1.3x +-+....................................*...*............................*...*...*...*............*...*...........+-+
      |                       +++            *   *                            *   *   *   *            *   *             |
      |                      *****           *   *                            *   *   *   *   *****    *   *             |
 1.2x +-+....................*...*...........*...*............................*...*...*...*...*+++*....*...*...*****...+-+
      |                      *   *           *   *                            *   *   *   *   *   *    *   *   *+++*     |
      |     *****            *   *   *****   *   *                            *   *   *   *   *   *    *   *   *   *     |
      |     *   *            *   *   *+++*   *   *                            *   *   *   *   *   *    *   *   *   *     |
 1.1x +-+...*...*............*...*...*...*...*...*............................*...*...*...*...*...*....*...*...*...*...+-+
      |     *   *   *****    *   *   *   *   *   *                    *****   *   *   *   *   *   *    *   *   *   *     |
      |     *   *   *   *    *   *   *   *   *   *    +++    ******   *+++*   *   *   *   *   *   *    *   *   *   *     |
   1x +-+---*****---*****----*****---*****---*****---*****---******---*****---*****---*****---*****----*****---*****---+-+
         astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
  png: http://imgur.com/DXzwyLP

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/arm/translate-a64.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 5b691fc..46cb6c5 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -11360,8 +11360,7 @@ void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb)
             gen_a64_set_pc_im(dc->pc);
             /* fall through */
         case DISAS_JUMP:
-            /* indicate that the hash table must be used to find the next TB */
-            tcg_gen_exit_tb(0);
+            tcg_gen_lookup_and_goto_ptr(cpu_pc);
             break;
         case DISAS_TB_JUMP:
         case DISAS_EXC:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 + 1/2] target/aarch64: optimize cross-page direct jumps in softmmu
  2017-04-28 19:17   ` [Qemu-devel] [PATCH v5 + 1/2] target/aarch64: optimize cross-page direct jumps in softmmu Emilio G. Cota
@ 2017-04-28 19:22     ` Emilio G. Cota
  2017-04-29 10:30       ` Richard Henderson
  0 siblings, 1 reply; 58+ messages in thread
From: Emilio G. Cota @ 2017-04-28 19:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson, alex.bennee

On Fri, Apr 28, 2017 at 15:17:24 -0400, Emilio G. Cota wrote:
> Perf numbers in next commit's log.
> 
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  target/arm/translate-a64.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index 24de30d..5b691fc 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -373,8 +373,7 @@ static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest)
>          } else if (s->singlestep_enabled) {
>              gen_exception_internal(EXCP_DEBUG);
>          } else {
> -            tcg_gen_exit_tb(0);
> -            s->is_jmp = DISAS_TB_JUMP;

I'm not sure about removing this line though. Would it be better to leave it?
I can't see how TB_JUMP ends up doing anything in the rest of the file.

Thanks,

		E.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 + 2/2] target/aarch64: optimize indirect branches
  2017-04-28 19:17   ` [Qemu-devel] [PATCH v5 + 2/2] target/aarch64: optimize indirect branches Emilio G. Cota
@ 2017-04-28 21:19     ` Emilio G. Cota
  2017-04-30  9:47     ` Richard Henderson
  1 sibling, 0 replies; 58+ messages in thread
From: Emilio G. Cota @ 2017-04-28 21:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson, alex.bennee

On Fri, Apr 28, 2017 at 15:17:25 -0400, Emilio G. Cota wrote:
> Measurements:
(snip)
> -                            SPECint06 (test set), x86_64-linux-user. Host: Intel i7-4790K @ 4.00GHz
(snip)
> -                           SPECint06 (train set), x86_64-linux-user. Host: Intel i7-4790K @ 4.00GHz
s/x86_64/aarch64/ , obviously.

		E.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/19] target/i386: optimize cross-page direct jumps in softmmu
  2017-04-28 16:56   ` Alex Bennée
@ 2017-04-29  9:14     ` Richard Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Henderson @ 2017-04-29  9:14 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, cota

On 04/28/2017 06:56 PM, Alex Bennée wrote:
> 
> Richard Henderson <rth@twiddle.net> writes:
> 
>> From: "Emilio G. Cota" <cota@braap.org>
>>
>> Instead of unconditionally exiting to the exec loop, use the
>> gen_jr helper to jump to the target if it is valid.
>>
>> Perf impact: see next commit's log.
>>
>> Reviewed-by: Richard Henderson <rth@twiddle.net>
>> Signed-off-by: Emilio G. Cota <cota@braap.org>
>> Message-Id: <1493263764-18657-10-git-send-email-cota@braap.org>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>>   target/i386/translate.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/i386/translate.c b/target/i386/translate.c
>> index f0e48dc..ea113fe 100644
>> --- a/target/i386/translate.c
>> +++ b/target/i386/translate.c
>> @@ -2154,9 +2154,9 @@ static inline void gen_goto_tb(DisasContext *s, int tb_num, target_ulong eip)
>>           gen_jmp_im(eip);
>>           tcg_gen_exit_tb((uintptr_t)s->tb + tb_num);
>>       } else {
>> -        /* jump to another page: currently not optimized */
>> +        /* jump to another page */
>>           gen_jmp_im(eip);
>> -        gen_eob(s);
>> +        gen_jr(s, cpu_tmp0);
> 
> 
> I had to look up what was going on with cpu_tmp0 there. Is there a
> particular reason i386 has these global temps with implied setting
> rules? It does seem somewhat hacky.

It's mostly hysterical raisins, and that no one has rewritten it yet.

> Given cmp_tmp0 seems to be a heavily used across i386 I guess it keeps
> to the style of the translator :-/

Yeah.  :-/


r~

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 + 1/2] target/aarch64: optimize cross-page direct jumps in softmmu
  2017-04-28 19:22     ` Emilio G. Cota
@ 2017-04-29 10:30       ` Richard Henderson
  2017-05-01  2:10         ` Emilio G. Cota
  0 siblings, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-29 10:30 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel; +Cc: alex.bennee

On 04/28/2017 09:22 PM, Emilio G. Cota wrote:
> On Fri, Apr 28, 2017 at 15:17:24 -0400, Emilio G. Cota wrote:
>> Perf numbers in next commit's log.
>>
>> Signed-off-by: Emilio G. Cota <cota@braap.org>
>> ---
>>   target/arm/translate-a64.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
>> index 24de30d..5b691fc 100644
>> --- a/target/arm/translate-a64.c
>> +++ b/target/arm/translate-a64.c
>> @@ -373,8 +373,7 @@ static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest)
>>           } else if (s->singlestep_enabled) {
>>               gen_exception_internal(EXCP_DEBUG);
>>           } else {
>> -            tcg_gen_exit_tb(0);
>> -            s->is_jmp = DISAS_TB_JUMP;
> 
> I'm not sure about removing this line though. Would it be better to leave it?
> I can't see how TB_JUMP ends up doing anything in the rest of the file.

Why not just replace this with

   s->is_jmp = DISAS_JUMP

and not emit the lookup_and_goto_ptr here at all?


r~

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 + 2/2] target/aarch64: optimize indirect branches
  2017-04-28 19:17   ` [Qemu-devel] [PATCH v5 + 2/2] target/aarch64: optimize indirect branches Emilio G. Cota
  2017-04-28 21:19     ` Emilio G. Cota
@ 2017-04-30  9:47     ` Richard Henderson
  2017-04-30 10:17       ` Richard Henderson
  1 sibling, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-04-30  9:47 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel; +Cc: alex.bennee

These aarch64 patches fail for me like so:

$ ../bld/aarch64-softmmu/qemu-system-aarch64 -M virt -cpu cortex-a57 \
     -m 1024 -nographic -kernel ./aarch64-linux-3.15rc2-buildroot.img \
     -append console=ttyAMA0
qemu-system-aarch64: /home/rth/work/qemu/qemu/cpu-exec.c:599: cpu_loop_exec_tb: 
Assertion `use_icount' failed.



r~

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 + 2/2] target/aarch64: optimize indirect branches
  2017-04-30  9:47     ` Richard Henderson
@ 2017-04-30 10:17       ` Richard Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Richard Henderson @ 2017-04-30 10:17 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel; +Cc: alex.bennee

On 04/30/2017 11:47 AM, Richard Henderson wrote:
> These aarch64 patches fail for me like so:
> 
> $ ../bld/aarch64-softmmu/qemu-system-aarch64 -M virt -cpu cortex-a57 \
>      -m 1024 -nographic -kernel ./aarch64-linux-3.15rc2-buildroot.img \
>      -append console=ttyAMA0
> qemu-system-aarch64: /home/rth/work/qemu/qemu/cpu-exec.c:599: cpu_loop_exec_tb: 
> Assertion `use_icount' failed.

Bah.  Nevermind, this is my fault.


r~

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5++] TCG cross-tb optimizations
  2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
                   ` (20 preceding siblings ...)
  2017-04-28 19:17 ` [Qemu-devel] [PATCH v5+] " Emilio G. Cota
@ 2017-04-30 14:52 ` Aurelien Jarno
  2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 1/3] tcg/mips: implement goto_ptr Aurelien Jarno
                     ` (2 more replies)
  21 siblings, 3 replies; 58+ messages in thread
From: Aurelien Jarno @ 2017-04-30 14:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson, Emilio G . Cota, Aurelien Jarno

Please find patches to support cross-tb optimizations on MIPS hosts
and to implement cross-tb optimizations for MIPS target.

Aurelien Jarno (3):
  tcg/mips: implement goto_ptr
  target/mips: optimize cross-page direct jumps in softmmu
  target/mips: optimize indirect branches

 target/mips/translate.c   |  4 ++--
 tcg/mips/tcg-target.h     |  2 +-
 tcg/mips/tcg-target.inc.c | 13 +++++++++++++
 3 files changed, 16 insertions(+), 3 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5++ 1/3] tcg/mips: implement goto_ptr
  2017-04-30 14:52 ` [Qemu-devel] [PATCH v5++] TCG cross-tb optimizations Aurelien Jarno
@ 2017-04-30 14:52   ` Aurelien Jarno
  2017-05-01 22:00     ` Philippe Mathieu-Daudé
  2017-05-02 16:21     ` Richard Henderson
  2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 2/3] target/mips: optimize cross-page direct jumps in softmmu Aurelien Jarno
  2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 3/3] target/mips: optimize indirect branches Aurelien Jarno
  2 siblings, 2 replies; 58+ messages in thread
From: Aurelien Jarno @ 2017-04-30 14:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson, Emilio G . Cota, Aurelien Jarno

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/mips/tcg-target.h     |  2 +-
 tcg/mips/tcg-target.inc.c | 13 +++++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index e3240cfba7..d75cb63ed3 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -130,7 +130,7 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_muluh_i32        1
 #define TCG_TARGET_HAS_mulsh_i32        1
 #define TCG_TARGET_HAS_bswap32_i32      1
-#define TCG_TARGET_HAS_goto_ptr         0
+#define TCG_TARGET_HAS_goto_ptr         1
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 01ac7b2c81..9e5b9f42da 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -1747,6 +1747,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_nop(s);
         s->tb_jmp_reset_offset[a0] = tcg_current_code_size(s);
         break;
+    case INDEX_op_goto_ptr:
+        /* jmp to the given host address (could be epilogue) */
+        tcg_out_opc_reg(s, OPC_JR, 0, a0, 0);
+        tcg_out_nop(s);
+        break;
     case INDEX_op_br:
         tcg_out_brcond(s, TCG_COND_EQ, TCG_REG_ZERO, TCG_REG_ZERO,
                        arg_label(a0));
@@ -2160,6 +2165,7 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_exit_tb, { } },
     { INDEX_op_goto_tb, { } },
     { INDEX_op_br, { } },
+    { INDEX_op_goto_ptr, { "r" } },
 
     { INDEX_op_ld8u_i32, { "r", "r" } },
     { INDEX_op_ld8s_i32, { "r", "r" } },
@@ -2451,6 +2457,13 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     /* delay slot */
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
 
+    /*
+     * Return path for goto_ptr. Set return value to 0, a-la exit_tb,
+     * and fall through to the rest of the epilogue.
+     */
+    s->code_gen_epilogue = s->code_ptr;
+    tcg_out_mov(s, TCG_TYPE_REG, TCG_REG_V0, TCG_REG_ZERO);
+
     /* TB epilogue */
     tb_ret_addr = s->code_ptr;
     for (i = 0; i < ARRAY_SIZE(tcg_target_callee_save_regs); i++) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5++ 2/3] target/mips: optimize cross-page direct jumps in softmmu
  2017-04-30 14:52 ` [Qemu-devel] [PATCH v5++] TCG cross-tb optimizations Aurelien Jarno
  2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 1/3] tcg/mips: implement goto_ptr Aurelien Jarno
@ 2017-04-30 14:52   ` Aurelien Jarno
  2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 3/3] target/mips: optimize indirect branches Aurelien Jarno
  2 siblings, 0 replies; 58+ messages in thread
From: Aurelien Jarno @ 2017-04-30 14:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Emilio G . Cota, Aurelien Jarno, Yongbok Kim

Cc: Yongbok Kim <yongbok.kim@imgtec.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target/mips/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 3022f349cb..1a7ac07c67 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -4233,7 +4233,7 @@ static inline void gen_goto_tb(DisasContext *ctx, int n, target_ulong dest)
             save_cpu_state(ctx, 0);
             gen_helper_raise_exception_debug(cpu_env);
         }
-        tcg_gen_exit_tb(0);
+        tcg_gen_lookup_and_goto_ptr(cpu_PC);
     }
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [Qemu-devel] [PATCH v5++ 3/3] target/mips: optimize indirect branches
  2017-04-30 14:52 ` [Qemu-devel] [PATCH v5++] TCG cross-tb optimizations Aurelien Jarno
  2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 1/3] tcg/mips: implement goto_ptr Aurelien Jarno
  2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 2/3] target/mips: optimize cross-page direct jumps in softmmu Aurelien Jarno
@ 2017-04-30 14:52   ` Aurelien Jarno
  2 siblings, 0 replies; 58+ messages in thread
From: Aurelien Jarno @ 2017-04-30 14:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Emilio G . Cota, Aurelien Jarno, Yongbok Kim

Cc: Yongbok Kim <yongbok.kim@imgtec.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 target/mips/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 1a7ac07c67..559f8fed89 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -10725,7 +10725,7 @@ static void gen_branch(DisasContext *ctx, int insn_bytes)
                 save_cpu_state(ctx, 0);
                 gen_helper_raise_exception_debug(cpu_env);
             }
-            tcg_gen_exit_tb(0);
+            tcg_gen_lookup_and_goto_ptr(cpu_PC);
             break;
         default:
             fprintf(stderr, "unknown branch 0x%x\n", proc_hflags);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5 + 1/2] target/aarch64: optimize cross-page direct jumps in softmmu
  2017-04-29 10:30       ` Richard Henderson
@ 2017-05-01  2:10         ` Emilio G. Cota
  0 siblings, 0 replies; 58+ messages in thread
From: Emilio G. Cota @ 2017-05-01  2:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, alex.bennee

On Sat, Apr 29, 2017 at 12:30:08 +0200, Richard Henderson wrote:
> On 04/28/2017 09:22 PM, Emilio G. Cota wrote:
> >On Fri, Apr 28, 2017 at 15:17:24 -0400, Emilio G. Cota wrote:
> >>+++ b/target/arm/translate-a64.c
> >>@@ -373,8 +373,7 @@ static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest)
> >>          } else if (s->singlestep_enabled) {
> >>              gen_exception_internal(EXCP_DEBUG);
> >>          } else {
> >>-            tcg_gen_exit_tb(0);
> >>-            s->is_jmp = DISAS_TB_JUMP;
> >
> >I'm not sure about removing this line though. Would it be better to leave it?
> >I can't see how TB_JUMP ends up doing anything in the rest of the file.
> 
> Why not just replace this with
> 
>   s->is_jmp = DISAS_JUMP
> 
> and not emit the lookup_and_goto_ptr here at all?

If we don't emit anything here, we get the error you reported
in the other message (icount whatever in cpu-exec.c:599).

I think this is due to callers assuming get_goto_tb does indeed
generate code, instead of deferring it via is_jmp. For example:

    if (cond < 0x0e) {
        /* genuinely conditional branches */
        TCGLabel *label_match = gen_new_label();
        arm_gen_test_cc(cond, label_match);
        gen_goto_tb(s, 0, s->pc);
        gen_set_label(label_match);
        gen_goto_tb(s, 1, addr);
    } else { [...]

So the simplest solution here seems to just emit the goto_ptr helper
in gen_goto_tb().

Regarding the setting of is_jmp to DISAS_TB_JUMP, after having
looked at the code more closely, I think it shouldn't
be removed, since this is the way we break out of the loop in
gen_intermediate_code(), thereby marking this instruction as the
last of the current TB.

I have updated patch 1/2 accordingly. You can cherry-pick it from:
  https://github.com/cota/qemu/tree/tcg-next-v5+

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5++ 1/3] tcg/mips: implement goto_ptr
  2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 1/3] tcg/mips: implement goto_ptr Aurelien Jarno
@ 2017-05-01 22:00     ` Philippe Mathieu-Daudé
  2017-05-02 16:21     ` Richard Henderson
  1 sibling, 0 replies; 58+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-05-01 22:00 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Emilio G . Cota, Richard Henderson

On 04/30/2017 11:52 AM, Aurelien Jarno wrote:
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> ---
>  tcg/mips/tcg-target.h     |  2 +-
>  tcg/mips/tcg-target.inc.c | 13 +++++++++++++
>  2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
> index e3240cfba7..d75cb63ed3 100644
> --- a/tcg/mips/tcg-target.h
> +++ b/tcg/mips/tcg-target.h
> @@ -130,7 +130,7 @@ extern bool use_mips32r2_instructions;
>  #define TCG_TARGET_HAS_muluh_i32        1
>  #define TCG_TARGET_HAS_mulsh_i32        1
>  #define TCG_TARGET_HAS_bswap32_i32      1
> -#define TCG_TARGET_HAS_goto_ptr         0
> +#define TCG_TARGET_HAS_goto_ptr         1
>
>  #if TCG_TARGET_REG_BITS == 64
>  #define TCG_TARGET_HAS_add2_i32         0
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index 01ac7b2c81..9e5b9f42da 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -1747,6 +1747,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          tcg_out_nop(s);
>          s->tb_jmp_reset_offset[a0] = tcg_current_code_size(s);
>          break;
> +    case INDEX_op_goto_ptr:
> +        /* jmp to the given host address (could be epilogue) */
> +        tcg_out_opc_reg(s, OPC_JR, 0, a0, 0);
> +        tcg_out_nop(s);
> +        break;
>      case INDEX_op_br:
>          tcg_out_brcond(s, TCG_COND_EQ, TCG_REG_ZERO, TCG_REG_ZERO,
>                         arg_label(a0));
> @@ -2160,6 +2165,7 @@ static const TCGTargetOpDef mips_op_defs[] = {
>      { INDEX_op_exit_tb, { } },
>      { INDEX_op_goto_tb, { } },
>      { INDEX_op_br, { } },
> +    { INDEX_op_goto_ptr, { "r" } },
>
>      { INDEX_op_ld8u_i32, { "r", "r" } },
>      { INDEX_op_ld8s_i32, { "r", "r" } },
> @@ -2451,6 +2457,13 @@ static void tcg_target_qemu_prologue(TCGContext *s)
>      /* delay slot */
>      tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
>
> +    /*
> +     * Return path for goto_ptr. Set return value to 0, a-la exit_tb,
> +     * and fall through to the rest of the epilogue.
> +     */
> +    s->code_gen_epilogue = s->code_ptr;
> +    tcg_out_mov(s, TCG_TYPE_REG, TCG_REG_V0, TCG_REG_ZERO);
> +
>      /* TB epilogue */
>      tb_ret_addr = s->code_ptr;
>      for (i = 0; i < ARRAY_SIZE(tcg_target_callee_save_regs); i++) {
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5++ 1/3] tcg/mips: implement goto_ptr
  2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 1/3] tcg/mips: implement goto_ptr Aurelien Jarno
  2017-05-01 22:00     ` Philippe Mathieu-Daudé
@ 2017-05-02 16:21     ` Richard Henderson
  2017-05-02 19:38       ` Aurelien Jarno
  1 sibling, 1 reply; 58+ messages in thread
From: Richard Henderson @ 2017-05-02 16:21 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Emilio G . Cota

On 04/30/2017 04:52 PM, Aurelien Jarno wrote:
> +        /* jmp to the given host address (could be epilogue) */
> +        tcg_out_opc_reg(s, OPC_JR, 0, a0, 0);
> +        tcg_out_nop(s);

Any particular reason not to do the zeroing in the delay slot...

> +    s->code_gen_epilogue = s->code_ptr;
> +    tcg_out_mov(s, TCG_TYPE_REG, TCG_REG_V0, TCG_REG_ZERO);

... instead of here?


r~

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [Qemu-devel] [PATCH v5++ 1/3] tcg/mips: implement goto_ptr
  2017-05-02 16:21     ` Richard Henderson
@ 2017-05-02 19:38       ` Aurelien Jarno
  0 siblings, 0 replies; 58+ messages in thread
From: Aurelien Jarno @ 2017-05-02 19:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Emilio G . Cota

On 2017-05-02 18:21, Richard Henderson wrote:
> On 04/30/2017 04:52 PM, Aurelien Jarno wrote:
> > +        /* jmp to the given host address (could be epilogue) */
> > +        tcg_out_opc_reg(s, OPC_JR, 0, a0, 0);
> > +        tcg_out_nop(s);
> 
> Any particular reason not to do the zeroing in the delay slot...
> 
> > +    s->code_gen_epilogue = s->code_ptr;
> > +    tcg_out_mov(s, TCG_TYPE_REG, TCG_REG_V0, TCG_REG_ZERO);
> 
> ... instead of here?

There is no particular reason in the current usage of goto_ptr. It's
just that in the future we might want to use code_gen_epilogue for
other reasons or use the tcg_out_opc_reg to do other things. It's
probably better to have a consistent behaviour across all TCG
targets.

That said if you prefer, I am find sending a v2 with the zeroing moved
to the delay slot.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2017-05-02 19:38 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-27 11:59 [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations Richard Henderson
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 01/19] target/nios2: Fix 64-bit ilp32 compilation Richard Henderson
2017-04-27 16:03   ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 02/19] tcg/sparc: Use the proper compilation flags for 32-bit Richard Henderson
2017-04-27 16:04   ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 03/19] qemu/atomic: Loosen restrictions for 64-bit ILP32 hosts Richard Henderson
2017-04-27 16:10   ` Alex Bennée
2017-04-28  7:07     ` Richard Henderson
2017-04-28  7:47       ` Alex Bennée
2017-04-28  8:05         ` Richard Henderson
2017-04-28 10:25           ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 04/19] exec-all: export tb_htable_lookup Richard Henderson
2017-04-27 16:10   ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 05/19] tcg-runtime: add lookup_tb_ptr helper Richard Henderson
2017-04-28 10:29   ` Alex Bennée
2017-04-28 10:32     ` Richard Henderson
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 06/19] tcg: introduce goto_ptr opcode Richard Henderson
2017-04-28 10:32   ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 07/19] tcg: export tcg_gen_lookup_and_goto_ptr Richard Henderson
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 08/19] target/arm: optimize cross-page direct jumps in softmmu Richard Henderson
2017-04-28 11:30   ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 09/19] target/arm: optimize indirect branches Richard Henderson
2017-04-27 22:58   ` Emilio G. Cota
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 10/19] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr Richard Henderson
2017-04-28 16:50   ` Alex Bennée
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 11/19] target/i386: optimize cross-page direct jumps in softmmu Richard Henderson
2017-04-28 16:56   ` Alex Bennée
2017-04-29  9:14     ` Richard Henderson
2017-04-27 11:59 ` [Qemu-devel] [PATCH v5 12/19] target/i386: optimize indirect branches Richard Henderson
2017-04-28 16:58   ` Alex Bennée
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 13/19] tb-hash: improve tb_jmp_cache hash function in user mode Richard Henderson
2017-04-28 17:00   ` Alex Bennée
2017-04-28 17:44     ` Emilio G. Cota
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 14/19] target/alpha: Use tcg_gen_goto_ptr Richard Henderson
2017-04-28 17:10   ` Alex Bennée
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 15/19] tcg/i386: implement goto_ptr Richard Henderson
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 16/19] tcg/ppc: Implement goto_ptr Richard Henderson
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 17/19] tcg/aarch64: " Richard Henderson
2017-04-27 22:18   ` Emilio G. Cota
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 18/19] tcg/sparc: " Richard Henderson
2017-04-27 12:00 ` [Qemu-devel] [PATCH v5 19/19] tcg/s390: " Richard Henderson
2017-04-27 12:58 ` [Qemu-devel] [PATCH v5 00/19] TCG cross-tb optimizations no-reply
2017-04-28 19:17 ` [Qemu-devel] [PATCH v5+] " Emilio G. Cota
2017-04-28 19:17   ` [Qemu-devel] [PATCH v5 + 1/2] target/aarch64: optimize cross-page direct jumps in softmmu Emilio G. Cota
2017-04-28 19:22     ` Emilio G. Cota
2017-04-29 10:30       ` Richard Henderson
2017-05-01  2:10         ` Emilio G. Cota
2017-04-28 19:17   ` [Qemu-devel] [PATCH v5 + 2/2] target/aarch64: optimize indirect branches Emilio G. Cota
2017-04-28 21:19     ` Emilio G. Cota
2017-04-30  9:47     ` Richard Henderson
2017-04-30 10:17       ` Richard Henderson
2017-04-30 14:52 ` [Qemu-devel] [PATCH v5++] TCG cross-tb optimizations Aurelien Jarno
2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 1/3] tcg/mips: implement goto_ptr Aurelien Jarno
2017-05-01 22:00     ` Philippe Mathieu-Daudé
2017-05-02 16:21     ` Richard Henderson
2017-05-02 19:38       ` Aurelien Jarno
2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 2/3] target/mips: optimize cross-page direct jumps in softmmu Aurelien Jarno
2017-04-30 14:52   ` [Qemu-devel] [PATCH v5++ 3/3] target/mips: optimize indirect branches Aurelien Jarno

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.