All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts
@ 2017-07-20  3:08 Emilio G. Cota
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 01/43] cputlb: bring back tlb_flush_count under !TLB_DEBUG Emilio G. Cota
                   ` (43 more replies)
  0 siblings, 44 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

v2:
  https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg04749.html

v3 applies on top of the current master (d4e59218a).

To ease review/testing, you can pull this series from:
  https://github.com/cota/qemu/tree/multi-tcg-v3

Note: I cannot even compile-test _WIN32 bits, help appreciated! See
patches 39/40.

Changes from v2:
- Rebase on top of current master (therefore dropping the first 2 patches,
  which are already on master)
  - Add sh4 bits, touching:
    - Removal of argument to tb_lookup_ptr (merged into otherwise same v2 patch)
    - tb_cflags() inline (new patch in v3 for sh4 and all other arches)
    - CF_PARALLEL instead of parallel_cpus (sh4-only patch in v3)
- Add R-b tags
- Drop the patch removing the tb->invalid check.
- Introduce the patch implementing tb_lookup__cpu_state before the patches
  that fiddle with tb->cflags, so that we have a single place where to
  do that fiddling
  - Update commit log of the tb_lookup__cpu_state patch explaining
    why tb->invalid must be checked when obtaining the *tb from tb_jmp_cache
- Improve comment next to CF_INVALID
- CF_PARALLEL:
  - Introduce tb_cflags inline to hide the atomic_read
    - Add an extra patch to convert tb->cflags readers to tb_cflags
  - Rename curr_cf_mask() to curr_cflags()
    - Remove many superfluous if (parallel_cpus) checks; just call curr_cflags()
  - Drop tb_cf_mask(); use CF_HASH_MASK instead
  - m68k: use gen_helper_exit_atomic instead of implementing cas2w_parallel
  - s390x: Richard: I dropped your R-b tag because v3 also includes csst.
  - sh4: add sh4 patch, as mentioned above
  - tcg_ctx.cf_parallel: use a bool instead of a u8
  - Do if (foo && (tb_cflags(tb) & BAR)) instead of (foo && tb_cflags() & BAR)
- Use a size_t for struct tb_tc.size, plugging the 4-byte hole
- Dynamically allocate TCG optimizer globals
  - Use directly a bitmap, instead of TCGTempSet for temps_used, which
    saves some space
  - Add perf numbers for the change: ~2% slowdown
- **tcg_ctxs: get rid of tcg_ctxs_init
- TCGProfile: s/PROF_ADD_MAX/PROF_MAX/
- real_host_page_size: move to its own file with an init constructor,
  as suggested by Richard (Richard: I kept your R-b tag).
- qemu_mprotect helpers: g_assert on page-aligned address and size
  - Adapt callers in translate-all.c to pass page-aligned address and size
- TCG regions:
  - Hide the computation of n_regions from tcg_region_init's callers. The
    function now takes no arguments. Add a comment about
    qemu_tcg_mttcg_enabled().
  - if (!inited) { inited = true; do_init(); } in cpus.c
  - Use assert instead of if (err) tcg_abort();
  - Use QEMU_ALIGN_DOWN instead of &= mask
  - Inline set_guard_pages() into tcg_region_init
  - Merge patch that removes code_gen_buffer's guard page into the TCG
    regions' patch
- TCG __thread:
  - Inline tcg_ctxs_init into tcg_context_init
  - Move the code that determines the number of regions from the previous
    patch to this patch.

To be done after this series:
- Get rid of tb_lock, or at least push it down so that we take advantage of
  multiple TCG contexts in MTTCG. (I'm doing this in my testing, but doing
  it well will require another patch series.)

Improvements that were suggested during this series' development:
- Order tb->[*] comparisons by likelihood of mismatch.
- Get rid of parallel_cpus from from cpu_exec_step_atomic -- I'm not sure
  whether just removing it is safe, since we call curr_cflags from several
  places.
- Perhaps parse -accel=tcg command-line arguments before TCG is initialized,
  so that those arguments can be used during TCG initialization.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 01/43] cputlb: bring back tlb_flush_count under !TLB_DEBUG
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
@ 2017-07-20  3:08 ` Emilio G. Cota
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 02/43] tcg: fix corruption of code_time profiling counter upon tb_flush Emilio G. Cota
                   ` (42 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Commit f0aff0f124 ("cputlb: add assert_cpu_is_self checks") buried
the increment of tlb_flush_count under TLB_DEBUG. This results in
"info jit" always (mis)reporting 0 TLB flushes when !TLB_DEBUG.

Besides, under MTTCG tlb_flush_count is updated by several threads,
so in order not to lose counts we'd either have to use atomic ops
or distribute the counter, which is more scalable.

This patch does the latter by embedding tlb_flush_count in CPUArchState.
The global count is then easily obtained by iterating over the CPU list.

Note that this change also requires updating the accessors to
tlb_flush_count to use atomic_read/set whenever there may be conflicting
accesses (as defined in C11) to it.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/cpu-defs.h   |  1 +
 include/exec/cputlb.h     |  3 +--
 accel/tcg/cputlb.c        | 17 ++++++++++++++---
 accel/tcg/translate-all.c |  2 +-
 4 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index bc8e7f8..e43ff83 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -137,6 +137,7 @@ typedef struct CPUIOTLBEntry {
     CPUTLBEntry tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE];               \
     CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE];                    \
     CPUIOTLBEntry iotlb_v[NB_MMU_MODES][CPU_VTLB_SIZE];                 \
+    size_t tlb_flush_count;                                             \
     target_ulong tlb_flush_addr;                                        \
     target_ulong tlb_flush_mask;                                        \
     target_ulong vtlb_index;                                            \
diff --git a/include/exec/cputlb.h b/include/exec/cputlb.h
index 3f94178..c91db21 100644
--- a/include/exec/cputlb.h
+++ b/include/exec/cputlb.h
@@ -23,7 +23,6 @@
 /* cputlb.c */
 void tlb_protect_code(ram_addr_t ram_addr);
 void tlb_unprotect_code(ram_addr_t ram_addr);
-extern int tlb_flush_count;
-
+size_t tlb_flush_count(void);
 #endif
 #endif
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 85635ae..9377110 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -92,8 +92,18 @@ static void flush_all_helper(CPUState *src, run_on_cpu_func fn,
     }
 }
 
-/* statistics */
-int tlb_flush_count;
+size_t tlb_flush_count(void)
+{
+    CPUState *cpu;
+    size_t count = 0;
+
+    CPU_FOREACH(cpu) {
+        CPUArchState *env = cpu->env_ptr;
+
+        count += atomic_read(&env->tlb_flush_count);
+    }
+    return count;
+}
 
 /* This is OK because CPU architectures generally permit an
  * implementation to drop entries from the TLB at any time, so
@@ -112,7 +122,8 @@ static void tlb_flush_nocheck(CPUState *cpu)
     }
 
     assert_cpu_is_self(cpu);
-    tlb_debug("(count: %d)\n", tlb_flush_count++);
+    atomic_set(&env->tlb_flush_count, env->tlb_flush_count + 1);
+    tlb_debug("(count: %zu)\n", tlb_flush_count());
 
     tb_lock();
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 090ebad..3ee69e5 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1916,7 +1916,7 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
             atomic_read(&tcg_ctx.tb_ctx.tb_flush_count));
     cpu_fprintf(f, "TB invalidate count %d\n",
             tcg_ctx.tb_ctx.tb_phys_invalidate_count);
-    cpu_fprintf(f, "TLB flush count     %d\n", tlb_flush_count);
+    cpu_fprintf(f, "TLB flush count     %zu\n", tlb_flush_count());
     tcg_dump_info(f, cpu_fprintf);
 
     tb_unlock();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 02/43] tcg: fix corruption of code_time profiling counter upon tb_flush
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 01/43] cputlb: bring back tlb_flush_count under !TLB_DEBUG Emilio G. Cota
@ 2017-07-20  3:08 ` Emilio G. Cota
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 03/43] exec-all: fix typos in TranslationBlock's documentation Emilio G. Cota
                   ` (41 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Whenever there is an overflow in code_gen_buffer (e.g. we run out
of space in it and have to flush it), the code_time profiling counter
ends up with an invalid value (that is, code_time -= profile_getclock(),
without later on getting += profile_getclock() due to the goto).

Fix it by using the ti variable, so that we only update code_time
when there is no overflow. Note that in case there is an overflow
we fail to account for the elapsed coding time, but this is quite rare
so we can probably live with it.

"info jit" before/after, roughly at the same time during debian-arm bootup:

- before:
Statistics:
TB flush count      1
TB invalidate count 4665
TLB flush count     998
JIT cycles          -615191529184601 (-256329.804 s at 2.4 GHz)
translated TBs      302310 (aborted=0 0.0%)
avg ops/TB          48.4 max=438
deleted ops/TB      8.54
avg temps/TB        32.31 max=38
avg host code/TB    361.5
avg search data/TB  24.5
cycles/op           -42014693.0
cycles/in byte      -121444900.2
cycles/out byte     -5629031.1
cycles/search byte     -83114481.0
  gen_interm time   -0.0%
  gen_code time     100.0%
optim./code time    -0.0%
liveness/code time  -0.0%
cpu_restore count   6236
  avg cycles        110.4

- after:
Statistics:
TB flush count      1
TB invalidate count 4665
TLB flush count     1010
JIT cycles          1996899624 (0.832 s at 2.4 GHz)
translated TBs      297961 (aborted=0 0.0%)
avg ops/TB          48.5 max=438
deleted ops/TB      8.56
avg temps/TB        32.31 max=38
avg host code/TB    361.8
avg search data/TB  24.5
cycles/op           138.2
cycles/in byte      398.4
cycles/out byte     18.5
cycles/search byte     273.1
  gen_interm time   14.0%
  gen_code time     86.0%
optim./code time    19.4%
liveness/code time  10.3%
cpu_restore count   6372
  avg cycles        111.0

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 accel/tcg/translate-all.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 3ee69e5..63f8538 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1300,7 +1300,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 #ifdef CONFIG_PROFILER
     tcg_ctx.tb_count++;
     tcg_ctx.interm_time += profile_getclock() - ti;
-    tcg_ctx.code_time -= profile_getclock();
+    ti = profile_getclock();
 #endif
 
     /* ??? Overflow could be handled better here.  In particular, we
@@ -1318,7 +1318,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     }
 
 #ifdef CONFIG_PROFILER
-    tcg_ctx.code_time += profile_getclock();
+    tcg_ctx.code_time += profile_getclock() - ti;
     tcg_ctx.code_in_len += tb->size;
     tcg_ctx.code_out_len += gen_code_size;
     tcg_ctx.search_out_len += search_size;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 03/43] exec-all: fix typos in TranslationBlock's documentation
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 01/43] cputlb: bring back tlb_flush_count under !TLB_DEBUG Emilio G. Cota
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 02/43] tcg: fix corruption of code_time profiling counter upon tb_flush Emilio G. Cota
@ 2017-07-20  3:08 ` Emilio G. Cota
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 04/43] translate-all: make have_tb_lock static Emilio G. Cota
                   ` (40 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/exec-all.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 87b1b74..69c1b36 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -370,7 +370,7 @@ struct TranslationBlock {
     /* The following data are used to directly call another TB from
      * the code of this one. This can be done either by emitting direct or
      * indirect native jump instructions. These jumps are reset so that the TB
-     * just continue its execution. The TB can be linked to another one by
+     * just continues its execution. The TB can be linked to another one by
      * setting one of the jump targets (or patching the jump instruction). Only
      * two of such jumps are supported.
      */
@@ -381,7 +381,7 @@ struct TranslationBlock {
 #else
     uintptr_t jmp_target_addr[2]; /* target address for indirect jump */
 #endif
-    /* Each TB has an assosiated circular list of TBs jumping to this one.
+    /* Each TB has an associated circular list of TBs jumping to this one.
      * jmp_list_first points to the first TB jumping to this one.
      * jmp_list_next is used to point to the next TB in a list.
      * Since each TB can have two jumps, it can participate in two lists.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 04/43] translate-all: make have_tb_lock static
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (2 preceding siblings ...)
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 03/43] exec-all: fix typos in TranslationBlock's documentation Emilio G. Cota
@ 2017-07-20  3:08 ` Emilio G. Cota
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 05/43] cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find Emilio G. Cota
                   ` (39 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

It is only used by this object, and it's not exported to any other.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 accel/tcg/translate-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 63f8538..a124181 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -139,7 +139,7 @@ TCGContext tcg_ctx;
 bool parallel_cpus;
 
 /* translation block context */
-__thread int have_tb_lock;
+static __thread int have_tb_lock;
 
 static void page_table_config_init(void)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 05/43] cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (3 preceding siblings ...)
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 04/43] translate-all: make have_tb_lock static Emilio G. Cota
@ 2017-07-20  3:08 ` Emilio G. Cota
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 06/43] tcg/i386: constify tcg_target_callee_save_regs Emilio G. Cota
                   ` (38 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Reusing the have_tb_lock name, which is also defined in translate-all.c,
makes code reviewing unnecessarily harder.

Avoid potential confusion by renaming the local have_tb_lock variable
to something else.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 accel/tcg/cpu-exec.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index d84b01d..c4c289b 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -337,7 +337,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
     TranslationBlock *tb;
     target_ulong cs_base, pc;
     uint32_t flags;
-    bool have_tb_lock = false;
+    bool acquired_tb_lock = false;
 
     /* we record a subset of the CPU state. It will
        always be the same before a given translated block
@@ -356,7 +356,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
              */
             mmap_lock();
             tb_lock();
-            have_tb_lock = true;
+            acquired_tb_lock = true;
 
             /* There's a chance that our desired tb has been translated while
              * taking the locks so we check again inside the lock.
@@ -384,15 +384,15 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 #endif
     /* See if we can patch the calling TB. */
     if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
-        if (!have_tb_lock) {
+        if (!acquired_tb_lock) {
             tb_lock();
-            have_tb_lock = true;
+            acquired_tb_lock = true;
         }
         if (!tb->invalid) {
             tb_add_jump(last_tb, tb_exit, tb);
         }
     }
-    if (have_tb_lock) {
+    if (acquired_tb_lock) {
         tb_unlock();
     }
     return tb;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 06/43] tcg/i386: constify tcg_target_callee_save_regs
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (4 preceding siblings ...)
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 05/43] cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find Emilio G. Cota
@ 2017-07-20  3:08 ` Emilio G. Cota
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 07/43] tcg/mips: " Emilio G. Cota
                   ` (37 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/i386/tcg-target.inc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 01e3b4e..06df01a 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2514,7 +2514,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     return NULL;
 }
 
-static int tcg_target_callee_save_regs[] = {
+static const int tcg_target_callee_save_regs[] = {
 #if TCG_TARGET_REG_BITS == 64
     TCG_REG_RBP,
     TCG_REG_RBX,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 07/43] tcg/mips: constify tcg_target_callee_save_regs
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (5 preceding siblings ...)
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 06/43] tcg/i386: constify tcg_target_callee_save_regs Emilio G. Cota
@ 2017-07-20  3:08 ` Emilio G. Cota
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 08/43] tcg: remove addr argument from lookup_tb_ptr Emilio G. Cota
                   ` (36 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/mips/tcg-target.inc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 85756b8..56db228 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -2323,7 +2323,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     return NULL;
 }
 
-static int tcg_target_callee_save_regs[] = {
+static const int tcg_target_callee_save_regs[] = {
     TCG_REG_S0,       /* used for the global env (TCG_AREG0) */
     TCG_REG_S1,
     TCG_REG_S2,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 08/43] tcg: remove addr argument from lookup_tb_ptr
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (6 preceding siblings ...)
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 07/43] tcg/mips: " Emilio G. Cota
@ 2017-07-20  3:08 ` Emilio G. Cota
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 09/43] tcg: consolidate TB lookups in tb_lookup__cpu_state Emilio G. Cota
                   ` (35 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

It is unlikely that we will ever want to call this helper passing
an argument other than the current PC. So just remove the argument,
and use the pc we already get from cpu_get_tb_cpu_state.

This change paves the way to having a common "tb_lookup" function.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/tcg-op.h               |  4 ++--
 tcg/tcg-runtime.h          |  2 +-
 target/alpha/translate.c   |  2 +-
 target/arm/translate-a64.c |  4 ++--
 target/arm/translate.c     |  5 +----
 target/hppa/translate.c    |  6 +++---
 target/i386/translate.c    | 17 +++++------------
 target/mips/translate.c    |  4 ++--
 target/s390x/translate.c   |  2 +-
 target/sh4/translate.c     |  4 ++--
 tcg/tcg-op.c               |  4 ++--
 tcg/tcg-runtime.c          | 20 ++++++++++----------
 12 files changed, 32 insertions(+), 42 deletions(-)

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 5d3278f..18d01b2 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -797,7 +797,7 @@ static inline void tcg_gen_exit_tb(uintptr_t val)
 void tcg_gen_goto_tb(unsigned idx);
 
 /**
- * tcg_gen_lookup_and_goto_ptr() - look up a TB and jump to it if valid
+ * tcg_gen_lookup_and_goto_ptr() - look up the current TB, jump to it if valid
  * @addr: Guest address of the target TB
  *
  * If the TB is not valid, jump to the epilogue.
@@ -805,7 +805,7 @@ void tcg_gen_goto_tb(unsigned idx);
  * This operation is optional. If the TCG backend does not implement goto_ptr,
  * this op is equivalent to calling tcg_gen_exit_tb() with 0 as the argument.
  */
-void tcg_gen_lookup_and_goto_ptr(TCGv addr);
+void tcg_gen_lookup_and_goto_ptr(void);
 
 #if TARGET_LONG_BITS == 32
 #define tcg_temp_new() tcg_temp_new_i32()
diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
index c41d38a..1df17d0 100644
--- a/tcg/tcg-runtime.h
+++ b/tcg/tcg-runtime.h
@@ -24,7 +24,7 @@ DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(ctpop_i32, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(ctpop_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 
-DEF_HELPER_FLAGS_2(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env, tl)
+DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env)
 
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 90e6d52..9e98312 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -3073,7 +3073,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
         /* FALLTHRU */
     case EXIT_PC_UPDATED:
         if (!use_exit_tb(&ctx)) {
-            tcg_gen_lookup_and_goto_ptr(cpu_pc);
+            tcg_gen_lookup_and_goto_ptr();
             break;
         }
         /* FALLTHRU */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 3fa3902..818d7eb 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -379,7 +379,7 @@ static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest)
         } else if (s->singlestep_enabled) {
             gen_exception_internal(EXCP_DEBUG);
         } else {
-            tcg_gen_lookup_and_goto_ptr(cpu_pc);
+            tcg_gen_lookup_and_goto_ptr();
             s->is_jmp = DISAS_TB_JUMP;
         }
     }
@@ -11366,7 +11366,7 @@ void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb)
             gen_goto_tb(dc, 1, dc->pc);
             break;
         case DISAS_JUMP:
-            tcg_gen_lookup_and_goto_ptr(cpu_pc);
+            tcg_gen_lookup_and_goto_ptr();
             break;
         case DISAS_TB_JUMP:
         case DISAS_EXC:
diff --git a/target/arm/translate.c b/target/arm/translate.c
index e27736c..964b627 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4152,10 +4152,7 @@ static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
 
 static void gen_goto_ptr(void)
 {
-    TCGv addr = tcg_temp_new();
-    tcg_gen_extu_i32_tl(addr, cpu_R[15]);
-    tcg_gen_lookup_and_goto_ptr(addr);
-    tcg_temp_free(addr);
+    tcg_gen_lookup_and_goto_ptr();
 }
 
 /* This will end the TB but doesn't guarantee we'll return to
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index e10abc5..91053e2 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -517,7 +517,7 @@ static void gen_goto_tb(DisasContext *ctx, int which,
         if (ctx->singlestep_enabled) {
             gen_excp_1(EXCP_DEBUG);
         } else {
-            tcg_gen_lookup_and_goto_ptr(cpu_iaoq_f);
+            tcg_gen_lookup_and_goto_ptr();
         }
     }
 }
@@ -1527,7 +1527,7 @@ static ExitStatus do_ibranch(DisasContext *ctx, TCGv dest,
         if (link != 0) {
             tcg_gen_movi_tl(cpu_gr[link], ctx->iaoq_n);
         }
-        tcg_gen_lookup_and_goto_ptr(cpu_iaoq_f);
+        tcg_gen_lookup_and_goto_ptr();
         return nullify_end(ctx, NO_EXIT);
     } else {
         cond_prep(&ctx->null_cond);
@@ -3885,7 +3885,7 @@ void gen_intermediate_code(CPUHPPAState *env, struct TranslationBlock *tb)
         if (ctx.singlestep_enabled) {
             gen_excp_1(EXCP_DEBUG);
         } else {
-            tcg_gen_lookup_and_goto_ptr(cpu_iaoq_f);
+            tcg_gen_lookup_and_goto_ptr();
         }
         break;
     default:
diff --git a/target/i386/translate.c b/target/i386/translate.c
index ed3b896..291c577 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -2511,7 +2511,7 @@ static void gen_bnd_jmp(DisasContext *s)
    If RECHECK_TF, emit a rechecking helper for #DB, ignoring the state of
    S->TF.  This is used by the syscall/sysret insns.  */
 static void
-do_gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf, TCGv jr)
+do_gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf, bool jr)
 {
     gen_update_cc_op(s);
 
@@ -2532,12 +2532,8 @@ do_gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf, TCGv jr)
         tcg_gen_exit_tb(0);
     } else if (s->tf) {
         gen_helper_single_step(cpu_env);
-    } else if (!TCGV_IS_UNUSED(jr)) {
-        TCGv vaddr = tcg_temp_new();
-
-        tcg_gen_add_tl(vaddr, jr, cpu_seg_base[R_CS]);
-        tcg_gen_lookup_and_goto_ptr(vaddr);
-        tcg_temp_free(vaddr);
+    } else if (jr) {
+        tcg_gen_lookup_and_goto_ptr();
     } else {
         tcg_gen_exit_tb(0);
     }
@@ -2547,10 +2543,7 @@ do_gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf, TCGv jr)
 static inline void
 gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf)
 {
-    TCGv unused;
-
-    TCGV_UNUSED(unused);
-    do_gen_eob_worker(s, inhibit, recheck_tf, unused);
+    do_gen_eob_worker(s, inhibit, recheck_tf, false);
 }
 
 /* End of block.
@@ -2569,7 +2562,7 @@ static void gen_eob(DisasContext *s)
 /* Jump to register */
 static void gen_jr(DisasContext *s, TCGv dest)
 {
-    do_gen_eob_worker(s, false, false, dest);
+    do_gen_eob_worker(s, false, false, true);
 }
 
 /* generate a jump to eip. No segment change must happen before as a
diff --git a/target/mips/translate.c b/target/mips/translate.c
index fe44f2f..28c9fbd 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -4233,7 +4233,7 @@ static inline void gen_goto_tb(DisasContext *ctx, int n, target_ulong dest)
             save_cpu_state(ctx, 0);
             gen_helper_raise_exception_debug(cpu_env);
         }
-        tcg_gen_lookup_and_goto_ptr(cpu_PC);
+        tcg_gen_lookup_and_goto_ptr();
     }
 }
 
@@ -10731,7 +10731,7 @@ static void gen_branch(DisasContext *ctx, int insn_bytes)
                 save_cpu_state(ctx, 0);
                 gen_helper_raise_exception_debug(cpu_env);
             }
-            tcg_gen_lookup_and_goto_ptr(cpu_PC);
+            tcg_gen_lookup_and_goto_ptr();
             break;
         default:
             fprintf(stderr, "unknown branch 0x%x\n", proc_hflags);
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 1dffcee..be1a04d 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -5948,7 +5948,7 @@ void gen_intermediate_code(CPUS390XState *env, struct TranslationBlock *tb)
         } else if (use_exit_tb(&dc) || status == EXIT_PC_STALE_NOCHAIN) {
             tcg_gen_exit_tb(0);
         } else {
-            tcg_gen_lookup_and_goto_ptr(psw_addr);
+            tcg_gen_lookup_and_goto_ptr();
         }
         break;
     default:
diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 498bb99..2a206af 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -261,7 +261,7 @@ static void gen_goto_tb(DisasContext *ctx, int n, target_ulong dest)
         } else if (use_exit_tb(ctx)) {
             tcg_gen_exit_tb(0);
         } else {
-            tcg_gen_lookup_and_goto_ptr(cpu_pc);
+            tcg_gen_lookup_and_goto_ptr();
         }
     }
 }
@@ -278,7 +278,7 @@ static void gen_jump(DisasContext * ctx)
         } else if (use_exit_tb(ctx)) {
             tcg_gen_exit_tb(0);
         } else {
-            tcg_gen_lookup_and_goto_ptr(cpu_pc);
+            tcg_gen_lookup_and_goto_ptr();
         }
     } else {
 	gen_goto_tb(ctx, 0, ctx->delayed_pc);
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 87f673e..205d07f 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2587,11 +2587,11 @@ void tcg_gen_goto_tb(unsigned idx)
     tcg_gen_op1i(INDEX_op_goto_tb, idx);
 }
 
-void tcg_gen_lookup_and_goto_ptr(TCGv addr)
+void tcg_gen_lookup_and_goto_ptr(void)
 {
     if (TCG_TARGET_HAS_goto_ptr && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
         TCGv_ptr ptr = tcg_temp_new_ptr();
-        gen_helper_lookup_tb_ptr(ptr, tcg_ctx.tcg_env, addr);
+        gen_helper_lookup_tb_ptr(ptr, tcg_ctx.tcg_env);
         tcg_gen_op1i(INDEX_op_goto_ptr, GET_TCGV_PTR(ptr));
         tcg_temp_free_ptr(ptr);
     } else {
diff --git a/tcg/tcg-runtime.c b/tcg/tcg-runtime.c
index 3e23649..e85a042 100644
--- a/tcg/tcg-runtime.c
+++ b/tcg/tcg-runtime.c
@@ -144,33 +144,33 @@ uint64_t HELPER(ctpop_i64)(uint64_t arg)
     return ctpop64(arg);
 }
 
-void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr)
+void *HELPER(lookup_tb_ptr)(CPUArchState *env)
 {
     CPUState *cpu = ENV_GET_CPU(env);
     TranslationBlock *tb;
     target_ulong cs_base, pc;
-    uint32_t flags, addr_hash;
+    uint32_t flags, hash;
 
-    addr_hash = tb_jmp_cache_hash_func(addr);
-    tb = atomic_rcu_read(&cpu->tb_jmp_cache[addr_hash]);
     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
+    hash = tb_jmp_cache_hash_func(pc);
+    tb = atomic_rcu_read(&cpu->tb_jmp_cache[hash]);
 
     if (unlikely(!(tb
-                   && tb->pc == addr
+                   && tb->pc == pc
                    && tb->cs_base == cs_base
                    && tb->flags == flags
                    && tb->trace_vcpu_dstate == *cpu->trace_dstate))) {
-        tb = tb_htable_lookup(cpu, addr, cs_base, flags);
+        tb = tb_htable_lookup(cpu, pc, cs_base, flags);
         if (!tb) {
             return tcg_ctx.code_gen_epilogue;
         }
-        atomic_set(&cpu->tb_jmp_cache[addr_hash], tb);
+        atomic_set(&cpu->tb_jmp_cache[hash], tb);
     }
 
-    qemu_log_mask_and_addr(CPU_LOG_EXEC, addr,
+    qemu_log_mask_and_addr(CPU_LOG_EXEC, pc,
                            "Chain %p [%d: " TARGET_FMT_lx "] %s\n",
-                           tb->tc_ptr, cpu->cpu_index, addr,
-                           lookup_symbol(addr));
+                           tb->tc_ptr, cpu->cpu_index, pc,
+                           lookup_symbol(pc));
     return tb->tc_ptr;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 09/43] tcg: consolidate TB lookups in tb_lookup__cpu_state
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (7 preceding siblings ...)
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 08/43] tcg: remove addr argument from lookup_tb_ptr Emilio G. Cota
@ 2017-07-20  3:08 ` Emilio G. Cota
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 10/43] exec-all: bring tb->invalid into tb->cflags Emilio G. Cota
                   ` (34 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

This avoids duplicating code. cpu_exec_step will also use the
new common function once we integrate parallel_cpus into tb->cflags.

Note that in this commit we also fix a race, described by Richard Henderson
during review. Think of this scenario with threads A and B:

   (A) Lookup succeeds for TB in hash without tb_lock
        (B) Sets the TB's tb->invalid flag
        (B) Removes the TB from tb_htable
        (B) Clears all CPU's tb_jmp_cache
   (A) Store TB into local tb_jmp_cache

Given that order of events, (A) will keep executing that invalid TB until
another flush of its tb_jmp_cache happens, which in theory might never happen.
We can fix this by checking the tb->invalid flag every time we look up a TB
from tb_jmp_cache, so that in the above scenario, next time we try to find
that TB in tb_jmp_cache, we won't, and will therefore be forced to look it
up in tb_htable.

Performance-wise, I measured a small improvement when booting debian-arm.
Note that inlining pays off:

 Performance counter stats for 'taskset -c 0 qemu-system-arm \
	-machine type=virt -nographic -smp 1 -m 4096 \
	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
	-device virtio-net-device,netdev=unet \
	-drive file=jessie.qcow2,id=myblock,index=0,if=none \
	-device virtio-blk-device,drive=myblock \
	-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
	-name arm,debug-threads=on -smp 1' (10 runs):

Before:
      18714.917392 task-clock                #    0.952 CPUs utilized            ( +-  0.95% )
            23,142 context-switches          #    0.001 M/sec                    ( +-  0.50% )
                 1 CPU-migrations            #    0.000 M/sec
            10,558 page-faults               #    0.001 M/sec                    ( +-  0.95% )
    53,957,727,252 cycles                    #    2.883 GHz                      ( +-  0.91% ) [83.33%]
    24,440,599,852 stalled-cycles-frontend   #   45.30% frontend cycles idle     ( +-  1.20% ) [83.33%]
    16,495,714,424 stalled-cycles-backend    #   30.57% backend  cycles idle     ( +-  0.95% ) [66.66%]
    76,267,572,582 instructions              #    1.41  insns per cycle
                                             #    0.32  stalled cycles per insn  ( +-  0.87% ) [83.34%]
    12,692,186,323 branches                  #  678.186 M/sec                    ( +-  0.92% ) [83.35%]
       263,486,879 branch-misses             #    2.08% of all branches          ( +-  0.73% ) [83.34%]

      19.648474449 seconds time elapsed                                          ( +-  0.82% )

After, w/ inline (this patch):
      18471.376627 task-clock                #    0.955 CPUs utilized            ( +-  0.96% )
            23,048 context-switches          #    0.001 M/sec                    ( +-  0.48% )
                 1 CPU-migrations            #    0.000 M/sec
            10,708 page-faults               #    0.001 M/sec                    ( +-  0.81% )
    53,208,990,796 cycles                    #    2.881 GHz                      ( +-  0.98% ) [83.34%]
    23,941,071,673 stalled-cycles-frontend   #   44.99% frontend cycles idle     ( +-  0.95% ) [83.34%]
    16,161,773,848 stalled-cycles-backend    #   30.37% backend  cycles idle     ( +-  0.76% ) [66.67%]
    75,786,269,766 instructions              #    1.42  insns per cycle
                                             #    0.32  stalled cycles per insn  ( +-  1.24% ) [83.34%]
    12,573,617,143 branches                  #  680.708 M/sec                    ( +-  1.34% ) [83.33%]
       260,235,550 branch-misses             #    2.07% of all branches          ( +-  0.66% ) [83.33%]

      19.340502161 seconds time elapsed                                          ( +-  0.56% )

After, w/o inline:
      18791.253967 task-clock                #    0.954 CPUs utilized            ( +-  0.78% )
            23,230 context-switches          #    0.001 M/sec                    ( +-  0.42% )
                 1 CPU-migrations            #    0.000 M/sec
            10,563 page-faults               #    0.001 M/sec                    ( +-  1.27% )
    54,168,674,622 cycles                    #    2.883 GHz                      ( +-  0.80% ) [83.34%]
    24,244,712,629 stalled-cycles-frontend   #   44.76% frontend cycles idle     ( +-  1.37% ) [83.33%]
    16,288,648,572 stalled-cycles-backend    #   30.07% backend  cycles idle     ( +-  0.95% ) [66.66%]
    77,659,755,503 instructions              #    1.43  insns per cycle
                                             #    0.31  stalled cycles per insn  ( +-  0.97% ) [83.34%]
    12,922,780,045 branches                  #  687.702 M/sec                    ( +-  1.06% ) [83.34%]
       261,962,386 branch-misses             #    2.03% of all branches          ( +-  0.71% ) [83.35%]

      19.700174670 seconds time elapsed                                          ( +-  0.56% )

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/tb-lookup.h | 49 ++++++++++++++++++++++++++++++++++++++++++++++++
 accel/tcg/cpu-exec.c     | 47 ++++++++++++++++++----------------------------
 tcg/tcg-runtime.c        | 24 ++++++------------------
 3 files changed, 73 insertions(+), 47 deletions(-)
 create mode 100644 include/exec/tb-lookup.h

diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h
new file mode 100644
index 0000000..9d32cb0
--- /dev/null
+++ b/include/exec/tb-lookup.h
@@ -0,0 +1,49 @@
+/*
+ * Copyright (C) 2017, Emilio G. Cota <cota@braap.org>
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+#ifndef EXEC_TB_LOOKUP_H
+#define EXEC_TB_LOOKUP_H
+
+#include "qemu/osdep.h"
+
+#ifdef NEED_CPU_H
+#include "cpu.h"
+#else
+#include "exec/poison.h"
+#endif
+
+#include "exec/exec-all.h"
+#include "exec/tb-hash.h"
+
+/* Might cause an exception, so have a longjmp destination ready */
+static inline TranslationBlock *
+tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_base,
+                     uint32_t *flags)
+{
+    CPUArchState *env = (CPUArchState *)cpu->env_ptr;
+    TranslationBlock *tb;
+    uint32_t hash;
+
+    cpu_get_tb_cpu_state(env, pc, cs_base, flags);
+    hash = tb_jmp_cache_hash_func(*pc);
+    tb = atomic_rcu_read(&cpu->tb_jmp_cache[hash]);
+    if (likely(tb &&
+               tb->pc == *pc &&
+               tb->cs_base == *cs_base &&
+               tb->flags == *flags &&
+               tb->trace_vcpu_dstate == *cpu->trace_dstate &&
+               !atomic_read(&tb->invalid))) {
+        return tb;
+    }
+    tb = tb_htable_lookup(cpu, *pc, *cs_base, *flags);
+    if (tb == NULL) {
+        return NULL;
+    }
+    atomic_set(&cpu->tb_jmp_cache[hash], tb);
+    return tb;
+}
+
+#endif /* EXEC_TB_LOOKUP_H */
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index c4c289b..5d2ee5b 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -28,6 +28,7 @@
 #include "exec/address-spaces.h"
 #include "qemu/rcu.h"
 #include "exec/tb-hash.h"
+#include "exec/tb-lookup.h"
 #include "exec/log.h"
 #include "qemu/main-loop.h"
 #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
@@ -333,43 +334,31 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
                                         TranslationBlock *last_tb,
                                         int tb_exit)
 {
-    CPUArchState *env = (CPUArchState *)cpu->env_ptr;
     TranslationBlock *tb;
     target_ulong cs_base, pc;
     uint32_t flags;
     bool acquired_tb_lock = false;
 
-    /* we record a subset of the CPU state. It will
-       always be the same before a given translated block
-       is executed. */
-    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
-    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
-    if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
-                 tb->flags != flags ||
-                 tb->trace_vcpu_dstate != *cpu->trace_dstate)) {
-        tb = tb_htable_lookup(cpu, pc, cs_base, flags);
-        if (!tb) {
-
-            /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
-             * taken outside tb_lock. As system emulation is currently
-             * single threaded the locks are NOPs.
-             */
-            mmap_lock();
-            tb_lock();
-            acquired_tb_lock = true;
-
-            /* There's a chance that our desired tb has been translated while
-             * taking the locks so we check again inside the lock.
-             */
-            tb = tb_htable_lookup(cpu, pc, cs_base, flags);
-            if (!tb) {
-                /* if no translated code available, then translate it now */
-                tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
-            }
+    tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags);
+    if (tb == NULL) {
+        /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
+         * taken outside tb_lock. As system emulation is currently
+         * single threaded the locks are NOPs.
+         */
+        mmap_lock();
+        tb_lock();
+        acquired_tb_lock = true;
 
-            mmap_unlock();
+        /* There's a chance that our desired tb has been translated while
+         * taking the locks so we check again inside the lock.
+         */
+        tb = tb_htable_lookup(cpu, pc, cs_base, flags);
+        if (likely(tb == NULL)) {
+            /* if no translated code available, then translate it now */
+            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
         }
 
+        mmap_unlock();
         /* We add the TB in the virtual pc hash table for the fast lookup */
         atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
     }
diff --git a/tcg/tcg-runtime.c b/tcg/tcg-runtime.c
index e85a042..7100339 100644
--- a/tcg/tcg-runtime.c
+++ b/tcg/tcg-runtime.c
@@ -27,7 +27,7 @@
 #include "exec/helper-proto.h"
 #include "exec/cpu_ldst.h"
 #include "exec/exec-all.h"
-#include "exec/tb-hash.h"
+#include "exec/tb-lookup.h"
 #include "disas/disas.h"
 #include "exec/log.h"
 
@@ -149,24 +149,12 @@ void *HELPER(lookup_tb_ptr)(CPUArchState *env)
     CPUState *cpu = ENV_GET_CPU(env);
     TranslationBlock *tb;
     target_ulong cs_base, pc;
-    uint32_t flags, hash;
-
-    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
-    hash = tb_jmp_cache_hash_func(pc);
-    tb = atomic_rcu_read(&cpu->tb_jmp_cache[hash]);
-
-    if (unlikely(!(tb
-                   && tb->pc == pc
-                   && tb->cs_base == cs_base
-                   && tb->flags == flags
-                   && tb->trace_vcpu_dstate == *cpu->trace_dstate))) {
-        tb = tb_htable_lookup(cpu, pc, cs_base, flags);
-        if (!tb) {
-            return tcg_ctx.code_gen_epilogue;
-        }
-        atomic_set(&cpu->tb_jmp_cache[hash], tb);
-    }
+    uint32_t flags;
 
+    tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags);
+    if (tb == NULL) {
+        return tcg_ctx.code_gen_epilogue;
+    }
     qemu_log_mask_and_addr(CPU_LOG_EXEC, pc,
                            "Chain %p [%d: " TARGET_FMT_lx "] %s\n",
                            tb->tc_ptr, cpu->cpu_index, pc,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 10/43] exec-all: bring tb->invalid into tb->cflags
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (8 preceding siblings ...)
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 09/43] tcg: consolidate TB lookups in tb_lookup__cpu_state Emilio G. Cota
@ 2017-07-20  3:08 ` Emilio G. Cota
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 11/43] tcg: define CF_PARALLEL and use it for TB hashing Emilio G. Cota
                   ` (33 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

This gets rid of a hole in struct TranslationBlock.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/exec-all.h   | 3 +--
 include/exec/tb-lookup.h  | 2 +-
 accel/tcg/cpu-exec.c      | 4 ++--
 accel/tcg/translate-all.c | 3 +--
 4 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 69c1b36..256b9a6 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -352,12 +352,11 @@ struct TranslationBlock {
 #define CF_NOCACHE     0x10000 /* To be freed after execution */
 #define CF_USE_ICOUNT  0x20000
 #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
+#define CF_INVALID     0x80000 /* TB is stale. Setters must acquire tb_lock */
 
     /* Per-vCPU dynamic tracing state used to generate this TB */
     uint32_t trace_vcpu_dstate;
 
-    uint16_t invalid;
-
     void *tc_ptr;    /* pointer to the translated code */
     uint8_t *tc_search;  /* pointer to search data */
     /* original tb when cflags has CF_NOCACHE */
diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h
index 9d32cb0..436b6d5 100644
--- a/include/exec/tb-lookup.h
+++ b/include/exec/tb-lookup.h
@@ -35,7 +35,7 @@ tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_base,
                tb->cs_base == *cs_base &&
                tb->flags == *flags &&
                tb->trace_vcpu_dstate == *cpu->trace_dstate &&
-               !atomic_read(&tb->invalid))) {
+               !(atomic_read(&tb->cflags) & CF_INVALID))) {
         return tb;
     }
     tb = tb_htable_lookup(cpu, *pc, *cs_base, *flags);
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 5d2ee5b..fae8c40 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -294,7 +294,7 @@ static bool tb_cmp(const void *p, const void *d)
         tb->cs_base == desc->cs_base &&
         tb->flags == desc->flags &&
         tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
-        !atomic_read(&tb->invalid)) {
+        !(atomic_read(&tb->cflags) & CF_INVALID)) {
         /* check next page if needed */
         if (tb->page_addr[1] == -1) {
             return true;
@@ -377,7 +377,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
             tb_lock();
             acquired_tb_lock = true;
         }
-        if (!tb->invalid) {
+        if (!(tb->cflags & CF_INVALID)) {
             tb_add_jump(last_tb, tb_exit, tb);
         }
     }
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index a124181..7ef4f19 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1073,7 +1073,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
 
     assert_tb_locked();
 
-    atomic_set(&tb->invalid, true);
+    atomic_set(&tb->cflags, tb->cflags | CF_INVALID);
 
     /* remove the TB from the hash list */
     phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
@@ -1269,7 +1269,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tb->flags = flags;
     tb->cflags = cflags;
     tb->trace_vcpu_dstate = *cpu->trace_dstate;
-    tb->invalid = false;
 
 #ifdef CONFIG_PROFILER
     tcg_ctx.tb_count1++; /* includes aborted translations because of
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 11/43] tcg: define CF_PARALLEL and use it for TB hashing
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (9 preceding siblings ...)
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 10/43] exec-all: bring tb->invalid into tb->cflags Emilio G. Cota
@ 2017-07-20  3:08 ` Emilio G. Cota
  2017-07-20  8:45   ` Richard Henderson
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 12/43] tcg: convert tb->cflags reads to tb_cflags(tb) Emilio G. Cota
                   ` (32 subsequent siblings)
  43 siblings, 1 reply; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

This will enable us to decouple code translation from the value
of parallel_cpus at any given time. It will also help us minimize
TB flushes when generating code via EXCP_ATOMIC.

Note that the declaration of parallel_cpus is brought to exec-all.h
to be able to define there the "curr_cflags" inline.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/exec-all.h   | 20 +++++++++++++++++++-
 include/exec/tb-hash-xx.h |  9 ++++++---
 include/exec/tb-hash.h    |  4 ++--
 include/exec/tb-lookup.h  |  6 +++---
 tcg/tcg.h                 |  1 -
 accel/tcg/cpu-exec.c      | 45 +++++++++++++++++++++++----------------------
 accel/tcg/translate-all.c | 13 +++++++++----
 exec.c                    |  2 +-
 tcg/tcg-runtime.c         |  2 +-
 tests/qht-bench.c         |  2 +-
 10 files changed, 65 insertions(+), 39 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 256b9a6..0af0485 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -353,6 +353,9 @@ struct TranslationBlock {
 #define CF_USE_ICOUNT  0x20000
 #define CF_IGNORE_ICOUNT 0x40000 /* Do not generate icount code */
 #define CF_INVALID     0x80000 /* TB is stale. Setters must acquire tb_lock */
+#define CF_PARALLEL    0x100000 /* Generate code for a parallel context */
+/* cflags' mask for hashing/comparison */
+#define CF_HASH_MASK (CF_PARALLEL)
 
     /* Per-vCPU dynamic tracing state used to generate this TB */
     uint32_t trace_vcpu_dstate;
@@ -396,11 +399,26 @@ struct TranslationBlock {
     uintptr_t jmp_list_first;
 };
 
+extern bool parallel_cpus;
+
+/* Hide the atomic_read to make code a little easier on the eyes */
+static inline uint32_t tb_cflags(const TranslationBlock *tb)
+{
+    return atomic_read(&tb->cflags);
+}
+
+/* current cflags for hashing/comparison */
+static inline uint32_t curr_cflags(void)
+{
+    return parallel_cpus ? CF_PARALLEL : 0;
+}
+
 void tb_free(TranslationBlock *tb);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
-                                   target_ulong cs_base, uint32_t flags);
+                                   target_ulong cs_base, uint32_t flags,
+                                   uint32_t cf_mask);
 
 #if defined(USE_DIRECT_JUMP)
 
diff --git a/include/exec/tb-hash-xx.h b/include/exec/tb-hash-xx.h
index 6cd3022..747a9a6 100644
--- a/include/exec/tb-hash-xx.h
+++ b/include/exec/tb-hash-xx.h
@@ -48,8 +48,8 @@
  * xxhash32, customized for input variables that are not guaranteed to be
  * contiguous in memory.
  */
-static inline
-uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f)
+static inline uint32_t
+tb_hash_func7(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f, uint32_t g)
 {
     uint32_t v1 = TB_HASH_XX_SEED + PRIME32_1 + PRIME32_2;
     uint32_t v2 = TB_HASH_XX_SEED + PRIME32_2;
@@ -78,7 +78,7 @@ uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f)
     v4 *= PRIME32_1;
 
     h32 = rol32(v1, 1) + rol32(v2, 7) + rol32(v3, 12) + rol32(v4, 18);
-    h32 += 24;
+    h32 += 28;
 
     h32 += e * PRIME32_3;
     h32  = rol32(h32, 17) * PRIME32_4;
@@ -86,6 +86,9 @@ uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f)
     h32 += f * PRIME32_3;
     h32  = rol32(h32, 17) * PRIME32_4;
 
+    h32 += g * PRIME32_3;
+    h32  = rol32(h32, 17) * PRIME32_4;
+
     h32 ^= h32 >> 15;
     h32 *= PRIME32_2;
     h32 ^= h32 >> 13;
diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index 17b5ee0..0526c4f 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -59,9 +59,9 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
 
 static inline
 uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags,
-                      uint32_t trace_vcpu_dstate)
+                      uint32_t cf_mask, uint32_t trace_vcpu_dstate)
 {
-    return tb_hash_func6(phys_pc, pc, flags, trace_vcpu_dstate);
+    return tb_hash_func7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate);
 }
 
 #endif
diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h
index 436b6d5..2961385 100644
--- a/include/exec/tb-lookup.h
+++ b/include/exec/tb-lookup.h
@@ -21,7 +21,7 @@
 /* Might cause an exception, so have a longjmp destination ready */
 static inline TranslationBlock *
 tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_base,
-                     uint32_t *flags)
+                     uint32_t *flags, uint32_t cf_mask)
 {
     CPUArchState *env = (CPUArchState *)cpu->env_ptr;
     TranslationBlock *tb;
@@ -35,10 +35,10 @@ tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_base,
                tb->cs_base == *cs_base &&
                tb->flags == *flags &&
                tb->trace_vcpu_dstate == *cpu->trace_dstate &&
-               !(atomic_read(&tb->cflags) & CF_INVALID))) {
+               (tb_cflags(tb) & (CF_HASH_MASK | CF_INVALID)) == cf_mask)) {
         return tb;
     }
-    tb = tb_htable_lookup(cpu, *pc, *cs_base, *flags);
+    tb = tb_htable_lookup(cpu, *pc, *cs_base, *flags, cf_mask);
     if (tb == NULL) {
         return NULL;
     }
diff --git a/tcg/tcg.h b/tcg/tcg.h
index da78721..96872f8 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -730,7 +730,6 @@ struct TCGContext {
 };
 
 extern TCGContext tcg_ctx;
-extern bool parallel_cpus;
 
 static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
 {
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index fae8c40..b71e015 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -207,7 +207,8 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
     tb_lock();
     tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
                      max_cycles | CF_NOCACHE
-                         | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
+                         | (ignore_icount ? CF_IGNORE_ICOUNT : 0)
+                         | curr_cflags());
     tb->orig_tb = orig_tb;
     tb_unlock();
 
@@ -225,31 +226,27 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 static void cpu_exec_step(CPUState *cpu)
 {
     CPUClass *cc = CPU_GET_CLASS(cpu);
-    CPUArchState *env = (CPUArchState *)cpu->env_ptr;
     TranslationBlock *tb;
     target_ulong cs_base, pc;
     uint32_t flags;
+    uint32_t cflags = 1 | CF_IGNORE_ICOUNT;
 
-    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
     if (sigsetjmp(cpu->jmp_env, 0) == 0) {
-        mmap_lock();
-        tb_lock();
-        tb = tb_gen_code(cpu, pc, cs_base, flags,
-                         1 | CF_NOCACHE | CF_IGNORE_ICOUNT);
-        tb->orig_tb = NULL;
-        tb_unlock();
-        mmap_unlock();
+        tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags,
+                                  cflags & CF_HASH_MASK);
+        if (tb == NULL) {
+            mmap_lock();
+            tb_lock();
+            tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
+            tb_unlock();
+            mmap_unlock();
+        }
 
         cc->cpu_exec_enter(cpu);
         /* execute the generated code */
-        trace_exec_tb_nocache(tb, pc);
+        trace_exec_tb(tb, pc);
         cpu_tb_exec(cpu, tb);
         cc->cpu_exec_exit(cpu);
-
-        tb_lock();
-        tb_phys_invalidate(tb, -1);
-        tb_free(tb);
-        tb_unlock();
     } else {
         /* We may have exited due to another problem here, so we need
          * to reset any tb_locks we may have taken but didn't release.
@@ -281,6 +278,7 @@ struct tb_desc {
     CPUArchState *env;
     tb_page_addr_t phys_page1;
     uint32_t flags;
+    uint32_t cf_mask;
     uint32_t trace_vcpu_dstate;
 };
 
@@ -294,7 +292,7 @@ static bool tb_cmp(const void *p, const void *d)
         tb->cs_base == desc->cs_base &&
         tb->flags == desc->flags &&
         tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
-        !(atomic_read(&tb->cflags) & CF_INVALID)) {
+        (tb_cflags(tb) & (CF_HASH_MASK | CF_INVALID)) == desc->cf_mask) {
         /* check next page if needed */
         if (tb->page_addr[1] == -1) {
             return true;
@@ -313,7 +311,8 @@ static bool tb_cmp(const void *p, const void *d)
 }
 
 TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
-                                   target_ulong cs_base, uint32_t flags)
+                                   target_ulong cs_base, uint32_t flags,
+                                   uint32_t cf_mask)
 {
     tb_page_addr_t phys_pc;
     struct tb_desc desc;
@@ -322,11 +321,12 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
     desc.env = (CPUArchState *)cpu->env_ptr;
     desc.cs_base = cs_base;
     desc.flags = flags;
+    desc.cf_mask = cf_mask;
     desc.trace_vcpu_dstate = *cpu->trace_dstate;
     desc.pc = pc;
     phys_pc = get_page_addr_code(desc.env, pc);
     desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
-    h = tb_hash_func(phys_pc, pc, flags, *cpu->trace_dstate);
+    h = tb_hash_func(phys_pc, pc, flags, cf_mask, *cpu->trace_dstate);
     return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
 }
 
@@ -338,8 +338,9 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
     target_ulong cs_base, pc;
     uint32_t flags;
     bool acquired_tb_lock = false;
+    uint32_t cf_mask = curr_cflags();
 
-    tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags);
+    tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask);
     if (tb == NULL) {
         /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
          * taken outside tb_lock. As system emulation is currently
@@ -352,10 +353,10 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
         /* There's a chance that our desired tb has been translated while
          * taking the locks so we check again inside the lock.
          */
-        tb = tb_htable_lookup(cpu, pc, cs_base, flags);
+        tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
         if (likely(tb == NULL)) {
             /* if no translated code available, then translate it now */
-            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+            tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
         }
 
         mmap_unlock();
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 7ef4f19..600c0a1 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1077,7 +1077,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
 
     /* remove the TB from the hash list */
     phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
-    h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate);
+    h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
+                     tb->trace_vcpu_dstate);
     qht_remove(&tcg_ctx.tb_ctx.htable, tb, h);
 
     /* remove the TB from the page list */
@@ -1222,7 +1223,8 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
     }
 
     /* add in the hash table */
-    h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->trace_vcpu_dstate);
+    h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
+                     tb->trace_vcpu_dstate);
     qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
 
 #ifdef DEBUG_TB_CHECK
@@ -1503,7 +1505,8 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
         /* we generate a block containing just the instruction
            modifying the memory. It will ensure that it cannot modify
            itself */
-        tb_gen_code(cpu, current_pc, current_cs_base, current_flags, 1);
+        tb_gen_code(cpu, current_pc, current_cs_base, current_flags,
+                    1 | curr_cflags());
         cpu_loop_exit_noexc(cpu);
     }
 #endif
@@ -1621,7 +1624,8 @@ static bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
         /* we generate a block containing just the instruction
            modifying the memory. It will ensure that it cannot modify
            itself */
-        tb_gen_code(cpu, current_pc, current_cs_base, current_flags, 1);
+        tb_gen_code(cpu, current_pc, current_cs_base, current_flags,
+                    1 | curr_cflags());
         /* tb_lock will be reset after cpu_loop_exit_noexc longjmps
          * back into the cpu_exec loop. */
         return true;
@@ -1765,6 +1769,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     }
 
     cflags = n | CF_LAST_IO;
+    cflags |= curr_cflags();
     pc = tb->pc;
     cs_base = tb->cs_base;
     flags = tb->flags;
diff --git a/exec.c b/exec.c
index 01ac21e..94b0f3e 100644
--- a/exec.c
+++ b/exec.c
@@ -2433,7 +2433,7 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
                     cpu_loop_exit(cpu);
                 } else {
                     cpu_get_tb_cpu_state(env, &pc, &cs_base, &cpu_flags);
-                    tb_gen_code(cpu, pc, cs_base, cpu_flags, 1);
+                    tb_gen_code(cpu, pc, cs_base, cpu_flags, 1 | curr_cflags());
                     cpu_loop_exit_noexc(cpu);
                 }
             }
diff --git a/tcg/tcg-runtime.c b/tcg/tcg-runtime.c
index 7100339..4f873a9 100644
--- a/tcg/tcg-runtime.c
+++ b/tcg/tcg-runtime.c
@@ -151,7 +151,7 @@ void *HELPER(lookup_tb_ptr)(CPUArchState *env)
     target_ulong cs_base, pc;
     uint32_t flags;
 
-    tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags);
+    tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, curr_cflags());
     if (tb == NULL) {
         return tcg_ctx.code_gen_epilogue;
     }
diff --git a/tests/qht-bench.c b/tests/qht-bench.c
index 11c1cec..4cabdfd 100644
--- a/tests/qht-bench.c
+++ b/tests/qht-bench.c
@@ -103,7 +103,7 @@ static bool is_equal(const void *obj, const void *userp)
 
 static inline uint32_t h(unsigned long v)
 {
-    return tb_hash_func6(v, 0, 0, 0);
+    return tb_hash_func7(v, 0, 0, 0, 0);
 }
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 12/43] tcg: convert tb->cflags reads to tb_cflags(tb)
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (10 preceding siblings ...)
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 11/43] tcg: define CF_PARALLEL and use it for TB hashing Emilio G. Cota
@ 2017-07-20  3:08 ` Emilio G. Cota
  2017-07-20  7:22   ` Richard Henderson
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 13/43] target/arm: check CF_PARALLEL instead of parallel_cpus Emilio G. Cota
                   ` (31 subsequent siblings)
  43 siblings, 1 reply; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Convert all existing readers of tb->cflags to tb_cflags, so that we
use atomic_read and therefore avoid undefined behaviour in C11.

Note that the remaining setters/getters of the field are protected
by tb_lock, and therefore do not need conversion.

Luckily all readers access the field via 'tb->cflags' (so no foo.cflags,
bar->cflags in the code base), which makes the conversion easily
scriptable:
FILES=$(git grep 'tb->cflags' target include/exec/gen-icount.h | \
	cut -f 1 -d':' | sort | uniq)
perl -pi -e 's/([^>])tb->cflags/$1tb_cflags(tb)/g' $FILES
perl -pi -e 's/([a-z]*)->tb->cflags/tb_cflags($1->tb)/g' $FILES

Then manually fixed the few errors that checkpatch reported.

Compile-tested for all targets.

Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/gen-icount.h     |  8 +++----
 target/alpha/translate.c      | 12 +++++-----
 target/arm/translate-a64.c    | 13 +++++-----
 target/arm/translate.c        | 10 ++++----
 target/cris/translate.c       |  6 ++---
 target/hppa/translate.c       |  8 +++----
 target/i386/translate.c       | 55 ++++++++++++++++++++++---------------------
 target/lm32/translate.c       | 14 +++++------
 target/m68k/translate.c       |  6 ++---
 target/microblaze/translate.c |  6 ++---
 target/mips/translate.c       | 26 ++++++++++----------
 target/moxie/translate.c      |  2 +-
 target/nios2/translate.c      |  6 ++---
 target/openrisc/translate.c   |  6 ++---
 target/ppc/translate.c        |  6 ++---
 target/ppc/translate_init.c   | 32 ++++++++++++-------------
 target/s390x/translate.c      |  8 +++----
 target/sh4/translate.c        |  6 ++---
 target/sparc/translate.c      |  6 ++---
 target/tilegx/translate.c     |  2 +-
 target/tricore/translate.c    |  2 +-
 target/unicore32/translate.c  |  6 ++---
 target/xtensa/translate.c     | 28 +++++++++++-----------
 23 files changed, 138 insertions(+), 136 deletions(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 9b3cb14..48b566c 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -13,7 +13,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
     TCGv_i32 count, imm;
 
     exitreq_label = gen_new_label();
-    if (tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(tb) & CF_USE_ICOUNT) {
         count = tcg_temp_local_new_i32();
     } else {
         count = tcg_temp_new_i32();
@@ -22,7 +22,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
     tcg_gen_ld_i32(count, tcg_ctx.tcg_env,
                    -ENV_OFFSET + offsetof(CPUState, icount_decr.u32));
 
-    if (tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(tb) & CF_USE_ICOUNT) {
         imm = tcg_temp_new_i32();
         /* We emit a movi with a dummy immediate argument. Keep the insn index
          * of the movi so that we later (when we know the actual insn count)
@@ -36,7 +36,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
 
     tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, exitreq_label);
 
-    if (tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(tb) & CF_USE_ICOUNT) {
         tcg_gen_st16_i32(count, tcg_ctx.tcg_env,
                          -ENV_OFFSET + offsetof(CPUState, icount_decr.u16.low));
     }
@@ -46,7 +46,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
 
 static inline void gen_tb_end(TranslationBlock *tb, int num_insns)
 {
-    if (tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(tb) & CF_USE_ICOUNT) {
         /* Update the num_insn immediate parameter now that we know
          * the actual insn count.  */
         tcg_set_insn_param(icount_start_insn_idx, 1, num_insns);
diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 9e98312..f97a8e5 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -484,9 +484,9 @@ static bool in_superpage(DisasContext *ctx, int64_t addr)
 
 static bool use_exit_tb(DisasContext *ctx)
 {
-    return ((ctx->tb->cflags & CF_LAST_IO)
+    return (tb_cflags(ctx->tb) & CF_LAST_IO)
             || ctx->singlestep_enabled
-            || singlestep);
+            || singlestep;
 }
 
 static bool use_goto_tb(DisasContext *ctx, uint64_t dest)
@@ -2430,7 +2430,7 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
         case 0xC000:
             /* RPCC */
             va = dest_gpr(ctx, ra);
-            if (ctx->tb->cflags & CF_USE_ICOUNT) {
+            if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
                 gen_io_start();
                 gen_helper_load_pcc(va, cpu_env);
                 gen_io_end();
@@ -2998,7 +2998,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
     TCGV_UNUSED_I64(ctx.lit);
 
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -3028,7 +3028,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
             ctx.pc += 4;
             break;
         }
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
         insn = cpu_ldl_code(env, ctx.pc);
@@ -3053,7 +3053,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
         }
     } while (ret == NO_EXIT);
 
-    if (tb->cflags & CF_LAST_IO) {
+    if (tb_cflags(tb) & CF_LAST_IO) {
         gen_io_end();
     }
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 818d7eb..685f1b0 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -348,7 +348,8 @@ static inline bool use_goto_tb(DisasContext *s, int n, uint64_t dest)
     /* No direct tb linking with singlestep (either QEMU's or the ARM
      * debug architecture kind) or deterministic io
      */
-    if (s->singlestep_enabled || s->ss_active || (s->tb->cflags & CF_LAST_IO)) {
+    if (s->singlestep_enabled || s->ss_active ||
+        (tb_cflags(s->tb) & CF_LAST_IO)) {
         return false;
     }
 
@@ -1559,7 +1560,7 @@ static void handle_sys(DisasContext *s, uint32_t insn, bool isread,
         break;
     }
 
-    if ((s->tb->cflags & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
+    if ((tb_cflags(s->tb) & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
         gen_io_start();
     }
 
@@ -1590,7 +1591,7 @@ static void handle_sys(DisasContext *s, uint32_t insn, bool isread,
         }
     }
 
-    if ((s->tb->cflags & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
+    if ((tb_cflags(s->tb) & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
         /* I/O operations must end the TB here (whether read or write) */
         gen_io_end();
         s->is_jmp = DISAS_UPDATE;
@@ -11258,7 +11259,7 @@ void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb)
 
     next_page_start = (pc_start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -11299,7 +11300,7 @@ void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb)
             }
         }
 
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
 
@@ -11340,7 +11341,7 @@ void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb)
              dc->pc < next_page_start &&
              num_insns < max_insns);
 
-    if (tb->cflags & CF_LAST_IO) {
+    if (tb_cflags(tb) & CF_LAST_IO) {
         gen_io_end();
     }
 
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 964b627..ccfb428 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7655,7 +7655,7 @@ static int disas_coproc_insn(DisasContext *s, uint32_t insn)
             break;
         }
 
-        if ((s->tb->cflags & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
+        if ((tb_cflags(s->tb) & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
             gen_io_start();
         }
 
@@ -7746,7 +7746,7 @@ static int disas_coproc_insn(DisasContext *s, uint32_t insn)
             }
         }
 
-        if ((s->tb->cflags & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
+        if ((tb_cflags(s->tb) & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
             /* I/O operations must end the TB here (whether read or write) */
             gen_io_end();
             gen_lookup_tb(s);
@@ -11876,7 +11876,7 @@ void gen_intermediate_code(CPUARMState *env, TranslationBlock *tb)
     cpu_M0 = tcg_temp_new_i64();
     next_page_start = (pc_start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -11971,7 +11971,7 @@ void gen_intermediate_code(CPUARMState *env, TranslationBlock *tb)
             }
         }
 
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
 
@@ -12041,7 +12041,7 @@ void gen_intermediate_code(CPUARMState *env, TranslationBlock *tb)
              !end_of_page &&
              num_insns < max_insns);
 
-    if (tb->cflags & CF_LAST_IO) {
+    if (tb_cflags(tb) & CF_LAST_IO) {
         if (dc->condjmp) {
             /* FIXME:  This can theoretically happen with self-modifying
                code.  */
diff --git a/target/cris/translate.c b/target/cris/translate.c
index 0ee05ca..1703d91 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -3137,7 +3137,7 @@ void gen_intermediate_code(CPUCRISState *env, struct TranslationBlock *tb)
 
     next_page_start = (pc_start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -3167,7 +3167,7 @@ void gen_intermediate_code(CPUCRISState *env, struct TranslationBlock *tb)
         /* Pretty disas.  */
         LOG_DIS("%8.8x:\t", dc->pc);
 
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
         dc->clear_x = 1;
@@ -3240,7 +3240,7 @@ void gen_intermediate_code(CPUCRISState *env, struct TranslationBlock *tb)
 
     npc = dc->pc;
 
-        if (tb->cflags & CF_LAST_IO)
+        if (tb_cflags(tb) & CF_LAST_IO)
             gen_io_end();
     /* Force an update if the per-tb cpu state has changed.  */
     if (dc->is_jmp == DISAS_NEXT
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 91053e2..1effe82 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -487,7 +487,7 @@ static ExitStatus gen_illegal(DisasContext *ctx)
 static bool use_goto_tb(DisasContext *ctx, target_ulong dest)
 {
     /* Suppress goto_tb in the case of single-steping and IO.  */
-    if ((ctx->tb->cflags & CF_LAST_IO) || ctx->singlestep_enabled) {
+    if ((tb_cflags(ctx->tb) & CF_LAST_IO) || ctx->singlestep_enabled) {
         return false;
     }
     return true;
@@ -3762,7 +3762,7 @@ void gen_intermediate_code(CPUHPPAState *env, struct TranslationBlock *tb)
     /* Compute the maximum number of insns to execute, as bounded by
        (1) icount, (2) single-stepping, (3) branch delay slots, or
        (4) the number of insns remaining on the current page.  */
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -3792,7 +3792,7 @@ void gen_intermediate_code(CPUHPPAState *env, struct TranslationBlock *tb)
             ret = gen_excp(&ctx, EXCP_DEBUG);
             break;
         }
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
 
@@ -3868,7 +3868,7 @@ void gen_intermediate_code(CPUHPPAState *env, struct TranslationBlock *tb)
         }
     } while (ret == NO_EXIT);
 
-    if (tb->cflags & CF_LAST_IO) {
+    if (tb_cflags(tb) & CF_LAST_IO) {
         gen_io_end();
     }
 
diff --git a/target/i386/translate.c b/target/i386/translate.c
index 291c577..f046ffa 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -1119,7 +1119,7 @@ static void gen_bpt_io(DisasContext *s, TCGv_i32 t_port, int ot)
 
 static inline void gen_ins(DisasContext *s, TCGMemOp ot)
 {
-    if (s->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
     gen_string_movl_A0_EDI(s);
@@ -1134,14 +1134,14 @@ static inline void gen_ins(DisasContext *s, TCGMemOp ot)
     gen_op_movl_T0_Dshift(ot);
     gen_op_add_reg_T0(s->aflag, R_EDI);
     gen_bpt_io(s, cpu_tmp2_i32, ot);
-    if (s->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
         gen_io_end();
     }
 }
 
 static inline void gen_outs(DisasContext *s, TCGMemOp ot)
 {
-    if (s->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
     gen_string_movl_A0_ESI(s);
@@ -1154,7 +1154,7 @@ static inline void gen_outs(DisasContext *s, TCGMemOp ot)
     gen_op_movl_T0_Dshift(ot);
     gen_op_add_reg_T0(s->aflag, R_ESI);
     gen_bpt_io(s, cpu_tmp2_i32, ot);
-    if (s->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
         gen_io_end();
     }
 }
@@ -6299,7 +6299,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             gen_repz_ins(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
             gen_ins(s, ot);
-            if (s->tb->cflags & CF_USE_ICOUNT) {
+            if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
                 gen_jmp(s, s->pc - s->cs_base);
             }
         }
@@ -6314,7 +6314,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             gen_repz_outs(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
             gen_outs(s, ot);
-            if (s->tb->cflags & CF_USE_ICOUNT) {
+            if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
                 gen_jmp(s, s->pc - s->cs_base);
             }
         }
@@ -6330,14 +6330,14 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         tcg_gen_movi_tl(cpu_T0, val);
         gen_check_io(s, ot, pc_start - s->cs_base,
                      SVM_IOIO_TYPE_MASK | svm_is_rep(prefixes));
-        if (s->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
             gen_io_start();
 	}
         tcg_gen_movi_i32(cpu_tmp2_i32, val);
         gen_helper_in_func(ot, cpu_T1, cpu_tmp2_i32);
         gen_op_mov_reg_v(ot, R_EAX, cpu_T1);
         gen_bpt_io(s, cpu_tmp2_i32, ot);
-        if (s->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
             gen_io_end();
             gen_jmp(s, s->pc - s->cs_base);
         }
@@ -6351,14 +6351,14 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                      svm_is_rep(prefixes));
         gen_op_mov_v_reg(ot, cpu_T1, R_EAX);
 
-        if (s->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
             gen_io_start();
 	}
         tcg_gen_movi_i32(cpu_tmp2_i32, val);
         tcg_gen_trunc_tl_i32(cpu_tmp3_i32, cpu_T1);
         gen_helper_out_func(ot, cpu_tmp2_i32, cpu_tmp3_i32);
         gen_bpt_io(s, cpu_tmp2_i32, ot);
-        if (s->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
             gen_io_end();
             gen_jmp(s, s->pc - s->cs_base);
         }
@@ -6369,14 +6369,14 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         tcg_gen_ext16u_tl(cpu_T0, cpu_regs[R_EDX]);
         gen_check_io(s, ot, pc_start - s->cs_base,
                      SVM_IOIO_TYPE_MASK | svm_is_rep(prefixes));
-        if (s->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
             gen_io_start();
 	}
         tcg_gen_trunc_tl_i32(cpu_tmp2_i32, cpu_T0);
         gen_helper_in_func(ot, cpu_T1, cpu_tmp2_i32);
         gen_op_mov_reg_v(ot, R_EAX, cpu_T1);
         gen_bpt_io(s, cpu_tmp2_i32, ot);
-        if (s->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
             gen_io_end();
             gen_jmp(s, s->pc - s->cs_base);
         }
@@ -6389,14 +6389,14 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                      svm_is_rep(prefixes));
         gen_op_mov_v_reg(ot, cpu_T1, R_EAX);
 
-        if (s->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
             gen_io_start();
 	}
         tcg_gen_trunc_tl_i32(cpu_tmp2_i32, cpu_T0);
         tcg_gen_trunc_tl_i32(cpu_tmp3_i32, cpu_T1);
         gen_helper_out_func(ot, cpu_tmp2_i32, cpu_tmp3_i32);
         gen_bpt_io(s, cpu_tmp2_i32, ot);
-        if (s->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
             gen_io_end();
             gen_jmp(s, s->pc - s->cs_base);
         }
@@ -7104,11 +7104,11 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
     case 0x131: /* rdtsc */
         gen_update_cc_op(s);
         gen_jmp_im(pc_start - s->cs_base);
-        if (s->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
             gen_io_start();
 	}
         gen_helper_rdtsc(cpu_env);
-        if (s->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
             gen_io_end();
             gen_jmp(s, s->pc - s->cs_base);
         }
@@ -7563,11 +7563,11 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             }
             gen_update_cc_op(s);
             gen_jmp_im(pc_start - s->cs_base);
-            if (s->tb->cflags & CF_USE_ICOUNT) {
+            if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
                 gen_io_start();
             }
             gen_helper_rdtscp(cpu_env);
-            if (s->tb->cflags & CF_USE_ICOUNT) {
+            if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
                 gen_io_end();
                 gen_jmp(s, s->pc - s->cs_base);
             }
@@ -7932,24 +7932,24 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 gen_update_cc_op(s);
                 gen_jmp_im(pc_start - s->cs_base);
                 if (b & 2) {
-                    if (s->tb->cflags & CF_USE_ICOUNT) {
+                    if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
                         gen_io_start();
                     }
                     gen_op_mov_v_reg(ot, cpu_T0, rm);
                     gen_helper_write_crN(cpu_env, tcg_const_i32(reg),
                                          cpu_T0);
-                    if (s->tb->cflags & CF_USE_ICOUNT) {
+                    if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
                         gen_io_end();
                     }
                     gen_jmp_im(s->pc - s->cs_base);
                     gen_eob(s);
                 } else {
-                    if (s->tb->cflags & CF_USE_ICOUNT) {
+                    if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
                         gen_io_start();
                     }
                     gen_helper_read_crN(cpu_T0, cpu_env, tcg_const_i32(reg));
                     gen_op_mov_reg_v(ot, rm, cpu_T0);
-                    if (s->tb->cflags & CF_USE_ICOUNT) {
+                    if (tb_cflags(s->tb) & CF_USE_ICOUNT) {
                         gen_io_end();
                     }
                 }
@@ -8431,7 +8431,7 @@ void gen_intermediate_code(CPUX86State *env, TranslationBlock *tb)
        record/replay modes and there will always be an
        additional step for ecx=0 when icount is enabled.
      */
-    dc->repz_opt = !dc->jmp_opt && !(tb->cflags & CF_USE_ICOUNT);
+    dc->repz_opt = !dc->jmp_opt && !(tb_cflags(tb) & CF_USE_ICOUNT);
 #if 0
     /* check addseg logic */
     if (!dc->addseg && (dc->vm86 || !dc->pe || !dc->code32))
@@ -8454,7 +8454,7 @@ void gen_intermediate_code(CPUX86State *env, TranslationBlock *tb)
     dc->is_jmp = DISAS_NEXT;
     pc_ptr = pc_start;
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -8479,7 +8479,7 @@ void gen_intermediate_code(CPUX86State *env, TranslationBlock *tb)
             pc_ptr += 1;
             goto done_generating;
         }
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
 
@@ -8504,7 +8504,7 @@ void gen_intermediate_code(CPUX86State *env, TranslationBlock *tb)
            If current instruction already crossed the bound - it's ok,
            because an exception hasn't stopped this code.
          */
-        if ((tb->cflags & CF_USE_ICOUNT)
+        if ((tb_cflags(tb) & CF_USE_ICOUNT)
             && ((pc_ptr & TARGET_PAGE_MASK)
                 != ((pc_ptr + TARGET_MAX_INSN_SIZE - 1) & TARGET_PAGE_MASK)
                 || (pc_ptr & ~TARGET_PAGE_MASK) == 0)) {
@@ -8526,8 +8526,9 @@ void gen_intermediate_code(CPUX86State *env, TranslationBlock *tb)
             break;
         }
     }
-    if (tb->cflags & CF_LAST_IO)
+    if (tb_cflags(tb) & CF_LAST_IO) {
         gen_io_end();
+    }
 done_generating:
     gen_tb_end(tb, num_insns);
 
diff --git a/target/lm32/translate.c b/target/lm32/translate.c
index 692882f..3597c61 100644
--- a/target/lm32/translate.c
+++ b/target/lm32/translate.c
@@ -874,24 +874,24 @@ static void dec_wcsr(DisasContext *dc)
         break;
     case CSR_IM:
         /* mark as an io operation because it could cause an interrupt */
-        if (dc->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(dc->tb) & CF_USE_ICOUNT) {
             gen_io_start();
         }
         gen_helper_wcsr_im(cpu_env, cpu_R[dc->r1]);
         tcg_gen_movi_tl(cpu_pc, dc->pc + 4);
-        if (dc->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(dc->tb) & CF_USE_ICOUNT) {
             gen_io_end();
         }
         dc->is_jmp = DISAS_UPDATE;
         break;
     case CSR_IP:
         /* mark as an io operation because it could cause an interrupt */
-        if (dc->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(dc->tb) & CF_USE_ICOUNT) {
             gen_io_start();
         }
         gen_helper_wcsr_ip(cpu_env, cpu_R[dc->r1]);
         tcg_gen_movi_tl(cpu_pc, dc->pc + 4);
-        if (dc->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(dc->tb) & CF_USE_ICOUNT) {
             gen_io_end();
         }
         dc->is_jmp = DISAS_UPDATE;
@@ -1072,7 +1072,7 @@ void gen_intermediate_code(CPULM32State *env, struct TranslationBlock *tb)
 
     next_page_start = (pc_start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -1100,7 +1100,7 @@ void gen_intermediate_code(CPULM32State *env, struct TranslationBlock *tb)
         /* Pretty disas.  */
         LOG_DIS("%8.8x:\t", dc->pc);
 
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
 
@@ -1113,7 +1113,7 @@ void gen_intermediate_code(CPULM32State *env, struct TranslationBlock *tb)
          && (dc->pc < next_page_start)
          && num_insns < max_insns);
 
-    if (tb->cflags & CF_LAST_IO) {
+    if (tb_cflags(tb) & CF_LAST_IO) {
         gen_io_end();
     }
 
diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 3a519b7..188520b 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -5544,7 +5544,7 @@ void gen_intermediate_code(CPUM68KState *env, TranslationBlock *tb)
     dc->done_mac = 0;
     dc->writeback_mask = 0;
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -5570,7 +5570,7 @@ void gen_intermediate_code(CPUM68KState *env, TranslationBlock *tb)
             break;
         }
 
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
 
@@ -5582,7 +5582,7 @@ void gen_intermediate_code(CPUM68KState *env, TranslationBlock *tb)
              (pc_offset) < (TARGET_PAGE_SIZE - 32) &&
              num_insns < max_insns);
 
-    if (tb->cflags & CF_LAST_IO)
+    if (tb_cflags(tb) & CF_LAST_IO)
         gen_io_end();
     if (unlikely(cs->singlestep_enabled)) {
         /* Make sure the pc is updated, and raise a debug exception.  */
diff --git a/target/microblaze/translate.c b/target/microblaze/translate.c
index cb65d1e..4cd184e 100644
--- a/target/microblaze/translate.c
+++ b/target/microblaze/translate.c
@@ -1660,7 +1660,7 @@ void gen_intermediate_code(CPUMBState *env, struct TranslationBlock *tb)
 
     next_page_start = (pc_start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -1695,7 +1695,7 @@ void gen_intermediate_code(CPUMBState *env, struct TranslationBlock *tb)
         /* Pretty disas.  */
         LOG_DIS("%8.8x:\t", dc->pc);
 
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
 
@@ -1757,7 +1757,7 @@ void gen_intermediate_code(CPUMBState *env, struct TranslationBlock *tb)
             npc = dc->jmp_pc;
     }
 
-    if (tb->cflags & CF_LAST_IO)
+    if (tb_cflags(tb) & CF_LAST_IO)
         gen_io_end();
     /* Force an update if the per-tb cpu state has changed.  */
     if (dc->is_jmp == DISAS_NEXT
diff --git a/target/mips/translate.c b/target/mips/translate.c
index 28c9fbd..f839a2b 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -5238,11 +5238,11 @@ static void gen_mfc0(DisasContext *ctx, TCGv arg, int reg, int sel)
         switch (sel) {
         case 0:
             /* Mark as an IO operation because we read the time.  */
-            if (ctx->tb->cflags & CF_USE_ICOUNT) {
+            if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
                 gen_io_start();
 	    }
             gen_helper_mfc0_count(arg, cpu_env);
-            if (ctx->tb->cflags & CF_USE_ICOUNT) {
+            if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
                 gen_io_end();
             }
             /* Break the TB to be able to take timer interrupts immediately
@@ -5642,7 +5642,7 @@ static void gen_mtc0(DisasContext *ctx, TCGv arg, int reg, int sel)
     if (sel != 0)
         check_insn(ctx, ISA_MIPS32);
 
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
 
@@ -6291,7 +6291,7 @@ static void gen_mtc0(DisasContext *ctx, TCGv arg, int reg, int sel)
     trace_mips_translate_c0("mtc0", rn, reg, sel);
 
     /* For simplicity assume that all writes can cause interrupts.  */
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_end();
         ctx->bstate = BS_STOP;
     }
@@ -6551,11 +6551,11 @@ static void gen_dmfc0(DisasContext *ctx, TCGv arg, int reg, int sel)
         switch (sel) {
         case 0:
             /* Mark as an IO operation because we read the time.  */
-            if (ctx->tb->cflags & CF_USE_ICOUNT) {
+            if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
                 gen_io_start();
             }
             gen_helper_mfc0_count(arg, cpu_env);
-            if (ctx->tb->cflags & CF_USE_ICOUNT) {
+            if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
                 gen_io_end();
             }
             /* Break the TB to be able to take timer interrupts immediately
@@ -6942,7 +6942,7 @@ static void gen_dmtc0(DisasContext *ctx, TCGv arg, int reg, int sel)
     if (sel != 0)
         check_insn(ctx, ISA_MIPS64);
 
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
 
@@ -7259,11 +7259,11 @@ static void gen_dmtc0(DisasContext *ctx, TCGv arg, int reg, int sel)
             save_cpu_state(ctx, 1);
             /* Mark as an IO operation because we may trigger a software
                interrupt.  */
-            if (ctx->tb->cflags & CF_USE_ICOUNT) {
+            if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
                 gen_io_start();
             }
             gen_helper_mtc0_cause(cpu_env, arg);
-            if (ctx->tb->cflags & CF_USE_ICOUNT) {
+            if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
                 gen_io_end();
             }
             /* Stop translation as we may have triggered an intetrupt */
@@ -7589,7 +7589,7 @@ static void gen_dmtc0(DisasContext *ctx, TCGv arg, int reg, int sel)
     trace_mips_translate_c0("dmtc0", rn, reg, sel);
 
     /* For simplicity assume that all writes can cause interrupts.  */
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_end();
         ctx->bstate = BS_STOP;
     }
@@ -19937,7 +19937,7 @@ void gen_intermediate_code(CPUMIPSState *env, struct TranslationBlock *tb)
     ctx.default_tcg_memop_mask = (ctx.insn_flags & ISA_MIPS32R6) ?
                                  MO_UNALN : MO_ALIGN;
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -19963,7 +19963,7 @@ void gen_intermediate_code(CPUMIPSState *env, struct TranslationBlock *tb)
             goto done_generating;
         }
 
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
 
@@ -20024,7 +20024,7 @@ void gen_intermediate_code(CPUMIPSState *env, struct TranslationBlock *tb)
         if (singlestep)
             break;
     }
-    if (tb->cflags & CF_LAST_IO) {
+    if (tb_cflags(tb) & CF_LAST_IO) {
         gen_io_end();
     }
     if (cs->singlestep_enabled && ctx.bstate != BS_BRANCH) {
diff --git a/target/moxie/translate.c b/target/moxie/translate.c
index 0660b44..f61aa2d 100644
--- a/target/moxie/translate.c
+++ b/target/moxie/translate.c
@@ -838,7 +838,7 @@ void gen_intermediate_code(CPUMoxieState *env, struct TranslationBlock *tb)
     ctx.singlestep_enabled = 0;
     ctx.bstate = BS_NONE;
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
diff --git a/target/nios2/translate.c b/target/nios2/translate.c
index 2f3c2e5..e74e070 100644
--- a/target/nios2/translate.c
+++ b/target/nios2/translate.c
@@ -822,7 +822,7 @@ void gen_intermediate_code(CPUNios2State *env, TranslationBlock *tb)
         max_insns = 1;
     } else {
         int page_insns = (TARGET_PAGE_SIZE - (tb->pc & TARGET_PAGE_MASK)) / 4;
-        max_insns = tb->cflags & CF_COUNT_MASK;
+        max_insns = tb_cflags(tb) & CF_COUNT_MASK;
         if (max_insns == 0) {
             max_insns = CF_COUNT_MASK;
         }
@@ -849,7 +849,7 @@ void gen_intermediate_code(CPUNios2State *env, TranslationBlock *tb)
             break;
         }
 
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
 
@@ -866,7 +866,7 @@ void gen_intermediate_code(CPUNios2State *env, TranslationBlock *tb)
              !tcg_op_buf_full() &&
              num_insns < max_insns);
 
-    if (tb->cflags & CF_LAST_IO) {
+    if (tb_cflags(tb) & CF_LAST_IO) {
         gen_io_end();
     }
 
diff --git a/target/openrisc/translate.c b/target/openrisc/translate.c
index e49518e..347790c 100644
--- a/target/openrisc/translate.c
+++ b/target/openrisc/translate.c
@@ -1540,7 +1540,7 @@ void gen_intermediate_code(CPUOpenRISCState *env, struct TranslationBlock *tb)
 
     next_page_start = (pc_start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
 
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
@@ -1583,7 +1583,7 @@ void gen_intermediate_code(CPUOpenRISCState *env, struct TranslationBlock *tb)
             break;
         }
 
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
         disas_openrisc_insn(dc, cpu);
@@ -1606,7 +1606,7 @@ void gen_intermediate_code(CPUOpenRISCState *env, struct TranslationBlock *tb)
              && (dc->pc < next_page_start)
              && num_insns < max_insns);
 
-    if (tb->cflags & CF_LAST_IO) {
+    if (tb_cflags(tb) & CF_LAST_IO) {
         gen_io_end();
     }
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index c0cd64d..e146aa3 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -7275,7 +7275,7 @@ void gen_intermediate_code(CPUPPCState *env, struct TranslationBlock *tb)
     msr_se = 1;
 #endif
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -7303,7 +7303,7 @@ void gen_intermediate_code(CPUPPCState *env, struct TranslationBlock *tb)
         LOG_DISAS("----------------\n");
         LOG_DISAS("nip=" TARGET_FMT_lx " super=%d ir=%d\n",
                   ctx.nip, ctx.mem_idx, (int)msr_ir);
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO))
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO))
             gen_io_start();
         if (unlikely(need_byteswap(&ctx))) {
             ctx.opcode = bswap32(cpu_ldl_code(env, ctx.nip));
@@ -7384,7 +7384,7 @@ void gen_intermediate_code(CPUPPCState *env, struct TranslationBlock *tb)
             exit(1);
         }
     }
-    if (tb->cflags & CF_LAST_IO)
+    if (tb_cflags(tb) & CF_LAST_IO)
         gen_io_end();
     if (ctx.exception == POWERPC_EXCP_NONE) {
         gen_goto_tb(&ctx, 0, ctx.nip);
diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
index b325c2c..2e902fc 100644
--- a/target/ppc/translate_init.c
+++ b/target/ppc/translate_init.c
@@ -175,11 +175,11 @@ static void spr_write_ureg(DisasContext *ctx, int sprn, int gprn)
 #if !defined(CONFIG_USER_ONLY)
 static void spr_read_decr(DisasContext *ctx, int gprn, int sprn)
 {
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
     gen_helper_load_decr(cpu_gpr[gprn], cpu_env);
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_end();
         gen_stop_exception(ctx);
     }
@@ -187,11 +187,11 @@ static void spr_read_decr(DisasContext *ctx, int gprn, int sprn)
 
 static void spr_write_decr(DisasContext *ctx, int sprn, int gprn)
 {
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
     gen_helper_store_decr(cpu_env, cpu_gpr[gprn]);
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_end();
         gen_stop_exception(ctx);
     }
@@ -202,11 +202,11 @@ static void spr_write_decr(DisasContext *ctx, int sprn, int gprn)
 /* Time base */
 static void spr_read_tbl(DisasContext *ctx, int gprn, int sprn)
 {
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
     gen_helper_load_tbl(cpu_gpr[gprn], cpu_env);
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_end();
         gen_stop_exception(ctx);
     }
@@ -214,11 +214,11 @@ static void spr_read_tbl(DisasContext *ctx, int gprn, int sprn)
 
 static void spr_read_tbu(DisasContext *ctx, int gprn, int sprn)
 {
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
     gen_helper_load_tbu(cpu_gpr[gprn], cpu_env);
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_end();
         gen_stop_exception(ctx);
     }
@@ -239,11 +239,11 @@ static void spr_read_atbu(DisasContext *ctx, int gprn, int sprn)
 #if !defined(CONFIG_USER_ONLY)
 static void spr_write_tbl(DisasContext *ctx, int sprn, int gprn)
 {
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
     gen_helper_store_tbl(cpu_env, cpu_gpr[gprn]);
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_end();
         gen_stop_exception(ctx);
     }
@@ -251,11 +251,11 @@ static void spr_write_tbl(DisasContext *ctx, int sprn, int gprn)
 
 static void spr_write_tbu(DisasContext *ctx, int sprn, int gprn)
 {
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
     gen_helper_store_tbu(cpu_env, cpu_gpr[gprn]);
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_end();
         gen_stop_exception(ctx);
     }
@@ -283,11 +283,11 @@ static void spr_read_purr(DisasContext *ctx, int gprn, int sprn)
 /* HDECR */
 static void spr_read_hdecr(DisasContext *ctx, int gprn, int sprn)
 {
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
     gen_helper_load_hdecr(cpu_gpr[gprn], cpu_env);
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_end();
         gen_stop_exception(ctx);
     }
@@ -295,11 +295,11 @@ static void spr_read_hdecr(DisasContext *ctx, int gprn, int sprn)
 
 static void spr_write_hdecr(DisasContext *ctx, int sprn, int gprn)
 {
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
     gen_helper_store_hdecr(cpu_env, cpu_gpr[gprn]);
-    if (ctx->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
         gen_io_end();
         gen_stop_exception(ctx);
     }
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index be1a04d..0d5d623 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -612,7 +612,7 @@ static void gen_op_calc_cc(DisasContext *s)
 static bool use_exit_tb(DisasContext *s)
 {
     return (s->singlestep_enabled ||
-            (s->tb->cflags & CF_LAST_IO) ||
+            (tb_cflags(s->tb) & CF_LAST_IO) ||
             (s->tb->flags & FLAG_MASK_PER));
 }
 
@@ -5880,7 +5880,7 @@ void gen_intermediate_code(CPUS390XState *env, struct TranslationBlock *tb)
     next_page_start = (pc_start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
 
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -5905,7 +5905,7 @@ void gen_intermediate_code(CPUS390XState *env, struct TranslationBlock *tb)
             break;
         }
 
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
 
@@ -5924,7 +5924,7 @@ void gen_intermediate_code(CPUS390XState *env, struct TranslationBlock *tb)
         }
     } while (status == NO_EXIT);
 
-    if (tb->cflags & CF_LAST_IO) {
+    if (tb_cflags(tb) & CF_LAST_IO) {
         gen_io_end();
     }
 
diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 2a206af..9fcaefd 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -2256,7 +2256,7 @@ void gen_intermediate_code(CPUSH4State * env, struct TranslationBlock *tb)
                  (ctx.tbflags & (1 << SR_RB))) * 0x10;
     ctx.fbank = ctx.tbflags & FPSCR_FR ? 0x10 : 0;
 
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -2300,7 +2300,7 @@ void gen_intermediate_code(CPUSH4State * env, struct TranslationBlock *tb)
             break;
         }
 
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
 
@@ -2308,7 +2308,7 @@ void gen_intermediate_code(CPUSH4State * env, struct TranslationBlock *tb)
 	decode_opc(&ctx);
 	ctx.pc += 2;
     }
-    if (tb->cflags & CF_LAST_IO) {
+    if (tb_cflags(tb) & CF_LAST_IO) {
         gen_io_end();
     }
 
diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index aa6734d..39d8494 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -5781,7 +5781,7 @@ void gen_intermediate_code(CPUSPARCState * env, TranslationBlock * tb)
 #endif
 
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -5810,7 +5810,7 @@ void gen_intermediate_code(CPUSPARCState * env, TranslationBlock * tb)
             goto exit_gen_loop;
         }
 
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
 
@@ -5837,7 +5837,7 @@ void gen_intermediate_code(CPUSPARCState * env, TranslationBlock * tb)
              num_insns < max_insns);
 
  exit_gen_loop:
-    if (tb->cflags & CF_LAST_IO) {
+    if (tb_cflags(tb) & CF_LAST_IO) {
         gen_io_end();
     }
     if (!dc->is_br) {
diff --git a/target/tilegx/translate.c b/target/tilegx/translate.c
index ff2ef7b..33be670 100644
--- a/target/tilegx/translate.c
+++ b/target/tilegx/translate.c
@@ -2379,7 +2379,7 @@ void gen_intermediate_code(CPUTLGState *env, struct TranslationBlock *tb)
     uint64_t pc_start = tb->pc;
     uint64_t next_page_start = (pc_start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
     int num_insns = 0;
-    int max_insns = tb->cflags & CF_COUNT_MASK;
+    int max_insns = tb_cflags(tb) & CF_COUNT_MASK;
 
     dc->pc = pc_start;
     dc->mmuidx = 0;
diff --git a/target/tricore/translate.c b/target/tricore/translate.c
index ddd2dd0..3d8448c 100644
--- a/target/tricore/translate.c
+++ b/target/tricore/translate.c
@@ -8791,7 +8791,7 @@ void gen_intermediate_code(CPUTriCoreState *env, struct TranslationBlock *tb)
     int num_insns, max_insns;
 
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
diff --git a/target/unicore32/translate.c b/target/unicore32/translate.c
index 666a201..4cede72 100644
--- a/target/unicore32/translate.c
+++ b/target/unicore32/translate.c
@@ -1896,7 +1896,7 @@ void gen_intermediate_code(CPUUniCore32State *env, TranslationBlock *tb)
     cpu_F1d = tcg_temp_new_i64();
     next_page_start = (pc_start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
     num_insns = 0;
-    max_insns = tb->cflags & CF_COUNT_MASK;
+    max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     if (max_insns == 0) {
         max_insns = CF_COUNT_MASK;
     }
@@ -1929,7 +1929,7 @@ void gen_intermediate_code(CPUUniCore32State *env, TranslationBlock *tb)
             goto done_generating;
         }
 
-        if (num_insns == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (num_insns == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
 
@@ -1954,7 +1954,7 @@ void gen_intermediate_code(CPUUniCore32State *env, TranslationBlock *tb)
              dc->pc < next_page_start &&
              num_insns < max_insns);
 
-    if (tb->cflags & CF_LAST_IO) {
+    if (tb_cflags(tb) & CF_LAST_IO) {
         if (dc->condjmp) {
             /* FIXME:  This can theoretically happen with self-modifying
                code.  */
diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
index 2630024..3ded61b 100644
--- a/target/xtensa/translate.c
+++ b/target/xtensa/translate.c
@@ -513,12 +513,12 @@ static bool gen_check_sr(DisasContext *dc, uint32_t sr, unsigned access)
 
 static bool gen_rsr_ccount(DisasContext *dc, TCGv_i32 d, uint32_t sr)
 {
-    if (dc->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(dc->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
     gen_helper_update_ccount(cpu_env);
     tcg_gen_mov_i32(d, cpu_SR[sr]);
-    if (dc->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(dc->tb) & CF_USE_ICOUNT) {
         gen_io_end();
         return true;
     }
@@ -698,11 +698,11 @@ static bool gen_wsr_cpenable(DisasContext *dc, uint32_t sr, TCGv_i32 v)
 
 static void gen_check_interrupts(DisasContext *dc)
 {
-    if (dc->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(dc->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
     gen_helper_check_interrupts(cpu_env);
-    if (dc->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(dc->tb) & CF_USE_ICOUNT) {
         gen_io_end();
     }
 }
@@ -756,11 +756,11 @@ static bool gen_wsr_ps(DisasContext *dc, uint32_t sr, TCGv_i32 v)
 
 static bool gen_wsr_ccount(DisasContext *dc, uint32_t sr, TCGv_i32 v)
 {
-    if (dc->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(dc->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
     gen_helper_wsr_ccount(cpu_env, v);
-    if (dc->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(dc->tb) & CF_USE_ICOUNT) {
         gen_io_end();
         gen_jumpi_check_loop_end(dc, 0);
         return true;
@@ -797,11 +797,11 @@ static bool gen_wsr_ccompare(DisasContext *dc, uint32_t sr, TCGv_i32 v)
 
         tcg_gen_mov_i32(cpu_SR[sr], v);
         tcg_gen_andi_i32(cpu_SR[INTSET], cpu_SR[INTSET], ~int_bit);
-        if (dc->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(dc->tb) & CF_USE_ICOUNT) {
             gen_io_start();
         }
         gen_helper_update_ccompare(cpu_env, tmp);
-        if (dc->tb->cflags & CF_USE_ICOUNT) {
+        if (tb_cflags(dc->tb) & CF_USE_ICOUNT) {
             gen_io_end();
             gen_jumpi_check_loop_end(dc, 0);
             ret = true;
@@ -896,11 +896,11 @@ static void gen_waiti(DisasContext *dc, uint32_t imm4)
     TCGv_i32 pc = tcg_const_i32(dc->next_pc);
     TCGv_i32 intlevel = tcg_const_i32(imm4);
 
-    if (dc->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(dc->tb) & CF_USE_ICOUNT) {
         gen_io_start();
     }
     gen_helper_waiti(cpu_env, pc, intlevel);
-    if (dc->tb->cflags & CF_USE_ICOUNT) {
+    if (tb_cflags(dc->tb) & CF_USE_ICOUNT) {
         gen_io_end();
     }
     tcg_temp_free(pc);
@@ -3123,7 +3123,7 @@ void gen_intermediate_code(CPUXtensaState *env, TranslationBlock *tb)
     CPUState *cs = CPU(cpu);
     DisasContext dc;
     int insn_count = 0;
-    int max_insns = tb->cflags & CF_COUNT_MASK;
+    int max_insns = tb_cflags(tb) & CF_COUNT_MASK;
     uint32_t pc_start = tb->pc;
     uint32_t next_page_start =
         (pc_start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
@@ -3159,7 +3159,7 @@ void gen_intermediate_code(CPUXtensaState *env, TranslationBlock *tb)
 
     gen_tb_start(tb);
 
-    if ((tb->cflags & CF_USE_ICOUNT) &&
+    if ((tb_cflags(tb) & CF_USE_ICOUNT) &&
         (tb->flags & XTENSA_TBFLAG_YIELD)) {
         tcg_gen_insn_start(dc.pc);
         ++insn_count;
@@ -3191,7 +3191,7 @@ void gen_intermediate_code(CPUXtensaState *env, TranslationBlock *tb)
             break;
         }
 
-        if (insn_count == max_insns && (tb->cflags & CF_LAST_IO)) {
+        if (insn_count == max_insns && (tb_cflags(tb) & CF_LAST_IO)) {
             gen_io_start();
         }
 
@@ -3232,7 +3232,7 @@ done:
         tcg_temp_free(dc.next_icount);
     }
 
-    if (tb->cflags & CF_LAST_IO) {
+    if (tb_cflags(tb) & CF_LAST_IO) {
         gen_io_end();
     }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 13/43] target/arm: check CF_PARALLEL instead of parallel_cpus
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (11 preceding siblings ...)
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 12/43] tcg: convert tb->cflags reads to tb_cflags(tb) Emilio G. Cota
@ 2017-07-20  3:08 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 14/43] target/hppa: " Emilio G. Cota
                   ` (30 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/arm/helper-a64.h    |  4 ++++
 target/arm/helper-a64.c    | 38 ++++++++++++++++++++++++++++++++------
 target/arm/op_helper.c     |  7 -------
 target/arm/translate-a64.c | 31 +++++++++++++++++++++++++------
 target/arm/translate.c     |  9 +++++++--
 5 files changed, 68 insertions(+), 21 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index 6f9eaba..85d8674 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -43,4 +43,8 @@ DEF_HELPER_FLAGS_2(fcvtx_f64_to_f32, TCG_CALL_NO_RWG, f32, f64, env)
 DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_4(paired_cmpxchg64_le, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
+DEF_HELPER_FLAGS_4(paired_cmpxchg64_le_parallel, TCG_CALL_NO_WG,
+                   i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(paired_cmpxchg64_be, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
+DEF_HELPER_FLAGS_4(paired_cmpxchg64_be_parallel, TCG_CALL_NO_WG,
+                   i64, env, i64, i64, i64)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index d9df82c..d0e435c 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -430,8 +430,9 @@ uint64_t HELPER(crc32c_64)(uint64_t acc, uint64_t val, uint32_t bytes)
 }
 
 /* Returns 0 on success; 1 otherwise.  */
-uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr,
-                                     uint64_t new_lo, uint64_t new_hi)
+static uint64_t do_paired_cmpxchg64_le(CPUARMState *env, uint64_t addr,
+                                       uint64_t new_lo, uint64_t new_hi,
+                                       bool parallel)
 {
     uintptr_t ra = GETPC();
     Int128 oldv, cmpv, newv;
@@ -440,7 +441,7 @@ uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr,
     cmpv = int128_make128(env->exclusive_val, env->exclusive_high);
     newv = int128_make128(new_lo, new_hi);
 
-    if (parallel_cpus) {
+    if (parallel) {
 #ifndef CONFIG_ATOMIC128
         cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
 #else
@@ -484,8 +485,21 @@ uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr,
     return !success;
 }
 
-uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr,
-                                     uint64_t new_lo, uint64_t new_hi)
+uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr,
+                                              uint64_t new_lo, uint64_t new_hi)
+{
+    return do_paired_cmpxchg64_le(env, addr, new_lo, new_hi, false);
+}
+
+uint64_t HELPER(paired_cmpxchg64_le_parallel)(CPUARMState *env, uint64_t addr,
+                                              uint64_t new_lo, uint64_t new_hi)
+{
+    return do_paired_cmpxchg64_le(env, addr, new_lo, new_hi, true);
+}
+
+static uint64_t do_paired_cmpxchg64_be(CPUARMState *env, uint64_t addr,
+                                       uint64_t new_lo, uint64_t new_hi,
+                                       bool parallel)
 {
     uintptr_t ra = GETPC();
     Int128 oldv, cmpv, newv;
@@ -494,7 +508,7 @@ uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr,
     cmpv = int128_make128(env->exclusive_val, env->exclusive_high);
     newv = int128_make128(new_lo, new_hi);
 
-    if (parallel_cpus) {
+    if (parallel) {
 #ifndef CONFIG_ATOMIC128
         cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
 #else
@@ -537,3 +551,15 @@ uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr,
 
     return !success;
 }
+
+uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr,
+                                     uint64_t new_lo, uint64_t new_hi)
+{
+    return do_paired_cmpxchg64_be(env, addr, new_lo, new_hi, false);
+}
+
+uint64_t HELPER(paired_cmpxchg64_be_parallel)(CPUARMState *env, uint64_t addr,
+                                     uint64_t new_lo, uint64_t new_hi)
+{
+    return do_paired_cmpxchg64_be(env, addr, new_lo, new_hi, true);
+}
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index 2a85666..a28f254 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -450,13 +450,6 @@ void HELPER(yield)(CPUARMState *env)
     ARMCPU *cpu = arm_env_get_cpu(env);
     CPUState *cs = CPU(cpu);
 
-    /* When running in MTTCG we don't generate jumps to the yield and
-     * WFE helpers as it won't affect the scheduling of other vCPUs.
-     * If we wanted to more completely model WFE/SEV so we don't busy
-     * spin unnecessarily we would need to do something more involved.
-     */
-    g_assert(!parallel_cpus);
-
     /* This is a non-trappable hint instruction that generally indicates
      * that the guest is currently busy-looping. Yield control back to the
      * top level loop so that a more deserving VCPU has a chance to run.
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 685f1b0..5e775bd 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1334,13 +1334,18 @@ static void handle_hint(DisasContext *s, uint32_t insn,
     case 3: /* WFI */
         s->is_jmp = DISAS_WFI;
         return;
+        /* When running in MTTCG we don't generate jumps to the yield and
+         * WFE helpers as it won't affect the scheduling of other vCPUs.
+         * If we wanted to more completely model WFE/SEV so we don't busy
+         * spin unnecessarily we would need to do something more involved.
+         */
     case 1: /* YIELD */
-        if (!parallel_cpus) {
+        if (!(tb_cflags(s->tb) & CF_PARALLEL)) {
             s->is_jmp = DISAS_YIELD;
         }
         return;
     case 2: /* WFE */
-        if (!parallel_cpus) {
+        if (!(tb_cflags(s->tb) & CF_PARALLEL)) {
             s->is_jmp = DISAS_WFE;
         }
         return;
@@ -1918,11 +1923,25 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
             tcg_gen_setcond_i64(TCG_COND_NE, tmp, tmp, val);
             tcg_temp_free_i64(val);
         } else if (s->be_data == MO_LE) {
-            gen_helper_paired_cmpxchg64_le(tmp, cpu_env, addr, cpu_reg(s, rt),
-                                           cpu_reg(s, rt2));
+            if (tb_cflags(s->tb) & CF_PARALLEL) {
+                gen_helper_paired_cmpxchg64_le_parallel(tmp, cpu_env, addr,
+                                                        cpu_reg(s, rt),
+                                                        cpu_reg(s, rt2));
+            } else {
+                gen_helper_paired_cmpxchg64_le(tmp, cpu_env, addr,
+                                               cpu_reg(s, rt),
+                                               cpu_reg(s, rt2));
+            }
         } else {
-            gen_helper_paired_cmpxchg64_be(tmp, cpu_env, addr, cpu_reg(s, rt),
-                                           cpu_reg(s, rt2));
+            if (tb_cflags(s->tb) & CF_PARALLEL) {
+                gen_helper_paired_cmpxchg64_be_parallel(tmp, cpu_env, addr,
+                                                        cpu_reg(s, rt),
+                                                        cpu_reg(s, rt2));
+            } else {
+                gen_helper_paired_cmpxchg64_be(tmp, cpu_env, addr,
+                                               cpu_reg(s, rt),
+                                               cpu_reg(s, rt2));
+            }
         }
     } else {
         TCGv_i64 val = cpu_reg(s, rt);
diff --git a/target/arm/translate.c b/target/arm/translate.c
index ccfb428..bd0ef58 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4497,8 +4497,13 @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
 static void gen_nop_hint(DisasContext *s, int val)
 {
     switch (val) {
+        /* When running in MTTCG we don't generate jumps to the yield and
+         * WFE helpers as it won't affect the scheduling of other vCPUs.
+         * If we wanted to more completely model WFE/SEV so we don't busy
+         * spin unnecessarily we would need to do something more involved.
+         */
     case 1: /* yield */
-        if (!parallel_cpus) {
+        if (!(tb_cflags(s->tb) & CF_PARALLEL)) {
             gen_set_pc_im(s, s->pc);
             s->is_jmp = DISAS_YIELD;
         }
@@ -4508,7 +4513,7 @@ static void gen_nop_hint(DisasContext *s, int val)
         s->is_jmp = DISAS_WFI;
         break;
     case 2: /* wfe */
-        if (!parallel_cpus) {
+        if (!(tb_cflags(s->tb) & CF_PARALLEL)) {
             gen_set_pc_im(s, s->pc);
             s->is_jmp = DISAS_WFE;
         }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 14/43] target/hppa: check CF_PARALLEL instead of parallel_cpus
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (12 preceding siblings ...)
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 13/43] target/arm: check CF_PARALLEL instead of parallel_cpus Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 15/43] target/i386: " Emilio G. Cota
                   ` (29 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/hppa/helper.h    |  2 ++
 target/hppa/op_helper.c | 32 ++++++++++++++++++++++++++++----
 target/hppa/translate.c | 12 ++++++++++--
 3 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/target/hppa/helper.h b/target/hppa/helper.h
index 789f07f..0a6b900 100644
--- a/target/hppa/helper.h
+++ b/target/hppa/helper.h
@@ -3,7 +3,9 @@ DEF_HELPER_FLAGS_2(tsv, TCG_CALL_NO_WG, void, env, tl)
 DEF_HELPER_FLAGS_2(tcond, TCG_CALL_NO_WG, void, env, tl)
 
 DEF_HELPER_FLAGS_3(stby_b, TCG_CALL_NO_WG, void, env, tl, tl)
+DEF_HELPER_FLAGS_3(stby_b_parallel, TCG_CALL_NO_WG, void, env, tl, tl)
 DEF_HELPER_FLAGS_3(stby_e, TCG_CALL_NO_WG, void, env, tl, tl)
+DEF_HELPER_FLAGS_3(stby_e_parallel, TCG_CALL_NO_WG, void, env, tl, tl)
 
 DEF_HELPER_FLAGS_1(probe_r, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(probe_w, TCG_CALL_NO_RWG_SE, tl, tl)
diff --git a/target/hppa/op_helper.c b/target/hppa/op_helper.c
index c05c0d5..3104404 100644
--- a/target/hppa/op_helper.c
+++ b/target/hppa/op_helper.c
@@ -76,7 +76,8 @@ static void atomic_store_3(CPUHPPAState *env, target_ulong addr, uint32_t val,
 #endif
 }
 
-void HELPER(stby_b)(CPUHPPAState *env, target_ulong addr, target_ulong val)
+static void do_stby_b(CPUHPPAState *env, target_ulong addr, target_ulong val,
+                      bool parallel)
 {
     uintptr_t ra = GETPC();
 
@@ -89,7 +90,7 @@ void HELPER(stby_b)(CPUHPPAState *env, target_ulong addr, target_ulong val)
         break;
     case 1:
         /* The 3 byte store must appear atomic.  */
-        if (parallel_cpus) {
+        if (parallel) {
             atomic_store_3(env, addr, val, 0x00ffffffu, ra);
         } else {
             cpu_stb_data_ra(env, addr, val >> 16, ra);
@@ -102,14 +103,26 @@ void HELPER(stby_b)(CPUHPPAState *env, target_ulong addr, target_ulong val)
     }
 }
 
-void HELPER(stby_e)(CPUHPPAState *env, target_ulong addr, target_ulong val)
+void HELPER(stby_b)(CPUHPPAState *env, target_ulong addr, target_ulong val)
+{
+    do_stby_b(env, addr, val, false);
+}
+
+void HELPER(stby_b_parallel)(CPUHPPAState *env, target_ulong addr,
+                             target_ulong val)
+{
+    do_stby_b(env, addr, val, true);
+}
+
+static void do_stby_e(CPUHPPAState *env, target_ulong addr, target_ulong val,
+                      bool parallel)
 {
     uintptr_t ra = GETPC();
 
     switch (addr & 3) {
     case 3:
         /* The 3 byte store must appear atomic.  */
-        if (parallel_cpus) {
+        if (parallel) {
             atomic_store_3(env, addr - 3, val, 0xffffff00u, ra);
         } else {
             cpu_stw_data_ra(env, addr - 3, val >> 16, ra);
@@ -132,6 +145,17 @@ void HELPER(stby_e)(CPUHPPAState *env, target_ulong addr, target_ulong val)
     }
 }
 
+void HELPER(stby_e)(CPUHPPAState *env, target_ulong addr, target_ulong val)
+{
+    do_stby_e(env, addr, val, false);
+}
+
+void HELPER(stby_e_parallel)(CPUHPPAState *env, target_ulong addr,
+                             target_ulong val)
+{
+    do_stby_e(env, addr, val, true);
+}
+
 target_ulong HELPER(probe_r)(target_ulong addr)
 {
     return page_check_range(addr, 1, PAGE_READ);
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 1effe82..66aa11d 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -2309,9 +2309,17 @@ static ExitStatus trans_stby(DisasContext *ctx, uint32_t insn,
     val = load_gpr(ctx, rt);
 
     if (a) {
-        gen_helper_stby_e(cpu_env, addr, val);
+        if (tb_cflags(ctx->tb) & CF_PARALLEL) {
+            gen_helper_stby_e_parallel(cpu_env, addr, val);
+        } else {
+            gen_helper_stby_e(cpu_env, addr, val);
+        }
     } else {
-        gen_helper_stby_b(cpu_env, addr, val);
+        if (tb_cflags(ctx->tb) & CF_PARALLEL) {
+            gen_helper_stby_b_parallel(cpu_env, addr, val);
+        } else {
+            gen_helper_stby_b(cpu_env, addr, val);
+        }
     }
 
     if (m) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 15/43] target/i386: check CF_PARALLEL instead of parallel_cpus
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (13 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 14/43] target/hppa: " Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 16/43] target/m68k: " Emilio G. Cota
                   ` (28 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/i386/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index f046ffa..0f38a48 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5263,7 +5263,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             if (!(s->cpuid_ext_features & CPUID_EXT_CX16))
                 goto illegal_op;
             gen_lea_modrm(env, s, modrm);
-            if ((s->prefix & PREFIX_LOCK) && parallel_cpus) {
+            if ((s->prefix & PREFIX_LOCK) && (tb_cflags(s->tb) & CF_PARALLEL)) {
                 gen_helper_cmpxchg16b(cpu_env, cpu_A0);
             } else {
                 gen_helper_cmpxchg16b_unlocked(cpu_env, cpu_A0);
@@ -5274,7 +5274,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             if (!(s->cpuid_features & CPUID_CX8))
                 goto illegal_op;
             gen_lea_modrm(env, s, modrm);
-            if ((s->prefix & PREFIX_LOCK) && parallel_cpus) {
+            if ((s->prefix & PREFIX_LOCK) && (tb_cflags(s->tb) & CF_PARALLEL)) {
                 gen_helper_cmpxchg8b(cpu_env, cpu_A0);
             } else {
                 gen_helper_cmpxchg8b_unlocked(cpu_env, cpu_A0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 16/43] target/m68k: check CF_PARALLEL instead of parallel_cpus
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (14 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 15/43] target/i386: " Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  7:23   ` Richard Henderson
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 17/43] target/s390x: " Emilio G. Cota
                   ` (27 subsequent siblings)
  43 siblings, 1 reply; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Thereby decoupling the resulting translated code from the current state
of the system.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/m68k/helper.h    |  1 +
 target/m68k/op_helper.c | 33 ++++++++++++++++++++-------------
 target/m68k/translate.c | 12 ++++++++++--
 3 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/target/m68k/helper.h b/target/m68k/helper.h
index 475a1f2..eebe52d 100644
--- a/target/m68k/helper.h
+++ b/target/m68k/helper.h
@@ -11,6 +11,7 @@ DEF_HELPER_2(set_sr, void, env, i32)
 DEF_HELPER_3(movec, void, env, i32, i32)
 DEF_HELPER_4(cas2w, void, env, i32, i32, i32)
 DEF_HELPER_4(cas2l, void, env, i32, i32, i32)
+DEF_HELPER_4(cas2l_parallel, void, env, i32, i32, i32)
 
 #define dh_alias_fp ptr
 #define dh_ctype_fp FPReg *
diff --git a/target/m68k/op_helper.c b/target/m68k/op_helper.c
index 7b5126c..6308951 100644
--- a/target/m68k/op_helper.c
+++ b/target/m68k/op_helper.c
@@ -361,6 +361,7 @@ void HELPER(divsll)(CPUM68KState *env, int numr, int regr, int32_t den)
     env->dregs[numr] = quot;
 }
 
+/* We're executing in a serial context -- no need to be atomic.  */
 void HELPER(cas2w)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2)
 {
     uint32_t Dc1 = extract32(regs, 9, 3);
@@ -374,17 +375,11 @@ void HELPER(cas2w)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2)
     int16_t l1, l2;
     uintptr_t ra = GETPC();
 
-    if (parallel_cpus) {
-        /* Tell the main loop we need to serialize this insn.  */
-        cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
-    } else {
-        /* We're executing in a serial context -- no need to be atomic.  */
-        l1 = cpu_lduw_data_ra(env, a1, ra);
-        l2 = cpu_lduw_data_ra(env, a2, ra);
-        if (l1 == c1 && l2 == c2) {
-            cpu_stw_data_ra(env, a1, u1, ra);
-            cpu_stw_data_ra(env, a2, u2, ra);
-        }
+    l1 = cpu_lduw_data_ra(env, a1, ra);
+    l2 = cpu_lduw_data_ra(env, a2, ra);
+    if (l1 == c1 && l2 == c2) {
+        cpu_stw_data_ra(env, a1, u1, ra);
+        cpu_stw_data_ra(env, a2, u2, ra);
     }
 
     if (c1 != l1) {
@@ -399,7 +394,8 @@ void HELPER(cas2w)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2)
     env->dregs[Dc2] = deposit32(env->dregs[Dc2], 0, 16, l2);
 }
 
-void HELPER(cas2l)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2)
+static void do_cas2l(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2,
+                     bool parallel)
 {
     uint32_t Dc1 = extract32(regs, 9, 3);
     uint32_t Dc2 = extract32(regs, 6, 3);
@@ -416,7 +412,7 @@ void HELPER(cas2l)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2)
     TCGMemOpIdx oi;
 #endif
 
-    if (parallel_cpus) {
+    if (parallel) {
         /* We're executing in a parallel context -- must be atomic.  */
 #ifdef CONFIG_ATOMIC64
         uint64_t c, u, l;
@@ -470,6 +466,17 @@ void HELPER(cas2l)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2)
     env->dregs[Dc2] = l2;
 }
 
+void HELPER(cas2l)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2)
+{
+    do_cas2l(env, regs, a1, a2, false);
+}
+
+void HELPER(cas2l_parallel)(CPUM68KState *env, uint32_t regs, uint32_t a1,
+                            uint32_t a2)
+{
+    do_cas2l(env, regs, a1, a2, true);
+}
+
 struct bf_data {
     uint32_t addr;
     uint32_t bofs;
diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 188520b..65044be 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -2308,7 +2308,11 @@ DISAS_INSN(cas2w)
                          (REG(ext1, 6) << 3) |
                          (REG(ext2, 0) << 6) |
                          (REG(ext1, 0) << 9));
-    gen_helper_cas2w(cpu_env, regs, addr1, addr2);
+    if (tb_cflags(s->tb) & CF_PARALLEL) {
+        gen_helper_exit_atomic(cpu_env);
+    } else {
+        gen_helper_cas2w(cpu_env, regs, addr1, addr2);
+    }
     tcg_temp_free(regs);
 
     /* Note that cas2w also assigned to env->cc_op.  */
@@ -2354,7 +2358,11 @@ DISAS_INSN(cas2l)
                          (REG(ext1, 6) << 3) |
                          (REG(ext2, 0) << 6) |
                          (REG(ext1, 0) << 9));
-    gen_helper_cas2l(cpu_env, regs, addr1, addr2);
+    if (tb_cflags(s->tb) & CF_PARALLEL) {
+        gen_helper_cas2l_parallel(cpu_env, regs, addr1, addr2);
+    } else {
+        gen_helper_cas2l(cpu_env, regs, addr1, addr2);
+    }
     tcg_temp_free(regs);
 
     /* Note that cas2l also assigned to env->cc_op.  */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 17/43] target/s390x: check CF_PARALLEL instead of parallel_cpus
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (15 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 16/43] target/m68k: " Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  7:25   ` Richard Henderson
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 18/43] target/sh4: " Emilio G. Cota
                   ` (26 subsequent siblings)
  43 siblings, 1 reply; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Thereby decoupling the resulting translated code from the current state
of the system.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/s390x/helper.h     |  4 +++
 target/s390x/mem_helper.c | 80 +++++++++++++++++++++++++++++++++++++----------
 target/s390x/translate.c  | 26 ++++++++++++---
 3 files changed, 88 insertions(+), 22 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 4b02907..84a4597 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -34,7 +34,9 @@ DEF_HELPER_3(celgb, i64, env, i64, i32)
 DEF_HELPER_3(cdlgb, i64, env, i64, i32)
 DEF_HELPER_3(cxlgb, i64, env, i64, i32)
 DEF_HELPER_4(cdsg, void, env, i64, i32, i32)
+DEF_HELPER_4(cdsg_parallel, void, env, i64, i32, i32)
 DEF_HELPER_4(csst, i32, env, i32, i64, i64)
+DEF_HELPER_4(csst_parallel, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_3(aeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(adb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_5(axb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64)
@@ -107,7 +109,9 @@ DEF_HELPER_FLAGS_1(popcnt, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(stfl, TCG_CALL_NO_RWG, void, env)
 DEF_HELPER_2(stfle, i32, env, i64)
 DEF_HELPER_FLAGS_2(lpq, TCG_CALL_NO_WG, i64, env, i64)
+DEF_HELPER_FLAGS_2(lpq_parallel, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_4(stpq, TCG_CALL_NO_WG, void, env, i64, i64, i64)
+DEF_HELPER_FLAGS_4(stpq_parallel, TCG_CALL_NO_WG, void, env, i64, i64, i64)
 DEF_HELPER_4(mvcos, i32, env, i64, i64, i64)
 DEF_HELPER_4(cu12, i32, env, i32, i32, i32)
 DEF_HELPER_4(cu14, i32, env, i32, i32, i32)
diff --git a/target/s390x/mem_helper.c b/target/s390x/mem_helper.c
index cdc78aa..74a2157 100644
--- a/target/s390x/mem_helper.c
+++ b/target/s390x/mem_helper.c
@@ -1363,8 +1363,8 @@ uint32_t HELPER(trXX)(CPUS390XState *env, uint32_t r1, uint32_t r2,
     return cc;
 }
 
-void HELPER(cdsg)(CPUS390XState *env, uint64_t addr,
-                  uint32_t r1, uint32_t r3)
+static void do_cdsg(CPUS390XState *env, uint64_t addr,
+                    uint32_t r1, uint32_t r3, bool parallel)
 {
     uintptr_t ra = GETPC();
     Int128 cmpv = int128_make128(env->regs[r1 + 1], env->regs[r1]);
@@ -1372,7 +1372,7 @@ void HELPER(cdsg)(CPUS390XState *env, uint64_t addr,
     Int128 oldv;
     bool fail;
 
-    if (parallel_cpus) {
+    if (parallel) {
 #ifndef CONFIG_ATOMIC128
         cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
 #else
@@ -1404,7 +1404,20 @@ void HELPER(cdsg)(CPUS390XState *env, uint64_t addr,
     env->regs[r1 + 1] = int128_getlo(oldv);
 }
 
-uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2)
+void HELPER(cdsg)(CPUS390XState *env, uint64_t addr,
+                  uint32_t r1, uint32_t r3)
+{
+    do_cdsg(env, addr, r1, r3, false);
+}
+
+void HELPER(cdsg_parallel)(CPUS390XState *env, uint64_t addr,
+                           uint32_t r1, uint32_t r3)
+{
+    do_cdsg(env, addr, r1, r3, true);
+}
+
+static uint32_t do_csst(CPUS390XState *env, uint32_t r3, uint64_t a1,
+                        uint64_t a2, bool parallel)
 {
 #if !defined(CONFIG_USER_ONLY) || defined(CONFIG_ATOMIC128)
     uint32_t mem_idx = cpu_mmu_index(env, false);
@@ -1440,7 +1453,7 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2)
        the complete operation is not.  Therefore we do not need to assert serial
        context in order to implement this.  That said, restart early if we can't
        support either operation that is supposed to be atomic.  */
-    if (parallel_cpus) {
+    if (parallel) {
         int mask = 0;
 #if !defined(CONFIG_ATOMIC64)
         mask = -8;
@@ -1464,7 +1477,7 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2)
             uint32_t cv = env->regs[r3];
             uint32_t ov;
 
-            if (parallel_cpus) {
+            if (parallel) {
 #ifdef CONFIG_USER_ONLY
                 uint32_t *haddr = g2h(a1);
                 ov = atomic_cmpxchg__nocheck(haddr, cv, nv);
@@ -1487,7 +1500,7 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2)
             uint64_t cv = env->regs[r3];
             uint64_t ov;
 
-            if (parallel_cpus) {
+            if (parallel) {
 #ifdef CONFIG_ATOMIC64
 # ifdef CONFIG_USER_ONLY
                 uint64_t *haddr = g2h(a1);
@@ -1497,7 +1510,7 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2)
                 ov = helper_atomic_cmpxchgq_be_mmu(env, a1, cv, nv, oi, ra);
 # endif
 #else
-                /* Note that we asserted !parallel_cpus above.  */
+                /* Note that we asserted !parallel above.  */
                 g_assert_not_reached();
 #endif
             } else {
@@ -1517,13 +1530,13 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2)
             Int128 cv = int128_make128(env->regs[r3 + 1], env->regs[r3]);
             Int128 ov;
 
-            if (parallel_cpus) {
+            if (parallel) {
 #ifdef CONFIG_ATOMIC128
                 TCGMemOpIdx oi = make_memop_idx(MO_TEQ | MO_ALIGN_16, mem_idx);
                 ov = helper_atomic_cmpxchgo_be_mmu(env, a1, cv, nv, oi, ra);
                 cc = !int128_eq(ov, cv);
 #else
-                /* Note that we asserted !parallel_cpus above.  */
+                /* Note that we asserted !parallel above.  */
                 g_assert_not_reached();
 #endif
             } else {
@@ -1567,13 +1580,13 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2)
             cpu_stq_data_ra(env, a2, svh, ra);
             break;
         case 4:
-            if (parallel_cpus) {
+            if (parallel) {
 #ifdef CONFIG_ATOMIC128
                 TCGMemOpIdx oi = make_memop_idx(MO_TEQ | MO_ALIGN_16, mem_idx);
                 Int128 sv = int128_make128(svl, svh);
                 helper_atomic_sto_be_mmu(env, a2, sv, oi, ra);
 #else
-                /* Note that we asserted !parallel_cpus above.  */
+                /* Note that we asserted !parallel above.  */
                 g_assert_not_reached();
 #endif
             } else {
@@ -1593,6 +1606,17 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2)
     g_assert_not_reached();
 }
 
+uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2)
+{
+    return do_csst(env, r3, a1, a2, false);
+}
+
+uint32_t HELPER(csst_parallel)(CPUS390XState *env, uint32_t r3, uint64_t a1,
+                               uint64_t a2)
+{
+    return do_csst(env, r3, a1, a2, true);
+}
+
 #if !defined(CONFIG_USER_ONLY)
 void HELPER(lctlg)(CPUS390XState *env, uint32_t r1, uint64_t a2, uint32_t r3)
 {
@@ -2035,12 +2059,12 @@ uint64_t HELPER(lra)(CPUS390XState *env, uint64_t addr)
 #endif
 
 /* load pair from quadword */
-uint64_t HELPER(lpq)(CPUS390XState *env, uint64_t addr)
+static uint64_t do_lpq(CPUS390XState *env, uint64_t addr, bool parallel)
 {
     uintptr_t ra = GETPC();
     uint64_t hi, lo;
 
-    if (parallel_cpus) {
+    if (parallel) {
 #ifndef CONFIG_ATOMIC128
         cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
 #else
@@ -2061,13 +2085,23 @@ uint64_t HELPER(lpq)(CPUS390XState *env, uint64_t addr)
     return hi;
 }
 
+uint64_t HELPER(lpq)(CPUS390XState *env, uint64_t addr)
+{
+    return do_lpq(env, addr, false);
+}
+
+uint64_t HELPER(lpq_parallel)(CPUS390XState *env, uint64_t addr)
+{
+    return do_lpq(env, addr, true);
+}
+
 /* store pair to quadword */
-void HELPER(stpq)(CPUS390XState *env, uint64_t addr,
-                  uint64_t low, uint64_t high)
+static void do_stpq(CPUS390XState *env, uint64_t addr,
+                    uint64_t low, uint64_t high, bool parallel)
 {
     uintptr_t ra = GETPC();
 
-    if (parallel_cpus) {
+    if (parallel) {
 #ifndef CONFIG_ATOMIC128
         cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
 #else
@@ -2085,6 +2119,18 @@ void HELPER(stpq)(CPUS390XState *env, uint64_t addr,
     }
 }
 
+void HELPER(stpq)(CPUS390XState *env, uint64_t addr,
+                  uint64_t low, uint64_t high)
+{
+    do_stpq(env, addr, low, high, false);
+}
+
+void HELPER(stpq_parallel)(CPUS390XState *env, uint64_t addr,
+                           uint64_t low, uint64_t high)
+{
+    do_stpq(env, addr, low, high, true);
+}
+
 /* Execute instruction.  This instruction executes an insn modified with
    the contents of r1.  It does not change the executed instruction in memory;
    it does not change the program counter.
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 0d5d623..ea8a90a 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -2024,7 +2024,11 @@ static ExitStatus op_cdsg(DisasContext *s, DisasOps *o)
     addr = get_address(s, 0, b2, d2);
     t_r1 = tcg_const_i32(r1);
     t_r3 = tcg_const_i32(r3);
-    gen_helper_cdsg(cpu_env, addr, t_r1, t_r3);
+    if (tb_cflags(s->tb) & CF_PARALLEL) {
+        gen_helper_cdsg_parallel(cpu_env, addr, t_r1, t_r3);
+    } else {
+        gen_helper_cdsg(cpu_env, addr, t_r1, t_r3);
+    }
     tcg_temp_free_i64(addr);
     tcg_temp_free_i32(t_r1);
     tcg_temp_free_i32(t_r3);
@@ -2038,7 +2042,11 @@ static ExitStatus op_csst(DisasContext *s, DisasOps *o)
     int r3 = get_field(s->fields, r3);
     TCGv_i32 t_r3 = tcg_const_i32(r3);
 
-    gen_helper_csst(cc_op, cpu_env, t_r3, o->in1, o->in2);
+    if (tb_cflags(s->tb) & CF_PARALLEL) {
+        gen_helper_csst_parallel(cc_op, cpu_env, t_r3, o->in1, o->in2);
+    } else {
+        gen_helper_csst(cc_op, cpu_env, t_r3, o->in1, o->in2);
+    }
     tcg_temp_free_i32(t_r3);
 
     set_cc_static(s);
@@ -2943,7 +2951,7 @@ static ExitStatus op_lpd(DisasContext *s, DisasOps *o)
     TCGMemOp mop = s->insn->data;
 
     /* In a parallel context, stop the world and single step.  */
-    if (parallel_cpus) {
+    if (tb_cflags(s->tb) & CF_PARALLEL) {
         potential_page_fault(s);
         gen_exception(EXCP_ATOMIC);
         return EXIT_NORETURN;
@@ -2964,7 +2972,11 @@ static ExitStatus op_lpd(DisasContext *s, DisasOps *o)
 
 static ExitStatus op_lpq(DisasContext *s, DisasOps *o)
 {
-    gen_helper_lpq(o->out, cpu_env, o->in2);
+    if (tb_cflags(s->tb) & CF_PARALLEL) {
+        gen_helper_lpq_parallel(o->out, cpu_env, o->in2);
+    } else {
+        gen_helper_lpq(o->out, cpu_env, o->in2);
+    }
     return_low128(o->out2);
     return NO_EXIT;
 }
@@ -4281,7 +4293,11 @@ static ExitStatus op_stmh(DisasContext *s, DisasOps *o)
 
 static ExitStatus op_stpq(DisasContext *s, DisasOps *o)
 {
-    gen_helper_stpq(cpu_env, o->in2, o->out2, o->out);
+    if (tb_cflags(s->tb) & CF_PARALLEL) {
+        gen_helper_stpq_parallel(cpu_env, o->in2, o->out2, o->out);
+    } else {
+        gen_helper_stpq(cpu_env, o->in2, o->out2, o->out);
+    }
     return NO_EXIT;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 18/43] target/sh4: check CF_PARALLEL instead of parallel_cpus
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (16 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 17/43] target/s390x: " Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  7:26   ` Richard Henderson
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 19/43] target/sparc: " Emilio G. Cota
                   ` (25 subsequent siblings)
  43 siblings, 1 reply; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Thereby decoupling the resulting translated code from the current state
of the system.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/sh4/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 9fcaefd..52fabb3 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -528,7 +528,7 @@ static void _decode_opc(DisasContext * ctx)
         /* Detect the start of a gUSA region.  If so, update envflags
            and end the TB.  This will allow us to see the end of the
            region (stored in R0) in the next TB.  */
-        if (B11_8 == 15 && B7_0s < 0 && parallel_cpus) {
+        if (B11_8 == 15 && B7_0s < 0 && (tb_cflags(ctx->tb) & CF_PARALLEL)) {
             ctx->envflags = deposit32(ctx->envflags, GUSA_SHIFT, 8, B7_0s);
             ctx->bstate = BS_STOP;
         }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 19/43] target/sparc: check CF_PARALLEL instead of parallel_cpus
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (17 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 18/43] target/sh4: " Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 20/43] tcg: " Emilio G. Cota
                   ` (24 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/sparc/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index 39d8494..768ce68 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -2450,7 +2450,7 @@ static void gen_ldstub_asi(DisasContext *dc, TCGv dst, TCGv addr, int insn)
     default:
         /* ??? In theory, this should be raise DAE_invalid_asi.
            But the SS-20 roms do ldstuba [%l0] #ASI_M_CTL, %o1.  */
-        if (parallel_cpus) {
+        if (tb_cflags(dc->tb) & CF_PARALLEL) {
             gen_helper_exit_atomic(cpu_env);
         } else {
             TCGv_i32 r_asi = tcg_const_i32(da.asi);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 20/43] tcg: check CF_PARALLEL instead of parallel_cpus
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (18 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 19/43] target/sparc: " Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 21/43] cpu-exec: lookup/generate TB outside exclusive region during step_atomic Emilio G. Cota
                   ` (23 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Thereby decoupling the resulting translated code from the current state
of the system.

The tb->cflags field is not passed to tcg generation functions. So
we add a bit to TCGContext, storing there whether CF_PARALLEL is set
before translating every TB.

Most architectures have <= 32 registers, which results in a 4-byte hole
in TCGContext. Use this hole for the bit we need, which we store in a bool.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/tcg.h                 |  1 +
 accel/tcg/translate-all.c |  1 +
 tcg/tcg-op.c              | 10 +++++-----
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index 96872f8..9b6dade 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -656,6 +656,7 @@ struct TCGContext {
     uintptr_t *tb_jmp_target_addr; /* tb->jmp_target_addr if !USE_DIRECT_JUMP */
 
     TCGRegSet reserved_regs;
+    bool cf_parallel; /* whether CF_PARALLEL is set in tb->cflags */
     intptr_t current_frame_offset;
     intptr_t frame_start;
     intptr_t frame_end;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 600c0a1..645bc70 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1271,6 +1271,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tb->flags = flags;
     tb->cflags = cflags;
     tb->trace_vcpu_dstate = *cpu->trace_dstate;
+    tcg_ctx.cf_parallel = !!(cflags & CF_PARALLEL);
 
 #ifdef CONFIG_PROFILER
     tcg_ctx.tb_count1++; /* includes aborted translations because of
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 205d07f..ef420d4 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -150,7 +150,7 @@ void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2,
 
 void tcg_gen_mb(TCGBar mb_type)
 {
-    if (parallel_cpus) {
+    if (tcg_ctx.cf_parallel) {
         tcg_gen_op1(&tcg_ctx, INDEX_op_mb, mb_type);
     }
 }
@@ -2794,7 +2794,7 @@ void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
 {
     memop = tcg_canonicalize_memop(memop, 0, 0);
 
-    if (!parallel_cpus) {
+    if (!tcg_ctx.cf_parallel) {
         TCGv_i32 t1 = tcg_temp_new_i32();
         TCGv_i32 t2 = tcg_temp_new_i32();
 
@@ -2838,7 +2838,7 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
 {
     memop = tcg_canonicalize_memop(memop, 1, 0);
 
-    if (!parallel_cpus) {
+    if (!tcg_ctx.cf_parallel) {
         TCGv_i64 t1 = tcg_temp_new_i64();
         TCGv_i64 t2 = tcg_temp_new_i64();
 
@@ -3015,7 +3015,7 @@ static void * const table_##NAME[16] = {                                \
 void tcg_gen_atomic_##NAME##_i32                                        \
     (TCGv_i32 ret, TCGv addr, TCGv_i32 val, TCGArg idx, TCGMemOp memop) \
 {                                                                       \
-    if (parallel_cpus) {                                                \
+    if (tcg_ctx.cf_parallel) {                                          \
         do_atomic_op_i32(ret, addr, val, idx, memop, table_##NAME);     \
     } else {                                                            \
         do_nonatomic_op_i32(ret, addr, val, idx, memop, NEW,            \
@@ -3025,7 +3025,7 @@ void tcg_gen_atomic_##NAME##_i32                                        \
 void tcg_gen_atomic_##NAME##_i64                                        \
     (TCGv_i64 ret, TCGv addr, TCGv_i64 val, TCGArg idx, TCGMemOp memop) \
 {                                                                       \
-    if (parallel_cpus) {                                                \
+    if (tcg_ctx.cf_parallel) {                                          \
         do_atomic_op_i64(ret, addr, val, idx, memop, table_##NAME);     \
     } else {                                                            \
         do_nonatomic_op_i64(ret, addr, val, idx, memop, NEW,            \
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 21/43] cpu-exec: lookup/generate TB outside exclusive region during step_atomic
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (19 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 20/43] tcg: " Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 22/43] translate-all: define and use DEBUG_TB_FLUSH_GATE Emilio G. Cota
                   ` (22 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Now that all code generation has been converted to check CF_PARALLEL, we can
generate !CF_PARALLEL code without having yet set !parallel_cpus --
and therefore without having to be in the exclusive region during
cpu_exec_step_atomic.

While at it, merge cpu_exec_step into cpu_exec_step_atomic.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 accel/tcg/cpu-exec.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index b71e015..526cab3 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -223,30 +223,40 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 }
 #endif
 
-static void cpu_exec_step(CPUState *cpu)
+void cpu_exec_step_atomic(CPUState *cpu)
 {
     CPUClass *cc = CPU_GET_CLASS(cpu);
     TranslationBlock *tb;
     target_ulong cs_base, pc;
     uint32_t flags;
     uint32_t cflags = 1 | CF_IGNORE_ICOUNT;
+    uint32_t cf_mask = cflags & CF_HASH_MASK;
 
     if (sigsetjmp(cpu->jmp_env, 0) == 0) {
-        tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags,
-                                  cflags & CF_HASH_MASK);
+        tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask);
         if (tb == NULL) {
             mmap_lock();
             tb_lock();
-            tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
+            tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
+            if (likely(tb == NULL)) {
+                tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
+            }
             tb_unlock();
             mmap_unlock();
         }
 
+        start_exclusive();
+
+        /* Since we got here, we know that parallel_cpus must be true.  */
+        parallel_cpus = false;
         cc->cpu_exec_enter(cpu);
         /* execute the generated code */
         trace_exec_tb(tb, pc);
         cpu_tb_exec(cpu, tb);
         cc->cpu_exec_exit(cpu);
+        parallel_cpus = true;
+
+        end_exclusive();
     } else {
         /* We may have exited due to another problem here, so we need
          * to reset any tb_locks we may have taken but didn't release.
@@ -260,18 +270,6 @@ static void cpu_exec_step(CPUState *cpu)
     }
 }
 
-void cpu_exec_step_atomic(CPUState *cpu)
-{
-    start_exclusive();
-
-    /* Since we got here, we know that parallel_cpus must be true.  */
-    parallel_cpus = false;
-    cpu_exec_step(cpu);
-    parallel_cpus = true;
-
-    end_exclusive();
-}
-
 struct tb_desc {
     target_ulong pc;
     target_ulong cs_base;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 22/43] translate-all: define and use DEBUG_TB_FLUSH_GATE
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (20 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 21/43] cpu-exec: lookup/generate TB outside exclusive region during step_atomic Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 23/43] exec-all: introduce TB_PAGE_ADDR_FMT Emilio G. Cota
                   ` (21 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

This gets rid of some ifdef checks while ensuring that the debug code
is compiled, which prevents bit rot.

Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 accel/tcg/translate-all.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 645bc70..c1cd258 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -65,6 +65,12 @@
 /* make various TB consistency checks */
 /* #define DEBUG_TB_CHECK */
 
+#ifdef DEBUG_TB_FLUSH
+#define DEBUG_TB_FLUSH_GATE 1
+#else
+#define DEBUG_TB_FLUSH_GATE 0
+#endif
+
 #if !defined(CONFIG_USER_ONLY)
 /* TB consistency checks only implemented for usermode emulation.  */
 #undef DEBUG_TB_CHECK
@@ -899,13 +905,13 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
         goto done;
     }
 
-#if defined(DEBUG_TB_FLUSH)
-    printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
-           (unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
-           tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.tb_ctx.nb_tbs > 0 ?
-           ((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer)) /
-           tcg_ctx.tb_ctx.nb_tbs : 0);
-#endif
+    if (DEBUG_TB_FLUSH_GATE) {
+        printf("qemu: flush code_size=%td nb_tbs=%d avg_tb_size=%td\n",
+               tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer,
+               tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.tb_ctx.nb_tbs > 0 ?
+               (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) /
+               tcg_ctx.tb_ctx.nb_tbs : 0);
+    }
     if ((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer)
         > tcg_ctx.code_gen_buffer_size) {
         cpu_abort(cpu, "Internal error: code buffer overflow\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 23/43] exec-all: introduce TB_PAGE_ADDR_FMT
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (21 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 22/43] translate-all: define and use DEBUG_TB_FLUSH_GATE Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 24/43] translate-all: define and use DEBUG_TB_INVALIDATE_GATE Emilio G. Cota
                   ` (20 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

And fix the following warning when DEBUG_TB_INVALIDATE is enabled
in translate-all.c:

  CC      mipsn32-linux-user/accel/tcg/translate-all.o
/data/src/qemu/accel/tcg/translate-all.c: In function ‘tb_alloc_page’:
/data/src/qemu/accel/tcg/translate-all.c:1201:16: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘tb_page_addr_t {aka unsigned int}’ [-Werror=format=]
         printf("protecting code page: 0x" TARGET_FMT_lx "\n",
                ^
cc1: all warnings being treated as errors
/data/src/qemu/rules.mak:66: recipe for target 'accel/tcg/translate-all.o' failed
make[1]: *** [accel/tcg/translate-all.o] Error 1
Makefile:328: recipe for target 'subdir-mipsn32-linux-user' failed
make: *** [subdir-mipsn32-linux-user] Error 2
cota@flamenco:/data/src/qemu/build ((18f3fe1...) *$)$

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/exec-all.h   | 2 ++
 accel/tcg/translate-all.c | 3 +--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 0af0485..00f7da8 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -31,8 +31,10 @@
    type.  */
 #if defined(CONFIG_USER_ONLY)
 typedef abi_ulong tb_page_addr_t;
+#define TB_PAGE_ADDR_FMT TARGET_ABI_FMT_lx
 #else
 typedef ram_addr_t tb_page_addr_t;
+#define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
 #endif
 
 /* DisasContext is_jmp field values
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index c1cd258..c4c23f9 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1194,8 +1194,7 @@ static inline void tb_alloc_page(TranslationBlock *tb,
         mprotect(g2h(page_addr), qemu_host_page_size,
                  (prot & PAGE_BITS) & ~PAGE_WRITE);
 #ifdef DEBUG_TB_INVALIDATE
-        printf("protecting code page: 0x" TARGET_FMT_lx "\n",
-               page_addr);
+        printf("protecting code page: 0x" TB_PAGE_ADDR_FMT "\n", page_addr);
 #endif
     }
 #else
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 24/43] translate-all: define and use DEBUG_TB_INVALIDATE_GATE
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (22 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 23/43] exec-all: introduce TB_PAGE_ADDR_FMT Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 25/43] translate-all: define and use DEBUG_TB_CHECK_GATE Emilio G. Cota
                   ` (19 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

This gets rid of an ifdef check while ensuring that the debug code
is compiled, which prevents bit rot.

Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 accel/tcg/translate-all.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index c4c23f9..962e9b3 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -65,6 +65,12 @@
 /* make various TB consistency checks */
 /* #define DEBUG_TB_CHECK */
 
+#ifdef DEBUG_TB_INVALIDATE
+#define DEBUG_TB_INVALIDATE_GATE 1
+#else
+#define DEBUG_TB_INVALIDATE_GATE 0
+#endif
+
 #ifdef DEBUG_TB_FLUSH
 #define DEBUG_TB_FLUSH_GATE 1
 #else
@@ -1193,9 +1199,9 @@ static inline void tb_alloc_page(TranslationBlock *tb,
           }
         mprotect(g2h(page_addr), qemu_host_page_size,
                  (prot & PAGE_BITS) & ~PAGE_WRITE);
-#ifdef DEBUG_TB_INVALIDATE
-        printf("protecting code page: 0x" TB_PAGE_ADDR_FMT "\n", page_addr);
-#endif
+        if (DEBUG_TB_INVALIDATE_GATE) {
+            printf("protecting code page: 0x" TB_PAGE_ADDR_FMT "\n", page_addr);
+        }
     }
 #else
     /* if some code is already present, then the pages are already
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 25/43] translate-all: define and use DEBUG_TB_CHECK_GATE
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (23 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 24/43] translate-all: define and use DEBUG_TB_INVALIDATE_GATE Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 26/43] exec-all: extract tb->tc_* into a separate struct tc_tb Emilio G. Cota
                   ` (18 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

This prevents bit rot by ensuring the debug code is compiled when
building a user-mode target.

Unfortunately the helpers are user-mode-only so we cannot fully
get rid of the ifdef checks. Add a comment to explain this.

Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 accel/tcg/translate-all.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 962e9b3..845585b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -82,6 +82,12 @@
 #undef DEBUG_TB_CHECK
 #endif
 
+#ifdef DEBUG_TB_CHECK
+#define DEBUG_TB_CHECK_GATE 1
+#else
+#define DEBUG_TB_CHECK_GATE 0
+#endif
+
 /* Access to the various translations structures need to be serialised via locks
  * for consistency. This is automatic for SoftMMU based system
  * emulation due to its single threaded nature. In user-mode emulation
@@ -950,7 +956,13 @@ void tb_flush(CPUState *cpu)
     }
 }
 
-#ifdef DEBUG_TB_CHECK
+/*
+ * Formerly ifdef DEBUG_TB_CHECK. These debug functions are user-mode-only,
+ * so in order to prevent bit rot we compile them unconditionally in user-mode,
+ * and let the optimizer get rid of them by wrapping their user-only callers
+ * with if (DEBUG_TB_CHECK_GATE).
+ */
+#ifdef CONFIG_USER_ONLY
 
 static void
 do_tb_invalidate_check(struct qht *ht, void *p, uint32_t hash, void *userp)
@@ -994,7 +1006,7 @@ static void tb_page_check(void)
     qht_iter(&tcg_ctx.tb_ctx.htable, do_tb_page_check, NULL);
 }
 
-#endif
+#endif /* CONFIG_USER_ONLY */
 
 static inline void tb_page_remove(TranslationBlock **ptb, TranslationBlock *tb)
 {
@@ -1238,8 +1250,10 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
                      tb->trace_vcpu_dstate);
     qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
 
-#ifdef DEBUG_TB_CHECK
-    tb_page_check();
+#ifdef CONFIG_USER_ONLY
+    if (DEBUG_TB_CHECK_GATE) {
+        tb_page_check();
+    }
 #endif
 }
 
@@ -2209,8 +2223,10 @@ int page_unprotect(target_ulong address, uintptr_t pc)
             /* and since the content will be modified, we must invalidate
                the corresponding translated code. */
             current_tb_invalidated |= tb_invalidate_phys_page(addr, pc);
-#ifdef DEBUG_TB_CHECK
-            tb_invalidate_check(addr);
+#ifdef CONFIG_USER_ONLY
+            if (DEBUG_TB_CHECK_GATE) {
+                tb_invalidate_check(addr);
+            }
 #endif
         }
         mprotect((void *)g2h(host_start), qemu_host_page_size,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 26/43] exec-all: extract tb->tc_* into a separate struct tc_tb
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (24 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 25/43] translate-all: define and use DEBUG_TB_CHECK_GATE Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 27/43] translate-all: use a binary search tree to track TBs in TBContext Emilio G. Cota
                   ` (17 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

In preparation for adding tc.size to be able to keep track of
TB's using the binary search tree implementation from glib.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/exec-all.h   | 20 ++++++++++++++------
 accel/tcg/cpu-exec.c      |  6 +++---
 accel/tcg/translate-all.c | 20 ++++++++++----------
 tcg/tcg-runtime.c         |  4 ++--
 tcg/tcg.c                 |  4 ++--
 5 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 00f7da8..bc4f41c 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -341,6 +341,14 @@ static inline void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
 #define USE_DIRECT_JUMP
 #endif
 
+/*
+ * Translation Cache-related fields of a TB.
+ */
+struct tb_tc {
+    void *ptr;    /* pointer to the translated code */
+    uint8_t *search;  /* pointer to search data */
+};
+
 struct TranslationBlock {
     target_ulong pc;   /* simulated PC corresponding to this block (EIP + CS base) */
     target_ulong cs_base; /* CS base for this block */
@@ -362,8 +370,8 @@ struct TranslationBlock {
     /* Per-vCPU dynamic tracing state used to generate this TB */
     uint32_t trace_vcpu_dstate;
 
-    void *tc_ptr;    /* pointer to the translated code */
-    uint8_t *tc_search;  /* pointer to search data */
+    struct tb_tc tc;
+
     /* original tb when cflags has CF_NOCACHE */
     struct TranslationBlock *orig_tb;
     /* first and second physical page containing code. The lower bit
@@ -462,7 +470,7 @@ static inline void tb_set_jmp_target(TranslationBlock *tb,
                                      int n, uintptr_t addr)
 {
     uint16_t offset = tb->jmp_insn_offset[n];
-    tb_set_jmp_target1((uintptr_t)(tb->tc_ptr + offset), addr);
+    tb_set_jmp_target1((uintptr_t)(tb->tc.ptr + offset), addr);
 }
 
 #else
@@ -489,11 +497,11 @@ static inline void tb_add_jump(TranslationBlock *tb, int n,
     qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
                            "Linking TBs %p [" TARGET_FMT_lx
                            "] index %d -> %p [" TARGET_FMT_lx "]\n",
-                           tb->tc_ptr, tb->pc, n,
-                           tb_next->tc_ptr, tb_next->pc);
+                           tb->tc.ptr, tb->pc, n,
+                           tb_next->tc.ptr, tb_next->pc);
 
     /* patch the native jump address */
-    tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc_ptr);
+    tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc.ptr);
 
     /* add in TB jmp circular list */
     tb->jmp_list_next[n] = tb_next->jmp_list_first;
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 526cab3..cb1e6d3 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -143,11 +143,11 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
     uintptr_t ret;
     TranslationBlock *last_tb;
     int tb_exit;
-    uint8_t *tb_ptr = itb->tc_ptr;
+    uint8_t *tb_ptr = itb->tc.ptr;
 
     qemu_log_mask_and_addr(CPU_LOG_EXEC, itb->pc,
                            "Trace %p [%d: " TARGET_FMT_lx "] %s\n",
-                           itb->tc_ptr, cpu->cpu_index, itb->pc,
+                           itb->tc.ptr, cpu->cpu_index, itb->pc,
                            lookup_symbol(itb->pc));
 
 #if defined(DEBUG_DISAS)
@@ -179,7 +179,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
         qemu_log_mask_and_addr(CPU_LOG_EXEC, last_tb->pc,
                                "Stopped execution of TB chain before %p ["
                                TARGET_FMT_lx "] %s\n",
-                               last_tb->tc_ptr, last_tb->pc,
+                               last_tb->tc.ptr, last_tb->pc,
                                lookup_symbol(last_tb->pc));
         if (cc->synchronize_from_tb) {
             cc->synchronize_from_tb(cpu, last_tb);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 845585b..0a2eb86 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -260,7 +260,7 @@ static target_long decode_sleb128(uint8_t **pp)
    which comes from the host pc of the end of the code implementing the insn.
 
    Each line of the table is encoded as sleb128 deltas from the previous
-   line.  The seed for the first line is { tb->pc, 0..., tb->tc_ptr }.
+   line.  The seed for the first line is { tb->pc, 0..., tb->tc.ptr }.
    That is, the first column is seeded with the guest pc, the last column
    with the host pc, and the middle columns with zeros.  */
 
@@ -270,7 +270,7 @@ static int encode_search(TranslationBlock *tb, uint8_t *block)
     uint8_t *p = block;
     int i, j, n;
 
-    tb->tc_search = block;
+    tb->tc.search = block;
 
     for (i = 0, n = tb->icount; i < n; ++i) {
         target_ulong prev;
@@ -305,9 +305,9 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
                                      uintptr_t searched_pc)
 {
     target_ulong data[TARGET_INSN_START_WORDS] = { tb->pc };
-    uintptr_t host_pc = (uintptr_t)tb->tc_ptr;
+    uintptr_t host_pc = (uintptr_t)tb->tc.ptr;
     CPUArchState *env = cpu->env_ptr;
-    uint8_t *p = tb->tc_search;
+    uint8_t *p = tb->tc.search;
     int i, j, num_insns = tb->icount;
 #ifdef CONFIG_PROFILER
     int64_t ti = profile_getclock();
@@ -858,7 +858,7 @@ void tb_free(TranslationBlock *tb)
             tb == tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs - 1]) {
         size_t struct_size = ROUND_UP(sizeof(*tb), qemu_icache_linesize);
 
-        tcg_ctx.code_gen_ptr = tb->tc_ptr - struct_size;
+        tcg_ctx.code_gen_ptr = tb->tc.ptr - struct_size;
         tcg_ctx.tb_ctx.nb_tbs--;
     }
 }
@@ -1059,7 +1059,7 @@ static inline void tb_remove_from_jmp_list(TranslationBlock *tb, int n)
    another TB */
 static inline void tb_reset_jump(TranslationBlock *tb, int n)
 {
-    uintptr_t addr = (uintptr_t)(tb->tc_ptr + tb->jmp_reset_offset[n]);
+    uintptr_t addr = (uintptr_t)(tb->tc.ptr + tb->jmp_reset_offset[n]);
     tb_set_jmp_target(tb, n, addr);
 }
 
@@ -1290,7 +1290,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     }
 
     gen_code_buf = tcg_ctx.code_gen_ptr;
-    tb->tc_ptr = gen_code_buf;
+    tb->tc.ptr = gen_code_buf;
     tb->pc = pc;
     tb->cs_base = cs_base;
     tb->flags = flags;
@@ -1310,7 +1310,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     gen_intermediate_code(env, tb);
     tcg_ctx.cpu = NULL;
 
-    trace_translate_block(tb, tb->pc, tb->tc_ptr);
+    trace_translate_block(tb, tb->pc, tb->tc.ptr);
 
     /* generate machine code */
     tb->jmp_reset_offset[0] = TB_JMP_RESET_OFFSET_INVALID;
@@ -1356,7 +1356,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         qemu_log_in_addr_range(tb->pc)) {
         qemu_log_lock();
         qemu_log("OUT: [size=%d]\n", gen_code_size);
-        log_disas(tb->tc_ptr, gen_code_size);
+        log_disas(tb->tc.ptr, gen_code_size);
         qemu_log("\n");
         qemu_log_flush();
         qemu_log_unlock();
@@ -1684,7 +1684,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
     while (m_min <= m_max) {
         m = (m_min + m_max) >> 1;
         tb = tcg_ctx.tb_ctx.tbs[m];
-        v = (uintptr_t)tb->tc_ptr;
+        v = (uintptr_t)tb->tc.ptr;
         if (v == tc_ptr) {
             return tb;
         } else if (tc_ptr < v) {
diff --git a/tcg/tcg-runtime.c b/tcg/tcg-runtime.c
index 4f873a9..9a87616 100644
--- a/tcg/tcg-runtime.c
+++ b/tcg/tcg-runtime.c
@@ -157,9 +157,9 @@ void *HELPER(lookup_tb_ptr)(CPUArchState *env)
     }
     qemu_log_mask_and_addr(CPU_LOG_EXEC, pc,
                            "Chain %p [%d: " TARGET_FMT_lx "] %s\n",
-                           tb->tc_ptr, cpu->cpu_index, pc,
+                           tb->tc.ptr, cpu->cpu_index, pc,
                            lookup_symbol(pc));
-    return tb->tc_ptr;
+    return tb->tc.ptr;
 }
 
 void HELPER(exit_atomic)(CPUArchState *env)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 3559829..28c1b94 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -2616,8 +2616,8 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 
     tcg_reg_alloc_start(s);
 
-    s->code_buf = tb->tc_ptr;
-    s->code_ptr = tb->tc_ptr;
+    s->code_buf = tb->tc.ptr;
+    s->code_ptr = tb->tc.ptr;
 
     tcg_out_tb_init(s);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 27/43] translate-all: use a binary search tree to track TBs in TBContext
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (25 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 26/43] exec-all: extract tb->tc_* into a separate struct tc_tb Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 28/43] exec-all: rename tb_free to tb_remove Emilio G. Cota
                   ` (16 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

This is a prerequisite for supporting multiple TCG contexts, since
we will have threads generating code in separate regions of
code_gen_buffer.

For this we need a new field (.size) in struct tb_tc to keep
track of the size of the translated code. This field uses a size_t
to avoid adding a hole to the struct, although really an unsigned
int would have been enough.

The comparison function we use is optimized for the common case:
insertions. Profiling shows that upon booting debian-arm, 98%
of comparisons are between existing tb's (i.e. a->size and b->size
are both !0), which happens during insertions (and removals, but
those are rare). The remaining cases are lookups. From reading the glib
sources we see that the first key is always the lookup key. However,
the code does not assume this to always be the case because this
behaviour is not guaranteed in the glib docs. However, we embed
this knowledge in the code as a branch hint for the compiler.

Note that tb_free does not free space in the code_gen_buffer anymore,
since we cannot easily know whether the tb is the last one inserted
in code_gen_buffer. The next patch in this series renames tb_free
to tb_remove to reflect this.

Performance-wise, lookups in tb_find_pc are the same as before:
O(log n). However, insertions are O(log n) instead of O(1), which
results in a small slowdown when booting debian-arm:

Performance counter stats for 'build/arm-softmmu/qemu-system-arm \
	-machine type=virt -nographic -smp 1 -m 4096 \
	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
	-device virtio-net-device,netdev=unet \
	-drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \
	-device virtio-blk-device,drive=myblock \
	-kernel img/arm/aarch32-current-linux-kernel-only.img \
	-append console=ttyAMA0 root=/dev/vda1 \
	-name arm,debug-threads=on -smp 1' (10 runs):

- Before:

       8048.598422      task-clock (msec)         #    0.931 CPUs utilized            ( +-  0.28% )
            16,974      context-switches          #    0.002 M/sec                    ( +-  0.12% )
                 0      cpu-migrations            #    0.000 K/sec
            10,125      page-faults               #    0.001 M/sec                    ( +-  1.23% )
    35,144,901,879      cycles                    #    4.367 GHz                      ( +-  0.14% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    65,758,252,643      instructions              #    1.87  insns per cycle          ( +-  0.33% )
    10,871,298,668      branches                  # 1350.707 M/sec                    ( +-  0.41% )
       192,322,212      branch-misses             #    1.77% of all branches          ( +-  0.32% )

       8.640869419 seconds time elapsed                                          ( +-  0.57% )

- After:
       8146.242027      task-clock (msec)         #    0.923 CPUs utilized            ( +-  1.23% )
            17,016      context-switches          #    0.002 M/sec                    ( +-  0.40% )
                 0      cpu-migrations            #    0.000 K/sec
            18,769      page-faults               #    0.002 M/sec                    ( +-  0.45% )
    35,660,956,120      cycles                    #    4.378 GHz                      ( +-  1.22% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    65,095,366,607      instructions              #    1.83  insns per cycle          ( +-  1.73% )
    10,803,480,261      branches                  # 1326.192 M/sec                    ( +-  1.95% )
       195,601,289      branch-misses             #    1.81% of all branches          ( +-  0.39% )

       8.828660235 seconds time elapsed                                          ( +-  0.38% )

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/exec-all.h   |   5 ++
 include/exec/tb-context.h |   4 +-
 accel/tcg/translate-all.c | 217 ++++++++++++++++++++++++----------------------
 3 files changed, 118 insertions(+), 108 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index bc4f41c..eb3eb7b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -343,10 +343,15 @@ static inline void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
 
 /*
  * Translation Cache-related fields of a TB.
+ * This struct exists just for convenience; we keep track of TB's in a binary
+ * search tree, and the only fields needed to compare TB's in the tree are
+ * @ptr and @size. @search is brought here for consistency, since it is also
+ * a TC-related field.
  */
 struct tb_tc {
     void *ptr;    /* pointer to the translated code */
     uint8_t *search;  /* pointer to search data */
+    size_t size;
 };
 
 struct TranslationBlock {
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index 25c2afe..1fa8dcc 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -31,10 +31,8 @@ typedef struct TBContext TBContext;
 
 struct TBContext {
 
-    TranslationBlock **tbs;
+    GTree *tb_tree;
     struct qht htable;
-    size_t tbs_size;
-    int nb_tbs;
     /* any access to the tbs or the page table must use this lock */
     QemuMutex tb_lock;
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 0a2eb86..cb71aef 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -776,6 +776,48 @@ static inline void *alloc_code_gen_buffer(void)
 }
 #endif /* USE_STATIC_CODE_GEN_BUFFER, WIN32, POSIX */
 
+/* compare a pointer @ptr and a tb_tc @s */
+static int ptr_cmp_tb_tc(const void *ptr, const struct tb_tc *s)
+{
+    if (ptr >= s->ptr + s->size) {
+        return 1;
+    } else if (ptr < s->ptr) {
+        return -1;
+    }
+    return 0;
+}
+
+static gint tb_tc_cmp(gconstpointer ap, gconstpointer bp)
+{
+    const struct tb_tc *a = ap;
+    const struct tb_tc *b = bp;
+
+    /*
+     * When both sizes are set, we know this isn't a lookup and therefore
+     * the two buffers are non-overlapping.
+     * This is the most likely case: every TB must be inserted; lookups
+     * are a lot less frequent.
+     */
+    if (likely(a->size && b->size)) {
+        /* a->ptr == b->ptr would mean the buffers overlap */
+        g_assert(a->ptr != b->ptr);
+
+        if (a->ptr > b->ptr) {
+            return 1;
+        }
+        return -1;
+    }
+    /*
+     * All lookups have either .size field set to 0.
+     * From the glib sources we see that @ap is always the lookup key. However
+     * the docs provide no guarantee, so we just mark this case as likely.
+     */
+    if (likely(a->size == 0)) {
+        return ptr_cmp_tb_tc(a->ptr, b);
+    }
+    return ptr_cmp_tb_tc(b->ptr, a);
+}
+
 static inline void code_gen_alloc(size_t tb_size)
 {
     tcg_ctx.code_gen_buffer_size = size_code_gen_buffer(tb_size);
@@ -784,15 +826,7 @@ static inline void code_gen_alloc(size_t tb_size)
         fprintf(stderr, "Could not allocate dynamic translator buffer\n");
         exit(1);
     }
-
-    /* size this conservatively -- realloc later if needed */
-    tcg_ctx.tb_ctx.tbs_size =
-        tcg_ctx.code_gen_buffer_size / CODE_GEN_AVG_BLOCK_SIZE / 8;
-    if (unlikely(!tcg_ctx.tb_ctx.tbs_size)) {
-        tcg_ctx.tb_ctx.tbs_size = 64 * 1024;
-    }
-    tcg_ctx.tb_ctx.tbs = g_new(TranslationBlock *, tcg_ctx.tb_ctx.tbs_size);
-
+    tcg_ctx.tb_ctx.tb_tree = g_tree_new(tb_tc_cmp);
     qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
 }
 
@@ -829,7 +863,6 @@ void tcg_exec_init(unsigned long tb_size)
 static TranslationBlock *tb_alloc(target_ulong pc)
 {
     TranslationBlock *tb;
-    TBContext *ctx;
 
     assert_tb_locked();
 
@@ -837,12 +870,6 @@ static TranslationBlock *tb_alloc(target_ulong pc)
     if (unlikely(tb == NULL)) {
         return NULL;
     }
-    ctx = &tcg_ctx.tb_ctx;
-    if (unlikely(ctx->nb_tbs == ctx->tbs_size)) {
-        ctx->tbs_size *= 2;
-        ctx->tbs = g_renew(TranslationBlock *, ctx->tbs, ctx->tbs_size);
-    }
-    ctx->tbs[ctx->nb_tbs++] = tb;
     return tb;
 }
 
@@ -851,16 +878,7 @@ void tb_free(TranslationBlock *tb)
 {
     assert_tb_locked();
 
-    /* In practice this is mostly used for single use temporary TB
-       Ignore the hard cases and just back up if this TB happens to
-       be the last one generated.  */
-    if (tcg_ctx.tb_ctx.nb_tbs > 0 &&
-            tb == tcg_ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs - 1]) {
-        size_t struct_size = ROUND_UP(sizeof(*tb), qemu_icache_linesize);
-
-        tcg_ctx.code_gen_ptr = tb->tc.ptr - struct_size;
-        tcg_ctx.tb_ctx.nb_tbs--;
-    }
+    g_tree_remove(tcg_ctx.tb_ctx.tb_tree, &tb->tc);
 }
 
 static inline void invalidate_page_bitmap(PageDesc *p)
@@ -918,11 +936,12 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
     }
 
     if (DEBUG_TB_FLUSH_GATE) {
-        printf("qemu: flush code_size=%td nb_tbs=%d avg_tb_size=%td\n",
-               tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer,
-               tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.tb_ctx.nb_tbs > 0 ?
-               (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) /
-               tcg_ctx.tb_ctx.nb_tbs : 0);
+        size_t nb_tbs = g_tree_nnodes(tcg_ctx.tb_ctx.tb_tree);
+
+        printf("qemu: flush code_size=%td nb_tbs=%zu avg_tb_size=%td\n",
+               tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer, nb_tbs,
+               nb_tbs > 0 ?
+               (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) / nb_tbs : 0);
     }
     if ((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer)
         > tcg_ctx.code_gen_buffer_size) {
@@ -933,7 +952,10 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
         cpu_tb_jmp_cache_clear(cpu);
     }
 
-    tcg_ctx.tb_ctx.nb_tbs = 0;
+    /* Increment the refcount first so that destroy acts as a reset */
+    g_tree_ref(tcg_ctx.tb_ctx.tb_tree);
+    g_tree_destroy(tcg_ctx.tb_ctx.tb_tree);
+
     qht_reset_size(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
     page_flush_tb();
 
@@ -1343,6 +1365,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     if (unlikely(search_size < 0)) {
         goto buffer_overflow;
     }
+    tb->tc.size = gen_code_size;
 
 #ifdef CONFIG_PROFILER
     tcg_ctx.code_time += profile_getclock() - ti;
@@ -1393,6 +1416,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
      * through the physical hash table and physical page list.
      */
     tb_link_page(tb, phys_pc, phys_page2);
+    g_tree_insert(tcg_ctx.tb_ctx.tb_tree, &tb->tc, tb);
     return tb;
 }
 
@@ -1663,37 +1687,16 @@ static bool tb_invalidate_phys_page(tb_page_addr_t addr, uintptr_t pc)
 }
 #endif
 
-/* find the TB 'tb' such that tb[0].tc_ptr <= tc_ptr <
-   tb[1].tc_ptr. Return NULL if not found */
+/*
+ * Find the TB 'tb' such that
+ * tb->tc.ptr <= tc_ptr < tb->tc.ptr + tb->tc.size
+ * Return NULL if not found.
+ */
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
 {
-    int m_min, m_max, m;
-    uintptr_t v;
-    TranslationBlock *tb;
+    struct tb_tc s = { .ptr = (void *)tc_ptr };
 
-    if (tcg_ctx.tb_ctx.nb_tbs <= 0) {
-        return NULL;
-    }
-    if (tc_ptr < (uintptr_t)tcg_ctx.code_gen_buffer ||
-        tc_ptr >= (uintptr_t)tcg_ctx.code_gen_ptr) {
-        return NULL;
-    }
-    /* binary search (cf Knuth) */
-    m_min = 0;
-    m_max = tcg_ctx.tb_ctx.nb_tbs - 1;
-    while (m_min <= m_max) {
-        m = (m_min + m_max) >> 1;
-        tb = tcg_ctx.tb_ctx.tbs[m];
-        v = (uintptr_t)tb->tc.ptr;
-        if (v == tc_ptr) {
-            return tb;
-        } else if (tc_ptr < v) {
-            m_max = m - 1;
-        } else {
-            m_min = m + 1;
-        }
-    }
-    return tcg_ctx.tb_ctx.tbs[m_max];
+    return g_tree_lookup(tcg_ctx.tb_ctx.tb_tree, &s);
 }
 
 #if !defined(CONFIG_USER_ONLY)
@@ -1879,63 +1882,67 @@ static void print_qht_statistics(FILE *f, fprintf_function cpu_fprintf,
     g_free(hgram);
 }
 
+struct tb_tree_stats {
+    size_t target_size;
+    size_t max_target_size;
+    size_t direct_jmp_count;
+    size_t direct_jmp2_count;
+    size_t cross_page;
+};
+
+static gboolean tb_tree_stats_iter(gpointer key, gpointer value, gpointer data)
+{
+    const TranslationBlock *tb = value;
+    struct tb_tree_stats *tst = data;
+
+    tst->target_size += tb->size;
+    if (tb->size > tst->max_target_size) {
+        tst->max_target_size = tb->size;
+    }
+    if (tb->page_addr[1] != -1) {
+        tst->cross_page++;
+    }
+    if (tb->jmp_reset_offset[0] != TB_JMP_RESET_OFFSET_INVALID) {
+        tst->direct_jmp_count++;
+        if (tb->jmp_reset_offset[1] != TB_JMP_RESET_OFFSET_INVALID) {
+            tst->direct_jmp2_count++;
+        }
+    }
+    return false;
+}
+
 void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
 {
-    int i, target_code_size, max_target_code_size;
-    int direct_jmp_count, direct_jmp2_count, cross_page;
-    TranslationBlock *tb;
+    struct tb_tree_stats tst = {};
     struct qht_stats hst;
+    size_t nb_tbs;
 
     tb_lock();
 
-    target_code_size = 0;
-    max_target_code_size = 0;
-    cross_page = 0;
-    direct_jmp_count = 0;
-    direct_jmp2_count = 0;
-    for (i = 0; i < tcg_ctx.tb_ctx.nb_tbs; i++) {
-        tb = tcg_ctx.tb_ctx.tbs[i];
-        target_code_size += tb->size;
-        if (tb->size > max_target_code_size) {
-            max_target_code_size = tb->size;
-        }
-        if (tb->page_addr[1] != -1) {
-            cross_page++;
-        }
-        if (tb->jmp_reset_offset[0] != TB_JMP_RESET_OFFSET_INVALID) {
-            direct_jmp_count++;
-            if (tb->jmp_reset_offset[1] != TB_JMP_RESET_OFFSET_INVALID) {
-                direct_jmp2_count++;
-            }
-        }
-    }
+    nb_tbs = g_tree_nnodes(tcg_ctx.tb_ctx.tb_tree);
+    g_tree_foreach(tcg_ctx.tb_ctx.tb_tree, tb_tree_stats_iter, &tst);
     /* XXX: avoid using doubles ? */
     cpu_fprintf(f, "Translation buffer state:\n");
     cpu_fprintf(f, "gen code size       %td/%zd\n",
                 tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer,
                 tcg_ctx.code_gen_highwater - tcg_ctx.code_gen_buffer);
-    cpu_fprintf(f, "TB count            %d\n", tcg_ctx.tb_ctx.nb_tbs);
-    cpu_fprintf(f, "TB avg target size  %d max=%d bytes\n",
-            tcg_ctx.tb_ctx.nb_tbs ? target_code_size /
-                    tcg_ctx.tb_ctx.nb_tbs : 0,
-            max_target_code_size);
+    cpu_fprintf(f, "TB count            %zu\n", nb_tbs);
+    cpu_fprintf(f, "TB avg target size  %zu max=%zu bytes\n",
+                nb_tbs ? tst.target_size / nb_tbs : 0,
+                tst.max_target_size);
     cpu_fprintf(f, "TB avg host size    %td bytes (expansion ratio: %0.1f)\n",
-            tcg_ctx.tb_ctx.nb_tbs ? (tcg_ctx.code_gen_ptr -
-                                     tcg_ctx.code_gen_buffer) /
-                                     tcg_ctx.tb_ctx.nb_tbs : 0,
-                target_code_size ? (double) (tcg_ctx.code_gen_ptr -
-                                             tcg_ctx.code_gen_buffer) /
-                                             target_code_size : 0);
-    cpu_fprintf(f, "cross page TB count %d (%d%%)\n", cross_page,
-            tcg_ctx.tb_ctx.nb_tbs ? (cross_page * 100) /
-                                    tcg_ctx.tb_ctx.nb_tbs : 0);
-    cpu_fprintf(f, "direct jump count   %d (%d%%) (2 jumps=%d %d%%)\n",
-                direct_jmp_count,
-                tcg_ctx.tb_ctx.nb_tbs ? (direct_jmp_count * 100) /
-                        tcg_ctx.tb_ctx.nb_tbs : 0,
-                direct_jmp2_count,
-                tcg_ctx.tb_ctx.nb_tbs ? (direct_jmp2_count * 100) /
-                        tcg_ctx.tb_ctx.nb_tbs : 0);
+                nb_tbs ? (tcg_ctx.code_gen_ptr -
+                          tcg_ctx.code_gen_buffer) / nb_tbs : 0,
+                tst.target_size ? (double) (tcg_ctx.code_gen_ptr -
+                                            tcg_ctx.code_gen_buffer) /
+                                            tst.target_size : 0);
+    cpu_fprintf(f, "cross page TB count %zu (%zu%%)\n", tst.cross_page,
+            nb_tbs ? (tst.cross_page * 100) / nb_tbs : 0);
+    cpu_fprintf(f, "direct jump count   %zu (%zu%%) (2 jumps=%zu %zu%%)\n",
+                tst.direct_jmp_count,
+                nb_tbs ? (tst.direct_jmp_count * 100) / nb_tbs : 0,
+                tst.direct_jmp2_count,
+                nb_tbs ? (tst.direct_jmp2_count * 100) / nb_tbs : 0);
 
     qht_statistics_init(&tcg_ctx.tb_ctx.htable, &hst);
     print_qht_statistics(f, cpu_fprintf, hst);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 28/43] exec-all: rename tb_free to tb_remove
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (26 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 27/43] translate-all: use a binary search tree to track TBs in TBContext Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 29/43] translate-all: report correct avg host TB size Emilio G. Cota
                   ` (15 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

We don't really free anything in this function anymore; we just remove
the TB from the binary search tree.

Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/exec-all.h   | 2 +-
 accel/tcg/cpu-exec.c      | 2 +-
 accel/tcg/translate-all.c | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index eb3eb7b..7bc2050 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -428,7 +428,7 @@ static inline uint32_t curr_cflags(void)
     return parallel_cpus ? CF_PARALLEL : 0;
 }
 
-void tb_free(TranslationBlock *tb);
+void tb_remove(TranslationBlock *tb);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index cb1e6d3..1963bda 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -218,7 +218,7 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 
     tb_lock();
     tb_phys_invalidate(tb, -1);
-    tb_free(tb);
+    tb_remove(tb);
     tb_unlock();
 }
 #endif
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index cb71aef..448f13b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -375,7 +375,7 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr)
         if (tb->cflags & CF_NOCACHE) {
             /* one-shot translation, invalidate it immediately */
             tb_phys_invalidate(tb, -1);
-            tb_free(tb);
+            tb_remove(tb);
         }
         r = true;
     }
@@ -874,7 +874,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 }
 
 /* Called with tb_lock held.  */
-void tb_free(TranslationBlock *tb)
+void tb_remove(TranslationBlock *tb)
 {
     assert_tb_locked();
 
@@ -1809,7 +1809,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
              * cpu_exec_nocache() */
             tb_phys_invalidate(tb->orig_tb, -1);
         }
-        tb_free(tb);
+        tb_remove(tb);
     }
     /* FIXME: In theory this could raise an exception.  In practice
        we have already translated the block once so it's probably ok.  */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 29/43] translate-all: report correct avg host TB size
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (27 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 28/43] exec-all: rename tb_free to tb_remove Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 30/43] tci: move tci_regs to tcg_qemu_tb_exec's stack Emilio G. Cota
                   ` (14 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Since commit 6e3b2bfd6 ("tcg: allocate TB structs before the
corresponding translated code") we are not fully utilizing
code_gen_buffer for translated code, and therefore are
incorrectly reporting the amount of translated code as well as
the average host TB size. Address this by:

- Making the conscious choice of misreporting the total translated code;
  doing otherwise would mislead users into thinking "-tb-size" is not
  honoured.

- Expanding tb_tree_stats to accurately count the bytes of translated code on
  the host, and using this for reporting the average tb host size,
  as well as the expansion ratio.

In the future we might want to consider reporting the accurate numbers for
the total translated code, together with a "bookkeeping/overhead" field to
account for the TB structs.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 accel/tcg/translate-all.c | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 448f13b..d50e2b9 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -923,6 +923,15 @@ static void page_flush_tb(void)
     }
 }
 
+static gboolean tb_host_size_iter(gpointer key, gpointer value, gpointer data)
+{
+    const TranslationBlock *tb = value;
+    size_t *size = data;
+
+    *size += tb->tc.size;
+    return false;
+}
+
 /* flush all the translation blocks */
 static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
 {
@@ -937,11 +946,12 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
 
     if (DEBUG_TB_FLUSH_GATE) {
         size_t nb_tbs = g_tree_nnodes(tcg_ctx.tb_ctx.tb_tree);
+        size_t host_size = 0;
 
-        printf("qemu: flush code_size=%td nb_tbs=%zu avg_tb_size=%td\n",
+        g_tree_foreach(tcg_ctx.tb_ctx.tb_tree, tb_host_size_iter, &host_size);
+        printf("qemu: flush code_size=%td nb_tbs=%zu avg_tb_size=%zu\n",
                tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer, nb_tbs,
-               nb_tbs > 0 ?
-               (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) / nb_tbs : 0);
+               nb_tbs > 0 ? host_size / nb_tbs : 0);
     }
     if ((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer)
         > tcg_ctx.code_gen_buffer_size) {
@@ -1883,6 +1893,7 @@ static void print_qht_statistics(FILE *f, fprintf_function cpu_fprintf,
 }
 
 struct tb_tree_stats {
+    size_t host_size;
     size_t target_size;
     size_t max_target_size;
     size_t direct_jmp_count;
@@ -1895,6 +1906,7 @@ static gboolean tb_tree_stats_iter(gpointer key, gpointer value, gpointer data)
     const TranslationBlock *tb = value;
     struct tb_tree_stats *tst = data;
 
+    tst->host_size += tb->tc.size;
     tst->target_size += tb->size;
     if (tb->size > tst->max_target_size) {
         tst->max_target_size = tb->size;
@@ -1923,6 +1935,11 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     g_tree_foreach(tcg_ctx.tb_ctx.tb_tree, tb_tree_stats_iter, &tst);
     /* XXX: avoid using doubles ? */
     cpu_fprintf(f, "Translation buffer state:\n");
+    /*
+     * Report total code size including the padding and TB structs;
+     * otherwise users might think "-tb-size" is not honoured.
+     * For avg host size we use the precise numbers from tb_tree_stats though.
+     */
     cpu_fprintf(f, "gen code size       %td/%zd\n",
                 tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer,
                 tcg_ctx.code_gen_highwater - tcg_ctx.code_gen_buffer);
@@ -1930,12 +1947,9 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     cpu_fprintf(f, "TB avg target size  %zu max=%zu bytes\n",
                 nb_tbs ? tst.target_size / nb_tbs : 0,
                 tst.max_target_size);
-    cpu_fprintf(f, "TB avg host size    %td bytes (expansion ratio: %0.1f)\n",
-                nb_tbs ? (tcg_ctx.code_gen_ptr -
-                          tcg_ctx.code_gen_buffer) / nb_tbs : 0,
-                tst.target_size ? (double) (tcg_ctx.code_gen_ptr -
-                                            tcg_ctx.code_gen_buffer) /
-                                            tst.target_size : 0);
+    cpu_fprintf(f, "TB avg host size    %zu bytes (expansion ratio: %0.1f)\n",
+                nb_tbs ? tst.host_size / nb_tbs : 0,
+                tst.target_size ? (double)tst.host_size / tst.target_size : 0);
     cpu_fprintf(f, "cross page TB count %zu (%zu%%)\n", tst.cross_page,
             nb_tbs ? (tst.cross_page * 100) / nb_tbs : 0);
     cpu_fprintf(f, "direct jump count   %zu (%zu%%) (2 jumps=%zu %zu%%)\n",
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 30/43] tci: move tci_regs to tcg_qemu_tb_exec's stack
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (28 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 29/43] translate-all: report correct avg host TB size Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 31/43] tcg: take tb_ctx out of TCGContext Emilio G. Cota
                   ` (13 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Groundwork for supporting multiple TCG contexts.

Compile-tested for all targets on an x86_64 host.

Suggested-by: Richard Henderson <rth@twiddle.net>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/tci.c | 552 +++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 279 insertions(+), 273 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 4bdc645..f3216c1 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -55,93 +55,95 @@ typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong,
                                     tcg_target_ulong);
 #endif
 
-static tcg_target_ulong tci_reg[TCG_TARGET_NB_REGS];
-
-static tcg_target_ulong tci_read_reg(TCGReg index)
+static tcg_target_ulong tci_read_reg(const tcg_target_ulong *regs, TCGReg index)
 {
-    tci_assert(index < ARRAY_SIZE(tci_reg));
-    return tci_reg[index];
+    tci_assert(index < TCG_TARGET_NB_REGS);
+    return regs[index];
 }
 
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
-static int8_t tci_read_reg8s(TCGReg index)
+static int8_t tci_read_reg8s(const tcg_target_ulong *regs, TCGReg index)
 {
-    return (int8_t)tci_read_reg(index);
+    return (int8_t)tci_read_reg(regs, index);
 }
 #endif
 
 #if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
-static int16_t tci_read_reg16s(TCGReg index)
+static int16_t tci_read_reg16s(const tcg_target_ulong *regs, TCGReg index)
 {
-    return (int16_t)tci_read_reg(index);
+    return (int16_t)tci_read_reg(regs, index);
 }
 #endif
 
 #if TCG_TARGET_REG_BITS == 64
-static int32_t tci_read_reg32s(TCGReg index)
+static int32_t tci_read_reg32s(const tcg_target_ulong *regs, TCGReg index)
 {
-    return (int32_t)tci_read_reg(index);
+    return (int32_t)tci_read_reg(regs, index);
 }
 #endif
 
-static uint8_t tci_read_reg8(TCGReg index)
+static uint8_t tci_read_reg8(const tcg_target_ulong *regs, TCGReg index)
 {
-    return (uint8_t)tci_read_reg(index);
+    return (uint8_t)tci_read_reg(regs, index);
 }
 
-static uint16_t tci_read_reg16(TCGReg index)
+static uint16_t tci_read_reg16(const tcg_target_ulong *regs, TCGReg index)
 {
-    return (uint16_t)tci_read_reg(index);
+    return (uint16_t)tci_read_reg(regs, index);
 }
 
-static uint32_t tci_read_reg32(TCGReg index)
+static uint32_t tci_read_reg32(const tcg_target_ulong *regs, TCGReg index)
 {
-    return (uint32_t)tci_read_reg(index);
+    return (uint32_t)tci_read_reg(regs, index);
 }
 
 #if TCG_TARGET_REG_BITS == 64
-static uint64_t tci_read_reg64(TCGReg index)
+static uint64_t tci_read_reg64(const tcg_target_ulong *regs, TCGReg index)
 {
-    return tci_read_reg(index);
+    return tci_read_reg(regs, index);
 }
 #endif
 
-static void tci_write_reg(TCGReg index, tcg_target_ulong value)
+static void
+tci_write_reg(tcg_target_ulong *regs, TCGReg index, tcg_target_ulong value)
 {
-    tci_assert(index < ARRAY_SIZE(tci_reg));
+    tci_assert(index < TCG_TARGET_NB_REGS);
     tci_assert(index != TCG_AREG0);
     tci_assert(index != TCG_REG_CALL_STACK);
-    tci_reg[index] = value;
+    regs[index] = value;
 }
 
 #if TCG_TARGET_REG_BITS == 64
-static void tci_write_reg32s(TCGReg index, int32_t value)
+static void
+tci_write_reg32s(tcg_target_ulong *regs, TCGReg index, int32_t value)
 {
-    tci_write_reg(index, value);
+    tci_write_reg(regs, index, value);
 }
 #endif
 
-static void tci_write_reg8(TCGReg index, uint8_t value)
+static void tci_write_reg8(tcg_target_ulong *regs, TCGReg index, uint8_t value)
 {
-    tci_write_reg(index, value);
+    tci_write_reg(regs, index, value);
 }
 
-static void tci_write_reg32(TCGReg index, uint32_t value)
+static void
+tci_write_reg32(tcg_target_ulong *regs, TCGReg index, uint32_t value)
 {
-    tci_write_reg(index, value);
+    tci_write_reg(regs, index, value);
 }
 
 #if TCG_TARGET_REG_BITS == 32
-static void tci_write_reg64(uint32_t high_index, uint32_t low_index,
-                            uint64_t value)
+static void tci_write_reg64(tcg_target_ulong *regs, uint32_t high_index,
+                            uint32_t low_index, uint64_t value)
 {
-    tci_write_reg(low_index, value);
-    tci_write_reg(high_index, value >> 32);
+    tci_write_reg(regs, low_index, value);
+    tci_write_reg(regs, high_index, value >> 32);
 }
 #elif TCG_TARGET_REG_BITS == 64
-static void tci_write_reg64(TCGReg index, uint64_t value)
+static void
+tci_write_reg64(tcg_target_ulong *regs, TCGReg index, uint64_t value)
 {
-    tci_write_reg(index, value);
+    tci_write_reg(regs, index, value);
 }
 #endif
 
@@ -188,94 +190,97 @@ static uint64_t tci_read_i64(uint8_t **tb_ptr)
 #endif
 
 /* Read indexed register (native size) from bytecode. */
-static tcg_target_ulong tci_read_r(uint8_t **tb_ptr)
+static tcg_target_ulong
+tci_read_r(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
-    tcg_target_ulong value = tci_read_reg(**tb_ptr);
+    tcg_target_ulong value = tci_read_reg(regs, **tb_ptr);
     *tb_ptr += 1;
     return value;
 }
 
 /* Read indexed register (8 bit) from bytecode. */
-static uint8_t tci_read_r8(uint8_t **tb_ptr)
+static uint8_t tci_read_r8(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
-    uint8_t value = tci_read_reg8(**tb_ptr);
+    uint8_t value = tci_read_reg8(regs, **tb_ptr);
     *tb_ptr += 1;
     return value;
 }
 
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
 /* Read indexed register (8 bit signed) from bytecode. */
-static int8_t tci_read_r8s(uint8_t **tb_ptr)
+static int8_t tci_read_r8s(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
-    int8_t value = tci_read_reg8s(**tb_ptr);
+    int8_t value = tci_read_reg8s(regs, **tb_ptr);
     *tb_ptr += 1;
     return value;
 }
 #endif
 
 /* Read indexed register (16 bit) from bytecode. */
-static uint16_t tci_read_r16(uint8_t **tb_ptr)
+static uint16_t tci_read_r16(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
-    uint16_t value = tci_read_reg16(**tb_ptr);
+    uint16_t value = tci_read_reg16(regs, **tb_ptr);
     *tb_ptr += 1;
     return value;
 }
 
 #if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
 /* Read indexed register (16 bit signed) from bytecode. */
-static int16_t tci_read_r16s(uint8_t **tb_ptr)
+static int16_t tci_read_r16s(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
-    int16_t value = tci_read_reg16s(**tb_ptr);
+    int16_t value = tci_read_reg16s(regs, **tb_ptr);
     *tb_ptr += 1;
     return value;
 }
 #endif
 
 /* Read indexed register (32 bit) from bytecode. */
-static uint32_t tci_read_r32(uint8_t **tb_ptr)
+static uint32_t tci_read_r32(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
-    uint32_t value = tci_read_reg32(**tb_ptr);
+    uint32_t value = tci_read_reg32(regs, **tb_ptr);
     *tb_ptr += 1;
     return value;
 }
 
 #if TCG_TARGET_REG_BITS == 32
 /* Read two indexed registers (2 * 32 bit) from bytecode. */
-static uint64_t tci_read_r64(uint8_t **tb_ptr)
+static uint64_t tci_read_r64(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
-    uint32_t low = tci_read_r32(tb_ptr);
-    return tci_uint64(tci_read_r32(tb_ptr), low);
+    uint32_t low = tci_read_r32(regs, tb_ptr);
+    return tci_uint64(tci_read_r32(regs, tb_ptr), low);
 }
 #elif TCG_TARGET_REG_BITS == 64
 /* Read indexed register (32 bit signed) from bytecode. */
-static int32_t tci_read_r32s(uint8_t **tb_ptr)
+static int32_t tci_read_r32s(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
-    int32_t value = tci_read_reg32s(**tb_ptr);
+    int32_t value = tci_read_reg32s(regs, **tb_ptr);
     *tb_ptr += 1;
     return value;
 }
 
 /* Read indexed register (64 bit) from bytecode. */
-static uint64_t tci_read_r64(uint8_t **tb_ptr)
+static uint64_t tci_read_r64(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
-    uint64_t value = tci_read_reg64(**tb_ptr);
+    uint64_t value = tci_read_reg64(regs, **tb_ptr);
     *tb_ptr += 1;
     return value;
 }
 #endif
 
 /* Read indexed register(s) with target address from bytecode. */
-static target_ulong tci_read_ulong(uint8_t **tb_ptr)
+static target_ulong
+tci_read_ulong(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
-    target_ulong taddr = tci_read_r(tb_ptr);
+    target_ulong taddr = tci_read_r(regs, tb_ptr);
 #if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
-    taddr += (uint64_t)tci_read_r(tb_ptr) << 32;
+    taddr += (uint64_t)tci_read_r(regs, tb_ptr) << 32;
 #endif
     return taddr;
 }
 
 /* Read indexed register or constant (native size) from bytecode. */
-static tcg_target_ulong tci_read_ri(uint8_t **tb_ptr)
+static tcg_target_ulong
+tci_read_ri(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
     tcg_target_ulong value;
     TCGReg r = **tb_ptr;
@@ -283,13 +288,13 @@ static tcg_target_ulong tci_read_ri(uint8_t **tb_ptr)
     if (r == TCG_CONST) {
         value = tci_read_i(tb_ptr);
     } else {
-        value = tci_read_reg(r);
+        value = tci_read_reg(regs, r);
     }
     return value;
 }
 
 /* Read indexed register or constant (32 bit) from bytecode. */
-static uint32_t tci_read_ri32(uint8_t **tb_ptr)
+static uint32_t tci_read_ri32(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
     uint32_t value;
     TCGReg r = **tb_ptr;
@@ -297,21 +302,21 @@ static uint32_t tci_read_ri32(uint8_t **tb_ptr)
     if (r == TCG_CONST) {
         value = tci_read_i32(tb_ptr);
     } else {
-        value = tci_read_reg32(r);
+        value = tci_read_reg32(regs, r);
     }
     return value;
 }
 
 #if TCG_TARGET_REG_BITS == 32
 /* Read two indexed registers or constants (2 * 32 bit) from bytecode. */
-static uint64_t tci_read_ri64(uint8_t **tb_ptr)
+static uint64_t tci_read_ri64(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
-    uint32_t low = tci_read_ri32(tb_ptr);
-    return tci_uint64(tci_read_ri32(tb_ptr), low);
+    uint32_t low = tci_read_ri32(regs, tb_ptr);
+    return tci_uint64(tci_read_ri32(regs, tb_ptr), low);
 }
 #elif TCG_TARGET_REG_BITS == 64
 /* Read indexed register or constant (64 bit) from bytecode. */
-static uint64_t tci_read_ri64(uint8_t **tb_ptr)
+static uint64_t tci_read_ri64(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
     uint64_t value;
     TCGReg r = **tb_ptr;
@@ -319,7 +324,7 @@ static uint64_t tci_read_ri64(uint8_t **tb_ptr)
     if (r == TCG_CONST) {
         value = tci_read_i64(tb_ptr);
     } else {
-        value = tci_read_reg64(r);
+        value = tci_read_reg64(regs, r);
     }
     return value;
 }
@@ -465,12 +470,13 @@ static bool tci_compare64(uint64_t u0, uint64_t u1, TCGCond condition)
 /* Interpret pseudo code in tb. */
 uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
 {
+    tcg_target_ulong regs[TCG_TARGET_NB_REGS];
     long tcg_temps[CPU_TEMP_BUF_NLONGS];
     uintptr_t sp_value = (uintptr_t)(tcg_temps + CPU_TEMP_BUF_NLONGS);
     uintptr_t ret = 0;
 
-    tci_reg[TCG_AREG0] = (tcg_target_ulong)env;
-    tci_reg[TCG_REG_CALL_STACK] = sp_value;
+    regs[TCG_AREG0] = (tcg_target_ulong)env;
+    regs[TCG_REG_CALL_STACK] = sp_value;
     tci_assert(tb_ptr);
 
     for (;;) {
@@ -503,27 +509,27 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
 
         switch (opc) {
         case INDEX_op_call:
-            t0 = tci_read_ri(&tb_ptr);
+            t0 = tci_read_ri(regs, &tb_ptr);
 #if TCG_TARGET_REG_BITS == 32
-            tmp64 = ((helper_function)t0)(tci_read_reg(TCG_REG_R0),
-                                          tci_read_reg(TCG_REG_R1),
-                                          tci_read_reg(TCG_REG_R2),
-                                          tci_read_reg(TCG_REG_R3),
-                                          tci_read_reg(TCG_REG_R5),
-                                          tci_read_reg(TCG_REG_R6),
-                                          tci_read_reg(TCG_REG_R7),
-                                          tci_read_reg(TCG_REG_R8),
-                                          tci_read_reg(TCG_REG_R9),
-                                          tci_read_reg(TCG_REG_R10));
-            tci_write_reg(TCG_REG_R0, tmp64);
-            tci_write_reg(TCG_REG_R1, tmp64 >> 32);
+            tmp64 = ((helper_function)t0)(tci_read_reg(regs, TCG_REG_R0),
+                                          tci_read_reg(regs, TCG_REG_R1),
+                                          tci_read_reg(regs, TCG_REG_R2),
+                                          tci_read_reg(regs, TCG_REG_R3),
+                                          tci_read_reg(regs, TCG_REG_R5),
+                                          tci_read_reg(regs, TCG_REG_R6),
+                                          tci_read_reg(regs, TCG_REG_R7),
+                                          tci_read_reg(regs, TCG_REG_R8),
+                                          tci_read_reg(regs, TCG_REG_R9),
+                                          tci_read_reg(regs, TCG_REG_R10));
+            tci_write_reg(regs, TCG_REG_R0, tmp64);
+            tci_write_reg(regs, TCG_REG_R1, tmp64 >> 32);
 #else
-            tmp64 = ((helper_function)t0)(tci_read_reg(TCG_REG_R0),
-                                          tci_read_reg(TCG_REG_R1),
-                                          tci_read_reg(TCG_REG_R2),
-                                          tci_read_reg(TCG_REG_R3),
-                                          tci_read_reg(TCG_REG_R5));
-            tci_write_reg(TCG_REG_R0, tmp64);
+            tmp64 = ((helper_function)t0)(tci_read_reg(regs, TCG_REG_R0),
+                                          tci_read_reg(regs, TCG_REG_R1),
+                                          tci_read_reg(regs, TCG_REG_R2),
+                                          tci_read_reg(regs, TCG_REG_R3),
+                                          tci_read_reg(regs, TCG_REG_R5));
+            tci_write_reg(regs, TCG_REG_R0, tmp64);
 #endif
             break;
         case INDEX_op_br:
@@ -533,46 +539,46 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
             continue;
         case INDEX_op_setcond_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
             condition = *tb_ptr++;
-            tci_write_reg32(t0, tci_compare32(t1, t2, condition));
+            tci_write_reg32(regs, t0, tci_compare32(t1, t2, condition));
             break;
 #if TCG_TARGET_REG_BITS == 32
         case INDEX_op_setcond2_i32:
             t0 = *tb_ptr++;
-            tmp64 = tci_read_r64(&tb_ptr);
-            v64 = tci_read_ri64(&tb_ptr);
+            tmp64 = tci_read_r64(regs, &tb_ptr);
+            v64 = tci_read_ri64(regs, &tb_ptr);
             condition = *tb_ptr++;
-            tci_write_reg32(t0, tci_compare64(tmp64, v64, condition));
+            tci_write_reg32(regs, t0, tci_compare64(tmp64, v64, condition));
             break;
 #elif TCG_TARGET_REG_BITS == 64
         case INDEX_op_setcond_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(&tb_ptr);
-            t2 = tci_read_ri64(&tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
             condition = *tb_ptr++;
-            tci_write_reg64(t0, tci_compare64(t1, t2, condition));
+            tci_write_reg64(regs, t0, tci_compare64(t1, t2, condition));
             break;
 #endif
         case INDEX_op_mov_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(&tb_ptr);
-            tci_write_reg32(t0, t1);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1);
             break;
         case INDEX_op_movi_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_i32(&tb_ptr);
-            tci_write_reg32(t0, t1);
+            tci_write_reg32(regs, t0, t1);
             break;
 
             /* Load/store operations (32 bit). */
 
         case INDEX_op_ld8u_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(&tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg8(t0, *(uint8_t *)(t1 + t2));
+            tci_write_reg8(regs, t0, *(uint8_t *)(t1 + t2));
             break;
         case INDEX_op_ld8s_i32:
         case INDEX_op_ld16u_i32:
@@ -583,25 +589,25 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
             break;
         case INDEX_op_ld_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(&tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg32(t0, *(uint32_t *)(t1 + t2));
+            tci_write_reg32(regs, t0, *(uint32_t *)(t1 + t2));
             break;
         case INDEX_op_st8_i32:
-            t0 = tci_read_r8(&tb_ptr);
-            t1 = tci_read_r(&tb_ptr);
+            t0 = tci_read_r8(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             *(uint8_t *)(t1 + t2) = t0;
             break;
         case INDEX_op_st16_i32:
-            t0 = tci_read_r16(&tb_ptr);
-            t1 = tci_read_r(&tb_ptr);
+            t0 = tci_read_r16(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             *(uint16_t *)(t1 + t2) = t0;
             break;
         case INDEX_op_st_i32:
-            t0 = tci_read_r32(&tb_ptr);
-            t1 = tci_read_r(&tb_ptr);
+            t0 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             tci_assert(t1 != sp_value || (int32_t)t2 < 0);
             *(uint32_t *)(t1 + t2) = t0;
@@ -611,46 +617,46 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
 
         case INDEX_op_add_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, t1 + t2);
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1 + t2);
             break;
         case INDEX_op_sub_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, t1 - t2);
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1 - t2);
             break;
         case INDEX_op_mul_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, t1 * t2);
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1 * t2);
             break;
 #if TCG_TARGET_HAS_div_i32
         case INDEX_op_div_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, (int32_t)t1 / (int32_t)t2);
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, (int32_t)t1 / (int32_t)t2);
             break;
         case INDEX_op_divu_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, t1 / t2);
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1 / t2);
             break;
         case INDEX_op_rem_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, (int32_t)t1 % (int32_t)t2);
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, (int32_t)t1 % (int32_t)t2);
             break;
         case INDEX_op_remu_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, t1 % t2);
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1 % t2);
             break;
 #elif TCG_TARGET_HAS_div2_i32
         case INDEX_op_div2_i32:
@@ -660,71 +666,71 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
 #endif
         case INDEX_op_and_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, t1 & t2);
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1 & t2);
             break;
         case INDEX_op_or_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, t1 | t2);
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1 | t2);
             break;
         case INDEX_op_xor_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, t1 ^ t2);
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1 ^ t2);
             break;
 
             /* Shift/rotate operations (32 bit). */
 
         case INDEX_op_shl_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, t1 << (t2 & 31));
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1 << (t2 & 31));
             break;
         case INDEX_op_shr_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, t1 >> (t2 & 31));
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1 >> (t2 & 31));
             break;
         case INDEX_op_sar_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, ((int32_t)t1 >> (t2 & 31)));
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, ((int32_t)t1 >> (t2 & 31)));
             break;
 #if TCG_TARGET_HAS_rot_i32
         case INDEX_op_rotl_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, rol32(t1, t2 & 31));
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, rol32(t1, t2 & 31));
             break;
         case INDEX_op_rotr_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(&tb_ptr);
-            t2 = tci_read_ri32(&tb_ptr);
-            tci_write_reg32(t0, ror32(t1, t2 & 31));
+            t1 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_ri32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, ror32(t1, t2 & 31));
             break;
 #endif
 #if TCG_TARGET_HAS_deposit_i32
         case INDEX_op_deposit_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(&tb_ptr);
-            t2 = tci_read_r32(&tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tmp16 = *tb_ptr++;
             tmp8 = *tb_ptr++;
             tmp32 = (((1 << tmp8) - 1) << tmp16);
-            tci_write_reg32(t0, (t1 & ~tmp32) | ((t2 << tmp16) & tmp32));
+            tci_write_reg32(regs, t0, (t1 & ~tmp32) | ((t2 << tmp16) & tmp32));
             break;
 #endif
         case INDEX_op_brcond_i32:
-            t0 = tci_read_r32(&tb_ptr);
-            t1 = tci_read_ri32(&tb_ptr);
+            t0 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_ri32(regs, &tb_ptr);
             condition = *tb_ptr++;
             label = tci_read_label(&tb_ptr);
             if (tci_compare32(t0, t1, condition)) {
@@ -737,20 +743,20 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
         case INDEX_op_add2_i32:
             t0 = *tb_ptr++;
             t1 = *tb_ptr++;
-            tmp64 = tci_read_r64(&tb_ptr);
-            tmp64 += tci_read_r64(&tb_ptr);
-            tci_write_reg64(t1, t0, tmp64);
+            tmp64 = tci_read_r64(regs, &tb_ptr);
+            tmp64 += tci_read_r64(regs, &tb_ptr);
+            tci_write_reg64(regs, t1, t0, tmp64);
             break;
         case INDEX_op_sub2_i32:
             t0 = *tb_ptr++;
             t1 = *tb_ptr++;
-            tmp64 = tci_read_r64(&tb_ptr);
-            tmp64 -= tci_read_r64(&tb_ptr);
-            tci_write_reg64(t1, t0, tmp64);
+            tmp64 = tci_read_r64(regs, &tb_ptr);
+            tmp64 -= tci_read_r64(regs, &tb_ptr);
+            tci_write_reg64(regs, t1, t0, tmp64);
             break;
         case INDEX_op_brcond2_i32:
-            tmp64 = tci_read_r64(&tb_ptr);
-            v64 = tci_read_ri64(&tb_ptr);
+            tmp64 = tci_read_r64(regs, &tb_ptr);
+            v64 = tci_read_ri64(regs, &tb_ptr);
             condition = *tb_ptr++;
             label = tci_read_label(&tb_ptr);
             if (tci_compare64(tmp64, v64, condition)) {
@@ -762,86 +768,86 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
         case INDEX_op_mulu2_i32:
             t0 = *tb_ptr++;
             t1 = *tb_ptr++;
-            t2 = tci_read_r32(&tb_ptr);
-            tmp64 = tci_read_r32(&tb_ptr);
-            tci_write_reg64(t1, t0, t2 * tmp64);
+            t2 = tci_read_r32(regs, &tb_ptr);
+            tmp64 = tci_read_r32(regs, &tb_ptr);
+            tci_write_reg64(regs, t1, t0, t2 * tmp64);
             break;
 #endif /* TCG_TARGET_REG_BITS == 32 */
 #if TCG_TARGET_HAS_ext8s_i32
         case INDEX_op_ext8s_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r8s(&tb_ptr);
-            tci_write_reg32(t0, t1);
+            t1 = tci_read_r8s(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext16s_i32
         case INDEX_op_ext16s_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r16s(&tb_ptr);
-            tci_write_reg32(t0, t1);
+            t1 = tci_read_r16s(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext8u_i32
         case INDEX_op_ext8u_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r8(&tb_ptr);
-            tci_write_reg32(t0, t1);
+            t1 = tci_read_r8(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext16u_i32
         case INDEX_op_ext16u_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r16(&tb_ptr);
-            tci_write_reg32(t0, t1);
+            t1 = tci_read_r16(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_bswap16_i32
         case INDEX_op_bswap16_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r16(&tb_ptr);
-            tci_write_reg32(t0, bswap16(t1));
+            t1 = tci_read_r16(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, bswap16(t1));
             break;
 #endif
 #if TCG_TARGET_HAS_bswap32_i32
         case INDEX_op_bswap32_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(&tb_ptr);
-            tci_write_reg32(t0, bswap32(t1));
+            t1 = tci_read_r32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, bswap32(t1));
             break;
 #endif
 #if TCG_TARGET_HAS_not_i32
         case INDEX_op_not_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(&tb_ptr);
-            tci_write_reg32(t0, ~t1);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, ~t1);
             break;
 #endif
 #if TCG_TARGET_HAS_neg_i32
         case INDEX_op_neg_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(&tb_ptr);
-            tci_write_reg32(t0, -t1);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            tci_write_reg32(regs, t0, -t1);
             break;
 #endif
 #if TCG_TARGET_REG_BITS == 64
         case INDEX_op_mov_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(&tb_ptr);
-            tci_write_reg64(t0, t1);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1);
             break;
         case INDEX_op_movi_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_i64(&tb_ptr);
-            tci_write_reg64(t0, t1);
+            tci_write_reg64(regs, t0, t1);
             break;
 
             /* Load/store operations (64 bit). */
 
         case INDEX_op_ld8u_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(&tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg8(t0, *(uint8_t *)(t1 + t2));
+            tci_write_reg8(regs, t0, *(uint8_t *)(t1 + t2));
             break;
         case INDEX_op_ld8s_i64:
         case INDEX_op_ld16u_i64:
@@ -850,43 +856,43 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
             break;
         case INDEX_op_ld32u_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(&tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg32(t0, *(uint32_t *)(t1 + t2));
+            tci_write_reg32(regs, t0, *(uint32_t *)(t1 + t2));
             break;
         case INDEX_op_ld32s_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(&tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg32s(t0, *(int32_t *)(t1 + t2));
+            tci_write_reg32s(regs, t0, *(int32_t *)(t1 + t2));
             break;
         case INDEX_op_ld_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(&tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg64(t0, *(uint64_t *)(t1 + t2));
+            tci_write_reg64(regs, t0, *(uint64_t *)(t1 + t2));
             break;
         case INDEX_op_st8_i64:
-            t0 = tci_read_r8(&tb_ptr);
-            t1 = tci_read_r(&tb_ptr);
+            t0 = tci_read_r8(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             *(uint8_t *)(t1 + t2) = t0;
             break;
         case INDEX_op_st16_i64:
-            t0 = tci_read_r16(&tb_ptr);
-            t1 = tci_read_r(&tb_ptr);
+            t0 = tci_read_r16(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             *(uint16_t *)(t1 + t2) = t0;
             break;
         case INDEX_op_st32_i64:
-            t0 = tci_read_r32(&tb_ptr);
-            t1 = tci_read_r(&tb_ptr);
+            t0 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             *(uint32_t *)(t1 + t2) = t0;
             break;
         case INDEX_op_st_i64:
-            t0 = tci_read_r64(&tb_ptr);
-            t1 = tci_read_r(&tb_ptr);
+            t0 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             tci_assert(t1 != sp_value || (int32_t)t2 < 0);
             *(uint64_t *)(t1 + t2) = t0;
@@ -896,21 +902,21 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
 
         case INDEX_op_add_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(&tb_ptr);
-            t2 = tci_read_ri64(&tb_ptr);
-            tci_write_reg64(t0, t1 + t2);
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1 + t2);
             break;
         case INDEX_op_sub_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(&tb_ptr);
-            t2 = tci_read_ri64(&tb_ptr);
-            tci_write_reg64(t0, t1 - t2);
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1 - t2);
             break;
         case INDEX_op_mul_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(&tb_ptr);
-            t2 = tci_read_ri64(&tb_ptr);
-            tci_write_reg64(t0, t1 * t2);
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1 * t2);
             break;
 #if TCG_TARGET_HAS_div_i64
         case INDEX_op_div_i64:
@@ -927,71 +933,71 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
 #endif
         case INDEX_op_and_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(&tb_ptr);
-            t2 = tci_read_ri64(&tb_ptr);
-            tci_write_reg64(t0, t1 & t2);
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1 & t2);
             break;
         case INDEX_op_or_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(&tb_ptr);
-            t2 = tci_read_ri64(&tb_ptr);
-            tci_write_reg64(t0, t1 | t2);
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1 | t2);
             break;
         case INDEX_op_xor_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(&tb_ptr);
-            t2 = tci_read_ri64(&tb_ptr);
-            tci_write_reg64(t0, t1 ^ t2);
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1 ^ t2);
             break;
 
             /* Shift/rotate operations (64 bit). */
 
         case INDEX_op_shl_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(&tb_ptr);
-            t2 = tci_read_ri64(&tb_ptr);
-            tci_write_reg64(t0, t1 << (t2 & 63));
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1 << (t2 & 63));
             break;
         case INDEX_op_shr_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(&tb_ptr);
-            t2 = tci_read_ri64(&tb_ptr);
-            tci_write_reg64(t0, t1 >> (t2 & 63));
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1 >> (t2 & 63));
             break;
         case INDEX_op_sar_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(&tb_ptr);
-            t2 = tci_read_ri64(&tb_ptr);
-            tci_write_reg64(t0, ((int64_t)t1 >> (t2 & 63)));
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, ((int64_t)t1 >> (t2 & 63)));
             break;
 #if TCG_TARGET_HAS_rot_i64
         case INDEX_op_rotl_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(&tb_ptr);
-            t2 = tci_read_ri64(&tb_ptr);
-            tci_write_reg64(t0, rol64(t1, t2 & 63));
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, rol64(t1, t2 & 63));
             break;
         case INDEX_op_rotr_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(&tb_ptr);
-            t2 = tci_read_ri64(&tb_ptr);
-            tci_write_reg64(t0, ror64(t1, t2 & 63));
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, ror64(t1, t2 & 63));
             break;
 #endif
 #if TCG_TARGET_HAS_deposit_i64
         case INDEX_op_deposit_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(&tb_ptr);
-            t2 = tci_read_r64(&tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tmp16 = *tb_ptr++;
             tmp8 = *tb_ptr++;
             tmp64 = (((1ULL << tmp8) - 1) << tmp16);
-            tci_write_reg64(t0, (t1 & ~tmp64) | ((t2 << tmp16) & tmp64));
+            tci_write_reg64(regs, t0, (t1 & ~tmp64) | ((t2 << tmp16) & tmp64));
             break;
 #endif
         case INDEX_op_brcond_i64:
-            t0 = tci_read_r64(&tb_ptr);
-            t1 = tci_read_ri64(&tb_ptr);
+            t0 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_ri64(regs, &tb_ptr);
             condition = *tb_ptr++;
             label = tci_read_label(&tb_ptr);
             if (tci_compare64(t0, t1, condition)) {
@@ -1003,29 +1009,29 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
 #if TCG_TARGET_HAS_ext8u_i64
         case INDEX_op_ext8u_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r8(&tb_ptr);
-            tci_write_reg64(t0, t1);
+            t1 = tci_read_r8(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext8s_i64
         case INDEX_op_ext8s_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r8s(&tb_ptr);
-            tci_write_reg64(t0, t1);
+            t1 = tci_read_r8s(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext16s_i64
         case INDEX_op_ext16s_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r16s(&tb_ptr);
-            tci_write_reg64(t0, t1);
+            t1 = tci_read_r16s(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext16u_i64
         case INDEX_op_ext16u_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r16(&tb_ptr);
-            tci_write_reg64(t0, t1);
+            t1 = tci_read_r16(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext32s_i64
@@ -1033,51 +1039,51 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
 #endif
         case INDEX_op_ext_i32_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32s(&tb_ptr);
-            tci_write_reg64(t0, t1);
+            t1 = tci_read_r32s(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1);
             break;
 #if TCG_TARGET_HAS_ext32u_i64
         case INDEX_op_ext32u_i64:
 #endif
         case INDEX_op_extu_i32_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(&tb_ptr);
-            tci_write_reg64(t0, t1);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, t1);
             break;
 #if TCG_TARGET_HAS_bswap16_i64
         case INDEX_op_bswap16_i64:
             TODO();
             t0 = *tb_ptr++;
-            t1 = tci_read_r16(&tb_ptr);
-            tci_write_reg64(t0, bswap16(t1));
+            t1 = tci_read_r16(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, bswap16(t1));
             break;
 #endif
 #if TCG_TARGET_HAS_bswap32_i64
         case INDEX_op_bswap32_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(&tb_ptr);
-            tci_write_reg64(t0, bswap32(t1));
+            t1 = tci_read_r32(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, bswap32(t1));
             break;
 #endif
 #if TCG_TARGET_HAS_bswap64_i64
         case INDEX_op_bswap64_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(&tb_ptr);
-            tci_write_reg64(t0, bswap64(t1));
+            t1 = tci_read_r64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, bswap64(t1));
             break;
 #endif
 #if TCG_TARGET_HAS_not_i64
         case INDEX_op_not_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(&tb_ptr);
-            tci_write_reg64(t0, ~t1);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, ~t1);
             break;
 #endif
 #if TCG_TARGET_HAS_neg_i64
         case INDEX_op_neg_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(&tb_ptr);
-            tci_write_reg64(t0, -t1);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            tci_write_reg64(regs, t0, -t1);
             break;
 #endif
 #endif /* TCG_TARGET_REG_BITS == 64 */
@@ -1098,7 +1104,7 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
             continue;
         case INDEX_op_qemu_ld_i32:
             t0 = *tb_ptr++;
-            taddr = tci_read_ulong(&tb_ptr);
+            taddr = tci_read_ulong(regs, &tb_ptr);
             oi = tci_read_i(&tb_ptr);
             switch (get_memop(oi) & (MO_BSWAP | MO_SSIZE)) {
             case MO_UB:
@@ -1128,14 +1134,14 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
             default:
                 tcg_abort();
             }
-            tci_write_reg(t0, tmp32);
+            tci_write_reg(regs, t0, tmp32);
             break;
         case INDEX_op_qemu_ld_i64:
             t0 = *tb_ptr++;
             if (TCG_TARGET_REG_BITS == 32) {
                 t1 = *tb_ptr++;
             }
-            taddr = tci_read_ulong(&tb_ptr);
+            taddr = tci_read_ulong(regs, &tb_ptr);
             oi = tci_read_i(&tb_ptr);
             switch (get_memop(oi) & (MO_BSWAP | MO_SSIZE)) {
             case MO_UB:
@@ -1177,14 +1183,14 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
             default:
                 tcg_abort();
             }
-            tci_write_reg(t0, tmp64);
+            tci_write_reg(regs, t0, tmp64);
             if (TCG_TARGET_REG_BITS == 32) {
-                tci_write_reg(t1, tmp64 >> 32);
+                tci_write_reg(regs, t1, tmp64 >> 32);
             }
             break;
         case INDEX_op_qemu_st_i32:
-            t0 = tci_read_r(&tb_ptr);
-            taddr = tci_read_ulong(&tb_ptr);
+            t0 = tci_read_r(regs, &tb_ptr);
+            taddr = tci_read_ulong(regs, &tb_ptr);
             oi = tci_read_i(&tb_ptr);
             switch (get_memop(oi) & (MO_BSWAP | MO_SIZE)) {
             case MO_UB:
@@ -1207,8 +1213,8 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
             }
             break;
         case INDEX_op_qemu_st_i64:
-            tmp64 = tci_read_r64(&tb_ptr);
-            taddr = tci_read_ulong(&tb_ptr);
+            tmp64 = tci_read_r64(regs, &tb_ptr);
+            taddr = tci_read_ulong(regs, &tb_ptr);
             oi = tci_read_i(&tb_ptr);
             switch (get_memop(oi) & (MO_BSWAP | MO_SIZE)) {
             case MO_UB:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 31/43] tcg: take tb_ctx out of TCGContext
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (29 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 30/43] tci: move tci_regs to tcg_qemu_tb_exec's stack Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 32/43] tcg: take .helpers " Emilio G. Cota
                   ` (12 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Groundwork for supporting multiple TCG contexts.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/tb-context.h |  2 ++
 tcg/tcg.h                 |  2 --
 accel/tcg/cpu-exec.c      |  2 +-
 accel/tcg/translate-all.c | 57 +++++++++++++++++++++++------------------------
 linux-user/main.c         |  6 ++---
 5 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index 1fa8dcc..1d41202 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -41,4 +41,6 @@ struct TBContext {
     int tb_phys_invalidate_count;
 };
 
+extern TBContext tb_ctx;
+
 #endif
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 9b6dade..22f7ecd 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -707,8 +707,6 @@ struct TCGContext {
     /* Threshold to flush the translated code buffer.  */
     void *code_gen_highwater;
 
-    TBContext tb_ctx;
-
     /* Track which vCPU triggers events */
     CPUState *cpu;                      /* *_trans */
     TCGv_env tcg_env;                   /* *_exec  */
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 1963bda..f42096a 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -325,7 +325,7 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
     phys_pc = get_page_addr_code(desc.env, pc);
     desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
     h = tb_hash_func(phys_pc, pc, flags, cf_mask, *cpu->trace_dstate);
-    return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
+    return qht_lookup(&tb_ctx.htable, tb_cmp, &desc, h);
 }
 
 static inline TranslationBlock *tb_find(CPUState *cpu,
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index d50e2b9..5509407 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -154,6 +154,7 @@ static void *l1_map[V_L1_MAX_SIZE];
 
 /* code generation context */
 TCGContext tcg_ctx;
+TBContext tb_ctx;
 bool parallel_cpus;
 
 /* translation block context */
@@ -185,7 +186,7 @@ static void page_table_config_init(void)
 void tb_lock(void)
 {
     assert_tb_unlocked();
-    qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
+    qemu_mutex_lock(&tb_ctx.tb_lock);
     have_tb_lock++;
 }
 
@@ -193,13 +194,13 @@ void tb_unlock(void)
 {
     assert_tb_locked();
     have_tb_lock--;
-    qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
+    qemu_mutex_unlock(&tb_ctx.tb_lock);
 }
 
 void tb_lock_reset(void)
 {
     if (have_tb_lock) {
-        qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
+        qemu_mutex_unlock(&tb_ctx.tb_lock);
         have_tb_lock = 0;
     }
 }
@@ -826,15 +827,15 @@ static inline void code_gen_alloc(size_t tb_size)
         fprintf(stderr, "Could not allocate dynamic translator buffer\n");
         exit(1);
     }
-    tcg_ctx.tb_ctx.tb_tree = g_tree_new(tb_tc_cmp);
-    qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
+    tb_ctx.tb_tree = g_tree_new(tb_tc_cmp);
+    qemu_mutex_init(&tb_ctx.tb_lock);
 }
 
 static void tb_htable_init(void)
 {
     unsigned int mode = QHT_MODE_AUTO_RESIZE;
 
-    qht_init(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE, mode);
+    qht_init(&tb_ctx.htable, CODE_GEN_HTABLE_SIZE, mode);
 }
 
 /* Must be called before using the QEMU cpus. 'tb_size' is the size
@@ -878,7 +879,7 @@ void tb_remove(TranslationBlock *tb)
 {
     assert_tb_locked();
 
-    g_tree_remove(tcg_ctx.tb_ctx.tb_tree, &tb->tc);
+    g_tree_remove(tb_ctx.tb_tree, &tb->tc);
 }
 
 static inline void invalidate_page_bitmap(PageDesc *p)
@@ -940,15 +941,15 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
     /* If it is already been done on request of another CPU,
      * just retry.
      */
-    if (tcg_ctx.tb_ctx.tb_flush_count != tb_flush_count.host_int) {
+    if (tb_ctx.tb_flush_count != tb_flush_count.host_int) {
         goto done;
     }
 
     if (DEBUG_TB_FLUSH_GATE) {
-        size_t nb_tbs = g_tree_nnodes(tcg_ctx.tb_ctx.tb_tree);
+        size_t nb_tbs = g_tree_nnodes(tb_ctx.tb_tree);
         size_t host_size = 0;
 
-        g_tree_foreach(tcg_ctx.tb_ctx.tb_tree, tb_host_size_iter, &host_size);
+        g_tree_foreach(tb_ctx.tb_tree, tb_host_size_iter, &host_size);
         printf("qemu: flush code_size=%td nb_tbs=%zu avg_tb_size=%zu\n",
                tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer, nb_tbs,
                nb_tbs > 0 ? host_size / nb_tbs : 0);
@@ -963,17 +964,16 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
     }
 
     /* Increment the refcount first so that destroy acts as a reset */
-    g_tree_ref(tcg_ctx.tb_ctx.tb_tree);
-    g_tree_destroy(tcg_ctx.tb_ctx.tb_tree);
+    g_tree_ref(tb_ctx.tb_tree);
+    g_tree_destroy(tb_ctx.tb_tree);
 
-    qht_reset_size(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
+    qht_reset_size(&tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
     page_flush_tb();
 
     tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer;
     /* XXX: flush processor icache at this point if cache flush is
        expensive */
-    atomic_mb_set(&tcg_ctx.tb_ctx.tb_flush_count,
-                  tcg_ctx.tb_ctx.tb_flush_count + 1);
+    atomic_mb_set(&tb_ctx.tb_flush_count, tb_ctx.tb_flush_count + 1);
 
 done:
     tb_unlock();
@@ -982,7 +982,7 @@ done:
 void tb_flush(CPUState *cpu)
 {
     if (tcg_enabled()) {
-        unsigned tb_flush_count = atomic_mb_read(&tcg_ctx.tb_ctx.tb_flush_count);
+        unsigned tb_flush_count = atomic_mb_read(&tb_ctx.tb_flush_count);
         async_safe_run_on_cpu(cpu, do_tb_flush,
                               RUN_ON_CPU_HOST_INT(tb_flush_count));
     }
@@ -1015,7 +1015,7 @@ do_tb_invalidate_check(struct qht *ht, void *p, uint32_t hash, void *userp)
 static void tb_invalidate_check(target_ulong address)
 {
     address &= TARGET_PAGE_MASK;
-    qht_iter(&tcg_ctx.tb_ctx.htable, do_tb_invalidate_check, &address);
+    qht_iter(&tb_ctx.htable, do_tb_invalidate_check, &address);
 }
 
 static void
@@ -1035,7 +1035,7 @@ do_tb_page_check(struct qht *ht, void *p, uint32_t hash, void *userp)
 /* verify that all the pages have correct rights for code */
 static void tb_page_check(void)
 {
-    qht_iter(&tcg_ctx.tb_ctx.htable, do_tb_page_check, NULL);
+    qht_iter(&tb_ctx.htable, do_tb_page_check, NULL);
 }
 
 #endif /* CONFIG_USER_ONLY */
@@ -1135,7 +1135,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
     h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
                      tb->trace_vcpu_dstate);
-    qht_remove(&tcg_ctx.tb_ctx.htable, tb, h);
+    qht_remove(&tb_ctx.htable, tb, h);
 
     /* remove the TB from the page list */
     if (tb->page_addr[0] != page_addr) {
@@ -1164,7 +1164,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     /* suppress any remaining jumps to this TB */
     tb_jmp_unlink(tb);
 
-    tcg_ctx.tb_ctx.tb_phys_invalidate_count++;
+    tb_ctx.tb_phys_invalidate_count++;
 }
 
 #ifdef CONFIG_SOFTMMU
@@ -1280,7 +1280,7 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
     /* add in the hash table */
     h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
                      tb->trace_vcpu_dstate);
-    qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
+    qht_insert(&tb_ctx.htable, tb, h);
 
 #ifdef CONFIG_USER_ONLY
     if (DEBUG_TB_CHECK_GATE) {
@@ -1426,7 +1426,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
      * through the physical hash table and physical page list.
      */
     tb_link_page(tb, phys_pc, phys_page2);
-    g_tree_insert(tcg_ctx.tb_ctx.tb_tree, &tb->tc, tb);
+    g_tree_insert(tb_ctx.tb_tree, &tb->tc, tb);
     return tb;
 }
 
@@ -1706,7 +1706,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
 {
     struct tb_tc s = { .ptr = (void *)tc_ptr };
 
-    return g_tree_lookup(tcg_ctx.tb_ctx.tb_tree, &s);
+    return g_tree_lookup(tb_ctx.tb_tree, &s);
 }
 
 #if !defined(CONFIG_USER_ONLY)
@@ -1931,8 +1931,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
 
     tb_lock();
 
-    nb_tbs = g_tree_nnodes(tcg_ctx.tb_ctx.tb_tree);
-    g_tree_foreach(tcg_ctx.tb_ctx.tb_tree, tb_tree_stats_iter, &tst);
+    nb_tbs = g_tree_nnodes(tb_ctx.tb_tree);
+    g_tree_foreach(tb_ctx.tb_tree, tb_tree_stats_iter, &tst);
     /* XXX: avoid using doubles ? */
     cpu_fprintf(f, "Translation buffer state:\n");
     /*
@@ -1958,15 +1958,14 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
                 tst.direct_jmp2_count,
                 nb_tbs ? (tst.direct_jmp2_count * 100) / nb_tbs : 0);
 
-    qht_statistics_init(&tcg_ctx.tb_ctx.htable, &hst);
+    qht_statistics_init(&tb_ctx.htable, &hst);
     print_qht_statistics(f, cpu_fprintf, hst);
     qht_statistics_destroy(&hst);
 
     cpu_fprintf(f, "\nStatistics:\n");
     cpu_fprintf(f, "TB flush count      %u\n",
-            atomic_read(&tcg_ctx.tb_ctx.tb_flush_count));
-    cpu_fprintf(f, "TB invalidate count %d\n",
-            tcg_ctx.tb_ctx.tb_phys_invalidate_count);
+                atomic_read(&tb_ctx.tb_flush_count));
+    cpu_fprintf(f, "TB invalidate count %d\n", tb_ctx.tb_phys_invalidate_count);
     cpu_fprintf(f, "TLB flush count     %zu\n", tlb_flush_count());
     tcg_dump_info(f, cpu_fprintf);
 
diff --git a/linux-user/main.c b/linux-user/main.c
index 2b38d39..dbbe3d7 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -114,7 +114,7 @@ int cpu_get_pic_interrupt(CPUX86State *env)
 void fork_start(void)
 {
     cpu_list_lock();
-    qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
+    qemu_mutex_lock(&tb_ctx.tb_lock);
     mmap_fork_start();
 }
 
@@ -130,11 +130,11 @@ void fork_end(int child)
                 QTAILQ_REMOVE(&cpus, cpu, node);
             }
         }
-        qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
+        qemu_mutex_init(&tb_ctx.tb_lock);
         qemu_init_cpu_list();
         gdbserver_fork(thread_cpu);
     } else {
-        qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
+        qemu_mutex_unlock(&tb_ctx.tb_lock);
         cpu_list_unlock();
     }
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 32/43] tcg: take .helpers out of TCGContext
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (30 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 31/43] tcg: take tb_ctx out of TCGContext Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 33/43] tcg: define tcg_init_ctx and make tcg_ctx a pointer Emilio G. Cota
                   ` (11 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Groundwork for supporting multiple TCG contexts.

The hash table becomes read-only after it is filled in,
so we can save space by keeping just a global pointer to it.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/tcg.h |  2 --
 tcg/tcg.c | 10 +++++-----
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index 22f7ecd..53c679f 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -664,8 +664,6 @@ struct TCGContext {
 
     tcg_insn_unit *code_ptr;
 
-    GHashTable *helpers;
-
 #ifdef CONFIG_PROFILER
     /* profiling info */
     int64_t tb_count1;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 28c1b94..c0c2d6c 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -319,6 +319,7 @@ typedef struct TCGHelperInfo {
 static const TCGHelperInfo all_helpers[] = {
 #include "exec/helper-tcg.h"
 };
+static GHashTable *helper_table;
 
 static int indirect_reg_alloc_order[ARRAY_SIZE(tcg_target_reg_alloc_order)];
 static void process_op_defs(TCGContext *s);
@@ -329,7 +330,6 @@ void tcg_context_init(TCGContext *s)
     TCGOpDef *def;
     TCGArgConstraint *args_ct;
     int *sorted_args;
-    GHashTable *helper_table;
 
     memset(s, 0, sizeof(*s));
     s->nb_globals = 0;
@@ -357,7 +357,7 @@ void tcg_context_init(TCGContext *s)
 
     /* Register helpers.  */
     /* Use g_direct_hash/equal for direct pointer comparisons on func.  */
-    s->helpers = helper_table = g_hash_table_new(NULL, NULL);
+    helper_table = g_hash_table_new(NULL, NULL);
 
     for (i = 0; i < ARRAY_SIZE(all_helpers); ++i) {
         g_hash_table_insert(helper_table, (gpointer)all_helpers[i].func,
@@ -761,7 +761,7 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
     unsigned sizemask, flags;
     TCGHelperInfo *info;
 
-    info = g_hash_table_lookup(s->helpers, (gpointer)func);
+    info = g_hash_table_lookup(helper_table, (gpointer)func);
     flags = info->flags;
     sizemask = info->sizemask;
 
@@ -990,8 +990,8 @@ static char *tcg_get_arg_str_idx(TCGContext *s, char *buf,
 static inline const char *tcg_find_helper(TCGContext *s, uintptr_t val)
 {
     const char *ret = NULL;
-    if (s->helpers) {
-        TCGHelperInfo *info = g_hash_table_lookup(s->helpers, (gpointer)val);
+    if (helper_table) {
+        TCGHelperInfo *info = g_hash_table_lookup(helper_table, (gpointer)val);
         if (info) {
             ret = info->name;
         }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 33/43] tcg: define tcg_init_ctx and make tcg_ctx a pointer
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (31 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 32/43] tcg: take .helpers " Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 34/43] gen-icount: fold exitreq_label into TCGContext Emilio G. Cota
                   ` (10 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Groundwork for supporting multiple TCG contexts.

The core of this patch is this change to tcg/tcg.h:

> -extern TCGContext tcg_ctx;
> +extern TCGContext tcg_init_ctx;
> +extern TCGContext *tcg_ctx;

Note that for now we set *tcg_ctx to whatever TCGContext is passed
to tcg_context_init -- in this case &tcg_init_ctx.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/gen-icount.h     | 10 ++---
 include/exec/helper-gen.h     | 12 +++---
 tcg/tcg-op.h                  | 80 +++++++++++++++++------------------
 tcg/tcg.h                     | 15 +++----
 accel/tcg/translate-all.c     | 97 ++++++++++++++++++++++---------------------
 bsd-user/main.c               |  2 +-
 linux-user/main.c             |  2 +-
 target/alpha/translate.c      |  2 +-
 target/arm/translate.c        |  2 +-
 target/cris/translate.c       |  2 +-
 target/cris/translate_v10.c   |  2 +-
 target/hppa/translate.c       |  2 +-
 target/i386/translate.c       |  2 +-
 target/lm32/translate.c       |  2 +-
 target/m68k/translate.c       |  2 +-
 target/microblaze/translate.c |  2 +-
 target/mips/translate.c       |  2 +-
 target/moxie/translate.c      |  2 +-
 target/openrisc/translate.c   |  2 +-
 target/ppc/translate.c        |  2 +-
 target/s390x/translate.c      |  2 +-
 target/sh4/translate.c        |  2 +-
 target/sparc/translate.c      |  2 +-
 target/tilegx/translate.c     |  2 +-
 target/tricore/translate.c    |  2 +-
 target/unicore32/translate.c  |  2 +-
 target/xtensa/translate.c     |  2 +-
 tcg/tcg-op.c                  | 58 +++++++++++++-------------
 tcg/tcg-runtime.c             |  2 +-
 tcg/tcg.c                     | 21 +++++-----
 30 files changed, 171 insertions(+), 168 deletions(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 48b566c..c58b0b2 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -19,7 +19,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
         count = tcg_temp_new_i32();
     }
 
-    tcg_gen_ld_i32(count, tcg_ctx.tcg_env,
+    tcg_gen_ld_i32(count, tcg_ctx->tcg_env,
                    -ENV_OFFSET + offsetof(CPUState, icount_decr.u32));
 
     if (tb_cflags(tb) & CF_USE_ICOUNT) {
@@ -37,7 +37,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
     tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, exitreq_label);
 
     if (tb_cflags(tb) & CF_USE_ICOUNT) {
-        tcg_gen_st16_i32(count, tcg_ctx.tcg_env,
+        tcg_gen_st16_i32(count, tcg_ctx->tcg_env,
                          -ENV_OFFSET + offsetof(CPUState, icount_decr.u16.low));
     }
 
@@ -56,13 +56,13 @@ static inline void gen_tb_end(TranslationBlock *tb, int num_insns)
     tcg_gen_exit_tb((uintptr_t)tb + TB_EXIT_REQUESTED);
 
     /* Terminate the linked list.  */
-    tcg_ctx.gen_op_buf[tcg_ctx.gen_op_buf[0].prev].next = 0;
+    tcg_ctx->gen_op_buf[tcg_ctx->gen_op_buf[0].prev].next = 0;
 }
 
 static inline void gen_io_start(void)
 {
     TCGv_i32 tmp = tcg_const_i32(1);
-    tcg_gen_st_i32(tmp, tcg_ctx.tcg_env,
+    tcg_gen_st_i32(tmp, tcg_ctx->tcg_env,
                    -ENV_OFFSET + offsetof(CPUState, can_do_io));
     tcg_temp_free_i32(tmp);
 }
@@ -70,7 +70,7 @@ static inline void gen_io_start(void)
 static inline void gen_io_end(void)
 {
     TCGv_i32 tmp = tcg_const_i32(0);
-    tcg_gen_st_i32(tmp, tcg_ctx.tcg_env,
+    tcg_gen_st_i32(tmp, tcg_ctx->tcg_env,
                    -ENV_OFFSET + offsetof(CPUState, can_do_io));
     tcg_temp_free_i32(tmp);
 }
diff --git a/include/exec/helper-gen.h b/include/exec/helper-gen.h
index 8239ffc..3bcb901 100644
--- a/include/exec/helper-gen.h
+++ b/include/exec/helper-gen.h
@@ -9,7 +9,7 @@
 #define DEF_HELPER_FLAGS_0(name, flags, ret)                            \
 static inline void glue(gen_helper_, name)(dh_retvar_decl0(ret))        \
 {                                                                       \
-  tcg_gen_callN(&tcg_ctx, HELPER(name), dh_retvar(ret), 0, NULL);       \
+  tcg_gen_callN(tcg_ctx, HELPER(name), dh_retvar(ret), 0, NULL);        \
 }
 
 #define DEF_HELPER_FLAGS_1(name, flags, ret, t1)                        \
@@ -17,7 +17,7 @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
     dh_arg_decl(t1, 1))                                                 \
 {                                                                       \
   TCGArg args[1] = { dh_arg(t1, 1) };                                   \
-  tcg_gen_callN(&tcg_ctx, HELPER(name), dh_retvar(ret), 1, args);       \
+  tcg_gen_callN(tcg_ctx, HELPER(name), dh_retvar(ret), 1, args);        \
 }
 
 #define DEF_HELPER_FLAGS_2(name, flags, ret, t1, t2)                    \
@@ -25,7 +25,7 @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
     dh_arg_decl(t1, 1), dh_arg_decl(t2, 2))                             \
 {                                                                       \
   TCGArg args[2] = { dh_arg(t1, 1), dh_arg(t2, 2) };                    \
-  tcg_gen_callN(&tcg_ctx, HELPER(name), dh_retvar(ret), 2, args);       \
+  tcg_gen_callN(tcg_ctx, HELPER(name), dh_retvar(ret), 2, args);        \
 }
 
 #define DEF_HELPER_FLAGS_3(name, flags, ret, t1, t2, t3)                \
@@ -33,7 +33,7 @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
     dh_arg_decl(t1, 1), dh_arg_decl(t2, 2), dh_arg_decl(t3, 3))         \
 {                                                                       \
   TCGArg args[3] = { dh_arg(t1, 1), dh_arg(t2, 2), dh_arg(t3, 3) };     \
-  tcg_gen_callN(&tcg_ctx, HELPER(name), dh_retvar(ret), 3, args);       \
+  tcg_gen_callN(tcg_ctx, HELPER(name), dh_retvar(ret), 3, args);        \
 }
 
 #define DEF_HELPER_FLAGS_4(name, flags, ret, t1, t2, t3, t4)            \
@@ -43,7 +43,7 @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
 {                                                                       \
   TCGArg args[4] = { dh_arg(t1, 1), dh_arg(t2, 2),                      \
                      dh_arg(t3, 3), dh_arg(t4, 4) };                    \
-  tcg_gen_callN(&tcg_ctx, HELPER(name), dh_retvar(ret), 4, args);       \
+  tcg_gen_callN(tcg_ctx, HELPER(name), dh_retvar(ret), 4, args);        \
 }
 
 #define DEF_HELPER_FLAGS_5(name, flags, ret, t1, t2, t3, t4, t5)        \
@@ -53,7 +53,7 @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
 {                                                                       \
   TCGArg args[5] = { dh_arg(t1, 1), dh_arg(t2, 2), dh_arg(t3, 3),       \
                      dh_arg(t4, 4), dh_arg(t5, 5) };                    \
-  tcg_gen_callN(&tcg_ctx, HELPER(name), dh_retvar(ret), 5, args);       \
+  tcg_gen_callN(tcg_ctx, HELPER(name), dh_retvar(ret), 5, args);        \
 }
 
 #include "helper.h"
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 18d01b2..75c15cc 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -40,161 +40,161 @@ void tcg_gen_op6(TCGContext *, TCGOpcode, TCGArg, TCGArg, TCGArg,
 
 static inline void tcg_gen_op1_i32(TCGOpcode opc, TCGv_i32 a1)
 {
-    tcg_gen_op1(&tcg_ctx, opc, GET_TCGV_I32(a1));
+    tcg_gen_op1(tcg_ctx, opc, GET_TCGV_I32(a1));
 }
 
 static inline void tcg_gen_op1_i64(TCGOpcode opc, TCGv_i64 a1)
 {
-    tcg_gen_op1(&tcg_ctx, opc, GET_TCGV_I64(a1));
+    tcg_gen_op1(tcg_ctx, opc, GET_TCGV_I64(a1));
 }
 
 static inline void tcg_gen_op1i(TCGOpcode opc, TCGArg a1)
 {
-    tcg_gen_op1(&tcg_ctx, opc, a1);
+    tcg_gen_op1(tcg_ctx, opc, a1);
 }
 
 static inline void tcg_gen_op2_i32(TCGOpcode opc, TCGv_i32 a1, TCGv_i32 a2)
 {
-    tcg_gen_op2(&tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2));
+    tcg_gen_op2(tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2));
 }
 
 static inline void tcg_gen_op2_i64(TCGOpcode opc, TCGv_i64 a1, TCGv_i64 a2)
 {
-    tcg_gen_op2(&tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2));
+    tcg_gen_op2(tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2));
 }
 
 static inline void tcg_gen_op2i_i32(TCGOpcode opc, TCGv_i32 a1, TCGArg a2)
 {
-    tcg_gen_op2(&tcg_ctx, opc, GET_TCGV_I32(a1), a2);
+    tcg_gen_op2(tcg_ctx, opc, GET_TCGV_I32(a1), a2);
 }
 
 static inline void tcg_gen_op2i_i64(TCGOpcode opc, TCGv_i64 a1, TCGArg a2)
 {
-    tcg_gen_op2(&tcg_ctx, opc, GET_TCGV_I64(a1), a2);
+    tcg_gen_op2(tcg_ctx, opc, GET_TCGV_I64(a1), a2);
 }
 
 static inline void tcg_gen_op2ii(TCGOpcode opc, TCGArg a1, TCGArg a2)
 {
-    tcg_gen_op2(&tcg_ctx, opc, a1, a2);
+    tcg_gen_op2(tcg_ctx, opc, a1, a2);
 }
 
 static inline void tcg_gen_op3_i32(TCGOpcode opc, TCGv_i32 a1,
                                    TCGv_i32 a2, TCGv_i32 a3)
 {
-    tcg_gen_op3(&tcg_ctx, opc, GET_TCGV_I32(a1),
+    tcg_gen_op3(tcg_ctx, opc, GET_TCGV_I32(a1),
                 GET_TCGV_I32(a2), GET_TCGV_I32(a3));
 }
 
 static inline void tcg_gen_op3_i64(TCGOpcode opc, TCGv_i64 a1,
                                    TCGv_i64 a2, TCGv_i64 a3)
 {
-    tcg_gen_op3(&tcg_ctx, opc, GET_TCGV_I64(a1),
+    tcg_gen_op3(tcg_ctx, opc, GET_TCGV_I64(a1),
                 GET_TCGV_I64(a2), GET_TCGV_I64(a3));
 }
 
 static inline void tcg_gen_op3i_i32(TCGOpcode opc, TCGv_i32 a1,
                                     TCGv_i32 a2, TCGArg a3)
 {
-    tcg_gen_op3(&tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2), a3);
+    tcg_gen_op3(tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2), a3);
 }
 
 static inline void tcg_gen_op3i_i64(TCGOpcode opc, TCGv_i64 a1,
                                     TCGv_i64 a2, TCGArg a3)
 {
-    tcg_gen_op3(&tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2), a3);
+    tcg_gen_op3(tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2), a3);
 }
 
 static inline void tcg_gen_ldst_op_i32(TCGOpcode opc, TCGv_i32 val,
                                        TCGv_ptr base, TCGArg offset)
 {
-    tcg_gen_op3(&tcg_ctx, opc, GET_TCGV_I32(val), GET_TCGV_PTR(base), offset);
+    tcg_gen_op3(tcg_ctx, opc, GET_TCGV_I32(val), GET_TCGV_PTR(base), offset);
 }
 
 static inline void tcg_gen_ldst_op_i64(TCGOpcode opc, TCGv_i64 val,
                                        TCGv_ptr base, TCGArg offset)
 {
-    tcg_gen_op3(&tcg_ctx, opc, GET_TCGV_I64(val), GET_TCGV_PTR(base), offset);
+    tcg_gen_op3(tcg_ctx, opc, GET_TCGV_I64(val), GET_TCGV_PTR(base), offset);
 }
 
 static inline void tcg_gen_op4_i32(TCGOpcode opc, TCGv_i32 a1, TCGv_i32 a2,
                                    TCGv_i32 a3, TCGv_i32 a4)
 {
-    tcg_gen_op4(&tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
+    tcg_gen_op4(tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
                 GET_TCGV_I32(a3), GET_TCGV_I32(a4));
 }
 
 static inline void tcg_gen_op4_i64(TCGOpcode opc, TCGv_i64 a1, TCGv_i64 a2,
                                    TCGv_i64 a3, TCGv_i64 a4)
 {
-    tcg_gen_op4(&tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
+    tcg_gen_op4(tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
                 GET_TCGV_I64(a3), GET_TCGV_I64(a4));
 }
 
 static inline void tcg_gen_op4i_i32(TCGOpcode opc, TCGv_i32 a1, TCGv_i32 a2,
                                     TCGv_i32 a3, TCGArg a4)
 {
-    tcg_gen_op4(&tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
+    tcg_gen_op4(tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
                 GET_TCGV_I32(a3), a4);
 }
 
 static inline void tcg_gen_op4i_i64(TCGOpcode opc, TCGv_i64 a1, TCGv_i64 a2,
                                     TCGv_i64 a3, TCGArg a4)
 {
-    tcg_gen_op4(&tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
+    tcg_gen_op4(tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
                 GET_TCGV_I64(a3), a4);
 }
 
 static inline void tcg_gen_op4ii_i32(TCGOpcode opc, TCGv_i32 a1, TCGv_i32 a2,
                                      TCGArg a3, TCGArg a4)
 {
-    tcg_gen_op4(&tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2), a3, a4);
+    tcg_gen_op4(tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2), a3, a4);
 }
 
 static inline void tcg_gen_op4ii_i64(TCGOpcode opc, TCGv_i64 a1, TCGv_i64 a2,
                                      TCGArg a3, TCGArg a4)
 {
-    tcg_gen_op4(&tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2), a3, a4);
+    tcg_gen_op4(tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2), a3, a4);
 }
 
 static inline void tcg_gen_op5_i32(TCGOpcode opc, TCGv_i32 a1, TCGv_i32 a2,
                                    TCGv_i32 a3, TCGv_i32 a4, TCGv_i32 a5)
 {
-    tcg_gen_op5(&tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
+    tcg_gen_op5(tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
                 GET_TCGV_I32(a3), GET_TCGV_I32(a4), GET_TCGV_I32(a5));
 }
 
 static inline void tcg_gen_op5_i64(TCGOpcode opc, TCGv_i64 a1, TCGv_i64 a2,
                                    TCGv_i64 a3, TCGv_i64 a4, TCGv_i64 a5)
 {
-    tcg_gen_op5(&tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
+    tcg_gen_op5(tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
                 GET_TCGV_I64(a3), GET_TCGV_I64(a4), GET_TCGV_I64(a5));
 }
 
 static inline void tcg_gen_op5i_i32(TCGOpcode opc, TCGv_i32 a1, TCGv_i32 a2,
                                     TCGv_i32 a3, TCGv_i32 a4, TCGArg a5)
 {
-    tcg_gen_op5(&tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
+    tcg_gen_op5(tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
                 GET_TCGV_I32(a3), GET_TCGV_I32(a4), a5);
 }
 
 static inline void tcg_gen_op5i_i64(TCGOpcode opc, TCGv_i64 a1, TCGv_i64 a2,
                                     TCGv_i64 a3, TCGv_i64 a4, TCGArg a5)
 {
-    tcg_gen_op5(&tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
+    tcg_gen_op5(tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
                 GET_TCGV_I64(a3), GET_TCGV_I64(a4), a5);
 }
 
 static inline void tcg_gen_op5ii_i32(TCGOpcode opc, TCGv_i32 a1, TCGv_i32 a2,
                                      TCGv_i32 a3, TCGArg a4, TCGArg a5)
 {
-    tcg_gen_op5(&tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
+    tcg_gen_op5(tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
                 GET_TCGV_I32(a3), a4, a5);
 }
 
 static inline void tcg_gen_op5ii_i64(TCGOpcode opc, TCGv_i64 a1, TCGv_i64 a2,
                                      TCGv_i64 a3, TCGArg a4, TCGArg a5)
 {
-    tcg_gen_op5(&tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
+    tcg_gen_op5(tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
                 GET_TCGV_I64(a3), a4, a5);
 }
 
@@ -202,7 +202,7 @@ static inline void tcg_gen_op6_i32(TCGOpcode opc, TCGv_i32 a1, TCGv_i32 a2,
                                    TCGv_i32 a3, TCGv_i32 a4,
                                    TCGv_i32 a5, TCGv_i32 a6)
 {
-    tcg_gen_op6(&tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
+    tcg_gen_op6(tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
                 GET_TCGV_I32(a3), GET_TCGV_I32(a4), GET_TCGV_I32(a5),
                 GET_TCGV_I32(a6));
 }
@@ -211,7 +211,7 @@ static inline void tcg_gen_op6_i64(TCGOpcode opc, TCGv_i64 a1, TCGv_i64 a2,
                                    TCGv_i64 a3, TCGv_i64 a4,
                                    TCGv_i64 a5, TCGv_i64 a6)
 {
-    tcg_gen_op6(&tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
+    tcg_gen_op6(tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
                 GET_TCGV_I64(a3), GET_TCGV_I64(a4), GET_TCGV_I64(a5),
                 GET_TCGV_I64(a6));
 }
@@ -220,7 +220,7 @@ static inline void tcg_gen_op6i_i32(TCGOpcode opc, TCGv_i32 a1, TCGv_i32 a2,
                                     TCGv_i32 a3, TCGv_i32 a4,
                                     TCGv_i32 a5, TCGArg a6)
 {
-    tcg_gen_op6(&tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
+    tcg_gen_op6(tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
                 GET_TCGV_I32(a3), GET_TCGV_I32(a4), GET_TCGV_I32(a5), a6);
 }
 
@@ -228,7 +228,7 @@ static inline void tcg_gen_op6i_i64(TCGOpcode opc, TCGv_i64 a1, TCGv_i64 a2,
                                     TCGv_i64 a3, TCGv_i64 a4,
                                     TCGv_i64 a5, TCGArg a6)
 {
-    tcg_gen_op6(&tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
+    tcg_gen_op6(tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
                 GET_TCGV_I64(a3), GET_TCGV_I64(a4), GET_TCGV_I64(a5), a6);
 }
 
@@ -236,7 +236,7 @@ static inline void tcg_gen_op6ii_i32(TCGOpcode opc, TCGv_i32 a1, TCGv_i32 a2,
                                      TCGv_i32 a3, TCGv_i32 a4,
                                      TCGArg a5, TCGArg a6)
 {
-    tcg_gen_op6(&tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
+    tcg_gen_op6(tcg_ctx, opc, GET_TCGV_I32(a1), GET_TCGV_I32(a2),
                 GET_TCGV_I32(a3), GET_TCGV_I32(a4), a5, a6);
 }
 
@@ -244,7 +244,7 @@ static inline void tcg_gen_op6ii_i64(TCGOpcode opc, TCGv_i64 a1, TCGv_i64 a2,
                                      TCGv_i64 a3, TCGv_i64 a4,
                                      TCGArg a5, TCGArg a6)
 {
-    tcg_gen_op6(&tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
+    tcg_gen_op6(tcg_ctx, opc, GET_TCGV_I64(a1), GET_TCGV_I64(a2),
                 GET_TCGV_I64(a3), GET_TCGV_I64(a4), a5, a6);
 }
 
@@ -253,12 +253,12 @@ static inline void tcg_gen_op6ii_i64(TCGOpcode opc, TCGv_i64 a1, TCGv_i64 a2,
 
 static inline void gen_set_label(TCGLabel *l)
 {
-    tcg_gen_op1(&tcg_ctx, INDEX_op_set_label, label_arg(l));
+    tcg_gen_op1(tcg_ctx, INDEX_op_set_label, label_arg(l));
 }
 
 static inline void tcg_gen_br(TCGLabel *l)
 {
-    tcg_gen_op1(&tcg_ctx, INDEX_op_br, label_arg(l));
+    tcg_gen_op1(tcg_ctx, INDEX_op_br, label_arg(l));
 }
 
 void tcg_gen_mb(TCGBar);
@@ -732,12 +732,12 @@ static inline void tcg_gen_concat32_i64(TCGv_i64 ret, TCGv_i64 lo, TCGv_i64 hi)
 # if TARGET_LONG_BITS <= TCG_TARGET_REG_BITS
 static inline void tcg_gen_insn_start(target_ulong pc)
 {
-    tcg_gen_op1(&tcg_ctx, INDEX_op_insn_start, pc);
+    tcg_gen_op1(tcg_ctx, INDEX_op_insn_start, pc);
 }
 # else
 static inline void tcg_gen_insn_start(target_ulong pc)
 {
-    tcg_gen_op2(&tcg_ctx, INDEX_op_insn_start,
+    tcg_gen_op2(tcg_ctx, INDEX_op_insn_start,
                 (uint32_t)pc, (uint32_t)(pc >> 32));
 }
 # endif
@@ -745,12 +745,12 @@ static inline void tcg_gen_insn_start(target_ulong pc)
 # if TARGET_LONG_BITS <= TCG_TARGET_REG_BITS
 static inline void tcg_gen_insn_start(target_ulong pc, target_ulong a1)
 {
-    tcg_gen_op2(&tcg_ctx, INDEX_op_insn_start, pc, a1);
+    tcg_gen_op2(tcg_ctx, INDEX_op_insn_start, pc, a1);
 }
 # else
 static inline void tcg_gen_insn_start(target_ulong pc, target_ulong a1)
 {
-    tcg_gen_op4(&tcg_ctx, INDEX_op_insn_start,
+    tcg_gen_op4(tcg_ctx, INDEX_op_insn_start,
                 (uint32_t)pc, (uint32_t)(pc >> 32),
                 (uint32_t)a1, (uint32_t)(a1 >> 32));
 }
@@ -760,13 +760,13 @@ static inline void tcg_gen_insn_start(target_ulong pc, target_ulong a1)
 static inline void tcg_gen_insn_start(target_ulong pc, target_ulong a1,
                                       target_ulong a2)
 {
-    tcg_gen_op3(&tcg_ctx, INDEX_op_insn_start, pc, a1, a2);
+    tcg_gen_op3(tcg_ctx, INDEX_op_insn_start, pc, a1, a2);
 }
 # else
 static inline void tcg_gen_insn_start(target_ulong pc, target_ulong a1,
                                       target_ulong a2)
 {
-    tcg_gen_op6(&tcg_ctx, INDEX_op_insn_start,
+    tcg_gen_op6(tcg_ctx, INDEX_op_insn_start,
                 (uint32_t)pc, (uint32_t)(pc >> 32),
                 (uint32_t)a1, (uint32_t)(a1 >> 32),
                 (uint32_t)a2, (uint32_t)(a2 >> 32));
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 53c679f..c88746d 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -726,18 +726,19 @@ struct TCGContext {
     target_ulong gen_insn_data[TCG_MAX_INSNS][TARGET_INSN_START_WORDS];
 };
 
-extern TCGContext tcg_ctx;
+extern TCGContext tcg_init_ctx;
+extern TCGContext *tcg_ctx;
 
 static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
 {
-    int op_argi = tcg_ctx.gen_op_buf[op_idx].args;
-    tcg_ctx.gen_opparam_buf[op_argi + arg] = v;
+    int op_argi = tcg_ctx->gen_op_buf[op_idx].args;
+    tcg_ctx->gen_opparam_buf[op_argi + arg] = v;
 }
 
 /* The number of opcodes emitted so far.  */
 static inline int tcg_op_buf_count(void)
 {
-    return tcg_ctx.gen_next_op_idx;
+    return tcg_ctx->gen_next_op_idx;
 }
 
 /* Test for whether to terminate the TB for using too many opcodes.  */
@@ -756,13 +757,13 @@ TranslationBlock *tcg_tb_alloc(TCGContext *s);
 /* Called with tb_lock held.  */
 static inline void *tcg_malloc(int size)
 {
-    TCGContext *s = &tcg_ctx;
+    TCGContext *s = tcg_ctx;
     uint8_t *ptr, *ptr_end;
     size = (size + sizeof(long) - 1) & ~(sizeof(long) - 1);
     ptr = s->pool_cur;
     ptr_end = ptr + size;
     if (unlikely(ptr_end > s->pool_end)) {
-        return tcg_malloc_internal(&tcg_ctx, size);
+        return tcg_malloc_internal(tcg_ctx, size);
     } else {
         s->pool_cur = ptr_end;
         return ptr;
@@ -1100,7 +1101,7 @@ static inline unsigned get_mmuidx(TCGMemOpIdx oi)
 uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr);
 #else
 # define tcg_qemu_tb_exec(env, tb_ptr) \
-    ((uintptr_t (*)(void *, void *))tcg_ctx.code_gen_prologue)(env, tb_ptr)
+    ((uintptr_t (*)(void *, void *))tcg_ctx->code_gen_prologue)(env, tb_ptr)
 #endif
 
 void tcg_register_jit(void *buf, size_t buf_size);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 5509407..e6ee4e3 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -153,7 +153,8 @@ static int v_l2_levels;
 static void *l1_map[V_L1_MAX_SIZE];
 
 /* code generation context */
-TCGContext tcg_ctx;
+TCGContext tcg_init_ctx;
+TCGContext *tcg_ctx;
 TBContext tb_ctx;
 bool parallel_cpus;
 
@@ -209,7 +210,7 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
 
 void cpu_gen_init(void)
 {
-    tcg_context_init(&tcg_ctx); 
+    tcg_context_init(&tcg_init_ctx);
 }
 
 /* Encode VAL as a signed leb128 sequence at P.
@@ -267,7 +268,7 @@ static target_long decode_sleb128(uint8_t **pp)
 
 static int encode_search(TranslationBlock *tb, uint8_t *block)
 {
-    uint8_t *highwater = tcg_ctx.code_gen_highwater;
+    uint8_t *highwater = tcg_ctx->code_gen_highwater;
     uint8_t *p = block;
     int i, j, n;
 
@@ -280,12 +281,12 @@ static int encode_search(TranslationBlock *tb, uint8_t *block)
             if (i == 0) {
                 prev = (j == 0 ? tb->pc : 0);
             } else {
-                prev = tcg_ctx.gen_insn_data[i - 1][j];
+                prev = tcg_ctx->gen_insn_data[i - 1][j];
             }
-            p = encode_sleb128(p, tcg_ctx.gen_insn_data[i][j] - prev);
+            p = encode_sleb128(p, tcg_ctx->gen_insn_data[i][j] - prev);
         }
-        prev = (i == 0 ? 0 : tcg_ctx.gen_insn_end_off[i - 1]);
-        p = encode_sleb128(p, tcg_ctx.gen_insn_end_off[i] - prev);
+        prev = (i == 0 ? 0 : tcg_ctx->gen_insn_end_off[i - 1]);
+        p = encode_sleb128(p, tcg_ctx->gen_insn_end_off[i] - prev);
 
         /* Test for (pending) buffer overflow.  The assumption is that any
            one row beginning below the high water mark cannot overrun
@@ -345,8 +346,8 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
     restore_state_to_opc(env, tb, data);
 
 #ifdef CONFIG_PROFILER
-    tcg_ctx.restore_time += profile_getclock() - ti;
-    tcg_ctx.restore_count++;
+    tcg_ctx->restore_time += profile_getclock() - ti;
+    tcg_ctx->restore_count++;
 #endif
     return 0;
 }
@@ -592,7 +593,7 @@ static inline void *split_cross_256mb(void *buf1, size_t size1)
         buf1 = buf2;
     }
 
-    tcg_ctx.code_gen_buffer_size = size1;
+    tcg_ctx->code_gen_buffer_size = size1;
     return buf1;
 }
 #endif
@@ -655,16 +656,16 @@ static inline void *alloc_code_gen_buffer(void)
     size = full_size - qemu_real_host_page_size;
 
     /* Honor a command-line option limiting the size of the buffer.  */
-    if (size > tcg_ctx.code_gen_buffer_size) {
-        size = (((uintptr_t)buf + tcg_ctx.code_gen_buffer_size)
+    if (size > tcg_ctx->code_gen_buffer_size) {
+        size = (((uintptr_t)buf + tcg_ctx->code_gen_buffer_size)
                 & qemu_real_host_page_mask) - (uintptr_t)buf;
     }
-    tcg_ctx.code_gen_buffer_size = size;
+    tcg_ctx->code_gen_buffer_size = size;
 
 #ifdef __mips__
     if (cross_256mb(buf, size)) {
         buf = split_cross_256mb(buf, size);
-        size = tcg_ctx.code_gen_buffer_size;
+        size = tcg_ctx->code_gen_buffer_size;
     }
 #endif
 
@@ -677,7 +678,7 @@ static inline void *alloc_code_gen_buffer(void)
 #elif defined(_WIN32)
 static inline void *alloc_code_gen_buffer(void)
 {
-    size_t size = tcg_ctx.code_gen_buffer_size;
+    size_t size = tcg_ctx->code_gen_buffer_size;
     void *buf1, *buf2;
 
     /* Perform the allocation in two steps, so that the guard page
@@ -696,7 +697,7 @@ static inline void *alloc_code_gen_buffer(void)
 {
     int flags = MAP_PRIVATE | MAP_ANONYMOUS;
     uintptr_t start = 0;
-    size_t size = tcg_ctx.code_gen_buffer_size;
+    size_t size = tcg_ctx->code_gen_buffer_size;
     void *buf;
 
     /* Constrain the position of the buffer based on the host cpu.
@@ -713,7 +714,7 @@ static inline void *alloc_code_gen_buffer(void)
     flags |= MAP_32BIT;
     /* Cannot expect to map more than 800MB in low memory.  */
     if (size > 800u * 1024 * 1024) {
-        tcg_ctx.code_gen_buffer_size = size = 800u * 1024 * 1024;
+        tcg_ctx->code_gen_buffer_size = size = 800u * 1024 * 1024;
     }
 # elif defined(__sparc__)
     start = 0x40000000ul;
@@ -753,7 +754,7 @@ static inline void *alloc_code_gen_buffer(void)
         default:
             /* Split the original buffer.  Free the smaller half.  */
             buf2 = split_cross_256mb(buf, size);
-            size2 = tcg_ctx.code_gen_buffer_size;
+            size2 = tcg_ctx->code_gen_buffer_size;
             if (buf == buf2) {
                 munmap(buf + size2 + qemu_real_host_page_size, size - size2);
             } else {
@@ -821,9 +822,9 @@ static gint tb_tc_cmp(gconstpointer ap, gconstpointer bp)
 
 static inline void code_gen_alloc(size_t tb_size)
 {
-    tcg_ctx.code_gen_buffer_size = size_code_gen_buffer(tb_size);
-    tcg_ctx.code_gen_buffer = alloc_code_gen_buffer();
-    if (tcg_ctx.code_gen_buffer == NULL) {
+    tcg_ctx->code_gen_buffer_size = size_code_gen_buffer(tb_size);
+    tcg_ctx->code_gen_buffer = alloc_code_gen_buffer();
+    if (tcg_ctx->code_gen_buffer == NULL) {
         fprintf(stderr, "Could not allocate dynamic translator buffer\n");
         exit(1);
     }
@@ -851,7 +852,7 @@ void tcg_exec_init(unsigned long tb_size)
 #if defined(CONFIG_SOFTMMU)
     /* There's no guest base to take into account, so go ahead and
        initialize the prologue now.  */
-    tcg_prologue_init(&tcg_ctx);
+    tcg_prologue_init(tcg_ctx);
 #endif
 }
 
@@ -867,7 +868,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 
     assert_tb_locked();
 
-    tb = tcg_tb_alloc(&tcg_ctx);
+    tb = tcg_tb_alloc(tcg_ctx);
     if (unlikely(tb == NULL)) {
         return NULL;
     }
@@ -951,11 +952,11 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
 
         g_tree_foreach(tb_ctx.tb_tree, tb_host_size_iter, &host_size);
         printf("qemu: flush code_size=%td nb_tbs=%zu avg_tb_size=%zu\n",
-               tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer, nb_tbs,
+               tcg_ctx->code_gen_ptr - tcg_ctx->code_gen_buffer, nb_tbs,
                nb_tbs > 0 ? host_size / nb_tbs : 0);
     }
-    if ((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer)
-        > tcg_ctx.code_gen_buffer_size) {
+    if ((unsigned long)(tcg_ctx->code_gen_ptr - tcg_ctx->code_gen_buffer)
+        > tcg_ctx->code_gen_buffer_size) {
         cpu_abort(cpu, "Internal error: code buffer overflow\n");
     }
 
@@ -970,7 +971,7 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
     qht_reset_size(&tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
     page_flush_tb();
 
-    tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer;
+    tcg_ctx->code_gen_ptr = tcg_ctx->code_gen_buffer;
     /* XXX: flush processor icache at this point if cache flush is
        expensive */
     atomic_mb_set(&tb_ctx.tb_flush_count, tb_ctx.tb_flush_count + 1);
@@ -1321,44 +1322,44 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         cpu_loop_exit(cpu);
     }
 
-    gen_code_buf = tcg_ctx.code_gen_ptr;
+    gen_code_buf = tcg_ctx->code_gen_ptr;
     tb->tc.ptr = gen_code_buf;
     tb->pc = pc;
     tb->cs_base = cs_base;
     tb->flags = flags;
     tb->cflags = cflags;
     tb->trace_vcpu_dstate = *cpu->trace_dstate;
-    tcg_ctx.cf_parallel = !!(cflags & CF_PARALLEL);
+    tcg_ctx->cf_parallel = !!(cflags & CF_PARALLEL);
 
 #ifdef CONFIG_PROFILER
-    tcg_ctx.tb_count1++; /* includes aborted translations because of
+    tcg_ctx->tb_count1++; /* includes aborted translations because of
                        exceptions */
     ti = profile_getclock();
 #endif
 
-    tcg_func_start(&tcg_ctx);
+    tcg_func_start(tcg_ctx);
 
-    tcg_ctx.cpu = ENV_GET_CPU(env);
+    tcg_ctx->cpu = ENV_GET_CPU(env);
     gen_intermediate_code(env, tb);
-    tcg_ctx.cpu = NULL;
+    tcg_ctx->cpu = NULL;
 
     trace_translate_block(tb, tb->pc, tb->tc.ptr);
 
     /* generate machine code */
     tb->jmp_reset_offset[0] = TB_JMP_RESET_OFFSET_INVALID;
     tb->jmp_reset_offset[1] = TB_JMP_RESET_OFFSET_INVALID;
-    tcg_ctx.tb_jmp_reset_offset = tb->jmp_reset_offset;
+    tcg_ctx->tb_jmp_reset_offset = tb->jmp_reset_offset;
 #ifdef USE_DIRECT_JUMP
-    tcg_ctx.tb_jmp_insn_offset = tb->jmp_insn_offset;
-    tcg_ctx.tb_jmp_target_addr = NULL;
+    tcg_ctx->tb_jmp_insn_offset = tb->jmp_insn_offset;
+    tcg_ctx->tb_jmp_target_addr = NULL;
 #else
-    tcg_ctx.tb_jmp_insn_offset = NULL;
-    tcg_ctx.tb_jmp_target_addr = tb->jmp_target_addr;
+    tcg_ctx->tb_jmp_insn_offset = NULL;
+    tcg_ctx->tb_jmp_target_addr = tb->jmp_target_addr;
 #endif
 
 #ifdef CONFIG_PROFILER
-    tcg_ctx.tb_count++;
-    tcg_ctx.interm_time += profile_getclock() - ti;
+    tcg_ctx->tb_count++;
+    tcg_ctx->interm_time += profile_getclock() - ti;
     ti = profile_getclock();
 #endif
 
@@ -1367,7 +1368,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
        the tcg optimization currently hidden inside tcg_gen_code.  All
        that should be required is to flush the TBs, allocate a new TB,
        re-initialize it per above, and re-do the actual code generation.  */
-    gen_code_size = tcg_gen_code(&tcg_ctx, tb);
+    gen_code_size = tcg_gen_code(tcg_ctx, tb);
     if (unlikely(gen_code_size < 0)) {
         goto buffer_overflow;
     }
@@ -1378,10 +1379,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tb->tc.size = gen_code_size;
 
 #ifdef CONFIG_PROFILER
-    tcg_ctx.code_time += profile_getclock() - ti;
-    tcg_ctx.code_in_len += tb->size;
-    tcg_ctx.code_out_len += gen_code_size;
-    tcg_ctx.search_out_len += search_size;
+    tcg_ctx->code_time += profile_getclock() - ti;
+    tcg_ctx->code_in_len += tb->size;
+    tcg_ctx->code_out_len += gen_code_size;
+    tcg_ctx->search_out_len += search_size;
 #endif
 
 #ifdef DEBUG_DISAS
@@ -1396,7 +1397,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     }
 #endif
 
-    tcg_ctx.code_gen_ptr = (void *)
+    tcg_ctx->code_gen_ptr = (void *)
         ROUND_UP((uintptr_t)gen_code_buf + gen_code_size + search_size,
                  CODE_GEN_ALIGN);
 
@@ -1941,8 +1942,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
      * For avg host size we use the precise numbers from tb_tree_stats though.
      */
     cpu_fprintf(f, "gen code size       %td/%zd\n",
-                tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer,
-                tcg_ctx.code_gen_highwater - tcg_ctx.code_gen_buffer);
+                tcg_ctx->code_gen_ptr - tcg_ctx->code_gen_buffer,
+                tcg_ctx->code_gen_highwater - tcg_ctx->code_gen_buffer);
     cpu_fprintf(f, "TB count            %zu\n", nb_tbs);
     cpu_fprintf(f, "TB avg target size  %zu max=%zu bytes\n",
                 nb_tbs ? tst.target_size / nb_tbs : 0,
diff --git a/bsd-user/main.c b/bsd-user/main.c
index fa9c012..7a8b29e 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -978,7 +978,7 @@ int main(int argc, char **argv)
     /* Now that we've loaded the binary, GUEST_BASE is fixed.  Delay
        generating the prologue until now so that the prologue can take
        the real value of GUEST_BASE into account.  */
-    tcg_prologue_init(&tcg_ctx);
+    tcg_prologue_init(tcg_ctx);
 
     /* build Task State */
     memset(ts, 0, sizeof(TaskState));
diff --git a/linux-user/main.c b/linux-user/main.c
index dbbe3d7..de7d948 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -4459,7 +4459,7 @@ int main(int argc, char **argv, char **envp)
     /* Now that we've loaded the binary, GUEST_BASE is fixed.  Delay
        generating the prologue until now so that the prologue can take
        the real value of GUEST_BASE into account.  */
-    tcg_prologue_init(&tcg_ctx);
+    tcg_prologue_init(tcg_ctx);
 
 #if defined(TARGET_I386)
     env->cr[0] = CR0_PG_MASK | CR0_WP_MASK | CR0_PE_MASK;
diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index f97a8e5..b506198 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -156,7 +156,7 @@ void alpha_translate_init(void)
     done_init = 1;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
 
     for (i = 0; i < 31; i++) {
         cpu_std_ir[i] = tcg_global_mem_new_i64(cpu_env,
diff --git a/target/arm/translate.c b/target/arm/translate.c
index bd0ef58..657d1fe 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -82,7 +82,7 @@ void arm_translate_init(void)
     int i;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
 
     for (i = 0; i < 16; i++) {
         cpu_R[i] = tcg_global_mem_new_i32(cpu_env,
diff --git a/target/cris/translate.c b/target/cris/translate.c
index 1703d91..afaeadf 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -3365,7 +3365,7 @@ void cris_initialize_tcg(void)
     int i;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
     cc_x = tcg_global_mem_new(cpu_env,
                               offsetof(CPUCRISState, cc_x), "cc_x");
     cc_src = tcg_global_mem_new(cpu_env,
diff --git a/target/cris/translate_v10.c b/target/cris/translate_v10.c
index 4a0b485..5d48920 100644
--- a/target/cris/translate_v10.c
+++ b/target/cris/translate_v10.c
@@ -1273,7 +1273,7 @@ void cris_initialize_crisv10_tcg(void)
     int i;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
     cc_x = tcg_global_mem_new(cpu_env,
                               offsetof(CPUCRISState, cc_x), "cc_x");
     cc_src = tcg_global_mem_new(cpu_env,
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 66aa11d..c1ba87c 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -145,7 +145,7 @@ void hppa_translate_init(void)
     done_init = 1;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
 
     TCGV_UNUSED(cpu_gr[0]);
     for (i = 1; i < 32; i++) {
diff --git a/target/i386/translate.c b/target/i386/translate.c
index 0f38a48..0d574a7 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -8335,7 +8335,7 @@ void tcg_x86_init(void)
     initialized = true;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
     cpu_cc_op = tcg_global_mem_new_i32(cpu_env,
                                        offsetof(CPUX86State, cc_op), "cc_op");
     cpu_cc_dst = tcg_global_mem_new(cpu_env, offsetof(CPUX86State, cc_dst),
diff --git a/target/lm32/translate.c b/target/lm32/translate.c
index 3597c61..0e8ed34 100644
--- a/target/lm32/translate.c
+++ b/target/lm32/translate.c
@@ -1203,7 +1203,7 @@ void lm32_translate_init(void)
     int i;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
 
     for (i = 0; i < ARRAY_SIZE(cpu_R); i++) {
         cpu_R[i] = tcg_global_mem_new(cpu_env,
diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 65044be..e1fd030 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -69,7 +69,7 @@ void m68k_tcg_init(void)
     int i;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
 
 #define DEFO32(name, offset) \
     QREG_##name = tcg_global_mem_new_i32(cpu_env, \
diff --git a/target/microblaze/translate.c b/target/microblaze/translate.c
index 4cd184e..9e8e38c 100644
--- a/target/microblaze/translate.c
+++ b/target/microblaze/translate.c
@@ -1861,7 +1861,7 @@ void mb_tcg_init(void)
     int i;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
 
     env_debug = tcg_global_mem_new(cpu_env,
                     offsetof(CPUMBState, debug),
diff --git a/target/mips/translate.c b/target/mips/translate.c
index f839a2b..84eb5c2 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -20149,7 +20149,7 @@ void mips_tcg_init(void)
         return;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
 
     TCGV_UNUSED(cpu_gpr[0]);
     for (i = 1; i < 32; i++)
diff --git a/target/moxie/translate.c b/target/moxie/translate.c
index f61aa2d..93983a6 100644
--- a/target/moxie/translate.c
+++ b/target/moxie/translate.c
@@ -106,7 +106,7 @@ void moxie_translate_init(void)
         return;
     }
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
     cpu_pc = tcg_global_mem_new_i32(cpu_env,
                                     offsetof(CPUMoxieState, pc), "$pc");
     for (i = 0; i < 16; i++)
diff --git a/target/openrisc/translate.c b/target/openrisc/translate.c
index 347790c..cbe67cb 100644
--- a/target/openrisc/translate.c
+++ b/target/openrisc/translate.c
@@ -75,7 +75,7 @@ void openrisc_translate_init(void)
     int i;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
     cpu_sr = tcg_global_mem_new(cpu_env,
                                 offsetof(CPUOpenRISCState, sr), "sr");
     cpu_dflag = tcg_global_mem_new_i32(cpu_env,
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index e146aa3..b0ab44a 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -90,7 +90,7 @@ void ppc_translate_init(void)
         return;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
 
     p = cpu_reg_names;
     cpu_reg_names_size = sizeof(cpu_reg_names);
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index ea8a90a..ca4d2b0 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -171,7 +171,7 @@ void s390x_translate_init(void)
     int i;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
     psw_addr = tcg_global_mem_new_i64(cpu_env,
                                       offsetof(CPUS390XState, psw.addr),
                                       "psw_addr");
diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 52fabb3..f1cb018 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -105,7 +105,7 @@ void sh4_translate_init(void)
     }
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
 
     for (i = 0; i < 24; i++) {
         cpu_gregs[i] = tcg_global_mem_new_i32(cpu_env,
diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index 768ce68..b22f765 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -5933,7 +5933,7 @@ void gen_intermediate_code_init(CPUSPARCState *env)
     inited = 1;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
 
     cpu_regwptr = tcg_global_mem_new_ptr(cpu_env,
                                          offsetof(CPUSPARCState, regwptr),
diff --git a/target/tilegx/translate.c b/target/tilegx/translate.c
index 33be670..f32f088 100644
--- a/target/tilegx/translate.c
+++ b/target/tilegx/translate.c
@@ -2447,7 +2447,7 @@ void tilegx_tcg_init(void)
     int i;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
     cpu_pc = tcg_global_mem_new_i64(cpu_env, offsetof(CPUTLGState, pc), "pc");
     for (i = 0; i < TILEGX_R_COUNT; i++) {
         cpu_regs[i] = tcg_global_mem_new_i64(cpu_env,
diff --git a/target/tricore/translate.c b/target/tricore/translate.c
index 3d8448c..4cce3cc 100644
--- a/target/tricore/translate.c
+++ b/target/tricore/translate.c
@@ -8886,7 +8886,7 @@ void tricore_tcg_init(void)
         return;
     }
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
     /* reg init */
     for (i = 0 ; i < 16 ; i++) {
         cpu_gpr_a[i] = tcg_global_mem_new(cpu_env,
diff --git a/target/unicore32/translate.c b/target/unicore32/translate.c
index 4cede72..d7c7d49 100644
--- a/target/unicore32/translate.c
+++ b/target/unicore32/translate.c
@@ -70,7 +70,7 @@ void uc32_translate_init(void)
     int i;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
 
     for (i = 0; i < 32; i++) {
         cpu_R[i] = tcg_global_mem_new_i32(cpu_env,
diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
index 3ded61b..7f859cc 100644
--- a/target/xtensa/translate.c
+++ b/target/xtensa/translate.c
@@ -218,7 +218,7 @@ void xtensa_translate_init(void)
     int i;
 
     cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-    tcg_ctx.tcg_env = cpu_env;
+    tcg_ctx->tcg_env = cpu_env;
     cpu_pc = tcg_global_mem_new_i32(cpu_env,
             offsetof(CPUXtensaState, pc), "pc");
 
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index ef420d4..4a7057e 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -150,8 +150,8 @@ void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2,
 
 void tcg_gen_mb(TCGBar mb_type)
 {
-    if (tcg_ctx.cf_parallel) {
-        tcg_gen_op1(&tcg_ctx, INDEX_op_mb, mb_type);
+    if (tcg_ctx->cf_parallel) {
+        tcg_gen_op1(tcg_ctx, INDEX_op_mb, mb_type);
     }
 }
 
@@ -2486,7 +2486,7 @@ void tcg_gen_extrl_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
     if (TCG_TARGET_REG_BITS == 32) {
         tcg_gen_mov_i32(ret, TCGV_LOW(arg));
     } else if (TCG_TARGET_HAS_extrl_i64_i32) {
-        tcg_gen_op2(&tcg_ctx, INDEX_op_extrl_i64_i32,
+        tcg_gen_op2(tcg_ctx, INDEX_op_extrl_i64_i32,
                     GET_TCGV_I32(ret), GET_TCGV_I64(arg));
     } else {
         tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
@@ -2498,7 +2498,7 @@ void tcg_gen_extrh_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
     if (TCG_TARGET_REG_BITS == 32) {
         tcg_gen_mov_i32(ret, TCGV_HIGH(arg));
     } else if (TCG_TARGET_HAS_extrh_i64_i32) {
-        tcg_gen_op2(&tcg_ctx, INDEX_op_extrh_i64_i32,
+        tcg_gen_op2(tcg_ctx, INDEX_op_extrh_i64_i32,
                     GET_TCGV_I32(ret), GET_TCGV_I64(arg));
     } else {
         TCGv_i64 t = tcg_temp_new_i64();
@@ -2514,7 +2514,7 @@ void tcg_gen_extu_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
         tcg_gen_mov_i32(TCGV_LOW(ret), arg);
         tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
     } else {
-        tcg_gen_op2(&tcg_ctx, INDEX_op_extu_i32_i64,
+        tcg_gen_op2(tcg_ctx, INDEX_op_extu_i32_i64,
                     GET_TCGV_I64(ret), GET_TCGV_I32(arg));
     }
 }
@@ -2525,7 +2525,7 @@ void tcg_gen_ext_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
         tcg_gen_mov_i32(TCGV_LOW(ret), arg);
         tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
     } else {
-        tcg_gen_op2(&tcg_ctx, INDEX_op_ext_i32_i64,
+        tcg_gen_op2(tcg_ctx, INDEX_op_ext_i32_i64,
                     GET_TCGV_I64(ret), GET_TCGV_I32(arg));
     }
 }
@@ -2581,8 +2581,8 @@ void tcg_gen_goto_tb(unsigned idx)
     tcg_debug_assert(idx <= 1);
 #ifdef CONFIG_DEBUG_TCG
     /* Verify that we havn't seen this numbered exit before.  */
-    tcg_debug_assert((tcg_ctx.goto_tb_issue_mask & (1 << idx)) == 0);
-    tcg_ctx.goto_tb_issue_mask |= 1 << idx;
+    tcg_debug_assert((tcg_ctx->goto_tb_issue_mask & (1 << idx)) == 0);
+    tcg_ctx->goto_tb_issue_mask |= 1 << idx;
 #endif
     tcg_gen_op1i(INDEX_op_goto_tb, idx);
 }
@@ -2591,7 +2591,7 @@ void tcg_gen_lookup_and_goto_ptr(void)
 {
     if (TCG_TARGET_HAS_goto_ptr && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
         TCGv_ptr ptr = tcg_temp_new_ptr();
-        gen_helper_lookup_tb_ptr(ptr, tcg_ctx.tcg_env);
+        gen_helper_lookup_tb_ptr(ptr, tcg_ctx->tcg_env);
         tcg_gen_op1i(INDEX_op_goto_ptr, GET_TCGV_PTR(ptr));
         tcg_temp_free_ptr(ptr);
     } else {
@@ -2637,7 +2637,7 @@ static void gen_ldst_i32(TCGOpcode opc, TCGv_i32 val, TCGv addr,
     if (TCG_TARGET_REG_BITS == 32) {
         tcg_gen_op4i_i32(opc, val, TCGV_LOW(addr), TCGV_HIGH(addr), oi);
     } else {
-        tcg_gen_op3(&tcg_ctx, opc, GET_TCGV_I32(val), GET_TCGV_I64(addr), oi);
+        tcg_gen_op3(tcg_ctx, opc, GET_TCGV_I32(val), GET_TCGV_I64(addr), oi);
     }
 #endif
 }
@@ -2650,7 +2650,7 @@ static void gen_ldst_i64(TCGOpcode opc, TCGv_i64 val, TCGv addr,
     if (TCG_TARGET_REG_BITS == 32) {
         tcg_gen_op4i_i32(opc, TCGV_LOW(val), TCGV_HIGH(val), addr, oi);
     } else {
-        tcg_gen_op3(&tcg_ctx, opc, GET_TCGV_I64(val), GET_TCGV_I32(addr), oi);
+        tcg_gen_op3(tcg_ctx, opc, GET_TCGV_I64(val), GET_TCGV_I32(addr), oi);
     }
 #else
     if (TCG_TARGET_REG_BITS == 32) {
@@ -2665,7 +2665,7 @@ static void gen_ldst_i64(TCGOpcode opc, TCGv_i64 val, TCGv addr,
 void tcg_gen_qemu_ld_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop)
 {
     memop = tcg_canonicalize_memop(memop, 0, 0);
-    trace_guest_mem_before_tcg(tcg_ctx.cpu, tcg_ctx.tcg_env,
+    trace_guest_mem_before_tcg(tcg_ctx->cpu, tcg_ctx->tcg_env,
                                addr, trace_mem_get_info(memop, 0));
     gen_ldst_i32(INDEX_op_qemu_ld_i32, val, addr, memop, idx);
 }
@@ -2673,7 +2673,7 @@ void tcg_gen_qemu_ld_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop)
 void tcg_gen_qemu_st_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop)
 {
     memop = tcg_canonicalize_memop(memop, 0, 1);
-    trace_guest_mem_before_tcg(tcg_ctx.cpu, tcg_ctx.tcg_env,
+    trace_guest_mem_before_tcg(tcg_ctx->cpu, tcg_ctx->tcg_env,
                                addr, trace_mem_get_info(memop, 1));
     gen_ldst_i32(INDEX_op_qemu_st_i32, val, addr, memop, idx);
 }
@@ -2691,7 +2691,7 @@ void tcg_gen_qemu_ld_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop)
     }
 
     memop = tcg_canonicalize_memop(memop, 1, 0);
-    trace_guest_mem_before_tcg(tcg_ctx.cpu, tcg_ctx.tcg_env,
+    trace_guest_mem_before_tcg(tcg_ctx->cpu, tcg_ctx->tcg_env,
                                addr, trace_mem_get_info(memop, 0));
     gen_ldst_i64(INDEX_op_qemu_ld_i64, val, addr, memop, idx);
 }
@@ -2704,7 +2704,7 @@ void tcg_gen_qemu_st_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop)
     }
 
     memop = tcg_canonicalize_memop(memop, 1, 1);
-    trace_guest_mem_before_tcg(tcg_ctx.cpu, tcg_ctx.tcg_env,
+    trace_guest_mem_before_tcg(tcg_ctx->cpu, tcg_ctx->tcg_env,
                                addr, trace_mem_get_info(memop, 1));
     gen_ldst_i64(INDEX_op_qemu_st_i64, val, addr, memop, idx);
 }
@@ -2794,7 +2794,7 @@ void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
 {
     memop = tcg_canonicalize_memop(memop, 0, 0);
 
-    if (!tcg_ctx.cf_parallel) {
+    if (!tcg_ctx->cf_parallel) {
         TCGv_i32 t1 = tcg_temp_new_i32();
         TCGv_i32 t2 = tcg_temp_new_i32();
 
@@ -2820,11 +2820,11 @@ void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
 #ifdef CONFIG_SOFTMMU
         {
             TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
-            gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv, oi);
+            gen(retv, tcg_ctx->tcg_env, addr, cmpv, newv, oi);
             tcg_temp_free_i32(oi);
         }
 #else
-        gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv);
+        gen(retv, tcg_ctx->tcg_env, addr, cmpv, newv);
 #endif
 
         if (memop & MO_SIGN) {
@@ -2838,7 +2838,7 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
 {
     memop = tcg_canonicalize_memop(memop, 1, 0);
 
-    if (!tcg_ctx.cf_parallel) {
+    if (!tcg_ctx->cf_parallel) {
         TCGv_i64 t1 = tcg_temp_new_i64();
         TCGv_i64 t2 = tcg_temp_new_i64();
 
@@ -2865,14 +2865,14 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
 #ifdef CONFIG_SOFTMMU
         {
             TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop, idx));
-            gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv, oi);
+            gen(retv, tcg_ctx->tcg_env, addr, cmpv, newv, oi);
             tcg_temp_free_i32(oi);
         }
 #else
-        gen(retv, tcg_ctx.tcg_env, addr, cmpv, newv);
+        gen(retv, tcg_ctx->tcg_env, addr, cmpv, newv);
 #endif
 #else
-        gen_helper_exit_atomic(tcg_ctx.tcg_env);
+        gen_helper_exit_atomic(tcg_ctx->tcg_env);
         /* Produce a result, so that we have a well-formed opcode stream
            with respect to uses of the result in the (dead) code following.  */
         tcg_gen_movi_i64(retv, 0);
@@ -2928,11 +2928,11 @@ static void do_atomic_op_i32(TCGv_i32 ret, TCGv addr, TCGv_i32 val,
 #ifdef CONFIG_SOFTMMU
     {
         TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
-        gen(ret, tcg_ctx.tcg_env, addr, val, oi);
+        gen(ret, tcg_ctx->tcg_env, addr, val, oi);
         tcg_temp_free_i32(oi);
     }
 #else
-    gen(ret, tcg_ctx.tcg_env, addr, val);
+    gen(ret, tcg_ctx->tcg_env, addr, val);
 #endif
 
     if (memop & MO_SIGN) {
@@ -2973,14 +2973,14 @@ static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
 #ifdef CONFIG_SOFTMMU
         {
             TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
-            gen(ret, tcg_ctx.tcg_env, addr, val, oi);
+            gen(ret, tcg_ctx->tcg_env, addr, val, oi);
             tcg_temp_free_i32(oi);
         }
 #else
-        gen(ret, tcg_ctx.tcg_env, addr, val);
+        gen(ret, tcg_ctx->tcg_env, addr, val);
 #endif
 #else
-        gen_helper_exit_atomic(tcg_ctx.tcg_env);
+        gen_helper_exit_atomic(tcg_ctx->tcg_env);
         /* Produce a result, so that we have a well-formed opcode stream
            with respect to uses of the result in the (dead) code following.  */
         tcg_gen_movi_i64(ret, 0);
@@ -3015,7 +3015,7 @@ static void * const table_##NAME[16] = {                                \
 void tcg_gen_atomic_##NAME##_i32                                        \
     (TCGv_i32 ret, TCGv addr, TCGv_i32 val, TCGArg idx, TCGMemOp memop) \
 {                                                                       \
-    if (tcg_ctx.cf_parallel) {                                          \
+    if (tcg_ctx->cf_parallel) {                                         \
         do_atomic_op_i32(ret, addr, val, idx, memop, table_##NAME);     \
     } else {                                                            \
         do_nonatomic_op_i32(ret, addr, val, idx, memop, NEW,            \
@@ -3025,7 +3025,7 @@ void tcg_gen_atomic_##NAME##_i32                                        \
 void tcg_gen_atomic_##NAME##_i64                                        \
     (TCGv_i64 ret, TCGv addr, TCGv_i64 val, TCGArg idx, TCGMemOp memop) \
 {                                                                       \
-    if (tcg_ctx.cf_parallel) {                                          \
+    if (tcg_ctx->cf_parallel) {                                         \
         do_atomic_op_i64(ret, addr, val, idx, memop, table_##NAME);     \
     } else {                                                            \
         do_nonatomic_op_i64(ret, addr, val, idx, memop, NEW,            \
diff --git a/tcg/tcg-runtime.c b/tcg/tcg-runtime.c
index 9a87616..02d3acb 100644
--- a/tcg/tcg-runtime.c
+++ b/tcg/tcg-runtime.c
@@ -153,7 +153,7 @@ void *HELPER(lookup_tb_ptr)(CPUArchState *env)
 
     tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, curr_cflags());
     if (tb == NULL) {
-        return tcg_ctx.code_gen_epilogue;
+        return tcg_ctx->code_gen_epilogue;
     }
     qemu_log_mask_and_addr(CPU_LOG_EXEC, pc,
                            "Chain %p [%d: " TARGET_FMT_lx "] %s\n",
diff --git a/tcg/tcg.c b/tcg/tcg.c
index c0c2d6c..f907c47 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -116,7 +116,6 @@ static void tcg_out_tb_init(TCGContext *s);
 static bool tcg_out_tb_finalize(TCGContext *s);
 
 
-
 static TCGRegSet tcg_target_available_regs[2];
 static TCGRegSet tcg_target_call_clobber_regs;
 
@@ -242,7 +241,7 @@ static void tcg_out_label(TCGContext *s, TCGLabel *l, tcg_insn_unit *ptr)
 
 TCGLabel *gen_new_label(void)
 {
-    TCGContext *s = &tcg_ctx;
+    TCGContext *s = tcg_ctx;
     TCGLabel *l = tcg_malloc(sizeof(TCGLabel));
 
     *l = (TCGLabel){
@@ -381,6 +380,8 @@ void tcg_context_init(TCGContext *s)
     for (; i < ARRAY_SIZE(tcg_target_reg_alloc_order); ++i) {
         indirect_reg_alloc_order[i] = tcg_target_reg_alloc_order[i];
     }
+
+    tcg_ctx = s;
 }
 
 /*
@@ -526,7 +527,7 @@ void tcg_set_frame(TCGContext *s, TCGReg reg, intptr_t start, intptr_t size)
 
 TCGv_i32 tcg_global_reg_new_i32(TCGReg reg, const char *name)
 {
-    TCGContext *s = &tcg_ctx;
+    TCGContext *s = tcg_ctx;
     int idx;
 
     if (tcg_regset_test_reg(s->reserved_regs, reg)) {
@@ -538,7 +539,7 @@ TCGv_i32 tcg_global_reg_new_i32(TCGReg reg, const char *name)
 
 TCGv_i64 tcg_global_reg_new_i64(TCGReg reg, const char *name)
 {
-    TCGContext *s = &tcg_ctx;
+    TCGContext *s = tcg_ctx;
     int idx;
 
     if (tcg_regset_test_reg(s->reserved_regs, reg)) {
@@ -551,7 +552,7 @@ TCGv_i64 tcg_global_reg_new_i64(TCGReg reg, const char *name)
 int tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
                                 intptr_t offset, const char *name)
 {
-    TCGContext *s = &tcg_ctx;
+    TCGContext *s = tcg_ctx;
     TCGTemp *base_ts = &s->temps[GET_TCGV_PTR(base)];
     TCGTemp *ts = tcg_global_alloc(s);
     int indirect_reg = 0, bigendian = 0;
@@ -606,7 +607,7 @@ int tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
 
 static int tcg_temp_new_internal(TCGType type, int temp_local)
 {
-    TCGContext *s = &tcg_ctx;
+    TCGContext *s = tcg_ctx;
     TCGTemp *ts;
     int idx, k;
 
@@ -668,7 +669,7 @@ TCGv_i64 tcg_temp_new_internal_i64(int temp_local)
 
 static void tcg_temp_free_internal(int idx)
 {
-    TCGContext *s = &tcg_ctx;
+    TCGContext *s = tcg_ctx;
     TCGTemp *ts;
     int k;
 
@@ -733,13 +734,13 @@ TCGv_i64 tcg_const_local_i64(int64_t val)
 #if defined(CONFIG_DEBUG_TCG)
 void tcg_clear_temp_count(void)
 {
-    TCGContext *s = &tcg_ctx;
+    TCGContext *s = tcg_ctx;
     s->temps_in_use = 0;
 }
 
 int tcg_check_temp_count(void)
 {
-    TCGContext *s = &tcg_ctx;
+    TCGContext *s = tcg_ctx;
     if (s->temps_in_use) {
         /* Clear the count so that we don't give another
          * warning immediately next time around.
@@ -2707,7 +2708,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 #ifdef CONFIG_PROFILER
 void tcg_dump_info(FILE *f, fprintf_function cpu_fprintf)
 {
-    TCGContext *s = &tcg_ctx;
+    TCGContext *s = tcg_ctx;
     int64_t tb_count = s->tb_count;
     int64_t tb_div_count = tb_count ? tb_count : 1;
     int64_t tot = s->interm_time + s->code_time;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 34/43] gen-icount: fold exitreq_label into TCGContext
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (32 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 33/43] tcg: define tcg_init_ctx and make tcg_ctx a pointer Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps Emilio G. Cota
                   ` (9 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Groundwork for supporting multiple TCG contexts.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/gen-icount.h | 7 +++----
 tcg/tcg.h                 | 2 ++
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index c58b0b2..fe80176 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -6,13 +6,12 @@
 /* Helpers for instruction counting code generation.  */
 
 static int icount_start_insn_idx;
-static TCGLabel *exitreq_label;
 
 static inline void gen_tb_start(TranslationBlock *tb)
 {
     TCGv_i32 count, imm;
 
-    exitreq_label = gen_new_label();
+    tcg_ctx->exitreq_label = gen_new_label();
     if (tb_cflags(tb) & CF_USE_ICOUNT) {
         count = tcg_temp_local_new_i32();
     } else {
@@ -34,7 +33,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
         tcg_temp_free_i32(imm);
     }
 
-    tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, exitreq_label);
+    tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
 
     if (tb_cflags(tb) & CF_USE_ICOUNT) {
         tcg_gen_st16_i32(count, tcg_ctx->tcg_env,
@@ -52,7 +51,7 @@ static inline void gen_tb_end(TranslationBlock *tb, int num_insns)
         tcg_set_insn_param(icount_start_insn_idx, 1, num_insns);
     }
 
-    gen_set_label(exitreq_label);
+    gen_set_label(tcg_ctx->exitreq_label);
     tcg_gen_exit_tb((uintptr_t)tb + TB_EXIT_REQUESTED);
 
     /* Terminate the linked list.  */
diff --git a/tcg/tcg.h b/tcg/tcg.h
index c88746d..f83f9b0 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -712,6 +712,8 @@ struct TCGContext {
     /* The TCGBackendData structure is private to tcg-target.inc.c.  */
     struct TCGBackendData *be;
 
+    TCGLabel *exitreq_label;
+
     TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
     TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (33 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 34/43] gen-icount: fold exitreq_label into TCGContext Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  7:39   ` Richard Henderson
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 36/43] tcg: introduce **tcg_ctxs to keep track of all TCGContext's Emilio G. Cota
                   ` (8 subsequent siblings)
  43 siblings, 1 reply; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Groundwork for supporting multiple TCG contexts.

While at it, also allocate temps_used directly as a bitmap of the
required size, instead of having a bitmap of TCG_MAX_TEMPS via
TCGTempSet.

Performance-wise we lose about 2% in a translation-heavy workload
such as booting+shutting down debian-arm:

Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
	-machine type=virt -nographic -smp 1 -m 4096 \
	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
	-device virtio-net-device,netdev=unet \
	-drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
	-device virtio-blk-device,drive=myblock \
	-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
	-name arm,debug-threads=on -smp 1' (10 runs):

Before:
      19489.126318 task-clock                #    0.960 CPUs utilized            ( +-  0.96% )
            23,697 context-switches          #    0.001 M/sec                    ( +-  0.51% )
                 1 CPU-migrations            #    0.000 M/sec
            19,953 page-faults               #    0.001 M/sec                    ( +-  0.40% )
    56,214,402,410 cycles                    #    2.884 GHz                      ( +-  0.95% ) [83.34%]
    25,516,669,513 stalled-cycles-frontend   #   45.39% frontend cycles idle     ( +-  0.69% ) [83.33%]
    17,266,165,747 stalled-cycles-backend    #   30.71% backend  cycles idle     ( +-  0.59% ) [66.66%]
    79,007,843,327 instructions              #    1.41  insns per cycle
                                             #    0.32  stalled cycles per insn  ( +-  1.19% ) [83.34%]
    13,136,600,416 branches                  #  674.048 M/sec                    ( +-  1.29% ) [83.34%]
       274,715,270 branch-misses             #    2.09% of all branches          ( +-  0.79% ) [83.33%]

      20.300335944 seconds time elapsed                                          ( +-  0.55% )

After:
      19917.737030 task-clock                #    0.955 CPUs utilized            ( +-  0.74% )
            23,973 context-switches          #    0.001 M/sec                    ( +-  0.37% )
                 1 CPU-migrations            #    0.000 M/sec
            19,824 page-faults               #    0.001 M/sec                    ( +-  0.38% )
    57,380,269,537 cycles                    #    2.881 GHz                      ( +-  0.70% ) [83.34%]
    26,462,452,508 stalled-cycles-frontend   #   46.12% frontend cycles idle     ( +-  0.65% ) [83.34%]
    17,970,546,047 stalled-cycles-backend    #   31.32% backend  cycles idle     ( +-  0.64% ) [66.67%]
    79,527,238,334 instructions              #    1.39  insns per cycle
                                             #    0.33  stalled cycles per insn  ( +-  0.79% ) [83.33%]
    13,272,362,192 branches                  #  666.359 M/sec                    ( +-  0.83% ) [83.34%]
       278,357,773 branch-misses             #    2.10% of all branches          ( +-  0.65% ) [83.33%]

      20.850558455 seconds time elapsed                                          ( +-  0.55% )

That is, 2.70% slowdown.

The perf difference shrinks a bit when using a high-performance allocator
such as tcmalloc:

Before:
      19372.008814 task-clock                #    0.957 CPUs utilized            ( +-  1.00% )
            23,621 context-switches          #    0.001 M/sec                    ( +-  0.50% )
                 1 CPU-migrations            #    0.000 M/sec
            13,289 page-faults               #    0.001 M/sec                    ( +-  1.46% )
    55,824,272,818 cycles                    #    2.882 GHz                      ( +-  1.00% ) [83.33%]
    25,284,946,453 stalled-cycles-frontend   #   45.29% frontend cycles idle     ( +-  1.12% ) [83.32%]
    17,100,517,753 stalled-cycles-backend    #   30.63% backend  cycles idle     ( +-  0.86% ) [66.69%]
    78,193,046,990 instructions              #    1.40  insns per cycle
                                             #    0.32  stalled cycles per insn  ( +-  1.14% ) [83.35%]
    12,986,014,194 branches                  #  670.349 M/sec                    ( +-  1.22% ) [83.34%]
       272,581,789 branch-misses             #    2.10% of all branches          ( +-  0.62% ) [83.33%]

      20.249726404 seconds time elapsed                                          ( +-  0.61% )

After:
      19809.295886 task-clock                #    0.962 CPUs utilized            ( +-  0.99% )
            23,894 context-switches          #    0.001 M/sec                    ( +-  0.50% )
                 1 CPU-migrations            #    0.000 M/sec
            12,927 page-faults               #    0.001 M/sec                    ( +-  0.78% )
    57,131,686,004 cycles                    #    2.884 GHz                      ( +-  0.97% ) [83.34%]
    25,965,120,001 stalled-cycles-frontend   #   45.45% frontend cycles idle     ( +-  0.71% ) [83.35%]
    17,534,942,176 stalled-cycles-backend    #   30.69% backend  cycles idle     ( +-  0.54% ) [66.68%]
    80,000,003,715 instructions              #    1.40  insns per cycle
                                             #    0.32  stalled cycles per insn  ( +-  1.24% ) [83.34%]
    13,327,272,806 branches                  #  672.779 M/sec                    ( +-  1.31% ) [83.34%]
       273,622,661 branch-misses             #    2.05% of all branches          ( +-  0.95% ) [83.31%]

      20.601366430 seconds time elapsed                                          ( +-  0.60% )

That is, 1.77% slowdown.

Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/optimize.c | 307 ++++++++++++++++++++++++++++++---------------------------
 1 file changed, 162 insertions(+), 145 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index adfc56c..b727a4a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -40,21 +40,18 @@ struct tcg_temp_info {
     tcg_target_ulong mask;
 };
 
-static struct tcg_temp_info temps[TCG_MAX_TEMPS];
-static TCGTempSet temps_used;
-
-static inline bool temp_is_const(TCGArg arg)
+static inline bool temp_is_const(const struct tcg_temp_info *temps, TCGArg arg)
 {
     return temps[arg].is_const;
 }
 
-static inline bool temp_is_copy(TCGArg arg)
+static inline bool temp_is_copy(const struct tcg_temp_info *temps, TCGArg arg)
 {
     return temps[arg].next_copy != arg;
 }
 
 /* Reset TEMP's state, possibly removing the temp for the list of copies.  */
-static void reset_temp(TCGArg temp)
+static void reset_temp(struct tcg_temp_info *temps, TCGArg temp)
 {
     temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
     temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
@@ -64,21 +61,16 @@ static void reset_temp(TCGArg temp)
     temps[temp].mask = -1;
 }
 
-/* Reset all temporaries, given that there are NB_TEMPS of them.  */
-static void reset_all_temps(int nb_temps)
-{
-    bitmap_zero(temps_used.l, nb_temps);
-}
-
 /* Initialize and activate a temporary.  */
-static void init_temp_info(TCGArg temp)
+static void init_temp_info(struct tcg_temp_info *temps,
+                           unsigned long *temps_used, TCGArg temp)
 {
-    if (!test_bit(temp, temps_used.l)) {
+    if (!test_bit(temp, temps_used)) {
         temps[temp].next_copy = temp;
         temps[temp].prev_copy = temp;
         temps[temp].is_const = false;
         temps[temp].mask = -1;
-        set_bit(temp, temps_used.l);
+        set_bit(temp, temps_used);
     }
 }
 
@@ -116,7 +108,8 @@ static TCGOpcode op_to_movi(TCGOpcode op)
     }
 }
 
-static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
+static TCGArg find_better_copy(TCGContext *s, const struct tcg_temp_info *temps,
+                               TCGArg temp)
 {
     TCGArg i;
 
@@ -145,7 +138,8 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
     return temp;
 }
 
-static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
+static bool temps_are_copies(const struct tcg_temp_info *temps, TCGArg arg1,
+                             TCGArg arg2)
 {
     TCGArg i;
 
@@ -153,7 +147,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
         return true;
     }
 
-    if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) {
+    if (!temp_is_copy(temps, arg1) || !temp_is_copy(temps, arg2)) {
         return false;
     }
 
@@ -166,15 +160,15 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
     return false;
 }
 
-static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
-                             TCGArg dst, TCGArg val)
+static void tcg_opt_gen_movi(TCGContext *s, struct tcg_temp_info *temps,
+                             TCGOp *op, TCGArg *args, TCGArg dst, TCGArg val)
 {
     TCGOpcode new_op = op_to_movi(op->opc);
     tcg_target_ulong mask;
 
     op->opc = new_op;
 
-    reset_temp(dst);
+    reset_temp(temps, dst);
     temps[dst].is_const = true;
     temps[dst].val = val;
     mask = val;
@@ -188,10 +182,10 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
     args[1] = val;
 }
 
-static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
-                            TCGArg dst, TCGArg src)
+static void tcg_opt_gen_mov(TCGContext *s, struct tcg_temp_info *temps,
+                            TCGOp *op, TCGArg *args, TCGArg dst, TCGArg src)
 {
-    if (temps_are_copies(dst, src)) {
+    if (temps_are_copies(temps, dst, src)) {
         tcg_op_remove(s, op);
         return;
     }
@@ -201,7 +195,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
 
     op->opc = new_op;
 
-    reset_temp(dst);
+    reset_temp(temps, dst);
     mask = temps[src].mask;
     if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
         /* High bits of the destination are now garbage.  */
@@ -463,10 +457,11 @@ static bool do_constant_folding_cond_eq(TCGCond c)
 
 /* Return 2 if the condition can't be simplified, and the result
    of the condition (0 or 1) if it can */
-static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
-                                       TCGArg y, TCGCond c)
+static TCGArg
+do_constant_folding_cond(const struct tcg_temp_info *temps, TCGOpcode op,
+                         TCGArg x, TCGArg y, TCGCond c)
 {
-    if (temp_is_const(x) && temp_is_const(y)) {
+    if (temp_is_const(temps, x) && temp_is_const(temps, y)) {
         switch (op_bits(op)) {
         case 32:
             return do_constant_folding_cond_32(temps[x].val, temps[y].val, c);
@@ -475,9 +470,9 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
         default:
             tcg_abort();
         }
-    } else if (temps_are_copies(x, y)) {
+    } else if (temps_are_copies(temps, x, y)) {
         return do_constant_folding_cond_eq(c);
-    } else if (temp_is_const(y) && temps[y].val == 0) {
+    } else if (temp_is_const(temps, y) && temps[y].val == 0) {
         switch (c) {
         case TCG_COND_LTU:
             return 0;
@@ -492,15 +487,17 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
 
 /* Return 2 if the condition can't be simplified, and the result
    of the condition (0 or 1) if it can */
-static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
+static TCGArg
+do_constant_folding_cond2(const struct tcg_temp_info *temps, TCGArg *p1,
+                          TCGArg *p2, TCGCond c)
 {
     TCGArg al = p1[0], ah = p1[1];
     TCGArg bl = p2[0], bh = p2[1];
 
-    if (temp_is_const(bl) && temp_is_const(bh)) {
+    if (temp_is_const(temps, bl) && temp_is_const(temps, bh)) {
         uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val;
 
-        if (temp_is_const(al) && temp_is_const(ah)) {
+        if (temp_is_const(temps, al) && temp_is_const(temps, ah)) {
             uint64_t a;
             a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val;
             return do_constant_folding_cond_64(a, b, c);
@@ -516,18 +513,19 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
             }
         }
     }
-    if (temps_are_copies(al, bl) && temps_are_copies(ah, bh)) {
+    if (temps_are_copies(temps, al, bl) && temps_are_copies(temps, ah, bh)) {
         return do_constant_folding_cond_eq(c);
     }
     return 2;
 }
 
-static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
+static bool swap_commutative(const struct tcg_temp_info *temps, TCGArg dest,
+                             TCGArg *p1, TCGArg *p2)
 {
     TCGArg a1 = *p1, a2 = *p2;
     int sum = 0;
-    sum += temp_is_const(a1);
-    sum -= temp_is_const(a2);
+    sum += temp_is_const(temps, a1);
+    sum -= temp_is_const(temps, a2);
 
     /* Prefer the constant in second argument, and then the form
        op a, a, b, which is better handled on non-RISC hosts. */
@@ -539,13 +537,14 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
     return false;
 }
 
-static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
+static bool swap_commutative2(const struct tcg_temp_info *temps, TCGArg *p1,
+                              TCGArg *p2)
 {
     int sum = 0;
-    sum += temp_is_const(p1[0]);
-    sum += temp_is_const(p1[1]);
-    sum -= temp_is_const(p2[0]);
-    sum -= temp_is_const(p2[1]);
+    sum += temp_is_const(temps, p1[0]);
+    sum += temp_is_const(temps, p1[1]);
+    sum -= temp_is_const(temps, p2[0]);
+    sum -= temp_is_const(temps, p2[1]);
     if (sum > 0) {
         TCGArg t;
         t = p1[0], p1[0] = p2[0], p2[0] = t;
@@ -558,6 +557,8 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
+    struct tcg_temp_info *temps;
+    unsigned long *temps_used;
     int oi, oi_next, nb_temps, nb_globals;
     TCGArg *prev_mb_args = NULL;
 
@@ -568,7 +569,8 @@ void tcg_optimize(TCGContext *s)
 
     nb_temps = s->nb_temps;
     nb_globals = s->nb_globals;
-    reset_all_temps(nb_temps);
+    temps = g_new(struct tcg_temp_info, nb_temps);
+    temps_used = bitmap_new(nb_temps);
 
     for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
         tcg_target_ulong mask, partmask, affected;
@@ -590,21 +592,21 @@ void tcg_optimize(TCGContext *s)
             for (i = 0; i < nb_oargs + nb_iargs; i++) {
                 tmp = args[i];
                 if (tmp != TCG_CALL_DUMMY_ARG) {
-                    init_temp_info(tmp);
+                    init_temp_info(temps, temps_used, tmp);
                 }
             }
         } else {
             nb_oargs = def->nb_oargs;
             nb_iargs = def->nb_iargs;
             for (i = 0; i < nb_oargs + nb_iargs; i++) {
-                init_temp_info(args[i]);
+                init_temp_info(temps, temps_used, args[i]);
             }
         }
 
         /* Do copy propagation */
         for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-            if (temp_is_copy(args[i])) {
-                args[i] = find_better_copy(s, args[i]);
+            if (temp_is_copy(temps, args[i])) {
+                args[i] = find_better_copy(s, temps, args[i]);
             }
         }
 
@@ -620,44 +622,44 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(nor):
         CASE_OP_32_64(muluh):
         CASE_OP_32_64(mulsh):
-            swap_commutative(args[0], &args[1], &args[2]);
+            swap_commutative(temps, args[0], &args[1], &args[2]);
             break;
         CASE_OP_32_64(brcond):
-            if (swap_commutative(-1, &args[0], &args[1])) {
+            if (swap_commutative(temps, -1, &args[0], &args[1])) {
                 args[2] = tcg_swap_cond(args[2]);
             }
             break;
         CASE_OP_32_64(setcond):
-            if (swap_commutative(args[0], &args[1], &args[2])) {
+            if (swap_commutative(temps, args[0], &args[1], &args[2])) {
                 args[3] = tcg_swap_cond(args[3]);
             }
             break;
         CASE_OP_32_64(movcond):
-            if (swap_commutative(-1, &args[1], &args[2])) {
+            if (swap_commutative(temps, -1, &args[1], &args[2])) {
                 args[5] = tcg_swap_cond(args[5]);
             }
             /* For movcond, we canonicalize the "false" input reg to match
                the destination reg so that the tcg backend can implement
                a "move if true" operation.  */
-            if (swap_commutative(args[0], &args[4], &args[3])) {
+            if (swap_commutative(temps, args[0], &args[4], &args[3])) {
                 args[5] = tcg_invert_cond(args[5]);
             }
             break;
         CASE_OP_32_64(add2):
-            swap_commutative(args[0], &args[2], &args[4]);
-            swap_commutative(args[1], &args[3], &args[5]);
+            swap_commutative(temps, args[0], &args[2], &args[4]);
+            swap_commutative(temps, args[1], &args[3], &args[5]);
             break;
         CASE_OP_32_64(mulu2):
         CASE_OP_32_64(muls2):
-            swap_commutative(args[0], &args[2], &args[3]);
+            swap_commutative(temps, args[0], &args[2], &args[3]);
             break;
         case INDEX_op_brcond2_i32:
-            if (swap_commutative2(&args[0], &args[2])) {
+            if (swap_commutative2(temps, &args[0], &args[2])) {
                 args[4] = tcg_swap_cond(args[4]);
             }
             break;
         case INDEX_op_setcond2_i32:
-            if (swap_commutative2(&args[1], &args[3])) {
+            if (swap_commutative2(temps, &args[1], &args[3])) {
                 args[5] = tcg_swap_cond(args[5]);
             }
             break;
@@ -673,8 +675,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(sar):
         CASE_OP_32_64(rotl):
         CASE_OP_32_64(rotr):
-            if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
-                tcg_opt_gen_movi(s, op, args, args[0], 0);
+            if (temp_is_const(temps, args[1]) && temps[args[1]].val == 0) {
+                tcg_opt_gen_movi(s, temps, op, args, args[0], 0);
                 continue;
             }
             break;
@@ -683,7 +685,7 @@ void tcg_optimize(TCGContext *s)
                 TCGOpcode neg_op;
                 bool have_neg;
 
-                if (temp_is_const(args[2])) {
+                if (temp_is_const(temps, args[2])) {
                     /* Proceed with possible constant folding. */
                     break;
                 }
@@ -697,9 +699,9 @@ void tcg_optimize(TCGContext *s)
                 if (!have_neg) {
                     break;
                 }
-                if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
+                if (temp_is_const(temps, args[1]) && temps[args[1]].val == 0) {
                     op->opc = neg_op;
-                    reset_temp(args[0]);
+                    reset_temp(temps, args[0]);
                     args[1] = args[2];
                     continue;
                 }
@@ -707,30 +709,30 @@ void tcg_optimize(TCGContext *s)
             break;
         CASE_OP_32_64(xor):
         CASE_OP_32_64(nand):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
+            if (!temp_is_const(temps, args[1])
+                && temp_is_const(temps, args[2]) && temps[args[2]].val == -1) {
                 i = 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(nor):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
+            if (!temp_is_const(temps, args[1])
+                && temp_is_const(temps, args[2]) && temps[args[2]].val == 0) {
                 i = 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(andc):
-            if (!temp_is_const(args[2])
-                && temp_is_const(args[1]) && temps[args[1]].val == -1) {
+            if (!temp_is_const(temps, args[2])
+                && temp_is_const(temps, args[1]) && temps[args[1]].val == -1) {
                 i = 2;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (!temp_is_const(args[2])
-                && temp_is_const(args[1]) && temps[args[1]].val == 0) {
+            if (!temp_is_const(temps, args[2])
+                && temp_is_const(temps, args[1]) && temps[args[1]].val == 0) {
                 i = 2;
                 goto try_not;
             }
@@ -751,7 +753,7 @@ void tcg_optimize(TCGContext *s)
                     break;
                 }
                 op->opc = not_op;
-                reset_temp(args[0]);
+                reset_temp(temps, args[0]);
                 args[1] = args[i];
                 continue;
             }
@@ -771,18 +773,18 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(or):
         CASE_OP_32_64(xor):
         CASE_OP_32_64(andc):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            if (!temp_is_const(temps, args[1])
+                && temp_is_const(temps, args[2]) && temps[args[2]].val == 0) {
+                tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
                 continue;
             }
             break;
         CASE_OP_32_64(and):
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            if (!temp_is_const(temps, args[1])
+                && temp_is_const(temps, args[2]) && temps[args[2]].val == -1) {
+                tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
                 continue;
             }
             break;
@@ -819,7 +821,7 @@ void tcg_optimize(TCGContext *s)
 
         CASE_OP_32_64(and):
             mask = temps[args[2]].mask;
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
         and_const:
                 affected = temps[args[1]].mask & ~mask;
             }
@@ -838,7 +840,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(andc):
             /* Known-zeros does not imply known-ones.  Therefore unless
                args[2] is constant, we can't infer anything from it.  */
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 mask = ~temps[args[2]].mask;
                 goto and_const;
             }
@@ -847,26 +849,26 @@ void tcg_optimize(TCGContext *s)
             break;
 
         case INDEX_op_sar_i32:
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp = temps[args[2]].val & 31;
                 mask = (int32_t)temps[args[1]].mask >> tmp;
             }
             break;
         case INDEX_op_sar_i64:
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp = temps[args[2]].val & 63;
                 mask = (int64_t)temps[args[1]].mask >> tmp;
             }
             break;
 
         case INDEX_op_shr_i32:
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp = temps[args[2]].val & 31;
                 mask = (uint32_t)temps[args[1]].mask >> tmp;
             }
             break;
         case INDEX_op_shr_i64:
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp = temps[args[2]].val & 63;
                 mask = (uint64_t)temps[args[1]].mask >> tmp;
             }
@@ -880,7 +882,7 @@ void tcg_optimize(TCGContext *s)
             break;
 
         CASE_OP_32_64(shl):
-            if (temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[2])) {
                 tmp = temps[args[2]].val & (TCG_TARGET_REG_BITS - 1);
                 mask = temps[args[1]].mask << tmp;
             }
@@ -976,12 +978,12 @@ void tcg_optimize(TCGContext *s)
 
         if (partmask == 0) {
             tcg_debug_assert(nb_oargs == 1);
-            tcg_opt_gen_movi(s, op, args, args[0], 0);
+            tcg_opt_gen_movi(s, temps, op, args, args[0], 0);
             continue;
         }
         if (affected == 0) {
             tcg_debug_assert(nb_oargs == 1);
-            tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
             continue;
         }
 
@@ -991,8 +993,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(mul):
         CASE_OP_32_64(muluh):
         CASE_OP_32_64(mulsh):
-            if ((temp_is_const(args[2]) && temps[args[2]].val == 0)) {
-                tcg_opt_gen_movi(s, op, args, args[0], 0);
+            if ((temp_is_const(temps, args[2]) && temps[args[2]].val == 0)) {
+                tcg_opt_gen_movi(s, temps, op, args, args[0], 0);
                 continue;
             }
             break;
@@ -1004,8 +1006,8 @@ void tcg_optimize(TCGContext *s)
         switch (opc) {
         CASE_OP_32_64(or):
         CASE_OP_32_64(and):
-            if (temps_are_copies(args[1], args[2])) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            if (temps_are_copies(temps, args[1], args[2])) {
+                tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
                 continue;
             }
             break;
@@ -1018,8 +1020,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(andc):
         CASE_OP_32_64(sub):
         CASE_OP_32_64(xor):
-            if (temps_are_copies(args[1], args[2])) {
-                tcg_opt_gen_movi(s, op, args, args[0], 0);
+            if (temps_are_copies(temps, args[1], args[2])) {
+                tcg_opt_gen_movi(s, temps, op, args, args[0], 0);
                 continue;
             }
             break;
@@ -1032,10 +1034,10 @@ void tcg_optimize(TCGContext *s)
            allocator where needed and possible.  Also detect copies. */
         switch (opc) {
         CASE_OP_32_64(mov):
-            tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            tcg_opt_gen_mov(s, temps, op, args, args[0], args[1]);
             break;
         CASE_OP_32_64(movi):
-            tcg_opt_gen_movi(s, op, args, args[0], args[1]);
+            tcg_opt_gen_movi(s, temps, op, args, args[0], args[1]);
             break;
 
         CASE_OP_32_64(not):
@@ -1051,9 +1053,9 @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_extu_i32_i64:
         case INDEX_op_extrl_i64_i32:
         case INDEX_op_extrh_i64_i32:
-            if (temp_is_const(args[1])) {
+            if (temp_is_const(temps, args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, 0);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1080,66 +1082,70 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(divu):
         CASE_OP_32_64(rem):
         CASE_OP_32_64(remu):
-            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[1]) &&
+                temp_is_const(temps, args[2])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val,
                                           temps[args[2]].val);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(clz):
         CASE_OP_32_64(ctz):
-            if (temp_is_const(args[1])) {
+            if (temp_is_const(temps, args[1])) {
                 TCGArg v = temps[args[1]].val;
                 if (v != 0) {
                     tmp = do_constant_folding(opc, v, 0);
-                    tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                    tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 } else {
-                    tcg_opt_gen_mov(s, op, args, args[0], args[2]);
+                    tcg_opt_gen_mov(s, temps, op, args, args[0], args[2]);
                 }
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(deposit):
-            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
+            if (temp_is_const(temps, args[1]) &&
+                temp_is_const(temps, args[2])) {
                 tmp = deposit64(temps[args[1]].val, args[3], args[4],
                                 temps[args[2]].val);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(extract):
-            if (temp_is_const(args[1])) {
+            if (temp_is_const(temps, args[1])) {
                 tmp = extract64(temps[args[1]].val, args[2], args[3]);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(sextract):
-            if (temp_is_const(args[1])) {
+            if (temp_is_const(temps, args[1])) {
                 tmp = sextract64(temps[args[1]].val, args[2], args[3]);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(setcond):
-            tmp = do_constant_folding_cond(opc, args[1], args[2], args[3]);
+            tmp = do_constant_folding_cond(temps, opc, args[1], args[2],
+                                           args[3]);
             if (tmp != 2) {
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(brcond):
-            tmp = do_constant_folding_cond(opc, args[0], args[1], args[2]);
+            tmp = do_constant_folding_cond(temps, opc, args[0], args[1],
+                                           args[2]);
             if (tmp != 2) {
                 if (tmp) {
-                    reset_all_temps(nb_temps);
+                    bitmap_zero(temps_used, nb_temps);
                     op->opc = INDEX_op_br;
                     args[0] = args[3];
                 } else {
@@ -1150,12 +1156,14 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         CASE_OP_32_64(movcond):
-            tmp = do_constant_folding_cond(opc, args[1], args[2], args[5]);
+            tmp = do_constant_folding_cond(temps, opc, args[1], args[2],
+                                           args[5]);
             if (tmp != 2) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[4-tmp]);
+                tcg_opt_gen_mov(s, temps, op, args, args[0], args[4 - tmp]);
                 break;
             }
-            if (temp_is_const(args[3]) && temp_is_const(args[4])) {
+            if (temp_is_const(temps, args[3]) &&
+                temp_is_const(temps, args[4])) {
                 tcg_target_ulong tv = temps[args[3]].val;
                 tcg_target_ulong fv = temps[args[4]].val;
                 TCGCond cond = args[5];
@@ -1174,8 +1182,10 @@ void tcg_optimize(TCGContext *s)
 
         case INDEX_op_add2_i32:
         case INDEX_op_sub2_i32:
-            if (temp_is_const(args[2]) && temp_is_const(args[3])
-                && temp_is_const(args[4]) && temp_is_const(args[5])) {
+            if (temp_is_const(temps, args[2]) &&
+                temp_is_const(temps, args[3]) &&
+                temp_is_const(temps, args[4]) &&
+                temp_is_const(temps, args[5])) {
                 uint32_t al = temps[args[2]].val;
                 uint32_t ah = temps[args[3]].val;
                 uint32_t bl = temps[args[4]].val;
@@ -1194,8 +1204,8 @@ void tcg_optimize(TCGContext *s)
 
                 rl = args[0];
                 rh = args[1];
-                tcg_opt_gen_movi(s, op, args, rl, (int32_t)a);
-                tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(a >> 32));
+                tcg_opt_gen_movi(s, temps, op, args, rl, (int32_t)a);
+                tcg_opt_gen_movi(s, temps, op2, args2, rh, (int32_t)(a >> 32));
 
                 /* We've done all we need to do with the movi.  Skip it.  */
                 oi_next = op2->next;
@@ -1204,7 +1214,8 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         case INDEX_op_mulu2_i32:
-            if (temp_is_const(args[2]) && temp_is_const(args[3])) {
+            if (temp_is_const(temps, args[2]) &&
+                temp_is_const(temps, args[3])) {
                 uint32_t a = temps[args[2]].val;
                 uint32_t b = temps[args[3]].val;
                 uint64_t r = (uint64_t)a * b;
@@ -1214,8 +1225,8 @@ void tcg_optimize(TCGContext *s)
 
                 rl = args[0];
                 rh = args[1];
-                tcg_opt_gen_movi(s, op, args, rl, (int32_t)r);
-                tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(r >> 32));
+                tcg_opt_gen_movi(s, temps, op, args, rl, (int32_t)r);
+                tcg_opt_gen_movi(s, temps, op2, args2, rh, (int32_t)(r >> 32));
 
                 /* We've done all we need to do with the movi.  Skip it.  */
                 oi_next = op2->next;
@@ -1224,11 +1235,11 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         case INDEX_op_brcond2_i32:
-            tmp = do_constant_folding_cond2(&args[0], &args[2], args[4]);
+            tmp = do_constant_folding_cond2(temps, &args[0], &args[2], args[4]);
             if (tmp != 2) {
                 if (tmp) {
             do_brcond_true:
-                    reset_all_temps(nb_temps);
+                    bitmap_zero(temps_used, nb_temps);
                     op->opc = INDEX_op_br;
                     args[0] = args[5];
                 } else {
@@ -1236,12 +1247,14 @@ void tcg_optimize(TCGContext *s)
                     tcg_op_remove(s, op);
                 }
             } else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
-                       && temp_is_const(args[2]) && temps[args[2]].val == 0
-                       && temp_is_const(args[3]) && temps[args[3]].val == 0) {
+                       && temp_is_const(temps, args[2])
+                       && temps[args[2]].val == 0
+                       && temp_is_const(temps, args[3])
+                       && temps[args[3]].val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_brcond_high:
-                reset_all_temps(nb_temps);
+                bitmap_zero(temps_used, nb_temps);
                 op->opc = INDEX_op_brcond_i32;
                 args[0] = args[1];
                 args[1] = args[3];
@@ -1250,14 +1263,14 @@ void tcg_optimize(TCGContext *s)
             } else if (args[4] == TCG_COND_EQ) {
                 /* Simplify EQ comparisons where one of the pairs
                    can be simplified.  */
-                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_brcond_i32,
                                                args[0], args[2], TCG_COND_EQ);
                 if (tmp == 0) {
                     goto do_brcond_false;
                 } else if (tmp == 1) {
                     goto do_brcond_high;
                 }
-                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_brcond_i32,
                                                args[1], args[3], TCG_COND_EQ);
                 if (tmp == 0) {
                     goto do_brcond_false;
@@ -1265,7 +1278,7 @@ void tcg_optimize(TCGContext *s)
                     goto do_default;
                 }
             do_brcond_low:
-                reset_all_temps(nb_temps);
+                bitmap_zero(temps_used, nb_temps);
                 op->opc = INDEX_op_brcond_i32;
                 args[1] = args[2];
                 args[2] = args[4];
@@ -1273,14 +1286,14 @@ void tcg_optimize(TCGContext *s)
             } else if (args[4] == TCG_COND_NE) {
                 /* Simplify NE comparisons where one of the pairs
                    can be simplified.  */
-                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_brcond_i32,
                                                args[0], args[2], TCG_COND_NE);
                 if (tmp == 0) {
                     goto do_brcond_high;
                 } else if (tmp == 1) {
                     goto do_brcond_true;
                 }
-                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_brcond_i32,
                                                args[1], args[3], TCG_COND_NE);
                 if (tmp == 0) {
                     goto do_brcond_low;
@@ -1294,17 +1307,19 @@ void tcg_optimize(TCGContext *s)
             break;
 
         case INDEX_op_setcond2_i32:
-            tmp = do_constant_folding_cond2(&args[1], &args[3], args[5]);
+            tmp = do_constant_folding_cond2(temps, &args[1], &args[3], args[5]);
             if (tmp != 2) {
             do_setcond_const:
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, temps, op, args, args[0], tmp);
             } else if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
-                       && temp_is_const(args[3]) && temps[args[3]].val == 0
-                       && temp_is_const(args[4]) && temps[args[4]].val == 0) {
+                       && temp_is_const(temps, args[3])
+                       && temps[args[3]].val == 0
+                       && temp_is_const(temps, args[4])
+                       && temps[args[4]].val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_setcond_high:
-                reset_temp(args[0]);
+                reset_temp(temps, args[0]);
                 temps[args[0]].mask = 1;
                 op->opc = INDEX_op_setcond_i32;
                 args[1] = args[2];
@@ -1313,14 +1328,14 @@ void tcg_optimize(TCGContext *s)
             } else if (args[5] == TCG_COND_EQ) {
                 /* Simplify EQ comparisons where one of the pairs
                    can be simplified.  */
-                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_setcond_i32,
                                                args[1], args[3], TCG_COND_EQ);
                 if (tmp == 0) {
                     goto do_setcond_const;
                 } else if (tmp == 1) {
                     goto do_setcond_high;
                 }
-                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_setcond_i32,
                                                args[2], args[4], TCG_COND_EQ);
                 if (tmp == 0) {
                     goto do_setcond_high;
@@ -1328,7 +1343,7 @@ void tcg_optimize(TCGContext *s)
                     goto do_default;
                 }
             do_setcond_low:
-                reset_temp(args[0]);
+                reset_temp(temps, args[0]);
                 temps[args[0]].mask = 1;
                 op->opc = INDEX_op_setcond_i32;
                 args[2] = args[3];
@@ -1336,14 +1351,14 @@ void tcg_optimize(TCGContext *s)
             } else if (args[5] == TCG_COND_NE) {
                 /* Simplify NE comparisons where one of the pairs
                    can be simplified.  */
-                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_setcond_i32,
                                                args[1], args[3], TCG_COND_NE);
                 if (tmp == 0) {
                     goto do_setcond_high;
                 } else if (tmp == 1) {
                     goto do_setcond_const;
                 }
-                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
+                tmp = do_constant_folding_cond(temps, INDEX_op_setcond_i32,
                                                args[2], args[4], TCG_COND_NE);
                 if (tmp == 0) {
                     goto do_setcond_low;
@@ -1360,8 +1375,8 @@ void tcg_optimize(TCGContext *s)
             if (!(args[nb_oargs + nb_iargs + 1]
                   & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
                 for (i = 0; i < nb_globals; i++) {
-                    if (test_bit(i, temps_used.l)) {
-                        reset_temp(i);
+                    if (test_bit(i, temps_used)) {
+                        reset_temp(temps, i);
                     }
                 }
             }
@@ -1375,11 +1390,11 @@ void tcg_optimize(TCGContext *s)
                block, otherwise we only trash the output args.  "mask" is
                the non-zero bits mask for the first output arg.  */
             if (def->flags & TCG_OPF_BB_END) {
-                reset_all_temps(nb_temps);
+                bitmap_zero(temps_used, nb_temps);
             } else {
         do_reset_output:
                 for (i = 0; i < nb_oargs; i++) {
-                    reset_temp(args[i]);
+                    reset_temp(temps, args[i]);
                     /* Save the corresponding known-zero bits mask for the
                        first output argument (only one supported so far). */
                     if (i == 0) {
@@ -1428,4 +1443,6 @@ void tcg_optimize(TCGContext *s)
             prev_mb_args = args;
         }
     }
+    g_free(temps);
+    g_free(temps_used);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 36/43] tcg: introduce **tcg_ctxs to keep track of all TCGContext's
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (34 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  7:47   ` Richard Henderson
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 37/43] tcg: distribute profiling counters across TCGContext's Emilio G. Cota
                   ` (7 subsequent siblings)
  43 siblings, 1 reply; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Groundwork for supporting multiple TCG contexts.

Note that having n_tcg_ctxs is unnecessary. However, it is
convenient to have it, since it will simplify iterating over the
array: we'll have just a for loop instead of having to iterate
over a NULL-terminated array (which would require n+1 elems)
or having to check with ifdef's for usermode/softmmu.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/tcg.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index f907c47..2217314 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -115,6 +115,8 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 static void tcg_out_tb_init(TCGContext *s);
 static bool tcg_out_tb_finalize(TCGContext *s);
 
+static TCGContext **tcg_ctxs;
+static unsigned int n_tcg_ctxs;
 
 static TCGRegSet tcg_target_available_regs[2];
 static TCGRegSet tcg_target_call_clobber_regs;
@@ -382,6 +384,8 @@ void tcg_context_init(TCGContext *s)
     }
 
     tcg_ctx = s;
+    tcg_ctxs = &tcg_ctx;
+    n_tcg_ctxs = 1;
 }
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 37/43] tcg: distribute profiling counters across TCGContext's
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (35 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 36/43] tcg: introduce **tcg_ctxs to keep track of all TCGContext's Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 38/43] util: move qemu_real_host_page_size/mask to osdep.h Emilio G. Cota
                   ` (6 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

This is groundwork for supporting multiple TCG contexts.

To avoid scalability issues when profiling info is enabled, this patch
makes the profiling info counters distributed via the following changes:

1) Consolidate profile info into its own struct, TCGProfile, which
   TCGContext also includes. Note that tcg_table_op_count is brought
   into TCGProfile after dropping the tcg_ prefix.
2) Iterate over the TCG contexts in the system to obtain the total counts.

This change also requires updating the accessors to TCGProfile fields to
use atomic_read/set whenever there may be conflicting accesses (as defined
in C11) to them.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/tcg.h                 |  38 +++++++++-------
 accel/tcg/translate-all.c |  23 +++++-----
 tcg/tcg.c                 | 110 ++++++++++++++++++++++++++++++++++++++--------
 3 files changed, 126 insertions(+), 45 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index f83f9b0..3611141 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -641,6 +641,26 @@ QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE > (1 << 14));
 /* Make sure that we don't overflow 64 bits without noticing.  */
 QEMU_BUILD_BUG_ON(sizeof(TCGOp) > 8);
 
+typedef struct TCGProfile {
+    int64_t tb_count1;
+    int64_t tb_count;
+    int64_t op_count; /* total insn count */
+    int op_count_max; /* max insn per TB */
+    int64_t temp_count;
+    int temp_count_max;
+    int64_t del_op_count;
+    int64_t code_in_len;
+    int64_t code_out_len;
+    int64_t search_out_len;
+    int64_t interm_time;
+    int64_t code_time;
+    int64_t la_time;
+    int64_t opt_time;
+    int64_t restore_count;
+    int64_t restore_time;
+    int64_t table_op_count[NB_OPS];
+} TCGProfile;
+
 struct TCGContext {
     uint8_t *pool_cur, *pool_end;
     TCGPool *pool_first, *pool_current, *pool_first_large;
@@ -665,23 +685,7 @@ struct TCGContext {
     tcg_insn_unit *code_ptr;
 
 #ifdef CONFIG_PROFILER
-    /* profiling info */
-    int64_t tb_count1;
-    int64_t tb_count;
-    int64_t op_count; /* total insn count */
-    int op_count_max; /* max insn per TB */
-    int64_t temp_count;
-    int temp_count_max;
-    int64_t del_op_count;
-    int64_t code_in_len;
-    int64_t code_out_len;
-    int64_t search_out_len;
-    int64_t interm_time;
-    int64_t code_time;
-    int64_t la_time;
-    int64_t opt_time;
-    int64_t restore_count;
-    int64_t restore_time;
+    TCGProfile prof;
 #endif
 
 #ifdef CONFIG_DEBUG_TCG
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index e6ee4e3..36b17ac 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -312,6 +312,7 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
     uint8_t *p = tb->tc.search;
     int i, j, num_insns = tb->icount;
 #ifdef CONFIG_PROFILER
+    TCGProfile *prof = &tcg_ctx->prof;
     int64_t ti = profile_getclock();
 #endif
 
@@ -346,8 +347,9 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
     restore_state_to_opc(env, tb, data);
 
 #ifdef CONFIG_PROFILER
-    tcg_ctx->restore_time += profile_getclock() - ti;
-    tcg_ctx->restore_count++;
+    atomic_set(&prof->restore_time,
+                prof->restore_time + profile_getclock() - ti);
+    atomic_set(&prof->restore_count, prof->restore_count + 1);
 #endif
     return 0;
 }
@@ -1302,6 +1304,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tcg_insn_unit *gen_code_buf;
     int gen_code_size, search_size;
 #ifdef CONFIG_PROFILER
+    TCGProfile *prof = &tcg_ctx->prof;
     int64_t ti;
 #endif
     assert_memory_lock();
@@ -1332,8 +1335,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tcg_ctx->cf_parallel = !!(cflags & CF_PARALLEL);
 
 #ifdef CONFIG_PROFILER
-    tcg_ctx->tb_count1++; /* includes aborted translations because of
-                       exceptions */
+    /* includes aborted translations because of exceptions */
+    atomic_set(&prof->tb_count1, prof->tb_count1 + 1);
     ti = profile_getclock();
 #endif
 
@@ -1358,8 +1361,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 #endif
 
 #ifdef CONFIG_PROFILER
-    tcg_ctx->tb_count++;
-    tcg_ctx->interm_time += profile_getclock() - ti;
+    atomic_set(&prof->tb_count, prof->tb_count + 1);
+    atomic_set(&prof->interm_time, prof->interm_time + profile_getclock() - ti);
     ti = profile_getclock();
 #endif
 
@@ -1379,10 +1382,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     tb->tc.size = gen_code_size;
 
 #ifdef CONFIG_PROFILER
-    tcg_ctx->code_time += profile_getclock() - ti;
-    tcg_ctx->code_in_len += tb->size;
-    tcg_ctx->code_out_len += gen_code_size;
-    tcg_ctx->search_out_len += search_size;
+    atomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti);
+    atomic_set(&prof->code_in_len, prof->code_in_len + tb->size);
+    atomic_set(&prof->code_out_len, prof->code_out_len + gen_code_size);
+    atomic_set(&prof->search_out_len, prof->search_out_len + search_size);
 #endif
 
 #ifdef DEBUG_DISAS
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 2217314..0ddd0dc 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1344,7 +1344,7 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
     memset(op, 0, sizeof(*op));
 
 #ifdef CONFIG_PROFILER
-    s->del_op_count++;
+    atomic_set(&s->prof.del_op_count, s->prof.del_op_count + 1);
 #endif
 }
 
@@ -2515,15 +2515,79 @@ static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
 
 #ifdef CONFIG_PROFILER
 
-static int64_t tcg_table_op_count[NB_OPS];
+/* avoid copy/paste errors */
+#define PROF_ADD(to, from, field)                       \
+    do {                                                \
+        (to)->field += atomic_read(&((from)->field));   \
+    } while (0)
+
+#define PROF_MAX(to, from, field)                                       \
+    do {                                                                \
+        typeof((from)->field) val__ = atomic_read(&((from)->field));    \
+        if (val__ > (to)->field) {                                      \
+            (to)->field = val__;                                        \
+        }                                                               \
+    } while (0)
+
+/* Pass in a zero'ed @prof */
+static inline
+void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
+{
+    unsigned int i;
+
+    for (i = 0; i < n_tcg_ctxs; i++) {
+        const TCGProfile *orig = &tcg_ctxs[i]->prof;
+
+        if (counters) {
+            PROF_ADD(prof, orig, tb_count1);
+            PROF_ADD(prof, orig, tb_count);
+            PROF_ADD(prof, orig, op_count);
+            PROF_MAX(prof, orig, op_count_max);
+            PROF_ADD(prof, orig, temp_count);
+            PROF_MAX(prof, orig, temp_count_max);
+            PROF_ADD(prof, orig, del_op_count);
+            PROF_ADD(prof, orig, code_in_len);
+            PROF_ADD(prof, orig, code_out_len);
+            PROF_ADD(prof, orig, search_out_len);
+            PROF_ADD(prof, orig, interm_time);
+            PROF_ADD(prof, orig, code_time);
+            PROF_ADD(prof, orig, la_time);
+            PROF_ADD(prof, orig, opt_time);
+            PROF_ADD(prof, orig, restore_count);
+            PROF_ADD(prof, orig, restore_time);
+        }
+        if (table) {
+            int i;
+
+            for (i = 0; i < NB_OPS; i++) {
+                PROF_ADD(prof, orig, table_op_count[i]);
+            }
+        }
+    }
+}
+
+#undef PROF_ADD
+#undef PROF_MAX
+
+static void tcg_profile_snapshot_counters(TCGProfile *prof)
+{
+    tcg_profile_snapshot(prof, true, false);
+}
+
+static void tcg_profile_snapshot_table(TCGProfile *prof)
+{
+    tcg_profile_snapshot(prof, false, true);
+}
 
 void tcg_dump_op_count(FILE *f, fprintf_function cpu_fprintf)
 {
+    TCGProfile prof = {};
     int i;
 
+    tcg_profile_snapshot_table(&prof);
     for (i = 0; i < NB_OPS; i++) {
         cpu_fprintf(f, "%s %" PRId64 "\n", tcg_op_defs[i].name,
-                    tcg_table_op_count[i]);
+                    prof.table_op_count[i]);
     }
 }
 #else
@@ -2536,6 +2600,9 @@ void tcg_dump_op_count(FILE *f, fprintf_function cpu_fprintf)
 
 int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 {
+#ifdef CONFIG_PROFILER
+    TCGProfile *prof = &s->prof;
+#endif
     int i, oi, oi_next, num_insns;
 
 #ifdef CONFIG_PROFILER
@@ -2543,15 +2610,15 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
         int n;
 
         n = s->gen_op_buf[0].prev + 1;
-        s->op_count += n;
-        if (n > s->op_count_max) {
-            s->op_count_max = n;
+        atomic_set(&prof->op_count, prof->op_count + n);
+        if (n > prof->op_count_max) {
+            atomic_set(&prof->op_count_max, n);
         }
 
         n = s->nb_temps;
-        s->temp_count += n;
-        if (n > s->temp_count_max) {
-            s->temp_count_max = n;
+        atomic_set(&prof->temp_count, prof->temp_count + n);
+        if (n > prof->temp_count_max) {
+            atomic_set(&prof->temp_count_max, n);
         }
     }
 #endif
@@ -2568,7 +2635,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 #endif
 
 #ifdef CONFIG_PROFILER
-    s->opt_time -= profile_getclock();
+    atomic_set(&prof->opt_time, prof->opt_time - profile_getclock());
 #endif
 
 #ifdef USE_TCG_OPTIMIZATIONS
@@ -2576,8 +2643,8 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 #endif
 
 #ifdef CONFIG_PROFILER
-    s->opt_time += profile_getclock();
-    s->la_time -= profile_getclock();
+    atomic_set(&prof->opt_time, prof->opt_time + profile_getclock());
+    atomic_set(&prof->la_time, prof->la_time - profile_getclock());
 #endif
 
     {
@@ -2605,7 +2672,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     }
 
 #ifdef CONFIG_PROFILER
-    s->la_time += profile_getclock();
+    atomic_set(&prof->la_time, prof->la_time + profile_getclock());
 #endif
 
 #ifdef DEBUG_DISAS
@@ -2636,7 +2703,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 
         oi_next = op->next;
 #ifdef CONFIG_PROFILER
-        tcg_table_op_count[opc]++;
+        atomic_set(&prof->table_op_count[opc], prof->table_op_count[opc] + 1);
 #endif
 
         switch (opc) {
@@ -2712,10 +2779,17 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 #ifdef CONFIG_PROFILER
 void tcg_dump_info(FILE *f, fprintf_function cpu_fprintf)
 {
-    TCGContext *s = tcg_ctx;
-    int64_t tb_count = s->tb_count;
-    int64_t tb_div_count = tb_count ? tb_count : 1;
-    int64_t tot = s->interm_time + s->code_time;
+    TCGProfile prof = {};
+    const TCGProfile *s;
+    int64_t tb_count;
+    int64_t tb_div_count;
+    int64_t tot;
+
+    tcg_profile_snapshot_counters(&prof);
+    s = &prof;
+    tb_count = s->tb_count;
+    tb_div_count = tb_count ? tb_count : 1;
+    tot = s->interm_time + s->code_time;
 
     cpu_fprintf(f, "JIT cycles          %" PRId64 " (%0.3f s at 2.4 GHz)\n",
                 tot, tot / 2.4e9);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 38/43] util: move qemu_real_host_page_size/mask to osdep.h
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (36 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 37/43] tcg: distribute profiling counters across TCGContext's Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 39/43] osdep: introduce qemu_mprotect_rwx/none Emilio G. Cota
                   ` (5 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

These only depend on the host and therefore belong in the common
osdep, not in a target-dependent object.

While at it, query the host during an init constructor, which guarantees
the page size will be well-defined throughout the execution of the program.

Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/cpu-all.h |  2 --
 include/qemu/osdep.h   |  6 ++++++
 exec.c                 |  4 ----
 util/pagesize.c        | 18 ++++++++++++++++++
 util/Makefile.objs     |  1 +
 5 files changed, 25 insertions(+), 6 deletions(-)
 create mode 100644 util/pagesize.c

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index ffe43d5..778031c 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -229,8 +229,6 @@ extern int target_page_bits;
 /* Using intptr_t ensures that qemu_*_page_mask is sign-extended even
  * when intptr_t is 32-bit and we are aligning a long long.
  */
-extern uintptr_t qemu_real_host_page_size;
-extern intptr_t qemu_real_host_page_mask;
 extern uintptr_t qemu_host_page_size;
 extern intptr_t qemu_host_page_mask;
 
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 3b74f6f..0cba871 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -481,6 +481,12 @@ char *qemu_get_pid_name(pid_t pid);
  */
 pid_t qemu_fork(Error **errp);
 
+/* Using intptr_t ensures that qemu_*_page_mask is sign-extended even
+ * when intptr_t is 32-bit and we are aligning a long long.
+ */
+extern uintptr_t qemu_real_host_page_size;
+extern intptr_t qemu_real_host_page_mask;
+
 extern int qemu_icache_linesize;
 extern int qemu_dcache_linesize;
 
diff --git a/exec.c b/exec.c
index 94b0f3e..6e85535 100644
--- a/exec.c
+++ b/exec.c
@@ -121,8 +121,6 @@ int use_icount;
 
 uintptr_t qemu_host_page_size;
 intptr_t qemu_host_page_mask;
-uintptr_t qemu_real_host_page_size;
-intptr_t qemu_real_host_page_mask;
 
 bool set_preferred_target_page_bits(int bits)
 {
@@ -3621,8 +3619,6 @@ void page_size_init(void)
 {
     /* NOTE: we can always suppose that qemu_host_page_size >=
        TARGET_PAGE_SIZE */
-    qemu_real_host_page_size = getpagesize();
-    qemu_real_host_page_mask = -(intptr_t)qemu_real_host_page_size;
     if (qemu_host_page_size == 0) {
         qemu_host_page_size = qemu_real_host_page_size;
     }
diff --git a/util/pagesize.c b/util/pagesize.c
new file mode 100644
index 0000000..998632c
--- /dev/null
+++ b/util/pagesize.c
@@ -0,0 +1,18 @@
+/*
+ * pagesize.c - query the host about its page size
+ *
+ * Copyright (C) 2017, Emilio G. Cota <cota@braap.org>
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+uintptr_t qemu_real_host_page_size;
+intptr_t qemu_real_host_page_mask;
+
+static void __attribute__((constructor)) init_real_host_page_size(void)
+{
+    qemu_real_host_page_size = getpagesize();
+    qemu_real_host_page_mask = -(intptr_t)qemu_real_host_page_size;
+}
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 50a55ec..2973b0a 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -40,6 +40,7 @@ util-obj-y += buffer.o
 util-obj-y += timed-average.o
 util-obj-y += base64.o
 util-obj-y += log.o
+util-obj-y += pagesize.o
 util-obj-y += qdist.o
 util-obj-y += qht.o
 util-obj-y += range.o
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 39/43] osdep: introduce qemu_mprotect_rwx/none
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (37 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 38/43] util: move qemu_real_host_page_size/mask to osdep.h Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  7:49   ` Richard Henderson
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 40/43] translate-all: use qemu_protect_rwx/none helpers Emilio G. Cota
                   ` (4 subsequent siblings)
  43 siblings, 1 reply; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/qemu/osdep.h |  2 ++
 util/osdep.c         | 41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 0cba871..2c7d7db 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -348,6 +348,8 @@ void sigaction_invoke(struct sigaction *action,
 #endif
 
 int qemu_madvise(void *addr, size_t len, int advice);
+int qemu_mprotect_rwx(void *addr, size_t size);
+int qemu_mprotect_none(void *addr, size_t size);
 
 int qemu_open(const char *name, int flags, ...);
 int qemu_close(int fd);
diff --git a/util/osdep.c b/util/osdep.c
index a2863c8..f72d679 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -81,6 +81,47 @@ int qemu_madvise(void *addr, size_t len, int advice)
 #endif
 }
 
+static int qemu_mprotect__osdep(void *addr, size_t size, int prot)
+{
+    g_assert(!((uintptr_t)addr & ~qemu_real_host_page_mask));
+    g_assert(!(size & ~qemu_real_host_page_mask));
+
+#ifdef _WIN32
+    DWORD old_protect;
+
+    if (!VirtualProtect(addr, size, prot, &old_protect)) {
+        error_report("%s: VirtualProtect failed with error code %d",
+                     __func__, GetLastError());
+        return -1;
+    }
+    return 0;
+#else
+    if (mprotect(addr, size, prot)) {
+        error_report("%s: mprotect failed: %s", __func__, strerror(errno));
+        return -1;
+    }
+    return 0;
+#endif
+}
+
+int qemu_mprotect_rwx(void *addr, size_t size)
+{
+#ifdef _WIN32
+    return qemu_mprotect__osdep(addr, size, PAGE_EXECUTE_READWRITE);
+#else
+    return qemu_mprotect__osdep(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
+#endif
+}
+
+int qemu_mprotect_none(void *addr, size_t size)
+{
+#ifdef _WIN32
+    return qemu_mprotect__osdep(addr, size, PAGE_NOACCESS);
+#else
+    return qemu_mprotect__osdep(addr, size, PROT_NONE);
+#endif
+}
+
 #ifndef _WIN32
 /*
  * Dups an fd and sets the flags
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 40/43] translate-all: use qemu_protect_rwx/none helpers
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (38 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 39/43] osdep: introduce qemu_mprotect_rwx/none Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  7:51   ` Richard Henderson
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 41/43] tcg: define TCG_HIGHWATER Emilio G. Cota
                   ` (3 subsequent siblings)
  43 siblings, 1 reply; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

The helpers require the address and size to be page-aligned, so
do that before calling them.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 accel/tcg/translate-all.c | 61 ++++++++++-------------------------------------
 1 file changed, 13 insertions(+), 48 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 36b17ac..e930bac 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -604,63 +604,24 @@ static inline void *split_cross_256mb(void *buf1, size_t size1)
 static uint8_t static_code_gen_buffer[DEFAULT_CODE_GEN_BUFFER_SIZE]
     __attribute__((aligned(CODE_GEN_ALIGN)));
 
-# ifdef _WIN32
-static inline void do_protect(void *addr, long size, int prot)
-{
-    DWORD old_protect;
-    VirtualProtect(addr, size, prot, &old_protect);
-}
-
-static inline void map_exec(void *addr, long size)
-{
-    do_protect(addr, size, PAGE_EXECUTE_READWRITE);
-}
-
-static inline void map_none(void *addr, long size)
-{
-    do_protect(addr, size, PAGE_NOACCESS);
-}
-# else
-static inline void do_protect(void *addr, long size, int prot)
-{
-    uintptr_t start, end;
-
-    start = (uintptr_t)addr;
-    start &= qemu_real_host_page_mask;
-
-    end = (uintptr_t)addr + size;
-    end = ROUND_UP(end, qemu_real_host_page_size);
-
-    mprotect((void *)start, end - start, prot);
-}
-
-static inline void map_exec(void *addr, long size)
-{
-    do_protect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
-}
-
-static inline void map_none(void *addr, long size)
-{
-    do_protect(addr, size, PROT_NONE);
-}
-# endif /* WIN32 */
-
 static inline void *alloc_code_gen_buffer(void)
 {
     void *buf = static_code_gen_buffer;
+    void *end = static_code_gen_buffer + sizeof(static_code_gen_buffer);
     size_t full_size, size;
 
-    /* The size of the buffer, rounded down to end on a page boundary.  */
-    full_size = (((uintptr_t)buf + sizeof(static_code_gen_buffer))
-                 & qemu_real_host_page_mask) - (uintptr_t)buf;
+    /* page-align the beginning and end of the buffer */
+    buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
+    end = QEMU_ALIGN_PTR_DOWN(end, qemu_real_host_page_size);
 
     /* Reserve a guard page.  */
+    full_size = end - buf;
     size = full_size - qemu_real_host_page_size;
 
     /* Honor a command-line option limiting the size of the buffer.  */
     if (size > tcg_ctx->code_gen_buffer_size) {
-        size = (((uintptr_t)buf + tcg_ctx->code_gen_buffer_size)
-                & qemu_real_host_page_mask) - (uintptr_t)buf;
+        size = QEMU_ALIGN_DOWN(tcg_ctx->code_gen_buffer_size,
+                               qemu_real_host_page_size);
     }
     tcg_ctx->code_gen_buffer_size = size;
 
@@ -671,8 +632,12 @@ static inline void *alloc_code_gen_buffer(void)
     }
 #endif
 
-    map_exec(buf, size);
-    map_none(buf + size, qemu_real_host_page_size);
+    if (qemu_mprotect_rwx(buf, size)) {
+        abort();
+    }
+    if (qemu_mprotect_none(buf + size, qemu_real_host_page_size)) {
+        abort();
+    }
     qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
 
     return buf;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 41/43] tcg: define TCG_HIGHWATER
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (39 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 40/43] translate-all: use qemu_protect_rwx/none helpers Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 42/43] tcg: introduce regions to split code_gen_buffer Emilio G. Cota
                   ` (2 subsequent siblings)
  43 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Will come in handy very soon.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/tcg.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 0ddd0dc..cb4ecbd 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -115,6 +115,8 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 static void tcg_out_tb_init(TCGContext *s);
 static bool tcg_out_tb_finalize(TCGContext *s);
 
+#define TCG_HIGHWATER 1024
+
 static TCGContext **tcg_ctxs;
 static unsigned int n_tcg_ctxs;
 
@@ -435,7 +437,7 @@ void tcg_prologue_init(TCGContext *s)
     /* Compute a high-water mark, at which we voluntarily flush the buffer
        and start over.  The size here is arbitrary, significantly larger
        than we expect the code generation for any one opcode to require.  */
-    s->code_gen_highwater = s->code_gen_buffer + (total_size - 1024);
+    s->code_gen_highwater = s->code_gen_buffer + (total_size - TCG_HIGHWATER);
 
     tcg_register_jit(s->code_gen_buffer, total_size);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 42/43] tcg: introduce regions to split code_gen_buffer
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (40 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 41/43] tcg: define TCG_HIGHWATER Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  8:04   ` Richard Henderson
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 43/43] tcg: enable multiple TCG contexts in softmmu Emilio G. Cota
  2017-07-20  4:05 ` [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts no-reply
  43 siblings, 1 reply; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

This is groundwork for supporting multiple TCG contexts.

The naive solution here is to split code_gen_buffer statically
among the TCG threads; this however results in poor utilization
if translation needs are different across TCG threads.

What we do here is to add an extra layer of indirection, assigning
regions that act just like pages do in virtual memory allocation.
(BTW if you are wondering about the chosen naming, I did not want
to use blocks or pages because those are already heavily used in QEMU).

We use a global lock to serialize allocations as well as statistics
reporting (we now export the size of the used code_gen_buffer with
tcg_code_size()). Note that for the allocator we could just use
a counter and atomic_inc; however, that would complicate the gathering
of tcg_code_size()-like stats. So given that the region operations are
not a fast path, a lock seems the most reasonable choice.

The effectiveness of this approach is clear after seeing some numbers.
I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark.
Note that I'm evaluating this after enabling per-thread TCG (which
is done by a subsequent commit).

* -smp 1, 1 region (entire buffer):
    qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357
    qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363
    qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364
    qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373
    qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373
    qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360
    qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370
    qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367

That is, 8 flushes.

* -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]:

    qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356
    qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361
    qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361
    qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375
    qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375
    qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360
    qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365
    qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368

Again, 8 flushes. Note how buffer utilization is not 100%, but it
is close. Smaller region sizes would yield higher utilization,
but we want region allocation to be rare (it acquires a lock), so
we do not want to go too small.

* -smp 8, static partitioning of 8 regions (10 MB per region):
    qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354
    qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370
    qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365
    qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377
    qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358
    qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367
    qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364
    qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358
    qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362
    qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372
    qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374
    qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376
    qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374
    qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372
    qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359
    qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362
    qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368
    qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378
    qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367
    qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364

That is, 20 flushes. Note how a static partitioning approach uses
the code buffer poorly, leading to many unnecessary flushes.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/tcg.h                 |   6 ++
 accel/tcg/translate-all.c |  63 +++++-----------
 bsd-user/main.c           |   1 +
 cpus.c                    |  12 +++
 linux-user/main.c         |   1 +
 tcg/tcg.c                 | 183 +++++++++++++++++++++++++++++++++++++++++++++-
 6 files changed, 221 insertions(+), 45 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index 3611141..3365da8 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -760,6 +760,12 @@ void *tcg_malloc_internal(TCGContext *s, int size);
 void tcg_pool_reset(TCGContext *s);
 TranslationBlock *tcg_tb_alloc(TCGContext *s);
 
+void tcg_region_init(void);
+void tcg_region_reset_all(void);
+
+size_t tcg_code_size(void);
+size_t tcg_code_capacity(void);
+
 /* Called with tb_lock held.  */
 static inline void *tcg_malloc(int size)
 {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index e930bac..623b9e7 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -608,15 +608,13 @@ static inline void *alloc_code_gen_buffer(void)
 {
     void *buf = static_code_gen_buffer;
     void *end = static_code_gen_buffer + sizeof(static_code_gen_buffer);
-    size_t full_size, size;
+    size_t size;
 
     /* page-align the beginning and end of the buffer */
     buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
     end = QEMU_ALIGN_PTR_DOWN(end, qemu_real_host_page_size);
 
-    /* Reserve a guard page.  */
-    full_size = end - buf;
-    size = full_size - qemu_real_host_page_size;
+    size = end - buf;
 
     /* Honor a command-line option limiting the size of the buffer.  */
     if (size > tcg_ctx->code_gen_buffer_size) {
@@ -635,9 +633,6 @@ static inline void *alloc_code_gen_buffer(void)
     if (qemu_mprotect_rwx(buf, size)) {
         abort();
     }
-    if (qemu_mprotect_none(buf + size, qemu_real_host_page_size)) {
-        abort();
-    }
     qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
 
     return buf;
@@ -646,22 +641,16 @@ static inline void *alloc_code_gen_buffer(void)
 static inline void *alloc_code_gen_buffer(void)
 {
     size_t size = tcg_ctx->code_gen_buffer_size;
-    void *buf1, *buf2;
-
-    /* Perform the allocation in two steps, so that the guard page
-       is reserved but uncommitted.  */
-    buf1 = VirtualAlloc(NULL, size + qemu_real_host_page_size,
-                        MEM_RESERVE, PAGE_NOACCESS);
-    if (buf1 != NULL) {
-        buf2 = VirtualAlloc(buf1, size, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
-        assert(buf1 == buf2);
-    }
+    void *buf;
 
-    return buf1;
+    buf = VirtualAlloc(NULL, size, MEM_RESERVE | MEM_COMMIT,
+                        PAGE_EXECUTE_READWRITE);
+    return buf;
 }
 #else
 static inline void *alloc_code_gen_buffer(void)
 {
+    int prot = PROT_WRITE | PROT_READ | PROT_EXEC;
     int flags = MAP_PRIVATE | MAP_ANONYMOUS;
     uintptr_t start = 0;
     size_t size = tcg_ctx->code_gen_buffer_size;
@@ -695,8 +684,7 @@ static inline void *alloc_code_gen_buffer(void)
 #  endif
 # endif
 
-    buf = mmap((void *)start, size + qemu_real_host_page_size,
-               PROT_NONE, flags, -1, 0);
+    buf = mmap((void *)start, size, prot, flags, -1, 0);
     if (buf == MAP_FAILED) {
         return NULL;
     }
@@ -706,24 +694,23 @@ static inline void *alloc_code_gen_buffer(void)
         /* Try again, with the original still mapped, to avoid re-acquiring
            that 256mb crossing.  This time don't specify an address.  */
         size_t size2;
-        void *buf2 = mmap(NULL, size + qemu_real_host_page_size,
-                          PROT_NONE, flags, -1, 0);
+        void *buf2 = mmap(NULL, size, prot, flags, -1, 0);
         switch ((int)(buf2 != MAP_FAILED)) {
         case 1:
             if (!cross_256mb(buf2, size)) {
                 /* Success!  Use the new buffer.  */
-                munmap(buf, size + qemu_real_host_page_size);
+                munmap(buf, size);
                 break;
             }
             /* Failure.  Work with what we had.  */
-            munmap(buf2, size + qemu_real_host_page_size);
+            munmap(buf2, size);
             /* fallthru */
         default:
             /* Split the original buffer.  Free the smaller half.  */
             buf2 = split_cross_256mb(buf, size);
             size2 = tcg_ctx->code_gen_buffer_size;
             if (buf == buf2) {
-                munmap(buf + size2 + qemu_real_host_page_size, size - size2);
+                munmap(buf + size2, size - size2);
             } else {
                 munmap(buf, size - size2);
             }
@@ -734,10 +721,6 @@ static inline void *alloc_code_gen_buffer(void)
     }
 #endif
 
-    /* Make the final buffer accessible.  The guard page at the end
-       will remain inaccessible with PROT_NONE.  */
-    mprotect(buf, size, PROT_WRITE | PROT_READ | PROT_EXEC);
-
     /* Request large pages for the buffer.  */
     qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
 
@@ -918,13 +901,8 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
         size_t host_size = 0;
 
         g_tree_foreach(tb_ctx.tb_tree, tb_host_size_iter, &host_size);
-        printf("qemu: flush code_size=%td nb_tbs=%zu avg_tb_size=%zu\n",
-               tcg_ctx->code_gen_ptr - tcg_ctx->code_gen_buffer, nb_tbs,
-               nb_tbs > 0 ? host_size / nb_tbs : 0);
-    }
-    if ((unsigned long)(tcg_ctx->code_gen_ptr - tcg_ctx->code_gen_buffer)
-        > tcg_ctx->code_gen_buffer_size) {
-        cpu_abort(cpu, "Internal error: code buffer overflow\n");
+        printf("qemu: flush code_size=%zu nb_tbs=%zu avg_tb_size=%zu\n",
+               tcg_code_size(), nb_tbs, nb_tbs > 0 ? host_size / nb_tbs : 0);
     }
 
     CPU_FOREACH(cpu) {
@@ -938,7 +916,7 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
     qht_reset_size(&tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
     page_flush_tb();
 
-    tcg_ctx->code_gen_ptr = tcg_ctx->code_gen_buffer;
+    tcg_region_reset_all();
     /* XXX: flush processor icache at this point if cache flush is
        expensive */
     atomic_mb_set(&tb_ctx.tb_flush_count, tb_ctx.tb_flush_count + 1);
@@ -1279,9 +1257,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         cflags |= CF_USE_ICOUNT;
     }
 
+ buffer_overflow:
     tb = tb_alloc(pc);
     if (unlikely(!tb)) {
- buffer_overflow:
         /* flush must be done */
         tb_flush(cpu);
         mmap_unlock();
@@ -1365,9 +1343,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     }
 #endif
 
-    tcg_ctx->code_gen_ptr = (void *)
+    atomic_set(&tcg_ctx->code_gen_ptr, (void *)
         ROUND_UP((uintptr_t)gen_code_buf + gen_code_size + search_size,
-                 CODE_GEN_ALIGN);
+                 CODE_GEN_ALIGN));
 
     /* init jump list */
     assert(((uintptr_t)tb & 3) == 0);
@@ -1909,9 +1887,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
      * otherwise users might think "-tb-size" is not honoured.
      * For avg host size we use the precise numbers from tb_tree_stats though.
      */
-    cpu_fprintf(f, "gen code size       %td/%zd\n",
-                tcg_ctx->code_gen_ptr - tcg_ctx->code_gen_buffer,
-                tcg_ctx->code_gen_highwater - tcg_ctx->code_gen_buffer);
+    cpu_fprintf(f, "gen code size       %zu/%zu\n",
+                tcg_code_size(), tcg_code_capacity());
     cpu_fprintf(f, "TB count            %zu\n", nb_tbs);
     cpu_fprintf(f, "TB avg target size  %zu max=%zu bytes\n",
                 nb_tbs ? tst.target_size / nb_tbs : 0,
diff --git a/bsd-user/main.c b/bsd-user/main.c
index 7a8b29e..aa52c97 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -979,6 +979,7 @@ int main(int argc, char **argv)
        generating the prologue until now so that the prologue can take
        the real value of GUEST_BASE into account.  */
     tcg_prologue_init(tcg_ctx);
+    tcg_region_init();
 
     /* build Task State */
     memset(ts, 0, sizeof(TaskState));
diff --git a/cpus.c b/cpus.c
index 9bed61e..6022d40 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1664,6 +1664,18 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
     char thread_name[VCPU_THREAD_NAME_SIZE];
     static QemuCond *single_tcg_halt_cond;
     static QemuThread *single_tcg_cpu_thread;
+    static int tcg_region_inited;
+
+    /*
+     * Initialize TCG regions--once. Now is a good time, because:
+     * (1) TCG's init context, prologue and target globals have been set up.
+     * (2) qemu_tcg_mttcg_enabled() works now (TCG init code runs before the
+     *     -accel flag is processed, so the check doesn't work then).
+     */
+    if (!tcg_region_inited) {
+        tcg_region_inited = 1;
+        tcg_region_init();
+    }
 
     if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
         cpu->thread = g_malloc0(sizeof(QemuThread));
diff --git a/linux-user/main.c b/linux-user/main.c
index de7d948..aab4433 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -4460,6 +4460,7 @@ int main(int argc, char **argv, char **envp)
        generating the prologue until now so that the prologue can take
        the real value of GUEST_BASE into account.  */
     tcg_prologue_init(tcg_ctx);
+    tcg_region_init();
 
 #if defined(TARGET_I386)
     env->cr[0] = CR0_PG_MASK | CR0_WP_MASK | CR0_PE_MASK;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index cb4ecbd..22a949f 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -120,6 +120,23 @@ static bool tcg_out_tb_finalize(TCGContext *s);
 static TCGContext **tcg_ctxs;
 static unsigned int n_tcg_ctxs;
 
+/*
+ * We divide code_gen_buffer into equally-sized "regions" that TCG threads
+ * dynamically allocate from as demand dictates. Given appropriate region
+ * sizing, this minimizes flushes even when some TCG threads generate a lot
+ * more code than others.
+ */
+struct tcg_region_state {
+    QemuMutex lock;
+    void *buf;      /* set at init time */
+    size_t n;       /* set at init time */
+    size_t size;    /* size of one region; set at init time */
+    size_t current; /* protected by the lock */
+    size_t n_full;  /* protected by the lock */
+};
+
+static struct tcg_region_state region;
+
 static TCGRegSet tcg_target_available_regs[2];
 static TCGRegSet tcg_target_call_clobber_regs;
 
@@ -257,6 +274,164 @@ TCGLabel *gen_new_label(void)
 
 #include "tcg-target.inc.c"
 
+static void tcg_region_assign(TCGContext *s, size_t curr_region)
+{
+    void *buf;
+
+    buf = region.buf + curr_region * (region.size + qemu_real_host_page_size);
+    s->code_gen_buffer = buf;
+    s->code_gen_ptr = buf;
+    s->code_gen_buffer_size = region.size;
+    s->code_gen_highwater = buf + region.size - TCG_HIGHWATER;
+}
+
+static bool tcg_region_alloc__locked(TCGContext *s)
+{
+    if (region.current == region.n) {
+        return true;
+    }
+    tcg_region_assign(s, region.current);
+    region.current++;
+    return false;
+}
+
+/*
+ * Request a new region once the one in use has filled up.
+ * Returns true on error.
+ */
+static bool tcg_region_alloc(TCGContext *s)
+{
+    bool err;
+
+    qemu_mutex_lock(&region.lock);
+    err = tcg_region_alloc__locked(s);
+    if (!err) {
+        region.n_full++;
+    }
+    qemu_mutex_unlock(&region.lock);
+    return err;
+}
+
+/*
+ * Perform a context's first region allocation.
+ * This function does _not_ increment region.n_full.
+ */
+static inline bool tcg_region_initial_alloc__locked(TCGContext *s)
+{
+    return tcg_region_alloc__locked(s);
+}
+
+/* Call from a safe-work context */
+void tcg_region_reset_all(void)
+{
+    unsigned int i;
+
+    qemu_mutex_lock(&region.lock);
+    region.current = 0;
+    region.n_full = 0;
+
+    for (i = 0; i < n_tcg_ctxs; i++) {
+        bool err = tcg_region_initial_alloc__locked(tcg_ctxs[i]);
+
+        g_assert(!err);
+    }
+    qemu_mutex_unlock(&region.lock);
+}
+
+/*
+ * Initializes region partitioning.
+ *
+ * Called at init time from the parent thread (i.e. the one calling
+ * tcg_context_init), after the target's TCG globals have been set.
+ */
+void tcg_region_init(void)
+{
+    void *buf = tcg_init_ctx.code_gen_buffer;
+    size_t size = tcg_init_ctx.code_gen_buffer_size;
+    size_t region_size;
+    size_t n_regions;
+    size_t i;
+
+    /* We do not yet support multiple TCG contexts, so use one region for now */
+    n_regions = 1;
+
+    /* start on a page-aligned address */
+    buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
+    g_assert(buf < tcg_init_ctx.code_gen_buffer + size);
+
+    /* discard that initial portion */
+    size -= buf - tcg_init_ctx.code_gen_buffer;
+
+    /* make region_size a multiple of page_size */
+    region_size = size / n_regions;
+    region_size = QEMU_ALIGN_DOWN(region_size, qemu_real_host_page_size);
+
+    /* A region must have at least 2 pages; one code, one guard */
+    g_assert(region_size >= 2 * qemu_real_host_page_size);
+
+    /* init the region struct */
+    qemu_mutex_init(&region.lock);
+    region.n = n_regions;
+    region.buf = buf;
+    /* do not count the guard page in region.size */
+    region.size = region_size - qemu_real_host_page_size;
+
+    /* set guard pages */
+    for (i = 0; i < region.n; i++) {
+        void *guard;
+        int rc;
+
+        guard = region.buf + region.size;
+        guard += i * (region.size + qemu_real_host_page_size);
+        rc = qemu_mprotect_none(guard, qemu_real_host_page_size);
+        g_assert(!rc);
+    }
+
+    /* We do not yet support multiple TCG contexts so allocate the region now */
+    {
+        bool err = tcg_region_initial_alloc__locked(tcg_ctx);
+
+        g_assert(!err);
+    }
+}
+
+/*
+ * Returns the size (in bytes) of all translated code (i.e. from all regions)
+ * currently in the cache.
+ * See also: tcg_code_capacity()
+ * Do not confuse with tcg_current_code_size(); that one applies to a single
+ * TCG context.
+ */
+size_t tcg_code_size(void)
+{
+    unsigned int i;
+    size_t total;
+
+    qemu_mutex_lock(&region.lock);
+    total = region.n_full * (region.size - TCG_HIGHWATER);
+    for (i = 0; i < n_tcg_ctxs; i++) {
+        const TCGContext *s = tcg_ctxs[i];
+        size_t size;
+
+        size = atomic_read(&s->code_gen_ptr) - s->code_gen_buffer;
+        g_assert(size <= s->code_gen_buffer_size);
+        total += size;
+    }
+    qemu_mutex_unlock(&region.lock);
+    return total;
+}
+
+/*
+ * Returns the code capacity (in bytes) of the entire cache, i.e. including all
+ * regions.
+ * See also: tcg_code_size()
+ */
+size_t tcg_code_capacity(void)
+{
+    /* no need for synchronization; these variables are set at init time */
+    return region.n * (region.size - TCG_HIGHWATER);
+}
+
 /* pool based memory allocation */
 void *tcg_malloc_internal(TCGContext *s, int size)
 {
@@ -400,13 +575,17 @@ TranslationBlock *tcg_tb_alloc(TCGContext *s)
     TranslationBlock *tb;
     void *next;
 
+ retry:
     tb = (void *)ROUND_UP((uintptr_t)s->code_gen_ptr, align);
     next = (void *)ROUND_UP((uintptr_t)(tb + 1), align);
 
     if (unlikely(next > s->code_gen_highwater)) {
-        return NULL;
+        if (tcg_region_alloc(s)) {
+            return NULL;
+        }
+        goto retry;
     }
-    s->code_gen_ptr = next;
+    atomic_set(&s->code_gen_ptr, next);
     return tb;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCH v3 43/43] tcg: enable multiple TCG contexts in softmmu
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (41 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 42/43] tcg: introduce regions to split code_gen_buffer Emilio G. Cota
@ 2017-07-20  3:09 ` Emilio G. Cota
  2017-07-20  8:17   ` Richard Henderson
  2017-07-20  4:05 ` [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts no-reply
  43 siblings, 1 reply; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20  3:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

This enables parallel TCG code generation. However, we do not take
advantage of it yet since tb_lock is still held during tb_gen_code.

In user-mode we use a single TCG context; see the documentation
added to tcg_region_init for the rationale.

Note that targets do not need any conversion: targets initialize a
TCGContext (e.g. defining TCG globals), and after this initialization
has finished, the context is cloned by the vCPU threads, each of
them keeping a separate copy.

TCG threads claim one entry in tcg_ctxs[] by atomically increasing
n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's
of that variable; they are there just to play nice with analysis
tools such as thread sanitizer.

Note that we do not allocate an array of contexts (we allocate
an array of pointers instead) because when tcg_context_init
is called, we do not know yet how many contexts we'll use since
the bool behind qemu_tcg_mttcg_enabled() isn't set yet.

Previous patches folded some TCG globals into TCGContext. The non-const
globals remaining are only set at init time, i.e. before the TCG
threads are spawned. Here is a list of these set-at-init-time globals
under tcg/:

Only written by tcg_context_init:
- indirect_reg_alloc_order
- tcg_op_defs
Only written by tcg_target_init (called from tcg_context_init):
- tcg_target_available_regs
- tcg_target_call_clobber_regs
- arm: arm_arch, use_idiv_instructions
- i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt,
        have_movbe, have_popcnt
- mips: use_movnz_instructions, use_mips32_instructions,
        use_mips32r2_instructions, got_sigill (tcg_target_detect_isa)
- ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr
- s390: tb_ret_addr, s390_facilities
- sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines),
         use_vis3_instructions

Only written by tcg_prologue_init:
- 'struct jit_code_entry one_entry'
- aarch64: tb_ret_addr
- arm: tb_ret_addr
- i386: tb_ret_addr, guest_base_flags
- ia64: tb_ret_addr
- mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/tcg.h                 |   7 ++-
 accel/tcg/translate-all.c |   2 +-
 cpus.c                    |   2 +
 linux-user/syscall.c      |   1 +
 tcg/tcg.c                 | 141 ++++++++++++++++++++++++++++++++++++++++++++--
 5 files changed, 143 insertions(+), 10 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index 3365da8..68cd14e 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -733,7 +733,7 @@ struct TCGContext {
 };
 
 extern TCGContext tcg_init_ctx;
-extern TCGContext *tcg_ctx;
+extern __thread TCGContext *tcg_ctx;
 
 static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
 {
@@ -755,7 +755,7 @@ static inline bool tcg_op_buf_full(void)
 
 /* pool based memory allocation */
 
-/* tb_lock must be held for tcg_malloc_internal. */
+/* user-mode: tb_lock must be held for tcg_malloc_internal. */
 void *tcg_malloc_internal(TCGContext *s, int size);
 void tcg_pool_reset(TCGContext *s);
 TranslationBlock *tcg_tb_alloc(TCGContext *s);
@@ -766,7 +766,7 @@ void tcg_region_reset_all(void);
 size_t tcg_code_size(void);
 size_t tcg_code_capacity(void);
 
-/* Called with tb_lock held.  */
+/* user-mode: Called with tb_lock held.  */
 static inline void *tcg_malloc(int size)
 {
     TCGContext *s = tcg_ctx;
@@ -783,6 +783,7 @@ static inline void *tcg_malloc(int size)
 }
 
 void tcg_context_init(TCGContext *s);
+void tcg_register_thread(void);
 void tcg_prologue_init(TCGContext *s);
 void tcg_func_start(TCGContext *s);
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 623b9e7..2e810b9 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -154,7 +154,7 @@ static void *l1_map[V_L1_MAX_SIZE];
 
 /* code generation context */
 TCGContext tcg_init_ctx;
-TCGContext *tcg_ctx;
+__thread TCGContext *tcg_ctx;
 TBContext tb_ctx;
 bool parallel_cpus;
 
diff --git a/cpus.c b/cpus.c
index 6022d40..74ddd49 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1307,6 +1307,7 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
     CPUState *cpu = arg;
 
     rcu_register_thread();
+    tcg_register_thread();
 
     qemu_mutex_lock_iothread();
     qemu_thread_get_self(cpu->thread);
@@ -1454,6 +1455,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     g_assert(!use_icount);
 
     rcu_register_thread();
+    tcg_register_thread();
 
     qemu_mutex_lock_iothread();
     qemu_thread_get_self(cpu->thread);
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 003943b..bbf7913 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6214,6 +6214,7 @@ static void *clone_func(void *arg)
     TaskState *ts;
 
     rcu_register_thread();
+    tcg_register_thread();
     env = info->env;
     cpu = ENV_GET_CPU(env);
     thread_cpu = cpu;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 22a949f..a5c01be 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -58,6 +58,7 @@
 
 #include "elf.h"
 #include "exec/log.h"
+#include "sysemu/sysemu.h"
 
 /* Forward declarations for functions declared in tcg-target.inc.c and
    used here. */
@@ -324,13 +325,14 @@ static inline bool tcg_region_initial_alloc__locked(TCGContext *s)
 /* Call from a safe-work context */
 void tcg_region_reset_all(void)
 {
+    unsigned int n_ctxs = atomic_read(&n_tcg_ctxs);
     unsigned int i;
 
     qemu_mutex_lock(&region.lock);
     region.current = 0;
     region.n_full = 0;
 
-    for (i = 0; i < n_tcg_ctxs; i++) {
+    for (i = 0; i < n_ctxs; i++) {
         bool err = tcg_region_initial_alloc__locked(tcg_ctxs[i]);
 
         g_assert(!err);
@@ -338,11 +340,71 @@ void tcg_region_reset_all(void)
     qemu_mutex_unlock(&region.lock);
 }
 
+#ifdef CONFIG_SOFTMMU
+/*
+ * It is likely that some vCPUs will translate more code than others, so we
+ * first try to set more regions than smp_cpus, with those regions being of
+ * reasonable size. If that's not possible we make do by evenly dividing
+ * the code_gen_buffer among the vCPUs.
+ */
+static size_t tcg_n_regions(void)
+{
+    size_t i;
+
+    /* Use a single region if all we have is one vCPU thread */
+    if (smp_cpus == 1 || !qemu_tcg_mttcg_enabled()) {
+        return 1;
+    }
+
+    /* Try to have more regions than smp_cpus, with each region being >= 2 MB */
+    for (i = 8; i > 0; i--) {
+        size_t regions_per_thread = i;
+        size_t region_size;
+
+        region_size = tcg_init_ctx.code_gen_buffer_size;
+        region_size /= smp_cpus * regions_per_thread;
+
+        if (region_size >= 2 * 1024u * 1024) {
+            return smp_cpus * regions_per_thread;
+        }
+    }
+    /* If we can't, then just allocate one region per vCPU thread */
+    return smp_cpus;
+}
+#else /* user-mode */
+static size_t tcg_n_regions(void)
+{
+    return 1;
+}
+#endif
+
 /*
  * Initializes region partitioning.
  *
  * Called at init time from the parent thread (i.e. the one calling
  * tcg_context_init), after the target's TCG globals have been set.
+ *
+ * Region partitioning works by splitting code_gen_buffer into separate regions,
+ * and then assigning regions to TCG threads so that the threads can translate
+ * code in parallel without synchronization.
+ *
+ * In softmmu the number of TCG threads is bounded by smp_cpus, so we use at
+ * least smp_cpus regions in MTTCG. In !MTTCG we use a single region.
+ * Note that the TCG options from the command-line (i.e. -accel accel=tcg,[...])
+ * must have been parsed before calling this function, since it calls
+ * qemu_tcg_mttcg_enabled().
+ *
+ * In user-mode we use a single region.  Having multiple regions in user-mode
+ * is not supported, because the number of vCPU threads (recall that each thread
+ * spawned by the guest corresponds to a vCPU thread) is only bounded by the
+ * OS, and usually this number is huge (tens of thousands is not uncommon).
+ * Thus, given this large bound on the number of vCPU threads and the fact
+ * that code_gen_buffer is allocated at compile-time, we cannot guarantee
+ * that the availability of at least one region per vCPU thread.
+ *
+ * However, this user-mode limitation is unlikely to be a significant problem
+ * in practice. Multi-threaded guests share most if not all of their translated
+ * code, which makes parallel code generation less appealing than in softmmu.
  */
 void tcg_region_init(void)
 {
@@ -352,8 +414,7 @@ void tcg_region_init(void)
     size_t n_regions;
     size_t i;
 
-    /* We do not yet support multiple TCG contexts, so use one region for now */
-    n_regions = 1;
+    n_regions = tcg_n_regions();
 
     /* start on a page-aligned address */
     buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
@@ -387,13 +448,69 @@ void tcg_region_init(void)
         g_assert(!rc);
     }
 
-    /* We do not yet support multiple TCG contexts so allocate the region now */
+    /* In user-mode we support only one ctx, so do the initial allocation now */
+#ifdef CONFIG_USER_ONLY
     {
         bool err = tcg_region_initial_alloc__locked(tcg_ctx);
 
         g_assert(!err);
     }
+#endif
+}
+
+/*
+ * All TCG threads except the parent (i.e. the one that called tcg_context_init
+ * and registered the target's TCG globals) must register with this function
+ * before initiating translation.
+ *
+ * In user-mode we just point tcg_ctx to tcg_init_ctx. See the documentation
+ * of tcg_region_init() for the reasoning behind this.
+ *
+ * In softmmu each caller registers its context in tcg_ctxs[]. Note that in
+ * softmmu tcg_ctxs[] does not track tcg_ctx_init, since the initial context
+ * is not used anymore for translation once this function is called.
+ *
+ * Not tracking tcg_init_ctx in tcg_ctxs[] in softmmu keeps code that iterates
+ * over the array (e.g. tcg_code_size() the same for both softmmu and user-mode.
+ */
+#ifdef CONFIG_USER_ONLY
+void tcg_register_thread(void)
+{
+    tcg_ctx = &tcg_init_ctx;
+}
+#else
+void tcg_register_thread(void)
+{
+    TCGContext *s = g_malloc(sizeof(*s));
+    unsigned int n;
+    bool err;
+
+    memcpy(s, &tcg_init_ctx, sizeof(*s));
+
+    /* claim an entry in tcg_ctxs */
+    n = atomic_fetch_inc(&n_tcg_ctxs);
+    g_assert(n < smp_cpus);
+    tcg_ctxs[n] = s;
+
+    /*
+     * Zero out s->prof in all contexts but the first.
+     * This ensures that we correctly account for the profiling info
+     * generated during initialization, since tcg_init_ctx is not
+     * tracked by the array.
+     */
+    if (n != 0) {
+#ifdef CONFIG_PROFILER
+        memset(&s->prof, 0, sizeof(s->prof));
+#endif
+    }
+
+    tcg_ctx = s;
+    qemu_mutex_lock(&region.lock);
+    err = tcg_region_initial_alloc__locked(tcg_ctx);
+    g_assert(!err);
+    qemu_mutex_unlock(&region.lock);
 }
+#endif /* !CONFIG_USER_ONLY */
 
 /*
  * Returns the size (in bytes) of all translated code (i.e. from all regions)
@@ -404,12 +521,13 @@ void tcg_region_init(void)
  */
 size_t tcg_code_size(void)
 {
+    unsigned int n_ctxs = atomic_read(&n_tcg_ctxs);
     unsigned int i;
     size_t total;
 
     qemu_mutex_lock(&region.lock);
     total = region.n_full * (region.size - TCG_HIGHWATER);
-    for (i = 0; i < n_tcg_ctxs; i++) {
+    for (i = 0; i < n_ctxs; i++) {
         const TCGContext *s = tcg_ctxs[i];
         size_t size;
 
@@ -561,8 +679,18 @@ void tcg_context_init(TCGContext *s)
     }
 
     tcg_ctx = s;
+    /*
+     * In user-mode we simply share the init context among threads, since we
+     * use a single region. See the documentation tcg_region_init() for the
+     * reasoning behind this.
+     * In softmmu we will have at most smp_cpus TCG threads.
+     */
+#ifdef CONFIG_USER_ONLY
     tcg_ctxs = &tcg_ctx;
     n_tcg_ctxs = 1;
+#else
+    tcg_ctxs = g_new(TCGContext *, smp_cpus);
+#endif
 }
 
 /*
@@ -2714,9 +2842,10 @@ static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
 static inline
 void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
 {
+    unsigned int n_ctxs = atomic_read(&n_tcg_ctxs);
     unsigned int i;
 
-    for (i = 0; i < n_tcg_ctxs; i++) {
+    for (i = 0; i < n_ctxs; i++) {
         const TCGProfile *orig = &tcg_ctxs[i]->prof;
 
         if (counters) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts
  2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
                   ` (42 preceding siblings ...)
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 43/43] tcg: enable multiple TCG contexts in softmmu Emilio G. Cota
@ 2017-07-20  4:05 ` no-reply
  43 siblings, 0 replies; 63+ messages in thread
From: no-reply @ 2017-07-20  4:05 UTC (permalink / raw)
  To: cota; +Cc: famz, qemu-devel, rth

Hi,

This series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Message-id: 1500520169-23367-1-git-send-email-cota@braap.org
Subject: [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-quick@centos6
time make docker-test-build@min-glib
time make docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/1500520169-23367-1-git-send-email-cota@braap.org -> patchew/1500520169-23367-1-git-send-email-cota@braap.org
 - [tag update]      patchew/20170719163108.26943-1-apahim@redhat.com -> patchew/20170719163108.26943-1-apahim@redhat.com
Switched to a new branch 'test'
da87a3c tcg: enable multiple TCG contexts in softmmu
043a8fe tcg: introduce regions to split code_gen_buffer
56b9b69 tcg: define TCG_HIGHWATER
e7f8206 translate-all: use qemu_protect_rwx/none helpers
22af337 osdep: introduce qemu_mprotect_rwx/none
e169f67 util: move qemu_real_host_page_size/mask to osdep.h
2cb855d tcg: distribute profiling counters across TCGContext's
bd2ea58 tcg: introduce **tcg_ctxs to keep track of all TCGContext's
317e1af tcg: dynamically allocate optimizer temps
379bf86 gen-icount: fold exitreq_label into TCGContext
7dd8ecd tcg: define tcg_init_ctx and make tcg_ctx a pointer
29b8220 tcg: take .helpers out of TCGContext
f90dd29 tcg: take tb_ctx out of TCGContext
265e0ba tci: move tci_regs to tcg_qemu_tb_exec's stack
556cc0f translate-all: report correct avg host TB size
60c2e41 exec-all: rename tb_free to tb_remove
7e9cbd3 translate-all: use a binary search tree to track TBs in TBContext
05a7053 exec-all: extract tb->tc_* into a separate struct tc_tb
9d12b48 translate-all: define and use DEBUG_TB_CHECK_GATE
e6a6556 translate-all: define and use DEBUG_TB_INVALIDATE_GATE
0db2c48 exec-all: introduce TB_PAGE_ADDR_FMT
ace4460 translate-all: define and use DEBUG_TB_FLUSH_GATE
27836fa cpu-exec: lookup/generate TB outside exclusive region during step_atomic
1e060f9 tcg: check CF_PARALLEL instead of parallel_cpus
bc98732 target/sparc: check CF_PARALLEL instead of parallel_cpus
6ffdc73 target/sh4: check CF_PARALLEL instead of parallel_cpus
ccb61ac target/s390x: check CF_PARALLEL instead of parallel_cpus
12eeba3 target/m68k: check CF_PARALLEL instead of parallel_cpus
c6c4772 target/i386: check CF_PARALLEL instead of parallel_cpus
b6c38be target/hppa: check CF_PARALLEL instead of parallel_cpus
ee35e3a target/arm: check CF_PARALLEL instead of parallel_cpus
ac51961 tcg: convert tb->cflags reads to tb_cflags(tb)
30db086 tcg: define CF_PARALLEL and use it for TB hashing
35d964c exec-all: bring tb->invalid into tb->cflags
cc370d5 tcg: consolidate TB lookups in tb_lookup__cpu_state
7948c58 tcg: remove addr argument from lookup_tb_ptr
263f3e2 tcg/mips: constify tcg_target_callee_save_regs
fc1b9cb tcg/i386: constify tcg_target_callee_save_regs
72848e1 cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find
b2ae03c translate-all: make have_tb_lock static
d2271e9 exec-all: fix typos in TranslationBlock's documentation
bba6cb2 tcg: fix corruption of code_time profiling counter upon tb_flush
50913ad cputlb: bring back tlb_flush_count under !TLB_DEBUG

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-0962jnus/src/dtc'...
Submodule path 'dtc': checked out '558cd81bdd432769b59bff01240c44f82cfb1a9d'
  BUILD   centos6
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-0962jnus/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPY    RUNNER
    RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
bison-2.4.1-5.el6.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
flex-2.5.35-9.el6.x86_64
gcc-4.4.7-18.el6.x86_64
git-1.7.1-8.el6.x86_64
glib2-devel-2.28.8-9.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache     tar git make gcc g++ flex bison     zlib-devel glib2-devel SDL-devel pixman-devel     epel-release
HOSTNAME=6a51217a1bb9
TERM=xterm
MAKEFLAGS= -j8
HISTSIZE=1000
J=8
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix    /var/tmp/qemu-build/install
BIOS directory    /var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path       /tmp/qemu-test/src
C compiler        cc
Host C compiler   cc
C++ compiler      
Objective-C compiler cc
ARFLAGS           rv
CFLAGS            -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS       -I/usr/include/pixman-1   -I$(SRC_PATH)/dtc/libfdt -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all
LDFLAGS           -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make              make
install           install
python            python -B
smbd              /usr/sbin/smbd
module support    no
host CPU          x86_64
host big endian   no
target list       x86_64-softmmu aarch64-softmmu
gprof enabled     no
sparse enabled    no
strip binaries    yes
profiler          no
static build      no
pixman            system
SDL support       yes (1.2.14)
GTK support       no 
GTK GL support    no
VTE support       no 
TLS priority      NORMAL
GNUTLS support    no
GNUTLS rnd        no
libgcrypt         no
libgcrypt kdf     no
nettle            no 
nettle kdf        no
libtasn1          no
curses support    no
virgl support     no
curl support      no
mingw32 support   no
Audio drivers     oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS support    no
VNC support       yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support       no
brlapi support    no
bluez  support    no
Documentation     no
PIE               yes
vde support       no
netmap support    no
Linux AIO support no
ATTR/XATTR support yes
Install blobs     yes
KVM support       yes
HAX support       no
TCG support       yes
TCG debug enabled no
TCG interpreter   no
RDMA support      no
fdt support       yes
preadv support    yes
fdatasync         yes
madvise           yes
posix_madvise     yes
libcap-ng support no
vhost-net support yes
vhost-scsi support yes
vhost-vsock support yes
Trace backends    log
spice support     no 
rbd support       no
xfsctl support    no
smartcard support no
libusb            no
usb net redir     no
OpenGL support    no
OpenGL dmabufs    no
libiscsi support  no
libnfs support    no
build guest agent yes
QGA VSS support   no
QGA w32 disk info no
QGA MSI support   no
seccomp support   no
coroutine backend ucontext
coroutine pool    yes
debug stack usage no
crypto afalg      no
GlusterFS support no
gcov              gcov
gcov enabled      no
TPM support       yes
libssh2 support   no
TPM passthrough   yes
QOM debugging     yes
Live block migration yes
lzo support       no
snappy support    no
bzip2 support     no
NUMA host support no
tcmalloc support  no
jemalloc support  no
avx2 optimization no
replication support yes
VxHS block device no
mkdir -p dtc/libfdt
  GEN     aarch64-softmmu/config-devices.mak.tmp
  GEN     x86_64-softmmu/config-devices.mak.tmp
  GEN     config-host.h
mkdir -p dtc/tests
  GEN     qemu-options.def
  GEN     qmp-commands.h
  GEN     qapi-visit.h
  GEN     qapi-types.h
  GEN     qapi-event.h
  GEN     x86_64-softmmu/config-devices.mak
  GEN     qmp-marshal.c
  GEN     aarch64-softmmu/config-devices.mak
  GEN     qapi-types.c
  GEN     qapi-visit.c
  GEN     qapi-event.c
  GEN     qmp-introspect.h
  GEN     qmp-introspect.c
  GEN     trace/generated-tcg-tracers.h
  GEN     trace/generated-helpers-wrappers.h
  GEN     trace/generated-helpers.h
  GEN     trace/generated-helpers.c
  GEN     module_block.h
  GEN     tests/test-qapi-visit.h
  GEN     tests/test-qapi-types.h
  GEN     tests/test-qmp-commands.h
  GEN     tests/test-qapi-event.h
  GEN     tests/test-qmp-introspect.h
  GEN     trace-root.h
  GEN     util/trace.h
  GEN     crypto/trace.h
  GEN     io/trace.h
  GEN     migration/trace.h
  GEN     block/trace.h
  GEN     chardev/trace.h
  GEN     hw/block/trace.h
  GEN     hw/block/dataplane/trace.h
  GEN     hw/char/trace.h
  GEN     hw/intc/trace.h
  GEN     hw/net/trace.h
  GEN     hw/virtio/trace.h
  GEN     hw/audio/trace.h
  GEN     hw/misc/trace.h
  GEN     hw/usb/trace.h
  GEN     hw/scsi/trace.h
  GEN     hw/nvram/trace.h
  GEN     hw/display/trace.h
  GEN     hw/input/trace.h
  GEN     hw/timer/trace.h
  GEN     hw/dma/trace.h
  GEN     hw/sparc/trace.h
  GEN     hw/sd/trace.h
  GEN     hw/isa/trace.h
  GEN     hw/mem/trace.h
  GEN     hw/i386/trace.h
  GEN     hw/i386/xen/trace.h
  GEN     hw/9pfs/trace.h
  GEN     hw/ppc/trace.h
  GEN     hw/pci/trace.h
  GEN     hw/s390x/trace.h
  GEN     hw/vfio/trace.h
  GEN     hw/acpi/trace.h
  GEN     hw/arm/trace.h
  GEN     hw/alpha/trace.h
  GEN     hw/xen/trace.h
  GEN     ui/trace.h
  GEN     audio/trace.h
  GEN     net/trace.h
  GEN     target/arm/trace.h
  GEN     target/i386/trace.h
  GEN     target/mips/trace.h
  GEN     target/sparc/trace.h
  GEN     target/s390x/trace.h
  GEN     target/ppc/trace.h
  GEN     qom/trace.h
  GEN     linux-user/trace.h
  GEN     qapi/trace.h
  GEN     accel/tcg/trace.h
  GEN     accel/kvm/trace.h
  GEN     nbd/trace.h
  GEN     trace-root.c
  GEN     util/trace.c
  GEN     crypto/trace.c
  GEN     io/trace.c
  GEN     migration/trace.c
  GEN     block/trace.c
  GEN     chardev/trace.c
  GEN     hw/block/trace.c
  GEN     hw/block/dataplane/trace.c
  GEN     hw/char/trace.c
  GEN     hw/intc/trace.c
  GEN     hw/net/trace.c
  GEN     hw/virtio/trace.c
  GEN     hw/audio/trace.c
  GEN     hw/misc/trace.c
  GEN     hw/usb/trace.c
  GEN     hw/scsi/trace.c
  GEN     hw/nvram/trace.c
  GEN     hw/display/trace.c
  GEN     hw/input/trace.c
  GEN     hw/timer/trace.c
  GEN     hw/dma/trace.c
  GEN     hw/sparc/trace.c
  GEN     hw/sd/trace.c
  GEN     hw/isa/trace.c
  GEN     hw/mem/trace.c
  GEN     hw/i386/trace.c
  GEN     hw/i386/xen/trace.c
  GEN     hw/9pfs/trace.c
  GEN     hw/ppc/trace.c
  GEN     hw/pci/trace.c
  GEN     hw/s390x/trace.c
  GEN     hw/vfio/trace.c
  GEN     hw/acpi/trace.c
  GEN     hw/arm/trace.c
  GEN     hw/alpha/trace.c
  GEN     hw/xen/trace.c
  GEN     ui/trace.c
  GEN     audio/trace.c
  GEN     net/trace.c
  GEN     target/arm/trace.c
  GEN     target/i386/trace.c
  GEN     target/mips/trace.c
  GEN     target/sparc/trace.c
  GEN     target/s390x/trace.c
  GEN     target/ppc/trace.c
  GEN     qom/trace.c
  GEN     linux-user/trace.c
  GEN     qapi/trace.c
  GEN     accel/tcg/trace.c
  GEN     accel/kvm/trace.c
  GEN     nbd/trace.c
  GEN     config-all-devices.mak
	 DEP /tmp/qemu-test/src/dtc/tests/dumptrees.c
	 DEP /tmp/qemu-test/src/dtc/tests/trees.S
	 DEP /tmp/qemu-test/src/dtc/tests/testutils.c
	 DEP /tmp/qemu-test/src/dtc/tests/value-labels.c
	 DEP /tmp/qemu-test/src/dtc/tests/asm_tree_dump.c
	 DEP /tmp/qemu-test/src/dtc/tests/truncated_property.c
	 DEP /tmp/qemu-test/src/dtc/tests/check_path.c
	 DEP /tmp/qemu-test/src/dtc/tests/overlay_bad_fixup.c
	 DEP /tmp/qemu-test/src/dtc/tests/overlay.c
	 DEP /tmp/qemu-test/src/dtc/tests/subnode_iterate.c
	 DEP /tmp/qemu-test/src/dtc/tests/property_iterate.c
	 DEP /tmp/qemu-test/src/dtc/tests/integer-expressions.c
	 DEP /tmp/qemu-test/src/dtc/tests/utilfdt_test.c
	 DEP /tmp/qemu-test/src/dtc/tests/path_offset_aliases.c
	 DEP /tmp/qemu-test/src/dtc/tests/add_subnode_with_nops.c
	 DEP /tmp/qemu-test/src/dtc/tests/dtbs_equal_unordered.c
	 DEP /tmp/qemu-test/src/dtc/tests/dtb_reverse.c
	 DEP /tmp/qemu-test/src/dtc/tests/dtbs_equal_ordered.c
	 DEP /tmp/qemu-test/src/dtc/tests/extra-terminating-null.c
	 DEP /tmp/qemu-test/src/dtc/tests/incbin.c
	 DEP /tmp/qemu-test/src/dtc/tests/boot-cpuid.c
	 DEP /tmp/qemu-test/src/dtc/tests/phandle_format.c
	 DEP /tmp/qemu-test/src/dtc/tests/path-references.c
	 DEP /tmp/qemu-test/src/dtc/tests/references.c
	 DEP /tmp/qemu-test/src/dtc/tests/string_escapes.c
	 DEP /tmp/qemu-test/src/dtc/tests/propname_escapes.c
	 DEP /tmp/qemu-test/src/dtc/tests/appendprop2.c
	 DEP /tmp/qemu-test/src/dtc/tests/appendprop1.c
	 DEP /tmp/qemu-test/src/dtc/tests/del_node.c
	 DEP /tmp/qemu-test/src/dtc/tests/del_property.c
	 DEP /tmp/qemu-test/src/dtc/tests/setprop.c
	 DEP /tmp/qemu-test/src/dtc/tests/set_name.c
	 DEP /tmp/qemu-test/src/dtc/tests/rw_tree1.c
	 DEP /tmp/qemu-test/src/dtc/tests/open_pack.c
	 DEP /tmp/qemu-test/src/dtc/tests/nopulate.c
	 DEP /tmp/qemu-test/src/dtc/tests/mangle-layout.c
	 DEP /tmp/qemu-test/src/dtc/tests/move_and_save.c
	 DEP /tmp/qemu-test/src/dtc/tests/sw_tree1.c
	 DEP /tmp/qemu-test/src/dtc/tests/nop_node.c
	 DEP /tmp/qemu-test/src/dtc/tests/nop_property.c
	 DEP /tmp/qemu-test/src/dtc/tests/setprop_inplace.c
	 DEP /tmp/qemu-test/src/dtc/tests/stringlist.c
	 DEP /tmp/qemu-test/src/dtc/tests/addr_size_cells.c
	 DEP /tmp/qemu-test/src/dtc/tests/sized_cells.c
	 DEP /tmp/qemu-test/src/dtc/tests/notfound.c
	 DEP /tmp/qemu-test/src/dtc/tests/char_literal.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_alias.c
	 DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_compatible.c
	 DEP /tmp/qemu-test/src/dtc/tests/node_check_compatible.c
	 DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_phandle.c
	 DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_prop_value.c
	 DEP /tmp/qemu-test/src/dtc/tests/parent_offset.c
	 DEP /tmp/qemu-test/src/dtc/tests/supernode_atdepth_offset.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_path.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_phandle.c
	 DEP /tmp/qemu-test/src/dtc/tests/getprop.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_name.c
	 DEP /tmp/qemu-test/src/dtc/tests/path_offset.c
	 DEP /tmp/qemu-test/src/dtc/tests/subnode_offset.c
	 DEP /tmp/qemu-test/src/dtc/tests/find_property.c
	 DEP /tmp/qemu-test/src/dtc/tests/root_node.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_mem_rsv.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_overlay.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_addresses.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_empty_tree.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_strerror.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_rw.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_sw.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_wip.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_ro.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt.c
	 DEP /tmp/qemu-test/src/dtc/util.c
	 DEP /tmp/qemu-test/src/dtc/fdtput.c
	 DEP /tmp/qemu-test/src/dtc/fdtget.c
	 DEP /tmp/qemu-test/src/dtc/fdtdump.c
	 LEX convert-dtsv0-lexer.lex.c
	 DEP /tmp/qemu-test/src/dtc/srcpos.c
	 BISON dtc-parser.tab.c
	 LEX dtc-lexer.lex.c
	 DEP /tmp/qemu-test/src/dtc/treesource.c
	 DEP /tmp/qemu-test/src/dtc/fstree.c
	 DEP /tmp/qemu-test/src/dtc/dtc.c
	 DEP /tmp/qemu-test/src/dtc/flattree.c
	 DEP /tmp/qemu-test/src/dtc/livetree.c
	 DEP /tmp/qemu-test/src/dtc/data.c
	 DEP /tmp/qemu-test/src/dtc/checks.c
	 DEP convert-dtsv0-lexer.lex.c
	 DEP dtc-lexer.lex.c
	 DEP dtc-parser.tab.c
	CHK version_gen.h
	UPD version_gen.h
	 DEP /tmp/qemu-test/src/dtc/util.c
	 CC libfdt/fdt.o
	 CC libfdt/fdt_ro.o
	 CC libfdt/fdt_wip.o
	 CC libfdt/fdt_sw.o
	 CC libfdt/fdt_rw.o
	 CC libfdt/fdt_strerror.o
	 CC libfdt/fdt_empty_tree.o
	 CC libfdt/fdt_addresses.o
	 CC libfdt/fdt_overlay.o
	 AR libfdt/libfdt.a
ar: creating libfdt/libfdt.a
a - libfdt/fdt.o
a - libfdt/fdt_ro.o
a - libfdt/fdt_wip.o
a - libfdt/fdt_sw.o
a - libfdt/fdt_rw.o
a - libfdt/fdt_strerror.o
a - libfdt/fdt_empty_tree.o
a - libfdt/fdt_addresses.o
a - libfdt/fdt_overlay.o
  CC      tests/qemu-iotests/socket_scm_helper.o
  GEN     qga/qapi-generated/qga-qapi-types.h
  GEN     qga/qapi-generated/qga-qapi-visit.h
  GEN     qga/qapi-generated/qga-qapi-types.c
  CC      qmp-introspect.o
  GEN     qga/qapi-generated/qga-qmp-commands.h
  GEN     qga/qapi-generated/qga-qapi-visit.c
  GEN     qga/qapi-generated/qga-qmp-marshal.c
  CC      qapi-types.o
  CC      qapi-visit.o
  CC      qapi-event.o
  CC      qapi/qapi-visit-core.o
  CC      qapi/qapi-dealloc-visitor.o
  CC      qapi/qobject-input-visitor.o
  CC      qapi/qobject-output-visitor.o
  CC      qapi/qmp-registry.o
  CC      qapi/qmp-dispatch.o
  CC      qapi/string-input-visitor.o
  CC      qapi/string-output-visitor.o
  CC      qapi/opts-visitor.o
  CC      qapi/qapi-clone-visitor.o
  CC      qapi/qmp-event.o
  CC      qapi/qapi-util.o
  CC      qobject/qnull.o
  CC      qobject/qnum.o
  CC      qobject/qstring.o
  CC      qobject/qdict.o
  CC      qobject/qlist.o
  CC      qobject/qbool.o
  CC      qobject/qjson.o
  CC      qobject/qobject.o
  CC      qobject/json-lexer.o
  CC      qobject/json-streamer.o
  CC      qobject/json-parser.o
  CC      trace/control.o
  CC      trace/qmp.o
  CC      util/osdep.o
  CC      util/cutils.o
  CC      util/unicode.o
  CC      util/qemu-timer-common.o
  CC      util/bufferiszero.o
  CC      util/lockcnt.o
  CC      util/aiocb.o
  CC      util/async.o
  CC      util/thread-pool.o
  CC      util/qemu-timer.o
  CC      util/main-loop.o
  CC      util/iohandler.o
  CC      util/aio-posix.o
  CC      util/compatfd.o
  CC      util/event_notifier-posix.o
  CC      util/mmap-alloc.o
  CC      util/oslib-posix.o
  CC      util/qemu-openpty.o
  CC      util/qemu-thread-posix.o
  CC      util/memfd.o
  CC      util/envlist.o
  CC      util/path.o
  CC      util/module.o
  CC      util/host-utils.o
  CC      util/bitmap.o
  CC      util/bitops.o
  CC      util/hbitmap.o
  CC      util/fifo8.o
  CC      util/acl.o
  CC      util/cacheinfo.o
  CC      util/error.o
  CC      util/qemu-error.o
  CC      util/id.o
  CC      util/iov.o
  CC      util/qemu-config.o
  CC      util/qemu-sockets.o
  CC      util/uri.o
  CC      util/notify.o
  CC      util/qemu-option.o
  CC      util/qemu-progress.o
  CC      util/keyval.o
  CC      util/hexdump.o
  CC      util/crc32c.o
  CC      util/uuid.o
  CC      util/throttle.o
  CC      util/getauxval.o
  CC      util/readline.o
  CC      util/rcu.o
  CC      util/qemu-coroutine.o
  CC      util/qemu-coroutine-lock.o
  CC      util/qemu-coroutine-io.o
  CC      util/qemu-coroutine-sleep.o
  CC      util/coroutine-ucontext.o
  CC      util/buffer.o
  CC      util/base64.o
  CC      util/timed-average.o
  CC      util/log.o
  CC      util/pagesize.o
  CC      util/qdist.o
  CC      util/qht.o
  CC      util/range.o
  CC      util/stats64.o
  CC      util/systemd.o
  CC      trace-root.o
  CC      util/trace.o
  CC      crypto/trace.o
  CC      io/trace.o
  CC      migration/trace.o
  CC      block/trace.o
  CC      chardev/trace.o
  CC      hw/block/trace.o
  CC      hw/block/dataplane/trace.o
  CC      hw/char/trace.o
  CC      hw/intc/trace.o
  CC      hw/net/trace.o
  CC      hw/virtio/trace.o
  CC      hw/audio/trace.o
  CC      hw/misc/trace.o
  CC      hw/usb/trace.o
  CC      hw/scsi/trace.o
  CC      hw/nvram/trace.o
  CC      hw/display/trace.o
  CC      hw/input/trace.o
  CC      hw/timer/trace.o
  CC      hw/dma/trace.o
  CC      hw/sparc/trace.o
  CC      hw/sd/trace.o
  CC      hw/isa/trace.o
  CC      hw/mem/trace.o
  CC      hw/i386/trace.o
  CC      hw/i386/xen/trace.o
  CC      hw/9pfs/trace.o
  CC      hw/ppc/trace.o
  CC      hw/pci/trace.o
  CC      hw/s390x/trace.o
  CC      hw/vfio/trace.o
  CC      hw/acpi/trace.o
  CC      hw/arm/trace.o
  CC      hw/alpha/trace.o
  CC      hw/xen/trace.o
  CC      ui/trace.o
  CC      audio/trace.o
  CC      net/trace.o
  CC      target/arm/trace.o
  CC      target/i386/trace.o
  CC      target/mips/trace.o
  CC      target/sparc/trace.o
  CC      target/s390x/trace.o
  CC      target/ppc/trace.o
  CC      qom/trace.o
  CC      linux-user/trace.o
  CC      qapi/trace.o
  CC      accel/tcg/trace.o
  CC      accel/kvm/trace.o
  CC      nbd/trace.o
  CC      crypto/pbkdf-stub.o
  CC      stubs/arch-query-cpu-def.o
  CC      stubs/arch-query-cpu-model-expansion.o
  CC      stubs/arch-query-cpu-model-comparison.o
  CC      stubs/arch-query-cpu-model-baseline.o
  CC      stubs/bdrv-next-monitor-owned.o
  CC      stubs/blk-commit-all.o
  CC      stubs/blockdev-close-all-bdrv-states.o
  CC      stubs/cpu-get-clock.o
  CC      stubs/cpu-get-icount.o
  CC      stubs/clock-warp.o
  CC      stubs/dump.o
  CC      stubs/error-printf.o
  CC      stubs/fdset.o
  CC      stubs/gdbstub.o
  CC      stubs/get-vm-name.o
  CC      stubs/iothread.o
  CC      stubs/iothread-lock.o
  CC      stubs/is-daemonized.o
  CC      stubs/machine-init-done.o
  CC      stubs/migr-blocker.o
  CC      stubs/monitor.o
  CC      stubs/notify-event.o
  CC      stubs/qtest.o
  CC      stubs/runstate-check.o
  CC      stubs/replay.o
  CC      stubs/set-fd-handler.o
  CC      stubs/slirp.o
  CC      stubs/sysbus.o
  CC      stubs/trace-control.o
  CC      stubs/uuid.o
  CC      stubs/vm-stop.o
  CC      stubs/vmstate.o
  CC      stubs/qmp_pc_dimm_device_list.o
  CC      stubs/target-monitor-defs.o
  CC      stubs/target-get-monitor-def.o
  CC      stubs/pc_madt_cpu_entry.o
  CC      stubs/vmgenid.o
  CC      stubs/xen-common.o
  CC      stubs/xen-hvm.o
  CC      contrib/ivshmem-client/ivshmem-client.o
  CC      contrib/ivshmem-client/main.o
  CC      contrib/ivshmem-server/ivshmem-server.o
  CC      contrib/ivshmem-server/main.o
  CC      qemu-nbd.o
  CC      block.o
  CC      blockjob.o
  CC      qemu-io-cmds.o
  CC      replication.o
  CC      block/raw-format.o
  CC      block/qcow.o
  CC      block/vdi.o
  CC      block/vmdk.o
  CC      block/cloop.o
  CC      block/bochs.o
  CC      block/vpc.o
  CC      block/vvfat.o
  CC      block/dmg.o
  CC      block/qcow2.o
  CC      block/qcow2-refcount.o
  CC      block/qcow2-cluster.o
  CC      block/qcow2-snapshot.o
  CC      block/qcow2-cache.o
  CC      block/qcow2-bitmap.o
  CC      block/qed.o
  CC      block/qed-l2-cache.o
  CC      block/qed-table.o
  CC      block/qed-cluster.o
  CC      block/qed-check.o
  CC      block/vhdx.o
  CC      block/vhdx-endian.o
  CC      block/vhdx-log.o
  CC      block/quorum.o
  CC      block/parallels.o
  CC      block/blkdebug.o
  CC      block/blkverify.o
  CC      block/blkreplay.o
  CC      block/block-backend.o
  CC      block/snapshot.o
  CC      block/qapi.o
  CC      block/file-posix.o
  CC      block/null.o
  CC      block/mirror.o
  CC      block/commit.o
  CC      block/io.o
  CC      block/throttle-groups.o
  CC      block/nbd.o
  CC      block/nbd-client.o
  CC      block/sheepdog.o
  CC      block/accounting.o
  CC      block/dirty-bitmap.o
  CC      block/write-threshold.o
  CC      block/backup.o
  CC      block/replication.o
  CC      block/crypto.o
  CC      nbd/server.o
  CC      nbd/client.o
  CC      nbd/common.o
  CC      crypto/init.o
  CC      crypto/hash.o
  CC      crypto/hash-glib.o
  CC      crypto/hmac.o
  CC      crypto/hmac-glib.o
  CC      crypto/aes.o
  CC      crypto/desrfb.o
  CC      crypto/cipher.o
  CC      crypto/tlscreds.o
  CC      crypto/tlscredsanon.o
  CC      crypto/tlscredsx509.o
  CC      crypto/tlssession.o
  CC      crypto/secret.o
  CC      crypto/random-platform.o
  CC      crypto/pbkdf.o
  CC      crypto/ivgen.o
  CC      crypto/ivgen-essiv.o
  CC      crypto/ivgen-plain.o
  CC      crypto/ivgen-plain64.o
  CC      crypto/afsplit.o
  CC      crypto/xts.o
  CC      crypto/block.o
  CC      crypto/block-qcow.o
  CC      crypto/block-luks.o
  CC      io/channel.o
  CC      io/channel-buffer.o
  CC      io/channel-command.o
  CC      io/channel-file.o
  CC      io/channel-socket.o
  CC      io/channel-tls.o
  CC      io/channel-watch.o
  CC      io/channel-websock.o
  CC      io/channel-util.o
  CC      io/dns-resolver.o
  CC      io/task.o
  CC      qom/object.o
  CC      qom/container.o
  CC      qom/qom-qobject.o
  CC      qom/object_interfaces.o
  GEN     qemu-img-cmds.h
  CC      qemu-io.o
  CC      qemu-bridge-helper.o
  CC      blockdev.o
  CC      blockdev-nbd.o
  CC      bootdevice.o
  CC      iothread.o
  CC      qdev-monitor.o
  CC      device-hotplug.o
  CC      os-posix.o
  CC      bt-host.o
  CC      bt-vhci.o
  CC      dma-helpers.o
  CC      vl.o
  CC      tpm.o
  CC      device_tree.o
  CC      qmp-marshal.o
  CC      qmp.o
  CC      hmp.o
  CC      cpus-common.o
  CC      audio/audio.o
  CC      audio/noaudio.o
  CC      audio/wavaudio.o
  CC      audio/mixeng.o
  CC      audio/sdlaudio.o
  CC      audio/ossaudio.o
  CC      audio/wavcapture.o
  CC      backends/rng.o
  CC      backends/rng-egd.o
  CC      backends/rng-random.o
  CC      backends/tpm.o
  CC      backends/hostmem.o
  CC      backends/hostmem-ram.o
  CC      backends/hostmem-file.o
  CC      backends/cryptodev.o
  CC      backends/cryptodev-builtin.o
  CC      block/stream.o
  CC      chardev/msmouse.o
  CC      chardev/wctablet.o
  CC      chardev/testdev.o
  CC      disas/arm.o
  CC      disas/i386.o
  CC      fsdev/qemu-fsdev-dummy.o
  CC      fsdev/qemu-fsdev-opts.o
  CC      fsdev/qemu-fsdev-throttle.o
  CC      hw/acpi/core.o
  CC      hw/acpi/piix4.o
  CC      hw/acpi/pcihp.o
  CC      hw/acpi/ich9.o
  CC      hw/acpi/tco.o
  CC      hw/acpi/cpu_hotplug.o
  CC      hw/acpi/memory_hotplug.o
  CC      hw/acpi/cpu.o
  CC      hw/acpi/nvdimm.o
  CC      hw/acpi/vmgenid.o
  CC      hw/acpi/acpi_interface.o
  CC      hw/acpi/bios-linker-loader.o
  CC      hw/acpi/aml-build.o
  CC      hw/acpi/ipmi.o
  CC      hw/acpi/acpi-stub.o
  CC      hw/acpi/ipmi-stub.o
  CC      hw/audio/sb16.o
  CC      hw/audio/es1370.o
  CC      hw/audio/ac97.o
  CC      hw/audio/fmopl.o
  CC      hw/audio/adlib.o
  CC      hw/audio/gus.o
  CC      hw/audio/gusemu_hal.o
  CC      hw/audio/gusemu_mixer.o
  CC      hw/audio/cs4231a.o
  CC      hw/audio/intel-hda.o
  CC      hw/audio/hda-codec.o
  CC      hw/audio/pcspk.o
  CC      hw/audio/wm8750.o
  CC      hw/audio/pl041.o
  CC      hw/audio/lm4549.o
  CC      hw/audio/marvell_88w8618.o
  CC      hw/audio/soundhw.o
  CC      hw/block/block.o
  CC      hw/block/cdrom.o
  CC      hw/block/hd-geometry.o
  CC      hw/block/fdc.o
  CC      hw/block/m25p80.o
  CC      hw/block/nand.o
  CC      hw/block/pflash_cfi01.o
  CC      hw/block/pflash_cfi02.o
  CC      hw/block/ecc.o
  CC      hw/block/onenand.o
  CC      hw/block/nvme.o
  CC      hw/bt/core.o
  CC      hw/bt/l2cap.o
  CC      hw/bt/sdp.o
  CC      hw/bt/hci.o
  CC      hw/bt/hid.o
  CC      hw/bt/hci-csr.o
  CC      hw/char/ipoctal232.o
  CC      hw/char/parallel.o
  CC      hw/char/pl011.o
  CC      hw/char/serial.o
  CC      hw/char/serial-isa.o
  CC      hw/char/serial-pci.o
  CC      hw/char/virtio-console.o
  CC      hw/char/cadence_uart.o
  CC      hw/char/cmsdk-apb-uart.o
  CC      hw/char/debugcon.o
  CC      hw/char/imx_serial.o
  CC      hw/core/qdev.o
  CC      hw/core/qdev-properties.o
  CC      hw/core/bus.o
  CC      hw/core/fw-path-provider.o
  CC      hw/core/reset.o
  CC      hw/core/irq.o
  CC      hw/core/hotplug.o
  CC      hw/core/nmi.o
  CC      hw/core/ptimer.o
  CC      hw/core/sysbus.o
  CC      hw/core/machine.o
  CC      hw/core/loader.o
  CC      hw/core/qdev-properties-system.o
  CC      hw/core/register.o
  CC      hw/core/or-irq.o
  CC      hw/core/platform-bus.o
  CC      hw/cpu/core.o
  CC      hw/display/ads7846.o
  CC      hw/display/cirrus_vga.o
  CC      hw/display/pl110.o
  CC      hw/display/ssd0303.o
  CC      hw/display/ssd0323.o
  CC      hw/display/vga-pci.o
  CC      hw/display/vga-isa.o
  CC      hw/display/vmware_vga.o
  CC      hw/display/blizzard.o
  CC      hw/display/exynos4210_fimd.o
  CC      hw/display/framebuffer.o
  CC      hw/display/tc6393xb.o
  CC      hw/dma/pl080.o
  CC      hw/dma/pl330.o
  CC      hw/dma/i8257.o
  CC      hw/dma/xlnx-zynq-devcfg.o
  CC      hw/gpio/max7310.o
  CC      hw/gpio/pl061.o
  CC      hw/gpio/zaurus.o
  CC      hw/gpio/gpio_key.o
  CC      hw/i2c/core.o
  CC      hw/i2c/smbus.o
  CC      hw/i2c/smbus_eeprom.o
  CC      hw/i2c/i2c-ddc.o
  CC      hw/i2c/versatile_i2c.o
  CC      hw/i2c/smbus_ich9.o
  CC      hw/i2c/pm_smbus.o
  CC      hw/i2c/bitbang_i2c.o
  CC      hw/i2c/exynos4210_i2c.o
  CC      hw/i2c/imx_i2c.o
  CC      hw/i2c/aspeed_i2c.o
  CC      hw/ide/core.o
  CC      hw/ide/atapi.o
  CC      hw/ide/qdev.o
  CC      hw/ide/pci.o
  CC      hw/ide/isa.o
  CC      hw/ide/piix.o
  CC      hw/ide/microdrive.o
  CC      hw/ide/ahci.o
  CC      hw/ide/ich.o
  CC      hw/input/hid.o
  CC      hw/input/lm832x.o
  CC      hw/input/pckbd.o
  CC      hw/input/pl050.o
  CC      hw/input/ps2.o
  CC      hw/input/stellaris_input.o
  CC      hw/input/tsc2005.o
  CC      hw/input/vmmouse.o
  CC      hw/input/virtio-input.o
  CC      hw/input/virtio-input-hid.o
  CC      hw/input/virtio-input-host.o
  CC      hw/intc/i8259_common.o
  CC      hw/intc/i8259.o
  CC      hw/intc/pl190.o
  CC      hw/intc/imx_avic.o
  CC      hw/intc/realview_gic.o
  CC      hw/intc/ioapic_common.o
  CC      hw/intc/arm_gic_common.o
  CC      hw/intc/arm_gic.o
  CC      hw/intc/arm_gicv2m.o
  CC      hw/intc/arm_gicv3_common.o
  CC      hw/intc/arm_gicv3.o
  CC      hw/intc/arm_gicv3_dist.o
  CC      hw/intc/arm_gicv3_redist.o
  CC      hw/intc/arm_gicv3_its_common.o
  CC      hw/intc/intc.o
  CC      hw/ipack/ipack.o
  CC      hw/ipack/tpci200.o
  CC      hw/ipmi/ipmi.o
  CC      hw/ipmi/ipmi_bmc_sim.o
  CC      hw/ipmi/ipmi_bmc_extern.o
  CC      hw/ipmi/isa_ipmi_kcs.o
  CC      hw/ipmi/isa_ipmi_bt.o
  CC      hw/isa/isa-bus.o
  CC      hw/isa/apm.o
  CC      hw/mem/pc-dimm.o
  CC      hw/mem/nvdimm.o
  CC      hw/misc/applesmc.o
  CC      hw/misc/max111x.o
  CC      hw/misc/tmp105.o
  CC      hw/misc/tmp421.o
  CC      hw/misc/debugexit.o
  CC      hw/misc/sga.o
  CC      hw/misc/pc-testdev.o
  CC      hw/misc/pci-testdev.o
  CC      hw/misc/edu.o
  CC      hw/misc/unimp.o
  CC      hw/misc/arm_l2x0.o
  CC      hw/misc/arm_integrator_debug.o
  CC      hw/misc/a9scu.o
  CC      hw/misc/arm11scu.o
  CC      hw/net/ne2000.o
  CC      hw/net/eepro100.o
  CC      hw/net/pcnet-pci.o
  CC      hw/net/pcnet.o
  CC      hw/net/e1000.o
  CC      hw/net/e1000x_common.o
  CC      hw/net/net_tx_pkt.o
  CC      hw/net/net_rx_pkt.o
  CC      hw/net/e1000e.o
  CC      hw/net/e1000e_core.o
  CC      hw/net/rtl8139.o
  CC      hw/net/smc91c111.o
  CC      hw/net/vmxnet3.o
  CC      hw/net/ne2000-isa.o
  CC      hw/net/lan9118.o
  CC      hw/net/xgmac.o
  CC      hw/net/allwinner_emac.o
  CC      hw/net/imx_fec.o
  CC      hw/net/cadence_gem.o
  CC      hw/net/stellaris_enet.o
  CC      hw/net/ftgmac100.o
  CC      hw/net/rocker/rocker.o
  CC      hw/net/rocker/rocker_fp.o
  CC      hw/net/rocker/rocker_desc.o
  CC      hw/net/rocker/rocker_world.o
  CC      hw/net/rocker/rocker_of_dpa.o
  CC      hw/nvram/eeprom93xx.o
  CC      hw/nvram/fw_cfg.o
  CC      hw/nvram/chrp_nvram.o
  CC      hw/pci-bridge/pci_bridge_dev.o
  CC      hw/pci-bridge/pcie_root_port.o
  CC      hw/pci-bridge/gen_pcie_root_port.o
  CC      hw/pci-bridge/pci_expander_bridge.o
  CC      hw/pci-bridge/xio3130_upstream.o
  CC      hw/pci-bridge/xio3130_downstream.o
  CC      hw/pci-bridge/ioh3420.o
  CC      hw/pci-bridge/i82801b11.o
  CC      hw/pci-host/pam.o
  CC      hw/pci-host/versatile.o
  CC      hw/pci-host/piix.o
  CC      hw/pci-host/gpex.o
  CC      hw/pci-host/q35.o
  CC      hw/pci/pci.o
  CC      hw/pci/pci_bridge.o
  CC      hw/pci/msix.o
  CC      hw/pci/msi.o
  CC      hw/pci/shpc.o
  CC      hw/pci/slotid_cap.o
  CC      hw/pci/pci_host.o
  CC      hw/pci/pcie_host.o
  CC      hw/pci/pcie.o
  CC      hw/pci/pcie_aer.o
  CC      hw/pci/pcie_port.o
  CC      hw/pci/pci-stub.o
  CC      hw/pcmcia/pcmcia.o
  CC      hw/scsi/scsi-disk.o
  CC      hw/scsi/scsi-generic.o
  CC      hw/scsi/scsi-bus.o
  CC      hw/scsi/lsi53c895a.o
  CC      hw/scsi/mptsas.o
  CC      hw/scsi/mptconfig.o
  CC      hw/scsi/mptendian.o
  CC      hw/scsi/megasas.o
  CC      hw/scsi/vmw_pvscsi.o
  CC      hw/scsi/esp.o
  CC      hw/scsi/esp-pci.o
  CC      hw/sd/pl181.o
  CC      hw/sd/ssi-sd.o
  CC      hw/sd/sd.o
  CC      hw/sd/core.o
  CC      hw/sd/sdhci.o
  CC      hw/smbios/smbios.o
  CC      hw/smbios/smbios_type_38.o
  CC      hw/smbios/smbios-stub.o
  CC      hw/smbios/smbios_type_38-stub.o
  CC      hw/ssi/pl022.o
  CC      hw/ssi/ssi.o
  CC      hw/ssi/xilinx_spips.o
  CC      hw/ssi/aspeed_smc.o
  CC      hw/ssi/stm32f2xx_spi.o
  CC      hw/timer/arm_timer.o
  CC      hw/timer/arm_mptimer.o
  CC      hw/timer/armv7m_systick.o
  CC      hw/timer/a9gtimer.o
  CC      hw/timer/cadence_ttc.o
  CC      hw/timer/ds1338.o
  CC      hw/timer/hpet.o
  CC      hw/timer/i8254.o
  CC      hw/timer/i8254_common.o
  CC      hw/timer/pl031.o
  CC      hw/timer/twl92230.o
  CC      hw/timer/imx_epit.o
  CC      hw/timer/imx_gpt.o
  CC      hw/timer/stm32f2xx_timer.o
  CC      hw/timer/aspeed_timer.o
  CC      hw/timer/cmsdk-apb-timer.o
  CC      hw/tpm/tpm_tis.o
  CC      hw/tpm/tpm_passthrough.o
  CC      hw/tpm/tpm_util.o
  CC      hw/usb/core.o
  CC      hw/usb/combined-packet.o
  CC      hw/usb/bus.o
  CC      hw/usb/libhw.o
  CC      hw/usb/desc.o
  CC      hw/usb/desc-msos.o
  CC      hw/usb/hcd-uhci.o
  CC      hw/usb/hcd-ohci.o
  CC      hw/usb/hcd-ehci.o
  CC      hw/usb/hcd-ehci-pci.o
  CC      hw/usb/hcd-ehci-sysbus.o
  CC      hw/usb/hcd-xhci.o
  CC      hw/usb/hcd-xhci-nec.o
  CC      hw/usb/hcd-musb.o
  CC      hw/usb/dev-hub.o
  CC      hw/usb/dev-hid.o
  CC      hw/usb/dev-wacom.o
  CC      hw/usb/dev-storage.o
  CC      hw/usb/dev-uas.o
  CC      hw/usb/dev-audio.o
  CC      hw/usb/dev-serial.o
  CC      hw/usb/dev-network.o
  CC      hw/usb/dev-bluetooth.o
  CC      hw/usb/dev-smartcard-reader.o
  CC      hw/usb/dev-mtp.o
  CC      hw/usb/host-stub.o
  CC      hw/virtio/virtio-rng.o
  CC      hw/virtio/virtio-pci.o
  CC      hw/virtio/virtio-bus.o
  CC      hw/virtio/virtio-mmio.o
  CC      hw/virtio/vhost-stub.o
  CC      hw/watchdog/watchdog.o
  CC      hw/watchdog/wdt_i6300esb.o
  CC      hw/watchdog/wdt_ib700.o
  CC      hw/watchdog/wdt_aspeed.o
  CC      migration/migration.o
  CC      migration/socket.o
  CC      migration/fd.o
  CC      migration/exec.o
  CC      migration/tls.o
  CC      migration/channel.o
  CC      migration/savevm.o
  CC      migration/colo-comm.o
  CC      migration/colo.o
  CC      migration/colo-failover.o
  CC      migration/vmstate.o
  CC      migration/vmstate-types.o
  CC      migration/page_cache.o
  CC      migration/qemu-file.o
  CC      migration/global_state.o
  CC      migration/qemu-file-channel.o
  CC      migration/xbzrle.o
  CC      migration/postcopy-ram.o
  CC      migration/qjson.o
  CC      net/net.o
  CC      migration/block.o
  CC      net/queue.o
  CC      net/checksum.o
  CC      net/util.o
  CC      net/hub.o
  CC      net/socket.o
  CC      net/dump.o
  CC      net/eth.o
  CC      net/l2tpv3.o
  CC      net/vhost-user.o
  CC      net/slirp.o
  CC      net/filter.o
  CC      net/filter-buffer.o
  CC      net/filter-mirror.o
  CC      net/colo-compare.o
  CC      net/colo.o
  CC      net/filter-rewriter.o
  CC      net/filter-replay.o
  CC      net/tap.o
  CC      net/tap-linux.o
  CC      qom/cpu.o
  CC      replay/replay.o
  CC      replay/replay-internal.o
  CC      replay/replay-events.o
  CC      replay/replay-time.o
  CC      replay/replay-input.o
  CC      replay/replay-char.o
/tmp/qemu-test/src/replay/replay-internal.c: In function ‘replay_put_array’:
/tmp/qemu-test/src/replay/replay-internal.c:65: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result
  CC      replay/replay-snapshot.o
  CC      replay/replay-net.o
  CC      replay/replay-audio.o
  CC      slirp/cksum.o
  CC      slirp/if.o
  CC      slirp/ip_icmp.o
  CC      slirp/ip6_icmp.o
  CC      slirp/ip6_input.o
  CC      slirp/ip6_output.o
  CC      slirp/ip_input.o
  CC      slirp/ip_output.o
  CC      slirp/dnssearch.o
  CC      slirp/dhcpv6.o
  CC      slirp/slirp.o
  CC      slirp/mbuf.o
  CC      slirp/misc.o
  CC      slirp/sbuf.o
  CC      slirp/socket.o
  CC      slirp/tcp_input.o
  CC      slirp/tcp_output.o
  CC      slirp/tcp_subr.o
  CC      slirp/tcp_timer.o
  CC      slirp/udp.o
  CC      slirp/udp6.o
  CC      slirp/bootp.o
  CC      slirp/tftp.o
  CC      slirp/arp_table.o
/tmp/qemu-test/src/slirp/tcp_input.c: In function ‘tcp_input’:
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_p’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_len’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_tos’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_id’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_off’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_ttl’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_sum’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_src.s_addr’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:219: warning: ‘save_ip.ip_dst.s_addr’ may be used uninitialized in this function
/tmp/qemu-test/src/slirp/tcp_input.c:220: warning: ‘save_ip6.ip_nh’ may be used uninitialized in this function
  CC      slirp/ndp_table.o
  CC      slirp/ncsi.o
  CC      ui/keymaps.o
  CC      ui/console.o
  CC      ui/cursor.o
  CC      ui/qemu-pixman.o
  CC      ui/input.o
  CC      ui/input-keymap.o
  CC      ui/input-legacy.o
  CC      ui/input-linux.o
  CC      ui/sdl.o
  CC      ui/sdl_zoom.o
  CC      ui/x_keymap.o
  CC      ui/vnc.o
  CC      ui/vnc-enc-zlib.o
  CC      ui/vnc-enc-hextile.o
  CC      ui/vnc-enc-tight.o
  CC      ui/vnc-palette.o
  CC      ui/vnc-enc-zrle.o
  CC      ui/vnc-auth-vencrypt.o
  CC      ui/vnc-ws.o
  CC      ui/vnc-jobs.o
  CC      chardev/char.o
  CC      chardev/char-fd.o
  CC      chardev/char-fe.o
  CC      chardev/char-file.o
  CC      chardev/char-io.o
  CC      chardev/char-mux.o
  CC      chardev/char-null.o
  CC      chardev/char-parallel.o
  CC      chardev/char-pipe.o
  CC      chardev/char-pty.o
  CC      chardev/char-ringbuf.o
  CC      chardev/char-serial.o
  CC      chardev/char-socket.o
  CC      chardev/char-stdio.o
  CC      chardev/char-udp.o
  LINK    tests/qemu-iotests/socket_scm_helper
  CC      qga/commands.o
  AS      optionrom/multiboot.o
  AS      optionrom/linuxboot.o
  CC      optionrom/linuxboot_dma.o
cc: unrecognized option '-no-integrated-as'
cc: unrecognized option '-no-integrated-as'
  AS      optionrom/kvmvapic.o
  BUILD   optionrom/multiboot.img
  BUILD   optionrom/linuxboot.img
  BUILD   optionrom/linuxboot_dma.img
  BUILD   optionrom/kvmvapic.img
  BUILD   optionrom/multiboot.raw
  CC      qga/guest-agent-command-state.o
  CC      qga/main.o
  BUILD   optionrom/linuxboot.raw
  CC      qga/commands-posix.o
  BUILD   optionrom/linuxboot_dma.raw
  CC      qga/channel-posix.o
  CC      qga/qapi-generated/qga-qapi-types.o
  BUILD   optionrom/kvmvapic.raw
  CC      qga/qapi-generated/qga-qapi-visit.o
  CC      qga/qapi-generated/qga-qmp-marshal.o
  SIGN    optionrom/multiboot.bin
  AR      libqemuutil.a
  SIGN    optionrom/kvmvapic.bin
  AR      libqemustub.a
  CC      qemu-img.o
  SIGN    optionrom/linuxboot.bin
  SIGN    optionrom/linuxboot_dma.bin
  LINK    ivshmem-client
  LINK    ivshmem-server
  LINK    qemu-nbd
  LINK    qemu-img
  LINK    qemu-io
  LINK    qemu-bridge-helper
  GEN     x86_64-softmmu/hmp-commands.h
  GEN     x86_64-softmmu/config-target.h
  GEN     x86_64-softmmu/hmp-commands-info.h
  CC      x86_64-softmmu/exec.o
  CC      x86_64-softmmu/tcg/tcg-common.o
  CC      x86_64-softmmu/tcg/tcg-op.o
  CC      x86_64-softmmu/tcg/optimize.o
  CC      x86_64-softmmu/tcg/tcg.o
  CC      x86_64-softmmu/tcg/tcg-runtime.o
  GEN     aarch64-softmmu/hmp-commands.h
  GEN     aarch64-softmmu/hmp-commands-info.h
  GEN     aarch64-softmmu/config-target.h
  CC      aarch64-softmmu/exec.o
  CC      x86_64-softmmu/fpu/softfloat.o
  CC      x86_64-softmmu/disas.o
  GEN     x86_64-softmmu/gdbstub-xml.c
  CC      aarch64-softmmu/tcg/tcg.o
  CC      aarch64-softmmu/tcg/tcg-op.o
  CC      aarch64-softmmu/tcg/optimize.o
  CC      x86_64-softmmu/hax-stub.o
  CC      x86_64-softmmu/arch_init.o
  CC      aarch64-softmmu/tcg/tcg-common.o
  CC      aarch64-softmmu/tcg/tcg-runtime.o
  CC      aarch64-softmmu/fpu/softfloat.o
  CC      aarch64-softmmu/disas.o
  CC      x86_64-softmmu/cpus.o
  CC      x86_64-softmmu/monitor.o
  CC      x86_64-softmmu/gdbstub.o
  CC      x86_64-softmmu/balloon.o
  GEN     aarch64-softmmu/gdbstub-xml.c
  CC      aarch64-softmmu/hax-stub.o
  LINK    qemu-ga
  CC      aarch64-softmmu/arch_init.o
  CC      x86_64-softmmu/ioport.o
  CC      aarch64-softmmu/cpus.o
  CC      aarch64-softmmu/monitor.o
  CC      x86_64-softmmu/numa.o
  CC      aarch64-softmmu/gdbstub.o
  CC      x86_64-softmmu/qtest.o
  CC      aarch64-softmmu/balloon.o
  CC      x86_64-softmmu/memory.o
  CC      x86_64-softmmu/memory_mapping.o
  CC      aarch64-softmmu/ioport.o
  CC      x86_64-softmmu/dump.o
  CC      aarch64-softmmu/numa.o
  CC      aarch64-softmmu/qtest.o
  CC      aarch64-softmmu/memory.o
  CC      aarch64-softmmu/memory_mapping.o
  CC      aarch64-softmmu/dump.o
  CC      aarch64-softmmu/migration/ram.o
  CC      x86_64-softmmu/migration/ram.o
  CC      x86_64-softmmu/accel/accel.o
  CC      x86_64-softmmu/accel/kvm/kvm-all.o
  CC      aarch64-softmmu/accel/accel.o
  CC      x86_64-softmmu/accel/tcg/tcg-all.o
  CC      aarch64-softmmu/accel/stubs/kvm-stub.o
  CC      aarch64-softmmu/accel/tcg/tcg-all.o
  CC      x86_64-softmmu/accel/tcg/cputlb.o
  CC      aarch64-softmmu/accel/tcg/cputlb.o
  CC      aarch64-softmmu/accel/tcg/cpu-exec.o
  CC      aarch64-softmmu/accel/tcg/cpu-exec-common.o
  CC      aarch64-softmmu/accel/tcg/translate-all.o
  CC      x86_64-softmmu/accel/tcg/cpu-exec.o
  CC      aarch64-softmmu/hw/adc/stm32f2xx_adc.o
  CC      aarch64-softmmu/hw/block/virtio-blk.o
  CC      x86_64-softmmu/accel/tcg/cpu-exec-common.o
  CC      aarch64-softmmu/hw/block/dataplane/virtio-blk.o
  CC      x86_64-softmmu/accel/tcg/translate-all.o
  CC      aarch64-softmmu/hw/char/omap_uart.o
  CC      aarch64-softmmu/hw/char/digic-uart.o
  CC      aarch64-softmmu/hw/char/exynos4210_uart.o
  CC      x86_64-softmmu/hw/block/virtio-blk.o
  CC      x86_64-softmmu/hw/block/dataplane/virtio-blk.o
  CC      aarch64-softmmu/hw/char/stm32f2xx_usart.o
  CC      x86_64-softmmu/hw/char/virtio-serial-bus.o
  CC      aarch64-softmmu/hw/char/bcm2835_aux.o
  CC      x86_64-softmmu/hw/core/generic-loader.o
  CC      x86_64-softmmu/hw/core/null-machine.o
  CC      aarch64-softmmu/hw/char/virtio-serial-bus.o
  CC      aarch64-softmmu/hw/core/generic-loader.o
  CC      x86_64-softmmu/hw/display/vga.o
  CC      x86_64-softmmu/hw/display/virtio-gpu.o
  CC      aarch64-softmmu/hw/core/null-machine.o
  CC      x86_64-softmmu/hw/display/virtio-gpu-3d.o
  CC      aarch64-softmmu/hw/cpu/arm11mpcore.o
  CC      aarch64-softmmu/hw/cpu/realview_mpcore.o
  CC      aarch64-softmmu/hw/cpu/a9mpcore.o
  CC      x86_64-softmmu/hw/display/virtio-gpu-pci.o
  CC      aarch64-softmmu/hw/cpu/a15mpcore.o
  CC      x86_64-softmmu/hw/display/virtio-vga.o
  CC      x86_64-softmmu/hw/intc/apic.o
  CC      aarch64-softmmu/hw/display/omap_dss.o
  CC      x86_64-softmmu/hw/intc/apic_common.o
  CC      x86_64-softmmu/hw/intc/ioapic.o
  CC      aarch64-softmmu/hw/display/omap_lcdc.o
  CC      x86_64-softmmu/hw/isa/lpc_ich9.o
  CC      aarch64-softmmu/hw/display/pxa2xx_lcd.o
  CC      aarch64-softmmu/hw/display/bcm2835_fb.o
  CC      x86_64-softmmu/hw/misc/vmport.o
  CC      aarch64-softmmu/hw/display/vga.o
  CC      aarch64-softmmu/hw/display/virtio-gpu.o
  CC      aarch64-softmmu/hw/display/virtio-gpu-3d.o
  CC      aarch64-softmmu/hw/display/virtio-gpu-pci.o
  CC      x86_64-softmmu/hw/misc/ivshmem.o
  CC      x86_64-softmmu/hw/misc/pvpanic.o
  CC      aarch64-softmmu/hw/display/dpcd.o
  CC      x86_64-softmmu/hw/misc/hyperv_testdev.o
  CC      aarch64-softmmu/hw/display/xlnx_dp.o
  CC      aarch64-softmmu/hw/dma/xlnx_dpdma.o
  CC      aarch64-softmmu/hw/dma/omap_dma.o
  CC      aarch64-softmmu/hw/dma/soc_dma.o
  CC      x86_64-softmmu/hw/misc/mmio_interface.o
  CC      aarch64-softmmu/hw/dma/pxa2xx_dma.o
  CC      aarch64-softmmu/hw/dma/bcm2835_dma.o
  CC      x86_64-softmmu/hw/net/virtio-net.o
  CC      aarch64-softmmu/hw/gpio/omap_gpio.o
  CC      x86_64-softmmu/hw/net/vhost_net.o
  CC      x86_64-softmmu/hw/scsi/virtio-scsi.o
  CC      aarch64-softmmu/hw/gpio/imx_gpio.o
  CC      aarch64-softmmu/hw/gpio/bcm2835_gpio.o
  CC      x86_64-softmmu/hw/scsi/virtio-scsi-dataplane.o
  CC      x86_64-softmmu/hw/scsi/vhost-scsi-common.o
  CC      x86_64-softmmu/hw/scsi/vhost-scsi.o
  CC      x86_64-softmmu/hw/scsi/vhost-user-scsi.o
  CC      x86_64-softmmu/hw/timer/mc146818rtc.o
  CC      x86_64-softmmu/hw/vfio/common.o
  CC      x86_64-softmmu/hw/vfio/pci.o
  CC      x86_64-softmmu/hw/vfio/pci-quirks.o
  CC      x86_64-softmmu/hw/vfio/platform.o
  CC      x86_64-softmmu/hw/vfio/spapr.o
  CC      x86_64-softmmu/hw/virtio/virtio.o
  CC      x86_64-softmmu/hw/virtio/virtio-balloon.o
  CC      x86_64-softmmu/hw/virtio/vhost.o
  CC      x86_64-softmmu/hw/virtio/vhost-backend.o
  CC      aarch64-softmmu/hw/i2c/omap_i2c.o
  CC      aarch64-softmmu/hw/input/pxa2xx_keypad.o
  CC      x86_64-softmmu/hw/virtio/vhost-user.o
  CC      aarch64-softmmu/hw/input/tsc210x.o
  CC      x86_64-softmmu/hw/virtio/vhost-vsock.o
  CC      aarch64-softmmu/hw/intc/armv7m_nvic.o
  CC      x86_64-softmmu/hw/virtio/virtio-crypto.o
  CC      x86_64-softmmu/hw/virtio/virtio-crypto-pci.o
  CC      x86_64-softmmu/hw/i386/multiboot.o
  CC      aarch64-softmmu/hw/intc/exynos4210_gic.o
  CC      x86_64-softmmu/hw/i386/pc.o
  CC      x86_64-softmmu/hw/i386/pc_piix.o
  CC      aarch64-softmmu/hw/intc/exynos4210_combiner.o
  CC      x86_64-softmmu/hw/i386/pc_q35.o
  CC      aarch64-softmmu/hw/intc/omap_intc.o
  CC      x86_64-softmmu/hw/i386/pc_sysfw.o
  CC      aarch64-softmmu/hw/intc/bcm2835_ic.o
  CC      x86_64-softmmu/hw/i386/x86-iommu.o
  CC      aarch64-softmmu/hw/intc/bcm2836_control.o
  CC      x86_64-softmmu/hw/i386/intel_iommu.o
  CC      aarch64-softmmu/hw/intc/allwinner-a10-pic.o
/tmp/qemu-test/src/hw/i386/pc_piix.c: In function ‘igd_passthrough_isa_bridge_create’:
/tmp/qemu-test/src/hw/i386/pc_piix.c:1065: warning: ‘pch_rev_id’ may be used uninitialized in this function
  CC      x86_64-softmmu/hw/i386/amd_iommu.o
  CC      x86_64-softmmu/hw/i386/kvmvapic.o
  CC      aarch64-softmmu/hw/intc/aspeed_vic.o
  CC      x86_64-softmmu/hw/i386/acpi-build.o
  CC      aarch64-softmmu/hw/intc/arm_gicv3_cpuif.o
  CC      aarch64-softmmu/hw/misc/ivshmem.o
  CC      x86_64-softmmu/hw/i386/pci-assign-load-rom.o
  CC      x86_64-softmmu/hw/i386/kvm/clock.o
  CC      x86_64-softmmu/hw/i386/kvm/apic.o
  CC      x86_64-softmmu/hw/i386/kvm/i8259.o
  CC      aarch64-softmmu/hw/misc/arm_sysctl.o
  CC      x86_64-softmmu/hw/i386/kvm/ioapic.o
  CC      x86_64-softmmu/hw/i386/kvm/i8254.o
  CC      aarch64-softmmu/hw/misc/cbus.o
  CC      aarch64-softmmu/hw/misc/exynos4210_pmu.o
  CC      x86_64-softmmu/hw/i386/kvm/pci-assign.o
  CC      x86_64-softmmu/target/i386/helper.o
  CC      aarch64-softmmu/hw/misc/exynos4210_clk.o
  CC      aarch64-softmmu/hw/misc/exynos4210_rng.o
  CC      aarch64-softmmu/hw/misc/imx_ccm.o
  CC      aarch64-softmmu/hw/misc/imx31_ccm.o
/tmp/qemu-test/src/hw/i386/acpi-build.c: In function ‘build_append_pci_bus_devices’:
/tmp/qemu-test/src/hw/i386/acpi-build.c:525: warning: ‘notify_method’ may be used uninitialized in this function
  CC      x86_64-softmmu/target/i386/cpu.o
  CC      x86_64-softmmu/target/i386/gdbstub.o
  CC      x86_64-softmmu/target/i386/xsave_helper.o
  CC      x86_64-softmmu/target/i386/translate.o
  CC      aarch64-softmmu/hw/misc/imx25_ccm.o
  CC      x86_64-softmmu/target/i386/bpt_helper.o
  CC      x86_64-softmmu/target/i386/cc_helper.o
  CC      x86_64-softmmu/target/i386/excp_helper.o
  CC      x86_64-softmmu/target/i386/fpu_helper.o
  CC      aarch64-softmmu/hw/misc/imx6_ccm.o
  CC      x86_64-softmmu/target/i386/int_helper.o
  CC      x86_64-softmmu/target/i386/mem_helper.o
  CC      x86_64-softmmu/target/i386/misc_helper.o
  CC      aarch64-softmmu/hw/misc/imx6_src.o
  CC      x86_64-softmmu/target/i386/mpx_helper.o
  CC      aarch64-softmmu/hw/misc/mst_fpga.o
  CC      x86_64-softmmu/target/i386/seg_helper.o
  CC      aarch64-softmmu/hw/misc/omap_clk.o
  CC      aarch64-softmmu/hw/misc/omap_gpmc.o
  CC      aarch64-softmmu/hw/misc/omap_l4.o
  CC      x86_64-softmmu/target/i386/smm_helper.o
  CC      aarch64-softmmu/hw/misc/omap_sdrc.o
  CC      aarch64-softmmu/hw/misc/omap_tap.o
  CC      aarch64-softmmu/hw/misc/bcm2835_mbox.o
  CC      x86_64-softmmu/target/i386/svm_helper.o
  CC      aarch64-softmmu/hw/misc/bcm2835_property.o
  CC      aarch64-softmmu/hw/misc/bcm2835_rng.o
  CC      x86_64-softmmu/target/i386/machine.o
  CC      aarch64-softmmu/hw/misc/zynq_slcr.o
  CC      x86_64-softmmu/target/i386/arch_memory_mapping.o
  CC      x86_64-softmmu/target/i386/arch_dump.o
  CC      x86_64-softmmu/target/i386/monitor.o
  CC      x86_64-softmmu/target/i386/kvm.o
  CC      aarch64-softmmu/hw/misc/zynq-xadc.o
  CC      x86_64-softmmu/target/i386/hyperv.o
  CC      aarch64-softmmu/hw/misc/stm32f2xx_syscfg.o
  GEN     trace/generated-helpers.c
  CC      x86_64-softmmu/trace/control-target.o
  CC      aarch64-softmmu/hw/misc/mps2-scc.o
  CC      x86_64-softmmu/gdbstub-xml.o
  CC      x86_64-softmmu/trace/generated-helpers.o
  CC      aarch64-softmmu/hw/misc/auxbus.o
  CC      aarch64-softmmu/hw/misc/aspeed_scu.o
  CC      aarch64-softmmu/hw/misc/aspeed_sdmc.o
  CC      aarch64-softmmu/hw/misc/mmio_interface.o
  CC      aarch64-softmmu/hw/net/virtio-net.o
  CC      aarch64-softmmu/hw/net/vhost_net.o
  CC      aarch64-softmmu/hw/pcmcia/pxa2xx.o
  CC      aarch64-softmmu/hw/scsi/virtio-scsi.o
  CC      aarch64-softmmu/hw/scsi/virtio-scsi-dataplane.o
  CC      aarch64-softmmu/hw/scsi/vhost-scsi-common.o
  CC      aarch64-softmmu/hw/scsi/vhost-scsi.o
  CC      aarch64-softmmu/hw/scsi/vhost-user-scsi.o
  CC      aarch64-softmmu/hw/sd/omap_mmc.o
  CC      aarch64-softmmu/hw/sd/pxa2xx_mmci.o
  CC      aarch64-softmmu/hw/sd/bcm2835_sdhost.o
  CC      aarch64-softmmu/hw/ssi/omap_spi.o
  CC      aarch64-softmmu/hw/ssi/imx_spi.o
  CC      aarch64-softmmu/hw/timer/exynos4210_mct.o
  CC      aarch64-softmmu/hw/timer/exynos4210_pwm.o
  CC      aarch64-softmmu/hw/timer/exynos4210_rtc.o
  CC      aarch64-softmmu/hw/timer/omap_gptimer.o
  CC      aarch64-softmmu/hw/timer/omap_synctimer.o
  CC      aarch64-softmmu/hw/timer/pxa2xx_timer.o
  CC      aarch64-softmmu/hw/timer/digic-timer.o
  CC      aarch64-softmmu/hw/timer/allwinner-a10-pit.o
  CC      aarch64-softmmu/hw/usb/tusb6010.o
  CC      aarch64-softmmu/hw/vfio/common.o
  CC      aarch64-softmmu/hw/vfio/pci.o
  CC      aarch64-softmmu/hw/vfio/pci-quirks.o
  CC      aarch64-softmmu/hw/vfio/platform.o
  CC      aarch64-softmmu/hw/vfio/calxeda-xgmac.o
  CC      aarch64-softmmu/hw/vfio/amd-xgbe.o
  CC      aarch64-softmmu/hw/vfio/spapr.o
  CC      aarch64-softmmu/hw/virtio/virtio.o
  LINK    x86_64-softmmu/qemu-system-x86_64
  CC      aarch64-softmmu/hw/virtio/virtio-balloon.o
  CC      aarch64-softmmu/hw/virtio/vhost.o
  CC      aarch64-softmmu/hw/virtio/vhost-backend.o
  CC      aarch64-softmmu/hw/virtio/vhost-user.o
  CC      aarch64-softmmu/hw/virtio/vhost-vsock.o
  CC      aarch64-softmmu/hw/virtio/virtio-crypto.o
  CC      aarch64-softmmu/hw/virtio/virtio-crypto-pci.o
  CC      aarch64-softmmu/hw/arm/boot.o
  CC      aarch64-softmmu/hw/arm/collie.o
  CC      aarch64-softmmu/hw/arm/exynos4_boards.o
  CC      aarch64-softmmu/hw/arm/gumstix.o
  CC      aarch64-softmmu/hw/arm/highbank.o
  CC      aarch64-softmmu/hw/arm/digic_boards.o
  CC      aarch64-softmmu/hw/arm/mainstone.o
  CC      aarch64-softmmu/hw/arm/integratorcp.o
  CC      aarch64-softmmu/hw/arm/musicpal.o
  CC      aarch64-softmmu/hw/arm/nseries.o
  CC      aarch64-softmmu/hw/arm/omap_sx1.o
  CC      aarch64-softmmu/hw/arm/palm.o
  CC      aarch64-softmmu/hw/arm/realview.o
  CC      aarch64-softmmu/hw/arm/spitz.o
  CC      aarch64-softmmu/hw/arm/stellaris.o
  CC      aarch64-softmmu/hw/arm/tosa.o
  CC      aarch64-softmmu/hw/arm/versatilepb.o
  CC      aarch64-softmmu/hw/arm/vexpress.o
  CC      aarch64-softmmu/hw/arm/virt.o
  CC      aarch64-softmmu/hw/arm/xilinx_zynq.o
  CC      aarch64-softmmu/hw/arm/z2.o
  CC      aarch64-softmmu/hw/arm/virt-acpi-build.o
  CC      aarch64-softmmu/hw/arm/netduino2.o
  CC      aarch64-softmmu/hw/arm/sysbus-fdt.o
  CC      aarch64-softmmu/hw/arm/armv7m.o
  CC      aarch64-softmmu/hw/arm/exynos4210.o
  CC      aarch64-softmmu/hw/arm/pxa2xx.o
  CC      aarch64-softmmu/hw/arm/pxa2xx_gpio.o
  CC      aarch64-softmmu/hw/arm/pxa2xx_pic.o
  CC      aarch64-softmmu/hw/arm/digic.o
  CC      aarch64-softmmu/hw/arm/omap1.o
  CC      aarch64-softmmu/hw/arm/omap2.o
  CC      aarch64-softmmu/hw/arm/strongarm.o
  CC      aarch64-softmmu/hw/arm/allwinner-a10.o
  CC      aarch64-softmmu/hw/arm/cubieboard.o
  CC      aarch64-softmmu/hw/arm/bcm2835_peripherals.o
  CC      aarch64-softmmu/hw/arm/bcm2836.o
  CC      aarch64-softmmu/hw/arm/raspi.o
  CC      aarch64-softmmu/hw/arm/stm32f205_soc.o
  CC      aarch64-softmmu/hw/arm/xlnx-zynqmp.o
  CC      aarch64-softmmu/hw/arm/xlnx-ep108.o
  CC      aarch64-softmmu/hw/arm/fsl-imx25.o
  CC      aarch64-softmmu/hw/arm/imx25_pdk.o
  CC      aarch64-softmmu/hw/arm/fsl-imx31.o
  CC      aarch64-softmmu/hw/arm/kzm.o
  CC      aarch64-softmmu/hw/arm/fsl-imx6.o
  CC      aarch64-softmmu/hw/arm/sabrelite.o
  CC      aarch64-softmmu/hw/arm/aspeed_soc.o
  CC      aarch64-softmmu/hw/arm/aspeed.o
  CC      aarch64-softmmu/hw/arm/mps2.o
  CC      aarch64-softmmu/target/arm/arm-semi.o
  CC      aarch64-softmmu/target/arm/machine.o
  CC      aarch64-softmmu/target/arm/psci.o
  CC      aarch64-softmmu/target/arm/arch_dump.o
  CC      aarch64-softmmu/target/arm/monitor.o
  CC      aarch64-softmmu/target/arm/kvm-stub.o
  CC      aarch64-softmmu/target/arm/translate.o
  CC      aarch64-softmmu/target/arm/op_helper.o
  CC      aarch64-softmmu/target/arm/helper.o
  CC      aarch64-softmmu/target/arm/cpu.o
  CC      aarch64-softmmu/target/arm/neon_helper.o
  CC      aarch64-softmmu/target/arm/iwmmxt_helper.o
  CC      aarch64-softmmu/target/arm/gdbstub.o
  CC      aarch64-softmmu/target/arm/cpu64.o
  CC      aarch64-softmmu/target/arm/translate-a64.o
  CC      aarch64-softmmu/target/arm/helper-a64.o
  CC      aarch64-softmmu/target/arm/gdbstub64.o
  CC      aarch64-softmmu/target/arm/crypto_helper.o
/tmp/qemu-test/src/target/arm/translate-a64.c: In function ‘handle_shri_with_rndacc’:
/tmp/qemu-test/src/target/arm/translate-a64.c:6388: warning: ‘tcg_src_hi’ may be used uninitialized in this function
/tmp/qemu-test/src/target/arm/translate-a64.c: In function ‘disas_simd_scalar_two_reg_misc’:
/tmp/qemu-test/src/target/arm/translate-a64.c:8115: warning: ‘rmode’ may be used uninitialized in this function
  GEN     trace/generated-helpers.c
  CC      aarch64-softmmu/gdbstub-xml.o
  CC      aarch64-softmmu/target/arm/arm-powerctl.o
  CC      aarch64-softmmu/trace/control-target.o
  CC      aarch64-softmmu/trace/generated-helpers.o
  LINK    aarch64-softmmu/qemu-system-aarch64
  TEST    tests/qapi-schema/alternate-any.out
  TEST    tests/qapi-schema/alternate-array.out
  TEST    tests/qapi-schema/alternate-conflict-enum-bool.out
  TEST    tests/qapi-schema/alternate-clash.out
  TEST    tests/qapi-schema/alternate-base.out
  TEST    tests/qapi-schema/alternate-conflict-dict.out
  TEST    tests/qapi-schema/alternate-conflict-enum-int.out
  TEST    tests/qapi-schema/alternate-conflict-string.out
  TEST    tests/qapi-schema/alternate-empty.out
  TEST    tests/qapi-schema/alternate-nested.out
  TEST    tests/qapi-schema/alternate-unknown.out
  TEST    tests/qapi-schema/args-alternate.out
  TEST    tests/qapi-schema/args-any.out
  TEST    tests/qapi-schema/args-array-empty.out
  TEST    tests/qapi-schema/args-bad-boxed.out
  TEST    tests/qapi-schema/args-array-unknown.out
  TEST    tests/qapi-schema/args-boxed-anon.out
  TEST    tests/qapi-schema/args-boxed-empty.out
  TEST    tests/qapi-schema/args-boxed-string.out
  TEST    tests/qapi-schema/args-int.out
  TEST    tests/qapi-schema/args-member-array-bad.out
  TEST    tests/qapi-schema/args-invalid.out
  TEST    tests/qapi-schema/args-member-case.out
  TEST    tests/qapi-schema/args-member-unknown.out
  TEST    tests/qapi-schema/args-union.out
  TEST    tests/qapi-schema/args-name-clash.out
  TEST    tests/qapi-schema/args-unknown.out
  TEST    tests/qapi-schema/bad-base.out
  TEST    tests/qapi-schema/bad-data.out
  TEST    tests/qapi-schema/bad-ident.out
  TEST    tests/qapi-schema/bad-type-bool.out
  TEST    tests/qapi-schema/bad-type-dict.out
  TEST    tests/qapi-schema/bad-type-int.out
  TEST    tests/qapi-schema/base-cycle-indirect.out
  TEST    tests/qapi-schema/base-cycle-direct.out
  TEST    tests/qapi-schema/command-int.out
  TEST    tests/qapi-schema/comments.out
  TEST    tests/qapi-schema/doc-bad-alternate-member.out
  TEST    tests/qapi-schema/doc-bad-command-arg.out
  TEST    tests/qapi-schema/doc-bad-symbol.out
  TEST    tests/qapi-schema/doc-bad-union-member.out
  TEST    tests/qapi-schema/doc-before-pragma.out
  TEST    tests/qapi-schema/doc-before-include.out
  TEST    tests/qapi-schema/doc-duplicated-arg.out
  TEST    tests/qapi-schema/doc-duplicated-return.out
  TEST    tests/qapi-schema/doc-duplicated-since.out
  TEST    tests/qapi-schema/doc-empty-arg.out
  TEST    tests/qapi-schema/doc-empty-section.out
  TEST    tests/qapi-schema/doc-empty-symbol.out
  TEST    tests/qapi-schema/doc-good.out
  TEST    tests/qapi-schema/doc-interleaved-section.out
  TEST    tests/qapi-schema/doc-invalid-end2.out
  TEST    tests/qapi-schema/doc-invalid-end.out
  TEST    tests/qapi-schema/doc-invalid-return.out
  TEST    tests/qapi-schema/doc-invalid-section.out
  TEST    tests/qapi-schema/doc-invalid-start.out
  TEST    tests/qapi-schema/doc-missing.out
  TEST    tests/qapi-schema/doc-missing-colon.out
  TEST    tests/qapi-schema/doc-missing-expr.out
  TEST    tests/qapi-schema/doc-missing-space.out
  TEST    tests/qapi-schema/doc-no-symbol.out
  TEST    tests/qapi-schema/double-data.out
  TEST    tests/qapi-schema/double-type.out
  TEST    tests/qapi-schema/duplicate-key.out
  TEST    tests/qapi-schema/empty.out
  TEST    tests/qapi-schema/enum-bad-name.out
  TEST    tests/qapi-schema/enum-bad-prefix.out
  TEST    tests/qapi-schema/enum-clash-member.out
  TEST    tests/qapi-schema/enum-dict-member.out
  TEST    tests/qapi-schema/enum-int-member.out
  TEST    tests/qapi-schema/enum-member-case.out
  TEST    tests/qapi-schema/enum-missing-data.out
  TEST    tests/qapi-schema/enum-wrong-data.out
  TEST    tests/qapi-schema/escape-outside-string.out
  TEST    tests/qapi-schema/escape-too-big.out
  TEST    tests/qapi-schema/escape-too-short.out
  TEST    tests/qapi-schema/event-boxed-empty.out
  TEST    tests/qapi-schema/event-case.out
  TEST    tests/qapi-schema/event-nest-struct.out
  TEST    tests/qapi-schema/flat-union-array-branch.out
  TEST    tests/qapi-schema/flat-union-bad-base.out
  TEST    tests/qapi-schema/flat-union-bad-discriminator.out
  TEST    tests/qapi-schema/flat-union-base-any.out
  TEST    tests/qapi-schema/flat-union-base-union.out
  TEST    tests/qapi-schema/flat-union-clash-member.out
  TEST    tests/qapi-schema/flat-union-empty.out
  TEST    tests/qapi-schema/flat-union-incomplete-branch.out
  TEST    tests/qapi-schema/flat-union-inline.out
  TEST    tests/qapi-schema/flat-union-int-branch.out
  TEST    tests/qapi-schema/flat-union-invalid-branch-key.out
  TEST    tests/qapi-schema/flat-union-invalid-discriminator.out
  TEST    tests/qapi-schema/flat-union-no-base.out
  TEST    tests/qapi-schema/flat-union-optional-discriminator.out
  TEST    tests/qapi-schema/flat-union-string-discriminator.out
  TEST    tests/qapi-schema/funny-char.out
  TEST    tests/qapi-schema/ident-with-escape.out
  TEST    tests/qapi-schema/include-before-err.out
  TEST    tests/qapi-schema/include-cycle.out
  TEST    tests/qapi-schema/include-extra-junk.out
  TEST    tests/qapi-schema/include-format-err.out
  TEST    tests/qapi-schema/include-nested-err.out
  TEST    tests/qapi-schema/include-no-file.out
  TEST    tests/qapi-schema/include-non-file.out
  TEST    tests/qapi-schema/include-relpath.out
  TEST    tests/qapi-schema/include-repetition.out
  TEST    tests/qapi-schema/include-self-cycle.out
  TEST    tests/qapi-schema/include-simple.out
  TEST    tests/qapi-schema/indented-expr.out
  TEST    tests/qapi-schema/leading-comma-list.out
  TEST    tests/qapi-schema/leading-comma-object.out
  TEST    tests/qapi-schema/missing-colon.out
  TEST    tests/qapi-schema/missing-comma-list.out
  TEST    tests/qapi-schema/missing-comma-object.out
  TEST    tests/qapi-schema/missing-type.out
  TEST    tests/qapi-schema/nested-struct-data.out
  TEST    tests/qapi-schema/non-objects.out
  TEST    tests/qapi-schema/pragma-doc-required-crap.out
  TEST    tests/qapi-schema/pragma-extra-junk.out
  TEST    tests/qapi-schema/pragma-name-case-whitelist-crap.out
  TEST    tests/qapi-schema/pragma-non-dict.out
  TEST    tests/qapi-schema/pragma-returns-whitelist-crap.out
  TEST    tests/qapi-schema/qapi-schema-test.out
  TEST    tests/qapi-schema/quoted-structural-chars.out
  TEST    tests/qapi-schema/redefined-builtin.out
  TEST    tests/qapi-schema/redefined-command.out
  TEST    tests/qapi-schema/redefined-event.out
  TEST    tests/qapi-schema/redefined-type.out
  TEST    tests/qapi-schema/reserved-command-q.out
  TEST    tests/qapi-schema/reserved-enum-q.out
  TEST    tests/qapi-schema/reserved-member-has.out
  TEST    tests/qapi-schema/reserved-member-q.out
  TEST    tests/qapi-schema/reserved-member-u.out
  TEST    tests/qapi-schema/reserved-member-underscore.out
  TEST    tests/qapi-schema/reserved-type-kind.out
  TEST    tests/qapi-schema/returns-alternate.out
  TEST    tests/qapi-schema/reserved-type-list.out
  TEST    tests/qapi-schema/returns-array-bad.out
  TEST    tests/qapi-schema/returns-dict.out
  TEST    tests/qapi-schema/returns-unknown.out
  TEST    tests/qapi-schema/returns-whitelist.out
  TEST    tests/qapi-schema/struct-base-clash-deep.out
  TEST    tests/qapi-schema/struct-base-clash.out
  TEST    tests/qapi-schema/struct-data-invalid.out
  TEST    tests/qapi-schema/struct-member-invalid.out
  TEST    tests/qapi-schema/trailing-comma-list.out
  TEST    tests/qapi-schema/trailing-comma-object.out
  TEST    tests/qapi-schema/type-bypass-bad-gen.out
  TEST    tests/qapi-schema/unclosed-list.out
  TEST    tests/qapi-schema/unclosed-object.out
  TEST    tests/qapi-schema/unclosed-string.out
  TEST    tests/qapi-schema/unicode-str.out
  TEST    tests/qapi-schema/union-base-empty.out
  TEST    tests/qapi-schema/union-base-no-discriminator.out
  TEST    tests/qapi-schema/union-branch-case.out
  TEST    tests/qapi-schema/union-clash-branches.out
  TEST    tests/qapi-schema/union-empty.out
  TEST    tests/qapi-schema/union-invalid-base.out
  TEST    tests/qapi-schema/union-optional-branch.out
  TEST    tests/qapi-schema/union-unknown.out
  TEST    tests/qapi-schema/unknown-escape.out
  TEST    tests/qapi-schema/unknown-expr-key.out
  GEN     tests/qapi-schema/doc-good.test.texi
  CC      tests/check-qdict.o
  CC      tests/test-char.o
  CC      tests/check-qnum.o
  CC      tests/check-qstring.o
  CC      tests/check-qlist.o
  CC      tests/check-qnull.o
  CC      tests/check-qjson.o
  CC      tests/test-qobject-output-visitor.o
  GEN     tests/test-qapi-visit.c
  GEN     tests/test-qapi-types.c
  GEN     tests/test-qapi-event.c
  GEN     tests/test-qmp-introspect.c
  CC      tests/test-clone-visitor.o
  CC      tests/test-qobject-input-visitor.o
  CC      tests/test-qmp-commands.o
  GEN     tests/test-qmp-marshal.c
  CC      tests/test-string-input-visitor.o
  CC      tests/test-string-output-visitor.o
  CC      tests/test-qmp-event.o
  CC      tests/test-opts-visitor.o
  CC      tests/test-coroutine.o
  CC      tests/iothread.o
  CC      tests/test-visitor-serialization.o
  CC      tests/test-iov.o
  CC      tests/test-aio.o
  CC      tests/test-aio-multithread.o
  CC      tests/test-throttle.o
  CC      tests/test-thread-pool.o
  CC      tests/test-hbitmap.o
  CC      tests/test-blockjob.o
  CC      tests/test-blockjob-txn.o
  CC      tests/test-x86-cpuid.o
  CC      tests/test-xbzrle.o
  CC      tests/test-vmstate.o
  CC      tests/test-cutils.o
  CC      tests/test-shift128.o
  CC      tests/test-mul64.o
  CC      tests/test-int128.o
  CC      tests/rcutorture.o
  CC      tests/test-rcu-list.o
  CC      tests/test-qdist.o
  CC      tests/test-qht.o
  CC      tests/test-qht-par.o
  CC      tests/qht-bench.o
  CC      tests/test-bitops.o
  CC      tests/test-bitcnt.o
  CC      tests/check-qom-interface.o
  CC      tests/check-qom-proplist.o
  CC      tests/test-qemu-opts.o
  CC      tests/test-keyval.o
  CC      tests/test-write-threshold.o
/tmp/qemu-test/src/tests/test-int128.c:180: warning: ‘__noclone__’ attribute directive ignored
  CC      tests/test-crypto-hash.o
  CC      tests/test-crypto-hmac.o
  CC      tests/test-crypto-cipher.o
  CC      tests/test-crypto-secret.o
  CC      tests/test-qga.o
  CC      tests/libqtest.o
  CC      tests/test-timed-average.o
  CC      tests/test-io-task.o
  CC      tests/test-io-channel-socket.o
  CC      tests/io-channel-helpers.o
  CC      tests/test-io-channel-file.o
  CC      tests/test-io-channel-command.o
  CC      tests/test-io-channel-buffer.o
  CC      tests/test-base64.o
  CC      tests/test-crypto-ivgen.o
  CC      tests/test-crypto-afsplit.o
  CC      tests/test-crypto-xts.o
  CC      tests/test-crypto-block.o
  CC      tests/test-logging.o
  CC      tests/test-replication.o
  CC      tests/test-bufferiszero.o
  CC      tests/test-uuid.o
  CC      tests/ptimer-test.o
  CC      tests/ptimer-test-stubs.o
  CC      tests/test-qapi-util.o
  CC      tests/vhost-user-test.o
  CC      tests/libqos/pci.o
  CC      tests/libqos/fw_cfg.o
  CC      tests/libqos/malloc.o
  CC      tests/libqos/i2c.o
  CC      tests/libqos/libqos.o
  CC      tests/libqos/malloc-spapr.o
  CC      tests/libqos/libqos-spapr.o
  CC      tests/libqos/rtas.o
  CC      tests/libqos/pci-spapr.o
  CC      tests/libqos/pci-pc.o
  CC      tests/libqos/malloc-pc.o
  CC      tests/libqos/libqos-pc.o
  CC      tests/libqos/ahci.o
  CC      tests/libqos/virtio.o
  CC      tests/libqos/virtio-pci.o
  CC      tests/libqos/virtio-mmio.o
  CC      tests/libqos/malloc-generic.o
  CC      tests/endianness-test.o
  CC      tests/fdc-test.o
  CC      tests/ide-test.o
  CC      tests/ahci-test.o
  CC      tests/hd-geo-test.o
  CC      tests/boot-order-test.o
  CC      tests/bios-tables-test.o
  CC      tests/boot-sector.o
  CC      tests/acpi-utils.o
  CC      tests/boot-serial-test.o
  CC      tests/pxe-test.o
  CC      tests/rtc-test.o
  CC      tests/ipmi-kcs-test.o
  CC      tests/ipmi-bt-test.o
  CC      tests/i440fx-test.o
  CC      tests/fw_cfg-test.o
  CC      tests/drive_del-test.o
  CC      tests/wdt_ib700-test.o
  CC      tests/tco-test.o
  CC      tests/e1000-test.o
  CC      tests/e1000e-test.o
  CC      tests/rtl8139-test.o
  CC      tests/pcnet-test.o
  CC      tests/eepro100-test.o
  CC      tests/ne2000-test.o
  CC      tests/nvme-test.o
  CC      tests/ac97-test.o
  CC      tests/es1370-test.o
  CC      tests/virtio-net-test.o
  CC      tests/virtio-balloon-test.o
  CC      tests/virtio-blk-test.o
  CC      tests/virtio-rng-test.o
  CC      tests/virtio-scsi-test.o
  CC      tests/virtio-serial-test.o
  CC      tests/virtio-console-test.o
  CC      tests/tpci200-test.o
  CC      tests/ipoctal232-test.o
  CC      tests/display-vga-test.o
  CC      tests/intel-hda-test.o
  CC      tests/ivshmem-test.o
  CC      tests/megasas-test.o
  CC      tests/vmxnet3-test.o
  CC      tests/pvpanic-test.o
  CC      tests/i82801b11-test.o
  CC      tests/ioh3420-test.o
  CC      tests/usb-hcd-ohci-test.o
  CC      tests/libqos/usb.o
  CC      tests/usb-hcd-uhci-test.o
  CC      tests/usb-hcd-ehci-test.o
  CC      tests/usb-hcd-xhci-test.o
  CC      tests/pc-cpu-test.o
  CC      tests/q35-test.o
  CC      tests/vmgenid-test.o
  CC      tests/test-netfilter.o
  CC      tests/test-filter-mirror.o
  CC      tests/test-filter-redirector.o
  CC      tests/postcopy-test.o
  CC      tests/test-x86-cpuid-compat.o
  CC      tests/numa-test.o
  CC      tests/qmp-test.o
  CC      tests/device-introspect-test.o
  CC      tests/qom-test.o
  CC      tests/test-hmp.o
  LINK    tests/check-qdict
  LINK    tests/test-char
  LINK    tests/check-qnum
  LINK    tests/check-qstring
  LINK    tests/check-qlist
  LINK    tests/check-qnull
  LINK    tests/check-qjson
  CC      tests/test-qapi-types.o
  CC      tests/test-qapi-visit.o
  CC      tests/test-qapi-event.o
  CC      tests/test-qmp-introspect.o
  CC      tests/test-qmp-marshal.o
  LINK    tests/test-coroutine
  LINK    tests/test-iov
  LINK    tests/test-aio
  LINK    tests/test-aio-multithread
  LINK    tests/test-throttle
  LINK    tests/test-thread-pool
  LINK    tests/test-hbitmap
  LINK    tests/test-blockjob
  LINK    tests/test-blockjob-txn
  LINK    tests/test-x86-cpuid
  LINK    tests/test-xbzrle
  LINK    tests/test-vmstate
  LINK    tests/test-cutils
  LINK    tests/test-shift128
  LINK    tests/test-mul64
  LINK    tests/test-int128
  LINK    tests/rcutorture
  LINK    tests/test-rcu-list
  LINK    tests/test-qdist
  LINK    tests/test-qht
  LINK    tests/qht-bench
  LINK    tests/test-bitops
  LINK    tests/test-bitcnt
  LINK    tests/check-qom-interface
  LINK    tests/check-qom-proplist
  LINK    tests/test-qemu-opts
  LINK    tests/test-keyval
  LINK    tests/test-write-threshold
  LINK    tests/test-crypto-hash
  LINK    tests/test-crypto-hmac
  LINK    tests/test-crypto-cipher
  LINK    tests/test-crypto-secret
  LINK    tests/test-qga
  LINK    tests/test-timed-average
  LINK    tests/test-io-task
  LINK    tests/test-io-channel-socket
  LINK    tests/test-io-channel-file
  LINK    tests/test-io-channel-command
  LINK    tests/test-io-channel-buffer
  LINK    tests/test-base64
  LINK    tests/test-crypto-ivgen
  LINK    tests/test-crypto-afsplit
  LINK    tests/test-crypto-xts
  LINK    tests/test-crypto-block
  LINK    tests/test-logging
  LINK    tests/test-replication
  LINK    tests/test-bufferiszero
  LINK    tests/test-uuid
  LINK    tests/ptimer-test
  LINK    tests/test-qapi-util
  LINK    tests/vhost-user-test
  LINK    tests/endianness-test
  LINK    tests/fdc-test
  LINK    tests/ide-test
  LINK    tests/ahci-test
  LINK    tests/hd-geo-test
  LINK    tests/boot-order-test
  LINK    tests/bios-tables-test
  LINK    tests/boot-serial-test
  LINK    tests/pxe-test
  LINK    tests/rtc-test
  LINK    tests/ipmi-kcs-test
  LINK    tests/i440fx-test
  LINK    tests/fw_cfg-test
  LINK    tests/drive_del-test
  LINK    tests/wdt_ib700-test
  LINK    tests/tco-test
  LINK    tests/e1000-test
  LINK    tests/e1000e-test
  LINK    tests/rtl8139-test
  LINK    tests/pcnet-test
  LINK    tests/eepro100-test
  LINK    tests/ne2000-test
  LINK    tests/nvme-test
  LINK    tests/ac97-test
  LINK    tests/es1370-test
  LINK    tests/virtio-net-test
  LINK    tests/virtio-balloon-test
  LINK    tests/virtio-blk-test
  LINK    tests/virtio-rng-test
  LINK    tests/virtio-scsi-test
  LINK    tests/virtio-serial-test
  LINK    tests/virtio-console-test
  LINK    tests/tpci200-test
  LINK    tests/ipoctal232-test
  LINK    tests/display-vga-test
  LINK    tests/intel-hda-test
  LINK    tests/ivshmem-test
  LINK    tests/megasas-test
  LINK    tests/vmxnet3-test
  LINK    tests/pvpanic-test
  LINK    tests/i82801b11-test
  LINK    tests/ioh3420-test
  LINK    tests/usb-hcd-ohci-test
  LINK    tests/usb-hcd-uhci-test
  LINK    tests/usb-hcd-ehci-test
  LINK    tests/usb-hcd-xhci-test
  LINK    tests/pc-cpu-test
  LINK    tests/q35-test
  LINK    tests/vmgenid-test
  LINK    tests/test-netfilter
  LINK    tests/test-filter-mirror
  LINK    tests/test-filter-redirector
  LINK    tests/postcopy-test
  LINK    tests/test-x86-cpuid-compat
  LINK    tests/numa-test
  LINK    tests/qmp-test
  LINK    tests/device-introspect-test
  LINK    tests/qom-test
  LINK    tests/test-hmp
  GTESTER tests/check-qdict
  GTESTER tests/test-char
  GTESTER tests/check-qnum
  GTESTER tests/check-qstring
  GTESTER tests/check-qlist
  GTESTER tests/check-qnull
  GTESTER tests/check-qjson
  LINK    tests/test-qobject-output-visitor
  LINK    tests/test-clone-visitor
  LINK    tests/test-qobject-input-visitor
  LINK    tests/test-qmp-commands
  LINK    tests/test-string-input-visitor
  LINK    tests/test-string-output-visitor
  LINK    tests/test-qmp-event
  LINK    tests/test-opts-visitor
  GTESTER tests/test-coroutine
  LINK    tests/test-visitor-serialization
  GTESTER tests/test-iov
  GTESTER tests/test-aio
  GTESTER tests/test-thread-pool
  GTESTER tests/test-throttle
  GTESTER tests/test-aio-multithread
  GTESTER tests/test-hbitmap
  GTESTER tests/test-blockjob
  GTESTER tests/test-blockjob-txn
  GTESTER tests/test-x86-cpuid
  GTESTER tests/test-xbzrle
  GTESTER tests/test-vmstate
Failed to load simple/primitive:b_1
Failed to load simple/primitive:i64_2
Failed to load simple/primitive:i32_1
Failed to load simple/primitive:i32_1
Failed to load test/with_tmp:a
Failed to load test/tmp_child_parent:f
Failed to load test/tmp_child:parent
Failed to load test/with_tmp:tmp
Failed to load test/tmp_child:diff
Failed to load test/with_tmp:tmp
Failed to load test/tmp_child:diff
Failed to load test/with_tmp:tmp
  GTESTER tests/test-cutils
  GTESTER tests/test-shift128
  GTESTER tests/test-mul64
  GTESTER tests/test-int128
  GTESTER tests/rcutorture
  GTESTER tests/test-rcu-list
  GTESTER tests/test-qdist
  GTESTER tests/test-qht
  LINK    tests/test-qht-par
  GTESTER tests/test-bitops
  GTESTER tests/test-bitcnt
  GTESTER tests/check-qom-interface
  GTESTER tests/check-qom-proplist
  GTESTER tests/test-qemu-opts
  GTESTER tests/test-keyval
  GTESTER tests/test-write-threshold
  GTESTER tests/test-crypto-hash
  GTESTER tests/test-crypto-hmac
  GTESTER tests/test-crypto-cipher
  GTESTER tests/test-crypto-secret
  GTESTER tests/test-timed-average
  GTESTER tests/test-qga
  GTESTER tests/test-io-task
  GTESTER tests/test-io-channel-socket
  GTESTER tests/test-io-channel-file
  GTESTER tests/test-io-channel-command
  GTESTER tests/test-io-channel-buffer
  GTESTER tests/test-base64
  GTESTER tests/test-crypto-ivgen
  GTESTER tests/test-crypto-afsplit
  GTESTER tests/test-crypto-xts
  GTESTER tests/test-crypto-block
  GTESTER tests/test-logging
  GTESTER tests/test-replication
  GTESTER tests/test-bufferiszero
  GTESTER tests/test-uuid
  GTESTER tests/ptimer-test
  GTESTER tests/test-qapi-util
  LINK    tests/ipmi-bt-test
  GTESTER tests/test-qobject-output-visitor
  GTESTER tests/test-qobject-input-visitor
  GTESTER tests/test-clone-visitor
  GTESTER tests/test-qmp-commands
  GTESTER tests/test-string-input-visitor
  GTESTER tests/test-string-output-visitor
  GTESTER tests/test-qmp-event
  GTESTER tests/test-opts-visitor
  GTESTER tests/test-qht-par
  GTESTER tests/test-visitor-serialization
  GTESTER check-qtest-x86_64
  GTESTER check-qtest-aarch64
**
ERROR:/tmp/qemu-test/src/tests/test-qga.c:78:fixture_setup: assertion failed (fixture->fd != -1): (-1 != -1)
GTester: last random seed: R02S1021e1f576e14505a92c46f5dfe047db
make: *** [check-tests/test-qga] Error 1
make: *** Waiting for unfinished jobs....
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 382, in <module>
    sys.exit(main())
  File "./tests/docker/docker.py", line 379, in main
    return args.cmdobj.run(args, argv)
  File "./tests/docker/docker.py", line 237, in run
    return Docker().run(argv, args.keep, quiet=args.quiet)
  File "./tests/docker/docker.py", line 205, in run
    quiet=quiet)
  File "./tests/docker/docker.py", line 123, in _do_check
    return subprocess.check_call(self._command + cmd, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['docker', 'run', '--label', 'com.qemu.instance.uuid=33b487c86cfe11e7b706525400c803e1', '-u', '0', '-t', '--rm', '--net=none', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=8', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/var/tmp/patchew-tester-tmp-0962jnus/src/docker-src.2017-07-19-23.47.47.24854:/var/tmp/qemu:z,ro', '-v', '/root/.cache/qemu-docker-ccache:/var/tmp/ccache:z', 'qemu:centos6', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2
tests/docker/Makefile.include:124: recipe for target 'docker-run' failed
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory '/var/tmp/patchew-tester-tmp-0962jnus/src'
tests/docker/Makefile.include:156: recipe for target 'docker-run-test-quick@centos6' failed
make: *** [docker-run-test-quick@centos6] Error 2
=== OUTPUT END ===

Test command exited with code: 2


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 12/43] tcg: convert tb->cflags reads to tb_cflags(tb)
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 12/43] tcg: convert tb->cflags reads to tb_cflags(tb) Emilio G. Cota
@ 2017-07-20  7:22   ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2017-07-20  7:22 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel

On 07/19/2017 05:08 PM, Emilio G. Cota wrote:
> Convert all existing readers of tb->cflags to tb_cflags, so that we
> use atomic_read and therefore avoid undefined behaviour in C11.
> 
> Note that the remaining setters/getters of the field are protected
> by tb_lock, and therefore do not need conversion.
> 
> Luckily all readers access the field via 'tb->cflags' (so no foo.cflags,
> bar->cflags in the code base), which makes the conversion easily
> scriptable:
> FILES=$(git grep 'tb->cflags' target include/exec/gen-icount.h | \
> 	cut -f 1 -d':' | sort | uniq)
> perl -pi -e 's/([^>])tb->cflags/$1tb_cflags(tb)/g' $FILES
> perl -pi -e 's/([a-z]*)->tb->cflags/tb_cflags($1->tb)/g' $FILES
> 
> Then manually fixed the few errors that checkpatch reported.
> 
> Compile-tested for all targets.
> 
> Suggested-by: Richard Henderson<rth@twiddle.net>
> Signed-off-by: Emilio G. Cota<cota@braap.org>
> ---

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 16/43] target/m68k: check CF_PARALLEL instead of parallel_cpus
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 16/43] target/m68k: " Emilio G. Cota
@ 2017-07-20  7:23   ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2017-07-20  7:23 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel

On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> Thereby decoupling the resulting translated code from the current state
> of the system.
> 
> Signed-off-by: Emilio G. Cota<cota@braap.org>
> ---
>   target/m68k/helper.h    |  1 +
>   target/m68k/op_helper.c | 33 ++++++++++++++++++++-------------
>   target/m68k/translate.c | 12 ++++++++++--
>   3 files changed, 31 insertions(+), 15 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 17/43] target/s390x: check CF_PARALLEL instead of parallel_cpus
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 17/43] target/s390x: " Emilio G. Cota
@ 2017-07-20  7:25   ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2017-07-20  7:25 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel

On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> Thereby decoupling the resulting translated code from the current state
> of the system.
> 
> Signed-off-by: Emilio G. Cota<cota@braap.org>
> ---
>   target/s390x/helper.h     |  4 +++
>   target/s390x/mem_helper.c | 80 +++++++++++++++++++++++++++++++++++++----------
>   target/s390x/translate.c  | 26 ++++++++++++---
>   3 files changed, 88 insertions(+), 22 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 18/43] target/sh4: check CF_PARALLEL instead of parallel_cpus
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 18/43] target/sh4: " Emilio G. Cota
@ 2017-07-20  7:26   ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2017-07-20  7:26 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel

On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> Thereby decoupling the resulting translated code from the current state
> of the system.
> 
> Signed-off-by: Emilio G. Cota<cota@braap.org>
> ---
>   target/sh4/translate.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps Emilio G. Cota
@ 2017-07-20  7:39   ` Richard Henderson
  2017-07-20 23:53     ` Emilio G. Cota
  0 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2017-07-20  7:39 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel

On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> Groundwork for supporting multiple TCG contexts.
> 
> While at it, also allocate temps_used directly as a bitmap of the
> required size, instead of having a bitmap of TCG_MAX_TEMPS via
> TCGTempSet.
> 
> Performance-wise we lose about 2% in a translation-heavy workload
> such as booting+shutting down debian-arm:
> 
> Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
> 	-machine type=virt -nographic -smp 1 -m 4096 \
> 	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
> 	-device virtio-net-device,netdev=unet \
> 	-drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
> 	-device virtio-blk-device,drive=myblock \
> 	-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
> 	-name arm,debug-threads=on -smp 1' (10 runs):
> 
> Before:
>        19489.126318 task-clock                #    0.960 CPUs utilized            ( +-  0.96% )
>              23,697 context-switches          #    0.001 M/sec                    ( +-  0.51% )
>                   1 CPU-migrations            #    0.000 M/sec
>              19,953 page-faults               #    0.001 M/sec                    ( +-  0.40% )
>      56,214,402,410 cycles                    #    2.884 GHz                      ( +-  0.95% ) [83.34%]
>      25,516,669,513 stalled-cycles-frontend   #   45.39% frontend cycles idle     ( +-  0.69% ) [83.33%]
>      17,266,165,747 stalled-cycles-backend    #   30.71% backend  cycles idle     ( +-  0.59% ) [66.66%]
>      79,007,843,327 instructions              #    1.41  insns per cycle
>                                               #    0.32  stalled cycles per insn  ( +-  1.19% ) [83.34%]
>      13,136,600,416 branches                  #  674.048 M/sec                    ( +-  1.29% ) [83.34%]
>         274,715,270 branch-misses             #    2.09% of all branches          ( +-  0.79% ) [83.33%]
> 
>        20.300335944 seconds time elapsed                                          ( +-  0.55% )
> 
> After:
>        19917.737030 task-clock                #    0.955 CPUs utilized            ( +-  0.74% )
>              23,973 context-switches          #    0.001 M/sec                    ( +-  0.37% )
>                   1 CPU-migrations            #    0.000 M/sec
>              19,824 page-faults               #    0.001 M/sec                    ( +-  0.38% )
>      57,380,269,537 cycles                    #    2.881 GHz                      ( +-  0.70% ) [83.34%]
>      26,462,452,508 stalled-cycles-frontend   #   46.12% frontend cycles idle     ( +-  0.65% ) [83.34%]
>      17,970,546,047 stalled-cycles-backend    #   31.32% backend  cycles idle     ( +-  0.64% ) [66.67%]
>      79,527,238,334 instructions              #    1.39  insns per cycle
>                                               #    0.33  stalled cycles per insn  ( +-  0.79% ) [83.33%]
>      13,272,362,192 branches                  #  666.359 M/sec                    ( +-  0.83% ) [83.34%]
>         278,357,773 branch-misses             #    2.10% of all branches          ( +-  0.65% ) [83.33%]
> 
>        20.850558455 seconds time elapsed                                          ( +-  0.55% )
> 
> That is, 2.70% slowdown.

That's disappointing.  How about using tcg_malloc?

Maximum allocation is sizeof(tcg_temp_info) * TCG_MAX_TEMPS = 12288, which is 
less than TCG_POOL_CHUNK_SIZE, so we'll retain the allocation in the pool 
across translations.

Otherwise,

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 36/43] tcg: introduce **tcg_ctxs to keep track of all TCGContext's
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 36/43] tcg: introduce **tcg_ctxs to keep track of all TCGContext's Emilio G. Cota
@ 2017-07-20  7:47   ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2017-07-20  7:47 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel

On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> Groundwork for supporting multiple TCG contexts.
> 
> Note that having n_tcg_ctxs is unnecessary. However, it is
> convenient to have it, since it will simplify iterating over the
> array: we'll have just a for loop instead of having to iterate
> over a NULL-terminated array (which would require n+1 elems)
> or having to check with ifdef's for usermode/softmmu.
> 
> Signed-off-by: Emilio G. Cota<cota@braap.org>
> ---
>   tcg/tcg.c | 4 ++++
>   1 file changed, 4 insertions(+)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 39/43] osdep: introduce qemu_mprotect_rwx/none
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 39/43] osdep: introduce qemu_mprotect_rwx/none Emilio G. Cota
@ 2017-07-20  7:49   ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2017-07-20  7:49 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel

On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> Signed-off-by: Emilio G. Cota<cota@braap.org>
> ---
>   include/qemu/osdep.h |  2 ++
>   util/osdep.c         | 41 +++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 43 insertions(+)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 40/43] translate-all: use qemu_protect_rwx/none helpers
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 40/43] translate-all: use qemu_protect_rwx/none helpers Emilio G. Cota
@ 2017-07-20  7:51   ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2017-07-20  7:51 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel

On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> The helpers require the address and size to be page-aligned, so
> do that before calling them.
> 
> Signed-off-by: Emilio G. Cota<cota@braap.org>
> ---
>   accel/tcg/translate-all.c | 61 ++++++++++-------------------------------------
>   1 file changed, 13 insertions(+), 48 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 42/43] tcg: introduce regions to split code_gen_buffer
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 42/43] tcg: introduce regions to split code_gen_buffer Emilio G. Cota
@ 2017-07-20  8:04   ` Richard Henderson
  2017-07-20 20:50     ` Emilio G. Cota
  0 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2017-07-20  8:04 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel

On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> +    /* We do not yet support multiple TCG contexts, so use one region for now */
> +    n_regions = 1;
> +
> +    /* start on a page-aligned address */
> +    buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
> +    g_assert(buf < tcg_init_ctx.code_gen_buffer + size);
> +
> +    /* discard that initial portion */
> +    size -= buf - tcg_init_ctx.code_gen_buffer;

It seems pointless wasting most of a page after the prologue when n_regions == 
1.  We don't really need to start on a page boundary in that case.

> +    /* make region_size a multiple of page_size */
> +    region_size = size / n_regions;
> +    region_size = QEMU_ALIGN_DOWN(region_size, qemu_real_host_page_size);

This division can result in a number of pages at the end of the region being 
unused.  Is it worthwhile freeing them?  Or marking them mprotect_none along 
with the last guard page?


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 43/43] tcg: enable multiple TCG contexts in softmmu
  2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 43/43] tcg: enable multiple TCG contexts in softmmu Emilio G. Cota
@ 2017-07-20  8:17   ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2017-07-20  8:17 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel

On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> +#ifdef CONFIG_SOFTMMU
> +/*
> + * It is likely that some vCPUs will translate more code than others, so we
> + * first try to set more regions than smp_cpus, with those regions being of
> + * reasonable size. If that's not possible we make do by evenly dividing
> + * the code_gen_buffer among the vCPUs.
> + */
> +static size_t tcg_n_regions(void)

Use #ifdef CONFIG_USER_ONLY here.

In the back of your mind, please remove the SOFTMMU == !USER_ONLY equality that 
tends to pervade the code base at present.

At some point in the (distant) future I want to be able to enable SOFTMMU for 
USER_ONLY.  There is *lots* that doesn't work when the host page size != guest 
page size.  Which is becoming more and more common (on non-x86) as enterprise 
operating systems enable 64k pages.

Otherwise,

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/43] tcg: define CF_PARALLEL and use it for TB hashing
  2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 11/43] tcg: define CF_PARALLEL and use it for TB hashing Emilio G. Cota
@ 2017-07-20  8:45   ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2017-07-20  8:45 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel

On 07/19/2017 05:08 PM, Emilio G. Cota wrote:
> @@ -225,31 +226,27 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
>   static void cpu_exec_step(CPUState *cpu)
>   {
>       CPUClass *cc = CPU_GET_CLASS(cpu);
> -    CPUArchState *env = (CPUArchState *)cpu->env_ptr;
>       TranslationBlock *tb;
>       target_ulong cs_base, pc;
>       uint32_t flags;
> +    uint32_t cflags = 1 | CF_IGNORE_ICOUNT;
>   
> -    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>       if (sigsetjmp(cpu->jmp_env, 0) == 0) {
> -        mmap_lock();
> -        tb_lock();
> -        tb = tb_gen_code(cpu, pc, cs_base, flags,
> -                         1 | CF_NOCACHE | CF_IGNORE_ICOUNT);
> -        tb->orig_tb = NULL;
> -        tb_unlock();
> -        mmap_unlock();
> +        tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags,
> +                                  cflags & CF_HASH_MASK);
> +        if (tb == NULL) {
> +            mmap_lock();
> +            tb_lock();
> +            tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
> +            tb_unlock();
> +            mmap_unlock();
> +        }
>   

The only thing I'm worried about here is that the correctness of this *relies* 
on the tb_flush that happens when enabling parallel_cpus.  Consider:

Prior to multiple threads, we create a TB with pc=X, cflags=0, which happens to 
begin with an atomic operation.  Later, after cloning the first thread, we 
raise EXCP_ATOMIC at pc=X.  We'll ask for an existing tb with 
cflags=1|CF_IGNORE_ICOUNT, but since we don't compare any of those bits, we 
return the original TB.

Worse, that original TB may have populated links to other TBs and control may 
"escape" from the one TB, vastly extending the amount of code we execute under 
the exclusive lock.

The only reason this escape does not happen now is that (1) we generate new TBs 
and (2) cpu_exec_step does not fill in links, so that (3) any goto_tb 
degenerate to jump-to-next-host-insn, and (4) front-ends are unlikely to 
generate goto_ptr when ending a TB containing a single instruction.

Some of the above seems extremely fragile.

This suggests to me the usefulness of a CF_NOCHAIN flag, which would cause 
tcg-op.c to suppress emitting any code that chains between blocks.  No need for 
you to add that in this patch set, however, your tcg_ctx.cf_parallel should 
simply be tcg_ctx.tb_cflags, and s/cf_parallel/tb_cflags & CF_PARALLEL/.  Then 
it is trivial for me to come along and test tb_cflags & CF_NOCHAIN.

Anyway, back to the current patch.  Leaving CF_IGNORE_ICOUNT alone for the 
moment, that implies adding CF_COUNT_MASK to CF_HASH_MASK.

If we were to clean up CF_IGNORE_ICOUNT, then I think we could compare all bits 
of cflags.  You need not do that in this patch set.



r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 42/43] tcg: introduce regions to split code_gen_buffer
  2017-07-20  8:04   ` Richard Henderson
@ 2017-07-20 20:50     ` Emilio G. Cota
  2017-07-20 21:22       ` Richard Henderson
  0 siblings, 1 reply; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20 20:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Wed, Jul 19, 2017 at 22:04:50 -1000, Richard Henderson wrote:
> On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> >+    /* We do not yet support multiple TCG contexts, so use one region for now */
> >+    n_regions = 1;
> >+
> >+    /* start on a page-aligned address */
> >+    buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
> >+    g_assert(buf < tcg_init_ctx.code_gen_buffer + size);
> >+
> >+    /* discard that initial portion */
> >+    size -= buf - tcg_init_ctx.code_gen_buffer;
> 
> It seems pointless wasting most of a page after the prologue when n_regions
> == 1.  We don't really need to start on a page boundary in that case.
> 
> >+    /* make region_size a multiple of page_size */
> >+    region_size = size / n_regions;
> >+    region_size = QEMU_ALIGN_DOWN(region_size, qemu_real_host_page_size);
> 
> This division can result in a number of pages at the end of the region being
> unused.  Is it worthwhile freeing them?  Or marking them mprotect_none along
> with the last guard page?

Perhaps we should then enlarge both the first and last regions so that we
fully use the buffer.

What do you think of the below? It's a delta over the v3 patch.

		Emilio

---8<---

diff --git a/tcg/tcg.c b/tcg/tcg.c
index a5c01be..8bf8bca 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -129,7 +129,8 @@ static unsigned int n_tcg_ctxs;
  */
 struct tcg_region_state {
     QemuMutex lock;
-    void *buf;      /* set at init time */
+    void *start;    /* set at init time */
+    void *end;      /* set at init time */
     size_t n;       /* set at init time */
     size_t size;    /* size of one region; set at init time */
     size_t current; /* protected by the lock */
@@ -277,13 +278,27 @@ TCGLabel *gen_new_label(void)
 
 static void tcg_region_assign(TCGContext *s, size_t curr_region)
 {
+    void *aligned = QEMU_ALIGN_PTR_UP(region.start, qemu_real_host_page_size);
     void *buf;
+    size_t size = region.size;
 
-    buf = region.buf + curr_region * (region.size + qemu_real_host_page_size);
+    /* The beginning of the first region might not be page-aligned */
+    if (curr_region == 0) {
+        buf = region.start;
+        size += aligned - buf;
+    } else {
+        buf = aligned + curr_region * (region.size +  qemu_real_host_page_size);
+    }
+    /* the last region might be larger than region.size */
+    if (curr_region == region.n - 1) {
+        void *aligned_end = buf + size;
+
+        size += region.end - qemu_real_host_page_size - aligned_end;
+    }
     s->code_gen_buffer = buf;
     s->code_gen_ptr = buf;
-    s->code_gen_buffer_size = region.size;
-    s->code_gen_highwater = buf + region.size - TCG_HIGHWATER;
+    s->code_gen_buffer_size = size;
+    s->code_gen_highwater = buf + size - TCG_HIGHWATER;
 }
 
 static bool tcg_region_alloc__locked(TCGContext *s)
@@ -409,44 +424,50 @@ static size_t tcg_n_regions(void)
 void tcg_region_init(void)
 {
     void *buf = tcg_init_ctx.code_gen_buffer;
+    void *aligned;
     size_t size = tcg_init_ctx.code_gen_buffer_size;
+    size_t page_size = qemu_real_host_page_size;
     size_t region_size;
     size_t n_regions;
     size_t i;
+    int rc;
 
     n_regions = tcg_n_regions();
 
-    /* start on a page-aligned address */
-    buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
-    g_assert(buf < tcg_init_ctx.code_gen_buffer + size);
-
-    /* discard that initial portion */
-    size -= buf - tcg_init_ctx.code_gen_buffer;
-
-    /* make region_size a multiple of page_size */
-    region_size = size / n_regions;
-    region_size = QEMU_ALIGN_DOWN(region_size, qemu_real_host_page_size);
+    /* The first region will be 'aligned - buf' bytes larger than the others */
+    aligned = QEMU_ALIGN_PTR_UP(buf, page_size);
+    g_assert(aligned < tcg_init_ctx.code_gen_buffer + size);
+    /*
+     * Make region_size a multiple of page_size, using aligned as the start.
+     * As a result of this we might end up with a few extra pages at the end of
+     * the buffer; we will assign those to the last region.
+     */
+    region_size = (size - (aligned - buf)) / n_regions;
+    region_size = QEMU_ALIGN_DOWN(region_size, page_size);
 
     /* A region must have at least 2 pages; one code, one guard */
-    g_assert(region_size >= 2 * qemu_real_host_page_size);
+    g_assert(region_size >= 2 * page_size);
 
     /* init the region struct */
     qemu_mutex_init(&region.lock);
     region.n = n_regions;
-    region.buf = buf;
+    region.start = buf;
+    /* page-align the end, since its last page will be a guard page */
+    region.end = QEMU_ALIGN_PTR_DOWN(buf + size, page_size);
     /* do not count the guard page in region.size */
-    region.size = region_size - qemu_real_host_page_size;
+    region.size = region_size - page_size;
 
-    /* set guard pages */
-    for (i = 0; i < region.n; i++) {
+    /* set guard pages for the first n-1 regions */
+    for (i = 0; i < region.n - 1; i++) {
         void *guard;
-        int rc;
 
-        guard = region.buf + region.size;
-        guard += i * (region.size + qemu_real_host_page_size);
-        rc = qemu_mprotect_none(guard, qemu_real_host_page_size);
+        guard = aligned + region.size + i * (region.size + page_size);
+        rc = qemu_mprotect_none(guard, page_size);
         g_assert(!rc);
     }
+    /* the last region gets the rest of the buffer */
+    rc = qemu_mprotect_none(region.end - page_size, page_size);
+    g_assert(!rc);
 
     /* In user-mode we support only one ctx, so do the initial allocation now */
 #ifdef CONFIG_USER_ONLY

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 42/43] tcg: introduce regions to split code_gen_buffer
  2017-07-20 20:50     ` Emilio G. Cota
@ 2017-07-20 21:22       ` Richard Henderson
  2017-07-20 23:23         ` Emilio G. Cota
  0 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2017-07-20 21:22 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel

On 07/20/2017 10:50 AM, Emilio G. Cota wrote:
> On Wed, Jul 19, 2017 at 22:04:50 -1000, Richard Henderson wrote:
>> On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
>>> +    /* We do not yet support multiple TCG contexts, so use one region for now */
>>> +    n_regions = 1;
>>> +
>>> +    /* start on a page-aligned address */
>>> +    buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
>>> +    g_assert(buf < tcg_init_ctx.code_gen_buffer + size);
>>> +
>>> +    /* discard that initial portion */
>>> +    size -= buf - tcg_init_ctx.code_gen_buffer;
>>
>> It seems pointless wasting most of a page after the prologue when n_regions
>> == 1.  We don't really need to start on a page boundary in that case.
>>
>>> +    /* make region_size a multiple of page_size */
>>> +    region_size = size / n_regions;
>>> +    region_size = QEMU_ALIGN_DOWN(region_size, qemu_real_host_page_size);
>>
>> This division can result in a number of pages at the end of the region being
>> unused.  Is it worthwhile freeing them?  Or marking them mprotect_none along
>> with the last guard page?
> 
> Perhaps we should then enlarge both the first and last regions so that we
> fully use the buffer.

I really like the idea.  That's a lot of space recovered for 64k page hosts.

I do think we can make the computation clearer.  How about


>   static void tcg_region_assign(TCGContext *s, size_t curr_region)
>   {
> +    void *aligned = QEMU_ALIGN_PTR_UP(region.start, qemu_real_host_page_size);
>       void *buf;
> +    size_t size = region.size;
>   
> -    buf = region.buf + curr_region * (region.size + qemu_real_host_page_size);
> +    /* The beginning of the first region might not be page-aligned */
> +    if (curr_region == 0) {
> +        buf = region.start;
> +        size += aligned - buf;
> +    } else {
> +        buf = aligned + curr_region * (region.size +  qemu_real_host_page_size);
> +    }
> +    /* the last region might be larger than region.size */
> +    if (curr_region == region.n - 1) {
> +        void *aligned_end = buf + size;
> +
> +        size += region.end - qemu_real_host_page_size - aligned_end;
> +    }
>       s->code_gen_buffer = buf;
>       s->code_gen_ptr = buf;
> -    s->code_gen_buffer_size = region.size;
> -    s->code_gen_highwater = buf + region.size - TCG_HIGHWATER;
> +    s->code_gen_buffer_size = size;
> +    s->code_gen_highwater = buf + size - TCG_HIGHWATER;
>   }

static inline void tcg_region_bounds(TCGContext *s, size_t curr_region,
                                      void **pstart, void **pend)
{
   void *start, *end;

   /* ??? Maybe store "aligned" precomputed.  */
   start = QEMU_ALIGN_PTR_UP(region.start, qemu_real_host_page_size);
   /* ??? Maybe store "stride" precomputed.  */
   start += curr_region * (region.size + qemu_real_host_page_size);
   end = start + region.size;

   if (curr_region == 0) {
     start = region.start;
   }
   if (curr_region == region.n - 1) {
     end = region.end;
   }

   *pstart = start;
   *pend = end;
}

static void tcg_region_assign(TCGContext *s, size_t curr_region)
{
   void *start, *end;

   tcg_region_bounds(s, curr_region, &start, &end);

   s->code_gen_buffer = start;
   s->code_gen_ptr = start;
   s->code_gen_buffer_size = end - start;
   s->code_gen_highwater = end - TCG_HIGHWATER;
}

> @@ -409,44 +424,50 @@ static size_t tcg_n_regions(void)
>   void tcg_region_init(void)
>   {
>       void *buf = tcg_init_ctx.code_gen_buffer;
> +    void *aligned;
>       size_t size = tcg_init_ctx.code_gen_buffer_size;
> +    size_t page_size = qemu_real_host_page_size;
>       size_t region_size;
>       size_t n_regions;
>       size_t i;
> +    int rc;
>   
>       n_regions = tcg_n_regions();
>   
> -    /* start on a page-aligned address */
> -    buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
> -    g_assert(buf < tcg_init_ctx.code_gen_buffer + size);
> -
> -    /* discard that initial portion */
> -    size -= buf - tcg_init_ctx.code_gen_buffer;
> -
> -    /* make region_size a multiple of page_size */
> -    region_size = size / n_regions;
> -    region_size = QEMU_ALIGN_DOWN(region_size, qemu_real_host_page_size);
> +    /* The first region will be 'aligned - buf' bytes larger than the others */
> +    aligned = QEMU_ALIGN_PTR_UP(buf, page_size);
> +    g_assert(aligned < tcg_init_ctx.code_gen_buffer + size);
> +    /*
> +     * Make region_size a multiple of page_size, using aligned as the start.
> +     * As a result of this we might end up with a few extra pages at the end of
> +     * the buffer; we will assign those to the last region.
> +     */
> +    region_size = (size - (aligned - buf)) / n_regions;
> +    region_size = QEMU_ALIGN_DOWN(region_size, page_size);
>   
>       /* A region must have at least 2 pages; one code, one guard */
> -    g_assert(region_size >= 2 * qemu_real_host_page_size);
> +    g_assert(region_size >= 2 * page_size);
>   
>       /* init the region struct */
>       qemu_mutex_init(&region.lock);
>       region.n = n_regions;
> -    region.buf = buf;
> +    region.start = buf;
> +    /* page-align the end, since its last page will be a guard page */
> +    region.end = QEMU_ALIGN_PTR_DOWN(buf + size, page_size);
>       /* do not count the guard page in region.size */
> -    region.size = region_size - qemu_real_host_page_size;
> +    region.size = region_size - page_size;
>   
> -    /* set guard pages */
> -    for (i = 0; i < region.n; i++) {
> +    /* set guard pages for the first n-1 regions */
> +    for (i = 0; i < region.n - 1; i++) {
>           void *guard;
> -        int rc;
>   
> -        guard = region.buf + region.size;
> -        guard += i * (region.size + qemu_real_host_page_size);
> -        rc = qemu_mprotect_none(guard, qemu_real_host_page_size);
> +        guard = aligned + region.size + i * (region.size + page_size);
> +        rc = qemu_mprotect_none(guard, page_size);
>           g_assert(!rc);
>       }
> +    /* the last region gets the rest of the buffer */
> +    rc = qemu_mprotect_none(region.end - page_size, page_size);
> +    g_assert(!rc);

   for (i = region.n; i-- > 0; ) {
     void *start, *end;
     tcg_region_bounds(s, i, &start, &end);
     rc = qemu_mprotect_none(end, page_size);
     g_assert(rc == 0);
   }



r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 42/43] tcg: introduce regions to split code_gen_buffer
  2017-07-20 21:22       ` Richard Henderson
@ 2017-07-20 23:23         ` Emilio G. Cota
  2017-07-21  0:07           ` Richard Henderson
  0 siblings, 1 reply; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20 23:23 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Jul 20, 2017 at 11:22:10 -1000, Richard Henderson wrote:
> >Perhaps we should then enlarge both the first and last regions so that we
> >fully use the buffer.
> 
> I really like the idea.  That's a lot of space recovered for 64k page hosts.
> 
> I do think we can make the computation clearer.  How about

(snip)
> 
> static inline void tcg_region_bounds(TCGContext *s, size_t curr_region,
>                                      void **pstart, void **pend)
> {
>   void *start, *end;
> 
>   /* ??? Maybe store "aligned" precomputed.  */
>   start = QEMU_ALIGN_PTR_UP(region.start, qemu_real_host_page_size);
>   /* ??? Maybe store "stride" precomputed.  */
>   start += curr_region * (region.size + qemu_real_host_page_size);
>   end = start + region.size;
> 
>   if (curr_region == 0) {
>     start = region.start;
>   }
>   if (curr_region == region.n - 1) {
>     end = region.end;
>   }
> 
>   *pstart = start;
>   *pend = end;
> }

That's a nice helper -- will do it this way.

For v4, should I send all patches again, or just the handful of patches
that are changing from v3?

		E.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
  2017-07-20  7:39   ` Richard Henderson
@ 2017-07-20 23:53     ` Emilio G. Cota
  2017-07-21  0:02       ` Richard Henderson
  0 siblings, 1 reply; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-20 23:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Wed, Jul 19, 2017 at 21:39:35 -1000, Richard Henderson wrote:
> On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> >Groundwork for supporting multiple TCG contexts.
> >That is, 2.70% slowdown.
> 
> That's disappointing.  How about using tcg_malloc?
> 
> Maximum allocation is sizeof(tcg_temp_info) * TCG_MAX_TEMPS = 12288, which
> is less than TCG_POOL_CHUNK_SIZE, so we'll retain the allocation in the pool
> across translations.

             exec time (s)  Relative slowdown wrt original (%)
---------------------------------------------------------------
 original     20.213321616                                  0.
 tcg_malloc   20.441130078                           1.1270214
 TCGContext   20.477846517                           1.3086662
 g_malloc     20.780527895                           2.8061013

So will go with tcg_malloc.

BTW, is there any chance that the pool will be initialized before we copy
tcg_init_ctx? That'd mean the main thread has performed translation, which
seems unlikely to me. But should then we bother clearing the TCGProfile
counters after we copy tcg_init_ctx? I don't see how without translation
counters would be !0.

		E.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
  2017-07-20 23:53     ` Emilio G. Cota
@ 2017-07-21  0:02       ` Richard Henderson
  2017-07-21  5:04         ` Emilio G. Cota
  0 siblings, 1 reply; 63+ messages in thread
From: Richard Henderson @ 2017-07-21  0:02 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel

On 07/20/2017 01:53 PM, Emilio G. Cota wrote:
> BTW, is there any chance that the pool will be initialized before we copy
> tcg_init_ctx? That'd mean the main thread has performed translation, which
> seems unlikely to me. But should then we bother clearing the TCGProfile
> counters after we copy tcg_init_ctx? I don't see how without translation
> counters would be !0.

I wouldn't think so.  This cpu setup should be happening very early.

We could perhaps look at arranging fields such that all the fields that are 
"shared" between the contexts are up front, and use the qemu standard

   memcpy(new, old, offsetof(TCGContext, end_common_fields));

trick, and zero the rest.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 42/43] tcg: introduce regions to split code_gen_buffer
  2017-07-20 23:23         ` Emilio G. Cota
@ 2017-07-21  0:07           ` Richard Henderson
  0 siblings, 0 replies; 63+ messages in thread
From: Richard Henderson @ 2017-07-21  0:07 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel

On 07/20/2017 01:23 PM, Emilio G. Cota wrote:
> That's a nice helper -- will do it this way.
> 
> For v4, should I send all patches again, or just the handful of patches
> that are changing from v3?

I don't mind if you just send the changed patches.
Just be sure to reference the tree in the cover letter.


r~

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
  2017-07-21  0:02       ` Richard Henderson
@ 2017-07-21  5:04         ` Emilio G. Cota
  0 siblings, 0 replies; 63+ messages in thread
From: Emilio G. Cota @ 2017-07-21  5:04 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Jul 20, 2017 at 14:02:53 -1000, Richard Henderson wrote:
> On 07/20/2017 01:53 PM, Emilio G. Cota wrote:
> >BTW, is there any chance that the pool will be initialized before we copy
> >tcg_init_ctx? That'd mean the main thread has performed translation, which
> >seems unlikely to me. But should then we bother clearing the TCGProfile
> >counters after we copy tcg_init_ctx? I don't see how without translation
> >counters would be !0.
> 
> I wouldn't think so.  This cpu setup should be happening very early.

OK. I've removed the clearing of prof in v4.

> We could perhaps look at arranging fields such that all the fields that are
> "shared" between the contexts are up front, and use the qemu standard
> 
>   memcpy(new, old, offsetof(TCGContext, end_common_fields));
> 
> trick, and zero the rest.

It'll be much faster if you do this because you're familiar with all
the fields in there (I'm not); I've added this to the "to do later"
list in v4's cover letter so that we do not forget.

v4 coming up.

		E.

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2017-07-21  5:04 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-20  3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 01/43] cputlb: bring back tlb_flush_count under !TLB_DEBUG Emilio G. Cota
2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 02/43] tcg: fix corruption of code_time profiling counter upon tb_flush Emilio G. Cota
2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 03/43] exec-all: fix typos in TranslationBlock's documentation Emilio G. Cota
2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 04/43] translate-all: make have_tb_lock static Emilio G. Cota
2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 05/43] cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find Emilio G. Cota
2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 06/43] tcg/i386: constify tcg_target_callee_save_regs Emilio G. Cota
2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 07/43] tcg/mips: " Emilio G. Cota
2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 08/43] tcg: remove addr argument from lookup_tb_ptr Emilio G. Cota
2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 09/43] tcg: consolidate TB lookups in tb_lookup__cpu_state Emilio G. Cota
2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 10/43] exec-all: bring tb->invalid into tb->cflags Emilio G. Cota
2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 11/43] tcg: define CF_PARALLEL and use it for TB hashing Emilio G. Cota
2017-07-20  8:45   ` Richard Henderson
2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 12/43] tcg: convert tb->cflags reads to tb_cflags(tb) Emilio G. Cota
2017-07-20  7:22   ` Richard Henderson
2017-07-20  3:08 ` [Qemu-devel] [PATCH v3 13/43] target/arm: check CF_PARALLEL instead of parallel_cpus Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 14/43] target/hppa: " Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 15/43] target/i386: " Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 16/43] target/m68k: " Emilio G. Cota
2017-07-20  7:23   ` Richard Henderson
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 17/43] target/s390x: " Emilio G. Cota
2017-07-20  7:25   ` Richard Henderson
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 18/43] target/sh4: " Emilio G. Cota
2017-07-20  7:26   ` Richard Henderson
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 19/43] target/sparc: " Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 20/43] tcg: " Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 21/43] cpu-exec: lookup/generate TB outside exclusive region during step_atomic Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 22/43] translate-all: define and use DEBUG_TB_FLUSH_GATE Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 23/43] exec-all: introduce TB_PAGE_ADDR_FMT Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 24/43] translate-all: define and use DEBUG_TB_INVALIDATE_GATE Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 25/43] translate-all: define and use DEBUG_TB_CHECK_GATE Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 26/43] exec-all: extract tb->tc_* into a separate struct tc_tb Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 27/43] translate-all: use a binary search tree to track TBs in TBContext Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 28/43] exec-all: rename tb_free to tb_remove Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 29/43] translate-all: report correct avg host TB size Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 30/43] tci: move tci_regs to tcg_qemu_tb_exec's stack Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 31/43] tcg: take tb_ctx out of TCGContext Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 32/43] tcg: take .helpers " Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 33/43] tcg: define tcg_init_ctx and make tcg_ctx a pointer Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 34/43] gen-icount: fold exitreq_label into TCGContext Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps Emilio G. Cota
2017-07-20  7:39   ` Richard Henderson
2017-07-20 23:53     ` Emilio G. Cota
2017-07-21  0:02       ` Richard Henderson
2017-07-21  5:04         ` Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 36/43] tcg: introduce **tcg_ctxs to keep track of all TCGContext's Emilio G. Cota
2017-07-20  7:47   ` Richard Henderson
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 37/43] tcg: distribute profiling counters across TCGContext's Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 38/43] util: move qemu_real_host_page_size/mask to osdep.h Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 39/43] osdep: introduce qemu_mprotect_rwx/none Emilio G. Cota
2017-07-20  7:49   ` Richard Henderson
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 40/43] translate-all: use qemu_protect_rwx/none helpers Emilio G. Cota
2017-07-20  7:51   ` Richard Henderson
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 41/43] tcg: define TCG_HIGHWATER Emilio G. Cota
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 42/43] tcg: introduce regions to split code_gen_buffer Emilio G. Cota
2017-07-20  8:04   ` Richard Henderson
2017-07-20 20:50     ` Emilio G. Cota
2017-07-20 21:22       ` Richard Henderson
2017-07-20 23:23         ` Emilio G. Cota
2017-07-21  0:07           ` Richard Henderson
2017-07-20  3:09 ` [Qemu-devel] [PATCH v3 43/43] tcg: enable multiple TCG contexts in softmmu Emilio G. Cota
2017-07-20  8:17   ` Richard Henderson
2017-07-20  4:05 ` [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.