All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/93] TCI fixes and cleanups
@ 2021-02-04  1:43 Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 01/93] gdbstub: Fix handle_query_xfer_auxv Richard Henderson
                   ` (95 more replies)
  0 siblings, 96 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Almost 7 years ago I detailed 5 major problems in tci[1], of
which three still remain:

  * Unaligned accesses to the bytecode stream, which means
    that we immediately SIGBUS on any host requiring alignment.
  * Non-portable calls to helper functions.
  * Full of useless ifdefs and TODOs.

To my mind, this means the code is unmaintained, despite what it
says in MAINTAINERS.  Thus tci *should* be simply removed.
However, every time removal is suggested, someone comes out of the
woodwork and says we should keep it, because it's useful for $FOO.

Anyway, if we're not going to remove tci, then we have to fix it.

Previously, I rewrote tci all in one lump.  Which was obviously a
mistake, since it meant that the patch was never going to get
reviewed.  This time I've done the rewrite in tiny pieces.

Previously, I invented a moderately complex encoding scheme that
allowed any operand to encode a register or an int32_t immediate.
This time the encoding is quite simple: with only 4 exceptions
all operands encoded into 4-bit slots (registers & conditions).
I rely on the new TEMP_CONST optimization to do a decent job of
loading an immediate value into a register once (via movi), and
reusing that across the TB.

There is a disassembler built into tcg/tci.c, replacing the stub
in disas/tci.c, which reuses the same decoding helpers that are
used by the interpreter.  So finally -d out_asm is useful.

This is good enough to pass make check check-tcg with all of the
docker cross-compilers enabled.  I can boot linux with aarch64,
alpha, and s390x guests.


r~


Based-on: 20210203021550.375058-1-richard.henderson@linaro.org
("[PULL 00/24] tcg patch queue")

[1] https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg02594.html

Richard Henderson (91):
  gdbstub: Fix handle_query_xfer_auxv
  tcg: Split out tcg_raise_tb_overflow
  configure: Fix --enable-tcg-interpreter
  tcg: Manage splitwx in tc_ptr_to_region_tree by hand
  tcg/tci: Make tci_tb_ptr thread-local
  tcg/tci: Inline tci_write_reg32s into the only caller
  tcg/tci: Inline tci_write_reg8 into its callers
  tcg/tci: Inline tci_write_reg16 into the only caller
  tcg/tci: Inline tci_write_reg32 into all callers
  tcg/tci: Inline tci_write_reg64 into 64-bit callers
  tcg/tci: Merge INDEX_op_ld8u_{i32,i64}
  tcg/tci: Merge INDEX_op_ld8s_{i32,i64}
  tcg/tci: Merge INDEX_op_ld16u_{i32,i64}
  tcg/tci: Merge INDEX_op_ld16s_{i32,i64}
  tcg/tci: Merge INDEX_op_{ld_i32,ld32u_i64}
  tcg/tci: Merge INDEX_op_st8_{i32,i64}
  tcg/tci: Merge INDEX_op_st16_{i32,i64}
  tcg/tci: Move stack bounds check to compile-time
  tcg/tci: Merge INDEX_op_{st_i32,st32_i64}
  tcg/tci: Use g_assert_not_reached
  tcg/tci: Remove dead code for TCG_TARGET_HAS_div2_*
  tcg/tci: Implement 64-bit division
  tcg/tci: Remove TODO as unused
  tcg/tci: Restrict TCG_TARGET_NB_REGS to 16
  tcg/tci: Fix TCG_REG_R4 misusage
  tcg/tci: Use bool in tcg_out_ri*
  tcg/tci: Remove TCG_CONST
  tcg/tci: Merge identical cases in generation
  tcg/tci: Remove tci_read_r8
  tcg/tci: Remove tci_read_r8s
  tcg/tci: Remove tci_read_r16
  tcg/tci: Remove tci_read_r16s
  tcg/tci: Remove tci_read_r32s
  tcg/tci: Reduce use of tci_read_r64
  tcg/tci: Merge basic arithmetic operations
  tcg/tci: Merge extension operations
  tcg/tci: Remove ifdefs for TCG_TARGET_HAS_ext32[us]_i64
  tcg/tci: Merge bswap operations
  tcg/tci: Merge mov, not and neg operations
  tcg/tci: Rename tci_read_r to tci_read_rval
  tcg/tci: Split out tci_args_rrs
  tcg/tci: Split out tci_args_rr
  tcg/tci: Split out tci_args_rrr
  tcg/tci: Split out tci_args_rrrc
  tcg/tci: Split out tci_args_l
  tcg/tci: Split out tci_args_rrrrrc
  tcg/tci: Split out tci_args_rrcl and tci_args_rrrrcl
  tcg/tci: Split out tci_args_ri and tci_args_rI
  tcg/tci: Reuse tci_args_l for calls.
  tcg/tci: Reuse tci_args_l for exit_tb
  tcg/tci: Reuse tci_args_l for goto_tb
  tcg/tci: Split out tci_args_rrrrrr
  tcg/tci: Split out tci_args_rrrr
  tcg/tci: Clean up deposit operations
  tcg/tci: Reduce qemu_ld/st TCGMemOpIdx operand to 32-bits
  tcg/tci: Split out tci_args_{rrm,rrrm,rrrrm}
  tcg/tci: Hoist op_size checking into tci_args_*
  tcg/tci: Remove tci_disas
  tcg/tci: Implement the disassembler properly
  tcg: Build ffi data structures for helpers
  tcg/tci: Use ffi for calls
  tcg/tci: Improve tcg_target_call_clobber_regs
  tcg/tci: Move call-return regs to end of tcg_target_reg_alloc_order
  tcg/tci: Push opcode emit into each case
  tcg/tci: Split out tcg_out_op_rrs
  tcg/tci: Split out tcg_out_op_l
  tcg/tci: Split out tcg_out_op_p
  tcg/tci: Split out tcg_out_op_rr
  tcg/tci: Split out tcg_out_op_rrr
  tcg/tci: Split out tcg_out_op_rrrc
  tcg/tci: Split out tcg_out_op_rrrrrc
  tcg/tci: Split out tcg_out_op_rrrbb
  tcg/tci: Split out tcg_out_op_rrcl
  tcg/tci: Split out tcg_out_op_rrrrrr
  tcg/tci: Split out tcg_out_op_rrrr
  tcg/tci: Split out tcg_out_op_rrrrcl
  tcg/tci: Split out tcg_out_op_{rrm,rrrm,rrrrm}
  tcg/tci: Split out tcg_out_op_v
  tcg/tci: Split out tcg_out_op_np
  tcg/tci: Split out tcg_out_op_r[iI]
  tcg/tci: Reserve r13 for a temporary
  tcg/tci: Emit setcond before brcond
  tcg/tci: Remove tci_write_reg
  tcg/tci: Change encoding to uint32_t units
  tcg/tci: Implement goto_ptr
  tcg/tci: Implement movcond
  tcg/tci: Implement andc, orc, eqv, nand, nor
  tcg/tci: Implement extract, sextract
  tcg/tci: Implement clz, ctz, ctpop
  tcg/tci: Implement mulu2, muls2
  tcg/tci: Implement add2, sub2

Stefan Weil (2):
  tcg/tci: Implement INDEX_op_ld16s_i32
  tcg/tci: Implement INDEX_op_ld8s_i64

 configure                              |    5 +-
 meson.build                            |    9 +-
 include/exec/exec-all.h                |    2 +-
 include/exec/helper-ffi.h              |  115 ++
 include/exec/helper-tcg.h              |   24 +-
 include/tcg/tcg-opc.h                  |    6 +-
 include/tcg/tcg.h                      |    1 +
 target/hppa/helper.h                   |    2 +
 target/i386/ops_sse_header.h           |    6 +
 target/m68k/helper.h                   |    1 +
 target/ppc/helper.h                    |    3 +
 tcg/tci/tcg-target-con-set.h           |    8 +-
 tcg/tci/tcg-target.h                   |  118 +-
 disas/tci.c                            |   61 -
 gdbstub.c                              |   17 +-
 tcg/tcg-common.c                       |    4 -
 tcg/tcg.c                              |  117 +-
 tcg/tci.c                              | 1695 +++++++++++++-----------
 tcg/tci/tcg-target.c.inc               |  989 +++++++-------
 tcg/tci/README                         |   20 +-
 tests/docker/dockerfiles/fedora.docker |    1 +
 21 files changed, 1734 insertions(+), 1470 deletions(-)
 create mode 100644 include/exec/helper-ffi.h
 delete mode 100644 disas/tci.c

-- 
2.25.1



^ permalink raw reply	[flat|nested] 129+ messages in thread

* [PATCH v2 01/93] gdbstub: Fix handle_query_xfer_auxv
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 02/93] tcg: Split out tcg_raise_tb_overflow Richard Henderson
                   ` (94 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

The main problem was that we were treating a guest address
as a host address with a mere cast.

Use the correct interface for accessing guest memory.  Do not
allow offset == auxv_len, which would result in an empty packet.

Fixes: 51c623b0de1 ("gdbstub: add support to Xfer:auxv:read: packet")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 gdbstub.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index c7ca7e9f88..759bb00bcf 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -2245,7 +2245,6 @@ static void handle_query_xfer_auxv(GdbCmdContext *gdb_ctx, void *user_ctx)
 {
     TaskState *ts;
     unsigned long offset, len, saved_auxv, auxv_len;
-    const char *mem;
 
     if (gdb_ctx->num_params < 2) {
         put_packet("E22");
@@ -2257,8 +2256,8 @@ static void handle_query_xfer_auxv(GdbCmdContext *gdb_ctx, void *user_ctx)
     ts = gdbserver_state.c_cpu->opaque;
     saved_auxv = ts->info->saved_auxv;
     auxv_len = ts->info->auxv_len;
-    mem = (const char *)(saved_auxv + offset);
-    if (offset > auxv_len) {
+
+    if (offset >= auxv_len) {
         put_packet("E00");
         return;
     }
@@ -2269,12 +2268,20 @@ static void handle_query_xfer_auxv(GdbCmdContext *gdb_ctx, void *user_ctx)
 
     if (len < auxv_len - offset) {
         g_string_assign(gdbserver_state.str_buf, "m");
-        memtox(gdbserver_state.str_buf, mem, len);
     } else {
         g_string_assign(gdbserver_state.str_buf, "l");
-        memtox(gdbserver_state.str_buf, mem, auxv_len - offset);
+        len = auxv_len - offset;
     }
 
+    g_byte_array_set_size(gdbserver_state.mem_buf, len);
+    if (target_memory_rw_debug(gdbserver_state.g_cpu, saved_auxv + offset,
+                               gdbserver_state.mem_buf->data, len, false)) {
+        put_packet("E14");
+        return;
+    }
+
+    memtox(gdbserver_state.str_buf,
+           (const char *)gdbserver_state.mem_buf->data, len);
     put_packet_binary(gdbserver_state.str_buf->str,
                       gdbserver_state.str_buf->len, true);
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 02/93] tcg: Split out tcg_raise_tb_overflow
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 01/93] gdbstub: Fix handle_query_xfer_auxv Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 03/93] configure: Fix --enable-tcg-interpreter Richard Henderson
                   ` (93 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Allow other places in tcg to restart with a smaller tb.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 63a12b197b..bbe3dcee03 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -346,6 +346,12 @@ static void set_jmp_reset_offset(TCGContext *s, int which)
     s->tb_jmp_reset_offset[which] = tcg_current_code_size(s);
 }
 
+/* Signal overflow, starting over with fewer guest insns. */
+static void QEMU_NORETURN tcg_raise_tb_overflow(TCGContext *s)
+{
+    siglongjmp(s->jmp_trans, -2);
+}
+
 #define C_PFX1(P, A)                    P##A
 #define C_PFX2(P, A, B)                 P##A##_##B
 #define C_PFX3(P, A, B, C)              P##A##_##B##_##C
@@ -1310,8 +1316,7 @@ static TCGTemp *tcg_temp_alloc(TCGContext *s)
     int n = s->nb_temps++;
 
     if (n >= TCG_MAX_TEMPS) {
-        /* Signal overflow, starting over with fewer guest insns. */
-        siglongjmp(s->jmp_trans, -2);
+        tcg_raise_tb_overflow(s);
     }
     return memset(&s->temps[n], 0, sizeof(TCGTemp));
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 03/93] configure: Fix --enable-tcg-interpreter
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 01/93] gdbstub: Fix handle_query_xfer_auxv Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 02/93] tcg: Split out tcg_raise_tb_overflow Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 04/93] tcg: Manage splitwx in tc_ptr_to_region_tree by hand Richard Henderson
                   ` (92 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: sw, Daniel P . Berrangé,
	Alex Bennée, Philippe Mathieu-Daudé,
	Philippe Mathieu-Daudé

The configure option was backward, and we failed to
pass the value on to meson.

Fixes: 23a77b2d18b ("build-system: clean up TCG/TCI configury")
Tested-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 configure | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index e85d6baf8f..a34f91171d 100755
--- a/configure
+++ b/configure
@@ -1110,9 +1110,9 @@ for opt do
   ;;
   --enable-whpx) whpx="enabled"
   ;;
-  --disable-tcg-interpreter) tcg_interpreter="true"
+  --disable-tcg-interpreter) tcg_interpreter="false"
   ;;
-  --enable-tcg-interpreter) tcg_interpreter="false"
+  --enable-tcg-interpreter) tcg_interpreter="true"
   ;;
   --disable-cap-ng)  cap_ng="disabled"
   ;;
@@ -6417,6 +6417,7 @@ NINJA=$ninja $meson setup \
         -Dvhost_user_blk_server=$vhost_user_blk_server \
         -Dfuse=$fuse -Dfuse_lseek=$fuse_lseek -Dguest_agent_msi=$guest_agent_msi \
         $(if test "$default_features" = no; then echo "-Dauto_features=disabled"; fi) \
+	-Dtcg_interpreter=$tcg_interpreter \
         $cross_arg \
         "$PWD" "$source_path"
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 04/93] tcg: Manage splitwx in tc_ptr_to_region_tree by hand
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (2 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 03/93] configure: Fix --enable-tcg-interpreter Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04 15:01   ` Alex Bennée
  2021-02-04  1:43 ` [PATCH v2 05/93] tcg/tci: Make tci_tb_ptr thread-local Richard Henderson
                   ` (91 subsequent siblings)
  95 siblings, 1 reply; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

The use in tcg_tb_lookup is given a random pc that comes from the pc
of a signal handler.  Do not assert that the pointer is already within
the code gen buffer at all, much less the writable mirror of it.

Fixes: db0c51a3803
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---

For TCI, this indicates a bug in handle_cpu_signal, in that we
are taking PC from the host signal frame.  Which is, nearly,
unrelated to TCI at all.

The TCI "pc" is tci_tb_ptr (fixed in the next patch to at least
be thread-local).  We update this only on calls, since we don't
expect SEGV during the interpretation loop.  Which works ok for
softmmu, in which we pass down pc by hand to the helpers, but
is not ok for user-only, where we simply perform the raw memory
operation.

I don't know how to fix this, exactly.  Probably by storing to
tci_tb_ptr before each qemu_ld/qemu_st operation, with barriers.
Then Doing the Right Thing in handle_cpu_signal.  And perhaps
by clearing tci_tb_ptr whenever we're not expecting a SEGV on
behalf of the guest (and thus anything left is a qemu host bug).

---
v2: Retain full struct initialization
---
 tcg/tcg.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index bbe3dcee03..2991112829 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -513,11 +513,21 @@ static void tcg_region_trees_init(void)
     }
 }
 
-static struct tcg_region_tree *tc_ptr_to_region_tree(const void *cp)
+static struct tcg_region_tree *tc_ptr_to_region_tree(const void *p)
 {
-    void *p = tcg_splitwx_to_rw(cp);
     size_t region_idx;
 
+    /*
+     * Like tcg_splitwx_to_rw, with no assert.  The pc may come from
+     * a signal handler over which the caller has no control.
+     */
+    if (!in_code_gen_buffer(p)) {
+        p -= tcg_splitwx_diff;
+        if (!in_code_gen_buffer(p)) {
+            return NULL;
+        }
+    }
+
     if (p < region.start_aligned) {
         region_idx = 0;
     } else {
@@ -536,6 +546,7 @@ void tcg_tb_insert(TranslationBlock *tb)
 {
     struct tcg_region_tree *rt = tc_ptr_to_region_tree(tb->tc.ptr);
 
+    g_assert(rt != NULL);
     qemu_mutex_lock(&rt->lock);
     g_tree_insert(rt->tree, &tb->tc, tb);
     qemu_mutex_unlock(&rt->lock);
@@ -545,6 +556,7 @@ void tcg_tb_remove(TranslationBlock *tb)
 {
     struct tcg_region_tree *rt = tc_ptr_to_region_tree(tb->tc.ptr);
 
+    g_assert(rt != NULL);
     qemu_mutex_lock(&rt->lock);
     g_tree_remove(rt->tree, &tb->tc);
     qemu_mutex_unlock(&rt->lock);
@@ -561,6 +573,10 @@ TranslationBlock *tcg_tb_lookup(uintptr_t tc_ptr)
     TranslationBlock *tb;
     struct tb_tc s = { .ptr = (void *)tc_ptr };
 
+    if (rt == NULL) {
+        return NULL;
+    }
+
     qemu_mutex_lock(&rt->lock);
     tb = g_tree_lookup(rt->tree, &s);
     qemu_mutex_unlock(&rt->lock);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 05/93] tcg/tci: Make tci_tb_ptr thread-local
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (3 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 04/93] tcg: Manage splitwx in tc_ptr_to_region_tree by hand Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04 15:05   ` Alex Bennée
  2021-02-04  1:43 ` [PATCH v2 06/93] tcg/tci: Implement INDEX_op_ld16s_i32 Richard Henderson
                   ` (90 subsequent siblings)
  95 siblings, 1 reply; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Each thread must have its own pc, even under TCI.

Remove the GETPC ifdef, because GETPC is always available for
helpers, and thus is always required.  Move the assignment
under INDEX_op_call, because the value is only visible when
we make a call to a helper function.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/exec-all.h | 2 +-
 tcg/tcg-common.c        | 4 ----
 tcg/tci.c               | 7 +++----
 3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 125000bcf7..f933c74c44 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -544,7 +544,7 @@ void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr);
 
 /* GETPC is the true target of the return instruction that we'll execute.  */
 #if defined(CONFIG_TCG_INTERPRETER)
-extern uintptr_t tci_tb_ptr;
+extern __thread uintptr_t tci_tb_ptr;
 # define GETPC() tci_tb_ptr
 #else
 # define GETPC() \
diff --git a/tcg/tcg-common.c b/tcg/tcg-common.c
index 7e1992e79e..aa0c4f60c9 100644
--- a/tcg/tcg-common.c
+++ b/tcg/tcg-common.c
@@ -25,10 +25,6 @@
 #include "qemu/osdep.h"
 #include "tcg/tcg.h"
 
-#if defined(CONFIG_TCG_INTERPRETER)
-uintptr_t tci_tb_ptr;
-#endif
-
 TCGOpDef tcg_op_defs[] = {
 #define DEF(s, oargs, iargs, cargs, flags) \
          { #s, oargs, iargs, cargs, iargs + oargs + cargs, flags },
diff --git a/tcg/tci.c b/tcg/tci.c
index 3fc82d3c79..b3f9531a73 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -57,6 +57,8 @@ typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong,
                                     tcg_target_ulong, tcg_target_ulong);
 #endif
 
+__thread uintptr_t tci_tb_ptr;
+
 static tcg_target_ulong tci_read_reg(const tcg_target_ulong *regs, TCGReg index)
 {
     tci_assert(index < TCG_TARGET_NB_REGS);
@@ -526,16 +528,13 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #endif
         TCGMemOpIdx oi;
 
-#if defined(GETPC)
-        tci_tb_ptr = (uintptr_t)tb_ptr;
-#endif
-
         /* Skip opcode and size entry. */
         tb_ptr += 2;
 
         switch (opc) {
         case INDEX_op_call:
             t0 = tci_read_ri(regs, &tb_ptr);
+            tci_tb_ptr = (uintptr_t)tb_ptr;
 #if TCG_TARGET_REG_BITS == 32
             tmp64 = ((helper_function)t0)(tci_read_reg(regs, TCG_REG_R0),
                                           tci_read_reg(regs, TCG_REG_R1),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 06/93] tcg/tci: Implement INDEX_op_ld16s_i32
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (4 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 05/93] tcg/tci: Make tci_tb_ptr thread-local Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 07/93] tcg/tci: Implement INDEX_op_ld8s_i64 Richard Henderson
                   ` (89 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

From: Stefan Weil <sw@weilnetz.de>

That TCG opcode is used by debian-buster (arm64) running ffmpeg:

    qemu-aarch64 /usr/bin/ffmpeg -i theora.mkv theora.webm

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reported-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-Id: <20210128024814.2056958-1-sw@weilnetz.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index b3f9531a73..2ba97da189 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -615,7 +615,10 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             TODO();
             break;
         case INDEX_op_ld16s_i32:
-            TODO();
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_s32(&tb_ptr);
+            tci_write_reg(regs, t0, *(int16_t *)(t1 + t2));
             break;
         case INDEX_op_ld_i32:
             t0 = *tb_ptr++;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 07/93] tcg/tci: Implement INDEX_op_ld8s_i64
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (5 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 06/93] tcg/tci: Implement INDEX_op_ld16s_i32 Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 08/93] tcg/tci: Inline tci_write_reg32s into the only caller Richard Henderson
                   ` (88 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

From: Stefan Weil <sw@weilnetz.de>

That TCG opcode is used by debian-buster (arm64) running ffmpeg:

    qemu-aarch64 /usr/bin/ffmpeg -i theora.mkv theora.webm

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reported-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-Id: <20210128020425.2055454-1-sw@weilnetz.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 2ba97da189..c3a8511dfe 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -883,7 +883,10 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_write_reg8(regs, t0, *(uint8_t *)(t1 + t2));
             break;
         case INDEX_op_ld8s_i64:
-            TODO();
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_s32(&tb_ptr);
+            tci_write_reg(regs, t0, *(int8_t *)(t1 + t2));
             break;
         case INDEX_op_ld16u_i64:
             t0 = *tb_ptr++;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 08/93] tcg/tci: Inline tci_write_reg32s into the only caller
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (6 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 07/93] tcg/tci: Implement INDEX_op_ld8s_i64 Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 09/93] tcg/tci: Inline tci_write_reg8 into its callers Richard Henderson
                   ` (87 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index c3a8511dfe..e8023b5384 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -117,14 +117,6 @@ tci_write_reg(tcg_target_ulong *regs, TCGReg index, tcg_target_ulong value)
     regs[index] = value;
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static void
-tci_write_reg32s(tcg_target_ulong *regs, TCGReg index, int32_t value)
-{
-    tci_write_reg(regs, index, value);
-}
-#endif
-
 static void tci_write_reg8(tcg_target_ulong *regs, TCGReg index, uint8_t value)
 {
     tci_write_reg(regs, index, value);
@@ -907,7 +899,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg32s(regs, t0, *(int32_t *)(t1 + t2));
+            tci_write_reg(regs, t0, *(int32_t *)(t1 + t2));
             break;
         case INDEX_op_ld_i64:
             t0 = *tb_ptr++;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 09/93] tcg/tci: Inline tci_write_reg8 into its callers
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (7 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 08/93] tcg/tci: Inline tci_write_reg32s into the only caller Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 10/93] tcg/tci: Inline tci_write_reg16 into the only caller Richard Henderson
                   ` (86 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index e8023b5384..740244cc54 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -117,11 +117,6 @@ tci_write_reg(tcg_target_ulong *regs, TCGReg index, tcg_target_ulong value)
     regs[index] = value;
 }
 
-static void tci_write_reg8(tcg_target_ulong *regs, TCGReg index, uint8_t value)
-{
-    tci_write_reg(regs, index, value);
-}
-
 #if TCG_TARGET_REG_BITS == 64
 static void
 tci_write_reg16(tcg_target_ulong *regs, TCGReg index, uint16_t value)
@@ -598,7 +593,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg8(regs, t0, *(uint8_t *)(t1 + t2));
+            tci_write_reg(regs, t0, *(uint8_t *)(t1 + t2));
             break;
         case INDEX_op_ld8s_i32:
             TODO();
@@ -872,7 +867,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg8(regs, t0, *(uint8_t *)(t1 + t2));
+            tci_write_reg(regs, t0, *(uint8_t *)(t1 + t2));
             break;
         case INDEX_op_ld8s_i64:
             t0 = *tb_ptr++;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 10/93] tcg/tci: Inline tci_write_reg16 into the only caller
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (8 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 09/93] tcg/tci: Inline tci_write_reg8 into its callers Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 11/93] tcg/tci: Inline tci_write_reg32 into all callers Richard Henderson
                   ` (85 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 740244cc54..005d2946c4 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -117,14 +117,6 @@ tci_write_reg(tcg_target_ulong *regs, TCGReg index, tcg_target_ulong value)
     regs[index] = value;
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static void
-tci_write_reg16(tcg_target_ulong *regs, TCGReg index, uint16_t value)
-{
-    tci_write_reg(regs, index, value);
-}
-#endif
-
 static void
 tci_write_reg32(tcg_target_ulong *regs, TCGReg index, uint32_t value)
 {
@@ -879,7 +871,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg16(regs, t0, *(uint16_t *)(t1 + t2));
+            tci_write_reg(regs, t0, *(uint16_t *)(t1 + t2));
             break;
         case INDEX_op_ld16s_i64:
             TODO();
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 11/93] tcg/tci: Inline tci_write_reg32 into all callers
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (9 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 10/93] tcg/tci: Inline tci_write_reg16 into the only caller Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 12/93] tcg/tci: Inline tci_write_reg64 into 64-bit callers Richard Henderson
                   ` (84 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

For a 64-bit TCI, the upper bits of a 32-bit operation are
undefined (much like a native ppc64 32-bit operation).  It
simplifies everything if we don't force-extend the result.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 66 +++++++++++++++++++++++++------------------------------
 1 file changed, 30 insertions(+), 36 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 005d2946c4..39ad00663f 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -117,12 +117,6 @@ tci_write_reg(tcg_target_ulong *regs, TCGReg index, tcg_target_ulong value)
     regs[index] = value;
 }
 
-static void
-tci_write_reg32(tcg_target_ulong *regs, TCGReg index, uint32_t value)
-{
-    tci_write_reg(regs, index, value);
-}
-
 #if TCG_TARGET_REG_BITS == 32
 static void tci_write_reg64(tcg_target_ulong *regs, uint32_t high_index,
                             uint32_t low_index, uint64_t value)
@@ -549,7 +543,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t1 = tci_read_r32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
             condition = *tb_ptr++;
-            tci_write_reg32(regs, t0, tci_compare32(t1, t2, condition));
+            tci_write_reg(regs, t0, tci_compare32(t1, t2, condition));
             break;
 #if TCG_TARGET_REG_BITS == 32
         case INDEX_op_setcond2_i32:
@@ -557,7 +551,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tmp64 = tci_read_r64(regs, &tb_ptr);
             v64 = tci_read_ri64(regs, &tb_ptr);
             condition = *tb_ptr++;
-            tci_write_reg32(regs, t0, tci_compare64(tmp64, v64, condition));
+            tci_write_reg(regs, t0, tci_compare64(tmp64, v64, condition));
             break;
 #elif TCG_TARGET_REG_BITS == 64
         case INDEX_op_setcond_i64:
@@ -571,12 +565,12 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_mov_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_r32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1);
+            tci_write_reg(regs, t0, t1);
             break;
         case INDEX_op_tci_movi_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_i32(&tb_ptr);
-            tci_write_reg32(regs, t0, t1);
+            tci_write_reg(regs, t0, t1);
             break;
 
             /* Load/store operations (32 bit). */
@@ -603,7 +597,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg32(regs, t0, *(uint32_t *)(t1 + t2));
+            tci_write_reg(regs, t0, *(uint32_t *)(t1 + t2));
             break;
         case INDEX_op_st8_i32:
             t0 = tci_read_r8(regs, &tb_ptr);
@@ -631,44 +625,44 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1 + t2);
+            tci_write_reg(regs, t0, t1 + t2);
             break;
         case INDEX_op_sub_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1 - t2);
+            tci_write_reg(regs, t0, t1 - t2);
             break;
         case INDEX_op_mul_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1 * t2);
+            tci_write_reg(regs, t0, t1 * t2);
             break;
 #if TCG_TARGET_HAS_div_i32
         case INDEX_op_div_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, (int32_t)t1 / (int32_t)t2);
+            tci_write_reg(regs, t0, (int32_t)t1 / (int32_t)t2);
             break;
         case INDEX_op_divu_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1 / t2);
+            tci_write_reg(regs, t0, t1 / t2);
             break;
         case INDEX_op_rem_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, (int32_t)t1 % (int32_t)t2);
+            tci_write_reg(regs, t0, (int32_t)t1 % (int32_t)t2);
             break;
         case INDEX_op_remu_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1 % t2);
+            tci_write_reg(regs, t0, t1 % t2);
             break;
 #elif TCG_TARGET_HAS_div2_i32
         case INDEX_op_div2_i32:
@@ -680,19 +674,19 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1 & t2);
+            tci_write_reg(regs, t0, t1 & t2);
             break;
         case INDEX_op_or_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1 | t2);
+            tci_write_reg(regs, t0, t1 | t2);
             break;
         case INDEX_op_xor_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1 ^ t2);
+            tci_write_reg(regs, t0, t1 ^ t2);
             break;
 
             /* Shift/rotate operations (32 bit). */
@@ -701,32 +695,32 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1 << (t2 & 31));
+            tci_write_reg(regs, t0, t1 << (t2 & 31));
             break;
         case INDEX_op_shr_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1 >> (t2 & 31));
+            tci_write_reg(regs, t0, t1 >> (t2 & 31));
             break;
         case INDEX_op_sar_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, ((int32_t)t1 >> (t2 & 31)));
+            tci_write_reg(regs, t0, ((int32_t)t1 >> (t2 & 31)));
             break;
 #if TCG_TARGET_HAS_rot_i32
         case INDEX_op_rotl_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, rol32(t1, t2 & 31));
+            tci_write_reg(regs, t0, rol32(t1, t2 & 31));
             break;
         case INDEX_op_rotr_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
             t2 = tci_read_ri32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, ror32(t1, t2 & 31));
+            tci_write_reg(regs, t0, ror32(t1, t2 & 31));
             break;
 #endif
 #if TCG_TARGET_HAS_deposit_i32
@@ -737,7 +731,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tmp16 = *tb_ptr++;
             tmp8 = *tb_ptr++;
             tmp32 = (((1 << tmp8) - 1) << tmp16);
-            tci_write_reg32(regs, t0, (t1 & ~tmp32) | ((t2 << tmp16) & tmp32));
+            tci_write_reg(regs, t0, (t1 & ~tmp32) | ((t2 << tmp16) & tmp32));
             break;
 #endif
         case INDEX_op_brcond_i32:
@@ -789,56 +783,56 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_ext8s_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_r8s(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1);
+            tci_write_reg(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext16s_i32
         case INDEX_op_ext16s_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_r16s(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1);
+            tci_write_reg(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext8u_i32
         case INDEX_op_ext8u_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_r8(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1);
+            tci_write_reg(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext16u_i32
         case INDEX_op_ext16u_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_r16(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, t1);
+            tci_write_reg(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_bswap16_i32
         case INDEX_op_bswap16_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_r16(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, bswap16(t1));
+            tci_write_reg(regs, t0, bswap16(t1));
             break;
 #endif
 #if TCG_TARGET_HAS_bswap32_i32
         case INDEX_op_bswap32_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_r32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, bswap32(t1));
+            tci_write_reg(regs, t0, bswap32(t1));
             break;
 #endif
 #if TCG_TARGET_HAS_not_i32
         case INDEX_op_not_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_r32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, ~t1);
+            tci_write_reg(regs, t0, ~t1);
             break;
 #endif
 #if TCG_TARGET_HAS_neg_i32
         case INDEX_op_neg_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_r32(regs, &tb_ptr);
-            tci_write_reg32(regs, t0, -t1);
+            tci_write_reg(regs, t0, -t1);
             break;
 #endif
 #if TCG_TARGET_REG_BITS == 64
@@ -880,7 +874,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg32(regs, t0, *(uint32_t *)(t1 + t2));
+            tci_write_reg(regs, t0, *(uint32_t *)(t1 + t2));
             break;
         case INDEX_op_ld32s_i64:
             t0 = *tb_ptr++;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 12/93] tcg/tci: Inline tci_write_reg64 into 64-bit callers
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (10 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 11/93] tcg/tci: Inline tci_write_reg32 into all callers Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 13/93] tcg/tci: Merge INDEX_op_ld8u_{i32,i64} Richard Henderson
                   ` (83 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Note that we had two functions of the same name: a 32-bit version
which took two register numbers and a 64-bit version which was a
no-op wrapper for tcg_write_reg.  After this, we are left with
only the 32-bit version.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 60 +++++++++++++++++++++++++------------------------------
 1 file changed, 27 insertions(+), 33 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 39ad00663f..0f56702b93 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -124,12 +124,6 @@ static void tci_write_reg64(tcg_target_ulong *regs, uint32_t high_index,
     tci_write_reg(regs, low_index, value);
     tci_write_reg(regs, high_index, value >> 32);
 }
-#elif TCG_TARGET_REG_BITS == 64
-static void
-tci_write_reg64(tcg_target_ulong *regs, TCGReg index, uint64_t value)
-{
-    tci_write_reg(regs, index, value);
-}
 #endif
 
 #if TCG_TARGET_REG_BITS == 32
@@ -559,7 +553,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t1 = tci_read_r64(regs, &tb_ptr);
             t2 = tci_read_ri64(regs, &tb_ptr);
             condition = *tb_ptr++;
-            tci_write_reg64(regs, t0, tci_compare64(t1, t2, condition));
+            tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
             break;
 #endif
         case INDEX_op_mov_i32:
@@ -839,12 +833,12 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_mov_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1);
+            tci_write_reg(regs, t0, t1);
             break;
         case INDEX_op_tci_movi_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_i64(&tb_ptr);
-            tci_write_reg64(regs, t0, t1);
+            tci_write_reg(regs, t0, t1);
             break;
 
             /* Load/store operations (64 bit). */
@@ -886,7 +880,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg64(regs, t0, *(uint64_t *)(t1 + t2));
+            tci_write_reg(regs, t0, *(uint64_t *)(t1 + t2));
             break;
         case INDEX_op_st8_i64:
             t0 = tci_read_r8(regs, &tb_ptr);
@@ -920,19 +914,19 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = *tb_ptr++;
             t1 = tci_read_ri64(regs, &tb_ptr);
             t2 = tci_read_ri64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1 + t2);
+            tci_write_reg(regs, t0, t1 + t2);
             break;
         case INDEX_op_sub_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_ri64(regs, &tb_ptr);
             t2 = tci_read_ri64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1 - t2);
+            tci_write_reg(regs, t0, t1 - t2);
             break;
         case INDEX_op_mul_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_ri64(regs, &tb_ptr);
             t2 = tci_read_ri64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1 * t2);
+            tci_write_reg(regs, t0, t1 * t2);
             break;
 #if TCG_TARGET_HAS_div_i64
         case INDEX_op_div_i64:
@@ -951,19 +945,19 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = *tb_ptr++;
             t1 = tci_read_ri64(regs, &tb_ptr);
             t2 = tci_read_ri64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1 & t2);
+            tci_write_reg(regs, t0, t1 & t2);
             break;
         case INDEX_op_or_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_ri64(regs, &tb_ptr);
             t2 = tci_read_ri64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1 | t2);
+            tci_write_reg(regs, t0, t1 | t2);
             break;
         case INDEX_op_xor_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_ri64(regs, &tb_ptr);
             t2 = tci_read_ri64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1 ^ t2);
+            tci_write_reg(regs, t0, t1 ^ t2);
             break;
 
             /* Shift/rotate operations (64 bit). */
@@ -972,32 +966,32 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = *tb_ptr++;
             t1 = tci_read_ri64(regs, &tb_ptr);
             t2 = tci_read_ri64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1 << (t2 & 63));
+            tci_write_reg(regs, t0, t1 << (t2 & 63));
             break;
         case INDEX_op_shr_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_ri64(regs, &tb_ptr);
             t2 = tci_read_ri64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1 >> (t2 & 63));
+            tci_write_reg(regs, t0, t1 >> (t2 & 63));
             break;
         case INDEX_op_sar_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_ri64(regs, &tb_ptr);
             t2 = tci_read_ri64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, ((int64_t)t1 >> (t2 & 63)));
+            tci_write_reg(regs, t0, ((int64_t)t1 >> (t2 & 63)));
             break;
 #if TCG_TARGET_HAS_rot_i64
         case INDEX_op_rotl_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_ri64(regs, &tb_ptr);
             t2 = tci_read_ri64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, rol64(t1, t2 & 63));
+            tci_write_reg(regs, t0, rol64(t1, t2 & 63));
             break;
         case INDEX_op_rotr_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_ri64(regs, &tb_ptr);
             t2 = tci_read_ri64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, ror64(t1, t2 & 63));
+            tci_write_reg(regs, t0, ror64(t1, t2 & 63));
             break;
 #endif
 #if TCG_TARGET_HAS_deposit_i64
@@ -1008,7 +1002,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tmp16 = *tb_ptr++;
             tmp8 = *tb_ptr++;
             tmp64 = (((1ULL << tmp8) - 1) << tmp16);
-            tci_write_reg64(regs, t0, (t1 & ~tmp64) | ((t2 << tmp16) & tmp64));
+            tci_write_reg(regs, t0, (t1 & ~tmp64) | ((t2 << tmp16) & tmp64));
             break;
 #endif
         case INDEX_op_brcond_i64:
@@ -1026,28 +1020,28 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_ext8u_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r8(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1);
+            tci_write_reg(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext8s_i64
         case INDEX_op_ext8s_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r8s(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1);
+            tci_write_reg(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext16s_i64
         case INDEX_op_ext16s_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r16s(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1);
+            tci_write_reg(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext16u_i64
         case INDEX_op_ext16u_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r16(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1);
+            tci_write_reg(regs, t0, t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext32s_i64
@@ -1056,7 +1050,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_ext_i32_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r32s(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1);
+            tci_write_reg(regs, t0, t1);
             break;
 #if TCG_TARGET_HAS_ext32u_i64
         case INDEX_op_ext32u_i64:
@@ -1064,41 +1058,41 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_extu_i32_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r32(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, t1);
+            tci_write_reg(regs, t0, t1);
             break;
 #if TCG_TARGET_HAS_bswap16_i64
         case INDEX_op_bswap16_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r16(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, bswap16(t1));
+            tci_write_reg(regs, t0, bswap16(t1));
             break;
 #endif
 #if TCG_TARGET_HAS_bswap32_i64
         case INDEX_op_bswap32_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r32(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, bswap32(t1));
+            tci_write_reg(regs, t0, bswap32(t1));
             break;
 #endif
 #if TCG_TARGET_HAS_bswap64_i64
         case INDEX_op_bswap64_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, bswap64(t1));
+            tci_write_reg(regs, t0, bswap64(t1));
             break;
 #endif
 #if TCG_TARGET_HAS_not_i64
         case INDEX_op_not_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, ~t1);
+            tci_write_reg(regs, t0, ~t1);
             break;
 #endif
 #if TCG_TARGET_HAS_neg_i64
         case INDEX_op_neg_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r64(regs, &tb_ptr);
-            tci_write_reg64(regs, t0, -t1);
+            tci_write_reg(regs, t0, -t1);
             break;
 #endif
 #endif /* TCG_TARGET_REG_BITS == 64 */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 13/93] tcg/tci: Merge INDEX_op_ld8u_{i32,i64}
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (11 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 12/93] tcg/tci: Inline tci_write_reg64 into 64-bit callers Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 14/93] tcg/tci: Merge INDEX_op_ld8s_{i32,i64} Richard Henderson
                   ` (82 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 0f56702b93..7e108bcbb3 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -455,6 +455,18 @@ static bool tci_compare64(uint64_t u0, uint64_t u1, TCGCond condition)
 # define qemu_st_beq(X)  stq_be_p(g2h(taddr), X)
 #endif
 
+#if TCG_TARGET_REG_BITS == 64
+# define CASE_32_64(x) \
+        case glue(glue(INDEX_op_, x), _i64): \
+        case glue(glue(INDEX_op_, x), _i32):
+# define CASE_64(x) \
+        case glue(glue(INDEX_op_, x), _i64):
+#else
+# define CASE_32_64(x) \
+        case glue(glue(INDEX_op_, x), _i32):
+# define CASE_64(x)
+#endif
+
 /* Interpret pseudo code in tb. */
 /*
  * Disable CFI checks.
@@ -569,7 +581,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
             /* Load/store operations (32 bit). */
 
-        case INDEX_op_ld8u_i32:
+        CASE_32_64(ld8u)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
@@ -843,12 +855,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
             /* Load/store operations (64 bit). */
 
-        case INDEX_op_ld8u_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg(regs, t0, *(uint8_t *)(t1 + t2));
-            break;
         case INDEX_op_ld8s_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 14/93] tcg/tci: Merge INDEX_op_ld8s_{i32,i64}
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (12 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 13/93] tcg/tci: Merge INDEX_op_ld8u_{i32,i64} Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 15/93] tcg/tci: Merge INDEX_op_ld16u_{i32,i64} Richard Henderson
                   ` (81 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Eliminating a TODO for ld8s_i32.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 7e108bcbb3..c31be1a1f4 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -587,8 +587,11 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t2 = tci_read_s32(&tb_ptr);
             tci_write_reg(regs, t0, *(uint8_t *)(t1 + t2));
             break;
-        case INDEX_op_ld8s_i32:
-            TODO();
+        CASE_32_64(ld8s)
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_s32(&tb_ptr);
+            tci_write_reg(regs, t0, *(int8_t *)(t1 + t2));
             break;
         case INDEX_op_ld16u_i32:
             TODO();
@@ -855,12 +858,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
             /* Load/store operations (64 bit). */
 
-        case INDEX_op_ld8s_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg(regs, t0, *(int8_t *)(t1 + t2));
-            break;
         case INDEX_op_ld16u_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 15/93] tcg/tci: Merge INDEX_op_ld16u_{i32,i64}
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (13 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 14/93] tcg/tci: Merge INDEX_op_ld8s_{i32,i64} Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 16/93] tcg/tci: Merge INDEX_op_ld16s_{i32,i64} Richard Henderson
                   ` (80 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Eliminating a TODO for ld16u_i32.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index c31be1a1f4..b64d611ec9 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -593,8 +593,11 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t2 = tci_read_s32(&tb_ptr);
             tci_write_reg(regs, t0, *(int8_t *)(t1 + t2));
             break;
-        case INDEX_op_ld16u_i32:
-            TODO();
+        CASE_32_64(ld16u)
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_s32(&tb_ptr);
+            tci_write_reg(regs, t0, *(uint16_t *)(t1 + t2));
             break;
         case INDEX_op_ld16s_i32:
             t0 = *tb_ptr++;
@@ -858,12 +861,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
             /* Load/store operations (64 bit). */
 
-        case INDEX_op_ld16u_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg(regs, t0, *(uint16_t *)(t1 + t2));
-            break;
         case INDEX_op_ld16s_i64:
             TODO();
             break;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 16/93] tcg/tci: Merge INDEX_op_ld16s_{i32,i64}
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (14 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 15/93] tcg/tci: Merge INDEX_op_ld16u_{i32,i64} Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 17/93] tcg/tci: Merge INDEX_op_{ld_i32,ld32u_i64} Richard Henderson
                   ` (79 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Eliminating a TODO for ld16s_i64.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index b64d611ec9..259a8538bf 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -599,7 +599,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t2 = tci_read_s32(&tb_ptr);
             tci_write_reg(regs, t0, *(uint16_t *)(t1 + t2));
             break;
-        case INDEX_op_ld16s_i32:
+        CASE_32_64(ld16s)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
@@ -861,9 +861,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
             /* Load/store operations (64 bit). */
 
-        case INDEX_op_ld16s_i64:
-            TODO();
-            break;
         case INDEX_op_ld32u_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 17/93] tcg/tci: Merge INDEX_op_{ld_i32,ld32u_i64}
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (15 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 16/93] tcg/tci: Merge INDEX_op_ld16s_{i32,i64} Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 18/93] tcg/tci: Merge INDEX_op_st8_{i32,i64} Richard Henderson
                   ` (78 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 259a8538bf..55863f76a7 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -606,6 +606,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_write_reg(regs, t0, *(int16_t *)(t1 + t2));
             break;
         case INDEX_op_ld_i32:
+        CASE_64(ld32u)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
@@ -861,12 +862,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
             /* Load/store operations (64 bit). */
 
-        case INDEX_op_ld32u_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg(regs, t0, *(uint32_t *)(t1 + t2));
-            break;
         case INDEX_op_ld32s_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 18/93] tcg/tci: Merge INDEX_op_st8_{i32,i64}
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (16 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 17/93] tcg/tci: Merge INDEX_op_{ld_i32,ld32u_i64} Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 19/93] tcg/tci: Merge INDEX_op_st16_{i32,i64} Richard Henderson
                   ` (77 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 55863f76a7..6819c97792 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -612,7 +612,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t2 = tci_read_s32(&tb_ptr);
             tci_write_reg(regs, t0, *(uint32_t *)(t1 + t2));
             break;
-        case INDEX_op_st8_i32:
+        CASE_32_64(st8)
             t0 = tci_read_r8(regs, &tb_ptr);
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
@@ -874,12 +874,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t2 = tci_read_s32(&tb_ptr);
             tci_write_reg(regs, t0, *(uint64_t *)(t1 + t2));
             break;
-        case INDEX_op_st8_i64:
-            t0 = tci_read_r8(regs, &tb_ptr);
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            *(uint8_t *)(t1 + t2) = t0;
-            break;
         case INDEX_op_st16_i64:
             t0 = tci_read_r16(regs, &tb_ptr);
             t1 = tci_read_r(regs, &tb_ptr);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 19/93] tcg/tci: Merge INDEX_op_st16_{i32,i64}
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (17 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 18/93] tcg/tci: Merge INDEX_op_st8_{i32,i64} Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 20/93] tcg/tci: Move stack bounds check to compile-time Richard Henderson
                   ` (76 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 6819c97792..fe935e71a3 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -618,7 +618,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t2 = tci_read_s32(&tb_ptr);
             *(uint8_t *)(t1 + t2) = t0;
             break;
-        case INDEX_op_st16_i32:
+        CASE_32_64(st16)
             t0 = tci_read_r16(regs, &tb_ptr);
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
@@ -874,12 +874,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t2 = tci_read_s32(&tb_ptr);
             tci_write_reg(regs, t0, *(uint64_t *)(t1 + t2));
             break;
-        case INDEX_op_st16_i64:
-            t0 = tci_read_r16(regs, &tb_ptr);
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            *(uint16_t *)(t1 + t2) = t0;
-            break;
         case INDEX_op_st32_i64:
             t0 = tci_read_r32(regs, &tb_ptr);
             t1 = tci_read_r(regs, &tb_ptr);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 20/93] tcg/tci: Move stack bounds check to compile-time
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (18 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 19/93] tcg/tci: Merge INDEX_op_st16_{i32,i64} Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 21/93] tcg/tci: Merge INDEX_op_{st_i32,st32_i64} Richard Henderson
                   ` (75 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

The existing check was incomplete:
(1) Only applied to two of the 7 stores, and not to the loads at all.
(2) Only checked the upper, but not the lower bound of the stack.

Doing this at compile time means that we don't need to do it
at runtime as well.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c                |  2 --
 tcg/tci/tcg-target.c.inc | 13 +++++++++++++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index fe935e71a3..ee2cd7dfa2 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -628,7 +628,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = tci_read_r32(regs, &tb_ptr);
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_assert(t1 != sp_value || (int32_t)t2 < 0);
             *(uint32_t *)(t1 + t2) = t0;
             break;
 
@@ -884,7 +883,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t0 = tci_read_r64(regs, &tb_ptr);
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
-            tci_assert(t1 != sp_value || (int32_t)t2 < 0);
             *(uint64_t *)(t1 + t2) = t0;
             break;
 
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index f0f6b13112..82efb9af60 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -375,10 +375,20 @@ static void tci_out_label(TCGContext *s, TCGLabel *label)
     }
 }
 
+static void stack_bounds_check(TCGReg base, target_long offset)
+{
+    if (base == TCG_REG_CALL_STACK) {
+        tcg_debug_assert(offset < 0);
+        tcg_debug_assert(offset >= -(CPU_TEMP_BUF_NLONGS * sizeof(long)));
+    }
+}
+
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
                        intptr_t arg2)
 {
     uint8_t *old_code_ptr = s->code_ptr;
+
+    stack_bounds_check(arg1, arg2);
     if (type == TCG_TYPE_I32) {
         tcg_out_op_t(s, INDEX_op_ld_i32);
         tcg_out_r(s, ret);
@@ -514,6 +524,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_st16_i64:
     case INDEX_op_st32_i64:
     case INDEX_op_st_i64:
+        stack_bounds_check(args[1], args[2]);
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_debug_assert(args[2] == (int32_t)args[2]);
@@ -716,6 +727,8 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1,
                        intptr_t arg2)
 {
     uint8_t *old_code_ptr = s->code_ptr;
+
+    stack_bounds_check(arg1, arg2);
     if (type == TCG_TYPE_I32) {
         tcg_out_op_t(s, INDEX_op_st_i32);
         tcg_out_r(s, arg);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 21/93] tcg/tci: Merge INDEX_op_{st_i32,st32_i64}
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (19 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 20/93] tcg/tci: Move stack bounds check to compile-time Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 22/93] tcg/tci: Use g_assert_not_reached Richard Henderson
                   ` (74 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index ee2cd7dfa2..eb70672efb 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -625,6 +625,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             *(uint16_t *)(t1 + t2) = t0;
             break;
         case INDEX_op_st_i32:
+        CASE_64(st32)
             t0 = tci_read_r32(regs, &tb_ptr);
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
@@ -873,12 +874,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t2 = tci_read_s32(&tb_ptr);
             tci_write_reg(regs, t0, *(uint64_t *)(t1 + t2));
             break;
-        case INDEX_op_st32_i64:
-            t0 = tci_read_r32(regs, &tb_ptr);
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            *(uint32_t *)(t1 + t2) = t0;
-            break;
         case INDEX_op_st_i64:
             t0 = tci_read_r64(regs, &tb_ptr);
             t1 = tci_read_r(regs, &tb_ptr);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 22/93] tcg/tci: Use g_assert_not_reached
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (20 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 21/93] tcg/tci: Merge INDEX_op_{st_i32,st32_i64} Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:43 ` [PATCH v2 23/93] tcg/tci: Remove dead code for TCG_TARGET_HAS_div2_* Richard Henderson
                   ` (73 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Three TODO instances are never happen cases.
Other uses of tcg_abort are also indicating unreachable cases.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index eb70672efb..36d594672f 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -362,7 +362,7 @@ static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
         result = (u0 > u1);
         break;
     default:
-        TODO();
+        g_assert_not_reached();
     }
     return result;
 }
@@ -404,7 +404,7 @@ static bool tci_compare64(uint64_t u0, uint64_t u1, TCGCond condition)
         result = (u0 > u1);
         break;
     default:
-        TODO();
+        g_assert_not_reached();
     }
     return result;
 }
@@ -1114,7 +1114,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
                 tmp32 = qemu_ld_beul;
                 break;
             default:
-                tcg_abort();
+                g_assert_not_reached();
             }
             tci_write_reg(regs, t0, tmp32);
             break;
@@ -1163,7 +1163,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
                 tmp64 = qemu_ld_beq;
                 break;
             default:
-                tcg_abort();
+                g_assert_not_reached();
             }
             tci_write_reg(regs, t0, tmp64);
             if (TCG_TARGET_REG_BITS == 32) {
@@ -1191,7 +1191,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
                 qemu_st_bel(t0);
                 break;
             default:
-                tcg_abort();
+                g_assert_not_reached();
             }
             break;
         case INDEX_op_qemu_st_i64:
@@ -1221,7 +1221,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
                 qemu_st_beq(tmp64);
                 break;
             default:
-                tcg_abort();
+                g_assert_not_reached();
             }
             break;
         case INDEX_op_mb:
@@ -1229,8 +1229,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             smp_mb();
             break;
         default:
-            TODO();
-            break;
+            g_assert_not_reached();
         }
         tci_assert(tb_ptr == old_code_ptr + op_size);
     }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 23/93] tcg/tci: Remove dead code for TCG_TARGET_HAS_div2_*
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (21 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 22/93] tcg/tci: Use g_assert_not_reached Richard Henderson
@ 2021-02-04  1:43 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 24/93] tcg/tci: Implement 64-bit division Richard Henderson
                   ` (72 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

We do not simultaneously support div and div2 -- it's one
or the other.  TCI is already using div, so remove div2.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c                | 12 ------------
 tcg/tci/tcg-target.c.inc |  8 --------
 2 files changed, 20 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 36d594672f..25329345cf 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -652,7 +652,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t2 = tci_read_ri32(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 * t2);
             break;
-#if TCG_TARGET_HAS_div_i32
         case INDEX_op_div_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
@@ -677,12 +676,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t2 = tci_read_ri32(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 % t2);
             break;
-#elif TCG_TARGET_HAS_div2_i32
-        case INDEX_op_div2_i32:
-        case INDEX_op_divu2_i32:
-            TODO();
-            break;
-#endif
         case INDEX_op_and_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_ri32(regs, &tb_ptr);
@@ -908,11 +901,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_remu_i64:
             TODO();
             break;
-#elif TCG_TARGET_HAS_div2_i64
-        case INDEX_op_div2_i64:
-        case INDEX_op_divu2_i64:
-            TODO();
-            break;
 #endif
         case INDEX_op_and_i64:
             t0 = *tb_ptr++;
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 82efb9af60..6dc5bac2f3 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -596,10 +596,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_remu_i64:     /* Optional (TCG_TARGET_HAS_div_i64). */
         TODO();
         break;
-    case INDEX_op_div2_i64:     /* Optional (TCG_TARGET_HAS_div2_i64). */
-    case INDEX_op_divu2_i64:    /* Optional (TCG_TARGET_HAS_div2_i64). */
-        TODO();
-        break;
     case INDEX_op_brcond_i64:
         tcg_out_r(s, args[0]);
         tcg_out_ri64(s, const_args[1], args[1]);
@@ -639,10 +635,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_ri32(s, const_args[1], args[1]);
         tcg_out_ri32(s, const_args[2], args[2]);
         break;
-    case INDEX_op_div2_i32:     /* Optional (TCG_TARGET_HAS_div2_i32). */
-    case INDEX_op_divu2_i32:    /* Optional (TCG_TARGET_HAS_div2_i32). */
-        TODO();
-        break;
 #if TCG_TARGET_REG_BITS == 32
     case INDEX_op_add2_i32:
     case INDEX_op_sub2_i32:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 24/93] tcg/tci: Implement 64-bit division
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (22 preceding siblings ...)
  2021-02-04  1:43 ` [PATCH v2 23/93] tcg/tci: Remove dead code for TCG_TARGET_HAS_div2_* Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 25/93] tcg/tci: Remove TODO as unused Richard Henderson
                   ` (71 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Trivially implemented like other arithmetic.
Tested via check-tcg and the ppc64 target.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.h     |  4 ++--
 tcg/tci.c                | 28 ++++++++++++++++++++++------
 tcg/tci/tcg-target.c.inc | 10 ++++------
 3 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index bb784e018e..7fc349a3de 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -100,8 +100,8 @@
 #define TCG_TARGET_HAS_extract_i64      0
 #define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_extract2_i64     0
-#define TCG_TARGET_HAS_div_i64          0
-#define TCG_TARGET_HAS_rem_i64          0
+#define TCG_TARGET_HAS_div_i64          1
+#define TCG_TARGET_HAS_rem_i64          1
 #define TCG_TARGET_HAS_ext8s_i64        1
 #define TCG_TARGET_HAS_ext16s_i64       1
 #define TCG_TARGET_HAS_ext32s_i64       1
diff --git a/tcg/tci.c b/tcg/tci.c
index 25329345cf..5c84a1c979 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -894,14 +894,30 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t2 = tci_read_ri64(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 * t2);
             break;
-#if TCG_TARGET_HAS_div_i64
         case INDEX_op_div_i64:
-        case INDEX_op_divu_i64:
-        case INDEX_op_rem_i64:
-        case INDEX_op_remu_i64:
-            TODO();
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (int64_t)t1 / (int64_t)t2);
+            break;
+        case INDEX_op_divu_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (uint64_t)t1 / (uint64_t)t2);
+            break;
+        case INDEX_op_rem_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (int64_t)t1 % (int64_t)t2);
+            break;
+        case INDEX_op_remu_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_ri64(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (uint64_t)t1 % (uint64_t)t2);
             break;
-#endif
         case INDEX_op_and_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_ri64(regs, &tb_ptr);
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 6dc5bac2f3..3327ce3072 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -577,6 +577,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_sar_i64:
     case INDEX_op_rotl_i64:     /* Optional (TCG_TARGET_HAS_rot_i64). */
     case INDEX_op_rotr_i64:     /* Optional (TCG_TARGET_HAS_rot_i64). */
+    case INDEX_op_div_i64:      /* Optional (TCG_TARGET_HAS_div_i64). */
+    case INDEX_op_divu_i64:     /* Optional (TCG_TARGET_HAS_div_i64). */
+    case INDEX_op_rem_i64:      /* Optional (TCG_TARGET_HAS_div_i64). */
+    case INDEX_op_remu_i64:     /* Optional (TCG_TARGET_HAS_div_i64). */
         tcg_out_r(s, args[0]);
         tcg_out_ri64(s, const_args[1], args[1]);
         tcg_out_ri64(s, const_args[2], args[2]);
@@ -590,12 +594,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_debug_assert(args[4] <= UINT8_MAX);
         tcg_out8(s, args[4]);
         break;
-    case INDEX_op_div_i64:      /* Optional (TCG_TARGET_HAS_div_i64). */
-    case INDEX_op_divu_i64:     /* Optional (TCG_TARGET_HAS_div_i64). */
-    case INDEX_op_rem_i64:      /* Optional (TCG_TARGET_HAS_div_i64). */
-    case INDEX_op_remu_i64:     /* Optional (TCG_TARGET_HAS_div_i64). */
-        TODO();
-        break;
     case INDEX_op_brcond_i64:
         tcg_out_r(s, args[0]);
         tcg_out_ri64(s, const_args[1], args[1]);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 25/93] tcg/tci: Remove TODO as unused
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (23 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 24/93] tcg/tci: Implement 64-bit division Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 26/93] tcg/tci: Restrict TCG_TARGET_NB_REGS to 16 Richard Henderson
                   ` (70 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw, Alex Bennée

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 5c84a1c979..e0d815e4b2 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -33,14 +33,6 @@
 #include "tcg/tcg-op.h"
 #include "qemu/compiler.h"
 
-/* Marker for missing code. */
-#define TODO() \
-    do { \
-        fprintf(stderr, "TODO %s:%u: %s()\n", \
-                __FILE__, __LINE__, __func__); \
-        tcg_abort(); \
-    } while (0)
-
 #if MAX_OPC_PARAM_IARGS != 6
 # error Fix needed, number of supported input arguments changed!
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 26/93] tcg/tci: Restrict TCG_TARGET_NB_REGS to 16
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (24 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 25/93] tcg/tci: Remove TODO as unused Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04 15:07   ` Alex Bennée
  2021-02-04  1:44 ` [PATCH v2 27/93] tcg/tci: Fix TCG_REG_R4 misusage Richard Henderson
                   ` (69 subsequent siblings)
  95 siblings, 1 reply; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

As noted in several comments, 8 regs is not enough for 32-bit
to perform calls, as currently implemented.  Shortly, we will
rearrange the encoding which will make 32 regs impossible.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.h     | 32 +++++---------------------------
 tcg/tci/tcg-target.c.inc | 26 --------------------------
 2 files changed, 5 insertions(+), 53 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 7fc349a3de..8f7ed676fc 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -133,11 +133,8 @@
 #define TCG_TARGET_HAS_mulu2_i32        1
 #endif /* TCG_TARGET_REG_BITS == 64 */
 
-/* Number of registers available.
-   For 32 bit hosts, we need more than 8 registers (call arguments). */
-/* #define TCG_TARGET_NB_REGS 8 */
+/* Number of registers available. */
 #define TCG_TARGET_NB_REGS 16
-/* #define TCG_TARGET_NB_REGS 32 */
 
 /* List of registers which are used by TCG. */
 typedef enum {
@@ -149,7 +146,6 @@ typedef enum {
     TCG_REG_R5,
     TCG_REG_R6,
     TCG_REG_R7,
-#if TCG_TARGET_NB_REGS >= 16
     TCG_REG_R8,
     TCG_REG_R9,
     TCG_REG_R10,
@@ -158,33 +154,15 @@ typedef enum {
     TCG_REG_R13,
     TCG_REG_R14,
     TCG_REG_R15,
-#if TCG_TARGET_NB_REGS >= 32
-    TCG_REG_R16,
-    TCG_REG_R17,
-    TCG_REG_R18,
-    TCG_REG_R19,
-    TCG_REG_R20,
-    TCG_REG_R21,
-    TCG_REG_R22,
-    TCG_REG_R23,
-    TCG_REG_R24,
-    TCG_REG_R25,
-    TCG_REG_R26,
-    TCG_REG_R27,
-    TCG_REG_R28,
-    TCG_REG_R29,
-    TCG_REG_R30,
-    TCG_REG_R31,
-#endif
-#endif
+
+    TCG_AREG0 = TCG_REG_R14,
+    TCG_REG_CALL_STACK = TCG_REG_R15,
+
     /* Special value UINT8_MAX is used by TCI to encode constant values. */
     TCG_CONST = UINT8_MAX
 } TCGReg;
 
-#define TCG_AREG0                       (TCG_TARGET_NB_REGS - 2)
-
 /* Used for function call generation. */
-#define TCG_REG_CALL_STACK              (TCG_TARGET_NB_REGS - 1)
 #define TCG_TARGET_CALL_STACK_OFFSET    0
 #define TCG_TARGET_STACK_ALIGN          16
 
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 3327ce3072..7e3bed811e 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -187,7 +187,6 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_R5,
     TCG_REG_R6,
     TCG_REG_R7,
-#if TCG_TARGET_NB_REGS >= 16
     TCG_REG_R8,
     TCG_REG_R9,
     TCG_REG_R10,
@@ -196,7 +195,6 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_R13,
     TCG_REG_R14,
     TCG_REG_R15,
-#endif
 };
 
 #if MAX_OPC_PARAM_IARGS != 6
@@ -216,15 +214,11 @@ static const int tcg_target_call_iarg_regs[] = {
 #if TCG_TARGET_REG_BITS == 32
     /* 32 bit hosts need 2 * MAX_OPC_PARAM_IARGS registers. */
     TCG_REG_R7,
-#if TCG_TARGET_NB_REGS >= 16
     TCG_REG_R8,
     TCG_REG_R9,
     TCG_REG_R10,
     TCG_REG_R11,
     TCG_REG_R12,
-#else
-# error Too few input registers available
-#endif
 #endif
 };
 
@@ -245,7 +239,6 @@ static const char *const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
     "r05",
     "r06",
     "r07",
-#if TCG_TARGET_NB_REGS >= 16
     "r08",
     "r09",
     "r10",
@@ -254,25 +247,6 @@ static const char *const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
     "r13",
     "r14",
     "r15",
-#if TCG_TARGET_NB_REGS >= 32
-    "r16",
-    "r17",
-    "r18",
-    "r19",
-    "r20",
-    "r21",
-    "r22",
-    "r23",
-    "r24",
-    "r25",
-    "r26",
-    "r27",
-    "r28",
-    "r29",
-    "r30",
-    "r31"
-#endif
-#endif
 };
 #endif
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 27/93] tcg/tci: Fix TCG_REG_R4 misusage
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (25 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 26/93] tcg/tci: Restrict TCG_TARGET_NB_REGS to 16 Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04 15:09   ` Alex Bennée
  2021-02-04  1:44 ` [PATCH v2 28/93] tcg/tci: Use bool in tcg_out_ri* Richard Henderson
                   ` (68 subsequent siblings)
  95 siblings, 1 reply; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

This was removed from tcg_target_reg_alloc_order and
tcg_target_call_iarg_regs on the assumption that it
was the stack.  This was incorrectly copied from i386.
For tci, the stack is R15.

By adding R4 back to tcg_target_call_iarg_regs, adjust the other
entries so that 6 (or 12) entries are still present in the array,
and adjust the numbers in the interpreter.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c                | 8 ++++----
 tcg/tci/tcg-target.c.inc | 7 +------
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index e0d815e4b2..935eb87330 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -511,14 +511,14 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
                                           tci_read_reg(regs, TCG_REG_R1),
                                           tci_read_reg(regs, TCG_REG_R2),
                                           tci_read_reg(regs, TCG_REG_R3),
+                                          tci_read_reg(regs, TCG_REG_R4),
                                           tci_read_reg(regs, TCG_REG_R5),
                                           tci_read_reg(regs, TCG_REG_R6),
                                           tci_read_reg(regs, TCG_REG_R7),
                                           tci_read_reg(regs, TCG_REG_R8),
                                           tci_read_reg(regs, TCG_REG_R9),
                                           tci_read_reg(regs, TCG_REG_R10),
-                                          tci_read_reg(regs, TCG_REG_R11),
-                                          tci_read_reg(regs, TCG_REG_R12));
+                                          tci_read_reg(regs, TCG_REG_R11));
             tci_write_reg(regs, TCG_REG_R0, tmp64);
             tci_write_reg(regs, TCG_REG_R1, tmp64 >> 32);
 #else
@@ -526,8 +526,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
                                           tci_read_reg(regs, TCG_REG_R1),
                                           tci_read_reg(regs, TCG_REG_R2),
                                           tci_read_reg(regs, TCG_REG_R3),
-                                          tci_read_reg(regs, TCG_REG_R5),
-                                          tci_read_reg(regs, TCG_REG_R6));
+                                          tci_read_reg(regs, TCG_REG_R4),
+                                          tci_read_reg(regs, TCG_REG_R5));
             tci_write_reg(regs, TCG_REG_R0, tmp64);
 #endif
             break;
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 7e3bed811e..aba7f75ad1 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -181,9 +181,7 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_R1,
     TCG_REG_R2,
     TCG_REG_R3,
-#if 0 /* used for TCG_REG_CALL_STACK */
     TCG_REG_R4,
-#endif
     TCG_REG_R5,
     TCG_REG_R6,
     TCG_REG_R7,
@@ -206,19 +204,16 @@ static const int tcg_target_call_iarg_regs[] = {
     TCG_REG_R1,
     TCG_REG_R2,
     TCG_REG_R3,
-#if 0 /* used for TCG_REG_CALL_STACK */
     TCG_REG_R4,
-#endif
     TCG_REG_R5,
-    TCG_REG_R6,
 #if TCG_TARGET_REG_BITS == 32
     /* 32 bit hosts need 2 * MAX_OPC_PARAM_IARGS registers. */
+    TCG_REG_R6,
     TCG_REG_R7,
     TCG_REG_R8,
     TCG_REG_R9,
     TCG_REG_R10,
     TCG_REG_R11,
-    TCG_REG_R12,
 #endif
 };
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 28/93] tcg/tci: Use bool in tcg_out_ri*
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (26 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 27/93] tcg/tci: Fix TCG_REG_R4 misusage Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04 15:11   ` Alex Bennée
  2021-02-04 15:15   ` Alex Bennée
  2021-02-04  1:44 ` [PATCH v2 29/93] tcg/tci: Remove TCG_CONST Richard Henderson
                   ` (67 subsequent siblings)
  95 siblings, 2 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

This is the intended behavior. Remove the assert on
the specific value passed, which can now never be
anything besides false/true (0/1).

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index aba7f75ad1..1b66368c94 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -295,10 +295,9 @@ static void tcg_out_r(TCGContext *s, TCGArg t0)
 }
 
 /* Write register or constant (native size). */
-static void tcg_out_ri(TCGContext *s, int const_arg, TCGArg arg)
+static void tcg_out_ri(TCGContext *s, bool const_arg, TCGArg arg)
 {
     if (const_arg) {
-        tcg_debug_assert(const_arg == 1);
         tcg_out8(s, TCG_CONST);
         tcg_out_i(s, arg);
     } else {
@@ -307,10 +306,9 @@ static void tcg_out_ri(TCGContext *s, int const_arg, TCGArg arg)
 }
 
 /* Write register or constant (32 bit). */
-static void tcg_out_ri32(TCGContext *s, int const_arg, TCGArg arg)
+static void tcg_out_ri32(TCGContext *s, bool const_arg, TCGArg arg)
 {
     if (const_arg) {
-        tcg_debug_assert(const_arg == 1);
         tcg_out8(s, TCG_CONST);
         tcg_out32(s, arg);
     } else {
@@ -320,10 +318,9 @@ static void tcg_out_ri32(TCGContext *s, int const_arg, TCGArg arg)
 
 #if TCG_TARGET_REG_BITS == 64
 /* Write register or constant (64 bit). */
-static void tcg_out_ri64(TCGContext *s, int const_arg, TCGArg arg)
+static void tcg_out_ri64(TCGContext *s, bool const_arg, TCGArg arg)
 {
     if (const_arg) {
-        tcg_debug_assert(const_arg == 1);
         tcg_out8(s, TCG_CONST);
         tcg_out64(s, arg);
     } else {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 29/93] tcg/tci: Remove TCG_CONST
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (27 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 28/93] tcg/tci: Use bool in tcg_out_ri* Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04 15:39   ` Alex Bennée
  2021-02-04  1:44 ` [PATCH v2 30/93] tcg/tci: Merge identical cases in generation Richard Henderson
                   ` (66 subsequent siblings)
  95 siblings, 1 reply; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Only allow registers or constants, but not both, in any
given position.  Removing this difference in input will
allow more code to be shared between 32-bit and 64-bit.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target-con-set.h |   6 +-
 tcg/tci/tcg-target.h         |   3 -
 tcg/tci.c                    | 189 +++++++++++++----------------------
 tcg/tci/tcg-target.c.inc     |  82 ++++-----------
 4 files changed, 89 insertions(+), 191 deletions(-)

diff --git a/tcg/tci/tcg-target-con-set.h b/tcg/tci/tcg-target-con-set.h
index 38e82f7535..f51b7bcb13 100644
--- a/tcg/tci/tcg-target-con-set.h
+++ b/tcg/tci/tcg-target-con-set.h
@@ -10,16 +10,12 @@
  * tcg-target-con-str.h; the constraint combination is inclusive or.
  */
 C_O0_I2(r, r)
-C_O0_I2(r, ri)
 C_O0_I3(r, r, r)
-C_O0_I4(r, r, ri, ri)
 C_O0_I4(r, r, r, r)
 C_O1_I1(r, r)
 C_O1_I2(r, 0, r)
-C_O1_I2(r, ri, ri)
 C_O1_I2(r, r, r)
-C_O1_I2(r, r, ri)
-C_O1_I4(r, r, r, ri, ri)
+C_O1_I4(r, r, r, r, r)
 C_O2_I1(r, r, r)
 C_O2_I2(r, r, r, r)
 C_O2_I4(r, r, r, r, r, r)
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 8f7ed676fc..9c0021a26f 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -157,9 +157,6 @@ typedef enum {
 
     TCG_AREG0 = TCG_REG_R14,
     TCG_REG_CALL_STACK = TCG_REG_R15,
-
-    /* Special value UINT8_MAX is used by TCI to encode constant values. */
-    TCG_CONST = UINT8_MAX
 } TCGReg;
 
 /* Used for function call generation. */
diff --git a/tcg/tci.c b/tcg/tci.c
index 935eb87330..fb3c97aaf1 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -255,61 +255,6 @@ tci_read_ulong(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
     return taddr;
 }
 
-/* Read indexed register or constant (native size) from bytecode. */
-static tcg_target_ulong
-tci_read_ri(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
-{
-    tcg_target_ulong value;
-    TCGReg r = **tb_ptr;
-    *tb_ptr += 1;
-    if (r == TCG_CONST) {
-        value = tci_read_i(tb_ptr);
-    } else {
-        value = tci_read_reg(regs, r);
-    }
-    return value;
-}
-
-/* Read indexed register or constant (32 bit) from bytecode. */
-static uint32_t tci_read_ri32(const tcg_target_ulong *regs,
-                              const uint8_t **tb_ptr)
-{
-    uint32_t value;
-    TCGReg r = **tb_ptr;
-    *tb_ptr += 1;
-    if (r == TCG_CONST) {
-        value = tci_read_i32(tb_ptr);
-    } else {
-        value = tci_read_reg32(regs, r);
-    }
-    return value;
-}
-
-#if TCG_TARGET_REG_BITS == 32
-/* Read two indexed registers or constants (2 * 32 bit) from bytecode. */
-static uint64_t tci_read_ri64(const tcg_target_ulong *regs,
-                              const uint8_t **tb_ptr)
-{
-    uint32_t low = tci_read_ri32(regs, tb_ptr);
-    return tci_uint64(tci_read_ri32(regs, tb_ptr), low);
-}
-#elif TCG_TARGET_REG_BITS == 64
-/* Read indexed register or constant (64 bit) from bytecode. */
-static uint64_t tci_read_ri64(const tcg_target_ulong *regs,
-                              const uint8_t **tb_ptr)
-{
-    uint64_t value;
-    TCGReg r = **tb_ptr;
-    *tb_ptr += 1;
-    if (r == TCG_CONST) {
-        value = tci_read_i64(tb_ptr);
-    } else {
-        value = tci_read_reg64(regs, r);
-    }
-    return value;
-}
-#endif
-
 static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
 {
     tcg_target_ulong label = tci_read_i(tb_ptr);
@@ -504,7 +449,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         switch (opc) {
         case INDEX_op_call:
-            t0 = tci_read_ri(regs, &tb_ptr);
+            t0 = tci_read_i(&tb_ptr);
             tci_tb_ptr = (uintptr_t)tb_ptr;
 #if TCG_TARGET_REG_BITS == 32
             tmp64 = ((helper_function)t0)(tci_read_reg(regs, TCG_REG_R0),
@@ -539,7 +484,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_setcond_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             condition = *tb_ptr++;
             tci_write_reg(regs, t0, tci_compare32(t1, t2, condition));
             break;
@@ -547,7 +492,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_setcond2_i32:
             t0 = *tb_ptr++;
             tmp64 = tci_read_r64(regs, &tb_ptr);
-            v64 = tci_read_ri64(regs, &tb_ptr);
+            v64 = tci_read_r64(regs, &tb_ptr);
             condition = *tb_ptr++;
             tci_write_reg(regs, t0, tci_compare64(tmp64, v64, condition));
             break;
@@ -555,7 +500,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_setcond_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             condition = *tb_ptr++;
             tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
             break;
@@ -628,62 +573,62 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_add_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 + t2);
             break;
         case INDEX_op_sub_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 - t2);
             break;
         case INDEX_op_mul_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 * t2);
             break;
         case INDEX_op_div_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int32_t)t1 / (int32_t)t2);
             break;
         case INDEX_op_divu_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 / t2);
             break;
         case INDEX_op_rem_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int32_t)t1 % (int32_t)t2);
             break;
         case INDEX_op_remu_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 % t2);
             break;
         case INDEX_op_and_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 & t2);
             break;
         case INDEX_op_or_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 | t2);
             break;
         case INDEX_op_xor_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 ^ t2);
             break;
 
@@ -691,33 +636,33 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_shl_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 << (t2 & 31));
             break;
         case INDEX_op_shr_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 >> (t2 & 31));
             break;
         case INDEX_op_sar_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, ((int32_t)t1 >> (t2 & 31)));
             break;
 #if TCG_TARGET_HAS_rot_i32
         case INDEX_op_rotl_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, rol32(t1, t2 & 31));
             break;
         case INDEX_op_rotr_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri32(regs, &tb_ptr);
-            t2 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
+            t2 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg(regs, t0, ror32(t1, t2 & 31));
             break;
 #endif
@@ -734,7 +679,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #endif
         case INDEX_op_brcond_i32:
             t0 = tci_read_r32(regs, &tb_ptr);
-            t1 = tci_read_ri32(regs, &tb_ptr);
+            t1 = tci_read_r32(regs, &tb_ptr);
             condition = *tb_ptr++;
             label = tci_read_label(&tb_ptr);
             if (tci_compare32(t0, t1, condition)) {
@@ -760,7 +705,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
         case INDEX_op_brcond2_i32:
             tmp64 = tci_read_r64(regs, &tb_ptr);
-            v64 = tci_read_ri64(regs, &tb_ptr);
+            v64 = tci_read_r64(regs, &tb_ptr);
             condition = *tb_ptr++;
             label = tci_read_label(&tb_ptr);
             if (tci_compare64(tmp64, v64, condition)) {
@@ -870,62 +815,62 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_add_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 + t2);
             break;
         case INDEX_op_sub_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 - t2);
             break;
         case INDEX_op_mul_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 * t2);
             break;
         case INDEX_op_div_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int64_t)t1 / (int64_t)t2);
             break;
         case INDEX_op_divu_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint64_t)t1 / (uint64_t)t2);
             break;
         case INDEX_op_rem_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int64_t)t1 % (int64_t)t2);
             break;
         case INDEX_op_remu_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint64_t)t1 % (uint64_t)t2);
             break;
         case INDEX_op_and_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 & t2);
             break;
         case INDEX_op_or_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 | t2);
             break;
         case INDEX_op_xor_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 ^ t2);
             break;
 
@@ -933,33 +878,33 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_shl_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 << (t2 & 63));
             break;
         case INDEX_op_shr_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 >> (t2 & 63));
             break;
         case INDEX_op_sar_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, ((int64_t)t1 >> (t2 & 63)));
             break;
 #if TCG_TARGET_HAS_rot_i64
         case INDEX_op_rotl_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, rol64(t1, t2 & 63));
             break;
         case INDEX_op_rotr_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_ri64(regs, &tb_ptr);
-            t2 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
+            t2 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg(regs, t0, ror64(t1, t2 & 63));
             break;
 #endif
@@ -976,7 +921,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #endif
         case INDEX_op_brcond_i64:
             t0 = tci_read_r64(regs, &tb_ptr);
-            t1 = tci_read_ri64(regs, &tb_ptr);
+            t1 = tci_read_r64(regs, &tb_ptr);
             condition = *tb_ptr++;
             label = tci_read_label(&tb_ptr);
             if (tci_compare64(t0, t1, condition)) {
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 1b66368c94..feac4659cc 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -92,8 +92,6 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_rem_i64:
     case INDEX_op_remu_i32:
     case INDEX_op_remu_i64:
-        return C_O1_I2(r, r, r);
-
     case INDEX_op_add_i32:
     case INDEX_op_add_i64:
     case INDEX_op_sub_i32:
@@ -126,8 +124,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_rotl_i64:
     case INDEX_op_rotr_i32:
     case INDEX_op_rotr_i64:
-        /* TODO: Does R, RI, RI result in faster code than R, R, RI? */
-        return C_O1_I2(r, ri, ri);
+    case INDEX_op_setcond_i32:
+    case INDEX_op_setcond_i64:
+        return C_O1_I2(r, r, r);
 
     case INDEX_op_deposit_i32:
     case INDEX_op_deposit_i64:
@@ -135,11 +134,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_brcond_i32:
     case INDEX_op_brcond_i64:
-        return C_O0_I2(r, ri);
-
-    case INDEX_op_setcond_i32:
-    case INDEX_op_setcond_i64:
-        return C_O1_I2(r, r, ri);
+        return C_O0_I2(r, r);
 
 #if TCG_TARGET_REG_BITS == 32
     /* TODO: Support R, R, R, R, RI, RI? Will it be faster? */
@@ -147,11 +142,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_sub2_i32:
         return C_O2_I4(r, r, r, r, r, r);
     case INDEX_op_brcond2_i32:
-        return C_O0_I4(r, r, ri, ri);
+        return C_O0_I4(r, r, r, r);
     case INDEX_op_mulu2_i32:
         return C_O2_I2(r, r, r, r);
     case INDEX_op_setcond2_i32:
-        return C_O1_I4(r, r, r, ri, ri);
+        return C_O1_I4(r, r, r, r, r);
 #endif
 
     case INDEX_op_qemu_ld_i32:
@@ -294,41 +289,6 @@ static void tcg_out_r(TCGContext *s, TCGArg t0)
     tcg_out8(s, t0);
 }
 
-/* Write register or constant (native size). */
-static void tcg_out_ri(TCGContext *s, bool const_arg, TCGArg arg)
-{
-    if (const_arg) {
-        tcg_out8(s, TCG_CONST);
-        tcg_out_i(s, arg);
-    } else {
-        tcg_out_r(s, arg);
-    }
-}
-
-/* Write register or constant (32 bit). */
-static void tcg_out_ri32(TCGContext *s, bool const_arg, TCGArg arg)
-{
-    if (const_arg) {
-        tcg_out8(s, TCG_CONST);
-        tcg_out32(s, arg);
-    } else {
-        tcg_out_r(s, arg);
-    }
-}
-
-#if TCG_TARGET_REG_BITS == 64
-/* Write register or constant (64 bit). */
-static void tcg_out_ri64(TCGContext *s, bool const_arg, TCGArg arg)
-{
-    if (const_arg) {
-        tcg_out8(s, TCG_CONST);
-        tcg_out64(s, arg);
-    } else {
-        tcg_out_r(s, arg);
-    }
-}
-#endif
-
 /* Write label. */
 static void tci_out_label(TCGContext *s, TCGLabel *label)
 {
@@ -416,7 +376,7 @@ static inline void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
 {
     uint8_t *old_code_ptr = s->code_ptr;
     tcg_out_op_t(s, INDEX_op_call);
-    tcg_out_ri(s, 1, (uintptr_t)arg);
+    tcg_out_i(s, (uintptr_t)arg);
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
@@ -450,7 +410,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_setcond_i32:
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
-        tcg_out_ri32(s, const_args[2], args[2]);
+        tcg_out_r(s, args[2]);
         tcg_out8(s, args[3]);   /* condition */
         break;
 #if TCG_TARGET_REG_BITS == 32
@@ -459,15 +419,15 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_out_r(s, args[2]);
-        tcg_out_ri32(s, const_args[3], args[3]);
-        tcg_out_ri32(s, const_args[4], args[4]);
+        tcg_out_r(s, args[3]);
+        tcg_out_r(s, args[4]);
         tcg_out8(s, args[5]);   /* condition */
         break;
 #elif TCG_TARGET_REG_BITS == 64
     case INDEX_op_setcond_i64:
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
-        tcg_out_ri64(s, const_args[2], args[2]);
+        tcg_out_r(s, args[2]);
         tcg_out8(s, args[3]);   /* condition */
         break;
 #endif
@@ -513,8 +473,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_rotl_i32:     /* Optional (TCG_TARGET_HAS_rot_i32). */
     case INDEX_op_rotr_i32:     /* Optional (TCG_TARGET_HAS_rot_i32). */
         tcg_out_r(s, args[0]);
-        tcg_out_ri32(s, const_args[1], args[1]);
-        tcg_out_ri32(s, const_args[2], args[2]);
+        tcg_out_r(s, args[1]);
+        tcg_out_r(s, args[2]);
         break;
     case INDEX_op_deposit_i32:  /* Optional (TCG_TARGET_HAS_deposit_i32). */
         tcg_out_r(s, args[0]);
@@ -548,8 +508,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_rem_i64:      /* Optional (TCG_TARGET_HAS_div_i64). */
     case INDEX_op_remu_i64:     /* Optional (TCG_TARGET_HAS_div_i64). */
         tcg_out_r(s, args[0]);
-        tcg_out_ri64(s, const_args[1], args[1]);
-        tcg_out_ri64(s, const_args[2], args[2]);
+        tcg_out_r(s, args[1]);
+        tcg_out_r(s, args[2]);
         break;
     case INDEX_op_deposit_i64:  /* Optional (TCG_TARGET_HAS_deposit_i64). */
         tcg_out_r(s, args[0]);
@@ -562,7 +522,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
     case INDEX_op_brcond_i64:
         tcg_out_r(s, args[0]);
-        tcg_out_ri64(s, const_args[1], args[1]);
+        tcg_out_r(s, args[1]);
         tcg_out8(s, args[2]);           /* condition */
         tci_out_label(s, arg_label(args[3]));
         break;
@@ -596,8 +556,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_rem_i32:      /* Optional (TCG_TARGET_HAS_div_i32). */
     case INDEX_op_remu_i32:     /* Optional (TCG_TARGET_HAS_div_i32). */
         tcg_out_r(s, args[0]);
-        tcg_out_ri32(s, const_args[1], args[1]);
-        tcg_out_ri32(s, const_args[2], args[2]);
+        tcg_out_r(s, args[1]);
+        tcg_out_r(s, args[2]);
         break;
 #if TCG_TARGET_REG_BITS == 32
     case INDEX_op_add2_i32:
@@ -612,8 +572,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_brcond2_i32:
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
-        tcg_out_ri32(s, const_args[2], args[2]);
-        tcg_out_ri32(s, const_args[3], args[3]);
+        tcg_out_r(s, args[2]);
+        tcg_out_r(s, args[3]);
         tcg_out8(s, args[4]);           /* condition */
         tci_out_label(s, arg_label(args[5]));
         break;
@@ -626,7 +586,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 #endif
     case INDEX_op_brcond_i32:
         tcg_out_r(s, args[0]);
-        tcg_out_ri32(s, const_args[1], args[1]);
+        tcg_out_r(s, args[1]);
         tcg_out8(s, args[2]);           /* condition */
         tci_out_label(s, arg_label(args[3]));
         break;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 30/93] tcg/tci: Merge identical cases in generation
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (28 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 29/93] tcg/tci: Remove TCG_CONST Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 31/93] tcg/tci: Remove tci_read_r8 Richard Henderson
                   ` (65 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Use CASE_32_64 and CASE_64 to reduce ifdefs and merge
cases that are identical between 32-bit and 64-bit hosts.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 204 ++++++++++++++-------------------------
 1 file changed, 73 insertions(+), 131 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index feac4659cc..c79f9c32d8 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -380,6 +380,18 @@ static inline void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+#if TCG_TARGET_REG_BITS == 64
+# define CASE_32_64(x) \
+        case glue(glue(INDEX_op_, x), _i64): \
+        case glue(glue(INDEX_op_, x), _i32):
+# define CASE_64(x) \
+        case glue(glue(INDEX_op_, x), _i64):
+#else
+# define CASE_32_64(x) \
+        case glue(glue(INDEX_op_, x), _i32):
+# define CASE_64(x)
+#endif
+
 static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
                        const int *const_args)
 {
@@ -391,6 +403,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_exit_tb:
         tcg_out64(s, args[0]);
         break;
+
     case INDEX_op_goto_tb:
         if (s->tb_jmp_insn_offset) {
             /* Direct jump method. */
@@ -404,15 +417,18 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         }
         set_jmp_reset_offset(s, args[0]);
         break;
+
     case INDEX_op_br:
         tci_out_label(s, arg_label(args[0]));
         break;
-    case INDEX_op_setcond_i32:
+
+    CASE_32_64(setcond)
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_out_r(s, args[2]);
         tcg_out8(s, args[3]);   /* condition */
         break;
+
 #if TCG_TARGET_REG_BITS == 32
     case INDEX_op_setcond2_i32:
         /* setcond2_i32 cond, t0, t1_low, t1_high, t2_low, t2_high */
@@ -423,60 +439,54 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_r(s, args[4]);
         tcg_out8(s, args[5]);   /* condition */
         break;
-#elif TCG_TARGET_REG_BITS == 64
-    case INDEX_op_setcond_i64:
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        tcg_out_r(s, args[2]);
-        tcg_out8(s, args[3]);   /* condition */
-        break;
 #endif
-    case INDEX_op_ld8u_i32:
-    case INDEX_op_ld8s_i32:
-    case INDEX_op_ld16u_i32:
-    case INDEX_op_ld16s_i32:
+
+    CASE_32_64(ld8u)
+    CASE_32_64(ld8s)
+    CASE_32_64(ld16u)
+    CASE_32_64(ld16s)
     case INDEX_op_ld_i32:
-    case INDEX_op_st8_i32:
-    case INDEX_op_st16_i32:
+    CASE_64(ld32u)
+    CASE_64(ld32s)
+    CASE_64(ld)
+    CASE_32_64(st8)
+    CASE_32_64(st16)
     case INDEX_op_st_i32:
-    case INDEX_op_ld8u_i64:
-    case INDEX_op_ld8s_i64:
-    case INDEX_op_ld16u_i64:
-    case INDEX_op_ld16s_i64:
-    case INDEX_op_ld32u_i64:
-    case INDEX_op_ld32s_i64:
-    case INDEX_op_ld_i64:
-    case INDEX_op_st8_i64:
-    case INDEX_op_st16_i64:
-    case INDEX_op_st32_i64:
-    case INDEX_op_st_i64:
+    CASE_64(st32)
+    CASE_64(st)
         stack_bounds_check(args[1], args[2]);
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_debug_assert(args[2] == (int32_t)args[2]);
         tcg_out32(s, args[2]);
         break;
-    case INDEX_op_add_i32:
-    case INDEX_op_sub_i32:
-    case INDEX_op_mul_i32:
-    case INDEX_op_and_i32:
-    case INDEX_op_andc_i32:     /* Optional (TCG_TARGET_HAS_andc_i32). */
-    case INDEX_op_eqv_i32:      /* Optional (TCG_TARGET_HAS_eqv_i32). */
-    case INDEX_op_nand_i32:     /* Optional (TCG_TARGET_HAS_nand_i32). */
-    case INDEX_op_nor_i32:      /* Optional (TCG_TARGET_HAS_nor_i32). */
-    case INDEX_op_or_i32:
-    case INDEX_op_orc_i32:      /* Optional (TCG_TARGET_HAS_orc_i32). */
-    case INDEX_op_xor_i32:
-    case INDEX_op_shl_i32:
-    case INDEX_op_shr_i32:
-    case INDEX_op_sar_i32:
-    case INDEX_op_rotl_i32:     /* Optional (TCG_TARGET_HAS_rot_i32). */
-    case INDEX_op_rotr_i32:     /* Optional (TCG_TARGET_HAS_rot_i32). */
+
+    CASE_32_64(add)
+    CASE_32_64(sub)
+    CASE_32_64(mul)
+    CASE_32_64(and)
+    CASE_32_64(or)
+    CASE_32_64(xor)
+    CASE_32_64(andc)     /* Optional (TCG_TARGET_HAS_andc_*). */
+    CASE_32_64(orc)      /* Optional (TCG_TARGET_HAS_orc_*). */
+    CASE_32_64(eqv)      /* Optional (TCG_TARGET_HAS_eqv_*). */
+    CASE_32_64(nand)     /* Optional (TCG_TARGET_HAS_nand_*). */
+    CASE_32_64(nor)      /* Optional (TCG_TARGET_HAS_nor_*). */
+    CASE_32_64(shl)
+    CASE_32_64(shr)
+    CASE_32_64(sar)
+    CASE_32_64(rotl)     /* Optional (TCG_TARGET_HAS_rot_*). */
+    CASE_32_64(rotr)     /* Optional (TCG_TARGET_HAS_rot_*). */
+    CASE_32_64(div)      /* Optional (TCG_TARGET_HAS_div_*). */
+    CASE_32_64(divu)     /* Optional (TCG_TARGET_HAS_div_*). */
+    CASE_32_64(rem)      /* Optional (TCG_TARGET_HAS_div_*). */
+    CASE_32_64(remu)     /* Optional (TCG_TARGET_HAS_div_*). */
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_out_r(s, args[2]);
         break;
-    case INDEX_op_deposit_i32:  /* Optional (TCG_TARGET_HAS_deposit_i32). */
+
+    CASE_32_64(deposit)  /* Optional (TCG_TARGET_HAS_deposit_*). */
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_out_r(s, args[2]);
@@ -486,79 +496,30 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out8(s, args[4]);
         break;
 
-#if TCG_TARGET_REG_BITS == 64
-    case INDEX_op_add_i64:
-    case INDEX_op_sub_i64:
-    case INDEX_op_mul_i64:
-    case INDEX_op_and_i64:
-    case INDEX_op_andc_i64:     /* Optional (TCG_TARGET_HAS_andc_i64). */
-    case INDEX_op_eqv_i64:      /* Optional (TCG_TARGET_HAS_eqv_i64). */
-    case INDEX_op_nand_i64:     /* Optional (TCG_TARGET_HAS_nand_i64). */
-    case INDEX_op_nor_i64:      /* Optional (TCG_TARGET_HAS_nor_i64). */
-    case INDEX_op_or_i64:
-    case INDEX_op_orc_i64:      /* Optional (TCG_TARGET_HAS_orc_i64). */
-    case INDEX_op_xor_i64:
-    case INDEX_op_shl_i64:
-    case INDEX_op_shr_i64:
-    case INDEX_op_sar_i64:
-    case INDEX_op_rotl_i64:     /* Optional (TCG_TARGET_HAS_rot_i64). */
-    case INDEX_op_rotr_i64:     /* Optional (TCG_TARGET_HAS_rot_i64). */
-    case INDEX_op_div_i64:      /* Optional (TCG_TARGET_HAS_div_i64). */
-    case INDEX_op_divu_i64:     /* Optional (TCG_TARGET_HAS_div_i64). */
-    case INDEX_op_rem_i64:      /* Optional (TCG_TARGET_HAS_div_i64). */
-    case INDEX_op_remu_i64:     /* Optional (TCG_TARGET_HAS_div_i64). */
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        tcg_out_r(s, args[2]);
-        break;
-    case INDEX_op_deposit_i64:  /* Optional (TCG_TARGET_HAS_deposit_i64). */
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        tcg_out_r(s, args[2]);
-        tcg_debug_assert(args[3] <= UINT8_MAX);
-        tcg_out8(s, args[3]);
-        tcg_debug_assert(args[4] <= UINT8_MAX);
-        tcg_out8(s, args[4]);
-        break;
-    case INDEX_op_brcond_i64:
+    CASE_32_64(brcond)
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_out8(s, args[2]);           /* condition */
         tci_out_label(s, arg_label(args[3]));
         break;
-    case INDEX_op_bswap16_i64:  /* Optional (TCG_TARGET_HAS_bswap16_i64). */
-    case INDEX_op_bswap32_i64:  /* Optional (TCG_TARGET_HAS_bswap32_i64). */
-    case INDEX_op_bswap64_i64:  /* Optional (TCG_TARGET_HAS_bswap64_i64). */
-    case INDEX_op_not_i64:      /* Optional (TCG_TARGET_HAS_not_i64). */
-    case INDEX_op_neg_i64:      /* Optional (TCG_TARGET_HAS_neg_i64). */
-    case INDEX_op_ext8s_i64:    /* Optional (TCG_TARGET_HAS_ext8s_i64). */
-    case INDEX_op_ext8u_i64:    /* Optional (TCG_TARGET_HAS_ext8u_i64). */
-    case INDEX_op_ext16s_i64:   /* Optional (TCG_TARGET_HAS_ext16s_i64). */
-    case INDEX_op_ext16u_i64:   /* Optional (TCG_TARGET_HAS_ext16u_i64). */
-    case INDEX_op_ext32s_i64:   /* Optional (TCG_TARGET_HAS_ext32s_i64). */
-    case INDEX_op_ext32u_i64:   /* Optional (TCG_TARGET_HAS_ext32u_i64). */
-    case INDEX_op_ext_i32_i64:
-    case INDEX_op_extu_i32_i64:
-#endif /* TCG_TARGET_REG_BITS == 64 */
-    case INDEX_op_neg_i32:      /* Optional (TCG_TARGET_HAS_neg_i32). */
-    case INDEX_op_not_i32:      /* Optional (TCG_TARGET_HAS_not_i32). */
-    case INDEX_op_ext8s_i32:    /* Optional (TCG_TARGET_HAS_ext8s_i32). */
-    case INDEX_op_ext16s_i32:   /* Optional (TCG_TARGET_HAS_ext16s_i32). */
-    case INDEX_op_ext8u_i32:    /* Optional (TCG_TARGET_HAS_ext8u_i32). */
-    case INDEX_op_ext16u_i32:   /* Optional (TCG_TARGET_HAS_ext16u_i32). */
-    case INDEX_op_bswap16_i32:  /* Optional (TCG_TARGET_HAS_bswap16_i32). */
-    case INDEX_op_bswap32_i32:  /* Optional (TCG_TARGET_HAS_bswap32_i32). */
+
+    CASE_32_64(neg)      /* Optional (TCG_TARGET_HAS_neg_*). */
+    CASE_32_64(not)      /* Optional (TCG_TARGET_HAS_not_*). */
+    CASE_32_64(ext8s)    /* Optional (TCG_TARGET_HAS_ext8s_*). */
+    CASE_32_64(ext8u)    /* Optional (TCG_TARGET_HAS_ext8u_*). */
+    CASE_32_64(ext16s)   /* Optional (TCG_TARGET_HAS_ext16s_*). */
+    CASE_32_64(ext16u)   /* Optional (TCG_TARGET_HAS_ext16u_*). */
+    CASE_64(ext32s)      /* Optional (TCG_TARGET_HAS_ext32s_i64). */
+    CASE_64(ext32u)      /* Optional (TCG_TARGET_HAS_ext32u_i64). */
+    CASE_64(ext_i32)
+    CASE_64(extu_i32)
+    CASE_32_64(bswap16)  /* Optional (TCG_TARGET_HAS_bswap16_*). */
+    CASE_32_64(bswap32)  /* Optional (TCG_TARGET_HAS_bswap32_*). */
+    CASE_64(bswap64)     /* Optional (TCG_TARGET_HAS_bswap64_i64). */
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         break;
-    case INDEX_op_div_i32:      /* Optional (TCG_TARGET_HAS_div_i32). */
-    case INDEX_op_divu_i32:     /* Optional (TCG_TARGET_HAS_div_i32). */
-    case INDEX_op_rem_i32:      /* Optional (TCG_TARGET_HAS_div_i32). */
-    case INDEX_op_remu_i32:     /* Optional (TCG_TARGET_HAS_div_i32). */
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        tcg_out_r(s, args[2]);
-        break;
+
 #if TCG_TARGET_REG_BITS == 32
     case INDEX_op_add2_i32:
     case INDEX_op_sub2_i32:
@@ -584,31 +545,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_r(s, args[3]);
         break;
 #endif
-    case INDEX_op_brcond_i32:
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        tcg_out8(s, args[2]);           /* condition */
-        tci_out_label(s, arg_label(args[3]));
-        break;
+
     case INDEX_op_qemu_ld_i32:
-        tcg_out_r(s, *args++);
-        tcg_out_r(s, *args++);
-        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-            tcg_out_r(s, *args++);
-        }
-        tcg_out_i(s, *args++);
-        break;
-    case INDEX_op_qemu_ld_i64:
-        tcg_out_r(s, *args++);
-        if (TCG_TARGET_REG_BITS == 32) {
-            tcg_out_r(s, *args++);
-        }
-        tcg_out_r(s, *args++);
-        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-            tcg_out_r(s, *args++);
-        }
-        tcg_out_i(s, *args++);
-        break;
     case INDEX_op_qemu_st_i32:
         tcg_out_r(s, *args++);
         tcg_out_r(s, *args++);
@@ -617,6 +555,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         }
         tcg_out_i(s, *args++);
         break;
+
+    case INDEX_op_qemu_ld_i64:
     case INDEX_op_qemu_st_i64:
         tcg_out_r(s, *args++);
         if (TCG_TARGET_REG_BITS == 32) {
@@ -628,8 +568,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         }
         tcg_out_i(s, *args++);
         break;
+
     case INDEX_op_mb:
         break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 31/93] tcg/tci: Remove tci_read_r8
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (29 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 30/93] tcg/tci: Merge identical cases in generation Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 32/93] tcg/tci: Remove tci_read_r8s Richard Henderson
                   ` (64 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Use explicit casts for ext8u opcodes, and allow truncation
to happen with the store for st8 opcodes.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index fb3c97aaf1..c44a4aec7b 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -78,11 +78,6 @@ static int32_t tci_read_reg32s(const tcg_target_ulong *regs, TCGReg index)
 }
 #endif
 
-static uint8_t tci_read_reg8(const tcg_target_ulong *regs, TCGReg index)
-{
-    return (uint8_t)tci_read_reg(regs, index);
-}
-
 static uint16_t tci_read_reg16(const tcg_target_ulong *regs, TCGReg index)
 {
     return (uint16_t)tci_read_reg(regs, index);
@@ -169,14 +164,6 @@ tci_read_r(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
     return value;
 }
 
-/* Read indexed register (8 bit) from bytecode. */
-static uint8_t tci_read_r8(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
-{
-    uint8_t value = tci_read_reg8(regs, **tb_ptr);
-    *tb_ptr += 1;
-    return value;
-}
-
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
 /* Read indexed register (8 bit signed) from bytecode. */
 static int8_t tci_read_r8s(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
@@ -550,7 +537,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_write_reg(regs, t0, *(uint32_t *)(t1 + t2));
             break;
         CASE_32_64(st8)
-            t0 = tci_read_r8(regs, &tb_ptr);
+            t0 = tci_read_r(regs, &tb_ptr);
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             *(uint8_t *)(t1 + t2) = t0;
@@ -739,8 +726,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #if TCG_TARGET_HAS_ext8u_i32
         case INDEX_op_ext8u_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r8(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1);
+            t1 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (uint8_t)t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext16u_i32
@@ -933,8 +920,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #if TCG_TARGET_HAS_ext8u_i64
         case INDEX_op_ext8u_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r8(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1);
+            t1 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (uint8_t)t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext8s_i64
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 32/93] tcg/tci: Remove tci_read_r8s
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (30 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 31/93] tcg/tci: Remove tci_read_r8 Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 33/93] tcg/tci: Remove tci_read_r16 Richard Henderson
                   ` (63 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Use explicit casts for ext8s opcodes.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index c44a4aec7b..25db479e62 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -57,13 +57,6 @@ static tcg_target_ulong tci_read_reg(const tcg_target_ulong *regs, TCGReg index)
     return regs[index];
 }
 
-#if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
-static int8_t tci_read_reg8s(const tcg_target_ulong *regs, TCGReg index)
-{
-    return (int8_t)tci_read_reg(regs, index);
-}
-#endif
-
 #if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
 static int16_t tci_read_reg16s(const tcg_target_ulong *regs, TCGReg index)
 {
@@ -164,16 +157,6 @@ tci_read_r(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
     return value;
 }
 
-#if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
-/* Read indexed register (8 bit signed) from bytecode. */
-static int8_t tci_read_r8s(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
-{
-    int8_t value = tci_read_reg8s(regs, **tb_ptr);
-    *tb_ptr += 1;
-    return value;
-}
-#endif
-
 /* Read indexed register (16 bit) from bytecode. */
 static uint16_t tci_read_r16(const tcg_target_ulong *regs,
                              const uint8_t **tb_ptr)
@@ -712,8 +695,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #if TCG_TARGET_HAS_ext8s_i32
         case INDEX_op_ext8s_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r8s(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1);
+            t1 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (int8_t)t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext16s_i32
@@ -927,8 +910,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #if TCG_TARGET_HAS_ext8s_i64
         case INDEX_op_ext8s_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r8s(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1);
+            t1 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (int8_t)t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext16s_i64
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 33/93] tcg/tci: Remove tci_read_r16
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (31 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 32/93] tcg/tci: Remove tci_read_r8s Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 34/93] tcg/tci: Remove tci_read_r16s Richard Henderson
                   ` (62 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Use explicit casts for ext16u opcodes, and allow truncation
to happen with the store for st8 opcodes, and with the call
for bswap16 opcodes.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 28 +++++++---------------------
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 25db479e62..547be0c2f0 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -71,11 +71,6 @@ static int32_t tci_read_reg32s(const tcg_target_ulong *regs, TCGReg index)
 }
 #endif
 
-static uint16_t tci_read_reg16(const tcg_target_ulong *regs, TCGReg index)
-{
-    return (uint16_t)tci_read_reg(regs, index);
-}
-
 static uint32_t tci_read_reg32(const tcg_target_ulong *regs, TCGReg index)
 {
     return (uint32_t)tci_read_reg(regs, index);
@@ -157,15 +152,6 @@ tci_read_r(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
     return value;
 }
 
-/* Read indexed register (16 bit) from bytecode. */
-static uint16_t tci_read_r16(const tcg_target_ulong *regs,
-                             const uint8_t **tb_ptr)
-{
-    uint16_t value = tci_read_reg16(regs, **tb_ptr);
-    *tb_ptr += 1;
-    return value;
-}
-
 #if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
 /* Read indexed register (16 bit signed) from bytecode. */
 static int16_t tci_read_r16s(const tcg_target_ulong *regs,
@@ -526,7 +512,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             *(uint8_t *)(t1 + t2) = t0;
             break;
         CASE_32_64(st16)
-            t0 = tci_read_r16(regs, &tb_ptr);
+            t0 = tci_read_r(regs, &tb_ptr);
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             *(uint16_t *)(t1 + t2) = t0;
@@ -716,14 +702,14 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #if TCG_TARGET_HAS_ext16u_i32
         case INDEX_op_ext16u_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r16(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1);
+            t1 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (uint16_t)t1);
             break;
 #endif
 #if TCG_TARGET_HAS_bswap16_i32
         case INDEX_op_bswap16_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r16(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, bswap16(t1));
             break;
 #endif
@@ -924,8 +910,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #if TCG_TARGET_HAS_ext16u_i64
         case INDEX_op_ext16u_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r16(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1);
+            t1 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (uint16_t)t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext32s_i64
@@ -947,7 +933,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #if TCG_TARGET_HAS_bswap16_i64
         case INDEX_op_bswap16_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r16(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, bswap16(t1));
             break;
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 34/93] tcg/tci: Remove tci_read_r16s
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (32 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 33/93] tcg/tci: Remove tci_read_r16 Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 35/93] tcg/tci: Remove tci_read_r32s Richard Henderson
                   ` (61 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Use explicit casts for ext16s opcodes.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 148 +++++++++++++++++++++---------------------------------
 1 file changed, 58 insertions(+), 90 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 547be0c2f0..72ec63e18e 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -57,13 +57,6 @@ static tcg_target_ulong tci_read_reg(const tcg_target_ulong *regs, TCGReg index)
     return regs[index];
 }
 
-#if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
-static int16_t tci_read_reg16s(const tcg_target_ulong *regs, TCGReg index)
-{
-    return (int16_t)tci_read_reg(regs, index);
-}
-#endif
-
 #if TCG_TARGET_REG_BITS == 64
 static int32_t tci_read_reg32s(const tcg_target_ulong *regs, TCGReg index)
 {
@@ -71,11 +64,6 @@ static int32_t tci_read_reg32s(const tcg_target_ulong *regs, TCGReg index)
 }
 #endif
 
-static uint32_t tci_read_reg32(const tcg_target_ulong *regs, TCGReg index)
-{
-    return (uint32_t)tci_read_reg(regs, index);
-}
-
 #if TCG_TARGET_REG_BITS == 64
 static uint64_t tci_read_reg64(const tcg_target_ulong *regs, TCGReg index)
 {
@@ -152,33 +140,13 @@ tci_read_r(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
     return value;
 }
 
-#if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
-/* Read indexed register (16 bit signed) from bytecode. */
-static int16_t tci_read_r16s(const tcg_target_ulong *regs,
-                             const uint8_t **tb_ptr)
-{
-    int16_t value = tci_read_reg16s(regs, **tb_ptr);
-    *tb_ptr += 1;
-    return value;
-}
-#endif
-
-/* Read indexed register (32 bit) from bytecode. */
-static uint32_t tci_read_r32(const tcg_target_ulong *regs,
-                             const uint8_t **tb_ptr)
-{
-    uint32_t value = tci_read_reg32(regs, **tb_ptr);
-    *tb_ptr += 1;
-    return value;
-}
-
 #if TCG_TARGET_REG_BITS == 32
 /* Read two indexed registers (2 * 32 bit) from bytecode. */
 static uint64_t tci_read_r64(const tcg_target_ulong *regs,
                              const uint8_t **tb_ptr)
 {
-    uint32_t low = tci_read_r32(regs, tb_ptr);
-    return tci_uint64(tci_read_r32(regs, tb_ptr), low);
+    uint32_t low = tci_read_r(regs, tb_ptr);
+    return tci_uint64(tci_read_r(regs, tb_ptr), low);
 }
 #elif TCG_TARGET_REG_BITS == 64
 /* Read indexed register (32 bit signed) from bytecode. */
@@ -439,8 +407,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             continue;
         case INDEX_op_setcond_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             condition = *tb_ptr++;
             tci_write_reg(regs, t0, tci_compare32(t1, t2, condition));
             break;
@@ -463,7 +431,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #endif
         case INDEX_op_mov_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1);
             break;
         case INDEX_op_tci_movi_i32:
@@ -519,7 +487,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
         case INDEX_op_st_i32:
         CASE_64(st32)
-            t0 = tci_read_r32(regs, &tb_ptr);
+            t0 = tci_read_r(regs, &tb_ptr);
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             *(uint32_t *)(t1 + t2) = t0;
@@ -529,62 +497,62 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_add_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 + t2);
             break;
         case INDEX_op_sub_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 - t2);
             break;
         case INDEX_op_mul_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 * t2);
             break;
         case INDEX_op_div_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int32_t)t1 / (int32_t)t2);
             break;
         case INDEX_op_divu_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 / t2);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (uint32_t)t1 / (uint32_t)t2);
             break;
         case INDEX_op_rem_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int32_t)t1 % (int32_t)t2);
             break;
         case INDEX_op_remu_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 % t2);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (uint32_t)t1 % (uint32_t)t2);
             break;
         case INDEX_op_and_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 & t2);
             break;
         case INDEX_op_or_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 | t2);
             break;
         case INDEX_op_xor_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 ^ t2);
             break;
 
@@ -592,41 +560,41 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_shl_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 << (t2 & 31));
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (uint32_t)t1 << (t2 & 31));
             break;
         case INDEX_op_shr_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 >> (t2 & 31));
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (uint32_t)t1 >> (t2 & 31));
             break;
         case INDEX_op_sar_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
-            tci_write_reg(regs, t0, ((int32_t)t1 >> (t2 & 31)));
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (int32_t)t1 >> (t2 & 31));
             break;
 #if TCG_TARGET_HAS_rot_i32
         case INDEX_op_rotl_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, rol32(t1, t2 & 31));
             break;
         case INDEX_op_rotr_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, ror32(t1, t2 & 31));
             break;
 #endif
 #if TCG_TARGET_HAS_deposit_i32
         case INDEX_op_deposit_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            t2 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tmp16 = *tb_ptr++;
             tmp8 = *tb_ptr++;
             tmp32 = (((1 << tmp8) - 1) << tmp16);
@@ -634,8 +602,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
 #endif
         case INDEX_op_brcond_i32:
-            t0 = tci_read_r32(regs, &tb_ptr);
-            t1 = tci_read_r32(regs, &tb_ptr);
+            t0 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             condition = *tb_ptr++;
             label = tci_read_label(&tb_ptr);
             if (tci_compare32(t0, t1, condition)) {
@@ -673,9 +641,9 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_mulu2_i32:
             t0 = *tb_ptr++;
             t1 = *tb_ptr++;
-            t2 = tci_read_r32(regs, &tb_ptr);
-            tmp64 = tci_read_r32(regs, &tb_ptr);
-            tci_write_reg64(regs, t1, t0, t2 * tmp64);
+            t2 = tci_read_r(regs, &tb_ptr);
+            tmp64 = (uint32_t)tci_read_r(regs, &tb_ptr);
+            tci_write_reg64(regs, t1, t0, (uint32_t)t2 * tmp64);
             break;
 #endif /* TCG_TARGET_REG_BITS == 32 */
 #if TCG_TARGET_HAS_ext8s_i32
@@ -688,8 +656,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #if TCG_TARGET_HAS_ext16s_i32
         case INDEX_op_ext16s_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r16s(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1);
+            t1 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (int16_t)t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext8u_i32
@@ -716,21 +684,21 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #if TCG_TARGET_HAS_bswap32_i32
         case INDEX_op_bswap32_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, bswap32(t1));
             break;
 #endif
 #if TCG_TARGET_HAS_not_i32
         case INDEX_op_not_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, ~t1);
             break;
 #endif
 #if TCG_TARGET_HAS_neg_i32
         case INDEX_op_neg_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, -t1);
             break;
 #endif
@@ -903,8 +871,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #if TCG_TARGET_HAS_ext16s_i64
         case INDEX_op_ext16s_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r16s(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1);
+            t1 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (int16_t)t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext16u_i64
@@ -927,8 +895,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #endif
         case INDEX_op_extu_i32_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1);
+            t1 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (uint32_t)t1);
             break;
 #if TCG_TARGET_HAS_bswap16_i64
         case INDEX_op_bswap16_i64:
@@ -940,7 +908,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #if TCG_TARGET_HAS_bswap32_i64
         case INDEX_op_bswap32_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, bswap32(t1));
             break;
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 35/93] tcg/tci: Remove tci_read_r32s
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (33 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 34/93] tcg/tci: Remove tci_read_r16s Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 36/93] tcg/tci: Reduce use of tci_read_r64 Richard Henderson
                   ` (60 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Use explicit casts for ext32s opcodes.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 20 ++------------------
 1 file changed, 2 insertions(+), 18 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 72ec63e18e..9c8395397a 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -57,13 +57,6 @@ static tcg_target_ulong tci_read_reg(const tcg_target_ulong *regs, TCGReg index)
     return regs[index];
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static int32_t tci_read_reg32s(const tcg_target_ulong *regs, TCGReg index)
-{
-    return (int32_t)tci_read_reg(regs, index);
-}
-#endif
-
 #if TCG_TARGET_REG_BITS == 64
 static uint64_t tci_read_reg64(const tcg_target_ulong *regs, TCGReg index)
 {
@@ -149,15 +142,6 @@ static uint64_t tci_read_r64(const tcg_target_ulong *regs,
     return tci_uint64(tci_read_r(regs, tb_ptr), low);
 }
 #elif TCG_TARGET_REG_BITS == 64
-/* Read indexed register (32 bit signed) from bytecode. */
-static int32_t tci_read_r32s(const tcg_target_ulong *regs,
-                             const uint8_t **tb_ptr)
-{
-    int32_t value = tci_read_reg32s(regs, **tb_ptr);
-    *tb_ptr += 1;
-    return value;
-}
-
 /* Read indexed register (64 bit) from bytecode. */
 static uint64_t tci_read_r64(const tcg_target_ulong *regs,
                              const uint8_t **tb_ptr)
@@ -887,8 +871,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #endif
         case INDEX_op_ext_i32_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r32s(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1);
+            t1 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, (int32_t)t1);
             break;
 #if TCG_TARGET_HAS_ext32u_i64
         case INDEX_op_ext32u_i64:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 36/93] tcg/tci: Reduce use of tci_read_r64
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (34 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 35/93] tcg/tci: Remove tci_read_r32s Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 37/93] tcg/tci: Merge basic arithmetic operations Richard Henderson
                   ` (59 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

In all cases restricted to 64-bit hosts, tcg_read_r is
identical.  We retain the 64-bit symbol for the single
case of INDEX_op_qemu_st_i64.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 93 +++++++++++++++++++++++++------------------------------
 1 file changed, 42 insertions(+), 51 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 9c8395397a..0246e663a3 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -57,13 +57,6 @@ static tcg_target_ulong tci_read_reg(const tcg_target_ulong *regs, TCGReg index)
     return regs[index];
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static uint64_t tci_read_reg64(const tcg_target_ulong *regs, TCGReg index)
-{
-    return tci_read_reg(regs, index);
-}
-#endif
-
 static void
 tci_write_reg(tcg_target_ulong *regs, TCGReg index, tcg_target_ulong value)
 {
@@ -146,9 +139,7 @@ static uint64_t tci_read_r64(const tcg_target_ulong *regs,
 static uint64_t tci_read_r64(const tcg_target_ulong *regs,
                              const uint8_t **tb_ptr)
 {
-    uint64_t value = tci_read_reg64(regs, **tb_ptr);
-    *tb_ptr += 1;
-    return value;
+    return tci_read_r(regs, tb_ptr);
 }
 #endif
 
@@ -407,8 +398,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #elif TCG_TARGET_REG_BITS == 64
         case INDEX_op_setcond_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             condition = *tb_ptr++;
             tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
             break;
@@ -689,7 +680,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #if TCG_TARGET_REG_BITS == 64
         case INDEX_op_mov_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1);
             break;
         case INDEX_op_tci_movi_i64:
@@ -713,7 +704,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_write_reg(regs, t0, *(uint64_t *)(t1 + t2));
             break;
         case INDEX_op_st_i64:
-            t0 = tci_read_r64(regs, &tb_ptr);
+            t0 = tci_read_r(regs, &tb_ptr);
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             *(uint64_t *)(t1 + t2) = t0;
@@ -723,62 +714,62 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_add_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 + t2);
             break;
         case INDEX_op_sub_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 - t2);
             break;
         case INDEX_op_mul_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 * t2);
             break;
         case INDEX_op_div_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int64_t)t1 / (int64_t)t2);
             break;
         case INDEX_op_divu_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint64_t)t1 / (uint64_t)t2);
             break;
         case INDEX_op_rem_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int64_t)t1 % (int64_t)t2);
             break;
         case INDEX_op_remu_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint64_t)t1 % (uint64_t)t2);
             break;
         case INDEX_op_and_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 & t2);
             break;
         case INDEX_op_or_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 | t2);
             break;
         case INDEX_op_xor_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 ^ t2);
             break;
 
@@ -786,41 +777,41 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_shl_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 << (t2 & 63));
             break;
         case INDEX_op_shr_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 >> (t2 & 63));
             break;
         case INDEX_op_sar_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, ((int64_t)t1 >> (t2 & 63)));
             break;
 #if TCG_TARGET_HAS_rot_i64
         case INDEX_op_rotl_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, rol64(t1, t2 & 63));
             break;
         case INDEX_op_rotr_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, ror64(t1, t2 & 63));
             break;
 #endif
 #if TCG_TARGET_HAS_deposit_i64
         case INDEX_op_deposit_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
-            t2 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
             tmp16 = *tb_ptr++;
             tmp8 = *tb_ptr++;
             tmp64 = (((1ULL << tmp8) - 1) << tmp16);
@@ -828,8 +819,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
 #endif
         case INDEX_op_brcond_i64:
-            t0 = tci_read_r64(regs, &tb_ptr);
-            t1 = tci_read_r64(regs, &tb_ptr);
+            t0 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             condition = *tb_ptr++;
             label = tci_read_label(&tb_ptr);
             if (tci_compare64(t0, t1, condition)) {
@@ -899,21 +890,21 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #if TCG_TARGET_HAS_bswap64_i64
         case INDEX_op_bswap64_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, bswap64(t1));
             break;
 #endif
 #if TCG_TARGET_HAS_not_i64
         case INDEX_op_not_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, ~t1);
             break;
 #endif
 #if TCG_TARGET_HAS_neg_i64
         case INDEX_op_neg_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r64(regs, &tb_ptr);
+            t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, -t1);
             break;
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 37/93] tcg/tci: Merge basic arithmetic operations
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (35 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 36/93] tcg/tci: Reduce use of tci_read_r64 Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 38/93] tcg/tci: Merge extension operations Richard Henderson
                   ` (58 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

This includes add, sub, mul, and, or, xor.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 83 +++++++++++++++++--------------------------------------
 1 file changed, 25 insertions(+), 58 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 0246e663a3..894e87e1b0 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -468,26 +468,47 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             *(uint32_t *)(t1 + t2) = t0;
             break;
 
-            /* Arithmetic operations (32 bit). */
+            /* Arithmetic operations (mixed 32/64 bit). */
 
-        case INDEX_op_add_i32:
+        CASE_32_64(add)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 + t2);
             break;
-        case INDEX_op_sub_i32:
+        CASE_32_64(sub)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 - t2);
             break;
-        case INDEX_op_mul_i32:
+        CASE_32_64(mul)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 * t2);
             break;
+        CASE_32_64(and)
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, t1 & t2);
+            break;
+        CASE_32_64(or)
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, t1 | t2);
+            break;
+        CASE_32_64(xor)
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_r(regs, &tb_ptr);
+            tci_write_reg(regs, t0, t1 ^ t2);
+            break;
+
+            /* Arithmetic operations (32 bit). */
+
         case INDEX_op_div_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
@@ -512,24 +533,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint32_t)t1 % (uint32_t)t2);
             break;
-        case INDEX_op_and_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 & t2);
-            break;
-        case INDEX_op_or_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 | t2);
-            break;
-        case INDEX_op_xor_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 ^ t2);
-            break;
 
             /* Shift/rotate operations (32 bit). */
 
@@ -712,24 +715,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
             /* Arithmetic operations (64 bit). */
 
-        case INDEX_op_add_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 + t2);
-            break;
-        case INDEX_op_sub_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 - t2);
-            break;
-        case INDEX_op_mul_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 * t2);
-            break;
         case INDEX_op_div_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
@@ -754,24 +739,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t2 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint64_t)t1 % (uint64_t)t2);
             break;
-        case INDEX_op_and_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 & t2);
-            break;
-        case INDEX_op_or_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 | t2);
-            break;
-        case INDEX_op_xor_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 ^ t2);
-            break;
 
             /* Shift/rotate operations (64 bit). */
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 38/93] tcg/tci: Merge extension operations
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (36 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 37/93] tcg/tci: Merge basic arithmetic operations Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 39/93] tcg/tci: Remove ifdefs for TCG_TARGET_HAS_ext32[us]_i64 Richard Henderson
                   ` (57 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

This includes ext8s, ext8u, ext16s, ext16u.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 44 ++++++++------------------------------------
 1 file changed, 8 insertions(+), 36 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 894e87e1b0..cdfd9b7af8 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -624,29 +624,29 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_write_reg64(regs, t1, t0, (uint32_t)t2 * tmp64);
             break;
 #endif /* TCG_TARGET_REG_BITS == 32 */
-#if TCG_TARGET_HAS_ext8s_i32
-        case INDEX_op_ext8s_i32:
+#if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
+        CASE_32_64(ext8s)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int8_t)t1);
             break;
 #endif
-#if TCG_TARGET_HAS_ext16s_i32
-        case INDEX_op_ext16s_i32:
+#if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
+        CASE_32_64(ext16s)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int16_t)t1);
             break;
 #endif
-#if TCG_TARGET_HAS_ext8u_i32
-        case INDEX_op_ext8u_i32:
+#if TCG_TARGET_HAS_ext8u_i32 || TCG_TARGET_HAS_ext8u_i64
+        CASE_32_64(ext8u)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint8_t)t1);
             break;
 #endif
-#if TCG_TARGET_HAS_ext16u_i32
-        case INDEX_op_ext16u_i32:
+#if TCG_TARGET_HAS_ext16u_i32 || TCG_TARGET_HAS_ext16u_i64
+        CASE_32_64(ext16u)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint16_t)t1);
@@ -796,34 +796,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
                 continue;
             }
             break;
-#if TCG_TARGET_HAS_ext8u_i64
-        case INDEX_op_ext8u_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (uint8_t)t1);
-            break;
-#endif
-#if TCG_TARGET_HAS_ext8s_i64
-        case INDEX_op_ext8s_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (int8_t)t1);
-            break;
-#endif
-#if TCG_TARGET_HAS_ext16s_i64
-        case INDEX_op_ext16s_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (int16_t)t1);
-            break;
-#endif
-#if TCG_TARGET_HAS_ext16u_i64
-        case INDEX_op_ext16u_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (uint16_t)t1);
-            break;
-#endif
 #if TCG_TARGET_HAS_ext32s_i64
         case INDEX_op_ext32s_i64:
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 39/93] tcg/tci: Remove ifdefs for TCG_TARGET_HAS_ext32[us]_i64
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (37 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 38/93] tcg/tci: Merge extension operations Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 40/93] tcg/tci: Merge bswap operations Richard Henderson
                   ` (56 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

These operations are always available under different names:
INDEX_op_ext_i32_i64 and INDEX_op_extu_i32_i64, so we remove
no code with the ifdef.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index cdfd9b7af8..1819652c5a 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -796,17 +796,13 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
                 continue;
             }
             break;
-#if TCG_TARGET_HAS_ext32s_i64
         case INDEX_op_ext32s_i64:
-#endif
         case INDEX_op_ext_i32_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int32_t)t1);
             break;
-#if TCG_TARGET_HAS_ext32u_i64
         case INDEX_op_ext32u_i64:
-#endif
         case INDEX_op_extu_i32_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 40/93] tcg/tci: Merge bswap operations
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (38 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 39/93] tcg/tci: Remove ifdefs for TCG_TARGET_HAS_ext32[us]_i64 Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 41/93] tcg/tci: Merge mov, not and neg operations Richard Henderson
                   ` (55 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

This includes bswap16 and bswap32.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 22 ++++------------------
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 1819652c5a..c979215332 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -652,15 +652,15 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_write_reg(regs, t0, (uint16_t)t1);
             break;
 #endif
-#if TCG_TARGET_HAS_bswap16_i32
-        case INDEX_op_bswap16_i32:
+#if TCG_TARGET_HAS_bswap16_i32 || TCG_TARGET_HAS_bswap16_i64
+        CASE_32_64(bswap16)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, bswap16(t1));
             break;
 #endif
-#if TCG_TARGET_HAS_bswap32_i32
-        case INDEX_op_bswap32_i32:
+#if TCG_TARGET_HAS_bswap32_i32 || TCG_TARGET_HAS_bswap32_i64
+        CASE_32_64(bswap32)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, bswap32(t1));
@@ -808,20 +808,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint32_t)t1);
             break;
-#if TCG_TARGET_HAS_bswap16_i64
-        case INDEX_op_bswap16_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, bswap16(t1));
-            break;
-#endif
-#if TCG_TARGET_HAS_bswap32_i64
-        case INDEX_op_bswap32_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, bswap32(t1));
-            break;
-#endif
 #if TCG_TARGET_HAS_bswap64_i64
         case INDEX_op_bswap64_i64:
             t0 = *tb_ptr++;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 41/93] tcg/tci: Merge mov, not and neg operations
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (39 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 40/93] tcg/tci: Merge bswap operations Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 42/93] tcg/tci: Rename tci_read_r to tci_read_rval Richard Henderson
                   ` (54 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 29 +++++------------------------
 1 file changed, 5 insertions(+), 24 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index c979215332..225cb698e8 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -404,7 +404,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
             break;
 #endif
-        case INDEX_op_mov_i32:
+        CASE_32_64(mov)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1);
@@ -666,26 +666,21 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_write_reg(regs, t0, bswap32(t1));
             break;
 #endif
-#if TCG_TARGET_HAS_not_i32
-        case INDEX_op_not_i32:
+#if TCG_TARGET_HAS_not_i32 || TCG_TARGET_HAS_not_i64
+        CASE_32_64(not)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, ~t1);
             break;
 #endif
-#if TCG_TARGET_HAS_neg_i32
-        case INDEX_op_neg_i32:
+#if TCG_TARGET_HAS_neg_i32 || TCG_TARGET_HAS_neg_i64
+        CASE_32_64(neg)
             t0 = *tb_ptr++;
             t1 = tci_read_r(regs, &tb_ptr);
             tci_write_reg(regs, t0, -t1);
             break;
 #endif
 #if TCG_TARGET_REG_BITS == 64
-        case INDEX_op_mov_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1);
-            break;
         case INDEX_op_tci_movi_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_i64(&tb_ptr);
@@ -815,20 +810,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_write_reg(regs, t0, bswap64(t1));
             break;
 #endif
-#if TCG_TARGET_HAS_not_i64
-        case INDEX_op_not_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, ~t1);
-            break;
-#endif
-#if TCG_TARGET_HAS_neg_i64
-        case INDEX_op_neg_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            tci_write_reg(regs, t0, -t1);
-            break;
-#endif
 #endif /* TCG_TARGET_REG_BITS == 64 */
 
             /* QEMU specific operations. */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 42/93] tcg/tci: Rename tci_read_r to tci_read_rval
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (40 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 41/93] tcg/tci: Merge mov, not and neg operations Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 43/93] tcg/tci: Split out tci_args_rrs Richard Henderson
                   ` (53 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

In the next patches, we want to use tci_read_r to return
the raw register number.  So rename the existing function,
which returns the register value, to tci_read_rval.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 192 +++++++++++++++++++++++++++---------------------------
 1 file changed, 96 insertions(+), 96 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 225cb698e8..20aaaca959 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -119,7 +119,7 @@ static uint64_t tci_read_i64(const uint8_t **tb_ptr)
 
 /* Read indexed register (native size) from bytecode. */
 static tcg_target_ulong
-tci_read_r(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
+tci_read_rval(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
 {
     tcg_target_ulong value = tci_read_reg(regs, **tb_ptr);
     *tb_ptr += 1;
@@ -131,15 +131,15 @@ tci_read_r(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
 static uint64_t tci_read_r64(const tcg_target_ulong *regs,
                              const uint8_t **tb_ptr)
 {
-    uint32_t low = tci_read_r(regs, tb_ptr);
-    return tci_uint64(tci_read_r(regs, tb_ptr), low);
+    uint32_t low = tci_read_rval(regs, tb_ptr);
+    return tci_uint64(tci_read_rval(regs, tb_ptr), low);
 }
 #elif TCG_TARGET_REG_BITS == 64
 /* Read indexed register (64 bit) from bytecode. */
 static uint64_t tci_read_r64(const tcg_target_ulong *regs,
                              const uint8_t **tb_ptr)
 {
-    return tci_read_r(regs, tb_ptr);
+    return tci_read_rval(regs, tb_ptr);
 }
 #endif
 
@@ -147,9 +147,9 @@ static uint64_t tci_read_r64(const tcg_target_ulong *regs,
 static target_ulong
 tci_read_ulong(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
 {
-    target_ulong taddr = tci_read_r(regs, tb_ptr);
+    target_ulong taddr = tci_read_rval(regs, tb_ptr);
 #if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
-    taddr += (uint64_t)tci_read_r(regs, tb_ptr) << 32;
+    taddr += (uint64_t)tci_read_rval(regs, tb_ptr) << 32;
 #endif
     return taddr;
 }
@@ -382,8 +382,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             continue;
         case INDEX_op_setcond_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             condition = *tb_ptr++;
             tci_write_reg(regs, t0, tci_compare32(t1, t2, condition));
             break;
@@ -398,15 +398,15 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #elif TCG_TARGET_REG_BITS == 64
         case INDEX_op_setcond_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             condition = *tb_ptr++;
             tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
             break;
 #endif
         CASE_32_64(mov)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1);
             break;
         case INDEX_op_tci_movi_i32:
@@ -419,51 +419,51 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         CASE_32_64(ld8u)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             tci_write_reg(regs, t0, *(uint8_t *)(t1 + t2));
             break;
         CASE_32_64(ld8s)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             tci_write_reg(regs, t0, *(int8_t *)(t1 + t2));
             break;
         CASE_32_64(ld16u)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             tci_write_reg(regs, t0, *(uint16_t *)(t1 + t2));
             break;
         CASE_32_64(ld16s)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             tci_write_reg(regs, t0, *(int16_t *)(t1 + t2));
             break;
         case INDEX_op_ld_i32:
         CASE_64(ld32u)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             tci_write_reg(regs, t0, *(uint32_t *)(t1 + t2));
             break;
         CASE_32_64(st8)
-            t0 = tci_read_r(regs, &tb_ptr);
-            t1 = tci_read_r(regs, &tb_ptr);
+            t0 = tci_read_rval(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             *(uint8_t *)(t1 + t2) = t0;
             break;
         CASE_32_64(st16)
-            t0 = tci_read_r(regs, &tb_ptr);
-            t1 = tci_read_r(regs, &tb_ptr);
+            t0 = tci_read_rval(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             *(uint16_t *)(t1 + t2) = t0;
             break;
         case INDEX_op_st_i32:
         CASE_64(st32)
-            t0 = tci_read_r(regs, &tb_ptr);
-            t1 = tci_read_r(regs, &tb_ptr);
+            t0 = tci_read_rval(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             *(uint32_t *)(t1 + t2) = t0;
             break;
@@ -472,38 +472,38 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         CASE_32_64(add)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 + t2);
             break;
         CASE_32_64(sub)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 - t2);
             break;
         CASE_32_64(mul)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 * t2);
             break;
         CASE_32_64(and)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 & t2);
             break;
         CASE_32_64(or)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 | t2);
             break;
         CASE_32_64(xor)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 ^ t2);
             break;
 
@@ -511,26 +511,26 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_div_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int32_t)t1 / (int32_t)t2);
             break;
         case INDEX_op_divu_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint32_t)t1 / (uint32_t)t2);
             break;
         case INDEX_op_rem_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int32_t)t1 % (int32_t)t2);
             break;
         case INDEX_op_remu_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint32_t)t1 % (uint32_t)t2);
             break;
 
@@ -538,41 +538,41 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_shl_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint32_t)t1 << (t2 & 31));
             break;
         case INDEX_op_shr_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint32_t)t1 >> (t2 & 31));
             break;
         case INDEX_op_sar_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int32_t)t1 >> (t2 & 31));
             break;
 #if TCG_TARGET_HAS_rot_i32
         case INDEX_op_rotl_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, rol32(t1, t2 & 31));
             break;
         case INDEX_op_rotr_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, ror32(t1, t2 & 31));
             break;
 #endif
 #if TCG_TARGET_HAS_deposit_i32
         case INDEX_op_deposit_i32:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tmp16 = *tb_ptr++;
             tmp8 = *tb_ptr++;
             tmp32 = (((1 << tmp8) - 1) << tmp16);
@@ -580,8 +580,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
 #endif
         case INDEX_op_brcond_i32:
-            t0 = tci_read_r(regs, &tb_ptr);
-            t1 = tci_read_r(regs, &tb_ptr);
+            t0 = tci_read_rval(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             condition = *tb_ptr++;
             label = tci_read_label(&tb_ptr);
             if (tci_compare32(t0, t1, condition)) {
@@ -619,64 +619,64 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_mulu2_i32:
             t0 = *tb_ptr++;
             t1 = *tb_ptr++;
-            t2 = tci_read_r(regs, &tb_ptr);
-            tmp64 = (uint32_t)tci_read_r(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
+            tmp64 = (uint32_t)tci_read_rval(regs, &tb_ptr);
             tci_write_reg64(regs, t1, t0, (uint32_t)t2 * tmp64);
             break;
 #endif /* TCG_TARGET_REG_BITS == 32 */
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
         CASE_32_64(ext8s)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int8_t)t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
         CASE_32_64(ext16s)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int16_t)t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext8u_i32 || TCG_TARGET_HAS_ext8u_i64
         CASE_32_64(ext8u)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint8_t)t1);
             break;
 #endif
 #if TCG_TARGET_HAS_ext16u_i32 || TCG_TARGET_HAS_ext16u_i64
         CASE_32_64(ext16u)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint16_t)t1);
             break;
 #endif
 #if TCG_TARGET_HAS_bswap16_i32 || TCG_TARGET_HAS_bswap16_i64
         CASE_32_64(bswap16)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, bswap16(t1));
             break;
 #endif
 #if TCG_TARGET_HAS_bswap32_i32 || TCG_TARGET_HAS_bswap32_i64
         CASE_32_64(bswap32)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, bswap32(t1));
             break;
 #endif
 #if TCG_TARGET_HAS_not_i32 || TCG_TARGET_HAS_not_i64
         CASE_32_64(not)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, ~t1);
             break;
 #endif
 #if TCG_TARGET_HAS_neg_i32 || TCG_TARGET_HAS_neg_i64
         CASE_32_64(neg)
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, -t1);
             break;
 #endif
@@ -691,19 +691,19 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_ld32s_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             tci_write_reg(regs, t0, *(int32_t *)(t1 + t2));
             break;
         case INDEX_op_ld_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             tci_write_reg(regs, t0, *(uint64_t *)(t1 + t2));
             break;
         case INDEX_op_st_i64:
-            t0 = tci_read_r(regs, &tb_ptr);
-            t1 = tci_read_r(regs, &tb_ptr);
+            t0 = tci_read_rval(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             t2 = tci_read_s32(&tb_ptr);
             *(uint64_t *)(t1 + t2) = t0;
             break;
@@ -712,26 +712,26 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_div_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int64_t)t1 / (int64_t)t2);
             break;
         case INDEX_op_divu_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint64_t)t1 / (uint64_t)t2);
             break;
         case INDEX_op_rem_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int64_t)t1 % (int64_t)t2);
             break;
         case INDEX_op_remu_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint64_t)t1 % (uint64_t)t2);
             break;
 
@@ -739,41 +739,41 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_shl_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 << (t2 & 63));
             break;
         case INDEX_op_shr_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, t1 >> (t2 & 63));
             break;
         case INDEX_op_sar_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, ((int64_t)t1 >> (t2 & 63)));
             break;
 #if TCG_TARGET_HAS_rot_i64
         case INDEX_op_rotl_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, rol64(t1, t2 & 63));
             break;
         case INDEX_op_rotr_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, ror64(t1, t2 & 63));
             break;
 #endif
 #if TCG_TARGET_HAS_deposit_i64
         case INDEX_op_deposit_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
-            t2 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
+            t2 = tci_read_rval(regs, &tb_ptr);
             tmp16 = *tb_ptr++;
             tmp8 = *tb_ptr++;
             tmp64 = (((1ULL << tmp8) - 1) << tmp16);
@@ -781,8 +781,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
 #endif
         case INDEX_op_brcond_i64:
-            t0 = tci_read_r(regs, &tb_ptr);
-            t1 = tci_read_r(regs, &tb_ptr);
+            t0 = tci_read_rval(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             condition = *tb_ptr++;
             label = tci_read_label(&tb_ptr);
             if (tci_compare64(t0, t1, condition)) {
@@ -794,19 +794,19 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_ext32s_i64:
         case INDEX_op_ext_i32_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (int32_t)t1);
             break;
         case INDEX_op_ext32u_i64:
         case INDEX_op_extu_i32_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, (uint32_t)t1);
             break;
 #if TCG_TARGET_HAS_bswap64_i64
         case INDEX_op_bswap64_i64:
             t0 = *tb_ptr++;
-            t1 = tci_read_r(regs, &tb_ptr);
+            t1 = tci_read_rval(regs, &tb_ptr);
             tci_write_reg(regs, t0, bswap64(t1));
             break;
 #endif
@@ -913,7 +913,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             }
             break;
         case INDEX_op_qemu_st_i32:
-            t0 = tci_read_r(regs, &tb_ptr);
+            t0 = tci_read_rval(regs, &tb_ptr);
             taddr = tci_read_ulong(regs, &tb_ptr);
             oi = tci_read_i(&tb_ptr);
             switch (get_memop(oi) & (MO_BSWAP | MO_SIZE)) {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 43/93] tcg/tci: Split out tci_args_rrs
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (41 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 42/93] tcg/tci: Rename tci_read_r to tci_read_rval Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 44/93] tcg/tci: Split out tci_args_rr Richard Henderson
                   ` (52 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Begin splitting out functions that do pure argument decode,
without actually loading values from the register set.

This means that decoding need not concern itself between
input and output registers.  We can assert that the register
number is in range during decode, so that it is safe to
simply dereference from regs[] later.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 111 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 67 insertions(+), 44 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 20aaaca959..be298ae39d 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -83,6 +83,20 @@ static uint64_t tci_uint64(uint32_t high, uint32_t low)
 }
 #endif
 
+/* Read constant byte from bytecode. */
+static uint8_t tci_read_b(const uint8_t **tb_ptr)
+{
+    return *(tb_ptr[0]++);
+}
+
+/* Read register number from bytecode. */
+static TCGReg tci_read_r(const uint8_t **tb_ptr)
+{
+    uint8_t regno = tci_read_b(tb_ptr);
+    tci_assert(regno < TCG_TARGET_NB_REGS);
+    return regno;
+}
+
 /* Read constant (native size) from bytecode. */
 static tcg_target_ulong tci_read_i(const uint8_t **tb_ptr)
 {
@@ -161,6 +175,23 @@ static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
     return label;
 }
 
+/*
+ * Load sets of arguments all at once.  The naming convention is:
+ *   tci_args_<arguments>
+ * where arguments is a sequence of
+ *
+ *   r = register
+ *   s = signed ldst offset
+ */
+
+static void tci_args_rrs(const uint8_t **tb_ptr,
+                         TCGReg *r0, TCGReg *r1, int32_t *i2)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *r1 = tci_read_r(tb_ptr);
+    *i2 = tci_read_s32(tb_ptr);
+}
+
 static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
 {
     bool result = false;
@@ -328,6 +359,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         uint8_t op_size = tb_ptr[1];
         const uint8_t *old_code_ptr = tb_ptr;
 #endif
+        TCGReg r0, r1;
         tcg_target_ulong t0;
         tcg_target_ulong t1;
         tcg_target_ulong t2;
@@ -342,6 +374,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         uint64_t v64;
 #endif
         TCGMemOpIdx oi;
+        int32_t ofs;
+        void *ptr;
 
         /* Skip opcode and size entry. */
         tb_ptr += 2;
@@ -418,54 +452,46 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             /* Load/store operations (32 bit). */
 
         CASE_32_64(ld8u)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg(regs, t0, *(uint8_t *)(t1 + t2));
+            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            regs[r0] = *(uint8_t *)ptr;
             break;
         CASE_32_64(ld8s)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg(regs, t0, *(int8_t *)(t1 + t2));
+            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            regs[r0] = *(int8_t *)ptr;
             break;
         CASE_32_64(ld16u)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg(regs, t0, *(uint16_t *)(t1 + t2));
+            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            regs[r0] = *(uint16_t *)ptr;
             break;
         CASE_32_64(ld16s)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg(regs, t0, *(int16_t *)(t1 + t2));
+            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            regs[r0] = *(int16_t *)ptr;
             break;
         case INDEX_op_ld_i32:
         CASE_64(ld32u)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg(regs, t0, *(uint32_t *)(t1 + t2));
+            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            regs[r0] = *(uint32_t *)ptr;
             break;
         CASE_32_64(st8)
-            t0 = tci_read_rval(regs, &tb_ptr);
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            *(uint8_t *)(t1 + t2) = t0;
+            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            *(uint8_t *)ptr = regs[r0];
             break;
         CASE_32_64(st16)
-            t0 = tci_read_rval(regs, &tb_ptr);
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            *(uint16_t *)(t1 + t2) = t0;
+            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            *(uint16_t *)ptr = regs[r0];
             break;
         case INDEX_op_st_i32:
         CASE_64(st32)
-            t0 = tci_read_rval(regs, &tb_ptr);
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            *(uint32_t *)(t1 + t2) = t0;
+            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            *(uint32_t *)ptr = regs[r0];
             break;
 
             /* Arithmetic operations (mixed 32/64 bit). */
@@ -690,22 +716,19 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             /* Load/store operations (64 bit). */
 
         case INDEX_op_ld32s_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg(regs, t0, *(int32_t *)(t1 + t2));
+            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            regs[r0] = *(int32_t *)ptr;
             break;
         case INDEX_op_ld_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            tci_write_reg(regs, t0, *(uint64_t *)(t1 + t2));
+            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            regs[r0] = *(uint64_t *)ptr;
             break;
         case INDEX_op_st_i64:
-            t0 = tci_read_rval(regs, &tb_ptr);
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_s32(&tb_ptr);
-            *(uint64_t *)(t1 + t2) = t0;
+            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            ptr = (void *)(regs[r1] + ofs);
+            *(uint64_t *)ptr = regs[r0];
             break;
 
             /* Arithmetic operations (64 bit). */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 44/93] tcg/tci: Split out tci_args_rr
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (42 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 43/93] tcg/tci: Split out tci_args_rrs Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 45/93] tcg/tci: Split out tci_args_rrr Richard Henderson
                   ` (51 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 67 +++++++++++++++++++++++++------------------------------
 1 file changed, 31 insertions(+), 36 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index be298ae39d..0bc5294e8b 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -184,6 +184,13 @@ static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
  *   s = signed ldst offset
  */
 
+static void tci_args_rr(const uint8_t **tb_ptr,
+                        TCGReg *r0, TCGReg *r1)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *r1 = tci_read_r(tb_ptr);
+}
+
 static void tci_args_rrs(const uint8_t **tb_ptr,
                          TCGReg *r0, TCGReg *r1, int32_t *i2)
 {
@@ -439,9 +446,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
 #endif
         CASE_32_64(mov)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1);
+            tci_args_rr(&tb_ptr, &r0, &r1);
+            regs[r0] = regs[r1];
             break;
         case INDEX_op_tci_movi_i32:
             t0 = *tb_ptr++;
@@ -652,58 +658,50 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #endif /* TCG_TARGET_REG_BITS == 32 */
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
         CASE_32_64(ext8s)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (int8_t)t1);
+            tci_args_rr(&tb_ptr, &r0, &r1);
+            regs[r0] = (int8_t)regs[r1];
             break;
 #endif
 #if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
         CASE_32_64(ext16s)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (int16_t)t1);
+            tci_args_rr(&tb_ptr, &r0, &r1);
+            regs[r0] = (int16_t)regs[r1];
             break;
 #endif
 #if TCG_TARGET_HAS_ext8u_i32 || TCG_TARGET_HAS_ext8u_i64
         CASE_32_64(ext8u)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (uint8_t)t1);
+            tci_args_rr(&tb_ptr, &r0, &r1);
+            regs[r0] = (uint8_t)regs[r1];
             break;
 #endif
 #if TCG_TARGET_HAS_ext16u_i32 || TCG_TARGET_HAS_ext16u_i64
         CASE_32_64(ext16u)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (uint16_t)t1);
+            tci_args_rr(&tb_ptr, &r0, &r1);
+            regs[r0] = (uint16_t)regs[r1];
             break;
 #endif
 #if TCG_TARGET_HAS_bswap16_i32 || TCG_TARGET_HAS_bswap16_i64
         CASE_32_64(bswap16)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, bswap16(t1));
+            tci_args_rr(&tb_ptr, &r0, &r1);
+            regs[r0] = bswap16(regs[r1]);
             break;
 #endif
 #if TCG_TARGET_HAS_bswap32_i32 || TCG_TARGET_HAS_bswap32_i64
         CASE_32_64(bswap32)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, bswap32(t1));
+            tci_args_rr(&tb_ptr, &r0, &r1);
+            regs[r0] = bswap32(regs[r1]);
             break;
 #endif
 #if TCG_TARGET_HAS_not_i32 || TCG_TARGET_HAS_not_i64
         CASE_32_64(not)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, ~t1);
+            tci_args_rr(&tb_ptr, &r0, &r1);
+            regs[r0] = ~regs[r1];
             break;
 #endif
 #if TCG_TARGET_HAS_neg_i32 || TCG_TARGET_HAS_neg_i64
         CASE_32_64(neg)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, -t1);
+            tci_args_rr(&tb_ptr, &r0, &r1);
+            regs[r0] = -regs[r1];
             break;
 #endif
 #if TCG_TARGET_REG_BITS == 64
@@ -816,21 +814,18 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
         case INDEX_op_ext32s_i64:
         case INDEX_op_ext_i32_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (int32_t)t1);
+            tci_args_rr(&tb_ptr, &r0, &r1);
+            regs[r0] = (int32_t)regs[r1];
             break;
         case INDEX_op_ext32u_i64:
         case INDEX_op_extu_i32_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (uint32_t)t1);
+            tci_args_rr(&tb_ptr, &r0, &r1);
+            regs[r0] = (uint32_t)regs[r1];
             break;
 #if TCG_TARGET_HAS_bswap64_i64
         case INDEX_op_bswap64_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, bswap64(t1));
+            tci_args_rr(&tb_ptr, &r0, &r1);
+            regs[r0] = bswap64(regs[r1]);
             break;
 #endif
 #endif /* TCG_TARGET_REG_BITS == 64 */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 45/93] tcg/tci: Split out tci_args_rrr
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (43 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 44/93] tcg/tci: Split out tci_args_rr Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 46/93] tcg/tci: Split out tci_args_rrrc Richard Henderson
                   ` (50 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 154 ++++++++++++++++++++----------------------------------
 1 file changed, 57 insertions(+), 97 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 0bc5294e8b..1736234bfd 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -191,6 +191,14 @@ static void tci_args_rr(const uint8_t **tb_ptr,
     *r1 = tci_read_r(tb_ptr);
 }
 
+static void tci_args_rrr(const uint8_t **tb_ptr,
+                         TCGReg *r0, TCGReg *r1, TCGReg *r2)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *r1 = tci_read_r(tb_ptr);
+    *r2 = tci_read_r(tb_ptr);
+}
+
 static void tci_args_rrs(const uint8_t **tb_ptr,
                          TCGReg *r0, TCGReg *r1, int32_t *i2)
 {
@@ -366,7 +374,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         uint8_t op_size = tb_ptr[1];
         const uint8_t *old_code_ptr = tb_ptr;
 #endif
-        TCGReg r0, r1;
+        TCGReg r0, r1, r2;
         tcg_target_ulong t0;
         tcg_target_ulong t1;
         tcg_target_ulong t2;
@@ -503,101 +511,71 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             /* Arithmetic operations (mixed 32/64 bit). */
 
         CASE_32_64(add)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 + t2);
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = regs[r1] + regs[r2];
             break;
         CASE_32_64(sub)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 - t2);
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = regs[r1] - regs[r2];
             break;
         CASE_32_64(mul)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 * t2);
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = regs[r1] * regs[r2];
             break;
         CASE_32_64(and)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 & t2);
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = regs[r1] & regs[r2];
             break;
         CASE_32_64(or)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 | t2);
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = regs[r1] | regs[r2];
             break;
         CASE_32_64(xor)
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 ^ t2);
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = regs[r1] ^ regs[r2];
             break;
 
             /* Arithmetic operations (32 bit). */
 
         case INDEX_op_div_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (int32_t)t1 / (int32_t)t2);
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = (int32_t)regs[r1] / (int32_t)regs[r2];
             break;
         case INDEX_op_divu_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (uint32_t)t1 / (uint32_t)t2);
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = (uint32_t)regs[r1] / (uint32_t)regs[r2];
             break;
         case INDEX_op_rem_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (int32_t)t1 % (int32_t)t2);
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = (int32_t)regs[r1] % (int32_t)regs[r2];
             break;
         case INDEX_op_remu_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (uint32_t)t1 % (uint32_t)t2);
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = (uint32_t)regs[r1] % (uint32_t)regs[r2];
             break;
 
             /* Shift/rotate operations (32 bit). */
 
         case INDEX_op_shl_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (uint32_t)t1 << (t2 & 31));
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = (uint32_t)regs[r1] << (regs[r2] & 31);
             break;
         case INDEX_op_shr_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (uint32_t)t1 >> (t2 & 31));
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = (uint32_t)regs[r1] >> (regs[r2] & 31);
             break;
         case INDEX_op_sar_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (int32_t)t1 >> (t2 & 31));
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = (int32_t)regs[r1] >> (regs[r2] & 31);
             break;
 #if TCG_TARGET_HAS_rot_i32
         case INDEX_op_rotl_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, rol32(t1, t2 & 31));
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = rol32(regs[r1], regs[r2] & 31);
             break;
         case INDEX_op_rotr_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, ror32(t1, t2 & 31));
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = ror32(regs[r1], regs[r2] & 31);
             break;
 #endif
 #if TCG_TARGET_HAS_deposit_i32
@@ -732,62 +710,44 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             /* Arithmetic operations (64 bit). */
 
         case INDEX_op_div_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (int64_t)t1 / (int64_t)t2);
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = (int64_t)regs[r1] / (int64_t)regs[r2];
             break;
         case INDEX_op_divu_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (uint64_t)t1 / (uint64_t)t2);
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = (uint64_t)regs[r1] / (uint64_t)regs[r2];
             break;
         case INDEX_op_rem_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (int64_t)t1 % (int64_t)t2);
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = (int64_t)regs[r1] % (int64_t)regs[r2];
             break;
         case INDEX_op_remu_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, (uint64_t)t1 % (uint64_t)t2);
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = (uint64_t)regs[r1] % (uint64_t)regs[r2];
             break;
 
             /* Shift/rotate operations (64 bit). */
 
         case INDEX_op_shl_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 << (t2 & 63));
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = regs[r1] << (regs[r2] & 63);
             break;
         case INDEX_op_shr_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, t1 >> (t2 & 63));
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = regs[r1] >> (regs[r2] & 63);
             break;
         case INDEX_op_sar_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, ((int64_t)t1 >> (t2 & 63)));
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = (int64_t)regs[r1] >> (regs[r2] & 63);
             break;
 #if TCG_TARGET_HAS_rot_i64
         case INDEX_op_rotl_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, rol64(t1, t2 & 63));
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = rol64(regs[r1], regs[r2] & 63);
             break;
         case INDEX_op_rotr_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tci_write_reg(regs, t0, ror64(t1, t2 & 63));
+            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            regs[r0] = ror64(regs[r1], regs[r2] & 63);
             break;
 #endif
 #if TCG_TARGET_HAS_deposit_i64
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 46/93] tcg/tci: Split out tci_args_rrrc
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (44 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 45/93] tcg/tci: Split out tci_args_rrr Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 47/93] tcg/tci: Split out tci_args_l Richard Henderson
                   ` (49 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 1736234bfd..86625061f1 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -207,6 +207,15 @@ static void tci_args_rrs(const uint8_t **tb_ptr,
     *i2 = tci_read_s32(tb_ptr);
 }
 
+static void tci_args_rrrc(const uint8_t **tb_ptr,
+                          TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *r1 = tci_read_r(tb_ptr);
+    *r2 = tci_read_r(tb_ptr);
+    *c3 = tci_read_b(tb_ptr);
+}
+
 static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
 {
     bool result = false;
@@ -430,11 +439,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tb_ptr = (uint8_t *)label;
             continue;
         case INDEX_op_setcond_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            condition = *tb_ptr++;
-            tci_write_reg(regs, t0, tci_compare32(t1, t2, condition));
+            tci_args_rrrc(&tb_ptr, &r0, &r1, &r2, &condition);
+            regs[r0] = tci_compare32(regs[r1], regs[r2], condition);
             break;
 #if TCG_TARGET_REG_BITS == 32
         case INDEX_op_setcond2_i32:
@@ -446,11 +452,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
 #elif TCG_TARGET_REG_BITS == 64
         case INDEX_op_setcond_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            condition = *tb_ptr++;
-            tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
+            tci_args_rrrc(&tb_ptr, &r0, &r1, &r2, &condition);
+            regs[r0] = tci_compare64(regs[r1], regs[r2], condition);
             break;
 #endif
         CASE_32_64(mov)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 47/93] tcg/tci: Split out tci_args_l
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (45 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 46/93] tcg/tci: Split out tci_args_rrrc Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 48/93] tcg/tci: Split out tci_args_rrrrrc Richard Henderson
                   ` (48 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 86625061f1..8bc9dd27b0 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -184,6 +184,11 @@ static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
  *   s = signed ldst offset
  */
 
+static void tci_args_l(const uint8_t **tb_ptr, void **l0)
+{
+    *l0 = (void *)tci_read_label(tb_ptr);
+}
+
 static void tci_args_rr(const uint8_t **tb_ptr,
                         TCGReg *r0, TCGReg *r1)
 {
@@ -434,9 +439,9 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #endif
             break;
         case INDEX_op_br:
-            label = tci_read_label(&tb_ptr);
+            tci_args_l(&tb_ptr, &ptr);
             tci_assert(tb_ptr == old_code_ptr + op_size);
-            tb_ptr = (uint8_t *)label;
+            tb_ptr = ptr;
             continue;
         case INDEX_op_setcond_i32:
             tci_args_rrrc(&tb_ptr, &r0, &r1, &r2, &condition);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 48/93] tcg/tci: Split out tci_args_rrrrrc
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (46 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 47/93] tcg/tci: Split out tci_args_l Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 49/93] tcg/tci: Split out tci_args_rrcl and tci_args_rrrrcl Richard Henderson
                   ` (47 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 8bc9dd27b0..692b95b5c2 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -221,6 +221,19 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
     *c3 = tci_read_b(tb_ptr);
 }
 
+#if TCG_TARGET_REG_BITS == 32
+static void tci_args_rrrrrc(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+                            TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *r1 = tci_read_r(tb_ptr);
+    *r2 = tci_read_r(tb_ptr);
+    *r3 = tci_read_r(tb_ptr);
+    *r4 = tci_read_r(tb_ptr);
+    *c5 = tci_read_b(tb_ptr);
+}
+#endif
+
 static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
 {
     bool result = false;
@@ -400,7 +413,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         uint32_t tmp32;
         uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
-        uint64_t v64;
+        TCGReg r3, r4;
+        uint64_t v64, T1, T2;
 #endif
         TCGMemOpIdx oi;
         int32_t ofs;
@@ -449,11 +463,10 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
 #if TCG_TARGET_REG_BITS == 32
         case INDEX_op_setcond2_i32:
-            t0 = *tb_ptr++;
-            tmp64 = tci_read_r64(regs, &tb_ptr);
-            v64 = tci_read_r64(regs, &tb_ptr);
-            condition = *tb_ptr++;
-            tci_write_reg(regs, t0, tci_compare64(tmp64, v64, condition));
+            tci_args_rrrrrc(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &condition);
+            T1 = tci_uint64(regs[r2], regs[r1]);
+            T2 = tci_uint64(regs[r4], regs[r3]);
+            regs[r0] = tci_compare64(T1, T2, condition);
             break;
 #elif TCG_TARGET_REG_BITS == 64
         case INDEX_op_setcond_i64:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 49/93] tcg/tci: Split out tci_args_rrcl and tci_args_rrrrcl
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (47 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 48/93] tcg/tci: Split out tci_args_rrrrrc Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 50/93] tcg/tci: Split out tci_args_ri and tci_args_rI Richard Henderson
                   ` (46 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 52 ++++++++++++++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 692b95b5c2..1e2f78a9f9 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -212,6 +212,15 @@ static void tci_args_rrs(const uint8_t **tb_ptr,
     *i2 = tci_read_s32(tb_ptr);
 }
 
+static void tci_args_rrcl(const uint8_t **tb_ptr,
+                          TCGReg *r0, TCGReg *r1, TCGCond *c2, void **l3)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *r1 = tci_read_r(tb_ptr);
+    *c2 = tci_read_b(tb_ptr);
+    *l3 = (void *)tci_read_label(tb_ptr);
+}
+
 static void tci_args_rrrc(const uint8_t **tb_ptr,
                           TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
 {
@@ -222,6 +231,17 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
 }
 
 #if TCG_TARGET_REG_BITS == 32
+static void tci_args_rrrrcl(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+                            TCGReg *r2, TCGReg *r3, TCGCond *c4, void **l5)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *r1 = tci_read_r(tb_ptr);
+    *r2 = tci_read_r(tb_ptr);
+    *r3 = tci_read_r(tb_ptr);
+    *c4 = tci_read_b(tb_ptr);
+    *l5 = (void *)tci_read_label(tb_ptr);
+}
+
 static void tci_args_rrrrrc(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
                             TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
 {
@@ -405,7 +425,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         tcg_target_ulong t0;
         tcg_target_ulong t1;
         tcg_target_ulong t2;
-        tcg_target_ulong label;
         TCGCond condition;
         target_ulong taddr;
         uint8_t tmp8;
@@ -414,7 +433,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
         TCGReg r3, r4;
-        uint64_t v64, T1, T2;
+        uint64_t T1, T2;
 #endif
         TCGMemOpIdx oi;
         int32_t ofs;
@@ -611,13 +630,10 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
 #endif
         case INDEX_op_brcond_i32:
-            t0 = tci_read_rval(regs, &tb_ptr);
-            t1 = tci_read_rval(regs, &tb_ptr);
-            condition = *tb_ptr++;
-            label = tci_read_label(&tb_ptr);
-            if (tci_compare32(t0, t1, condition)) {
+            tci_args_rrcl(&tb_ptr, &r0, &r1, &condition, &ptr);
+            if (tci_compare32(regs[r0], regs[r1], condition)) {
                 tci_assert(tb_ptr == old_code_ptr + op_size);
-                tb_ptr = (uint8_t *)label;
+                tb_ptr = ptr;
                 continue;
             }
             break;
@@ -637,13 +653,12 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_write_reg64(regs, t1, t0, tmp64);
             break;
         case INDEX_op_brcond2_i32:
-            tmp64 = tci_read_r64(regs, &tb_ptr);
-            v64 = tci_read_r64(regs, &tb_ptr);
-            condition = *tb_ptr++;
-            label = tci_read_label(&tb_ptr);
-            if (tci_compare64(tmp64, v64, condition)) {
+            tci_args_rrrrcl(&tb_ptr, &r0, &r1, &r2, &r3, &condition, &ptr);
+            T1 = tci_uint64(regs[r1], regs[r0]);
+            T2 = tci_uint64(regs[r3], regs[r2]);
+            if (tci_compare64(T1, T2, condition)) {
                 tci_assert(tb_ptr == old_code_ptr + op_size);
-                tb_ptr = (uint8_t *)label;
+                tb_ptr = ptr;
                 continue;
             }
             break;
@@ -783,13 +798,10 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
 #endif
         case INDEX_op_brcond_i64:
-            t0 = tci_read_rval(regs, &tb_ptr);
-            t1 = tci_read_rval(regs, &tb_ptr);
-            condition = *tb_ptr++;
-            label = tci_read_label(&tb_ptr);
-            if (tci_compare64(t0, t1, condition)) {
+            tci_args_rrcl(&tb_ptr, &r0, &r1, &condition, &ptr);
+            if (tci_compare64(regs[r0], regs[r1], condition)) {
                 tci_assert(tb_ptr == old_code_ptr + op_size);
-                tb_ptr = (uint8_t *)label;
+                tb_ptr = ptr;
                 continue;
             }
             break;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 50/93] tcg/tci: Split out tci_args_ri and tci_args_rI
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (48 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 49/93] tcg/tci: Split out tci_args_rrcl and tci_args_rrrrcl Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 51/93] tcg/tci: Reuse tci_args_l for calls Richard Henderson
                   ` (45 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 1e2f78a9f9..5cc05fa554 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -121,16 +121,6 @@ static int32_t tci_read_s32(const uint8_t **tb_ptr)
     return value;
 }
 
-#if TCG_TARGET_REG_BITS == 64
-/* Read constant (64 bit) from bytecode. */
-static uint64_t tci_read_i64(const uint8_t **tb_ptr)
-{
-    uint64_t value = *(const uint64_t *)(*tb_ptr);
-    *tb_ptr += sizeof(value);
-    return value;
-}
-#endif
-
 /* Read indexed register (native size) from bytecode. */
 static tcg_target_ulong
 tci_read_rval(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
@@ -180,6 +170,8 @@ static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
  *   tci_args_<arguments>
  * where arguments is a sequence of
  *
+ *   i = immediate (uint32_t)
+ *   I = immediate (tcg_target_ulong)
  *   r = register
  *   s = signed ldst offset
  */
@@ -196,6 +188,22 @@ static void tci_args_rr(const uint8_t **tb_ptr,
     *r1 = tci_read_r(tb_ptr);
 }
 
+static void tci_args_ri(const uint8_t **tb_ptr,
+                        TCGReg *r0, tcg_target_ulong *i1)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *i1 = tci_read_i32(tb_ptr);
+}
+
+#if TCG_TARGET_REG_BITS == 64
+static void tci_args_rI(const uint8_t **tb_ptr,
+                        TCGReg *r0, tcg_target_ulong *i1)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *i1 = tci_read_i(tb_ptr);
+}
+#endif
+
 static void tci_args_rrr(const uint8_t **tb_ptr,
                          TCGReg *r0, TCGReg *r1, TCGReg *r2)
 {
@@ -498,9 +506,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             regs[r0] = regs[r1];
             break;
         case INDEX_op_tci_movi_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_i32(&tb_ptr);
-            tci_write_reg(regs, t0, t1);
+            tci_args_ri(&tb_ptr, &r0, &t1);
+            regs[r0] = t1;
             break;
 
             /* Load/store operations (32 bit). */
@@ -720,9 +727,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #endif
 #if TCG_TARGET_REG_BITS == 64
         case INDEX_op_tci_movi_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_i64(&tb_ptr);
-            tci_write_reg(regs, t0, t1);
+            tci_args_rI(&tb_ptr, &r0, &t1);
+            regs[r0] = t1;
             break;
 
             /* Load/store operations (64 bit). */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 51/93] tcg/tci: Reuse tci_args_l for calls.
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (49 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 50/93] tcg/tci: Split out tci_args_ri and tci_args_rI Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 52/93] tcg/tci: Reuse tci_args_l for exit_tb Richard Henderson
                   ` (44 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 5cc05fa554..92b13829c3 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -452,30 +452,30 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         switch (opc) {
         case INDEX_op_call:
-            t0 = tci_read_i(&tb_ptr);
+            tci_args_l(&tb_ptr, &ptr);
             tci_tb_ptr = (uintptr_t)tb_ptr;
 #if TCG_TARGET_REG_BITS == 32
-            tmp64 = ((helper_function)t0)(tci_read_reg(regs, TCG_REG_R0),
-                                          tci_read_reg(regs, TCG_REG_R1),
-                                          tci_read_reg(regs, TCG_REG_R2),
-                                          tci_read_reg(regs, TCG_REG_R3),
-                                          tci_read_reg(regs, TCG_REG_R4),
-                                          tci_read_reg(regs, TCG_REG_R5),
-                                          tci_read_reg(regs, TCG_REG_R6),
-                                          tci_read_reg(regs, TCG_REG_R7),
-                                          tci_read_reg(regs, TCG_REG_R8),
-                                          tci_read_reg(regs, TCG_REG_R9),
-                                          tci_read_reg(regs, TCG_REG_R10),
-                                          tci_read_reg(regs, TCG_REG_R11));
+            tmp64 = ((helper_function)ptr)(tci_read_reg(regs, TCG_REG_R0),
+                                           tci_read_reg(regs, TCG_REG_R1),
+                                           tci_read_reg(regs, TCG_REG_R2),
+                                           tci_read_reg(regs, TCG_REG_R3),
+                                           tci_read_reg(regs, TCG_REG_R4),
+                                           tci_read_reg(regs, TCG_REG_R5),
+                                           tci_read_reg(regs, TCG_REG_R6),
+                                           tci_read_reg(regs, TCG_REG_R7),
+                                           tci_read_reg(regs, TCG_REG_R8),
+                                           tci_read_reg(regs, TCG_REG_R9),
+                                           tci_read_reg(regs, TCG_REG_R10),
+                                           tci_read_reg(regs, TCG_REG_R11));
             tci_write_reg(regs, TCG_REG_R0, tmp64);
             tci_write_reg(regs, TCG_REG_R1, tmp64 >> 32);
 #else
-            tmp64 = ((helper_function)t0)(tci_read_reg(regs, TCG_REG_R0),
-                                          tci_read_reg(regs, TCG_REG_R1),
-                                          tci_read_reg(regs, TCG_REG_R2),
-                                          tci_read_reg(regs, TCG_REG_R3),
-                                          tci_read_reg(regs, TCG_REG_R4),
-                                          tci_read_reg(regs, TCG_REG_R5));
+            tmp64 = ((helper_function)ptr)(tci_read_reg(regs, TCG_REG_R0),
+                                           tci_read_reg(regs, TCG_REG_R1),
+                                           tci_read_reg(regs, TCG_REG_R2),
+                                           tci_read_reg(regs, TCG_REG_R3),
+                                           tci_read_reg(regs, TCG_REG_R4),
+                                           tci_read_reg(regs, TCG_REG_R5));
             tci_write_reg(regs, TCG_REG_R0, tmp64);
 #endif
             break;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 52/93] tcg/tci: Reuse tci_args_l for exit_tb
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (50 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 51/93] tcg/tci: Reuse tci_args_l for calls Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 53/93] tcg/tci: Reuse tci_args_l for goto_tb Richard Henderson
                   ` (43 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Do not emit a uint64_t, but a tcg_target_ulong, aka uintptr_t.
This reduces the size of the constant on 32-bit hosts.
The assert for label != NULL has to be removed because that
is a valid value for exit_tb.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c                | 13 ++++---------
 tcg/tci/tcg-target.c.inc |  2 +-
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 92b13829c3..57b6defe09 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -160,9 +160,7 @@ tci_read_ulong(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
 
 static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
 {
-    tcg_target_ulong label = tci_read_i(tb_ptr);
-    tci_assert(label != 0);
-    return label;
+    return tci_read_i(tb_ptr);
 }
 
 /*
@@ -417,7 +415,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
     tcg_target_ulong regs[TCG_TARGET_NB_REGS];
     long tcg_temps[CPU_TEMP_BUF_NLONGS];
     uintptr_t sp_value = (uintptr_t)(tcg_temps + CPU_TEMP_BUF_NLONGS);
-    uintptr_t ret = 0;
 
     regs[TCG_AREG0] = (tcg_target_ulong)env;
     regs[TCG_REG_CALL_STACK] = sp_value;
@@ -832,9 +829,9 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             /* QEMU specific operations. */
 
         case INDEX_op_exit_tb:
-            ret = *(uint64_t *)tb_ptr;
-            goto exit;
-            break;
+            tci_args_l(&tb_ptr, &ptr);
+            return (uintptr_t)ptr;
+
         case INDEX_op_goto_tb:
             /* Jump address is aligned */
             tb_ptr = QEMU_ALIGN_PTR_UP(tb_ptr, 4);
@@ -992,6 +989,4 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         }
         tci_assert(tb_ptr == old_code_ptr + op_size);
     }
-exit:
-    return ret;
 }
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index c79f9c32d8..ff8040510f 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -401,7 +401,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 
     switch (opc) {
     case INDEX_op_exit_tb:
-        tcg_out64(s, args[0]);
+        tcg_out_i(s, args[0]);
         break;
 
     case INDEX_op_goto_tb:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 53/93] tcg/tci: Reuse tci_args_l for goto_tb
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (51 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 52/93] tcg/tci: Reuse tci_args_l for exit_tb Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 54/93] tcg/tci: Split out tci_args_rrrrrr Richard Henderson
                   ` (42 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Convert to indirect jumps, as it's less complicated.
Then we just have a pointer to the tb address at which
the chain is stored, from which we read.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.h     | 11 +++--------
 tcg/tci.c                |  8 +++-----
 tcg/tci/tcg-target.c.inc | 13 +++----------
 3 files changed, 9 insertions(+), 23 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 9c0021a26f..9285c930a2 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -87,7 +87,7 @@
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_goto_ptr         0
-#define TCG_TARGET_HAS_direct_jump      1
+#define TCG_TARGET_HAS_direct_jump      0
 #define TCG_TARGET_HAS_qemu_st8_i32     0
 
 #if TCG_TARGET_REG_BITS == 64
@@ -174,12 +174,7 @@ void tci_disas(uint8_t opc);
 
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
 
-static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
-                                            uintptr_t jmp_rw, uintptr_t addr)
-{
-    /* patch the branch destination */
-    qatomic_set((int32_t *)jmp_rw, addr - (jmp_rx + 4));
-    /* no need to flush icache explicitly */
-}
+/* not defined -- call should be eliminated at compile time */
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 
 #endif /* TCG_TARGET_H */
diff --git a/tcg/tci.c b/tcg/tci.c
index 57b6defe09..0301ee63a7 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -833,13 +833,11 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             return (uintptr_t)ptr;
 
         case INDEX_op_goto_tb:
-            /* Jump address is aligned */
-            tb_ptr = QEMU_ALIGN_PTR_UP(tb_ptr, 4);
-            t0 = qatomic_read((int32_t *)tb_ptr);
-            tb_ptr += sizeof(int32_t);
+            tci_args_l(&tb_ptr, &ptr);
             tci_assert(tb_ptr == old_code_ptr + op_size);
-            tb_ptr += (int32_t)t0;
+            tb_ptr = *(void **)ptr;
             continue;
+
         case INDEX_op_qemu_ld_i32:
             t0 = *tb_ptr++;
             taddr = tci_read_ulong(regs, &tb_ptr);
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index ff8040510f..2c64b4f617 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -405,16 +405,9 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
 
     case INDEX_op_goto_tb:
-        if (s->tb_jmp_insn_offset) {
-            /* Direct jump method. */
-            /* Align for atomic patching and thread safety */
-            s->code_ptr = QEMU_ALIGN_PTR_UP(s->code_ptr, 4);
-            s->tb_jmp_insn_offset[args[0]] = tcg_current_code_size(s);
-            tcg_out32(s, 0);
-        } else {
-            /* Indirect jump method. */
-            TODO();
-        }
+        tcg_debug_assert(s->tb_jmp_insn_offset == 0);
+        /* indirect jump method. */
+        tcg_out_i(s, (uintptr_t)(s->tb_jmp_target_addr + args[0]));
         set_jmp_reset_offset(s, args[0]);
         break;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 54/93] tcg/tci: Split out tci_args_rrrrrr
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (52 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 53/93] tcg/tci: Reuse tci_args_l for goto_tb Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 55/93] tcg/tci: Split out tci_args_rrrr Richard Henderson
                   ` (41 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 0301ee63a7..84d77855ee 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -258,6 +258,17 @@ static void tci_args_rrrrrc(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
     *r4 = tci_read_r(tb_ptr);
     *c5 = tci_read_b(tb_ptr);
 }
+
+static void tci_args_rrrrrr(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+                            TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGReg *r5)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *r1 = tci_read_r(tb_ptr);
+    *r2 = tci_read_r(tb_ptr);
+    *r3 = tci_read_r(tb_ptr);
+    *r4 = tci_read_r(tb_ptr);
+    *r5 = tci_read_r(tb_ptr);
+}
 #endif
 
 static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
@@ -437,7 +448,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         uint32_t tmp32;
         uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
-        TCGReg r3, r4;
+        TCGReg r3, r4, r5;
         uint64_t T1, T2;
 #endif
         TCGMemOpIdx oi;
@@ -643,18 +654,16 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
 #if TCG_TARGET_REG_BITS == 32
         case INDEX_op_add2_i32:
-            t0 = *tb_ptr++;
-            t1 = *tb_ptr++;
-            tmp64 = tci_read_r64(regs, &tb_ptr);
-            tmp64 += tci_read_r64(regs, &tb_ptr);
-            tci_write_reg64(regs, t1, t0, tmp64);
+            tci_args_rrrrrr(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &r5);
+            T1 = tci_uint64(regs[r3], regs[r2]);
+            T2 = tci_uint64(regs[r5], regs[r4]);
+            tci_write_reg64(regs, r1, r0, T1 + T2);
             break;
         case INDEX_op_sub2_i32:
-            t0 = *tb_ptr++;
-            t1 = *tb_ptr++;
-            tmp64 = tci_read_r64(regs, &tb_ptr);
-            tmp64 -= tci_read_r64(regs, &tb_ptr);
-            tci_write_reg64(regs, t1, t0, tmp64);
+            tci_args_rrrrrr(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &r5);
+            T1 = tci_uint64(regs[r3], regs[r2]);
+            T2 = tci_uint64(regs[r5], regs[r4]);
+            tci_write_reg64(regs, r1, r0, T1 - T2);
             break;
         case INDEX_op_brcond2_i32:
             tci_args_rrrrcl(&tb_ptr, &r0, &r1, &r2, &r3, &condition, &ptr);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 55/93] tcg/tci: Split out tci_args_rrrr
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (53 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 54/93] tcg/tci: Split out tci_args_rrrrrr Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 56/93] tcg/tci: Clean up deposit operations Richard Henderson
                   ` (40 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 84d77855ee..cb24295cd9 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -237,6 +237,15 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
 }
 
 #if TCG_TARGET_REG_BITS == 32
+static void tci_args_rrrr(const uint8_t **tb_ptr,
+                          TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *r1 = tci_read_r(tb_ptr);
+    *r2 = tci_read_r(tb_ptr);
+    *r3 = tci_read_r(tb_ptr);
+}
+
 static void tci_args_rrrrcl(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
                             TCGReg *r2, TCGReg *r3, TCGCond *c4, void **l5)
 {
@@ -676,11 +685,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             }
             break;
         case INDEX_op_mulu2_i32:
-            t0 = *tb_ptr++;
-            t1 = *tb_ptr++;
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tmp64 = (uint32_t)tci_read_rval(regs, &tb_ptr);
-            tci_write_reg64(regs, t1, t0, (uint32_t)t2 * tmp64);
+            tci_args_rrrr(&tb_ptr, &r0, &r1, &r2, &r3);
+            tci_write_reg64(regs, r1, r0, (uint64_t)regs[r2] * regs[r3]);
             break;
 #endif /* TCG_TARGET_REG_BITS == 32 */
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 56/93] tcg/tci: Clean up deposit operations
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (54 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 55/93] tcg/tci: Split out tci_args_rrrr Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 57/93] tcg/tci: Reduce qemu_ld/st TCGMemOpIdx operand to 32-bits Richard Henderson
                   ` (39 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Use the correct set of asserts during code generation.
We do not require the first input to overlap the output;
the existing interpreter already supported that.

Split out tci_args_rrrbb in the translator.
Use the deposit32/64 functions rather than inline expansion.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target-con-set.h |  1 -
 tcg/tci.c                    | 33 ++++++++++++++++-----------------
 tcg/tci/tcg-target.c.inc     | 24 ++++++++++++++----------
 3 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/tcg/tci/tcg-target-con-set.h b/tcg/tci/tcg-target-con-set.h
index f51b7bcb13..316730f32c 100644
--- a/tcg/tci/tcg-target-con-set.h
+++ b/tcg/tci/tcg-target-con-set.h
@@ -13,7 +13,6 @@ C_O0_I2(r, r)
 C_O0_I3(r, r, r)
 C_O0_I4(r, r, r, r)
 C_O1_I1(r, r)
-C_O1_I2(r, 0, r)
 C_O1_I2(r, r, r)
 C_O1_I4(r, r, r, r, r)
 C_O2_I1(r, r, r)
diff --git a/tcg/tci.c b/tcg/tci.c
index cb24295cd9..e10ccfc344 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -168,6 +168,7 @@ static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
  *   tci_args_<arguments>
  * where arguments is a sequence of
  *
+ *   b = immediate (bit position)
  *   i = immediate (uint32_t)
  *   I = immediate (tcg_target_ulong)
  *   r = register
@@ -236,6 +237,16 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
     *c3 = tci_read_b(tb_ptr);
 }
 
+static void tci_args_rrrbb(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+                           TCGReg *r2, uint8_t *i3, uint8_t *i4)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *r1 = tci_read_r(tb_ptr);
+    *r2 = tci_read_r(tb_ptr);
+    *i3 = tci_read_b(tb_ptr);
+    *i4 = tci_read_b(tb_ptr);
+}
+
 #if TCG_TARGET_REG_BITS == 32
 static void tci_args_rrrr(const uint8_t **tb_ptr,
                           TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
@@ -449,11 +460,9 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         TCGReg r0, r1, r2;
         tcg_target_ulong t0;
         tcg_target_ulong t1;
-        tcg_target_ulong t2;
         TCGCond condition;
         target_ulong taddr;
-        uint8_t tmp8;
-        uint16_t tmp16;
+        uint8_t pos, len;
         uint32_t tmp32;
         uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
@@ -644,13 +653,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #endif
 #if TCG_TARGET_HAS_deposit_i32
         case INDEX_op_deposit_i32:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tmp16 = *tb_ptr++;
-            tmp8 = *tb_ptr++;
-            tmp32 = (((1 << tmp8) - 1) << tmp16);
-            tci_write_reg(regs, t0, (t1 & ~tmp32) | ((t2 << tmp16) & tmp32));
+            tci_args_rrrbb(&tb_ptr, &r0, &r1, &r2, &pos, &len);
+            regs[r0] = deposit32(regs[r1], pos, len, regs[r2]);
             break;
 #endif
         case INDEX_op_brcond_i32:
@@ -806,13 +810,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 #endif
 #if TCG_TARGET_HAS_deposit_i64
         case INDEX_op_deposit_i64:
-            t0 = *tb_ptr++;
-            t1 = tci_read_rval(regs, &tb_ptr);
-            t2 = tci_read_rval(regs, &tb_ptr);
-            tmp16 = *tb_ptr++;
-            tmp8 = *tb_ptr++;
-            tmp64 = (((1ULL << tmp8) - 1) << tmp16);
-            tci_write_reg(regs, t0, (t1 & ~tmp64) | ((t2 << tmp16) & tmp64));
+            tci_args_rrrbb(&tb_ptr, &r0, &r1, &r2, &pos, &len);
+            regs[r0] = deposit64(regs[r1], pos, len, regs[r2]);
             break;
 #endif
         case INDEX_op_brcond_i64:
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 2c64b4f617..640407b4a8 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -126,11 +126,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_rotr_i64:
     case INDEX_op_setcond_i32:
     case INDEX_op_setcond_i64:
-        return C_O1_I2(r, r, r);
-
     case INDEX_op_deposit_i32:
     case INDEX_op_deposit_i64:
-        return C_O1_I2(r, 0, r);
+        return C_O1_I2(r, r, r);
 
     case INDEX_op_brcond_i32:
     case INDEX_op_brcond_i64:
@@ -480,13 +478,19 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
 
     CASE_32_64(deposit)  /* Optional (TCG_TARGET_HAS_deposit_*). */
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        tcg_out_r(s, args[2]);
-        tcg_debug_assert(args[3] <= UINT8_MAX);
-        tcg_out8(s, args[3]);
-        tcg_debug_assert(args[4] <= UINT8_MAX);
-        tcg_out8(s, args[4]);
+        {
+            TCGArg pos = args[3], len = args[4];
+            TCGArg max = opc == INDEX_op_deposit_i32 ? 32 : 64;
+
+            tcg_debug_assert(pos < max);
+            tcg_debug_assert(pos + len <= max);
+
+            tcg_out_r(s, args[0]);
+            tcg_out_r(s, args[1]);
+            tcg_out_r(s, args[2]);
+            tcg_out8(s, pos);
+            tcg_out8(s, len);
+        }
         break;
 
     CASE_32_64(brcond)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 57/93] tcg/tci: Reduce qemu_ld/st TCGMemOpIdx operand to 32-bits
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (55 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 56/93] tcg/tci: Clean up deposit operations Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 58/93] tcg/tci: Split out tci_args_{rrm,rrrm,rrrrm} Richard Henderson
                   ` (38 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

We are currently using the "natural" size routine, which
uses 64-bits on a 64-bit host.  The TCGMemOpIdx operand
has 11 bits, so we can safely reduce to 32-bits.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c                | 8 ++++----
 tcg/tci/tcg-target.c.inc | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index e10ccfc344..ddc138359b 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -855,7 +855,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_qemu_ld_i32:
             t0 = *tb_ptr++;
             taddr = tci_read_ulong(regs, &tb_ptr);
-            oi = tci_read_i(&tb_ptr);
+            oi = tci_read_i32(&tb_ptr);
             switch (get_memop(oi) & (MO_BSWAP | MO_SSIZE)) {
             case MO_UB:
                 tmp32 = qemu_ld_ub;
@@ -892,7 +892,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
                 t1 = *tb_ptr++;
             }
             taddr = tci_read_ulong(regs, &tb_ptr);
-            oi = tci_read_i(&tb_ptr);
+            oi = tci_read_i32(&tb_ptr);
             switch (get_memop(oi) & (MO_BSWAP | MO_SSIZE)) {
             case MO_UB:
                 tmp64 = qemu_ld_ub;
@@ -941,7 +941,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_qemu_st_i32:
             t0 = tci_read_rval(regs, &tb_ptr);
             taddr = tci_read_ulong(regs, &tb_ptr);
-            oi = tci_read_i(&tb_ptr);
+            oi = tci_read_i32(&tb_ptr);
             switch (get_memop(oi) & (MO_BSWAP | MO_SIZE)) {
             case MO_UB:
                 qemu_st_b(t0);
@@ -965,7 +965,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_qemu_st_i64:
             tmp64 = tci_read_r64(regs, &tb_ptr);
             taddr = tci_read_ulong(regs, &tb_ptr);
-            oi = tci_read_i(&tb_ptr);
+            oi = tci_read_i32(&tb_ptr);
             switch (get_memop(oi) & (MO_BSWAP | MO_SIZE)) {
             case MO_UB:
                 qemu_st_b(tmp64);
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 640407b4a8..6c187a25cc 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -550,7 +550,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
             tcg_out_r(s, *args++);
         }
-        tcg_out_i(s, *args++);
+        tcg_out32(s, *args++);
         break;
 
     case INDEX_op_qemu_ld_i64:
@@ -563,7 +563,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
             tcg_out_r(s, *args++);
         }
-        tcg_out_i(s, *args++);
+        tcg_out32(s, *args++);
         break;
 
     case INDEX_op_mb:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 58/93] tcg/tci: Split out tci_args_{rrm,rrrm,rrrrm}
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (56 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 57/93] tcg/tci: Reduce qemu_ld/st TCGMemOpIdx operand to 32-bits Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 59/93] tcg/tci: Hoist op_size checking into tci_args_* Richard Henderson
                   ` (37 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 147 ++++++++++++++++++++++++++++++------------------------
 1 file changed, 81 insertions(+), 66 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index ddc138359b..a1846825ea 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -66,22 +66,18 @@ tci_write_reg(tcg_target_ulong *regs, TCGReg index, tcg_target_ulong value)
     regs[index] = value;
 }
 
-#if TCG_TARGET_REG_BITS == 32
 static void tci_write_reg64(tcg_target_ulong *regs, uint32_t high_index,
                             uint32_t low_index, uint64_t value)
 {
     tci_write_reg(regs, low_index, value);
     tci_write_reg(regs, high_index, value >> 32);
 }
-#endif
 
-#if TCG_TARGET_REG_BITS == 32
 /* Create a 64 bit value from two 32 bit values. */
 static uint64_t tci_uint64(uint32_t high, uint32_t low)
 {
     return ((uint64_t)high << 32) + low;
 }
-#endif
 
 /* Read constant byte from bytecode. */
 static uint8_t tci_read_b(const uint8_t **tb_ptr)
@@ -121,43 +117,6 @@ static int32_t tci_read_s32(const uint8_t **tb_ptr)
     return value;
 }
 
-/* Read indexed register (native size) from bytecode. */
-static tcg_target_ulong
-tci_read_rval(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
-{
-    tcg_target_ulong value = tci_read_reg(regs, **tb_ptr);
-    *tb_ptr += 1;
-    return value;
-}
-
-#if TCG_TARGET_REG_BITS == 32
-/* Read two indexed registers (2 * 32 bit) from bytecode. */
-static uint64_t tci_read_r64(const tcg_target_ulong *regs,
-                             const uint8_t **tb_ptr)
-{
-    uint32_t low = tci_read_rval(regs, tb_ptr);
-    return tci_uint64(tci_read_rval(regs, tb_ptr), low);
-}
-#elif TCG_TARGET_REG_BITS == 64
-/* Read indexed register (64 bit) from bytecode. */
-static uint64_t tci_read_r64(const tcg_target_ulong *regs,
-                             const uint8_t **tb_ptr)
-{
-    return tci_read_rval(regs, tb_ptr);
-}
-#endif
-
-/* Read indexed register(s) with target address from bytecode. */
-static target_ulong
-tci_read_ulong(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
-{
-    target_ulong taddr = tci_read_rval(regs, tb_ptr);
-#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
-    taddr += (uint64_t)tci_read_rval(regs, tb_ptr) << 32;
-#endif
-    return taddr;
-}
-
 static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
 {
     return tci_read_i(tb_ptr);
@@ -171,6 +130,7 @@ static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
  *   b = immediate (bit position)
  *   i = immediate (uint32_t)
  *   I = immediate (tcg_target_ulong)
+ *   m = immediate (TCGMemOpIdx)
  *   r = register
  *   s = signed ldst offset
  */
@@ -203,6 +163,14 @@ static void tci_args_rI(const uint8_t **tb_ptr,
 }
 #endif
 
+static void tci_args_rrm(const uint8_t **tb_ptr,
+                         TCGReg *r0, TCGReg *r1, TCGMemOpIdx *m2)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *r1 = tci_read_r(tb_ptr);
+    *m2 = tci_read_i32(tb_ptr);
+}
+
 static void tci_args_rrr(const uint8_t **tb_ptr,
                          TCGReg *r0, TCGReg *r1, TCGReg *r2)
 {
@@ -237,6 +205,15 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
     *c3 = tci_read_b(tb_ptr);
 }
 
+static void tci_args_rrrm(const uint8_t **tb_ptr,
+                          TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGMemOpIdx *m3)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *r1 = tci_read_r(tb_ptr);
+    *r2 = tci_read_r(tb_ptr);
+    *m3 = tci_read_i32(tb_ptr);
+}
+
 static void tci_args_rrrbb(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
                            TCGReg *r2, uint8_t *i3, uint8_t *i4)
 {
@@ -247,6 +224,16 @@ static void tci_args_rrrbb(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
     *i4 = tci_read_b(tb_ptr);
 }
 
+static void tci_args_rrrrm(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+                           TCGReg *r2, TCGReg *r3, TCGMemOpIdx *m4)
+{
+    *r0 = tci_read_r(tb_ptr);
+    *r1 = tci_read_r(tb_ptr);
+    *r2 = tci_read_r(tb_ptr);
+    *r3 = tci_read_r(tb_ptr);
+    *m4 = tci_read_i32(tb_ptr);
+}
+
 #if TCG_TARGET_REG_BITS == 32
 static void tci_args_rrrr(const uint8_t **tb_ptr,
                           TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
@@ -457,8 +444,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         uint8_t op_size = tb_ptr[1];
         const uint8_t *old_code_ptr = tb_ptr;
 #endif
-        TCGReg r0, r1, r2;
-        tcg_target_ulong t0;
+        TCGReg r0, r1, r2, r3;
         tcg_target_ulong t1;
         TCGCond condition;
         target_ulong taddr;
@@ -466,7 +452,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         uint32_t tmp32;
         uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
-        TCGReg r3, r4, r5;
+        TCGReg r4, r5;
         uint64_t T1, T2;
 #endif
         TCGMemOpIdx oi;
@@ -853,9 +839,13 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             continue;
 
         case INDEX_op_qemu_ld_i32:
-            t0 = *tb_ptr++;
-            taddr = tci_read_ulong(regs, &tb_ptr);
-            oi = tci_read_i32(&tb_ptr);
+            if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+                tci_args_rrm(&tb_ptr, &r0, &r1, &oi);
+                taddr = regs[r1];
+            } else {
+                tci_args_rrrm(&tb_ptr, &r0, &r1, &r2, &oi);
+                taddr = tci_uint64(regs[r2], regs[r1]);
+            }
             switch (get_memop(oi) & (MO_BSWAP | MO_SSIZE)) {
             case MO_UB:
                 tmp32 = qemu_ld_ub;
@@ -884,15 +874,20 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             default:
                 g_assert_not_reached();
             }
-            tci_write_reg(regs, t0, tmp32);
+            regs[r0] = tmp32;
             break;
+
         case INDEX_op_qemu_ld_i64:
-            t0 = *tb_ptr++;
-            if (TCG_TARGET_REG_BITS == 32) {
-                t1 = *tb_ptr++;
+            if (TCG_TARGET_REG_BITS == 64) {
+                tci_args_rrm(&tb_ptr, &r0, &r1, &oi);
+                taddr = regs[r1];
+            } else if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+                tci_args_rrrm(&tb_ptr, &r0, &r1, &r2, &oi);
+                taddr = regs[r2];
+            } else {
+                tci_args_rrrrm(&tb_ptr, &r0, &r1, &r2, &r3, &oi);
+                taddr = tci_uint64(regs[r3], regs[r2]);
             }
-            taddr = tci_read_ulong(regs, &tb_ptr);
-            oi = tci_read_i32(&tb_ptr);
             switch (get_memop(oi) & (MO_BSWAP | MO_SSIZE)) {
             case MO_UB:
                 tmp64 = qemu_ld_ub;
@@ -933,39 +928,58 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             default:
                 g_assert_not_reached();
             }
-            tci_write_reg(regs, t0, tmp64);
             if (TCG_TARGET_REG_BITS == 32) {
-                tci_write_reg(regs, t1, tmp64 >> 32);
+                tci_write_reg64(regs, r1, r0, tmp64);
+            } else {
+                regs[r0] = tmp64;
             }
             break;
+
         case INDEX_op_qemu_st_i32:
-            t0 = tci_read_rval(regs, &tb_ptr);
-            taddr = tci_read_ulong(regs, &tb_ptr);
-            oi = tci_read_i32(&tb_ptr);
+            if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+                tci_args_rrm(&tb_ptr, &r0, &r1, &oi);
+                taddr = regs[r1];
+            } else {
+                tci_args_rrrm(&tb_ptr, &r0, &r1, &r2, &oi);
+                taddr = tci_uint64(regs[r2], regs[r1]);
+            }
+            tmp32 = regs[r0];
             switch (get_memop(oi) & (MO_BSWAP | MO_SIZE)) {
             case MO_UB:
-                qemu_st_b(t0);
+                qemu_st_b(tmp32);
                 break;
             case MO_LEUW:
-                qemu_st_lew(t0);
+                qemu_st_lew(tmp32);
                 break;
             case MO_LEUL:
-                qemu_st_lel(t0);
+                qemu_st_lel(tmp32);
                 break;
             case MO_BEUW:
-                qemu_st_bew(t0);
+                qemu_st_bew(tmp32);
                 break;
             case MO_BEUL:
-                qemu_st_bel(t0);
+                qemu_st_bel(tmp32);
                 break;
             default:
                 g_assert_not_reached();
             }
             break;
+
         case INDEX_op_qemu_st_i64:
-            tmp64 = tci_read_r64(regs, &tb_ptr);
-            taddr = tci_read_ulong(regs, &tb_ptr);
-            oi = tci_read_i32(&tb_ptr);
+            if (TCG_TARGET_REG_BITS == 64) {
+                tci_args_rrm(&tb_ptr, &r0, &r1, &oi);
+                taddr = regs[r1];
+                tmp64 = regs[r0];
+            } else {
+                if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+                    tci_args_rrrm(&tb_ptr, &r0, &r1, &r2, &oi);
+                    taddr = regs[r2];
+                } else {
+                    tci_args_rrrrm(&tb_ptr, &r0, &r1, &r2, &r3, &oi);
+                    taddr = tci_uint64(regs[r3], regs[r2]);
+                }
+                tmp64 = tci_uint64(regs[r1], regs[r0]);
+            }
             switch (get_memop(oi) & (MO_BSWAP | MO_SIZE)) {
             case MO_UB:
                 qemu_st_b(tmp64);
@@ -992,6 +1006,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
                 g_assert_not_reached();
             }
             break;
+
         case INDEX_op_mb:
             /* Ensure ordering for all kinds */
             smp_mb();
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 59/93] tcg/tci: Hoist op_size checking into tci_args_*
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (57 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 58/93] tcg/tci: Split out tci_args_{rrm,rrrm,rrrrm} Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 60/93] tcg/tci: Remove tci_disas Richard Henderson
                   ` (36 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

This performs the size check while reading the arguments,
which means that we don't have to arrange for it to be
done after the operation.  Which tidies all of the branches.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 73 insertions(+), 14 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index a1846825ea..3dc89ed829 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -24,7 +24,7 @@
 #if defined(CONFIG_DEBUG_TCG)
 # define tci_assert(cond) assert(cond)
 #else
-# define tci_assert(cond) ((void)0)
+# define tci_assert(cond) ((void)(cond))
 #endif
 
 #include "qemu-common.h"
@@ -135,146 +135,217 @@ static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
  *   s = signed ldst offset
  */
 
+static void check_size(const uint8_t *start, const uint8_t **tb_ptr)
+{
+    const uint8_t *old_code_ptr = start - 2;
+    uint8_t op_size = old_code_ptr[1];
+    tci_assert(*tb_ptr == old_code_ptr + op_size);
+}
+
 static void tci_args_l(const uint8_t **tb_ptr, void **l0)
 {
+    const uint8_t *start = *tb_ptr;
+
     *l0 = (void *)tci_read_label(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 
 static void tci_args_rr(const uint8_t **tb_ptr,
                         TCGReg *r0, TCGReg *r1)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *r1 = tci_read_r(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 
 static void tci_args_ri(const uint8_t **tb_ptr,
                         TCGReg *r0, tcg_target_ulong *i1)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *i1 = tci_read_i32(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 
 #if TCG_TARGET_REG_BITS == 64
 static void tci_args_rI(const uint8_t **tb_ptr,
                         TCGReg *r0, tcg_target_ulong *i1)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *i1 = tci_read_i(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 #endif
 
 static void tci_args_rrm(const uint8_t **tb_ptr,
                          TCGReg *r0, TCGReg *r1, TCGMemOpIdx *m2)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *r1 = tci_read_r(tb_ptr);
     *m2 = tci_read_i32(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 
 static void tci_args_rrr(const uint8_t **tb_ptr,
                          TCGReg *r0, TCGReg *r1, TCGReg *r2)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *r1 = tci_read_r(tb_ptr);
     *r2 = tci_read_r(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 
 static void tci_args_rrs(const uint8_t **tb_ptr,
                          TCGReg *r0, TCGReg *r1, int32_t *i2)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *r1 = tci_read_r(tb_ptr);
     *i2 = tci_read_s32(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 
 static void tci_args_rrcl(const uint8_t **tb_ptr,
                           TCGReg *r0, TCGReg *r1, TCGCond *c2, void **l3)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *r1 = tci_read_r(tb_ptr);
     *c2 = tci_read_b(tb_ptr);
     *l3 = (void *)tci_read_label(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 
 static void tci_args_rrrc(const uint8_t **tb_ptr,
                           TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *r1 = tci_read_r(tb_ptr);
     *r2 = tci_read_r(tb_ptr);
     *c3 = tci_read_b(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 
 static void tci_args_rrrm(const uint8_t **tb_ptr,
                           TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGMemOpIdx *m3)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *r1 = tci_read_r(tb_ptr);
     *r2 = tci_read_r(tb_ptr);
     *m3 = tci_read_i32(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 
 static void tci_args_rrrbb(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
                            TCGReg *r2, uint8_t *i3, uint8_t *i4)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *r1 = tci_read_r(tb_ptr);
     *r2 = tci_read_r(tb_ptr);
     *i3 = tci_read_b(tb_ptr);
     *i4 = tci_read_b(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 
 static void tci_args_rrrrm(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
                            TCGReg *r2, TCGReg *r3, TCGMemOpIdx *m4)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *r1 = tci_read_r(tb_ptr);
     *r2 = tci_read_r(tb_ptr);
     *r3 = tci_read_r(tb_ptr);
     *m4 = tci_read_i32(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 
 #if TCG_TARGET_REG_BITS == 32
 static void tci_args_rrrr(const uint8_t **tb_ptr,
                           TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *r1 = tci_read_r(tb_ptr);
     *r2 = tci_read_r(tb_ptr);
     *r3 = tci_read_r(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 
 static void tci_args_rrrrcl(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
                             TCGReg *r2, TCGReg *r3, TCGCond *c4, void **l5)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *r1 = tci_read_r(tb_ptr);
     *r2 = tci_read_r(tb_ptr);
     *r3 = tci_read_r(tb_ptr);
     *c4 = tci_read_b(tb_ptr);
     *l5 = (void *)tci_read_label(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 
 static void tci_args_rrrrrc(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
                             TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *r1 = tci_read_r(tb_ptr);
     *r2 = tci_read_r(tb_ptr);
     *r3 = tci_read_r(tb_ptr);
     *r4 = tci_read_r(tb_ptr);
     *c5 = tci_read_b(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 
 static void tci_args_rrrrrr(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
                             TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGReg *r5)
 {
+    const uint8_t *start = *tb_ptr;
+
     *r0 = tci_read_r(tb_ptr);
     *r1 = tci_read_r(tb_ptr);
     *r2 = tci_read_r(tb_ptr);
     *r3 = tci_read_r(tb_ptr);
     *r4 = tci_read_r(tb_ptr);
     *r5 = tci_read_r(tb_ptr);
+
+    check_size(start, tb_ptr);
 }
 #endif
 
@@ -440,10 +511,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
     for (;;) {
         TCGOpcode opc = tb_ptr[0];
-#if defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG)
-        uint8_t op_size = tb_ptr[1];
-        const uint8_t *old_code_ptr = tb_ptr;
-#endif
         TCGReg r0, r1, r2, r3;
         tcg_target_ulong t1;
         TCGCond condition;
@@ -493,7 +560,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
         case INDEX_op_br:
             tci_args_l(&tb_ptr, &ptr);
-            tci_assert(tb_ptr == old_code_ptr + op_size);
             tb_ptr = ptr;
             continue;
         case INDEX_op_setcond_i32:
@@ -646,9 +712,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_brcond_i32:
             tci_args_rrcl(&tb_ptr, &r0, &r1, &condition, &ptr);
             if (tci_compare32(regs[r0], regs[r1], condition)) {
-                tci_assert(tb_ptr == old_code_ptr + op_size);
                 tb_ptr = ptr;
-                continue;
             }
             break;
 #if TCG_TARGET_REG_BITS == 32
@@ -669,7 +733,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             T1 = tci_uint64(regs[r1], regs[r0]);
             T2 = tci_uint64(regs[r3], regs[r2]);
             if (tci_compare64(T1, T2, condition)) {
-                tci_assert(tb_ptr == old_code_ptr + op_size);
                 tb_ptr = ptr;
                 continue;
             }
@@ -803,9 +866,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         case INDEX_op_brcond_i64:
             tci_args_rrcl(&tb_ptr, &r0, &r1, &condition, &ptr);
             if (tci_compare64(regs[r0], regs[r1], condition)) {
-                tci_assert(tb_ptr == old_code_ptr + op_size);
                 tb_ptr = ptr;
-                continue;
             }
             break;
         case INDEX_op_ext32s_i64:
@@ -834,9 +895,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_goto_tb:
             tci_args_l(&tb_ptr, &ptr);
-            tci_assert(tb_ptr == old_code_ptr + op_size);
             tb_ptr = *(void **)ptr;
-            continue;
+            break;
 
         case INDEX_op_qemu_ld_i32:
             if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
@@ -1014,6 +1074,5 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         default:
             g_assert_not_reached();
         }
-        tci_assert(tb_ptr == old_code_ptr + op_size);
     }
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 60/93] tcg/tci: Remove tci_disas
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (58 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 59/93] tcg/tci: Hoist op_size checking into tci_args_* Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 61/93] tcg/tci: Implement the disassembler properly Richard Henderson
                   ` (35 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

This function is unused.  It's not even the disassembler,
which is print_insn_tci, located in disas/tci.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.h     |  2 --
 tcg/tci/tcg-target.c.inc | 10 ----------
 2 files changed, 12 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 9285c930a2..52af6d8bc5 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -163,8 +163,6 @@ typedef enum {
 #define TCG_TARGET_CALL_STACK_OFFSET    0
 #define TCG_TARGET_STACK_ALIGN          16
 
-void tci_disas(uint8_t opc);
-
 #define HAVE_TCG_QEMU_TB_EXEC
 
 /* We could notice __i386__ or __s390x__ and reduce the barriers depending
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 6c187a25cc..7fb3b04eaf 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -253,16 +253,6 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
     return true;
 }
 
-#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
-/* Show current bytecode. Used by tcg interpreter. */
-void tci_disas(uint8_t opc)
-{
-    const TCGOpDef *def = &tcg_op_defs[opc];
-    fprintf(stderr, "TCG %s %u, %u, %u\n",
-            def->name, def->nb_oargs, def->nb_iargs, def->nb_cargs);
-}
-#endif
-
 /* Write value (native size). */
 static void tcg_out_i(TCGContext *s, tcg_target_ulong v)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 61/93] tcg/tci: Implement the disassembler properly
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (59 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 60/93] tcg/tci: Remove tci_disas Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 62/93] tcg: Build ffi data structures for helpers Richard Henderson
                   ` (34 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Actually print arguments as opposed to simply the opcodes
and, uselessly, the argument counts.  Reuse all of the helpers
developed as part of the interpreter.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 meson.build           |   2 +-
 include/tcg/tcg-opc.h |   2 -
 disas/tci.c           |  61 ---------
 tcg/tci.c             | 283 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 284 insertions(+), 64 deletions(-)
 delete mode 100644 disas/tci.c

diff --git a/meson.build b/meson.build
index 2d8b433ff0..475d8a94ea 100644
--- a/meson.build
+++ b/meson.build
@@ -1901,7 +1901,7 @@ specific_ss.add(when: 'CONFIG_TCG', if_true: files(
   'tcg/tcg-op.c',
   'tcg/tcg.c',
 ))
-specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('disas/tci.c', 'tcg/tci.c'))
+specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tcg/tci.c'))
 
 subdir('backends')
 subdir('disas')
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index 900984c005..bbb0884af8 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -278,10 +278,8 @@ DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 #ifdef TCG_TARGET_INTERPRETER
 /* These opcodes are only for use between the tci generator and interpreter. */
 DEF(tci_movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
-#if TCG_TARGET_REG_BITS == 64
 DEF(tci_movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
 #endif
-#endif
 
 #undef TLADDR_ARGS
 #undef DATA64_ARGS
diff --git a/disas/tci.c b/disas/tci.c
deleted file mode 100644
index f1d6c6b469..0000000000
--- a/disas/tci.c
+++ /dev/null
@@ -1,61 +0,0 @@
-/*
- * Tiny Code Interpreter for QEMU - disassembler
- *
- * Copyright (c) 2011 Stefan Weil
- *
- * This program is free software: you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation, either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#include "qemu/osdep.h"
-#include "qemu-common.h"
-#include "disas/dis-asm.h"
-#include "tcg/tcg.h"
-
-/* Disassemble TCI bytecode. */
-int print_insn_tci(bfd_vma addr, disassemble_info *info)
-{
-    int length;
-    uint8_t byte;
-    int status;
-    TCGOpcode op;
-
-    status = info->read_memory_func(addr, &byte, 1, info);
-    if (status != 0) {
-        info->memory_error_func(status, addr, info);
-        return -1;
-    }
-    op = byte;
-
-    addr++;
-    status = info->read_memory_func(addr, &byte, 1, info);
-    if (status != 0) {
-        info->memory_error_func(status, addr, info);
-        return -1;
-    }
-    length = byte;
-
-    if (op >= tcg_op_defs_max) {
-        info->fprintf_func(info->stream, "illegal opcode %d", op);
-    } else {
-        const TCGOpDef *def = &tcg_op_defs[op];
-        int nb_oargs = def->nb_oargs;
-        int nb_iargs = def->nb_iargs;
-        int nb_cargs = def->nb_cargs;
-        /* TODO: Improve disassembler output. */
-        info->fprintf_func(info->stream, "%s\to=%d i=%d c=%d",
-                           def->name, nb_oargs, nb_iargs, nb_cargs);
-    }
-
-    return length;
-}
diff --git a/tcg/tci.c b/tcg/tci.c
index 3dc89ed829..6843e837ae 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -1076,3 +1076,286 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         }
     }
 }
+
+/*
+ * Disassembler that matches the interpreter
+ */
+
+static const char *str_r(TCGReg r)
+{
+    static const char regs[TCG_TARGET_NB_REGS][4] = {
+        "r0", "r1", "r2",  "r3",  "r4",  "r5",  "r6",  "r7",
+        "r8", "r9", "r10", "r11", "r12", "r13", "env", "sp"
+    };
+
+    QEMU_BUILD_BUG_ON(TCG_AREG0 != TCG_REG_R14);
+    QEMU_BUILD_BUG_ON(TCG_REG_CALL_STACK != TCG_REG_R15);
+
+    assert((unsigned)r < TCG_TARGET_NB_REGS);
+    return regs[r];
+}
+
+static const char *str_c(TCGCond c)
+{
+    static const char cond[16][8] = {
+        [TCG_COND_NEVER] = "never",
+        [TCG_COND_ALWAYS] = "always",
+        [TCG_COND_EQ] = "eq",
+        [TCG_COND_NE] = "ne",
+        [TCG_COND_LT] = "lt",
+        [TCG_COND_GE] = "ge",
+        [TCG_COND_LE] = "le",
+        [TCG_COND_GT] = "gt",
+        [TCG_COND_LTU] = "ltu",
+        [TCG_COND_GEU] = "geu",
+        [TCG_COND_LEU] = "leu",
+        [TCG_COND_GTU] = "gtu",
+    };
+
+    assert((unsigned)c < ARRAY_SIZE(cond));
+    assert(cond[c][0] != 0);
+    return cond[c];
+}
+
+/* Disassemble TCI bytecode. */
+int print_insn_tci(bfd_vma addr, disassemble_info *info)
+{
+    uint8_t buf[256];
+    int length, status;
+    const TCGOpDef *def;
+    const char *op_name;
+    TCGOpcode op;
+    TCGReg r0, r1, r2, r3;
+#if TCG_TARGET_REG_BITS == 32
+    TCGReg r4, r5;
+#endif
+    tcg_target_ulong i1;
+    int32_t s2;
+    TCGCond c;
+    TCGMemOpIdx oi;
+    uint8_t pos, len;
+    void *ptr;
+    const uint8_t *tb_ptr;
+
+    status = info->read_memory_func(addr, buf, 2, info);
+    if (status != 0) {
+        info->memory_error_func(status, addr, info);
+        return -1;
+    }
+    op = buf[0];
+    length = buf[1];
+
+    if (length < 2) {
+        info->fprintf_func(info->stream, "invalid length %d", length);
+        return 1;
+    }
+
+    status = info->read_memory_func(addr + 2, buf + 2, length - 2, info);
+    if (status != 0) {
+        info->memory_error_func(status, addr + 2, info);
+        return -1;
+    }
+
+    def = &tcg_op_defs[op];
+    op_name = def->name;
+    tb_ptr = buf + 2;
+
+    switch (op) {
+    case INDEX_op_br:
+    case INDEX_op_call:
+    case INDEX_op_exit_tb:
+    case INDEX_op_goto_tb:
+        tci_args_l(&tb_ptr, &ptr);
+        info->fprintf_func(info->stream, "%-12s  %p", op_name, ptr);
+        break;
+
+    case INDEX_op_brcond_i32:
+    case INDEX_op_brcond_i64:
+        tci_args_rrcl(&tb_ptr, &r0, &r1, &c, &ptr);
+        info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%p",
+                           op_name, str_r(r0), str_r(r1), str_c(c), ptr);
+        break;
+
+    case INDEX_op_setcond_i32:
+    case INDEX_op_setcond_i64:
+        tci_args_rrrc(&tb_ptr, &r0, &r1, &r2, &c);
+        info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s",
+                           op_name, str_r(r0), str_r(r1), str_r(r2), str_c(c));
+        break;
+
+    case INDEX_op_tci_movi_i32:
+        tci_args_ri(&tb_ptr, &r0, &i1);
+        info->fprintf_func(info->stream, "%-12s  %s,0x%" TCG_PRIlx "",
+                           op_name, str_r(r0), i1);
+        break;
+
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_tci_movi_i64:
+        tci_args_rI(&tb_ptr, &r0, &i1);
+        info->fprintf_func(info->stream, "%-12s  %s,0x%" TCG_PRIlx "",
+                           op_name, str_r(r0), i1);
+        break;
+#endif
+
+    case INDEX_op_ld8u_i32:
+    case INDEX_op_ld8u_i64:
+    case INDEX_op_ld8s_i32:
+    case INDEX_op_ld8s_i64:
+    case INDEX_op_ld16u_i32:
+    case INDEX_op_ld16u_i64:
+    case INDEX_op_ld16s_i32:
+    case INDEX_op_ld16s_i64:
+    case INDEX_op_ld32u_i64:
+    case INDEX_op_ld32s_i64:
+    case INDEX_op_ld_i32:
+    case INDEX_op_ld_i64:
+    case INDEX_op_st8_i32:
+    case INDEX_op_st8_i64:
+    case INDEX_op_st16_i32:
+    case INDEX_op_st16_i64:
+    case INDEX_op_st32_i64:
+    case INDEX_op_st_i32:
+    case INDEX_op_st_i64:
+        tci_args_rrs(&tb_ptr, &r0, &r1, &s2);
+        info->fprintf_func(info->stream, "%-12s  %s,%s,%d",
+                           op_name, str_r(r0), str_r(r1), s2);
+        break;
+
+    case INDEX_op_mov_i32:
+    case INDEX_op_mov_i64:
+    case INDEX_op_ext8s_i32:
+    case INDEX_op_ext8s_i64:
+    case INDEX_op_ext8u_i32:
+    case INDEX_op_ext8u_i64:
+    case INDEX_op_ext16s_i32:
+    case INDEX_op_ext16s_i64:
+    case INDEX_op_ext16u_i32:
+    case INDEX_op_ext32s_i64:
+    case INDEX_op_ext32u_i64:
+    case INDEX_op_ext_i32_i64:
+    case INDEX_op_extu_i32_i64:
+    case INDEX_op_bswap16_i32:
+    case INDEX_op_bswap16_i64:
+    case INDEX_op_bswap32_i32:
+    case INDEX_op_bswap32_i64:
+    case INDEX_op_bswap64_i64:
+    case INDEX_op_not_i32:
+    case INDEX_op_not_i64:
+    case INDEX_op_neg_i32:
+    case INDEX_op_neg_i64:
+        tci_args_rr(&tb_ptr, &r0, &r1);
+        info->fprintf_func(info->stream, "%-12s  %s,%s",
+                           op_name, str_r(r0), str_r(r1));
+        break;
+
+    case INDEX_op_add_i32:
+    case INDEX_op_add_i64:
+    case INDEX_op_sub_i32:
+    case INDEX_op_sub_i64:
+    case INDEX_op_mul_i32:
+    case INDEX_op_mul_i64:
+    case INDEX_op_and_i32:
+    case INDEX_op_and_i64:
+    case INDEX_op_or_i32:
+    case INDEX_op_or_i64:
+    case INDEX_op_xor_i32:
+    case INDEX_op_xor_i64:
+    case INDEX_op_div_i32:
+    case INDEX_op_div_i64:
+    case INDEX_op_rem_i32:
+    case INDEX_op_rem_i64:
+    case INDEX_op_divu_i32:
+    case INDEX_op_divu_i64:
+    case INDEX_op_remu_i32:
+    case INDEX_op_remu_i64:
+    case INDEX_op_shl_i32:
+    case INDEX_op_shl_i64:
+    case INDEX_op_shr_i32:
+    case INDEX_op_shr_i64:
+    case INDEX_op_sar_i32:
+    case INDEX_op_sar_i64:
+    case INDEX_op_rotl_i32:
+    case INDEX_op_rotl_i64:
+    case INDEX_op_rotr_i32:
+    case INDEX_op_rotr_i64:
+        tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+        info->fprintf_func(info->stream, "%-12s  %s,%s,%s",
+                           op_name, str_r(r0), str_r(r1), str_r(r2));
+        break;
+
+    case INDEX_op_deposit_i32:
+    case INDEX_op_deposit_i64:
+        tci_args_rrrbb(&tb_ptr, &r0, &r1, &r2, &pos, &len);
+        info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%d,%d",
+                           op_name, str_r(r0), str_r(r1), str_r(r2), pos, len);
+        break;
+
+#if TCG_TARGET_REG_BITS == 32
+    case INDEX_op_setcond2_i32:
+        tci_args_rrrrrc(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &c);
+        info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s,%s,%s",
+                           op_name, str_r(r0), str_r(r1), str_r(r2),
+                           str_r(r3), str_r(r4), str_c(c));
+        break;
+
+    case INDEX_op_brcond2_i32:
+        tci_args_rrrrcl(&tb_ptr, &r0, &r1, &r2, &r3, &c, &ptr);
+        info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s,%s,%p",
+                           op_name, str_r(r0), str_r(r1),
+                           str_r(r2), str_r(r3), str_c(c), ptr);
+        break;
+
+    case INDEX_op_mulu2_i32:
+        tci_args_rrrr(&tb_ptr, &r0, &r1, &r2, &r3);
+        info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s",
+                           op_name, str_r(r0), str_r(r1),
+                           str_r(r2), str_r(r3));
+        break;
+
+    case INDEX_op_add2_i32:
+    case INDEX_op_sub2_i32:
+        tci_args_rrrrrr(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &r5);
+        info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s,%s,%s",
+                           op_name, str_r(r0), str_r(r1), str_r(r2),
+                           str_r(r3), str_r(r4), str_r(r5));
+        break;
+#endif
+
+    case INDEX_op_qemu_ld_i64:
+    case INDEX_op_qemu_st_i64:
+        len = DIV_ROUND_UP(64, TCG_TARGET_REG_BITS);
+        goto do_qemu_ldst;
+    case INDEX_op_qemu_ld_i32:
+    case INDEX_op_qemu_st_i32:
+        len = 1;
+    do_qemu_ldst:
+        len += DIV_ROUND_UP(TARGET_LONG_BITS, TCG_TARGET_REG_BITS);
+        switch (len) {
+        case 2:
+            tci_args_rrm(&tb_ptr, &r0, &r1, &oi);
+            info->fprintf_func(info->stream, "%-12s  %s,%s,%x",
+                               op_name, str_r(r0), str_r(r1), oi);
+            break;
+        case 3:
+            tci_args_rrrm(&tb_ptr, &r0, &r1, &r2, &oi);
+            info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%x",
+                               op_name, str_r(r0), str_r(r1), str_r(r2), oi);
+            break;
+        case 4:
+            tci_args_rrrrm(&tb_ptr, &r0, &r1, &r2, &r3, &oi);
+            info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s,%x",
+                               op_name, str_r(r0), str_r(r1),
+                               str_r(r2), str_r(r3), oi);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        break;
+
+    default:
+        info->fprintf_func(info->stream, "illegal opcode %d", op);
+        break;
+    }
+
+    return length;
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 62/93] tcg: Build ffi data structures for helpers
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (60 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 61/93] tcg/tci: Implement the disassembler properly Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 63/93] tcg/tci: Use ffi for calls Richard Henderson
                   ` (33 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

We will shortly use libffi for tci, as that is the only
portable way of calling arbitrary functions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 meson.build                            |   9 +-
 include/exec/helper-ffi.h              | 115 +++++++++++++++++++++++++
 include/exec/helper-tcg.h              |  24 ++++--
 target/hppa/helper.h                   |   2 +
 target/i386/ops_sse_header.h           |   6 ++
 target/m68k/helper.h                   |   1 +
 target/ppc/helper.h                    |   3 +
 tcg/tcg.c                              |  20 +++++
 tests/docker/dockerfiles/fedora.docker |   1 +
 9 files changed, 172 insertions(+), 9 deletions(-)
 create mode 100644 include/exec/helper-ffi.h

diff --git a/meson.build b/meson.build
index 475d8a94ea..fc08f15a00 100644
--- a/meson.build
+++ b/meson.build
@@ -1901,7 +1901,14 @@ specific_ss.add(when: 'CONFIG_TCG', if_true: files(
   'tcg/tcg-op.c',
   'tcg/tcg.c',
 ))
-specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tcg/tci.c'))
+
+if get_option('tcg_interpreter')
+  libffi = dependency('libffi', version: '>=3.0',
+                      static: enable_static, method: 'pkg-config',
+                      required: true)
+  specific_ss.add(libffi)
+  specific_ss.add(files('tcg/tci.c'))
+endif
 
 subdir('backends')
 subdir('disas')
diff --git a/include/exec/helper-ffi.h b/include/exec/helper-ffi.h
new file mode 100644
index 0000000000..3af1065af3
--- /dev/null
+++ b/include/exec/helper-ffi.h
@@ -0,0 +1,115 @@
+/*
+ * Helper file for declaring TCG helper functions.
+ * This one defines data structures private to tcg.c.
+ */
+
+#ifndef HELPER_FFI_H
+#define HELPER_FFI_H 1
+
+#include "exec/helper-head.h"
+
+#define dh_ffitype_i32  &ffi_type_uint32
+#define dh_ffitype_s32  &ffi_type_sint32
+#define dh_ffitype_int  &ffi_type_sint
+#define dh_ffitype_i64  &ffi_type_uint64
+#define dh_ffitype_s64  &ffi_type_sint64
+#define dh_ffitype_f16  &ffi_type_uint32
+#define dh_ffitype_f32  &ffi_type_uint32
+#define dh_ffitype_f64  &ffi_type_uint64
+#ifdef TARGET_LONG_BITS
+# if TARGET_LONG_BITS == 32
+#  define dh_ffitype_tl &ffi_type_uint32
+# else
+#  define dh_ffitype_tl &ffi_type_uint64
+# endif
+#endif
+#define dh_ffitype_ptr  &ffi_type_pointer
+#define dh_ffitype_cptr &ffi_type_pointer
+#define dh_ffitype_void &ffi_type_void
+#define dh_ffitype_noreturn &ffi_type_void
+#define dh_ffitype_env  &ffi_type_pointer
+#define dh_ffitype(t) glue(dh_ffitype_, t)
+
+#define DEF_HELPER_FLAGS_0(NAME, FLAGS, ret)    \
+    static ffi_cif glue(cif_,NAME) = {          \
+        .rtype = dh_ffitype(ret), .nargs = 0,   \
+    };
+
+#define DEF_HELPER_FLAGS_1(NAME, FLAGS, ret, t1)                        \
+    static ffi_type *glue(cif_args_,NAME)[1] = { dh_ffitype(t1) };      \
+    static ffi_cif glue(cif_,NAME) = {                                  \
+        .rtype = dh_ffitype(ret), .nargs = 1,                           \
+        .arg_types = glue(cif_args_,NAME),                              \
+    };
+
+#define DEF_HELPER_FLAGS_2(NAME, FLAGS, ret, t1, t2)    \
+    static ffi_type *glue(cif_args_,NAME)[2] = {        \
+        dh_ffitype(t1), dh_ffitype(t2)                  \
+    };                                                  \
+    static ffi_cif glue(cif_,NAME) = {                  \
+        .rtype = dh_ffitype(ret), .nargs = 2,           \
+        .arg_types = glue(cif_args_,NAME),              \
+    };
+
+#define DEF_HELPER_FLAGS_3(NAME, FLAGS, ret, t1, t2, t3)        \
+    static ffi_type *glue(cif_args_,NAME)[3] = {                \
+        dh_ffitype(t1), dh_ffitype(t2), dh_ffitype(t3)          \
+    };                                                          \
+    static ffi_cif glue(cif_,NAME) = {                          \
+        .rtype = dh_ffitype(ret), .nargs = 3,                   \
+        .arg_types = glue(cif_args_,NAME),                      \
+    };
+
+#define DEF_HELPER_FLAGS_4(NAME, FLAGS, ret, t1, t2, t3, t4)            \
+    static ffi_type *glue(cif_args_,NAME)[4] = {                        \
+        dh_ffitype(t1), dh_ffitype(t2), dh_ffitype(t3), dh_ffitype(t4)  \
+    };                                                                  \
+    static ffi_cif glue(cif_,NAME) = {                                  \
+        .rtype = dh_ffitype(ret), .nargs = 4,                           \
+        .arg_types = glue(cif_args_,NAME),                              \
+    };
+
+#define DEF_HELPER_FLAGS_5(NAME, FLAGS, ret, t1, t2, t3, t4, t5)        \
+    static ffi_type *glue(cif_args_,NAME)[5] = {                        \
+        dh_ffitype(t1), dh_ffitype(t2), dh_ffitype(t3),                 \
+        dh_ffitype(t4), dh_ffitype(t5)                                  \
+    };                                                                  \
+    static ffi_cif glue(cif_,NAME) = {                                  \
+        .rtype = dh_ffitype(ret), .nargs = 5,                           \
+        .arg_types = glue(cif_args_,NAME),                              \
+    };
+
+#define DEF_HELPER_FLAGS_6(NAME, FLAGS, ret, t1, t2, t3, t4, t5, t6)    \
+    static ffi_type *glue(cif_args_,NAME)[6] = {                        \
+        dh_ffitype(t1), dh_ffitype(t2), dh_ffitype(t3),                 \
+        dh_ffitype(t4), dh_ffitype(t5), dh_ffitype(t6)                  \
+    };                                                                  \
+    static ffi_cif glue(cif_,NAME) = {                                  \
+        .rtype = dh_ffitype(ret), .nargs = 6,                           \
+        .arg_types = glue(cif_args_,NAME),                              \
+    };
+
+#define DEF_HELPER_FLAGS_7(NAME, FLAGS, ret, t1, t2, t3, t4, t5, t6, t7) \
+    static ffi_type *glue(cif_args_,NAME)[7] = {                        \
+        dh_ffitype(t1), dh_ffitype(t2), dh_ffitype(t3),                 \
+        dh_ffitype(t4), dh_ffitype(t5), dh_ffitype(t6), dh_ffitype(t7)  \
+    };                                                                  \
+    static ffi_cif glue(cif_,NAME) = {                                  \
+        .rtype = dh_ffitype(ret), .nargs = 7,                           \
+        .arg_types = glue(cif_args_,NAME),                              \
+    };
+
+#include "helper.h"
+#include "trace/generated-helpers.h"
+#include "tcg-runtime.h"
+
+#undef DEF_HELPER_FLAGS_0
+#undef DEF_HELPER_FLAGS_1
+#undef DEF_HELPER_FLAGS_2
+#undef DEF_HELPER_FLAGS_3
+#undef DEF_HELPER_FLAGS_4
+#undef DEF_HELPER_FLAGS_5
+#undef DEF_HELPER_FLAGS_6
+#undef DEF_HELPER_FLAGS_7
+
+#endif /* HELPER_FFI_H */
diff --git a/include/exec/helper-tcg.h b/include/exec/helper-tcg.h
index 27870509a2..a71b848576 100644
--- a/include/exec/helper-tcg.h
+++ b/include/exec/helper-tcg.h
@@ -10,50 +10,57 @@
    to get all the macros expanded first.  */
 #define str(s) #s
 
+#ifdef CONFIG_TCG_INTERPRETER
+# define DO_CIF(NAME)  .cif = &cif_##NAME,
+#else
+# define DO_CIF(NAME)
+#endif
+
 #define DEF_HELPER_FLAGS_0(NAME, FLAGS, ret) \
-  { .func = HELPER(NAME), .name = str(NAME), \
+  { .func = HELPER(NAME), DO_CIF(NAME) .name = str(NAME), \
     .flags = FLAGS | dh_callflag(ret), \
     .sizemask = dh_sizemask(ret, 0) },
 
 #define DEF_HELPER_FLAGS_1(NAME, FLAGS, ret, t1) \
-  { .func = HELPER(NAME), .name = str(NAME), \
+  { .func = HELPER(NAME), DO_CIF(NAME) .name = str(NAME), \
     .flags = FLAGS | dh_callflag(ret), \
     .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) },
 
 #define DEF_HELPER_FLAGS_2(NAME, FLAGS, ret, t1, t2) \
-  { .func = HELPER(NAME), .name = str(NAME), \
+  { .func = HELPER(NAME), DO_CIF(NAME) .name = str(NAME), \
     .flags = FLAGS | dh_callflag(ret), \
     .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) \
     | dh_sizemask(t2, 2) },
 
 #define DEF_HELPER_FLAGS_3(NAME, FLAGS, ret, t1, t2, t3) \
-  { .func = HELPER(NAME), .name = str(NAME), \
+  { .func = HELPER(NAME), DO_CIF(NAME) .name = str(NAME), \
     .flags = FLAGS | dh_callflag(ret), \
     .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) \
     | dh_sizemask(t2, 2) | dh_sizemask(t3, 3) },
 
 #define DEF_HELPER_FLAGS_4(NAME, FLAGS, ret, t1, t2, t3, t4) \
-  { .func = HELPER(NAME), .name = str(NAME), \
+  { .func = HELPER(NAME), DO_CIF(NAME) .name = str(NAME), \
     .flags = FLAGS | dh_callflag(ret), \
     .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) \
     | dh_sizemask(t2, 2) | dh_sizemask(t3, 3) | dh_sizemask(t4, 4) },
 
 #define DEF_HELPER_FLAGS_5(NAME, FLAGS, ret, t1, t2, t3, t4, t5) \
-  { .func = HELPER(NAME), .name = str(NAME), \
+  { .func = HELPER(NAME), DO_CIF(NAME) .name = str(NAME), \
     .flags = FLAGS | dh_callflag(ret), \
     .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) \
     | dh_sizemask(t2, 2) | dh_sizemask(t3, 3) | dh_sizemask(t4, 4) \
     | dh_sizemask(t5, 5) },
 
 #define DEF_HELPER_FLAGS_6(NAME, FLAGS, ret, t1, t2, t3, t4, t5, t6) \
-  { .func = HELPER(NAME), .name = str(NAME), \
+  { .func = HELPER(NAME), DO_CIF(NAME) .name = str(NAME), \
     .flags = FLAGS | dh_callflag(ret), \
     .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) \
     | dh_sizemask(t2, 2) | dh_sizemask(t3, 3) | dh_sizemask(t4, 4) \
     | dh_sizemask(t5, 5) | dh_sizemask(t6, 6) },
 
 #define DEF_HELPER_FLAGS_7(NAME, FLAGS, ret, t1, t2, t3, t4, t5, t6, t7) \
-  { .func = HELPER(NAME), .name = str(NAME), .flags = FLAGS, \
+  { .func = HELPER(NAME), DO_CIF(NAME) .name = str(NAME), \
+    .flags = FLAGS | dh_callflag(ret), \
     .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) \
     | dh_sizemask(t2, 2) | dh_sizemask(t3, 3) | dh_sizemask(t4, 4) \
     | dh_sizemask(t5, 5) | dh_sizemask(t6, 6) | dh_sizemask(t7, 7) },
@@ -64,6 +71,7 @@
 #include "plugin-helpers.h"
 
 #undef str
+#undef DO_CIF
 #undef DEF_HELPER_FLAGS_0
 #undef DEF_HELPER_FLAGS_1
 #undef DEF_HELPER_FLAGS_2
diff --git a/target/hppa/helper.h b/target/hppa/helper.h
index 2d483aab58..35c612f09d 100644
--- a/target/hppa/helper.h
+++ b/target/hppa/helper.h
@@ -1,9 +1,11 @@
 #if TARGET_REGISTER_BITS == 64
 # define dh_alias_tr     i64
 # define dh_is_64bit_tr  1
+# define dh_ffitype_tr   dh_ffitype_i64
 #else
 # define dh_alias_tr     i32
 # define dh_is_64bit_tr  0
+# define dh_ffitype_tr   dh_ffitype_i32
 #endif
 #define dh_ctype_tr      target_ureg
 #define dh_is_signed_tr  0
diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h
index 6c0c849347..cae50f77eb 100644
--- a/target/i386/ops_sse_header.h
+++ b/target/i386/ops_sse_header.h
@@ -27,13 +27,19 @@
 #define dh_alias_Reg ptr
 #define dh_alias_ZMMReg ptr
 #define dh_alias_MMXReg ptr
+
 #define dh_ctype_Reg Reg *
 #define dh_ctype_ZMMReg ZMMReg *
 #define dh_ctype_MMXReg MMXReg *
+
 #define dh_is_signed_Reg dh_is_signed_ptr
 #define dh_is_signed_ZMMReg dh_is_signed_ptr
 #define dh_is_signed_MMXReg dh_is_signed_ptr
 
+#define dh_ffitype_Reg  dh_ffitype_ptr
+#define dh_ffitype_ZMMReg  dh_ffitype_ptr
+#define dh_ffitype_MMXReg  dh_ffitype_ptr
+
 DEF_HELPER_3(glue(psrlw, SUFFIX), void, env, Reg, Reg)
 DEF_HELPER_3(glue(psraw, SUFFIX), void, env, Reg, Reg)
 DEF_HELPER_3(glue(psllw, SUFFIX), void, env, Reg, Reg)
diff --git a/target/m68k/helper.h b/target/m68k/helper.h
index 77808497a9..672c99d5de 100644
--- a/target/m68k/helper.h
+++ b/target/m68k/helper.h
@@ -18,6 +18,7 @@ DEF_HELPER_4(cas2l_parallel, void, env, i32, i32, i32)
 #define dh_alias_fp ptr
 #define dh_ctype_fp FPReg *
 #define dh_is_signed_fp dh_is_signed_ptr
+#define dh_ffitype_fp dh_ffitype_ptr
 
 DEF_HELPER_3(exts32, void, env, fp, s32)
 DEF_HELPER_3(extf32, void, env, fp, f32)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 6a4dccf70c..bbd4700064 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -108,10 +108,12 @@ DEF_HELPER_FLAGS_1(ftsqrt, TCG_CALL_NO_RWG_SE, i32, i64)
 #define dh_alias_avr ptr
 #define dh_ctype_avr ppc_avr_t *
 #define dh_is_signed_avr dh_is_signed_ptr
+#define dh_ffitype_avr dh_ffitype_ptr
 
 #define dh_alias_vsr ptr
 #define dh_ctype_vsr ppc_vsr_t *
 #define dh_is_signed_vsr dh_is_signed_ptr
+#define dh_ffitype_vsr dh_ffitype_ptr
 
 DEF_HELPER_3(vavgub, void, avr, avr, avr)
 DEF_HELPER_3(vavguh, void, avr, avr, avr)
@@ -696,6 +698,7 @@ DEF_HELPER_3(store_601_batu, void, env, i32, tl)
 #define dh_alias_fprp ptr
 #define dh_ctype_fprp ppc_fprp_t *
 #define dh_is_signed_fprp dh_is_signed_ptr
+#define dh_ffitype_fprp dh_ffitype_ptr
 
 DEF_HELPER_4(dadd, void, env, fprp, fprp, fprp)
 DEF_HELPER_4(daddq, void, env, fprp, fprp, fprp)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 2991112829..6382112215 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -66,6 +66,10 @@
 #include "exec/log.h"
 #include "sysemu/sysemu.h"
 
+#ifdef CONFIG_TCG_INTERPRETER
+#include <ffi.h>
+#endif
+
 /* Forward declarations for functions declared in tcg-target.c.inc and
    used here. */
 static void tcg_target_init(TCGContext *s);
@@ -1082,6 +1086,9 @@ void tcg_pool_reset(TCGContext *s)
 
 typedef struct TCGHelperInfo {
     void *func;
+#ifdef CONFIG_TCG_INTERPRETER
+    ffi_cif *cif;
+#endif
     const char *name;
     unsigned flags;
     unsigned sizemask;
@@ -1089,6 +1096,10 @@ typedef struct TCGHelperInfo {
 
 #include "exec/helper-proto.h"
 
+#ifdef CONFIG_TCG_INTERPRETER
+#include "exec/helper-ffi.h"
+#endif
+
 static const TCGHelperInfo all_helpers[] = {
 #include "exec/helper-tcg.h"
 };
@@ -1136,6 +1147,15 @@ void tcg_context_init(TCGContext *s)
                             (gpointer)&all_helpers[i]);
     }
 
+#ifdef CONFIG_TCG_INTERPRETER
+    for (i = 0; i < ARRAY_SIZE(all_helpers); ++i) {
+        ffi_cif *cif = all_helpers[i].cif;
+        ffi_status ok = ffi_prep_cif(cif, FFI_DEFAULT_ABI, cif->nargs,
+                                     cif->rtype, cif->arg_types);
+        tcg_debug_assert(ok == FFI_OK);
+    }
+#endif
+
     tcg_target_init(s);
     process_op_defs(s);
 
diff --git a/tests/docker/dockerfiles/fedora.docker b/tests/docker/dockerfiles/fedora.docker
index 0d7602abbe..45fc1a77bd 100644
--- a/tests/docker/dockerfiles/fedora.docker
+++ b/tests/docker/dockerfiles/fedora.docker
@@ -32,6 +32,7 @@ ENV PACKAGES \
     libcurl-devel \
     libepoxy-devel \
     libfdt-devel \
+    libffi-devel \
     libiscsi-devel \
     libjpeg-devel \
     libpmem-devel \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (61 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 62/93] tcg: Build ffi data structures for helpers Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-07 16:25   ` Stefan Weil
  2021-02-04  1:44 ` [PATCH v2 64/93] tcg/tci: Improve tcg_target_call_clobber_regs Richard Henderson
                   ` (32 subsequent siblings)
  95 siblings, 1 reply; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

This requires adjusting where arguments are stored.
Place them on the stack at left-aligned positions.
Adjust the stack frame to be at entirely positive offsets.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h        |   1 +
 tcg/tci/tcg-target.h     |   2 +-
 tcg/tcg.c                |  72 ++++++++++++---------
 tcg/tci.c                | 131 ++++++++++++++++++++++-----------------
 tcg/tci/tcg-target.c.inc |  50 +++++++--------
 5 files changed, 143 insertions(+), 113 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 0f0695e90d..e5573a9877 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -53,6 +53,7 @@
 #define MAX_OPC_PARAM (4 + (MAX_OPC_PARAM_PER_ARG * MAX_OPC_PARAM_ARGS))
 
 #define CPU_TEMP_BUF_NLONGS 128
+#define TCG_STATIC_FRAME_SIZE  (CPU_TEMP_BUF_NLONGS * sizeof(long))
 
 /* Default target word size to pointer size.  */
 #ifndef TCG_TARGET_REG_BITS
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 52af6d8bc5..4df10e2e83 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -161,7 +161,7 @@ typedef enum {
 
 /* Used for function call generation. */
 #define TCG_TARGET_CALL_STACK_OFFSET    0
-#define TCG_TARGET_STACK_ALIGN          16
+#define TCG_TARGET_STACK_ALIGN          8
 
 #define HAVE_TCG_QEMU_TB_EXEC
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 6382112215..92aec0d238 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -208,6 +208,18 @@ static size_t tree_size;
 static TCGRegSet tcg_target_available_regs[TCG_TYPE_COUNT];
 static TCGRegSet tcg_target_call_clobber_regs;
 
+typedef struct TCGHelperInfo {
+    void *func;
+#ifdef CONFIG_TCG_INTERPRETER
+    ffi_cif *cif;
+#endif
+    const char *name;
+    unsigned flags;
+    unsigned sizemask;
+} TCGHelperInfo;
+
+static GHashTable *helper_table;
+
 #if TCG_TARGET_INSN_UNIT_SIZE == 1
 static __attribute__((unused)) inline void tcg_out8(TCGContext *s, uint8_t v)
 {
@@ -1084,16 +1096,6 @@ void tcg_pool_reset(TCGContext *s)
     s->pool_current = NULL;
 }
 
-typedef struct TCGHelperInfo {
-    void *func;
-#ifdef CONFIG_TCG_INTERPRETER
-    ffi_cif *cif;
-#endif
-    const char *name;
-    unsigned flags;
-    unsigned sizemask;
-} TCGHelperInfo;
-
 #include "exec/helper-proto.h"
 
 #ifdef CONFIG_TCG_INTERPRETER
@@ -1103,7 +1105,6 @@ typedef struct TCGHelperInfo {
 static const TCGHelperInfo all_helpers[] = {
 #include "exec/helper-tcg.h"
 };
-static GHashTable *helper_table;
 
 static int indirect_reg_alloc_order[ARRAY_SIZE(tcg_target_reg_alloc_order)];
 static void process_op_defs(TCGContext *s);
@@ -2081,25 +2082,38 @@ void tcg_gen_callN(void *func, TCGTemp *ret, int nargs, TCGTemp **args)
 
     real_args = 0;
     for (i = 0; i < nargs; i++) {
-        int is_64bit = sizemask & (1 << (i+1)*2);
-        if (TCG_TARGET_REG_BITS < 64 && is_64bit) {
-#ifdef TCG_TARGET_CALL_ALIGN_ARGS
-            /* some targets want aligned 64 bit args */
-            if (real_args & 1) {
-                op->args[pi++] = TCG_CALL_DUMMY_ARG;
-                real_args++;
-            }
+        bool is_64bit = sizemask & (1 << (i+1)*2);
+        bool want_align = false;
+
+#if defined(CONFIG_TCG_INTERPRETER)
+        /*
+         * Align all arguments, so that they land in predictable places
+         * for passing off to ffi_call.
+         */
+        want_align = true;
+#elif defined(TCG_TARGET_CALL_ALIGN_ARGS)
+        /* Some targets want aligned 64 bit args */
+        want_align = is_64bit;
 #endif
-           /* If stack grows up, then we will be placing successive
-              arguments at lower addresses, which means we need to
-              reverse the order compared to how we would normally
-              treat either big or little-endian.  For those arguments
-              that will wind up in registers, this still works for
-              HPPA (the only current STACK_GROWSUP target) since the
-              argument registers are *also* allocated in decreasing
-              order.  If another such target is added, this logic may
-              have to get more complicated to differentiate between
-              stack arguments and register arguments.  */
+
+        if (TCG_TARGET_REG_BITS < 64 && want_align && (real_args & 1)) {
+            op->args[pi++] = TCG_CALL_DUMMY_ARG;
+            real_args++;
+        }
+
+        if (TCG_TARGET_REG_BITS < 64 && is_64bit) {
+           /*
+            * If stack grows up, then we will be placing successive
+            * arguments at lower addresses, which means we need to
+            * reverse the order compared to how we would normally
+            * treat either big or little-endian.  For those arguments
+            * that will wind up in registers, this still works for
+            * HPPA (the only current STACK_GROWSUP target) since the
+            * argument registers are *also* allocated in decreasing
+            * order.  If another such target is added, this logic may
+            * have to get more complicated to differentiate between
+            * stack arguments and register arguments.
+            */
 #if defined(HOST_WORDS_BIGENDIAN) != defined(TCG_TARGET_STACK_GROWSUP)
             op->args[pi++] = temp_arg(args[i] + 1);
             op->args[pi++] = temp_arg(args[i]);
diff --git a/tcg/tci.c b/tcg/tci.c
index 6843e837ae..d27db9f720 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -18,6 +18,13 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "tcg/tcg.h"           /* MAX_OPC_PARAM_IARGS */
+#include "exec/cpu_ldst.h"
+#include "tcg/tcg-op.h"
+#include "qemu/compiler.h"
+#include <ffi.h>
+
 
 /* Enable TCI assertions only when debugging TCG (and without NDEBUG defined).
  * Without assertions, the interpreter runs much faster. */
@@ -27,36 +34,8 @@
 # define tci_assert(cond) ((void)(cond))
 #endif
 
-#include "qemu-common.h"
-#include "tcg/tcg.h"           /* MAX_OPC_PARAM_IARGS */
-#include "exec/cpu_ldst.h"
-#include "tcg/tcg-op.h"
-#include "qemu/compiler.h"
-
-#if MAX_OPC_PARAM_IARGS != 6
-# error Fix needed, number of supported input arguments changed!
-#endif
-#if TCG_TARGET_REG_BITS == 32
-typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong,
-                                    tcg_target_ulong, tcg_target_ulong,
-                                    tcg_target_ulong, tcg_target_ulong,
-                                    tcg_target_ulong, tcg_target_ulong,
-                                    tcg_target_ulong, tcg_target_ulong,
-                                    tcg_target_ulong, tcg_target_ulong);
-#else
-typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong,
-                                    tcg_target_ulong, tcg_target_ulong,
-                                    tcg_target_ulong, tcg_target_ulong);
-#endif
-
 __thread uintptr_t tci_tb_ptr;
 
-static tcg_target_ulong tci_read_reg(const tcg_target_ulong *regs, TCGReg index)
-{
-    tci_assert(index < TCG_TARGET_NB_REGS);
-    return regs[index];
-}
-
 static void
 tci_write_reg(tcg_target_ulong *regs, TCGReg index, tcg_target_ulong value)
 {
@@ -131,6 +110,7 @@ static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
  *   i = immediate (uint32_t)
  *   I = immediate (tcg_target_ulong)
  *   m = immediate (TCGMemOpIdx)
+ *   n = immediate (call return length)
  *   r = register
  *   s = signed ldst offset
  */
@@ -151,6 +131,16 @@ static void tci_args_l(const uint8_t **tb_ptr, void **l0)
     check_size(start, tb_ptr);
 }
 
+static void tci_args_nl(const uint8_t **tb_ptr, uint8_t *n0, void **l1)
+{
+    const uint8_t *start = *tb_ptr;
+
+    *n0 = tci_read_b(tb_ptr);
+    *l1 = (void *)tci_read_label(tb_ptr);
+
+    check_size(start, tb_ptr);
+}
+
 static void tci_args_rr(const uint8_t **tb_ptr,
                         TCGReg *r0, TCGReg *r1)
 {
@@ -491,6 +481,7 @@ static bool tci_compare64(uint64_t u0, uint64_t u1, TCGCond condition)
 # define CASE_64(x)
 #endif
 
+
 /* Interpret pseudo code in tb. */
 /*
  * Disable CFI checks.
@@ -502,11 +493,13 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 {
     const uint8_t *tb_ptr = v_tb_ptr;
     tcg_target_ulong regs[TCG_TARGET_NB_REGS];
-    long tcg_temps[CPU_TEMP_BUF_NLONGS];
-    uintptr_t sp_value = (uintptr_t)(tcg_temps + CPU_TEMP_BUF_NLONGS);
+    uint64_t stack[(TCG_STATIC_CALL_ARGS_SIZE + TCG_STATIC_FRAME_SIZE)
+                   / sizeof(uint64_t)];
+    void *call_slots[TCG_STATIC_CALL_ARGS_SIZE / sizeof(uint64_t)];
 
     regs[TCG_AREG0] = (tcg_target_ulong)env;
-    regs[TCG_REG_CALL_STACK] = sp_value;
+    regs[TCG_REG_CALL_STACK] = (uintptr_t)stack;
+    call_slots[0] = NULL;
     tci_assert(tb_ptr);
 
     for (;;) {
@@ -531,33 +524,53 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         switch (opc) {
         case INDEX_op_call:
-            tci_args_l(&tb_ptr, &ptr);
+            /*
+             * We are passed a pointer to the TCGHelperInfo, which contains
+             * the function pointer followed by the ffi_cif pointer.
+             */
+            tci_args_nl(&tb_ptr, &len, &ptr);
+
+            /* Helper functions may need to access the "return address" */
             tci_tb_ptr = (uintptr_t)tb_ptr;
-#if TCG_TARGET_REG_BITS == 32
-            tmp64 = ((helper_function)ptr)(tci_read_reg(regs, TCG_REG_R0),
-                                           tci_read_reg(regs, TCG_REG_R1),
-                                           tci_read_reg(regs, TCG_REG_R2),
-                                           tci_read_reg(regs, TCG_REG_R3),
-                                           tci_read_reg(regs, TCG_REG_R4),
-                                           tci_read_reg(regs, TCG_REG_R5),
-                                           tci_read_reg(regs, TCG_REG_R6),
-                                           tci_read_reg(regs, TCG_REG_R7),
-                                           tci_read_reg(regs, TCG_REG_R8),
-                                           tci_read_reg(regs, TCG_REG_R9),
-                                           tci_read_reg(regs, TCG_REG_R10),
-                                           tci_read_reg(regs, TCG_REG_R11));
-            tci_write_reg(regs, TCG_REG_R0, tmp64);
-            tci_write_reg(regs, TCG_REG_R1, tmp64 >> 32);
-#else
-            tmp64 = ((helper_function)ptr)(tci_read_reg(regs, TCG_REG_R0),
-                                           tci_read_reg(regs, TCG_REG_R1),
-                                           tci_read_reg(regs, TCG_REG_R2),
-                                           tci_read_reg(regs, TCG_REG_R3),
-                                           tci_read_reg(regs, TCG_REG_R4),
-                                           tci_read_reg(regs, TCG_REG_R5));
-            tci_write_reg(regs, TCG_REG_R0, tmp64);
-#endif
+
+            /*
+             * Set up the ffi_avalue array once, delayed until now
+             * because many TB's do not make any calls. In tcg_gen_callN,
+             * we arranged for every real argument to be "left-aligned"
+             * in each 64-bit slot.
+             */
+            if (call_slots[0] == NULL) {
+                for (int i = 0; i < ARRAY_SIZE(call_slots); ++i) {
+                    call_slots[i] = &stack[i];
+                }
+            }
+
+            /*
+             * Call the helper function.  Any result winds up
+             * "left-aligned" in the stack[0] slot.
+             */
+            {
+                void **pptr = ptr;
+                ffi_call(pptr[1], pptr[0], stack, call_slots);
+            }
+            switch (len) {
+            case 0: /* void */
+                break;
+            case 1: /* uint32_t */
+                regs[TCG_REG_R0] = *(uint32_t *)stack;
+                break;
+            case 2: /* uint64_t */
+                if (TCG_TARGET_REG_BITS == 32) {
+                    tci_write_reg64(regs, TCG_REG_R1, TCG_REG_R0, stack[0]);
+                } else {
+                    regs[TCG_REG_R0] = stack[0];
+                }
+                break;
+            default:
+                g_assert_not_reached();
+            }
             break;
+
         case INDEX_op_br:
             tci_args_l(&tb_ptr, &ptr);
             tb_ptr = ptr;
@@ -1162,13 +1175,17 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
 
     switch (op) {
     case INDEX_op_br:
-    case INDEX_op_call:
     case INDEX_op_exit_tb:
     case INDEX_op_goto_tb:
         tci_args_l(&tb_ptr, &ptr);
         info->fprintf_func(info->stream, "%-12s  %p", op_name, ptr);
         break;
 
+    case INDEX_op_call:
+        tci_args_nl(&tb_ptr, &len, &ptr);
+        info->fprintf_func(info->stream, "%-12s  %d,%p", op_name, len, ptr);
+        break;
+
     case INDEX_op_brcond_i32:
     case INDEX_op_brcond_i64:
         tci_args_rrcl(&tb_ptr, &r0, &r1, &c, &ptr);
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 7fb3b04eaf..8d75482546 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -192,23 +192,8 @@ static const int tcg_target_reg_alloc_order[] = {
 # error Fix needed, number of supported input arguments changed!
 #endif
 
-static const int tcg_target_call_iarg_regs[] = {
-    TCG_REG_R0,
-    TCG_REG_R1,
-    TCG_REG_R2,
-    TCG_REG_R3,
-    TCG_REG_R4,
-    TCG_REG_R5,
-#if TCG_TARGET_REG_BITS == 32
-    /* 32 bit hosts need 2 * MAX_OPC_PARAM_IARGS registers. */
-    TCG_REG_R6,
-    TCG_REG_R7,
-    TCG_REG_R8,
-    TCG_REG_R9,
-    TCG_REG_R10,
-    TCG_REG_R11,
-#endif
-};
+/* No call arguments via registers.  All will be stored on the "stack". */
+static const int tcg_target_call_iarg_regs[] = { };
 
 static const int tcg_target_call_oarg_regs[] = {
     TCG_REG_R0,
@@ -292,8 +277,9 @@ static void tci_out_label(TCGContext *s, TCGLabel *label)
 static void stack_bounds_check(TCGReg base, target_long offset)
 {
     if (base == TCG_REG_CALL_STACK) {
-        tcg_debug_assert(offset < 0);
-        tcg_debug_assert(offset >= -(CPU_TEMP_BUF_NLONGS * sizeof(long)));
+        tcg_debug_assert(offset >= 0);
+        tcg_debug_assert(offset < (TCG_STATIC_CALL_ARGS_SIZE +
+                                   TCG_STATIC_FRAME_SIZE));
     }
 }
 
@@ -360,11 +346,25 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
-static inline void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
+static void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
 {
     uint8_t *old_code_ptr = s->code_ptr;
+    const TCGHelperInfo *info;
+    uint8_t which;
+
+    info = g_hash_table_lookup(helper_table, (gpointer)arg);
+    if (info->cif->rtype == &ffi_type_void) {
+        which = 0;
+    } else if (info->cif->rtype->size == 4) {
+        which = 1;
+    } else {
+        tcg_debug_assert(info->cif->rtype->size == 8);
+        which = 2;
+    }
     tcg_out_op_t(s, INDEX_op_call);
-    tcg_out_i(s, (uintptr_t)arg);
+    tcg_out8(s, which);
+    tcg_out_i(s, (uintptr_t)info);
+
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
@@ -629,11 +629,9 @@ static void tcg_target_init(TCGContext *s)
     s->reserved_regs = 0;
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
 
-    /* We use negative offsets from "sp" so that we can distinguish
-       stores that might pretend to be call arguments.  */
-    tcg_set_frame(s, TCG_REG_CALL_STACK,
-                  -CPU_TEMP_BUF_NLONGS * sizeof(long),
-                  CPU_TEMP_BUF_NLONGS * sizeof(long));
+    /* The call arguments come first, followed by the temp storage. */
+    tcg_set_frame(s, TCG_REG_CALL_STACK, TCG_STATIC_CALL_ARGS_SIZE,
+                  TCG_STATIC_FRAME_SIZE);
 }
 
 /* Generate global QEMU prologue and epilogue code. */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 64/93] tcg/tci: Improve tcg_target_call_clobber_regs
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (62 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 63/93] tcg/tci: Use ffi for calls Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 65/93] tcg/tci: Move call-return regs to end of tcg_target_reg_alloc_order Richard Henderson
                   ` (31 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

The current setting is much too pessimistic.  Indicating only
the one or two registers that are actually assigned after a
call should avoid unnecessary movement between the register
array and the stack array.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 8d75482546..4dae09deda 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -623,8 +623,14 @@ static void tcg_target_init(TCGContext *s)
     tcg_target_available_regs[TCG_TYPE_I32] = BIT(TCG_TARGET_NB_REGS) - 1;
     /* Registers available for 64 bit operations. */
     tcg_target_available_regs[TCG_TYPE_I64] = BIT(TCG_TARGET_NB_REGS) - 1;
-    /* TODO: Which registers should be set here? */
-    tcg_target_call_clobber_regs = BIT(TCG_TARGET_NB_REGS) - 1;
+    /*
+     * The interpreter "registers" are in the local stack frame and
+     * cannot be clobbered by the called helper functions.  However,
+     * the interpreter assumes a 64-bit return value and assigns to
+     * the return value registers.
+     */
+    tcg_target_call_clobber_regs =
+        MAKE_64BIT_MASK(TCG_REG_R0, 64 / TCG_TARGET_REG_BITS);
 
     s->reserved_regs = 0;
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 65/93] tcg/tci: Move call-return regs to end of tcg_target_reg_alloc_order
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (63 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 64/93] tcg/tci: Improve tcg_target_call_clobber_regs Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 66/93] tcg/tci: Push opcode emit into each case Richard Henderson
                   ` (30 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

As the only call-clobbered regs for TCI, these should
receive the least priority.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 4dae09deda..53edc50a3b 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -170,8 +170,6 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 }
 
 static const int tcg_target_reg_alloc_order[] = {
-    TCG_REG_R0,
-    TCG_REG_R1,
     TCG_REG_R2,
     TCG_REG_R3,
     TCG_REG_R4,
@@ -186,6 +184,8 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_R13,
     TCG_REG_R14,
     TCG_REG_R15,
+    TCG_REG_R1,
+    TCG_REG_R0,
 };
 
 #if MAX_OPC_PARAM_IARGS != 6
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 66/93] tcg/tci: Push opcode emit into each case
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (64 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 65/93] tcg/tci: Move call-return regs to end of tcg_target_reg_alloc_order Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 67/93] tcg/tci: Split out tcg_out_op_rrs Richard Henderson
                   ` (29 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

We're about to split out bytecode output into helpers, but
we can't do that one at a time if tcg_out_op_t is being done
outside of the switch.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 53edc50a3b..050d514853 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -385,40 +385,48 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 {
     uint8_t *old_code_ptr = s->code_ptr;
 
-    tcg_out_op_t(s, opc);
-
     switch (opc) {
     case INDEX_op_exit_tb:
+        tcg_out_op_t(s, opc);
         tcg_out_i(s, args[0]);
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
     case INDEX_op_goto_tb:
         tcg_debug_assert(s->tb_jmp_insn_offset == 0);
         /* indirect jump method. */
+        tcg_out_op_t(s, opc);
         tcg_out_i(s, (uintptr_t)(s->tb_jmp_target_addr + args[0]));
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         set_jmp_reset_offset(s, args[0]);
         break;
 
     case INDEX_op_br:
+        tcg_out_op_t(s, opc);
         tci_out_label(s, arg_label(args[0]));
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
     CASE_32_64(setcond)
+        tcg_out_op_t(s, opc);
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_out_r(s, args[2]);
         tcg_out8(s, args[3]);   /* condition */
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
 #if TCG_TARGET_REG_BITS == 32
     case INDEX_op_setcond2_i32:
         /* setcond2_i32 cond, t0, t1_low, t1_high, t2_low, t2_high */
+        tcg_out_op_t(s, opc);
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_out_r(s, args[2]);
         tcg_out_r(s, args[3]);
         tcg_out_r(s, args[4]);
         tcg_out8(s, args[5]);   /* condition */
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 #endif
 
@@ -436,10 +444,12 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     CASE_64(st32)
     CASE_64(st)
         stack_bounds_check(args[1], args[2]);
+        tcg_out_op_t(s, opc);
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_debug_assert(args[2] == (int32_t)args[2]);
         tcg_out32(s, args[2]);
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
     CASE_32_64(add)
@@ -462,12 +472,15 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     CASE_32_64(divu)     /* Optional (TCG_TARGET_HAS_div_*). */
     CASE_32_64(rem)      /* Optional (TCG_TARGET_HAS_div_*). */
     CASE_32_64(remu)     /* Optional (TCG_TARGET_HAS_div_*). */
+        tcg_out_op_t(s, opc);
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_out_r(s, args[2]);
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
     CASE_32_64(deposit)  /* Optional (TCG_TARGET_HAS_deposit_*). */
+        tcg_out_op_t(s, opc);
         {
             TCGArg pos = args[3], len = args[4];
             TCGArg max = opc == INDEX_op_deposit_i32 ? 32 : 64;
@@ -481,13 +494,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
             tcg_out8(s, pos);
             tcg_out8(s, len);
         }
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
     CASE_32_64(brcond)
+        tcg_out_op_t(s, opc);
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_out8(s, args[2]);           /* condition */
         tci_out_label(s, arg_label(args[3]));
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
     CASE_32_64(neg)      /* Optional (TCG_TARGET_HAS_neg_*). */
@@ -503,48 +519,59 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     CASE_32_64(bswap16)  /* Optional (TCG_TARGET_HAS_bswap16_*). */
     CASE_32_64(bswap32)  /* Optional (TCG_TARGET_HAS_bswap32_*). */
     CASE_64(bswap64)     /* Optional (TCG_TARGET_HAS_bswap64_i64). */
+        tcg_out_op_t(s, opc);
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
 #if TCG_TARGET_REG_BITS == 32
     case INDEX_op_add2_i32:
     case INDEX_op_sub2_i32:
+        tcg_out_op_t(s, opc);
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_out_r(s, args[2]);
         tcg_out_r(s, args[3]);
         tcg_out_r(s, args[4]);
         tcg_out_r(s, args[5]);
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
     case INDEX_op_brcond2_i32:
+        tcg_out_op_t(s, opc);
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_out_r(s, args[2]);
         tcg_out_r(s, args[3]);
         tcg_out8(s, args[4]);           /* condition */
         tci_out_label(s, arg_label(args[5]));
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
     case INDEX_op_mulu2_i32:
+        tcg_out_op_t(s, opc);
         tcg_out_r(s, args[0]);
         tcg_out_r(s, args[1]);
         tcg_out_r(s, args[2]);
         tcg_out_r(s, args[3]);
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 #endif
 
     case INDEX_op_qemu_ld_i32:
     case INDEX_op_qemu_st_i32:
+        tcg_out_op_t(s, opc);
         tcg_out_r(s, *args++);
         tcg_out_r(s, *args++);
         if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
             tcg_out_r(s, *args++);
         }
         tcg_out32(s, *args++);
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
     case INDEX_op_qemu_ld_i64:
     case INDEX_op_qemu_st_i64:
+        tcg_out_op_t(s, opc);
         tcg_out_r(s, *args++);
         if (TCG_TARGET_REG_BITS == 32) {
             tcg_out_r(s, *args++);
@@ -554,9 +581,12 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
             tcg_out_r(s, *args++);
         }
         tcg_out32(s, *args++);
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
     case INDEX_op_mb:
+        tcg_out_op_t(s, opc);
+        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
@@ -565,7 +595,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     default:
         tcg_abort();
     }
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
 static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 67/93] tcg/tci: Split out tcg_out_op_rrs
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (65 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 66/93] tcg/tci: Push opcode emit into each case Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 68/93] tcg/tci: Split out tcg_out_op_l Richard Henderson
                   ` (28 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 84 +++++++++++++++++++---------------------
 1 file changed, 39 insertions(+), 45 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 050d514853..707f801099 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -283,32 +283,38 @@ static void stack_bounds_check(TCGReg base, target_long offset)
     }
 }
 
-static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
-                       intptr_t arg2)
+static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
+                           TCGReg r0, TCGReg r1, intptr_t i2)
 {
     uint8_t *old_code_ptr = s->code_ptr;
 
-    stack_bounds_check(arg1, arg2);
-    if (type == TCG_TYPE_I32) {
-        tcg_out_op_t(s, INDEX_op_ld_i32);
-        tcg_out_r(s, ret);
-        tcg_out_r(s, arg1);
-        tcg_out32(s, arg2);
-    } else {
-        tcg_debug_assert(type == TCG_TYPE_I64);
-#if TCG_TARGET_REG_BITS == 64
-        tcg_out_op_t(s, INDEX_op_ld_i64);
-        tcg_out_r(s, ret);
-        tcg_out_r(s, arg1);
-        tcg_debug_assert(arg2 == (int32_t)arg2);
-        tcg_out32(s, arg2);
-#else
-        TODO();
-#endif
-    }
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_debug_assert(i2 == (int32_t)i2);
+    tcg_out32(s, i2);
+
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
+                       intptr_t offset)
+{
+    stack_bounds_check(base, offset);
+    switch (type) {
+    case TCG_TYPE_I32:
+        tcg_out_op_rrs(s, INDEX_op_ld_i32, val, base, offset);
+        break;
+#if TCG_TARGET_REG_BITS == 64
+    case TCG_TYPE_I64:
+        tcg_out_op_rrs(s, INDEX_op_ld_i64, val, base, offset);
+        break;
+#endif
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
     uint8_t *old_code_ptr = s->code_ptr;
@@ -444,12 +450,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     CASE_64(st32)
     CASE_64(st)
         stack_bounds_check(args[1], args[2]);
-        tcg_out_op_t(s, opc);
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        tcg_debug_assert(args[2] == (int32_t)args[2]);
-        tcg_out32(s, args[2]);
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
+        tcg_out_op_rrs(s, opc, args[0], args[1], args[2]);
         break;
 
     CASE_32_64(add)
@@ -597,29 +598,22 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     }
 }
 
-static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1,
-                       intptr_t arg2)
+static void tcg_out_st(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
+                       intptr_t offset)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
-
-    stack_bounds_check(arg1, arg2);
-    if (type == TCG_TYPE_I32) {
-        tcg_out_op_t(s, INDEX_op_st_i32);
-        tcg_out_r(s, arg);
-        tcg_out_r(s, arg1);
-        tcg_out32(s, arg2);
-    } else {
-        tcg_debug_assert(type == TCG_TYPE_I64);
+    stack_bounds_check(base, offset);
+    switch (type) {
+    case TCG_TYPE_I32:
+        tcg_out_op_rrs(s, INDEX_op_st_i32, val, base, offset);
+        break;
 #if TCG_TARGET_REG_BITS == 64
-        tcg_out_op_t(s, INDEX_op_st_i64);
-        tcg_out_r(s, arg);
-        tcg_out_r(s, arg1);
-        tcg_out32(s, arg2);
-#else
-        TODO();
+    case TCG_TYPE_I64:
+        tcg_out_op_rrs(s, INDEX_op_st_i64, val, base, offset);
+        break;
 #endif
+    default:
+        g_assert_not_reached();
     }
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
 static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 68/93] tcg/tci: Split out tcg_out_op_l
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (66 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 67/93] tcg/tci: Split out tcg_out_op_rrs Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 69/93] tcg/tci: Split out tcg_out_op_p Richard Henderson
                   ` (27 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 707f801099..1e3f2c4049 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -283,6 +283,16 @@ static void stack_bounds_check(TCGReg base, target_long offset)
     }
 }
 
+static void tcg_out_op_l(TCGContext *s, TCGOpcode op, TCGLabel *l0)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tci_out_label(s, l0);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
                            TCGReg r0, TCGReg r1, intptr_t i2)
 {
@@ -408,9 +418,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
 
     case INDEX_op_br:
-        tcg_out_op_t(s, opc);
-        tci_out_label(s, arg_label(args[0]));
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
+        tcg_out_op_l(s, opc, arg_label(args[0]));
         break;
 
     CASE_32_64(setcond)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 69/93] tcg/tci: Split out tcg_out_op_p
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (67 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 68/93] tcg/tci: Split out tcg_out_op_l Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 70/93] tcg/tci: Split out tcg_out_op_rr Richard Henderson
                   ` (26 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 1e3f2c4049..cb0cbbb8da 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -293,6 +293,16 @@ static void tcg_out_op_l(TCGContext *s, TCGOpcode op, TCGLabel *l0)
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_p(TCGContext *s, TCGOpcode op, void *p0)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_i(s, (uintptr_t)p0);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
                            TCGReg r0, TCGReg r1, intptr_t i2)
 {
@@ -403,17 +413,13 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 
     switch (opc) {
     case INDEX_op_exit_tb:
-        tcg_out_op_t(s, opc);
-        tcg_out_i(s, args[0]);
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
+        tcg_out_op_p(s, opc, (void *)args[0]);
         break;
 
     case INDEX_op_goto_tb:
         tcg_debug_assert(s->tb_jmp_insn_offset == 0);
         /* indirect jump method. */
-        tcg_out_op_t(s, opc);
-        tcg_out_i(s, (uintptr_t)(s->tb_jmp_target_addr + args[0]));
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
+        tcg_out_op_p(s, opc, s->tb_jmp_target_addr + args[0]);
         set_jmp_reset_offset(s, args[0]);
         break;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 70/93] tcg/tci: Split out tcg_out_op_rr
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (68 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 69/93] tcg/tci: Split out tcg_out_op_p Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 71/93] tcg/tci: Split out tcg_out_op_rrr Richard Henderson
                   ` (25 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

At the same time, validate the type argument in tcg_out_mov.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index cb0cbbb8da..272e3ca70b 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -303,6 +303,17 @@ static void tcg_out_op_p(TCGContext *s, TCGOpcode op, void *p0)
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
                            TCGReg r0, TCGReg r1, intptr_t i2)
 {
@@ -337,16 +348,18 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
 
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
-    tcg_debug_assert(ret != arg);
-#if TCG_TARGET_REG_BITS == 32
-    tcg_out_op_t(s, INDEX_op_mov_i32);
-#else
-    tcg_out_op_t(s, INDEX_op_mov_i64);
+    switch (type) {
+    case TCG_TYPE_I32:
+        tcg_out_op_rr(s, INDEX_op_mov_i32, ret, arg);
+        break;
+#if TCG_TARGET_REG_BITS == 64
+    case TCG_TYPE_I64:
+        tcg_out_op_rr(s, INDEX_op_mov_i64, ret, arg);
+        break;
 #endif
-    tcg_out_r(s, ret);
-    tcg_out_r(s, arg);
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    default:
+        g_assert_not_reached();
+    }
     return true;
 }
 
@@ -534,10 +547,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     CASE_32_64(bswap16)  /* Optional (TCG_TARGET_HAS_bswap16_*). */
     CASE_32_64(bswap32)  /* Optional (TCG_TARGET_HAS_bswap32_*). */
     CASE_64(bswap64)     /* Optional (TCG_TARGET_HAS_bswap64_i64). */
-        tcg_out_op_t(s, opc);
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
+        tcg_out_op_rr(s, opc, args[0], args[1]);
         break;
 
 #if TCG_TARGET_REG_BITS == 32
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 71/93] tcg/tci: Split out tcg_out_op_rrr
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (69 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 70/93] tcg/tci: Split out tcg_out_op_rr Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 72/93] tcg/tci: Split out tcg_out_op_rrrc Richard Henderson
                   ` (24 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 272e3ca70b..546424c2bd 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -314,6 +314,19 @@ static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrr(TCGContext *s, TCGOpcode op,
+                           TCGReg r0, TCGReg r1, TCGReg r2)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_out_r(s, r2);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
                            TCGReg r0, TCGReg r1, intptr_t i2)
 {
@@ -500,11 +513,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     CASE_32_64(divu)     /* Optional (TCG_TARGET_HAS_div_*). */
     CASE_32_64(rem)      /* Optional (TCG_TARGET_HAS_div_*). */
     CASE_32_64(remu)     /* Optional (TCG_TARGET_HAS_div_*). */
-        tcg_out_op_t(s, opc);
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        tcg_out_r(s, args[2]);
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
+        tcg_out_op_rrr(s, opc, args[0], args[1], args[2]);
         break;
 
     CASE_32_64(deposit)  /* Optional (TCG_TARGET_HAS_deposit_*). */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 72/93] tcg/tci: Split out tcg_out_op_rrrc
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (70 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 71/93] tcg/tci: Split out tcg_out_op_rrr Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 73/93] tcg/tci: Split out tcg_out_op_rrrrrc Richard Henderson
                   ` (23 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 546424c2bd..5848779208 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -341,6 +341,20 @@ static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
+                            TCGReg r0, TCGReg r1, TCGReg r2, TCGCond c3)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_out_r(s, r2);
+    tcg_out8(s, c3);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
                        intptr_t offset)
 {
@@ -454,12 +468,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
 
     CASE_32_64(setcond)
-        tcg_out_op_t(s, opc);
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        tcg_out_r(s, args[2]);
-        tcg_out8(s, args[3]);   /* condition */
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
+        tcg_out_op_rrrc(s, opc, args[0], args[1], args[2], args[3]);
         break;
 
 #if TCG_TARGET_REG_BITS == 32
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 73/93] tcg/tci: Split out tcg_out_op_rrrrrc
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (71 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 72/93] tcg/tci: Split out tcg_out_op_rrrc Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 74/93] tcg/tci: Split out tcg_out_op_rrrbb Richard Henderson
                   ` (22 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 5848779208..8eda159dde 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -355,6 +355,25 @@ static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+#if TCG_TARGET_REG_BITS == 32
+static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
+                              TCGReg r0, TCGReg r1, TCGReg r2,
+                              TCGReg r3, TCGReg r4, TCGCond c5)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_out_r(s, r2);
+    tcg_out_r(s, r3);
+    tcg_out_r(s, r4);
+    tcg_out8(s, c5);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+#endif
+
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
                        intptr_t offset)
 {
@@ -473,15 +492,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 
 #if TCG_TARGET_REG_BITS == 32
     case INDEX_op_setcond2_i32:
-        /* setcond2_i32 cond, t0, t1_low, t1_high, t2_low, t2_high */
-        tcg_out_op_t(s, opc);
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        tcg_out_r(s, args[2]);
-        tcg_out_r(s, args[3]);
-        tcg_out_r(s, args[4]);
-        tcg_out8(s, args[5]);   /* condition */
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
+        tcg_out_op_rrrrrc(s, opc, args[0], args[1], args[2],
+                          args[3], args[4], args[5]);
         break;
 #endif
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 74/93] tcg/tci: Split out tcg_out_op_rrrbb
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (72 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 73/93] tcg/tci: Split out tcg_out_op_rrrrrc Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 75/93] tcg/tci: Split out tcg_out_op_rrcl Richard Henderson
                   ` (21 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 8eda159dde..6c743a8fbd 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -355,6 +355,21 @@ static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
+                             TCGReg r1, TCGReg r2, uint8_t b3, uint8_t b4)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_out_r(s, r2);
+    tcg_out8(s, b3);
+    tcg_out8(s, b4);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 #if TCG_TARGET_REG_BITS == 32
 static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
                               TCGReg r0, TCGReg r1, TCGReg r2,
@@ -538,7 +553,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
 
     CASE_32_64(deposit)  /* Optional (TCG_TARGET_HAS_deposit_*). */
-        tcg_out_op_t(s, opc);
         {
             TCGArg pos = args[3], len = args[4];
             TCGArg max = opc == INDEX_op_deposit_i32 ? 32 : 64;
@@ -546,13 +560,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
             tcg_debug_assert(pos < max);
             tcg_debug_assert(pos + len <= max);
 
-            tcg_out_r(s, args[0]);
-            tcg_out_r(s, args[1]);
-            tcg_out_r(s, args[2]);
-            tcg_out8(s, pos);
-            tcg_out8(s, len);
+            tcg_out_op_rrrbb(s, opc, args[0], args[1], args[2], pos, len);
         }
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
     CASE_32_64(brcond)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 75/93] tcg/tci: Split out tcg_out_op_rrcl
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (73 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 74/93] tcg/tci: Split out tcg_out_op_rrrbb Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 76/93] tcg/tci: Split out tcg_out_op_rrrrrr Richard Henderson
                   ` (20 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 6c743a8fbd..8cc63124d4 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -341,6 +341,20 @@ static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrcl(TCGContext *s, TCGOpcode op,
+                            TCGReg r0, TCGReg r1, TCGCond c2, TCGLabel *l3)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_out8(s, c2);
+    tci_out_label(s, l3);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
                             TCGReg r0, TCGReg r1, TCGReg r2, TCGCond c3)
 {
@@ -565,12 +579,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
 
     CASE_32_64(brcond)
-        tcg_out_op_t(s, opc);
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        tcg_out8(s, args[2]);           /* condition */
-        tci_out_label(s, arg_label(args[3]));
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
+        tcg_out_op_rrcl(s, opc, args[0], args[1], args[2], arg_label(args[3]));
         break;
 
     CASE_32_64(neg)      /* Optional (TCG_TARGET_HAS_neg_*). */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 76/93] tcg/tci: Split out tcg_out_op_rrrrrr
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (74 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 75/93] tcg/tci: Split out tcg_out_op_rrcl Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 77/93] tcg/tci: Split out tcg_out_op_rrrr Richard Henderson
                   ` (19 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 8cc63124d4..f7595fbd65 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -401,6 +401,23 @@ static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
 
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
+
+static void tcg_out_op_rrrrrr(TCGContext *s, TCGOpcode op,
+                              TCGReg r0, TCGReg r1, TCGReg r2,
+                              TCGReg r3, TCGReg r4, TCGReg r5)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_out_r(s, r2);
+    tcg_out_r(s, r3);
+    tcg_out_r(s, r4);
+    tcg_out_r(s, r5);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
 #endif
 
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
@@ -601,14 +618,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 #if TCG_TARGET_REG_BITS == 32
     case INDEX_op_add2_i32:
     case INDEX_op_sub2_i32:
-        tcg_out_op_t(s, opc);
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        tcg_out_r(s, args[2]);
-        tcg_out_r(s, args[3]);
-        tcg_out_r(s, args[4]);
-        tcg_out_r(s, args[5]);
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
+        tcg_out_op_rrrrrr(s, opc, args[0], args[1], args[2],
+                          args[3], args[4], args[5]);
         break;
     case INDEX_op_brcond2_i32:
         tcg_out_op_t(s, opc);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 77/93] tcg/tci: Split out tcg_out_op_rrrr
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (75 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 76/93] tcg/tci: Split out tcg_out_op_rrrrrr Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 78/93] tcg/tci: Split out tcg_out_op_rrrrcl Richard Henderson
                   ` (18 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index f7595fbd65..c2bbd85130 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -385,6 +385,20 @@ static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
 }
 
 #if TCG_TARGET_REG_BITS == 32
+static void tcg_out_op_rrrr(TCGContext *s, TCGOpcode op,
+                            TCGReg r0, TCGReg r1, TCGReg r2, TCGReg r3)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_out_r(s, r2);
+    tcg_out_r(s, r3);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
                               TCGReg r0, TCGReg r1, TCGReg r2,
                               TCGReg r3, TCGReg r4, TCGCond c5)
@@ -632,12 +646,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
     case INDEX_op_mulu2_i32:
-        tcg_out_op_t(s, opc);
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        tcg_out_r(s, args[2]);
-        tcg_out_r(s, args[3]);
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
+        tcg_out_op_rrrr(s, opc, args[0], args[1], args[2], args[3]);
         break;
 #endif
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 78/93] tcg/tci: Split out tcg_out_op_rrrrcl
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (76 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 77/93] tcg/tci: Split out tcg_out_op_rrrr Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:44 ` [PATCH v2 79/93] tcg/tci: Split out tcg_out_op_{rrm,rrrm,rrrrm} Richard Henderson
                   ` (17 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index c2bbd85130..fb4aacaca3 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -399,6 +399,23 @@ static void tcg_out_op_rrrr(TCGContext *s, TCGOpcode op,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrrrcl(TCGContext *s, TCGOpcode op,
+                              TCGReg r0, TCGReg r1, TCGReg r2, TCGReg r3,
+                              TCGCond c4, TCGLabel *l5)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_out_r(s, r2);
+    tcg_out_r(s, r3);
+    tcg_out8(s, c4);
+    tci_out_label(s, l5);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
                               TCGReg r0, TCGReg r1, TCGReg r2,
                               TCGReg r3, TCGReg r4, TCGCond c5)
@@ -636,14 +653,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
                           args[3], args[4], args[5]);
         break;
     case INDEX_op_brcond2_i32:
-        tcg_out_op_t(s, opc);
-        tcg_out_r(s, args[0]);
-        tcg_out_r(s, args[1]);
-        tcg_out_r(s, args[2]);
-        tcg_out_r(s, args[3]);
-        tcg_out8(s, args[4]);           /* condition */
-        tci_out_label(s, arg_label(args[5]));
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
+        tcg_out_op_rrrrcl(s, opc, args[0], args[1], args[2],
+                          args[3], args[4], arg_label(args[5]));
         break;
     case INDEX_op_mulu2_i32:
         tcg_out_op_rrrr(s, opc, args[0], args[1], args[2], args[3]);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 79/93] tcg/tci: Split out tcg_out_op_{rrm,rrrm,rrrrm}
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (77 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 78/93] tcg/tci: Split out tcg_out_op_rrrrcl Richard Henderson
@ 2021-02-04  1:44 ` Richard Henderson
  2021-02-04  1:54 ` Richard Henderson
                   ` (16 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 70 ++++++++++++++++++++++++++++++----------
 1 file changed, 53 insertions(+), 17 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index fb4aacaca3..f93772f01f 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -314,6 +314,19 @@ static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrm(TCGContext *s, TCGOpcode op,
+                           TCGReg r0, TCGReg r1, TCGArg m2)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_out32(s, m2);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrr(TCGContext *s, TCGOpcode op,
                            TCGReg r0, TCGReg r1, TCGReg r2)
 {
@@ -369,6 +382,20 @@ static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrrm(TCGContext *s, TCGOpcode op,
+                            TCGReg r0, TCGReg r1, TCGReg r2, TCGArg m3)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_out_r(s, r2);
+    tcg_out32(s, m3);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
                              TCGReg r1, TCGReg r2, uint8_t b3, uint8_t b4)
 {
@@ -384,6 +411,21 @@ static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrrrm(TCGContext *s, TCGOpcode op, TCGReg r0,
+                             TCGReg r1, TCGReg r2, TCGReg r3, TCGArg m4)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_out_r(s, r2);
+    tcg_out_r(s, r3);
+    tcg_out32(s, m4);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 #if TCG_TARGET_REG_BITS == 32
 static void tcg_out_op_rrrr(TCGContext *s, TCGOpcode op,
                             TCGReg r0, TCGReg r1, TCGReg r2, TCGReg r3)
@@ -663,29 +705,23 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 
     case INDEX_op_qemu_ld_i32:
     case INDEX_op_qemu_st_i32:
-        tcg_out_op_t(s, opc);
-        tcg_out_r(s, *args++);
-        tcg_out_r(s, *args++);
-        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-            tcg_out_r(s, *args++);
+        if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+            tcg_out_op_rrm(s, opc, args[0], args[1], args[2]);
+        } else {
+            tcg_out_op_rrrm(s, opc, args[0], args[1], args[2], args[3]);
         }
-        tcg_out32(s, *args++);
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
     case INDEX_op_qemu_ld_i64:
     case INDEX_op_qemu_st_i64:
-        tcg_out_op_t(s, opc);
-        tcg_out_r(s, *args++);
-        if (TCG_TARGET_REG_BITS == 32) {
-            tcg_out_r(s, *args++);
+        if (TCG_TARGET_REG_BITS == 64) {
+            tcg_out_op_rrm(s, opc, args[0], args[1], args[2]);
+        } else if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+            tcg_out_op_rrrm(s, opc, args[0], args[1], args[2], args[3]);
+        } else {
+            tcg_out_op_rrrrm(s, opc, args[0], args[1],
+                             args[2], args[3], args[4]);
         }
-        tcg_out_r(s, *args++);
-        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-            tcg_out_r(s, *args++);
-        }
-        tcg_out32(s, *args++);
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
     case INDEX_op_mb:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 79/93] tcg/tci: Split out tcg_out_op_{rrm,rrrm,rrrrm}
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (78 preceding siblings ...)
  2021-02-04  1:44 ` [PATCH v2 79/93] tcg/tci: Split out tcg_out_op_{rrm,rrrm,rrrrm} Richard Henderson
@ 2021-02-04  1:54 ` Richard Henderson
  2021-02-04 14:54   ` Alex Bennée
  2021-02-04  1:54 ` [PATCH v2 80/93] tcg/tci: Split out tcg_out_op_v Richard Henderson
                   ` (15 subsequent siblings)
  95 siblings, 1 reply; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 70 ++++++++++++++++++++++++++++++----------
 1 file changed, 53 insertions(+), 17 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index fb4aacaca3..f93772f01f 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -314,6 +314,19 @@ static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrm(TCGContext *s, TCGOpcode op,
+                           TCGReg r0, TCGReg r1, TCGArg m2)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_out32(s, m2);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrr(TCGContext *s, TCGOpcode op,
                            TCGReg r0, TCGReg r1, TCGReg r2)
 {
@@ -369,6 +382,20 @@ static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrrm(TCGContext *s, TCGOpcode op,
+                            TCGReg r0, TCGReg r1, TCGReg r2, TCGArg m3)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_out_r(s, r2);
+    tcg_out32(s, m3);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
                              TCGReg r1, TCGReg r2, uint8_t b3, uint8_t b4)
 {
@@ -384,6 +411,21 @@ static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrrrm(TCGContext *s, TCGOpcode op, TCGReg r0,
+                             TCGReg r1, TCGReg r2, TCGReg r3, TCGArg m4)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out_r(s, r1);
+    tcg_out_r(s, r2);
+    tcg_out_r(s, r3);
+    tcg_out32(s, m4);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 #if TCG_TARGET_REG_BITS == 32
 static void tcg_out_op_rrrr(TCGContext *s, TCGOpcode op,
                             TCGReg r0, TCGReg r1, TCGReg r2, TCGReg r3)
@@ -663,29 +705,23 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 
     case INDEX_op_qemu_ld_i32:
     case INDEX_op_qemu_st_i32:
-        tcg_out_op_t(s, opc);
-        tcg_out_r(s, *args++);
-        tcg_out_r(s, *args++);
-        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-            tcg_out_r(s, *args++);
+        if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+            tcg_out_op_rrm(s, opc, args[0], args[1], args[2]);
+        } else {
+            tcg_out_op_rrrm(s, opc, args[0], args[1], args[2], args[3]);
         }
-        tcg_out32(s, *args++);
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
     case INDEX_op_qemu_ld_i64:
     case INDEX_op_qemu_st_i64:
-        tcg_out_op_t(s, opc);
-        tcg_out_r(s, *args++);
-        if (TCG_TARGET_REG_BITS == 32) {
-            tcg_out_r(s, *args++);
+        if (TCG_TARGET_REG_BITS == 64) {
+            tcg_out_op_rrm(s, opc, args[0], args[1], args[2]);
+        } else if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+            tcg_out_op_rrrm(s, opc, args[0], args[1], args[2], args[3]);
+        } else {
+            tcg_out_op_rrrrm(s, opc, args[0], args[1],
+                             args[2], args[3], args[4]);
         }
-        tcg_out_r(s, *args++);
-        if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-            tcg_out_r(s, *args++);
-        }
-        tcg_out32(s, *args++);
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
         break;
 
     case INDEX_op_mb:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 80/93] tcg/tci: Split out tcg_out_op_v
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (79 preceding siblings ...)
  2021-02-04  1:54 ` Richard Henderson
@ 2021-02-04  1:54 ` Richard Henderson
  2021-02-04  1:55 ` [PATCH v2 81/93] tcg/tci: Split out tcg_out_op_np Richard Henderson
                   ` (14 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index f93772f01f..eeafec6d44 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -303,6 +303,15 @@ static void tcg_out_op_p(TCGContext *s, TCGOpcode op, void *p0)
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_v(TCGContext *s, TCGOpcode op)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
 {
     uint8_t *old_code_ptr = s->code_ptr;
@@ -587,8 +596,6 @@ static void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
 static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
                        const int *const_args)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
-
     switch (opc) {
     case INDEX_op_exit_tb:
         tcg_out_op_p(s, opc, (void *)args[0]);
@@ -725,8 +732,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
 
     case INDEX_op_mb:
-        tcg_out_op_t(s, opc);
-        old_code_ptr[1] = s->code_ptr - old_code_ptr;
+        tcg_out_op_v(s, opc);
         break;
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 81/93] tcg/tci: Split out tcg_out_op_np
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (80 preceding siblings ...)
  2021-02-04  1:54 ` [PATCH v2 80/93] tcg/tci: Split out tcg_out_op_v Richard Henderson
@ 2021-02-04  1:55 ` Richard Henderson
  2021-02-04  1:55 ` [PATCH v2 82/93] tcg/tci: Split out tcg_out_op_r[iI] Richard Henderson
                   ` (13 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index eeafec6d44..e4a5872b2a 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -312,6 +312,18 @@ static void tcg_out_op_v(TCGContext *s, TCGOpcode op)
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_np(TCGContext *s, TCGOpcode op,
+                          uint8_t n0, const void *p1)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out8(s, n0);
+    tcg_out_i(s, (uintptr_t)p1);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
 {
     uint8_t *old_code_ptr = s->code_ptr;
@@ -561,7 +573,6 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 
 static void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
     const TCGHelperInfo *info;
     uint8_t which;
 
@@ -574,11 +585,8 @@ static void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
         tcg_debug_assert(info->cif->rtype->size == 8);
         which = 2;
     }
-    tcg_out_op_t(s, INDEX_op_call);
-    tcg_out8(s, which);
-    tcg_out_i(s, (uintptr_t)info);
 
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    tcg_out_op_np(s, INDEX_op_call, which, info);
 }
 
 #if TCG_TARGET_REG_BITS == 64
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 82/93] tcg/tci: Split out tcg_out_op_r[iI]
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (81 preceding siblings ...)
  2021-02-04  1:55 ` [PATCH v2 81/93] tcg/tci: Split out tcg_out_op_np Richard Henderson
@ 2021-02-04  1:55 ` Richard Henderson
  2021-02-04  1:55 ` [PATCH v2 83/93] tcg/tci: Reserve r13 for a temporary Richard Henderson
                   ` (12 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.c.inc | 50 ++++++++++++++++++++++++++++------------
 1 file changed, 35 insertions(+), 15 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index e4a5872b2a..c2d2bd24d7 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -324,6 +324,31 @@ static void tcg_out_op_np(TCGContext *s, TCGOpcode op,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_ri(TCGContext *s, TCGOpcode op, TCGReg r0, int32_t i1)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out32(s, i1);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
+#if TCG_TARGET_REG_BITS == 64
+static void tcg_out_op_rI(TCGContext *s, TCGOpcode op,
+                          TCGReg r0, uint64_t i1)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tcg_out64(s, i1);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+#endif
+
 static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
 {
     uint8_t *old_code_ptr = s->code_ptr;
@@ -550,25 +575,20 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 }
 
 static void tcg_out_movi(TCGContext *s, TCGType type,
-                         TCGReg t0, tcg_target_long arg)
+                         TCGReg ret, tcg_target_long arg)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
-    uint32_t arg32 = arg;
-    if (type == TCG_TYPE_I32 || arg == arg32) {
-        tcg_out_op_t(s, INDEX_op_tci_movi_i32);
-        tcg_out_r(s, t0);
-        tcg_out32(s, arg32);
-    } else {
-        tcg_debug_assert(type == TCG_TYPE_I64);
+    switch (type) {
+    case TCG_TYPE_I32:
+        tcg_out_op_ri(s, INDEX_op_tci_movi_i32, ret, arg);
+        break;
 #if TCG_TARGET_REG_BITS == 64
-        tcg_out_op_t(s, INDEX_op_tci_movi_i64);
-        tcg_out_r(s, t0);
-        tcg_out64(s, arg);
-#else
-        TODO();
+    case TCG_TYPE_I64:
+        tcg_out_op_rI(s, INDEX_op_tci_movi_i64, ret, arg);
+        break;
 #endif
+    default:
+        g_assert_not_reached();
     }
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
 static void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 83/93] tcg/tci: Reserve r13 for a temporary
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (82 preceding siblings ...)
  2021-02-04  1:55 ` [PATCH v2 82/93] tcg/tci: Split out tcg_out_op_r[iI] Richard Henderson
@ 2021-02-04  1:55 ` Richard Henderson
  2021-02-04  1:56 ` [PATCH v2 84/93] tcg/tci: Emit setcond before brcond Richard Henderson
                   ` (11 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

We're about to adjust the offset range on host memory ops,
and the format of branches.  Both will require a temporary.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.h     | 1 +
 tcg/tci/tcg-target.c.inc | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 4df10e2e83..1558a6e44e 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -155,6 +155,7 @@ typedef enum {
     TCG_REG_R14,
     TCG_REG_R15,
 
+    TCG_REG_TMP = TCG_REG_R13,
     TCG_AREG0 = TCG_REG_R14,
     TCG_REG_CALL_STACK = TCG_REG_R15,
 } TCGReg;
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index c2d2bd24d7..b29e75425d 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -829,6 +829,7 @@ static void tcg_target_init(TCGContext *s)
         MAKE_64BIT_MASK(TCG_REG_R0, 64 / TCG_TARGET_REG_BITS);
 
     s->reserved_regs = 0;
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
 
     /* The call arguments come first, followed by the temp storage. */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 84/93] tcg/tci: Emit setcond before brcond
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (83 preceding siblings ...)
  2021-02-04  1:55 ` [PATCH v2 83/93] tcg/tci: Reserve r13 for a temporary Richard Henderson
@ 2021-02-04  1:56 ` Richard Henderson
  2021-02-04  1:56 ` [PATCH v2 85/93] tcg/tci: Remove tci_write_reg Richard Henderson
                   ` (10 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

The encoding planned for tci does not have enough room for
brcond2, with 4 registers and a condition as input as well
as the label.  Resolve the condition into TCG_REG_TMP, and
relax brcond to one register plus a label, considering the
condition to always be reg != 0.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c                | 68 ++++++++++------------------------------
 tcg/tci/tcg-target.c.inc | 52 +++++++++++-------------------
 2 files changed, 35 insertions(+), 85 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index d27db9f720..e7268b13e1 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -141,6 +141,16 @@ static void tci_args_nl(const uint8_t **tb_ptr, uint8_t *n0, void **l1)
     check_size(start, tb_ptr);
 }
 
+static void tci_args_rl(const uint8_t **tb_ptr, TCGReg *r0, void **l1)
+{
+    const uint8_t *start = *tb_ptr;
+
+    *r0 = tci_read_r(tb_ptr);
+    *l1 = (void *)tci_read_label(tb_ptr);
+
+    check_size(start, tb_ptr);
+}
+
 static void tci_args_rr(const uint8_t **tb_ptr,
                         TCGReg *r0, TCGReg *r1)
 {
@@ -212,19 +222,6 @@ static void tci_args_rrs(const uint8_t **tb_ptr,
     check_size(start, tb_ptr);
 }
 
-static void tci_args_rrcl(const uint8_t **tb_ptr,
-                          TCGReg *r0, TCGReg *r1, TCGCond *c2, void **l3)
-{
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *r1 = tci_read_r(tb_ptr);
-    *c2 = tci_read_b(tb_ptr);
-    *l3 = (void *)tci_read_label(tb_ptr);
-
-    check_size(start, tb_ptr);
-}
-
 static void tci_args_rrrc(const uint8_t **tb_ptr,
                           TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
 {
@@ -293,21 +290,6 @@ static void tci_args_rrrr(const uint8_t **tb_ptr,
     check_size(start, tb_ptr);
 }
 
-static void tci_args_rrrrcl(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
-                            TCGReg *r2, TCGReg *r3, TCGCond *c4, void **l5)
-{
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *r1 = tci_read_r(tb_ptr);
-    *r2 = tci_read_r(tb_ptr);
-    *r3 = tci_read_r(tb_ptr);
-    *c4 = tci_read_b(tb_ptr);
-    *l5 = (void *)tci_read_label(tb_ptr);
-
-    check_size(start, tb_ptr);
-}
-
 static void tci_args_rrrrrc(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
                             TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
 {
@@ -723,8 +705,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
 #endif
         case INDEX_op_brcond_i32:
-            tci_args_rrcl(&tb_ptr, &r0, &r1, &condition, &ptr);
-            if (tci_compare32(regs[r0], regs[r1], condition)) {
+            tci_args_rl(&tb_ptr, &r0, &ptr);
+            if ((uint32_t)regs[r0]) {
                 tb_ptr = ptr;
             }
             break;
@@ -741,15 +723,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             T2 = tci_uint64(regs[r5], regs[r4]);
             tci_write_reg64(regs, r1, r0, T1 - T2);
             break;
-        case INDEX_op_brcond2_i32:
-            tci_args_rrrrcl(&tb_ptr, &r0, &r1, &r2, &r3, &condition, &ptr);
-            T1 = tci_uint64(regs[r1], regs[r0]);
-            T2 = tci_uint64(regs[r3], regs[r2]);
-            if (tci_compare64(T1, T2, condition)) {
-                tb_ptr = ptr;
-                continue;
-            }
-            break;
         case INDEX_op_mulu2_i32:
             tci_args_rrrr(&tb_ptr, &r0, &r1, &r2, &r3);
             tci_write_reg64(regs, r1, r0, (uint64_t)regs[r2] * regs[r3]);
@@ -877,8 +850,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
 #endif
         case INDEX_op_brcond_i64:
-            tci_args_rrcl(&tb_ptr, &r0, &r1, &condition, &ptr);
-            if (tci_compare64(regs[r0], regs[r1], condition)) {
+            tci_args_rl(&tb_ptr, &r0, &ptr);
+            if (regs[r0]) {
                 tb_ptr = ptr;
             }
             break;
@@ -1188,9 +1161,9 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
 
     case INDEX_op_brcond_i32:
     case INDEX_op_brcond_i64:
-        tci_args_rrcl(&tb_ptr, &r0, &r1, &c, &ptr);
-        info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%p",
-                           op_name, str_r(r0), str_r(r1), str_c(c), ptr);
+        tci_args_rl(&tb_ptr, &r0, &ptr);
+        info->fprintf_func(info->stream, "%-12s  %s,0,ne,%p",
+                           op_name, str_r(r0), ptr);
         break;
 
     case INDEX_op_setcond_i32:
@@ -1315,13 +1288,6 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
                            str_r(r3), str_r(r4), str_c(c));
         break;
 
-    case INDEX_op_brcond2_i32:
-        tci_args_rrrrcl(&tb_ptr, &r0, &r1, &r2, &r3, &c, &ptr);
-        info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s,%s,%p",
-                           op_name, str_r(r0), str_r(r1),
-                           str_r(r2), str_r(r3), str_c(c), ptr);
-        break;
-
     case INDEX_op_mulu2_i32:
         tci_args_rrrr(&tb_ptr, &r0, &r1, &r2, &r3);
         info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s",
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index b29e75425d..e06d4e9380 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -349,6 +349,17 @@ static void tcg_out_op_rI(TCGContext *s, TCGOpcode op,
 }
 #endif
 
+static void tcg_out_op_rl(TCGContext *s, TCGOpcode op, TCGReg r0, TCGLabel *l1)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, op);
+    tcg_out_r(s, r0);
+    tci_out_label(s, l1);
+
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
 {
     uint8_t *old_code_ptr = s->code_ptr;
@@ -400,20 +411,6 @@ static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
-static void tcg_out_op_rrcl(TCGContext *s, TCGOpcode op,
-                            TCGReg r0, TCGReg r1, TCGCond c2, TCGLabel *l3)
-{
-    uint8_t *old_code_ptr = s->code_ptr;
-
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out_r(s, r1);
-    tcg_out8(s, c2);
-    tci_out_label(s, l3);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
-}
-
 static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
                             TCGReg r0, TCGReg r1, TCGReg r2, TCGCond c3)
 {
@@ -487,23 +484,6 @@ static void tcg_out_op_rrrr(TCGContext *s, TCGOpcode op,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
-static void tcg_out_op_rrrrcl(TCGContext *s, TCGOpcode op,
-                              TCGReg r0, TCGReg r1, TCGReg r2, TCGReg r3,
-                              TCGCond c4, TCGLabel *l5)
-{
-    uint8_t *old_code_ptr = s->code_ptr;
-
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out_r(s, r1);
-    tcg_out_r(s, r2);
-    tcg_out_r(s, r3);
-    tcg_out8(s, c4);
-    tci_out_label(s, l5);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
-}
-
 static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
                               TCGReg r0, TCGReg r1, TCGReg r2,
                               TCGReg r3, TCGReg r4, TCGCond c5)
@@ -704,7 +684,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
 
     CASE_32_64(brcond)
-        tcg_out_op_rrcl(s, opc, args[0], args[1], args[2], arg_label(args[3]));
+        tcg_out_op_rrrc(s, (opc == INDEX_op_brcond_i32
+                            ? INDEX_op_setcond_i32 : INDEX_op_setcond_i64),
+                        TCG_REG_TMP, args[0], args[1], args[2]);
+        tcg_out_op_rl(s, opc, TCG_REG_TMP, arg_label(args[3]));
         break;
 
     CASE_32_64(neg)      /* Optional (TCG_TARGET_HAS_neg_*). */
@@ -730,8 +713,9 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
                           args[3], args[4], args[5]);
         break;
     case INDEX_op_brcond2_i32:
-        tcg_out_op_rrrrcl(s, opc, args[0], args[1], args[2],
-                          args[3], args[4], arg_label(args[5]));
+        tcg_out_op_rrrrrc(s, INDEX_op_setcond2_i32, TCG_REG_TMP,
+                          args[0], args[1], args[2], args[3], args[4]);
+        tcg_out_op_rl(s, INDEX_op_brcond_i32, TCG_REG_TMP, arg_label(args[5]));
         break;
     case INDEX_op_mulu2_i32:
         tcg_out_op_rrrr(s, opc, args[0], args[1], args[2], args[3]);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 85/93] tcg/tci: Remove tci_write_reg
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (84 preceding siblings ...)
  2021-02-04  1:56 ` [PATCH v2 84/93] tcg/tci: Emit setcond before brcond Richard Henderson
@ 2021-02-04  1:56 ` Richard Henderson
  2021-02-04  1:56 ` [PATCH v2 86/93] tcg/tci: Change encoding to uint32_t units Richard Henderson
                   ` (9 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Inline it into its one caller, tci_write_reg64.
Drop the asserts that are redundant with tcg_read_r.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index e7268b13e1..4f81cbb904 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -36,20 +36,11 @@
 
 __thread uintptr_t tci_tb_ptr;
 
-static void
-tci_write_reg(tcg_target_ulong *regs, TCGReg index, tcg_target_ulong value)
-{
-    tci_assert(index < TCG_TARGET_NB_REGS);
-    tci_assert(index != TCG_AREG0);
-    tci_assert(index != TCG_REG_CALL_STACK);
-    regs[index] = value;
-}
-
 static void tci_write_reg64(tcg_target_ulong *regs, uint32_t high_index,
                             uint32_t low_index, uint64_t value)
 {
-    tci_write_reg(regs, low_index, value);
-    tci_write_reg(regs, high_index, value >> 32);
+    regs[low_index] = value;
+    regs[high_index] = value >> 32;
 }
 
 /* Create a 64 bit value from two 32 bit values. */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 86/93] tcg/tci: Change encoding to uint32_t units
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (85 preceding siblings ...)
  2021-02-04  1:56 ` [PATCH v2 85/93] tcg/tci: Remove tci_write_reg Richard Henderson
@ 2021-02-04  1:56 ` Richard Henderson
  2021-02-04  1:57 ` [PATCH v2 87/93] tcg/tci: Implement goto_ptr Richard Henderson
                   ` (8 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

This removes all of the problems with unaligned accesses
to the bytecode stream.

With an 8-bit opcode at the bottom, we have 24 bits remaining,
which are generally split into 6 4-bit slots.  This fits well
with the maximum length opcodes, e.g. INDEX_op_add2_i386, which
have 6 register operands.

We have, in previous patches, rearranged things such that there
are no operations with a label, which have more than one other
operand.  Which leaves us with a 20-bit field in which to encode
a label, giving us a maximum TB size of 512k -- easily large.

Change the INDEX_op_tci_movi_{i32,i64} opcodes to tci_mov[il].
The former puts the immediate in the upper 20 bits of the insn,
like we do for the label displacement.  The later uses a label
to reference an entry in the constant pool.  Thus, in the worst
case we still have a single memory reference for any constant,
but now the constants are out-of-line of the bytecode and can
be shared between different moves saving space.

Change INDEX_op_call to use a label to reference a pair of
pointers in the constant pool.  This removes the only slightly
dodgy link with the layout of struct TCGHelperInfo.

The re-encode cannot be done in pieces.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-opc.h    |   4 +-
 tcg/tci/tcg-target.h     |   3 +-
 tcg/tci.c                | 534 +++++++++++++++------------------------
 tcg/tci/tcg-target.c.inc | 386 +++++++++++++---------------
 tcg/tci/README           |  20 +-
 5 files changed, 380 insertions(+), 567 deletions(-)

diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index bbb0884af8..5bbec858aa 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -277,8 +277,8 @@ DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 
 #ifdef TCG_TARGET_INTERPRETER
 /* These opcodes are only for use between the tci generator and interpreter. */
-DEF(tci_movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
-DEF(tci_movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
+DEF(tci_movi, 1, 0, 1, TCG_OPF_NOT_PRESENT)
+DEF(tci_movl, 1, 0, 1, TCG_OPF_NOT_PRESENT)
 #endif
 
 #undef TLADDR_ARGS
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 1558a6e44e..d953f2ead3 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -41,7 +41,7 @@
 #define TCG_TARGET_H
 
 #define TCG_TARGET_INTERPRETER 1
-#define TCG_TARGET_INSN_UNIT_SIZE 1
+#define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 32
 
 #if UINTPTR_MAX == UINT32_MAX
@@ -165,6 +165,7 @@ typedef enum {
 #define TCG_TARGET_STACK_ALIGN          8
 
 #define HAVE_TCG_QEMU_TB_EXEC
+#define TCG_TARGET_NEED_POOL_LABELS
 
 /* We could notice __i386__ or __s390x__ and reduce the barriers depending
    on the host.  But if you want performance, you use the normal backend.
diff --git a/tcg/tci.c b/tcg/tci.c
index 4f81cbb904..c4f0a7e82d 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -49,49 +49,6 @@ static uint64_t tci_uint64(uint32_t high, uint32_t low)
     return ((uint64_t)high << 32) + low;
 }
 
-/* Read constant byte from bytecode. */
-static uint8_t tci_read_b(const uint8_t **tb_ptr)
-{
-    return *(tb_ptr[0]++);
-}
-
-/* Read register number from bytecode. */
-static TCGReg tci_read_r(const uint8_t **tb_ptr)
-{
-    uint8_t regno = tci_read_b(tb_ptr);
-    tci_assert(regno < TCG_TARGET_NB_REGS);
-    return regno;
-}
-
-/* Read constant (native size) from bytecode. */
-static tcg_target_ulong tci_read_i(const uint8_t **tb_ptr)
-{
-    tcg_target_ulong value = *(const tcg_target_ulong *)(*tb_ptr);
-    *tb_ptr += sizeof(value);
-    return value;
-}
-
-/* Read unsigned constant (32 bit) from bytecode. */
-static uint32_t tci_read_i32(const uint8_t **tb_ptr)
-{
-    uint32_t value = *(const uint32_t *)(*tb_ptr);
-    *tb_ptr += sizeof(value);
-    return value;
-}
-
-/* Read signed constant (32 bit) from bytecode. */
-static int32_t tci_read_s32(const uint8_t **tb_ptr)
-{
-    int32_t value = *(const int32_t *)(*tb_ptr);
-    *tb_ptr += sizeof(value);
-    return value;
-}
-
-static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
-{
-    return tci_read_i(tb_ptr);
-}
-
 /*
  * Load sets of arguments all at once.  The naming convention is:
  *   tci_args_<arguments>
@@ -106,209 +63,128 @@ static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
  *   s = signed ldst offset
  */
 
-static void check_size(const uint8_t *start, const uint8_t **tb_ptr)
+static void tci_args_l(uint32_t insn, const void *tb_ptr, void **l0)
 {
-    const uint8_t *old_code_ptr = start - 2;
-    uint8_t op_size = old_code_ptr[1];
-    tci_assert(*tb_ptr == old_code_ptr + op_size);
+    int diff = sextract32(insn, 12, 20);
+    *l0 = diff ? (void *)tb_ptr + diff : NULL;
 }
 
-static void tci_args_l(const uint8_t **tb_ptr, void **l0)
+static void tci_args_nl(uint32_t insn, const void *tb_ptr,
+                        uint8_t *n0, void **l1)
 {
-    const uint8_t *start = *tb_ptr;
-
-    *l0 = (void *)tci_read_label(tb_ptr);
-
-    check_size(start, tb_ptr);
+    *n0 = extract32(insn, 8, 4);
+    *l1 = sextract32(insn, 12, 20) + (void *)tb_ptr;
 }
 
-static void tci_args_nl(const uint8_t **tb_ptr, uint8_t *n0, void **l1)
+static void tci_args_rl(uint32_t insn, const void *tb_ptr,
+                        TCGReg *r0, void **l1)
 {
-    const uint8_t *start = *tb_ptr;
-
-    *n0 = tci_read_b(tb_ptr);
-    *l1 = (void *)tci_read_label(tb_ptr);
-
-    check_size(start, tb_ptr);
+    *r0 = extract32(insn, 8, 4);
+    *l1 = sextract32(insn, 12, 20) + (void *)tb_ptr;
 }
 
-static void tci_args_rl(const uint8_t **tb_ptr, TCGReg *r0, void **l1)
+static void tci_args_rr(uint32_t insn, TCGReg *r0, TCGReg *r1)
 {
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *l1 = (void *)tci_read_label(tb_ptr);
-
-    check_size(start, tb_ptr);
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
 }
 
-static void tci_args_rr(const uint8_t **tb_ptr,
-                        TCGReg *r0, TCGReg *r1)
+static void tci_args_ri(uint32_t insn, TCGReg *r0, tcg_target_ulong *i1)
 {
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *r1 = tci_read_r(tb_ptr);
-
-    check_size(start, tb_ptr);
+    *r0 = extract32(insn, 8, 4);
+    *i1 = sextract32(insn, 12, 20);
 }
 
-static void tci_args_ri(const uint8_t **tb_ptr,
-                        TCGReg *r0, tcg_target_ulong *i1)
+static void tci_args_rrm(uint32_t insn, TCGReg *r0,
+                         TCGReg *r1, TCGMemOpIdx *m2)
 {
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *i1 = tci_read_i32(tb_ptr);
-
-    check_size(start, tb_ptr);
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *m2 = extract32(insn, 20, 12);
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static void tci_args_rI(const uint8_t **tb_ptr,
-                        TCGReg *r0, tcg_target_ulong *i1)
+static void tci_args_rrr(uint32_t insn, TCGReg *r0, TCGReg *r1, TCGReg *r2)
 {
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *i1 = tci_read_i(tb_ptr);
-
-    check_size(start, tb_ptr);
-}
-#endif
-
-static void tci_args_rrm(const uint8_t **tb_ptr,
-                         TCGReg *r0, TCGReg *r1, TCGMemOpIdx *m2)
-{
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *r1 = tci_read_r(tb_ptr);
-    *m2 = tci_read_i32(tb_ptr);
-
-    check_size(start, tb_ptr);
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *r2 = extract32(insn, 16, 4);
 }
 
-static void tci_args_rrr(const uint8_t **tb_ptr,
-                         TCGReg *r0, TCGReg *r1, TCGReg *r2)
+static void tci_args_rrs(uint32_t insn, TCGReg *r0, TCGReg *r1, int32_t *i2)
 {
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *r1 = tci_read_r(tb_ptr);
-    *r2 = tci_read_r(tb_ptr);
-
-    check_size(start, tb_ptr);
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *i2 = sextract32(insn, 16, 16);
 }
 
-static void tci_args_rrs(const uint8_t **tb_ptr,
-                         TCGReg *r0, TCGReg *r1, int32_t *i2)
-{
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *r1 = tci_read_r(tb_ptr);
-    *i2 = tci_read_s32(tb_ptr);
-
-    check_size(start, tb_ptr);
-}
-
-static void tci_args_rrrc(const uint8_t **tb_ptr,
+static void tci_args_rrrc(uint32_t insn,
                           TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
 {
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *r1 = tci_read_r(tb_ptr);
-    *r2 = tci_read_r(tb_ptr);
-    *c3 = tci_read_b(tb_ptr);
-
-    check_size(start, tb_ptr);
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *r2 = extract32(insn, 16, 4);
+    *c3 = extract32(insn, 20, 4);
 }
 
-static void tci_args_rrrm(const uint8_t **tb_ptr,
+static void tci_args_rrrm(uint32_t insn,
                           TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGMemOpIdx *m3)
 {
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *r1 = tci_read_r(tb_ptr);
-    *r2 = tci_read_r(tb_ptr);
-    *m3 = tci_read_i32(tb_ptr);
-
-    check_size(start, tb_ptr);
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *r2 = extract32(insn, 16, 4);
+    *m3 = extract32(insn, 20, 12);
 }
 
-static void tci_args_rrrbb(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+static void tci_args_rrrbb(uint32_t insn, TCGReg *r0, TCGReg *r1,
                            TCGReg *r2, uint8_t *i3, uint8_t *i4)
 {
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *r1 = tci_read_r(tb_ptr);
-    *r2 = tci_read_r(tb_ptr);
-    *i3 = tci_read_b(tb_ptr);
-    *i4 = tci_read_b(tb_ptr);
-
-    check_size(start, tb_ptr);
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *r2 = extract32(insn, 16, 4);
+    *i3 = extract32(insn, 20, 6);
+    *i4 = extract32(insn, 26, 6);
 }
 
-static void tci_args_rrrrm(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
-                           TCGReg *r2, TCGReg *r3, TCGMemOpIdx *m4)
+static void tci_args_rrrrr(uint32_t insn, TCGReg *r0, TCGReg *r1,
+                           TCGReg *r2, TCGReg *r3, TCGReg *r4)
 {
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *r1 = tci_read_r(tb_ptr);
-    *r2 = tci_read_r(tb_ptr);
-    *r3 = tci_read_r(tb_ptr);
-    *m4 = tci_read_i32(tb_ptr);
-
-    check_size(start, tb_ptr);
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *r2 = extract32(insn, 16, 4);
+    *r3 = extract32(insn, 20, 4);
+    *r4 = extract32(insn, 24, 4);
 }
 
 #if TCG_TARGET_REG_BITS == 32
-static void tci_args_rrrr(const uint8_t **tb_ptr,
+static void tci_args_rrrr(uint32_t insn,
                           TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
 {
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *r1 = tci_read_r(tb_ptr);
-    *r2 = tci_read_r(tb_ptr);
-    *r3 = tci_read_r(tb_ptr);
-
-    check_size(start, tb_ptr);
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *r2 = extract32(insn, 16, 4);
+    *r3 = extract32(insn, 20, 4);
 }
 
-static void tci_args_rrrrrc(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+static void tci_args_rrrrrc(uint32_t insn, TCGReg *r0, TCGReg *r1,
                             TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
 {
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *r1 = tci_read_r(tb_ptr);
-    *r2 = tci_read_r(tb_ptr);
-    *r3 = tci_read_r(tb_ptr);
-    *r4 = tci_read_r(tb_ptr);
-    *c5 = tci_read_b(tb_ptr);
-
-    check_size(start, tb_ptr);
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *r2 = extract32(insn, 16, 4);
+    *r3 = extract32(insn, 20, 4);
+    *r4 = extract32(insn, 24, 4);
+    *c5 = extract32(insn, 28, 4);
 }
 
-static void tci_args_rrrrrr(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+static void tci_args_rrrrrr(uint32_t insn, TCGReg *r0, TCGReg *r1,
                             TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGReg *r5)
 {
-    const uint8_t *start = *tb_ptr;
-
-    *r0 = tci_read_r(tb_ptr);
-    *r1 = tci_read_r(tb_ptr);
-    *r2 = tci_read_r(tb_ptr);
-    *r3 = tci_read_r(tb_ptr);
-    *r4 = tci_read_r(tb_ptr);
-    *r5 = tci_read_r(tb_ptr);
-
-    check_size(start, tb_ptr);
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *r2 = extract32(insn, 16, 4);
+    *r3 = extract32(insn, 20, 4);
+    *r4 = extract32(insn, 24, 4);
+    *r5 = extract32(insn, 28, 4);
 }
 #endif
 
@@ -464,7 +340,7 @@ static bool tci_compare64(uint64_t u0, uint64_t u1, TCGCond condition)
 uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
                                             const void *v_tb_ptr)
 {
-    const uint8_t *tb_ptr = v_tb_ptr;
+    const uint32_t *tb_ptr = v_tb_ptr;
     tcg_target_ulong regs[TCG_TARGET_NB_REGS];
     uint64_t stack[(TCG_STATIC_CALL_ARGS_SIZE + TCG_STATIC_FRAME_SIZE)
                    / sizeof(uint64_t)];
@@ -476,8 +352,9 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
     tci_assert(tb_ptr);
 
     for (;;) {
-        TCGOpcode opc = tb_ptr[0];
-        TCGReg r0, r1, r2, r3;
+        uint32_t insn;
+        TCGOpcode opc;
+        TCGReg r0, r1, r2, r3, r4;
         tcg_target_ulong t1;
         TCGCond condition;
         target_ulong taddr;
@@ -485,23 +362,19 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
         uint32_t tmp32;
         uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
-        TCGReg r4, r5;
+        TCGReg r5;
         uint64_t T1, T2;
 #endif
         TCGMemOpIdx oi;
         int32_t ofs;
         void *ptr;
 
-        /* Skip opcode and size entry. */
-        tb_ptr += 2;
+        insn = *tb_ptr++;
+        opc = extract32(insn, 0, 8);
 
         switch (opc) {
         case INDEX_op_call:
-            /*
-             * We are passed a pointer to the TCGHelperInfo, which contains
-             * the function pointer followed by the ffi_cif pointer.
-             */
-            tci_args_nl(&tb_ptr, &len, &ptr);
+            tci_args_nl(insn, tb_ptr, &len, &ptr);
 
             /* Helper functions may need to access the "return address" */
             tci_tb_ptr = (uintptr_t)tb_ptr;
@@ -519,8 +392,9 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             }
 
             /*
-             * Call the helper function.  Any result winds up
-             * "left-aligned" in the stack[0] slot.
+             * We are passed a pointer into the constant pool, which
+             * contains a pair of the function pointer and the cif pointer.
+             * Any result winds up "left-aligned" in the stack[0] slot.
              */
             {
                 void **pptr = ptr;
@@ -545,76 +419,80 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             break;
 
         case INDEX_op_br:
-            tci_args_l(&tb_ptr, &ptr);
+            tci_args_l(insn, tb_ptr, &ptr);
             tb_ptr = ptr;
             continue;
         case INDEX_op_setcond_i32:
-            tci_args_rrrc(&tb_ptr, &r0, &r1, &r2, &condition);
+            tci_args_rrrc(insn, &r0, &r1, &r2, &condition);
             regs[r0] = tci_compare32(regs[r1], regs[r2], condition);
             break;
 #if TCG_TARGET_REG_BITS == 32
         case INDEX_op_setcond2_i32:
-            tci_args_rrrrrc(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &condition);
+            tci_args_rrrrrc(insn, &r0, &r1, &r2, &r3, &r4, &condition);
             T1 = tci_uint64(regs[r2], regs[r1]);
             T2 = tci_uint64(regs[r4], regs[r3]);
             regs[r0] = tci_compare64(T1, T2, condition);
             break;
 #elif TCG_TARGET_REG_BITS == 64
         case INDEX_op_setcond_i64:
-            tci_args_rrrc(&tb_ptr, &r0, &r1, &r2, &condition);
+            tci_args_rrrc(insn, &r0, &r1, &r2, &condition);
             regs[r0] = tci_compare64(regs[r1], regs[r2], condition);
             break;
 #endif
         CASE_32_64(mov)
-            tci_args_rr(&tb_ptr, &r0, &r1);
+            tci_args_rr(insn, &r0, &r1);
             regs[r0] = regs[r1];
             break;
-        case INDEX_op_tci_movi_i32:
-            tci_args_ri(&tb_ptr, &r0, &t1);
+        case INDEX_op_tci_movi:
+            tci_args_ri(insn, &r0, &t1);
             regs[r0] = t1;
             break;
+        case INDEX_op_tci_movl:
+            tci_args_rl(insn, tb_ptr, &r0, &ptr);
+            regs[r0] = *(tcg_target_ulong *)ptr;
+            break;
 
             /* Load/store operations (32 bit). */
 
         CASE_32_64(ld8u)
-            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            tci_args_rrs(insn, &r0, &r1, &ofs);
             ptr = (void *)(regs[r1] + ofs);
             regs[r0] = *(uint8_t *)ptr;
             break;
         CASE_32_64(ld8s)
-            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            tci_args_rrs(insn, &r0, &r1, &ofs);
             ptr = (void *)(regs[r1] + ofs);
             regs[r0] = *(int8_t *)ptr;
             break;
         CASE_32_64(ld16u)
-            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            tci_args_rrs(insn, &r0, &r1, &ofs);
             ptr = (void *)(regs[r1] + ofs);
             regs[r0] = *(uint16_t *)ptr;
             break;
         CASE_32_64(ld16s)
-            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            tci_args_rrs(insn, &r0, &r1, &ofs);
             ptr = (void *)(regs[r1] + ofs);
             regs[r0] = *(int16_t *)ptr;
             break;
         case INDEX_op_ld_i32:
         CASE_64(ld32u)
-            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            tci_args_rrs(insn, &r0, &r1, &ofs);
             ptr = (void *)(regs[r1] + ofs);
             regs[r0] = *(uint32_t *)ptr;
             break;
         CASE_32_64(st8)
-            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            tci_args_rrs(insn, &r0, &r1, &ofs);
             ptr = (void *)(regs[r1] + ofs);
             *(uint8_t *)ptr = regs[r0];
             break;
         CASE_32_64(st16)
-            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            tci_args_rrs(insn, &r0, &r1, &ofs);
             ptr = (void *)(regs[r1] + ofs);
             *(uint16_t *)ptr = regs[r0];
             break;
         case INDEX_op_st_i32:
         CASE_64(st32)
-            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            tci_args_rrs(insn, &r0, &r1, &ofs);
             ptr = (void *)(regs[r1] + ofs);
             *(uint32_t *)ptr = regs[r0];
             break;
@@ -622,171 +500,166 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             /* Arithmetic operations (mixed 32/64 bit). */
 
         CASE_32_64(add)
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = regs[r1] + regs[r2];
             break;
         CASE_32_64(sub)
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = regs[r1] - regs[r2];
             break;
         CASE_32_64(mul)
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = regs[r1] * regs[r2];
             break;
         CASE_32_64(and)
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = regs[r1] & regs[r2];
             break;
         CASE_32_64(or)
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = regs[r1] | regs[r2];
             break;
         CASE_32_64(xor)
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = regs[r1] ^ regs[r2];
             break;
 
             /* Arithmetic operations (32 bit). */
 
         case INDEX_op_div_i32:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = (int32_t)regs[r1] / (int32_t)regs[r2];
             break;
         case INDEX_op_divu_i32:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = (uint32_t)regs[r1] / (uint32_t)regs[r2];
             break;
         case INDEX_op_rem_i32:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = (int32_t)regs[r1] % (int32_t)regs[r2];
             break;
         case INDEX_op_remu_i32:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = (uint32_t)regs[r1] % (uint32_t)regs[r2];
             break;
 
             /* Shift/rotate operations (32 bit). */
 
         case INDEX_op_shl_i32:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = (uint32_t)regs[r1] << (regs[r2] & 31);
             break;
         case INDEX_op_shr_i32:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = (uint32_t)regs[r1] >> (regs[r2] & 31);
             break;
         case INDEX_op_sar_i32:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = (int32_t)regs[r1] >> (regs[r2] & 31);
             break;
 #if TCG_TARGET_HAS_rot_i32
         case INDEX_op_rotl_i32:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = rol32(regs[r1], regs[r2] & 31);
             break;
         case INDEX_op_rotr_i32:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = ror32(regs[r1], regs[r2] & 31);
             break;
 #endif
 #if TCG_TARGET_HAS_deposit_i32
         case INDEX_op_deposit_i32:
-            tci_args_rrrbb(&tb_ptr, &r0, &r1, &r2, &pos, &len);
+            tci_args_rrrbb(insn, &r0, &r1, &r2, &pos, &len);
             regs[r0] = deposit32(regs[r1], pos, len, regs[r2]);
             break;
 #endif
         case INDEX_op_brcond_i32:
-            tci_args_rl(&tb_ptr, &r0, &ptr);
+            tci_args_rl(insn, tb_ptr, &r0, &ptr);
             if ((uint32_t)regs[r0]) {
                 tb_ptr = ptr;
             }
             break;
 #if TCG_TARGET_REG_BITS == 32
         case INDEX_op_add2_i32:
-            tci_args_rrrrrr(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &r5);
+            tci_args_rrrrrr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
             T1 = tci_uint64(regs[r3], regs[r2]);
             T2 = tci_uint64(regs[r5], regs[r4]);
             tci_write_reg64(regs, r1, r0, T1 + T2);
             break;
         case INDEX_op_sub2_i32:
-            tci_args_rrrrrr(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &r5);
+            tci_args_rrrrrr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
             T1 = tci_uint64(regs[r3], regs[r2]);
             T2 = tci_uint64(regs[r5], regs[r4]);
             tci_write_reg64(regs, r1, r0, T1 - T2);
             break;
         case INDEX_op_mulu2_i32:
-            tci_args_rrrr(&tb_ptr, &r0, &r1, &r2, &r3);
+            tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
             tci_write_reg64(regs, r1, r0, (uint64_t)regs[r2] * regs[r3]);
             break;
 #endif /* TCG_TARGET_REG_BITS == 32 */
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
         CASE_32_64(ext8s)
-            tci_args_rr(&tb_ptr, &r0, &r1);
+            tci_args_rr(insn, &r0, &r1);
             regs[r0] = (int8_t)regs[r1];
             break;
 #endif
 #if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
         CASE_32_64(ext16s)
-            tci_args_rr(&tb_ptr, &r0, &r1);
+            tci_args_rr(insn, &r0, &r1);
             regs[r0] = (int16_t)regs[r1];
             break;
 #endif
 #if TCG_TARGET_HAS_ext8u_i32 || TCG_TARGET_HAS_ext8u_i64
         CASE_32_64(ext8u)
-            tci_args_rr(&tb_ptr, &r0, &r1);
+            tci_args_rr(insn, &r0, &r1);
             regs[r0] = (uint8_t)regs[r1];
             break;
 #endif
 #if TCG_TARGET_HAS_ext16u_i32 || TCG_TARGET_HAS_ext16u_i64
         CASE_32_64(ext16u)
-            tci_args_rr(&tb_ptr, &r0, &r1);
+            tci_args_rr(insn, &r0, &r1);
             regs[r0] = (uint16_t)regs[r1];
             break;
 #endif
 #if TCG_TARGET_HAS_bswap16_i32 || TCG_TARGET_HAS_bswap16_i64
         CASE_32_64(bswap16)
-            tci_args_rr(&tb_ptr, &r0, &r1);
+            tci_args_rr(insn, &r0, &r1);
             regs[r0] = bswap16(regs[r1]);
             break;
 #endif
 #if TCG_TARGET_HAS_bswap32_i32 || TCG_TARGET_HAS_bswap32_i64
         CASE_32_64(bswap32)
-            tci_args_rr(&tb_ptr, &r0, &r1);
+            tci_args_rr(insn, &r0, &r1);
             regs[r0] = bswap32(regs[r1]);
             break;
 #endif
 #if TCG_TARGET_HAS_not_i32 || TCG_TARGET_HAS_not_i64
         CASE_32_64(not)
-            tci_args_rr(&tb_ptr, &r0, &r1);
+            tci_args_rr(insn, &r0, &r1);
             regs[r0] = ~regs[r1];
             break;
 #endif
 #if TCG_TARGET_HAS_neg_i32 || TCG_TARGET_HAS_neg_i64
         CASE_32_64(neg)
-            tci_args_rr(&tb_ptr, &r0, &r1);
+            tci_args_rr(insn, &r0, &r1);
             regs[r0] = -regs[r1];
             break;
 #endif
 #if TCG_TARGET_REG_BITS == 64
-        case INDEX_op_tci_movi_i64:
-            tci_args_rI(&tb_ptr, &r0, &t1);
-            regs[r0] = t1;
-            break;
-
             /* Load/store operations (64 bit). */
 
         case INDEX_op_ld32s_i64:
-            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            tci_args_rrs(insn, &r0, &r1, &ofs);
             ptr = (void *)(regs[r1] + ofs);
             regs[r0] = *(int32_t *)ptr;
             break;
         case INDEX_op_ld_i64:
-            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            tci_args_rrs(insn, &r0, &r1, &ofs);
             ptr = (void *)(regs[r1] + ofs);
             regs[r0] = *(uint64_t *)ptr;
             break;
         case INDEX_op_st_i64:
-            tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+            tci_args_rrs(insn, &r0, &r1, &ofs);
             ptr = (void *)(regs[r1] + ofs);
             *(uint64_t *)ptr = regs[r0];
             break;
@@ -794,71 +667,71 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             /* Arithmetic operations (64 bit). */
 
         case INDEX_op_div_i64:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = (int64_t)regs[r1] / (int64_t)regs[r2];
             break;
         case INDEX_op_divu_i64:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = (uint64_t)regs[r1] / (uint64_t)regs[r2];
             break;
         case INDEX_op_rem_i64:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = (int64_t)regs[r1] % (int64_t)regs[r2];
             break;
         case INDEX_op_remu_i64:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = (uint64_t)regs[r1] % (uint64_t)regs[r2];
             break;
 
             /* Shift/rotate operations (64 bit). */
 
         case INDEX_op_shl_i64:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = regs[r1] << (regs[r2] & 63);
             break;
         case INDEX_op_shr_i64:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = regs[r1] >> (regs[r2] & 63);
             break;
         case INDEX_op_sar_i64:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = (int64_t)regs[r1] >> (regs[r2] & 63);
             break;
 #if TCG_TARGET_HAS_rot_i64
         case INDEX_op_rotl_i64:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = rol64(regs[r1], regs[r2] & 63);
             break;
         case INDEX_op_rotr_i64:
-            tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+            tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = ror64(regs[r1], regs[r2] & 63);
             break;
 #endif
 #if TCG_TARGET_HAS_deposit_i64
         case INDEX_op_deposit_i64:
-            tci_args_rrrbb(&tb_ptr, &r0, &r1, &r2, &pos, &len);
+            tci_args_rrrbb(insn, &r0, &r1, &r2, &pos, &len);
             regs[r0] = deposit64(regs[r1], pos, len, regs[r2]);
             break;
 #endif
         case INDEX_op_brcond_i64:
-            tci_args_rl(&tb_ptr, &r0, &ptr);
+            tci_args_rl(insn, tb_ptr, &r0, &ptr);
             if (regs[r0]) {
                 tb_ptr = ptr;
             }
             break;
         case INDEX_op_ext32s_i64:
         case INDEX_op_ext_i32_i64:
-            tci_args_rr(&tb_ptr, &r0, &r1);
+            tci_args_rr(insn, &r0, &r1);
             regs[r0] = (int32_t)regs[r1];
             break;
         case INDEX_op_ext32u_i64:
         case INDEX_op_extu_i32_i64:
-            tci_args_rr(&tb_ptr, &r0, &r1);
+            tci_args_rr(insn, &r0, &r1);
             regs[r0] = (uint32_t)regs[r1];
             break;
 #if TCG_TARGET_HAS_bswap64_i64
         case INDEX_op_bswap64_i64:
-            tci_args_rr(&tb_ptr, &r0, &r1);
+            tci_args_rr(insn, &r0, &r1);
             regs[r0] = bswap64(regs[r1]);
             break;
 #endif
@@ -867,20 +740,20 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             /* QEMU specific operations. */
 
         case INDEX_op_exit_tb:
-            tci_args_l(&tb_ptr, &ptr);
+            tci_args_l(insn, tb_ptr, &ptr);
             return (uintptr_t)ptr;
 
         case INDEX_op_goto_tb:
-            tci_args_l(&tb_ptr, &ptr);
+            tci_args_l(insn, tb_ptr, &ptr);
             tb_ptr = *(void **)ptr;
             break;
 
         case INDEX_op_qemu_ld_i32:
             if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
-                tci_args_rrm(&tb_ptr, &r0, &r1, &oi);
+                tci_args_rrm(insn, &r0, &r1, &oi);
                 taddr = regs[r1];
             } else {
-                tci_args_rrrm(&tb_ptr, &r0, &r1, &r2, &oi);
+                tci_args_rrrm(insn, &r0, &r1, &r2, &oi);
                 taddr = tci_uint64(regs[r2], regs[r1]);
             }
             switch (get_memop(oi) & (MO_BSWAP | MO_SSIZE)) {
@@ -916,14 +789,15 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_qemu_ld_i64:
             if (TCG_TARGET_REG_BITS == 64) {
-                tci_args_rrm(&tb_ptr, &r0, &r1, &oi);
+                tci_args_rrm(insn, &r0, &r1, &oi);
                 taddr = regs[r1];
             } else if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
-                tci_args_rrrm(&tb_ptr, &r0, &r1, &r2, &oi);
+                tci_args_rrrm(insn, &r0, &r1, &r2, &oi);
                 taddr = regs[r2];
             } else {
-                tci_args_rrrrm(&tb_ptr, &r0, &r1, &r2, &r3, &oi);
+                tci_args_rrrrr(insn, &r0, &r1, &r2, &r3, &r4);
                 taddr = tci_uint64(regs[r3], regs[r2]);
+                oi = regs[r4];
             }
             switch (get_memop(oi) & (MO_BSWAP | MO_SSIZE)) {
             case MO_UB:
@@ -974,10 +848,10 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_qemu_st_i32:
             if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
-                tci_args_rrm(&tb_ptr, &r0, &r1, &oi);
+                tci_args_rrm(insn, &r0, &r1, &oi);
                 taddr = regs[r1];
             } else {
-                tci_args_rrrm(&tb_ptr, &r0, &r1, &r2, &oi);
+                tci_args_rrrm(insn, &r0, &r1, &r2, &oi);
                 taddr = tci_uint64(regs[r2], regs[r1]);
             }
             tmp32 = regs[r0];
@@ -1004,16 +878,17 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
 
         case INDEX_op_qemu_st_i64:
             if (TCG_TARGET_REG_BITS == 64) {
-                tci_args_rrm(&tb_ptr, &r0, &r1, &oi);
+                tci_args_rrm(insn, &r0, &r1, &oi);
                 taddr = regs[r1];
                 tmp64 = regs[r0];
             } else {
                 if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
-                    tci_args_rrrm(&tb_ptr, &r0, &r1, &r2, &oi);
+                    tci_args_rrrm(insn, &r0, &r1, &r2, &oi);
                     taddr = regs[r2];
                 } else {
-                    tci_args_rrrrm(&tb_ptr, &r0, &r1, &r2, &r3, &oi);
+                    tci_args_rrrrr(insn, &r0, &r1, &r2, &r3, &r4);
                     taddr = tci_uint64(regs[r3], regs[r2]);
+                    oi = regs[r4];
                 }
                 tmp64 = tci_uint64(regs[r1], regs[r0]);
             }
@@ -1097,14 +972,14 @@ static const char *str_c(TCGCond c)
 /* Disassemble TCI bytecode. */
 int print_insn_tci(bfd_vma addr, disassemble_info *info)
 {
-    uint8_t buf[256];
-    int length, status;
+    const uint32_t *tb_ptr = (const void *)(uintptr_t)addr;
     const TCGOpDef *def;
     const char *op_name;
+    uint32_t insn;
     TCGOpcode op;
-    TCGReg r0, r1, r2, r3;
+    TCGReg r0, r1, r2, r3, r4;
 #if TCG_TARGET_REG_BITS == 32
-    TCGReg r4, r5;
+    TCGReg r5;
 #endif
     tcg_target_ulong i1;
     int32_t s2;
@@ -1112,71 +987,54 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
     TCGMemOpIdx oi;
     uint8_t pos, len;
     void *ptr;
-    const uint8_t *tb_ptr;
 
-    status = info->read_memory_func(addr, buf, 2, info);
-    if (status != 0) {
-        info->memory_error_func(status, addr, info);
-        return -1;
-    }
-    op = buf[0];
-    length = buf[1];
+    /* TCI is always the host, so we don't need to load indirect. */
+    insn = *tb_ptr++;
 
-    if (length < 2) {
-        info->fprintf_func(info->stream, "invalid length %d", length);
-        return 1;
-    }
-
-    status = info->read_memory_func(addr + 2, buf + 2, length - 2, info);
-    if (status != 0) {
-        info->memory_error_func(status, addr + 2, info);
-        return -1;
-    }
+    info->fprintf_func(info->stream, "%08x  ", insn);
 
+    op = extract32(insn, 0, 8);
     def = &tcg_op_defs[op];
     op_name = def->name;
-    tb_ptr = buf + 2;
 
     switch (op) {
     case INDEX_op_br:
     case INDEX_op_exit_tb:
     case INDEX_op_goto_tb:
-        tci_args_l(&tb_ptr, &ptr);
+        tci_args_l(insn, tb_ptr, &ptr);
         info->fprintf_func(info->stream, "%-12s  %p", op_name, ptr);
         break;
 
     case INDEX_op_call:
-        tci_args_nl(&tb_ptr, &len, &ptr);
+        tci_args_nl(insn, tb_ptr, &len, &ptr);
         info->fprintf_func(info->stream, "%-12s  %d,%p", op_name, len, ptr);
         break;
 
     case INDEX_op_brcond_i32:
     case INDEX_op_brcond_i64:
-        tci_args_rl(&tb_ptr, &r0, &ptr);
+        tci_args_rl(insn, tb_ptr, &r0, &ptr);
         info->fprintf_func(info->stream, "%-12s  %s,0,ne,%p",
                            op_name, str_r(r0), ptr);
         break;
 
     case INDEX_op_setcond_i32:
     case INDEX_op_setcond_i64:
-        tci_args_rrrc(&tb_ptr, &r0, &r1, &r2, &c);
+        tci_args_rrrc(insn, &r0, &r1, &r2, &c);
         info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s",
                            op_name, str_r(r0), str_r(r1), str_r(r2), str_c(c));
         break;
 
-    case INDEX_op_tci_movi_i32:
-        tci_args_ri(&tb_ptr, &r0, &i1);
+    case INDEX_op_tci_movi:
+        tci_args_ri(insn, &r0, &i1);
         info->fprintf_func(info->stream, "%-12s  %s,0x%" TCG_PRIlx "",
                            op_name, str_r(r0), i1);
         break;
 
-#if TCG_TARGET_REG_BITS == 64
-    case INDEX_op_tci_movi_i64:
-        tci_args_rI(&tb_ptr, &r0, &i1);
-        info->fprintf_func(info->stream, "%-12s  %s,0x%" TCG_PRIlx "",
-                           op_name, str_r(r0), i1);
+    case INDEX_op_tci_movl:
+        tci_args_rl(insn, tb_ptr, &r0, &ptr);
+        info->fprintf_func(info->stream, "%-12s  %s,%p",
+                           op_name, str_r(r0), ptr);
         break;
-#endif
 
     case INDEX_op_ld8u_i32:
     case INDEX_op_ld8u_i64:
@@ -1197,7 +1055,7 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
     case INDEX_op_st32_i64:
     case INDEX_op_st_i32:
     case INDEX_op_st_i64:
-        tci_args_rrs(&tb_ptr, &r0, &r1, &s2);
+        tci_args_rrs(insn, &r0, &r1, &s2);
         info->fprintf_func(info->stream, "%-12s  %s,%s,%d",
                            op_name, str_r(r0), str_r(r1), s2);
         break;
@@ -1224,7 +1082,7 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
     case INDEX_op_not_i64:
     case INDEX_op_neg_i32:
     case INDEX_op_neg_i64:
-        tci_args_rr(&tb_ptr, &r0, &r1);
+        tci_args_rr(insn, &r0, &r1);
         info->fprintf_func(info->stream, "%-12s  %s,%s",
                            op_name, str_r(r0), str_r(r1));
         break;
@@ -1259,28 +1117,28 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
     case INDEX_op_rotl_i64:
     case INDEX_op_rotr_i32:
     case INDEX_op_rotr_i64:
-        tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+        tci_args_rrr(insn, &r0, &r1, &r2);
         info->fprintf_func(info->stream, "%-12s  %s,%s,%s",
                            op_name, str_r(r0), str_r(r1), str_r(r2));
         break;
 
     case INDEX_op_deposit_i32:
     case INDEX_op_deposit_i64:
-        tci_args_rrrbb(&tb_ptr, &r0, &r1, &r2, &pos, &len);
+        tci_args_rrrbb(insn, &r0, &r1, &r2, &pos, &len);
         info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%d,%d",
                            op_name, str_r(r0), str_r(r1), str_r(r2), pos, len);
         break;
 
 #if TCG_TARGET_REG_BITS == 32
     case INDEX_op_setcond2_i32:
-        tci_args_rrrrrc(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &c);
+        tci_args_rrrrrc(insn, &r0, &r1, &r2, &r3, &r4, &c);
         info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s,%s,%s",
                            op_name, str_r(r0), str_r(r1), str_r(r2),
                            str_r(r3), str_r(r4), str_c(c));
         break;
 
     case INDEX_op_mulu2_i32:
-        tci_args_rrrr(&tb_ptr, &r0, &r1, &r2, &r3);
+        tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
         info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s",
                            op_name, str_r(r0), str_r(r1),
                            str_r(r2), str_r(r3));
@@ -1288,7 +1146,7 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
 
     case INDEX_op_add2_i32:
     case INDEX_op_sub2_i32:
-        tci_args_rrrrrr(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &r5);
+        tci_args_rrrrrr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
         info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s,%s,%s",
                            op_name, str_r(r0), str_r(r1), str_r(r2),
                            str_r(r3), str_r(r4), str_r(r5));
@@ -1306,30 +1164,38 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
         len += DIV_ROUND_UP(TARGET_LONG_BITS, TCG_TARGET_REG_BITS);
         switch (len) {
         case 2:
-            tci_args_rrm(&tb_ptr, &r0, &r1, &oi);
+            tci_args_rrm(insn, &r0, &r1, &oi);
             info->fprintf_func(info->stream, "%-12s  %s,%s,%x",
                                op_name, str_r(r0), str_r(r1), oi);
             break;
         case 3:
-            tci_args_rrrm(&tb_ptr, &r0, &r1, &r2, &oi);
+            tci_args_rrrm(insn, &r0, &r1, &r2, &oi);
             info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%x",
                                op_name, str_r(r0), str_r(r1), str_r(r2), oi);
             break;
         case 4:
-            tci_args_rrrrm(&tb_ptr, &r0, &r1, &r2, &r3, &oi);
-            info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s,%x",
+            tci_args_rrrrr(insn, &r0, &r1, &r2, &r3, &r4);
+            info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s,%s",
                                op_name, str_r(r0), str_r(r1),
-                               str_r(r2), str_r(r3), oi);
+                               str_r(r2), str_r(r3), str_r(r4));
             break;
         default:
             g_assert_not_reached();
         }
         break;
 
+    case 0:
+        /* tcg_out_nop_fill uses zeros */
+        if (insn == 0) {
+            info->fprintf_func(info->stream, "align");
+            break;
+        }
+        /* fall through */
+
     default:
         info->fprintf_func(info->stream, "illegal opcode %d", op);
         break;
     }
 
-    return length;
+    return sizeof(insn);
 }
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index e06d4e9380..0df8384be7 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -22,20 +22,7 @@
  * THE SOFTWARE.
  */
 
-/* TODO list:
- * - See TODO comments in code.
- */
-
-/* Marker for missing code. */
-#define TODO() \
-    do { \
-        fprintf(stderr, "TODO %s:%u: %s()\n", \
-                __FILE__, __LINE__, __func__); \
-        tcg_abort(); \
-    } while (0)
-
-/* Bitfield n...m (in 32 bit value). */
-#define BITS(n, m) (((0xffffffffU << (31 - n)) >> (31 - n + m)) << m)
+#include "../tcg-pool.c.inc"
 
 static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 {
@@ -226,52 +213,16 @@ static const char *const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
-    /* tcg_out_reloc always uses the same type, addend. */
-    tcg_debug_assert(type == sizeof(tcg_target_long));
+    intptr_t diff = value - (intptr_t)(code_ptr + 1);
+
     tcg_debug_assert(addend == 0);
-    tcg_debug_assert(value != 0);
-    if (TCG_TARGET_REG_BITS == 32) {
-        tcg_patch32(code_ptr, value);
-    } else {
-        tcg_patch64(code_ptr, value);
-    }
-    return true;
-}
-
-/* Write value (native size). */
-static void tcg_out_i(TCGContext *s, tcg_target_ulong v)
-{
-    if (TCG_TARGET_REG_BITS == 32) {
-        tcg_out32(s, v);
-    } else {
-        tcg_out64(s, v);
-    }
-}
-
-/* Write opcode. */
-static void tcg_out_op_t(TCGContext *s, TCGOpcode op)
-{
-    tcg_out8(s, op);
-    tcg_out8(s, 0);
-}
-
-/* Write register. */
-static void tcg_out_r(TCGContext *s, TCGArg t0)
-{
-    tcg_debug_assert(t0 < TCG_TARGET_NB_REGS);
-    tcg_out8(s, t0);
-}
-
-/* Write label. */
-static void tci_out_label(TCGContext *s, TCGLabel *label)
-{
-    if (label->has_value) {
-        tcg_out_i(s, label->u.value);
-        tcg_debug_assert(label->u.value);
-    } else {
-        tcg_out_reloc(s, s->code_ptr, sizeof(tcg_target_ulong), label, 0);
-        s->code_ptr += sizeof(tcg_target_ulong);
+    tcg_debug_assert(type == 20);
+
+    if (diff == sextract32(diff, 0, type)) {
+        tcg_patch32(code_ptr, deposit32(*code_ptr, 32 - type, type, diff));
+        return true;
     }
+    return false;
 }
 
 static void stack_bounds_check(TCGReg base, target_long offset)
@@ -285,251 +236,236 @@ static void stack_bounds_check(TCGReg base, target_long offset)
 
 static void tcg_out_op_l(TCGContext *s, TCGOpcode op, TCGLabel *l0)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
 
-    tcg_out_op_t(s, op);
-    tci_out_label(s, l0);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    tcg_out_reloc(s, s->code_ptr, 20, l0, 0);
+    insn = deposit32(insn, 0, 8, op);
+    tcg_out32(s, insn);
 }
 
 static void tcg_out_op_p(TCGContext *s, TCGOpcode op, void *p0)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
+    intptr_t diff;
 
-    tcg_out_op_t(s, op);
-    tcg_out_i(s, (uintptr_t)p0);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    /* Special case for exit_tb: map null -> 0. */
+    if (p0 == NULL) {
+        diff = 0;
+    } else {
+        diff = p0 - (void *)(s->code_ptr + 1);
+        tcg_debug_assert(diff != 0);
+        if (diff != sextract32(diff, 0, 20)) {
+            tcg_raise_tb_overflow(s);
+        }
+    }
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 12, 20, diff);
+    tcg_out32(s, insn);
 }
 
 static void tcg_out_op_v(TCGContext *s, TCGOpcode op)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
-
-    tcg_out_op_t(s, op);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
-}
-
-static void tcg_out_op_np(TCGContext *s, TCGOpcode op,
-                          uint8_t n0, const void *p1)
-{
-    uint8_t *old_code_ptr = s->code_ptr;
-
-    tcg_out_op_t(s, op);
-    tcg_out8(s, n0);
-    tcg_out_i(s, (uintptr_t)p1);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    tcg_out32(s, (uint8_t)op);
 }
 
 static void tcg_out_op_ri(TCGContext *s, TCGOpcode op, TCGReg r0, int32_t i1)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
 
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out32(s, i1);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    tcg_debug_assert(i1 == sextract32(i1, 0, 20));
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 20, i1);
+    tcg_out32(s, insn);
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static void tcg_out_op_rI(TCGContext *s, TCGOpcode op,
-                          TCGReg r0, uint64_t i1)
-{
-    uint8_t *old_code_ptr = s->code_ptr;
-
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out64(s, i1);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
-}
-#endif
-
 static void tcg_out_op_rl(TCGContext *s, TCGOpcode op, TCGReg r0, TCGLabel *l1)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
 
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tci_out_label(s, l1);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    tcg_out_reloc(s, s->code_ptr, 20, l1, 0);
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    tcg_out32(s, insn);
 }
 
 static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
 
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out_r(s, r1);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    tcg_out32(s, insn);
 }
 
 static void tcg_out_op_rrm(TCGContext *s, TCGOpcode op,
                            TCGReg r0, TCGReg r1, TCGArg m2)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
 
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out_r(s, r1);
-    tcg_out32(s, m2);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    tcg_debug_assert(m2 == extract32(m2, 0, 12));
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 20, 12, m2);
+    tcg_out32(s, insn);
 }
 
 static void tcg_out_op_rrr(TCGContext *s, TCGOpcode op,
                            TCGReg r0, TCGReg r1, TCGReg r2)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
 
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out_r(s, r1);
-    tcg_out_r(s, r2);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 4, r2);
+    tcg_out32(s, insn);
 }
 
 static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
                            TCGReg r0, TCGReg r1, intptr_t i2)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
 
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out_r(s, r1);
-    tcg_debug_assert(i2 == (int32_t)i2);
-    tcg_out32(s, i2);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    tcg_debug_assert(i2 == sextract32(i2, 0, 16));
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 16, i2);
+    tcg_out32(s, insn);
 }
 
 static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
                             TCGReg r0, TCGReg r1, TCGReg r2, TCGCond c3)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
 
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out_r(s, r1);
-    tcg_out_r(s, r2);
-    tcg_out8(s, c3);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 4, r2);
+    insn = deposit32(insn, 20, 4, c3);
+    tcg_out32(s, insn);
 }
 
 static void tcg_out_op_rrrm(TCGContext *s, TCGOpcode op,
                             TCGReg r0, TCGReg r1, TCGReg r2, TCGArg m3)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
 
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out_r(s, r1);
-    tcg_out_r(s, r2);
-    tcg_out32(s, m3);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    tcg_debug_assert(m3 == extract32(m3, 0, 12));
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 4, r2);
+    insn = deposit32(insn, 20, 12, m3);
+    tcg_out32(s, insn);
 }
 
 static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
                              TCGReg r1, TCGReg r2, uint8_t b3, uint8_t b4)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
 
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out_r(s, r1);
-    tcg_out_r(s, r2);
-    tcg_out8(s, b3);
-    tcg_out8(s, b4);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    tcg_debug_assert(b3 == extract32(b3, 0, 6));
+    tcg_debug_assert(b4 == extract32(b4, 0, 6));
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 4, r2);
+    insn = deposit32(insn, 20, 6, b3);
+    insn = deposit32(insn, 26, 6, b4);
+    tcg_out32(s, insn);
 }
 
-static void tcg_out_op_rrrrm(TCGContext *s, TCGOpcode op, TCGReg r0,
-                             TCGReg r1, TCGReg r2, TCGReg r3, TCGArg m4)
+static void tcg_out_op_rrrrr(TCGContext *s, TCGOpcode op, TCGReg r0,
+                             TCGReg r1, TCGReg r2, TCGReg r3, TCGReg r4)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
 
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out_r(s, r1);
-    tcg_out_r(s, r2);
-    tcg_out_r(s, r3);
-    tcg_out32(s, m4);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 4, r2);
+    insn = deposit32(insn, 20, 4, r3);
+    insn = deposit32(insn, 24, 4, r4);
+    tcg_out32(s, insn);
 }
 
 #if TCG_TARGET_REG_BITS == 32
 static void tcg_out_op_rrrr(TCGContext *s, TCGOpcode op,
                             TCGReg r0, TCGReg r1, TCGReg r2, TCGReg r3)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
 
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out_r(s, r1);
-    tcg_out_r(s, r2);
-    tcg_out_r(s, r3);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 4, r2);
+    insn = deposit32(insn, 20, 4, r3);
+    tcg_out32(s, insn);
 }
 
 static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
                               TCGReg r0, TCGReg r1, TCGReg r2,
                               TCGReg r3, TCGReg r4, TCGCond c5)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
 
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out_r(s, r1);
-    tcg_out_r(s, r2);
-    tcg_out_r(s, r3);
-    tcg_out_r(s, r4);
-    tcg_out8(s, c5);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 4, r2);
+    insn = deposit32(insn, 20, 4, r3);
+    insn = deposit32(insn, 24, 4, r4);
+    insn = deposit32(insn, 28, 4, c5);
+    tcg_out32(s, insn);
 }
 
 static void tcg_out_op_rrrrrr(TCGContext *s, TCGOpcode op,
                               TCGReg r0, TCGReg r1, TCGReg r2,
                               TCGReg r3, TCGReg r4, TCGReg r5)
 {
-    uint8_t *old_code_ptr = s->code_ptr;
+    tcg_insn_unit insn = 0;
 
-    tcg_out_op_t(s, op);
-    tcg_out_r(s, r0);
-    tcg_out_r(s, r1);
-    tcg_out_r(s, r2);
-    tcg_out_r(s, r3);
-    tcg_out_r(s, r4);
-    tcg_out_r(s, r5);
-
-    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 4, r2);
+    insn = deposit32(insn, 20, 4, r3);
+    insn = deposit32(insn, 24, 4, r4);
+    insn = deposit32(insn, 28, 4, r5);
+    tcg_out32(s, insn);
 }
 #endif
 
+static void tcg_out_ldst(TCGContext *s, TCGOpcode op, TCGReg val,
+                         TCGReg base, intptr_t offset)
+{
+    stack_bounds_check(base, offset);
+    if (offset != sextract32(offset, 0, 16)) {
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, offset);
+        tcg_out_op_rrr(s, (TCG_TARGET_REG_BITS == 32
+                           ? INDEX_op_add_i32 : INDEX_op_add_i64),
+                       TCG_REG_TMP, TCG_REG_TMP, base);
+        base = TCG_REG_TMP;
+        offset = 0;
+    }
+    tcg_out_op_rrs(s, op, val, base, offset);
+}
+
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
                        intptr_t offset)
 {
-    stack_bounds_check(base, offset);
     switch (type) {
     case TCG_TYPE_I32:
-        tcg_out_op_rrs(s, INDEX_op_ld_i32, val, base, offset);
+        tcg_out_ldst(s, INDEX_op_ld_i32, val, base, offset);
         break;
 #if TCG_TARGET_REG_BITS == 64
     case TCG_TYPE_I64:
-        tcg_out_op_rrs(s, INDEX_op_ld_i64, val, base, offset);
+        tcg_out_ldst(s, INDEX_op_ld_i64, val, base, offset);
         break;
 #endif
     default:
@@ -559,22 +495,33 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 {
     switch (type) {
     case TCG_TYPE_I32:
-        tcg_out_op_ri(s, INDEX_op_tci_movi_i32, ret, arg);
-        break;
 #if TCG_TARGET_REG_BITS == 64
+        arg = (int32_t)arg;
+        /* fall through */
     case TCG_TYPE_I64:
-        tcg_out_op_rI(s, INDEX_op_tci_movi_i64, ret, arg);
-        break;
 #endif
+        break;
     default:
         g_assert_not_reached();
     }
+
+    if (arg == sextract32(arg, 0, 20)) {
+        tcg_out_op_ri(s, INDEX_op_tci_movi, ret, arg);
+    } else {
+        tcg_insn_unit insn = 0;
+
+        new_pool_label(s, arg, 20, s->code_ptr, 0);
+        insn = deposit32(insn, 0, 8, INDEX_op_tci_movl);
+        insn = deposit32(insn, 8, 4, ret);
+        tcg_out32(s, insn);
+    }
 }
 
 static void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
 {
     const TCGHelperInfo *info;
     uint8_t which;
+    tcg_insn_unit insn = 0;
 
     info = g_hash_table_lookup(helper_table, (gpointer)arg);
     if (info->cif->rtype == &ffi_type_void) {
@@ -586,7 +533,11 @@ static void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
         which = 2;
     }
 
-    tcg_out_op_np(s, INDEX_op_call, which, info);
+    new_pool_l2(s, 20, s->code_ptr, 0,
+                (uintptr_t)info->func, (uintptr_t)info->cif);
+    insn = deposit32(insn, 0, 8, INDEX_op_call);
+    insn = deposit32(insn, 8, 4, which);
+    tcg_out32(s, insn);
 }
 
 #if TCG_TARGET_REG_BITS == 64
@@ -644,8 +595,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_st_i32:
     CASE_64(st32)
     CASE_64(st)
-        stack_bounds_check(args[1], args[2]);
-        tcg_out_op_rrs(s, opc, args[0], args[1], args[2]);
+        tcg_out_ldst(s, opc, args[0], args[1], args[2]);
         break;
 
     CASE_32_64(add)
@@ -738,8 +688,9 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         } else if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
             tcg_out_op_rrrm(s, opc, args[0], args[1], args[2], args[3]);
         } else {
-            tcg_out_op_rrrrm(s, opc, args[0], args[1],
-                             args[2], args[3], args[4]);
+            tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_TMP, args[4]);
+            tcg_out_op_rrrrr(s, opc, args[0], args[1],
+                             args[2], args[3], TCG_REG_TMP);
         }
         break;
 
@@ -787,6 +738,11 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
     return arg_ct->ct & TCG_CT_CONST;
 }
 
+static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
+{
+    memset(p, 0, sizeof(*p) * count);
+}
+
 static void tcg_target_init(TCGContext *s)
 {
 #if defined(CONFIG_DEBUG_TCG_INTERPRETER)
diff --git a/tcg/tci/README b/tcg/tci/README
index 9bb7d7a5d3..f72a40a395 100644
--- a/tcg/tci/README
+++ b/tcg/tci/README
@@ -23,10 +23,12 @@ This is what TCI (Tiny Code Interpreter) does.
 Like each TCG host frontend, TCI implements the code generator in
 tcg-target.c.inc, tcg-target.h. Both files are in directory tcg/tci.
 
-The additional file tcg/tci.c adds the interpreter.
+The additional file tcg/tci.c adds the interpreter and disassembler.
 
-The bytecode consists of opcodes (same numeric values as those used by
-TCG), command length and arguments of variable size and number.
+The bytecode consists of opcodes (with only a few exceptions, with
+the same same numeric values and semantics as used by TCG), and up
+to six arguments packed into a 32-bit integer.  See comments in tci.c
+for details on the encoding.
 
 3) Usage
 
@@ -39,11 +41,6 @@ suggest using this option. Setting it automatically would need
 additional code in configure which must be fixed when new native TCG
 implementations are added.
 
-System emulation should work on any 32 or 64 bit host.
-User mode emulation might work. Maybe a new linker script (*.ld)
-is needed. Byte order might be wrong (on big endian hosts)
-and need fixes in configure.
-
 For hosts with native TCG, the interpreter TCI can be enabled by
 
         configure --enable-tcg-interpreter
@@ -118,13 +115,6 @@ u1 = linux-user-test works
   in the interpreter. These opcodes raise a runtime exception, so it is
   possible to see where code must be added.
 
-* The pseudo code is not optimized and still ugly. For hosts with special
-  alignment requirements, it needs some fixes (maybe aligned bytecode
-  would also improve speed for hosts which support byte alignment).
-
-* A better disassembler for the pseudo code would be nice (a very primitive
-  disassembler is included in tcg-target.c.inc).
-
 * It might be useful to have a runtime option which selects the native TCG
   or TCI, so QEMU would have to include two TCGs. Today, selecting TCI
   is a configure option, so you need two compilations of QEMU.
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 87/93] tcg/tci: Implement goto_ptr
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (86 preceding siblings ...)
  2021-02-04  1:56 ` [PATCH v2 86/93] tcg/tci: Change encoding to uint32_t units Richard Henderson
@ 2021-02-04  1:57 ` Richard Henderson
  2021-02-04  1:57 ` [PATCH v2 88/93] tcg/tci: Implement movcond Richard Henderson
                   ` (7 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

This operation is critical to staying within the interpretation
loop longer, which avoids the overhead of setup and teardown for
many TBs.

The check in tcg_prologue_init is disabled because TCI does
want to use NULL to indicate exit, as opposed to branching to
a real epilogue.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target-con-set.h |  1 +
 tcg/tci/tcg-target.h         |  2 +-
 tcg/tcg.c                    |  2 ++
 tcg/tci.c                    | 19 +++++++++++++++++++
 tcg/tci/tcg-target.c.inc     | 16 ++++++++++++++++
 5 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/tcg/tci/tcg-target-con-set.h b/tcg/tci/tcg-target-con-set.h
index 316730f32c..ae2dc3b844 100644
--- a/tcg/tci/tcg-target-con-set.h
+++ b/tcg/tci/tcg-target-con-set.h
@@ -9,6 +9,7 @@
  * Each operand should be a sequence of constraint letters as defined by
  * tcg-target-con-str.h; the constraint combination is inclusive or.
  */
+C_O0_I1(r)
 C_O0_I2(r, r)
 C_O0_I3(r, r, r)
 C_O0_I4(r, r, r, r)
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index d953f2ead3..17911d3297 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -86,7 +86,7 @@
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_goto_ptr         0
+#define TCG_TARGET_HAS_goto_ptr         1
 #define TCG_TARGET_HAS_direct_jump      0
 #define TCG_TARGET_HAS_qemu_st8_i32     0
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 92aec0d238..ce80adcfbe 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1314,10 +1314,12 @@ void tcg_prologue_init(TCGContext *s)
     }
 #endif
 
+#ifndef CONFIG_TCG_INTERPRETER
     /* Assert that goto_ptr is implemented completely.  */
     if (TCG_TARGET_HAS_goto_ptr) {
         tcg_debug_assert(tcg_code_gen_epilogue != NULL);
     }
+#endif
 }
 
 void tcg_func_start(TCGContext *s)
diff --git a/tcg/tci.c b/tcg/tci.c
index c4f0a7e82d..a6e30d31a9 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -69,6 +69,11 @@ static void tci_args_l(uint32_t insn, const void *tb_ptr, void **l0)
     *l0 = diff ? (void *)tb_ptr + diff : NULL;
 }
 
+static void tci_args_r(uint32_t insn, TCGReg *r0)
+{
+    *r0 = extract32(insn, 8, 4);
+}
+
 static void tci_args_nl(uint32_t insn, const void *tb_ptr,
                         uint8_t *n0, void **l1)
 {
@@ -748,6 +753,15 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tb_ptr = *(void **)ptr;
             break;
 
+        case INDEX_op_goto_ptr:
+            tci_args_r(insn, &r0);
+            ptr = (void *)regs[r0];
+            if (!ptr) {
+                return 0;
+            }
+            tb_ptr = ptr;
+            break;
+
         case INDEX_op_qemu_ld_i32:
             if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
                 tci_args_rrm(insn, &r0, &r1, &oi);
@@ -1005,6 +1019,11 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
         info->fprintf_func(info->stream, "%-12s  %p", op_name, ptr);
         break;
 
+    case INDEX_op_goto_ptr:
+        tci_args_r(insn, &r0);
+        info->fprintf_func(info->stream, "%-12s  %s", op_name, str_r(r0));
+        break;
+
     case INDEX_op_call:
         tci_args_nl(insn, tb_ptr, &len, &ptr);
         info->fprintf_func(info->stream, "%-12s  %d,%p", op_name, len, ptr);
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 0df8384be7..db29bc6e54 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -27,6 +27,9 @@
 static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 {
     switch (op) {
+    case INDEX_op_goto_ptr:
+        return C_O0_I1(r);
+
     case INDEX_op_ld8u_i32:
     case INDEX_op_ld8s_i32:
     case INDEX_op_ld16u_i32:
@@ -263,6 +266,15 @@ static void tcg_out_op_p(TCGContext *s, TCGOpcode op, void *p0)
     tcg_out32(s, insn);
 }
 
+static void tcg_out_op_r(TCGContext *s, TCGOpcode op, TCGReg r0)
+{
+    tcg_insn_unit insn = 0;
+
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    tcg_out32(s, insn);
+}
+
 static void tcg_out_op_v(TCGContext *s, TCGOpcode op)
 {
     tcg_out32(s, (uint8_t)op);
@@ -567,6 +579,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         set_jmp_reset_offset(s, args[0]);
         break;
 
+    case INDEX_op_goto_ptr:
+        tcg_out_op_r(s, opc, args[0]);
+        break;
+
     case INDEX_op_br:
         tcg_out_op_l(s, opc, arg_label(args[0]));
         break;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 88/93] tcg/tci: Implement movcond
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (87 preceding siblings ...)
  2021-02-04  1:57 ` [PATCH v2 87/93] tcg/tci: Implement goto_ptr Richard Henderson
@ 2021-02-04  1:57 ` Richard Henderson
  2021-02-04  1:57 ` [PATCH v2 89/93] tcg/tci: Implement andc, orc, eqv, nand, nor Richard Henderson
                   ` (6 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

When this opcode is not available in the backend, tcg middle-end
will expand this as a series of 5 opcodes.  So implementing this
saves bytecode space.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.h     |  4 ++--
 tcg/tci.c                | 16 +++++++++++++++-
 tcg/tci/tcg-target.c.inc | 10 +++++++---
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 17911d3297..f53773a555 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -82,7 +82,7 @@
 #define TCG_TARGET_HAS_not_i32          1
 #define TCG_TARGET_HAS_orc_i32          0
 #define TCG_TARGET_HAS_rot_i32          1
-#define TCG_TARGET_HAS_movcond_i32      0
+#define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
@@ -119,7 +119,7 @@
 #define TCG_TARGET_HAS_not_i64          1
 #define TCG_TARGET_HAS_orc_i64          0
 #define TCG_TARGET_HAS_rot_i64          1
-#define TCG_TARGET_HAS_movcond_i64      0
+#define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_muls2_i64        0
 #define TCG_TARGET_HAS_add2_i32         0
 #define TCG_TARGET_HAS_sub2_i32         0
diff --git a/tcg/tci.c b/tcg/tci.c
index a6e30d31a9..2a39f8f5a0 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -169,6 +169,7 @@ static void tci_args_rrrr(uint32_t insn,
     *r2 = extract32(insn, 16, 4);
     *r3 = extract32(insn, 20, 4);
 }
+#endif
 
 static void tci_args_rrrrrc(uint32_t insn, TCGReg *r0, TCGReg *r1,
                             TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
@@ -181,6 +182,7 @@ static void tci_args_rrrrrc(uint32_t insn, TCGReg *r0, TCGReg *r1,
     *c5 = extract32(insn, 28, 4);
 }
 
+#if TCG_TARGET_REG_BITS == 32
 static void tci_args_rrrrrr(uint32_t insn, TCGReg *r0, TCGReg *r1,
                             TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGReg *r5)
 {
@@ -431,6 +433,11 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_args_rrrc(insn, &r0, &r1, &r2, &condition);
             regs[r0] = tci_compare32(regs[r1], regs[r2], condition);
             break;
+        case INDEX_op_movcond_i32:
+            tci_args_rrrrrc(insn, &r0, &r1, &r2, &r3, &r4, &condition);
+            tmp32 = tci_compare32(regs[r1], regs[r2], condition);
+            regs[r0] = regs[tmp32 ? r3 : r4];
+            break;
 #if TCG_TARGET_REG_BITS == 32
         case INDEX_op_setcond2_i32:
             tci_args_rrrrrc(insn, &r0, &r1, &r2, &r3, &r4, &condition);
@@ -443,6 +450,11 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_args_rrrc(insn, &r0, &r1, &r2, &condition);
             regs[r0] = tci_compare64(regs[r1], regs[r2], condition);
             break;
+        case INDEX_op_movcond_i64:
+            tci_args_rrrrrc(insn, &r0, &r1, &r2, &r3, &r4, &condition);
+            tmp32 = tci_compare64(regs[r1], regs[r2], condition);
+            regs[r0] = regs[tmp32 ? r3 : r4];
+            break;
 #endif
         CASE_32_64(mov)
             tci_args_rr(insn, &r0, &r1);
@@ -1148,7 +1160,8 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
                            op_name, str_r(r0), str_r(r1), str_r(r2), pos, len);
         break;
 
-#if TCG_TARGET_REG_BITS == 32
+    case INDEX_op_movcond_i32:
+    case INDEX_op_movcond_i64:
     case INDEX_op_setcond2_i32:
         tci_args_rrrrrc(insn, &r0, &r1, &r2, &r3, &r4, &c);
         info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s,%s,%s",
@@ -1156,6 +1169,7 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
                            str_r(r3), str_r(r4), str_c(c));
         break;
 
+#if TCG_TARGET_REG_BITS == 32
     case INDEX_op_mulu2_i32:
         tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
         info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s",
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index db29bc6e54..a0c458a60a 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -133,9 +133,12 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         return C_O0_I4(r, r, r, r);
     case INDEX_op_mulu2_i32:
         return C_O2_I2(r, r, r, r);
+#endif
+
+    case INDEX_op_movcond_i32:
+    case INDEX_op_movcond_i64:
     case INDEX_op_setcond2_i32:
         return C_O1_I4(r, r, r, r, r);
-#endif
 
     case INDEX_op_qemu_ld_i32:
         return (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS
@@ -419,6 +422,7 @@ static void tcg_out_op_rrrr(TCGContext *s, TCGOpcode op,
     insn = deposit32(insn, 20, 4, r3);
     tcg_out32(s, insn);
 }
+#endif
 
 static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
                               TCGReg r0, TCGReg r1, TCGReg r2,
@@ -436,6 +440,7 @@ static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
     tcg_out32(s, insn);
 }
 
+#if TCG_TARGET_REG_BITS == 32
 static void tcg_out_op_rrrrrr(TCGContext *s, TCGOpcode op,
                               TCGReg r0, TCGReg r1, TCGReg r2,
                               TCGReg r3, TCGReg r4, TCGReg r5)
@@ -591,12 +596,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_op_rrrc(s, opc, args[0], args[1], args[2], args[3]);
         break;
 
-#if TCG_TARGET_REG_BITS == 32
+    CASE_32_64(movcond)
     case INDEX_op_setcond2_i32:
         tcg_out_op_rrrrrc(s, opc, args[0], args[1], args[2],
                           args[3], args[4], args[5]);
         break;
-#endif
 
     CASE_32_64(ld8u)
     CASE_32_64(ld8s)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 89/93] tcg/tci: Implement andc, orc, eqv, nand, nor
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (88 preceding siblings ...)
  2021-02-04  1:57 ` [PATCH v2 88/93] tcg/tci: Implement movcond Richard Henderson
@ 2021-02-04  1:57 ` Richard Henderson
  2021-02-04  1:57 ` [PATCH v2 90/93] tcg/tci: Implement extract, sextract Richard Henderson
                   ` (5 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

These were already present in tcg-target.c.inc,
but not in the interpreter.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.h | 20 ++++++++++----------
 tcg/tci.c            | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index f53773a555..5945272a43 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -67,20 +67,20 @@
 #define TCG_TARGET_HAS_ext16s_i32       1
 #define TCG_TARGET_HAS_ext8u_i32        1
 #define TCG_TARGET_HAS_ext16u_i32       1
-#define TCG_TARGET_HAS_andc_i32         0
+#define TCG_TARGET_HAS_andc_i32         1
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      0
 #define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_extract2_i32     0
-#define TCG_TARGET_HAS_eqv_i32          0
-#define TCG_TARGET_HAS_nand_i32         0
-#define TCG_TARGET_HAS_nor_i32          0
+#define TCG_TARGET_HAS_eqv_i32          1
+#define TCG_TARGET_HAS_nand_i32         1
+#define TCG_TARGET_HAS_nor_i32          1
 #define TCG_TARGET_HAS_clz_i32          0
 #define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_ctpop_i32        0
 #define TCG_TARGET_HAS_neg_i32          1
 #define TCG_TARGET_HAS_not_i32          1
-#define TCG_TARGET_HAS_orc_i32          0
+#define TCG_TARGET_HAS_orc_i32          1
 #define TCG_TARGET_HAS_rot_i32          1
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_muls2_i32        0
@@ -108,16 +108,16 @@
 #define TCG_TARGET_HAS_ext8u_i64        1
 #define TCG_TARGET_HAS_ext16u_i64       1
 #define TCG_TARGET_HAS_ext32u_i64       1
-#define TCG_TARGET_HAS_andc_i64         0
-#define TCG_TARGET_HAS_eqv_i64          0
-#define TCG_TARGET_HAS_nand_i64         0
-#define TCG_TARGET_HAS_nor_i64          0
+#define TCG_TARGET_HAS_andc_i64         1
+#define TCG_TARGET_HAS_eqv_i64          1
+#define TCG_TARGET_HAS_nand_i64         1
+#define TCG_TARGET_HAS_nor_i64          1
 #define TCG_TARGET_HAS_clz_i64          0
 #define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_ctpop_i64        0
 #define TCG_TARGET_HAS_neg_i64          1
 #define TCG_TARGET_HAS_not_i64          1
-#define TCG_TARGET_HAS_orc_i64          0
+#define TCG_TARGET_HAS_orc_i64          1
 #define TCG_TARGET_HAS_rot_i64          1
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_muls2_i64        0
diff --git a/tcg/tci.c b/tcg/tci.c
index 2a39f8f5a0..9c17947e6b 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -540,6 +540,36 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = regs[r1] ^ regs[r2];
             break;
+#if TCG_TARGET_HAS_andc_i32 || TCG_TARGET_HAS_andc_i64
+        CASE_32_64(andc)
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] & ~regs[r2];
+            break;
+#endif
+#if TCG_TARGET_HAS_orc_i32 || TCG_TARGET_HAS_orc_i64
+        CASE_32_64(orc)
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] | ~regs[r2];
+            break;
+#endif
+#if TCG_TARGET_HAS_eqv_i32 || TCG_TARGET_HAS_eqv_i64
+        CASE_32_64(eqv)
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = ~(regs[r1] ^ regs[r2]);
+            break;
+#endif
+#if TCG_TARGET_HAS_nand_i32 || TCG_TARGET_HAS_nand_i64
+        CASE_32_64(nand)
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = ~(regs[r1] & regs[r2]);
+            break;
+#endif
+#if TCG_TARGET_HAS_nor_i32 || TCG_TARGET_HAS_nor_i64
+        CASE_32_64(nor)
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = ~(regs[r1] | regs[r2]);
+            break;
+#endif
 
             /* Arithmetic operations (32 bit). */
 
@@ -1130,6 +1160,16 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
     case INDEX_op_or_i64:
     case INDEX_op_xor_i32:
     case INDEX_op_xor_i64:
+    case INDEX_op_andc_i32:
+    case INDEX_op_andc_i64:
+    case INDEX_op_orc_i32:
+    case INDEX_op_orc_i64:
+    case INDEX_op_eqv_i32:
+    case INDEX_op_eqv_i64:
+    case INDEX_op_nand_i32:
+    case INDEX_op_nand_i64:
+    case INDEX_op_nor_i32:
+    case INDEX_op_nor_i64:
     case INDEX_op_div_i32:
     case INDEX_op_div_i64:
     case INDEX_op_rem_i32:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 90/93] tcg/tci: Implement extract, sextract
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (89 preceding siblings ...)
  2021-02-04  1:57 ` [PATCH v2 89/93] tcg/tci: Implement andc, orc, eqv, nand, nor Richard Henderson
@ 2021-02-04  1:57 ` Richard Henderson
  2021-02-04  1:58 ` [PATCH v2 91/93] tcg/tci: Implement clz, ctz, ctpop Richard Henderson
                   ` (4 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.h     |  8 ++++----
 tcg/tci.c                | 42 ++++++++++++++++++++++++++++++++++++++++
 tcg/tci/tcg-target.c.inc | 32 ++++++++++++++++++++++++++++++
 3 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 5945272a43..60b67b196b 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -69,8 +69,8 @@
 #define TCG_TARGET_HAS_ext16u_i32       1
 #define TCG_TARGET_HAS_andc_i32         1
 #define TCG_TARGET_HAS_deposit_i32      1
-#define TCG_TARGET_HAS_extract_i32      0
-#define TCG_TARGET_HAS_sextract_i32     0
+#define TCG_TARGET_HAS_extract_i32      1
+#define TCG_TARGET_HAS_sextract_i32     1
 #define TCG_TARGET_HAS_extract2_i32     0
 #define TCG_TARGET_HAS_eqv_i32          1
 #define TCG_TARGET_HAS_nand_i32         1
@@ -97,8 +97,8 @@
 #define TCG_TARGET_HAS_bswap32_i64      1
 #define TCG_TARGET_HAS_bswap64_i64      1
 #define TCG_TARGET_HAS_deposit_i64      1
-#define TCG_TARGET_HAS_extract_i64      0
-#define TCG_TARGET_HAS_sextract_i64     0
+#define TCG_TARGET_HAS_extract_i64      1
+#define TCG_TARGET_HAS_sextract_i64     1
 #define TCG_TARGET_HAS_extract2_i64     0
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          1
diff --git a/tcg/tci.c b/tcg/tci.c
index 9c17947e6b..831a3bb97e 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -122,6 +122,15 @@ static void tci_args_rrs(uint32_t insn, TCGReg *r0, TCGReg *r1, int32_t *i2)
     *i2 = sextract32(insn, 16, 16);
 }
 
+static void tci_args_rrbb(uint32_t insn, TCGReg *r0, TCGReg *r1,
+                          uint8_t *i2, uint8_t *i3)
+{
+    *r0 = extract32(insn, 8, 4);
+    *r1 = extract32(insn, 12, 4);
+    *i2 = extract32(insn, 16, 6);
+    *i3 = extract32(insn, 22, 6);
+}
+
 static void tci_args_rrrc(uint32_t insn,
                           TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
 {
@@ -619,6 +628,18 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_args_rrrbb(insn, &r0, &r1, &r2, &pos, &len);
             regs[r0] = deposit32(regs[r1], pos, len, regs[r2]);
             break;
+#endif
+#if TCG_TARGET_HAS_extract_i32
+        case INDEX_op_extract_i32:
+            tci_args_rrbb(insn, &r0, &r1, &pos, &len);
+            regs[r0] = extract32(regs[r1], pos, len);
+            break;
+#endif
+#if TCG_TARGET_HAS_sextract_i32
+        case INDEX_op_sextract_i32:
+            tci_args_rrbb(insn, &r0, &r1, &pos, &len);
+            regs[r0] = sextract32(regs[r1], pos, len);
+            break;
 #endif
         case INDEX_op_brcond_i32:
             tci_args_rl(insn, tb_ptr, &r0, &ptr);
@@ -759,6 +780,18 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_args_rrrbb(insn, &r0, &r1, &r2, &pos, &len);
             regs[r0] = deposit64(regs[r1], pos, len, regs[r2]);
             break;
+#endif
+#if TCG_TARGET_HAS_extract_i64
+        case INDEX_op_extract_i64:
+            tci_args_rrbb(insn, &r0, &r1, &pos, &len);
+            regs[r0] = extract64(regs[r1], pos, len);
+            break;
+#endif
+#if TCG_TARGET_HAS_sextract_i64
+        case INDEX_op_sextract_i64:
+            tci_args_rrbb(insn, &r0, &r1, &pos, &len);
+            regs[r0] = sextract64(regs[r1], pos, len);
+            break;
 #endif
         case INDEX_op_brcond_i64:
             tci_args_rl(insn, tb_ptr, &r0, &ptr);
@@ -1200,6 +1233,15 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
                            op_name, str_r(r0), str_r(r1), str_r(r2), pos, len);
         break;
 
+    case INDEX_op_extract_i32:
+    case INDEX_op_extract_i64:
+    case INDEX_op_sextract_i32:
+    case INDEX_op_sextract_i64:
+        tci_args_rrbb(insn, &r0, &r1, &pos, &len);
+        info->fprintf_func(info->stream, "%-12s  %s,%s,%d,%d",
+                           op_name, str_r(r0), str_r(r1), pos, len);
+        break;
+
     case INDEX_op_movcond_i32:
     case INDEX_op_movcond_i64:
     case INDEX_op_setcond2_i32:
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index a0c458a60a..cedd0328df 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -63,6 +63,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_bswap32_i32:
     case INDEX_op_bswap32_i64:
     case INDEX_op_bswap64_i64:
+    case INDEX_op_extract_i32:
+    case INDEX_op_extract_i64:
+    case INDEX_op_sextract_i32:
+    case INDEX_op_sextract_i64:
         return C_O1_I1(r, r);
 
     case INDEX_op_st8_i32:
@@ -352,6 +356,21 @@ static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
     tcg_out32(s, insn);
 }
 
+static void tcg_out_op_rrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
+                            TCGReg r1, uint8_t b2, uint8_t b3)
+{
+    tcg_insn_unit insn = 0;
+
+    tcg_debug_assert(b2 == extract32(b2, 0, 6));
+    tcg_debug_assert(b3 == extract32(b3, 0, 6));
+    insn = deposit32(insn, 0, 8, op);
+    insn = deposit32(insn, 8, 4, r0);
+    insn = deposit32(insn, 12, 4, r1);
+    insn = deposit32(insn, 16, 6, b2);
+    insn = deposit32(insn, 22, 6, b3);
+    tcg_out32(s, insn);
+}
+
 static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
                             TCGReg r0, TCGReg r1, TCGReg r2, TCGCond c3)
 {
@@ -653,6 +672,19 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         }
         break;
 
+    CASE_32_64(extract)  /* Optional (TCG_TARGET_HAS_extract_*). */
+    CASE_32_64(sextract) /* Optional (TCG_TARGET_HAS_sextract_*). */
+        {
+            TCGArg pos = args[2], len = args[3];
+            TCGArg max = tcg_op_defs[opc].flags & TCG_OPF_64BIT ? 64 : 32;
+
+            tcg_debug_assert(pos < max);
+            tcg_debug_assert(pos + len <= max);
+
+            tcg_out_op_rrbb(s, opc, args[0], args[1], pos, len);
+        }
+        break;
+
     CASE_32_64(brcond)
         tcg_out_op_rrrc(s, (opc == INDEX_op_brcond_i32
                             ? INDEX_op_setcond_i32 : INDEX_op_setcond_i64),
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 91/93] tcg/tci: Implement clz, ctz, ctpop
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (90 preceding siblings ...)
  2021-02-04  1:57 ` [PATCH v2 90/93] tcg/tci: Implement extract, sextract Richard Henderson
@ 2021-02-04  1:58 ` Richard Henderson
  2021-02-04  1:58 ` [PATCH v2 92/93] tcg/tci: Implement mulu2, muls2 Richard Henderson
                   ` (3 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.h     | 12 +++++------
 tcg/tci.c                | 44 ++++++++++++++++++++++++++++++++++++++++
 tcg/tci/tcg-target.c.inc |  9 ++++++++
 3 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 60b67b196b..59859bd8a6 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -75,9 +75,9 @@
 #define TCG_TARGET_HAS_eqv_i32          1
 #define TCG_TARGET_HAS_nand_i32         1
 #define TCG_TARGET_HAS_nor_i32          1
-#define TCG_TARGET_HAS_clz_i32          0
-#define TCG_TARGET_HAS_ctz_i32          0
-#define TCG_TARGET_HAS_ctpop_i32        0
+#define TCG_TARGET_HAS_clz_i32          1
+#define TCG_TARGET_HAS_ctz_i32          1
+#define TCG_TARGET_HAS_ctpop_i32        1
 #define TCG_TARGET_HAS_neg_i32          1
 #define TCG_TARGET_HAS_not_i32          1
 #define TCG_TARGET_HAS_orc_i32          1
@@ -112,9 +112,9 @@
 #define TCG_TARGET_HAS_eqv_i64          1
 #define TCG_TARGET_HAS_nand_i64         1
 #define TCG_TARGET_HAS_nor_i64          1
-#define TCG_TARGET_HAS_clz_i64          0
-#define TCG_TARGET_HAS_ctz_i64          0
-#define TCG_TARGET_HAS_ctpop_i64        0
+#define TCG_TARGET_HAS_clz_i64          1
+#define TCG_TARGET_HAS_ctz_i64          1
+#define TCG_TARGET_HAS_ctpop_i64        1
 #define TCG_TARGET_HAS_neg_i64          1
 #define TCG_TARGET_HAS_not_i64          1
 #define TCG_TARGET_HAS_orc_i64          1
diff --git a/tcg/tci.c b/tcg/tci.c
index 831a3bb97e..35f2c4bfbb 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -598,6 +598,26 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = (uint32_t)regs[r1] % (uint32_t)regs[r2];
             break;
+#if TCG_TARGET_HAS_clz_i32
+        case INDEX_op_clz_i32:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            tmp32 = regs[r1];
+            regs[r0] = tmp32 ? clz32(tmp32) : regs[r2];
+            break;
+#endif
+#if TCG_TARGET_HAS_ctz_i32
+        case INDEX_op_ctz_i32:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            tmp32 = regs[r1];
+            regs[r0] = tmp32 ? ctz32(tmp32) : regs[r2];
+            break;
+#endif
+#if TCG_TARGET_HAS_ctpop_i32
+        case INDEX_op_ctpop_i32:
+            tci_args_rr(insn, &r0, &r1);
+            regs[r0] = ctpop32(regs[r1]);
+            break;
+#endif
 
             /* Shift/rotate operations (32 bit). */
 
@@ -750,6 +770,24 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             tci_args_rrr(insn, &r0, &r1, &r2);
             regs[r0] = (uint64_t)regs[r1] % (uint64_t)regs[r2];
             break;
+#if TCG_TARGET_HAS_clz_i64
+        case INDEX_op_clz_i64:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] ? clz64(regs[r1]) : regs[r2];
+            break;
+#endif
+#if TCG_TARGET_HAS_ctz_i64
+        case INDEX_op_ctz_i64:
+            tci_args_rrr(insn, &r0, &r1, &r2);
+            regs[r0] = regs[r1] ? ctz64(regs[r1]) : regs[r2];
+            break;
+#endif
+#if TCG_TARGET_HAS_ctpop_i64
+        case INDEX_op_ctpop_i64:
+            tci_args_rr(insn, &r0, &r1);
+            regs[r0] = ctpop64(regs[r1]);
+            break;
+#endif
 
             /* Shift/rotate operations (64 bit). */
 
@@ -1176,6 +1214,8 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
     case INDEX_op_not_i64:
     case INDEX_op_neg_i32:
     case INDEX_op_neg_i64:
+    case INDEX_op_ctpop_i32:
+    case INDEX_op_ctpop_i64:
         tci_args_rr(insn, &r0, &r1);
         info->fprintf_func(info->stream, "%-12s  %s,%s",
                            op_name, str_r(r0), str_r(r1));
@@ -1221,6 +1261,10 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
     case INDEX_op_rotl_i64:
     case INDEX_op_rotr_i32:
     case INDEX_op_rotr_i64:
+    case INDEX_op_clz_i32:
+    case INDEX_op_clz_i64:
+    case INDEX_op_ctz_i32:
+    case INDEX_op_ctz_i64:
         tci_args_rrr(insn, &r0, &r1, &r2);
         info->fprintf_func(info->stream, "%-12s  %s,%s,%s",
                            op_name, str_r(r0), str_r(r1), str_r(r2));
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index cedd0328df..664d715440 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -67,6 +67,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_extract_i64:
     case INDEX_op_sextract_i32:
     case INDEX_op_sextract_i64:
+    case INDEX_op_ctpop_i32:
+    case INDEX_op_ctpop_i64:
         return C_O1_I1(r, r);
 
     case INDEX_op_st8_i32:
@@ -122,6 +124,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_setcond_i64:
     case INDEX_op_deposit_i32:
     case INDEX_op_deposit_i64:
+    case INDEX_op_clz_i32:
+    case INDEX_op_clz_i64:
+    case INDEX_op_ctz_i32:
+    case INDEX_op_ctz_i64:
         return C_O1_I2(r, r, r);
 
     case INDEX_op_brcond_i32:
@@ -657,6 +663,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     CASE_32_64(divu)     /* Optional (TCG_TARGET_HAS_div_*). */
     CASE_32_64(rem)      /* Optional (TCG_TARGET_HAS_div_*). */
     CASE_32_64(remu)     /* Optional (TCG_TARGET_HAS_div_*). */
+    CASE_32_64(clz)      /* Optional (TCG_TARGET_HAS_clz_*). */
+    CASE_32_64(ctz)      /* Optional (TCG_TARGET_HAS_ctz_*). */
         tcg_out_op_rrr(s, opc, args[0], args[1], args[2]);
         break;
 
@@ -705,6 +713,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     CASE_32_64(bswap16)  /* Optional (TCG_TARGET_HAS_bswap16_*). */
     CASE_32_64(bswap32)  /* Optional (TCG_TARGET_HAS_bswap32_*). */
     CASE_64(bswap64)     /* Optional (TCG_TARGET_HAS_bswap64_i64). */
+    CASE_32_64(ctpop)    /* Optional (TCG_TARGET_HAS_ctpop_*). */
         tcg_out_op_rr(s, opc, args[0], args[1]);
         break;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 92/93] tcg/tci: Implement mulu2, muls2
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (91 preceding siblings ...)
  2021-02-04  1:58 ` [PATCH v2 91/93] tcg/tci: Implement clz, ctz, ctpop Richard Henderson
@ 2021-02-04  1:58 ` Richard Henderson
  2021-02-04  1:58 ` [PATCH v2 93/93] tcg/tci: Implement add2, sub2 Richard Henderson
                   ` (2 subsequent siblings)
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

We already had mulu2_i32 for a 32-bit host; expand this to 64-bit
hosts as well.  The muls2_i32 and the 64-bit opcodes are new.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.h     |  8 ++++----
 tcg/tci.c                | 35 +++++++++++++++++++++++++++++------
 tcg/tci/tcg-target.c.inc | 16 ++++++++++------
 3 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 59859bd8a6..71a44bbfb0 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -83,7 +83,7 @@
 #define TCG_TARGET_HAS_orc_i32          1
 #define TCG_TARGET_HAS_rot_i32          1
 #define TCG_TARGET_HAS_movcond_i32      1
-#define TCG_TARGET_HAS_muls2_i32        0
+#define TCG_TARGET_HAS_muls2_i32        1
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_goto_ptr         1
@@ -120,13 +120,13 @@
 #define TCG_TARGET_HAS_orc_i64          1
 #define TCG_TARGET_HAS_rot_i64          1
 #define TCG_TARGET_HAS_movcond_i64      1
-#define TCG_TARGET_HAS_muls2_i64        0
+#define TCG_TARGET_HAS_muls2_i64        1
 #define TCG_TARGET_HAS_add2_i32         0
 #define TCG_TARGET_HAS_sub2_i32         0
-#define TCG_TARGET_HAS_mulu2_i32        0
+#define TCG_TARGET_HAS_mulu2_i32        1
 #define TCG_TARGET_HAS_add2_i64         0
 #define TCG_TARGET_HAS_sub2_i64         0
-#define TCG_TARGET_HAS_mulu2_i64        0
+#define TCG_TARGET_HAS_mulu2_i64        1
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i64        0
 #else
diff --git a/tcg/tci.c b/tcg/tci.c
index 35f2c4bfbb..5d83b2d957 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -39,7 +39,7 @@ __thread uintptr_t tci_tb_ptr;
 static void tci_write_reg64(tcg_target_ulong *regs, uint32_t high_index,
                             uint32_t low_index, uint64_t value)
 {
-    regs[low_index] = value;
+    regs[low_index] = (uint32_t)value;
     regs[high_index] = value >> 32;
 }
 
@@ -169,7 +169,6 @@ static void tci_args_rrrrr(uint32_t insn, TCGReg *r0, TCGReg *r1,
     *r4 = extract32(insn, 24, 4);
 }
 
-#if TCG_TARGET_REG_BITS == 32
 static void tci_args_rrrr(uint32_t insn,
                           TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
 {
@@ -178,7 +177,6 @@ static void tci_args_rrrr(uint32_t insn,
     *r2 = extract32(insn, 16, 4);
     *r3 = extract32(insn, 20, 4);
 }
-#endif
 
 static void tci_args_rrrrrc(uint32_t insn, TCGReg *r0, TCGReg *r1,
                             TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
@@ -680,11 +678,21 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             T2 = tci_uint64(regs[r5], regs[r4]);
             tci_write_reg64(regs, r1, r0, T1 - T2);
             break;
+#endif /* TCG_TARGET_REG_BITS == 32 */
+#if TCG_TARGET_HAS_mulu2_i32
         case INDEX_op_mulu2_i32:
             tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
-            tci_write_reg64(regs, r1, r0, (uint64_t)regs[r2] * regs[r3]);
+            tmp64 = (uint64_t)(uint32_t)regs[r2] * (uint32_t)regs[r3];
+            tci_write_reg64(regs, r1, r0, tmp64);
             break;
-#endif /* TCG_TARGET_REG_BITS == 32 */
+#endif
+#if TCG_TARGET_HAS_muls2_i32
+        case INDEX_op_muls2_i32:
+            tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
+            tmp64 = (int64_t)(int32_t)regs[r2] * (int32_t)regs[r3];
+            tci_write_reg64(regs, r1, r0, tmp64);
+            break;
+#endif
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
         CASE_32_64(ext8s)
             tci_args_rr(insn, &r0, &r1);
@@ -788,6 +796,18 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             regs[r0] = ctpop64(regs[r1]);
             break;
 #endif
+#if TCG_TARGET_HAS_mulu2_i64
+        case INDEX_op_mulu2_i64:
+            tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
+            mulu64(&regs[r0], &regs[r1], regs[r2], regs[r3]);
+            break;
+#endif
+#if TCG_TARGET_HAS_muls2_i64
+        case INDEX_op_muls2_i64:
+            tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
+            muls64(&regs[r0], &regs[r1], regs[r2], regs[r3]);
+            break;
+#endif
 
             /* Shift/rotate operations (64 bit). */
 
@@ -1295,14 +1315,17 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
                            str_r(r3), str_r(r4), str_c(c));
         break;
 
-#if TCG_TARGET_REG_BITS == 32
     case INDEX_op_mulu2_i32:
+    case INDEX_op_mulu2_i64:
+    case INDEX_op_muls2_i32:
+    case INDEX_op_muls2_i64:
         tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
         info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s",
                            op_name, str_r(r0), str_r(r1),
                            str_r(r2), str_r(r3));
         break;
 
+#if TCG_TARGET_REG_BITS == 32
     case INDEX_op_add2_i32:
     case INDEX_op_sub2_i32:
         tci_args_rrrrrr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 664d715440..eb48633fba 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -141,10 +141,14 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         return C_O2_I4(r, r, r, r, r, r);
     case INDEX_op_brcond2_i32:
         return C_O0_I4(r, r, r, r);
-    case INDEX_op_mulu2_i32:
-        return C_O2_I2(r, r, r, r);
 #endif
 
+    case INDEX_op_mulu2_i32:
+    case INDEX_op_mulu2_i64:
+    case INDEX_op_muls2_i32:
+    case INDEX_op_muls2_i64:
+        return C_O2_I2(r, r, r, r);
+
     case INDEX_op_movcond_i32:
     case INDEX_op_movcond_i64:
     case INDEX_op_setcond2_i32:
@@ -434,7 +438,6 @@ static void tcg_out_op_rrrrr(TCGContext *s, TCGOpcode op, TCGReg r0,
     tcg_out32(s, insn);
 }
 
-#if TCG_TARGET_REG_BITS == 32
 static void tcg_out_op_rrrr(TCGContext *s, TCGOpcode op,
                             TCGReg r0, TCGReg r1, TCGReg r2, TCGReg r3)
 {
@@ -447,7 +450,6 @@ static void tcg_out_op_rrrr(TCGContext *s, TCGOpcode op,
     insn = deposit32(insn, 20, 4, r3);
     tcg_out32(s, insn);
 }
-#endif
 
 static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
                               TCGReg r0, TCGReg r1, TCGReg r2,
@@ -728,10 +730,12 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
                           args[0], args[1], args[2], args[3], args[4]);
         tcg_out_op_rl(s, INDEX_op_brcond_i32, TCG_REG_TMP, arg_label(args[5]));
         break;
-    case INDEX_op_mulu2_i32:
+#endif
+
+    CASE_32_64(mulu2)
+    CASE_32_64(muls2)
         tcg_out_op_rrrr(s, opc, args[0], args[1], args[2], args[3]);
         break;
-#endif
 
     case INDEX_op_qemu_ld_i32:
     case INDEX_op_qemu_st_i32:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* [PATCH v2 93/93] tcg/tci: Implement add2, sub2
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (92 preceding siblings ...)
  2021-02-04  1:58 ` [PATCH v2 92/93] tcg/tci: Implement mulu2, muls2 Richard Henderson
@ 2021-02-04  1:58 ` Richard Henderson
  2021-02-04  3:31 ` [PATCH v2 00/93] TCI fixes and cleanups no-reply
  2021-02-04  9:58 ` Peter Maydell
  95 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04  1:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: sw

We already had the 32-bit versions for a 32-bit host; expand this
to 64-bit hosts as well.  The 64-bit opcodes are new.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci/tcg-target.h     |  8 ++++----
 tcg/tci.c                | 40 ++++++++++++++++++++++++++--------------
 tcg/tci/tcg-target.c.inc | 15 ++++++++-------
 3 files changed, 38 insertions(+), 25 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 71a44bbfb0..515b3c7a56 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -121,11 +121,11 @@
 #define TCG_TARGET_HAS_rot_i64          1
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_muls2_i64        1
-#define TCG_TARGET_HAS_add2_i32         0
-#define TCG_TARGET_HAS_sub2_i32         0
+#define TCG_TARGET_HAS_add2_i32         1
+#define TCG_TARGET_HAS_sub2_i32         1
 #define TCG_TARGET_HAS_mulu2_i32        1
-#define TCG_TARGET_HAS_add2_i64         0
-#define TCG_TARGET_HAS_sub2_i64         0
+#define TCG_TARGET_HAS_add2_i64         1
+#define TCG_TARGET_HAS_sub2_i64         1
 #define TCG_TARGET_HAS_mulu2_i64        1
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i64        0
diff --git a/tcg/tci.c b/tcg/tci.c
index 5d83b2d957..ee16142f48 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -189,7 +189,6 @@ static void tci_args_rrrrrc(uint32_t insn, TCGReg *r0, TCGReg *r1,
     *c5 = extract32(insn, 28, 4);
 }
 
-#if TCG_TARGET_REG_BITS == 32
 static void tci_args_rrrrrr(uint32_t insn, TCGReg *r0, TCGReg *r1,
                             TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGReg *r5)
 {
@@ -200,7 +199,6 @@ static void tci_args_rrrrrr(uint32_t insn, TCGReg *r0, TCGReg *r1,
     *r4 = extract32(insn, 24, 4);
     *r5 = extract32(insn, 28, 4);
 }
-#endif
 
 static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
 {
@@ -368,17 +366,14 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
     for (;;) {
         uint32_t insn;
         TCGOpcode opc;
-        TCGReg r0, r1, r2, r3, r4;
+        TCGReg r0, r1, r2, r3, r4, r5;
         tcg_target_ulong t1;
         TCGCond condition;
         target_ulong taddr;
         uint8_t pos, len;
         uint32_t tmp32;
         uint64_t tmp64;
-#if TCG_TARGET_REG_BITS == 32
-        TCGReg r5;
         uint64_t T1, T2;
-#endif
         TCGMemOpIdx oi;
         int32_t ofs;
         void *ptr;
@@ -665,20 +660,22 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
                 tb_ptr = ptr;
             }
             break;
-#if TCG_TARGET_REG_BITS == 32
+#if TCG_TARGET_REG_BITS == 32 || TCG_TARGET_HAS_add2_i32
         case INDEX_op_add2_i32:
             tci_args_rrrrrr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
             T1 = tci_uint64(regs[r3], regs[r2]);
             T2 = tci_uint64(regs[r5], regs[r4]);
             tci_write_reg64(regs, r1, r0, T1 + T2);
             break;
+#endif
+#if TCG_TARGET_REG_BITS == 32 || TCG_TARGET_HAS_sub2_i32
         case INDEX_op_sub2_i32:
             tci_args_rrrrrr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
             T1 = tci_uint64(regs[r3], regs[r2]);
             T2 = tci_uint64(regs[r5], regs[r4]);
             tci_write_reg64(regs, r1, r0, T1 - T2);
             break;
-#endif /* TCG_TARGET_REG_BITS == 32 */
+#endif
 #if TCG_TARGET_HAS_mulu2_i32
         case INDEX_op_mulu2_i32:
             tci_args_rrrr(insn, &r0, &r1, &r2, &r3);
@@ -808,6 +805,24 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             muls64(&regs[r0], &regs[r1], regs[r2], regs[r3]);
             break;
 #endif
+#if TCG_TARGET_HAS_add2_i64
+        case INDEX_op_add2_i64:
+            tci_args_rrrrrr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
+            T1 = regs[r2] + regs[r4];
+            T2 = regs[r3] + regs[r5] + (T1 < regs[r2]);
+            regs[r0] = T1;
+            regs[r1] = T2;
+            break;
+#endif
+#if TCG_TARGET_HAS_add2_i64
+        case INDEX_op_sub2_i64:
+            tci_args_rrrrrr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
+            T1 = regs[r2] - regs[r4];
+            T2 = regs[r3] - regs[r5] - (regs[r2] < regs[r4]);
+            regs[r0] = T1;
+            regs[r1] = T2;
+            break;
+#endif
 
             /* Shift/rotate operations (64 bit). */
 
@@ -1124,10 +1139,7 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
     const char *op_name;
     uint32_t insn;
     TCGOpcode op;
-    TCGReg r0, r1, r2, r3, r4;
-#if TCG_TARGET_REG_BITS == 32
-    TCGReg r5;
-#endif
+    TCGReg r0, r1, r2, r3, r4, r5;
     tcg_target_ulong i1;
     int32_t s2;
     TCGCond c;
@@ -1325,15 +1337,15 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
                            str_r(r2), str_r(r3));
         break;
 
-#if TCG_TARGET_REG_BITS == 32
     case INDEX_op_add2_i32:
+    case INDEX_op_add2_i64:
     case INDEX_op_sub2_i32:
+    case INDEX_op_sub2_i64:
         tci_args_rrrrrr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
         info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s,%s,%s",
                            op_name, str_r(r0), str_r(r1), str_r(r2),
                            str_r(r3), str_r(r4), str_r(r5));
         break;
-#endif
 
     case INDEX_op_qemu_ld_i64:
     case INDEX_op_qemu_st_i64:
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index eb48633fba..9b2e2c32a1 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -134,11 +134,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_brcond_i64:
         return C_O0_I2(r, r);
 
-#if TCG_TARGET_REG_BITS == 32
-    /* TODO: Support R, R, R, R, RI, RI? Will it be faster? */
     case INDEX_op_add2_i32:
+    case INDEX_op_add2_i64:
     case INDEX_op_sub2_i32:
+    case INDEX_op_sub2_i64:
         return C_O2_I4(r, r, r, r, r, r);
+
+#if TCG_TARGET_REG_BITS == 32
     case INDEX_op_brcond2_i32:
         return C_O0_I4(r, r, r, r);
 #endif
@@ -467,7 +469,6 @@ static void tcg_out_op_rrrrrc(TCGContext *s, TCGOpcode op,
     tcg_out32(s, insn);
 }
 
-#if TCG_TARGET_REG_BITS == 32
 static void tcg_out_op_rrrrrr(TCGContext *s, TCGOpcode op,
                               TCGReg r0, TCGReg r1, TCGReg r2,
                               TCGReg r3, TCGReg r4, TCGReg r5)
@@ -483,7 +484,6 @@ static void tcg_out_op_rrrrrr(TCGContext *s, TCGOpcode op,
     insn = deposit32(insn, 28, 4, r5);
     tcg_out32(s, insn);
 }
-#endif
 
 static void tcg_out_ldst(TCGContext *s, TCGOpcode op, TCGReg val,
                          TCGReg base, intptr_t offset)
@@ -719,12 +719,13 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_op_rr(s, opc, args[0], args[1]);
         break;
 
-#if TCG_TARGET_REG_BITS == 32
-    case INDEX_op_add2_i32:
-    case INDEX_op_sub2_i32:
+    CASE_32_64(add2)
+    CASE_32_64(sub2)
         tcg_out_op_rrrrrr(s, opc, args[0], args[1], args[2],
                           args[3], args[4], args[5]);
         break;
+
+#if TCG_TARGET_REG_BITS == 32
     case INDEX_op_brcond2_i32:
         tcg_out_op_rrrrrc(s, INDEX_op_setcond2_i32, TCG_REG_TMP,
                           args[0], args[1], args[2], args[3], args[4]);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 00/93] TCI fixes and cleanups
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (93 preceding siblings ...)
  2021-02-04  1:58 ` [PATCH v2 93/93] tcg/tci: Implement add2, sub2 Richard Henderson
@ 2021-02-04  3:31 ` no-reply
  2021-02-04  9:58 ` Peter Maydell
  95 siblings, 0 replies; 129+ messages in thread
From: no-reply @ 2021-02-04  3:31 UTC (permalink / raw)
  To: richard.henderson; +Cc: sw, qemu-devel

Patchew URL: https://patchew.org/QEMU/20210204014509.882821-1-richard.henderson@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210204014509.882821-1-richard.henderson@linaro.org
Subject: [PATCH v2 00/93] TCI fixes and cleanups

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]      patchew/20210127232151.3523581-1-f4bug@amsat.org -> patchew/20210127232151.3523581-1-f4bug@amsat.org
 - [tag update]      patchew/20210128144125.3696119-1-f4bug@amsat.org -> patchew/20210128144125.3696119-1-f4bug@amsat.org
 * [new tag]         patchew/20210204014509.882821-1-richard.henderson@linaro.org -> patchew/20210204014509.882821-1-richard.henderson@linaro.org
Switched to a new branch 'test'
8b0bc01 tcg/tci: Implement add2, sub2
da10429 tcg/tci: Implement mulu2, muls2
ced4f5a tcg/tci: Implement clz, ctz, ctpop
1e4852c tcg/tci: Implement extract, sextract
f3f91cc tcg/tci: Implement andc, orc, eqv, nand, nor
87a5d9e tcg/tci: Implement movcond
9976361 tcg/tci: Implement goto_ptr
11e005d tcg/tci: Change encoding to uint32_t units
d42a563 tcg/tci: Remove tci_write_reg
231b705 tcg/tci: Emit setcond before brcond
055b38e tcg/tci: Reserve r13 for a temporary
081aea4 tcg/tci: Split out tcg_out_op_r[iI]
fff070a tcg/tci: Split out tcg_out_op_np
015559a tcg/tci: Split out tcg_out_op_v
89bee6c tcg/tci: Split out tcg_out_op_{rrm,rrrm,rrrrm}
2311b1a tcg/tci: Split out tcg_out_op_rrrrcl
b226664 tcg/tci: Split out tcg_out_op_rrrr
08b2642 tcg/tci: Split out tcg_out_op_rrrrrr
36b029b tcg/tci: Split out tcg_out_op_rrcl
a78060f tcg/tci: Split out tcg_out_op_rrrbb
80891b3 tcg/tci: Split out tcg_out_op_rrrrrc
030b2c5 tcg/tci: Split out tcg_out_op_rrrc
152d803 tcg/tci: Split out tcg_out_op_rrr
219243e tcg/tci: Split out tcg_out_op_rr
bc62fe5 tcg/tci: Split out tcg_out_op_p
ae4c05c tcg/tci: Split out tcg_out_op_l
420d1be tcg/tci: Split out tcg_out_op_rrs
d67ed68 tcg/tci: Push opcode emit into each case
5e85088 tcg/tci: Move call-return regs to end of tcg_target_reg_alloc_order
8ef0a82 tcg/tci: Improve tcg_target_call_clobber_regs
6b7a3c7 tcg/tci: Use ffi for calls
3603af3 tcg: Build ffi data structures for helpers
a3ee3dd tcg/tci: Implement the disassembler properly
938c48e tcg/tci: Remove tci_disas
102a4b7 tcg/tci: Hoist op_size checking into tci_args_*
5f4a91e tcg/tci: Split out tci_args_{rrm,rrrm,rrrrm}
4ab25b1 tcg/tci: Reduce qemu_ld/st TCGMemOpIdx operand to 32-bits
5112952 tcg/tci: Clean up deposit operations
25e378a tcg/tci: Split out tci_args_rrrr
98e9b3f tcg/tci: Split out tci_args_rrrrrr
0156bf9 tcg/tci: Reuse tci_args_l for goto_tb
0f4b492 tcg/tci: Reuse tci_args_l for exit_tb
0697d2d tcg/tci: Reuse tci_args_l for calls.
d351c13 tcg/tci: Split out tci_args_ri and tci_args_rI
699ba0f tcg/tci: Split out tci_args_rrcl and tci_args_rrrrcl
8886438 tcg/tci: Split out tci_args_rrrrrc
40f64f6 tcg/tci: Split out tci_args_l
b14402b tcg/tci: Split out tci_args_rrrc
02a8039 tcg/tci: Split out tci_args_rrr
a91c032 tcg/tci: Split out tci_args_rr
cd078fe tcg/tci: Split out tci_args_rrs
943fd28 tcg/tci: Rename tci_read_r to tci_read_rval
0d2df6a tcg/tci: Merge mov, not and neg operations
c8f29d9 tcg/tci: Merge bswap operations
93e1dd4 tcg/tci: Remove ifdefs for TCG_TARGET_HAS_ext32[us]_i64
4ba996d tcg/tci: Merge extension operations
8e5b80c tcg/tci: Merge basic arithmetic operations
21a34d1 tcg/tci: Reduce use of tci_read_r64
e98b67d tcg/tci: Remove tci_read_r32s
0b2209a tcg/tci: Remove tci_read_r16s
23f5ca8 tcg/tci: Remove tci_read_r16
b1e734e tcg/tci: Remove tci_read_r8s
4d485d6 tcg/tci: Remove tci_read_r8
8ddeee2 tcg/tci: Merge identical cases in generation
01b68d4 tcg/tci: Remove TCG_CONST
3f384f5 tcg/tci: Use bool in tcg_out_ri*
c2365d0 tcg/tci: Fix TCG_REG_R4 misusage
5b68471 tcg/tci: Restrict TCG_TARGET_NB_REGS to 16
ad391cb tcg/tci: Remove TODO as unused
1a32255 tcg/tci: Implement 64-bit division
9b566c3 tcg/tci: Remove dead code for TCG_TARGET_HAS_div2_*
1e76758 tcg/tci: Use g_assert_not_reached
927c77f tcg/tci: Merge INDEX_op_{st_i32,st32_i64}
c794a09 tcg/tci: Move stack bounds check to compile-time
393a624 tcg/tci: Merge INDEX_op_st16_{i32,i64}
7ef8c09 tcg/tci: Merge INDEX_op_st8_{i32,i64}
10e9b92 tcg/tci: Merge INDEX_op_{ld_i32,ld32u_i64}
116f33d tcg/tci: Merge INDEX_op_ld16s_{i32,i64}
5f7ede9 tcg/tci: Merge INDEX_op_ld16u_{i32,i64}
8dcd478 tcg/tci: Merge INDEX_op_ld8s_{i32,i64}
01ef162 tcg/tci: Merge INDEX_op_ld8u_{i32,i64}
fccf4b6 tcg/tci: Inline tci_write_reg64 into 64-bit callers
c245070 tcg/tci: Inline tci_write_reg32 into all callers
84dcf2a tcg/tci: Inline tci_write_reg16 into the only caller
df64108 tcg/tci: Inline tci_write_reg8 into its callers
bde4a52 tcg/tci: Inline tci_write_reg32s into the only caller
149907b tcg/tci: Implement INDEX_op_ld8s_i64
dc1b33e tcg/tci: Implement INDEX_op_ld16s_i32
fc60c35 tcg/tci: Make tci_tb_ptr thread-local
0460433 tcg: Manage splitwx in tc_ptr_to_region_tree by hand
dcc1427 configure: Fix --enable-tcg-interpreter
88693f5 tcg: Split out tcg_raise_tb_overflow
58f1c9d gdbstub: Fix handle_query_xfer_auxv

=== OUTPUT BEGIN ===
1/93 Checking commit 58f1c9dd12ed (gdbstub: Fix handle_query_xfer_auxv)
2/93 Checking commit 88693f5b4098 (tcg: Split out tcg_raise_tb_overflow)
3/93 Checking commit dcc1427420fa (configure: Fix --enable-tcg-interpreter)
4/93 Checking commit 0460433f0b4d (tcg: Manage splitwx in tc_ptr_to_region_tree by hand)
5/93 Checking commit fc60c3528109 (tcg/tci: Make tci_tb_ptr thread-local)
6/93 Checking commit dc1b33ee7571 (tcg/tci: Implement INDEX_op_ld16s_i32)
7/93 Checking commit 149907ba9185 (tcg/tci: Implement INDEX_op_ld8s_i64)
8/93 Checking commit bde4a52322e3 (tcg/tci: Inline tci_write_reg32s into the only caller)
9/93 Checking commit df641084c79a (tcg/tci: Inline tci_write_reg8 into its callers)
10/93 Checking commit 84dcf2a64782 (tcg/tci: Inline tci_write_reg16 into the only caller)
11/93 Checking commit c24507003f2f (tcg/tci: Inline tci_write_reg32 into all callers)
12/93 Checking commit fccf4b68bfaa (tcg/tci: Inline tci_write_reg64 into 64-bit callers)
13/93 Checking commit 01ef162b41d5 (tcg/tci: Merge INDEX_op_ld8u_{i32,i64})
14/93 Checking commit 8dcd478e50c8 (tcg/tci: Merge INDEX_op_ld8s_{i32,i64})
15/93 Checking commit 5f7ede9dea7b (tcg/tci: Merge INDEX_op_ld16u_{i32,i64})
16/93 Checking commit 116f33d50d6b (tcg/tci: Merge INDEX_op_ld16s_{i32,i64})
17/93 Checking commit 10e9b929d2a4 (tcg/tci: Merge INDEX_op_{ld_i32,ld32u_i64})
18/93 Checking commit 7ef8c0931d3c (tcg/tci: Merge INDEX_op_st8_{i32,i64})
19/93 Checking commit 393a62462008 (tcg/tci: Merge INDEX_op_st16_{i32,i64})
20/93 Checking commit c794a09db8fc (tcg/tci: Move stack bounds check to compile-time)
21/93 Checking commit 927c77f44727 (tcg/tci: Merge INDEX_op_{st_i32,st32_i64})
22/93 Checking commit 1e7675830e78 (tcg/tci: Use g_assert_not_reached)
23/93 Checking commit 9b566c3c1ea4 (tcg/tci: Remove dead code for TCG_TARGET_HAS_div2_*)
24/93 Checking commit 1a3225514775 (tcg/tci: Implement 64-bit division)
25/93 Checking commit ad391cbfd2d9 (tcg/tci: Remove TODO as unused)
26/93 Checking commit 5b684717e738 (tcg/tci: Restrict TCG_TARGET_NB_REGS to 16)
27/93 Checking commit c2365d08ae35 (tcg/tci: Fix TCG_REG_R4 misusage)
28/93 Checking commit 3f384f57facc (tcg/tci: Use bool in tcg_out_ri*)
29/93 Checking commit 01b68d4c06b1 (tcg/tci: Remove TCG_CONST)
30/93 Checking commit 8ddeee25f950 (tcg/tci: Merge identical cases in generation)
31/93 Checking commit 4d485d6d060a (tcg/tci: Remove tci_read_r8)
32/93 Checking commit b1e734ec57ec (tcg/tci: Remove tci_read_r8s)
33/93 Checking commit 23f5ca83d14c (tcg/tci: Remove tci_read_r16)
34/93 Checking commit 0b2209a73835 (tcg/tci: Remove tci_read_r16s)
35/93 Checking commit e98b67d04791 (tcg/tci: Remove tci_read_r32s)
36/93 Checking commit 21a34d1a9dee (tcg/tci: Reduce use of tci_read_r64)
37/93 Checking commit 8e5b80cd5c63 (tcg/tci: Merge basic arithmetic operations)
38/93 Checking commit 4ba996ddc4a1 (tcg/tci: Merge extension operations)
39/93 Checking commit 93e1dd4b2d1b (tcg/tci: Remove ifdefs for TCG_TARGET_HAS_ext32[us]_i64)
40/93 Checking commit c8f29d9affb2 (tcg/tci: Merge bswap operations)
41/93 Checking commit 0d2df6ae1c07 (tcg/tci: Merge mov, not and neg operations)
42/93 Checking commit 943fd281285e (tcg/tci: Rename tci_read_r to tci_read_rval)
43/93 Checking commit cd078fe19af1 (tcg/tci: Split out tci_args_rrs)
44/93 Checking commit a91c032f91a9 (tcg/tci: Split out tci_args_rr)
45/93 Checking commit 02a8039045b4 (tcg/tci: Split out tci_args_rrr)
46/93 Checking commit b14402b7c4df (tcg/tci: Split out tci_args_rrrc)
47/93 Checking commit 40f64f684fc4 (tcg/tci: Split out tci_args_l)
48/93 Checking commit 8886438314e6 (tcg/tci: Split out tci_args_rrrrrc)
49/93 Checking commit 699ba0fe82e8 (tcg/tci: Split out tci_args_rrcl and tci_args_rrrrcl)
50/93 Checking commit d351c13d441e (tcg/tci: Split out tci_args_ri and tci_args_rI)
51/93 Checking commit 0697d2ddada4 (tcg/tci: Reuse tci_args_l for calls.)
52/93 Checking commit 0f4b492e3c71 (tcg/tci: Reuse tci_args_l for exit_tb)
53/93 Checking commit 0156bf9d5970 (tcg/tci: Reuse tci_args_l for goto_tb)
54/93 Checking commit 98e9b3fa9d40 (tcg/tci: Split out tci_args_rrrrrr)
55/93 Checking commit 25e378a6a247 (tcg/tci: Split out tci_args_rrrr)
56/93 Checking commit 5112952475e2 (tcg/tci: Clean up deposit operations)
57/93 Checking commit 4ab25b103753 (tcg/tci: Reduce qemu_ld/st TCGMemOpIdx operand to 32-bits)
58/93 Checking commit 5f4a91e3ea51 (tcg/tci: Split out tci_args_{rrm,rrrm,rrrrm})
59/93 Checking commit 102a4b755857 (tcg/tci: Hoist op_size checking into tci_args_*)
60/93 Checking commit 938c48e4249b (tcg/tci: Remove tci_disas)
61/93 Checking commit a3ee3dd3a1b9 (tcg/tci: Implement the disassembler properly)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#22: 
deleted file mode 100644

total: 0 errors, 1 warnings, 304 lines checked

Patch 61/93 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
62/93 Checking commit 3603af302b19 (tcg: Build ffi data structures for helpers)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#26: 
new file mode 100644

ERROR: Macros with complex values should be enclosed in parenthesis
#41: FILE: include/exec/helper-ffi.h:11:
+#define dh_ffitype_i32  &ffi_type_uint32

ERROR: Macros with complex values should be enclosed in parenthesis
#42: FILE: include/exec/helper-ffi.h:12:
+#define dh_ffitype_s32  &ffi_type_sint32

ERROR: Macros with complex values should be enclosed in parenthesis
#44: FILE: include/exec/helper-ffi.h:14:
+#define dh_ffitype_i64  &ffi_type_uint64

ERROR: Macros with complex values should be enclosed in parenthesis
#45: FILE: include/exec/helper-ffi.h:15:
+#define dh_ffitype_s64  &ffi_type_sint64

ERROR: Macros with complex values should be enclosed in parenthesis
#46: FILE: include/exec/helper-ffi.h:16:
+#define dh_ffitype_f16  &ffi_type_uint32

ERROR: Macros with complex values should be enclosed in parenthesis
#47: FILE: include/exec/helper-ffi.h:17:
+#define dh_ffitype_f32  &ffi_type_uint32

ERROR: Macros with complex values should be enclosed in parenthesis
#48: FILE: include/exec/helper-ffi.h:18:
+#define dh_ffitype_f64  &ffi_type_uint64

ERROR: Macros with complex values should be enclosed in parenthesis
#51: FILE: include/exec/helper-ffi.h:21:
+#  define dh_ffitype_tl &ffi_type_uint32

ERROR: Macros with complex values should be enclosed in parenthesis
#53: FILE: include/exec/helper-ffi.h:23:
+#  define dh_ffitype_tl &ffi_type_uint64

ERROR: Macros with complex values should be enclosed in parenthesis
#56: FILE: include/exec/helper-ffi.h:26:
+#define dh_ffitype_ptr  &ffi_type_pointer

ERROR: Macros with complex values should be enclosed in parenthesis
#57: FILE: include/exec/helper-ffi.h:27:
+#define dh_ffitype_cptr &ffi_type_pointer

ERROR: Macros with complex values should be enclosed in parenthesis
#60: FILE: include/exec/helper-ffi.h:30:
+#define dh_ffitype_env  &ffi_type_pointer

ERROR: space required after that ',' (ctx:VxV)
#64: FILE: include/exec/helper-ffi.h:34:
+    static ffi_cif glue(cif_,NAME) = {          \
                             ^

ERROR: Macros with multiple statements should be enclosed in a do - while loop
#68: FILE: include/exec/helper-ffi.h:38:
+#define DEF_HELPER_FLAGS_1(NAME, FLAGS, ret, t1)                        \
+    static ffi_type *glue(cif_args_,NAME)[1] = { dh_ffitype(t1) };      \
+    static ffi_cif glue(cif_,NAME) = {                                  \
+        .rtype = dh_ffitype(ret), .nargs = 1,                           \
+        .arg_types = glue(cif_args_,NAME),                              \
+    };

ERROR: spaces required around that '*' (ctx:WxV)
#69: FILE: include/exec/helper-ffi.h:39:
+    static ffi_type *glue(cif_args_,NAME)[1] = { dh_ffitype(t1) };      \
                     ^

ERROR: space required after that ',' (ctx:VxV)
#69: FILE: include/exec/helper-ffi.h:39:
+    static ffi_type *glue(cif_args_,NAME)[1] = { dh_ffitype(t1) };      \
                                    ^

ERROR: open brace '{' following function declarations go on the next line
#70: FILE: include/exec/helper-ffi.h:40:
+    static ffi_cif glue(cif_,NAME) = {                                  \

ERROR: space required after that ',' (ctx:VxV)
#70: FILE: include/exec/helper-ffi.h:40:
+    static ffi_cif glue(cif_,NAME) = {                                  \
                             ^

ERROR: space required after that ',' (ctx:VxV)
#72: FILE: include/exec/helper-ffi.h:42:
+        .arg_types = glue(cif_args_,NAME),                              \
                                    ^

ERROR: Macros with multiple statements should be enclosed in a do - while loop
#75: FILE: include/exec/helper-ffi.h:45:
+#define DEF_HELPER_FLAGS_2(NAME, FLAGS, ret, t1, t2)    \
+    static ffi_type *glue(cif_args_,NAME)[2] = {        \
+        dh_ffitype(t1), dh_ffitype(t2)                  \
+    };                                                  \
+    static ffi_cif glue(cif_,NAME) = {                  \
+        .rtype = dh_ffitype(ret), .nargs = 2,           \
+        .arg_types = glue(cif_args_,NAME),              \
+    };

ERROR: spaces required around that '*' (ctx:WxV)
#76: FILE: include/exec/helper-ffi.h:46:
+    static ffi_type *glue(cif_args_,NAME)[2] = {        \
                     ^

ERROR: space required after that ',' (ctx:VxV)
#76: FILE: include/exec/helper-ffi.h:46:
+    static ffi_type *glue(cif_args_,NAME)[2] = {        \
                                    ^

ERROR: open brace '{' following function declarations go on the next line
#79: FILE: include/exec/helper-ffi.h:49:
+    static ffi_cif glue(cif_,NAME) = {                  \

ERROR: space required after that ',' (ctx:VxV)
#79: FILE: include/exec/helper-ffi.h:49:
+    static ffi_cif glue(cif_,NAME) = {                  \
                             ^

ERROR: space required after that ',' (ctx:VxV)
#81: FILE: include/exec/helper-ffi.h:51:
+        .arg_types = glue(cif_args_,NAME),              \
                                    ^

ERROR: Macros with multiple statements should be enclosed in a do - while loop
#84: FILE: include/exec/helper-ffi.h:54:
+#define DEF_HELPER_FLAGS_3(NAME, FLAGS, ret, t1, t2, t3)        \
+    static ffi_type *glue(cif_args_,NAME)[3] = {                \
+        dh_ffitype(t1), dh_ffitype(t2), dh_ffitype(t3)          \
+    };                                                          \
+    static ffi_cif glue(cif_,NAME) = {                          \
+        .rtype = dh_ffitype(ret), .nargs = 3,                   \
+        .arg_types = glue(cif_args_,NAME),                      \
+    };

ERROR: spaces required around that '*' (ctx:WxV)
#85: FILE: include/exec/helper-ffi.h:55:
+    static ffi_type *glue(cif_args_,NAME)[3] = {                \
                     ^

ERROR: space required after that ',' (ctx:VxV)
#85: FILE: include/exec/helper-ffi.h:55:
+    static ffi_type *glue(cif_args_,NAME)[3] = {                \
                                    ^

ERROR: open brace '{' following function declarations go on the next line
#88: FILE: include/exec/helper-ffi.h:58:
+    static ffi_cif glue(cif_,NAME) = {                          \

ERROR: space required after that ',' (ctx:VxV)
#88: FILE: include/exec/helper-ffi.h:58:
+    static ffi_cif glue(cif_,NAME) = {                          \
                             ^

ERROR: space required after that ',' (ctx:VxV)
#90: FILE: include/exec/helper-ffi.h:60:
+        .arg_types = glue(cif_args_,NAME),                      \
                                    ^

ERROR: Macros with multiple statements should be enclosed in a do - while loop
#93: FILE: include/exec/helper-ffi.h:63:
+#define DEF_HELPER_FLAGS_4(NAME, FLAGS, ret, t1, t2, t3, t4)            \
+    static ffi_type *glue(cif_args_,NAME)[4] = {                        \
+        dh_ffitype(t1), dh_ffitype(t2), dh_ffitype(t3), dh_ffitype(t4)  \
+    };                                                                  \
+    static ffi_cif glue(cif_,NAME) = {                                  \
+        .rtype = dh_ffitype(ret), .nargs = 4,                           \
+        .arg_types = glue(cif_args_,NAME),                              \
+    };

ERROR: spaces required around that '*' (ctx:WxV)
#94: FILE: include/exec/helper-ffi.h:64:
+    static ffi_type *glue(cif_args_,NAME)[4] = {                        \
                     ^

ERROR: space required after that ',' (ctx:VxV)
#94: FILE: include/exec/helper-ffi.h:64:
+    static ffi_type *glue(cif_args_,NAME)[4] = {                        \
                                    ^

ERROR: open brace '{' following function declarations go on the next line
#97: FILE: include/exec/helper-ffi.h:67:
+    static ffi_cif glue(cif_,NAME) = {                                  \

ERROR: space required after that ',' (ctx:VxV)
#97: FILE: include/exec/helper-ffi.h:67:
+    static ffi_cif glue(cif_,NAME) = {                                  \
                             ^

ERROR: space required after that ',' (ctx:VxV)
#99: FILE: include/exec/helper-ffi.h:69:
+        .arg_types = glue(cif_args_,NAME),                              \
                                    ^

ERROR: Macros with multiple statements should be enclosed in a do - while loop
#102: FILE: include/exec/helper-ffi.h:72:
+#define DEF_HELPER_FLAGS_5(NAME, FLAGS, ret, t1, t2, t3, t4, t5)        \
+    static ffi_type *glue(cif_args_,NAME)[5] = {                        \
+        dh_ffitype(t1), dh_ffitype(t2), dh_ffitype(t3),                 \
+        dh_ffitype(t4), dh_ffitype(t5)                                  \
+    };                                                                  \
+    static ffi_cif glue(cif_,NAME) = {                                  \
+        .rtype = dh_ffitype(ret), .nargs = 5,                           \
+        .arg_types = glue(cif_args_,NAME),                              \
+    };

ERROR: spaces required around that '*' (ctx:WxV)
#103: FILE: include/exec/helper-ffi.h:73:
+    static ffi_type *glue(cif_args_,NAME)[5] = {                        \
                     ^

ERROR: space required after that ',' (ctx:VxV)
#103: FILE: include/exec/helper-ffi.h:73:
+    static ffi_type *glue(cif_args_,NAME)[5] = {                        \
                                    ^

ERROR: open brace '{' following function declarations go on the next line
#107: FILE: include/exec/helper-ffi.h:77:
+    static ffi_cif glue(cif_,NAME) = {                                  \

ERROR: space required after that ',' (ctx:VxV)
#107: FILE: include/exec/helper-ffi.h:77:
+    static ffi_cif glue(cif_,NAME) = {                                  \
                             ^

ERROR: space required after that ',' (ctx:VxV)
#109: FILE: include/exec/helper-ffi.h:79:
+        .arg_types = glue(cif_args_,NAME),                              \
                                    ^

ERROR: Macros with multiple statements should be enclosed in a do - while loop
#112: FILE: include/exec/helper-ffi.h:82:
+#define DEF_HELPER_FLAGS_6(NAME, FLAGS, ret, t1, t2, t3, t4, t5, t6)    \
+    static ffi_type *glue(cif_args_,NAME)[6] = {                        \
+        dh_ffitype(t1), dh_ffitype(t2), dh_ffitype(t3),                 \
+        dh_ffitype(t4), dh_ffitype(t5), dh_ffitype(t6)                  \
+    };                                                                  \
+    static ffi_cif glue(cif_,NAME) = {                                  \
+        .rtype = dh_ffitype(ret), .nargs = 6,                           \
+        .arg_types = glue(cif_args_,NAME),                              \
+    };

ERROR: spaces required around that '*' (ctx:WxV)
#113: FILE: include/exec/helper-ffi.h:83:
+    static ffi_type *glue(cif_args_,NAME)[6] = {                        \
                     ^

ERROR: space required after that ',' (ctx:VxV)
#113: FILE: include/exec/helper-ffi.h:83:
+    static ffi_type *glue(cif_args_,NAME)[6] = {                        \
                                    ^

ERROR: open brace '{' following function declarations go on the next line
#117: FILE: include/exec/helper-ffi.h:87:
+    static ffi_cif glue(cif_,NAME) = {                                  \

ERROR: space required after that ',' (ctx:VxV)
#117: FILE: include/exec/helper-ffi.h:87:
+    static ffi_cif glue(cif_,NAME) = {                                  \
                             ^

ERROR: space required after that ',' (ctx:VxV)
#119: FILE: include/exec/helper-ffi.h:89:
+        .arg_types = glue(cif_args_,NAME),                              \
                                    ^

ERROR: Macros with multiple statements should be enclosed in a do - while loop
#122: FILE: include/exec/helper-ffi.h:92:
+#define DEF_HELPER_FLAGS_7(NAME, FLAGS, ret, t1, t2, t3, t4, t5, t6, t7) \
+    static ffi_type *glue(cif_args_,NAME)[7] = {                        \
+        dh_ffitype(t1), dh_ffitype(t2), dh_ffitype(t3),                 \
+        dh_ffitype(t4), dh_ffitype(t5), dh_ffitype(t6), dh_ffitype(t7)  \
+    };                                                                  \
+    static ffi_cif glue(cif_,NAME) = {                                  \
+        .rtype = dh_ffitype(ret), .nargs = 7,                           \
+        .arg_types = glue(cif_args_,NAME),                              \
+    };

ERROR: spaces required around that '*' (ctx:WxV)
#123: FILE: include/exec/helper-ffi.h:93:
+    static ffi_type *glue(cif_args_,NAME)[7] = {                        \
                     ^

ERROR: space required after that ',' (ctx:VxV)
#123: FILE: include/exec/helper-ffi.h:93:
+    static ffi_type *glue(cif_args_,NAME)[7] = {                        \
                                    ^

ERROR: open brace '{' following function declarations go on the next line
#127: FILE: include/exec/helper-ffi.h:97:
+    static ffi_cif glue(cif_,NAME) = {                                  \

ERROR: space required after that ',' (ctx:VxV)
#127: FILE: include/exec/helper-ffi.h:97:
+    static ffi_cif glue(cif_,NAME) = {                                  \
                             ^

ERROR: space required after that ',' (ctx:VxV)
#129: FILE: include/exec/helper-ffi.h:99:
+        .arg_types = glue(cif_args_,NAME),                              \
                                    ^

total: 55 errors, 1 warnings, 309 lines checked

Patch 62/93 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

63/93 Checking commit 6b7a3c7affe7 (tcg/tci: Use ffi for calls)
ERROR: spaces required around that '+' (ctx:VxV)
#94: FILE: tcg/tcg.c:2085:
+        bool is_64bit = sizemask & (1 << (i+1)*2);
                                            ^

ERROR: spaces required around that '*' (ctx:VxV)
#94: FILE: tcg/tcg.c:2085:
+        bool is_64bit = sizemask & (1 << (i+1)*2);
                                               ^

ERROR: suspect code indent for conditional statements (8, 11)
#123: FILE: tcg/tcg.c:2104:
+        if (TCG_TARGET_REG_BITS < 64 && is_64bit) {
+           /*

total: 3 errors, 0 warnings, 380 lines checked

Patch 63/93 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

64/93 Checking commit 8ef0a82466ec (tcg/tci: Improve tcg_target_call_clobber_regs)
65/93 Checking commit 5e850884e122 (tcg/tci: Move call-return regs to end of tcg_target_reg_alloc_order)
66/93 Checking commit d67ed6866754 (tcg/tci: Push opcode emit into each case)
67/93 Checking commit 420d1be5fd4e (tcg/tci: Split out tcg_out_op_rrs)
68/93 Checking commit ae4c05c03bdb (tcg/tci: Split out tcg_out_op_l)
69/93 Checking commit bc62fe55babd (tcg/tci: Split out tcg_out_op_p)
70/93 Checking commit 219243ed3ba9 (tcg/tci: Split out tcg_out_op_rr)
71/93 Checking commit 152d803979b5 (tcg/tci: Split out tcg_out_op_rrr)
72/93 Checking commit 030b2c50f949 (tcg/tci: Split out tcg_out_op_rrrc)
73/93 Checking commit 80891b3fba98 (tcg/tci: Split out tcg_out_op_rrrrrc)
74/93 Checking commit a78060f3a057 (tcg/tci: Split out tcg_out_op_rrrbb)
75/93 Checking commit 36b029b4973f (tcg/tci: Split out tcg_out_op_rrcl)
76/93 Checking commit 08b264232c3a (tcg/tci: Split out tcg_out_op_rrrrrr)
77/93 Checking commit b226664af838 (tcg/tci: Split out tcg_out_op_rrrr)
78/93 Checking commit 2311b1a52c71 (tcg/tci: Split out tcg_out_op_rrrrcl)
79/93 Checking commit 89bee6cf27eb (tcg/tci: Split out tcg_out_op_{rrm,rrrm,rrrrm})
80/93 Checking commit 015559a6d7b3 (tcg/tci: Split out tcg_out_op_v)
81/93 Checking commit fff070acc967 (tcg/tci: Split out tcg_out_op_np)
82/93 Checking commit 081aea4ecef3 (tcg/tci: Split out tcg_out_op_r[iI])
83/93 Checking commit 055b38e38da2 (tcg/tci: Reserve r13 for a temporary)
84/93 Checking commit 231b705d5b32 (tcg/tci: Emit setcond before brcond)
85/93 Checking commit d42a563c8817 (tcg/tci: Remove tci_write_reg)
86/93 Checking commit 11e005d652ef (tcg/tci: Change encoding to uint32_t units)
87/93 Checking commit 9976361ad295 (tcg/tci: Implement goto_ptr)
88/93 Checking commit 87a5d9e861e9 (tcg/tci: Implement movcond)
89/93 Checking commit f3f91cc7ff2a (tcg/tci: Implement andc, orc, eqv, nand, nor)
90/93 Checking commit 1e4852c279f0 (tcg/tci: Implement extract, sextract)
91/93 Checking commit ced4f5ae7859 (tcg/tci: Implement clz, ctz, ctpop)
92/93 Checking commit da10429079fe (tcg/tci: Implement mulu2, muls2)
93/93 Checking commit 8b0bc016ff68 (tcg/tci: Implement add2, sub2)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210204014509.882821-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 00/93] TCI fixes and cleanups
  2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
                   ` (94 preceding siblings ...)
  2021-02-04  3:31 ` [PATCH v2 00/93] TCI fixes and cleanups no-reply
@ 2021-02-04  9:58 ` Peter Maydell
  2021-02-04 20:02   ` Stefan Weil
  95 siblings, 1 reply; 129+ messages in thread
From: Peter Maydell @ 2021-02-04  9:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Stefan Weil, QEMU Developers

On Thu, 4 Feb 2021 at 01:49, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Almost 7 years ago I detailed 5 major problems in tci[1], of
> which three still remain:
>
>   * Unaligned accesses to the bytecode stream, which means
>     that we immediately SIGBUS on any host requiring alignment.
>   * Non-portable calls to helper functions.
>   * Full of useless ifdefs and TODOs.
>
> To my mind, this means the code is unmaintained, despite what it
> says in MAINTAINERS.  Thus tci *should* be simply removed.
> However, every time removal is suggested, someone comes out of the
> woodwork and says we should keep it, because it's useful for $FOO.

Not listed, but also a problem:
 * it's a configure-time choice, not a runtime choice

(Personally I'm on the "we should just remove it" side.)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 79/93] tcg/tci: Split out tcg_out_op_{rrm,rrrm,rrrrm}
  2021-02-04  1:54 ` Richard Henderson
@ 2021-02-04 14:54   ` Alex Bennée
  2021-02-04 17:54     ` Richard Henderson
  0 siblings, 1 reply; 129+ messages in thread
From: Alex Bennée @ 2021-02-04 14:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: sw, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/tci/tcg-target.c.inc | 70
> ++++++++++++++++++++++++++++++----------

Hmm duplicate patch tripped up my mail tools, did send-email have a
brain fart at 79/93?

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 04/93] tcg: Manage splitwx in tc_ptr_to_region_tree by hand
  2021-02-04  1:43 ` [PATCH v2 04/93] tcg: Manage splitwx in tc_ptr_to_region_tree by hand Richard Henderson
@ 2021-02-04 15:01   ` Alex Bennée
  2021-02-04 17:46     ` Richard Henderson
  0 siblings, 1 reply; 129+ messages in thread
From: Alex Bennée @ 2021-02-04 15:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: sw, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> The use in tcg_tb_lookup is given a random pc that comes from the pc
> of a signal handler.  Do not assert that the pointer is already within
> the code gen buffer at all, much less the writable mirror of it.

Surely we are asserting that - or at least you can find a rt entry for
the pointer passed (which we always expect to work)?

>
> Fixes: db0c51a3803
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>
> For TCI, this indicates a bug in handle_cpu_signal, in that we
> are taking PC from the host signal frame.  Which is, nearly,
> unrelated to TCI at all.
>
> The TCI "pc" is tci_tb_ptr (fixed in the next patch to at least
> be thread-local).  We update this only on calls, since we don't
> expect SEGV during the interpretation loop.  Which works ok for
> softmmu, in which we pass down pc by hand to the helpers, but
> is not ok for user-only, where we simply perform the raw memory
> operation.
>
> I don't know how to fix this, exactly.  Probably by storing to
> tci_tb_ptr before each qemu_ld/qemu_st operation, with barriers.
> Then Doing the Right Thing in handle_cpu_signal.  And perhaps
> by clearing tci_tb_ptr whenever we're not expecting a SEGV on
> behalf of the guest (and thus anything left is a qemu host bug).
>
> ---
> v2: Retain full struct initialization
> ---
>  tcg/tcg.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index bbe3dcee03..2991112829 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -513,11 +513,21 @@ static void tcg_region_trees_init(void)
>      }
>  }
>  
> -static struct tcg_region_tree *tc_ptr_to_region_tree(const void *cp)
> +static struct tcg_region_tree *tc_ptr_to_region_tree(const void *p)
>  {
> -    void *p = tcg_splitwx_to_rw(cp);
>      size_t region_idx;
>  
> +    /*
> +     * Like tcg_splitwx_to_rw, with no assert.  The pc may come from
> +     * a signal handler over which the caller has no control.
> +     */
> +    if (!in_code_gen_buffer(p)) {
> +        p -= tcg_splitwx_diff;
> +        if (!in_code_gen_buffer(p)) {
> +            return NULL;
> +        }
> +    }
> +
>      if (p < region.start_aligned) {
>          region_idx = 0;
>      } else {
> @@ -536,6 +546,7 @@ void tcg_tb_insert(TranslationBlock *tb)
>  {
>      struct tcg_region_tree *rt = tc_ptr_to_region_tree(tb->tc.ptr);
>  
> +    g_assert(rt != NULL);
>      qemu_mutex_lock(&rt->lock);
>      g_tree_insert(rt->tree, &tb->tc, tb);
>      qemu_mutex_unlock(&rt->lock);
> @@ -545,6 +556,7 @@ void tcg_tb_remove(TranslationBlock *tb)
>  {
>      struct tcg_region_tree *rt = tc_ptr_to_region_tree(tb->tc.ptr);
>  
> +    g_assert(rt != NULL);
>      qemu_mutex_lock(&rt->lock);
>      g_tree_remove(rt->tree, &tb->tc);
>      qemu_mutex_unlock(&rt->lock);
> @@ -561,6 +573,10 @@ TranslationBlock *tcg_tb_lookup(uintptr_t tc_ptr)
>      TranslationBlock *tb;
>      struct tb_tc s = { .ptr = (void *)tc_ptr };
>  
> +    if (rt == NULL) {
> +        return NULL;
> +    }
> +
>      qemu_mutex_lock(&rt->lock);
>      tb = g_tree_lookup(rt->tree, &s);
>      qemu_mutex_unlock(&rt->lock);


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 05/93] tcg/tci: Make tci_tb_ptr thread-local
  2021-02-04  1:43 ` [PATCH v2 05/93] tcg/tci: Make tci_tb_ptr thread-local Richard Henderson
@ 2021-02-04 15:05   ` Alex Bennée
  0 siblings, 0 replies; 129+ messages in thread
From: Alex Bennée @ 2021-02-04 15:05 UTC (permalink / raw)
  To: Richard Henderson; +Cc: sw, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Each thread must have its own pc, even under TCI.
>
> Remove the GETPC ifdef, because GETPC is always available for
> helpers, and thus is always required.  Move the assignment
> under INDEX_op_call, because the value is only visible when
> we make a call to a helper function.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 26/93] tcg/tci: Restrict TCG_TARGET_NB_REGS to 16
  2021-02-04  1:44 ` [PATCH v2 26/93] tcg/tci: Restrict TCG_TARGET_NB_REGS to 16 Richard Henderson
@ 2021-02-04 15:07   ` Alex Bennée
  0 siblings, 0 replies; 129+ messages in thread
From: Alex Bennée @ 2021-02-04 15:07 UTC (permalink / raw)
  To: Richard Henderson; +Cc: sw, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> As noted in several comments, 8 regs is not enough for 32-bit
> to perform calls, as currently implemented.  Shortly, we will
> rearrange the encoding which will make 32 regs impossible.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 27/93] tcg/tci: Fix TCG_REG_R4 misusage
  2021-02-04  1:44 ` [PATCH v2 27/93] tcg/tci: Fix TCG_REG_R4 misusage Richard Henderson
@ 2021-02-04 15:09   ` Alex Bennée
  0 siblings, 0 replies; 129+ messages in thread
From: Alex Bennée @ 2021-02-04 15:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: sw, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> This was removed from tcg_target_reg_alloc_order and
> tcg_target_call_iarg_regs on the assumption that it
> was the stack.  This was incorrectly copied from i386.
> For tci, the stack is R15.
>
> By adding R4 back to tcg_target_call_iarg_regs, adjust the other
> entries so that 6 (or 12) entries are still present in the array,
> and adjust the numbers in the interpreter.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 28/93] tcg/tci: Use bool in tcg_out_ri*
  2021-02-04  1:44 ` [PATCH v2 28/93] tcg/tci: Use bool in tcg_out_ri* Richard Henderson
@ 2021-02-04 15:11   ` Alex Bennée
  2021-02-04 15:15   ` Alex Bennée
  1 sibling, 0 replies; 129+ messages in thread
From: Alex Bennée @ 2021-02-04 15:11 UTC (permalink / raw)
  To: Richard Henderson; +Cc: sw, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> This is the intended behavior. Remove the assert on
> the specific value passed, which can now never be
> anything besides false/true (0/1).
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 28/93] tcg/tci: Use bool in tcg_out_ri*
  2021-02-04  1:44 ` [PATCH v2 28/93] tcg/tci: Use bool in tcg_out_ri* Richard Henderson
  2021-02-04 15:11   ` Alex Bennée
@ 2021-02-04 15:15   ` Alex Bennée
  1 sibling, 0 replies; 129+ messages in thread
From: Alex Bennée @ 2021-02-04 15:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: sw, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> This is the intended behavior. Remove the assert on
> the specific value passed, which can now never be
> anything besides false/true (0/1).
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 29/93] tcg/tci: Remove TCG_CONST
  2021-02-04  1:44 ` [PATCH v2 29/93] tcg/tci: Remove TCG_CONST Richard Henderson
@ 2021-02-04 15:39   ` Alex Bennée
  2021-02-04 17:52     ` Richard Henderson
  0 siblings, 1 reply; 129+ messages in thread
From: Alex Bennée @ 2021-02-04 15:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: sw, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Only allow registers or constants, but not both, in any
> given position.

Aren't we switching to all registers (there are no more _i functions
after this)? I guess you mean the registers can have constants in them?

> Removing this difference in input will
> allow more code to be shared between 32-bit and 64-bit.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/tci/tcg-target-con-set.h |   6 +-
>  tcg/tci/tcg-target.h         |   3 -
>  tcg/tci.c                    | 189 +++++++++++++----------------------
>  tcg/tci/tcg-target.c.inc     |  82 ++++-----------
>  4 files changed, 89 insertions(+), 191 deletions(-)
>
> diff --git a/tcg/tci/tcg-target-con-set.h b/tcg/tci/tcg-target-con-set.h
> index 38e82f7535..f51b7bcb13 100644
> --- a/tcg/tci/tcg-target-con-set.h
> +++ b/tcg/tci/tcg-target-con-set.h
> @@ -10,16 +10,12 @@
>   * tcg-target-con-str.h; the constraint combination is inclusive or.
>   */
>  C_O0_I2(r, r)
> -C_O0_I2(r, ri)
>  C_O0_I3(r, r, r)
> -C_O0_I4(r, r, ri, ri)
>  C_O0_I4(r, r, r, r)
>  C_O1_I1(r, r)
>  C_O1_I2(r, 0, r)
> -C_O1_I2(r, ri, ri)
>  C_O1_I2(r, r, r)
> -C_O1_I2(r, r, ri)
> -C_O1_I4(r, r, r, ri, ri)
> +C_O1_I4(r, r, r, r, r)
>  C_O2_I1(r, r, r)
>  C_O2_I2(r, r, r, r)
>  C_O2_I4(r, r, r, r, r, r)
> diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
> index 8f7ed676fc..9c0021a26f 100644
> --- a/tcg/tci/tcg-target.h
> +++ b/tcg/tci/tcg-target.h
> @@ -157,9 +157,6 @@ typedef enum {
>  
>      TCG_AREG0 = TCG_REG_R14,
>      TCG_REG_CALL_STACK = TCG_REG_R15,
> -
> -    /* Special value UINT8_MAX is used by TCI to encode constant values. */
> -    TCG_CONST = UINT8_MAX
>  } TCGReg;
>  
>  /* Used for function call generation. */
> diff --git a/tcg/tci.c b/tcg/tci.c
> index 935eb87330..fb3c97aaf1 100644
> --- a/tcg/tci.c
> +++ b/tcg/tci.c
> @@ -255,61 +255,6 @@ tci_read_ulong(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
>      return taddr;
>  }
>  
> -/* Read indexed register or constant (native size) from bytecode. */
> -static tcg_target_ulong
> -tci_read_ri(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
> -{
> -    tcg_target_ulong value;
> -    TCGReg r = **tb_ptr;
> -    *tb_ptr += 1;
> -    if (r == TCG_CONST) {
> -        value = tci_read_i(tb_ptr);
> -    } else {
> -        value = tci_read_reg(regs, r);
> -    }
> -    return value;
> -}
> -
> -/* Read indexed register or constant (32 bit) from bytecode. */
> -static uint32_t tci_read_ri32(const tcg_target_ulong *regs,
> -                              const uint8_t **tb_ptr)
> -{
> -    uint32_t value;
> -    TCGReg r = **tb_ptr;
> -    *tb_ptr += 1;
> -    if (r == TCG_CONST) {
> -        value = tci_read_i32(tb_ptr);
> -    } else {
> -        value = tci_read_reg32(regs, r);
> -    }
> -    return value;
> -}
> -
> -#if TCG_TARGET_REG_BITS == 32
> -/* Read two indexed registers or constants (2 * 32 bit) from bytecode. */
> -static uint64_t tci_read_ri64(const tcg_target_ulong *regs,
> -                              const uint8_t **tb_ptr)
> -{
> -    uint32_t low = tci_read_ri32(regs, tb_ptr);
> -    return tci_uint64(tci_read_ri32(regs, tb_ptr), low);
> -}
> -#elif TCG_TARGET_REG_BITS == 64
> -/* Read indexed register or constant (64 bit) from bytecode. */
> -static uint64_t tci_read_ri64(const tcg_target_ulong *regs,
> -                              const uint8_t **tb_ptr)
> -{
> -    uint64_t value;
> -    TCGReg r = **tb_ptr;
> -    *tb_ptr += 1;
> -    if (r == TCG_CONST) {
> -        value = tci_read_i64(tb_ptr);
> -    } else {
> -        value = tci_read_reg64(regs, r);
> -    }
> -    return value;
> -}
> -#endif
> -
>  static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
>  {
>      tcg_target_ulong label = tci_read_i(tb_ptr);
> @@ -504,7 +449,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
>  
>          switch (opc) {
>          case INDEX_op_call:
> -            t0 = tci_read_ri(regs, &tb_ptr);
> +            t0 = tci_read_i(&tb_ptr);
>              tci_tb_ptr = (uintptr_t)tb_ptr;
>  #if TCG_TARGET_REG_BITS == 32
>              tmp64 = ((helper_function)t0)(tci_read_reg(regs, TCG_REG_R0),
> @@ -539,7 +484,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
>          case INDEX_op_setcond_i32:
>              t0 = *tb_ptr++;
>              t1 = tci_read_r32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              condition = *tb_ptr++;
>              tci_write_reg(regs, t0, tci_compare32(t1, t2, condition));
>              break;
> @@ -547,7 +492,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
>          case INDEX_op_setcond2_i32:
>              t0 = *tb_ptr++;
>              tmp64 = tci_read_r64(regs, &tb_ptr);
> -            v64 = tci_read_ri64(regs, &tb_ptr);
> +            v64 = tci_read_r64(regs, &tb_ptr);
>              condition = *tb_ptr++;
>              tci_write_reg(regs, t0, tci_compare64(tmp64, v64, condition));
>              break;
> @@ -555,7 +500,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
>          case INDEX_op_setcond_i64:
>              t0 = *tb_ptr++;
>              t1 = tci_read_r64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              condition = *tb_ptr++;
>              tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
>              break;
> @@ -628,62 +573,62 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
>  
>          case INDEX_op_add_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 + t2);
>              break;
>          case INDEX_op_sub_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 - t2);
>              break;
>          case INDEX_op_mul_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 * t2);
>              break;
>          case INDEX_op_div_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, (int32_t)t1 / (int32_t)t2);
>              break;
>          case INDEX_op_divu_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 / t2);
>              break;
>          case INDEX_op_rem_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, (int32_t)t1 % (int32_t)t2);
>              break;
>          case INDEX_op_remu_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 % t2);
>              break;
>          case INDEX_op_and_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 & t2);
>              break;
>          case INDEX_op_or_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 | t2);
>              break;
>          case INDEX_op_xor_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 ^ t2);
>              break;
>  
> @@ -691,33 +636,33 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
>  
>          case INDEX_op_shl_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 << (t2 & 31));
>              break;
>          case INDEX_op_shr_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 >> (t2 & 31));
>              break;
>          case INDEX_op_sar_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, ((int32_t)t1 >> (t2 & 31)));
>              break;
>  #if TCG_TARGET_HAS_rot_i32
>          case INDEX_op_rotl_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, rol32(t1, t2 & 31));
>              break;
>          case INDEX_op_rotr_i32:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> -            t2 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
> +            t2 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg(regs, t0, ror32(t1, t2 & 31));
>              break;
>  #endif
> @@ -734,7 +679,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
>  #endif
>          case INDEX_op_brcond_i32:
>              t0 = tci_read_r32(regs, &tb_ptr);
> -            t1 = tci_read_ri32(regs, &tb_ptr);
> +            t1 = tci_read_r32(regs, &tb_ptr);
>              condition = *tb_ptr++;
>              label = tci_read_label(&tb_ptr);
>              if (tci_compare32(t0, t1, condition)) {
> @@ -760,7 +705,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
>              break;
>          case INDEX_op_brcond2_i32:
>              tmp64 = tci_read_r64(regs, &tb_ptr);
> -            v64 = tci_read_ri64(regs, &tb_ptr);
> +            v64 = tci_read_r64(regs, &tb_ptr);
>              condition = *tb_ptr++;
>              label = tci_read_label(&tb_ptr);
>              if (tci_compare64(tmp64, v64, condition)) {
> @@ -870,62 +815,62 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
>  
>          case INDEX_op_add_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 + t2);
>              break;
>          case INDEX_op_sub_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 - t2);
>              break;
>          case INDEX_op_mul_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 * t2);
>              break;
>          case INDEX_op_div_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, (int64_t)t1 / (int64_t)t2);
>              break;
>          case INDEX_op_divu_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, (uint64_t)t1 / (uint64_t)t2);
>              break;
>          case INDEX_op_rem_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, (int64_t)t1 % (int64_t)t2);
>              break;
>          case INDEX_op_remu_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, (uint64_t)t1 % (uint64_t)t2);
>              break;
>          case INDEX_op_and_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 & t2);
>              break;
>          case INDEX_op_or_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 | t2);
>              break;
>          case INDEX_op_xor_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 ^ t2);
>              break;
>  
> @@ -933,33 +878,33 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
>  
>          case INDEX_op_shl_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 << (t2 & 63));
>              break;
>          case INDEX_op_shr_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, t1 >> (t2 & 63));
>              break;
>          case INDEX_op_sar_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, ((int64_t)t1 >> (t2 & 63)));
>              break;
>  #if TCG_TARGET_HAS_rot_i64
>          case INDEX_op_rotl_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, rol64(t1, t2 & 63));
>              break;
>          case INDEX_op_rotr_i64:
>              t0 = *tb_ptr++;
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> -            t2 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
> +            t2 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg(regs, t0, ror64(t1, t2 & 63));
>              break;
>  #endif
> @@ -976,7 +921,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
>  #endif
>          case INDEX_op_brcond_i64:
>              t0 = tci_read_r64(regs, &tb_ptr);
> -            t1 = tci_read_ri64(regs, &tb_ptr);
> +            t1 = tci_read_r64(regs, &tb_ptr);
>              condition = *tb_ptr++;
>              label = tci_read_label(&tb_ptr);
>              if (tci_compare64(t0, t1, condition)) {
> diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
> index 1b66368c94..feac4659cc 100644
> --- a/tcg/tci/tcg-target.c.inc
> +++ b/tcg/tci/tcg-target.c.inc
> @@ -92,8 +92,6 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
>      case INDEX_op_rem_i64:
>      case INDEX_op_remu_i32:
>      case INDEX_op_remu_i64:
> -        return C_O1_I2(r, r, r);
> -
>      case INDEX_op_add_i32:
>      case INDEX_op_add_i64:
>      case INDEX_op_sub_i32:
> @@ -126,8 +124,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
>      case INDEX_op_rotl_i64:
>      case INDEX_op_rotr_i32:
>      case INDEX_op_rotr_i64:
> -        /* TODO: Does R, RI, RI result in faster code than R, R, RI? */
> -        return C_O1_I2(r, ri, ri);
> +    case INDEX_op_setcond_i32:
> +    case INDEX_op_setcond_i64:
> +        return C_O1_I2(r, r, r);
>  
>      case INDEX_op_deposit_i32:
>      case INDEX_op_deposit_i64:
> @@ -135,11 +134,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
>  
>      case INDEX_op_brcond_i32:
>      case INDEX_op_brcond_i64:
> -        return C_O0_I2(r, ri);
> -
> -    case INDEX_op_setcond_i32:
> -    case INDEX_op_setcond_i64:
> -        return C_O1_I2(r, r, ri);
> +        return C_O0_I2(r, r);
>  
>  #if TCG_TARGET_REG_BITS == 32
>      /* TODO: Support R, R, R, R, RI, RI? Will it be faster? */
> @@ -147,11 +142,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
>      case INDEX_op_sub2_i32:
>          return C_O2_I4(r, r, r, r, r, r);
>      case INDEX_op_brcond2_i32:
> -        return C_O0_I4(r, r, ri, ri);
> +        return C_O0_I4(r, r, r, r);
>      case INDEX_op_mulu2_i32:
>          return C_O2_I2(r, r, r, r);
>      case INDEX_op_setcond2_i32:
> -        return C_O1_I4(r, r, r, ri, ri);
> +        return C_O1_I4(r, r, r, r, r);
>  #endif
>  
>      case INDEX_op_qemu_ld_i32:
> @@ -294,41 +289,6 @@ static void tcg_out_r(TCGContext *s, TCGArg t0)
>      tcg_out8(s, t0);
>  }
>  
> -/* Write register or constant (native size). */
> -static void tcg_out_ri(TCGContext *s, bool const_arg, TCGArg arg)
> -{
> -    if (const_arg) {
> -        tcg_out8(s, TCG_CONST);
> -        tcg_out_i(s, arg);
> -    } else {
> -        tcg_out_r(s, arg);
> -    }
> -}
> -
> -/* Write register or constant (32 bit). */
> -static void tcg_out_ri32(TCGContext *s, bool const_arg, TCGArg arg)
> -{
> -    if (const_arg) {
> -        tcg_out8(s, TCG_CONST);
> -        tcg_out32(s, arg);
> -    } else {
> -        tcg_out_r(s, arg);
> -    }
> -}
> -
> -#if TCG_TARGET_REG_BITS == 64
> -/* Write register or constant (64 bit). */
> -static void tcg_out_ri64(TCGContext *s, bool const_arg, TCGArg arg)
> -{
> -    if (const_arg) {
> -        tcg_out8(s, TCG_CONST);
> -        tcg_out64(s, arg);
> -    } else {
> -        tcg_out_r(s, arg);
> -    }
> -}
> -#endif
> -
>  /* Write label. */
>  static void tci_out_label(TCGContext *s, TCGLabel *label)
>  {
> @@ -416,7 +376,7 @@ static inline void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
>  {
>      uint8_t *old_code_ptr = s->code_ptr;
>      tcg_out_op_t(s, INDEX_op_call);
> -    tcg_out_ri(s, 1, (uintptr_t)arg);
> +    tcg_out_i(s, (uintptr_t)arg);
>      old_code_ptr[1] = s->code_ptr - old_code_ptr;
>  }
>  
> @@ -450,7 +410,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>      case INDEX_op_setcond_i32:
>          tcg_out_r(s, args[0]);
>          tcg_out_r(s, args[1]);
> -        tcg_out_ri32(s, const_args[2], args[2]);
> +        tcg_out_r(s, args[2]);
>          tcg_out8(s, args[3]);   /* condition */
>          break;
>  #if TCG_TARGET_REG_BITS == 32
> @@ -459,15 +419,15 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>          tcg_out_r(s, args[0]);
>          tcg_out_r(s, args[1]);
>          tcg_out_r(s, args[2]);
> -        tcg_out_ri32(s, const_args[3], args[3]);
> -        tcg_out_ri32(s, const_args[4], args[4]);
> +        tcg_out_r(s, args[3]);
> +        tcg_out_r(s, args[4]);
>          tcg_out8(s, args[5]);   /* condition */
>          break;
>  #elif TCG_TARGET_REG_BITS == 64
>      case INDEX_op_setcond_i64:
>          tcg_out_r(s, args[0]);
>          tcg_out_r(s, args[1]);
> -        tcg_out_ri64(s, const_args[2], args[2]);
> +        tcg_out_r(s, args[2]);
>          tcg_out8(s, args[3]);   /* condition */
>          break;
>  #endif
> @@ -513,8 +473,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>      case INDEX_op_rotl_i32:     /* Optional (TCG_TARGET_HAS_rot_i32). */
>      case INDEX_op_rotr_i32:     /* Optional (TCG_TARGET_HAS_rot_i32). */
>          tcg_out_r(s, args[0]);
> -        tcg_out_ri32(s, const_args[1], args[1]);
> -        tcg_out_ri32(s, const_args[2], args[2]);
> +        tcg_out_r(s, args[1]);
> +        tcg_out_r(s, args[2]);
>          break;
>      case INDEX_op_deposit_i32:  /* Optional (TCG_TARGET_HAS_deposit_i32). */
>          tcg_out_r(s, args[0]);
> @@ -548,8 +508,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>      case INDEX_op_rem_i64:      /* Optional (TCG_TARGET_HAS_div_i64). */
>      case INDEX_op_remu_i64:     /* Optional (TCG_TARGET_HAS_div_i64). */
>          tcg_out_r(s, args[0]);
> -        tcg_out_ri64(s, const_args[1], args[1]);
> -        tcg_out_ri64(s, const_args[2], args[2]);
> +        tcg_out_r(s, args[1]);
> +        tcg_out_r(s, args[2]);
>          break;
>      case INDEX_op_deposit_i64:  /* Optional (TCG_TARGET_HAS_deposit_i64). */
>          tcg_out_r(s, args[0]);
> @@ -562,7 +522,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>          break;
>      case INDEX_op_brcond_i64:
>          tcg_out_r(s, args[0]);
> -        tcg_out_ri64(s, const_args[1], args[1]);
> +        tcg_out_r(s, args[1]);
>          tcg_out8(s, args[2]);           /* condition */
>          tci_out_label(s, arg_label(args[3]));
>          break;
> @@ -596,8 +556,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>      case INDEX_op_rem_i32:      /* Optional (TCG_TARGET_HAS_div_i32). */
>      case INDEX_op_remu_i32:     /* Optional (TCG_TARGET_HAS_div_i32). */
>          tcg_out_r(s, args[0]);
> -        tcg_out_ri32(s, const_args[1], args[1]);
> -        tcg_out_ri32(s, const_args[2], args[2]);
> +        tcg_out_r(s, args[1]);
> +        tcg_out_r(s, args[2]);
>          break;
>  #if TCG_TARGET_REG_BITS == 32
>      case INDEX_op_add2_i32:
> @@ -612,8 +572,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>      case INDEX_op_brcond2_i32:
>          tcg_out_r(s, args[0]);
>          tcg_out_r(s, args[1]);
> -        tcg_out_ri32(s, const_args[2], args[2]);
> -        tcg_out_ri32(s, const_args[3], args[3]);
> +        tcg_out_r(s, args[2]);
> +        tcg_out_r(s, args[3]);
>          tcg_out8(s, args[4]);           /* condition */
>          tci_out_label(s, arg_label(args[5]));
>          break;
> @@ -626,7 +586,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>  #endif
>      case INDEX_op_brcond_i32:
>          tcg_out_r(s, args[0]);
> -        tcg_out_ri32(s, const_args[1], args[1]);
> +        tcg_out_r(s, args[1]);
>          tcg_out8(s, args[2]);           /* condition */
>          tci_out_label(s, arg_label(args[3]));
>          break;


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 04/93] tcg: Manage splitwx in tc_ptr_to_region_tree by hand
  2021-02-04 15:01   ` Alex Bennée
@ 2021-02-04 17:46     ` Richard Henderson
  2021-02-04 18:45       ` Alex Bennée
  0 siblings, 1 reply; 129+ messages in thread
From: Richard Henderson @ 2021-02-04 17:46 UTC (permalink / raw)
  To: Alex Bennée; +Cc: sw, qemu-devel

On 2/4/21 5:01 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> The use in tcg_tb_lookup is given a random pc that comes from the pc
>> of a signal handler.  Do not assert that the pointer is already within
>> the code gen buffer at all, much less the writable mirror of it.
> 
> Surely we are asserting that - or at least you can find a rt entry for
> the pointer passed (which we always expect to work)?

What?  No.  The pointer could be anything at all, depending on any other bug
within qemu.


r~


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 29/93] tcg/tci: Remove TCG_CONST
  2021-02-04 15:39   ` Alex Bennée
@ 2021-02-04 17:52     ` Richard Henderson
  2021-02-04 18:48       ` Alex Bennée
  0 siblings, 1 reply; 129+ messages in thread
From: Richard Henderson @ 2021-02-04 17:52 UTC (permalink / raw)
  To: Alex Bennée; +Cc: sw, qemu-devel

On 2/4/21 5:39 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Only allow registers or constants, but not both, in any
>> given position.
> 
> Aren't we switching to all registers (there are no more _i functions
> after this)? I guess you mean the registers can have constants in them?

Ah, bad wording.  I think I was thinking of movi, which would be r,i if it
actually existed in this list (which it doesn't).

Better as

  Restrict all operands to registers.  All constants will be
  forced into registers by the middle-end.

?


r~


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 79/93] tcg/tci: Split out tcg_out_op_{rrm,rrrm,rrrrm}
  2021-02-04 14:54   ` Alex Bennée
@ 2021-02-04 17:54     ` Richard Henderson
  0 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04 17:54 UTC (permalink / raw)
  To: Alex Bennée; +Cc: sw, qemu-devel

On 2/4/21 4:54 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  tcg/tci/tcg-target.c.inc | 70
>> ++++++++++++++++++++++++++++++----------
> 
> Hmm duplicate patch tripped up my mail tools, did send-email have a
> brain fart at 79/93?
> 

Send-email borked out at patch 79 and made me think it hadn't been sent.  I
then restarted with 79-93, so 79 got sent twice.


r~


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 04/93] tcg: Manage splitwx in tc_ptr_to_region_tree by hand
  2021-02-04 17:46     ` Richard Henderson
@ 2021-02-04 18:45       ` Alex Bennée
  2021-02-04 19:17         ` Richard Henderson
  0 siblings, 1 reply; 129+ messages in thread
From: Alex Bennée @ 2021-02-04 18:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: sw, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> On 2/4/21 5:01 AM, Alex Bennée wrote:
>> 
>> Richard Henderson <richard.henderson@linaro.org> writes:
>> 
>>> The use in tcg_tb_lookup is given a random pc that comes from the pc
>>> of a signal handler.  Do not assert that the pointer is already within
>>> the code gen buffer at all, much less the writable mirror of it.
>> 
>> Surely we are asserting that - or at least you can find a rt entry for
>> the pointer passed (which we always expect to work)?
>
> What?  No.  The pointer could be anything at all, depending on any other bug
> within qemu.

But you do assert it:

     struct tcg_region_tree *rt = tc_ptr_to_region_tree(tb->tc.ptr);
 
     g_assert(rt != NULL);

and rt is only NULL when it's !in_code_gen_buffer(p).

In tcg_tb_lookup you haven't removed an assert - you just ensure you
don't fail if it's not.

>
>
> r~


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 29/93] tcg/tci: Remove TCG_CONST
  2021-02-04 17:52     ` Richard Henderson
@ 2021-02-04 18:48       ` Alex Bennée
  0 siblings, 0 replies; 129+ messages in thread
From: Alex Bennée @ 2021-02-04 18:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: sw, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> On 2/4/21 5:39 AM, Alex Bennée wrote:
>> 
>> Richard Henderson <richard.henderson@linaro.org> writes:
>> 
>>> Only allow registers or constants, but not both, in any
>>> given position.
>> 
>> Aren't we switching to all registers (there are no more _i functions
>> after this)? I guess you mean the registers can have constants in them?
>
> Ah, bad wording.  I think I was thinking of movi, which would be r,i if it
> actually existed in this list (which it doesn't).
>
> Better as
>
>   Restrict all operands to registers.  All constants will be
>   forced into registers by the middle-end.

better thanks..

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

>
> ?
>
>
> r~


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 04/93] tcg: Manage splitwx in tc_ptr_to_region_tree by hand
  2021-02-04 18:45       ` Alex Bennée
@ 2021-02-04 19:17         ` Richard Henderson
  0 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04 19:17 UTC (permalink / raw)
  To: Alex Bennée; +Cc: sw, qemu-devel

On 2/4/21 8:45 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> On 2/4/21 5:01 AM, Alex Bennée wrote:
>>>
>>> Richard Henderson <richard.henderson@linaro.org> writes:
>>>
>>>> The use in tcg_tb_lookup is given a random pc that comes from the pc
>>>> of a signal handler.  Do not assert that the pointer is already within
>>>> the code gen buffer at all, much less the writable mirror of it.
>>>
>>> Surely we are asserting that - or at least you can find a rt entry for
>>> the pointer passed (which we always expect to work)?
>>
>> What?  No.  The pointer could be anything at all, depending on any other bug
>> within qemu.
> 
> But you do assert it:
> 
>      struct tcg_region_tree *rt = tc_ptr_to_region_tree(tb->tc.ptr);
>  
>      g_assert(rt != NULL);
> 
> and rt is only NULL when it's !in_code_gen_buffer(p).
> 
> In tcg_tb_lookup you haven't removed an assert - you just ensure you
> don't fail if it's not.

I see your confusion.  The arbitrary pointer is the one that is presented to
tcg_tb_lookup, and passed on to tc_ptr_to_region_tree.

The (dynamic) assert has been removed by not calling tcg_splitwx_to_rw in
tc_ptr_to_region_tree, as called from tcg_tb_lookup.

But tcg_tb_insert and tcg_tb_remove do not receive arbitrary pointers -- they
always get a TranslationBlock.  So I retain an assert along those paths.  (As
opposed to SEGV on the next lines, I suppose.)

Suggestions on how to reword the commit?  Just include most of the above?

r~


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 00/93] TCI fixes and cleanups
  2021-02-04  9:58 ` Peter Maydell
@ 2021-02-04 20:02   ` Stefan Weil
  2021-02-04 20:42     ` Richard Henderson
                       ` (2 more replies)
  0 siblings, 3 replies; 129+ messages in thread
From: Stefan Weil @ 2021-02-04 20:02 UTC (permalink / raw)
  To: Peter Maydell, Richard Henderson; +Cc: QEMU Developers

Am 04.02.21 um 10:58 schrieb Peter Maydell:
> On Thu, 4 Feb 2021 at 01:49, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Almost 7 years ago I detailed 5 major problems in tci[1], of
>> which three still remain:
>>
>>    * Unaligned accesses to the bytecode stream, which means
>>      that we immediately SIGBUS on any host requiring alignment.
>>    * Non-portable calls to helper functions.
>>    * Full of useless ifdefs and TODOs.
>>
>> To my mind, this means the code is unmaintained, despite what it
>> says in MAINTAINERS.  Thus tci *should* be simply removed.
>> However, every time removal is suggested, someone comes out of the
>> woodwork and says we should keep it, because it's useful for $FOO.
> Not listed, but also a problem:
>   * it's a configure-time choice, not a runtime choice


That's the feature which I also desire most. Technically it was not 
possible to have native and interpreted TCG in the same code some years 
ago when I tried to implement this, but that might have changed as TCG 
has evolved a lot. Having TCI as a runtime choice would not only require 
less CI builds but also allow new use cases for TCG testing.

I also agree with most of Richard's list of problems and appreciate the 
efforts to fix them.

I disagree on the #ifdefs which can help to understand TCG better in my 
opinion, so for me they have a useful side effect of being also 
documentation.

Most of the problems which you named above were already mentioned in the 
README for TCI from 2011.

Nevertheless I had the impression that TCI was "good enough" for those 
who used it, and it was sufficient to fix a newly used TCG opcode which 
triggered a TODO assertion from time to time. Obviously I underestimated 
Richard's desire to have a 100 % working TCI. I am sorry for that.

Technically the patch series looks reasonable. I like especially the 
disassembler. Is there a Git repository which makes pulling all changes 
easier? It would also help if the patches which were already reviewed 
were already merged in qemu master.

Regarding misaligned bytecode access, there exist two solutions. We 
could either use code which handles that correctly (I had sent a patch 
using memcpy two years ago and recently sent a V2 which uses the QEMU 
standard functions for that). Or we can align the data like it is done 
in Richard's patches. For me it is not obvious which one is better. 
While a single access is faster for aligned data, this might be 
different for sequential access on misaligned data which might profit 
from better caching of smaller bytecode.

Kind regards,

Stefan





^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 00/93] TCI fixes and cleanups
  2021-02-04 20:02   ` Stefan Weil
@ 2021-02-04 20:42     ` Richard Henderson
  2021-02-04 23:52     ` Richard Henderson
  2021-02-05  2:39     ` Richard Henderson
  2 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04 20:42 UTC (permalink / raw)
  To: Stefan Weil, Peter Maydell; +Cc: QEMU Developers

On 2/4/21 10:02 AM, Stefan Weil wrote:
> Is there a Git repository which makes pulling all changes easier?

https://gitlab.com/rth7680/qemu/-/tree/tci-next

> Regarding misaligned bytecode access, there exist two solutions. We could
> either use code which handles that correctly (I had sent a patch using memcpy
> two years ago and recently sent a V2 which uses the QEMU standard functions for
> that). Or we can align the data like it is done in Richard's patches. For me it
> is not obvious which one is better.

I think it is obvious.  If a host requires aligned access, a single aligned
load requires only one instruction, and an unaligned access requires lots.
E.g., for sparc,

int foo(void *p)
{
    int x;
    __builtin_memcpy(&x, p, 4);
    return x;
}

	ldub	[%i0], %g3
	ldub	[%i0+1], %g2
	stb	%g3, [%fp+2043]
	stb	%g2, [%fp+2044]
	ldub	[%i0+2], %g3
	ldub	[%i0+3], %g2
	stb	%g3, [%fp+2045]
	stb	%g2, [%fp+2046]
	ldsw	[%fp+2043], %i0

Such unaligned accesses are *really* slow.

> While a single access is faster for aligned
> data, this might be different for sequential access on misaligned data which
> might profit from better caching of smaller bytecode.

I believe you'll find that the rewrite encoding is smaller already.


r~


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 00/93] TCI fixes and cleanups
  2021-02-04 20:02   ` Stefan Weil
  2021-02-04 20:42     ` Richard Henderson
@ 2021-02-04 23:52     ` Richard Henderson
  2021-02-05  2:39     ` Richard Henderson
  2 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-04 23:52 UTC (permalink / raw)
  To: Stefan Weil, Peter Maydell; +Cc: QEMU Developers

On 2/4/21 10:02 AM, Stefan Weil wrote:
> Am 04.02.21 um 10:58 schrieb Peter Maydell:
>> Not listed, but also a problem:
>>   * it's a configure-time choice, not a runtime choice
> 
> That's the feature which I also desire most.

Well... that depends on how you see tci being most used.

If, like John Glaubitz, you want to use tci for odd systems that don't have a
native backend, then a configure-time choice is appropriate.

If you're trying to use tci as a cross-check to the native tcg backend, then a
command-line choice could be appropriate.  With the bits done below, it could
be just two extra files compiled, which I suppose is light-weight enough.

If you're trying to use tci as a true interpreter for a single-use TB... then
nothing about tci in its current form is appropriate.  Of course, I also
believe that our single-use TBs should not be single-use -- because of mttcg,
we can no longer reuse the memory from them, and therefore we might as well
just keep them around, properly hashed.


> Technically it was not possible to
> have native and interpreted TCG in the same code some years ago when I tried to
> implement this, but that might have changed as TCG has evolved a lot.

It is still not possible.

However.  There are a number of things that I would like to change about tcg
that would make that more practical.

In particular, and most importantly, tcg should be sufficiently partitioned
from any target that it should be built once.

  * There are a number of constants that tcg gets from
    target/foo/cpu.h as defines.  We could just as easily
    receive those constants via some structure.  The code
    that is generated in the end should be the same.

    This is a prerequisite to having multiple guest cpus
    at the same time.  The canonical example of course are
    the Xilinx boards that have both Arm and MicroBlaze cpus.

  * There are a number of constants that target/ gets from
    tcg, but shouldn't.  These are mistakenly used by arm
    and tricore (probably by my hand), and represent premature
    optimization in the front end in both cases, IMO.
    (Even though I'm probably guilty of all instances.)

    We'll know we've got all of these fixed when
      touch tcg/i386/tcg-target.h
    rebuilds no files outside of tcg/.

    Having these constants in some structure is a
    prerequisite to having a native tcg backend live
    along side any kind of tci, including the pure
    interpreter form.

> I disagree on the #ifdefs which can help to understand TCG better in my
> opinion, so for me they have a useful side effect of being also documentation.

Surely not.  Documentation is documentation; ifdefs are clutter, to be used
only when there is no alternative.

I think what you actually want is the structure described above, where all of
the TCG_TARGET_HAS_* knobs are data, and you can change them ad hoc -- even
from within gdb.


r~


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 00/93] TCI fixes and cleanups
  2021-02-04 20:02   ` Stefan Weil
  2021-02-04 20:42     ` Richard Henderson
  2021-02-04 23:52     ` Richard Henderson
@ 2021-02-05  2:39     ` Richard Henderson
  2 siblings, 0 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-05  2:39 UTC (permalink / raw)
  To: Stefan Weil, Peter Maydell; +Cc: QEMU Developers

On 2/4/21 10:02 AM, Stefan Weil wrote:
> It would also help if the patches which were already reviewed were already
> merged in qemu master.

I'll queue the ones that have been reviewed to tcg-next.
That'll get this lot down into the 60's.  :-)


r~


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-04  1:44 ` [PATCH v2 63/93] tcg/tci: Use ffi for calls Richard Henderson
@ 2021-02-07 16:25   ` Stefan Weil
  2021-02-07 17:39     ` Richard Henderson
  0 siblings, 1 reply; 129+ messages in thread
From: Stefan Weil @ 2021-02-07 16:25 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Am 04.02.21 um 02:44 schrieb Richard Henderson:

> This requires adjusting where arguments are stored.
> Place them on the stack at left-aligned positions.
> Adjust the stack frame to be at entirely positive offsets.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
[...]
> diff --git a/tcg/tci.c b/tcg/tci.c
> index 6843e837ae..d27db9f720 100644
> --- a/tcg/tci.c
> +++ b/tcg/tci.c
> @@ -18,6 +18,13 @@
>    */
>   
>   #include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "tcg/tcg.h"           /* MAX_OPC_PARAM_IARGS */
> +#include "exec/cpu_ldst.h"
> +#include "tcg/tcg-op.h"
> +#include "qemu/compiler.h"
> +#include <ffi.h>
> +


ffi.h is not found on macOS with Homebrew.

This can be fixed by using pkg-config to find the right compiler (and 
maybe also linker) flags:

% pkg-config --cflags libffi
-I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
% pkg-config --libs libffi
-lffi

Regards,

Stefan




^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-07 16:25   ` Stefan Weil
@ 2021-02-07 17:39     ` Richard Henderson
  2021-02-07 19:52       ` Peter Maydell
  0 siblings, 1 reply; 129+ messages in thread
From: Richard Henderson @ 2021-02-07 17:39 UTC (permalink / raw)
  To: Stefan Weil, qemu-devel

On 2/7/21 8:25 AM, Stefan Weil wrote:
>> +#include "qemu-common.h"
>> +#include "tcg/tcg.h"           /* MAX_OPC_PARAM_IARGS */
>> +#include "exec/cpu_ldst.h"
>> +#include "tcg/tcg-op.h"
>> +#include "qemu/compiler.h"
>> +#include <ffi.h>
>> +
> 
> 
> ffi.h is not found on macOS with Homebrew.
> 
> This can be fixed by using pkg-config to find the right compiler (and maybe
> also linker) flags:
> 
> % pkg-config --cflags libffi
> -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
> % pkg-config --libs libffi
> -lffi


Which is exactly what I do in the previous patch:


> +++ b/meson.build
> @@ -1901,7 +1901,14 @@ specific_ss.add(when: 'CONFIG_TCG', if_true: files(
>    'tcg/tcg-op.c',
>    'tcg/tcg.c',
>  ))
> -specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tcg/tci.c'))
> +
> +if get_option('tcg_interpreter')
> +  libffi = dependency('libffi', version: '>=3.0',
> +                      static: enable_static, method: 'pkg-config',
> +                      required: true)
> +  specific_ss.add(libffi)
> +  specific_ss.add(files('tcg/tci.c'))
> +endif

Did you need a PKG_CONFIG_LIBDIR set for homebrew?


r~


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-07 17:39     ` Richard Henderson
@ 2021-02-07 19:52       ` Peter Maydell
  2021-02-07 20:12         ` Richard Henderson
  0 siblings, 1 reply; 129+ messages in thread
From: Peter Maydell @ 2021-02-07 19:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Stefan Weil, QEMU Developers

On Sun, 7 Feb 2021 at 17:41, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 2/7/21 8:25 AM, Stefan Weil wrote:
> >> +#include "qemu-common.h"
> >> +#include "tcg/tcg.h"           /* MAX_OPC_PARAM_IARGS */
> >> +#include "exec/cpu_ldst.h"
> >> +#include "tcg/tcg-op.h"
> >> +#include "qemu/compiler.h"
> >> +#include <ffi.h>
> >> +
> >
> >
> > ffi.h is not found on macOS with Homebrew.
> >
> > This can be fixed by using pkg-config to find the right compiler (and maybe
> > also linker) flags:
> >
> > % pkg-config --cflags libffi
> > -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
> > % pkg-config --libs libffi
> > -lffi
>
>
> Which is exactly what I do in the previous patch:
>
>
> > +++ b/meson.build
> > @@ -1901,7 +1901,14 @@ specific_ss.add(when: 'CONFIG_TCG', if_true: files(
> >    'tcg/tcg-op.c',
> >    'tcg/tcg.c',
> >  ))
> > -specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tcg/tci.c'))
> > +
> > +if get_option('tcg_interpreter')
> > +  libffi = dependency('libffi', version: '>=3.0',
> > +                      static: enable_static, method: 'pkg-config',
> > +                      required: true)
> > +  specific_ss.add(libffi)
> > +  specific_ss.add(files('tcg/tci.c'))
> > +endif
>
> Did you need a PKG_CONFIG_LIBDIR set for homebrew?

Is this the "meson doesn't actually add the cflags everywhere it should"
bug again ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-07 19:52       ` Peter Maydell
@ 2021-02-07 20:12         ` Richard Henderson
  2021-02-07 21:33           ` Stefan Weil
  2021-02-08  9:20           ` Peter Maydell
  0 siblings, 2 replies; 129+ messages in thread
From: Richard Henderson @ 2021-02-07 20:12 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Stefan Weil, QEMU Developers

On 2/7/21 11:52 AM, Peter Maydell wrote:
> On Sun, 7 Feb 2021 at 17:41, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 2/7/21 8:25 AM, Stefan Weil wrote:
>>>> +#include "qemu-common.h"
>>>> +#include "tcg/tcg.h"           /* MAX_OPC_PARAM_IARGS */
>>>> +#include "exec/cpu_ldst.h"
>>>> +#include "tcg/tcg-op.h"
>>>> +#include "qemu/compiler.h"
>>>> +#include <ffi.h>
>>>> +
>>>
>>>
>>> ffi.h is not found on macOS with Homebrew.
>>>
>>> This can be fixed by using pkg-config to find the right compiler (and maybe
>>> also linker) flags:
>>>
>>> % pkg-config --cflags libffi
>>> -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
>>> % pkg-config --libs libffi
>>> -lffi
>>
>>
>> Which is exactly what I do in the previous patch:
>>
>>
>>> +++ b/meson.build
>>> @@ -1901,7 +1901,14 @@ specific_ss.add(when: 'CONFIG_TCG', if_true: files(
>>>    'tcg/tcg-op.c',
>>>    'tcg/tcg.c',
>>>  ))
>>> -specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tcg/tci.c'))
>>> +
>>> +if get_option('tcg_interpreter')
>>> +  libffi = dependency('libffi', version: '>=3.0',
>>> +                      static: enable_static, method: 'pkg-config',
>>> +                      required: true)
>>> +  specific_ss.add(libffi)
>>> +  specific_ss.add(files('tcg/tci.c'))
>>> +endif
>>
>> Did you need a PKG_CONFIG_LIBDIR set for homebrew?
> 
> Is this the "meson doesn't actually add the cflags everywhere it should"
> bug again ?

I guess so.  I realized after sending this reply that PKG_CONFIG_LIBDIR can't
be the answer, since the original configure should have failed if pkg-config
didn't find ffi.

Was there a resolution to said meson bug?


r~


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-07 20:12         ` Richard Henderson
@ 2021-02-07 21:33           ` Stefan Weil
  2021-02-08  9:20           ` Peter Maydell
  1 sibling, 0 replies; 129+ messages in thread
From: Stefan Weil @ 2021-02-07 21:33 UTC (permalink / raw)
  To: Richard Henderson, Peter Maydell; +Cc: QEMU Developers

On 07.02.21 21:12, Richard Henderson wrote:
> On 2/7/21 11:52 AM, Peter Maydell wrote:
>> On Sun, 7 Feb 2021 at 17:41, Richard Henderson
>> <richard.henderson@linaro.org> wrote:
>>>
>>> On 2/7/21 8:25 AM, Stefan Weil wrote:
>>>>> +#include "qemu-common.h"
>>>>> +#include "tcg/tcg.h"           /* MAX_OPC_PARAM_IARGS */
>>>>> +#include "exec/cpu_ldst.h"
>>>>> +#include "tcg/tcg-op.h"
>>>>> +#include "qemu/compiler.h"
>>>>> +#include <ffi.h>
>>>>> +
>>>>
>>>>
>>>> ffi.h is not found on macOS with Homebrew.
>>>>
>>>> This can be fixed by using pkg-config to find the right compiler (and maybe
>>>> also linker) flags:
>>>>
>>>> % pkg-config --cflags libffi
>>>> -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
>>>> % pkg-config --libs libffi
>>>> -lffi
>>>
>>>
>>> Which is exactly what I do in the previous patch:
>>>
>>>
>>>> +++ b/meson.build
>>>> @@ -1901,7 +1901,14 @@ specific_ss.add(when: 'CONFIG_TCG', if_true: files(
>>>>    'tcg/tcg-op.c',
>>>>    'tcg/tcg.c',
>>>>  ))
>>>> -specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tcg/tci.c'))
>>>> +
>>>> +if get_option('tcg_interpreter')
>>>> +  libffi = dependency('libffi', version: '>=3.0',
>>>> +                      static: enable_static, method: 'pkg-config',
>>>> +                      required: true)
>>>> +  specific_ss.add(libffi)
>>>> +  specific_ss.add(files('tcg/tci.c'))
>>>> +endif
>>>
>>> Did you need a PKG_CONFIG_LIBDIR set for homebrew?
>>
>> Is this the "meson doesn't actually add the cflags everywhere it should"
>> bug again ?
> 
> I guess so.  I realized after sending this reply that PKG_CONFIG_LIBDIR can't
> be the answer, since the original configure should have failed if pkg-config
> didn't find ffi.
> 
> Was there a resolution to said meson bug?

Meanwhile I noticed an additional detail:

There exist two different pkg-config configurations for libffi on Homebrew:

% pkg-config --cflags libffi
-I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
% export PKG_CONFIG_PATH="/opt/homebrew/opt/libffi/lib/pkgconfig"
% pkg-config --cflags libffi
-I/opt/homebrew/Cellar/libffi/3.3_2/include

By default it points to a system directory which does not exist at all
on my Mac, so that will never work.

With the right PKG_CONFIG_PATH a correct include directory is set, and
the latest rebased tci-next branch now works for me with a compiler warning:

/opt/homebrew/Cellar/libffi/3.3_2/include/ffi.h:441:5: warning:
'FFI_GO_CLOSURES' is not defined, evaluates to 0 [-Wundef]

Stefan


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-07 20:12         ` Richard Henderson
  2021-02-07 21:33           ` Stefan Weil
@ 2021-02-08  9:20           ` Peter Maydell
  2021-02-08  9:35             ` Paolo Bonzini
  1 sibling, 1 reply; 129+ messages in thread
From: Peter Maydell @ 2021-02-08  9:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Stefan Weil, QEMU Developers, Paolo Bonzini

On Sun, 7 Feb 2021 at 20:12, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 2/7/21 11:52 AM, Peter Maydell wrote:
> > On Sun, 7 Feb 2021 at 17:41, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >>
> >> On 2/7/21 8:25 AM, Stefan Weil wrote:
> >>>> +#include "qemu-common.h"
> >>>> +#include "tcg/tcg.h"           /* MAX_OPC_PARAM_IARGS */
> >>>> +#include "exec/cpu_ldst.h"
> >>>> +#include "tcg/tcg-op.h"
> >>>> +#include "qemu/compiler.h"
> >>>> +#include <ffi.h>
> >>>> +
> >>>
> >>>
> >>> ffi.h is not found on macOS with Homebrew.
> >>>
> >>> This can be fixed by using pkg-config to find the right compiler (and maybe
> >>> also linker) flags:
> >>>
> >>> % pkg-config --cflags libffi
> >>> -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
> >>> % pkg-config --libs libffi
> >>> -lffi
> >>
> >>
> >> Which is exactly what I do in the previous patch:
> >>
> >>
> >>> +++ b/meson.build
> >>> @@ -1901,7 +1901,14 @@ specific_ss.add(when: 'CONFIG_TCG', if_true: files(
> >>>    'tcg/tcg-op.c',
> >>>    'tcg/tcg.c',
> >>>  ))
> >>> -specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tcg/tci.c'))
> >>> +
> >>> +if get_option('tcg_interpreter')
> >>> +  libffi = dependency('libffi', version: '>=3.0',
> >>> +                      static: enable_static, method: 'pkg-config',
> >>> +                      required: true)
> >>> +  specific_ss.add(libffi)
> >>> +  specific_ss.add(files('tcg/tci.c'))
> >>> +endif
> >>
> >> Did you need a PKG_CONFIG_LIBDIR set for homebrew?
> >
> > Is this the "meson doesn't actually add the cflags everywhere it should"
> > bug again ?
>
> I guess so.  I realized after sending this reply that PKG_CONFIG_LIBDIR can't
> be the answer, since the original configure should have failed if pkg-config
> didn't find ffi.
>
> Was there a resolution to said meson bug?

There's a workaround involving adding the library to the dependencies
list in the right places -- commit 3eacf70bb5a83e4 did this for gnutls.
Paolo may be able to help further.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-08  9:20           ` Peter Maydell
@ 2021-02-08  9:35             ` Paolo Bonzini
  2021-02-08 13:07               ` Stefan Weil
  0 siblings, 1 reply; 129+ messages in thread
From: Paolo Bonzini @ 2021-02-08  9:35 UTC (permalink / raw)
  To: Peter Maydell, Richard Henderson; +Cc: Stefan Weil, QEMU Developers

On 08/02/21 10:20, Peter Maydell wrote:
>>>> +
>>>> +if get_option('tcg_interpreter')
>>>> +  libffi = dependency('libffi', version: '>=3.0',
>>>> +                      static: enable_static, method: 'pkg-config',
>>>> +                      required: true)
>>>> +  specific_ss.add(libffi)
>>>> +  specific_ss.add(files('tcg/tci.c'))
>>>> +endif
>>> Did you need a PKG_CONFIG_LIBDIR set for homebrew?
>> Is this the "meson doesn't actually add the cflags everywhere it should"
>> bug again ?

No, it shouldn't be the same bug.  In this case the dependency is not 
indirect dependency, specific_ss is included directly:

   target_specific = specific_ss.apply(config_target, strict: false)
   arch_srcs += target_specific.sources()
   arch_deps += target_specific.dependencies()

   lib = static_library('qemu-' + target,
                  sources: arch_srcs + genh,
                  dependencies: arch_deps,
                  objects: objects,
                  include_directories: target_inc,
                  c_args: c_args,
                  build_by_default: false,
                  name_suffix: 'fa')

It's more likely to be what Stefan pointed out later:

> Meanwhile I noticed an additional detail:
> 
> There exist two different pkg-config configurations for libffi on Homebrew:
> 
> % pkg-config --cflags libffi
> -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
> % export PKG_CONFIG_PATH="/opt/homebrew/opt/libffi/lib/pkgconfig"
> % pkg-config --cflags libffi
> -I/opt/homebrew/Cellar/libffi/3.3_2/include
>
> By default it points to a system directory which does not exist at all
> on my Mac, so that will never work.

Paolo



^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-08  9:35             ` Paolo Bonzini
@ 2021-02-08 13:07               ` Stefan Weil
  2021-02-08 17:39                 ` Richard Henderson
  0 siblings, 1 reply; 129+ messages in thread
From: Stefan Weil @ 2021-02-08 13:07 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Maydell, Richard Henderson; +Cc: QEMU Developers

Am 08.02.21 um 10:35 schrieb Paolo Bonzini:

> On 08/02/21 10:20, Peter Maydell wrote:
>>>>> +
>>>>> +if get_option('tcg_interpreter')
>>>>> +  libffi = dependency('libffi', version: '>=3.0',
>>>>> +                      static: enable_static, method: 'pkg-config',
>>>>> +                      required: true)
>>>>> +  specific_ss.add(libffi)
>>>>> +  specific_ss.add(files('tcg/tci.c'))
>>>>> +endif
>>>> Did you need a PKG_CONFIG_LIBDIR set for homebrew?
>>> Is this the "meson doesn't actually add the cflags everywhere it 
>>> should"
>>> bug again ?
>
> No, it shouldn't be the same bug.  In this case the dependency is not 
> indirect dependency, specific_ss is included directly:
>
>   target_specific = specific_ss.apply(config_target, strict: false)
>   arch_srcs += target_specific.sources()
>   arch_deps += target_specific.dependencies()
>
>   lib = static_library('qemu-' + target,
>                  sources: arch_srcs + genh,
>                  dependencies: arch_deps,
>                  objects: objects,
>                  include_directories: target_inc,
>                  c_args: c_args,
>                  build_by_default: false,
>                  name_suffix: 'fa')
>
> It's more likely to be what Stefan pointed out later:
>
>> Meanwhile I noticed an additional detail:
>>
>> There exist two different pkg-config configurations for libffi on 
>> Homebrew:
>>
>> % pkg-config --cflags libffi
>> -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ffi
>> % export PKG_CONFIG_PATH="/opt/homebrew/opt/libffi/lib/pkgconfig"
>> % pkg-config --cflags libffi
>> -I/opt/homebrew/Cellar/libffi/3.3_2/include
>>
>> By default it points to a system directory which does not exist at all
>> on my Mac, so that will never work.
>
> Paolo


Yes, it looks like setting the right PKG_CONFIG_PATH is sufficient for 
builds with Homebrew on macOS.

Richard, this commit is also the one which breaks qemu-system-i386 on 
sparc64 for me:

sw@gcc102:~/src/gitlab/qemu-project/qemu$ git bisect good
115a01c323e6a01902894ec23ba704bf3dc8215a is the first bad commit
commit 115a01c323e6a01902894ec23ba704bf3dc8215a
Author: Richard Henderson <richard.henderson@linaro.org>
Date:   Sat Jan 30 14:24:25 2021 -0800

     tcg/tci: Use ffi for calls

     This requires adjusting where arguments are stored.
     Place them on the stack at left-aligned positions.
     Adjust the stack frame to be at entirely positive offsets.

     Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

  include/tcg/tcg.h        |   1 +
  tcg/tcg.c                |  72 +++++++++++++++-----------
  tcg/tci.c                | 131 
++++++++++++++++++++++++++---------------------
  tcg/tci/tcg-target.c.inc |  50 +++++++++---------
  tcg/tci/tcg-target.h     |   2 +-
  5 files changed, 143 insertions(+), 113 deletions(-)

Regards,

Stefan




^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-08 13:07               ` Stefan Weil
@ 2021-02-08 17:39                 ` Richard Henderson
  2021-02-08 19:04                   ` Stefan Weil
  0 siblings, 1 reply; 129+ messages in thread
From: Richard Henderson @ 2021-02-08 17:39 UTC (permalink / raw)
  To: Stefan Weil, Paolo Bonzini, Peter Maydell; +Cc: QEMU Developers

On 2/8/21 5:07 AM, Stefan Weil wrote:
> Richard, this commit is also the one which breaks qemu-system-i386 on sparc64
> for me:

You'll have to give me more details than that, because qemu-system-i386 works
for me on a niagara5 w/ debian sid.


r~


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-08 17:39                 ` Richard Henderson
@ 2021-02-08 19:04                   ` Stefan Weil
  2021-02-08 22:55                     ` Richard Henderson
  0 siblings, 1 reply; 129+ messages in thread
From: Stefan Weil @ 2021-02-08 19:04 UTC (permalink / raw)
  To: Richard Henderson, Paolo Bonzini, Peter Maydell; +Cc: QEMU Developers


Am 08.02.21 um 18:39 schrieb Richard Henderson:
> On 2/8/21 5:07 AM, Stefan Weil wrote:
>> Richard, this commit is also the one which breaks qemu-system-i386 on sparc64
>> for me:
> You'll have to give me more details than that, because qemu-system-i386 works
> for me on a niagara5 w/ debian sid.


I am testing on a similar Debian system (debian-ports unstable), but 
with a Niagara3 cpu:

Linux gcc102.fsffrance.org 5.10.0-3-sparc64-smp #1 SMP Debian 5.10.12-1 
(2021-01-30) sparc64 GNU/Linux

gcc (Debian 10.2.1-6) 10.2.1 20210110

$ cat /proc/cpuinfo
cpu        : UltraSparc T3 (Niagara3)
fpu        : UltraSparc T3 integrated FPU
pmu        : niagara3
prom        : OBP 4.34.6.c 2017/03/22 13:55
type        : sun4v
ncpus probed    : 256
ncpus active    : 256
D$ parity tl1    : 0
I$ parity tl1    : 0
cpucaps        : 
flush,stbar,swap,muldiv,v9,blkinit,n2,mul32,div32,v8plus,popc,vis,vis2,ASIBlkInit,fmaf,vis3,hpc
[...]

During "git bisect" I had to apply fixes for unaligned memory access as 
your series starts with TCI code which does not include my patch for 
that. The first bisect step was tested with your tci-next branch, the 
last step was tested with my bisect-tci branch 
(https://gitlab.com/stweil/qemu/-/commits/bisect-tci).

Stefan





^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-08 19:04                   ` Stefan Weil
@ 2021-02-08 22:55                     ` Richard Henderson
  2021-02-09 20:46                       ` Richard Henderson
  0 siblings, 1 reply; 129+ messages in thread
From: Richard Henderson @ 2021-02-08 22:55 UTC (permalink / raw)
  To: Stefan Weil, Paolo Bonzini, Peter Maydell; +Cc: QEMU Developers

On 2/8/21 11:04 AM, Stefan Weil wrote:
> 
> Am 08.02.21 um 18:39 schrieb Richard Henderson:
>> On 2/8/21 5:07 AM, Stefan Weil wrote:
>>> Richard, this commit is also the one which breaks qemu-system-i386 on sparc64
>>> for me:
>> You'll have to give me more details than that, because qemu-system-i386 works
>> for me on a niagara5 w/ debian sid.
> 
> 
> I am testing on a similar Debian system (debian-ports unstable), but with a
> Niagara3 cpu:
> 
> Linux gcc102.fsffrance.org 5.10.0-3-sparc64-smp #1 SMP Debian 5.10.12-1
> (2021-01-30) sparc64 GNU/Linux
> 
> gcc (Debian 10.2.1-6) 10.2.1 20210110
> 
> $ cat /proc/cpuinfo
> cpu        : UltraSparc T3 (Niagara3)
> fpu        : UltraSparc T3 integrated FPU
> pmu        : niagara3
> prom        : OBP 4.34.6.c 2017/03/22 13:55
> type        : sun4v
> ncpus probed    : 256
> ncpus active    : 256
> D$ parity tl1    : 0
> I$ parity tl1    : 0
> cpucaps        :
> flush,stbar,swap,muldiv,v9,blkinit,n2,mul32,div32,v8plus,popc,vis,vis2,ASIBlkInit,fmaf,vis3,hpc
> 

Ok, I've reproduced something on a T3 (gcc102.fsffrance.org).
Running the same code side-by-side vs the T5, I get different results.

I'll see if I can track down the difference, since they're both running the
same base os.


r~


^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-08 22:55                     ` Richard Henderson
@ 2021-02-09 20:46                       ` Richard Henderson
  2021-02-09 21:15                         ` Stefan Weil
  0 siblings, 1 reply; 129+ messages in thread
From: Richard Henderson @ 2021-02-09 20:46 UTC (permalink / raw)
  To: Stefan Weil, Paolo Bonzini, Peter Maydell; +Cc: QEMU Developers

[-- Attachment #1: Type: text/plain, Size: 353 bytes --]

On 2/8/21 2:55 PM, Richard Henderson wrote:
> Ok, I've reproduced something on a T3 (gcc102.fsffrance.org).
> Running the same code side-by-side vs the T5, I get different results.

Brown paper bag time: the T5 build dir lost the --enable-tcg-interpreter flag,
so was testing tcg native.

Big-endian bug wrt an odd api wart in libffi.  Fixed thus.


r~

[-- Attachment #2: ffi.patch --]
[-- Type: text/x-patch, Size: 893 bytes --]

diff --git a/tcg/tci.c b/tcg/tci.c
index d27db9f720..dd0cca296a 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -557,8 +557,15 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             case 0: /* void */
                 break;
             case 1: /* uint32_t */
-                regs[TCG_REG_R0] = *(uint32_t *)stack;
-                break;
+                /*
+                 * Note that libffi has an odd special case in that it will
+                 * always widen an integral result to ffi_arg.
+                 */
+                if (sizeof(ffi_arg) == 4) {
+                    regs[TCG_REG_R0] = *(uint32_t *)stack;
+                    break;
+                }
+                /* fall through */
             case 2: /* uint64_t */
                 if (TCG_TARGET_REG_BITS == 32) {
                     tci_write_reg64(regs, TCG_REG_R1, TCG_REG_R0, stack[0]);

^ permalink raw reply related	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-09 20:46                       ` Richard Henderson
@ 2021-02-09 21:15                         ` Stefan Weil
  2021-02-09 21:54                           ` Stefan Weil
  0 siblings, 1 reply; 129+ messages in thread
From: Stefan Weil @ 2021-02-09 21:15 UTC (permalink / raw)
  To: Richard Henderson, Paolo Bonzini, Peter Maydell; +Cc: QEMU Developers

Am 09.02.21 um 21:46 schrieb Richard Henderson:

> On 2/8/21 2:55 PM, Richard Henderson wrote:
>> Ok, I've reproduced something on a T3 (gcc102.fsffrance.org).
>> Running the same code side-by-side vs the T5, I get different results.
> Brown paper bag time: the T5 build dir lost the --enable-tcg-interpreter flag,
> so was testing tcg native.
>
> Big-endian bug wrt an odd api wart in libffi.  Fixed thus.


Thanks for solving this. The patch works for me.

BIOS boot time with qemu-system-i386 is about 41 s (with my code which 
lacks thread support and ffi it was 40 s).

With qemu-system-x86_64 it is twice as fast, so it looks like in my last 
report where I said that the new code had doubled the speed I compared 
different system emulations.

Apropos "slow" TCI: on Apple's M1 it is faster than native TCG on most 
of my Intel / AMD machines. And it works with current git master while 
native TCG still waits for pending patches which fix the memory access.

Stefan





^ permalink raw reply	[flat|nested] 129+ messages in thread

* Re: [PATCH v2 63/93] tcg/tci: Use ffi for calls
  2021-02-09 21:15                         ` Stefan Weil
@ 2021-02-09 21:54                           ` Stefan Weil
  0 siblings, 0 replies; 129+ messages in thread
From: Stefan Weil @ 2021-02-09 21:54 UTC (permalink / raw)
  To: Richard Henderson, Paolo Bonzini, Peter Maydell; +Cc: QEMU Developers

Am 09.02.21 um 22:15 schrieb Stefan Weil:

>
> Thanks for solving this. The patch works for me.
>
> BIOS boot time with qemu-system-i386 is about 41 s (with my code which 
> lacks thread support and ffi it was 40 s).
>
> With qemu-system-x86_64 it is twice as fast, so it looks like in my 
> last report where I said that the new code had doubled the speed I 
> compared different system emulations.


Update: with Richard's latest tci-next branch which includes the fixed 
code both qemu-system-x86_64 and qemu-system-i386 require about 20 s 
user time for the BIOS boot.

Stefan




^ permalink raw reply	[flat|nested] 129+ messages in thread

end of thread, other threads:[~2021-02-09 21:57 UTC | newest]

Thread overview: 129+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-04  1:43 [PATCH v2 00/93] TCI fixes and cleanups Richard Henderson
2021-02-04  1:43 ` [PATCH v2 01/93] gdbstub: Fix handle_query_xfer_auxv Richard Henderson
2021-02-04  1:43 ` [PATCH v2 02/93] tcg: Split out tcg_raise_tb_overflow Richard Henderson
2021-02-04  1:43 ` [PATCH v2 03/93] configure: Fix --enable-tcg-interpreter Richard Henderson
2021-02-04  1:43 ` [PATCH v2 04/93] tcg: Manage splitwx in tc_ptr_to_region_tree by hand Richard Henderson
2021-02-04 15:01   ` Alex Bennée
2021-02-04 17:46     ` Richard Henderson
2021-02-04 18:45       ` Alex Bennée
2021-02-04 19:17         ` Richard Henderson
2021-02-04  1:43 ` [PATCH v2 05/93] tcg/tci: Make tci_tb_ptr thread-local Richard Henderson
2021-02-04 15:05   ` Alex Bennée
2021-02-04  1:43 ` [PATCH v2 06/93] tcg/tci: Implement INDEX_op_ld16s_i32 Richard Henderson
2021-02-04  1:43 ` [PATCH v2 07/93] tcg/tci: Implement INDEX_op_ld8s_i64 Richard Henderson
2021-02-04  1:43 ` [PATCH v2 08/93] tcg/tci: Inline tci_write_reg32s into the only caller Richard Henderson
2021-02-04  1:43 ` [PATCH v2 09/93] tcg/tci: Inline tci_write_reg8 into its callers Richard Henderson
2021-02-04  1:43 ` [PATCH v2 10/93] tcg/tci: Inline tci_write_reg16 into the only caller Richard Henderson
2021-02-04  1:43 ` [PATCH v2 11/93] tcg/tci: Inline tci_write_reg32 into all callers Richard Henderson
2021-02-04  1:43 ` [PATCH v2 12/93] tcg/tci: Inline tci_write_reg64 into 64-bit callers Richard Henderson
2021-02-04  1:43 ` [PATCH v2 13/93] tcg/tci: Merge INDEX_op_ld8u_{i32,i64} Richard Henderson
2021-02-04  1:43 ` [PATCH v2 14/93] tcg/tci: Merge INDEX_op_ld8s_{i32,i64} Richard Henderson
2021-02-04  1:43 ` [PATCH v2 15/93] tcg/tci: Merge INDEX_op_ld16u_{i32,i64} Richard Henderson
2021-02-04  1:43 ` [PATCH v2 16/93] tcg/tci: Merge INDEX_op_ld16s_{i32,i64} Richard Henderson
2021-02-04  1:43 ` [PATCH v2 17/93] tcg/tci: Merge INDEX_op_{ld_i32,ld32u_i64} Richard Henderson
2021-02-04  1:43 ` [PATCH v2 18/93] tcg/tci: Merge INDEX_op_st8_{i32,i64} Richard Henderson
2021-02-04  1:43 ` [PATCH v2 19/93] tcg/tci: Merge INDEX_op_st16_{i32,i64} Richard Henderson
2021-02-04  1:43 ` [PATCH v2 20/93] tcg/tci: Move stack bounds check to compile-time Richard Henderson
2021-02-04  1:43 ` [PATCH v2 21/93] tcg/tci: Merge INDEX_op_{st_i32,st32_i64} Richard Henderson
2021-02-04  1:43 ` [PATCH v2 22/93] tcg/tci: Use g_assert_not_reached Richard Henderson
2021-02-04  1:43 ` [PATCH v2 23/93] tcg/tci: Remove dead code for TCG_TARGET_HAS_div2_* Richard Henderson
2021-02-04  1:44 ` [PATCH v2 24/93] tcg/tci: Implement 64-bit division Richard Henderson
2021-02-04  1:44 ` [PATCH v2 25/93] tcg/tci: Remove TODO as unused Richard Henderson
2021-02-04  1:44 ` [PATCH v2 26/93] tcg/tci: Restrict TCG_TARGET_NB_REGS to 16 Richard Henderson
2021-02-04 15:07   ` Alex Bennée
2021-02-04  1:44 ` [PATCH v2 27/93] tcg/tci: Fix TCG_REG_R4 misusage Richard Henderson
2021-02-04 15:09   ` Alex Bennée
2021-02-04  1:44 ` [PATCH v2 28/93] tcg/tci: Use bool in tcg_out_ri* Richard Henderson
2021-02-04 15:11   ` Alex Bennée
2021-02-04 15:15   ` Alex Bennée
2021-02-04  1:44 ` [PATCH v2 29/93] tcg/tci: Remove TCG_CONST Richard Henderson
2021-02-04 15:39   ` Alex Bennée
2021-02-04 17:52     ` Richard Henderson
2021-02-04 18:48       ` Alex Bennée
2021-02-04  1:44 ` [PATCH v2 30/93] tcg/tci: Merge identical cases in generation Richard Henderson
2021-02-04  1:44 ` [PATCH v2 31/93] tcg/tci: Remove tci_read_r8 Richard Henderson
2021-02-04  1:44 ` [PATCH v2 32/93] tcg/tci: Remove tci_read_r8s Richard Henderson
2021-02-04  1:44 ` [PATCH v2 33/93] tcg/tci: Remove tci_read_r16 Richard Henderson
2021-02-04  1:44 ` [PATCH v2 34/93] tcg/tci: Remove tci_read_r16s Richard Henderson
2021-02-04  1:44 ` [PATCH v2 35/93] tcg/tci: Remove tci_read_r32s Richard Henderson
2021-02-04  1:44 ` [PATCH v2 36/93] tcg/tci: Reduce use of tci_read_r64 Richard Henderson
2021-02-04  1:44 ` [PATCH v2 37/93] tcg/tci: Merge basic arithmetic operations Richard Henderson
2021-02-04  1:44 ` [PATCH v2 38/93] tcg/tci: Merge extension operations Richard Henderson
2021-02-04  1:44 ` [PATCH v2 39/93] tcg/tci: Remove ifdefs for TCG_TARGET_HAS_ext32[us]_i64 Richard Henderson
2021-02-04  1:44 ` [PATCH v2 40/93] tcg/tci: Merge bswap operations Richard Henderson
2021-02-04  1:44 ` [PATCH v2 41/93] tcg/tci: Merge mov, not and neg operations Richard Henderson
2021-02-04  1:44 ` [PATCH v2 42/93] tcg/tci: Rename tci_read_r to tci_read_rval Richard Henderson
2021-02-04  1:44 ` [PATCH v2 43/93] tcg/tci: Split out tci_args_rrs Richard Henderson
2021-02-04  1:44 ` [PATCH v2 44/93] tcg/tci: Split out tci_args_rr Richard Henderson
2021-02-04  1:44 ` [PATCH v2 45/93] tcg/tci: Split out tci_args_rrr Richard Henderson
2021-02-04  1:44 ` [PATCH v2 46/93] tcg/tci: Split out tci_args_rrrc Richard Henderson
2021-02-04  1:44 ` [PATCH v2 47/93] tcg/tci: Split out tci_args_l Richard Henderson
2021-02-04  1:44 ` [PATCH v2 48/93] tcg/tci: Split out tci_args_rrrrrc Richard Henderson
2021-02-04  1:44 ` [PATCH v2 49/93] tcg/tci: Split out tci_args_rrcl and tci_args_rrrrcl Richard Henderson
2021-02-04  1:44 ` [PATCH v2 50/93] tcg/tci: Split out tci_args_ri and tci_args_rI Richard Henderson
2021-02-04  1:44 ` [PATCH v2 51/93] tcg/tci: Reuse tci_args_l for calls Richard Henderson
2021-02-04  1:44 ` [PATCH v2 52/93] tcg/tci: Reuse tci_args_l for exit_tb Richard Henderson
2021-02-04  1:44 ` [PATCH v2 53/93] tcg/tci: Reuse tci_args_l for goto_tb Richard Henderson
2021-02-04  1:44 ` [PATCH v2 54/93] tcg/tci: Split out tci_args_rrrrrr Richard Henderson
2021-02-04  1:44 ` [PATCH v2 55/93] tcg/tci: Split out tci_args_rrrr Richard Henderson
2021-02-04  1:44 ` [PATCH v2 56/93] tcg/tci: Clean up deposit operations Richard Henderson
2021-02-04  1:44 ` [PATCH v2 57/93] tcg/tci: Reduce qemu_ld/st TCGMemOpIdx operand to 32-bits Richard Henderson
2021-02-04  1:44 ` [PATCH v2 58/93] tcg/tci: Split out tci_args_{rrm,rrrm,rrrrm} Richard Henderson
2021-02-04  1:44 ` [PATCH v2 59/93] tcg/tci: Hoist op_size checking into tci_args_* Richard Henderson
2021-02-04  1:44 ` [PATCH v2 60/93] tcg/tci: Remove tci_disas Richard Henderson
2021-02-04  1:44 ` [PATCH v2 61/93] tcg/tci: Implement the disassembler properly Richard Henderson
2021-02-04  1:44 ` [PATCH v2 62/93] tcg: Build ffi data structures for helpers Richard Henderson
2021-02-04  1:44 ` [PATCH v2 63/93] tcg/tci: Use ffi for calls Richard Henderson
2021-02-07 16:25   ` Stefan Weil
2021-02-07 17:39     ` Richard Henderson
2021-02-07 19:52       ` Peter Maydell
2021-02-07 20:12         ` Richard Henderson
2021-02-07 21:33           ` Stefan Weil
2021-02-08  9:20           ` Peter Maydell
2021-02-08  9:35             ` Paolo Bonzini
2021-02-08 13:07               ` Stefan Weil
2021-02-08 17:39                 ` Richard Henderson
2021-02-08 19:04                   ` Stefan Weil
2021-02-08 22:55                     ` Richard Henderson
2021-02-09 20:46                       ` Richard Henderson
2021-02-09 21:15                         ` Stefan Weil
2021-02-09 21:54                           ` Stefan Weil
2021-02-04  1:44 ` [PATCH v2 64/93] tcg/tci: Improve tcg_target_call_clobber_regs Richard Henderson
2021-02-04  1:44 ` [PATCH v2 65/93] tcg/tci: Move call-return regs to end of tcg_target_reg_alloc_order Richard Henderson
2021-02-04  1:44 ` [PATCH v2 66/93] tcg/tci: Push opcode emit into each case Richard Henderson
2021-02-04  1:44 ` [PATCH v2 67/93] tcg/tci: Split out tcg_out_op_rrs Richard Henderson
2021-02-04  1:44 ` [PATCH v2 68/93] tcg/tci: Split out tcg_out_op_l Richard Henderson
2021-02-04  1:44 ` [PATCH v2 69/93] tcg/tci: Split out tcg_out_op_p Richard Henderson
2021-02-04  1:44 ` [PATCH v2 70/93] tcg/tci: Split out tcg_out_op_rr Richard Henderson
2021-02-04  1:44 ` [PATCH v2 71/93] tcg/tci: Split out tcg_out_op_rrr Richard Henderson
2021-02-04  1:44 ` [PATCH v2 72/93] tcg/tci: Split out tcg_out_op_rrrc Richard Henderson
2021-02-04  1:44 ` [PATCH v2 73/93] tcg/tci: Split out tcg_out_op_rrrrrc Richard Henderson
2021-02-04  1:44 ` [PATCH v2 74/93] tcg/tci: Split out tcg_out_op_rrrbb Richard Henderson
2021-02-04  1:44 ` [PATCH v2 75/93] tcg/tci: Split out tcg_out_op_rrcl Richard Henderson
2021-02-04  1:44 ` [PATCH v2 76/93] tcg/tci: Split out tcg_out_op_rrrrrr Richard Henderson
2021-02-04  1:44 ` [PATCH v2 77/93] tcg/tci: Split out tcg_out_op_rrrr Richard Henderson
2021-02-04  1:44 ` [PATCH v2 78/93] tcg/tci: Split out tcg_out_op_rrrrcl Richard Henderson
2021-02-04  1:44 ` [PATCH v2 79/93] tcg/tci: Split out tcg_out_op_{rrm,rrrm,rrrrm} Richard Henderson
2021-02-04  1:54 ` Richard Henderson
2021-02-04 14:54   ` Alex Bennée
2021-02-04 17:54     ` Richard Henderson
2021-02-04  1:54 ` [PATCH v2 80/93] tcg/tci: Split out tcg_out_op_v Richard Henderson
2021-02-04  1:55 ` [PATCH v2 81/93] tcg/tci: Split out tcg_out_op_np Richard Henderson
2021-02-04  1:55 ` [PATCH v2 82/93] tcg/tci: Split out tcg_out_op_r[iI] Richard Henderson
2021-02-04  1:55 ` [PATCH v2 83/93] tcg/tci: Reserve r13 for a temporary Richard Henderson
2021-02-04  1:56 ` [PATCH v2 84/93] tcg/tci: Emit setcond before brcond Richard Henderson
2021-02-04  1:56 ` [PATCH v2 85/93] tcg/tci: Remove tci_write_reg Richard Henderson
2021-02-04  1:56 ` [PATCH v2 86/93] tcg/tci: Change encoding to uint32_t units Richard Henderson
2021-02-04  1:57 ` [PATCH v2 87/93] tcg/tci: Implement goto_ptr Richard Henderson
2021-02-04  1:57 ` [PATCH v2 88/93] tcg/tci: Implement movcond Richard Henderson
2021-02-04  1:57 ` [PATCH v2 89/93] tcg/tci: Implement andc, orc, eqv, nand, nor Richard Henderson
2021-02-04  1:57 ` [PATCH v2 90/93] tcg/tci: Implement extract, sextract Richard Henderson
2021-02-04  1:58 ` [PATCH v2 91/93] tcg/tci: Implement clz, ctz, ctpop Richard Henderson
2021-02-04  1:58 ` [PATCH v2 92/93] tcg/tci: Implement mulu2, muls2 Richard Henderson
2021-02-04  1:58 ` [PATCH v2 93/93] tcg/tci: Implement add2, sub2 Richard Henderson
2021-02-04  3:31 ` [PATCH v2 00/93] TCI fixes and cleanups no-reply
2021-02-04  9:58 ` Peter Maydell
2021-02-04 20:02   ` Stefan Weil
2021-02-04 20:42     ` Richard Henderson
2021-02-04 23:52     ` Richard Henderson
2021-02-05  2:39     ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.