All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
@ 2011-09-17 19:59 Stefan Weil
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 1/8] tcg: Declare TCG_TARGET_REG_BITS in tcg.h Stefan Weil
                   ` (10 more replies)
  0 siblings, 11 replies; 48+ messages in thread
From: Stefan Weil @ 2011-09-17 19:59 UTC (permalink / raw)
  To: QEMU Developers, Blue Swirl, malc, TeLeMan, Stuart Brady

Hello,

these patches add a new code generator (TCG target) to qemu.

Unlike other tcg target code generators, this one does not generate
machine code for some cpu. It generates machine independent bytecode
which is interpreted later. That's why I called it TCI (tiny code
interpreter).

I wrote most of the code two years ago and included feedback and
contributions from several QEMU developers, notably TeleMan,
Stuart Brady, Blue Swirl and Malc. See the history here:
http://lists.nongnu.org/archive/html/qemu-devel/2009-09/msg01710.html

Since that time, I used TCI regularly, added small fixes and improvements
and rebased it to latest QEMU. Some versions were tested using
ARM (emulated and real), PowerPC (emulated) and MIPS (emulated) hosts,
but normally I run it on i386 and x86_64 hosts.

I'd appreciate to see TCI in QEMU 1.0.

Regards,
Stefan Weil

The patches 2 and 4 are optional, patch 8 is only needed for running
TCI on a PowerPC host.

[PATCH 1/8] tcg: Declare TCG_TARGET_REG_BITS in tcg.h
[PATCH 2/8] tcg: Don't declare TCG_TARGET_REG_BITS in tcg-target.h
[PATCH 3/8] tcg: Add forward declarations for local functions
[PATCH 4/8] tcg: Add some assertions
[PATCH 5/8] tcg: Add interpreter for bytecode
[PATCH 6/8] tcg: Add bytecode generator for tcg interpreter
[PATCH 7/8] tcg: Add tcg interpreter to configure / make
[PATCH 8/8] ppc: Support tcg interpreter on ppc hosts

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 1/8] tcg: Declare TCG_TARGET_REG_BITS in tcg.h
  2011-09-17 19:59 [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Stefan Weil
@ 2011-09-17 20:00 ` Stefan Weil
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 2/8] tcg: Don't declare TCG_TARGET_REG_BITS in tcg-target.h Stefan Weil
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 48+ messages in thread
From: Stefan Weil @ 2011-09-17 20:00 UTC (permalink / raw)
  To: QEMU Developers

TCG_TARGET_REG_BITS can be determined by the compiler,
so there is no need to declare it for each individual tcg target.

This is especially important for new tcg targets
which will be supported by the tcg interpreter.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 tcg/tcg.h |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index dc5e9c9..1859fae 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -22,6 +22,16 @@
  * THE SOFTWARE.
  */
 #include "qemu-common.h"
+
+/* Target word size (must be identical to pointer size). */
+#if UINTPTR_MAX == UINT32_MAX
+# define TCG_TARGET_REG_BITS 32
+#elif UINTPTR_MAX == UINT64_MAX
+# define TCG_TARGET_REG_BITS 64
+#else
+# error Unknown pointer size for tcg target
+#endif
+
 #include "tcg-target.h"
 #include "tcg-runtime.h"
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 2/8] tcg: Don't declare TCG_TARGET_REG_BITS in tcg-target.h
  2011-09-17 19:59 [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Stefan Weil
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 1/8] tcg: Declare TCG_TARGET_REG_BITS in tcg.h Stefan Weil
@ 2011-09-17 20:00 ` Stefan Weil
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 3/8] tcg: Add forward declarations for local functions Stefan Weil
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 48+ messages in thread
From: Stefan Weil @ 2011-09-17 20:00 UTC (permalink / raw)
  To: QEMU Developers

It is now declared for all tcg targets in tcg.h,
so the tcg target specific declarations are redundant.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 tcg/arm/tcg-target.h   |    1 -
 tcg/hppa/tcg-target.h  |    4 +---
 tcg/ia64/tcg-target.h  |    2 --
 tcg/mips/tcg-target.h  |    1 -
 tcg/ppc/tcg-target.h   |    1 -
 tcg/ppc64/tcg-target.h |    1 -
 tcg/s390/tcg-target.h  |    6 ------
 tcg/sparc/tcg-target.h |    6 ------
 8 files changed, 1 insertions(+), 21 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 0e0f69a..33afd97 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -24,7 +24,6 @@
  */
 #define TCG_TARGET_ARM 1
 
-#define TCG_TARGET_REG_BITS 32
 #undef TCG_TARGET_WORDS_BIGENDIAN
 #undef TCG_TARGET_STACK_GROWSUP
 
diff --git a/tcg/hppa/tcg-target.h b/tcg/hppa/tcg-target.h
index ed90efc..ec9a7bf 100644
--- a/tcg/hppa/tcg-target.h
+++ b/tcg/hppa/tcg-target.h
@@ -24,9 +24,7 @@
 
 #define TCG_TARGET_HPPA 1
 
-#if defined(_PA_RISC1_1)
-#define TCG_TARGET_REG_BITS 32
-#else
+#if TCG_TARGET_REG_BITS != 32
 #error unsupported
 #endif
 
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index ddc93c1..578cf29 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -24,8 +24,6 @@
  */
 #define TCG_TARGET_IA64 1
 
-#define TCG_TARGET_REG_BITS 64
-
 /* We only map the first 64 registers */
 #define TCG_TARGET_NB_REGS 64
 enum {
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 43c5501..e2a2571 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -25,7 +25,6 @@
  */
 #define TCG_TARGET_MIPS 1
 
-#define TCG_TARGET_REG_BITS 32
 #ifdef __MIPSEB__
 # define TCG_TARGET_WORDS_BIGENDIAN
 #endif
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index f9a88c4..5c2d612 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -23,7 +23,6 @@
  */
 #define TCG_TARGET_PPC 1
 
-#define TCG_TARGET_REG_BITS 32
 #define TCG_TARGET_WORDS_BIGENDIAN
 #define TCG_TARGET_NB_REGS 32
 
diff --git a/tcg/ppc64/tcg-target.h b/tcg/ppc64/tcg-target.h
index 5395131..8d1fb73 100644
--- a/tcg/ppc64/tcg-target.h
+++ b/tcg/ppc64/tcg-target.h
@@ -23,7 +23,6 @@
  */
 #define TCG_TARGET_PPC64 1
 
-#define TCG_TARGET_REG_BITS 64
 #define TCG_TARGET_WORDS_BIGENDIAN
 #define TCG_TARGET_NB_REGS 32
 
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 35ebac3..e4cd641 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -23,12 +23,6 @@
  */
 #define TCG_TARGET_S390 1
 
-#ifdef __s390x__
-#define TCG_TARGET_REG_BITS 64
-#else
-#define TCG_TARGET_REG_BITS 32
-#endif
-
 #define TCG_TARGET_WORDS_BIGENDIAN
 
 typedef enum TCGReg {
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 7b4e7f9..1464ef4 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -23,12 +23,6 @@
  */
 #define TCG_TARGET_SPARC 1
 
-#if defined(__sparc_v9__) && !defined(__sparc_v8plus__)
-#define TCG_TARGET_REG_BITS 64
-#else
-#define TCG_TARGET_REG_BITS 32
-#endif
-
 #define TCG_TARGET_WORDS_BIGENDIAN
 
 #define TCG_TARGET_NB_REGS 32
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 3/8] tcg: Add forward declarations for local functions
  2011-09-17 19:59 [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Stefan Weil
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 1/8] tcg: Declare TCG_TARGET_REG_BITS in tcg.h Stefan Weil
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 2/8] tcg: Don't declare TCG_TARGET_REG_BITS in tcg-target.h Stefan Weil
@ 2011-09-17 20:00 ` Stefan Weil
  2011-09-17 21:40   ` Peter Maydell
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 4/8] tcg: Add some assertions Stefan Weil
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 48+ messages in thread
From: Stefan Weil @ 2011-09-17 20:00 UTC (permalink / raw)
  To: QEMU Developers

These functions are defined in the tcg target specific file
tcg-target.c.

The forward declarations assert that every tcg target uses
the same function prototype.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 tcg/tcg.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 411f971..bdd7a67 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -63,11 +63,27 @@
 #error GUEST_BASE not supported on this host.
 #endif
 
+/* Forward declarations for functions declared in tcg-target.c and used here. */
 static void tcg_target_init(TCGContext *s);
 static void tcg_target_qemu_prologue(TCGContext *s);
 static void patch_reloc(uint8_t *code_ptr, int type, 
                         tcg_target_long value, tcg_target_long addend);
 
+/* Forward declarations for functions declared and used in tcg-target.c. */
+static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str);
+static void tcg_out_ld(TCGContext *s, TCGType type, int ret, int arg1,
+                       tcg_target_long arg2);
+static void tcg_out_mov(TCGContext *s, TCGType type, int ret, int arg);
+static void tcg_out_movi(TCGContext *s, TCGType type,
+                         int ret, tcg_target_long arg);
+static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
+                       const int *const_args);
+static void tcg_out_st(TCGContext *s, TCGType type, int arg, int arg1,
+                       tcg_target_long arg2);
+static int tcg_target_const_match(tcg_target_long val,
+                                  const TCGArgConstraint *arg_ct);
+static int tcg_target_get_call_iarg_regs_count(int flags);
+
 TCGOpDef tcg_op_defs[] = {
 #define DEF(s, oargs, iargs, cargs, flags) { #s, oargs, iargs, cargs, iargs + oargs + cargs, flags },
 #include "tcg-opc.h"
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 4/8] tcg: Add some assertions
  2011-09-17 19:59 [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Stefan Weil
                   ` (2 preceding siblings ...)
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 3/8] tcg: Add forward declarations for local functions Stefan Weil
@ 2011-09-17 20:00 ` Stefan Weil
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode Stefan Weil
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 48+ messages in thread
From: Stefan Weil @ 2011-09-17 20:00 UTC (permalink / raw)
  To: QEMU Developers

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 tcg/tcg.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index bdd7a67..30f3aef 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -794,7 +794,9 @@ static char *tcg_get_arg_str_idx(TCGContext *s, char *buf, int buf_size,
 {
     TCGTemp *ts;
 
+    assert(idx >= 0 && idx < s->nb_temps);
     ts = &s->temps[idx];
+    assert(ts);
     if (idx < s->nb_globals) {
         pstrcpy(buf, buf_size, ts->name);
     } else {
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode
  2011-09-17 19:59 [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Stefan Weil
                   ` (3 preceding siblings ...)
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 4/8] tcg: Add some assertions Stefan Weil
@ 2011-09-17 20:00 ` Stefan Weil
  2011-09-18  4:03   ` Andi Kleen
                     ` (3 more replies)
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter Stefan Weil
                   ` (5 subsequent siblings)
  10 siblings, 4 replies; 48+ messages in thread
From: Stefan Weil @ 2011-09-17 20:00 UTC (permalink / raw)
  To: QEMU Developers

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 tcg/tcg.h |    4 +-
 tcg/tci.c | 1200 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 1203 insertions(+), 1 deletions(-)
 create mode 100644 tcg/tci.c

diff --git a/tcg/tcg.h b/tcg/tcg.h
index 1859fae..c99c7ea 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -577,7 +577,9 @@ TCGv_i32 tcg_const_local_i32(int32_t val);
 TCGv_i64 tcg_const_local_i64(int64_t val);
 
 extern uint8_t code_gen_prologue[];
-#if defined(_ARCH_PPC) && !defined(_ARCH_PPC64)
+#if defined(CONFIG_TCG_INTERPRETER)
+unsigned long tcg_qemu_tb_exec(CPUState *env, uint8_t *tb_ptr);
+#elif defined(_ARCH_PPC) && !defined(_ARCH_PPC64)
 #define tcg_qemu_tb_exec(env, tb_ptr)                                    \
     ((long REGPARM __attribute__ ((longcall)) (*)(void *, void *))code_gen_prologue)(env, tb_ptr)
 #else
diff --git a/tcg/tci.c b/tcg/tci.c
new file mode 100644
index 0000000..eea9992
--- /dev/null
+++ b/tcg/tci.c
@@ -0,0 +1,1200 @@
+/*
+ * Tiny Code Interpreter for QEMU
+ *
+ * Copyright (c) 2009, 2011 Stefan Weil
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "config.h"
+#include "qemu-common.h"
+#include "exec-all.h"           /* MAX_OPC_PARAM_IARGS */
+#include "tcg-op.h"
+
+/* Marker for missing code. */
+#define TODO() \
+    do { \
+        fprintf(stderr, "TODO %s:%u: %s()\n", \
+                __FILE__, __LINE__, __func__); \
+        tcg_abort(); \
+    } while (0)
+
+/* Trace message to see program flow. */
+#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
+#define TRACE() \
+    loglevel \
+    ? fprintf(stderr, "TCG %s:%u: %s()\n", __FILE__, __LINE__, __func__) \
+    : (void)0
+#else
+#define TRACE() ((void)0)
+#endif
+
+#if MAX_OPC_PARAM_IARGS != 4
+# error Fix needed, number of supported input arguments changed!
+#endif
+#if TCG_TARGET_REG_BITS == 32
+typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong,
+                                    tcg_target_ulong, tcg_target_ulong,
+                                    tcg_target_ulong, tcg_target_ulong,
+                                    tcg_target_ulong, tcg_target_ulong);
+#else
+typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong,
+                                    tcg_target_ulong, tcg_target_ulong);
+#endif
+
+CPUState *env;
+
+/* Alpha and SH4 user mode emulations call GETPC(), so they need tci_tb_ptr. */
+#if defined(CONFIG_SOFTMMU) || defined(TARGET_ALPHA) || defined(TARGET_SH4)
+# define NEEDS_TB_PTR
+#endif
+
+#ifdef NEEDS_TB_PTR
+uint8_t *tci_tb_ptr;
+#endif
+
+static tcg_target_ulong tci_reg[TCG_TARGET_NB_REGS];
+
+static tcg_target_ulong tci_read_reg(TCGRegister index)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    return tci_reg[index];
+}
+
+#if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
+static int8_t tci_read_reg8s(TCGRegister index)
+{
+    return (int8_t)tci_read_reg(index);
+}
+#endif
+
+#if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
+static int16_t tci_read_reg16s(TCGRegister index)
+{
+    return (int16_t)tci_read_reg(index);
+}
+#endif
+
+#if TCG_TARGET_REG_BITS == 64
+static int32_t tci_read_reg32s(TCGRegister index)
+{
+    return (int32_t)tci_read_reg(index);
+}
+#endif
+
+static uint8_t tci_read_reg8(TCGRegister index)
+{
+    return (uint8_t)tci_read_reg(index);
+}
+
+static uint16_t tci_read_reg16(TCGRegister index)
+{
+    return (uint16_t)tci_read_reg(index);
+}
+
+static uint32_t tci_read_reg32(TCGRegister index)
+{
+    return (uint32_t)tci_read_reg(index);
+}
+
+#if TCG_TARGET_REG_BITS == 64
+static uint64_t tci_read_reg64(TCGRegister index)
+{
+    return tci_read_reg(index);
+}
+#endif
+
+static void tci_write_reg(TCGRegister index, tcg_target_ulong value)
+{
+    assert(index < ARRAY_SIZE(tci_reg));
+    assert(index != TCG_AREG0);
+    tci_reg[index] = value;
+}
+
+static void tci_write_reg8s(TCGRegister index, int8_t value)
+{
+    tci_write_reg(index, value);
+}
+
+static void tci_write_reg16s(TCGRegister index, int16_t value)
+{
+    tci_write_reg(index, value);
+}
+
+#if TCG_TARGET_REG_BITS == 64
+static void tci_write_reg32s(TCGRegister index, int32_t value)
+{
+    tci_write_reg(index, value);
+}
+#endif
+
+static void tci_write_reg8(TCGRegister index, uint8_t value)
+{
+    tci_write_reg(index, value);
+}
+
+static void tci_write_reg16(TCGRegister index, uint16_t value)
+{
+    tci_write_reg(index, value);
+}
+
+static void tci_write_reg32(TCGRegister index, uint32_t value)
+{
+    tci_write_reg(index, value);
+}
+
+#if TCG_TARGET_REG_BITS == 32
+static void tci_write_reg64(uint32_t high_index, uint32_t low_index,
+                            uint64_t value)
+{
+    tci_write_reg(low_index, value);
+    tci_write_reg(high_index, value >> 32);
+}
+#elif TCG_TARGET_REG_BITS == 64
+static void tci_write_reg64(TCGRegister index, uint64_t value)
+{
+    tci_write_reg(index, value);
+}
+#endif
+
+#if TCG_TARGET_REG_BITS == 32
+/* Create a 64 bit value from two 32 bit values. */
+static uint64_t tci_uint64(uint32_t high, uint32_t low)
+{
+    return ((uint64_t)high << 32) + low;
+}
+#endif
+
+/* Read constant (native size) from bytecode. */
+static tcg_target_ulong tci_read_i(uint8_t **tb_ptr)
+{
+    tcg_target_ulong value = *(tcg_target_ulong *)(*tb_ptr);
+    *tb_ptr += sizeof(value);
+    return value;
+}
+
+/* Read constant (32 bit) from bytecode. */
+static uint32_t tci_read_i32(uint8_t **tb_ptr)
+{
+    uint32_t value = *(uint32_t *)(*tb_ptr);
+    *tb_ptr += sizeof(value);
+    return value;
+}
+
+#if TCG_TARGET_REG_BITS == 64
+/* Read constant (64 bit) from bytecode. */
+static uint64_t tci_read_i64(uint8_t **tb_ptr)
+{
+    uint64_t value = *(uint64_t *)(*tb_ptr);
+    *tb_ptr += sizeof(value);
+    return value;
+}
+#endif
+
+/* Read indexed register (native size) from bytecode. */
+static tcg_target_ulong tci_read_r(uint8_t **tb_ptr)
+{
+    tcg_target_ulong value = tci_read_reg(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+
+/* Read indexed register (8 bit) from bytecode. */
+static uint8_t tci_read_r8(uint8_t **tb_ptr)
+{
+    uint8_t value = tci_read_reg8(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+
+#if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
+/* Read indexed register (8 bit signed) from bytecode. */
+static int8_t tci_read_r8s(uint8_t **tb_ptr)
+{
+    int8_t value = tci_read_reg8s(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+#endif
+
+/* Read indexed register (16 bit) from bytecode. */
+static uint16_t tci_read_r16(uint8_t **tb_ptr)
+{
+    uint16_t value = tci_read_reg16(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+
+#if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
+/* Read indexed register (16 bit signed) from bytecode. */
+static int16_t tci_read_r16s(uint8_t **tb_ptr)
+{
+    int16_t value = tci_read_reg16s(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+#endif
+
+/* Read indexed register (32 bit) from bytecode. */
+static uint32_t tci_read_r32(uint8_t **tb_ptr)
+{
+    uint32_t value = tci_read_reg32(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+
+#if TCG_TARGET_REG_BITS == 32
+/* Read two indexed registers (2 * 32 bit) from bytecode. */
+static uint64_t tci_read_r64(uint8_t **tb_ptr)
+{
+    uint32_t low = tci_read_r32(tb_ptr);
+    return tci_uint64(tci_read_r32(tb_ptr), low);
+}
+#elif TCG_TARGET_REG_BITS == 64
+/* Read indexed register (32 bit signed) from bytecode. */
+static int32_t tci_read_r32s(uint8_t **tb_ptr)
+{
+    int32_t value = tci_read_reg32s(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+
+/* Read indexed register (64 bit) from bytecode. */
+static uint64_t tci_read_r64(uint8_t **tb_ptr)
+{
+    uint64_t value = tci_read_reg64(**tb_ptr);
+    *tb_ptr += 1;
+    return value;
+}
+#endif
+
+/* Read indexed register(s) with target address from bytecode. */
+static target_ulong tci_read_ulong(uint8_t **tb_ptr)
+{
+    target_ulong taddr = tci_read_r(tb_ptr);
+#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
+    taddr += (uint64_t)tci_read_r(tb_ptr) << 32;
+#endif
+    return taddr;
+}
+
+/* Read indexed register or constant (native size) from bytecode. */
+static tcg_target_ulong tci_read_ri(uint8_t **tb_ptr)
+{
+    tcg_target_ulong value;
+    TCGRegister r = **tb_ptr;
+    *tb_ptr += 1;
+    if (r == TCG_CONST) {
+        value = tci_read_i(tb_ptr);
+    } else {
+        value = tci_read_reg(r);
+    }
+    return value;
+}
+
+/* Read indexed register or constant (32 bit) from bytecode. */
+static uint32_t tci_read_ri32(uint8_t **tb_ptr)
+{
+    uint32_t value;
+    TCGRegister r = **tb_ptr;
+    *tb_ptr += 1;
+    if (r == TCG_CONST) {
+        value = tci_read_i32(tb_ptr);
+    } else {
+        value = tci_read_reg32(r);
+    }
+    return value;
+}
+
+#if TCG_TARGET_REG_BITS == 32
+/* Read two indexed registers or constants (2 * 32 bit) from bytecode. */
+static uint64_t tci_read_ri64(uint8_t **tb_ptr)
+{
+    uint32_t low = tci_read_ri32(tb_ptr);
+    return tci_uint64(tci_read_ri32(tb_ptr), low);
+}
+#elif TCG_TARGET_REG_BITS == 64
+/* Read indexed register or constant (64 bit) from bytecode. */
+static uint64_t tci_read_ri64(uint8_t **tb_ptr)
+{
+    uint64_t value;
+    TCGRegister r = **tb_ptr;
+    *tb_ptr += 1;
+    if (r == TCG_CONST) {
+        value = tci_read_i64(tb_ptr);
+    } else {
+        value = tci_read_reg64(r);
+    }
+    return value;
+}
+#endif
+
+static target_ulong tci_read_label(uint8_t **tb_ptr)
+{
+    target_ulong label = tci_read_i(tb_ptr);
+    assert(label != 0);
+    return label;
+}
+
+static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
+{
+    bool result = false;
+    int32_t i0 = u0;
+    int32_t i1 = u1;
+    switch (condition) {
+    case TCG_COND_EQ:
+        result = (u0 == u1);
+        break;
+    case TCG_COND_NE:
+        result = (u0 != u1);
+        break;
+    case TCG_COND_LT:
+        result = (i0 < i1);
+        break;
+    case TCG_COND_GE:
+        result = (i0 >= i1);
+        break;
+    case TCG_COND_LE:
+        result = (i0 <= i1);
+        break;
+    case TCG_COND_GT:
+        result = (i0 > i1);
+        break;
+    case TCG_COND_LTU:
+        result = (u0 < u1);
+        break;
+    case TCG_COND_GEU:
+        result = (u0 >= u1);
+        break;
+    case TCG_COND_LEU:
+        result = (u0 <= u1);
+        break;
+    case TCG_COND_GTU:
+        result = (u0 > u1);
+        break;
+    default:
+        TODO();
+    }
+    return result;
+}
+
+static bool tci_compare64(uint64_t u0, uint64_t u1, TCGCond condition)
+{
+    bool result = false;
+    int64_t i0 = u0;
+    int64_t i1 = u1;
+    switch (condition) {
+    case TCG_COND_EQ:
+        result = (u0 == u1);
+        break;
+    case TCG_COND_NE:
+        result = (u0 != u1);
+        break;
+    case TCG_COND_LT:
+        result = (i0 < i1);
+        break;
+    case TCG_COND_GE:
+        result = (i0 >= i1);
+        break;
+    case TCG_COND_LE:
+        result = (i0 <= i1);
+        break;
+    case TCG_COND_GT:
+        result = (i0 > i1);
+        break;
+    case TCG_COND_LTU:
+        result = (u0 < u1);
+        break;
+    case TCG_COND_GEU:
+        result = (u0 >= u1);
+        break;
+    case TCG_COND_LEU:
+        result = (u0 <= u1);
+        break;
+    case TCG_COND_GTU:
+        result = (u0 > u1);
+        break;
+    default:
+        TODO();
+    }
+    return result;
+}
+
+/* Interpret pseudo code in tb. */
+unsigned long tcg_qemu_tb_exec(CPUState *cpustate, uint8_t *tb_ptr)
+{
+    unsigned long next_tb = 0;
+
+    TRACE();
+
+    env = cpustate;
+    tci_reg[TCG_AREG0] = (tcg_target_ulong)cpustate;
+    assert(tb_ptr);
+
+    for (;;) {
+#ifdef NEEDS_TB_PTR
+        tci_tb_ptr = tb_ptr;
+#endif
+        uint8_t *old_code_ptr = tb_ptr;
+        TCGOpcode opc = *tb_ptr++;
+        uint8_t op_size = *tb_ptr++;
+        tcg_target_ulong t0;
+        tcg_target_ulong t1;
+        tcg_target_ulong t2;
+        tcg_target_ulong label;
+        TCGCond condition;
+        target_ulong taddr;
+#ifndef CONFIG_SOFTMMU
+        tcg_target_ulong host_addr;
+#endif
+        uint8_t u8;
+        uint16_t u16;
+        uint32_t u32;
+        uint64_t u64;
+#if TCG_TARGET_REG_BITS == 32
+        uint64_t v64;
+#endif
+
+#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
+        if (loglevel) {
+            tci_disas(opc);
+        }
+#endif
+
+        switch (opc) {
+        case INDEX_op_end:
+        case INDEX_op_nop:
+            break;
+        case INDEX_op_nop1:
+        case INDEX_op_nop2:
+        case INDEX_op_nop3:
+        case INDEX_op_nopn:
+        case INDEX_op_discard:
+            TODO();
+            break;
+        case INDEX_op_set_label:
+            TODO();
+            break;
+        case INDEX_op_call:
+            t0 = tci_read_ri(&tb_ptr);
+#if TCG_TARGET_REG_BITS == 32
+            u64 = ((helper_function)t0)(tci_read_reg(TCG_REG_R0),
+                                        tci_read_reg(TCG_REG_R1),
+                                        tci_read_reg(TCG_REG_R2),
+                                        tci_read_reg(TCG_REG_R3),
+                                        tci_read_reg(TCG_REG_R5),
+                                        tci_read_reg(TCG_REG_R6),
+                                        tci_read_reg(TCG_REG_R7),
+                                        tci_read_reg(TCG_REG_R8));
+            tci_write_reg(TCG_REG_R0, u64);
+            tci_write_reg(TCG_REG_R1, u64 >> 32);
+#else
+            u64 = ((helper_function)t0)(tci_read_reg(TCG_REG_R0),
+                                        tci_read_reg(TCG_REG_R1),
+                                        tci_read_reg(TCG_REG_R2),
+                                        tci_read_reg(TCG_REG_R3));
+            tci_write_reg(TCG_REG_R0, u64);
+#endif
+            break;
+        case INDEX_op_jmp:
+        case INDEX_op_br:
+            label = tci_read_label(&tb_ptr);
+            assert(tb_ptr == old_code_ptr + op_size);
+            tb_ptr = (uint8_t *)label;
+            continue;
+        case INDEX_op_setcond_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            condition = *tb_ptr++;
+            tci_write_reg32(t0, tci_compare32(t1, t2, condition));
+            break;
+#if TCG_TARGET_REG_BITS == 32
+        case INDEX_op_setcond2_i32:
+            t0 = *tb_ptr++;
+            u64 = tci_read_r64(&tb_ptr);
+            v64 = tci_read_ri64(&tb_ptr);
+            condition = *tb_ptr++;
+            tci_write_reg32(t0, tci_compare64(u64, v64, condition));
+            break;
+#elif TCG_TARGET_REG_BITS == 64
+        case INDEX_op_setcond_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            condition = *tb_ptr++;
+            tci_write_reg64(t0, tci_compare64(t1, t2, condition));
+            break;
+#endif
+        case INDEX_op_mov_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r32(&tb_ptr);
+            tci_write_reg32(t0, t1);
+            break;
+        case INDEX_op_movi_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_i32(&tb_ptr);
+            tci_write_reg32(t0, t1);
+            break;
+    /* Load/store operations. */
+        case INDEX_op_ld8u_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            tci_write_reg8(t0, *(uint8_t *)(t1 + t2));
+            break;
+        case INDEX_op_ld8s_i32:
+        case INDEX_op_ld16u_i32:
+            TODO();
+            break;
+        case INDEX_op_ld16s_i32:
+            TODO();
+            break;
+        case INDEX_op_ld_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            tci_write_reg32(t0, *(uint32_t *)(t1 + t2));
+            break;
+        case INDEX_op_st8_i32:
+            t0 = tci_read_r8(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            *(uint8_t *)(t1 + t2) = t0;
+            break;
+        case INDEX_op_st16_i32:
+            t0 = tci_read_r16(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            *(uint16_t *)(t1 + t2) = t0;
+            break;
+        case INDEX_op_st_i32:
+            t0 = tci_read_r32(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            *(uint32_t *)(t1 + t2) = t0;
+            break;
+    /* Arithmetic operations. */
+        case INDEX_op_add_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 + t2);
+            break;
+        case INDEX_op_sub_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 - t2);
+            break;
+        case INDEX_op_mul_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 * t2);
+            break;
+#if TCG_TARGET_HAS_div_i32
+        case INDEX_op_div_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, (int32_t)t1 / (int32_t)t2);
+            break;
+        case INDEX_op_divu_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 / t2);
+            break;
+        case INDEX_op_rem_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, (int32_t)t1 % (int32_t)t2);
+            break;
+        case INDEX_op_remu_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 % t2);
+            break;
+#elif TCG_TARGET_HAS_div2_i32
+        case INDEX_op_div2_i32:
+        case INDEX_op_divu2_i32:
+            TODO();
+            break;
+#endif
+        case INDEX_op_and_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 & t2);
+            break;
+        case INDEX_op_or_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 | t2);
+            break;
+        case INDEX_op_xor_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 ^ t2);
+            break;
+    /* Shift/rotate operations. */
+        case INDEX_op_shl_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 << t2);
+            break;
+        case INDEX_op_shr_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, t1 >> t2);
+            break;
+        case INDEX_op_sar_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, ((int32_t)t1 >> t2));
+            break;
+#if TCG_TARGET_HAS_rot_i32
+        case INDEX_op_rotl_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, (t1 << t2) | (t1 >> (32 - t2)));
+            break;
+        case INDEX_op_rotr_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri32(&tb_ptr);
+            t2 = tci_read_ri32(&tb_ptr);
+            tci_write_reg32(t0, (t1 >> t2) | (t1 << (32 - t2)));
+            break;
+#endif
+        case INDEX_op_brcond_i32:
+            t0 = tci_read_r32(&tb_ptr);
+            t1 = tci_read_ri32(&tb_ptr);
+            condition = *tb_ptr++;
+            label = tci_read_label(&tb_ptr);
+            if (tci_compare32(t0, t1, condition)) {
+                assert(tb_ptr == old_code_ptr + op_size);
+                tb_ptr = (uint8_t *)label;
+                continue;
+            }
+            break;
+#if TCG_TARGET_REG_BITS == 32
+        case INDEX_op_add2_i32:
+            t0 = *tb_ptr++;
+            t1 = *tb_ptr++;
+            u64 = tci_read_r64(&tb_ptr);
+            u64 += tci_read_r64(&tb_ptr);
+            tci_write_reg64(t1, t0, u64);
+            break;
+        case INDEX_op_sub2_i32:
+            t0 = *tb_ptr++;
+            t1 = *tb_ptr++;
+            u64 = tci_read_r64(&tb_ptr);
+            u64 -= tci_read_r64(&tb_ptr);
+            tci_write_reg64(t1, t0, u64);
+            break;
+        case INDEX_op_brcond2_i32:
+            u64 = tci_read_r64(&tb_ptr);
+            v64 = tci_read_ri64(&tb_ptr);
+            condition = *tb_ptr++;
+            label = tci_read_label(&tb_ptr);
+            if (tci_compare64(u64, v64, condition)) {
+                assert(tb_ptr == old_code_ptr + op_size);
+                tb_ptr = (uint8_t *)label;
+                continue;
+            }
+            break;
+        case INDEX_op_mulu2_i32:
+            t0 = *tb_ptr++;
+            t1 = *tb_ptr++;
+            t2 = tci_read_r32(&tb_ptr);
+            u64 = tci_read_r32(&tb_ptr);
+            tci_write_reg64(t1, t0, t2 * u64);
+            break;
+#endif /* TCG_TARGET_REG_BITS == 32 */
+#if TCG_TARGET_HAS_ext8s_i32
+        case INDEX_op_ext8s_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r8s(&tb_ptr);
+            tci_write_reg32(t0, t1);
+            break;
+#endif
+#if TCG_TARGET_HAS_ext16s_i32
+        case INDEX_op_ext16s_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r16s(&tb_ptr);
+            tci_write_reg32(t0, t1);
+            break;
+#endif
+#if TCG_TARGET_HAS_ext8u_i32
+        case INDEX_op_ext8u_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r8(&tb_ptr);
+            tci_write_reg32(t0, t1);
+            break;
+#endif
+#if TCG_TARGET_HAS_ext16u_i32
+        case INDEX_op_ext16u_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r16(&tb_ptr);
+            tci_write_reg32(t0, t1);
+            break;
+#endif
+#if TCG_TARGET_HAS_bswap16_i32
+        case INDEX_op_bswap16_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r16(&tb_ptr);
+            tci_write_reg32(t0, bswap16(t1));
+            break;
+#endif
+#if TCG_TARGET_HAS_bswap32_i32
+        case INDEX_op_bswap32_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r32(&tb_ptr);
+            tci_write_reg32(t0, bswap32(t1));
+            break;
+#endif
+#if TCG_TARGET_HAS_not_i32
+        case INDEX_op_not_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r32(&tb_ptr);
+            tci_write_reg32(t0, ~t1);
+            break;
+#endif
+#if TCG_TARGET_HAS_neg_i32
+        case INDEX_op_neg_i32:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r32(&tb_ptr);
+            tci_write_reg32(t0, -t1);
+            break;
+#endif
+#if TCG_TARGET_REG_BITS == 64
+        case INDEX_op_mov_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r64(&tb_ptr);
+            tci_write_reg64(t0, t1);
+            break;
+        case INDEX_op_movi_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_i64(&tb_ptr);
+            tci_write_reg64(t0, t1);
+            break;
+    /* Load/store operations. */
+        case INDEX_op_ld8u_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            tci_write_reg8(t0, *(uint8_t *)(t1 + t2));
+            break;
+        case INDEX_op_ld8s_i64:
+        case INDEX_op_ld16u_i64:
+        case INDEX_op_ld16s_i64:
+            TODO();
+            break;
+        case INDEX_op_ld32u_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            tci_write_reg32(t0, *(uint32_t *)(t1 + t2));
+            break;
+        case INDEX_op_ld32s_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            tci_write_reg32s(t0, *(int32_t *)(t1 + t2));
+            break;
+        case INDEX_op_ld_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            tci_write_reg64(t0, *(uint64_t *)(t1 + t2));
+            break;
+        case INDEX_op_st8_i64:
+            t0 = tci_read_r8(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            *(uint8_t *)(t1 + t2) = t0;
+            break;
+        case INDEX_op_st16_i64:
+            t0 = tci_read_r16(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            *(uint16_t *)(t1 + t2) = t0;
+            break;
+        case INDEX_op_st32_i64:
+            t0 = tci_read_r32(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            *(uint32_t *)(t1 + t2) = t0;
+            break;
+        case INDEX_op_st_i64:
+            t0 = tci_read_r64(&tb_ptr);
+            t1 = tci_read_r(&tb_ptr);
+            t2 = tci_read_i32(&tb_ptr);
+            *(uint64_t *)(t1 + t2) = t0;
+            break;
+    /* Arithmetic operations. */
+        case INDEX_op_add_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 + t2);
+            break;
+        case INDEX_op_sub_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 - t2);
+            break;
+        case INDEX_op_mul_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 * t2);
+            break;
+#if TCG_TARGET_HAS_div_i64
+        case INDEX_op_div_i64:
+        case INDEX_op_divu_i64:
+        case INDEX_op_rem_i64:
+        case INDEX_op_remu_i64:
+            TODO();
+            break;
+#elif TCG_TARGET_HAS_div2_i64
+        case INDEX_op_div2_i64:
+        case INDEX_op_divu2_i64:
+            TODO();
+            break;
+#endif
+        case INDEX_op_and_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 & t2);
+            break;
+        case INDEX_op_or_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 | t2);
+            break;
+        case INDEX_op_xor_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 ^ t2);
+            break;
+    /* Shift/rotate operations. */
+        case INDEX_op_shl_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 << t2);
+            break;
+        case INDEX_op_shr_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, t1 >> t2);
+            break;
+        case INDEX_op_sar_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_ri64(&tb_ptr);
+            t2 = tci_read_ri64(&tb_ptr);
+            tci_write_reg64(t0, ((int64_t)t1 >> t2));
+            break;
+#if TCG_TARGET_HAS_rot_i64
+        case INDEX_op_rotl_i64:
+        case INDEX_op_rotr_i64:
+            TODO();
+            break;
+#endif
+        case INDEX_op_brcond_i64:
+            t0 = tci_read_r64(&tb_ptr);
+            t1 = tci_read_ri64(&tb_ptr);
+            condition = *tb_ptr++;
+            label = tci_read_label(&tb_ptr);
+            if (tci_compare64(t0, t1, condition)) {
+                assert(tb_ptr == old_code_ptr + op_size);
+                tb_ptr = (uint8_t *)label;
+                continue;
+            }
+            break;
+#if TCG_TARGET_HAS_ext8u_i64
+        case INDEX_op_ext8u_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r8(&tb_ptr);
+            tci_write_reg64(t0, t1);
+            break;
+#endif
+#if TCG_TARGET_HAS_ext8s_i64
+        case INDEX_op_ext8s_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r8s(&tb_ptr);
+            tci_write_reg64(t0, t1);
+            break;
+#endif
+#if TCG_TARGET_HAS_ext16s_i64
+        case INDEX_op_ext16s_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r16s(&tb_ptr);
+            tci_write_reg64(t0, t1);
+            break;
+#endif
+#if TCG_TARGET_HAS_ext16u_i64
+        case INDEX_op_ext16u_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r16(&tb_ptr);
+            tci_write_reg64(t0, t1);
+            break;
+#endif
+#if TCG_TARGET_HAS_ext32s_i64
+        case INDEX_op_ext32s_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r32s(&tb_ptr);
+            tci_write_reg64(t0, t1);
+            break;
+#endif
+#if TCG_TARGET_HAS_ext32u_i64
+        case INDEX_op_ext32u_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r32(&tb_ptr);
+            tci_write_reg64(t0, t1);
+            break;
+#endif
+#if TCG_TARGET_HAS_bswap16_i64
+        case INDEX_op_bswap16_i64:
+            TODO();
+            t0 = *tb_ptr++;
+            t1 = tci_read_r16(&tb_ptr);
+            tci_write_reg64(t0, bswap16(t1));
+            break;
+#endif
+#if TCG_TARGET_HAS_bswap32_i64
+        case INDEX_op_bswap32_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r32(&tb_ptr);
+            tci_write_reg64(t0, bswap32(t1));
+            break;
+#endif
+#if TCG_TARGET_HAS_bswap64_i64
+        case INDEX_op_bswap64_i64:
+            TODO();
+            t0 = *tb_ptr++;
+            t1 = tci_read_r64(&tb_ptr);
+            tci_write_reg64(t0, bswap64(t1));
+            break;
+#endif
+#if TCG_TARGET_HAS_not_i64
+        case INDEX_op_not_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r64(&tb_ptr);
+            tci_write_reg64(t0, ~t1);
+            break;
+#endif
+#if TCG_TARGET_HAS_neg_i64
+        case INDEX_op_neg_i64:
+            t0 = *tb_ptr++;
+            t1 = tci_read_r64(&tb_ptr);
+            tci_write_reg64(t0, -t1);
+            break;
+#endif
+#endif /* TCG_TARGET_REG_BITS == 64 */
+    /* QEMU specific */
+#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
+        case INDEX_op_debug_insn_start:
+            TODO();
+            break;
+#else
+        case INDEX_op_debug_insn_start:
+            TODO();
+            break;
+#endif
+        case INDEX_op_exit_tb:
+            next_tb = *(uint64_t *)tb_ptr;
+            goto exit;
+            break;
+        case INDEX_op_goto_tb:
+            t0 = tci_read_i32(&tb_ptr);
+            assert(tb_ptr == old_code_ptr + op_size);
+            tb_ptr += (int32_t)t0;
+            continue;
+        case INDEX_op_qemu_ld8u:
+            t0 = *tb_ptr++;
+            taddr = tci_read_ulong(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            u8 = __ldb_mmu(taddr, tci_read_i(&tb_ptr));
+#else
+            host_addr = (tcg_target_ulong)taddr;
+            assert(taddr == host_addr);
+            u8 = *(uint8_t *)(host_addr + GUEST_BASE);
+#endif
+            tci_write_reg8(t0, u8);
+            break;
+        case INDEX_op_qemu_ld8s:
+            t0 = *tb_ptr++;
+            taddr = tci_read_ulong(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            u8 = __ldb_mmu(taddr, tci_read_i(&tb_ptr));
+#else
+            host_addr = (tcg_target_ulong)taddr;
+            assert(taddr == host_addr);
+            u8 = *(uint8_t *)(host_addr + GUEST_BASE);
+#endif
+            tci_write_reg8s(t0, u8);
+            break;
+        case INDEX_op_qemu_ld16u:
+            t0 = *tb_ptr++;
+            taddr = tci_read_ulong(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            u16 = __ldw_mmu(taddr, tci_read_i(&tb_ptr));
+#else
+            host_addr = (tcg_target_ulong)taddr;
+            assert(taddr == host_addr);
+            u16 = tswap16(*(uint16_t *)(host_addr + GUEST_BASE));
+#endif
+            tci_write_reg16(t0, u16);
+            break;
+        case INDEX_op_qemu_ld16s:
+            t0 = *tb_ptr++;
+            taddr = tci_read_ulong(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            u16 = __ldw_mmu(taddr, tci_read_i(&tb_ptr));
+#else
+            host_addr = (tcg_target_ulong)taddr;
+            assert(taddr == host_addr);
+            u16 = tswap16(*(uint16_t *)(host_addr + GUEST_BASE));
+#endif
+            tci_write_reg16s(t0, u16);
+            break;
+#if TCG_TARGET_REG_BITS == 64
+        case INDEX_op_qemu_ld32u:
+            t0 = *tb_ptr++;
+            taddr = tci_read_ulong(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            u32 = __ldl_mmu(taddr, tci_read_i(&tb_ptr));
+#else
+            host_addr = (tcg_target_ulong)taddr;
+            assert(taddr == host_addr);
+            u32 = tswap32(*(uint32_t *)(host_addr + GUEST_BASE));
+#endif
+            tci_write_reg32(t0, u32);
+            break;
+        case INDEX_op_qemu_ld32s:
+            t0 = *tb_ptr++;
+            taddr = tci_read_ulong(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            u32 = __ldl_mmu(taddr, tci_read_i(&tb_ptr));
+#else
+            host_addr = (tcg_target_ulong)taddr;
+            assert(taddr == host_addr);
+            u32 = tswap32(*(uint32_t *)(host_addr + GUEST_BASE));
+#endif
+            tci_write_reg32s(t0, u32);
+            break;
+#endif /* TCG_TARGET_REG_BITS == 64 */
+        case INDEX_op_qemu_ld32:
+            t0 = *tb_ptr++;
+            taddr = tci_read_ulong(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            u32 = __ldl_mmu(taddr, tci_read_i(&tb_ptr));
+#else
+            host_addr = (tcg_target_ulong)taddr;
+            assert(taddr == host_addr);
+            u32 = tswap32(*(uint32_t *)(host_addr + GUEST_BASE));
+#endif
+            tci_write_reg32(t0, u32);
+            break;
+        case INDEX_op_qemu_ld64:
+            t0 = *tb_ptr++;
+#if TCG_TARGET_REG_BITS == 32
+            t1 = *tb_ptr++;
+#endif
+            taddr = tci_read_ulong(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            u64 = __ldq_mmu(taddr, tci_read_i(&tb_ptr));
+#else
+            host_addr = (tcg_target_ulong)taddr;
+            assert(taddr == host_addr);
+            u64 = tswap64(*(uint64_t *)(host_addr + GUEST_BASE));
+#endif
+            tci_write_reg(t0, u64);
+#if TCG_TARGET_REG_BITS == 32
+            tci_write_reg(t1, u64 >> 32);
+#endif
+            break;
+        case INDEX_op_qemu_st8:
+            t0 = tci_read_r8(&tb_ptr);
+            taddr = tci_read_ulong(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            t2 = tci_read_i(&tb_ptr);
+            __stb_mmu(taddr, t0, t2);
+#else
+            host_addr = (tcg_target_ulong)taddr;
+            assert(taddr == host_addr);
+            *(uint8_t *)(host_addr + GUEST_BASE) = t0;
+#endif
+            break;
+        case INDEX_op_qemu_st16:
+            t0 = tci_read_r16(&tb_ptr);
+            taddr = tci_read_ulong(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            t2 = tci_read_i(&tb_ptr);
+            __stw_mmu(taddr, t0, t2);
+#else
+            host_addr = (tcg_target_ulong)taddr;
+            assert(taddr == host_addr);
+            *(uint16_t *)(host_addr + GUEST_BASE) = tswap16(t0);
+#endif
+            break;
+        case INDEX_op_qemu_st32:
+            t0 = tci_read_r32(&tb_ptr);
+            taddr = tci_read_ulong(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            t2 = tci_read_i(&tb_ptr);
+            __stl_mmu(taddr, t0, t2);
+#else
+            host_addr = (tcg_target_ulong)taddr;
+            assert(taddr == host_addr);
+            *(uint32_t *)(host_addr + GUEST_BASE) = tswap32(t0);
+#endif
+            break;
+        case INDEX_op_qemu_st64:
+            u64 = tci_read_r64(&tb_ptr);
+            taddr = tci_read_ulong(&tb_ptr);
+#ifdef CONFIG_SOFTMMU
+            t2 = tci_read_i(&tb_ptr);
+            __stq_mmu(taddr, u64, t2);
+#else
+            host_addr = (tcg_target_ulong)taddr;
+            assert(taddr == host_addr);
+            *(uint64_t *)(host_addr + GUEST_BASE) = tswap64(u64);
+#endif
+            break;
+        default:
+            TODO();
+            break;
+        }
+        assert(tb_ptr == old_code_ptr + op_size);
+    }
+exit:
+    return next_tb;
+}
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter
  2011-09-17 19:59 [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Stefan Weil
                   ` (4 preceding siblings ...)
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode Stefan Weil
@ 2011-09-17 20:00 ` Stefan Weil
  2011-09-18 10:03   ` Blue Swirl
  2011-10-01 16:54   ` Andreas Färber
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 7/8] tcg: Add tcg interpreter to configure / make Stefan Weil
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 48+ messages in thread
From: Stefan Weil @ 2011-09-17 20:00 UTC (permalink / raw)
  To: QEMU Developers

Unlike other tcg target code generators, this one does not generate
machine code for some cpu. It generates machine independent bytecode
which is interpreted later.

This allows running QEMU on any host.

Interpreted bytecode is slower than direct execution of generated
machine code.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 dis-asm.h                 |    1 +
 disas.c                   |    4 +-
 dyngen-exec.h             |   13 +-
 exec-all.h                |   13 +-
 tcg/bytecode/README       |  129 ++++++
 tcg/bytecode/tcg-target.c |  955 +++++++++++++++++++++++++++++++++++++++++++++
 tcg/bytecode/tcg-target.h |  152 +++++++
 7 files changed, 1263 insertions(+), 4 deletions(-)
 create mode 100644 tcg/bytecode/README
 create mode 100644 tcg/bytecode/tcg-target.c
 create mode 100644 tcg/bytecode/tcg-target.h

diff --git a/dis-asm.h b/dis-asm.h
index 5b07d7f..876975f 100644
--- a/dis-asm.h
+++ b/dis-asm.h
@@ -365,6 +365,7 @@ typedef struct disassemble_info {
    target address.  Return number of bytes processed.  */
 typedef int (*disassembler_ftype) (bfd_vma, disassemble_info *);
 
+int print_insn_bytecode(bfd_vma, disassemble_info*);
 int print_insn_big_mips         (bfd_vma, disassemble_info*);
 int print_insn_little_mips      (bfd_vma, disassemble_info*);
 int print_insn_i386             (bfd_vma, disassemble_info*);
diff --git a/disas.c b/disas.c
index 611b30b..e2061d8 100644
--- a/disas.c
+++ b/disas.c
@@ -273,7 +273,9 @@ void disas(FILE *out, void *code, unsigned long size)
 #else
     disasm_info.endian = BFD_ENDIAN_LITTLE;
 #endif
-#if defined(__i386__)
+#if defined(CONFIG_TCG_INTERPRETER)
+    print_insn = print_insn_bytecode;
+#elif defined(__i386__)
     disasm_info.mach = bfd_mach_i386_i386;
     print_insn = print_insn_i386;
 #elif defined(__x86_64__)
diff --git a/dyngen-exec.h b/dyngen-exec.h
index 8beb7f3..64f76c4 100644
--- a/dyngen-exec.h
+++ b/dyngen-exec.h
@@ -19,7 +19,9 @@
 #if !defined(__DYNGEN_EXEC_H__)
 #define __DYNGEN_EXEC_H__
 
-#if defined(__i386__)
+#if defined(CONFIG_TCG_INTERPRETER)
+/* The TCG interpreter does not use special registers. */
+#elif defined(__i386__)
 #define AREG0 "ebp"
 #elif defined(__x86_64__)
 #define AREG0 "r14"
@@ -55,11 +57,18 @@
 #error unsupported CPU
 #endif
 
+#if defined(AREG0)
 register CPUState *env asm(AREG0);
+#else
+extern CPUState *env;
+#endif
 
 /* The return address may point to the start of the next instruction.
    Subtracting one gets us the call instruction itself.  */
-#if defined(__s390__) && !defined(__s390x__)
+#if defined(CONFIG_TCG_INTERPRETER)
+extern uint8_t *tci_tb_ptr;
+# define GETPC() ((void *)tci_tb_ptr)
+#elif defined(__s390__) && !defined(__s390x__)
 # define GETPC() ((void*)(((unsigned long)__builtin_return_address(0) & 0x7fffffffUL) - 1))
 #elif defined(__arm__)
 /* Thumb return addresses have the low bit set, so we need to subtract two.
diff --git a/exec-all.h b/exec-all.h
index 9b8d62c..0116acd 100644
--- a/exec-all.h
+++ b/exec-all.h
@@ -122,6 +122,8 @@ void tlb_set_page(CPUState *env, target_ulong vaddr,
 
 #if defined(_ARCH_PPC) || defined(__x86_64__) || defined(__arm__) || defined(__i386__)
 #define USE_DIRECT_JUMP
+#elif defined(CONFIG_TCG_INTERPRETER)
+#define USE_DIRECT_JUMP
 #endif
 
 struct TranslationBlock {
@@ -189,7 +191,14 @@ extern TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
 
 #if defined(USE_DIRECT_JUMP)
 
-#if defined(_ARCH_PPC)
+#if defined(CONFIG_TCG_INTERPRETER)
+static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
+{
+    /* patch the branch destination */
+    *(uint32_t *)jmp_addr = addr - (jmp_addr + 4);
+    /* no need to flush icache explicitly */
+}
+#elif defined(_ARCH_PPC)
 void ppc_tb_set_jmp_target(unsigned long jmp_addr, unsigned long addr);
 #define tb_set_jmp_target1 ppc_tb_set_jmp_target
 #elif defined(__i386__) || defined(__x86_64__)
@@ -223,6 +232,8 @@ static inline void tb_set_jmp_target1(unsigned long jmp_addr, unsigned long addr
     __asm __volatile__ ("swi 0x9f0002" : : "r" (_beg), "r" (_end), "r" (_flg));
 #endif
 }
+#else
+#error tb_set_jmp_target1 is missing
 #endif
 
 static inline void tb_set_jmp_target(TranslationBlock *tb,
diff --git a/tcg/bytecode/README b/tcg/bytecode/README
new file mode 100644
index 0000000..6fe9755
--- /dev/null
+++ b/tcg/bytecode/README
@@ -0,0 +1,129 @@
+TCG Interpreter (TCI) - Copyright (c) 2011 Stefan Weil.
+
+This file is released under GPL 2 or later.
+
+1) Introduction
+
+TCG (Tiny Code Generator) is a code generator which translates
+code fragments ("basic blocks") from target code (any of the
+targets supported by QEMU) to a code representation which
+can be run on a host.
+
+QEMU can create native code for some hosts (arm, hppa, i386, ia64, ppc, ppc64,
+s390, sparc, x86_64). For others, unofficial host support was written.
+
+By adding a code generator for a virtual machine and using an
+interpreter for the generated bytecode, it is possible to
+support (almost) any host.
+
+This is what TCI (Tiny Code Interpreter) does.
+
+2) Implementation
+
+Like each TCG host frontend, TCI implements the code generator in
+tcg-target.c, tcg-target.h. Both files are in directory tcg/bytecode.
+
+The additional file tcg/tci.c adds the interpreter.
+
+The bytecode consists of opcodes (same numeric values as those used by
+TCG), command length and arguments of variable size and number.
+
+3) Usage
+
+For hosts without native TCG, the interpreter TCI must be enabled by
+
+        configure --enable-tcg-interpreter
+
+If configure is called without --enable-tcg-interpreter, it will
+suggest using this option. Setting it automatically would need
+additional code in configure which must be fixed when new native TCG
+implementations are added.
+
+System emulation should work on any 32 or 64 bit host.
+User mode emulation might work. Maybe a new loader (*.ld)
+is needed. Byte order might be wrong (on big endian hosts)
+and need fixes in configure.
+
+For hosts with native TCG, the interpreter TCI can be enabled by
+
+        configure --enable-tcg-interpreter
+
+The only difference from running qemu with TCI to running without TCI
+should be speed. Especially during development of TCI, it was very
+useful to compare runs with and without TCI. Create /tmp/qemu.log by
+
+        qemu -d in_asm,op_opt,cpu -singlestep
+
+once with interpreter and once without interpreter and compare the resulting
+qemu.log files. This is also useful to see the effects of additional
+registers or additional opcodes (it is easy to modify the virtual machine).
+It can also be used to verify native TCGs.
+
+Hosts with native TCG can also enable TCI by claiming to be unsupported:
+
+        configure --cpu=unknown --enable-tcg-interpreter
+
+configure then no longer uses the native loader (*.ld) for user mode emulation.
+
+
+4) Status
+
+TCI needs special implementation for 32 and 64 bit host, 32 and 64 bit target,
+host and target with same or different endianness.
+
+            | host (le)                     host (be)
+            | 32             64             32             64
+------------+------------------------------------------------------------
+target (le) | s0, u0         s1, u1         s?, u?         s?, u?
+32 bit      |
+            |
+target (le) | sc, uc         s1, u1         s?, u?         s?, u?
+64 bit      |
+            |
+target (be) | sc, u0         sc, uc         s?, u?         s?, u?
+32 bit      |
+            |
+target (be) | sc, uc         sc, uc         s?, u?         s?, u?
+64 bit      |
+            |
+
+System emulation
+s? = untested
+sc = compiles
+s0 = bios works
+s1 = grub works
+s2 = linux boots
+
+Linux user mode emulation
+u? = untested
+uc = compiles
+u0 = static hello works
+u1 = linux-user-test works
+
+5) Todo list
+
+* TCI is not widely tested. It was written and tested on a x86_64 host
+  running i386 and x86_64 system emulation and linux user mode.
+  A cross compiled qemu for i386 host also works with the same basic tests.
+  A cross compiled qemu for mipsel host works, too. It is terribly slow
+  because I run it in a mips malta emulation, so it is an interpreted
+  emulation in an emulation.
+  A cross compiled qemu for arm host works (tested with pc bios).
+  A cross compiled qemu for ppc host works at least partially:
+  i386-linux-user/qemu-i386 can run a simple hello-world program
+  (tested in a ppc emulation).
+
+* Some TCG opcodes are either missing in the code generator and/or
+  in the interpreter. These opcodes raise a runtime exception, so it is
+  possible to see where code must be added.
+
+* The pseudo code is not optimized and still ugly. For hosts with special
+  alignment requirements, it needs some fixes (maybe aligned bytecode
+  would also improve speed for hosts which support byte alignment).
+
+* A better disassembler for the pseudo code would be nice (a very primitive
+  disassembler is included in tcg-target.c).
+
+* It might be useful to have a runtime option which selects the native TCG
+  or TCI, so qemu would have to include two TCGs. Today, selecting TCI
+  is a configure option, so you need two compilations of qemu.
diff --git a/tcg/bytecode/tcg-target.c b/tcg/bytecode/tcg-target.c
new file mode 100644
index 0000000..f505ff0
--- /dev/null
+++ b/tcg/bytecode/tcg-target.c
@@ -0,0 +1,955 @@
+/*
+ * Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2009, 2011 Stefan Weil
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+/* TODO list:
+ * - See TODO comments in code.
+ */
+
+/* Marker for missing code. */
+#define TODO() \
+    do { \
+        fprintf(stderr, "TODO %s:%u: %s()\n", \
+                __FILE__, __LINE__, __func__); \
+        tcg_abort(); \
+    } while (0)
+
+/* Trace message to see program flow. */
+#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
+#define TRACE() \
+    loglevel \
+    ? fprintf(stderr, "TCG %s:%u: %s()\n", __FILE__, __LINE__, __func__) \
+    : (void)0
+#else
+#define TRACE() ((void)0)
+#endif
+
+/* Single bit n. */
+#define BIT(n) (1 << (n))
+
+/* Bitfield n...m (in 32 bit value). */
+#define BITS(n, m) (((0xffffffffU << (31 - n)) >> (31 - n + m)) << m)
+
+/* Used for function call generation. */
+#define TCG_REG_CALL_STACK              TCG_REG_R4
+#define TCG_TARGET_STACK_ALIGN          16
+#define TCG_TARGET_CALL_STACK_OFFSET    0
+
+/* TODO: documentation. */
+static uint8_t *tb_ret_addr;
+
+/* Macros used in tcg_target_op_defs. */
+#define R       "r"
+#define RI      "ri"
+#if TCG_TARGET_REG_BITS == 32
+# define R64    "r", "r"
+#else
+# define R64    "r"
+#endif
+#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
+# define L      "L", "L"
+# define S      "S", "S"
+#else
+# define L      "L"
+# define S      "S"
+#endif
+
+/* TODO: documentation. */
+static const TCGTargetOpDef tcg_target_op_defs[] = {
+    { INDEX_op_exit_tb, { } },
+    { INDEX_op_goto_tb, { } },
+    { INDEX_op_call, { RI } },
+    { INDEX_op_jmp, { RI } },
+    { INDEX_op_br, { } },
+
+    { INDEX_op_mov_i32, { R, R } },
+    { INDEX_op_movi_i32, { R } },
+
+    { INDEX_op_ld8u_i32, { R, R } },
+    { INDEX_op_ld8s_i32, { R, R } },
+    { INDEX_op_ld16u_i32, { R, R } },
+    { INDEX_op_ld16s_i32, { R, R } },
+    { INDEX_op_ld_i32, { R, R } },
+    { INDEX_op_st8_i32, { R, R } },
+    { INDEX_op_st16_i32, { R, R } },
+    { INDEX_op_st_i32, { R, R } },
+
+    { INDEX_op_add_i32, { R, RI, RI } },
+    { INDEX_op_sub_i32, { R, RI, RI } },
+    { INDEX_op_mul_i32, { R, RI, RI } },
+#if TCG_TARGET_HAS_div_i32
+    { INDEX_op_div_i32, { R, R, R } },
+    { INDEX_op_divu_i32, { R, R, R } },
+    { INDEX_op_rem_i32, { R, R, R } },
+    { INDEX_op_remu_i32, { R, R, R } },
+#elif TCG_TARGET_HAS_div2_i32
+    { INDEX_op_div2_i32, { R, R, "0", "1", R } },
+    { INDEX_op_divu2_i32, { R, R, "0", "1", R } },
+#endif
+    /* TODO: Does R, RI, RI result in faster code than R, R, RI?
+       If both operands are constants, we can optimize. */
+    { INDEX_op_and_i32, { R, RI, RI } },
+#if TCG_TARGET_HAS_andc_i32
+    { INDEX_op_andc_i32, { R, RI, RI } },
+#endif
+#if TCG_TARGET_HAS_eqv_i32
+    { INDEX_op_eqv_i32, { R, RI, RI } },
+#endif
+#if TCG_TARGET_HAS_nand_i32
+    { INDEX_op_nand_i32, { R, RI, RI } },
+#endif
+#if TCG_TARGET_HAS_nor_i32
+    { INDEX_op_nor_i32, { R, RI, RI } },
+#endif
+    { INDEX_op_or_i32, { R, RI, RI } },
+#if TCG_TARGET_HAS_orc_i32
+    { INDEX_op_orc_i32, { R, RI, RI } },
+#endif
+    { INDEX_op_xor_i32, { R, RI, RI } },
+    { INDEX_op_shl_i32, { R, RI, RI } },
+    { INDEX_op_shr_i32, { R, RI, RI } },
+    { INDEX_op_sar_i32, { R, RI, RI } },
+#if TCG_TARGET_HAS_rot_i32
+    { INDEX_op_rotl_i32, { R, RI, RI } },
+    { INDEX_op_rotr_i32, { R, RI, RI } },
+#endif
+
+    { INDEX_op_brcond_i32, { R, RI } },
+
+    { INDEX_op_setcond_i32, { R, R, RI } },
+#if TCG_TARGET_REG_BITS == 64
+    { INDEX_op_setcond_i64, { R, R, RI } },
+#endif /* TCG_TARGET_REG_BITS == 64 */
+
+#if TCG_TARGET_REG_BITS == 32
+    /* TODO: Support R, R, R, R, RI, RI? Will it be faster? */
+    { INDEX_op_add2_i32, { R, R, R, R, R, R } },
+    { INDEX_op_sub2_i32, { R, R, R, R, R, R } },
+    { INDEX_op_brcond2_i32, { R, R, RI, RI } },
+    { INDEX_op_mulu2_i32, { R, R, R, R } },
+    { INDEX_op_setcond2_i32, { R, R, R, RI, RI } },
+#endif
+
+#if TCG_TARGET_HAS_not_i32
+    { INDEX_op_not_i32, { R, R } },
+#endif
+#if TCG_TARGET_HAS_neg_i32
+    { INDEX_op_neg_i32, { R, R } },
+#endif
+
+#if TCG_TARGET_REG_BITS == 64
+    { INDEX_op_mov_i64, { R, R } },
+    { INDEX_op_movi_i64, { R } },
+
+    { INDEX_op_ld8u_i64, { R, R } },
+    { INDEX_op_ld8s_i64, { R, R } },
+    { INDEX_op_ld16u_i64, { R, R } },
+    { INDEX_op_ld16s_i64, { R, R } },
+    { INDEX_op_ld32u_i64, { R, R } },
+    { INDEX_op_ld32s_i64, { R, R } },
+    { INDEX_op_ld_i64, { R, R } },
+
+    { INDEX_op_st8_i64, { R, R } },
+    { INDEX_op_st16_i64, { R, R } },
+    { INDEX_op_st32_i64, { R, R } },
+    { INDEX_op_st_i64, { R, R } },
+
+    { INDEX_op_add_i64, { R, RI, RI } },
+    { INDEX_op_sub_i64, { R, RI, RI } },
+    { INDEX_op_mul_i64, { R, RI, RI } },
+#if TCG_TARGET_HAS_div_i64
+    { INDEX_op_div_i64, { R, R, R } },
+    { INDEX_op_divu_i64, { R, R, R } },
+    { INDEX_op_rem_i64, { R, R, R } },
+    { INDEX_op_remu_i64, { R, R, R } },
+#elif defined(TCG_TARGET_HAS_div2_i64)
+    { INDEX_op_div2_i64, { R, R, "0", "1", R } },
+    { INDEX_op_divu2_i64, { R, R, "0", "1", R } },
+#endif
+    { INDEX_op_and_i64, { R, RI, RI } },
+#if TCG_TARGET_HAS_andc_i64
+    { INDEX_op_andc_i64, { R, RI, RI } },
+#endif
+#if TCG_TARGET_HAS_eqv_i64
+    { INDEX_op_eqv_i64, { R, RI, RI } },
+#endif
+#if TCG_TARGET_HAS_nand_i64
+    { INDEX_op_nand_i64, { R, RI, RI } },
+#endif
+#if TCG_TARGET_HAS_nor_i64
+    { INDEX_op_nor_i64, { R, RI, RI } },
+#endif
+    { INDEX_op_or_i64, { R, RI, RI } },
+#if TCG_TARGET_HAS_orc_i64
+    { INDEX_op_orc_i64, { R, RI, RI } },
+#endif
+    { INDEX_op_xor_i64, { R, RI, RI } },
+    { INDEX_op_shl_i64, { R, RI, RI } },
+    { INDEX_op_shr_i64, { R, RI, RI } },
+    { INDEX_op_sar_i64, { R, RI, RI } },
+#if TCG_TARGET_HAS_rot_i64
+    { INDEX_op_rotl_i64, { R, RI, RI } },
+    { INDEX_op_rotr_i64, { R, RI, RI } },
+#endif
+    { INDEX_op_brcond_i64, { R, RI } },
+
+#if TCG_TARGET_HAS_ext8s_i64
+    { INDEX_op_ext8s_i64, { R, R } },
+#endif
+#if TCG_TARGET_HAS_ext16s_i64
+    { INDEX_op_ext16s_i64, { R, R } },
+#endif
+#if TCG_TARGET_HAS_ext32s_i64
+    { INDEX_op_ext32s_i64, { R, R } },
+#endif
+#if TCG_TARGET_HAS_ext8u_i64
+    { INDEX_op_ext8u_i64, { R, R } },
+#endif
+#if TCG_TARGET_HAS_ext16u_i64
+    { INDEX_op_ext16u_i64, { R, R } },
+#endif
+#if TCG_TARGET_HAS_ext32u_i64
+    { INDEX_op_ext32u_i64, { R, R } },
+#endif
+#if TCG_TARGET_HAS_bswap16_i64
+    { INDEX_op_bswap16_i64, { R, R } },
+#endif
+#if TCG_TARGET_HAS_bswap32_i64
+    { INDEX_op_bswap32_i64, { R, R } },
+#endif
+#if TCG_TARGET_HAS_bswap64_i64
+    { INDEX_op_bswap64_i64, { R, R } },
+#endif
+#if TCG_TARGET_HAS_not_i64
+    { INDEX_op_not_i64, { R, R } },
+#endif
+#if TCG_TARGET_HAS_neg_i64
+    { INDEX_op_neg_i64, { R, R } },
+#endif
+#endif /* TCG_TARGET_REG_BITS == 64 */
+
+    { INDEX_op_qemu_ld8u, { R, L } },
+    { INDEX_op_qemu_ld8s, { R, L } },
+    { INDEX_op_qemu_ld16u, { R, L } },
+    { INDEX_op_qemu_ld16s, { R, L } },
+    { INDEX_op_qemu_ld32, { R, L } },
+#if TCG_TARGET_REG_BITS == 64
+    { INDEX_op_qemu_ld32u, { R, L } },
+    { INDEX_op_qemu_ld32s, { R, L } },
+#endif
+    { INDEX_op_qemu_ld64, { R64, L } },
+
+    { INDEX_op_qemu_st8, { R, S } },
+    { INDEX_op_qemu_st16, { R, S } },
+    { INDEX_op_qemu_st32, { R, S } },
+    { INDEX_op_qemu_st64, { R64, S } },
+
+#if TCG_TARGET_HAS_ext8s_i32
+    { INDEX_op_ext8s_i32, { R, R } },
+#endif
+#if TCG_TARGET_HAS_ext16s_i32
+    { INDEX_op_ext16s_i32, { R, R } },
+#endif
+#if TCG_TARGET_HAS_ext8u_i32
+    { INDEX_op_ext8u_i32, { R, R } },
+#endif
+#if TCG_TARGET_HAS_ext16u_i32
+    { INDEX_op_ext16u_i32, { R, R } },
+#endif
+
+#if TCG_TARGET_HAS_bswap16_i32
+    { INDEX_op_bswap16_i32, { R, R } },
+#endif
+#if TCG_TARGET_HAS_bswap32_i32
+    { INDEX_op_bswap32_i32, { R, R } },
+#endif
+
+    { -1 },
+};
+
+static const int tcg_target_reg_alloc_order[] = {
+    TCG_REG_R0,
+    TCG_REG_R1,
+    TCG_REG_R2,
+    TCG_REG_R3,
+#if 0 /* used for TCG_REG_CALL_STACK */
+    TCG_REG_R4,
+#endif
+    TCG_REG_R5,
+    TCG_REG_R6,
+    TCG_REG_R7,
+#if TCG_TARGET_NB_REGS >= 16
+    TCG_REG_R8,
+    TCG_REG_R9,
+    TCG_REG_R10,
+    TCG_REG_R11,
+    TCG_REG_R12,
+    TCG_REG_R13,
+    TCG_REG_R14,
+    TCG_REG_R15,
+#endif
+};
+
+#if MAX_OPC_PARAM_IARGS != 4
+# error Fix needed, number of supported input arguments changed!
+#endif
+
+static const int tcg_target_call_iarg_regs[] = {
+    TCG_REG_R0,
+    TCG_REG_R1,
+    TCG_REG_R2,
+    TCG_REG_R3,
+#if TCG_TARGET_REG_BITS == 32
+    /* 32 bit hosts need 2 * MAX_OPC_PARAM_IARGS registers. */
+#if 0 /* used for TCG_REG_CALL_STACK */
+    TCG_REG_R4,
+#endif
+    TCG_REG_R5,
+    TCG_REG_R6,
+    TCG_REG_R7,
+#if TCG_TARGET_NB_REGS >= 16
+    TCG_REG_R8,
+#else
+# error Too few input registers available
+#endif
+#endif
+};
+
+static const int tcg_target_call_oarg_regs[] = {
+    TCG_REG_R0,
+#if TCG_TARGET_REG_BITS == 32
+    TCG_REG_R1
+#endif
+};
+
+#ifndef NDEBUG
+static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
+    "r00",
+    "r01",
+    "r02",
+    "r03",
+    "r04",
+    "r05",
+    "r06",
+    "r07",
+#if TCG_TARGET_NB_REGS >= 16
+    "r08",
+    "r09",
+    "r10",
+    "r11",
+    "r12",
+    "r13",
+    "r14",
+    "r15",
+#if TCG_TARGET_NB_REGS >= 32
+    "r16",
+    "r17",
+    "r18",
+    "r19",
+    "r20",
+    "r21",
+    "r22",
+    "r23",
+    "r24",
+    "r25",
+    "r26",
+    "r27",
+    "r28",
+    "r29",
+    "r30",
+    "r31"
+#endif
+#endif
+};
+#endif
+
+static void flush_icache_range(unsigned long start, unsigned long stop)
+{
+    TRACE();
+}
+
+static void patch_reloc(uint8_t *code_ptr, int type,
+                        tcg_target_long value, tcg_target_long addend)
+{
+    /* tcg_out_reloc always uses the same type, addend. */
+    assert(type == sizeof(tcg_target_long));
+    assert(addend == 0);
+    assert(value != 0);
+    *(tcg_target_long *)code_ptr = value;
+}
+
+/* Parse target specific constraints. */
+static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+{
+    const char *ct_str = *pct_str;
+    switch (ct_str[0]) {
+    case 'r':
+    case 'L':                   /* qemu_ld constraint */
+    case 'S':                   /* qemu_st constraint */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set32(ct->u.regs, 0, BIT(TCG_TARGET_NB_REGS) - 1);
+        break;
+    default:
+        return -1;
+    }
+    ct_str++;
+    *pct_str = ct_str;
+    return 0;
+}
+
+#include "dis-asm.h"
+
+/* Disassemble bytecode. */
+int print_insn_bytecode(bfd_vma addr, disassemble_info *info)
+{
+    int length;
+    uint8_t byte;
+    int status;
+    TCGOpcode op;
+
+    status = info->read_memory_func(addr, &byte, 1, info);
+    if (status != 0) {
+        info->memory_error_func(status, addr, info);
+        return -1;
+    }
+    op = byte;
+
+    addr++;
+    status = info->read_memory_func(addr, &byte, 1, info);
+    if (status != 0) {
+        info->memory_error_func(status, addr, info);
+        return -1;
+    }
+    length = byte;
+
+    if (op >= ARRAY_SIZE(tcg_op_defs)) {
+        return length;
+    }
+
+    const TCGOpDef *def = &tcg_op_defs[op];
+    int nb_oargs = def->nb_oargs;
+    int nb_iargs = def->nb_iargs;
+    int nb_cargs = def->nb_cargs;
+    FILE *f = info->stream;
+    /* TODO: Improve disassembler output. */
+    info->fprintf_func(f, "%s\to=%d i=%d c=%d",
+                       def->name, nb_oargs, nb_iargs, nb_cargs);
+
+    return length;
+}
+
+#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
+/* Show current bytecode. Used by tcg interpreter. */
+void tci_disas(uint8_t opc)
+{
+    const TCGOpDef *def = &tcg_op_defs[opc];
+    fprintf(stderr, "TCG %s %u, %u, %u\n",
+            def->name, def->nb_oargs, def->nb_iargs, def->nb_cargs);
+}
+#endif
+
+/* Write value (native size). */
+static void tcg_out_i(TCGContext *s, tcg_target_ulong v)
+{
+    *(tcg_target_ulong *)s->code_ptr = v;
+    s->code_ptr += sizeof(tcg_target_ulong);
+}
+
+/* Write 64 bit value. */
+static void tcg_out64(TCGContext *s, uint64_t v)
+{
+    *(uint64_t *)s->code_ptr = v;
+    s->code_ptr += sizeof(v);
+}
+
+/* Write opcode. */
+static void tcg_out_op_t(TCGContext *s, TCGOpcode op)
+{
+    tcg_out8(s, op);
+    tcg_out8(s, 0);
+}
+
+/* Write register. */
+static void tcg_out_r(TCGContext *s, TCGArg t0)
+{
+    assert(t0 < TCG_TARGET_NB_REGS);
+    tcg_out8(s, t0);
+}
+
+/* Write register or constant (native size). */
+static void tcg_out_ri(TCGContext *s, int const_arg, TCGArg arg)
+{
+    if (const_arg) {
+        assert(const_arg == 1);
+        tcg_out8(s, TCG_CONST);
+        tcg_out_i(s, arg);
+    } else {
+        tcg_out_r(s, arg);
+    }
+}
+
+/* Write register or constant (32 bit). */
+static void tcg_out_ri32(TCGContext *s, int const_arg, TCGArg arg)
+{
+    if (const_arg) {
+        assert(const_arg == 1);
+        tcg_out8(s, TCG_CONST);
+        tcg_out32(s, arg);
+    } else {
+        tcg_out_r(s, arg);
+    }
+}
+
+#if TCG_TARGET_REG_BITS == 64
+/* Write register or constant (64 bit). */
+static void tcg_out_ri64(TCGContext *s, int const_arg, TCGArg arg)
+{
+    if (const_arg) {
+        assert(const_arg == 1);
+        tcg_out8(s, TCG_CONST);
+        tcg_out64(s, arg);
+    } else {
+        tcg_out_r(s, arg);
+    }
+}
+#endif
+
+/* Write label. */
+static void tci_out_label(TCGContext *s, TCGArg arg)
+{
+    TCGLabel *label = &s->labels[arg];
+    if (label->has_value) {
+        tcg_out_i(s, label->u.value);
+        assert(label->u.value);
+    } else {
+        tcg_out_reloc(s, s->code_ptr, sizeof(tcg_target_ulong), arg, 0);
+        tcg_out_i(s, 0);
+    }
+}
+
+static void tcg_out_ld(TCGContext *s, TCGType type, int ret, int arg1,
+                       tcg_target_long arg2)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+    if (type == TCG_TYPE_I32) {
+        tcg_out_op_t(s, INDEX_op_ld_i32);
+        tcg_out_r(s, ret);
+        tcg_out_r(s, arg1);
+        tcg_out32(s, arg2);
+    } else {
+        assert(type == TCG_TYPE_I64);
+#if TCG_TARGET_REG_BITS == 64
+        tcg_out_op_t(s, INDEX_op_ld_i64);
+        tcg_out_r(s, ret);
+        tcg_out_r(s, arg1);
+        assert(arg2 == (uint32_t)arg2);
+        tcg_out32(s, arg2);
+#else
+        TODO();
+#endif
+    }
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
+static void tcg_out_mov(TCGContext *s, TCGType type, int ret, int arg)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+    assert(ret != arg);
+#if TCG_TARGET_REG_BITS == 32
+    tcg_out_op_t(s, INDEX_op_mov_i32);
+#else
+    tcg_out_op_t(s, INDEX_op_mov_i64);
+#endif
+    tcg_out_r(s, ret);
+    tcg_out_r(s, arg);
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
+static void tcg_out_movi(TCGContext *s, TCGType type,
+                         int t0, tcg_target_long arg)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+    uint32_t arg32 = arg;
+    if (type == TCG_TYPE_I32 || arg == arg32) {
+        tcg_out_op_t(s, INDEX_op_movi_i32);
+        tcg_out_r(s, t0);
+        tcg_out32(s, arg32);
+    } else {
+        assert(type == TCG_TYPE_I64);
+#if TCG_TARGET_REG_BITS == 64
+        tcg_out_op_t(s, INDEX_op_movi_i64);
+        tcg_out_r(s, t0);
+        tcg_out64(s, arg);
+#else
+        TODO();
+#endif
+    }
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
+static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
+                       const int *const_args)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+
+    tcg_out_op_t(s, opc);
+
+    switch (opc) {
+    case INDEX_op_exit_tb:
+        tcg_out64(s, args[0]);
+        break;
+    case INDEX_op_goto_tb:
+        if (s->tb_jmp_offset) {
+            /* Direct jump method. */
+            assert(args[0] < ARRAY_SIZE(s->tb_jmp_offset));
+            s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
+            tcg_out32(s, 0);
+        } else {
+            /* Indirect jump method. */
+            TODO();
+        }
+        assert(args[0] < ARRAY_SIZE(s->tb_next_offset));
+        s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
+        break;
+    case INDEX_op_br:
+        tci_out_label(s, args[0]);
+        break;
+    case INDEX_op_call:
+        tcg_out_ri(s, const_args[0], args[0]);
+        break;
+    case INDEX_op_jmp:
+        TODO();
+        break;
+    case INDEX_op_setcond_i32:
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        tcg_out_ri32(s, const_args[2], args[2]);
+        tcg_out8(s, args[3]);   /* condition */
+        break;
+#if TCG_TARGET_REG_BITS == 32
+    case INDEX_op_setcond2_i32:
+        /* setcond2_i32 cond, t0, t1_low, t1_high, t2_low, t2_high */
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        tcg_out_r(s, args[2]);
+        tcg_out_ri32(s, const_args[3], args[3]);
+        tcg_out_ri32(s, const_args[4], args[4]);
+        tcg_out8(s, args[5]);   /* condition */
+        break;
+#elif TCG_TARGET_REG_BITS == 64
+    case INDEX_op_setcond_i64:
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        tcg_out_ri64(s, const_args[2], args[2]);
+        tcg_out8(s, args[3]);   /* condition */
+        break;
+#endif
+    case INDEX_op_movi_i32:
+        TODO(); /* Handled by tcg_out_movi? */
+        break;
+    case INDEX_op_ld8u_i32:
+    case INDEX_op_ld8s_i32:
+    case INDEX_op_ld16u_i32:
+    case INDEX_op_ld16s_i32:
+    case INDEX_op_ld_i32:
+    case INDEX_op_st8_i32:
+    case INDEX_op_st16_i32:
+    case INDEX_op_st_i32:
+    case INDEX_op_ld8u_i64:
+    case INDEX_op_ld8s_i64:
+    case INDEX_op_ld16u_i64:
+    case INDEX_op_ld16s_i64:
+    case INDEX_op_ld32u_i64:
+    case INDEX_op_ld32s_i64:
+    case INDEX_op_ld_i64:
+    case INDEX_op_st8_i64:
+    case INDEX_op_st16_i64:
+    case INDEX_op_st32_i64:
+    case INDEX_op_st_i64:
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        assert(args[2] == (uint32_t)args[2]);
+        tcg_out32(s, args[2]);
+        break;
+    case INDEX_op_add_i32:
+    case INDEX_op_sub_i32:
+    case INDEX_op_mul_i32:
+    case INDEX_op_and_i32:
+    case INDEX_op_andc_i32:     /* Optional (TCG_TARGET_HAS_andc_i32). */
+    case INDEX_op_eqv_i32:      /* Optional (TCG_TARGET_HAS_eqv_i32). */
+    case INDEX_op_nand_i32:     /* Optional (TCG_TARGET_HAS_nand_i32). */
+    case INDEX_op_nor_i32:      /* Optional (TCG_TARGET_HAS_nor_i32). */
+    case INDEX_op_or_i32:
+    case INDEX_op_orc_i32:      /* Optional (TCG_TARGET_HAS_orc_i32). */
+    case INDEX_op_xor_i32:
+    case INDEX_op_shl_i32:
+    case INDEX_op_shr_i32:
+    case INDEX_op_sar_i32:
+    case INDEX_op_rotl_i32:     /* Optional (TCG_TARGET_HAS_rot_i32). */
+    case INDEX_op_rotr_i32:     /* Optional (TCG_TARGET_HAS_rot_i32). */
+        tcg_out_r(s, args[0]);
+        tcg_out_ri32(s, const_args[1], args[1]);
+        tcg_out_ri32(s, const_args[2], args[2]);
+        break;
+
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_mov_i64:
+    case INDEX_op_movi_i64:
+        TODO();
+        break;
+    case INDEX_op_add_i64:
+    case INDEX_op_sub_i64:
+    case INDEX_op_mul_i64:
+    case INDEX_op_and_i64:
+    case INDEX_op_andc_i64:     /* Optional (TCG_TARGET_HAS_andc_i64). */
+    case INDEX_op_eqv_i64:      /* Optional (TCG_TARGET_HAS_eqv_i64). */
+    case INDEX_op_nand_i64:     /* Optional (TCG_TARGET_HAS_nand_i64). */
+    case INDEX_op_nor_i64:      /* Optional (TCG_TARGET_HAS_nor_i64). */
+    case INDEX_op_or_i64:
+    case INDEX_op_orc_i64:      /* Optional (TCG_TARGET_HAS_orc_i64). */
+    case INDEX_op_xor_i64:
+    case INDEX_op_shl_i64:
+    case INDEX_op_shr_i64:
+    case INDEX_op_sar_i64:
+    /* TODO: Implementation of rotl_i64, rotr_i64 missing in tci.c. */
+    case INDEX_op_rotl_i64:     /* Optional (TCG_TARGET_HAS_rot_i64). */
+    case INDEX_op_rotr_i64:     /* Optional (TCG_TARGET_HAS_rot_i64). */
+        tcg_out_r(s, args[0]);
+        tcg_out_ri64(s, const_args[1], args[1]);
+        tcg_out_ri64(s, const_args[2], args[2]);
+        break;
+    case INDEX_op_div_i64:      /* Optional (TCG_TARGET_HAS_div_i64). */
+    case INDEX_op_divu_i64:     /* Optional (TCG_TARGET_HAS_div_i64). */
+    case INDEX_op_rem_i64:      /* Optional (TCG_TARGET_HAS_div_i64). */
+    case INDEX_op_remu_i64:     /* Optional (TCG_TARGET_HAS_div_i64). */
+        TODO();
+        break;
+    case INDEX_op_div2_i64:     /* Optional (TCG_TARGET_HAS_div2_i64). */
+    case INDEX_op_divu2_i64:    /* Optional (TCG_TARGET_HAS_div2_i64). */
+        TODO();
+        break;
+    case INDEX_op_brcond_i64:
+        tcg_out_r(s, args[0]);
+        tcg_out_ri64(s, const_args[1], args[1]);
+        tcg_out8(s, args[2]);           /* condition */
+        tci_out_label(s, args[3]);
+        break;
+    case INDEX_op_bswap16_i64:  /* Optional (TCG_TARGET_HAS_bswap16_i64). */
+    case INDEX_op_bswap32_i64:  /* Optional (TCG_TARGET_HAS_bswap32_i64). */
+    case INDEX_op_bswap64_i64:  /* Optional (TCG_TARGET_HAS_bswap64_i64). */
+    case INDEX_op_not_i64:      /* Optional (TCG_TARGET_HAS_not_i64). */
+    case INDEX_op_neg_i64:      /* Optional (TCG_TARGET_HAS_neg_i64). */
+    case INDEX_op_ext8s_i64:    /* Optional (TCG_TARGET_HAS_ext8s_i64). */
+    case INDEX_op_ext8u_i64:    /* Optional (TCG_TARGET_HAS_ext8u_i64). */
+    case INDEX_op_ext16s_i64:   /* Optional (TCG_TARGET_HAS_ext16s_i64). */
+    case INDEX_op_ext16u_i64:   /* Optional (TCG_TARGET_HAS_ext16u_i64). */
+    case INDEX_op_ext32s_i64:   /* Optional (TCG_TARGET_HAS_ext32s_i64). */
+    case INDEX_op_ext32u_i64:   /* Optional (TCG_TARGET_HAS_ext32u_i64). */
+#endif /* TCG_TARGET_REG_BITS == 64 */
+    case INDEX_op_neg_i32:      /* Optional (TCG_TARGET_HAS_neg_i32). */
+    case INDEX_op_not_i32:      /* Optional (TCG_TARGET_HAS_not_i32). */
+    case INDEX_op_ext8s_i32:    /* Optional (TCG_TARGET_HAS_ext8s_i32). */
+    case INDEX_op_ext16s_i32:   /* Optional (TCG_TARGET_HAS_ext16s_i32). */
+    case INDEX_op_ext8u_i32:    /* Optional (TCG_TARGET_HAS_ext8u_i32). */
+    case INDEX_op_ext16u_i32:   /* Optional (TCG_TARGET_HAS_ext16u_i32). */
+    case INDEX_op_bswap16_i32:  /* Optional (TCG_TARGET_HAS_bswap16_i32). */
+    case INDEX_op_bswap32_i32:  /* Optional (TCG_TARGET_HAS_bswap32_i32). */
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        break;
+    case INDEX_op_div_i32:      /* Optional (TCG_TARGET_HAS_div_i32). */
+    case INDEX_op_divu_i32:     /* Optional (TCG_TARGET_HAS_div_i32). */
+    case INDEX_op_rem_i32:      /* Optional (TCG_TARGET_HAS_div_i32). */
+    case INDEX_op_remu_i32:     /* Optional (TCG_TARGET_HAS_div_i32). */
+        tcg_out_r(s, args[0]);
+        tcg_out_ri32(s, const_args[1], args[1]);
+        tcg_out_ri32(s, const_args[2], args[2]);
+        break;
+    case INDEX_op_div2_i32:     /* Optional (TCG_TARGET_HAS_div2_i32). */
+    case INDEX_op_divu2_i32:    /* Optional (TCG_TARGET_HAS_div2_i32). */
+        TODO();
+        break;
+#if TCG_TARGET_REG_BITS == 32
+    case INDEX_op_add2_i32:
+    case INDEX_op_sub2_i32:
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        tcg_out_r(s, args[2]);
+        tcg_out_r(s, args[3]);
+        tcg_out_r(s, args[4]);
+        tcg_out_r(s, args[5]);
+        break;
+    case INDEX_op_brcond2_i32:
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        tcg_out_ri32(s, const_args[2], args[2]);
+        tcg_out_ri32(s, const_args[3], args[3]);
+        tcg_out8(s, args[4]);           /* condition */
+        tci_out_label(s, args[5]);
+        break;
+    case INDEX_op_mulu2_i32:
+        tcg_out_r(s, args[0]);
+        tcg_out_r(s, args[1]);
+        tcg_out_r(s, args[2]);
+        tcg_out_r(s, args[3]);
+        break;
+#endif
+    case INDEX_op_brcond_i32:
+        tcg_out_r(s, args[0]);
+        tcg_out_ri32(s, const_args[1], args[1]);
+        tcg_out8(s, args[2]);           /* condition */
+        tci_out_label(s, args[3]);
+        break;
+    case INDEX_op_qemu_ld8u:
+    case INDEX_op_qemu_ld8s:
+    case INDEX_op_qemu_ld16u:
+    case INDEX_op_qemu_ld16s:
+    case INDEX_op_qemu_ld32:
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_qemu_ld32s:
+    case INDEX_op_qemu_ld32u:
+#endif
+        tcg_out_r(s, *args++);
+        tcg_out_r(s, *args++);
+#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
+        tcg_out_r(s, *args++);
+#endif
+#ifdef CONFIG_SOFTMMU
+        tcg_out_i(s, *args);
+#endif
+        break;
+    case INDEX_op_qemu_ld64:
+        tcg_out_r(s, *args++);
+#if TCG_TARGET_REG_BITS == 32
+        tcg_out_r(s, *args++);
+#endif
+        tcg_out_r(s, *args++);
+#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
+        tcg_out_r(s, *args++);
+#endif
+#ifdef CONFIG_SOFTMMU
+        tcg_out_i(s, *args);
+#endif
+        break;
+    case INDEX_op_qemu_st8:
+    case INDEX_op_qemu_st16:
+    case INDEX_op_qemu_st32:
+        tcg_out_r(s, *args++);
+        tcg_out_r(s, *args++);
+#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
+        tcg_out_r(s, *args++);
+#endif
+#ifdef CONFIG_SOFTMMU
+        tcg_out_i(s, *args);
+#endif
+        break;
+    case INDEX_op_qemu_st64:
+        tcg_out_r(s, *args++);
+#if TCG_TARGET_REG_BITS == 32
+        tcg_out_r(s, *args++);
+#endif
+        tcg_out_r(s, *args++);
+#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
+        tcg_out_r(s, *args++);
+#endif
+#ifdef CONFIG_SOFTMMU
+        tcg_out_i(s, *args);
+#endif
+        break;
+    case INDEX_op_end:
+        TODO();
+        break;
+    default:
+        fprintf(stderr, "Missing: %s\n", tcg_op_defs[opc].name);
+        tcg_abort();
+    }
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
+static void tcg_out_st(TCGContext *s, TCGType type, int arg, int arg1,
+                       tcg_target_long arg2)
+{
+    uint8_t *old_code_ptr = s->code_ptr;
+    if (type == TCG_TYPE_I32) {
+        tcg_out_op_t(s, INDEX_op_st_i32);
+        tcg_out_r(s, arg);
+        tcg_out_r(s, arg1);
+        tcg_out32(s, arg2);
+    } else {
+        assert(type == TCG_TYPE_I64);
+#if TCG_TARGET_REG_BITS == 64
+        tcg_out_op_t(s, INDEX_op_st_i64);
+        tcg_out_r(s, arg);
+        tcg_out_r(s, arg1);
+        tcg_out32(s, arg2);
+#else
+        TODO();
+#endif
+    }
+    old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
+/* Test if a constant matches the constraint. */
+static int tcg_target_const_match(tcg_target_long val,
+                                  const TCGArgConstraint *arg_ct)
+{
+    /* No need to return 0 or 1, 0 or != 0 is good enough. */
+    return arg_ct->ct & TCG_CT_CONST;
+}
+
+/* Maximum number of register used for input function arguments. */
+static int tcg_target_get_call_iarg_regs_count(int flags)
+{
+    return ARRAY_SIZE(tcg_target_call_iarg_regs);
+}
+
+static void tcg_target_init(TCGContext *s)
+{
+#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
+    const char *envval = getenv("DEBUG_TCG");
+    if (envval) {
+        loglevel = strtol(envval, NULL, 0);
+    }
+#endif
+    TRACE();
+
+    /* The current code uses uint8_t for tcg operations. */
+    assert(ARRAY_SIZE(tcg_op_defs) <= UINT8_MAX);
+
+    /* Registers available for 32 bit operations. */
+    tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0,
+                     BIT(TCG_TARGET_NB_REGS) - 1);
+    /* Registers available for 64 bit operations. */
+    tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0,
+                     BIT(TCG_TARGET_NB_REGS) - 1);
+    /* TODO: Which registers should be set here? */
+    tcg_regset_set32(tcg_target_call_clobber_regs, 0,
+                     BIT(TCG_TARGET_NB_REGS) - 1);
+    tcg_regset_clear(s->reserved_regs);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
+    tcg_add_target_add_op_defs(tcg_target_op_defs);
+    tcg_set_frame(s, TCG_AREG0, offsetof(CPUState, temp_buf),
+                  CPU_TEMP_BUF_NLONGS * sizeof(long));
+}
+
+/* Generate global QEMU prologue and epilogue code. */
+static void tcg_target_qemu_prologue(TCGContext *s)
+{
+    TRACE();
+    tb_ret_addr = s->code_ptr;
+}
diff --git a/tcg/bytecode/tcg-target.h b/tcg/bytecode/tcg-target.h
new file mode 100644
index 0000000..05aaaf2
--- /dev/null
+++ b/tcg/bytecode/tcg-target.h
@@ -0,0 +1,152 @@
+/*
+ * Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2009, 2011 Stefan Weil
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This code implements a TCG which does not generate machine code for some
+ * real target machine but which generates virtual machine code for an
+ * interpreter. Interpreted pseudo code is slow, but it works on any host.
+ *
+ * Some remarks might help in understanding the code:
+ *
+ * "target" or "TCG target" is the machine which runs the generated code.
+ * This is different to the usual meaning in QEMU where "target" is the
+ * emulated machine. So normally QEMU host is identical to TCG target.
+ * Here the TCG target is a virtual machine, but this virtual machine must
+ * use the same word size like the real machine.
+ * Therefore, we need both 32 and 64 bit virtual machines (interpreter).
+ */
+
+#if !defined(TCG_TARGET_H)
+#define TCG_TARGET_H
+
+#include "config-host.h"
+
+#define TCG_TARGET_INTERPRETER 1
+
+#ifdef CONFIG_DEBUG_TCG
+/* Enable debug output. */
+#define CONFIG_DEBUG_TCG_INTERPRETER
+#endif
+
+#if 0 /* TCI tries to emulate a little endian host. */
+#if defined(HOST_WORDS_BIGENDIAN)
+# define TCG_TARGET_WORDS_BIGENDIAN
+#endif
+#endif
+
+/* Optional instructions. */
+
+#define TCG_TARGET_HAS_bswap16_i32      1
+#define TCG_TARGET_HAS_bswap32_i32      1
+/* Not more than one of the next two defines must be 1. */
+#define TCG_TARGET_HAS_div_i32          1
+#define TCG_TARGET_HAS_div2_i32         0
+#define TCG_TARGET_HAS_ext8s_i32        1
+#define TCG_TARGET_HAS_ext16s_i32       1
+#define TCG_TARGET_HAS_ext8u_i32        1
+#define TCG_TARGET_HAS_ext16u_i32       1
+#define TCG_TARGET_HAS_andc_i32         0
+#define TCG_TARGET_HAS_deposit_i32      0
+#define TCG_TARGET_HAS_eqv_i32          0
+#define TCG_TARGET_HAS_nand_i32         0
+#define TCG_TARGET_HAS_nor_i32          0
+#define TCG_TARGET_HAS_neg_i32          1
+#define TCG_TARGET_HAS_not_i32          1
+#define TCG_TARGET_HAS_orc_i32          0
+#define TCG_TARGET_HAS_rot_i32          1
+
+#if TCG_TARGET_REG_BITS == 64
+#define TCG_TARGET_HAS_bswap16_i64      1
+#define TCG_TARGET_HAS_bswap32_i64      1
+#define TCG_TARGET_HAS_bswap64_i64      1
+#define TCG_TARGET_HAS_deposit_i64      0
+/* Not more than one of the next two defines must be 1. */
+#define TCG_TARGET_HAS_div_i64          0
+#define TCG_TARGET_HAS_div2_i64         0
+#define TCG_TARGET_HAS_ext8s_i64        1
+#define TCG_TARGET_HAS_ext16s_i64       1
+#define TCG_TARGET_HAS_ext32s_i64       1
+#define TCG_TARGET_HAS_ext8u_i64        1
+#define TCG_TARGET_HAS_ext16u_i64       1
+#define TCG_TARGET_HAS_ext32u_i64       1
+#define TCG_TARGET_HAS_andc_i64         0
+#define TCG_TARGET_HAS_eqv_i64          0
+#define TCG_TARGET_HAS_nand_i64         0
+#define TCG_TARGET_HAS_nor_i64          0
+#define TCG_TARGET_HAS_neg_i64          1
+#define TCG_TARGET_HAS_not_i64          1
+#define TCG_TARGET_HAS_orc_i64          0
+#define TCG_TARGET_HAS_rot_i64          1
+#endif /* TCG_TARGET_REG_BITS == 64 */
+
+/* Offset to user memory in user mode. */
+#define TCG_TARGET_HAS_GUEST_BASE
+
+/* Number of registers available.
+   For 32 bit hosts, we need more than 8 registers (call arguments). */
+/* #define TCG_TARGET_NB_REGS 8 */
+#define TCG_TARGET_NB_REGS 16
+/* #define TCG_TARGET_NB_REGS 32 */
+
+/* List of registers which are used by TCG. */
+typedef enum {
+    TCG_REG_R0 = 0,
+    TCG_REG_R1,
+    TCG_REG_R2,
+    TCG_REG_R3,
+    TCG_REG_R4,
+    TCG_REG_R5,
+    TCG_REG_R6,
+    TCG_REG_R7,
+    TCG_AREG0 = TCG_REG_R7,
+#if TCG_TARGET_NB_REGS >= 16
+    TCG_REG_R8,
+    TCG_REG_R9,
+    TCG_REG_R10,
+    TCG_REG_R11,
+    TCG_REG_R12,
+    TCG_REG_R13,
+    TCG_REG_R14,
+    TCG_REG_R15,
+#if TCG_TARGET_NB_REGS >= 32
+    TCG_REG_R16,
+    TCG_REG_R17,
+    TCG_REG_R18,
+    TCG_REG_R19,
+    TCG_REG_R20,
+    TCG_REG_R21,
+    TCG_REG_R22,
+    TCG_REG_R23,
+    TCG_REG_R24,
+    TCG_REG_R25,
+    TCG_REG_R26,
+    TCG_REG_R27,
+    TCG_REG_R28,
+    TCG_REG_R29,
+    TCG_REG_R30,
+    TCG_REG_R31,
+#endif
+#endif
+    /* Special value UINT8_MAX is used by TCI to encode constant values. */
+    TCG_CONST = UINT8_MAX
+} TCGRegister;
+
+void tci_disas(uint8_t opc);
+
+#endif /* TCG_TARGET_H */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 7/8] tcg: Add tcg interpreter to configure / make
  2011-09-17 19:59 [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Stefan Weil
                   ` (5 preceding siblings ...)
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter Stefan Weil
@ 2011-09-17 20:00 ` Stefan Weil
  2011-09-18  9:37   ` Blue Swirl
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 8/8] ppc: Support tcg interpreter on ppc hosts Stefan Weil
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 48+ messages in thread
From: Stefan Weil @ 2011-09-17 20:00 UTC (permalink / raw)
  To: QEMU Developers

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 Makefile.target |    1 +
 configure       |   30 ++++++++++++++++++++++++++++--
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index 88d2f1f..a2c3a4a 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -69,6 +69,7 @@ all: $(PROGS) stap
 # cpu emulator library
 libobj-y = exec.o translate-all.o cpu-exec.o translate.o
 libobj-y += tcg/tcg.o tcg/optimize.o
+libobj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o
 libobj-y += fpu/softfloat.o
 libobj-y += op_helper.o helper.o
 ifeq ($(TARGET_BASE_ARCH), i386)
diff --git a/configure b/configure
index ad924c4..1d800e1 100755
--- a/configure
+++ b/configure
@@ -138,6 +138,7 @@ debug_tcg="no"
 debug_mon="no"
 debug="no"
 strip_opt="yes"
+tcg_interpreter="no"
 bigendian="no"
 mingw32="no"
 EXESUF=""
@@ -647,6 +648,10 @@ for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-tcg-interpreter) tcg_interpreter="no"
+  ;;
+  --enable-tcg-interpreter) tcg_interpreter="yes"
+  ;;
   --disable-spice) spice="no"
   ;;
   --enable-spice) spice="yes"
@@ -997,6 +1002,7 @@ echo "  --enable-bluez           enable bluez stack connectivity"
 echo "  --disable-slirp          disable SLIRP userspace network connectivity"
 echo "  --disable-kvm            disable KVM acceleration support"
 echo "  --enable-kvm             enable KVM acceleration support"
+echo "  --enable-tcg-interpreter enable TCG with bytecode interpreter (TCI)"
 echo "  --disable-nptl           disable usermode NPTL support"
 echo "  --enable-nptl            enable usermode NPTL support"
 echo "  --enable-system          enable all system emulation targets"
@@ -2714,6 +2720,7 @@ echo "Linux AIO support $linux_aio"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs     $blobs"
 echo "KVM support       $kvm"
+echo "TCG interpreter   $tcg_interpreter"
 echo "fdt support       $fdt"
 echo "preadv support    $preadv"
 echo "fdatasync         $fdatasync"
@@ -2761,6 +2768,15 @@ case "$cpu" in
   armv4b|armv4l)
     ARCH=arm
   ;;
+  *)
+    if test "$tcg_interpreter" = "yes" ; then
+        echo "Unsupported CPU = $cpu, will use TCG with TCI (experimental)"
+        ARCH=unknown
+    else
+        echo "Unsupported CPU = $cpu, try --enable-tcg-interpreter"
+        exit 1
+    fi
+  ;;
 esac
 echo "ARCH=$ARCH" >> $config_host_mak
 if test "$debug_tcg" = "yes" ; then
@@ -2994,6 +3010,9 @@ fi
 if test "$signalfd" = "yes" ; then
   echo "CONFIG_SIGNALFD=y" >> $config_host_mak
 fi
+if test "$tcg_interpreter" = "yes" ; then
+  echo "CONFIG_TCG_INTERPRETER=y" >> $config_host_mak
+fi
 if test "$need_offsetof" = "yes" ; then
   echo "CONFIG_NEED_OFFSETOF=y" >> $config_host_mak
 fi
@@ -3454,7 +3473,9 @@ cflags=""
 includes=""
 ldflags=""
 
-if test "$ARCH" = "sparc64" ; then
+if test "$tcg_interpreter" = "yes"; then
+  includes="-I\$(SRC_PATH)/tcg/bytecode $includes"
+elif test "$ARCH" = "sparc64" ; then
   includes="-I\$(SRC_PATH)/tcg/sparc $includes"
 elif test "$ARCH" = "s390x" ; then
   includes="-I\$(SRC_PATH)/tcg/s390 $includes"
@@ -3577,7 +3598,12 @@ if test "$gprof" = "yes" ; then
   fi
 fi
 
-linker_script="-Wl,-T../config-host.ld -Wl,-T,\$(SRC_PATH)/\$(ARCH).ld"
+if test "$ARCH" = "unknown"; then
+  linker_script=""
+else
+  linker_script="-Wl,-T../config-host.ld -Wl,-T,\$(SRC_PATH)/\$(ARCH).ld"
+fi
+
 if test "$target_linux_user" = "yes" -o "$target_bsd_user" = "yes" ; then
   case "$ARCH" in
   sparc)
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [Qemu-devel] [PATCH 8/8] ppc: Support tcg interpreter on ppc hosts
  2011-09-17 19:59 [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Stefan Weil
                   ` (6 preceding siblings ...)
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 7/8] tcg: Add tcg interpreter to configure / make Stefan Weil
@ 2011-09-17 20:00 ` Stefan Weil
  2011-09-17 21:31   ` Peter Maydell
  2011-09-18 10:26 ` [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Blue Swirl
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 48+ messages in thread
From: Stefan Weil @ 2011-09-17 20:00 UTC (permalink / raw)
  To: QEMU Developers

Tests of the tcg interpreter on an (emulated) ppc host
needed this small change.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 cache-utils.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/cache-utils.h b/cache-utils.h
index 0b65907..7c3b282 100644
--- a/cache-utils.h
+++ b/cache-utils.h
@@ -1,7 +1,7 @@
 #ifndef QEMU_CACHE_UTILS_H
 #define QEMU_CACHE_UTILS_H
 
-#if defined(_ARCH_PPC)
+#if defined(_ARCH_PPC) && !defined(CONFIG_TCG_INTERPRETER)
 struct qemu_cache_conf {
     unsigned long dcache_bsize;
     unsigned long icache_bsize;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 8/8] ppc: Support tcg interpreter on ppc hosts
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 8/8] ppc: Support tcg interpreter on ppc hosts Stefan Weil
@ 2011-09-17 21:31   ` Peter Maydell
  2011-09-17 21:33     ` Stefan Weil
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Maydell @ 2011-09-17 21:31 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

On 17 September 2011 21:00, Stefan Weil <weil@mail.berlios.de> wrote:
> Tests of the tcg interpreter on an (emulated) ppc host
> needed this small change.
>
> Signed-off-by: Stefan Weil <weil@mail.berlios.de>
> ---
>  cache-utils.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/cache-utils.h b/cache-utils.h
> index 0b65907..7c3b282 100644
> --- a/cache-utils.h
> +++ b/cache-utils.h
> @@ -1,7 +1,7 @@
>  #ifndef QEMU_CACHE_UTILS_H
>  #define QEMU_CACHE_UTILS_H
>
> -#if defined(_ARCH_PPC)
> +#if defined(_ARCH_PPC) && !defined(CONFIG_TCG_INTERPRETER)
>  struct qemu_cache_conf {
>     unsigned long dcache_bsize;
>     unsigned long icache_bsize;

This looks a bit odd, but I think that's partly an effect of
only the PPC flush_icache_range being in this header file when
for other architectures it is in tcg/*/tcg-target.h. If we
could have the cache flushing be in tcg/* for every target then
you wouldn't need to do an ifdef here.

-- PMM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 8/8] ppc: Support tcg interpreter on ppc hosts
  2011-09-17 21:31   ` Peter Maydell
@ 2011-09-17 21:33     ` Stefan Weil
  0 siblings, 0 replies; 48+ messages in thread
From: Stefan Weil @ 2011-09-17 21:33 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

Am 17.09.2011 23:31, schrieb Peter Maydell:
> On 17 September 2011 21:00, Stefan Weil<weil@mail.berlios.de>  wrote:
>> Tests of the tcg interpreter on an (emulated) ppc host
>> needed this small change.
>>
>> Signed-off-by: Stefan Weil<weil@mail.berlios.de>
>> ---
>>   cache-utils.h |    2 +-
>>   1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/cache-utils.h b/cache-utils.h
>> index 0b65907..7c3b282 100644
>> --- a/cache-utils.h
>> +++ b/cache-utils.h
>> @@ -1,7 +1,7 @@
>>   #ifndef QEMU_CACHE_UTILS_H
>>   #define QEMU_CACHE_UTILS_H
>>
>> -#if defined(_ARCH_PPC)
>> +#if defined(_ARCH_PPC)&&  !defined(CONFIG_TCG_INTERPRETER)
>>   struct qemu_cache_conf {
>>      unsigned long dcache_bsize;
>>      unsigned long icache_bsize;
> This looks a bit odd, but I think that's partly an effect of
> only the PPC flush_icache_range being in this header file when
> for other architectures it is in tcg/*/tcg-target.h. If we
> could have the cache flushing be in tcg/* for every target then
> you wouldn't need to do an ifdef here.
>
> -- PMM

That's correct.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 3/8] tcg: Add forward declarations for local functions
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 3/8] tcg: Add forward declarations for local functions Stefan Weil
@ 2011-09-17 21:40   ` Peter Maydell
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Maydell @ 2011-09-17 21:40 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

On 17 September 2011 21:00, Stefan Weil <weil@mail.berlios.de> wrote:
> +/* Forward declarations for functions declared and used in tcg-target.c. */
> +static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str);
> +static void tcg_out_ld(TCGContext *s, TCGType type, int ret, int arg1,
> +                       tcg_target_long arg2);
> +static void tcg_out_mov(TCGContext *s, TCGType type, int ret, int arg);
> +static void tcg_out_movi(TCGContext *s, TCGType type,
> +                         int ret, tcg_target_long arg);
> +static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
> +                       const int *const_args);
> +static void tcg_out_st(TCGContext *s, TCGType type, int arg, int arg1,
> +                       tcg_target_long arg2);
> +static int tcg_target_const_match(tcg_target_long val,
> +                                  const TCGArgConstraint *arg_ct);
> +static int tcg_target_get_call_iarg_regs_count(int flags);

I'm tempted to submit a bulk rename patch that renames the functions
in this list which don't start 'tcg_target_' so that they do...

-- PMM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode Stefan Weil
@ 2011-09-18  4:03   ` Andi Kleen
  2011-09-18  5:49     ` Stefan Weil
  2011-09-18 10:18   ` Blue Swirl
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2011-09-18  4:03 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

Stefan Weil <weil@mail.berlios.de> writes:
> +
> +        switch (opc) {
> +        case INDEX_op_end:
> +        case INDEX_op_nop:
> +            break;

You could probably get some more speed out of this by using a threaded
interpreter with gcc's computed goto extension. That's typically
significantly faster than a plain switch in a loop.

static void *ops[] = {
       &&op1,
       &&op2,
       ...
};

#define NEXT() goto *ops[*tb_ptr++];

        op1:
                ...
                NEXT();

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode
  2011-09-18  4:03   ` Andi Kleen
@ 2011-09-18  5:49     ` Stefan Weil
  2011-09-18  7:22       ` Paolo Bonzini
  0 siblings, 1 reply; 48+ messages in thread
From: Stefan Weil @ 2011-09-18  5:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: QEMU Developers

Am 18.09.2011 06:03, schrieb Andi Kleen:
> Stefan Weil <weil@mail.berlios.de> writes:
>> +
>> + switch (opc) {
>> + case INDEX_op_end:
>> + case INDEX_op_nop:
>> + break;
>
> You could probably get some more speed out of this by using a threaded
> interpreter with gcc's computed goto extension. That's typically
> significantly faster than a plain switch in a loop.
>
> static void *ops[] = {
> &&op1,
> &&op2,
> ...
> };
>
> #define NEXT() goto *ops[*tb_ptr++];
>
> op1:
> ...
> NEXT();
>
> -Andi

Is there really any difference in the generated code?
gcc already uses a jump table internally to handle the
switch cases.

- Stefan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode
  2011-09-18  5:49     ` Stefan Weil
@ 2011-09-18  7:22       ` Paolo Bonzini
  2011-09-18 17:54         ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Paolo Bonzini @ 2011-09-18  7:22 UTC (permalink / raw)
  To: Stefan Weil; +Cc: Andi Kleen, QEMU Developers

On 09/18/2011 07:49 AM, Stefan Weil wrote:
> Is there really any difference in the generated code?
> gcc already uses a jump table internally to handle the
> switch cases.

You typically save something on range checks, and it enables a lot more 
tricks for use later (e.g. using multiple jump tables to perform simple 
peephole optimizations, or to divert code execution on interrupts).

Paolo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 7/8] tcg: Add tcg interpreter to configure / make
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 7/8] tcg: Add tcg interpreter to configure / make Stefan Weil
@ 2011-09-18  9:37   ` Blue Swirl
  2011-09-18 10:14     ` Stefan Weil
  0 siblings, 1 reply; 48+ messages in thread
From: Blue Swirl @ 2011-09-18  9:37 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

On Sat, Sep 17, 2011 at 8:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
> Signed-off-by: Stefan Weil <weil@mail.berlios.de>
> ---
>  Makefile.target |    1 +
>  configure       |   30 ++++++++++++++++++++++++++++--
>  2 files changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/Makefile.target b/Makefile.target
> index 88d2f1f..a2c3a4a 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -69,6 +69,7 @@ all: $(PROGS) stap
>  # cpu emulator library
>  libobj-y = exec.o translate-all.o cpu-exec.o translate.o
>  libobj-y += tcg/tcg.o tcg/optimize.o
> +libobj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o
>  libobj-y += fpu/softfloat.o
>  libobj-y += op_helper.o helper.o
>  ifeq ($(TARGET_BASE_ARCH), i386)
> diff --git a/configure b/configure
> index ad924c4..1d800e1 100755
> --- a/configure
> +++ b/configure
> @@ -138,6 +138,7 @@ debug_tcg="no"
>  debug_mon="no"
>  debug="no"
>  strip_opt="yes"
> +tcg_interpreter="no"
>  bigendian="no"
>  mingw32="no"
>  EXESUF=""
> @@ -647,6 +648,10 @@ for opt do
>   ;;
>   --enable-kvm) kvm="yes"
>   ;;
> +  --disable-tcg-interpreter) tcg_interpreter="no"
> +  ;;
> +  --enable-tcg-interpreter) tcg_interpreter="yes"
> +  ;;
>   --disable-spice) spice="no"
>   ;;
>   --enable-spice) spice="yes"
> @@ -997,6 +1002,7 @@ echo "  --enable-bluez           enable bluez stack connectivity"
>  echo "  --disable-slirp          disable SLIRP userspace network connectivity"
>  echo "  --disable-kvm            disable KVM acceleration support"
>  echo "  --enable-kvm             enable KVM acceleration support"
> +echo "  --enable-tcg-interpreter enable TCG with bytecode interpreter (TCI)"
>  echo "  --disable-nptl           disable usermode NPTL support"
>  echo "  --enable-nptl            enable usermode NPTL support"
>  echo "  --enable-system          enable all system emulation targets"
> @@ -2714,6 +2720,7 @@ echo "Linux AIO support $linux_aio"
>  echo "ATTR/XATTR support $attr"
>  echo "Install blobs     $blobs"
>  echo "KVM support       $kvm"
> +echo "TCG interpreter   $tcg_interpreter"
>  echo "fdt support       $fdt"
>  echo "preadv support    $preadv"
>  echo "fdatasync         $fdatasync"
> @@ -2761,6 +2768,15 @@ case "$cpu" in
>   armv4b|armv4l)
>     ARCH=arm
>   ;;
> +  *)
> +    if test "$tcg_interpreter" = "yes" ; then
> +        echo "Unsupported CPU = $cpu, will use TCG with TCI (experimental)"
> +        ARCH=unknown

ARCH=TCI or 'all' would be more accurate.

> +    else
> +        echo "Unsupported CPU = $cpu, try --enable-tcg-interpreter"
> +        exit 1
> +    fi
> +  ;;
>  esac
>  echo "ARCH=$ARCH" >> $config_host_mak
>  if test "$debug_tcg" = "yes" ; then
> @@ -2994,6 +3010,9 @@ fi
>  if test "$signalfd" = "yes" ; then
>   echo "CONFIG_SIGNALFD=y" >> $config_host_mak
>  fi
> +if test "$tcg_interpreter" = "yes" ; then
> +  echo "CONFIG_TCG_INTERPRETER=y" >> $config_host_mak
> +fi
>  if test "$need_offsetof" = "yes" ; then
>   echo "CONFIG_NEED_OFFSETOF=y" >> $config_host_mak
>  fi
> @@ -3454,7 +3473,9 @@ cflags=""
>  includes=""
>  ldflags=""
>
> -if test "$ARCH" = "sparc64" ; then
> +if test "$tcg_interpreter" = "yes"; then

Here the test should be against ARCH for consistency.

> +  includes="-I\$(SRC_PATH)/tcg/bytecode $includes"
> +elif test "$ARCH" = "sparc64" ; then
>   includes="-I\$(SRC_PATH)/tcg/sparc $includes"
>  elif test "$ARCH" = "s390x" ; then
>   includes="-I\$(SRC_PATH)/tcg/s390 $includes"
> @@ -3577,7 +3598,12 @@ if test "$gprof" = "yes" ; then
>   fi
>  fi
>
> -linker_script="-Wl,-T../config-host.ld -Wl,-T,\$(SRC_PATH)/\$(ARCH).ld"
> +if test "$ARCH" = "unknown"; then
> +  linker_script=""
> +else
> +  linker_script="-Wl,-T../config-host.ld -Wl,-T,\$(SRC_PATH)/\$(ARCH).ld"
> +fi
> +
>  if test "$target_linux_user" = "yes" -o "$target_bsd_user" = "yes" ; then
>   case "$ARCH" in
>   sparc)
> --
> 1.7.2.5
>
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter Stefan Weil
@ 2011-09-18 10:03   ` Blue Swirl
  2011-09-19 22:28     ` Stuart Brady
  2011-10-01 16:54   ` Andreas Färber
  1 sibling, 1 reply; 48+ messages in thread
From: Blue Swirl @ 2011-09-18 10:03 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

On Sat, Sep 17, 2011 at 8:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
> Unlike other tcg target code generators, this one does not generate
> machine code for some cpu. It generates machine independent bytecode
> which is interpreted later.
>
> This allows running QEMU on any host.
>
> Interpreted bytecode is slower than direct execution of generated
> machine code.
>
> Signed-off-by: Stefan Weil <weil@mail.berlios.de>
> ---
>  dis-asm.h                 |    1 +
>  disas.c                   |    4 +-
>  dyngen-exec.h             |   13 +-
>  exec-all.h                |   13 +-
>  tcg/bytecode/README       |  129 ++++++
>  tcg/bytecode/tcg-target.c |  955 +++++++++++++++++++++++++++++++++++++++++++++
>  tcg/bytecode/tcg-target.h |  152 +++++++
>  7 files changed, 1263 insertions(+), 4 deletions(-)
>  create mode 100644 tcg/bytecode/README
>  create mode 100644 tcg/bytecode/tcg-target.c
>  create mode 100644 tcg/bytecode/tcg-target.h

It would be nice to use either 'bytecode' or TCI (or TCG interpreter)
consistently, also for the file names.

>
> diff --git a/dis-asm.h b/dis-asm.h
> index 5b07d7f..876975f 100644
> --- a/dis-asm.h
> +++ b/dis-asm.h
> @@ -365,6 +365,7 @@ typedef struct disassemble_info {
>    target address.  Return number of bytes processed.  */
>  typedef int (*disassembler_ftype) (bfd_vma, disassemble_info *);
>
> +int print_insn_bytecode(bfd_vma, disassemble_info*);
>  int print_insn_big_mips         (bfd_vma, disassemble_info*);
>  int print_insn_little_mips      (bfd_vma, disassemble_info*);
>  int print_insn_i386             (bfd_vma, disassemble_info*);
> diff --git a/disas.c b/disas.c
> index 611b30b..e2061d8 100644
> --- a/disas.c
> +++ b/disas.c
> @@ -273,7 +273,9 @@ void disas(FILE *out, void *code, unsigned long size)
>  #else
>     disasm_info.endian = BFD_ENDIAN_LITTLE;
>  #endif
> -#if defined(__i386__)
> +#if defined(CONFIG_TCG_INTERPRETER)
> +    print_insn = print_insn_bytecode;
> +#elif defined(__i386__)
>     disasm_info.mach = bfd_mach_i386_i386;
>     print_insn = print_insn_i386;
>  #elif defined(__x86_64__)
> diff --git a/dyngen-exec.h b/dyngen-exec.h
> index 8beb7f3..64f76c4 100644
> --- a/dyngen-exec.h
> +++ b/dyngen-exec.h
> @@ -19,7 +19,9 @@
>  #if !defined(__DYNGEN_EXEC_H__)
>  #define __DYNGEN_EXEC_H__
>
> -#if defined(__i386__)
> +#if defined(CONFIG_TCG_INTERPRETER)
> +/* The TCG interpreter does not use special registers. */
> +#elif defined(__i386__)
>  #define AREG0 "ebp"
>  #elif defined(__x86_64__)
>  #define AREG0 "r14"
> @@ -55,11 +57,18 @@
>  #error unsupported CPU
>  #endif
>
> +#if defined(AREG0)
>  register CPUState *env asm(AREG0);
> +#else
> +extern CPUState *env;

Maybe cpu_single_env could be used instead.

> +#endif
>
>  /* The return address may point to the start of the next instruction.
>    Subtracting one gets us the call instruction itself.  */
> -#if defined(__s390__) && !defined(__s390x__)
> +#if defined(CONFIG_TCG_INTERPRETER)
> +extern uint8_t *tci_tb_ptr;

Why is this here, could it be somewhere in tcg/*.h?

> +# define GETPC() ((void *)tci_tb_ptr)
> +#elif defined(__s390__) && !defined(__s390x__)
>  # define GETPC() ((void*)(((unsigned long)__builtin_return_address(0) & 0x7fffffffUL) - 1))
>  #elif defined(__arm__)
>  /* Thumb return addresses have the low bit set, so we need to subtract two.
> diff --git a/exec-all.h b/exec-all.h
> index 9b8d62c..0116acd 100644
> --- a/exec-all.h
> +++ b/exec-all.h
> @@ -122,6 +122,8 @@ void tlb_set_page(CPUState *env, target_ulong vaddr,
>
>  #if defined(_ARCH_PPC) || defined(__x86_64__) || defined(__arm__) || defined(__i386__)
>  #define USE_DIRECT_JUMP
> +#elif defined(CONFIG_TCG_INTERPRETER)
> +#define USE_DIRECT_JUMP
>  #endif
>
>  struct TranslationBlock {
> @@ -189,7 +191,14 @@ extern TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
>
>  #if defined(USE_DIRECT_JUMP)
>
> -#if defined(_ARCH_PPC)
> +#if defined(CONFIG_TCG_INTERPRETER)
> +static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
> +{
> +    /* patch the branch destination */
> +    *(uint32_t *)jmp_addr = addr - (jmp_addr + 4);
> +    /* no need to flush icache explicitly */
> +}
> +#elif defined(_ARCH_PPC)
>  void ppc_tb_set_jmp_target(unsigned long jmp_addr, unsigned long addr);
>  #define tb_set_jmp_target1 ppc_tb_set_jmp_target
>  #elif defined(__i386__) || defined(__x86_64__)
> @@ -223,6 +232,8 @@ static inline void tb_set_jmp_target1(unsigned long jmp_addr, unsigned long addr
>     __asm __volatile__ ("swi 0x9f0002" : : "r" (_beg), "r" (_end), "r" (_flg));
>  #endif
>  }
> +#else
> +#error tb_set_jmp_target1 is missing
>  #endif
>
>  static inline void tb_set_jmp_target(TranslationBlock *tb,
> diff --git a/tcg/bytecode/README b/tcg/bytecode/README
> new file mode 100644
> index 0000000..6fe9755
> --- /dev/null
> +++ b/tcg/bytecode/README
> @@ -0,0 +1,129 @@
> +TCG Interpreter (TCI) - Copyright (c) 2011 Stefan Weil.
> +
> +This file is released under GPL 2 or later.
> +
> +1) Introduction
> +
> +TCG (Tiny Code Generator) is a code generator which translates
> +code fragments ("basic blocks") from target code (any of the
> +targets supported by QEMU) to a code representation which
> +can be run on a host.
> +
> +QEMU can create native code for some hosts (arm, hppa, i386, ia64, ppc, ppc64,
> +s390, sparc, x86_64). For others, unofficial host support was written.
> +
> +By adding a code generator for a virtual machine and using an
> +interpreter for the generated bytecode, it is possible to
> +support (almost) any host.

This sounds like there are some limitations. I'm curious, what they are?

> +
> +This is what TCI (Tiny Code Interpreter) does.
> +
> +2) Implementation
> +
> +Like each TCG host frontend, TCI implements the code generator in
> +tcg-target.c, tcg-target.h. Both files are in directory tcg/bytecode.
> +
> +The additional file tcg/tci.c adds the interpreter.
> +
> +The bytecode consists of opcodes (same numeric values as those used by
> +TCG), command length and arguments of variable size and number.
> +
> +3) Usage
> +
> +For hosts without native TCG, the interpreter TCI must be enabled by
> +
> +        configure --enable-tcg-interpreter
> +
> +If configure is called without --enable-tcg-interpreter, it will
> +suggest using this option. Setting it automatically would need
> +additional code in configure which must be fixed when new native TCG
> +implementations are added.
> +
> +System emulation should work on any 32 or 64 bit host.
> +User mode emulation might work. Maybe a new loader (*.ld)
> +is needed. Byte order might be wrong (on big endian hosts)
> +and need fixes in configure.
> +
> +For hosts with native TCG, the interpreter TCI can be enabled by
> +
> +        configure --enable-tcg-interpreter
> +
> +The only difference from running qemu with TCI to running without TCI

QEMU

> +should be speed. Especially during development of TCI, it was very
> +useful to compare runs with and without TCI. Create /tmp/qemu.log by
> +
> +        qemu -d in_asm,op_opt,cpu -singlestep
> +
> +once with interpreter and once without interpreter and compare the resulting
> +qemu.log files. This is also useful to see the effects of additional
> +registers or additional opcodes (it is easy to modify the virtual machine).
> +It can also be used to verify native TCGs.
> +
> +Hosts with native TCG can also enable TCI by claiming to be unsupported:
> +
> +        configure --cpu=unknown --enable-tcg-interpreter
> +
> +configure then no longer uses the native loader (*.ld) for user mode emulation.

s/loader/linker script/

> +
> +
> +4) Status
> +
> +TCI needs special implementation for 32 and 64 bit host, 32 and 64 bit target,
> +host and target with same or different endianness.
> +
> +            | host (le)                     host (be)
> +            | 32             64             32             64
> +------------+------------------------------------------------------------
> +target (le) | s0, u0         s1, u1         s?, u?         s?, u?
> +32 bit      |
> +            |
> +target (le) | sc, uc         s1, u1         s?, u?         s?, u?
> +64 bit      |
> +            |
> +target (be) | sc, u0         sc, uc         s?, u?         s?, u?
> +32 bit      |
> +            |
> +target (be) | sc, uc         sc, uc         s?, u?         s?, u?
> +64 bit      |
> +            |
> +
> +System emulation
> +s? = untested
> +sc = compiles
> +s0 = bios works
> +s1 = grub works
> +s2 = linux boots

Linux

> +
> +Linux user mode emulation
> +u? = untested
> +uc = compiles
> +u0 = static hello works
> +u1 = linux-user-test works
> +
> +5) Todo list
> +
> +* TCI is not widely tested. It was written and tested on a x86_64 host
> +  running i386 and x86_64 system emulation and linux user mode.
> +  A cross compiled qemu for i386 host also works with the same basic tests.
> +  A cross compiled qemu for mipsel host works, too. It is terribly slow
> +  because I run it in a mips malta emulation, so it is an interpreted
> +  emulation in an emulation.
> +  A cross compiled qemu for arm host works (tested with pc bios).
> +  A cross compiled qemu for ppc host works at least partially:
> +  i386-linux-user/qemu-i386 can run a simple hello-world program
> +  (tested in a ppc emulation).
> +
> +* Some TCG opcodes are either missing in the code generator and/or
> +  in the interpreter. These opcodes raise a runtime exception, so it is
> +  possible to see where code must be added.
> +
> +* The pseudo code is not optimized and still ugly. For hosts with special
> +  alignment requirements, it needs some fixes (maybe aligned bytecode
> +  would also improve speed for hosts which support byte alignment).
> +
> +* A better disassembler for the pseudo code would be nice (a very primitive
> +  disassembler is included in tcg-target.c).
> +
> +* It might be useful to have a runtime option which selects the native TCG
> +  or TCI, so qemu would have to include two TCGs. Today, selecting TCI
> +  is a configure option, so you need two compilations of qemu.
> diff --git a/tcg/bytecode/tcg-target.c b/tcg/bytecode/tcg-target.c
> new file mode 100644
> index 0000000..f505ff0
> --- /dev/null
> +++ b/tcg/bytecode/tcg-target.c
> @@ -0,0 +1,955 @@
> +/*
> + * Tiny Code Generator for QEMU
> + *
> + * Copyright (c) 2009, 2011 Stefan Weil
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +/* TODO list:
> + * - See TODO comments in code.
> + */
> +
> +/* Marker for missing code. */
> +#define TODO() \
> +    do { \
> +        fprintf(stderr, "TODO %s:%u: %s()\n", \
> +                __FILE__, __LINE__, __func__); \
> +        tcg_abort(); \
> +    } while (0)
> +
> +/* Trace message to see program flow. */
> +#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
> +#define TRACE() \
> +    loglevel \
> +    ? fprintf(stderr, "TCG %s:%u: %s()\n", __FILE__, __LINE__, __func__) \
> +    : (void)0
> +#else
> +#define TRACE() ((void)0)
> +#endif

Perhaps tracepoints could be used instead.

> +
> +/* Single bit n. */
> +#define BIT(n) (1 << (n))
> +
> +/* Bitfield n...m (in 32 bit value). */
> +#define BITS(n, m) (((0xffffffffU << (31 - n)) >> (31 - n + m)) << m)
> +
> +/* Used for function call generation. */
> +#define TCG_REG_CALL_STACK              TCG_REG_R4
> +#define TCG_TARGET_STACK_ALIGN          16
> +#define TCG_TARGET_CALL_STACK_OFFSET    0
> +
> +/* TODO: documentation. */
> +static uint8_t *tb_ret_addr;
> +
> +/* Macros used in tcg_target_op_defs. */
> +#define R       "r"
> +#define RI      "ri"
> +#if TCG_TARGET_REG_BITS == 32
> +# define R64    "r", "r"
> +#else
> +# define R64    "r"
> +#endif
> +#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
> +# define L      "L", "L"
> +# define S      "S", "S"
> +#else
> +# define L      "L"
> +# define S      "S"
> +#endif
> +
> +/* TODO: documentation. */
> +static const TCGTargetOpDef tcg_target_op_defs[] = {
> +    { INDEX_op_exit_tb, { } },
> +    { INDEX_op_goto_tb, { } },
> +    { INDEX_op_call, { RI } },
> +    { INDEX_op_jmp, { RI } },
> +    { INDEX_op_br, { } },
> +
> +    { INDEX_op_mov_i32, { R, R } },
> +    { INDEX_op_movi_i32, { R } },
> +
> +    { INDEX_op_ld8u_i32, { R, R } },
> +    { INDEX_op_ld8s_i32, { R, R } },
> +    { INDEX_op_ld16u_i32, { R, R } },
> +    { INDEX_op_ld16s_i32, { R, R } },
> +    { INDEX_op_ld_i32, { R, R } },
> +    { INDEX_op_st8_i32, { R, R } },
> +    { INDEX_op_st16_i32, { R, R } },
> +    { INDEX_op_st_i32, { R, R } },
> +
> +    { INDEX_op_add_i32, { R, RI, RI } },
> +    { INDEX_op_sub_i32, { R, RI, RI } },
> +    { INDEX_op_mul_i32, { R, RI, RI } },
> +#if TCG_TARGET_HAS_div_i32

I was wondering if this #ifdeffery is needed since TCI would probably
give more performance compared to the alternative, TCG generated
emulation sequences. But it could be useful for testing those. Maybe
there should be two options to enable and disable all non-mandatoryTCI
versions.

> +    { INDEX_op_div_i32, { R, R, R } },
> +    { INDEX_op_divu_i32, { R, R, R } },
> +    { INDEX_op_rem_i32, { R, R, R } },
> +    { INDEX_op_remu_i32, { R, R, R } },
> +#elif TCG_TARGET_HAS_div2_i32
> +    { INDEX_op_div2_i32, { R, R, "0", "1", R } },
> +    { INDEX_op_divu2_i32, { R, R, "0", "1", R } },
> +#endif
> +    /* TODO: Does R, RI, RI result in faster code than R, R, RI?
> +       If both operands are constants, we can optimize. */
> +    { INDEX_op_and_i32, { R, RI, RI } },
> +#if TCG_TARGET_HAS_andc_i32
> +    { INDEX_op_andc_i32, { R, RI, RI } },
> +#endif
> +#if TCG_TARGET_HAS_eqv_i32
> +    { INDEX_op_eqv_i32, { R, RI, RI } },
> +#endif
> +#if TCG_TARGET_HAS_nand_i32
> +    { INDEX_op_nand_i32, { R, RI, RI } },
> +#endif
> +#if TCG_TARGET_HAS_nor_i32
> +    { INDEX_op_nor_i32, { R, RI, RI } },
> +#endif
> +    { INDEX_op_or_i32, { R, RI, RI } },
> +#if TCG_TARGET_HAS_orc_i32
> +    { INDEX_op_orc_i32, { R, RI, RI } },
> +#endif
> +    { INDEX_op_xor_i32, { R, RI, RI } },
> +    { INDEX_op_shl_i32, { R, RI, RI } },
> +    { INDEX_op_shr_i32, { R, RI, RI } },
> +    { INDEX_op_sar_i32, { R, RI, RI } },
> +#if TCG_TARGET_HAS_rot_i32
> +    { INDEX_op_rotl_i32, { R, RI, RI } },
> +    { INDEX_op_rotr_i32, { R, RI, RI } },
> +#endif
> +
> +    { INDEX_op_brcond_i32, { R, RI } },
> +
> +    { INDEX_op_setcond_i32, { R, R, RI } },
> +#if TCG_TARGET_REG_BITS == 64
> +    { INDEX_op_setcond_i64, { R, R, RI } },
> +#endif /* TCG_TARGET_REG_BITS == 64 */
> +
> +#if TCG_TARGET_REG_BITS == 32
> +    /* TODO: Support R, R, R, R, RI, RI? Will it be faster? */
> +    { INDEX_op_add2_i32, { R, R, R, R, R, R } },
> +    { INDEX_op_sub2_i32, { R, R, R, R, R, R } },
> +    { INDEX_op_brcond2_i32, { R, R, RI, RI } },
> +    { INDEX_op_mulu2_i32, { R, R, R, R } },
> +    { INDEX_op_setcond2_i32, { R, R, R, RI, RI } },
> +#endif
> +
> +#if TCG_TARGET_HAS_not_i32
> +    { INDEX_op_not_i32, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_neg_i32
> +    { INDEX_op_neg_i32, { R, R } },
> +#endif
> +
> +#if TCG_TARGET_REG_BITS == 64
> +    { INDEX_op_mov_i64, { R, R } },
> +    { INDEX_op_movi_i64, { R } },
> +
> +    { INDEX_op_ld8u_i64, { R, R } },
> +    { INDEX_op_ld8s_i64, { R, R } },
> +    { INDEX_op_ld16u_i64, { R, R } },
> +    { INDEX_op_ld16s_i64, { R, R } },
> +    { INDEX_op_ld32u_i64, { R, R } },
> +    { INDEX_op_ld32s_i64, { R, R } },
> +    { INDEX_op_ld_i64, { R, R } },
> +
> +    { INDEX_op_st8_i64, { R, R } },
> +    { INDEX_op_st16_i64, { R, R } },
> +    { INDEX_op_st32_i64, { R, R } },
> +    { INDEX_op_st_i64, { R, R } },
> +
> +    { INDEX_op_add_i64, { R, RI, RI } },
> +    { INDEX_op_sub_i64, { R, RI, RI } },
> +    { INDEX_op_mul_i64, { R, RI, RI } },
> +#if TCG_TARGET_HAS_div_i64
> +    { INDEX_op_div_i64, { R, R, R } },
> +    { INDEX_op_divu_i64, { R, R, R } },
> +    { INDEX_op_rem_i64, { R, R, R } },
> +    { INDEX_op_remu_i64, { R, R, R } },
> +#elif defined(TCG_TARGET_HAS_div2_i64)
> +    { INDEX_op_div2_i64, { R, R, "0", "1", R } },
> +    { INDEX_op_divu2_i64, { R, R, "0", "1", R } },
> +#endif
> +    { INDEX_op_and_i64, { R, RI, RI } },
> +#if TCG_TARGET_HAS_andc_i64
> +    { INDEX_op_andc_i64, { R, RI, RI } },
> +#endif
> +#if TCG_TARGET_HAS_eqv_i64
> +    { INDEX_op_eqv_i64, { R, RI, RI } },
> +#endif
> +#if TCG_TARGET_HAS_nand_i64
> +    { INDEX_op_nand_i64, { R, RI, RI } },
> +#endif
> +#if TCG_TARGET_HAS_nor_i64
> +    { INDEX_op_nor_i64, { R, RI, RI } },
> +#endif
> +    { INDEX_op_or_i64, { R, RI, RI } },
> +#if TCG_TARGET_HAS_orc_i64
> +    { INDEX_op_orc_i64, { R, RI, RI } },
> +#endif
> +    { INDEX_op_xor_i64, { R, RI, RI } },
> +    { INDEX_op_shl_i64, { R, RI, RI } },
> +    { INDEX_op_shr_i64, { R, RI, RI } },
> +    { INDEX_op_sar_i64, { R, RI, RI } },
> +#if TCG_TARGET_HAS_rot_i64
> +    { INDEX_op_rotl_i64, { R, RI, RI } },
> +    { INDEX_op_rotr_i64, { R, RI, RI } },
> +#endif
> +    { INDEX_op_brcond_i64, { R, RI } },
> +
> +#if TCG_TARGET_HAS_ext8s_i64
> +    { INDEX_op_ext8s_i64, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_ext16s_i64
> +    { INDEX_op_ext16s_i64, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_ext32s_i64
> +    { INDEX_op_ext32s_i64, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_ext8u_i64
> +    { INDEX_op_ext8u_i64, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_ext16u_i64
> +    { INDEX_op_ext16u_i64, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_ext32u_i64
> +    { INDEX_op_ext32u_i64, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_bswap16_i64
> +    { INDEX_op_bswap16_i64, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_bswap32_i64
> +    { INDEX_op_bswap32_i64, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_bswap64_i64
> +    { INDEX_op_bswap64_i64, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_not_i64
> +    { INDEX_op_not_i64, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_neg_i64
> +    { INDEX_op_neg_i64, { R, R } },
> +#endif
> +#endif /* TCG_TARGET_REG_BITS == 64 */
> +
> +    { INDEX_op_qemu_ld8u, { R, L } },
> +    { INDEX_op_qemu_ld8s, { R, L } },
> +    { INDEX_op_qemu_ld16u, { R, L } },
> +    { INDEX_op_qemu_ld16s, { R, L } },
> +    { INDEX_op_qemu_ld32, { R, L } },
> +#if TCG_TARGET_REG_BITS == 64
> +    { INDEX_op_qemu_ld32u, { R, L } },
> +    { INDEX_op_qemu_ld32s, { R, L } },
> +#endif
> +    { INDEX_op_qemu_ld64, { R64, L } },
> +
> +    { INDEX_op_qemu_st8, { R, S } },
> +    { INDEX_op_qemu_st16, { R, S } },
> +    { INDEX_op_qemu_st32, { R, S } },
> +    { INDEX_op_qemu_st64, { R64, S } },
> +
> +#if TCG_TARGET_HAS_ext8s_i32
> +    { INDEX_op_ext8s_i32, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_ext16s_i32
> +    { INDEX_op_ext16s_i32, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_ext8u_i32
> +    { INDEX_op_ext8u_i32, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_ext16u_i32
> +    { INDEX_op_ext16u_i32, { R, R } },
> +#endif
> +
> +#if TCG_TARGET_HAS_bswap16_i32
> +    { INDEX_op_bswap16_i32, { R, R } },
> +#endif
> +#if TCG_TARGET_HAS_bswap32_i32
> +    { INDEX_op_bswap32_i32, { R, R } },
> +#endif
> +
> +    { -1 },
> +};
> +
> +static const int tcg_target_reg_alloc_order[] = {
> +    TCG_REG_R0,
> +    TCG_REG_R1,
> +    TCG_REG_R2,
> +    TCG_REG_R3,
> +#if 0 /* used for TCG_REG_CALL_STACK */
> +    TCG_REG_R4,
> +#endif
> +    TCG_REG_R5,
> +    TCG_REG_R6,
> +    TCG_REG_R7,
> +#if TCG_TARGET_NB_REGS >= 16
> +    TCG_REG_R8,
> +    TCG_REG_R9,
> +    TCG_REG_R10,
> +    TCG_REG_R11,
> +    TCG_REG_R12,
> +    TCG_REG_R13,
> +    TCG_REG_R14,
> +    TCG_REG_R15,
> +#endif
> +};
> +
> +#if MAX_OPC_PARAM_IARGS != 4
> +# error Fix needed, number of supported input arguments changed!
> +#endif
> +
> +static const int tcg_target_call_iarg_regs[] = {
> +    TCG_REG_R0,
> +    TCG_REG_R1,
> +    TCG_REG_R2,
> +    TCG_REG_R3,
> +#if TCG_TARGET_REG_BITS == 32
> +    /* 32 bit hosts need 2 * MAX_OPC_PARAM_IARGS registers. */
> +#if 0 /* used for TCG_REG_CALL_STACK */
> +    TCG_REG_R4,
> +#endif
> +    TCG_REG_R5,
> +    TCG_REG_R6,
> +    TCG_REG_R7,
> +#if TCG_TARGET_NB_REGS >= 16
> +    TCG_REG_R8,
> +#else
> +# error Too few input registers available
> +#endif
> +#endif
> +};
> +
> +static const int tcg_target_call_oarg_regs[] = {
> +    TCG_REG_R0,
> +#if TCG_TARGET_REG_BITS == 32
> +    TCG_REG_R1
> +#endif
> +};
> +
> +#ifndef NDEBUG
> +static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
> +    "r00",
> +    "r01",
> +    "r02",
> +    "r03",
> +    "r04",
> +    "r05",
> +    "r06",
> +    "r07",
> +#if TCG_TARGET_NB_REGS >= 16
> +    "r08",
> +    "r09",
> +    "r10",
> +    "r11",
> +    "r12",
> +    "r13",
> +    "r14",
> +    "r15",
> +#if TCG_TARGET_NB_REGS >= 32
> +    "r16",
> +    "r17",
> +    "r18",
> +    "r19",
> +    "r20",
> +    "r21",
> +    "r22",
> +    "r23",
> +    "r24",
> +    "r25",
> +    "r26",
> +    "r27",
> +    "r28",
> +    "r29",
> +    "r30",
> +    "r31"
> +#endif
> +#endif
> +};
> +#endif
> +
> +static void flush_icache_range(unsigned long start, unsigned long stop)
> +{
> +    TRACE();
> +}
> +
> +static void patch_reloc(uint8_t *code_ptr, int type,
> +                        tcg_target_long value, tcg_target_long addend)
> +{
> +    /* tcg_out_reloc always uses the same type, addend. */
> +    assert(type == sizeof(tcg_target_long));
> +    assert(addend == 0);
> +    assert(value != 0);
> +    *(tcg_target_long *)code_ptr = value;
> +}
> +
> +/* Parse target specific constraints. */
> +static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
> +{
> +    const char *ct_str = *pct_str;
> +    switch (ct_str[0]) {
> +    case 'r':
> +    case 'L':                   /* qemu_ld constraint */
> +    case 'S':                   /* qemu_st constraint */
> +        ct->ct |= TCG_CT_REG;
> +        tcg_regset_set32(ct->u.regs, 0, BIT(TCG_TARGET_NB_REGS) - 1);
> +        break;
> +    default:
> +        return -1;
> +    }
> +    ct_str++;
> +    *pct_str = ct_str;
> +    return 0;
> +}
> +
> +#include "dis-asm.h"
> +
> +/* Disassemble bytecode. */
> +int print_insn_bytecode(bfd_vma addr, disassemble_info *info)
> +{
> +    int length;
> +    uint8_t byte;
> +    int status;
> +    TCGOpcode op;
> +
> +    status = info->read_memory_func(addr, &byte, 1, info);
> +    if (status != 0) {
> +        info->memory_error_func(status, addr, info);
> +        return -1;
> +    }
> +    op = byte;
> +
> +    addr++;
> +    status = info->read_memory_func(addr, &byte, 1, info);
> +    if (status != 0) {
> +        info->memory_error_func(status, addr, info);
> +        return -1;
> +    }
> +    length = byte;
> +
> +    if (op >= ARRAY_SIZE(tcg_op_defs)) {
> +        return length;
> +    }
> +
> +    const TCGOpDef *def = &tcg_op_defs[op];
> +    int nb_oargs = def->nb_oargs;
> +    int nb_iargs = def->nb_iargs;
> +    int nb_cargs = def->nb_cargs;
> +    FILE *f = info->stream;
> +    /* TODO: Improve disassembler output. */
> +    info->fprintf_func(f, "%s\to=%d i=%d c=%d",
> +                       def->name, nb_oargs, nb_iargs, nb_cargs);
> +
> +    return length;
> +}
> +
> +#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
> +/* Show current bytecode. Used by tcg interpreter. */
> +void tci_disas(uint8_t opc)
> +{
> +    const TCGOpDef *def = &tcg_op_defs[opc];
> +    fprintf(stderr, "TCG %s %u, %u, %u\n",
> +            def->name, def->nb_oargs, def->nb_iargs, def->nb_cargs);
> +}
> +#endif
> +
> +/* Write value (native size). */
> +static void tcg_out_i(TCGContext *s, tcg_target_ulong v)
> +{
> +    *(tcg_target_ulong *)s->code_ptr = v;
> +    s->code_ptr += sizeof(tcg_target_ulong);
> +}
> +
> +/* Write 64 bit value. */
> +static void tcg_out64(TCGContext *s, uint64_t v)
> +{
> +    *(uint64_t *)s->code_ptr = v;
> +    s->code_ptr += sizeof(v);
> +}
> +
> +/* Write opcode. */
> +static void tcg_out_op_t(TCGContext *s, TCGOpcode op)
> +{
> +    tcg_out8(s, op);
> +    tcg_out8(s, 0);
> +}
> +
> +/* Write register. */
> +static void tcg_out_r(TCGContext *s, TCGArg t0)
> +{
> +    assert(t0 < TCG_TARGET_NB_REGS);
> +    tcg_out8(s, t0);
> +}
> +
> +/* Write register or constant (native size). */
> +static void tcg_out_ri(TCGContext *s, int const_arg, TCGArg arg)
> +{
> +    if (const_arg) {
> +        assert(const_arg == 1);
> +        tcg_out8(s, TCG_CONST);
> +        tcg_out_i(s, arg);
> +    } else {
> +        tcg_out_r(s, arg);
> +    }
> +}
> +
> +/* Write register or constant (32 bit). */
> +static void tcg_out_ri32(TCGContext *s, int const_arg, TCGArg arg)
> +{
> +    if (const_arg) {
> +        assert(const_arg == 1);
> +        tcg_out8(s, TCG_CONST);
> +        tcg_out32(s, arg);
> +    } else {
> +        tcg_out_r(s, arg);
> +    }
> +}
> +
> +#if TCG_TARGET_REG_BITS == 64
> +/* Write register or constant (64 bit). */
> +static void tcg_out_ri64(TCGContext *s, int const_arg, TCGArg arg)
> +{
> +    if (const_arg) {
> +        assert(const_arg == 1);
> +        tcg_out8(s, TCG_CONST);
> +        tcg_out64(s, arg);
> +    } else {
> +        tcg_out_r(s, arg);
> +    }
> +}
> +#endif
> +
> +/* Write label. */
> +static void tci_out_label(TCGContext *s, TCGArg arg)
> +{
> +    TCGLabel *label = &s->labels[arg];
> +    if (label->has_value) {
> +        tcg_out_i(s, label->u.value);
> +        assert(label->u.value);
> +    } else {
> +        tcg_out_reloc(s, s->code_ptr, sizeof(tcg_target_ulong), arg, 0);
> +        tcg_out_i(s, 0);
> +    }
> +}
> +
> +static void tcg_out_ld(TCGContext *s, TCGType type, int ret, int arg1,
> +                       tcg_target_long arg2)
> +{
> +    uint8_t *old_code_ptr = s->code_ptr;
> +    if (type == TCG_TYPE_I32) {
> +        tcg_out_op_t(s, INDEX_op_ld_i32);
> +        tcg_out_r(s, ret);
> +        tcg_out_r(s, arg1);
> +        tcg_out32(s, arg2);
> +    } else {
> +        assert(type == TCG_TYPE_I64);
> +#if TCG_TARGET_REG_BITS == 64
> +        tcg_out_op_t(s, INDEX_op_ld_i64);
> +        tcg_out_r(s, ret);
> +        tcg_out_r(s, arg1);
> +        assert(arg2 == (uint32_t)arg2);
> +        tcg_out32(s, arg2);
> +#else
> +        TODO();
> +#endif
> +    }
> +    old_code_ptr[1] = s->code_ptr - old_code_ptr;
> +}
> +
> +static void tcg_out_mov(TCGContext *s, TCGType type, int ret, int arg)
> +{
> +    uint8_t *old_code_ptr = s->code_ptr;
> +    assert(ret != arg);
> +#if TCG_TARGET_REG_BITS == 32
> +    tcg_out_op_t(s, INDEX_op_mov_i32);
> +#else
> +    tcg_out_op_t(s, INDEX_op_mov_i64);
> +#endif
> +    tcg_out_r(s, ret);
> +    tcg_out_r(s, arg);
> +    old_code_ptr[1] = s->code_ptr - old_code_ptr;
> +}
> +
> +static void tcg_out_movi(TCGContext *s, TCGType type,
> +                         int t0, tcg_target_long arg)
> +{
> +    uint8_t *old_code_ptr = s->code_ptr;
> +    uint32_t arg32 = arg;
> +    if (type == TCG_TYPE_I32 || arg == arg32) {
> +        tcg_out_op_t(s, INDEX_op_movi_i32);
> +        tcg_out_r(s, t0);
> +        tcg_out32(s, arg32);
> +    } else {
> +        assert(type == TCG_TYPE_I64);
> +#if TCG_TARGET_REG_BITS == 64
> +        tcg_out_op_t(s, INDEX_op_movi_i64);
> +        tcg_out_r(s, t0);
> +        tcg_out64(s, arg);
> +#else
> +        TODO();
> +#endif
> +    }
> +    old_code_ptr[1] = s->code_ptr - old_code_ptr;
> +}
> +
> +static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
> +                       const int *const_args)
> +{
> +    uint8_t *old_code_ptr = s->code_ptr;
> +
> +    tcg_out_op_t(s, opc);
> +
> +    switch (opc) {
> +    case INDEX_op_exit_tb:
> +        tcg_out64(s, args[0]);
> +        break;
> +    case INDEX_op_goto_tb:
> +        if (s->tb_jmp_offset) {
> +            /* Direct jump method. */
> +            assert(args[0] < ARRAY_SIZE(s->tb_jmp_offset));
> +            s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
> +            tcg_out32(s, 0);
> +        } else {
> +            /* Indirect jump method. */
> +            TODO();
> +        }
> +        assert(args[0] < ARRAY_SIZE(s->tb_next_offset));
> +        s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
> +        break;
> +    case INDEX_op_br:
> +        tci_out_label(s, args[0]);
> +        break;
> +    case INDEX_op_call:
> +        tcg_out_ri(s, const_args[0], args[0]);
> +        break;
> +    case INDEX_op_jmp:
> +        TODO();
> +        break;
> +    case INDEX_op_setcond_i32:
> +        tcg_out_r(s, args[0]);
> +        tcg_out_r(s, args[1]);
> +        tcg_out_ri32(s, const_args[2], args[2]);
> +        tcg_out8(s, args[3]);   /* condition */
> +        break;
> +#if TCG_TARGET_REG_BITS == 32
> +    case INDEX_op_setcond2_i32:
> +        /* setcond2_i32 cond, t0, t1_low, t1_high, t2_low, t2_high */
> +        tcg_out_r(s, args[0]);
> +        tcg_out_r(s, args[1]);
> +        tcg_out_r(s, args[2]);
> +        tcg_out_ri32(s, const_args[3], args[3]);
> +        tcg_out_ri32(s, const_args[4], args[4]);
> +        tcg_out8(s, args[5]);   /* condition */
> +        break;
> +#elif TCG_TARGET_REG_BITS == 64
> +    case INDEX_op_setcond_i64:
> +        tcg_out_r(s, args[0]);
> +        tcg_out_r(s, args[1]);
> +        tcg_out_ri64(s, const_args[2], args[2]);
> +        tcg_out8(s, args[3]);   /* condition */
> +        break;
> +#endif
> +    case INDEX_op_movi_i32:
> +        TODO(); /* Handled by tcg_out_movi? */
> +        break;
> +    case INDEX_op_ld8u_i32:
> +    case INDEX_op_ld8s_i32:
> +    case INDEX_op_ld16u_i32:
> +    case INDEX_op_ld16s_i32:
> +    case INDEX_op_ld_i32:
> +    case INDEX_op_st8_i32:
> +    case INDEX_op_st16_i32:
> +    case INDEX_op_st_i32:
> +    case INDEX_op_ld8u_i64:
> +    case INDEX_op_ld8s_i64:
> +    case INDEX_op_ld16u_i64:
> +    case INDEX_op_ld16s_i64:
> +    case INDEX_op_ld32u_i64:
> +    case INDEX_op_ld32s_i64:
> +    case INDEX_op_ld_i64:
> +    case INDEX_op_st8_i64:
> +    case INDEX_op_st16_i64:
> +    case INDEX_op_st32_i64:
> +    case INDEX_op_st_i64:
> +        tcg_out_r(s, args[0]);
> +        tcg_out_r(s, args[1]);
> +        assert(args[2] == (uint32_t)args[2]);
> +        tcg_out32(s, args[2]);
> +        break;
> +    case INDEX_op_add_i32:
> +    case INDEX_op_sub_i32:
> +    case INDEX_op_mul_i32:
> +    case INDEX_op_and_i32:
> +    case INDEX_op_andc_i32:     /* Optional (TCG_TARGET_HAS_andc_i32). */
> +    case INDEX_op_eqv_i32:      /* Optional (TCG_TARGET_HAS_eqv_i32). */
> +    case INDEX_op_nand_i32:     /* Optional (TCG_TARGET_HAS_nand_i32). */
> +    case INDEX_op_nor_i32:      /* Optional (TCG_TARGET_HAS_nor_i32). */
> +    case INDEX_op_or_i32:
> +    case INDEX_op_orc_i32:      /* Optional (TCG_TARGET_HAS_orc_i32). */
> +    case INDEX_op_xor_i32:
> +    case INDEX_op_shl_i32:
> +    case INDEX_op_shr_i32:
> +    case INDEX_op_sar_i32:
> +    case INDEX_op_rotl_i32:     /* Optional (TCG_TARGET_HAS_rot_i32). */
> +    case INDEX_op_rotr_i32:     /* Optional (TCG_TARGET_HAS_rot_i32). */
> +        tcg_out_r(s, args[0]);
> +        tcg_out_ri32(s, const_args[1], args[1]);
> +        tcg_out_ri32(s, const_args[2], args[2]);
> +        break;
> +
> +#if TCG_TARGET_REG_BITS == 64
> +    case INDEX_op_mov_i64:
> +    case INDEX_op_movi_i64:
> +        TODO();
> +        break;
> +    case INDEX_op_add_i64:
> +    case INDEX_op_sub_i64:
> +    case INDEX_op_mul_i64:
> +    case INDEX_op_and_i64:
> +    case INDEX_op_andc_i64:     /* Optional (TCG_TARGET_HAS_andc_i64). */
> +    case INDEX_op_eqv_i64:      /* Optional (TCG_TARGET_HAS_eqv_i64). */
> +    case INDEX_op_nand_i64:     /* Optional (TCG_TARGET_HAS_nand_i64). */
> +    case INDEX_op_nor_i64:      /* Optional (TCG_TARGET_HAS_nor_i64). */
> +    case INDEX_op_or_i64:
> +    case INDEX_op_orc_i64:      /* Optional (TCG_TARGET_HAS_orc_i64). */
> +    case INDEX_op_xor_i64:
> +    case INDEX_op_shl_i64:
> +    case INDEX_op_shr_i64:
> +    case INDEX_op_sar_i64:
> +    /* TODO: Implementation of rotl_i64, rotr_i64 missing in tci.c. */
> +    case INDEX_op_rotl_i64:     /* Optional (TCG_TARGET_HAS_rot_i64). */
> +    case INDEX_op_rotr_i64:     /* Optional (TCG_TARGET_HAS_rot_i64). */
> +        tcg_out_r(s, args[0]);
> +        tcg_out_ri64(s, const_args[1], args[1]);
> +        tcg_out_ri64(s, const_args[2], args[2]);
> +        break;
> +    case INDEX_op_div_i64:      /* Optional (TCG_TARGET_HAS_div_i64). */
> +    case INDEX_op_divu_i64:     /* Optional (TCG_TARGET_HAS_div_i64). */
> +    case INDEX_op_rem_i64:      /* Optional (TCG_TARGET_HAS_div_i64). */
> +    case INDEX_op_remu_i64:     /* Optional (TCG_TARGET_HAS_div_i64). */
> +        TODO();
> +        break;
> +    case INDEX_op_div2_i64:     /* Optional (TCG_TARGET_HAS_div2_i64). */
> +    case INDEX_op_divu2_i64:    /* Optional (TCG_TARGET_HAS_div2_i64). */
> +        TODO();
> +        break;
> +    case INDEX_op_brcond_i64:
> +        tcg_out_r(s, args[0]);
> +        tcg_out_ri64(s, const_args[1], args[1]);
> +        tcg_out8(s, args[2]);           /* condition */
> +        tci_out_label(s, args[3]);
> +        break;
> +    case INDEX_op_bswap16_i64:  /* Optional (TCG_TARGET_HAS_bswap16_i64). */
> +    case INDEX_op_bswap32_i64:  /* Optional (TCG_TARGET_HAS_bswap32_i64). */
> +    case INDEX_op_bswap64_i64:  /* Optional (TCG_TARGET_HAS_bswap64_i64). */
> +    case INDEX_op_not_i64:      /* Optional (TCG_TARGET_HAS_not_i64). */
> +    case INDEX_op_neg_i64:      /* Optional (TCG_TARGET_HAS_neg_i64). */
> +    case INDEX_op_ext8s_i64:    /* Optional (TCG_TARGET_HAS_ext8s_i64). */
> +    case INDEX_op_ext8u_i64:    /* Optional (TCG_TARGET_HAS_ext8u_i64). */
> +    case INDEX_op_ext16s_i64:   /* Optional (TCG_TARGET_HAS_ext16s_i64). */
> +    case INDEX_op_ext16u_i64:   /* Optional (TCG_TARGET_HAS_ext16u_i64). */
> +    case INDEX_op_ext32s_i64:   /* Optional (TCG_TARGET_HAS_ext32s_i64). */
> +    case INDEX_op_ext32u_i64:   /* Optional (TCG_TARGET_HAS_ext32u_i64). */
> +#endif /* TCG_TARGET_REG_BITS == 64 */
> +    case INDEX_op_neg_i32:      /* Optional (TCG_TARGET_HAS_neg_i32). */
> +    case INDEX_op_not_i32:      /* Optional (TCG_TARGET_HAS_not_i32). */
> +    case INDEX_op_ext8s_i32:    /* Optional (TCG_TARGET_HAS_ext8s_i32). */
> +    case INDEX_op_ext16s_i32:   /* Optional (TCG_TARGET_HAS_ext16s_i32). */
> +    case INDEX_op_ext8u_i32:    /* Optional (TCG_TARGET_HAS_ext8u_i32). */
> +    case INDEX_op_ext16u_i32:   /* Optional (TCG_TARGET_HAS_ext16u_i32). */
> +    case INDEX_op_bswap16_i32:  /* Optional (TCG_TARGET_HAS_bswap16_i32). */
> +    case INDEX_op_bswap32_i32:  /* Optional (TCG_TARGET_HAS_bswap32_i32). */
> +        tcg_out_r(s, args[0]);
> +        tcg_out_r(s, args[1]);
> +        break;
> +    case INDEX_op_div_i32:      /* Optional (TCG_TARGET_HAS_div_i32). */
> +    case INDEX_op_divu_i32:     /* Optional (TCG_TARGET_HAS_div_i32). */
> +    case INDEX_op_rem_i32:      /* Optional (TCG_TARGET_HAS_div_i32). */
> +    case INDEX_op_remu_i32:     /* Optional (TCG_TARGET_HAS_div_i32). */
> +        tcg_out_r(s, args[0]);
> +        tcg_out_ri32(s, const_args[1], args[1]);
> +        tcg_out_ri32(s, const_args[2], args[2]);
> +        break;
> +    case INDEX_op_div2_i32:     /* Optional (TCG_TARGET_HAS_div2_i32). */
> +    case INDEX_op_divu2_i32:    /* Optional (TCG_TARGET_HAS_div2_i32). */
> +        TODO();
> +        break;
> +#if TCG_TARGET_REG_BITS == 32
> +    case INDEX_op_add2_i32:
> +    case INDEX_op_sub2_i32:
> +        tcg_out_r(s, args[0]);
> +        tcg_out_r(s, args[1]);
> +        tcg_out_r(s, args[2]);
> +        tcg_out_r(s, args[3]);
> +        tcg_out_r(s, args[4]);
> +        tcg_out_r(s, args[5]);
> +        break;
> +    case INDEX_op_brcond2_i32:
> +        tcg_out_r(s, args[0]);
> +        tcg_out_r(s, args[1]);
> +        tcg_out_ri32(s, const_args[2], args[2]);
> +        tcg_out_ri32(s, const_args[3], args[3]);
> +        tcg_out8(s, args[4]);           /* condition */
> +        tci_out_label(s, args[5]);
> +        break;
> +    case INDEX_op_mulu2_i32:
> +        tcg_out_r(s, args[0]);
> +        tcg_out_r(s, args[1]);
> +        tcg_out_r(s, args[2]);
> +        tcg_out_r(s, args[3]);
> +        break;
> +#endif
> +    case INDEX_op_brcond_i32:
> +        tcg_out_r(s, args[0]);
> +        tcg_out_ri32(s, const_args[1], args[1]);
> +        tcg_out8(s, args[2]);           /* condition */
> +        tci_out_label(s, args[3]);
> +        break;
> +    case INDEX_op_qemu_ld8u:
> +    case INDEX_op_qemu_ld8s:
> +    case INDEX_op_qemu_ld16u:
> +    case INDEX_op_qemu_ld16s:
> +    case INDEX_op_qemu_ld32:
> +#if TCG_TARGET_REG_BITS == 64
> +    case INDEX_op_qemu_ld32s:
> +    case INDEX_op_qemu_ld32u:
> +#endif
> +        tcg_out_r(s, *args++);
> +        tcg_out_r(s, *args++);
> +#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
> +        tcg_out_r(s, *args++);
> +#endif
> +#ifdef CONFIG_SOFTMMU
> +        tcg_out_i(s, *args);
> +#endif
> +        break;
> +    case INDEX_op_qemu_ld64:
> +        tcg_out_r(s, *args++);
> +#if TCG_TARGET_REG_BITS == 32
> +        tcg_out_r(s, *args++);
> +#endif
> +        tcg_out_r(s, *args++);
> +#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
> +        tcg_out_r(s, *args++);
> +#endif
> +#ifdef CONFIG_SOFTMMU
> +        tcg_out_i(s, *args);
> +#endif
> +        break;
> +    case INDEX_op_qemu_st8:
> +    case INDEX_op_qemu_st16:
> +    case INDEX_op_qemu_st32:
> +        tcg_out_r(s, *args++);
> +        tcg_out_r(s, *args++);
> +#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
> +        tcg_out_r(s, *args++);
> +#endif
> +#ifdef CONFIG_SOFTMMU
> +        tcg_out_i(s, *args);
> +#endif
> +        break;
> +    case INDEX_op_qemu_st64:
> +        tcg_out_r(s, *args++);
> +#if TCG_TARGET_REG_BITS == 32
> +        tcg_out_r(s, *args++);
> +#endif
> +        tcg_out_r(s, *args++);
> +#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
> +        tcg_out_r(s, *args++);
> +#endif
> +#ifdef CONFIG_SOFTMMU
> +        tcg_out_i(s, *args);
> +#endif
> +        break;
> +    case INDEX_op_end:
> +        TODO();
> +        break;
> +    default:
> +        fprintf(stderr, "Missing: %s\n", tcg_op_defs[opc].name);
> +        tcg_abort();
> +    }
> +    old_code_ptr[1] = s->code_ptr - old_code_ptr;
> +}
> +
> +static void tcg_out_st(TCGContext *s, TCGType type, int arg, int arg1,
> +                       tcg_target_long arg2)
> +{
> +    uint8_t *old_code_ptr = s->code_ptr;
> +    if (type == TCG_TYPE_I32) {
> +        tcg_out_op_t(s, INDEX_op_st_i32);
> +        tcg_out_r(s, arg);
> +        tcg_out_r(s, arg1);
> +        tcg_out32(s, arg2);
> +    } else {
> +        assert(type == TCG_TYPE_I64);
> +#if TCG_TARGET_REG_BITS == 64
> +        tcg_out_op_t(s, INDEX_op_st_i64);
> +        tcg_out_r(s, arg);
> +        tcg_out_r(s, arg1);
> +        tcg_out32(s, arg2);
> +#else
> +        TODO();
> +#endif
> +    }
> +    old_code_ptr[1] = s->code_ptr - old_code_ptr;
> +}
> +
> +/* Test if a constant matches the constraint. */
> +static int tcg_target_const_match(tcg_target_long val,
> +                                  const TCGArgConstraint *arg_ct)
> +{
> +    /* No need to return 0 or 1, 0 or != 0 is good enough. */
> +    return arg_ct->ct & TCG_CT_CONST;
> +}
> +
> +/* Maximum number of register used for input function arguments. */
> +static int tcg_target_get_call_iarg_regs_count(int flags)
> +{
> +    return ARRAY_SIZE(tcg_target_call_iarg_regs);
> +}
> +
> +static void tcg_target_init(TCGContext *s)
> +{
> +#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
> +    const char *envval = getenv("DEBUG_TCG");
> +    if (envval) {
> +        loglevel = strtol(envval, NULL, 0);
> +    }
> +#endif
> +    TRACE();
> +
> +    /* The current code uses uint8_t for tcg operations. */
> +    assert(ARRAY_SIZE(tcg_op_defs) <= UINT8_MAX);
> +
> +    /* Registers available for 32 bit operations. */
> +    tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0,
> +                     BIT(TCG_TARGET_NB_REGS) - 1);
> +    /* Registers available for 64 bit operations. */
> +    tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0,
> +                     BIT(TCG_TARGET_NB_REGS) - 1);
> +    /* TODO: Which registers should be set here? */
> +    tcg_regset_set32(tcg_target_call_clobber_regs, 0,
> +                     BIT(TCG_TARGET_NB_REGS) - 1);
> +    tcg_regset_clear(s->reserved_regs);
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
> +    tcg_add_target_add_op_defs(tcg_target_op_defs);
> +    tcg_set_frame(s, TCG_AREG0, offsetof(CPUState, temp_buf),
> +                  CPU_TEMP_BUF_NLONGS * sizeof(long));
> +}
> +
> +/* Generate global QEMU prologue and epilogue code. */
> +static void tcg_target_qemu_prologue(TCGContext *s)
> +{
> +    TRACE();
> +    tb_ret_addr = s->code_ptr;
> +}
> diff --git a/tcg/bytecode/tcg-target.h b/tcg/bytecode/tcg-target.h
> new file mode 100644
> index 0000000..05aaaf2
> --- /dev/null
> +++ b/tcg/bytecode/tcg-target.h
> @@ -0,0 +1,152 @@
> +/*
> + * Tiny Code Generator for QEMU
> + *
> + * Copyright (c) 2009, 2011 Stefan Weil
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +/*
> + * This code implements a TCG which does not generate machine code for some
> + * real target machine but which generates virtual machine code for an
> + * interpreter. Interpreted pseudo code is slow, but it works on any host.
> + *
> + * Some remarks might help in understanding the code:
> + *
> + * "target" or "TCG target" is the machine which runs the generated code.
> + * This is different to the usual meaning in QEMU where "target" is the
> + * emulated machine. So normally QEMU host is identical to TCG target.
> + * Here the TCG target is a virtual machine, but this virtual machine must
> + * use the same word size like the real machine.

Why, for performance? Allowing that could be useful for testing TCG,
perhaps we could even use non-native endianness?

> + * Therefore, we need both 32 and 64 bit virtual machines (interpreter).
> + */
> +
> +#if !defined(TCG_TARGET_H)
> +#define TCG_TARGET_H
> +
> +#include "config-host.h"
> +
> +#define TCG_TARGET_INTERPRETER 1
> +
> +#ifdef CONFIG_DEBUG_TCG
> +/* Enable debug output. */
> +#define CONFIG_DEBUG_TCG_INTERPRETER
> +#endif
> +
> +#if 0 /* TCI tries to emulate a little endian host. */
> +#if defined(HOST_WORDS_BIGENDIAN)
> +# define TCG_TARGET_WORDS_BIGENDIAN
> +#endif
> +#endif
> +
> +/* Optional instructions. */
> +
> +#define TCG_TARGET_HAS_bswap16_i32      1
> +#define TCG_TARGET_HAS_bswap32_i32      1
> +/* Not more than one of the next two defines must be 1. */
> +#define TCG_TARGET_HAS_div_i32          1
> +#define TCG_TARGET_HAS_div2_i32         0
> +#define TCG_TARGET_HAS_ext8s_i32        1
> +#define TCG_TARGET_HAS_ext16s_i32       1
> +#define TCG_TARGET_HAS_ext8u_i32        1
> +#define TCG_TARGET_HAS_ext16u_i32       1
> +#define TCG_TARGET_HAS_andc_i32         0
> +#define TCG_TARGET_HAS_deposit_i32      0
> +#define TCG_TARGET_HAS_eqv_i32          0
> +#define TCG_TARGET_HAS_nand_i32         0
> +#define TCG_TARGET_HAS_nor_i32          0
> +#define TCG_TARGET_HAS_neg_i32          1
> +#define TCG_TARGET_HAS_not_i32          1
> +#define TCG_TARGET_HAS_orc_i32          0
> +#define TCG_TARGET_HAS_rot_i32          1
> +
> +#if TCG_TARGET_REG_BITS == 64
> +#define TCG_TARGET_HAS_bswap16_i64      1
> +#define TCG_TARGET_HAS_bswap32_i64      1
> +#define TCG_TARGET_HAS_bswap64_i64      1
> +#define TCG_TARGET_HAS_deposit_i64      0
> +/* Not more than one of the next two defines must be 1. */
> +#define TCG_TARGET_HAS_div_i64          0
> +#define TCG_TARGET_HAS_div2_i64         0
> +#define TCG_TARGET_HAS_ext8s_i64        1
> +#define TCG_TARGET_HAS_ext16s_i64       1
> +#define TCG_TARGET_HAS_ext32s_i64       1
> +#define TCG_TARGET_HAS_ext8u_i64        1
> +#define TCG_TARGET_HAS_ext16u_i64       1
> +#define TCG_TARGET_HAS_ext32u_i64       1
> +#define TCG_TARGET_HAS_andc_i64         0
> +#define TCG_TARGET_HAS_eqv_i64          0
> +#define TCG_TARGET_HAS_nand_i64         0
> +#define TCG_TARGET_HAS_nor_i64          0
> +#define TCG_TARGET_HAS_neg_i64          1
> +#define TCG_TARGET_HAS_not_i64          1
> +#define TCG_TARGET_HAS_orc_i64          0
> +#define TCG_TARGET_HAS_rot_i64          1
> +#endif /* TCG_TARGET_REG_BITS == 64 */
> +
> +/* Offset to user memory in user mode. */
> +#define TCG_TARGET_HAS_GUEST_BASE
> +
> +/* Number of registers available.
> +   For 32 bit hosts, we need more than 8 registers (call arguments). */

On i386 there certainly aren't 8 registers, where does 8 come from?

> +/* #define TCG_TARGET_NB_REGS 8 */
> +#define TCG_TARGET_NB_REGS 16

Again, one way to test TCG would be to minimize and maximize the
number of registers.

> +/* #define TCG_TARGET_NB_REGS 32 */
> +
> +/* List of registers which are used by TCG. */
> +typedef enum {
> +    TCG_REG_R0 = 0,
> +    TCG_REG_R1,
> +    TCG_REG_R2,
> +    TCG_REG_R3,
> +    TCG_REG_R4,
> +    TCG_REG_R5,
> +    TCG_REG_R6,
> +    TCG_REG_R7,
> +    TCG_AREG0 = TCG_REG_R7,
> +#if TCG_TARGET_NB_REGS >= 16
> +    TCG_REG_R8,
> +    TCG_REG_R9,
> +    TCG_REG_R10,
> +    TCG_REG_R11,
> +    TCG_REG_R12,
> +    TCG_REG_R13,
> +    TCG_REG_R14,
> +    TCG_REG_R15,
> +#if TCG_TARGET_NB_REGS >= 32
> +    TCG_REG_R16,
> +    TCG_REG_R17,
> +    TCG_REG_R18,
> +    TCG_REG_R19,
> +    TCG_REG_R20,
> +    TCG_REG_R21,
> +    TCG_REG_R22,
> +    TCG_REG_R23,
> +    TCG_REG_R24,
> +    TCG_REG_R25,
> +    TCG_REG_R26,
> +    TCG_REG_R27,
> +    TCG_REG_R28,
> +    TCG_REG_R29,
> +    TCG_REG_R30,
> +    TCG_REG_R31,
> +#endif
> +#endif
> +    /* Special value UINT8_MAX is used by TCI to encode constant values. */
> +    TCG_CONST = UINT8_MAX
> +} TCGRegister;
> +
> +void tci_disas(uint8_t opc);
> +
> +#endif /* TCG_TARGET_H */
> --
> 1.7.2.5
>
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 7/8] tcg: Add tcg interpreter to configure / make
  2011-09-18  9:37   ` Blue Swirl
@ 2011-09-18 10:14     ` Stefan Weil
  0 siblings, 0 replies; 48+ messages in thread
From: Stefan Weil @ 2011-09-18 10:14 UTC (permalink / raw)
  To: Blue Swirl; +Cc: QEMU Developers

Am 18.09.2011 11:37, schrieb Blue Swirl:
> On Sat, Sep 17, 2011 at 8:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>> Signed-off-by: Stefan Weil <weil@mail.berlios.de>
>> ---
>>  Makefile.target |    1 +
>>  configure       |   30 ++++++++++++++++++++++++++++--
>>  2 files changed, 29 insertions(+), 2 deletions(-)
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index 88d2f1f..a2c3a4a 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -69,6 +69,7 @@ all: $(PROGS) stap
>>  # cpu emulator library
>>  libobj-y = exec.o translate-all.o cpu-exec.o translate.o
>>  libobj-y += tcg/tcg.o tcg/optimize.o
>> +libobj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o
>>  libobj-y += fpu/softfloat.o
>>  libobj-y += op_helper.o helper.o
>>  ifeq ($(TARGET_BASE_ARCH), i386)
>> diff --git a/configure b/configure
[snip]
>> @@ -2761,6 +2768,15 @@ case "$cpu" in
>>   armv4b|armv4l)
>>     ARCH=arm
>>   ;;
>> +  *)
>> +    if test "$tcg_interpreter" = "yes" ; then
>> +        echo "Unsupported CPU = $cpu, will use TCG with TCI 
>> (experimental)"
>> +        ARCH=unknown
>
> ARCH=TCI or 'all' would be more accurate.

Ok, I'll change it to ARCH=all (or 'any' or 'tci', if that is preferred).

>> +if test "$tcg_interpreter" = "yes"; then
>
> Here the test should be against ARCH for consistency.

That would not work:

There are 3 supported setups: no tcg interpreter, tcg interpreter
with known ARCH and tcg interpreter with unknown ARCH.

For the include path, I must test $tcg_interpreter.
For the linker script, I test $ARCH (see below).

>> +  includes="-I\$(SRC_PATH)/tcg/bytecode $includes"
>> +elif test "$ARCH" = "sparc64" ; then
>>    includes="-I\$(SRC_PATH)/tcg/sparc $includes"
>>   elif test "$ARCH" = "s390x" ; then
>>    includes="-I\$(SRC_PATH)/tcg/s390 $includes"
>> @@ -3577,7 +3598,12 @@ if test "$gprof" = "yes" ; then
>>    fi
>>   fi
>>
>> -linker_script="-Wl,-T../config-host.ld -Wl,-T,\$(SRC_PATH)/\$(ARCH).ld"
>> +if test "$ARCH" = "unknown"; then
>> +  linker_script=""
>> +else
>> +  linker_script="-Wl,-T../config-host.ld -Wl,-T,\$(SRC_PATH)/\$(ARCH).ld"
>> +fi
>> +

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode Stefan Weil
  2011-09-18  4:03   ` Andi Kleen
@ 2011-09-18 10:18   ` Blue Swirl
  2011-09-19 16:43   ` Richard Henderson
  2011-09-19 20:24   ` Stuart Brady
  3 siblings, 0 replies; 48+ messages in thread
From: Blue Swirl @ 2011-09-18 10:18 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

On Sat, Sep 17, 2011 at 8:00 PM, Stefan Weil <weil@mail.berlios.de> wrote:
> Signed-off-by: Stefan Weil <weil@mail.berlios.de>
> ---
>  tcg/tcg.h |    4 +-
>  tcg/tci.c | 1200 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 1203 insertions(+), 1 deletions(-)
>  create mode 100644 tcg/tci.c
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 1859fae..c99c7ea 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -577,7 +577,9 @@ TCGv_i32 tcg_const_local_i32(int32_t val);
>  TCGv_i64 tcg_const_local_i64(int64_t val);
>
>  extern uint8_t code_gen_prologue[];
> -#if defined(_ARCH_PPC) && !defined(_ARCH_PPC64)
> +#if defined(CONFIG_TCG_INTERPRETER)
> +unsigned long tcg_qemu_tb_exec(CPUState *env, uint8_t *tb_ptr);
> +#elif defined(_ARCH_PPC) && !defined(_ARCH_PPC64)
>  #define tcg_qemu_tb_exec(env, tb_ptr)                                    \
>     ((long REGPARM __attribute__ ((longcall)) (*)(void *, void *))code_gen_prologue)(env, tb_ptr)
>  #else
> diff --git a/tcg/tci.c b/tcg/tci.c
> new file mode 100644
> index 0000000..eea9992
> --- /dev/null
> +++ b/tcg/tci.c
> @@ -0,0 +1,1200 @@
> +/*
> + * Tiny Code Interpreter for QEMU
> + *
> + * Copyright (c) 2009, 2011 Stefan Weil
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "config.h"
> +#include "qemu-common.h"
> +#include "exec-all.h"           /* MAX_OPC_PARAM_IARGS */
> +#include "tcg-op.h"
> +
> +/* Marker for missing code. */
> +#define TODO() \
> +    do { \
> +        fprintf(stderr, "TODO %s:%u: %s()\n", \
> +                __FILE__, __LINE__, __func__); \
> +        tcg_abort(); \
> +    } while (0)
> +
> +/* Trace message to see program flow. */
> +#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
> +#define TRACE() \
> +    loglevel \
> +    ? fprintf(stderr, "TCG %s:%u: %s()\n", __FILE__, __LINE__, __func__) \
> +    : (void)0
> +#else
> +#define TRACE() ((void)0)
> +#endif

The macros must use do { } while(0), but inline function would be
preferred over those as well as tracepoints.

> +
> +#if MAX_OPC_PARAM_IARGS != 4
> +# error Fix needed, number of supported input arguments changed!
> +#endif
> +#if TCG_TARGET_REG_BITS == 32
> +typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong,
> +                                    tcg_target_ulong, tcg_target_ulong,
> +                                    tcg_target_ulong, tcg_target_ulong,
> +                                    tcg_target_ulong, tcg_target_ulong);
> +#else
> +typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong,
> +                                    tcg_target_ulong, tcg_target_ulong);
> +#endif
> +
> +CPUState *env;
> +
> +/* Alpha and SH4 user mode emulations call GETPC(), so they need tci_tb_ptr. */
> +#if defined(CONFIG_SOFTMMU) || defined(TARGET_ALPHA) || defined(TARGET_SH4)
> +# define NEEDS_TB_PTR

This should be #defined in target-alpha/cpu.h etc. or dyngen-exec.h.

> +#endif
> +
> +#ifdef NEEDS_TB_PTR
> +uint8_t *tci_tb_ptr;
> +#endif
> +
> +static tcg_target_ulong tci_reg[TCG_TARGET_NB_REGS];
> +
> +static tcg_target_ulong tci_read_reg(TCGRegister index)
> +{
> +    assert(index < ARRAY_SIZE(tci_reg));
> +    return tci_reg[index];
> +}
> +
> +#if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
> +static int8_t tci_read_reg8s(TCGRegister index)
> +{
> +    return (int8_t)tci_read_reg(index);
> +}
> +#endif
> +
> +#if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
> +static int16_t tci_read_reg16s(TCGRegister index)
> +{
> +    return (int16_t)tci_read_reg(index);
> +}
> +#endif
> +
> +#if TCG_TARGET_REG_BITS == 64
> +static int32_t tci_read_reg32s(TCGRegister index)
> +{
> +    return (int32_t)tci_read_reg(index);
> +}
> +#endif
> +
> +static uint8_t tci_read_reg8(TCGRegister index)
> +{
> +    return (uint8_t)tci_read_reg(index);
> +}
> +
> +static uint16_t tci_read_reg16(TCGRegister index)
> +{
> +    return (uint16_t)tci_read_reg(index);
> +}
> +
> +static uint32_t tci_read_reg32(TCGRegister index)
> +{
> +    return (uint32_t)tci_read_reg(index);
> +}
> +
> +#if TCG_TARGET_REG_BITS == 64
> +static uint64_t tci_read_reg64(TCGRegister index)
> +{
> +    return tci_read_reg(index);
> +}
> +#endif
> +
> +static void tci_write_reg(TCGRegister index, tcg_target_ulong value)
> +{
> +    assert(index < ARRAY_SIZE(tci_reg));
> +    assert(index != TCG_AREG0);
> +    tci_reg[index] = value;
> +}
> +
> +static void tci_write_reg8s(TCGRegister index, int8_t value)
> +{
> +    tci_write_reg(index, value);
> +}
> +
> +static void tci_write_reg16s(TCGRegister index, int16_t value)
> +{
> +    tci_write_reg(index, value);
> +}
> +
> +#if TCG_TARGET_REG_BITS == 64
> +static void tci_write_reg32s(TCGRegister index, int32_t value)
> +{
> +    tci_write_reg(index, value);
> +}
> +#endif
> +
> +static void tci_write_reg8(TCGRegister index, uint8_t value)
> +{
> +    tci_write_reg(index, value);
> +}
> +
> +static void tci_write_reg16(TCGRegister index, uint16_t value)
> +{
> +    tci_write_reg(index, value);
> +}
> +
> +static void tci_write_reg32(TCGRegister index, uint32_t value)
> +{
> +    tci_write_reg(index, value);
> +}
> +
> +#if TCG_TARGET_REG_BITS == 32
> +static void tci_write_reg64(uint32_t high_index, uint32_t low_index,
> +                            uint64_t value)
> +{
> +    tci_write_reg(low_index, value);
> +    tci_write_reg(high_index, value >> 32);
> +}
> +#elif TCG_TARGET_REG_BITS == 64
> +static void tci_write_reg64(TCGRegister index, uint64_t value)
> +{
> +    tci_write_reg(index, value);
> +}
> +#endif
> +
> +#if TCG_TARGET_REG_BITS == 32
> +/* Create a 64 bit value from two 32 bit values. */
> +static uint64_t tci_uint64(uint32_t high, uint32_t low)
> +{
> +    return ((uint64_t)high << 32) + low;
> +}
> +#endif
> +
> +/* Read constant (native size) from bytecode. */
> +static tcg_target_ulong tci_read_i(uint8_t **tb_ptr)
> +{
> +    tcg_target_ulong value = *(tcg_target_ulong *)(*tb_ptr);
> +    *tb_ptr += sizeof(value);
> +    return value;
> +}
> +
> +/* Read constant (32 bit) from bytecode. */
> +static uint32_t tci_read_i32(uint8_t **tb_ptr)
> +{
> +    uint32_t value = *(uint32_t *)(*tb_ptr);
> +    *tb_ptr += sizeof(value);
> +    return value;
> +}
> +
> +#if TCG_TARGET_REG_BITS == 64
> +/* Read constant (64 bit) from bytecode. */
> +static uint64_t tci_read_i64(uint8_t **tb_ptr)
> +{
> +    uint64_t value = *(uint64_t *)(*tb_ptr);
> +    *tb_ptr += sizeof(value);
> +    return value;
> +}
> +#endif
> +
> +/* Read indexed register (native size) from bytecode. */
> +static tcg_target_ulong tci_read_r(uint8_t **tb_ptr)
> +{
> +    tcg_target_ulong value = tci_read_reg(**tb_ptr);
> +    *tb_ptr += 1;
> +    return value;
> +}
> +
> +/* Read indexed register (8 bit) from bytecode. */
> +static uint8_t tci_read_r8(uint8_t **tb_ptr)
> +{
> +    uint8_t value = tci_read_reg8(**tb_ptr);
> +    *tb_ptr += 1;
> +    return value;
> +}
> +
> +#if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
> +/* Read indexed register (8 bit signed) from bytecode. */
> +static int8_t tci_read_r8s(uint8_t **tb_ptr)
> +{
> +    int8_t value = tci_read_reg8s(**tb_ptr);
> +    *tb_ptr += 1;
> +    return value;
> +}
> +#endif
> +
> +/* Read indexed register (16 bit) from bytecode. */
> +static uint16_t tci_read_r16(uint8_t **tb_ptr)
> +{
> +    uint16_t value = tci_read_reg16(**tb_ptr);
> +    *tb_ptr += 1;
> +    return value;
> +}
> +
> +#if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
> +/* Read indexed register (16 bit signed) from bytecode. */
> +static int16_t tci_read_r16s(uint8_t **tb_ptr)
> +{
> +    int16_t value = tci_read_reg16s(**tb_ptr);
> +    *tb_ptr += 1;
> +    return value;
> +}
> +#endif
> +
> +/* Read indexed register (32 bit) from bytecode. */
> +static uint32_t tci_read_r32(uint8_t **tb_ptr)
> +{
> +    uint32_t value = tci_read_reg32(**tb_ptr);
> +    *tb_ptr += 1;
> +    return value;
> +}
> +
> +#if TCG_TARGET_REG_BITS == 32
> +/* Read two indexed registers (2 * 32 bit) from bytecode. */
> +static uint64_t tci_read_r64(uint8_t **tb_ptr)
> +{
> +    uint32_t low = tci_read_r32(tb_ptr);
> +    return tci_uint64(tci_read_r32(tb_ptr), low);
> +}
> +#elif TCG_TARGET_REG_BITS == 64
> +/* Read indexed register (32 bit signed) from bytecode. */
> +static int32_t tci_read_r32s(uint8_t **tb_ptr)
> +{
> +    int32_t value = tci_read_reg32s(**tb_ptr);
> +    *tb_ptr += 1;
> +    return value;
> +}
> +
> +/* Read indexed register (64 bit) from bytecode. */
> +static uint64_t tci_read_r64(uint8_t **tb_ptr)
> +{
> +    uint64_t value = tci_read_reg64(**tb_ptr);
> +    *tb_ptr += 1;
> +    return value;
> +}
> +#endif
> +
> +/* Read indexed register(s) with target address from bytecode. */
> +static target_ulong tci_read_ulong(uint8_t **tb_ptr)
> +{
> +    target_ulong taddr = tci_read_r(tb_ptr);
> +#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
> +    taddr += (uint64_t)tci_read_r(tb_ptr) << 32;
> +#endif
> +    return taddr;
> +}
> +
> +/* Read indexed register or constant (native size) from bytecode. */
> +static tcg_target_ulong tci_read_ri(uint8_t **tb_ptr)
> +{
> +    tcg_target_ulong value;
> +    TCGRegister r = **tb_ptr;
> +    *tb_ptr += 1;
> +    if (r == TCG_CONST) {
> +        value = tci_read_i(tb_ptr);
> +    } else {
> +        value = tci_read_reg(r);
> +    }
> +    return value;
> +}
> +
> +/* Read indexed register or constant (32 bit) from bytecode. */
> +static uint32_t tci_read_ri32(uint8_t **tb_ptr)
> +{
> +    uint32_t value;
> +    TCGRegister r = **tb_ptr;
> +    *tb_ptr += 1;
> +    if (r == TCG_CONST) {
> +        value = tci_read_i32(tb_ptr);
> +    } else {
> +        value = tci_read_reg32(r);
> +    }
> +    return value;
> +}
> +
> +#if TCG_TARGET_REG_BITS == 32
> +/* Read two indexed registers or constants (2 * 32 bit) from bytecode. */
> +static uint64_t tci_read_ri64(uint8_t **tb_ptr)
> +{
> +    uint32_t low = tci_read_ri32(tb_ptr);
> +    return tci_uint64(tci_read_ri32(tb_ptr), low);
> +}
> +#elif TCG_TARGET_REG_BITS == 64
> +/* Read indexed register or constant (64 bit) from bytecode. */
> +static uint64_t tci_read_ri64(uint8_t **tb_ptr)
> +{
> +    uint64_t value;
> +    TCGRegister r = **tb_ptr;
> +    *tb_ptr += 1;
> +    if (r == TCG_CONST) {
> +        value = tci_read_i64(tb_ptr);
> +    } else {
> +        value = tci_read_reg64(r);
> +    }
> +    return value;
> +}
> +#endif
> +
> +static target_ulong tci_read_label(uint8_t **tb_ptr)
> +{
> +    target_ulong label = tci_read_i(tb_ptr);
> +    assert(label != 0);
> +    return label;
> +}
> +
> +static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
> +{
> +    bool result = false;
> +    int32_t i0 = u0;
> +    int32_t i1 = u1;
> +    switch (condition) {
> +    case TCG_COND_EQ:
> +        result = (u0 == u1);
> +        break;
> +    case TCG_COND_NE:
> +        result = (u0 != u1);
> +        break;
> +    case TCG_COND_LT:
> +        result = (i0 < i1);
> +        break;
> +    case TCG_COND_GE:
> +        result = (i0 >= i1);
> +        break;
> +    case TCG_COND_LE:
> +        result = (i0 <= i1);
> +        break;
> +    case TCG_COND_GT:
> +        result = (i0 > i1);
> +        break;
> +    case TCG_COND_LTU:
> +        result = (u0 < u1);
> +        break;
> +    case TCG_COND_GEU:
> +        result = (u0 >= u1);
> +        break;
> +    case TCG_COND_LEU:
> +        result = (u0 <= u1);
> +        break;
> +    case TCG_COND_GTU:
> +        result = (u0 > u1);
> +        break;
> +    default:
> +        TODO();
> +    }
> +    return result;
> +}
> +
> +static bool tci_compare64(uint64_t u0, uint64_t u1, TCGCond condition)
> +{
> +    bool result = false;
> +    int64_t i0 = u0;
> +    int64_t i1 = u1;
> +    switch (condition) {
> +    case TCG_COND_EQ:
> +        result = (u0 == u1);
> +        break;
> +    case TCG_COND_NE:
> +        result = (u0 != u1);
> +        break;
> +    case TCG_COND_LT:
> +        result = (i0 < i1);
> +        break;
> +    case TCG_COND_GE:
> +        result = (i0 >= i1);
> +        break;
> +    case TCG_COND_LE:
> +        result = (i0 <= i1);
> +        break;
> +    case TCG_COND_GT:
> +        result = (i0 > i1);
> +        break;
> +    case TCG_COND_LTU:
> +        result = (u0 < u1);
> +        break;
> +    case TCG_COND_GEU:
> +        result = (u0 >= u1);
> +        break;
> +    case TCG_COND_LEU:
> +        result = (u0 <= u1);
> +        break;
> +    case TCG_COND_GTU:
> +        result = (u0 > u1);
> +        break;
> +    default:
> +        TODO();
> +    }
> +    return result;
> +}
> +
> +/* Interpret pseudo code in tb. */
> +unsigned long tcg_qemu_tb_exec(CPUState *cpustate, uint8_t *tb_ptr)
> +{
> +    unsigned long next_tb = 0;
> +
> +    TRACE();
> +
> +    env = cpustate;
> +    tci_reg[TCG_AREG0] = (tcg_target_ulong)cpustate;
> +    assert(tb_ptr);
> +
> +    for (;;) {
> +#ifdef NEEDS_TB_PTR
> +        tci_tb_ptr = tb_ptr;
> +#endif
> +        uint8_t *old_code_ptr = tb_ptr;
> +        TCGOpcode opc = *tb_ptr++;
> +        uint8_t op_size = *tb_ptr++;
> +        tcg_target_ulong t0;
> +        tcg_target_ulong t1;
> +        tcg_target_ulong t2;
> +        tcg_target_ulong label;
> +        TCGCond condition;
> +        target_ulong taddr;
> +#ifndef CONFIG_SOFTMMU
> +        tcg_target_ulong host_addr;
> +#endif
> +        uint8_t u8;
> +        uint16_t u16;
> +        uint32_t u32;
> +        uint64_t u64;

The above variable names could conflict with Linux kernel headers if
they leaked the types used there. s/u/val/?

> +#if TCG_TARGET_REG_BITS == 32
> +        uint64_t v64;
> +#endif
> +
> +#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
> +        if (loglevel) {
> +            tci_disas(opc);
> +        }
> +#endif
> +
> +        switch (opc) {
> +        case INDEX_op_end:
> +        case INDEX_op_nop:
> +            break;
> +        case INDEX_op_nop1:
> +        case INDEX_op_nop2:
> +        case INDEX_op_nop3:
> +        case INDEX_op_nopn:
> +        case INDEX_op_discard:
> +            TODO();
> +            break;
> +        case INDEX_op_set_label:
> +            TODO();
> +            break;
> +        case INDEX_op_call:
> +            t0 = tci_read_ri(&tb_ptr);
> +#if TCG_TARGET_REG_BITS == 32
> +            u64 = ((helper_function)t0)(tci_read_reg(TCG_REG_R0),
> +                                        tci_read_reg(TCG_REG_R1),
> +                                        tci_read_reg(TCG_REG_R2),
> +                                        tci_read_reg(TCG_REG_R3),
> +                                        tci_read_reg(TCG_REG_R5),
> +                                        tci_read_reg(TCG_REG_R6),
> +                                        tci_read_reg(TCG_REG_R7),
> +                                        tci_read_reg(TCG_REG_R8));
> +            tci_write_reg(TCG_REG_R0, u64);
> +            tci_write_reg(TCG_REG_R1, u64 >> 32);
> +#else
> +            u64 = ((helper_function)t0)(tci_read_reg(TCG_REG_R0),
> +                                        tci_read_reg(TCG_REG_R1),
> +                                        tci_read_reg(TCG_REG_R2),
> +                                        tci_read_reg(TCG_REG_R3));
> +            tci_write_reg(TCG_REG_R0, u64);
> +#endif
> +            break;
> +        case INDEX_op_jmp:
> +        case INDEX_op_br:
> +            label = tci_read_label(&tb_ptr);
> +            assert(tb_ptr == old_code_ptr + op_size);
> +            tb_ptr = (uint8_t *)label;
> +            continue;
> +        case INDEX_op_setcond_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            condition = *tb_ptr++;
> +            tci_write_reg32(t0, tci_compare32(t1, t2, condition));
> +            break;
> +#if TCG_TARGET_REG_BITS == 32
> +        case INDEX_op_setcond2_i32:
> +            t0 = *tb_ptr++;
> +            u64 = tci_read_r64(&tb_ptr);
> +            v64 = tci_read_ri64(&tb_ptr);
> +            condition = *tb_ptr++;
> +            tci_write_reg32(t0, tci_compare64(u64, v64, condition));
> +            break;
> +#elif TCG_TARGET_REG_BITS == 64
> +        case INDEX_op_setcond_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r64(&tb_ptr);
> +            t2 = tci_read_ri64(&tb_ptr);
> +            condition = *tb_ptr++;
> +            tci_write_reg64(t0, tci_compare64(t1, t2, condition));
> +            break;
> +#endif
> +        case INDEX_op_mov_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r32(&tb_ptr);
> +            tci_write_reg32(t0, t1);
> +            break;
> +        case INDEX_op_movi_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_i32(&tb_ptr);
> +            tci_write_reg32(t0, t1);
> +            break;
> +    /* Load/store operations. */
> +        case INDEX_op_ld8u_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r(&tb_ptr);
> +            t2 = tci_read_i32(&tb_ptr);
> +            tci_write_reg8(t0, *(uint8_t *)(t1 + t2));
> +            break;
> +        case INDEX_op_ld8s_i32:
> +        case INDEX_op_ld16u_i32:
> +            TODO();
> +            break;
> +        case INDEX_op_ld16s_i32:
> +            TODO();
> +            break;
> +        case INDEX_op_ld_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r(&tb_ptr);
> +            t2 = tci_read_i32(&tb_ptr);
> +            tci_write_reg32(t0, *(uint32_t *)(t1 + t2));
> +            break;
> +        case INDEX_op_st8_i32:
> +            t0 = tci_read_r8(&tb_ptr);
> +            t1 = tci_read_r(&tb_ptr);
> +            t2 = tci_read_i32(&tb_ptr);
> +            *(uint8_t *)(t1 + t2) = t0;
> +            break;
> +        case INDEX_op_st16_i32:
> +            t0 = tci_read_r16(&tb_ptr);
> +            t1 = tci_read_r(&tb_ptr);
> +            t2 = tci_read_i32(&tb_ptr);
> +            *(uint16_t *)(t1 + t2) = t0;
> +            break;
> +        case INDEX_op_st_i32:
> +            t0 = tci_read_r32(&tb_ptr);
> +            t1 = tci_read_r(&tb_ptr);
> +            t2 = tci_read_i32(&tb_ptr);
> +            *(uint32_t *)(t1 + t2) = t0;
> +            break;
> +    /* Arithmetic operations. */
> +        case INDEX_op_add_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, t1 + t2);
> +            break;
> +        case INDEX_op_sub_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, t1 - t2);
> +            break;
> +        case INDEX_op_mul_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, t1 * t2);
> +            break;
> +#if TCG_TARGET_HAS_div_i32
> +        case INDEX_op_div_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, (int32_t)t1 / (int32_t)t2);
> +            break;
> +        case INDEX_op_divu_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, t1 / t2);
> +            break;
> +        case INDEX_op_rem_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, (int32_t)t1 % (int32_t)t2);
> +            break;
> +        case INDEX_op_remu_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, t1 % t2);
> +            break;
> +#elif TCG_TARGET_HAS_div2_i32
> +        case INDEX_op_div2_i32:
> +        case INDEX_op_divu2_i32:
> +            TODO();
> +            break;
> +#endif
> +        case INDEX_op_and_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, t1 & t2);
> +            break;
> +        case INDEX_op_or_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, t1 | t2);
> +            break;
> +        case INDEX_op_xor_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, t1 ^ t2);
> +            break;
> +    /* Shift/rotate operations. */
> +        case INDEX_op_shl_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, t1 << t2);
> +            break;
> +        case INDEX_op_shr_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, t1 >> t2);
> +            break;
> +        case INDEX_op_sar_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, ((int32_t)t1 >> t2));
> +            break;
> +#if TCG_TARGET_HAS_rot_i32
> +        case INDEX_op_rotl_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, (t1 << t2) | (t1 >> (32 - t2)));
> +            break;
> +        case INDEX_op_rotr_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri32(&tb_ptr);
> +            t2 = tci_read_ri32(&tb_ptr);
> +            tci_write_reg32(t0, (t1 >> t2) | (t1 << (32 - t2)));
> +            break;
> +#endif
> +        case INDEX_op_brcond_i32:
> +            t0 = tci_read_r32(&tb_ptr);
> +            t1 = tci_read_ri32(&tb_ptr);
> +            condition = *tb_ptr++;
> +            label = tci_read_label(&tb_ptr);
> +            if (tci_compare32(t0, t1, condition)) {
> +                assert(tb_ptr == old_code_ptr + op_size);
> +                tb_ptr = (uint8_t *)label;
> +                continue;
> +            }
> +            break;
> +#if TCG_TARGET_REG_BITS == 32
> +        case INDEX_op_add2_i32:
> +            t0 = *tb_ptr++;
> +            t1 = *tb_ptr++;
> +            u64 = tci_read_r64(&tb_ptr);
> +            u64 += tci_read_r64(&tb_ptr);
> +            tci_write_reg64(t1, t0, u64);
> +            break;
> +        case INDEX_op_sub2_i32:
> +            t0 = *tb_ptr++;
> +            t1 = *tb_ptr++;
> +            u64 = tci_read_r64(&tb_ptr);
> +            u64 -= tci_read_r64(&tb_ptr);
> +            tci_write_reg64(t1, t0, u64);
> +            break;
> +        case INDEX_op_brcond2_i32:
> +            u64 = tci_read_r64(&tb_ptr);
> +            v64 = tci_read_ri64(&tb_ptr);
> +            condition = *tb_ptr++;
> +            label = tci_read_label(&tb_ptr);
> +            if (tci_compare64(u64, v64, condition)) {
> +                assert(tb_ptr == old_code_ptr + op_size);
> +                tb_ptr = (uint8_t *)label;
> +                continue;
> +            }
> +            break;
> +        case INDEX_op_mulu2_i32:
> +            t0 = *tb_ptr++;
> +            t1 = *tb_ptr++;
> +            t2 = tci_read_r32(&tb_ptr);
> +            u64 = tci_read_r32(&tb_ptr);
> +            tci_write_reg64(t1, t0, t2 * u64);
> +            break;
> +#endif /* TCG_TARGET_REG_BITS == 32 */
> +#if TCG_TARGET_HAS_ext8s_i32
> +        case INDEX_op_ext8s_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r8s(&tb_ptr);
> +            tci_write_reg32(t0, t1);
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_ext16s_i32
> +        case INDEX_op_ext16s_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r16s(&tb_ptr);
> +            tci_write_reg32(t0, t1);
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_ext8u_i32
> +        case INDEX_op_ext8u_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r8(&tb_ptr);
> +            tci_write_reg32(t0, t1);
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_ext16u_i32
> +        case INDEX_op_ext16u_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r16(&tb_ptr);
> +            tci_write_reg32(t0, t1);
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_bswap16_i32
> +        case INDEX_op_bswap16_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r16(&tb_ptr);
> +            tci_write_reg32(t0, bswap16(t1));
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_bswap32_i32
> +        case INDEX_op_bswap32_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r32(&tb_ptr);
> +            tci_write_reg32(t0, bswap32(t1));
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_not_i32
> +        case INDEX_op_not_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r32(&tb_ptr);
> +            tci_write_reg32(t0, ~t1);
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_neg_i32
> +        case INDEX_op_neg_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r32(&tb_ptr);
> +            tci_write_reg32(t0, -t1);
> +            break;
> +#endif
> +#if TCG_TARGET_REG_BITS == 64
> +        case INDEX_op_mov_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r64(&tb_ptr);
> +            tci_write_reg64(t0, t1);
> +            break;
> +        case INDEX_op_movi_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_i64(&tb_ptr);
> +            tci_write_reg64(t0, t1);
> +            break;
> +    /* Load/store operations. */
> +        case INDEX_op_ld8u_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r(&tb_ptr);
> +            t2 = tci_read_i32(&tb_ptr);
> +            tci_write_reg8(t0, *(uint8_t *)(t1 + t2));
> +            break;
> +        case INDEX_op_ld8s_i64:
> +        case INDEX_op_ld16u_i64:
> +        case INDEX_op_ld16s_i64:
> +            TODO();
> +            break;
> +        case INDEX_op_ld32u_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r(&tb_ptr);
> +            t2 = tci_read_i32(&tb_ptr);
> +            tci_write_reg32(t0, *(uint32_t *)(t1 + t2));
> +            break;
> +        case INDEX_op_ld32s_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r(&tb_ptr);
> +            t2 = tci_read_i32(&tb_ptr);
> +            tci_write_reg32s(t0, *(int32_t *)(t1 + t2));
> +            break;
> +        case INDEX_op_ld_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r(&tb_ptr);
> +            t2 = tci_read_i32(&tb_ptr);
> +            tci_write_reg64(t0, *(uint64_t *)(t1 + t2));
> +            break;
> +        case INDEX_op_st8_i64:
> +            t0 = tci_read_r8(&tb_ptr);
> +            t1 = tci_read_r(&tb_ptr);
> +            t2 = tci_read_i32(&tb_ptr);
> +            *(uint8_t *)(t1 + t2) = t0;
> +            break;
> +        case INDEX_op_st16_i64:
> +            t0 = tci_read_r16(&tb_ptr);
> +            t1 = tci_read_r(&tb_ptr);
> +            t2 = tci_read_i32(&tb_ptr);
> +            *(uint16_t *)(t1 + t2) = t0;
> +            break;
> +        case INDEX_op_st32_i64:
> +            t0 = tci_read_r32(&tb_ptr);
> +            t1 = tci_read_r(&tb_ptr);
> +            t2 = tci_read_i32(&tb_ptr);
> +            *(uint32_t *)(t1 + t2) = t0;
> +            break;
> +        case INDEX_op_st_i64:
> +            t0 = tci_read_r64(&tb_ptr);
> +            t1 = tci_read_r(&tb_ptr);
> +            t2 = tci_read_i32(&tb_ptr);
> +            *(uint64_t *)(t1 + t2) = t0;
> +            break;
> +    /* Arithmetic operations. */
> +        case INDEX_op_add_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri64(&tb_ptr);
> +            t2 = tci_read_ri64(&tb_ptr);
> +            tci_write_reg64(t0, t1 + t2);
> +            break;
> +        case INDEX_op_sub_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri64(&tb_ptr);
> +            t2 = tci_read_ri64(&tb_ptr);
> +            tci_write_reg64(t0, t1 - t2);
> +            break;
> +        case INDEX_op_mul_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri64(&tb_ptr);
> +            t2 = tci_read_ri64(&tb_ptr);
> +            tci_write_reg64(t0, t1 * t2);
> +            break;
> +#if TCG_TARGET_HAS_div_i64
> +        case INDEX_op_div_i64:
> +        case INDEX_op_divu_i64:
> +        case INDEX_op_rem_i64:
> +        case INDEX_op_remu_i64:
> +            TODO();
> +            break;
> +#elif TCG_TARGET_HAS_div2_i64
> +        case INDEX_op_div2_i64:
> +        case INDEX_op_divu2_i64:
> +            TODO();
> +            break;
> +#endif
> +        case INDEX_op_and_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri64(&tb_ptr);
> +            t2 = tci_read_ri64(&tb_ptr);
> +            tci_write_reg64(t0, t1 & t2);
> +            break;
> +        case INDEX_op_or_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri64(&tb_ptr);
> +            t2 = tci_read_ri64(&tb_ptr);
> +            tci_write_reg64(t0, t1 | t2);
> +            break;
> +        case INDEX_op_xor_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri64(&tb_ptr);
> +            t2 = tci_read_ri64(&tb_ptr);
> +            tci_write_reg64(t0, t1 ^ t2);
> +            break;
> +    /* Shift/rotate operations. */
> +        case INDEX_op_shl_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri64(&tb_ptr);
> +            t2 = tci_read_ri64(&tb_ptr);
> +            tci_write_reg64(t0, t1 << t2);
> +            break;
> +        case INDEX_op_shr_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri64(&tb_ptr);
> +            t2 = tci_read_ri64(&tb_ptr);
> +            tci_write_reg64(t0, t1 >> t2);
> +            break;
> +        case INDEX_op_sar_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_ri64(&tb_ptr);
> +            t2 = tci_read_ri64(&tb_ptr);
> +            tci_write_reg64(t0, ((int64_t)t1 >> t2));
> +            break;
> +#if TCG_TARGET_HAS_rot_i64
> +        case INDEX_op_rotl_i64:
> +        case INDEX_op_rotr_i64:
> +            TODO();
> +            break;
> +#endif
> +        case INDEX_op_brcond_i64:
> +            t0 = tci_read_r64(&tb_ptr);
> +            t1 = tci_read_ri64(&tb_ptr);
> +            condition = *tb_ptr++;
> +            label = tci_read_label(&tb_ptr);
> +            if (tci_compare64(t0, t1, condition)) {
> +                assert(tb_ptr == old_code_ptr + op_size);
> +                tb_ptr = (uint8_t *)label;
> +                continue;
> +            }
> +            break;
> +#if TCG_TARGET_HAS_ext8u_i64
> +        case INDEX_op_ext8u_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r8(&tb_ptr);
> +            tci_write_reg64(t0, t1);
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_ext8s_i64
> +        case INDEX_op_ext8s_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r8s(&tb_ptr);
> +            tci_write_reg64(t0, t1);
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_ext16s_i64
> +        case INDEX_op_ext16s_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r16s(&tb_ptr);
> +            tci_write_reg64(t0, t1);
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_ext16u_i64
> +        case INDEX_op_ext16u_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r16(&tb_ptr);
> +            tci_write_reg64(t0, t1);
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_ext32s_i64
> +        case INDEX_op_ext32s_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r32s(&tb_ptr);
> +            tci_write_reg64(t0, t1);
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_ext32u_i64
> +        case INDEX_op_ext32u_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r32(&tb_ptr);
> +            tci_write_reg64(t0, t1);
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_bswap16_i64
> +        case INDEX_op_bswap16_i64:
> +            TODO();
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r16(&tb_ptr);
> +            tci_write_reg64(t0, bswap16(t1));
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_bswap32_i64
> +        case INDEX_op_bswap32_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r32(&tb_ptr);
> +            tci_write_reg64(t0, bswap32(t1));
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_bswap64_i64
> +        case INDEX_op_bswap64_i64:
> +            TODO();
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r64(&tb_ptr);
> +            tci_write_reg64(t0, bswap64(t1));
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_not_i64
> +        case INDEX_op_not_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r64(&tb_ptr);
> +            tci_write_reg64(t0, ~t1);
> +            break;
> +#endif
> +#if TCG_TARGET_HAS_neg_i64
> +        case INDEX_op_neg_i64:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r64(&tb_ptr);
> +            tci_write_reg64(t0, -t1);
> +            break;
> +#endif
> +#endif /* TCG_TARGET_REG_BITS == 64 */
> +    /* QEMU specific */
> +#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
> +        case INDEX_op_debug_insn_start:
> +            TODO();
> +            break;
> +#else
> +        case INDEX_op_debug_insn_start:
> +            TODO();
> +            break;
> +#endif
> +        case INDEX_op_exit_tb:
> +            next_tb = *(uint64_t *)tb_ptr;
> +            goto exit;
> +            break;
> +        case INDEX_op_goto_tb:
> +            t0 = tci_read_i32(&tb_ptr);
> +            assert(tb_ptr == old_code_ptr + op_size);
> +            tb_ptr += (int32_t)t0;
> +            continue;
> +        case INDEX_op_qemu_ld8u:
> +            t0 = *tb_ptr++;
> +            taddr = tci_read_ulong(&tb_ptr);
> +#ifdef CONFIG_SOFTMMU
> +            u8 = __ldb_mmu(taddr, tci_read_i(&tb_ptr));
> +#else
> +            host_addr = (tcg_target_ulong)taddr;
> +            assert(taddr == host_addr);
> +            u8 = *(uint8_t *)(host_addr + GUEST_BASE);
> +#endif
> +            tci_write_reg8(t0, u8);
> +            break;
> +        case INDEX_op_qemu_ld8s:
> +            t0 = *tb_ptr++;
> +            taddr = tci_read_ulong(&tb_ptr);
> +#ifdef CONFIG_SOFTMMU
> +            u8 = __ldb_mmu(taddr, tci_read_i(&tb_ptr));
> +#else
> +            host_addr = (tcg_target_ulong)taddr;
> +            assert(taddr == host_addr);
> +            u8 = *(uint8_t *)(host_addr + GUEST_BASE);
> +#endif
> +            tci_write_reg8s(t0, u8);
> +            break;
> +        case INDEX_op_qemu_ld16u:
> +            t0 = *tb_ptr++;
> +            taddr = tci_read_ulong(&tb_ptr);
> +#ifdef CONFIG_SOFTMMU
> +            u16 = __ldw_mmu(taddr, tci_read_i(&tb_ptr));
> +#else
> +            host_addr = (tcg_target_ulong)taddr;
> +            assert(taddr == host_addr);
> +            u16 = tswap16(*(uint16_t *)(host_addr + GUEST_BASE));
> +#endif
> +            tci_write_reg16(t0, u16);
> +            break;
> +        case INDEX_op_qemu_ld16s:
> +            t0 = *tb_ptr++;
> +            taddr = tci_read_ulong(&tb_ptr);
> +#ifdef CONFIG_SOFTMMU
> +            u16 = __ldw_mmu(taddr, tci_read_i(&tb_ptr));
> +#else
> +            host_addr = (tcg_target_ulong)taddr;
> +            assert(taddr == host_addr);
> +            u16 = tswap16(*(uint16_t *)(host_addr + GUEST_BASE));
> +#endif
> +            tci_write_reg16s(t0, u16);
> +            break;
> +#if TCG_TARGET_REG_BITS == 64
> +        case INDEX_op_qemu_ld32u:
> +            t0 = *tb_ptr++;
> +            taddr = tci_read_ulong(&tb_ptr);
> +#ifdef CONFIG_SOFTMMU
> +            u32 = __ldl_mmu(taddr, tci_read_i(&tb_ptr));
> +#else
> +            host_addr = (tcg_target_ulong)taddr;
> +            assert(taddr == host_addr);
> +            u32 = tswap32(*(uint32_t *)(host_addr + GUEST_BASE));
> +#endif
> +            tci_write_reg32(t0, u32);
> +            break;
> +        case INDEX_op_qemu_ld32s:
> +            t0 = *tb_ptr++;
> +            taddr = tci_read_ulong(&tb_ptr);
> +#ifdef CONFIG_SOFTMMU
> +            u32 = __ldl_mmu(taddr, tci_read_i(&tb_ptr));
> +#else
> +            host_addr = (tcg_target_ulong)taddr;
> +            assert(taddr == host_addr);
> +            u32 = tswap32(*(uint32_t *)(host_addr + GUEST_BASE));
> +#endif
> +            tci_write_reg32s(t0, u32);
> +            break;
> +#endif /* TCG_TARGET_REG_BITS == 64 */
> +        case INDEX_op_qemu_ld32:
> +            t0 = *tb_ptr++;
> +            taddr = tci_read_ulong(&tb_ptr);
> +#ifdef CONFIG_SOFTMMU
> +            u32 = __ldl_mmu(taddr, tci_read_i(&tb_ptr));
> +#else
> +            host_addr = (tcg_target_ulong)taddr;
> +            assert(taddr == host_addr);
> +            u32 = tswap32(*(uint32_t *)(host_addr + GUEST_BASE));
> +#endif
> +            tci_write_reg32(t0, u32);
> +            break;
> +        case INDEX_op_qemu_ld64:
> +            t0 = *tb_ptr++;
> +#if TCG_TARGET_REG_BITS == 32
> +            t1 = *tb_ptr++;
> +#endif
> +            taddr = tci_read_ulong(&tb_ptr);
> +#ifdef CONFIG_SOFTMMU
> +            u64 = __ldq_mmu(taddr, tci_read_i(&tb_ptr));
> +#else
> +            host_addr = (tcg_target_ulong)taddr;
> +            assert(taddr == host_addr);
> +            u64 = tswap64(*(uint64_t *)(host_addr + GUEST_BASE));
> +#endif
> +            tci_write_reg(t0, u64);
> +#if TCG_TARGET_REG_BITS == 32
> +            tci_write_reg(t1, u64 >> 32);
> +#endif
> +            break;
> +        case INDEX_op_qemu_st8:
> +            t0 = tci_read_r8(&tb_ptr);
> +            taddr = tci_read_ulong(&tb_ptr);
> +#ifdef CONFIG_SOFTMMU
> +            t2 = tci_read_i(&tb_ptr);
> +            __stb_mmu(taddr, t0, t2);
> +#else
> +            host_addr = (tcg_target_ulong)taddr;
> +            assert(taddr == host_addr);
> +            *(uint8_t *)(host_addr + GUEST_BASE) = t0;
> +#endif
> +            break;
> +        case INDEX_op_qemu_st16:
> +            t0 = tci_read_r16(&tb_ptr);
> +            taddr = tci_read_ulong(&tb_ptr);
> +#ifdef CONFIG_SOFTMMU
> +            t2 = tci_read_i(&tb_ptr);
> +            __stw_mmu(taddr, t0, t2);
> +#else
> +            host_addr = (tcg_target_ulong)taddr;
> +            assert(taddr == host_addr);
> +            *(uint16_t *)(host_addr + GUEST_BASE) = tswap16(t0);
> +#endif
> +            break;
> +        case INDEX_op_qemu_st32:
> +            t0 = tci_read_r32(&tb_ptr);
> +            taddr = tci_read_ulong(&tb_ptr);
> +#ifdef CONFIG_SOFTMMU
> +            t2 = tci_read_i(&tb_ptr);
> +            __stl_mmu(taddr, t0, t2);
> +#else
> +            host_addr = (tcg_target_ulong)taddr;
> +            assert(taddr == host_addr);
> +            *(uint32_t *)(host_addr + GUEST_BASE) = tswap32(t0);
> +#endif
> +            break;
> +        case INDEX_op_qemu_st64:
> +            u64 = tci_read_r64(&tb_ptr);
> +            taddr = tci_read_ulong(&tb_ptr);
> +#ifdef CONFIG_SOFTMMU
> +            t2 = tci_read_i(&tb_ptr);
> +            __stq_mmu(taddr, u64, t2);
> +#else
> +            host_addr = (tcg_target_ulong)taddr;
> +            assert(taddr == host_addr);
> +            *(uint64_t *)(host_addr + GUEST_BASE) = tswap64(u64);
> +#endif
> +            break;
> +        default:
> +            TODO();
> +            break;
> +        }
> +        assert(tb_ptr == old_code_ptr + op_size);
> +    }
> +exit:
> +    return next_tb;
> +}
> --
> 1.7.2.5
>
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-17 19:59 [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Stefan Weil
                   ` (7 preceding siblings ...)
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 8/8] ppc: Support tcg interpreter on ppc hosts Stefan Weil
@ 2011-09-18 10:26 ` Blue Swirl
  2011-09-18 10:49   ` malc
  2011-09-18 15:02 ` Mulyadi Santosa
  2011-09-18 18:02 ` Avi Kivity
  10 siblings, 1 reply; 48+ messages in thread
From: Blue Swirl @ 2011-09-18 10:26 UTC (permalink / raw)
  To: Stefan Weil; +Cc: TeLeMan, Stuart Brady, QEMU Developers

On Sat, Sep 17, 2011 at 7:59 PM, Stefan Weil <weil@mail.berlios.de> wrote:
> Hello,
>
> these patches add a new code generator (TCG target) to qemu.
>
> Unlike other tcg target code generators, this one does not generate
> machine code for some cpu. It generates machine independent bytecode
> which is interpreted later. That's why I called it TCI (tiny code
> interpreter).
>
> I wrote most of the code two years ago and included feedback and
> contributions from several QEMU developers, notably TeleMan,
> Stuart Brady, Blue Swirl and Malc. See the history here:
> http://lists.nongnu.org/archive/html/qemu-devel/2009-09/msg01710.html
>
> Since that time, I used TCI regularly, added small fixes and improvements
> and rebased it to latest QEMU. Some versions were tested using
> ARM (emulated and real), PowerPC (emulated) and MIPS (emulated) hosts,
> but normally I run it on i386 and x86_64 hosts.
>
> I'd appreciate to see TCI in QEMU 1.0.
>
> Regards,
> Stefan Weil
>
> The patches 2 and 4 are optional, patch 8 is only needed for running
> TCI on a PowerPC host.

I think patches 1 to 4 and 8 could be applied soon as they are now,
they should benefit plain TCG too. I had some comments to other
patches, but otherwise everything looks great.

Comparisons to other bytecode interpreters (for example Python) would
be interesting, maybe there are also tricks that can be reused.

> [PATCH 1/8] tcg: Declare TCG_TARGET_REG_BITS in tcg.h
> [PATCH 2/8] tcg: Don't declare TCG_TARGET_REG_BITS in tcg-target.h
> [PATCH 3/8] tcg: Add forward declarations for local functions
> [PATCH 4/8] tcg: Add some assertions
> [PATCH 5/8] tcg: Add interpreter for bytecode
> [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter
> [PATCH 7/8] tcg: Add tcg interpreter to configure / make
> [PATCH 8/8] ppc: Support tcg interpreter on ppc hosts
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-18 10:26 ` [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Blue Swirl
@ 2011-09-18 10:49   ` malc
  2011-09-18 12:12     ` Blue Swirl
  0 siblings, 1 reply; 48+ messages in thread
From: malc @ 2011-09-18 10:49 UTC (permalink / raw)
  To: Blue Swirl; +Cc: Stuart Brady, QEMU Developers, TeLeMan

On Sun, 18 Sep 2011, Blue Swirl wrote:

> On Sat, Sep 17, 2011 at 7:59 PM, Stefan Weil <weil@mail.berlios.de> wrote:
> > Hello,
> >
> > these patches add a new code generator (TCG target) to qemu.
> >
> > Unlike other tcg target code generators, this one does not generate
> > machine code for some cpu. It generates machine independent bytecode
> > which is interpreted later. That's why I called it TCI (tiny code
> > interpreter).
> >
> > I wrote most of the code two years ago and included feedback and
> > contributions from several QEMU developers, notably TeleMan,
> > Stuart Brady, Blue Swirl and Malc. See the history here:
> > http://lists.nongnu.org/archive/html/qemu-devel/2009-09/msg01710.html
> >
> > Since that time, I used TCI regularly, added small fixes and improvements
> > and rebased it to latest QEMU. Some versions were tested using
> > ARM (emulated and real), PowerPC (emulated) and MIPS (emulated) hosts,
> > but normally I run it on i386 and x86_64 hosts.
> >
> > I'd appreciate to see TCI in QEMU 1.0.
> >
> > Regards,
> > Stefan Weil
> >
> > The patches 2 and 4 are optional, patch 8 is only needed for running
> > TCI on a PowerPC host.
> 
> I think patches 1 to 4 and 8 could be applied soon as they are now,
> they should benefit plain TCG too. I had some comments to other
> patches, but otherwise everything looks great.

Hold the horses untill Stefan settles the licensing issues.

> 
> Comparisons to other bytecode interpreters (for example Python) would
> be interesting, maybe there are also tricks that can be reused.
> 
> > [PATCH 1/8] tcg: Declare TCG_TARGET_REG_BITS in tcg.h
> > [PATCH 2/8] tcg: Don't declare TCG_TARGET_REG_BITS in tcg-target.h
> > [PATCH 3/8] tcg: Add forward declarations for local functions
> > [PATCH 4/8] tcg: Add some assertions
> > [PATCH 5/8] tcg: Add interpreter for bytecode
> > [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter
> > [PATCH 7/8] tcg: Add tcg interpreter to configure / make
> > [PATCH 8/8] ppc: Support tcg interpreter on ppc hosts
> >
> >
> 

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-18 10:49   ` malc
@ 2011-09-18 12:12     ` Blue Swirl
  2011-09-18 12:46       ` malc
  0 siblings, 1 reply; 48+ messages in thread
From: Blue Swirl @ 2011-09-18 12:12 UTC (permalink / raw)
  To: malc; +Cc: Stuart Brady, QEMU Developers, TeLeMan

On Sun, Sep 18, 2011 at 10:49 AM, malc <av1474@comtv.ru> wrote:
> On Sun, 18 Sep 2011, Blue Swirl wrote:
>
>> On Sat, Sep 17, 2011 at 7:59 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>> > Hello,
>> >
>> > these patches add a new code generator (TCG target) to qemu.
>> >
>> > Unlike other tcg target code generators, this one does not generate
>> > machine code for some cpu. It generates machine independent bytecode
>> > which is interpreted later. That's why I called it TCI (tiny code
>> > interpreter).
>> >
>> > I wrote most of the code two years ago and included feedback and
>> > contributions from several QEMU developers, notably TeleMan,
>> > Stuart Brady, Blue Swirl and Malc. See the history here:
>> > http://lists.nongnu.org/archive/html/qemu-devel/2009-09/msg01710.html
>> >
>> > Since that time, I used TCI regularly, added small fixes and improvements
>> > and rebased it to latest QEMU. Some versions were tested using
>> > ARM (emulated and real), PowerPC (emulated) and MIPS (emulated) hosts,
>> > but normally I run it on i386 and x86_64 hosts.
>> >
>> > I'd appreciate to see TCI in QEMU 1.0.
>> >
>> > Regards,
>> > Stefan Weil
>> >
>> > The patches 2 and 4 are optional, patch 8 is only needed for running
>> > TCI on a PowerPC host.
>>
>> I think patches 1 to 4 and 8 could be applied soon as they are now,
>> they should benefit plain TCG too. I had some comments to other
>> patches, but otherwise everything looks great.
>
> Hold the horses untill Stefan settles the licensing issues.

Which issues? For which patches?

>>
>> Comparisons to other bytecode interpreters (for example Python) would
>> be interesting, maybe there are also tricks that can be reused.
>>
>> > [PATCH 1/8] tcg: Declare TCG_TARGET_REG_BITS in tcg.h
>> > [PATCH 2/8] tcg: Don't declare TCG_TARGET_REG_BITS in tcg-target.h
>> > [PATCH 3/8] tcg: Add forward declarations for local functions
>> > [PATCH 4/8] tcg: Add some assertions
>> > [PATCH 5/8] tcg: Add interpreter for bytecode
>> > [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter
>> > [PATCH 7/8] tcg: Add tcg interpreter to configure / make
>> > [PATCH 8/8] ppc: Support tcg interpreter on ppc hosts
>> >
>> >
>>
>
> --
> mailto:av1474@comtv.ru
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-18 12:12     ` Blue Swirl
@ 2011-09-18 12:46       ` malc
  2011-09-18 13:00         ` Blue Swirl
  0 siblings, 1 reply; 48+ messages in thread
From: malc @ 2011-09-18 12:46 UTC (permalink / raw)
  To: Blue Swirl; +Cc: Stuart Brady, QEMU Developers, TeLeMan

On Sun, 18 Sep 2011, Blue Swirl wrote:

> On Sun, Sep 18, 2011 at 10:49 AM, malc <av1474@comtv.ru> wrote:
> > On Sun, 18 Sep 2011, Blue Swirl wrote:
> >
> >> On Sat, Sep 17, 2011 at 7:59 PM, Stefan Weil <weil@mail.berlios.de> wrote:
> >> > Hello,
> >> >
> >> > these patches add a new code generator (TCG target) to qemu.
> >> >
> >> > Unlike other tcg target code generators, this one does not generate
> >> > machine code for some cpu. It generates machine independent bytecode
> >> > which is interpreted later. That's why I called it TCI (tiny code
> >> > interpreter).
> >> >
> >> > I wrote most of the code two years ago and included feedback and
> >> > contributions from several QEMU developers, notably TeleMan,
> >> > Stuart Brady, Blue Swirl and Malc. See the history here:
> >> > http://lists.nongnu.org/archive/html/qemu-devel/2009-09/msg01710.html
> >> >
> >> > Since that time, I used TCI regularly, added small fixes and improvements
> >> > and rebased it to latest QEMU. Some versions were tested using
> >> > ARM (emulated and real), PowerPC (emulated) and MIPS (emulated) hosts,
> >> > but normally I run it on i386 and x86_64 hosts.
> >> >
> >> > I'd appreciate to see TCI in QEMU 1.0.
> >> >
> >> > Regards,
> >> > Stefan Weil
> >> >
> >> > The patches 2 and 4 are optional, patch 8 is only needed for running
> >> > TCI on a PowerPC host.
> >>
> >> I think patches 1 to 4 and 8 could be applied soon as they are now,
> >> they should benefit plain TCG too. I had some comments to other
> >> patches, but otherwise everything looks great.
> >
> > Hold the horses untill Stefan settles the licensing issues.
> 
> Which issues? For which patches?
> 

Read tcg/LICENSE.

> >>
> >> Comparisons to other bytecode interpreters (for example Python) would
> >> be interesting, maybe there are also tricks that can be reused.
> >>
> >> > [PATCH 1/8] tcg: Declare TCG_TARGET_REG_BITS in tcg.h
> >> > [PATCH 2/8] tcg: Don't declare TCG_TARGET_REG_BITS in tcg-target.h
> >> > [PATCH 3/8] tcg: Add forward declarations for local functions
> >> > [PATCH 4/8] tcg: Add some assertions
> >> > [PATCH 5/8] tcg: Add interpreter for bytecode
> >> > [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter
> >> > [PATCH 7/8] tcg: Add tcg interpreter to configure / make
> >> > [PATCH 8/8] ppc: Support tcg interpreter on ppc hosts
> >> >
> >> >
> >>
> >
> > --
> > mailto:av1474@comtv.ru
> >
> 

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-18 12:46       ` malc
@ 2011-09-18 13:00         ` Blue Swirl
  2011-09-18 13:13           ` malc
  2011-09-25 20:37           ` Stefan Weil
  0 siblings, 2 replies; 48+ messages in thread
From: Blue Swirl @ 2011-09-18 13:00 UTC (permalink / raw)
  To: malc; +Cc: Stuart Brady, QEMU Developers, TeLeMan

On Sun, Sep 18, 2011 at 12:46 PM, malc <av1474@comtv.ru> wrote:
> On Sun, 18 Sep 2011, Blue Swirl wrote:
>
>> On Sun, Sep 18, 2011 at 10:49 AM, malc <av1474@comtv.ru> wrote:
>> > On Sun, 18 Sep 2011, Blue Swirl wrote:
>> >
>> >> On Sat, Sep 17, 2011 at 7:59 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>> >> > Hello,
>> >> >
>> >> > these patches add a new code generator (TCG target) to qemu.
>> >> >
>> >> > Unlike other tcg target code generators, this one does not generate
>> >> > machine code for some cpu. It generates machine independent bytecode
>> >> > which is interpreted later. That's why I called it TCI (tiny code
>> >> > interpreter).
>> >> >
>> >> > I wrote most of the code two years ago and included feedback and
>> >> > contributions from several QEMU developers, notably TeleMan,
>> >> > Stuart Brady, Blue Swirl and Malc. See the history here:
>> >> > http://lists.nongnu.org/archive/html/qemu-devel/2009-09/msg01710.html
>> >> >
>> >> > Since that time, I used TCI regularly, added small fixes and improvements
>> >> > and rebased it to latest QEMU. Some versions were tested using
>> >> > ARM (emulated and real), PowerPC (emulated) and MIPS (emulated) hosts,
>> >> > but normally I run it on i386 and x86_64 hosts.
>> >> >
>> >> > I'd appreciate to see TCI in QEMU 1.0.
>> >> >
>> >> > Regards,
>> >> > Stefan Weil
>> >> >
>> >> > The patches 2 and 4 are optional, patch 8 is only needed for running
>> >> > TCI on a PowerPC host.
>> >>
>> >> I think patches 1 to 4 and 8 could be applied soon as they are now,
>> >> they should benefit plain TCG too. I had some comments to other
>> >> patches, but otherwise everything looks great.
>> >
>> > Hold the horses untill Stefan settles the licensing issues.
>>
>> Which issues? For which patches?
>>
>
> Read tcg/LICENSE.

"All the files in this directory and subdirectories are released under
a BSD like license (see header in each file). No other license is
accepted."

The wording of the file should be changed to list the files for which
the BSD like license applies (and for which no other license is
accepted), the file can't stop us adding new files with different
licenses.

>> >>
>> >> Comparisons to other bytecode interpreters (for example Python) would
>> >> be interesting, maybe there are also tricks that can be reused.
>> >>
>> >> > [PATCH 1/8] tcg: Declare TCG_TARGET_REG_BITS in tcg.h
>> >> > [PATCH 2/8] tcg: Don't declare TCG_TARGET_REG_BITS in tcg-target.h
>> >> > [PATCH 3/8] tcg: Add forward declarations for local functions
>> >> > [PATCH 4/8] tcg: Add some assertions
>> >> > [PATCH 5/8] tcg: Add interpreter for bytecode
>> >> > [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter
>> >> > [PATCH 7/8] tcg: Add tcg interpreter to configure / make
>> >> > [PATCH 8/8] ppc: Support tcg interpreter on ppc hosts
>> >> >
>> >> >
>> >>
>> >
>> > --
>> > mailto:av1474@comtv.ru
>> >
>>
>
> --
> mailto:av1474@comtv.ru
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-18 13:00         ` Blue Swirl
@ 2011-09-18 13:13           ` malc
  2011-09-18 13:26             ` Blue Swirl
  2011-09-25 20:37           ` Stefan Weil
  1 sibling, 1 reply; 48+ messages in thread
From: malc @ 2011-09-18 13:13 UTC (permalink / raw)
  To: Blue Swirl; +Cc: Stuart Brady, QEMU Developers, TeLeMan

On Sun, 18 Sep 2011, Blue Swirl wrote:

> On Sun, Sep 18, 2011 at 12:46 PM, malc <av1474@comtv.ru> wrote:
> > On Sun, 18 Sep 2011, Blue Swirl wrote:
> >

[..snip..]

> 
> "All the files in this directory and subdirectories are released under
> a BSD like license (see header in each file). No other license is
> accepted."
> 
> The wording of the file should be changed to list the files for which
> the BSD like license applies (and for which no other license is
> accepted), the file can't stop us adding new files with different
> licenses.
> 

As i said to Stefan this should be talked over with Fabrice, not me or
anyone else.

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-18 13:13           ` malc
@ 2011-09-18 13:26             ` Blue Swirl
  0 siblings, 0 replies; 48+ messages in thread
From: Blue Swirl @ 2011-09-18 13:26 UTC (permalink / raw)
  To: malc; +Cc: Stuart Brady, QEMU Developers, TeLeMan

On Sun, Sep 18, 2011 at 1:13 PM, malc <av1474@comtv.ru> wrote:
> On Sun, 18 Sep 2011, Blue Swirl wrote:
>
>> On Sun, Sep 18, 2011 at 12:46 PM, malc <av1474@comtv.ru> wrote:
>> > On Sun, 18 Sep 2011, Blue Swirl wrote:
>> >
>
> [..snip..]
>
>>
>> "All the files in this directory and subdirectories are released under
>> a BSD like license (see header in each file). No other license is
>> accepted."
>>
>> The wording of the file should be changed to list the files for which
>> the BSD like license applies (and for which no other license is
>> accepted), the file can't stop us adding new files with different
>> licenses.
>>
>
> As i said to Stefan

[citation needed]

> this should be talked over with Fabrice, not me or
> anyone else.

IANAL, but I don't see any problem adding new files with different
licenses. As the LICENSE file is clearly conflicting with this, it
should be changed but that does not change the license situation of
old files. Each file contains a header which tells the license, so
LICENSE is redundant.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-17 19:59 [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Stefan Weil
                   ` (8 preceding siblings ...)
  2011-09-18 10:26 ` [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Blue Swirl
@ 2011-09-18 15:02 ` Mulyadi Santosa
  2011-09-18 15:13   ` Stefan Weil
  2011-09-18 18:02 ` Avi Kivity
  10 siblings, 1 reply; 48+ messages in thread
From: Mulyadi Santosa @ 2011-09-18 15:02 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

Hi :)

On Sun, Sep 18, 2011 at 02:59, Stefan Weil <weil@mail.berlios.de> wrote:
> Hello,
>
> these patches add a new code generator (TCG target) to qemu.

I personally congrats you for your hard work. So, here's a question
from who are not so keen with Qemu internals: what is the biggest
advantage of using TCI instead of directly using TCG?


-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-18 15:02 ` Mulyadi Santosa
@ 2011-09-18 15:13   ` Stefan Weil
  2011-09-18 16:39     ` Mulyadi Santosa
  2011-09-19  8:40     ` David Gilbert
  0 siblings, 2 replies; 48+ messages in thread
From: Stefan Weil @ 2011-09-18 15:13 UTC (permalink / raw)
  To: Mulyadi Santosa; +Cc: QEMU Developers

Am 18.09.2011 17:02, schrieb Mulyadi Santosa:
> Hi :)
>
> On Sun, Sep 18, 2011 at 02:59, Stefan Weil <weil@mail.berlios.de> wrote:
>> Hello,
>>
>> these patches add a new code generator (TCG target) to qemu.
>
> I personally congrats you for your hard work. So, here's a question
> from who are not so keen with Qemu internals: what is the biggest
> advantage of using TCI instead of directly using TCG?

TCG with native code support is much faster (6x to 10x),
so for emulation on a supported host, TCI has no advantage
for normal users.

Its primary purpose was support of new hosts without a native
TCG.

In addition, it's easier to trace TCG operations in TCI
than in generated native code, so TCI is really good to
examine code, to test new TCG opcodes, to make statistics
(I did some with Valgrind which now now longer works
thanks to coroutines), to test the influence of TCG parameters
like the number of available registers. I'm sure there are
even more interesting applications for which TCI could be
used.

Cheers,
Stefan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-18 15:13   ` Stefan Weil
@ 2011-09-18 16:39     ` Mulyadi Santosa
  2011-09-18 20:15       ` Stefan Weil
  2011-09-19  8:40     ` David Gilbert
  1 sibling, 1 reply; 48+ messages in thread
From: Mulyadi Santosa @ 2011-09-18 16:39 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

Hi Stefan...

On Sun, Sep 18, 2011 at 22:13, Stefan Weil <weil@mail.berlios.de> wrote:
> Its primary purpose was support of new hosts without a native
> TCG.

Thanks for the explanation, I got better picture now. However, still,
an interpreter must be ready to grab the bytecode and execute it,
right?

So, that interpreter, should it be build inside Qemu too? Or can we
use/write external one? let's say creating one in python and TCI
passes the generated bytecode via UNIX socket to the listening Python
script, is that doable or one of the goal your design?


-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode
  2011-09-18  7:22       ` Paolo Bonzini
@ 2011-09-18 17:54         ` Avi Kivity
  2011-09-19  6:52           ` Andi Kleen
  0 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2011-09-18 17:54 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Andi Kleen, QEMU Developers

On 09/18/2011 10:22 AM, Paolo Bonzini wrote:
> On 09/18/2011 07:49 AM, Stefan Weil wrote:
>> Is there really any difference in the generated code?
>> gcc already uses a jump table internally to handle the
>> switch cases.
>
> You typically save something on range checks, and it enables a lot 
> more tricks for use later (e.g. using multiple jump tables to perform 
> simple peephole optimizations, or to divert code execution on 
> interrupts).

I think it also improves branch target prediction - if you have a tight 
loop of a few opcodes the predictor can guess where you're headed (since 
there is a separate lookup key for each opcode), whereas with the 
original code, there's a single key which cannot be used to predict the 
branch target.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-17 19:59 [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Stefan Weil
                   ` (9 preceding siblings ...)
  2011-09-18 15:02 ` Mulyadi Santosa
@ 2011-09-18 18:02 ` Avi Kivity
  10 siblings, 0 replies; 48+ messages in thread
From: Avi Kivity @ 2011-09-18 18:02 UTC (permalink / raw)
  To: Stefan Weil; +Cc: Blue Swirl, TeLeMan, Stuart Brady, QEMU Developers

On 09/17/2011 10:59 PM, Stefan Weil wrote:
> Hello,
>
> these patches add a new code generator (TCG target) to qemu.
>
> Unlike other tcg target code generators, this one does not generate
> machine code for some cpu. It generates machine independent bytecode
> which is interpreted later. That's why I called it TCI (tiny code
> interpreter).
>
> I wrote most of the code two years ago and included feedback and
> contributions from several QEMU developers, notably TeleMan,
> Stuart Brady, Blue Swirl and Malc. See the history here:
> http://lists.nongnu.org/archive/html/qemu-devel/2009-09/msg01710.html
>
> Since that time, I used TCI regularly, added small fixes and improvements
> and rebased it to latest QEMU. Some versions were tested using
> ARM (emulated and real), PowerPC (emulated) and MIPS (emulated) hosts,
> but normally I run it on i386 and x86_64 hosts.
>
> I'd appreciate to see TCI in QEMU 1.0.

Next: a gcc target of (and a port of Linux to) tci, so we can run guests 
with tcg disabled.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-18 16:39     ` Mulyadi Santosa
@ 2011-09-18 20:15       ` Stefan Weil
  2011-09-19 15:14         ` Mulyadi Santosa
  0 siblings, 1 reply; 48+ messages in thread
From: Stefan Weil @ 2011-09-18 20:15 UTC (permalink / raw)
  To: Mulyadi Santosa; +Cc: QEMU Developers

Am 18.09.2011 18:39, schrieb Mulyadi Santosa:
> Hi Stefan...
>
> On Sun, Sep 18, 2011 at 22:13, Stefan Weil <weil@mail.berlios.de> wrote:
>> Its primary purpose was support of new hosts without a native
>> TCG.
>
> Thanks for the explanation, I got better picture now. However, still,
> an interpreter must be ready to grab the bytecode and execute it,
> right?
>
> So, that interpreter, should it be build inside Qemu too? Or can we
> use/write external one? let's say creating one in python and TCI
> passes the generated bytecode via UNIX socket to the listening Python
> script, is that doable or one of the goal your design?

Do you think of something like http://bellard.org/jslinux/?

The current interpreter is built inside QEMU, and I'm afraid
that separating code generator and interpreter in different
processes might be a lot of work. Maybe running both in
separate threads would be possible, so the code generator
could prepare new bytecode while the interpreter is still
running the previous one.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode
  2011-09-18 17:54         ` Avi Kivity
@ 2011-09-19  6:52           ` Andi Kleen
  2011-09-19 11:56             ` Avi Kivity
  0 siblings, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2011-09-19  6:52 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Paolo Bonzini, Andi Kleen, QEMU Developers


> I think it also improves branch target prediction - if you have a tight
> loop of a few opcodes the predictor can guess where you're headed (since
> there is a separate lookup key for each opcode), whereas with the
> original code, there's a single key which cannot be used to predict the
> branch target.

At least usually. I caught at least one version of gcc to CSE the jump
instruction into a common location in one of my interpreters
:-( But not all do it at least, and I hope gcc gets fixed.

It's still faster for other reasons usually.

-Andi

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-18 15:13   ` Stefan Weil
  2011-09-18 16:39     ` Mulyadi Santosa
@ 2011-09-19  8:40     ` David Gilbert
  2011-09-19 10:20       ` Stefan Hajnoczi
  1 sibling, 1 reply; 48+ messages in thread
From: David Gilbert @ 2011-09-19  8:40 UTC (permalink / raw)
  To: Stefan Weil; +Cc: Mulyadi Santosa, QEMU Developers

On 18 September 2011 16:13, Stefan Weil <weil@mail.berlios.de> wrote:
> Am 18.09.2011 17:02, schrieb Mulyadi Santosa:
>>
>> Hi :)
>>
>> On Sun, Sep 18, 2011 at 02:59, Stefan Weil <weil@mail.berlios.de> wrote:
>>>
>>> Hello,
>>>
>>> these patches add a new code generator (TCG target) to qemu.
>>
>> I personally congrats you for your hard work. So, here's a question
>> from who are not so keen with Qemu internals: what is the biggest
>> advantage of using TCI instead of directly using TCG?
>
> TCG with native code support is much faster (6x to 10x),
> so for emulation on a supported host, TCI has no advantage
> for normal users.

Is it possible to dynamically switch between the two?

The two cases I'm thinking of are:
  1) Using the interpreter to execute one or two instructions in an exception
handling case
  2) Avoiding TCG code generation on the first few runs of a piece of
code that might only be init code, and only bothering with TCG for hotter
code.

Dave

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-19  8:40     ` David Gilbert
@ 2011-09-19 10:20       ` Stefan Hajnoczi
  2011-09-19 10:27         ` David Gilbert
  0 siblings, 1 reply; 48+ messages in thread
From: Stefan Hajnoczi @ 2011-09-19 10:20 UTC (permalink / raw)
  To: David Gilbert; +Cc: Mulyadi Santosa, QEMU Developers

On Mon, Sep 19, 2011 at 9:40 AM, David Gilbert <david.gilbert@linaro.org> wrote:
> On 18 September 2011 16:13, Stefan Weil <weil@mail.berlios.de> wrote:
>> Am 18.09.2011 17:02, schrieb Mulyadi Santosa:
>>>
>>> Hi :)
>>>
>>> On Sun, Sep 18, 2011 at 02:59, Stefan Weil <weil@mail.berlios.de> wrote:
>>>>
>>>> Hello,
>>>>
>>>> these patches add a new code generator (TCG target) to qemu.
>>>
>>> I personally congrats you for your hard work. So, here's a question
>>> from who are not so keen with Qemu internals: what is the biggest
>>> advantage of using TCI instead of directly using TCG?
>>
>> TCG with native code support is much faster (6x to 10x),
>> so for emulation on a supported host, TCI has no advantage
>> for normal users.
>
> Is it possible to dynamically switch between the two?
>
> The two cases I'm thinking of are:
>  1) Using the interpreter to execute one or two instructions in an exception
> handling case
>  2) Avoiding TCG code generation on the first few runs of a piece of
> code that might only be init code, and only bothering with TCG for hotter
> code.

The tricky thing with using the interpeter for lesser run code is that
it has a bunch of machinery in front of it which probably makes it
relatively similar to actually emitting native code.  The interesting
benchmark would be to translate blocks but never cache them for future
executions - compare this with TCI to see how much difference there is
between executing with interpretation vs translation.  If the
interpreter is almost as expensive as the translator then it's not
worth it.

Stefan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-19 10:20       ` Stefan Hajnoczi
@ 2011-09-19 10:27         ` David Gilbert
  0 siblings, 0 replies; 48+ messages in thread
From: David Gilbert @ 2011-09-19 10:27 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Mulyadi Santosa, QEMU Developers

On 19 September 2011 11:20, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Sep 19, 2011 at 9:40 AM, David Gilbert <david.gilbert@linaro.org> wrote:

<snip>

>> Is it possible to dynamically switch between the two?
>>
>> The two cases I'm thinking of are:
>>  1) Using the interpreter to execute one or two instructions in an exception
>> handling case
>>  2) Avoiding TCG code generation on the first few runs of a piece of
>> code that might only be init code, and only bothering with TCG for hotter
>> code.
>
> The tricky thing with using the interpeter for lesser run code is that
> it has a bunch of machinery in front of it which probably makes it
> relatively similar to actually emitting native code.  The interesting
> benchmark would be to translate blocks but never cache them for future
> executions - compare this with TCI to see how much difference there is
> between executing with interpretation vs translation.  If the
> interpreter is almost as expensive as the translator then it's not
> worth it.

Right; the trick is if you have a passably fast interpreter you can afford
to do some more expensive optimisations in the code generator which would
be interesting.  It's not unusual to find an awful lot of executed once-or-twice
code.

Dave

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode
  2011-09-19  6:52           ` Andi Kleen
@ 2011-09-19 11:56             ` Avi Kivity
  2011-09-19 14:48               ` Andi Kleen
  0 siblings, 1 reply; 48+ messages in thread
From: Avi Kivity @ 2011-09-19 11:56 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Paolo Bonzini, QEMU Developers

On 09/19/2011 09:52 AM, Andi Kleen wrote:
> >  I think it also improves branch target prediction - if you have a tight
> >  loop of a few opcodes the predictor can guess where you're headed (since
> >  there is a separate lookup key for each opcode), whereas with the
> >  original code, there's a single key which cannot be used to predict the
> >  branch target.
>
> At least usually. I caught at least one version of gcc to CSE the jump
> instruction into a common location in one of my interpreters
> :-( But not all do it at least, and I hope gcc gets fixed.

You generally want CSE, yes?  So you can't blame gcc for getting it 
wrong sometimes.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode
  2011-09-19 11:56             ` Avi Kivity
@ 2011-09-19 14:48               ` Andi Kleen
  0 siblings, 0 replies; 48+ messages in thread
From: Andi Kleen @ 2011-09-19 14:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Paolo Bonzini, Andi Kleen, QEMU Developers

> You generally want CSE, yes?  So you can't blame gcc for getting it 
> wrong sometimes.

There are cases where CSE pessimizes the code, .e.g when it increases
memory pressure too much or caches something that is easier recomputed.
This is just another one.

BTW I checked again and the problem seems to be fixed in gcc 4.6, still
there in 4.5, at least in my example.

-Andi

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-18 20:15       ` Stefan Weil
@ 2011-09-19 15:14         ` Mulyadi Santosa
  0 siblings, 0 replies; 48+ messages in thread
From: Mulyadi Santosa @ 2011-09-19 15:14 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

Hi Stefan....

On Mon, Sep 19, 2011 at 03:15, Stefan Weil <weil@mail.berlios.de> wrote:
> Am 18.09.2011 18:39, schrieb Mulyadi Santosa:
>> On Sun, Sep 18, 2011 at 22:13, Stefan Weil <weil@mail.berlios.de> wrote:
>> So, that interpreter, should it be build inside Qemu too? Or can we
>> use/write external one? let's say creating one in python and TCI
>> passes the generated bytecode via UNIX socket to the listening Python
>> script, is that doable or one of the goal your design?
>
> Do you think of something like http://bellard.org/jslinux/?

None specific, but yes, that could be something that describe my idea
:) (anyway, that jslinux is awesome so to speak).

> The current interpreter is built inside QEMU, and I'm afraid
> that separating code generator and interpreter in different
> processes might be a lot of work. Maybe running both in
> separate threads would be possible, so the code generator
> could prepare new bytecode while the interpreter is still
> running the previous one.

Hm, got it...thanks for your kind explanation. I am very appreciate it.


-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode Stefan Weil
  2011-09-18  4:03   ` Andi Kleen
  2011-09-18 10:18   ` Blue Swirl
@ 2011-09-19 16:43   ` Richard Henderson
  2011-09-19 20:24   ` Stuart Brady
  3 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2011-09-19 16:43 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

On 09/17/2011 01:00 PM, Stefan Weil wrote:
> +#if TCG_TARGET_HAS_ext8s_i32
> +        case INDEX_op_ext8s_i32:
> +            t0 = *tb_ptr++;
> +            t1 = tci_read_r8s(&tb_ptr);
> +            tci_write_reg32(t0, t1);
> +            break;
> +#endif

You really ought not need all these ifdefs.

> +#if TCG_TARGET_HAS_rot_i64
> +        case INDEX_op_rotl_i64:
> +        case INDEX_op_rotr_i64:
> +            TODO();
> +            break;
> +#endif

Eh?  Just the same as your rot_i32 implementations,
with different types.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode Stefan Weil
                     ` (2 preceding siblings ...)
  2011-09-19 16:43   ` Richard Henderson
@ 2011-09-19 20:24   ` Stuart Brady
  2011-10-16 21:54     ` Stuart Brady
  3 siblings, 1 reply; 48+ messages in thread
From: Stuart Brady @ 2011-09-19 20:24 UTC (permalink / raw)
  To: qemu-devel

On Sat, Sep 17, 2011 at 10:00:31PM +0200, Stefan Weil wrote:

> +#if MAX_OPC_PARAM_IARGS != 4
> +# error Fix needed, number of supported input arguments changed!
> +#endif
> +#if TCG_TARGET_REG_BITS == 32
> +typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong,
> +                                    tcg_target_ulong, tcg_target_ulong,
> +                                    tcg_target_ulong, tcg_target_ulong,
> +                                    tcg_target_ulong, tcg_target_ulong);
> +#else
> +typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong,
> +                                    tcg_target_ulong, tcg_target_ulong);
> +#endif

[...]

> +        case INDEX_op_call:
> +            t0 = tci_read_ri(&tb_ptr);
> +#if TCG_TARGET_REG_BITS == 32
> +            u64 = ((helper_function)t0)(tci_read_reg(TCG_REG_R0),
> +                                        tci_read_reg(TCG_REG_R1),
> +                                        tci_read_reg(TCG_REG_R2),
> +                                        tci_read_reg(TCG_REG_R3),
> +                                        tci_read_reg(TCG_REG_R5),
> +                                        tci_read_reg(TCG_REG_R6),
> +                                        tci_read_reg(TCG_REG_R7),
> +                                        tci_read_reg(TCG_REG_R8));
> +            tci_write_reg(TCG_REG_R0, u64);
> +            tci_write_reg(TCG_REG_R1, u64 >> 32);
> +#else
> +            u64 = ((helper_function)t0)(tci_read_reg(TCG_REG_R0),
> +                                        tci_read_reg(TCG_REG_R1),
> +                                        tci_read_reg(TCG_REG_R2),
> +                                        tci_read_reg(TCG_REG_R3));
> +            tci_write_reg(TCG_REG_R0, u64);
> +#endif
> +            break;

Unfortunately, this won't work on all architectures.

C99 6.5.2.2 states:

   9. If the function is defined with a type that is not compatible with
      the type (of the expression) pointed to by the expression that
      denotes the called function, the behavior is undefined.

We could perhaps get away with this on certain architectures (and on
those architectures, doing it this way might be the most efficient
option), although I'm not sure which those architectures are.

The real problem is relates to alignment of parameters in registers
when passing 64-bit ints as arguments on 32-bit architectures.

Some ABIs have the situation where:

    void foo(uint32_t a, uint64_t b);

results in:

    register  contents
          p0  a
          p1  [padding]
          p2  b & 0xffffffff
          p3  b >> 32

An ABI may require this regardless of whether 64-bit integers are
typically aligned to 64-bit addresses in memory (or it may even do
this without such alignment for memory addresses).  The ordering of
the upper and lower 32-bits of a 64-bit parameter may have nothing at
all to do with the architecture's endianness.  The alignment rules
when passing arguments via registers might not even be consistent
with those when passing via the stack.

In QEMU, tcg_gen_callN() handles alignment of registers for the
architectures that currently have TCG backends.  If any new backend
were to require features not already supported by tcg_gen_callN(),
then those features would simply have to be added.

When using TCI, we could define TCG_TARGET_CALL_ALIGN_ARGS (and be
careful to handle the REGPARM case under x86), and simply rely on
tcg_gen_callN(), but this isn't guaranteed to work for all ABIs.

Since TCI is intended to be portable, I feel that we should provide
a means of calling helper functions that doesn't rely upon any
ABI-specific definitions, at least as a fallback.  It would probably
make sense to get the generic code working first, and then think about
optimising for specific ABIs later, IMO.

So, this leaves the question of how to do this in a generic manner.
Do we:

 1) Include a pointer to a wrapper function in the bytecode, which
    would call the helper with the correct type.  Each wrapper could
    just read from and write to the TCI registers itself without
    accepting/returning values, or the values of the TCI registers
    could be passed in as arguments to each of the wrapper functions.

 2) Encode the type of the function into the bytecode, such that a
    huge switch() statement can be used to cast the function pointer
    to the appropriate type, allowing the helper to be invoked in a
    defined manner.  My guess is that this would be slower when
    executing bytecode than 1), although it would be quicker for
    compilation of the bytecode.

 3) Modify the helpers themselves to accept uint32_t arguments when
    using TCI.  This would require quite a lot of work but would
    likely yield the best performance.  However, it would prevent
    us from ever being able to choose between architecture-specific
    backends and TCI using a command line option.

 4) Go with some other option that I've not considered?

To me, option 1) seems like the simplest, although the macros needed
to do this are likely to be a little hairy...

I'm also concerned that we should not clobber R1 when storing a
32-bit return value in R0 on 32-bit architectures.

Cheers,
-- 
Stuart Brady

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter
  2011-09-18 10:03   ` Blue Swirl
@ 2011-09-19 22:28     ` Stuart Brady
  0 siblings, 0 replies; 48+ messages in thread
From: Stuart Brady @ 2011-09-19 22:28 UTC (permalink / raw)
  To: qemu-devel

On Sun, Sep 18, 2011 at 10:03:07AM +0000, Blue Swirl wrote:

> I was wondering if this #ifdeffery is needed since TCI would probably
> give more performance compared to the alternative, TCG generated
> emulation sequences. But it could be useful for testing those. Maybe
> there should be two options to enable and disable all non-mandatory TCI
> versions.

We could perhaps even allow enabling/disabling of optional ops from the
command line, although I this would complicate tcg-op.h pretty badly.

> > +/*
> > + * This code implements a TCG which does not generate machine code for some
> > + * real target machine but which generates virtual machine code for an
> > + * interpreter. Interpreted pseudo code is slow, but it works on any host.
> > + *
> > + * Some remarks might help in understanding the code:
> > + *
> > + * "target" or "TCG target" is the machine which runs the generated code.
> > + * This is different to the usual meaning in QEMU where "target" is the
> > + * emulated machine. So normally QEMU host is identical to TCG target.
> > + * Here the TCG target is a virtual machine, but this virtual machine must
> > + * use the same word size like the real machine.
> 
> Why, for performance? Allowing that could be useful for testing TCG,
> perhaps we could even use non-native endianness?

I suppose any mismatch between TCGv_ptr and the host pointer size must
be avoided.  Perhaps it would be worth adding a TCG_TARGET_PTR_BITS and
converting users of TCG_TARGET_REG_BITS appropriately.  I'm surprised
at just how few places I've found that test TCG_TARGET_REG_BITS to
determine the width of a TCGv_ptr.

> > + * Therefore, we need both 32 and 64 bit virtual machines (interpreter).
> > + */
> > +
> > +#if !defined(TCG_TARGET_H)
> > +#define TCG_TARGET_H
> > +
> > +#include "config-host.h"
> > +
> > +#define TCG_TARGET_INTERPRETER 1
> > +
> > +#ifdef CONFIG_DEBUG_TCG
> > +/* Enable debug output. */
> > +#define CONFIG_DEBUG_TCG_INTERPRETER
> > +#endif
> > +
> > +#if 0 /* TCI tries to emulate a little endian host. */
> > +#if defined(HOST_WORDS_BIGENDIAN)
> > +# define TCG_TARGET_WORDS_BIGENDIAN
> > +#endif
> > +#endif
> > +
> > +/* Optional instructions. */
> > +
> > +#define TCG_TARGET_HAS_bswap16_i32      1
> > +#define TCG_TARGET_HAS_bswap32_i32      1
> > +/* Not more than one of the next two defines must be 1. */
> > +#define TCG_TARGET_HAS_div_i32          1
> > +#define TCG_TARGET_HAS_div2_i32         0
> > +#define TCG_TARGET_HAS_ext8s_i32        1
> > +#define TCG_TARGET_HAS_ext16s_i32       1
> > +#define TCG_TARGET_HAS_ext8u_i32        1
> > +#define TCG_TARGET_HAS_ext16u_i32       1
> > +#define TCG_TARGET_HAS_andc_i32         0
> > +#define TCG_TARGET_HAS_deposit_i32      0
> > +#define TCG_TARGET_HAS_eqv_i32          0
> > +#define TCG_TARGET_HAS_nand_i32         0
> > +#define TCG_TARGET_HAS_nor_i32          0
> > +#define TCG_TARGET_HAS_neg_i32          1
> > +#define TCG_TARGET_HAS_not_i32          1
> > +#define TCG_TARGET_HAS_orc_i32          0
> > +#define TCG_TARGET_HAS_rot_i32          1
> > +
> > +#if TCG_TARGET_REG_BITS == 64
> > +#define TCG_TARGET_HAS_bswap16_i64      1
> > +#define TCG_TARGET_HAS_bswap32_i64      1
> > +#define TCG_TARGET_HAS_bswap64_i64      1
> > +#define TCG_TARGET_HAS_deposit_i64      0
> > +/* Not more than one of the next two defines must be 1. */
> > +#define TCG_TARGET_HAS_div_i64          0
> > +#define TCG_TARGET_HAS_div2_i64         0
> > +#define TCG_TARGET_HAS_ext8s_i64        1
> > +#define TCG_TARGET_HAS_ext16s_i64       1
> > +#define TCG_TARGET_HAS_ext32s_i64       1
> > +#define TCG_TARGET_HAS_ext8u_i64        1
> > +#define TCG_TARGET_HAS_ext16u_i64       1
> > +#define TCG_TARGET_HAS_ext32u_i64       1
> > +#define TCG_TARGET_HAS_andc_i64         0
> > +#define TCG_TARGET_HAS_eqv_i64          0
> > +#define TCG_TARGET_HAS_nand_i64         0
> > +#define TCG_TARGET_HAS_nor_i64          0
> > +#define TCG_TARGET_HAS_neg_i64          1
> > +#define TCG_TARGET_HAS_not_i64          1
> > +#define TCG_TARGET_HAS_orc_i64          0
> > +#define TCG_TARGET_HAS_rot_i64          1
> > +#endif /* TCG_TARGET_REG_BITS == 64 */
> > +
> > +/* Offset to user memory in user mode. */
> > +#define TCG_TARGET_HAS_GUEST_BASE
> > +
> > +/* Number of registers available.
> > +   For 32 bit hosts, we need more than 8 registers (call arguments). */
> 
> On i386 there certainly aren't 8 registers, where does 8 come from?

We need eight registers to allow passing of four 32-bit arguments using
the registers.

Alternatively, we could use a stack to pass arguments.  For this, we'd
need to point our stack register (tci_reg[TCG_REG_CALL_STACK]) at some
memory that we use as a stack.  It wouldn't need to be much, just
enough to accomodate the arguments, AFAICT.

Unless we use the stack pointer register, we should not define
TCG_REG_CALL_STACK at all, and we should #ifdef out the parts of
tcg_reg_alloc_call() that use it.  If there's no stack available, the
code should abort in the case where there aren't enough TCI registers
for all of the parameters being passed, although in this case, there
should be a compile time check to ensure that:

    ARRAY_SIZE(tcg_target_call_iarg_regs) ==
        (MAX_OPC_PARAM_PER_ARG * MAX_OPC_PARAM_IARGS)

We might want to relax that check to allow tcg_target_call_iarg_regs to
have list more TCI registers than required, but those registers would
just be wasted.

BTW, note that TCG limits us to 64 registers (due to TCGRegSet).
I expect this could be changed (for TCI only) if needed, though, but if
we allow a user-supplied TCG_TARGET_NB_REGS, then we should check that
it is no more than 64, at least for the time being.

> > +/* #define TCG_TARGET_NB_REGS 32 */
> > +
> > +/* List of registers which are used by TCG. */
> > +typedef enum {
> > +    TCG_REG_R0 = 0,
> > +    TCG_REG_R1,
> > +    TCG_REG_R2,
> > +    TCG_REG_R3,
> > +    TCG_REG_R4,
> > +    TCG_REG_R5,
> > +    TCG_REG_R6,
> > +    TCG_REG_R7,
> > +    TCG_AREG0 = TCG_REG_R7,
> > +#if TCG_TARGET_NB_REGS >= 16
> > +    TCG_REG_R8,
> > +    TCG_REG_R9,
> > +    TCG_REG_R10,
> > +    TCG_REG_R11,
> > +    TCG_REG_R12,
> > +    TCG_REG_R13,
> > +    TCG_REG_R14,
> > +    TCG_REG_R15,
> > +#if TCG_TARGET_NB_REGS >= 32
> > +    TCG_REG_R16,
> > +    TCG_REG_R17,
> > +    TCG_REG_R18,
> > +    TCG_REG_R19,
> > +    TCG_REG_R20,
> > +    TCG_REG_R21,
> > +    TCG_REG_R22,
> > +    TCG_REG_R23,
> > +    TCG_REG_R24,
> > +    TCG_REG_R25,
> > +    TCG_REG_R26,
> > +    TCG_REG_R27,
> > +    TCG_REG_R28,
> > +    TCG_REG_R29,
> > +    TCG_REG_R30,
> > +    TCG_REG_R31,
> > +#endif

This seems unfortunate to me...

I wonder whether some sort of chain of defines would be better:

   /* already defined in osdep.h */

   #define xglue(x, y) x ## y
   #define glue(x, y) xglue(x, y)
   #define stringify(s)    tostring(s)
   #define tostring(s)     #s

   /* common definitions */

   #define NUM_DEF_1(n)                n(0)
   #define NUM_DEF_2(n)  NUM_DEF_1(n)  n(1)
   #define NUM_DEF_3(n)  NUM_DEF_2(n)  n(2)
   #define NUM_DEF_4(n)  NUM_DEF_3(n)  n(3)
   #define NUM_DEF_5(n)  NUM_DEF_4(n)  n(4)
   #define NUM_DEF_6(n)  NUM_DEF_5(n)  n(5)
   #define NUM_DEF_7(n)  NUM_DEF_6(n)  n(6)
   #define NUM_DEF_8(n)  NUM_DEF_7(n)  n(7)
   #define NUM_DEF_9(n)  NUM_DEF_8(n)  n(8)
   #define NUM_DEF_10(n) NUM_DEF_9(n)  n(9)
   #define NUM_DEF_11(n) NUM_DEF_10(n) n(10)
   #define NUM_DEF_12(n) NUM_DEF_11(n) n(11)
   #define NUM_DEF_13(n) NUM_DEF_12(n) n(12)
   #define NUM_DEF_14(n) NUM_DEF_13(n) n(13)
   #define NUM_DEF_15(n) NUM_DEF_14(n) n(14)
   #define NUM_DEF_16(n) NUM_DEF_15(n) n(15)

   #define DEF_TCG_REGS glue(NUM_DEF_,TCG_TARGET_NB_REGS)

   /* tcg-target.h */

   #define DEF_TCG_REG_NUM(x) TCG_REG_R##x,

   typedef enum {
      DEF_TCG_REGS(DEF_TCG_REG_NUM)
   };

   /* tcg-target.c */

   #define DEF_TCG_REG_NAME(x) tostring(r##x),

   static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
      DEF_TCG_REGS(DEF_TCG_REG_NAME)
   };

Okay, so I accept that this is rather horrible, but it does allow us
to define the right number of entries based on TCG_TARGET_NB_REGS
without masses of #ifdefs or relying on the compiler for the host.

It might be better to allow the number of registers to be defined at
run-time -- although TCG would have to be modified not to rely upon
TCG_TARGET_NB_REGS when compiled for TCI.

Cheers,
-- 
Stuart

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-18 13:00         ` Blue Swirl
  2011-09-18 13:13           ` malc
@ 2011-09-25 20:37           ` Stefan Weil
  2011-10-01 12:02             ` Blue Swirl
  1 sibling, 1 reply; 48+ messages in thread
From: Stefan Weil @ 2011-09-25 20:37 UTC (permalink / raw)
  To: Blue Swirl; +Cc: Stuart Brady, TeLeMan, QEMU Developers, Paul Brook

[-- Attachment #1: Type: text/plain, Size: 3315 bytes --]

Am 18.09.2011 15:00, schrieb Blue Swirl:
> On Sun, Sep 18, 2011 at 12:46 PM, malc <av1474@comtv.ru> wrote:
>> On Sun, 18 Sep 2011, Blue Swirl wrote:
>>
>>> On Sun, Sep 18, 2011 at 10:49 AM, malc <av1474@comtv.ru> wrote:
>>>> On Sun, 18 Sep 2011, Blue Swirl wrote:
>>>>
>>>>> On Sat, Sep 17, 2011 at 7:59 PM, Stefan Weil 
>>>>> <weil@mail.berlios.de> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> these patches add a new code generator (TCG target) to qemu.
>>>>>>
>>>>>> Unlike other tcg target code generators, this one does not generate
>>>>>> machine code for some cpu. It generates machine independent bytecode
>>>>>> which is interpreted later. That's why I called it TCI (tiny code
>>>>>> interpreter).
>>>>>>
>>>>>> I wrote most of the code two years ago and included feedback and
>>>>>> contributions from several QEMU developers, notably TeleMan,
>>>>>> Stuart Brady, Blue Swirl and Malc. See the history here:
>>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2009-09/msg01710.html
>>>>>>
>>>>>> Since that time, I used TCI regularly, added small fixes and 
>>>>>> improvements
>>>>>> and rebased it to latest QEMU. Some versions were tested using
>>>>>> ARM (emulated and real), PowerPC (emulated) and MIPS (emulated) 
>>>>>> hosts,
>>>>>> but normally I run it on i386 and x86_64 hosts.
>>>>>>
>>>>>> I'd appreciate to see TCI in QEMU 1.0.
>>>>>>
>>>>>> Regards,
>>>>>> Stefan Weil
>>>>>>
>>>>>> The patches 2 and 4 are optional, patch 8 is only needed for running
>>>>>> TCI on a PowerPC host.
>>>>>
>>>>> I think patches 1 to 4 and 8 could be applied soon as they are now,
>>>>> they should benefit plain TCG too. I had some comments to other
>>>>> patches, but otherwise everything looks great.
>>>>
>>>> Hold the horses until Stefan settles the licensing issues.
>>>
>>> Which issues? For which patches?
>>>
>>
>> Read tcg/LICENSE.
>
> "All the files in this directory and subdirectories are released under
> a BSD like license (see header in each file). No other license is
> accepted."
>
> The wording of the file should be changed to list the files for which
> the BSD like license applies (and for which no other license is
> accepted), the file can't stop us adding new files with different
> licenses.

Thanks for all feedback given.

These license issues delayed my further working on tci.
In the meantime, I asked Fabrice Bellard. Here is his answer:

    "Sorry but I no longer care about the license of this code.
    But if I was still in charge of the project,
    I would clearly refuse any non BSD code in TCG. "

Although I agree with Blue's opinion given above, I also
want to respect Fabrice.

Therefore I suggest these changes:

* tcg/bytecode is moved to tcg/tci, and I change the license to BSD.

* The interpreter is moved from tcg/tci.c to tci.c and remains GPL.
   It is no longer in TCG, so there is no conflict with tcg/LICENSE.

This might seem to be a trick, but in some way it is similar to the
other tcg target implementations: the code generator is always BSD,
and the generated code is run on a system with a different license.

Comments welcome. As soon as there is a consensus on the way how tci
can be integrated I'll continue with patch series v2.

Patches 1 to 4 could be applied immediately (that would reduce the
size of the new series).

Kind regards,

Stefan Weil


[-- Attachment #2: Type: text/html, Size: 4810 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)
  2011-09-25 20:37           ` Stefan Weil
@ 2011-10-01 12:02             ` Blue Swirl
  0 siblings, 0 replies; 48+ messages in thread
From: Blue Swirl @ 2011-10-01 12:02 UTC (permalink / raw)
  To: Stefan Weil; +Cc: Stuart Brady, TeLeMan, QEMU Developers, Paul Brook

On Sun, Sep 25, 2011 at 8:37 PM, Stefan Weil <weil@mail.berlios.de> wrote:
> Am 18.09.2011 15:00, schrieb Blue Swirl:
>
> On Sun, Sep 18, 2011 at 12:46 PM, malc <av1474@comtv.ru> wrote:
>
> On Sun, 18 Sep 2011, Blue Swirl wrote:
>
> On Sun, Sep 18, 2011 at 10:49 AM, malc <av1474@comtv.ru> wrote:
>
> On Sun, 18 Sep 2011, Blue Swirl wrote:
>
> On Sat, Sep 17, 2011 at 7:59 PM, Stefan Weil <weil@mail.berlios.de> wrote:
>
> Hello,
>
> these patches add a new code generator (TCG target) to qemu.
>
> Unlike other tcg target code generators, this one does not generate
> machine code for some cpu. It generates machine independent bytecode
> which is interpreted later. That's why I called it TCI (tiny code
> interpreter).
>
> I wrote most of the code two years ago and included feedback and
> contributions from several QEMU developers, notably TeleMan,
> Stuart Brady, Blue Swirl and Malc. See the history here:
> http://lists.nongnu.org/archive/html/qemu-devel/2009-09/msg01710.html
>
> Since that time, I used TCI regularly, added small fixes and improvements
> and rebased it to latest QEMU. Some versions were tested using
> ARM (emulated and real), PowerPC (emulated) and MIPS (emulated) hosts,
> but normally I run it on i386 and x86_64 hosts.
>
> I'd appreciate to see TCI in QEMU 1.0.
>
> Regards,
> Stefan Weil
>
> The patches 2 and 4 are optional, patch 8 is only needed for running
> TCI on a PowerPC host.
>
> I think patches 1 to 4 and 8 could be applied soon as they are now,
> they should benefit plain TCG too. I had some comments to other
> patches, but otherwise everything looks great.
>
> Hold the horses until Stefan settles the licensing issues.
>
> Which issues? For which patches?
>
>
> Read tcg/LICENSE.
>
> "All the files in this directory and subdirectories are released under
> a BSD like license (see header in each file). No other license is
> accepted."
>
> The wording of the file should be changed to list the files for which
> the BSD like license applies (and for which no other license is
> accepted), the file can't stop us adding new files with different
> licenses.
>
> Thanks for all feedback given.
>
> These license issues delayed my further working on tci.
> In the meantime, I asked Fabrice Bellard. Here is his answer:
>
> "Sorry but I no longer care about the license of this code.
> But if I was still in charge of the project,
> I would clearly refuse any non BSD code in TCG. "
>
> Although I agree with Blue's opinion given above, I also
> want to respect Fabrice.
>
> Therefore I suggest these changes:
>
> * tcg/bytecode is moved to tcg/tci, and I change the license to BSD.
>
> * The interpreter is moved from tcg/tci.c to tci.c and remains GPL.
>   It is no longer in TCG, so there is no conflict with tcg/LICENSE.
>
> This might seem to be a trick, but in some way it is similar to the
> other tcg target implementations: the code generator is always BSD,
> and the generated code is run on a system with a different license.
>
> Comments welcome. As soon as there is a consensus on the way how tci
> can be integrated I'll continue with patch series v2.
>
> Patches 1 to 4 could be applied immediately (that would reduce the
> size of the new series).

I applied 1 to 4, thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter
  2011-09-17 20:00 ` [Qemu-devel] [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter Stefan Weil
  2011-09-18 10:03   ` Blue Swirl
@ 2011-10-01 16:54   ` Andreas Färber
  2011-10-01 21:25     ` Stefan Weil
  1 sibling, 1 reply; 48+ messages in thread
From: Andreas Färber @ 2011-10-01 16:54 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

Am 17.09.2011 22:00, schrieb Stefan Weil:
> Unlike other tcg target code generators, this one does not generate
> machine code for some cpu. It generates machine independent bytecode
> which is interpreted later.
>
> This allows running QEMU on any host.
>
> Interpreted bytecode is slower than direct execution of generated
> machine code.
>
> Signed-off-by: Stefan Weil <weil@mail.berlios.de>
[...]
> diff --git a/tcg/bytecode/README b/tcg/bytecode/README
> new file mode 100644
> index 0000000..6fe9755
> --- /dev/null
> +++ b/tcg/bytecode/README
> @@ -0,0 +1,129 @@
> +TCG Interpreter (TCI) - Copyright (c) 2011 Stefan Weil.
> +
> +This file is released under GPL 2 or later.
> +
> +1) Introduction
> +
> +TCG (Tiny Code Generator) is a code generator which translates
> +code fragments ("basic blocks") from target code (any of the
> +targets supported by QEMU) to a code representation which
> +can be run on a host.
> +
> +QEMU can create native code for some hosts (arm, hppa, i386, ia64, ppc, ppc64,
> +s390, sparc, x86_64). For others, unofficial host support was written.
> +
> +By adding a code generator for a virtual machine and using an
> +interpreter for the generated bytecode, it is possible to
> +support (almost) any host.
> +
> +This is what TCI (Tiny Code Interpreter) does.
> +
> +2) Implementation
> +
> +Like each TCG host frontend, TCI implements the code generator in
> +tcg-target.c, tcg-target.h. Both files are in directory tcg/bytecode.
> +
> +The additional file tcg/tci.c adds the interpreter.
> +
> +The bytecode consists of opcodes (same numeric values as those used by
> +TCG), command length and arguments of variable size and number.

While reusing TCG opcode values certainly makes things easy to
implement, have you evaluated using LLVM bitcode as alternative to a
fully custom intermediate code format?

Andreas

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter
  2011-10-01 16:54   ` Andreas Färber
@ 2011-10-01 21:25     ` Stefan Weil
  2011-10-09 16:19       ` Andreas Färber
  0 siblings, 1 reply; 48+ messages in thread
From: Stefan Weil @ 2011-10-01 21:25 UTC (permalink / raw)
  To: Andreas Färber; +Cc: QEMU Developers

Am 01.10.2011 18:54, schrieb Andreas Färber:
> Am 17.09.2011 22:00, schrieb Stefan Weil:
>> Unlike other tcg target code generators, this one does not generate
>> machine code for some cpu. It generates machine independent bytecode
>> which is interpreted later.
>>
>> This allows running QEMU on any host.
>>
>> Interpreted bytecode is slower than direct execution of generated
>> machine code.
>>
>> Signed-off-by: Stefan Weil <weil@mail.berlios.de>
> [...]
>> diff --git a/tcg/bytecode/README b/tcg/bytecode/README
>> new file mode 100644
>> index 0000000..6fe9755
>> --- /dev/null
>> +++ b/tcg/bytecode/README
>> @@ -0,0 +1,129 @@
>> +TCG Interpreter (TCI) - Copyright (c) 2011 Stefan Weil.
>> +
>> +This file is released under GPL 2 or later.
>> +
>> +1) Introduction
>> +
>> +TCG (Tiny Code Generator) is a code generator which translates
>> +code fragments ("basic blocks") from target code (any of the
>> +targets supported by QEMU) to a code representation which
>> +can be run on a host.
>> +
>> +QEMU can create native code for some hosts (arm, hppa, i386, ia64, 
>> ppc, ppc64,
>> +s390, sparc, x86_64). For others, unofficial host support was written.
>> +
>> +By adding a code generator for a virtual machine and using an
>> +interpreter for the generated bytecode, it is possible to
>> +support (almost) any host.
>> +
>> +This is what TCI (Tiny Code Interpreter) does.
>> +
>> +2) Implementation
>> +
>> +Like each TCG host frontend, TCI implements the code generator in
>> +tcg-target.c, tcg-target.h. Both files are in directory tcg/bytecode.
>> +
>> +The additional file tcg/tci.c adds the interpreter.
>> +
>> +The bytecode consists of opcodes (same numeric values as those used by
>> +TCG), command length and arguments of variable size and number.
>
> While reusing TCG opcode values certainly makes things easy to
> implement, have you evaluated using LLVM bitcode as alternative to a
> fully custom intermediate code format?
>
> Andreas

I had a look on several bytecode representations - initially I thought
of using Java. LLVM was on my list, too, but I cannot say that I really
evaluated any of these alternatives. My primary goal was to learn more
about TCG and to get something working, and as you said, reusing the
TCG opcodes made things easier.

LLVM might also be used as a replacement for TCG.
It would be really interesting to see how both compare.

Stefan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter
  2011-10-01 21:25     ` Stefan Weil
@ 2011-10-09 16:19       ` Andreas Färber
  0 siblings, 0 replies; 48+ messages in thread
From: Andreas Färber @ 2011-10-09 16:19 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

Am 01.10.2011 23:25, schrieb Stefan Weil:
> Am 01.10.2011 18:54, schrieb Andreas Färber:
>> Am 17.09.2011 22:00, schrieb Stefan Weil:
>>> +The bytecode consists of opcodes (same numeric values as those used by
>>> +TCG), command length and arguments of variable size and number.
>>
>> While reusing TCG opcode values certainly makes things easy to
>> implement, have you evaluated using LLVM bitcode as alternative to a
>> fully custom intermediate code format?
> 
> I had a look on several bytecode representations - initially I thought
> of using Java. LLVM was on my list, too, but I cannot say that I really
> evaluated any of these alternatives. My primary goal was to learn more
> about TCG and to get something working, and as you said, reusing the
> TCG opcodes made things easier.

Okay, just thought I'd ask the blunt question. :)

We should be careful not to expose it to outside processes as discussed
elsewhere in the thread or we will have to start caring about ABI
versioning.

> LLVM might also be used as a replacement for TCG.
> It would be really interesting to see how both compare.

Maybe suited for a GSoC project?

Andreas

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode
  2011-09-19 20:24   ` Stuart Brady
@ 2011-10-16 21:54     ` Stuart Brady
  0 siblings, 0 replies; 48+ messages in thread
From: Stuart Brady @ 2011-10-16 21:54 UTC (permalink / raw)
  To: Stefan Weil; +Cc: qemu-devel

On Mon, Sep 19, 2011 at 09:24:47PM +0100, Stuart Brady wrote:
> On Sat, Sep 17, 2011 at 10:00:31PM +0200, Stefan Weil wrote:
> 
[...]
> > +            u64 = ((helper_function)t0)(tci_read_reg(TCG_REG_R0),
> > +                                        tci_read_reg(TCG_REG_R1),
> > +                                        tci_read_reg(TCG_REG_R2),
> > +                                        tci_read_reg(TCG_REG_R3));
> > +            tci_write_reg(TCG_REG_R0, u64);
> > +#endif
> > +            break;
[...]

> Unfortunately, this won't work on all architectures.

[...]

Stefan, have you had a chance to consider this, yet?

I think it would be nice to do this in a way that:

 * allows enabling TCI at runtime.

 * doesn't rely on a huge switch table or wrapper for every conceivable
   type of function.

   For example, if helpers have three argument types, i32, i64 and ptr
   and up to four arguments, together with return types of void, i32,
   i64 and ptr, we might have something like:
  
        4 * (3^4 + 3^2 + 3^1 + 3^0)
     == 376 wrapper functions (or cases).

   Most of these wouldn't be used for any given target.  This gets worse
   if we add a few extra arguments and types.  With up to six arguments,
   and an extra type, there'd be 26985 cases / wrappers. :-(

So, I wonder if it would be best to place this in the target and generate
a wrapper for each helper function.  When using TCI, we'd generate a call
to this wrapper function instead of the helper function.  No extra space
in the ops buffer would be required, since the wrapper function would
call the helper function directly, rather than taking it as a parameter.

This could be compiled out on those architectures for which the current
approach is okay.

If we wanted to allow enabling TCI at runtime, then we'd need to choose
between the wrapper functions and helper functions at translation time,
either in gen_helper_* or when generating the TCI bytecode by way of a
somewhat vile switch() statement...

Should I try to put something together?

Cheers,
-- 
Stuart

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2011-10-16 21:54 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-17 19:59 [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Stefan Weil
2011-09-17 20:00 ` [Qemu-devel] [PATCH 1/8] tcg: Declare TCG_TARGET_REG_BITS in tcg.h Stefan Weil
2011-09-17 20:00 ` [Qemu-devel] [PATCH 2/8] tcg: Don't declare TCG_TARGET_REG_BITS in tcg-target.h Stefan Weil
2011-09-17 20:00 ` [Qemu-devel] [PATCH 3/8] tcg: Add forward declarations for local functions Stefan Weil
2011-09-17 21:40   ` Peter Maydell
2011-09-17 20:00 ` [Qemu-devel] [PATCH 4/8] tcg: Add some assertions Stefan Weil
2011-09-17 20:00 ` [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode Stefan Weil
2011-09-18  4:03   ` Andi Kleen
2011-09-18  5:49     ` Stefan Weil
2011-09-18  7:22       ` Paolo Bonzini
2011-09-18 17:54         ` Avi Kivity
2011-09-19  6:52           ` Andi Kleen
2011-09-19 11:56             ` Avi Kivity
2011-09-19 14:48               ` Andi Kleen
2011-09-18 10:18   ` Blue Swirl
2011-09-19 16:43   ` Richard Henderson
2011-09-19 20:24   ` Stuart Brady
2011-10-16 21:54     ` Stuart Brady
2011-09-17 20:00 ` [Qemu-devel] [PATCH 6/8] tcg: Add bytecode generator for tcg interpreter Stefan Weil
2011-09-18 10:03   ` Blue Swirl
2011-09-19 22:28     ` Stuart Brady
2011-10-01 16:54   ` Andreas Färber
2011-10-01 21:25     ` Stefan Weil
2011-10-09 16:19       ` Andreas Färber
2011-09-17 20:00 ` [Qemu-devel] [PATCH 7/8] tcg: Add tcg interpreter to configure / make Stefan Weil
2011-09-18  9:37   ` Blue Swirl
2011-09-18 10:14     ` Stefan Weil
2011-09-17 20:00 ` [Qemu-devel] [PATCH 8/8] ppc: Support tcg interpreter on ppc hosts Stefan Weil
2011-09-17 21:31   ` Peter Maydell
2011-09-17 21:33     ` Stefan Weil
2011-09-18 10:26 ` [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine) Blue Swirl
2011-09-18 10:49   ` malc
2011-09-18 12:12     ` Blue Swirl
2011-09-18 12:46       ` malc
2011-09-18 13:00         ` Blue Swirl
2011-09-18 13:13           ` malc
2011-09-18 13:26             ` Blue Swirl
2011-09-25 20:37           ` Stefan Weil
2011-10-01 12:02             ` Blue Swirl
2011-09-18 15:02 ` Mulyadi Santosa
2011-09-18 15:13   ` Stefan Weil
2011-09-18 16:39     ` Mulyadi Santosa
2011-09-18 20:15       ` Stefan Weil
2011-09-19 15:14         ` Mulyadi Santosa
2011-09-19  8:40     ` David Gilbert
2011-09-19 10:20       ` Stefan Hajnoczi
2011-09-19 10:27         ` David Gilbert
2011-09-18 18:02 ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.