All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation
@ 2014-03-17 18:37 Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 01/14] tcg: Fix missed pointer size != TCG_TARGET_REG_BITS changes Richard Henderson
                   ` (13 more replies)
  0 siblings, 14 replies; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

Our 32-bit build for sparc has been requiring a 64-bit capable chip
for about 2 years now, by way of requiring move-conditional and LE
memory instructions.  But we've mostly been generating 32-bit code
otherwise.

This patch set changes things so that we make full use of the cpu.

The sparcv8plus code model requires that 64-bit data be kept only
in the %g and %o registers.  These are saved by the kernel in full
64-bit slots somewhere.  Whereas the %i and %l registers are saved
via the register window mechanism, and as part of the 32-bit ABI
we've only allocated 32-bits of stack for storing these.  Since the
register window can roll at any time, due to signals and interrupts,
we must consider the high bits of %i and %l to be garbage.

This implies that we must treat 32-bit and 64-bit quantities differently.
For the most part, TCG is good with that.  The one case where that falls
down, however, is when we frob data between widths.  Thus the addition
of the trunc_i32 opcode.

This new opcode, or something like it, would have been required if
we ever got around to supporting MIPS64 code generation, where 32-bit
quantities must remain sign-extended in the 64-bit register at all times.

In the case of sparcv8plus, we can get what we need out of the opcode
merely by setting its register constraints properly.


r~


Richard Henderson (14):
  tcg: Fix missed pointer size != TCG_TARGET_REG_BITS changes
  tcg: Add INDEX_op_trunc_i32
  tcg-sparc: Remove most uses of TCG_TARGET_REG_BITS
  tcg-sparc: Support trunc_i32
  tcg-sparc: Use 64-bit registers with sparcv8plus
  tcg-sparc: Use the RETURN instruction
  tcg-sparc: Implement muls2_i32
  tcg-sparc: Tidy check_fit_* tests
  tcg-sparc: Don't handle mov/movi in tcg_out_op
  tcg-sparc: Hoist common argument loads in tcg_out_op
  tcg-sparc: Fixup function argument types
  tcg-sparc: Fix small 32-bit movi
  tcg-sparc: Fix 32-bit constant arguments tests
  tcg-sparc: Accept stores of zero

 include/exec/def-helper.h |   2 +-
 tcg/README                |   5 +
 tcg/aarch64/tcg-target.h  |   1 +
 tcg/i386/tcg-target.h     |   1 +
 tcg/ia64/tcg-target.h     |   1 +
 tcg/optimize.c            |  16 +
 tcg/ppc64/tcg-target.h    |   1 +
 tcg/s390/tcg-target.h     |   1 +
 tcg/sparc/tcg-target.c    | 903 ++++++++++++++++++++--------------------------
 tcg/sparc/tcg-target.h    |  21 +-
 tcg/tcg-op.h              |  54 ++-
 tcg/tcg-opc.h             |   4 +
 tcg/tcg.c                 |  45 ++-
 tcg/tcg.h                 |   1 +
 tcg/tci/tcg-target.h      |   1 +
 15 files changed, 503 insertions(+), 554 deletions(-)

-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 01/14] tcg: Fix missed pointer size != TCG_TARGET_REG_BITS changes
  2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
@ 2014-03-17 18:37 ` Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 02/14] tcg: Add INDEX_op_trunc_i32 Richard Henderson
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/exec/def-helper.h | 2 +-
 tcg/tcg-op.h              | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/exec/def-helper.h b/include/exec/def-helper.h
index 73d51f9..255b58b 100644
--- a/include/exec/def-helper.h
+++ b/include/exec/def-helper.h
@@ -84,7 +84,7 @@
 #define dh_is_64bit_noreturn 0
 #define dh_is_64bit_i32 0
 #define dh_is_64bit_i64 1
-#define dh_is_64bit_ptr (TCG_TARGET_REG_BITS == 64)
+#define dh_is_64bit_ptr (sizeof(void *) == 8)
 #define dh_is_64bit(t) glue(dh_is_64bit_, dh_alias(t))
 
 #define dh_is_signed_void 0
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 7eabf22..4089d89 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -2865,7 +2865,7 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
 #define tcg_gen_muls2_tl tcg_gen_muls2_i32
 #endif
 
-#if TCG_TARGET_REG_BITS == 32
+#if UINTPTR_MAX == UINT32_MAX
 # define tcg_gen_ld_ptr(R, A, O) \
     tcg_gen_ld_i32(TCGV_PTR_TO_NAT(R), (A), (O))
 # define tcg_gen_discard_ptr(A) \
@@ -2887,4 +2887,4 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
     tcg_gen_addi_i64(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(A), (B))
 # define tcg_gen_ext_i32_ptr(R, A) \
     tcg_gen_ext_i32_i64(TCGV_PTR_TO_NAT(R), (A))
-#endif /* TCG_TARGET_REG_BITS == 32 */
+#endif /* UINTPTR_MAX == UINT32_MAX */
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 02/14] tcg: Add INDEX_op_trunc_i32
  2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 01/14] tcg: Fix missed pointer size != TCG_TARGET_REG_BITS changes Richard Henderson
@ 2014-03-17 18:37 ` Richard Henderson
  2014-03-21 23:35   ` Stuart Brady
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 03/14] tcg-sparc: Remove most uses of TCG_TARGET_REG_BITS Richard Henderson
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

Let the backend do something special for truncation.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/README               |  5 +++++
 tcg/aarch64/tcg-target.h |  1 +
 tcg/i386/tcg-target.h    |  1 +
 tcg/ia64/tcg-target.h    |  1 +
 tcg/optimize.c           | 16 ++++++++++++++++
 tcg/ppc64/tcg-target.h   |  1 +
 tcg/s390/tcg-target.h    |  1 +
 tcg/sparc/tcg-target.h   |  1 +
 tcg/tcg-op.h             | 50 ++++++++++++++++++++++++++++++++----------------
 tcg/tcg-opc.h            |  4 ++++
 tcg/tcg.h                |  1 +
 tcg/tci/tcg-target.h     |  1 +
 12 files changed, 67 insertions(+), 16 deletions(-)

diff --git a/tcg/README b/tcg/README
index f178212..160cbe8 100644
--- a/tcg/README
+++ b/tcg/README
@@ -306,6 +306,11 @@ This operation would be equivalent to
 
   dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
 
+* trunc_i32 t0, t1, pos
+
+For 64-bit hosts only, right shift the 64-bit input T1 by POS and
+truncate to 32-bit output T0.  Depending on the host, this may be
+a simple mov/shift, or may require additional canonicalization.
 
 ********* Conditional moves
 
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 988983e..6363151 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -63,6 +63,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
+#define TCG_TARGET_HAS_trunc_i32        0
 
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index bdf2222..de34269 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -101,6 +101,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
+#define TCG_TARGET_HAS_trunc_i32        0
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
 #define TCG_TARGET_HAS_ext8s_i64        1
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 52a939c..4962dca 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -152,6 +152,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_mulsh_i64        0
+#define TCG_TARGET_HAS_trunc_i32        0
 
 #define TCG_TARGET_HAS_new_ldst         0
 
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 7777743..fee3292 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -228,6 +228,7 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     case INDEX_op_shr_i32:
         return (uint32_t)x >> (uint32_t)y;
 
+    case INDEX_op_trunc_i32:
     case INDEX_op_shr_i64:
         return (uint64_t)x >> (uint64_t)y;
 
@@ -826,6 +827,10 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             }
             break;
 
+        case INDEX_op_trunc_i32:
+            mask = (uint64_t)temps[args[1]].mask >> args[2];
+            break;
+
         CASE_OP_32_64(shl):
             if (temps[args[2]].state == TCG_TEMP_CONST) {
                 mask = temps[args[1]].mask << temps[args[2]].val;
@@ -1017,6 +1022,17 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             }
             goto do_default;
 
+        case INDEX_op_trunc_i32:
+            if (temps[args[1]].state == TCG_TEMP_CONST) {
+                s->gen_opc_buf[op_index] = op_to_movi(op);
+                tmp = do_constant_folding(op, temps[args[1]].val, args[2]);
+                tcg_opt_gen_movi(gen_args, args[0], tmp);
+                gen_args += 2;
+                args += 3;
+                break;
+            }
+            goto do_default;
+
         CASE_OP_32_64(add):
         CASE_OP_32_64(sub):
         CASE_OP_32_64(mul):
diff --git a/tcg/ppc64/tcg-target.h b/tcg/ppc64/tcg-target.h
index 7ee50b6..52a8127 100644
--- a/tcg/ppc64/tcg-target.h
+++ b/tcg/ppc64/tcg-target.h
@@ -97,6 +97,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
+#define TCG_TARGET_HAS_trunc_i32        0
 
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 10adb77..192cdbb 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -71,6 +71,7 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
+#define TCG_TARGET_HAS_trunc_i32        0
 
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 3abf1b4..8c3a8a4 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -119,6 +119,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
+#define TCG_TARGET_HAS_trunc_i32        0
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_rot_i64          0
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 4089d89..71820b1 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -1624,9 +1624,20 @@ static inline void tcg_gen_ext32u_i64(TCGv_i64 ret, TCGv_i64 arg)
     tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
 }
 
-static inline void tcg_gen_trunc_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
+static inline void tcg_gen_trunc_rshift_i64_i32(TCGv_i32 ret, TCGv_i64 arg,
+                                                unsigned int count)
 {
-    tcg_gen_mov_i32(ret, TCGV_LOW(arg));
+    tcg_debug_assert(count < 64);
+    if (count >= 32) {
+        tcg_gen_shri_i32(ret, TCGV_HIGH(arg), count - 32);
+    } else if (count == 0) {
+        tcg_gen_mov_i32(ret, TCGV_LOW(arg));
+    } else {
+        TCGv_i64 t = tcg_temp_new_i64();
+        tcg_gen_shri_i64(t, arg, count);
+        tcg_gen_mov_i32(ret, TCGV_LOW(t));
+        tcg_temp_free_i64(t);
+    }
 }
 
 static inline void tcg_gen_extu_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
@@ -1727,11 +1738,21 @@ static inline void tcg_gen_ext32u_i64(TCGv_i64 ret, TCGv_i64 arg)
     }
 }
 
-/* Note: we assume the target supports move between 32 and 64 bit
-   registers.  This will probably break MIPS64 targets.  */
-static inline void tcg_gen_trunc_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
+static inline void tcg_gen_trunc_rshift_i64_i32(TCGv_i32 ret, TCGv_i64 arg,
+                                                unsigned int count)
 {
-    tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
+    tcg_debug_assert(count < 64);
+    if (TCG_TARGET_HAS_trunc_i32) {
+        tcg_gen_op3i_i32(INDEX_op_trunc_i32, ret,
+                         MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
+    } else if (count == 0) {
+        tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
+    } else {
+        TCGv_i64 t = tcg_temp_new_i64();
+        tcg_gen_shri_i64(t, arg, count);
+        tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(t)));
+        tcg_temp_free_i64(t);
+    }
 }
 
 /* Note: we assume the target supports move between 32 and 64 bit
@@ -2275,18 +2296,15 @@ static inline void tcg_gen_concat32_i64(TCGv_i64 dest, TCGv_i64 low,
     tcg_gen_deposit_i64(dest, low, high, 32, 32);
 }
 
+static inline void tcg_gen_trunc_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
+{
+    tcg_gen_trunc_rshift_i64_i32(ret, arg, 0);
+}
+
 static inline void tcg_gen_extr_i64_i32(TCGv_i32 lo, TCGv_i32 hi, TCGv_i64 arg)
 {
-#if TCG_TARGET_REG_BITS == 32
-    tcg_gen_mov_i32(lo, TCGV_LOW(arg));
-    tcg_gen_mov_i32(hi, TCGV_HIGH(arg));
-#else
-    TCGv_i64 t0 = tcg_temp_new_i64();
-    tcg_gen_trunc_i64_i32(lo, arg);
-    tcg_gen_shri_i64(t0, arg, 32);
-    tcg_gen_trunc_i64_i32(hi, t0);
-    tcg_temp_free_i64(t0);
-#endif
+    tcg_gen_trunc_rshift_i64_i32(lo, arg, 0);
+    tcg_gen_trunc_rshift_i64_i32(hi, arg, 32);
 }
 
 static inline void tcg_gen_extr32_i64(TCGv_i64 lo, TCGv_i64 hi, TCGv_i64 arg)
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index d71707d..ae06e27 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -147,6 +147,10 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
 
+DEF(trunc_i32, 1, 1, 1,
+    IMPL(TCG_TARGET_HAS_trunc_i32)
+    | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
+
 DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | IMPL64)
 DEF(ext8s_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ext8s_i64))
 DEF(ext16s_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ext16s_i64))
diff --git a/tcg/tcg.h b/tcg/tcg.h
index f7efcb4..c336227 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -66,6 +66,7 @@ typedef uint64_t TCGRegSet;
 
 #if TCG_TARGET_REG_BITS == 32
 /* Turn some undef macros into false macros.  */
+#define TCG_TARGET_HAS_trunc_i32        0
 #define TCG_TARGET_HAS_div_i64          0
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_div2_i64         0
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 6e1da8c..fdd5ada 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -88,6 +88,7 @@
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
+#define TCG_TARGET_HAS_trunc_i32        0
 #define TCG_TARGET_HAS_bswap16_i64      1
 #define TCG_TARGET_HAS_bswap32_i64      1
 #define TCG_TARGET_HAS_bswap64_i64      1
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 03/14] tcg-sparc: Remove most uses of TCG_TARGET_REG_BITS
  2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 01/14] tcg: Fix missed pointer size != TCG_TARGET_REG_BITS changes Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 02/14] tcg: Add INDEX_op_trunc_i32 Richard Henderson
@ 2014-03-17 18:37 ` Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 04/14] tcg-sparc: Support trunc_i32 Richard Henderson
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

Replace with SPARC64 define.  Soon even sparcv8plus will use
64-bit register as far as TCG is concerned.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c | 70 ++++++++++++++++++++++++++------------------------
 1 file changed, 37 insertions(+), 33 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 152335c..107c1ab 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -61,6 +61,12 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 };
 #endif
 
+#ifdef __arch64__
+# define SPARC64 1
+#else
+# define SPARC64 0
+#endif
+
 /* Define some temporary registers.  T2 is used for constant generation.  */
 #define TCG_REG_T1  TCG_REG_G1
 #define TCG_REG_T2  TCG_REG_O7
@@ -396,9 +402,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     }
 
     /* A 32-bit constant, or 32-bit zero-extended to 64-bits.  */
-    if (TCG_TARGET_REG_BITS == 32
-        || type == TCG_TYPE_I32
-        || (arg & ~0xffffffffu) == 0) {
+    if (type == TCG_TYPE_I32 || (arg & ~0xffffffffu) == 0) {
         tcg_out_sethi(s, ret, arg);
         if (arg & 0x3ff) {
             tcg_out_arithi(s, ret, ret, arg & 0x3ff, ARITH_OR);
@@ -582,7 +586,7 @@ static void tcg_out_movcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
     tcg_out_movcc(s, cond, MOVCC_ICC, ret, v1, v1const);
 }
 
-#if TCG_TARGET_REG_BITS == 64
+#if SPARC64
 static void tcg_out_brcond_i64(TCGContext *s, TCGCond cond, TCGArg arg1,
                                TCGArg arg2, int const_arg2, int label)
 {
@@ -720,7 +724,7 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
     }
 }
 
-#if TCG_TARGET_REG_BITS == 64
+#if SPARC64
 static void tcg_out_setcond_i64(TCGContext *s, TCGCond cond, TCGArg ret,
                                 TCGArg c1, TCGArg c2, int c2const)
 {
@@ -852,7 +856,7 @@ static void build_trampolines(TCGContext *s)
         qemu_ld_trampoline[i] = tramp;
 
         /* Find the retaddr argument register.  */
-        ra = TCG_REG_O3 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
+        ra = TCG_REG_O3 + (!SPARC64 && TARGET_LONG_BITS == 64);
 
         /* Set the retaddr operand.  */
         tcg_out_mov(s, TCG_TYPE_PTR, ra, TCG_REG_O7);
@@ -879,8 +883,8 @@ static void build_trampolines(TCGContext *s)
         /* Find the retaddr argument.  For 32-bit, this may be past the
            last argument register, and need passing on the stack.  */
         ra = (TCG_REG_O4
-              + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS)
-              + (TCG_TARGET_REG_BITS == 32 && (i & MO_SIZE) == MO_64));
+              + (!SPARC64 && TARGET_LONG_BITS == 64)
+              + (!SPARC64 && (i & MO_SIZE) == MO_64));
 
         /* Set the retaddr operand.  */
         if (ra >= TCG_REG_O6) {
@@ -959,7 +963,7 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     TCGReg addr = addrlo;
     int tlb_ofs;
 
-    if (TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 64) {
+    if (!SPARC64 && TARGET_LONG_BITS == 64) {
         /* Assemble the 64-bit address in R0.  */
         tcg_out_arithi(s, r0, addrlo, 0, SHIFT_SRL);
         tcg_out_arithi(s, r1, addrhi, 32, SHIFT_SLLX);
@@ -1001,7 +1005,7 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     tcg_out_cmp(s, r0, r2, 0);
 
     /* If the guest address must be zero-extended, do so now.  */
-    if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
+    if (SPARC64 && TARGET_LONG_BITS == 32) {
         tcg_out_arithi(s, r0, addrlo, 0, SHIFT_SRL);
         return r0;
     }
@@ -1050,9 +1054,9 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 #endif
 
     datalo = *args++;
-    datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
+    datahi = (!SPARC64 && is64 ? *args++ : 0);
     addrlo = *args++;
-    addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
+    addrhi = (!SPARC64 && TARGET_LONG_BITS == 64 ? *args++ : 0);
     memop = *args++;
     s_bits = memop & MO_SIZE;
 
@@ -1061,7 +1065,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     addrz = tcg_out_tlb_load(s, addrlo, addrhi, memi, s_bits,
                              offsetof(CPUTLBEntry, addr_read));
 
-    if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
+    if (!SPARC64 && s_bits == MO_64) {
         int reg64;
 
         /* bne,pn %[xi]cc, label0 */
@@ -1143,11 +1147,11 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     *label_ptr[1] |= INSN_OFF19((unsigned long)s->code_ptr -
                                 (unsigned long)label_ptr[1]);
 #else
-    if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
+    if (SPARC64 && TARGET_LONG_BITS == 32) {
         tcg_out_arithi(s, TCG_REG_T1, addrlo, 0, SHIFT_SRL);
         addrlo = TCG_REG_T1;
     }
-    if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
+    if (!SPARC64 && s_bits == MO_64) {
         int reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
 
         tcg_out_ldst_rr(s, reg64, addrlo,
@@ -1178,9 +1182,9 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 #endif
 
     datalo = *args++;
-    datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
+    datahi = (!SPARC64 && is64 ? *args++ : 0);
     addrlo = *args++;
-    addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
+    addrhi = (!SPARC64 && TARGET_LONG_BITS == 64 ? *args++ : 0);
     memop = *args++;
     s_bits = memop & MO_SIZE;
 
@@ -1190,7 +1194,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
                              offsetof(CPUTLBEntry, addr_write));
 
     datafull = datalo;
-    if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
+    if (!SPARC64 && s_bits == MO_64) {
         /* Reconstruct the full 64-bit value.  */
         tcg_out_arithi(s, TCG_REG_T1, datalo, 0, SHIFT_SRL);
         tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
@@ -1228,11 +1232,11 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     *label_ptr |= INSN_OFF19((unsigned long)s->code_ptr -
                              (unsigned long)label_ptr);
 #else
-    if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
+    if (SPARC64 && TARGET_LONG_BITS == 32) {
         tcg_out_arithi(s, TCG_REG_T1, addrlo, 0, SHIFT_SRL);
         addrlo = TCG_REG_T1;
     }
-    if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
+    if (!SPARC64 && s_bits == MO_64) {
         tcg_out_arithi(s, TCG_REG_T1, datalo, 0, SHIFT_SRL);
         tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
         tcg_out_arith(s, TCG_REG_O2, TCG_REG_T1, TCG_REG_O2, ARITH_OR);
@@ -1288,7 +1292,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_movi(s, TCG_TYPE_I32, args[0], (uint32_t)args[1]);
         break;
 
-#if TCG_TARGET_REG_BITS == 64
+#if SPARC64
 #define OP_32_64(x)                             \
         glue(glue(case INDEX_op_, x), _i32):    \
         glue(glue(case INDEX_op_, x), _i64)
@@ -1309,7 +1313,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_ldst(s, args[0], args[1], args[2], LDSH);
         break;
     case INDEX_op_ld_i32:
-#if TCG_TARGET_REG_BITS == 64
+#if SPARC64
     case INDEX_op_ld32u_i64:
 #endif
         tcg_out_ldst(s, args[0], args[1], args[2], LDUW);
@@ -1321,7 +1325,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_ldst(s, args[0], args[1], args[2], STH);
         break;
     case INDEX_op_st_i32:
-#if TCG_TARGET_REG_BITS == 64
+#if SPARC64
     case INDEX_op_st32_i64:
 #endif
         tcg_out_ldst(s, args[0], args[1], args[2], STW);
@@ -1390,7 +1394,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
                             args[2], const_args[2], args[3], const_args[3]);
         break;
 
-#if TCG_TARGET_REG_BITS == 32
+#if !SPARC64
     case INDEX_op_brcond2_i32:
         tcg_out_brcond2_i32(s, args[4], args[0], args[1],
                             args[2], const_args[2],
@@ -1432,7 +1436,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_qemu_st(s, args, 1);
         break;
 
-#if TCG_TARGET_REG_BITS == 64
+#if SPARC64
     case INDEX_op_movi_i64:
         tcg_out_movi(s, TCG_TYPE_I64, args[0], args[1]);
         break;
@@ -1539,7 +1543,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_setcond_i32, { "r", "rZ", "rJ" } },
     { INDEX_op_movcond_i32, { "r", "rZ", "rJ", "rI", "0" } },
 
-#if TCG_TARGET_REG_BITS == 32
+#if !SPARC64
     { INDEX_op_brcond2_i32, { "rZ", "rZ", "rJ", "rJ" } },
     { INDEX_op_setcond2_i32, { "r", "rZ", "rZ", "rJ", "rJ" } },
 #endif
@@ -1548,7 +1552,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_sub2_i32, { "r", "r", "rZ", "rZ", "rJ", "rJ" } },
     { INDEX_op_mulu2_i32, { "r", "r", "rZ", "rJ" } },
 
-#if TCG_TARGET_REG_BITS == 64
+#if SPARC64
     { INDEX_op_mov_i64, { "r", "r" } },
     { INDEX_op_movi_i64, { "r" } },
     { INDEX_op_ld8u_i64, { "r", "r" } },
@@ -1589,12 +1593,12 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_movcond_i64, { "r", "rZ", "rJ", "rI", "0" } },
 #endif
 
-#if TCG_TARGET_REG_BITS == 64
+#if SPARC64
     { INDEX_op_qemu_ld_i32, { "r", "L" } },
     { INDEX_op_qemu_ld_i64, { "r", "L" } },
     { INDEX_op_qemu_st_i32, { "L", "L" } },
     { INDEX_op_qemu_st_i64, { "L", "L" } },
-#elif TARGET_LONG_BITS <= TCG_TARGET_REG_BITS
+#elif TARGET_LONG_BITS == 32
     { INDEX_op_qemu_ld_i32, { "r", "L" } },
     { INDEX_op_qemu_ld_i64, { "r", "r", "L" } },
     { INDEX_op_qemu_st_i32, { "L", "L" } },
@@ -1612,7 +1616,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 static void tcg_target_init(TCGContext *s)
 {
     tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xffffffff);
-#if TCG_TARGET_REG_BITS == 64
+#if SPARC64
     tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, 0xffffffff);
 #endif
     tcg_regset_set32(tcg_target_call_clobber_regs, 0,
@@ -1644,7 +1648,7 @@ static void tcg_target_init(TCGContext *s)
     tcg_add_target_add_op_defs(sparc_op_defs);
 }
 
-#if TCG_TARGET_REG_BITS == 64
+#if SPARC64
 # define ELF_HOST_MACHINE  EM_SPARCV9
 #else
 # define ELF_HOST_MACHINE  EM_SPARC32PLUS
@@ -1654,7 +1658,7 @@ static void tcg_target_init(TCGContext *s)
 typedef struct {
     DebugFrameCIE cie;
     DebugFrameFDEHeader fde;
-    uint8_t fde_def_cfa[TCG_TARGET_REG_BITS == 64 ? 4 : 2];
+    uint8_t fde_def_cfa[SPARC64 ? 4 : 2];
     uint8_t fde_win_save;
     uint8_t fde_ret_save[3];
 } DebugFrame;
@@ -1671,7 +1675,7 @@ static DebugFrame debug_frame = {
     .fde.len = sizeof(DebugFrame) - offsetof(DebugFrame, fde.cie_offset),
 
     .fde_def_cfa = {
-#if TCG_TARGET_REG_BITS == 64
+#if SPARC64
         12, 30,                         /* DW_CFA_def_cfa i6, 2047 */
         (2047 & 0x7f) | 0x80, (2047 >> 7)
 #else
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 04/14] tcg-sparc: Support trunc_i32
  2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
                   ` (2 preceding siblings ...)
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 03/14] tcg-sparc: Remove most uses of TCG_TARGET_REG_BITS Richard Henderson
@ 2014-03-17 18:37 ` Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 05/14] tcg-sparc: Use 64-bit registers with sparcv8plus Richard Henderson
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

Unlike a 64-bit shift op, allows the output to be in %l or %i registers
for sparcv8plus.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c | 8 ++++++++
 tcg/sparc/tcg-target.h | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 107c1ab..38e8577 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1476,6 +1476,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_ext32u_i64:
         tcg_out_arithi(s, args[0], args[1], 0, SHIFT_SRL);
         break;
+    case INDEX_op_trunc_i32:
+        if (args[2] == 0) {
+            tcg_out_mov(s, TCG_TYPE_I32, args[0], args[1]);
+        } else {
+            tcg_out_arithi(s, args[0], args[1], args[2], SHIFT_SRLX);
+        }
+        break;
 
     case INDEX_op_brcond_i64:
         tcg_out_brcond_i64(s, args[2], args[0], args[1], const_args[1],
@@ -1587,6 +1594,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 
     { INDEX_op_ext32s_i64, { "r", "r" } },
     { INDEX_op_ext32u_i64, { "r", "r" } },
+    { INDEX_op_trunc_i32,  { "r", "r" } },
 
     { INDEX_op_brcond_i64, { "rZ", "rJ" } },
     { INDEX_op_setcond_i64, { "r", "rZ", "rJ" } },
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 8c3a8a4..7b8b33e 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -119,7 +119,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_i32        0
+#define TCG_TARGET_HAS_trunc_i32        1
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_rot_i64          0
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 05/14] tcg-sparc: Use 64-bit registers with sparcv8plus
  2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
                   ` (3 preceding siblings ...)
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 04/14] tcg-sparc: Support trunc_i32 Richard Henderson
@ 2014-03-17 18:37 ` Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 06/14] tcg-sparc: Use the RETURN instruction Richard Henderson
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

Quite a lot of effort was spent composing and decomposing 64-bit
quantities in registers, when we should just create them and leave
them as one 64-bit register.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c | 525 +++++++++++++++++--------------------------------
 tcg/sparc/tcg-target.h |  14 +-
 tcg/tcg.c              |  45 ++++-
 3 files changed, 228 insertions(+), 356 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 38e8577..fb5ba27 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -67,6 +67,18 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 # define SPARC64 0
 #endif
 
+/* Note that sparcv8plus can only hold 64 bit quantities in %g and %o
+   registers.  These are saved manually by the kernel in full 64-bit
+   slots.  The %i and %l registers are saved by the register window
+   mechanism, which only allocates space for 32 bits.  Given that this
+   window spill/fill can happen on any signal, we must consider the
+   high bits of the %i and %l registers garbage at all times.  */
+#if SPARC64
+# define ALL_64  0xffffffffu
+#else
+# define ALL_64  0xffffu
+#endif
+
 /* Define some temporary registers.  T2 is used for constant generation.  */
 #define TCG_REG_T1  TCG_REG_G1
 #define TCG_REG_T2  TCG_REG_O7
@@ -307,14 +319,27 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         ct->ct |= TCG_CT_REG;
         tcg_regset_set32(ct->u.regs, 0, 0xffffffff);
         break;
-    case 'L': /* qemu_ld/st constraint */
+    case 'R':
         ct->ct |= TCG_CT_REG;
-        tcg_regset_set32(ct->u.regs, 0, 0xffffffff);
-        // Helper args
+        tcg_regset_set32(ct->u.regs, 0, ALL_64);
+        break;
+    case 'A': /* qemu_ld/st address constraint */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set32(ct->u.regs, 0,
+                         TARGET_LONG_BITS == 64 ? ALL_64 : 0xffffffff);
+    reserve_helpers:
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_O0);
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_O1);
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_O2);
         break;
+    case 's': /* qemu_st data 32-bit constraint */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set32(ct->u.regs, 0, 0xffffffff);
+        goto reserve_helpers;
+    case 'S': /* qemu_st data 64-bit constraint */
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set32(ct->u.regs, 0, ALL_64);
+        goto reserve_helpers;
     case 'I':
         ct->ct |= TCG_CT_CONST_S11;
         break;
@@ -402,7 +427,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     }
 
     /* A 32-bit constant, or 32-bit zero-extended to 64-bits.  */
-    if (type == TCG_TYPE_I32 || (arg & ~0xffffffffu) == 0) {
+    if (type == TCG_TYPE_I32 || arg == (uint32_t)arg) {
         tcg_out_sethi(s, ret, arg);
         if (arg & 0x3ff) {
             tcg_out_arithi(s, ret, ret, arg & 0x3ff, ARITH_OR);
@@ -420,12 +445,12 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     /* A 64-bit constant decomposed into 2 32-bit pieces.  */
     lo = (int32_t)arg;
     if (check_fit_tl(lo, 13)) {
-        hi = (arg - lo) >> 31 >> 1;
+        hi = (arg - lo) >> 32;
         tcg_out_movi(s, TCG_TYPE_I32, ret, hi);
         tcg_out_arithi(s, ret, ret, 32, SHIFT_SLLX);
         tcg_out_arithi(s, ret, ret, lo, ARITH_ADD);
     } else {
-        hi = arg >> 31 >> 1;
+        hi = arg >> 32;
         tcg_out_movi(s, TCG_TYPE_I32, ret, hi);
         tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_T2, lo);
         tcg_out_arithi(s, ret, ret, 32, SHIFT_SLLX);
@@ -586,7 +611,6 @@ static void tcg_out_movcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
     tcg_out_movcc(s, cond, MOVCC_ICC, ret, v1, v1const);
 }
 
-#if SPARC64
 static void tcg_out_brcond_i64(TCGContext *s, TCGCond cond, TCGArg arg1,
                                TCGArg arg2, int const_arg2, int label)
 {
@@ -634,45 +658,6 @@ static void tcg_out_movcond_i64(TCGContext *s, TCGCond cond, TCGArg ret,
         tcg_out_movcc(s, cond, MOVCC_XCC, ret, v1, v1const);
     }
 }
-#else
-static void tcg_out_brcond2_i32(TCGContext *s, TCGCond cond,
-                                TCGArg al, TCGArg ah,
-                                TCGArg bl, int blconst,
-                                TCGArg bh, int bhconst, int label_dest)
-{
-    int scond, label_next = gen_new_label();
-
-    tcg_out_cmp(s, ah, bh, bhconst);
-
-    /* Note that we fill one of the delay slots with the second compare.  */
-    switch (cond) {
-    case TCG_COND_EQ:
-        tcg_out_bpcc(s, COND_NE, BPCC_ICC | BPCC_PT, label_next);
-        tcg_out_cmp(s, al, bl, blconst);
-        tcg_out_bpcc(s, COND_E, BPCC_ICC | BPCC_PT, label_dest);
-        break;
-
-    case TCG_COND_NE:
-        tcg_out_bpcc(s, COND_NE, BPCC_ICC | BPCC_PT, label_dest);
-        tcg_out_cmp(s, al, bl, blconst);
-        tcg_out_bpcc(s, COND_NE, BPCC_ICC | BPCC_PT, label_dest);
-        break;
-
-    default:
-        scond = tcg_cond_to_bcond[tcg_high_cond(cond)];
-        tcg_out_bpcc(s, scond, BPCC_ICC | BPCC_PT, label_dest);
-        tcg_out_nop(s);
-        tcg_out_bpcc(s, COND_NE, BPCC_ICC | BPCC_PT, label_next);
-        tcg_out_cmp(s, al, bl, blconst);
-        scond = tcg_cond_to_bcond[tcg_unsigned_cond(cond)];
-        tcg_out_bpcc(s, scond, BPCC_ICC | BPCC_PT, label_dest);
-        break;
-    }
-    tcg_out_nop(s);
-
-    tcg_out_label(s, label_next, s->code_ptr);
-}
-#endif
 
 static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
                                 TCGArg c1, TCGArg c2, int c2const)
@@ -724,7 +709,6 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
     }
 }
 
-#if SPARC64
 static void tcg_out_setcond_i64(TCGContext *s, TCGCond cond, TCGArg ret,
                                 TCGArg c1, TCGArg c2, int c2const)
 {
@@ -739,48 +723,6 @@ static void tcg_out_setcond_i64(TCGContext *s, TCGCond cond, TCGArg ret,
         tcg_out_movcc(s, cond, MOVCC_XCC, ret, 1, 1);
     }
 }
-#else
-static void tcg_out_setcond2_i32(TCGContext *s, TCGCond cond, TCGArg ret,
-                                 TCGArg al, TCGArg ah,
-                                 TCGArg bl, int blconst,
-                                 TCGArg bh, int bhconst)
-{
-    int tmp = TCG_REG_T1;
-
-    /* Note that the low parts are fully consumed before tmp is set.  */
-    if (ret != ah && (bhconst || ret != bh)) {
-        tmp = ret;
-    }
-
-    switch (cond) {
-    case TCG_COND_EQ:
-    case TCG_COND_NE:
-        if (bl == 0 && bh == 0) {
-            if (cond == TCG_COND_EQ) {
-                tcg_out_arith(s, TCG_REG_G0, al, ah, ARITH_ORCC);
-                tcg_out_movi(s, TCG_TYPE_I32, ret, 1);
-            } else {
-                tcg_out_arith(s, ret, al, ah, ARITH_ORCC);
-            }
-        } else {
-            tcg_out_setcond_i32(s, cond, tmp, al, bl, blconst);
-            tcg_out_cmp(s, ah, bh, bhconst);
-            tcg_out_mov(s, TCG_TYPE_I32, ret, tmp);
-        }
-        tcg_out_movcc(s, TCG_COND_NE, MOVCC_ICC, ret, cond == TCG_COND_NE, 1);
-        break;
-
-    default:
-        /* <= : ah < bh | (ah == bh && al <= bl) */
-        tcg_out_setcond_i32(s, tcg_unsigned_cond(cond), tmp, al, bl, blconst);
-        tcg_out_cmp(s, ah, bh, bhconst);
-        tcg_out_mov(s, TCG_TYPE_I32, ret, tmp);
-        tcg_out_movcc(s, TCG_COND_NE, MOVCC_ICC, ret, 0, 1);
-        tcg_out_movcc(s, tcg_high_cond(cond), MOVCC_ICC, ret, 1, 1);
-        break;
-    }
-}
-#endif
 
 static void tcg_out_addsub2(TCGContext *s, TCGArg rl, TCGArg rh,
                             TCGArg al, TCGArg ah, TCGArg bl, int blconst,
@@ -855,8 +797,13 @@ static void build_trampolines(TCGContext *s)
         }
         qemu_ld_trampoline[i] = tramp;
 
-        /* Find the retaddr argument register.  */
-        ra = TCG_REG_O3 + (!SPARC64 && TARGET_LONG_BITS == 64);
+        if (SPARC64 || TARGET_LONG_BITS == 32) {
+            ra = TCG_REG_O3;
+        } else {
+            /* Install the high part of the address.  */
+            tcg_out_arithi(s, TCG_REG_O1, TCG_REG_O2, 32, SHIFT_SRLX);
+            ra = TCG_REG_O4;
+        }
 
         /* Set the retaddr operand.  */
         tcg_out_mov(s, TCG_TYPE_PTR, ra, TCG_REG_O7);
@@ -880,12 +827,28 @@ static void build_trampolines(TCGContext *s)
         }
         qemu_st_trampoline[i] = tramp;
 
-        /* Find the retaddr argument.  For 32-bit, this may be past the
-           last argument register, and need passing on the stack.  */
-        ra = (TCG_REG_O4
-              + (!SPARC64 && TARGET_LONG_BITS == 64)
-              + (!SPARC64 && (i & MO_SIZE) == MO_64));
-
+        if (SPARC64) {
+            ra = TCG_REG_O4;
+        } else {
+            ra = TCG_REG_O1;
+            if (TARGET_LONG_BITS == 64) {
+                /* Install the high part of the address.  */
+                tcg_out_arithi(s, ra, ra + 1, 32, SHIFT_SRLX);
+                ra += 2;
+            } else {
+                ra += 1;
+            }
+            if ((i & MO_SIZE) == MO_64) {
+                /* Install the high part of the data.  */
+                tcg_out_arithi(s, ra, ra + 1, 32, SHIFT_SRLX);
+                ra += 2;
+            } else {
+                ra += 1;
+            }
+            /* Skip the mem_index argument.  */
+            ra += 1;
+        }
+                
         /* Set the retaddr operand.  */
         if (ra >= TCG_REG_O6) {
             tcg_out_st(s, TCG_TYPE_PTR, TCG_REG_O7, TCG_REG_CALL_STACK,
@@ -954,25 +917,16 @@ static void tcg_target_qemu_prologue(TCGContext *s)
    The result of the TLB comparison is in %[ix]cc.  The sanitized address
    is in the returned register, maybe %o0.  The TLB addend is in %o1.  */
 
-static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
-                               int mem_index, TCGMemOp s_bits, int which)
+static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addr, int mem_index,
+                               TCGMemOp s_bits, int which)
 {
     const TCGReg r0 = TCG_REG_O0;
     const TCGReg r1 = TCG_REG_O1;
     const TCGReg r2 = TCG_REG_O2;
-    TCGReg addr = addrlo;
     int tlb_ofs;
 
-    if (!SPARC64 && TARGET_LONG_BITS == 64) {
-        /* Assemble the 64-bit address in R0.  */
-        tcg_out_arithi(s, r0, addrlo, 0, SHIFT_SRL);
-        tcg_out_arithi(s, r1, addrhi, 32, SHIFT_SLLX);
-        tcg_out_arith(s, r0, r0, r1, ARITH_OR);
-        addr = r0;
-    }
-
     /* Shift the page number down.  */
-    tcg_out_arithi(s, r1, addrlo, TARGET_PAGE_BITS, SHIFT_SRL);
+    tcg_out_arithi(s, r1, addr, TARGET_PAGE_BITS, SHIFT_SRL);
 
     /* Mask out the page offset, except for the required alignment.  */
     tcg_out_movi(s, TCG_TYPE_TL, TCG_REG_T1,
@@ -1006,10 +960,10 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
 
     /* If the guest address must be zero-extended, do so now.  */
     if (SPARC64 && TARGET_LONG_BITS == 32) {
-        tcg_out_arithi(s, r0, addrlo, 0, SHIFT_SRL);
+        tcg_out_arithi(s, r0, addr, 0, SHIFT_SRL);
         return r0;
     }
-    return addrlo;
+    return addr;
 }
 #endif /* CONFIG_SOFTMMU */
 
@@ -1042,78 +996,37 @@ static const int qemu_st_opc[16] = {
     [MO_LEQ]  = STX_LE,
 };
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
+static void tcg_out_qemu_ld(TCGContext *s, TCGReg data, TCGReg addr,
+                            TCGMemOp memop, int memi, bool is_64)
 {
-    TCGReg addrlo, datalo, datahi, addrhi __attribute__((unused));
-    TCGMemOp memop, s_bits;
-#if defined(CONFIG_SOFTMMU)
+#ifdef CONFIG_SOFTMMU
+    TCGMemOp s_bits = memop & MO_SIZE;
     TCGReg addrz, param;
     uintptr_t func;
-    int memi;
-    uint32_t *label_ptr[2];
-#endif
-
-    datalo = *args++;
-    datahi = (!SPARC64 && is64 ? *args++ : 0);
-    addrlo = *args++;
-    addrhi = (!SPARC64 && TARGET_LONG_BITS == 64 ? *args++ : 0);
-    memop = *args++;
-    s_bits = memop & MO_SIZE;
+    uint32_t *label_ptr;
 
-#if defined(CONFIG_SOFTMMU)
-    memi = *args++;
-    addrz = tcg_out_tlb_load(s, addrlo, addrhi, memi, s_bits,
+    addrz = tcg_out_tlb_load(s, addr, memi, s_bits,
                              offsetof(CPUTLBEntry, addr_read));
 
-    if (!SPARC64 && s_bits == MO_64) {
-        int reg64;
-
-        /* bne,pn %[xi]cc, label0 */
-        label_ptr[0] = (uint32_t *)s->code_ptr;
-        tcg_out_bpcc0(s, COND_NE, BPCC_PN
-                      | (TARGET_LONG_BITS == 64 ? BPCC_XCC : BPCC_ICC), 0);
-        tcg_out_nop(s);
-
-        /* TLB Hit.  */
-        /* Load all 64-bits into an O/G register.  */
-        reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
-        tcg_out_ldst_rr(s, reg64, addrz, TCG_REG_O1, qemu_ld_opc[memop]);
+    /* The fast path is exactly one insn.  Thus we can perform the
+       entire TLB Hit in the (annulled) delay slot of the branch
+       over the TLB Miss case.  */
 
-        /* Move the two 32-bit pieces into the destination registers.  */
-        tcg_out_arithi(s, datahi, reg64, 32, SHIFT_SRLX);
-        if (reg64 != datalo) {
-            tcg_out_mov(s, TCG_TYPE_I32, datalo, reg64);
-        }
-
-        /* b,a,pt label1 */
-        label_ptr[1] = (uint32_t *)s->code_ptr;
-        tcg_out_bpcc0(s, COND_A, BPCC_A | BPCC_PT, 0);
-    } else {
-        /* The fast path is exactly one insn.  Thus we can perform the
-           entire TLB Hit in the (annulled) delay slot of the branch
-           over the TLB Miss case.  */
-
-        /* beq,a,pt %[xi]cc, label0 */
-        label_ptr[0] = NULL;
-        label_ptr[1] = (uint32_t *)s->code_ptr;
-        tcg_out_bpcc0(s, COND_E, BPCC_A | BPCC_PT
-                      | (TARGET_LONG_BITS == 64 ? BPCC_XCC : BPCC_ICC), 0);
-        /* delay slot */
-        tcg_out_ldst_rr(s, datalo, addrz, TCG_REG_O1, qemu_ld_opc[memop]);
-    }
+    /* beq,a,pt %[xi]cc, label0 */
+    label_ptr = (uint32_t *)s->code_ptr;
+    tcg_out_bpcc0(s, COND_E, BPCC_A | BPCC_PT
+                  | (TARGET_LONG_BITS == 64 ? BPCC_XCC : BPCC_ICC), 0);
+    /* delay slot */
+    tcg_out_ldst_rr(s, data, addrz, TCG_REG_O1, qemu_ld_opc[memop]);
 
     /* TLB Miss.  */
 
-    if (label_ptr[0]) {
-        *label_ptr[0] |= INSN_OFF19((unsigned long)s->code_ptr -
-                                    (unsigned long)label_ptr[0]);
-    }
-
     param = TCG_REG_O1;
-    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-        tcg_out_mov(s, TCG_TYPE_REG, param++, addrhi);
+    if (!SPARC64 && TARGET_LONG_BITS == 64) {
+        /* Skip the high-part; we'll perform the extract in the trampoline.  */
+        param++;
     }
-    tcg_out_mov(s, TCG_TYPE_REG, param++, addrlo);
+    tcg_out_mov(s, TCG_TYPE_REG, param++, addr);
 
     /* We use the helpers to extend SB and SW data, leaving the case
        of SL needing explicit extending below.  */
@@ -1127,81 +1040,54 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     /* delay slot */
     tcg_out_movi(s, TCG_TYPE_I32, param, memi);
 
-    switch (memop & ~MO_BSWAP) {
-    case MO_SL:
-        tcg_out_arithi(s, datalo, TCG_REG_O0, 0, SHIFT_SRA);
-        break;
-    case MO_Q:
-        if (TCG_TARGET_REG_BITS == 32) {
-            tcg_out_mov(s, TCG_TYPE_REG, datahi, TCG_REG_O0);
-            tcg_out_mov(s, TCG_TYPE_REG, datalo, TCG_REG_O1);
-            break;
+    /* Recall that all of the helpers return 64-bit results.
+       Which complicates things for sparcv8plus.  */
+    if (SPARC64) {
+        /* We let the helper sign-extend SB and SW, but leave SL for here.  */
+        if (is_64 && (memop & ~MO_BSWAP) == MO_SL) {
+            tcg_out_arithi(s, data, TCG_REG_O0, 0, SHIFT_SRA);
+        } else {
+            tcg_out_mov(s, TCG_TYPE_REG, data, TCG_REG_O0);
+        }
+    } else {
+        if (s_bits == MO_64) {
+            tcg_out_arithi(s, TCG_REG_O0, TCG_REG_O0, 32, SHIFT_SLLX);
+            tcg_out_arithi(s, TCG_REG_O1, TCG_REG_O1, 0, SHIFT_SRL);
+            tcg_out_arith(s, data, TCG_REG_O0, TCG_REG_O1, ARITH_OR);
+        } else if (is_64) {
+            /* Re-extend from 32-bit rather than reassembling when we
+               know the high register must be an extension.  */
+            tcg_out_arithi(s, data, TCG_REG_O1, 0,
+                           memop & MO_SIGN ? SHIFT_SRA : SHIFT_SRL);
+        } else {
+            tcg_out_mov(s, TCG_TYPE_I32, data, TCG_REG_O1);
         }
-        /* FALLTHRU */
-    default:
-        /* mov */
-        tcg_out_mov(s, TCG_TYPE_REG, datalo, TCG_REG_O0);
-        break;
     }
 
-    *label_ptr[1] |= INSN_OFF19((unsigned long)s->code_ptr -
-                                (unsigned long)label_ptr[1]);
+    *label_ptr |= INSN_OFF19((uintptr_t)s->code_ptr - (uintptr_t)label_ptr);
 #else
     if (SPARC64 && TARGET_LONG_BITS == 32) {
-        tcg_out_arithi(s, TCG_REG_T1, addrlo, 0, SHIFT_SRL);
-        addrlo = TCG_REG_T1;
-    }
-    if (!SPARC64 && s_bits == MO_64) {
-        int reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
-
-        tcg_out_ldst_rr(s, reg64, addrlo,
-                        (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
-                        qemu_ld_opc[memop]);
-
-        tcg_out_arithi(s, datahi, reg64, 32, SHIFT_SRLX);
-        if (reg64 != datalo) {
-            tcg_out_mov(s, TCG_TYPE_I32, datalo, reg64);
-        }
-    } else {
-        tcg_out_ldst_rr(s, datalo, addrlo,
-                        (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
-                        qemu_ld_opc[memop]);
+        tcg_out_arithi(s, TCG_REG_T1, addr, 0, SHIFT_SRL);
+        addr = TCG_REG_T1;
     }
+    tcg_out_ldst_rr(s, data, addr,
+                    (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
+                    qemu_ld_opc[memop]);
 #endif /* CONFIG_SOFTMMU */
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
+static void tcg_out_qemu_st(TCGContext *s, TCGReg data, TCGReg addr,
+                            TCGMemOp memop, int memi)
 {
-    TCGReg addrlo, datalo, datahi, addrhi __attribute__((unused));
-    TCGMemOp memop, s_bits;
-#if defined(CONFIG_SOFTMMU)
-    TCGReg addrz, datafull, param;
+#ifdef CONFIG_SOFTMMU
+    TCGMemOp s_bits = memop & MO_SIZE;
+    TCGReg addrz, param;
     uintptr_t func;
-    int memi;
     uint32_t *label_ptr;
-#endif
-
-    datalo = *args++;
-    datahi = (!SPARC64 && is64 ? *args++ : 0);
-    addrlo = *args++;
-    addrhi = (!SPARC64 && TARGET_LONG_BITS == 64 ? *args++ : 0);
-    memop = *args++;
-    s_bits = memop & MO_SIZE;
 
-#if defined(CONFIG_SOFTMMU)
-    memi = *args++;
-    addrz = tcg_out_tlb_load(s, addrlo, addrhi, memi, s_bits,
+    addrz = tcg_out_tlb_load(s, addr, memi, s_bits,
                              offsetof(CPUTLBEntry, addr_write));
 
-    datafull = datalo;
-    if (!SPARC64 && s_bits == MO_64) {
-        /* Reconstruct the full 64-bit value.  */
-        tcg_out_arithi(s, TCG_REG_T1, datalo, 0, SHIFT_SRL);
-        tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
-        tcg_out_arith(s, TCG_REG_O2, TCG_REG_T1, TCG_REG_O2, ARITH_OR);
-        datafull = TCG_REG_O2;
-    }
-
     /* The fast path is exactly one insn.  Thus we can perform the entire
        TLB Hit in the (annulled) delay slot of the branch over TLB Miss.  */
     /* beq,a,pt %[xi]cc, label0 */
@@ -1209,19 +1095,21 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     tcg_out_bpcc0(s, COND_E, BPCC_A | BPCC_PT
                   | (TARGET_LONG_BITS == 64 ? BPCC_XCC : BPCC_ICC), 0);
     /* delay slot */
-    tcg_out_ldst_rr(s, datafull, addrz, TCG_REG_O1, qemu_st_opc[memop]);
+    tcg_out_ldst_rr(s, data, addrz, TCG_REG_O1, qemu_st_opc[memop]);
 
     /* TLB Miss.  */
 
     param = TCG_REG_O1;
-    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-        tcg_out_mov(s, TCG_TYPE_REG, param++, addrhi);
+    if (!SPARC64 && TARGET_LONG_BITS == 64) {
+        /* Skip the high-part; we'll perform the extract in the trampoline.  */
+        param++;
     }
-    tcg_out_mov(s, TCG_TYPE_REG, param++, addrlo);
-    if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
-        tcg_out_mov(s, TCG_TYPE_REG, param++, datahi);
+    tcg_out_mov(s, TCG_TYPE_REG, param++, addr);
+    if (!SPARC64 && s_bits == MO_64) {
+        /* Skip the high-part; we'll perform the extract in the trampoline.  */
+        param++;
     }
-    tcg_out_mov(s, TCG_TYPE_REG, param++, datalo);
+    tcg_out_mov(s, TCG_TYPE_REG, param++, data);
 
     func = qemu_st_trampoline[memop];
     assert(func != 0);
@@ -1229,20 +1117,13 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     /* delay slot */
     tcg_out_movi(s, TCG_TYPE_REG, param, memi);
 
-    *label_ptr |= INSN_OFF19((unsigned long)s->code_ptr -
-                             (unsigned long)label_ptr);
+    *label_ptr |= INSN_OFF19((uintptr_t)s->code_ptr - (uintptr_t)label_ptr);
 #else
     if (SPARC64 && TARGET_LONG_BITS == 32) {
-        tcg_out_arithi(s, TCG_REG_T1, addrlo, 0, SHIFT_SRL);
-        addrlo = TCG_REG_T1;
-    }
-    if (!SPARC64 && s_bits == MO_64) {
-        tcg_out_arithi(s, TCG_REG_T1, datalo, 0, SHIFT_SRL);
-        tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
-        tcg_out_arith(s, TCG_REG_O2, TCG_REG_T1, TCG_REG_O2, ARITH_OR);
-        datalo = TCG_REG_O2;
+        tcg_out_arithi(s, TCG_REG_T1, addr, 0, SHIFT_SRL);
+        addr = TCG_REG_T1;
     }
-    tcg_out_ldst_rr(s, datalo, addrlo,
+    tcg_out_ldst_rr(s, data, addr,
                     (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
                     qemu_st_opc[memop]);
 #endif /* CONFIG_SOFTMMU */
@@ -1292,14 +1173,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_movi(s, TCG_TYPE_I32, args[0], (uint32_t)args[1]);
         break;
 
-#if SPARC64
 #define OP_32_64(x)                             \
         glue(glue(case INDEX_op_, x), _i32):    \
         glue(glue(case INDEX_op_, x), _i64)
-#else
-#define OP_32_64(x)                             \
-        glue(glue(case INDEX_op_, x), _i32)
-#endif
+
     OP_32_64(ld8u):
         tcg_out_ldst(s, args[0], args[1], args[2], LDUB);
         break;
@@ -1313,9 +1190,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_ldst(s, args[0], args[1], args[2], LDSH);
         break;
     case INDEX_op_ld_i32:
-#if SPARC64
     case INDEX_op_ld32u_i64:
-#endif
         tcg_out_ldst(s, args[0], args[1], args[2], LDUW);
         break;
     OP_32_64(st8):
@@ -1325,9 +1200,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_ldst(s, args[0], args[1], args[2], STH);
         break;
     case INDEX_op_st_i32:
-#if SPARC64
     case INDEX_op_st32_i64:
-#endif
         tcg_out_ldst(s, args[0], args[1], args[2], STW);
         break;
     OP_32_64(add):
@@ -1394,19 +1267,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
                             args[2], const_args[2], args[3], const_args[3]);
         break;
 
-#if !SPARC64
-    case INDEX_op_brcond2_i32:
-        tcg_out_brcond2_i32(s, args[4], args[0], args[1],
-                            args[2], const_args[2],
-                            args[3], const_args[3], args[5]);
-        break;
-    case INDEX_op_setcond2_i32:
-        tcg_out_setcond2_i32(s, args[5], args[0], args[1], args[2],
-                             args[3], const_args[3],
-                             args[4], const_args[4]);
-        break;
-#endif
-
     case INDEX_op_add2_i32:
         tcg_out_addsub2(s, args[0], args[1], args[2], args[3],
                         args[4], const_args[4], args[5], const_args[5],
@@ -1424,19 +1284,18 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
 
     case INDEX_op_qemu_ld_i32:
-        tcg_out_qemu_ld(s, args, 0);
+        tcg_out_qemu_ld(s, args[0], args[1], args[2], args[3], false);
         break;
     case INDEX_op_qemu_ld_i64:
-        tcg_out_qemu_ld(s, args, 1);
+        tcg_out_qemu_ld(s, args[0], args[1], args[2], args[3], true);
         break;
     case INDEX_op_qemu_st_i32:
-        tcg_out_qemu_st(s, args, 0);
+        tcg_out_qemu_st(s, args[0], args[1], args[2], args[3]);
         break;
     case INDEX_op_qemu_st_i64:
-        tcg_out_qemu_st(s, args, 1);
+        tcg_out_qemu_st(s, args[0], args[1], args[2], args[3]);
         break;
 
-#if SPARC64
     case INDEX_op_movi_i64:
         tcg_out_movi(s, TCG_TYPE_I64, args[0], args[1]);
         break;
@@ -1496,7 +1355,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_movcond_i64(s, args[5], args[0], args[1],
                             args[2], const_args[2], args[3], const_args[3]);
         break;
-#endif
+
     gen_arith:
         tcg_out_arithc(s, args[0], args[1], args[2], const_args[2], c);
         break;
@@ -1550,73 +1409,54 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_setcond_i32, { "r", "rZ", "rJ" } },
     { INDEX_op_movcond_i32, { "r", "rZ", "rJ", "rI", "0" } },
 
-#if !SPARC64
-    { INDEX_op_brcond2_i32, { "rZ", "rZ", "rJ", "rJ" } },
-    { INDEX_op_setcond2_i32, { "r", "rZ", "rZ", "rJ", "rJ" } },
-#endif
-
     { INDEX_op_add2_i32, { "r", "r", "rZ", "rZ", "rJ", "rJ" } },
     { INDEX_op_sub2_i32, { "r", "r", "rZ", "rZ", "rJ", "rJ" } },
     { INDEX_op_mulu2_i32, { "r", "r", "rZ", "rJ" } },
 
-#if SPARC64
-    { INDEX_op_mov_i64, { "r", "r" } },
-    { INDEX_op_movi_i64, { "r" } },
-    { INDEX_op_ld8u_i64, { "r", "r" } },
-    { INDEX_op_ld8s_i64, { "r", "r" } },
-    { INDEX_op_ld16u_i64, { "r", "r" } },
-    { INDEX_op_ld16s_i64, { "r", "r" } },
-    { INDEX_op_ld32u_i64, { "r", "r" } },
-    { INDEX_op_ld32s_i64, { "r", "r" } },
-    { INDEX_op_ld_i64, { "r", "r" } },
-    { INDEX_op_st8_i64, { "rZ", "r" } },
-    { INDEX_op_st16_i64, { "rZ", "r" } },
-    { INDEX_op_st32_i64, { "rZ", "r" } },
-    { INDEX_op_st_i64, { "rZ", "r" } },
-
-    { INDEX_op_add_i64, { "r", "rZ", "rJ" } },
-    { INDEX_op_mul_i64, { "r", "rZ", "rJ" } },
-    { INDEX_op_div_i64, { "r", "rZ", "rJ" } },
-    { INDEX_op_divu_i64, { "r", "rZ", "rJ" } },
-    { INDEX_op_sub_i64, { "r", "rZ", "rJ" } },
-    { INDEX_op_and_i64, { "r", "rZ", "rJ" } },
-    { INDEX_op_andc_i64, { "r", "rZ", "rJ" } },
-    { INDEX_op_or_i64, { "r", "rZ", "rJ" } },
-    { INDEX_op_orc_i64, { "r", "rZ", "rJ" } },
-    { INDEX_op_xor_i64, { "r", "rZ", "rJ" } },
-
-    { INDEX_op_shl_i64, { "r", "rZ", "rJ" } },
-    { INDEX_op_shr_i64, { "r", "rZ", "rJ" } },
-    { INDEX_op_sar_i64, { "r", "rZ", "rJ" } },
-
-    { INDEX_op_neg_i64, { "r", "rJ" } },
-    { INDEX_op_not_i64, { "r", "rJ" } },
-
-    { INDEX_op_ext32s_i64, { "r", "r" } },
-    { INDEX_op_ext32u_i64, { "r", "r" } },
-    { INDEX_op_trunc_i32,  { "r", "r" } },
-
-    { INDEX_op_brcond_i64, { "rZ", "rJ" } },
-    { INDEX_op_setcond_i64, { "r", "rZ", "rJ" } },
-    { INDEX_op_movcond_i64, { "r", "rZ", "rJ", "rI", "0" } },
-#endif
-
-#if SPARC64
-    { INDEX_op_qemu_ld_i32, { "r", "L" } },
-    { INDEX_op_qemu_ld_i64, { "r", "L" } },
-    { INDEX_op_qemu_st_i32, { "L", "L" } },
-    { INDEX_op_qemu_st_i64, { "L", "L" } },
-#elif TARGET_LONG_BITS == 32
-    { INDEX_op_qemu_ld_i32, { "r", "L" } },
-    { INDEX_op_qemu_ld_i64, { "r", "r", "L" } },
-    { INDEX_op_qemu_st_i32, { "L", "L" } },
-    { INDEX_op_qemu_st_i64, { "L", "L", "L" } },
-#else
-    { INDEX_op_qemu_ld_i32, { "r", "L", "L" } },
-    { INDEX_op_qemu_ld_i64, { "L", "L", "L", "L" } },
-    { INDEX_op_qemu_st_i32, { "L", "L", "L" } },
-    { INDEX_op_qemu_st_i64, { "L", "L", "L", "L" } },
-#endif
+    { INDEX_op_mov_i64, { "R", "R" } },
+    { INDEX_op_movi_i64, { "R" } },
+    { INDEX_op_ld8u_i64, { "R", "r" } },
+    { INDEX_op_ld8s_i64, { "R", "r" } },
+    { INDEX_op_ld16u_i64, { "R", "r" } },
+    { INDEX_op_ld16s_i64, { "R", "r" } },
+    { INDEX_op_ld32u_i64, { "R", "r" } },
+    { INDEX_op_ld32s_i64, { "R", "r" } },
+    { INDEX_op_ld_i64, { "R", "r" } },
+    { INDEX_op_st8_i64, { "RZ", "r" } },
+    { INDEX_op_st16_i64, { "RZ", "r" } },
+    { INDEX_op_st32_i64, { "RZ", "r" } },
+    { INDEX_op_st_i64, { "RZ", "r" } },
+
+    { INDEX_op_add_i64, { "R", "RZ", "RJ" } },
+    { INDEX_op_mul_i64, { "R", "RZ", "RJ" } },
+    { INDEX_op_div_i64, { "R", "RZ", "RJ" } },
+    { INDEX_op_divu_i64, { "R", "RZ", "RJ" } },
+    { INDEX_op_sub_i64, { "R", "RZ", "RJ" } },
+    { INDEX_op_and_i64, { "R", "RZ", "RJ" } },
+    { INDEX_op_andc_i64, { "R", "RZ", "RJ" } },
+    { INDEX_op_or_i64, { "R", "RZ", "RJ" } },
+    { INDEX_op_orc_i64, { "R", "RZ", "RJ" } },
+    { INDEX_op_xor_i64, { "R", "RZ", "RJ" } },
+
+    { INDEX_op_shl_i64, { "R", "RZ", "RJ" } },
+    { INDEX_op_shr_i64, { "R", "RZ", "RJ" } },
+    { INDEX_op_sar_i64, { "R", "RZ", "RJ" } },
+
+    { INDEX_op_neg_i64, { "R", "RJ" } },
+    { INDEX_op_not_i64, { "R", "RJ" } },
+
+    { INDEX_op_ext32s_i64, { "R", "r" } },
+    { INDEX_op_ext32u_i64, { "R", "r" } },
+    { INDEX_op_trunc_i32,  { "r", "R" } },
+
+    { INDEX_op_brcond_i64, { "RZ", "RJ" } },
+    { INDEX_op_setcond_i64, { "R", "RZ", "RJ" } },
+    { INDEX_op_movcond_i64, { "R", "RZ", "RJ", "RI", "0" } },
+
+    { INDEX_op_qemu_ld_i32, { "r", "A" } },
+    { INDEX_op_qemu_ld_i64, { "R", "A" } },
+    { INDEX_op_qemu_st_i32, { "s", "A" } },
+    { INDEX_op_qemu_st_i64, { "S", "A" } },
 
     { -1 },
 };
@@ -1624,9 +1464,8 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 static void tcg_target_init(TCGContext *s)
 {
     tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xffffffff);
-#if SPARC64
-    tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, 0xffffffff);
-#endif
+    tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, ALL_64);
+
     tcg_regset_set32(tcg_target_call_clobber_regs, 0,
                      (1 << TCG_REG_G1) |
                      (1 << TCG_REG_G2) |
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 7b8b33e..5442d45 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -24,13 +24,7 @@
 #ifndef TCG_TARGET_SPARC 
 #define TCG_TARGET_SPARC 1
 
-#if UINTPTR_MAX == UINT32_MAX
-# define TCG_TARGET_REG_BITS 32
-#elif UINTPTR_MAX == UINT64_MAX
-# define TCG_TARGET_REG_BITS 64
-#else
-# error Unknown pointer size for tcg target
-#endif
+#define TCG_TARGET_REG_BITS 64
 
 #define TCG_TARGET_WORDS_BIGENDIAN
 
@@ -78,7 +72,7 @@ typedef enum {
 /* used for function call generation */
 #define TCG_REG_CALL_STACK TCG_REG_O6
 
-#if TCG_TARGET_REG_BITS == 64
+#ifdef __arch64__
 #define TCG_TARGET_STACK_BIAS           2047
 #define TCG_TARGET_STACK_ALIGN          16
 #define TCG_TARGET_CALL_STACK_OFFSET    (128 + 6*8 + TCG_TARGET_STACK_BIAS)
@@ -88,7 +82,7 @@ typedef enum {
 #define TCG_TARGET_CALL_STACK_OFFSET    (64 + 4 + 6*4)
 #endif
 
-#if TCG_TARGET_REG_BITS == 64
+#ifdef __arch64__
 #define TCG_TARGET_EXTEND_ARGS 1
 #endif
 
@@ -118,7 +112,6 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 
-#if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_trunc_i32        1
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
@@ -147,7 +140,6 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i64        0
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i64        0
-#endif
 
 #define TCG_TARGET_HAS_new_ldst         1
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index f1e0763..eb4afe4 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -664,7 +664,28 @@ void tcg_gen_callN(TCGContext *s, TCGv_ptr func, unsigned int flags,
     int nb_rets;
     TCGArg *nparam;
 
-#if defined(TCG_TARGET_EXTEND_ARGS) && TCG_TARGET_REG_BITS == 64
+#if defined(__sparc__) && !defined(__arch64__)
+    /* We have 64-bit values in one register, but need to pass as two
+       separate parameters.  Split them.  */
+    int orig_sizemask = sizemask;
+    int orig_nargs = nargs;
+    if (sizemask != 0) {
+        TCGArg *split_args = __builtin_alloca(sizeof(TCGArg) * nargs * 2);
+        for (i = real_args = 0; i < nargs; ++i) {
+            int is_64bit = sizemask & (1 << (i+1)*2);
+            if (is_64bit) {
+                TCGv_i32 h = tcg_temp_new_i32();
+                TCGv_i64 orig = MAKE_TCGV_I64(args[i]);
+                tcg_gen_trunc_rshift_i64_i32(h, orig, 32);
+                split_args[real_args++] = GET_TCGV_I32(h);
+            }
+            split_args[real_args++] = args[i];
+        }
+        nargs = real_args;
+        args = split_args;
+        sizemask = 0;
+    }
+#elif defined(TCG_TARGET_EXTEND_ARGS) && TCG_TARGET_REG_BITS == 64
     for (i = 0; i < nargs; ++i) {
         int is_64bit = sizemask & (1 << (i+1)*2);
         int is_signed = sizemask & (2 << (i+1)*2);
@@ -749,7 +770,16 @@ void tcg_gen_callN(TCGContext *s, TCGv_ptr func, unsigned int flags,
     /* total parameters, needed to go backward in the instruction stream */
     *s->gen_opparam_ptr++ = 1 + nb_rets + real_args + 3;
 
-#if defined(TCG_TARGET_EXTEND_ARGS) && TCG_TARGET_REG_BITS == 64
+#if defined(__sparc__) && !defined(__arch64__)
+    for (i = real_args = 0; i < orig_nargs; ++i) {
+        int is_64bit = orig_sizemask & (1 << (i+1)*2);
+        if (is_64bit) {
+            TCGv_i32 h = MAKE_TCGV_I32(args[real_args++]);
+            tcg_temp_free_i32(h);
+        }
+        real_args++;
+    }
+#elif defined(TCG_TARGET_EXTEND_ARGS) && TCG_TARGET_REG_BITS == 64
     for (i = 0; i < nargs; ++i) {
         int is_64bit = sizemask & (1 << (i+1)*2);
         if (!is_64bit) {
@@ -2411,6 +2441,17 @@ static int tcg_reg_alloc_call(TCGContext *s, const TCGOpDef *def,
         ts = &s->temps[arg];
         reg = tcg_target_call_oarg_regs[i];
         assert(s->reg_to_temp[reg] == -1);
+
+#if defined(__sparc__) && !defined(__arch64__)
+        /* Reconstruct %o0,%o1 32-bit pair into a single 64-bit value. */
+        if (ts->type == TCG_TYPE_I64) {
+            assert(i == 0);
+            tcg_out_arithi(s, TCG_REG_O0, TCG_REG_O0, 32, SHIFT_SLLX);
+            tcg_out_arithi(s, TCG_REG_O1, TCG_REG_O1, 0, SHIFT_SRL);
+            tcg_out_arith(s, TCG_REG_O0, TCG_REG_O0, TCG_REG_O1, ARITH_OR);
+        }
+#endif
+
         if (ts->fixed_reg) {
             if (ts->reg != reg) {
                 tcg_out_mov(s, ts->type, ts->reg, reg);
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 06/14] tcg-sparc: Use the RETURN instruction
  2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
                   ` (4 preceding siblings ...)
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 05/14] tcg-sparc: Use 64-bit registers with sparcv8plus Richard Henderson
@ 2014-03-17 18:37 ` Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 07/14] tcg-sparc: Implement muls2_i32 Richard Henderson
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

Saves one insn per TB exit over JMPL+RESTORE.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index fb5ba27..d086c10 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -219,6 +219,7 @@ static const int tcg_target_call_oarg_regs[] = {
 #define RDY        (INSN_OP(2) | INSN_OP3(0x28) | INSN_RS1(0))
 #define WRY        (INSN_OP(2) | INSN_OP3(0x30) | INSN_RD(0))
 #define JMPL       (INSN_OP(2) | INSN_OP3(0x38))
+#define RETURN     (INSN_OP(2) | INSN_OP3(0x39))
 #define SAVE       (INSN_OP(2) | INSN_OP3(0x3c))
 #define RESTORE    (INSN_OP(2) | INSN_OP3(0x3d))
 #define SETHI      (INSN_OP(0) | INSN_OP2(0x4))
@@ -1136,10 +1137,15 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 
     switch (opc) {
     case INDEX_op_exit_tb:
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I0, args[0]);
-        tcg_out_arithi(s, TCG_REG_G0, TCG_REG_I7, 8, JMPL);
-        tcg_out32(s, RESTORE | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_G0) |
-                      INSN_RS2(TCG_REG_G0));
+        if (check_fit_tl(args[0], 13)) {
+            tcg_out_arithi(s, TCG_REG_G0, TCG_REG_I7, 8, RETURN);
+            tcg_out_movi_imm13(s, TCG_REG_O0, args[0]);
+        } else {
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I0, args[0] & ~0x3ff);
+            tcg_out_arithi(s, TCG_REG_G0, TCG_REG_I7, 8, RETURN);
+            tcg_out_arithi(s, TCG_REG_O0, TCG_REG_O0,
+                           args[0] & 0x3ff, ARITH_OR);
+        }
         break;
     case INDEX_op_goto_tb:
         if (s->tb_jmp_offset) {
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 07/14] tcg-sparc: Implement muls2_i32
  2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
                   ` (5 preceding siblings ...)
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 06/14] tcg-sparc: Use the RETURN instruction Richard Henderson
@ 2014-03-17 18:37 ` Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 08/14] tcg-sparc: Tidy check_fit_* tests Richard Henderson
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

Using the 32-bit SMUL is a tad more efficient than
resorting to extending and using the 64-bit MULX.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c | 18 +++++++++++++++---
 tcg/sparc/tcg-target.h |  2 +-
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index d086c10..43ede5b 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -200,6 +200,7 @@ static const int tcg_target_call_oarg_regs[] = {
 #define ARITH_ADDX (INSN_OP(2) | INSN_OP3(0x08))
 #define ARITH_SUBX (INSN_OP(2) | INSN_OP3(0x0c))
 #define ARITH_UMUL (INSN_OP(2) | INSN_OP3(0x0a))
+#define ARITH_SMUL (INSN_OP(2) | INSN_OP3(0x0b))
 #define ARITH_UDIV (INSN_OP(2) | INSN_OP3(0x0e))
 #define ARITH_SDIV (INSN_OP(2) | INSN_OP3(0x0f))
 #define ARITH_MULX (INSN_OP(2) | INSN_OP3(0x09))
@@ -1284,9 +1285,19 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
                         ARITH_SUBCC, ARITH_SUBX);
         break;
     case INDEX_op_mulu2_i32:
-        tcg_out_arithc(s, args[0], args[2], args[3], const_args[3],
-                       ARITH_UMUL);
-        tcg_out_rdy(s, args[1]);
+        c = ARITH_UMUL;
+        goto do_mul2;
+    case INDEX_op_muls2_i32:
+        c = ARITH_SMUL;
+    do_mul2:
+        /* The 32-bit multiply insns produce a full 64-bit result.  If the
+           destination register can hold it, we can avoid the slower RDY.  */
+        tcg_out_arithc(s, args[0], args[2], args[3], const_args[3], c);
+        if (SPARC64 || args[0] <= TCG_REG_O7) {
+            tcg_out_arithi(s, args[1], args[0], 32, SHIFT_SRLX);
+        } else {
+            tcg_out_rdy(s, args[1]);
+        }
         break;
 
     case INDEX_op_qemu_ld_i32:
@@ -1418,6 +1429,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_add2_i32, { "r", "r", "rZ", "rZ", "rJ", "rJ" } },
     { INDEX_op_sub2_i32, { "r", "r", "rZ", "rZ", "rJ", "rJ" } },
     { INDEX_op_mulu2_i32, { "r", "r", "rZ", "rJ" } },
+    { INDEX_op_muls2_i32, { "r", "r", "rZ", "rJ" } },
 
     { INDEX_op_mov_i64, { "R", "R" } },
     { INDEX_op_movi_i64, { "R" } },
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 5442d45..091224c 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -108,7 +108,7 @@ typedef enum {
 #define TCG_TARGET_HAS_add2_i32         1
 #define TCG_TARGET_HAS_sub2_i32         1
 #define TCG_TARGET_HAS_mulu2_i32        1
-#define TCG_TARGET_HAS_muls2_i32        0
+#define TCG_TARGET_HAS_muls2_i32        1
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 08/14] tcg-sparc: Tidy check_fit_* tests
  2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
                   ` (6 preceding siblings ...)
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 07/14] tcg-sparc: Implement muls2_i32 Richard Henderson
@ 2014-03-17 18:37 ` Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 09/14] tcg-sparc: Don't handle mov/movi in tcg_out_op Richard Henderson
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

Use sextract instead of raw bit shifting for the tests.  Introduce
a new check_fit_ptr macro to make it clear we're looking at pointers.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 43ede5b..3165fe6 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -262,17 +262,23 @@ static const int tcg_target_call_oarg_regs[] = {
 #define STW_LE     (STWA  | INSN_ASI(ASI_PRIMARY_LITTLE))
 #define STX_LE     (STXA  | INSN_ASI(ASI_PRIMARY_LITTLE))
 
-static inline int check_fit_tl(tcg_target_long val, unsigned int bits)
+static inline int check_fit_i64(int64_t val, unsigned int bits)
 {
-    return (val << ((sizeof(tcg_target_long) * 8 - bits))
-            >> (sizeof(tcg_target_long) * 8 - bits)) == val;
+    return val == sextract64(val, 0, bits);
 }
 
-static inline int check_fit_i32(uint32_t val, unsigned int bits)
+static inline int check_fit_i32(int32_t val, unsigned int bits)
 {
-    return ((val << (32 - bits)) >> (32 - bits)) == val;
+    return val == sextract32(val, 0, bits);
 }
 
+#define check_fit_tl    check_fit_i64
+#if SPARC64
+# define check_fit_ptr  check_fit_i64
+#else
+# define check_fit_ptr  check_fit_i32
+#endif
+
 static void patch_reloc(uint8_t *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
@@ -287,7 +293,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
         break;
     case R_SPARC_WDISP16:
         value -= (intptr_t)code_ptr;
-        if (!check_fit_tl(value >> 2, 16)) {
+        if (!check_fit_ptr(value >> 2, 16)) {
             tcg_abort();
         }
         insn = *(uint32_t *)code_ptr;
@@ -297,7 +303,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
         break;
     case R_SPARC_WDISP19:
         value -= (intptr_t)code_ptr;
-        if (!check_fit_tl(value >> 2, 19)) {
+        if (!check_fit_ptr(value >> 2, 19)) {
             tcg_abort();
         }
         insn = *(uint32_t *)code_ptr;
@@ -420,7 +426,7 @@ static inline void tcg_out_movi_imm13(TCGContext *s, int ret, uint32_t arg)
 static void tcg_out_movi(TCGContext *s, TCGType type,
                          TCGReg ret, tcg_target_long arg)
 {
-    tcg_target_long hi, lo;
+    tcg_target_long hi, lo = (int32_t)arg;
 
     /* A 13-bit constant sign-extended to 64-bits.  */
     if (check_fit_tl(arg, 13)) {
@@ -438,15 +444,14 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     }
 
     /* A 32-bit constant sign-extended to 64-bits.  */
-    if (check_fit_tl(arg, 32)) {
+    if (arg == lo) {
         tcg_out_sethi(s, ret, ~arg);
         tcg_out_arithi(s, ret, ret, (arg & 0x3ff) | -0x400, ARITH_XOR);
         return;
     }
 
     /* A 64-bit constant decomposed into 2 32-bit pieces.  */
-    lo = (int32_t)arg;
-    if (check_fit_tl(lo, 13)) {
+    if (check_fit_i32(lo, 13)) {
         hi = (arg - lo) >> 32;
         tcg_out_movi(s, TCG_TYPE_I32, ret, hi);
         tcg_out_arithi(s, ret, ret, 32, SHIFT_SLLX);
@@ -469,7 +474,7 @@ static inline void tcg_out_ldst_rr(TCGContext *s, int data, int a1,
 static inline void tcg_out_ldst(TCGContext *s, int ret, int addr,
                                 int offset, int op)
 {
-    if (check_fit_tl(offset, 13)) {
+    if (check_fit_ptr(offset, 13)) {
         tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(addr) |
                   INSN_IMM13(offset));
     } else {
@@ -493,7 +498,7 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
 static inline void tcg_out_ld_ptr(TCGContext *s, TCGReg ret, uintptr_t arg)
 {
     TCGReg base = TCG_REG_G0;
-    if (!check_fit_tl(arg, 10)) {
+    if (!check_fit_ptr(arg, 10)) {
         tcg_out_movi(s, TCG_TYPE_PTR, ret, arg & ~0x3ff);
         base = ret;
     }
@@ -948,7 +953,7 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addr, int mem_index,
 
     /* Find a base address that can load both tlb comparator and addend.  */
     tlb_ofs = offsetof(CPUArchState, tlb_table[mem_index][0]);
-    if (!check_fit_tl(tlb_ofs + sizeof(CPUTLBEntry), 13)) {
+    if (!check_fit_ptr(tlb_ofs + sizeof(CPUTLBEntry), 13)) {
         tcg_out_addi(s, r1, tlb_ofs & ~0x3ff);
         tlb_ofs &= 0x3ff;
     }
@@ -1138,7 +1143,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 
     switch (opc) {
     case INDEX_op_exit_tb:
-        if (check_fit_tl(args[0], 13)) {
+        if (check_fit_ptr(args[0], 13)) {
             tcg_out_arithi(s, TCG_REG_G0, TCG_REG_I7, 8, RETURN);
             tcg_out_movi_imm13(s, TCG_REG_O0, args[0]);
         } else {
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 09/14] tcg-sparc: Don't handle mov/movi in tcg_out_op
  2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
                   ` (7 preceding siblings ...)
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 08/14] tcg-sparc: Tidy check_fit_* tests Richard Henderson
@ 2014-03-17 18:37 ` Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 10/14] tcg-sparc: Hoist common argument loads " Richard Henderson
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 3165fe6..b70d9b5 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1181,9 +1181,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_bpcc(s, COND_A, BPCC_PT, args[0]);
         tcg_out_nop(s);
         break;
-    case INDEX_op_movi_i32:
-        tcg_out_movi(s, TCG_TYPE_I32, args[0], (uint32_t)args[1]);
-        break;
 
 #define OP_32_64(x)                             \
         glue(glue(case INDEX_op_, x), _i32):    \
@@ -1318,9 +1315,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_qemu_st(s, args[0], args[1], args[2], args[3]);
         break;
 
-    case INDEX_op_movi_i64:
-        tcg_out_movi(s, TCG_TYPE_I64, args[0], args[1]);
-        break;
     case INDEX_op_ld32s_i64:
         tcg_out_ldst(s, args[0], args[1], args[2], LDSW);
         break;
@@ -1386,8 +1380,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 	tcg_out_arithc(s, args[0], TCG_REG_G0, args[1], const_args[1], c);
 	break;
 
+    case INDEX_op_mov_i64:
+    case INDEX_op_mov_i32:
+    case INDEX_op_movi_i64:
+    case INDEX_op_movi_i32:
+        /* Always implemented with tcg_out_mov/i, never with tcg_out_op.  */
     default:
-        fprintf(stderr, "unknown opcode 0x%x\n", opc);
+        /* Opcode not implemented.  */
         tcg_abort();
     }
 }
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 10/14] tcg-sparc: Hoist common argument loads in tcg_out_op
  2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
                   ` (8 preceding siblings ...)
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 09/14] tcg-sparc: Don't handle mov/movi in tcg_out_op Richard Henderson
@ 2014-03-17 18:37 ` Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 11/14] tcg-sparc: Fixup function argument types Richard Henderson
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c | 123 +++++++++++++++++++++++--------------------------
 1 file changed, 58 insertions(+), 65 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index b70d9b5..88676da 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1136,49 +1136,56 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data, TCGReg addr,
 #endif /* CONFIG_SOFTMMU */
 }
 
-static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
-                              const int *const_args)
+static void tcg_out_op(TCGContext *s, TCGOpcode opc,
+                       const TCGArg args[TCG_MAX_OP_ARGS],
+                       const int const_args[TCG_MAX_OP_ARGS])
 {
-    int c;
+    TCGArg a0, a1, a2;
+    int c, c2;
+
+    /* Hoist the loads of the most common arguments.  */
+    a0 = args[0];
+    a1 = args[1];
+    a2 = args[2];
+    c2 = const_args[2];
 
     switch (opc) {
     case INDEX_op_exit_tb:
-        if (check_fit_ptr(args[0], 13)) {
+        if (check_fit_ptr(a0, 13)) {
             tcg_out_arithi(s, TCG_REG_G0, TCG_REG_I7, 8, RETURN);
-            tcg_out_movi_imm13(s, TCG_REG_O0, args[0]);
+            tcg_out_movi_imm13(s, TCG_REG_O0, a0);
         } else {
-            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I0, args[0] & ~0x3ff);
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I0, a0 & ~0x3ff);
             tcg_out_arithi(s, TCG_REG_G0, TCG_REG_I7, 8, RETURN);
-            tcg_out_arithi(s, TCG_REG_O0, TCG_REG_O0,
-                           args[0] & 0x3ff, ARITH_OR);
+            tcg_out_arithi(s, TCG_REG_O0, TCG_REG_O0, a0 & 0x3ff, ARITH_OR);
         }
         break;
     case INDEX_op_goto_tb:
         if (s->tb_jmp_offset) {
             /* direct jump method */
             uint32_t old_insn = *(uint32_t *)s->code_ptr;
-            s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
+            s->tb_jmp_offset[a0] = s->code_ptr - s->code_buf;
             /* Make sure to preserve links during retranslation.  */
             tcg_out32(s, CALL | (old_insn & ~INSN_OP(-1)));
         } else {
             /* indirect jump method */
-            tcg_out_ld_ptr(s, TCG_REG_T1, (uintptr_t)(s->tb_next + args[0]));
+            tcg_out_ld_ptr(s, TCG_REG_T1, (uintptr_t)(s->tb_next + a0));
             tcg_out_arithi(s, TCG_REG_G0, TCG_REG_T1, 0, JMPL);
         }
         tcg_out_nop(s);
-        s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
+        s->tb_next_offset[a0] = s->code_ptr - s->code_buf;
         break;
     case INDEX_op_call:
         if (const_args[0]) {
-            tcg_out_calli(s, args[0]);
+            tcg_out_calli(s, a0);
         } else {
-            tcg_out_arithi(s, TCG_REG_O7, args[0], 0, JMPL);
+            tcg_out_arithi(s, TCG_REG_O7, a0, 0, JMPL);
         }
         /* delay slot */
         tcg_out_nop(s);
         break;
     case INDEX_op_br:
-        tcg_out_bpcc(s, COND_A, BPCC_PT, args[0]);
+        tcg_out_bpcc(s, COND_A, BPCC_PT, a0);
         tcg_out_nop(s);
         break;
 
@@ -1187,30 +1194,30 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         glue(glue(case INDEX_op_, x), _i64)
 
     OP_32_64(ld8u):
-        tcg_out_ldst(s, args[0], args[1], args[2], LDUB);
+        tcg_out_ldst(s, a0, a1, a2, LDUB);
         break;
     OP_32_64(ld8s):
-        tcg_out_ldst(s, args[0], args[1], args[2], LDSB);
+        tcg_out_ldst(s, a0, a1, a2, LDSB);
         break;
     OP_32_64(ld16u):
-        tcg_out_ldst(s, args[0], args[1], args[2], LDUH);
+        tcg_out_ldst(s, a0, a1, a2, LDUH);
         break;
     OP_32_64(ld16s):
-        tcg_out_ldst(s, args[0], args[1], args[2], LDSH);
+        tcg_out_ldst(s, a0, a1, a2, LDSH);
         break;
     case INDEX_op_ld_i32:
     case INDEX_op_ld32u_i64:
-        tcg_out_ldst(s, args[0], args[1], args[2], LDUW);
+        tcg_out_ldst(s, a0, a1, a2, LDUW);
         break;
     OP_32_64(st8):
-        tcg_out_ldst(s, args[0], args[1], args[2], STB);
+        tcg_out_ldst(s, a0, a1, a2, STB);
         break;
     OP_32_64(st16):
-        tcg_out_ldst(s, args[0], args[1], args[2], STH);
+        tcg_out_ldst(s, a0, a1, a2, STH);
         break;
     case INDEX_op_st_i32:
     case INDEX_op_st32_i64:
-        tcg_out_ldst(s, args[0], args[1], args[2], STW);
+        tcg_out_ldst(s, a0, a1, a2, STW);
         break;
     OP_32_64(add):
         c = ARITH_ADD;
@@ -1237,7 +1244,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         c = SHIFT_SLL;
     do_shift32:
         /* Limit immediate shift count lest we create an illegal insn.  */
-        tcg_out_arithc(s, args[0], args[1], args[2] & 31, const_args[2], c);
+        tcg_out_arithc(s, a0, a1, a2 & 31, c2, c);
         break;
     case INDEX_op_shr_i32:
         c = SHIFT_SRL;
@@ -1257,34 +1264,29 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 	goto gen_arith1;
 
     case INDEX_op_div_i32:
-        tcg_out_div32(s, args[0], args[1], args[2], const_args[2], 0);
+        tcg_out_div32(s, a0, a1, a2, c2, 0);
         break;
     case INDEX_op_divu_i32:
-        tcg_out_div32(s, args[0], args[1], args[2], const_args[2], 1);
+        tcg_out_div32(s, a0, a1, a2, c2, 1);
         break;
 
     case INDEX_op_brcond_i32:
-        tcg_out_brcond_i32(s, args[2], args[0], args[1], const_args[1],
-                           args[3]);
+        tcg_out_brcond_i32(s, a2, a0, a1, const_args[1], args[3]);
         break;
     case INDEX_op_setcond_i32:
-        tcg_out_setcond_i32(s, args[3], args[0], args[1],
-                            args[2], const_args[2]);
+        tcg_out_setcond_i32(s, args[3], a0, a1, a2, c2);
         break;
     case INDEX_op_movcond_i32:
-        tcg_out_movcond_i32(s, args[5], args[0], args[1],
-                            args[2], const_args[2], args[3], const_args[3]);
+        tcg_out_movcond_i32(s, args[5], a0, a1, a2, c2, args[3], const_args[3]);
         break;
 
     case INDEX_op_add2_i32:
-        tcg_out_addsub2(s, args[0], args[1], args[2], args[3],
-                        args[4], const_args[4], args[5], const_args[5],
-                        ARITH_ADDCC, ARITH_ADDX);
+        tcg_out_addsub2(s, a0, a1, a2, args[3], args[4], const_args[4],
+                        args[5], const_args[5], ARITH_ADDCC, ARITH_ADDX);
         break;
     case INDEX_op_sub2_i32:
-        tcg_out_addsub2(s, args[0], args[1], args[2], args[3],
-                        args[4], const_args[4], args[5], const_args[5],
-                        ARITH_SUBCC, ARITH_SUBX);
+        tcg_out_addsub2(s, a0, a1, a2, args[3], args[4], const_args[4],
+                        args[5], const_args[5], ARITH_SUBCC, ARITH_SUBX);
         break;
     case INDEX_op_mulu2_i32:
         c = ARITH_UMUL;
@@ -1294,41 +1296,39 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     do_mul2:
         /* The 32-bit multiply insns produce a full 64-bit result.  If the
            destination register can hold it, we can avoid the slower RDY.  */
-        tcg_out_arithc(s, args[0], args[2], args[3], const_args[3], c);
-        if (SPARC64 || args[0] <= TCG_REG_O7) {
-            tcg_out_arithi(s, args[1], args[0], 32, SHIFT_SRLX);
+        tcg_out_arithc(s, a0, a2, args[3], const_args[3], c);
+        if (SPARC64 || a0 <= TCG_REG_O7) {
+            tcg_out_arithi(s, a1, a0, 32, SHIFT_SRLX);
         } else {
-            tcg_out_rdy(s, args[1]);
+            tcg_out_rdy(s, a1);
         }
         break;
 
     case INDEX_op_qemu_ld_i32:
-        tcg_out_qemu_ld(s, args[0], args[1], args[2], args[3], false);
+        tcg_out_qemu_ld(s, a0, a1, a2, args[3], false);
         break;
     case INDEX_op_qemu_ld_i64:
-        tcg_out_qemu_ld(s, args[0], args[1], args[2], args[3], true);
+        tcg_out_qemu_ld(s, a0, a1, a2, args[3], true);
         break;
     case INDEX_op_qemu_st_i32:
-        tcg_out_qemu_st(s, args[0], args[1], args[2], args[3]);
-        break;
     case INDEX_op_qemu_st_i64:
-        tcg_out_qemu_st(s, args[0], args[1], args[2], args[3]);
+        tcg_out_qemu_st(s, a0, a1, a2, args[3]);
         break;
 
     case INDEX_op_ld32s_i64:
-        tcg_out_ldst(s, args[0], args[1], args[2], LDSW);
+        tcg_out_ldst(s, a0, a1, a2, LDSW);
         break;
     case INDEX_op_ld_i64:
-        tcg_out_ldst(s, args[0], args[1], args[2], LDX);
+        tcg_out_ldst(s, a0, a1, a2, LDX);
         break;
     case INDEX_op_st_i64:
-        tcg_out_ldst(s, args[0], args[1], args[2], STX);
+        tcg_out_ldst(s, a0, a1, a2, STX);
         break;
     case INDEX_op_shl_i64:
         c = SHIFT_SLLX;
     do_shift64:
         /* Limit immediate shift count lest we create an illegal insn.  */
-        tcg_out_arithc(s, args[0], args[1], args[2] & 63, const_args[2], c);
+        tcg_out_arithc(s, a0, a1, a2 & 63, c2, c);
         break;
     case INDEX_op_shr_i64:
         c = SHIFT_SRLX;
@@ -1346,38 +1346,31 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         c = ARITH_UDIVX;
         goto gen_arith;
     case INDEX_op_ext32s_i64:
-        tcg_out_arithi(s, args[0], args[1], 0, SHIFT_SRA);
+        tcg_out_arithi(s, a0, a1, 0, SHIFT_SRA);
         break;
     case INDEX_op_ext32u_i64:
-        tcg_out_arithi(s, args[0], args[1], 0, SHIFT_SRL);
+        tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
         break;
     case INDEX_op_trunc_i32:
-        if (args[2] == 0) {
-            tcg_out_mov(s, TCG_TYPE_I32, args[0], args[1]);
-        } else {
-            tcg_out_arithi(s, args[0], args[1], args[2], SHIFT_SRLX);
-        }
+        tcg_out_arithi(s, a0, a1, a2, SHIFT_SRLX);
         break;
 
     case INDEX_op_brcond_i64:
-        tcg_out_brcond_i64(s, args[2], args[0], args[1], const_args[1],
-                           args[3]);
+        tcg_out_brcond_i64(s, a2, a0, a1, const_args[1], args[3]);
         break;
     case INDEX_op_setcond_i64:
-        tcg_out_setcond_i64(s, args[3], args[0], args[1],
-                            args[2], const_args[2]);
+        tcg_out_setcond_i64(s, args[3], a0, a1, a2, c2);
         break;
     case INDEX_op_movcond_i64:
-        tcg_out_movcond_i64(s, args[5], args[0], args[1],
-                            args[2], const_args[2], args[3], const_args[3]);
+        tcg_out_movcond_i64(s, args[5], a0, a1, a2, c2, args[3], const_args[3]);
         break;
 
     gen_arith:
-        tcg_out_arithc(s, args[0], args[1], args[2], const_args[2], c);
+        tcg_out_arithc(s, a0, a1, a2, c2, c);
         break;
 
     gen_arith1:
-	tcg_out_arithc(s, args[0], TCG_REG_G0, args[1], const_args[1], c);
+	tcg_out_arithc(s, a0, TCG_REG_G0, a1, const_args[1], c);
 	break;
 
     case INDEX_op_mov_i64:
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 11/14] tcg-sparc: Fixup function argument types
  2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
                   ` (9 preceding siblings ...)
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 10/14] tcg-sparc: Hoist common argument loads " Richard Henderson
@ 2014-03-17 18:37 ` Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 12/14] tcg-sparc: Fix small 32-bit movi Richard Henderson
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

Use TCGReg everywhere appropriate.  Use int32_t for all arguments
that may be registers or immediate constants.  Merge tcg_out_addi
into its only caller.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c | 115 +++++++++++++++++++++----------------------------
 1 file changed, 50 insertions(+), 65 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 88676da..d4ef0d4 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -384,22 +384,20 @@ static inline int tcg_target_const_match(tcg_target_long val,
     }
 }
 
-static inline void tcg_out_arith(TCGContext *s, int rd, int rs1, int rs2,
-                                 int op)
+static inline void tcg_out_arith(TCGContext *s, TCGReg rd, TCGReg rs1,
+                                 TCGReg rs2, int op)
 {
-    tcg_out32(s, op | INSN_RD(rd) | INSN_RS1(rs1) |
-              INSN_RS2(rs2));
+    tcg_out32(s, op | INSN_RD(rd) | INSN_RS1(rs1) | INSN_RS2(rs2));
 }
 
-static inline void tcg_out_arithi(TCGContext *s, int rd, int rs1,
-                                  uint32_t offset, int op)
+static inline void tcg_out_arithi(TCGContext *s, TCGReg rd, TCGReg rs1,
+                                  int32_t offset, int op)
 {
-    tcg_out32(s, op | INSN_RD(rd) | INSN_RS1(rs1) |
-              INSN_IMM13(offset));
+    tcg_out32(s, op | INSN_RD(rd) | INSN_RS1(rs1) | INSN_IMM13(offset));
 }
 
-static void tcg_out_arithc(TCGContext *s, int rd, int rs1,
-			   int val2, int val2const, int op)
+static void tcg_out_arithc(TCGContext *s, TCGReg rd, TCGReg rs1,
+			   int32_t val2, int val2const, int op)
 {
     tcg_out32(s, op | INSN_RD(rd) | INSN_RS1(rs1)
               | (val2const ? INSN_IMM13(val2) : INSN_RS2(val2)));
@@ -413,12 +411,12 @@ static inline void tcg_out_mov(TCGContext *s, TCGType type,
     }
 }
 
-static inline void tcg_out_sethi(TCGContext *s, int ret, uint32_t arg)
+static inline void tcg_out_sethi(TCGContext *s, TCGReg ret, uint32_t arg)
 {
     tcg_out32(s, SETHI | INSN_RD(ret) | ((arg & 0xfffffc00) >> 10));
 }
 
-static inline void tcg_out_movi_imm13(TCGContext *s, int ret, uint32_t arg)
+static inline void tcg_out_movi_imm13(TCGContext *s, TCGReg ret, int32_t arg)
 {
     tcg_out_arithi(s, ret, TCG_REG_G0, arg, ARITH_OR);
 }
@@ -465,14 +463,14 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     }
 }
 
-static inline void tcg_out_ldst_rr(TCGContext *s, int data, int a1,
-                                   int a2, int op)
+static inline void tcg_out_ldst_rr(TCGContext *s, TCGReg data, TCGReg a1,
+                                   TCGReg a2, int op)
 {
     tcg_out32(s, op | INSN_RD(data) | INSN_RS1(a1) | INSN_RS2(a2));
 }
 
-static inline void tcg_out_ldst(TCGContext *s, int ret, int addr,
-                                int offset, int op)
+static void tcg_out_ldst(TCGContext *s, TCGReg ret, TCGReg addr,
+                         intptr_t offset, int op)
 {
     if (check_fit_ptr(offset, 13)) {
         tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(addr) |
@@ -495,40 +493,24 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
     tcg_out_ldst(s, arg, arg1, arg2, (type == TCG_TYPE_I32 ? STW : STX));
 }
 
-static inline void tcg_out_ld_ptr(TCGContext *s, TCGReg ret, uintptr_t arg)
+static void tcg_out_ld_ptr(TCGContext *s, TCGReg ret, uintptr_t arg)
 {
-    TCGReg base = TCG_REG_G0;
-    if (!check_fit_ptr(arg, 10)) {
-        tcg_out_movi(s, TCG_TYPE_PTR, ret, arg & ~0x3ff);
-        base = ret;
-    }
-    tcg_out_ld(s, TCG_TYPE_PTR, ret, base, arg & 0x3ff);
+    tcg_out_movi(s, TCG_TYPE_PTR, ret, arg & ~0x3ff);
+    tcg_out_ld(s, TCG_TYPE_PTR, ret, ret, arg & 0x3ff);
 }
 
-static inline void tcg_out_sety(TCGContext *s, int rs)
+static inline void tcg_out_sety(TCGContext *s, TCGReg rs)
 {
     tcg_out32(s, WRY | INSN_RS1(TCG_REG_G0) | INSN_RS2(rs));
 }
 
-static inline void tcg_out_rdy(TCGContext *s, int rd)
+static inline void tcg_out_rdy(TCGContext *s, TCGReg rd)
 {
     tcg_out32(s, RDY | INSN_RD(rd));
 }
 
-static inline void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val)
-{
-    if (val != 0) {
-        if (check_fit_tl(val, 13))
-            tcg_out_arithi(s, reg, reg, val, ARITH_ADD);
-        else {
-            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_T1, val);
-            tcg_out_arith(s, reg, reg, TCG_REG_T1, ARITH_ADD);
-        }
-    }
-}
-
-static void tcg_out_div32(TCGContext *s, int rd, int rs1,
-                          int val2, int val2const, int uns)
+static void tcg_out_div32(TCGContext *s, TCGReg rd, TCGReg rs1,
+                          int32_t val2, int val2const, int uns)
 {
     /* Load Y with the sign/zero extension of RS1 to 64-bits.  */
     if (uns) {
@@ -589,37 +571,37 @@ static void tcg_out_bpcc(TCGContext *s, int scond, int flags, int label)
     tcg_out_bpcc0(s, scond, flags, off19);
 }
 
-static void tcg_out_cmp(TCGContext *s, TCGArg c1, TCGArg c2, int c2const)
+static void tcg_out_cmp(TCGContext *s, TCGReg c1, int32_t c2, int c2const)
 {
     tcg_out_arithc(s, TCG_REG_G0, c1, c2, c2const, ARITH_SUBCC);
 }
 
-static void tcg_out_brcond_i32(TCGContext *s, TCGCond cond, TCGArg arg1,
-                               TCGArg arg2, int const_arg2, int label)
+static void tcg_out_brcond_i32(TCGContext *s, TCGCond cond, TCGReg arg1,
+                               int32_t arg2, int const_arg2, int label)
 {
     tcg_out_cmp(s, arg1, arg2, const_arg2);
     tcg_out_bpcc(s, tcg_cond_to_bcond[cond], BPCC_ICC | BPCC_PT, label);
     tcg_out_nop(s);
 }
 
-static void tcg_out_movcc(TCGContext *s, TCGCond cond, int cc, TCGArg ret,
-                          TCGArg v1, int v1const)
+static void tcg_out_movcc(TCGContext *s, TCGCond cond, int cc, TCGReg ret,
+                          int32_t v1, int v1const)
 {
     tcg_out32(s, ARITH_MOVCC | cc | INSN_RD(ret)
               | INSN_RS1(tcg_cond_to_bcond[cond])
               | (v1const ? INSN_IMM11(v1) : INSN_RS2(v1)));
 }
 
-static void tcg_out_movcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
-                                TCGArg c1, TCGArg c2, int c2const,
-                                TCGArg v1, int v1const)
+static void tcg_out_movcond_i32(TCGContext *s, TCGCond cond, TCGReg ret,
+                                TCGReg c1, int32_t c2, int c2const,
+                                int32_t v1, int v1const)
 {
     tcg_out_cmp(s, c1, c2, c2const);
     tcg_out_movcc(s, cond, MOVCC_ICC, ret, v1, v1const);
 }
 
-static void tcg_out_brcond_i64(TCGContext *s, TCGCond cond, TCGArg arg1,
-                               TCGArg arg2, int const_arg2, int label)
+static void tcg_out_brcond_i64(TCGContext *s, TCGCond cond, TCGReg arg1,
+                               int32_t arg2, int const_arg2, int label)
 {
     /* For 64-bit signed comparisons vs zero, we can avoid the compare.  */
     if (arg2 == 0 && !is_unsigned_cond(cond)) {
@@ -642,23 +624,23 @@ static void tcg_out_brcond_i64(TCGContext *s, TCGCond cond, TCGArg arg1,
     tcg_out_nop(s);
 }
 
-static void tcg_out_movr(TCGContext *s, TCGCond cond, TCGArg ret, TCGArg c1,
-                         TCGArg v1, int v1const)
+static void tcg_out_movr(TCGContext *s, TCGCond cond, TCGReg ret, TCGReg c1,
+                         int32_t v1, int v1const)
 {
     tcg_out32(s, ARITH_MOVR | INSN_RD(ret) | INSN_RS1(c1)
               | (tcg_cond_to_rcond[cond] << 10)
               | (v1const ? INSN_IMM10(v1) : INSN_RS2(v1)));
 }
 
-static void tcg_out_movcond_i64(TCGContext *s, TCGCond cond, TCGArg ret,
-                                TCGArg c1, TCGArg c2, int c2const,
-                                TCGArg v1, int v1const)
+static void tcg_out_movcond_i64(TCGContext *s, TCGCond cond, TCGReg ret,
+                                TCGReg c1, int32_t c2, int c2const,
+                                int32_t v1, int v1const)
 {
     /* For 64-bit signed comparisons vs zero, we can avoid the compare.
        Note that the immediate range is one bit smaller, so we must check
        for that as well.  */
     if (c2 == 0 && !is_unsigned_cond(cond)
-        && (!v1const || check_fit_tl(v1, 10))) {
+        && (!v1const || check_fit_i32(v1, 10))) {
         tcg_out_movr(s, cond, ret, c1, v1, v1const);
     } else {
         tcg_out_cmp(s, c1, c2, c2const);
@@ -666,8 +648,8 @@ static void tcg_out_movcond_i64(TCGContext *s, TCGCond cond, TCGArg ret,
     }
 }
 
-static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
-                                TCGArg c1, TCGArg c2, int c2const)
+static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGReg ret,
+                                TCGReg c1, int32_t c2, int c2const)
 {
     /* For 32-bit comparisons, we can play games with ADDX/SUBX.  */
     switch (cond) {
@@ -716,8 +698,8 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
     }
 }
 
-static void tcg_out_setcond_i64(TCGContext *s, TCGCond cond, TCGArg ret,
-                                TCGArg c1, TCGArg c2, int c2const)
+static void tcg_out_setcond_i64(TCGContext *s, TCGCond cond, TCGReg ret,
+                                TCGReg c1, int32_t c2, int c2const)
 {
     /* For 64-bit signed comparisons vs zero, we can avoid the compare
        if the input does not overlap the output.  */
@@ -731,11 +713,11 @@ static void tcg_out_setcond_i64(TCGContext *s, TCGCond cond, TCGArg ret,
     }
 }
 
-static void tcg_out_addsub2(TCGContext *s, TCGArg rl, TCGArg rh,
-                            TCGArg al, TCGArg ah, TCGArg bl, int blconst,
-                            TCGArg bh, int bhconst, int opl, int oph)
+static void tcg_out_addsub2(TCGContext *s, TCGReg rl, TCGReg rh,
+                            TCGReg al, TCGReg ah, int32_t bl, int blconst,
+                            int32_t bh, int bhconst, int opl, int oph)
 {
-    TCGArg tmp = TCG_REG_T1;
+    TCGReg tmp = TCG_REG_T1;
 
     /* Note that the low parts are fully consumed before tmp is set.  */
     if (rl != ah && (bhconst || rl != bh)) {
@@ -747,7 +729,7 @@ static void tcg_out_addsub2(TCGContext *s, TCGArg rl, TCGArg rh,
     tcg_out_mov(s, TCG_TYPE_I32, rl, tmp);
 }
 
-static inline void tcg_out_calli(TCGContext *s, uintptr_t dest)
+static void tcg_out_calli(TCGContext *s, uintptr_t dest)
 {
     intptr_t disp = dest - (uintptr_t)s->code_ptr;
 
@@ -954,7 +936,10 @@ static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addr, int mem_index,
     /* Find a base address that can load both tlb comparator and addend.  */
     tlb_ofs = offsetof(CPUArchState, tlb_table[mem_index][0]);
     if (!check_fit_ptr(tlb_ofs + sizeof(CPUTLBEntry), 13)) {
-        tcg_out_addi(s, r1, tlb_ofs & ~0x3ff);
+        if (tlb_ofs & ~0x3ff) {
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_T1, tlb_ofs & ~0x3ff);
+            tcg_out_arith(s, r1, r1, TCG_REG_T1, ARITH_ADD);
+        }
         tlb_ofs &= 0x3ff;
     }
 
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 12/14] tcg-sparc: Fix small 32-bit movi
  2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
                   ` (10 preceding siblings ...)
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 11/14] tcg-sparc: Fixup function argument types Richard Henderson
@ 2014-03-17 18:37 ` Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 13/14] tcg-sparc: Fix 32-bit constant arguments tests Richard Henderson
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 14/14] tcg-sparc: Accept stores of zero Richard Henderson
  13 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

We tested imm13 before discarding garbage high bits.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index d4ef0d4..6498999 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -426,6 +426,11 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 {
     tcg_target_long hi, lo = (int32_t)arg;
 
+    /* Make sure we test 32-bit constants for imm13 properly.  */
+    if (type == TCG_TYPE_I32) {
+        arg = lo;
+    }
+
     /* A 13-bit constant sign-extended to 64-bits.  */
     if (check_fit_tl(arg, 13)) {
         tcg_out_movi_imm13(s, ret, arg);
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 13/14] tcg-sparc: Fix 32-bit constant arguments tests
  2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
                   ` (11 preceding siblings ...)
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 12/14] tcg-sparc: Fix small 32-bit movi Richard Henderson
@ 2014-03-17 18:37 ` Richard Henderson
  2014-03-22  9:48   ` Stuart Brady
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 14/14] tcg-sparc: Accept stores of zero Richard Henderson
  13 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

We need to discard garbage high bits before testing
for 32-bit I and J constraints.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c | 73 +++++++++++++++++++++++++++++---------------------
 tcg/sparc/tcg-target.h |  4 ---
 2 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 6498999..29d7a48 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -89,6 +89,11 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 # define TCG_GUEST_BASE_REG TCG_REG_G0
 #endif
 
+#define TCG_CT_CONST_S11  0x100
+#define TCG_CT_CONST_S13  0x200
+#define TCG_CT_CONST_ZERO 0x400
+#define TCG_CT_CONST_IS32 0x800
+
 static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_L0,
     TCG_REG_L1,
@@ -357,6 +362,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
     case 'Z':
         ct->ct |= TCG_CT_CONST_ZERO;
         break;
+    case 'w':
+        ct->ct |= TCG_CT_CONST_IS32;
+        break;
     default:
         return -1;
     }
@@ -373,7 +381,12 @@ static inline int tcg_target_const_match(tcg_target_long val,
 
     if (ct & TCG_CT_CONST) {
         return 1;
-    } else if ((ct & TCG_CT_CONST_ZERO) && val == 0) {
+    }
+    if (ct & TCG_CT_CONST_IS32) {
+        val = (int32_t)val;
+    }
+
+    if ((ct & TCG_CT_CONST_ZERO) && val == 0) {
         return 1;
     } else if ((ct & TCG_CT_CONST_S11) && check_fit_tl(val, 11)) {
         return 1;
@@ -679,7 +692,7 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGReg ret,
            swap the operands on GTU/LEU.  There's no benefit to loading
            the constant into a temporary register.  */
         if (!c2const || c2 == 0) {
-            TCGArg t = c1;
+            TCGReg t = c1;
             c1 = c2;
             c2 = t;
             c2const = 0;
@@ -1391,32 +1404,32 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_st16_i32, { "rZ", "r" } },
     { INDEX_op_st_i32, { "rZ", "r" } },
 
-    { INDEX_op_add_i32, { "r", "rZ", "rJ" } },
-    { INDEX_op_mul_i32, { "r", "rZ", "rJ" } },
-    { INDEX_op_div_i32, { "r", "rZ", "rJ" } },
-    { INDEX_op_divu_i32, { "r", "rZ", "rJ" } },
-    { INDEX_op_sub_i32, { "r", "rZ", "rJ" } },
-    { INDEX_op_and_i32, { "r", "rZ", "rJ" } },
-    { INDEX_op_andc_i32, { "r", "rZ", "rJ" } },
-    { INDEX_op_or_i32, { "r", "rZ", "rJ" } },
-    { INDEX_op_orc_i32, { "r", "rZ", "rJ" } },
-    { INDEX_op_xor_i32, { "r", "rZ", "rJ" } },
-
-    { INDEX_op_shl_i32, { "r", "rZ", "rJ" } },
-    { INDEX_op_shr_i32, { "r", "rZ", "rJ" } },
-    { INDEX_op_sar_i32, { "r", "rZ", "rJ" } },
-
-    { INDEX_op_neg_i32, { "r", "rJ" } },
-    { INDEX_op_not_i32, { "r", "rJ" } },
-
-    { INDEX_op_brcond_i32, { "rZ", "rJ" } },
-    { INDEX_op_setcond_i32, { "r", "rZ", "rJ" } },
-    { INDEX_op_movcond_i32, { "r", "rZ", "rJ", "rI", "0" } },
-
-    { INDEX_op_add2_i32, { "r", "r", "rZ", "rZ", "rJ", "rJ" } },
-    { INDEX_op_sub2_i32, { "r", "r", "rZ", "rZ", "rJ", "rJ" } },
-    { INDEX_op_mulu2_i32, { "r", "r", "rZ", "rJ" } },
-    { INDEX_op_muls2_i32, { "r", "r", "rZ", "rJ" } },
+    { INDEX_op_add_i32, { "r", "rZ", "rwJ" } },
+    { INDEX_op_mul_i32, { "r", "rZ", "rwJ" } },
+    { INDEX_op_div_i32, { "r", "rZ", "rwJ" } },
+    { INDEX_op_divu_i32, { "r", "rZ", "rwJ" } },
+    { INDEX_op_sub_i32, { "r", "rZ", "rwJ" } },
+    { INDEX_op_and_i32, { "r", "rZ", "rwJ" } },
+    { INDEX_op_andc_i32, { "r", "rZ", "rwJ" } },
+    { INDEX_op_or_i32, { "r", "rZ", "rwJ" } },
+    { INDEX_op_orc_i32, { "r", "rZ", "rwJ" } },
+    { INDEX_op_xor_i32, { "r", "rZ", "rwJ" } },
+
+    { INDEX_op_shl_i32, { "r", "rZ", "ri" } },
+    { INDEX_op_shr_i32, { "r", "rZ", "ri" } },
+    { INDEX_op_sar_i32, { "r", "rZ", "ri" } },
+
+    { INDEX_op_neg_i32, { "r", "r" } },
+    { INDEX_op_not_i32, { "r", "r" } },
+
+    { INDEX_op_brcond_i32, { "rZ", "rwJ" } },
+    { INDEX_op_setcond_i32, { "r", "rZ", "rwJ" } },
+    { INDEX_op_movcond_i32, { "r", "rZ", "rwJ", "rwI", "0" } },
+
+    { INDEX_op_add2_i32, { "r", "r", "rZ", "rZ", "rwJ", "rwJ" } },
+    { INDEX_op_sub2_i32, { "r", "r", "rZ", "rZ", "rwJ", "rwJ" } },
+    { INDEX_op_mulu2_i32, { "r", "r", "rZ", "rwJ" } },
+    { INDEX_op_muls2_i32, { "r", "r", "rZ", "rwJ" } },
 
     { INDEX_op_mov_i64, { "R", "R" } },
     { INDEX_op_movi_i64, { "R" } },
@@ -1447,8 +1460,8 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_shr_i64, { "R", "RZ", "RJ" } },
     { INDEX_op_sar_i64, { "R", "RZ", "RJ" } },
 
-    { INDEX_op_neg_i64, { "R", "RJ" } },
-    { INDEX_op_not_i64, { "R", "RJ" } },
+    { INDEX_op_neg_i64, { "R", "R" } },
+    { INDEX_op_not_i64, { "R", "R" } },
 
     { INDEX_op_ext32s_i64, { "R", "r" } },
     { INDEX_op_ext32u_i64, { "R", "r" } },
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 091224c..47fce69 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -65,10 +65,6 @@ typedef enum {
     TCG_REG_I7,
 } TCGReg;
 
-#define TCG_CT_CONST_S11  0x100
-#define TCG_CT_CONST_S13  0x200
-#define TCG_CT_CONST_ZERO 0x400
-
 /* used for function call generation */
 #define TCG_REG_CALL_STACK TCG_REG_O6
 
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 14/14] tcg-sparc: Accept stores of zero
  2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
                   ` (12 preceding siblings ...)
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 13/14] tcg-sparc: Fix 32-bit constant arguments tests Richard Henderson
@ 2014-03-17 18:37 ` Richard Henderson
  13 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-03-17 18:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 29d7a48..9e2e80b 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1473,8 +1473,8 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 
     { INDEX_op_qemu_ld_i32, { "r", "A" } },
     { INDEX_op_qemu_ld_i64, { "R", "A" } },
-    { INDEX_op_qemu_st_i32, { "s", "A" } },
-    { INDEX_op_qemu_st_i64, { "S", "A" } },
+    { INDEX_op_qemu_st_i32, { "sZ", "A" } },
+    { INDEX_op_qemu_st_i64, { "SZ", "A" } },
 
     { -1 },
 };
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 02/14] tcg: Add INDEX_op_trunc_i32
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 02/14] tcg: Add INDEX_op_trunc_i32 Richard Henderson
@ 2014-03-21 23:35   ` Stuart Brady
  2014-03-21 23:39     ` Stuart Brady
  0 siblings, 1 reply; 19+ messages in thread
From: Stuart Brady @ 2014-03-21 23:35 UTC (permalink / raw)
  To: qemu-devel

On Mon, Mar 17, 2014 at 11:37:44AM -0700, Richard Henderson wrote:
> diff --git a/tcg/README b/tcg/README
> index f178212..160cbe8 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -306,6 +306,11 @@ This operation would be equivalent to
>  
>    dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
>  
> +* trunc_i32 t0, t1, pos
> +
> +For 64-bit hosts only, right shift the 64-bit input T1 by POS and
> +truncate to 32-bit output T0.  Depending on the host, this may be
> +a simple mov/shift, or may require additional canonicalization.

Surely trunc_rshift_i64_i32, which is used elsewhere in this patch?

... or trunc_shr_i64_i32 as we use 'shr' for the existing shift-right op.

Cheers,
Stuart

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 02/14] tcg: Add INDEX_op_trunc_i32
  2014-03-21 23:35   ` Stuart Brady
@ 2014-03-21 23:39     ` Stuart Brady
  0 siblings, 0 replies; 19+ messages in thread
From: Stuart Brady @ 2014-03-21 23:39 UTC (permalink / raw)
  To: qemu-devel

On Fri, Mar 21, 2014 at 11:35:12PM +0000, Stuart Brady wrote:
> On Mon, Mar 17, 2014 at 11:37:44AM -0700, Richard Henderson wrote:
> > diff --git a/tcg/README b/tcg/README
> > index f178212..160cbe8 100644
> > --- a/tcg/README
> > +++ b/tcg/README
> > @@ -306,6 +306,11 @@ This operation would be equivalent to
> >  
> >    dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
> >  
> > +* trunc_i32 t0, t1, pos
> > +
> > +For 64-bit hosts only, right shift the 64-bit input T1 by POS and
> > +truncate to 32-bit output T0.  Depending on the host, this may be
> > +a simple mov/shift, or may require additional canonicalization.
> 
> Surely trunc_rshift_i64_i32, which is used elsewhere in this patch?

... but not used consistently, so this affects more than just the README.
Forgot to mention that! :-)

> ... or trunc_shr_i64_i32 as we use 'shr' for the existing shift-right op.

Cheers,
Stuart

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 13/14] tcg-sparc: Fix 32-bit constant arguments tests
  2014-03-17 18:37 ` [Qemu-devel] [PATCH 13/14] tcg-sparc: Fix 32-bit constant arguments tests Richard Henderson
@ 2014-03-22  9:48   ` Stuart Brady
  2014-03-22 17:08     ` Richard Henderson
  0 siblings, 1 reply; 19+ messages in thread
From: Stuart Brady @ 2014-03-22  9:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: blauwirbel, qemu-devel, aurelien

On Mon, Mar 17, 2014 at 11:37:55AM -0700, Richard Henderson wrote:
> We need to discard garbage high bits before testing
> for 32-bit I and J constraints.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/sparc/tcg-target.c | 73 +++++++++++++++++++++++++++++---------------------
>  tcg/sparc/tcg-target.h |  4 ---
>  2 files changed, 43 insertions(+), 34 deletions(-)
> 
> diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
> index 6498999..29d7a48 100644
> --- a/tcg/sparc/tcg-target.c
> +++ b/tcg/sparc/tcg-target.c
> @@ -89,6 +89,11 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
>  # define TCG_GUEST_BASE_REG TCG_REG_G0
>  #endif
>  
> +#define TCG_CT_CONST_S11  0x100
> +#define TCG_CT_CONST_S13  0x200
> +#define TCG_CT_CONST_ZERO 0x400
> +#define TCG_CT_CONST_IS32 0x800
> +
>  static const int tcg_target_reg_alloc_order[] = {
>      TCG_REG_L0,
>      TCG_REG_L1,
> @@ -357,6 +362,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>      case 'Z':
>          ct->ct |= TCG_CT_CONST_ZERO;
>          break;
> +    case 'w':
> +        ct->ct |= TCG_CT_CONST_IS32;
> +        break;

It might be neater just to expose ts->type to tcg_target_const_match(),
which the TCG target can just ignore if it wants, but that might not
belong in this series, as this seems to affect PPC64 and IA64 too.

>      default:
>          return -1;
>      }
> @@ -373,7 +381,12 @@ static inline int tcg_target_const_match(tcg_target_long val,
>  
>      if (ct & TCG_CT_CONST) {
>          return 1;
> -    } else if ((ct & TCG_CT_CONST_ZERO) && val == 0) {
> +    }
> +    if (ct & TCG_CT_CONST_IS32) {
> +        val = (int32_t)val;
> +    }
> +
> +    if ((ct & TCG_CT_CONST_ZERO) && val == 0) {
>          return 1;
>      } else if ((ct & TCG_CT_CONST_S11) && check_fit_tl(val, 11)) {
>          return 1;
> @@ -679,7 +692,7 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGReg ret,
>             swap the operands on GTU/LEU.  There's no benefit to loading
>             the constant into a temporary register.  */
>          if (!c2const || c2 == 0) {
> -            TCGArg t = c1;
> +            TCGReg t = c1;

I guess this was meant for 11/14?

>              c1 = c2;
>              c2 = t;
>              c2const = 0;
> @@ -1391,32 +1404,32 @@ static const TCGTargetOpDef sparc_op_defs[] = {
>      { INDEX_op_st16_i32, { "rZ", "r" } },
>      { INDEX_op_st_i32, { "rZ", "r" } },
>  
> -    { INDEX_op_add_i32, { "r", "rZ", "rJ" } },
> -    { INDEX_op_mul_i32, { "r", "rZ", "rJ" } },
> -    { INDEX_op_div_i32, { "r", "rZ", "rJ" } },
> -    { INDEX_op_divu_i32, { "r", "rZ", "rJ" } },
> -    { INDEX_op_sub_i32, { "r", "rZ", "rJ" } },
> -    { INDEX_op_and_i32, { "r", "rZ", "rJ" } },
> -    { INDEX_op_andc_i32, { "r", "rZ", "rJ" } },
> -    { INDEX_op_or_i32, { "r", "rZ", "rJ" } },
> -    { INDEX_op_orc_i32, { "r", "rZ", "rJ" } },
> -    { INDEX_op_xor_i32, { "r", "rZ", "rJ" } },
> -
> -    { INDEX_op_shl_i32, { "r", "rZ", "rJ" } },
> -    { INDEX_op_shr_i32, { "r", "rZ", "rJ" } },
> -    { INDEX_op_sar_i32, { "r", "rZ", "rJ" } },
> -
> -    { INDEX_op_neg_i32, { "r", "rJ" } },
> -    { INDEX_op_not_i32, { "r", "rJ" } },
> -
> -    { INDEX_op_brcond_i32, { "rZ", "rJ" } },
> -    { INDEX_op_setcond_i32, { "r", "rZ", "rJ" } },
> -    { INDEX_op_movcond_i32, { "r", "rZ", "rJ", "rI", "0" } },
> -
> -    { INDEX_op_add2_i32, { "r", "r", "rZ", "rZ", "rJ", "rJ" } },
> -    { INDEX_op_sub2_i32, { "r", "r", "rZ", "rZ", "rJ", "rJ" } },
> -    { INDEX_op_mulu2_i32, { "r", "r", "rZ", "rJ" } },
> -    { INDEX_op_muls2_i32, { "r", "r", "rZ", "rJ" } },
> +    { INDEX_op_add_i32, { "r", "rZ", "rwJ" } },
> +    { INDEX_op_mul_i32, { "r", "rZ", "rwJ" } },
> +    { INDEX_op_div_i32, { "r", "rZ", "rwJ" } },
> +    { INDEX_op_divu_i32, { "r", "rZ", "rwJ" } },
> +    { INDEX_op_sub_i32, { "r", "rZ", "rwJ" } },
> +    { INDEX_op_and_i32, { "r", "rZ", "rwJ" } },
> +    { INDEX_op_andc_i32, { "r", "rZ", "rwJ" } },
> +    { INDEX_op_or_i32, { "r", "rZ", "rwJ" } },
> +    { INDEX_op_orc_i32, { "r", "rZ", "rwJ" } },
> +    { INDEX_op_xor_i32, { "r", "rZ", "rwJ" } },
> +
> +    { INDEX_op_shl_i32, { "r", "rZ", "ri" } },
> +    { INDEX_op_shr_i32, { "r", "rZ", "ri" } },
> +    { INDEX_op_sar_i32, { "r", "rZ", "ri" } },

Relaxation of the shift argument for shl/shr/sar seems fine, but it's not
mentioned in the description.

> +
> +    { INDEX_op_neg_i32, { "r", "r" } },
> +    { INDEX_op_not_i32, { "r", "r" } },

Right, constant folding deals with these... but likewise.

> +
> +    { INDEX_op_brcond_i32, { "rZ", "rwJ" } },
> +    { INDEX_op_setcond_i32, { "r", "rZ", "rwJ" } },
> +    { INDEX_op_movcond_i32, { "r", "rZ", "rwJ", "rwI", "0" } },
> +
> +    { INDEX_op_add2_i32, { "r", "r", "rZ", "rZ", "rwJ", "rwJ" } },
> +    { INDEX_op_sub2_i32, { "r", "r", "rZ", "rZ", "rwJ", "rwJ" } },
> +    { INDEX_op_mulu2_i32, { "r", "r", "rZ", "rwJ" } },
> +    { INDEX_op_muls2_i32, { "r", "r", "rZ", "rwJ" } },
>  
>      { INDEX_op_mov_i64, { "R", "R" } },
>      { INDEX_op_movi_i64, { "R" } },
> @@ -1447,8 +1460,8 @@ static const TCGTargetOpDef sparc_op_defs[] = {
>      { INDEX_op_shr_i64, { "R", "RZ", "RJ" } },
>      { INDEX_op_sar_i64, { "R", "RZ", "RJ" } },

As above.

>  
> -    { INDEX_op_neg_i64, { "R", "RJ" } },
> -    { INDEX_op_not_i64, { "R", "RJ" } },
> +    { INDEX_op_neg_i64, { "R", "R" } },
> +    { INDEX_op_not_i64, { "R", "R" } },

As above.

>  
>      { INDEX_op_ext32s_i64, { "R", "r" } },
>      { INDEX_op_ext32u_i64, { "R", "r" } },
> diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
> index 091224c..47fce69 100644
> --- a/tcg/sparc/tcg-target.h
> +++ b/tcg/sparc/tcg-target.h
> @@ -65,10 +65,6 @@ typedef enum {
>      TCG_REG_I7,
>  } TCGReg;
>  
> -#define TCG_CT_CONST_S11  0x100
> -#define TCG_CT_CONST_S13  0x200
> -#define TCG_CT_CONST_ZERO 0x400
> -
>  /* used for function call generation */
>  #define TCG_REG_CALL_STACK TCG_REG_O6
>  
> -- 
> 1.8.5.3
> 
> 
>  

Cheers,
Stuart

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 13/14] tcg-sparc: Fix 32-bit constant arguments tests
  2014-03-22  9:48   ` Stuart Brady
@ 2014-03-22 17:08     ` Richard Henderson
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-03-22 17:08 UTC (permalink / raw)
  To: Stuart Brady; +Cc: blauwirbel, qemu-devel, aurelien

On 03/22/2014 02:48 AM, Stuart Brady wrote:
>> +    case 'w':
>> > +        ct->ct |= TCG_CT_CONST_IS32;
>> > +        break;
> It might be neater just to expose ts->type to tcg_target_const_match(),
> which the TCG target can just ignore if it wants, but that might not
> belong in this series, as this seems to affect PPC64 and IA64 too.
> 

That does sound quite sensible.


r~

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-03-22 17:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-17 18:37 [Qemu-devel] [PATCH 00/14] tcg/sparc v8plus code generation Richard Henderson
2014-03-17 18:37 ` [Qemu-devel] [PATCH 01/14] tcg: Fix missed pointer size != TCG_TARGET_REG_BITS changes Richard Henderson
2014-03-17 18:37 ` [Qemu-devel] [PATCH 02/14] tcg: Add INDEX_op_trunc_i32 Richard Henderson
2014-03-21 23:35   ` Stuart Brady
2014-03-21 23:39     ` Stuart Brady
2014-03-17 18:37 ` [Qemu-devel] [PATCH 03/14] tcg-sparc: Remove most uses of TCG_TARGET_REG_BITS Richard Henderson
2014-03-17 18:37 ` [Qemu-devel] [PATCH 04/14] tcg-sparc: Support trunc_i32 Richard Henderson
2014-03-17 18:37 ` [Qemu-devel] [PATCH 05/14] tcg-sparc: Use 64-bit registers with sparcv8plus Richard Henderson
2014-03-17 18:37 ` [Qemu-devel] [PATCH 06/14] tcg-sparc: Use the RETURN instruction Richard Henderson
2014-03-17 18:37 ` [Qemu-devel] [PATCH 07/14] tcg-sparc: Implement muls2_i32 Richard Henderson
2014-03-17 18:37 ` [Qemu-devel] [PATCH 08/14] tcg-sparc: Tidy check_fit_* tests Richard Henderson
2014-03-17 18:37 ` [Qemu-devel] [PATCH 09/14] tcg-sparc: Don't handle mov/movi in tcg_out_op Richard Henderson
2014-03-17 18:37 ` [Qemu-devel] [PATCH 10/14] tcg-sparc: Hoist common argument loads " Richard Henderson
2014-03-17 18:37 ` [Qemu-devel] [PATCH 11/14] tcg-sparc: Fixup function argument types Richard Henderson
2014-03-17 18:37 ` [Qemu-devel] [PATCH 12/14] tcg-sparc: Fix small 32-bit movi Richard Henderson
2014-03-17 18:37 ` [Qemu-devel] [PATCH 13/14] tcg-sparc: Fix 32-bit constant arguments tests Richard Henderson
2014-03-22  9:48   ` Stuart Brady
2014-03-22 17:08     ` Richard Henderson
2014-03-17 18:37 ` [Qemu-devel] [PATCH 14/14] tcg-sparc: Accept stores of zero Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.