All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2
@ 2012-03-28  0:32 Richard Henderson
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 01/14] tcg-sparc: Hack in qemu_ld/st64 for 32-bit Richard Henderson
                   ` (13 more replies)
  0 siblings, 14 replies; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel

Changes v1->v2:
  * Patch 3 found more __sparc_v8plus__ and __sparc_v9__ conditionals
    to convert.
  * Patch 6, user-exec.c no longer uses dyngen-exec.h at all.
  * Patch 7, env fallback to cpu_single_env now via macro.
  * Merged some of the qemu_ld/st patches.
  * Other random cleanups, as now I was able to test sparc64.


r~



Richard Henderson (14):
  tcg-sparc: Hack in qemu_ld/st64 for 32-bit.
  tcg-sparc: Fix ADDX opcode.
  tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode.
  tcg-sparc: Fix qemu_ld/st to handle 32-bit host.
  tcg-sparc: Simplify qemu_ld/st direct memory paths.
  tcg-sparc: Support GUEST_BASE.
  Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0.
  tcg-sparc: Do not use a global register for AREG0.
  tcg-sparc: Change AREG0 in generated code to %i0.
  tcg-sparc: Clean up cruft stemming from attempts to use global
    registers.
  tcg-sparc: Mask shift immediates to avoid illegal insns.
  tcg-sparc: Use defines for temporaries.
  tcg-sparc: Add %g/%o registers to alloc_order
  tcg-sparc: Fix and enable direct TB chaining.

 Makefile.target              |    5 -
 configure                    |   53 +---
 disas.c                      |    6 -
 dyngen-exec.h                |   24 +-
 exec-all.h                   |    9 +-
 exec.c                       |   12 +-
 qemu-timer.h                 |    8 +-
 target-m68k/op_helper.c      |    2 +-
 target-unicore32/op_helper.c |    2 +-
 target-xtensa/op_helper.c    |    2 +-
 tcg/sparc/tcg-target.c       |  958 ++++++++++++++++++++----------------------
 tcg/sparc/tcg-target.h       |   34 +-
 tcg/tcg.c                    |    3 +-
 user-exec.c                  |   17 +-
 xtensa-semi.c                |    2 +-
 15 files changed, 512 insertions(+), 625 deletions(-)

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 01/14] tcg-sparc: Hack in qemu_ld/st64 for 32-bit.
  2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
@ 2012-03-28  0:32 ` Richard Henderson
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 02/14] tcg-sparc: Fix ADDX opcode Richard Henderson
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel

Not actually implemented, but at least we avoid the tcg assert
at startup.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 247a278..0e71618 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1586,6 +1586,9 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 
     { INDEX_op_brcond_i64, { "r", "rJ" } },
     { INDEX_op_setcond_i64, { "r", "r", "rJ" } },
+#else
+    { INDEX_op_qemu_ld64, { "L", "L", "L" } },
+    { INDEX_op_qemu_st64, { "L", "L", "L" } },
 #endif
     { -1 },
 };
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 02/14] tcg-sparc: Fix ADDX opcode.
  2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 01/14] tcg-sparc: Hack in qemu_ld/st64 for 32-bit Richard Henderson
@ 2012-03-28  0:32 ` Richard Henderson
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 03/14] tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode Richard Henderson
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel


Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 0e71618..358a70c 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -242,7 +242,7 @@ static inline int tcg_target_const_match(tcg_target_long val,
 #define ARITH_XOR  (INSN_OP(2) | INSN_OP3(0x03))
 #define ARITH_SUB  (INSN_OP(2) | INSN_OP3(0x04))
 #define ARITH_SUBCC (INSN_OP(2) | INSN_OP3(0x14))
-#define ARITH_ADDX (INSN_OP(2) | INSN_OP3(0x10))
+#define ARITH_ADDX (INSN_OP(2) | INSN_OP3(0x08))
 #define ARITH_SUBX (INSN_OP(2) | INSN_OP3(0x0c))
 #define ARITH_UMUL (INSN_OP(2) | INSN_OP3(0x0a))
 #define ARITH_UDIV (INSN_OP(2) | INSN_OP3(0x0e))
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 03/14] tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode.
  2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 01/14] tcg-sparc: Hack in qemu_ld/st64 for 32-bit Richard Henderson
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 02/14] tcg-sparc: Fix ADDX opcode Richard Henderson
@ 2012-03-28  0:32 ` Richard Henderson
  2012-03-29 18:45   ` Blue Swirl
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 04/14] tcg-sparc: Fix qemu_ld/st to handle 32-bit host Richard Henderson
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel

Current code doesn't actually work in 32-bit mode at all.  Since
no one really noticed, drop the complication of v7 and v8 cpus.
Eliminate the --sparc_cpu configure option and standardize macro
testing on TCG_TARGET_REG_BITS / HOST_LONG_BITS

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 configure              |   41 ++++-------------------------------------
 disas.c                |    6 ------
 dyngen-exec.h          |    4 +---
 exec.c                 |   12 +++++-------
 qemu-timer.h           |    8 +++++---
 tcg/sparc/tcg-target.c |   20 +++++---------------
 tcg/sparc/tcg-target.h |    7 ++++---
 tcg/tcg.c              |    3 ++-
 8 files changed, 26 insertions(+), 75 deletions(-)

diff --git a/configure b/configure
index 80ca430..7741ba9 100755
--- a/configure
+++ b/configure
@@ -86,7 +86,6 @@ source_path=`dirname "$0"`
 cpu=""
 interp_prefix="/usr/gnemul/qemu-%M"
 static="no"
-sparc_cpu=""
 cross_prefix=""
 audio_drv_list=""
 audio_card_list="ac97 es1370 sb16 hda"
@@ -216,21 +215,6 @@ for opt do
   ;;
   --disable-debug-info) debug_info="no"
   ;;
-  --sparc_cpu=*)
-    sparc_cpu="$optarg"
-    case $sparc_cpu in
-    v7|v8|v8plus|v8plusa)
-      cpu="sparc"
-    ;;
-    v9)
-      cpu="sparc64"
-    ;;
-    *)
-      echo "undefined SPARC architecture. Exiting";
-      exit 1
-    ;;
-    esac
-  ;;
   esac
 done
 # OS specific
@@ -284,8 +268,6 @@ elif check_define __i386__ ; then
 elif check_define __x86_64__ ; then
   cpu="x86_64"
 elif check_define __sparc__ ; then
-  # We can't check for 64 bit (when gcc is biarch) or V8PLUSA
-  # They must be specified using --sparc_cpu
   if check_define __arch64__ ; then
     cpu="sparc64"
   else
@@ -749,8 +731,6 @@ for opt do
   ;;
   --enable-uname-release=*) uname_release="$optarg"
   ;;
-  --sparc_cpu=*)
-  ;;
   --enable-werror) werror="yes"
   ;;
   --disable-werror) werror="no"
@@ -830,32 +810,19 @@ for opt do
   esac
 done
 
-#
-# If cpu ~= sparc and  sparc_cpu hasn't been defined, plug in the right
-# QEMU_CFLAGS/LDFLAGS (assume sparc_v8plus for 32-bit and sparc_v9 for 64-bit)
-#
 host_guest_base="no"
 case "$cpu" in
-    sparc) case $sparc_cpu in
-           v7|v8)
-             QEMU_CFLAGS="-mcpu=${sparc_cpu} -D__sparc_${sparc_cpu}__ $QEMU_CFLAGS"
-           ;;
-           v8plus|v8plusa)
-             QEMU_CFLAGS="-mcpu=ultrasparc -D__sparc_${sparc_cpu}__ $QEMU_CFLAGS"
-           ;;
-           *) # sparc_cpu not defined in the command line
-             QEMU_CFLAGS="-mcpu=ultrasparc -D__sparc_v8plus__ $QEMU_CFLAGS"
-           esac
+    sparc)
            LDFLAGS="-m32 $LDFLAGS"
-           QEMU_CFLAGS="-m32 -ffixed-g2 -ffixed-g3 $QEMU_CFLAGS"
+           QEMU_CFLAGS="-m32 -mcpu=ultrasparc $QEMU_CFLAGS"
+           QEMU_CFLAGS="-ffixed-g2 -ffixed-g3 $QEMU_CFLAGS"
            if test "$solaris" = "no" ; then
              QEMU_CFLAGS="-ffixed-g1 -ffixed-g6 $QEMU_CFLAGS"
-             helper_cflags="-ffixed-i0"
            fi
            ;;
     sparc64)
-           QEMU_CFLAGS="-m64 -mcpu=ultrasparc -D__sparc_v9__ $QEMU_CFLAGS"
            LDFLAGS="-m64 $LDFLAGS"
+           QEMU_CFLAGS="-m64 -mcpu=ultrasparc $QEMU_CFLAGS"
            QEMU_CFLAGS="-ffixed-g5 -ffixed-g6 -ffixed-g7 $QEMU_CFLAGS"
            if test "$solaris" != "no" ; then
              QEMU_CFLAGS="-ffixed-g1 $QEMU_CFLAGS"
diff --git a/disas.c b/disas.c
index 4945c44..b3434fa 100644
--- a/disas.c
+++ b/disas.c
@@ -175,9 +175,7 @@ void target_disas(FILE *out, target_ulong code, target_ulong size, int flags)
 	print_insn = print_insn_arm;
 #elif defined(TARGET_SPARC)
     print_insn = print_insn_sparc;
-#ifdef TARGET_SPARC64
     disasm_info.mach = bfd_mach_sparc_v9b;
-#endif
 #elif defined(TARGET_PPC)
     if (flags >> 16)
         disasm_info.endian = BFD_ENDIAN_LITTLE;
@@ -287,9 +285,7 @@ void disas(FILE *out, void *code, unsigned long size)
     print_insn = print_insn_alpha;
 #elif defined(__sparc__)
     print_insn = print_insn_sparc;
-#if defined(__sparc_v8plus__) || defined(__sparc_v8plusa__) || defined(__sparc_v9__)
     disasm_info.mach = bfd_mach_sparc_v9b;
-#endif
 #elif defined(__arm__)
     print_insn = print_insn_arm;
 #elif defined(__MIPSEB__)
@@ -397,9 +393,7 @@ void monitor_disas(Monitor *mon, CPUArchState *env,
     print_insn = print_insn_alpha;
 #elif defined(TARGET_SPARC)
     print_insn = print_insn_sparc;
-#ifdef TARGET_SPARC64
     disasm_info.mach = bfd_mach_sparc_v9b;
-#endif
 #elif defined(TARGET_PPC)
 #ifdef TARGET_PPC64
     disasm_info.mach = bfd_mach_ppc64;
diff --git a/dyngen-exec.h b/dyngen-exec.h
index 083e20b..cfeef99 100644
--- a/dyngen-exec.h
+++ b/dyngen-exec.h
@@ -39,13 +39,11 @@
 #elif defined(__sparc__)
 #ifdef CONFIG_SOLARIS
 #define AREG0 "g2"
-#else
-#ifdef __sparc_v9__
+#elif HOST_LONG_BITS == 64
 #define AREG0 "g5"
 #else
 #define AREG0 "g6"
 #endif
-#endif
 #elif defined(__s390__)
 #define AREG0 "r10"
 #elif defined(__alpha__)
diff --git a/exec.c b/exec.c
index 6731ab8..ad13ce1 100644
--- a/exec.c
+++ b/exec.c
@@ -86,7 +86,7 @@ static int nb_tbs;
 /* any access to the tbs or the page table must use this lock */
 spinlock_t tb_lock = SPIN_LOCK_UNLOCKED;
 
-#if defined(__arm__) || defined(__sparc_v9__)
+#if defined(__arm__) || defined(__sparc__)
 /* The prologue must be reachable with a direct jump. ARM and Sparc64
  have limited branch ranges (possibly also PPC) so place it in a
  section close to code segment. */
@@ -559,10 +559,9 @@ static void code_gen_alloc(unsigned long tb_size)
         /* Cannot map more than that */
         if (code_gen_buffer_size > (800 * 1024 * 1024))
             code_gen_buffer_size = (800 * 1024 * 1024);
-#elif defined(__sparc_v9__)
+#elif defined(__sparc__) && HOST_LONG_BITS == 64
         // Map the buffer below 2G, so we can use direct calls and branches
-        flags |= MAP_FIXED;
-        start = (void *) 0x60000000UL;
+        start = (void *) 0x40000000UL;
         if (code_gen_buffer_size > (512 * 1024 * 1024))
             code_gen_buffer_size = (512 * 1024 * 1024);
 #elif defined(__arm__)
@@ -600,10 +599,9 @@ static void code_gen_alloc(unsigned long tb_size)
         /* Cannot map more than that */
         if (code_gen_buffer_size > (800 * 1024 * 1024))
             code_gen_buffer_size = (800 * 1024 * 1024);
-#elif defined(__sparc_v9__)
+#elif defined(__sparc__) && HOST_LONG_BITS == 64
         // Map the buffer below 2G, so we can use direct calls and branches
-        flags |= MAP_FIXED;
-        addr = (void *) 0x60000000UL;
+        addr = (void *) 0x40000000UL;
         if (code_gen_buffer_size > (512 * 1024 * 1024)) {
             code_gen_buffer_size = (512 * 1024 * 1024);
         }
diff --git a/qemu-timer.h b/qemu-timer.h
index de17f3b..b730427 100644
--- a/qemu-timer.h
+++ b/qemu-timer.h
@@ -221,7 +221,7 @@ static inline int64_t cpu_get_real_ticks(void)
     return val;
 }
 
-#elif defined(__sparc_v8plus__) || defined(__sparc_v8plusa__) || defined(__sparc_v9__)
+#elif defined(__sparc__)
 
 static inline int64_t cpu_get_real_ticks (void)
 {
@@ -230,6 +230,8 @@ static inline int64_t cpu_get_real_ticks (void)
     asm volatile("rd %%tick,%0" : "=r"(rval));
     return rval;
 #else
+    /* We need an %o or %g register for this.  For recent enough gcc
+       there is an "h" constraint for that.  Don't bother with that.  */
     union {
         uint64_t i64;
         struct {
@@ -237,8 +239,8 @@ static inline int64_t cpu_get_real_ticks (void)
             uint32_t low;
         }       i32;
     } rval;
-    asm volatile("rd %%tick,%1; srlx %1,32,%0"
-                 : "=r"(rval.i32.high), "=r"(rval.i32.low));
+    asm volatile("rd %%tick,%%g1; srlx %%g1,32,%0; mov %%g1,%1"
+                 : "=r"(rval.i32.high), "=r"(rval.i32.low) : : "g1");
     return rval.i64;
 #endif
 }
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 358a70c..38be0c8 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -627,18 +627,10 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
 
     default:
         tcg_out_cmp(s, c1, c2, c2const);
-#if defined(__sparc_v9__) || defined(__sparc_v8plus__)
         tcg_out_movi_imm13(s, ret, 0);
-        tcg_out32 (s, ARITH_MOVCC | INSN_RD(ret)
-                   | INSN_RS1(tcg_cond_to_bcond[cond])
-                   | MOVCC_ICC | INSN_IMM11(1));
-#else
-        t = gen_new_label();
-        tcg_out_branch_i32(s, INSN_COND(tcg_cond_to_bcond[cond], 1), t);
-        tcg_out_movi_imm13(s, ret, 1);
-        tcg_out_movi_imm13(s, ret, 0);
-        tcg_out_label(s, t, s->code_ptr);
-#endif
+        tcg_out32(s, ARITH_MOVCC | INSN_RD(ret)
+                  | INSN_RS1(tcg_cond_to_bcond[cond])
+                  | MOVCC_ICC | INSN_IMM11(1));
         return;
     }
 
@@ -768,7 +760,7 @@ static const void * const qemu_st_helpers[4] = {
 #endif
 #endif
 
-#ifdef __arch64__
+#if TCG_TARGET_REG_BITS == 64
 #define HOST_LD_OP LDX
 #define HOST_ST_OP STX
 #define HOST_SLL_OP SHIFT_SLLX
@@ -1630,11 +1622,9 @@ static void tcg_target_init(TCGContext *s)
 
 #if TCG_TARGET_REG_BITS == 64
 # define ELF_HOST_MACHINE  EM_SPARCV9
-#elif defined(__sparc_v8plus__)
+#else
 # define ELF_HOST_MACHINE  EM_SPARC32PLUS
 # define ELF_HOST_FLAGS    EF_SPARC_32PLUS
-#else
-# define ELF_HOST_MACHINE  EM_SPARC
 #endif
 
 typedef struct {
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index ee2274d..56742bf 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -67,7 +67,8 @@ typedef enum {
 
 /* used for function call generation */
 #define TCG_REG_CALL_STACK TCG_REG_I6
-#ifdef __arch64__
+
+#if TCG_TARGET_REG_BITS == 64
 // Reserve space for AREG0
 #define TCG_TARGET_STACK_MINFRAME (176 + 4 * (int)sizeof(long) + \
                                    TCG_STATIC_CALL_ARGS_SIZE)
@@ -81,7 +82,7 @@ typedef enum {
 #define TCG_TARGET_STACK_ALIGN 8
 #endif
 
-#ifdef __arch64__
+#if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_EXTEND_ARGS 1
 #endif
 
@@ -128,7 +129,7 @@ typedef enum {
 /* Note: must be synced with dyngen-exec.h */
 #ifdef CONFIG_SOLARIS
 #define TCG_AREG0 TCG_REG_G2
-#elif defined(__sparc_v9__)
+#elif HOST_LONG_BITS == 64
 #define TCG_AREG0 TCG_REG_G5
 #else
 #define TCG_AREG0 TCG_REG_G6
diff --git a/tcg/tcg.c b/tcg/tcg.c
index ab589c7..9f234f4 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1457,7 +1457,8 @@ static void temp_allocate_frame(TCGContext *s, int temp)
 {
     TCGTemp *ts;
     ts = &s->temps[temp];
-#ifndef __sparc_v9__ /* Sparc64 stack is accessed with offset of 2047 */
+#if !(defined(__sparc__) && TCG_TARGET_REG_BITS == 64)
+    /* Sparc64 stack is accessed with offset of 2047 */
     s->current_frame_offset = (s->current_frame_offset +
                                (tcg_target_long)sizeof(tcg_target_long) - 1) &
         ~(sizeof(tcg_target_long) - 1);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 04/14] tcg-sparc: Fix qemu_ld/st to handle 32-bit host.
  2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
                   ` (2 preceding siblings ...)
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 03/14] tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode Richard Henderson
@ 2012-03-28  0:32 ` Richard Henderson
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 05/14] tcg-sparc: Simplify qemu_ld/st direct memory paths Richard Henderson
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel

At the same time, split out the tlb load logic to a new function.
Fixes the cases of two data registers and two address registers.
Fixes the signature of, and adds missing, qemu_ld/st opcodes.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |  751 ++++++++++++++++++++++++------------------------
 1 files changed, 378 insertions(+), 373 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 38be0c8..c74fc2c 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -448,14 +448,15 @@ static inline void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val)
     }
 }
 
-static inline void tcg_out_andi(TCGContext *s, int reg, tcg_target_long val)
+static inline void tcg_out_andi(TCGContext *s, int rd, int rs,
+                                tcg_target_long val)
 {
     if (val != 0) {
         if (check_fit_tl(val, 13))
-            tcg_out_arithi(s, reg, reg, val, ARITH_AND);
+            tcg_out_arithi(s, rd, rs, val, ARITH_AND);
         else {
             tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_I5, val);
-            tcg_out_arith(s, reg, reg, TCG_REG_I5, ARITH_AND);
+            tcg_out_arith(s, rd, rs, TCG_REG_I5, ARITH_AND);
         }
     }
 }
@@ -744,422 +745,405 @@ static const void * const qemu_st_helpers[4] = {
     __stq_mmu,
 };
 #endif
-#endif
 
-#if TARGET_LONG_BITS == 32
-#define TARGET_LD_OP LDUW
-#else
-#define TARGET_LD_OP LDX
-#endif
+/* Perform the TLB load and compare.
 
-#if defined(CONFIG_SOFTMMU)
-#if HOST_LONG_BITS == 32
-#define TARGET_ADDEND_LD_OP LDUW
-#else
-#define TARGET_ADDEND_LD_OP LDX
-#endif
-#endif
+   Inputs:
+   ADDRLO_IDX contains the index into ARGS of the low part of the
+   address; the high part of the address is at ADDR_LOW_IDX+1.
 
-#if TCG_TARGET_REG_BITS == 64
-#define HOST_LD_OP LDX
-#define HOST_ST_OP STX
-#define HOST_SLL_OP SHIFT_SLLX
-#define HOST_SRA_OP SHIFT_SRAX
+   MEM_INDEX and S_BITS are the memory context and log2 size of the load.
+
+   WHICH is the offset into the CPUTLBEntry structure of the slot to read.
+   This should be offsetof addr_read or addr_write.
+
+   Outputs:
+   LABEL_PTRS is filled with the position of the forward jumps to the
+   TLB miss case.  This will always be a ,PN insn, so a 19-bit offset.
+
+   Returns a register loaded with the low part of the address, adjusted
+   as indicated by the TLB and so is a host address.  Undefined in the
+   TLB miss case.  */
+
+static int tcg_out_tlb_load(TCGContext *s, int addrlo_idx, int mem_index,
+                            int s_bits, const TCGArg *args,
+                            uint32_t **label_ptr, int which)
+{
+    const int addrlo = args[addrlo_idx];
+    const int r0 = tcg_target_call_iarg_regs[0];
+    const int r1 = tcg_target_call_iarg_regs[1];
+    const int r2 = tcg_target_call_iarg_regs[2];
+    int addr = addrlo;
+    int tlb_ofs;
+
+    if (TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 64) {
+        /* Assemble the 64-bit address in R0.  */
+        tcg_out_arithi(s, r0, addrlo, 0, SHIFT_SRL);
+        tcg_out_arithi(s, r1, args[addrlo_idx + 1], 32, SHIFT_SLLX);
+        tcg_out_arith(s, r0, r0, r1, ARITH_OR);
+    }
+
+    /* Shift the page number down to tlb-entry.  */
+    tcg_out_arithi(s, r1, addrlo,
+                   TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS, SHIFT_SRL);
+
+    /* Mask out the page offset, except for the required alignment.  */
+    tcg_out_andi(s, r0, addr, TARGET_PAGE_MASK | ((1 << s_bits) - 1));
+
+    /* Compute tlb index, modulo tlb size.  */
+    tcg_out_andi(s, r1, r1, (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
+
+    /* Relative to the current ENV.  */
+    tcg_out_arith(s, r1, TCG_AREG0, r1, ARITH_ADD);
+
+    /* Find a base address that can load both tlb comparator and addend.  */
+    tlb_ofs = offsetof(CPUArchState, tlb_table[mem_index][0]);
+    if (!check_fit_tl(tlb_ofs + sizeof(CPUTLBEntry), 13)) {
+        tcg_out_addi(s, r1, tlb_ofs);
+        tlb_ofs = 0;
+    }
+
+    /* ld [arg1 + which], arg2 */
+    tcg_out_ld(s, TCG_TYPE_TL, r2, r1, tlb_ofs + which);
+
+    /* subcc arg0, arg2, %g0 */
+    tcg_out_cmp(s, r0, r2, 0);
+
+    /* bne,pn %[ix]cc, label0 */
+    *label_ptr = (uint32_t *)s->code_ptr;
+    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_NE, 0) | INSN_OP2(0x1) |
+                  ((TARGET_LONG_BITS == 64) << 21)));
+
+    /* TLB Hit.  Compute the host address into r1.  The ld is in the
+       branch delay slot; harmless for the TLB miss case.  */
+    tcg_out_ld(s, TCG_TYPE_PTR, r1, r1, tlb_ofs+offsetof(CPUTLBEntry, addend));
+
+    if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
+        tcg_out_arithi(s, r0, addrlo, 0, SHIFT_SRL);
+        tcg_out_arith(s, r1, r0, r1, ARITH_ADD);
+    } else {
+        tcg_out_arith(s, r1, addrlo, r1, ARITH_ADD);
+    }
+
+    return r1;
+}
+#endif /* CONFIG_SOFTMMU */
+
+static void tcg_out_qemu_ld_direct(TCGContext *s, int addr, int datalo,
+                                   int datahi, int sizeop)
+{
+#ifdef TARGET_WORDS_BIGENDIAN
+    const int bigendian = 1;
 #else
-#define HOST_LD_OP LDUW
-#define HOST_ST_OP STW
-#define HOST_SLL_OP SHIFT_SLL
-#define HOST_SRA_OP SHIFT_SRA
+    const int bigendian = 0;
 #endif
+    switch (sizeop) {
+    case 0:
+        /* ldub [addr], datalo */
+        tcg_out_ldst(s, datalo, addr, 0, LDUB);
+        break;
+    case 0 | 4:
+        /* ldsb [addr], datalo */
+        tcg_out_ldst(s, datalo, addr, 0, LDSB);
+        break;
+    case 1:
+        if (bigendian) {
+            /* lduh [addr], datalo */
+            tcg_out_ldst(s, datalo, addr, 0, LDUH);
+        } else {
+            /* lduha [addr] ASI_PRIMARY_LITTLE, datalo */
+            tcg_out_ldst_asi(s, datalo, addr, 0, LDUHA, ASI_PRIMARY_LITTLE);
+        }
+        break;
+    case 1 | 4:
+        if (bigendian) {
+            /* ldsh [addr], datalo */
+            tcg_out_ldst(s, datalo, addr, 0, LDSH);
+        } else {
+            /* ldsha [addr] ASI_PRIMARY_LITTLE, datalo */
+            tcg_out_ldst_asi(s, datalo, addr, 0, LDSHA, ASI_PRIMARY_LITTLE);
+        }
+        break;
+    case 2:
+        if (bigendian) {
+            /* lduw [addr], datalo */
+            tcg_out_ldst(s, datalo, addr, 0, LDUW);
+        } else {
+            /* lduwa [addr] ASI_PRIMARY_LITTLE, datalo */
+            tcg_out_ldst_asi(s, datalo, addr, 0, LDUWA, ASI_PRIMARY_LITTLE);
+        }
+        break;
+    case 2 | 4:
+        if (bigendian) {
+            /* ldsw [addr], datalo */
+            tcg_out_ldst(s, datalo, addr, 0, LDSW);
+        } else {
+            /* ldswa [addr] ASI_PRIMARY_LITTLE, datalo */
+            tcg_out_ldst_asi(s, datalo, addr, 0, LDSWA, ASI_PRIMARY_LITTLE);
+        }
+        break;
+    case 3:
+        if (TCG_TARGET_REG_BITS == 64) {
+            if (bigendian) {
+                /* ldx [addr], datalo */
+                tcg_out_ldst(s, datalo, addr, 0, LDX);
+            } else {
+                /* ldxa [addr] ASI_PRIMARY_LITTLE, datalo */
+                tcg_out_ldst_asi(s, datalo, addr, 0, LDXA, ASI_PRIMARY_LITTLE);
+            }
+        } else {
+            if (bigendian) {
+                tcg_out_ldst(s, datahi, addr, 0, LDUW);
+                tcg_out_ldst(s, datalo, addr, 4, LDUW);
+            } else {
+                tcg_out_ldst_asi(s, datalo, addr, 0, LDUWA, ASI_PRIMARY_LITTLE);
+                tcg_out_ldst_asi(s, datahi, addr, 4, LDUWA, ASI_PRIMARY_LITTLE);
+            }
+        }
+        break;
+    default:
+        tcg_abort();
+    }
+}
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
-                            int opc)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 {
-    int addr_reg, data_reg, arg0, arg1, arg2, mem_index, s_bits;
+    int addrlo_idx = 1, datalo, datahi, addr_reg;
 #if defined(CONFIG_SOFTMMU)
-    uint32_t *label1_ptr, *label2_ptr;
+    int memi_idx, memi, s_bits, n;
+    uint32_t *label_ptr[2];
 #endif
 
-    data_reg = *args++;
-    addr_reg = *args++;
-    mem_index = *args;
-    s_bits = opc & 3;
-
-    arg0 = TCG_REG_O0;
-    arg1 = TCG_REG_O1;
-    arg2 = TCG_REG_O2;
+    datahi = datalo = args[0];
+    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
+        datahi = args[1];
+        addrlo_idx = 2;
+    }
 
 #if defined(CONFIG_SOFTMMU)
-    /* srl addr_reg, x, arg1 */
-    tcg_out_arithi(s, arg1, addr_reg, TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS,
-                   SHIFT_SRL);
-    /* and addr_reg, x, arg0 */
-    tcg_out_arithi(s, arg0, addr_reg, TARGET_PAGE_MASK | ((1 << s_bits) - 1),
-                   ARITH_AND);
-
-    /* and arg1, x, arg1 */
-    tcg_out_andi(s, arg1, (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
-
-    /* add arg1, x, arg1 */
-    tcg_out_addi(s, arg1, offsetof(CPUArchState,
-                                   tlb_table[mem_index][0].addr_read));
-
-    /* add env, arg1, arg1 */
-    tcg_out_arith(s, arg1, TCG_AREG0, arg1, ARITH_ADD);
+    memi_idx = addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
+    memi = args[memi_idx];
+    s_bits = opc & 3;
 
-    /* ld [arg1], arg2 */
-    tcg_out32(s, TARGET_LD_OP | INSN_RD(arg2) | INSN_RS1(arg1) |
-              INSN_RS2(TCG_REG_G0));
+    addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, s_bits, args,
+                                label_ptr, offsetof(CPUTLBEntry, addr_read));
 
-    /* subcc arg0, arg2, %g0 */
-    tcg_out_arith(s, TCG_REG_G0, arg0, arg2, ARITH_SUBCC);
+    /* TLB Hit.  */
+    tcg_out_qemu_ld_direct(s, addr_reg, datalo, datahi, opc);
 
-    /* will become:
-       be label1
-        or
-       be,pt %xcc label1 */
-    label1_ptr = (uint32_t *)s->code_ptr;
-    tcg_out32(s, 0);
+    /* b,pt,n label1 */
+    label_ptr[1] = (uint32_t *)s->code_ptr;
+    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x1)
+                  | (1 << 29) | (1 << 19)));
 
-    /* mov (delay slot) */
-    tcg_out_mov(s, TCG_TYPE_PTR, arg0, addr_reg);
+    /* TLB Miss.  */
 
-    /* mov */
-    tcg_out_movi(s, TCG_TYPE_I32, arg1, mem_index);
+    *label_ptr[0] |= INSN_OFF19((unsigned long)s->code_ptr -
+                                (unsigned long)label_ptr[0]);
+    n = 0;
 #ifdef CONFIG_TCG_PASS_AREG0
-    /* XXX/FIXME: suboptimal */
-    tcg_out_mov(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3],
-                tcg_target_call_iarg_regs[2]);
-    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[2],
-                tcg_target_call_iarg_regs[1]);
-    tcg_out_mov(s, TCG_TYPE_TL, tcg_target_call_iarg_regs[1],
-                tcg_target_call_iarg_regs[0]);
-    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0],
-                TCG_AREG0);
+    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[n++], TCG_AREG0);
 #endif
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
+                    args[addrlo_idx + 1]);
+    }
+    tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
+                args[addrlo_idx]);
+
+    /* Store AREG0 in stack to avoid ugly glibc bugs that mangle
+       global registers */
+    tcg_out_st(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
+               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
+               sizeof(long));
 
-    /* XXX: move that code at the end of the TB */
     /* qemu_ld_helper[s_bits](arg0, arg1) */
     tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_ld_helpers[s_bits]
                            - (tcg_target_ulong)s->code_ptr) >> 2)
                          & 0x3fffffff));
-    /* Store AREG0 in stack to avoid ugly glibc bugs that mangle
-       global registers */
-    // delay slot
-    tcg_out_ldst(s, TCG_AREG0, TCG_REG_CALL_STACK,
-                 TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                 sizeof(long), HOST_ST_OP);
-    tcg_out_ldst(s, TCG_AREG0, TCG_REG_CALL_STACK,
-                 TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                 sizeof(long), HOST_LD_OP);
-
-    /* data_reg = sign_extend(arg0) */
+    /* delay slot */
+    tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[n], memi);
+
+    /* Reload AREG0.  */
+    tcg_out_ld(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
+               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
+               sizeof(long));
+
+    n = tcg_target_call_oarg_regs[0];
+    /* datalo = sign_extend(arg0) */
     switch(opc) {
     case 0 | 4:
-        /* sll arg0, 24/56, data_reg */
-        tcg_out_arithi(s, data_reg, arg0, (int)sizeof(tcg_target_long) * 8 - 8,
-                       HOST_SLL_OP);
-        /* sra data_reg, 24/56, data_reg */
-        tcg_out_arithi(s, data_reg, data_reg,
-                       (int)sizeof(tcg_target_long) * 8 - 8, HOST_SRA_OP);
+        /* Recall that SRA sign extends from bit 31 through bit 63.  */
+        tcg_out_arithi(s, datalo, n, 24, SHIFT_SLL);
+        tcg_out_arithi(s, datalo, datalo, 24, SHIFT_SRA);
         break;
     case 1 | 4:
-        /* sll arg0, 16/48, data_reg */
-        tcg_out_arithi(s, data_reg, arg0,
-                       (int)sizeof(tcg_target_long) * 8 - 16, HOST_SLL_OP);
-        /* sra data_reg, 16/48, data_reg */
-        tcg_out_arithi(s, data_reg, data_reg,
-                       (int)sizeof(tcg_target_long) * 8 - 16, HOST_SRA_OP);
+        tcg_out_arithi(s, datalo, n, 16, SHIFT_SLL);
+        tcg_out_arithi(s, datalo, datalo, 16, SHIFT_SRA);
         break;
     case 2 | 4:
-        /* sll arg0, 32, data_reg */
-        tcg_out_arithi(s, data_reg, arg0, 32, HOST_SLL_OP);
-        /* sra data_reg, 32, data_reg */
-        tcg_out_arithi(s, data_reg, data_reg, 32, HOST_SRA_OP);
+        tcg_out_arithi(s, datalo, n, 0, SHIFT_SRA);
         break;
+    case 3:
+        if (TCG_TARGET_REG_BITS == 32) {
+            tcg_out_mov(s, TCG_TYPE_REG, datahi, n);
+            tcg_out_mov(s, TCG_TYPE_REG, datalo, n + 1);
+            break;
+        }
+        /* FALLTHRU */
     case 0:
     case 1:
     case 2:
-    case 3:
     default:
         /* mov */
-        tcg_out_mov(s, TCG_TYPE_REG, data_reg, arg0);
+        tcg_out_mov(s, TCG_TYPE_REG, datalo, n);
         break;
     }
 
-    /* will become:
-       ba label2 */
-    label2_ptr = (uint32_t *)s->code_ptr;
-    tcg_out32(s, 0);
-
-    /* nop (delay slot */
-    tcg_out_nop(s);
-
-    /* label1: */
-#if TARGET_LONG_BITS == 32
-    /* be label1 */
-    *label1_ptr = (INSN_OP(0) | INSN_COND(COND_E, 0) | INSN_OP2(0x2) |
-                   INSN_OFF22((unsigned long)s->code_ptr -
-                              (unsigned long)label1_ptr));
+    *label_ptr[1] |= INSN_OFF19((unsigned long)s->code_ptr -
+                                (unsigned long)label_ptr[1]);
 #else
-    /* be,pt %xcc label1 */
-    *label1_ptr = (INSN_OP(0) | INSN_COND(COND_E, 0) | INSN_OP2(0x1) |
-                   (0x5 << 19) | INSN_OFF19((unsigned long)s->code_ptr -
-                              (unsigned long)label1_ptr));
-#endif
-
-    /* ld [arg1 + x], arg1 */
-    tcg_out_ldst(s, arg1, arg1, offsetof(CPUTLBEntry, addend) -
-                 offsetof(CPUTLBEntry, addr_read), TARGET_ADDEND_LD_OP);
-
-#if TARGET_LONG_BITS == 32
-    /* and addr_reg, x, arg0 */
-    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_I5, 0xffffffff);
-    tcg_out_arith(s, arg0, addr_reg, TCG_REG_I5, ARITH_AND);
-    /* add arg0, arg1, arg0 */
-    tcg_out_arith(s, arg0, arg0, arg1, ARITH_ADD);
-#else
-    /* add addr_reg, arg1, arg0 */
-    tcg_out_arith(s, arg0, addr_reg, arg1, ARITH_ADD);
-#endif
+    addr_reg = args[addrlo_idx];
+    if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
+        tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
+        addr_reg = TCG_REG_I5;
+    }
+    tcg_out_qemu_ld_direct(s, addr_reg, datalo, datahi, opc);
+#endif /* CONFIG_SOFTMMU */
+}
 
+static void tcg_out_qemu_st_direct(TCGContext *s, int addr, int datalo,
+                                   int datahi, int sizeop)
+{
+#ifdef TARGET_WORDS_BIGENDIAN
+    const int bigendian = 1;
 #else
-    arg0 = addr_reg;
+    const int bigendian = 0;
 #endif
-
-    switch(opc) {
+    switch (sizeop) {
     case 0:
-        /* ldub [arg0], data_reg */
-        tcg_out_ldst(s, data_reg, arg0, 0, LDUB);
-        break;
-    case 0 | 4:
-        /* ldsb [arg0], data_reg */
-        tcg_out_ldst(s, data_reg, arg0, 0, LDSB);
+        /* stb datalo, [addr] */
+        tcg_out_ldst(s, datalo, addr, 0, STB);
         break;
     case 1:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* lduh [arg0], data_reg */
-        tcg_out_ldst(s, data_reg, arg0, 0, LDUH);
-#else
-        /* lduha [arg0] ASI_PRIMARY_LITTLE, data_reg */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, LDUHA, ASI_PRIMARY_LITTLE);
-#endif
-        break;
-    case 1 | 4:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* ldsh [arg0], data_reg */
-        tcg_out_ldst(s, data_reg, arg0, 0, LDSH);
-#else
-        /* ldsha [arg0] ASI_PRIMARY_LITTLE, data_reg */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, LDSHA, ASI_PRIMARY_LITTLE);
-#endif
+        if (bigendian) {
+            /* sth datalo, [addr] */
+            tcg_out_ldst(s, datalo, addr, 0, STH);
+        } else {
+            /* stha datalo, [addr] ASI_PRIMARY_LITTLE */
+            tcg_out_ldst_asi(s, datalo, addr, 0, STHA, ASI_PRIMARY_LITTLE);
+        }
         break;
     case 2:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* lduw [arg0], data_reg */
-        tcg_out_ldst(s, data_reg, arg0, 0, LDUW);
-#else
-        /* lduwa [arg0] ASI_PRIMARY_LITTLE, data_reg */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, LDUWA, ASI_PRIMARY_LITTLE);
-#endif
-        break;
-    case 2 | 4:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* ldsw [arg0], data_reg */
-        tcg_out_ldst(s, data_reg, arg0, 0, LDSW);
-#else
-        /* ldswa [arg0] ASI_PRIMARY_LITTLE, data_reg */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, LDSWA, ASI_PRIMARY_LITTLE);
-#endif
+        if (bigendian) {
+            /* stw datalo, [addr] */
+            tcg_out_ldst(s, datalo, addr, 0, STW);
+        } else {
+            /* stwa datalo, [addr] ASI_PRIMARY_LITTLE */
+            tcg_out_ldst_asi(s, datalo, addr, 0, STWA, ASI_PRIMARY_LITTLE);
+        }
         break;
     case 3:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* ldx [arg0], data_reg */
-        tcg_out_ldst(s, data_reg, arg0, 0, LDX);
-#else
-        /* ldxa [arg0] ASI_PRIMARY_LITTLE, data_reg */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, LDXA, ASI_PRIMARY_LITTLE);
-#endif
+        if (TCG_TARGET_REG_BITS == 64) {
+            if (bigendian) {
+                /* stx datalo, [addr] */
+                tcg_out_ldst(s, datalo, addr, 0, STX);
+            } else {
+                /* stxa datalo, [addr] ASI_PRIMARY_LITTLE */
+                tcg_out_ldst_asi(s, datalo, addr, 0, STXA, ASI_PRIMARY_LITTLE);
+            }
+        } else {
+            if (bigendian) {
+                tcg_out_ldst(s, datahi, addr, 0, STW);
+                tcg_out_ldst(s, datalo, addr, 4, STW);
+            } else {
+                tcg_out_ldst_asi(s, datalo, addr, 0, STWA, ASI_PRIMARY_LITTLE);
+                tcg_out_ldst_asi(s, datahi, addr, 4, STWA, ASI_PRIMARY_LITTLE);
+            }
+        }
         break;
     default:
         tcg_abort();
     }
-
-#if defined(CONFIG_SOFTMMU)
-    /* label2: */
-    *label2_ptr = (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x2) |
-                   INSN_OFF22((unsigned long)s->code_ptr -
-                              (unsigned long)label2_ptr));
-#endif
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
-                            int opc)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
 {
-    int addr_reg, data_reg, arg0, arg1, arg2, mem_index, s_bits;
+    int addrlo_idx = 1, datalo, datahi, addr_reg;
 #if defined(CONFIG_SOFTMMU)
-    uint32_t *label1_ptr, *label2_ptr;
+    int memi_idx, memi, n;
+    uint32_t *label_ptr[2];
 #endif
 
-    data_reg = *args++;
-    addr_reg = *args++;
-    mem_index = *args;
-
-    s_bits = opc;
-
-    arg0 = TCG_REG_O0;
-    arg1 = TCG_REG_O1;
-    arg2 = TCG_REG_O2;
+    datahi = datalo = args[0];
+    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
+        datahi = args[1];
+        addrlo_idx = 2;
+    }
 
 #if defined(CONFIG_SOFTMMU)
-    /* srl addr_reg, x, arg1 */
-    tcg_out_arithi(s, arg1, addr_reg, TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS,
-                   SHIFT_SRL);
+    memi_idx = addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
+    memi = args[memi_idx];
 
-    /* and addr_reg, x, arg0 */
-    tcg_out_arithi(s, arg0, addr_reg, TARGET_PAGE_MASK | ((1 << s_bits) - 1),
-                   ARITH_AND);
+    addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, opc, args,
+                                label_ptr, offsetof(CPUTLBEntry, addr_write));
 
-    /* and arg1, x, arg1 */
-    tcg_out_andi(s, arg1, (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
+    /* TLB Hit.  */
+    tcg_out_qemu_st_direct(s, addr_reg, datalo, datahi, opc);
 
-    /* add arg1, x, arg1 */
-    tcg_out_addi(s, arg1, offsetof(CPUArchState,
-                                   tlb_table[mem_index][0].addr_write));
+    /* b,pt,n label1 */
+    label_ptr[1] = (uint32_t *)s->code_ptr;
+    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x1)
+                  | (1 << 29) | (1 << 19)));
 
-    /* add env, arg1, arg1 */
-    tcg_out_arith(s, arg1, TCG_AREG0, arg1, ARITH_ADD);
+    /* TLB Miss.  */
 
-    /* ld [arg1], arg2 */
-    tcg_out32(s, TARGET_LD_OP | INSN_RD(arg2) | INSN_RS1(arg1) |
-              INSN_RS2(TCG_REG_G0));
-
-    /* subcc arg0, arg2, %g0 */
-    tcg_out_arith(s, TCG_REG_G0, arg0, arg2, ARITH_SUBCC);
-
-    /* will become:
-       be label1
-        or
-       be,pt %xcc label1 */
-    label1_ptr = (uint32_t *)s->code_ptr;
-    tcg_out32(s, 0);
-
-    /* mov (delay slot) */
-    tcg_out_mov(s, TCG_TYPE_PTR, arg0, addr_reg);
-
-    /* mov */
-    tcg_out_mov(s, TCG_TYPE_REG, arg1, data_reg);
-
-    /* mov */
-    tcg_out_movi(s, TCG_TYPE_I32, arg2, mem_index);
+    *label_ptr[0] |= INSN_OFF19((unsigned long)s->code_ptr -
+                                (unsigned long)label_ptr[0]);
 
+    n = 0;
 #ifdef CONFIG_TCG_PASS_AREG0
-    /* XXX/FIXME: suboptimal */
-    tcg_out_mov(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3],
-                tcg_target_call_iarg_regs[2]);
-    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[2],
-                tcg_target_call_iarg_regs[1]);
-    tcg_out_mov(s, TCG_TYPE_TL, tcg_target_call_iarg_regs[1],
-                tcg_target_call_iarg_regs[0]);
-    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0],
-                TCG_AREG0);
+    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[n++], TCG_AREG0);
 #endif
-    /* XXX: move that code at the end of the TB */
-    /* qemu_st_helper[s_bits](arg0, arg1, arg2) */
-    tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_st_helpers[s_bits]
-                           - (tcg_target_ulong)s->code_ptr) >> 2)
-                         & 0x3fffffff));
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
+                    args[addrlo_idx + 1]);
+    }
+    tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
+                args[addrlo_idx]);
+    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
+        tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++], datahi);
+    }
+    tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++], datalo);
+
     /* Store AREG0 in stack to avoid ugly glibc bugs that mangle
        global registers */
-    // delay slot
-    tcg_out_ldst(s, TCG_AREG0, TCG_REG_CALL_STACK,
-                 TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                 sizeof(long), HOST_ST_OP);
-    tcg_out_ldst(s, TCG_AREG0, TCG_REG_CALL_STACK,
-                 TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                 sizeof(long), HOST_LD_OP);
-
-    /* will become:
-       ba label2 */
-    label2_ptr = (uint32_t *)s->code_ptr;
-    tcg_out32(s, 0);
-
-    /* nop (delay slot) */
-    tcg_out_nop(s);
-
-#if TARGET_LONG_BITS == 32
-    /* be label1 */
-    *label1_ptr = (INSN_OP(0) | INSN_COND(COND_E, 0) | INSN_OP2(0x2) |
-                   INSN_OFF22((unsigned long)s->code_ptr -
-                              (unsigned long)label1_ptr));
-#else
-    /* be,pt %xcc label1 */
-    *label1_ptr = (INSN_OP(0) | INSN_COND(COND_E, 0) | INSN_OP2(0x1) |
-                   (0x5 << 19) | INSN_OFF19((unsigned long)s->code_ptr -
-                              (unsigned long)label1_ptr));
-#endif
+    tcg_out_st(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
+               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
+               sizeof(long));
 
-    /* ld [arg1 + x], arg1 */
-    tcg_out_ldst(s, arg1, arg1, offsetof(CPUTLBEntry, addend) -
-                 offsetof(CPUTLBEntry, addr_write), TARGET_ADDEND_LD_OP);
+    /* qemu_st_helper[s_bits](arg0, arg1, arg2) */
+    tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_st_helpers[opc]
+                           - (tcg_target_ulong)s->code_ptr) >> 2)
+                         & 0x3fffffff));
+    /* delay slot */
+    tcg_out_movi(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n], memi);
 
-#if TARGET_LONG_BITS == 32
-    /* and addr_reg, x, arg0 */
-    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_I5, 0xffffffff);
-    tcg_out_arith(s, arg0, addr_reg, TCG_REG_I5, ARITH_AND);
-    /* add arg0, arg1, arg0 */
-    tcg_out_arith(s, arg0, arg0, arg1, ARITH_ADD);
-#else
-    /* add addr_reg, arg1, arg0 */
-    tcg_out_arith(s, arg0, addr_reg, arg1, ARITH_ADD);
-#endif
+    /* Reload AREG0.  */
+    tcg_out_ld(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
+               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
+               sizeof(long));
 
+    *label_ptr[1] |= INSN_OFF19((unsigned long)s->code_ptr -
+                                (unsigned long)label_ptr[1]);
 #else
-    arg0 = addr_reg;
-#endif
-
-    switch(opc) {
-    case 0:
-        /* stb data_reg, [arg0] */
-        tcg_out_ldst(s, data_reg, arg0, 0, STB);
-        break;
-    case 1:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* sth data_reg, [arg0] */
-        tcg_out_ldst(s, data_reg, arg0, 0, STH);
-#else
-        /* stha data_reg, [arg0] ASI_PRIMARY_LITTLE */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, STHA, ASI_PRIMARY_LITTLE);
-#endif
-        break;
-    case 2:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* stw data_reg, [arg0] */
-        tcg_out_ldst(s, data_reg, arg0, 0, STW);
-#else
-        /* stwa data_reg, [arg0] ASI_PRIMARY_LITTLE */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, STWA, ASI_PRIMARY_LITTLE);
-#endif
-        break;
-    case 3:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* stx data_reg, [arg0] */
-        tcg_out_ldst(s, data_reg, arg0, 0, STX);
-#else
-        /* stxa data_reg, [arg0] ASI_PRIMARY_LITTLE */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, STXA, ASI_PRIMARY_LITTLE);
-#endif
-        break;
-    default:
-        tcg_abort();
+    addr_reg = args[addrlo_idx];
+    if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
+        tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
+        addr_reg = TCG_REG_I5;
     }
-
-#if defined(CONFIG_SOFTMMU)
-    /* label2: */
-    *label2_ptr = (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x2) |
-                   INSN_OFF22((unsigned long)s->code_ptr -
-                              (unsigned long)label2_ptr));
-#endif
+    tcg_out_qemu_st_direct(s, addr_reg, datalo, datahi, opc);
+#endif /* CONFIG_SOFTMMU */
 }
 
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
@@ -1205,12 +1189,12 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         /* Store AREG0 in stack to avoid ugly glibc bugs that mangle
            global registers */
         // delay slot
-        tcg_out_ldst(s, TCG_AREG0, TCG_REG_CALL_STACK,
-                     TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                     sizeof(long), HOST_ST_OP);
-        tcg_out_ldst(s, TCG_AREG0, TCG_REG_CALL_STACK,
-                     TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                     sizeof(long), HOST_LD_OP);
+        tcg_out_st(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
+                   TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
+                   sizeof(long));
+        tcg_out_ld(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
+                   TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
+                   sizeof(long));
         break;
     case INDEX_op_jmp:
     case INDEX_op_br:
@@ -1378,6 +1362,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_qemu_ld(s, args, 2 | 4);
         break;
 #endif
+    case INDEX_op_qemu_ld64:
+        tcg_out_qemu_ld(s, args, 3);
+        break;
     case INDEX_op_qemu_st8:
         tcg_out_qemu_st(s, args, 0);
         break;
@@ -1387,6 +1374,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_qemu_st32:
         tcg_out_qemu_st(s, args, 2);
         break;
+    case INDEX_op_qemu_st64:
+        tcg_out_qemu_st(s, args, 3);
+        break;
 
 #if TCG_TARGET_REG_BITS == 64
     case INDEX_op_movi_i64:
@@ -1451,13 +1441,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
                             args[2], const_args[2]);
         break;
 
-    case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, args, 3);
-        break;
-    case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, args, 3);
-        break;
-
 #endif
     gen_arith:
         tcg_out_arithc(s, args[0], args[1], args[2], const_args[2], c);
@@ -1522,20 +1505,6 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_mulu2_i32, { "r", "r", "r", "rJ" } },
 #endif
 
-    { INDEX_op_qemu_ld8u, { "r", "L" } },
-    { INDEX_op_qemu_ld8s, { "r", "L" } },
-    { INDEX_op_qemu_ld16u, { "r", "L" } },
-    { INDEX_op_qemu_ld16s, { "r", "L" } },
-    { INDEX_op_qemu_ld32, { "r", "L" } },
-#if TCG_TARGET_REG_BITS == 64
-    { INDEX_op_qemu_ld32u, { "r", "L" } },
-    { INDEX_op_qemu_ld32s, { "r", "L" } },
-#endif
-
-    { INDEX_op_qemu_st8, { "L", "L" } },
-    { INDEX_op_qemu_st16, { "L", "L" } },
-    { INDEX_op_qemu_st32, { "L", "L" } },
-
 #if TCG_TARGET_REG_BITS == 64
     { INDEX_op_mov_i64, { "r", "r" } },
     { INDEX_op_movi_i64, { "r" } },
@@ -1550,8 +1519,6 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_st16_i64, { "r", "r" } },
     { INDEX_op_st32_i64, { "r", "r" } },
     { INDEX_op_st_i64, { "r", "r" } },
-    { INDEX_op_qemu_ld64, { "L", "L" } },
-    { INDEX_op_qemu_st64, { "L", "L" } },
 
     { INDEX_op_add_i64, { "r", "r", "rJ" } },
     { INDEX_op_mul_i64, { "r", "r", "rJ" } },
@@ -1578,10 +1545,48 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 
     { INDEX_op_brcond_i64, { "r", "rJ" } },
     { INDEX_op_setcond_i64, { "r", "r", "rJ" } },
-#else
-    { INDEX_op_qemu_ld64, { "L", "L", "L" } },
+#endif
+
+#if TCG_TARGET_REG_BITS == 64
+    { INDEX_op_qemu_ld8u, { "r", "L" } },
+    { INDEX_op_qemu_ld8s, { "r", "L" } },
+    { INDEX_op_qemu_ld16u, { "r", "L" } },
+    { INDEX_op_qemu_ld16s, { "r", "L" } },
+    { INDEX_op_qemu_ld32, { "r", "L" } },
+    { INDEX_op_qemu_ld32u, { "r", "L" } },
+    { INDEX_op_qemu_ld32s, { "r", "L" } },
+    { INDEX_op_qemu_ld64, { "r", "L" } },
+
+    { INDEX_op_qemu_st8, { "L", "L" } },
+    { INDEX_op_qemu_st16, { "L", "L" } },
+    { INDEX_op_qemu_st32, { "L", "L" } },
+    { INDEX_op_qemu_st64, { "L", "L" } },
+#elif TARGET_LONG_BITS <= TCG_TARGET_REG_BITS
+    { INDEX_op_qemu_ld8u, { "r", "L" } },
+    { INDEX_op_qemu_ld8s, { "r", "L" } },
+    { INDEX_op_qemu_ld16u, { "r", "L" } },
+    { INDEX_op_qemu_ld16s, { "r", "L" } },
+    { INDEX_op_qemu_ld32, { "r", "L" } },
+    { INDEX_op_qemu_ld64, { "r", "r", "L" } },
+
+    { INDEX_op_qemu_st8, { "L", "L" } },
+    { INDEX_op_qemu_st16, { "L", "L" } },
+    { INDEX_op_qemu_st32, { "L", "L" } },
     { INDEX_op_qemu_st64, { "L", "L", "L" } },
+#else
+    { INDEX_op_qemu_ld8u, { "r", "L", "L" } },
+    { INDEX_op_qemu_ld8s, { "r", "L", "L" } },
+    { INDEX_op_qemu_ld16u, { "r", "L", "L" } },
+    { INDEX_op_qemu_ld16s, { "r", "L", "L" } },
+    { INDEX_op_qemu_ld32, { "r", "L", "L" } },
+    { INDEX_op_qemu_ld64, { "L", "L", "L", "L" } },
+
+    { INDEX_op_qemu_st8, { "L", "L", "L" } },
+    { INDEX_op_qemu_st16, { "L", "L", "L" } },
+    { INDEX_op_qemu_st32, { "L", "L", "L" } },
+    { INDEX_op_qemu_st64, { "L", "L", "L", "L" } },
 #endif
+
     { -1 },
 };
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 05/14] tcg-sparc: Simplify qemu_ld/st direct memory paths.
  2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
                   ` (3 preceding siblings ...)
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 04/14] tcg-sparc: Fix qemu_ld/st to handle 32-bit host Richard Henderson
@ 2012-03-28  0:32 ` Richard Henderson
  2012-03-29 18:47   ` Blue Swirl
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 06/14] tcg-sparc: Support GUEST_BASE Richard Henderson
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel

Given that we have an opcode for all sizes, all endianness,
turn the functions into a simple table lookup.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |  384 +++++++++++++++++++-----------------------------
 1 files changed, 150 insertions(+), 234 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index c74fc2c..5cea5a8 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -294,6 +294,16 @@ static inline int tcg_target_const_match(tcg_target_long val,
 #define ASI_PRIMARY_LITTLE 0x88
 #endif
 
+#define LDUH_LE    (LDUHA | INSN_ASI(ASI_PRIMARY_LITTLE))
+#define LDSH_LE    (LDSHA | INSN_ASI(ASI_PRIMARY_LITTLE))
+#define LDUW_LE    (LDUWA | INSN_ASI(ASI_PRIMARY_LITTLE))
+#define LDSW_LE    (LDSWA | INSN_ASI(ASI_PRIMARY_LITTLE))
+#define LDX_LE     (LDXA  | INSN_ASI(ASI_PRIMARY_LITTLE))
+
+#define STH_LE     (STHA  | INSN_ASI(ASI_PRIMARY_LITTLE))
+#define STW_LE     (STWA  | INSN_ASI(ASI_PRIMARY_LITTLE))
+#define STX_LE     (STXA  | INSN_ASI(ASI_PRIMARY_LITTLE))
+
 static inline void tcg_out_arith(TCGContext *s, int rd, int rs1, int rs2,
                                  int op)
 {
@@ -366,66 +376,46 @@ static inline void tcg_out_movi(TCGContext *s, TCGType type,
     }
 }
 
-static inline void tcg_out_ld_raw(TCGContext *s, int ret,
-                                  tcg_target_long arg)
+static inline void tcg_out_ldst_rr(TCGContext *s, int data, int a1,
+                                   int a2, int op)
 {
-    tcg_out_sethi(s, ret, arg);
-    tcg_out32(s, LDUW | INSN_RD(ret) | INSN_RS1(ret) |
-              INSN_IMM13(arg & 0x3ff));
+    tcg_out32(s, op | INSN_RD(data) | INSN_RS1(a1) | INSN_RS2(a2));
 }
 
-static inline void tcg_out_ld_ptr(TCGContext *s, int ret,
-                                  tcg_target_long arg)
+static inline void tcg_out_ldst(TCGContext *s, int ret, int addr,
+                                int offset, int op)
 {
-    if (!check_fit_tl(arg, 10))
-        tcg_out_movi(s, TCG_TYPE_PTR, ret, arg & ~0x3ffULL);
-    if (TCG_TARGET_REG_BITS == 64) {
-        tcg_out32(s, LDX | INSN_RD(ret) | INSN_RS1(ret) |
-                  INSN_IMM13(arg & 0x3ff));
-    } else {
-        tcg_out32(s, LDUW | INSN_RD(ret) | INSN_RS1(ret) |
-                  INSN_IMM13(arg & 0x3ff));
-    }
-}
-
-static inline void tcg_out_ldst(TCGContext *s, int ret, int addr, int offset, int op)
-{
-    if (check_fit_tl(offset, 13))
+    if (check_fit_tl(offset, 13)) {
         tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(addr) |
                   INSN_IMM13(offset));
-    else {
+    } else {
         tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I5, offset);
-        tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(TCG_REG_I5) |
-                  INSN_RS2(addr));
+        tcg_out_ldst_rr(s, ret, addr, TCG_REG_I5, op);
     }
 }
 
-static inline void tcg_out_ldst_asi(TCGContext *s, int ret, int addr,
-                                    int offset, int op, int asi)
-{
-    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I5, offset);
-    tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(TCG_REG_I5) |
-              INSN_ASI(asi) | INSN_RS2(addr));
-}
-
 static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
                               TCGReg arg1, tcg_target_long arg2)
 {
-    if (type == TCG_TYPE_I32)
-        tcg_out_ldst(s, ret, arg1, arg2, LDUW);
-    else
-        tcg_out_ldst(s, ret, arg1, arg2, LDX);
+    tcg_out_ldst(s, ret, arg1, arg2, (type == TCG_TYPE_I32 ? LDUW : LDX));
 }
 
 static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
                               TCGReg arg1, tcg_target_long arg2)
 {
-    if (type == TCG_TYPE_I32)
-        tcg_out_ldst(s, arg, arg1, arg2, STW);
-    else
-        tcg_out_ldst(s, arg, arg1, arg2, STX);
+    tcg_out_ldst(s, arg, arg1, arg2, (type == TCG_TYPE_I32 ? STW : STX));
+}
+
+static inline void tcg_out_ld_ptr(TCGContext *s, int ret,
+                                  tcg_target_long arg)
+{
+    if (!check_fit_tl(arg, 10)) {
+        tcg_out_movi(s, TCG_TYPE_PTR, ret, arg & ~0x3ff);
+    }
+    tcg_out_ld(s, TCG_TYPE_PTR, ret, ret, arg & 0x3ff);
 }
 
+
 static inline void tcg_out_sety(TCGContext *s, int rs)
 {
     tcg_out32(s, WRY | INSN_RS1(TCG_REG_G0) | INSN_RS2(rs));
@@ -757,22 +747,16 @@ static const void * const qemu_st_helpers[4] = {
    WHICH is the offset into the CPUTLBEntry structure of the slot to read.
    This should be offsetof addr_read or addr_write.
 
-   Outputs:
-   LABEL_PTRS is filled with the position of the forward jumps to the
-   TLB miss case.  This will always be a ,PN insn, so a 19-bit offset.
-
-   Returns a register loaded with the low part of the address, adjusted
-   as indicated by the TLB and so is a host address.  Undefined in the
-   TLB miss case.  */
+   The result of the TLB comparison is in %[ix]cc.  The sanitized address
+   is in the returned register, maybe %o0.  The TLB addend is in %o1.  */
 
 static int tcg_out_tlb_load(TCGContext *s, int addrlo_idx, int mem_index,
-                            int s_bits, const TCGArg *args,
-                            uint32_t **label_ptr, int which)
+                            int s_bits, const TCGArg *args, int which)
 {
     const int addrlo = args[addrlo_idx];
-    const int r0 = tcg_target_call_iarg_regs[0];
-    const int r1 = tcg_target_call_iarg_regs[1];
-    const int r2 = tcg_target_call_iarg_regs[2];
+    const int r0 = TCG_REG_O0;
+    const int r1 = TCG_REG_O1;
+    const int r2 = TCG_REG_O2;
     int addr = addrlo;
     int tlb_ofs;
 
@@ -803,110 +787,39 @@ static int tcg_out_tlb_load(TCGContext *s, int addrlo_idx, int mem_index,
         tlb_ofs = 0;
     }
 
-    /* ld [arg1 + which], arg2 */
+    /* Load the tlb comparator and the addend.  */
     tcg_out_ld(s, TCG_TYPE_TL, r2, r1, tlb_ofs + which);
+    tcg_out_ld(s, TCG_TYPE_PTR, r1, r1, tlb_ofs+offsetof(CPUTLBEntry, addend));
 
     /* subcc arg0, arg2, %g0 */
     tcg_out_cmp(s, r0, r2, 0);
 
-    /* bne,pn %[ix]cc, label0 */
-    *label_ptr = (uint32_t *)s->code_ptr;
-    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_NE, 0) | INSN_OP2(0x1) |
-                  ((TARGET_LONG_BITS == 64) << 21)));
-
-    /* TLB Hit.  Compute the host address into r1.  The ld is in the
-       branch delay slot; harmless for the TLB miss case.  */
-    tcg_out_ld(s, TCG_TYPE_PTR, r1, r1, tlb_ofs+offsetof(CPUTLBEntry, addend));
-
+    /* If the guest address must be zero-extended, do so now.  */
     if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
         tcg_out_arithi(s, r0, addrlo, 0, SHIFT_SRL);
-        tcg_out_arith(s, r1, r0, r1, ARITH_ADD);
-    } else {
-        tcg_out_arith(s, r1, addrlo, r1, ARITH_ADD);
+        return r0;
     }
-
-    return r1;
+    return addrlo;
 }
 #endif /* CONFIG_SOFTMMU */
 
-static void tcg_out_qemu_ld_direct(TCGContext *s, int addr, int datalo,
-                                   int datahi, int sizeop)
-{
+static const int qemu_ld_opc[8] = {
 #ifdef TARGET_WORDS_BIGENDIAN
-    const int bigendian = 1;
+    LDUB, LDUH, LDUW, LDX, LDSB, LDSH, LDSW, LDX
 #else
-    const int bigendian = 0;
+    LDUB, LDUH_LE, LDUW_LE, LDX_LE, LDSB, LDSH_LE, LDSW_LE, LDX_LE
 #endif
-    switch (sizeop) {
-    case 0:
-        /* ldub [addr], datalo */
-        tcg_out_ldst(s, datalo, addr, 0, LDUB);
-        break;
-    case 0 | 4:
-        /* ldsb [addr], datalo */
-        tcg_out_ldst(s, datalo, addr, 0, LDSB);
-        break;
-    case 1:
-        if (bigendian) {
-            /* lduh [addr], datalo */
-            tcg_out_ldst(s, datalo, addr, 0, LDUH);
-        } else {
-            /* lduha [addr] ASI_PRIMARY_LITTLE, datalo */
-            tcg_out_ldst_asi(s, datalo, addr, 0, LDUHA, ASI_PRIMARY_LITTLE);
-        }
-        break;
-    case 1 | 4:
-        if (bigendian) {
-            /* ldsh [addr], datalo */
-            tcg_out_ldst(s, datalo, addr, 0, LDSH);
-        } else {
-            /* ldsha [addr] ASI_PRIMARY_LITTLE, datalo */
-            tcg_out_ldst_asi(s, datalo, addr, 0, LDSHA, ASI_PRIMARY_LITTLE);
-        }
-        break;
-    case 2:
-        if (bigendian) {
-            /* lduw [addr], datalo */
-            tcg_out_ldst(s, datalo, addr, 0, LDUW);
-        } else {
-            /* lduwa [addr] ASI_PRIMARY_LITTLE, datalo */
-            tcg_out_ldst_asi(s, datalo, addr, 0, LDUWA, ASI_PRIMARY_LITTLE);
-        }
-        break;
-    case 2 | 4:
-        if (bigendian) {
-            /* ldsw [addr], datalo */
-            tcg_out_ldst(s, datalo, addr, 0, LDSW);
-        } else {
-            /* ldswa [addr] ASI_PRIMARY_LITTLE, datalo */
-            tcg_out_ldst_asi(s, datalo, addr, 0, LDSWA, ASI_PRIMARY_LITTLE);
-        }
-        break;
-    case 3:
-        if (TCG_TARGET_REG_BITS == 64) {
-            if (bigendian) {
-                /* ldx [addr], datalo */
-                tcg_out_ldst(s, datalo, addr, 0, LDX);
-            } else {
-                /* ldxa [addr] ASI_PRIMARY_LITTLE, datalo */
-                tcg_out_ldst_asi(s, datalo, addr, 0, LDXA, ASI_PRIMARY_LITTLE);
-            }
-        } else {
-            if (bigendian) {
-                tcg_out_ldst(s, datahi, addr, 0, LDUW);
-                tcg_out_ldst(s, datalo, addr, 4, LDUW);
-            } else {
-                tcg_out_ldst_asi(s, datalo, addr, 0, LDUWA, ASI_PRIMARY_LITTLE);
-                tcg_out_ldst_asi(s, datahi, addr, 4, LDUWA, ASI_PRIMARY_LITTLE);
-            }
-        }
-        break;
-    default:
-        tcg_abort();
-    }
-}
+};
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
+static const int qemu_st_opc[4] = {
+#ifdef TARGET_WORDS_BIGENDIAN
+    STB, STH, STW, STX
+#else
+    STB, STH_LE, STW_LE, STX_LE
+#endif
+};
+
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
 {
     int addrlo_idx = 1, datalo, datahi, addr_reg;
 #if defined(CONFIG_SOFTMMU)
@@ -915,7 +828,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 #endif
 
     datahi = datalo = args[0];
-    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
         datahi = args[1];
         addrlo_idx = 2;
     }
@@ -923,27 +836,59 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 #if defined(CONFIG_SOFTMMU)
     memi_idx = addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
     memi = args[memi_idx];
-    s_bits = opc & 3;
+    s_bits = sizeop & 3;
 
     addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, s_bits, args,
-                                label_ptr, offsetof(CPUTLBEntry, addr_read));
+                                offsetof(CPUTLBEntry, addr_read));
 
-    /* TLB Hit.  */
-    tcg_out_qemu_ld_direct(s, addr_reg, datalo, datahi, opc);
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+        int reg64;
 
-    /* b,pt,n label1 */
-    label_ptr[1] = (uint32_t *)s->code_ptr;
-    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x1)
-                  | (1 << 29) | (1 << 19)));
+        /* bne,pn %[xi]cc, label0 */
+        label_ptr[0] = (uint32_t *)s->code_ptr;
+        tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_NE, 0) | INSN_OP2(0x1)
+                      | ((TARGET_LONG_BITS == 64) << 21)));
+
+        /* TLB Hit.  */
+        /* Load all 64-bits into an O/G register.  */
+        reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
+        tcg_out_ldst_rr(s, reg64, addr_reg, TCG_REG_O1, qemu_ld_opc[sizeop]);
+
+        /* Move the two 32-bit pieces into the destination registers.  */
+        tcg_out_arithi(s, datahi, reg64, 32, SHIFT_SRLX);
+        if (reg64 != datalo) {
+            tcg_out_mov(s, TCG_TYPE_I32, datalo, reg64);
+        }
+
+        /* b,pt,n label1 */
+        label_ptr[1] = (uint32_t *)s->code_ptr;
+        tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x1)
+                      | (1 << 29) | (1 << 19)));
+    } else {
+        /* The fast path is exactly one insn.  Thus we can perform the
+           entire TLB Hit in the (annulled) delay slot of the branch
+           over the TLB Miss case.  */
+
+        /* beq,a,pt %[xi]cc, label0 */
+        label_ptr[0] = NULL;
+        label_ptr[1] = (uint32_t *)s->code_ptr;
+        tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_E, 0) | INSN_OP2(0x1)
+                      | ((TARGET_LONG_BITS == 64) << 21)
+                      | (1 << 29) | (1 << 19)));
+        /* delay slot */
+        tcg_out_ldst_rr(s, datalo, addr_reg, TCG_REG_O1, qemu_ld_opc[sizeop]);
+    }
 
     /* TLB Miss.  */
 
-    *label_ptr[0] |= INSN_OFF19((unsigned long)s->code_ptr -
-                                (unsigned long)label_ptr[0]);
-    n = 0;
-#ifdef CONFIG_TCG_PASS_AREG0
-    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[n++], TCG_AREG0);
-#endif
+    if (label_ptr[0]) {
+        *label_ptr[0] |= INSN_OFF19((unsigned long)s->code_ptr -
+                                    (unsigned long)label_ptr[0]);
+    }
+    n = ARG_OFFSET;
+    if (ARG_OFFSET) {
+       tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
+    }
     if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
         tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
                     args[addrlo_idx + 1]);
@@ -971,7 +916,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 
     n = tcg_target_call_oarg_regs[0];
     /* datalo = sign_extend(arg0) */
-    switch(opc) {
+    switch (sizeop) {
     case 0 | 4:
         /* Recall that SRA sign extends from bit 31 through bit 63.  */
         tcg_out_arithi(s, datalo, n, 24, SHIFT_SLL);
@@ -1008,75 +953,31 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
         tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
         addr_reg = TCG_REG_I5;
     }
-    tcg_out_qemu_ld_direct(s, addr_reg, datalo, datahi, opc);
-#endif /* CONFIG_SOFTMMU */
-}
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+        int reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
 
-static void tcg_out_qemu_st_direct(TCGContext *s, int addr, int datalo,
-                                   int datahi, int sizeop)
-{
-#ifdef TARGET_WORDS_BIGENDIAN
-    const int bigendian = 1;
-#else
-    const int bigendian = 0;
-#endif
-    switch (sizeop) {
-    case 0:
-        /* stb datalo, [addr] */
-        tcg_out_ldst(s, datalo, addr, 0, STB);
-        break;
-    case 1:
-        if (bigendian) {
-            /* sth datalo, [addr] */
-            tcg_out_ldst(s, datalo, addr, 0, STH);
-        } else {
-            /* stha datalo, [addr] ASI_PRIMARY_LITTLE */
-            tcg_out_ldst_asi(s, datalo, addr, 0, STHA, ASI_PRIMARY_LITTLE);
-        }
-        break;
-    case 2:
-        if (bigendian) {
-            /* stw datalo, [addr] */
-            tcg_out_ldst(s, datalo, addr, 0, STW);
-        } else {
-            /* stwa datalo, [addr] ASI_PRIMARY_LITTLE */
-            tcg_out_ldst_asi(s, datalo, addr, 0, STWA, ASI_PRIMARY_LITTLE);
-        }
-        break;
-    case 3:
-        if (TCG_TARGET_REG_BITS == 64) {
-            if (bigendian) {
-                /* stx datalo, [addr] */
-                tcg_out_ldst(s, datalo, addr, 0, STX);
-            } else {
-                /* stxa datalo, [addr] ASI_PRIMARY_LITTLE */
-                tcg_out_ldst_asi(s, datalo, addr, 0, STXA, ASI_PRIMARY_LITTLE);
-            }
-        } else {
-            if (bigendian) {
-                tcg_out_ldst(s, datahi, addr, 0, STW);
-                tcg_out_ldst(s, datalo, addr, 4, STW);
-            } else {
-                tcg_out_ldst_asi(s, datalo, addr, 0, STWA, ASI_PRIMARY_LITTLE);
-                tcg_out_ldst_asi(s, datahi, addr, 4, STWA, ASI_PRIMARY_LITTLE);
-            }
+        tcg_out_ldst_rr(s, reg64, addr_reg, TCG_REG_G0, qemu_ld_opc[sizeop]);
+
+        tcg_out_arithi(s, datahi, reg64, 32, SHIFT_SRLX);
+        if (reg64 != datalo) {
+            tcg_out_mov(s, TCG_TYPE_I32, datalo, reg64);
         }
-        break;
-    default:
-        tcg_abort();
+    } else {
+        tcg_out_ldst_rr(s, datalo, addr_reg, TCG_REG_G0, qemu_ld_opc[sizeop]);
     }
+#endif /* CONFIG_SOFTMMU */
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
 {
     int addrlo_idx = 1, datalo, datahi, addr_reg;
 #if defined(CONFIG_SOFTMMU)
     int memi_idx, memi, n;
-    uint32_t *label_ptr[2];
+    uint32_t *label_ptr;
 #endif
 
     datahi = datalo = args[0];
-    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
         datahi = args[1];
         addrlo_idx = 2;
     }
@@ -1085,33 +986,40 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
     memi_idx = addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
     memi = args[memi_idx];
 
-    addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, opc, args,
-                                label_ptr, offsetof(CPUTLBEntry, addr_write));
+    addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, sizeop, args,
+                                offsetof(CPUTLBEntry, addr_write));
 
-    /* TLB Hit.  */
-    tcg_out_qemu_st_direct(s, addr_reg, datalo, datahi, opc);
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+        /* Reconstruct the full 64-bit value in %g1, using %o2 as temp.  */
+        /* ??? Redefine the temps from %i4/%i5 so that we have a o/g temp. */
+        tcg_out_arithi(s, TCG_REG_G1, datalo, 0, SHIFT_SRL);
+        tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
+        tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
+        datalo = TCG_REG_G1;
+    }
 
-    /* b,pt,n label1 */
-    label_ptr[1] = (uint32_t *)s->code_ptr;
-    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x1)
+    /* The fast path is exactly one insn.  Thus we can perform the entire
+       TLB Hit in the (annulled) delay slot of the branch over TLB Miss.  */
+    /* beq,a,pt %[xi]cc, label0 */
+    label_ptr = (uint32_t *)s->code_ptr;
+    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_E, 0) | INSN_OP2(0x1)
+                  | ((TARGET_LONG_BITS == 64) << 21)
                   | (1 << 29) | (1 << 19)));
+    /* delay slot */
+    tcg_out_ldst_rr(s, datalo, addr_reg, TCG_REG_O1, qemu_st_opc[sizeop]);
 
     /* TLB Miss.  */
-
-    *label_ptr[0] |= INSN_OFF19((unsigned long)s->code_ptr -
-                                (unsigned long)label_ptr[0]);
-
-    n = 0;
-#ifdef CONFIG_TCG_PASS_AREG0
-    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[n++], TCG_AREG0);
-#endif
+    n = ARG_OFFSET;
+    if (ARG_OFFSET) {
+         tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
+    }
     if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
         tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
                     args[addrlo_idx + 1]);
     }
     tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
                 args[addrlo_idx]);
-    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
         tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++], datahi);
     }
     tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++], datalo);
@@ -1123,7 +1031,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
                sizeof(long));
 
     /* qemu_st_helper[s_bits](arg0, arg1, arg2) */
-    tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_st_helpers[opc]
+    tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_st_helpers[sizeop]
                            - (tcg_target_ulong)s->code_ptr) >> 2)
                          & 0x3fffffff));
     /* delay slot */
@@ -1134,15 +1042,23 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
                TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
                sizeof(long));
 
-    *label_ptr[1] |= INSN_OFF19((unsigned long)s->code_ptr -
-                                (unsigned long)label_ptr[1]);
+    *label_ptr |= INSN_OFF19((unsigned long)s->code_ptr -
+                             (unsigned long)label_ptr);
 #else
     addr_reg = args[addrlo_idx];
     if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
         tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
         addr_reg = TCG_REG_I5;
     }
-    tcg_out_qemu_st_direct(s, addr_reg, datalo, datahi, opc);
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+        /* Reconstruct the full 64-bit value in %g1, using %o2 as temp.  */
+        /* ??? Redefine the temps from %i4/%i5 so that we have a o/g temp. */
+        tcg_out_arithi(s, TCG_REG_G1, datalo, 0, SHIFT_SRL);
+        tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
+        tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
+        datalo = TCG_REG_G1;
+    }
+    tcg_out_ldst_rr(s, datalo, addr_reg, TCG_REG_G0, qemu_st_opc[sizeop]);
 #endif /* CONFIG_SOFTMMU */
 }
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 06/14] tcg-sparc: Support GUEST_BASE.
  2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
                   ` (4 preceding siblings ...)
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 05/14] tcg-sparc: Simplify qemu_ld/st direct memory paths Richard Henderson
@ 2012-03-28  0:32 ` Richard Henderson
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 07/14] Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0 Richard Henderson
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel


Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 configure              |    2 ++
 tcg/sparc/tcg-target.c |   26 +++++++++++++++++++++++---
 tcg/sparc/tcg-target.h |    2 ++
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 7741ba9..a79a090 100755
--- a/configure
+++ b/configure
@@ -819,6 +819,7 @@ case "$cpu" in
            if test "$solaris" = "no" ; then
              QEMU_CFLAGS="-ffixed-g1 -ffixed-g6 $QEMU_CFLAGS"
            fi
+           host_guest_base="yes"
            ;;
     sparc64)
            LDFLAGS="-m64 $LDFLAGS"
@@ -827,6 +828,7 @@ case "$cpu" in
            if test "$solaris" != "no" ; then
              QEMU_CFLAGS="-ffixed-g1 $QEMU_CFLAGS"
            fi
+           host_guest_base="yes"
            ;;
     s390)
            QEMU_CFLAGS="-m31 -march=z990 $QEMU_CFLAGS"
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 5cea5a8..c014ce0 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -59,6 +59,12 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 };
 #endif
 
+#ifdef CONFIG_USE_GUEST_BASE
+# define TCG_GUEST_BASE_REG TCG_REG_I3
+#else
+# define TCG_GUEST_BASE_REG TCG_REG_G0
+#endif
+
 #ifdef CONFIG_TCG_PASS_AREG0
 #define ARG_OFFSET 1
 #else
@@ -689,6 +695,14 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     tcg_out32(s, SAVE | INSN_RD(TCG_REG_O6) | INSN_RS1(TCG_REG_O6) |
               INSN_IMM13(-(TCG_TARGET_STACK_MINFRAME +
                            CPU_TEMP_BUF_NLONGS * (int)sizeof(long))));
+
+#ifdef CONFIG_USE_GUEST_BASE
+    if (GUEST_BASE != 0) {
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, GUEST_BASE);
+        tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
+    }
+#endif
+
     tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_I1) |
               INSN_RS2(TCG_REG_G0));
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_I0);
@@ -956,14 +970,18 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
     if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
         int reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
 
-        tcg_out_ldst_rr(s, reg64, addr_reg, TCG_REG_G0, qemu_ld_opc[sizeop]);
+        tcg_out_ldst_rr(s, reg64, addr_reg,
+                        (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
+                        qemu_ld_opc[sizeop]);
 
         tcg_out_arithi(s, datahi, reg64, 32, SHIFT_SRLX);
         if (reg64 != datalo) {
             tcg_out_mov(s, TCG_TYPE_I32, datalo, reg64);
         }
     } else {
-        tcg_out_ldst_rr(s, datalo, addr_reg, TCG_REG_G0, qemu_ld_opc[sizeop]);
+        tcg_out_ldst_rr(s, datalo, addr_reg,
+                        (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
+                        qemu_ld_opc[sizeop]);
     }
 #endif /* CONFIG_SOFTMMU */
 }
@@ -1058,7 +1076,9 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
         tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
         datalo = TCG_REG_G1;
     }
-    tcg_out_ldst_rr(s, datalo, addr_reg, TCG_REG_G0, qemu_st_opc[sizeop]);
+    tcg_out_ldst_rr(s, datalo, addr_reg,
+                    (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
+                    qemu_st_opc[sizeop]);
 #endif /* CONFIG_SOFTMMU */
 }
 
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 56742bf..e69dfc8 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -126,6 +126,8 @@ typedef enum {
 #define TCG_TARGET_HAS_deposit_i64      0
 #endif
 
+#define TCG_TARGET_HAS_GUEST_BASE
+
 /* Note: must be synced with dyngen-exec.h */
 #ifdef CONFIG_SOLARIS
 #define TCG_AREG0 TCG_REG_G2
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 07/14] Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0.
  2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
                   ` (5 preceding siblings ...)
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 06/14] tcg-sparc: Support GUEST_BASE Richard Henderson
@ 2012-03-28  0:32 ` Richard Henderson
  2012-03-29 18:57   ` Blue Swirl
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 08/14] tcg-sparc: Do not use a global register for AREG0 Richard Henderson
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel

At the same time, remove use of the global ENV from user-exec.c.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 Makefile.target |    5 -----
 dyngen-exec.h   |    5 +++++
 user-exec.c     |   17 ++++++-----------
 3 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index aa53e28..81fdf9e 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -110,11 +110,6 @@ $(libobj-y): $(GENERATED_HEADERS)
 ifndef CONFIG_TCG_PASS_AREG0
 op_helper.o: QEMU_CFLAGS += $(HELPER_CFLAGS)
 endif
-user-exec.o: QEMU_CFLAGS += $(HELPER_CFLAGS)
-
-# Note: this is a workaround. The real fix is to avoid compiling
-# cpu_signal_handler() in user-exec.c.
-signal.o: QEMU_CFLAGS += $(HELPER_CFLAGS)
 
 #########################################################
 # Linux user emulator target
diff --git a/dyngen-exec.h b/dyngen-exec.h
index cfeef99..65fcb43 100644
--- a/dyngen-exec.h
+++ b/dyngen-exec.h
@@ -19,6 +19,10 @@
 #if !defined(__DYNGEN_EXEC_H__)
 #define __DYNGEN_EXEC_H__
 
+/* If the target has indicated that it does not need an AREG0,
+   don't declare the env variable at all, much less as a register.  */
+#if !defined(CONFIG_TCG_PASS_AREG0)
+
 #if defined(CONFIG_TCG_INTERPRETER)
 /* The TCG interpreter does not need a special register AREG0,
  * but it is possible to use one by defining AREG0.
@@ -65,4 +69,5 @@ register CPUArchState *env asm(AREG0);
 extern CPUArchState *env;
 #endif
 
+#endif /* !CONFIG_TCG_PASS_AREG0 */
 #endif /* !defined(__DYNGEN_EXEC_H__) */
diff --git a/user-exec.c b/user-exec.c
index cd905ff..9691f09 100644
--- a/user-exec.c
+++ b/user-exec.c
@@ -18,7 +18,6 @@
  */
 #include "config.h"
 #include "cpu.h"
-#include "dyngen-exec.h"
 #include "disas.h"
 #include "tcg.h"
 
@@ -58,8 +57,6 @@ void cpu_resume_from_signal(CPUArchState *env1, void *puc)
     struct sigcontext *uc = puc;
 #endif
 
-    env = env1;
-
     /* XXX: restore cpu registers saved in host registers */
 
     if (puc) {
@@ -74,8 +71,8 @@ void cpu_resume_from_signal(CPUArchState *env1, void *puc)
         sigprocmask(SIG_SETMASK, &uc->sc_mask, NULL);
 #endif
     }
-    env->exception_index = -1;
-    longjmp(env->jmp_env, 1);
+    env1->exception_index = -1;
+    longjmp(env1->jmp_env, 1);
 }
 
 /* 'pc' is the host PC at which the exception was raised. 'address' is
@@ -86,12 +83,10 @@ static inline int handle_cpu_signal(unsigned long pc, unsigned long address,
                                     int is_write, sigset_t *old_set,
                                     void *puc)
 {
+    CPUArchState *env1 = cpu_single_env;
     TranslationBlock *tb;
     int ret;
 
-    if (cpu_single_env) {
-        env = cpu_single_env; /* XXX: find a correct solution for multithread */
-    }
 #if defined(DEBUG_SIGNAL)
     qemu_printf("qemu: SIGSEGV pc=0x%08lx address=%08lx w=%d oldset=0x%08lx\n",
                 pc, address, is_write, *(unsigned long *)old_set);
@@ -102,7 +97,7 @@ static inline int handle_cpu_signal(unsigned long pc, unsigned long address,
     }
 
     /* see if it is an MMU fault */
-    ret = cpu_handle_mmu_fault(env, address, is_write, MMU_USER_IDX);
+    ret = cpu_handle_mmu_fault(env1, address, is_write, MMU_USER_IDX);
     if (ret < 0) {
         return 0; /* not an MMU fault */
     }
@@ -114,13 +109,13 @@ static inline int handle_cpu_signal(unsigned long pc, unsigned long address,
     if (tb) {
         /* the PC is inside the translated code. It means that we have
            a virtual CPU fault */
-        cpu_restore_state(tb, env, pc);
+        cpu_restore_state(tb, env1, pc);
     }
 
     /* we restore the process signal mask as the sigreturn should
        do it (XXX: use sigsetjmp) */
     sigprocmask(SIG_SETMASK, old_set, NULL);
-    exception_action(env);
+    exception_action(env1);
 
     /* never comes here */
     return 1;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 08/14] tcg-sparc: Do not use a global register for AREG0.
  2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
                   ` (6 preceding siblings ...)
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 07/14] Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0 Richard Henderson
@ 2012-03-28  0:32 ` Richard Henderson
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 09/14] tcg-sparc: Change AREG0 in generated code to %i0 Richard Henderson
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel

Use of "env" as a macro means constraining include file ordering a bit.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 dyngen-exec.h                |   17 +++++++----------
 target-m68k/op_helper.c      |    2 +-
 target-unicore32/op_helper.c |    2 +-
 target-xtensa/op_helper.c    |    2 +-
 xtensa-semi.c                |    2 +-
 5 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/dyngen-exec.h b/dyngen-exec.h
index 65fcb43..97fd32c 100644
--- a/dyngen-exec.h
+++ b/dyngen-exec.h
@@ -41,13 +41,8 @@
 #elif defined(__mips__)
 #define AREG0 "s0"
 #elif defined(__sparc__)
-#ifdef CONFIG_SOLARIS
-#define AREG0 "g2"
-#elif HOST_LONG_BITS == 64
-#define AREG0 "g5"
-#else
-#define AREG0 "g6"
-#endif
+/* Don't use a global register.  Working around glibc clobbering these
+   global registers is more trouble than just using TLS.  */
 #elif defined(__s390__)
 #define AREG0 "r10"
 #elif defined(__alpha__)
@@ -62,11 +57,13 @@
 #error unsupported CPU
 #endif
 
-#if defined(AREG0)
+#ifdef AREG0
 register CPUArchState *env asm(AREG0);
 #else
-/* TODO: Try env = cpu_single_env. */
-extern CPUArchState *env;
+/* Without a hard register, we can use the TLS variable instead.  Note that
+   this macro interferes with the use of "env" in DEF_HELPER_N, thus targets
+   should always include "helper.h" before "dyngen-exec.h".  */
+#define env cpu_single_env
 #endif
 
 #endif /* !CONFIG_TCG_PASS_AREG0 */
diff --git a/target-m68k/op_helper.c b/target-m68k/op_helper.c
index bc8c1f0..ef12f21 100644
--- a/target-m68k/op_helper.c
+++ b/target-m68k/op_helper.c
@@ -17,8 +17,8 @@
  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
  */
 #include "cpu.h"
-#include "dyngen-exec.h"
 #include "helpers.h"
+#include "dyngen-exec.h"
 
 #if defined(CONFIG_USER_ONLY)
 
diff --git a/target-unicore32/op_helper.c b/target-unicore32/op_helper.c
index 638a020..a6a68b3 100644
--- a/target-unicore32/op_helper.c
+++ b/target-unicore32/op_helper.c
@@ -8,8 +8,8 @@
  * published by the Free Software Foundation.
  */
 #include "cpu.h"
-#include "dyngen-exec.h"
 #include "helper.h"
+#include "dyngen-exec.h"
 
 #define SIGNBIT (uint32_t)0x80000000
 #define SIGNBIT64 ((uint64_t)1 << 63)
diff --git a/target-xtensa/op_helper.c b/target-xtensa/op_helper.c
index cdef0db..d709983 100644
--- a/target-xtensa/op_helper.c
+++ b/target-xtensa/op_helper.c
@@ -26,8 +26,8 @@
  */
 
 #include "cpu.h"
-#include "dyngen-exec.h"
 #include "helpers.h"
+#include "dyngen-exec.h"
 #include "host-utils.h"
 
 static void do_unaligned_access(target_ulong addr, int is_write, int is_user,
diff --git a/xtensa-semi.c b/xtensa-semi.c
index 5754b77..0c0e018 100644
--- a/xtensa-semi.c
+++ b/xtensa-semi.c
@@ -30,8 +30,8 @@
 #include <string.h>
 #include <stddef.h>
 #include "cpu.h"
-#include "dyngen-exec.h"
 #include "helpers.h"
+#include "dyngen-exec.h"
 #include "qemu-log.h"
 
 enum {
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 09/14] tcg-sparc: Change AREG0 in generated code to %i0.
  2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
                   ` (7 preceding siblings ...)
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 08/14] tcg-sparc: Do not use a global register for AREG0 Richard Henderson
@ 2012-03-28  0:32 ` Richard Henderson
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 10/14] tcg-sparc: Clean up cruft stemming from attempts to use global registers Richard Henderson
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel


Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |    3 ++-
 tcg/sparc/tcg-target.h |    9 +--------
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index c014ce0..ad040fb 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -705,7 +705,8 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
     tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_I1) |
               INSN_RS2(TCG_REG_G0));
-    tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_I0);
+    /* delay slot */
+    tcg_out_nop(s);
 }
 
 #if defined(CONFIG_SOFTMMU)
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index e69dfc8..31b98e2 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -128,14 +128,7 @@ typedef enum {
 
 #define TCG_TARGET_HAS_GUEST_BASE
 
-/* Note: must be synced with dyngen-exec.h */
-#ifdef CONFIG_SOLARIS
-#define TCG_AREG0 TCG_REG_G2
-#elif HOST_LONG_BITS == 64
-#define TCG_AREG0 TCG_REG_G5
-#else
-#define TCG_AREG0 TCG_REG_G6
-#endif
+#define TCG_AREG0 TCG_REG_I0
 
 static inline void flush_icache_range(tcg_target_ulong start,
                                       tcg_target_ulong stop)
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 10/14] tcg-sparc: Clean up cruft stemming from attempts to use global registers.
  2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
                   ` (8 preceding siblings ...)
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 09/14] tcg-sparc: Change AREG0 in generated code to %i0 Richard Henderson
@ 2012-03-28  0:32 ` Richard Henderson
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 11/14] tcg-sparc: Mask shift immediates to avoid illegal insns Richard Henderson
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel

Don't use -ffixed-gN.  Don't link statically.  Don't save/restore
AREG0 around calls.  Don't allocate space on the stack for AREG0 save.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 configure              |   12 ----------
 tcg/sparc/tcg-target.c |   57 ++++++++++++++++--------------------------------
 tcg/sparc/tcg-target.h |   18 ++++++---------
 3 files changed, 26 insertions(+), 61 deletions(-)

diff --git a/configure b/configure
index a79a090..4ae70c0 100755
--- a/configure
+++ b/configure
@@ -815,19 +815,11 @@ case "$cpu" in
     sparc)
            LDFLAGS="-m32 $LDFLAGS"
            QEMU_CFLAGS="-m32 -mcpu=ultrasparc $QEMU_CFLAGS"
-           QEMU_CFLAGS="-ffixed-g2 -ffixed-g3 $QEMU_CFLAGS"
-           if test "$solaris" = "no" ; then
-             QEMU_CFLAGS="-ffixed-g1 -ffixed-g6 $QEMU_CFLAGS"
-           fi
            host_guest_base="yes"
            ;;
     sparc64)
            LDFLAGS="-m64 $LDFLAGS"
            QEMU_CFLAGS="-m64 -mcpu=ultrasparc $QEMU_CFLAGS"
-           QEMU_CFLAGS="-ffixed-g5 -ffixed-g6 -ffixed-g7 $QEMU_CFLAGS"
-           if test "$solaris" != "no" ; then
-             QEMU_CFLAGS="-ffixed-g1 $QEMU_CFLAGS"
-           fi
            host_guest_base="yes"
            ;;
     s390)
@@ -3817,10 +3809,6 @@ fi
 
 if test "$target_linux_user" = "yes" -o "$target_bsd_user" = "yes" ; then
   case "$ARCH" in
-  sparc)
-    # -static is used to avoid g1/g3 usage by the dynamic linker
-    ldflags="$linker_script -static $ldflags"
-    ;;
   alpha | s390x)
     # The default placement of the application is fine.
     ;;
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index ad040fb..88c5140 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -167,9 +167,6 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_O0);
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_O1);
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_O2);
-#ifdef CONFIG_TCG_PASS_AREG0
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_O3);
-#endif
         break;
     case 'I':
         ct->ct |= TCG_CT_CONST_S11;
@@ -690,11 +687,22 @@ static void tcg_out_setcond2_i32(TCGContext *s, TCGCond cond, TCGArg ret,
 /* Generate global QEMU prologue and epilogue code */
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
-    tcg_set_frame(s, TCG_REG_I6, TCG_TARGET_CALL_STACK_OFFSET,
-                  CPU_TEMP_BUF_NLONGS * (int)sizeof(long));
+    int tmp_buf_size, frame_size;
+
+    /* The TCG temp buffer is at the top of the frame, immediately
+       below the frame pointer.  */
+    tmp_buf_size = CPU_TEMP_BUF_NLONGS * (int)sizeof(long);
+    tcg_set_frame(s, TCG_REG_I6, TCG_TARGET_STACK_BIAS - tmp_buf_size,
+                  tmp_buf_size);
+
+    /* TCG_TARGET_CALL_STACK_OFFSET includes the stack bias, but is
+       otherwise the minimal frame usable by callees.  */
+    frame_size = TCG_TARGET_CALL_STACK_OFFSET - TCG_TARGET_STACK_BIAS;
+    frame_size += TCG_STATIC_CALL_ARGS_SIZE + tmp_buf_size;
+    frame_size += TCG_TARGET_STACK_ALIGN - 1;
+    frame_size &= -TCG_TARGET_STACK_ALIGN;
     tcg_out32(s, SAVE | INSN_RD(TCG_REG_O6) | INSN_RS1(TCG_REG_O6) |
-              INSN_IMM13(-(TCG_TARGET_STACK_MINFRAME +
-                           CPU_TEMP_BUF_NLONGS * (int)sizeof(long))));
+              INSN_IMM13(-frame_size));
 
 #ifdef CONFIG_USE_GUEST_BASE
     if (GUEST_BASE != 0) {
@@ -707,6 +715,8 @@ static void tcg_target_qemu_prologue(TCGContext *s)
               INSN_RS2(TCG_REG_G0));
     /* delay slot */
     tcg_out_nop(s);
+
+    /* No epilogue required.  We issue ret + restore directly in the TB.  */
 }
 
 #if defined(CONFIG_SOFTMMU)
@@ -911,12 +921,6 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
     tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
                 args[addrlo_idx]);
 
-    /* Store AREG0 in stack to avoid ugly glibc bugs that mangle
-       global registers */
-    tcg_out_st(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
-               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-               sizeof(long));
-
     /* qemu_ld_helper[s_bits](arg0, arg1) */
     tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_ld_helpers[s_bits]
                            - (tcg_target_ulong)s->code_ptr) >> 2)
@@ -924,11 +928,6 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
     /* delay slot */
     tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[n], memi);
 
-    /* Reload AREG0.  */
-    tcg_out_ld(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
-               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-               sizeof(long));
-
     n = tcg_target_call_oarg_regs[0];
     /* datalo = sign_extend(arg0) */
     switch (sizeop) {
@@ -1043,12 +1042,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
     }
     tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++], datalo);
 
-    /* Store AREG0 in stack to avoid ugly glibc bugs that mangle
-       global registers */
-    tcg_out_st(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
-               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-               sizeof(long));
-
     /* qemu_st_helper[s_bits](arg0, arg1, arg2) */
     tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_st_helpers[sizeop]
                            - (tcg_target_ulong)s->code_ptr) >> 2)
@@ -1056,11 +1049,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
     /* delay slot */
     tcg_out_movi(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n], memi);
 
-    /* Reload AREG0.  */
-    tcg_out_ld(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
-               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-               sizeof(long));
-
     *label_ptr |= INSN_OFF19((unsigned long)s->code_ptr -
                              (unsigned long)label_ptr);
 #else
@@ -1123,15 +1111,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
             tcg_out32(s, JMPL | INSN_RD(TCG_REG_O7) | INSN_RS1(TCG_REG_I5) |
                       INSN_RS2(TCG_REG_G0));
         }
-        /* Store AREG0 in stack to avoid ugly glibc bugs that mangle
-           global registers */
-        // delay slot
-        tcg_out_st(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
-                   TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                   sizeof(long));
-        tcg_out_ld(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
-                   TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                   sizeof(long));
+        /* delay slot */
+        tcg_out_nop(s);
         break;
     case INDEX_op_jmp:
     case INDEX_op_br:
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 31b98e2..b7afa7b 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -66,20 +66,16 @@ typedef enum {
 #define TCG_CT_CONST_S13 0x200
 
 /* used for function call generation */
-#define TCG_REG_CALL_STACK TCG_REG_I6
+#define TCG_REG_CALL_STACK TCG_REG_O6
 
 #if TCG_TARGET_REG_BITS == 64
-// Reserve space for AREG0
-#define TCG_TARGET_STACK_MINFRAME (176 + 4 * (int)sizeof(long) + \
-                                   TCG_STATIC_CALL_ARGS_SIZE)
-#define TCG_TARGET_CALL_STACK_OFFSET (2047 - 16)
-#define TCG_TARGET_STACK_ALIGN 16
+#define TCG_TARGET_STACK_BIAS           2047
+#define TCG_TARGET_STACK_ALIGN          16
+#define TCG_TARGET_CALL_STACK_OFFSET    (128 + 6*8 + TCG_TARGET_STACK_BIAS)
 #else
-// AREG0 + one word for alignment
-#define TCG_TARGET_STACK_MINFRAME (92 + (2 + 1) * (int)sizeof(long) + \
-                                   TCG_STATIC_CALL_ARGS_SIZE)
-#define TCG_TARGET_CALL_STACK_OFFSET TCG_TARGET_STACK_MINFRAME
-#define TCG_TARGET_STACK_ALIGN 8
+#define TCG_TARGET_STACK_BIAS           0
+#define TCG_TARGET_STACK_ALIGN          8
+#define TCG_TARGET_CALL_STACK_OFFSET    (64 + 4 + 6*4)
 #endif
 
 #if TCG_TARGET_REG_BITS == 64
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 11/14] tcg-sparc: Mask shift immediates to avoid illegal insns.
  2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
                   ` (9 preceding siblings ...)
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 10/14] tcg-sparc: Clean up cruft stemming from attempts to use global registers Richard Henderson
@ 2012-03-28  0:32 ` Richard Henderson
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 12/14] tcg-sparc: Use defines for temporaries Richard Henderson
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel

The xtensa-test image generates a sra_i32 with count 0x40.
Whether this is accident of tcg constant propagation or
originating directly from the instruction stream is immaterial.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |   18 ++++++++++++------
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 88c5140..5b3cde4 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1184,13 +1184,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         goto gen_arith;
     case INDEX_op_shl_i32:
         c = SHIFT_SLL;
-        goto gen_arith;
+    do_shift32:
+        /* Limit immediate shift count lest we create an illegal insn.  */
+        tcg_out_arithc(s, args[0], args[1], args[2] & 31, const_args[2], c);
+        break;
     case INDEX_op_shr_i32:
         c = SHIFT_SRL;
-        goto gen_arith;
+        goto do_shift32;
     case INDEX_op_sar_i32:
         c = SHIFT_SRA;
-        goto gen_arith;
+        goto do_shift32;
     case INDEX_op_mul_i32:
         c = ARITH_UMUL;
         goto gen_arith;
@@ -1311,13 +1314,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
     case INDEX_op_shl_i64:
         c = SHIFT_SLLX;
-        goto gen_arith;
+    do_shift64:
+        /* Limit immediate shift count lest we create an illegal insn.  */
+        tcg_out_arithc(s, args[0], args[1], args[2] & 63, const_args[2], c);
+        break;
     case INDEX_op_shr_i64:
         c = SHIFT_SRLX;
-        goto gen_arith;
+        goto do_shift64;
     case INDEX_op_sar_i64:
         c = SHIFT_SRAX;
-        goto gen_arith;
+        goto do_shift64;
     case INDEX_op_mul_i64:
         c = ARITH_MULX;
         goto gen_arith;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 12/14] tcg-sparc: Use defines for temporaries.
  2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
                   ` (10 preceding siblings ...)
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 11/14] tcg-sparc: Mask shift immediates to avoid illegal insns Richard Henderson
@ 2012-03-28  0:32 ` Richard Henderson
  2012-03-29 18:56   ` Blue Swirl
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 13/14] tcg-sparc: Add %g/%o registers to alloc_order Richard Henderson
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 14/14] tcg-sparc: Fix and enable direct TB chaining Richard Henderson
  13 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel

And change from %i4/%i5 to %g1/%o7 to remove a v8plus fixme.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |  114 ++++++++++++++++++++++++-----------------------
 1 files changed, 58 insertions(+), 56 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 5b3cde4..9e822f3 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -59,8 +59,12 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 };
 #endif
 
+/* Define some temporary registers.  T2 is used for constant generation.  */
+#define TCG_REG_T1  TCG_REG_G1
+#define TCG_REG_T2  TCG_REG_O7
+
 #ifdef CONFIG_USE_GUEST_BASE
-# define TCG_GUEST_BASE_REG TCG_REG_I3
+# define TCG_GUEST_BASE_REG TCG_REG_I5
 #else
 # define TCG_GUEST_BASE_REG TCG_REG_G0
 #endif
@@ -85,6 +89,7 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_I2,
     TCG_REG_I3,
     TCG_REG_I4,
+    TCG_REG_I5,
 };
 
 static const int tcg_target_call_iarg_regs[6] = {
@@ -372,10 +377,10 @@ static inline void tcg_out_movi(TCGContext *s, TCGType type,
         tcg_out_sethi(s, ret, ~arg);
         tcg_out_arithi(s, ret, ret, (arg & 0x3ff) | -0x400, ARITH_XOR);
     } else {
-        tcg_out_movi_imm32(s, TCG_REG_I4, arg >> (TCG_TARGET_REG_BITS / 2));
-        tcg_out_arithi(s, TCG_REG_I4, TCG_REG_I4, 32, SHIFT_SLLX);
-        tcg_out_movi_imm32(s, ret, arg);
-        tcg_out_arith(s, ret, ret, TCG_REG_I4, ARITH_OR);
+        tcg_out_movi_imm32(s, ret, arg >> (TCG_TARGET_REG_BITS / 2));
+        tcg_out_arithi(s, ret, ret, 32, SHIFT_SLLX);
+        tcg_out_movi_imm32(s, TCG_REG_T2, arg);
+        tcg_out_arith(s, ret, ret, TCG_REG_T2, ARITH_OR);
     }
 }
 
@@ -392,8 +397,8 @@ static inline void tcg_out_ldst(TCGContext *s, int ret, int addr,
         tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(addr) |
                   INSN_IMM13(offset));
     } else {
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I5, offset);
-        tcg_out_ldst_rr(s, ret, addr, TCG_REG_I5, op);
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_T1, offset);
+        tcg_out_ldst_rr(s, ret, addr, TCG_REG_T1, op);
     }
 }
 
@@ -435,8 +440,8 @@ static inline void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val)
         if (check_fit_tl(val, 13))
             tcg_out_arithi(s, reg, reg, val, ARITH_ADD);
         else {
-            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I5, val);
-            tcg_out_arith(s, reg, reg, TCG_REG_I5, ARITH_ADD);
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_T1, val);
+            tcg_out_arith(s, reg, reg, TCG_REG_T1, ARITH_ADD);
         }
     }
 }
@@ -448,8 +453,8 @@ static inline void tcg_out_andi(TCGContext *s, int rd, int rs,
         if (check_fit_tl(val, 13))
             tcg_out_arithi(s, rd, rs, val, ARITH_AND);
         else {
-            tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_I5, val);
-            tcg_out_arith(s, rd, rs, TCG_REG_I5, ARITH_AND);
+            tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_T1, val);
+            tcg_out_arith(s, rd, rs, TCG_REG_T1, ARITH_AND);
         }
     }
 }
@@ -461,8 +466,8 @@ static void tcg_out_div32(TCGContext *s, int rd, int rs1,
     if (uns) {
         tcg_out_sety(s, TCG_REG_G0);
     } else {
-        tcg_out_arithi(s, TCG_REG_I5, rs1, 31, SHIFT_SRA);
-        tcg_out_sety(s, TCG_REG_I5);
+        tcg_out_arithi(s, TCG_REG_T1, rs1, 31, SHIFT_SRA);
+        tcg_out_sety(s, TCG_REG_T1);
     }
 
     tcg_out_arithc(s, rd, rs1, val2, val2const,
@@ -608,8 +613,8 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
     case TCG_COND_GTU:
     case TCG_COND_GEU:
         if (c2const && c2 != 0) {
-            tcg_out_movi_imm13(s, TCG_REG_I5, c2);
-            c2 = TCG_REG_I5;
+            tcg_out_movi_imm13(s, TCG_REG_T1, c2);
+            c2 = TCG_REG_T1;
         }
         t = c1, c1 = c2, c2 = t, c2const = 0;
         cond = tcg_swap_cond(cond);
@@ -656,15 +661,15 @@ static void tcg_out_setcond2_i32(TCGContext *s, TCGCond cond, TCGArg ret,
 
     switch (cond) {
     case TCG_COND_EQ:
-        tcg_out_setcond_i32(s, TCG_COND_EQ, TCG_REG_I5, al, bl, blconst);
+        tcg_out_setcond_i32(s, TCG_COND_EQ, TCG_REG_T1, al, bl, blconst);
         tcg_out_setcond_i32(s, TCG_COND_EQ, ret, ah, bh, bhconst);
-        tcg_out_arith(s, ret, ret, TCG_REG_I5, ARITH_AND);
+        tcg_out_arith(s, ret, ret, TCG_REG_T1, ARITH_AND);
         break;
 
     case TCG_COND_NE:
-        tcg_out_setcond_i32(s, TCG_COND_NE, TCG_REG_I5, al, al, blconst);
+        tcg_out_setcond_i32(s, TCG_COND_NE, TCG_REG_T1, al, al, blconst);
         tcg_out_setcond_i32(s, TCG_COND_NE, ret, ah, bh, bhconst);
-        tcg_out_arith(s, ret, ret, TCG_REG_I5, ARITH_OR);
+        tcg_out_arith(s, ret, ret, TCG_REG_T1, ARITH_OR);
         break;
 
     default:
@@ -964,8 +969,8 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
 #else
     addr_reg = args[addrlo_idx];
     if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
-        tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
-        addr_reg = TCG_REG_I5;
+        tcg_out_arithi(s, TCG_REG_T1, addr_reg, 0, SHIFT_SRL);
+        addr_reg = TCG_REG_T1;
     }
     if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
         int reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
@@ -1008,12 +1013,11 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
                                 offsetof(CPUTLBEntry, addr_write));
 
     if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
-        /* Reconstruct the full 64-bit value in %g1, using %o2 as temp.  */
-        /* ??? Redefine the temps from %i4/%i5 so that we have a o/g temp. */
-        tcg_out_arithi(s, TCG_REG_G1, datalo, 0, SHIFT_SRL);
+        /* Reconstruct the full 64-bit value.  */
+        tcg_out_arithi(s, TCG_REG_T1, datalo, 0, SHIFT_SRL);
         tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
-        tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
-        datalo = TCG_REG_G1;
+        tcg_out_arith(s, TCG_REG_O2, TCG_REG_T1, TCG_REG_O2, ARITH_OR);
+        datalo = TCG_REG_O2;
     }
 
     /* The fast path is exactly one insn.  Thus we can perform the entire
@@ -1054,16 +1058,14 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
 #else
     addr_reg = args[addrlo_idx];
     if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
-        tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
-        addr_reg = TCG_REG_I5;
+        tcg_out_arithi(s, TCG_REG_T1, addr_reg, 0, SHIFT_SRL);
+        addr_reg = TCG_REG_T1;
     }
     if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
-        /* Reconstruct the full 64-bit value in %g1, using %o2 as temp.  */
-        /* ??? Redefine the temps from %i4/%i5 so that we have a o/g temp. */
-        tcg_out_arithi(s, TCG_REG_G1, datalo, 0, SHIFT_SRL);
+        tcg_out_arithi(s, TCG_REG_T1, datalo, 0, SHIFT_SRL);
         tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
-        tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
-        datalo = TCG_REG_G1;
+        tcg_out_arith(s, TCG_REG_O2, TCG_REG_T1, TCG_REG_O2, ARITH_OR);
+        datalo = TCG_REG_O2;
     }
     tcg_out_ldst_rr(s, datalo, addr_reg,
                     (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
@@ -1087,28 +1089,28 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_goto_tb:
         if (s->tb_jmp_offset) {
             /* direct jump method */
-            tcg_out_sethi(s, TCG_REG_I5, args[0] & 0xffffe000);
-            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_I5) |
+            tcg_out_sethi(s, TCG_REG_T1, args[0] & 0xffffe000);
+            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_T1) |
                       INSN_IMM13((args[0] & 0x1fff)));
             s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
         } else {
             /* indirect jump method */
-            tcg_out_ld_ptr(s, TCG_REG_I5, (tcg_target_long)(s->tb_next + args[0]));
-            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_I5) |
+            tcg_out_ld_ptr(s, TCG_REG_T1, (tcg_target_long)(s->tb_next + args[0]));
+            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_T1) |
                       INSN_RS2(TCG_REG_G0));
         }
         tcg_out_nop(s);
         s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
         break;
     case INDEX_op_call:
-        if (const_args[0])
+        if (const_args[0]) {
             tcg_out32(s, CALL | ((((tcg_target_ulong)args[0]
                                    - (tcg_target_ulong)s->code_ptr) >> 2)
                                  & 0x3fffffff));
-        else {
-            tcg_out_ld_ptr(s, TCG_REG_I5,
+        } else {
+            tcg_out_ld_ptr(s, TCG_REG_T1,
                            (tcg_target_long)(s->tb_next + args[0]));
-            tcg_out32(s, JMPL | INSN_RD(TCG_REG_O7) | INSN_RS1(TCG_REG_I5) |
+            tcg_out32(s, JMPL | INSN_RD(TCG_REG_O7) | INSN_RS1(TCG_REG_T1) |
                       INSN_RS2(TCG_REG_G0));
         }
         /* delay slot */
@@ -1214,11 +1216,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 
     case INDEX_op_rem_i32:
     case INDEX_op_remu_i32:
-        tcg_out_div32(s, TCG_REG_I5, args[1], args[2], const_args[2],
+        tcg_out_div32(s, TCG_REG_T1, args[1], args[2], const_args[2],
                       opc == INDEX_op_remu_i32);
-        tcg_out_arithc(s, TCG_REG_I5, TCG_REG_I5, args[2], const_args[2],
+        tcg_out_arithc(s, TCG_REG_T1, TCG_REG_T1, args[2], const_args[2],
                        ARITH_UMUL);
-        tcg_out_arith(s, args[0], args[1], TCG_REG_I5, ARITH_SUB);
+        tcg_out_arith(s, args[0], args[1], TCG_REG_T1, ARITH_SUB);
         break;
 
     case INDEX_op_brcond_i32:
@@ -1335,11 +1337,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         goto gen_arith;
     case INDEX_op_rem_i64:
     case INDEX_op_remu_i64:
-        tcg_out_arithc(s, TCG_REG_I5, args[1], args[2], const_args[2],
+        tcg_out_arithc(s, TCG_REG_T1, args[1], args[2], const_args[2],
                        opc == INDEX_op_rem_i64 ? ARITH_SDIVX : ARITH_UDIVX);
-        tcg_out_arithc(s, TCG_REG_I5, TCG_REG_I5, args[2], const_args[2],
+        tcg_out_arithc(s, TCG_REG_T1, TCG_REG_T1, args[2], const_args[2],
                        ARITH_MULX);
-        tcg_out_arith(s, args[0], args[1], TCG_REG_I5, ARITH_SUB);
+        tcg_out_arith(s, args[0], args[1], TCG_REG_T1, ARITH_SUB);
         break;
     case INDEX_op_ext32s_i64:
         if (const_args[1]) {
@@ -1537,15 +1539,15 @@ static void tcg_target_init(TCGContext *s)
                      (1 << TCG_REG_O7));
 
     tcg_regset_clear(s->reserved_regs);
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G0);
-#if TCG_TARGET_REG_BITS == 64
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I4); // for internal use
-#endif
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I5); // for internal use
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I6);
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I7);
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_O6);
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_O7);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G0); /* zero */
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G6); /* reserved for os */
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G7); /* thread pointer */
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I6); /* frame pointer */
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I7); /* return address */
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_O6); /* stack pointer */
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_T1); /* for internal use */
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_T2); /* for internal use */
+
     tcg_add_target_add_op_defs(sparc_op_defs);
 }
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 13/14] tcg-sparc: Add %g/%o registers to alloc_order
  2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
                   ` (11 preceding siblings ...)
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 12/14] tcg-sparc: Use defines for temporaries Richard Henderson
@ 2012-03-28  0:32 ` Richard Henderson
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 14/14] tcg-sparc: Fix and enable direct TB chaining Richard Henderson
  13 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel


Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 9e822f3..72d65cb 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -84,12 +84,25 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_L5,
     TCG_REG_L6,
     TCG_REG_L7,
+
     TCG_REG_I0,
     TCG_REG_I1,
     TCG_REG_I2,
     TCG_REG_I3,
     TCG_REG_I4,
     TCG_REG_I5,
+
+    TCG_REG_G2,
+    TCG_REG_G3,
+    TCG_REG_G4,
+    TCG_REG_G5,
+
+    TCG_REG_O0,
+    TCG_REG_O1,
+    TCG_REG_O2,
+    TCG_REG_O3,
+    TCG_REG_O4,
+    TCG_REG_O5,
 };
 
 static const int tcg_target_call_iarg_regs[6] = {
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 14/14] tcg-sparc: Fix and enable direct TB chaining.
  2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
                   ` (12 preceding siblings ...)
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 13/14] tcg-sparc: Add %g/%o registers to alloc_order Richard Henderson
@ 2012-03-28  0:32 ` Richard Henderson
  13 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2012-03-28  0:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: blauwirbel


Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 exec-all.h             |    9 ++++++---
 tcg/sparc/tcg-target.c |   19 ++++++++++++++++---
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/exec-all.h b/exec-all.h
index 93a5b22..f7d4708 100644
--- a/exec-all.h
+++ b/exec-all.h
@@ -120,9 +120,10 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
 #define CODE_GEN_AVG_BLOCK_SIZE 64
 #endif
 
-#if defined(_ARCH_PPC) || defined(__x86_64__) || defined(__arm__) || defined(__i386__)
-#define USE_DIRECT_JUMP
-#elif defined(CONFIG_TCG_INTERPRETER)
+#if defined(__arm__) || defined(_ARCH_PPC) \
+    || defined(__x86_64__) || defined(__i386__) \
+    || defined(__sparc__) \
+    || defined(CONFIG_TCG_INTERPRETER)
 #define USE_DIRECT_JUMP
 #endif
 
@@ -232,6 +233,8 @@ static inline void tb_set_jmp_target1(unsigned long jmp_addr, unsigned long addr
     __asm __volatile__ ("swi 0x9f0002" : : "r" (_beg), "r" (_end), "r" (_flg));
 #endif
 }
+#elif defined(__sparc__)
+extern void tb_set_jmp_target1(unsigned long jmp_addr, unsigned long addr);
 #else
 #error tb_set_jmp_target1 is missing
 #endif
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 72d65cb..ac214e6 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1102,10 +1102,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_goto_tb:
         if (s->tb_jmp_offset) {
             /* direct jump method */
-            tcg_out_sethi(s, TCG_REG_T1, args[0] & 0xffffe000);
-            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_T1) |
-                      INSN_IMM13((args[0] & 0x1fff)));
             s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
+            tcg_out32(s, CALL | (8 >> 2));
         } else {
             /* indirect jump method */
             tcg_out_ld_ptr(s, TCG_REG_T1, (tcg_target_long)(s->tb_next + args[0]));
@@ -1624,3 +1622,18 @@ void tcg_register_jit(void *buf, size_t buf_size)
 
     tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
 }
+
+void tb_set_jmp_target1(unsigned long jmp_addr, unsigned long addr)
+{
+    uint32_t *ptr = (uint32_t *)jmp_addr;
+    tcg_target_long disp = (tcg_target_long)(addr - jmp_addr) >> 2;
+
+    /* We can reach the entire address space for 32-bit.  For 64-bit
+       the code_gen_buffer can't be larger than 2GB.  */
+    if (TCG_TARGET_REG_BITS == 64 && !check_fit_tl(disp, 30)) {
+        tcg_abort();
+    }
+
+    *ptr = CALL | (disp & 0x3fffffff);
+    flush_icache_range(jmp_addr, jmp_addr + 4);
+}
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 03/14] tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode.
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 03/14] tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode Richard Henderson
@ 2012-03-29 18:45   ` Blue Swirl
  2012-03-29 18:49     ` Richard Henderson
  0 siblings, 1 reply; 21+ messages in thread
From: Blue Swirl @ 2012-03-29 18:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Wed, Mar 28, 2012 at 00:32, Richard Henderson <rth@twiddle.net> wrote:
> Current code doesn't actually work in 32-bit mode at all.  Since
> no one really noticed, drop the complication of v7 and v8 cpus.
> Eliminate the --sparc_cpu configure option and standardize macro
> testing on TCG_TARGET_REG_BITS / HOST_LONG_BITS
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  configure              |   41 ++++-------------------------------------
>  disas.c                |    6 ------
>  dyngen-exec.h          |    4 +---
>  exec.c                 |   12 +++++-------
>  qemu-timer.h           |    8 +++++---
>  tcg/sparc/tcg-target.c |   20 +++++---------------
>  tcg/sparc/tcg-target.h |    7 ++++---
>  tcg/tcg.c              |    3 ++-
>  8 files changed, 26 insertions(+), 75 deletions(-)
>
> diff --git a/configure b/configure
> index 80ca430..7741ba9 100755
> --- a/configure
> +++ b/configure
> @@ -86,7 +86,6 @@ source_path=`dirname "$0"`
>  cpu=""
>  interp_prefix="/usr/gnemul/qemu-%M"
>  static="no"
> -sparc_cpu=""
>  cross_prefix=""
>  audio_drv_list=""
>  audio_card_list="ac97 es1370 sb16 hda"
> @@ -216,21 +215,6 @@ for opt do
>   ;;
>   --disable-debug-info) debug_info="no"
>   ;;
> -  --sparc_cpu=*)
> -    sparc_cpu="$optarg"
> -    case $sparc_cpu in
> -    v7|v8|v8plus|v8plusa)
> -      cpu="sparc"
> -    ;;
> -    v9)
> -      cpu="sparc64"
> -    ;;
> -    *)
> -      echo "undefined SPARC architecture. Exiting";
> -      exit 1
> -    ;;
> -    esac
> -  ;;
>   esac
>  done
>  # OS specific
> @@ -284,8 +268,6 @@ elif check_define __i386__ ; then
>  elif check_define __x86_64__ ; then
>   cpu="x86_64"
>  elif check_define __sparc__ ; then
> -  # We can't check for 64 bit (when gcc is biarch) or V8PLUSA
> -  # They must be specified using --sparc_cpu
>   if check_define __arch64__ ; then
>     cpu="sparc64"
>   else
> @@ -749,8 +731,6 @@ for opt do
>   ;;
>   --enable-uname-release=*) uname_release="$optarg"
>   ;;
> -  --sparc_cpu=*)
> -  ;;
>   --enable-werror) werror="yes"
>   ;;
>   --disable-werror) werror="no"
> @@ -830,32 +810,19 @@ for opt do
>   esac
>  done
>
> -#
> -# If cpu ~= sparc and  sparc_cpu hasn't been defined, plug in the right
> -# QEMU_CFLAGS/LDFLAGS (assume sparc_v8plus for 32-bit and sparc_v9 for 64-bit)
> -#
>  host_guest_base="no"
>  case "$cpu" in
> -    sparc) case $sparc_cpu in
> -           v7|v8)
> -             QEMU_CFLAGS="-mcpu=${sparc_cpu} -D__sparc_${sparc_cpu}__ $QEMU_CFLAGS"
> -           ;;
> -           v8plus|v8plusa)
> -             QEMU_CFLAGS="-mcpu=ultrasparc -D__sparc_${sparc_cpu}__ $QEMU_CFLAGS"
> -           ;;
> -           *) # sparc_cpu not defined in the command line
> -             QEMU_CFLAGS="-mcpu=ultrasparc -D__sparc_v8plus__ $QEMU_CFLAGS"
> -           esac
> +    sparc)
>            LDFLAGS="-m32 $LDFLAGS"
> -           QEMU_CFLAGS="-m32 -ffixed-g2 -ffixed-g3 $QEMU_CFLAGS"
> +           QEMU_CFLAGS="-m32 -mcpu=ultrasparc $QEMU_CFLAGS"
> +           QEMU_CFLAGS="-ffixed-g2 -ffixed-g3 $QEMU_CFLAGS"
>            if test "$solaris" = "no" ; then
>              QEMU_CFLAGS="-ffixed-g1 -ffixed-g6 $QEMU_CFLAGS"
> -             helper_cflags="-ffixed-i0"
>            fi
>            ;;
>     sparc64)
> -           QEMU_CFLAGS="-m64 -mcpu=ultrasparc -D__sparc_v9__ $QEMU_CFLAGS"
>            LDFLAGS="-m64 $LDFLAGS"
> +           QEMU_CFLAGS="-m64 -mcpu=ultrasparc $QEMU_CFLAGS"
>            QEMU_CFLAGS="-ffixed-g5 -ffixed-g6 -ffixed-g7 $QEMU_CFLAGS"
>            if test "$solaris" != "no" ; then
>              QEMU_CFLAGS="-ffixed-g1 $QEMU_CFLAGS"
> diff --git a/disas.c b/disas.c
> index 4945c44..b3434fa 100644
> --- a/disas.c
> +++ b/disas.c
> @@ -175,9 +175,7 @@ void target_disas(FILE *out, target_ulong code, target_ulong size, int flags)
>        print_insn = print_insn_arm;
>  #elif defined(TARGET_SPARC)
>     print_insn = print_insn_sparc;
> -#ifdef TARGET_SPARC64
>     disasm_info.mach = bfd_mach_sparc_v9b;
> -#endif

This is not OK, it would change ASI printout for V8 guest code.

>  #elif defined(TARGET_PPC)
>     if (flags >> 16)
>         disasm_info.endian = BFD_ENDIAN_LITTLE;
> @@ -287,9 +285,7 @@ void disas(FILE *out, void *code, unsigned long size)
>     print_insn = print_insn_alpha;
>  #elif defined(__sparc__)
>     print_insn = print_insn_sparc;
> -#if defined(__sparc_v8plus__) || defined(__sparc_v8plusa__) || defined(__sparc_v9__)
>     disasm_info.mach = bfd_mach_sparc_v9b;
> -#endif

This change is OK, it's for Sparc V9 host.

>  #elif defined(__arm__)
>     print_insn = print_insn_arm;
>  #elif defined(__MIPSEB__)
> @@ -397,9 +393,7 @@ void monitor_disas(Monitor *mon, CPUArchState *env,
>     print_insn = print_insn_alpha;
>  #elif defined(TARGET_SPARC)
>     print_insn = print_insn_sparc;
> -#ifdef TARGET_SPARC64
>     disasm_info.mach = bfd_mach_sparc_v9b;
> -#endif

This is again for the guest code disassembly (from monitor) which
could be V8, so not OK.

>  #elif defined(TARGET_PPC)
>  #ifdef TARGET_PPC64
>     disasm_info.mach = bfd_mach_ppc64;
> diff --git a/dyngen-exec.h b/dyngen-exec.h
> index 083e20b..cfeef99 100644
> --- a/dyngen-exec.h
> +++ b/dyngen-exec.h
> @@ -39,13 +39,11 @@
>  #elif defined(__sparc__)
>  #ifdef CONFIG_SOLARIS
>  #define AREG0 "g2"
> -#else
> -#ifdef __sparc_v9__
> +#elif HOST_LONG_BITS == 64
>  #define AREG0 "g5"
>  #else
>  #define AREG0 "g6"
>  #endif
> -#endif
>  #elif defined(__s390__)
>  #define AREG0 "r10"
>  #elif defined(__alpha__)
> diff --git a/exec.c b/exec.c
> index 6731ab8..ad13ce1 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -86,7 +86,7 @@ static int nb_tbs;
>  /* any access to the tbs or the page table must use this lock */
>  spinlock_t tb_lock = SPIN_LOCK_UNLOCKED;
>
> -#if defined(__arm__) || defined(__sparc_v9__)
> +#if defined(__arm__) || defined(__sparc__)
>  /* The prologue must be reachable with a direct jump. ARM and Sparc64
>  have limited branch ranges (possibly also PPC) so place it in a
>  section close to code segment. */
> @@ -559,10 +559,9 @@ static void code_gen_alloc(unsigned long tb_size)
>         /* Cannot map more than that */
>         if (code_gen_buffer_size > (800 * 1024 * 1024))
>             code_gen_buffer_size = (800 * 1024 * 1024);
> -#elif defined(__sparc_v9__)
> +#elif defined(__sparc__) && HOST_LONG_BITS == 64
>         // Map the buffer below 2G, so we can use direct calls and branches
> -        flags |= MAP_FIXED;
> -        start = (void *) 0x60000000UL;
> +        start = (void *) 0x40000000UL;
>         if (code_gen_buffer_size > (512 * 1024 * 1024))
>             code_gen_buffer_size = (512 * 1024 * 1024);
>  #elif defined(__arm__)
> @@ -600,10 +599,9 @@ static void code_gen_alloc(unsigned long tb_size)
>         /* Cannot map more than that */
>         if (code_gen_buffer_size > (800 * 1024 * 1024))
>             code_gen_buffer_size = (800 * 1024 * 1024);
> -#elif defined(__sparc_v9__)
> +#elif defined(__sparc__) && HOST_LONG_BITS == 64
>         // Map the buffer below 2G, so we can use direct calls and branches
> -        flags |= MAP_FIXED;
> -        addr = (void *) 0x60000000UL;
> +        addr = (void *) 0x40000000UL;
>         if (code_gen_buffer_size > (512 * 1024 * 1024)) {
>             code_gen_buffer_size = (512 * 1024 * 1024);
>         }
> diff --git a/qemu-timer.h b/qemu-timer.h
> index de17f3b..b730427 100644
> --- a/qemu-timer.h
> +++ b/qemu-timer.h
> @@ -221,7 +221,7 @@ static inline int64_t cpu_get_real_ticks(void)
>     return val;
>  }
>
> -#elif defined(__sparc_v8plus__) || defined(__sparc_v8plusa__) || defined(__sparc_v9__)
> +#elif defined(__sparc__)
>
>  static inline int64_t cpu_get_real_ticks (void)
>  {
> @@ -230,6 +230,8 @@ static inline int64_t cpu_get_real_ticks (void)
>     asm volatile("rd %%tick,%0" : "=r"(rval));
>     return rval;
>  #else
> +    /* We need an %o or %g register for this.  For recent enough gcc
> +       there is an "h" constraint for that.  Don't bother with that.  */
>     union {
>         uint64_t i64;
>         struct {
> @@ -237,8 +239,8 @@ static inline int64_t cpu_get_real_ticks (void)
>             uint32_t low;
>         }       i32;
>     } rval;
> -    asm volatile("rd %%tick,%1; srlx %1,32,%0"
> -                 : "=r"(rval.i32.high), "=r"(rval.i32.low));
> +    asm volatile("rd %%tick,%%g1; srlx %%g1,32,%0; mov %%g1,%1"
> +                 : "=r"(rval.i32.high), "=r"(rval.i32.low) : : "g1");
>     return rval.i64;
>  #endif
>  }
> diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
> index 358a70c..38be0c8 100644
> --- a/tcg/sparc/tcg-target.c
> +++ b/tcg/sparc/tcg-target.c
> @@ -627,18 +627,10 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
>
>     default:
>         tcg_out_cmp(s, c1, c2, c2const);
> -#if defined(__sparc_v9__) || defined(__sparc_v8plus__)
>         tcg_out_movi_imm13(s, ret, 0);
> -        tcg_out32 (s, ARITH_MOVCC | INSN_RD(ret)
> -                   | INSN_RS1(tcg_cond_to_bcond[cond])
> -                   | MOVCC_ICC | INSN_IMM11(1));
> -#else
> -        t = gen_new_label();
> -        tcg_out_branch_i32(s, INSN_COND(tcg_cond_to_bcond[cond], 1), t);
> -        tcg_out_movi_imm13(s, ret, 1);
> -        tcg_out_movi_imm13(s, ret, 0);
> -        tcg_out_label(s, t, s->code_ptr);
> -#endif
> +        tcg_out32(s, ARITH_MOVCC | INSN_RD(ret)
> +                  | INSN_RS1(tcg_cond_to_bcond[cond])
> +                  | MOVCC_ICC | INSN_IMM11(1));
>         return;
>     }
>
> @@ -768,7 +760,7 @@ static const void * const qemu_st_helpers[4] = {
>  #endif
>  #endif
>
> -#ifdef __arch64__
> +#if TCG_TARGET_REG_BITS == 64
>  #define HOST_LD_OP LDX
>  #define HOST_ST_OP STX
>  #define HOST_SLL_OP SHIFT_SLLX
> @@ -1630,11 +1622,9 @@ static void tcg_target_init(TCGContext *s)
>
>  #if TCG_TARGET_REG_BITS == 64
>  # define ELF_HOST_MACHINE  EM_SPARCV9
> -#elif defined(__sparc_v8plus__)
> +#else
>  # define ELF_HOST_MACHINE  EM_SPARC32PLUS
>  # define ELF_HOST_FLAGS    EF_SPARC_32PLUS
> -#else
> -# define ELF_HOST_MACHINE  EM_SPARC
>  #endif
>
>  typedef struct {
> diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
> index ee2274d..56742bf 100644
> --- a/tcg/sparc/tcg-target.h
> +++ b/tcg/sparc/tcg-target.h
> @@ -67,7 +67,8 @@ typedef enum {
>
>  /* used for function call generation */
>  #define TCG_REG_CALL_STACK TCG_REG_I6
> -#ifdef __arch64__
> +
> +#if TCG_TARGET_REG_BITS == 64
>  // Reserve space for AREG0
>  #define TCG_TARGET_STACK_MINFRAME (176 + 4 * (int)sizeof(long) + \
>                                    TCG_STATIC_CALL_ARGS_SIZE)
> @@ -81,7 +82,7 @@ typedef enum {
>  #define TCG_TARGET_STACK_ALIGN 8
>  #endif
>
> -#ifdef __arch64__
> +#if TCG_TARGET_REG_BITS == 64
>  #define TCG_TARGET_EXTEND_ARGS 1
>  #endif
>
> @@ -128,7 +129,7 @@ typedef enum {
>  /* Note: must be synced with dyngen-exec.h */
>  #ifdef CONFIG_SOLARIS
>  #define TCG_AREG0 TCG_REG_G2
> -#elif defined(__sparc_v9__)
> +#elif HOST_LONG_BITS == 64
>  #define TCG_AREG0 TCG_REG_G5
>  #else
>  #define TCG_AREG0 TCG_REG_G6
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index ab589c7..9f234f4 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1457,7 +1457,8 @@ static void temp_allocate_frame(TCGContext *s, int temp)
>  {
>     TCGTemp *ts;
>     ts = &s->temps[temp];
> -#ifndef __sparc_v9__ /* Sparc64 stack is accessed with offset of 2047 */
> +#if !(defined(__sparc__) && TCG_TARGET_REG_BITS == 64)
> +    /* Sparc64 stack is accessed with offset of 2047 */
>     s->current_frame_offset = (s->current_frame_offset +
>                                (tcg_target_long)sizeof(tcg_target_long) - 1) &
>         ~(sizeof(tcg_target_long) - 1);
> --
> 1.7.7.6
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 05/14] tcg-sparc: Simplify qemu_ld/st direct memory paths.
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 05/14] tcg-sparc: Simplify qemu_ld/st direct memory paths Richard Henderson
@ 2012-03-29 18:47   ` Blue Swirl
  0 siblings, 0 replies; 21+ messages in thread
From: Blue Swirl @ 2012-03-29 18:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Wed, Mar 28, 2012 at 00:32, Richard Henderson <rth@twiddle.net> wrote:
> Given that we have an opcode for all sizes, all endianness,
> turn the functions into a simple table lookup.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/sparc/tcg-target.c |  384 +++++++++++++++++++-----------------------------
>  1 files changed, 150 insertions(+), 234 deletions(-)
>
> diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
> index c74fc2c..5cea5a8 100644
> --- a/tcg/sparc/tcg-target.c
> +++ b/tcg/sparc/tcg-target.c
> @@ -294,6 +294,16 @@ static inline int tcg_target_const_match(tcg_target_long val,
>  #define ASI_PRIMARY_LITTLE 0x88
>  #endif
>
> +#define LDUH_LE    (LDUHA | INSN_ASI(ASI_PRIMARY_LITTLE))
> +#define LDSH_LE    (LDSHA | INSN_ASI(ASI_PRIMARY_LITTLE))
> +#define LDUW_LE    (LDUWA | INSN_ASI(ASI_PRIMARY_LITTLE))
> +#define LDSW_LE    (LDSWA | INSN_ASI(ASI_PRIMARY_LITTLE))
> +#define LDX_LE     (LDXA  | INSN_ASI(ASI_PRIMARY_LITTLE))
> +
> +#define STH_LE     (STHA  | INSN_ASI(ASI_PRIMARY_LITTLE))
> +#define STW_LE     (STWA  | INSN_ASI(ASI_PRIMARY_LITTLE))
> +#define STX_LE     (STXA  | INSN_ASI(ASI_PRIMARY_LITTLE))
> +
>  static inline void tcg_out_arith(TCGContext *s, int rd, int rs1, int rs2,
>                                  int op)
>  {
> @@ -366,66 +376,46 @@ static inline void tcg_out_movi(TCGContext *s, TCGType type,
>     }
>  }
>
> -static inline void tcg_out_ld_raw(TCGContext *s, int ret,
> -                                  tcg_target_long arg)
> +static inline void tcg_out_ldst_rr(TCGContext *s, int data, int a1,
> +                                   int a2, int op)
>  {
> -    tcg_out_sethi(s, ret, arg);
> -    tcg_out32(s, LDUW | INSN_RD(ret) | INSN_RS1(ret) |
> -              INSN_IMM13(arg & 0x3ff));
> +    tcg_out32(s, op | INSN_RD(data) | INSN_RS1(a1) | INSN_RS2(a2));
>  }
>
> -static inline void tcg_out_ld_ptr(TCGContext *s, int ret,
> -                                  tcg_target_long arg)
> +static inline void tcg_out_ldst(TCGContext *s, int ret, int addr,
> +                                int offset, int op)
>  {
> -    if (!check_fit_tl(arg, 10))
> -        tcg_out_movi(s, TCG_TYPE_PTR, ret, arg & ~0x3ffULL);
> -    if (TCG_TARGET_REG_BITS == 64) {
> -        tcg_out32(s, LDX | INSN_RD(ret) | INSN_RS1(ret) |
> -                  INSN_IMM13(arg & 0x3ff));
> -    } else {
> -        tcg_out32(s, LDUW | INSN_RD(ret) | INSN_RS1(ret) |
> -                  INSN_IMM13(arg & 0x3ff));
> -    }
> -}
> -
> -static inline void tcg_out_ldst(TCGContext *s, int ret, int addr, int offset, int op)
> -{
> -    if (check_fit_tl(offset, 13))
> +    if (check_fit_tl(offset, 13)) {
>         tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(addr) |
>                   INSN_IMM13(offset));
> -    else {
> +    } else {
>         tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I5, offset);
> -        tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(TCG_REG_I5) |
> -                  INSN_RS2(addr));
> +        tcg_out_ldst_rr(s, ret, addr, TCG_REG_I5, op);
>     }
>  }
>
> -static inline void tcg_out_ldst_asi(TCGContext *s, int ret, int addr,
> -                                    int offset, int op, int asi)
> -{
> -    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I5, offset);
> -    tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(TCG_REG_I5) |
> -              INSN_ASI(asi) | INSN_RS2(addr));
> -}
> -
>  static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
>                               TCGReg arg1, tcg_target_long arg2)
>  {
> -    if (type == TCG_TYPE_I32)
> -        tcg_out_ldst(s, ret, arg1, arg2, LDUW);
> -    else
> -        tcg_out_ldst(s, ret, arg1, arg2, LDX);
> +    tcg_out_ldst(s, ret, arg1, arg2, (type == TCG_TYPE_I32 ? LDUW : LDX));
>  }
>
>  static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
>                               TCGReg arg1, tcg_target_long arg2)
>  {
> -    if (type == TCG_TYPE_I32)
> -        tcg_out_ldst(s, arg, arg1, arg2, STW);
> -    else
> -        tcg_out_ldst(s, arg, arg1, arg2, STX);
> +    tcg_out_ldst(s, arg, arg1, arg2, (type == TCG_TYPE_I32 ? STW : STX));
> +}
> +
> +static inline void tcg_out_ld_ptr(TCGContext *s, int ret,
> +                                  tcg_target_long arg)
> +{
> +    if (!check_fit_tl(arg, 10)) {
> +        tcg_out_movi(s, TCG_TYPE_PTR, ret, arg & ~0x3ff);
> +    }
> +    tcg_out_ld(s, TCG_TYPE_PTR, ret, ret, arg & 0x3ff);
>  }
>
> +
>  static inline void tcg_out_sety(TCGContext *s, int rs)
>  {
>     tcg_out32(s, WRY | INSN_RS1(TCG_REG_G0) | INSN_RS2(rs));
> @@ -757,22 +747,16 @@ static const void * const qemu_st_helpers[4] = {
>    WHICH is the offset into the CPUTLBEntry structure of the slot to read.
>    This should be offsetof addr_read or addr_write.
>
> -   Outputs:
> -   LABEL_PTRS is filled with the position of the forward jumps to the
> -   TLB miss case.  This will always be a ,PN insn, so a 19-bit offset.
> -
> -   Returns a register loaded with the low part of the address, adjusted
> -   as indicated by the TLB and so is a host address.  Undefined in the
> -   TLB miss case.  */
> +   The result of the TLB comparison is in %[ix]cc.  The sanitized address
> +   is in the returned register, maybe %o0.  The TLB addend is in %o1.  */
>
>  static int tcg_out_tlb_load(TCGContext *s, int addrlo_idx, int mem_index,
> -                            int s_bits, const TCGArg *args,
> -                            uint32_t **label_ptr, int which)
> +                            int s_bits, const TCGArg *args, int which)
>  {
>     const int addrlo = args[addrlo_idx];
> -    const int r0 = tcg_target_call_iarg_regs[0];
> -    const int r1 = tcg_target_call_iarg_regs[1];
> -    const int r2 = tcg_target_call_iarg_regs[2];
> +    const int r0 = TCG_REG_O0;
> +    const int r1 = TCG_REG_O1;
> +    const int r2 = TCG_REG_O2;
>     int addr = addrlo;
>     int tlb_ofs;
>
> @@ -803,110 +787,39 @@ static int tcg_out_tlb_load(TCGContext *s, int addrlo_idx, int mem_index,
>         tlb_ofs = 0;
>     }
>
> -    /* ld [arg1 + which], arg2 */
> +    /* Load the tlb comparator and the addend.  */
>     tcg_out_ld(s, TCG_TYPE_TL, r2, r1, tlb_ofs + which);
> +    tcg_out_ld(s, TCG_TYPE_PTR, r1, r1, tlb_ofs+offsetof(CPUTLBEntry, addend));
>
>     /* subcc arg0, arg2, %g0 */
>     tcg_out_cmp(s, r0, r2, 0);
>
> -    /* bne,pn %[ix]cc, label0 */
> -    *label_ptr = (uint32_t *)s->code_ptr;
> -    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_NE, 0) | INSN_OP2(0x1) |
> -                  ((TARGET_LONG_BITS == 64) << 21)));
> -
> -    /* TLB Hit.  Compute the host address into r1.  The ld is in the
> -       branch delay slot; harmless for the TLB miss case.  */
> -    tcg_out_ld(s, TCG_TYPE_PTR, r1, r1, tlb_ofs+offsetof(CPUTLBEntry, addend));
> -
> +    /* If the guest address must be zero-extended, do so now.  */
>     if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
>         tcg_out_arithi(s, r0, addrlo, 0, SHIFT_SRL);
> -        tcg_out_arith(s, r1, r0, r1, ARITH_ADD);
> -    } else {
> -        tcg_out_arith(s, r1, addrlo, r1, ARITH_ADD);
> +        return r0;
>     }
> -
> -    return r1;
> +    return addrlo;
>  }
>  #endif /* CONFIG_SOFTMMU */
>
> -static void tcg_out_qemu_ld_direct(TCGContext *s, int addr, int datalo,
> -                                   int datahi, int sizeop)
> -{
> +static const int qemu_ld_opc[8] = {
>  #ifdef TARGET_WORDS_BIGENDIAN
> -    const int bigendian = 1;
> +    LDUB, LDUH, LDUW, LDX, LDSB, LDSH, LDSW, LDX
>  #else
> -    const int bigendian = 0;
> +    LDUB, LDUH_LE, LDUW_LE, LDX_LE, LDSB, LDSH_LE, LDSW_LE, LDX_LE
>  #endif
> -    switch (sizeop) {
> -    case 0:
> -        /* ldub [addr], datalo */
> -        tcg_out_ldst(s, datalo, addr, 0, LDUB);
> -        break;
> -    case 0 | 4:
> -        /* ldsb [addr], datalo */
> -        tcg_out_ldst(s, datalo, addr, 0, LDSB);
> -        break;
> -    case 1:
> -        if (bigendian) {
> -            /* lduh [addr], datalo */
> -            tcg_out_ldst(s, datalo, addr, 0, LDUH);
> -        } else {
> -            /* lduha [addr] ASI_PRIMARY_LITTLE, datalo */
> -            tcg_out_ldst_asi(s, datalo, addr, 0, LDUHA, ASI_PRIMARY_LITTLE);
> -        }
> -        break;
> -    case 1 | 4:
> -        if (bigendian) {
> -            /* ldsh [addr], datalo */
> -            tcg_out_ldst(s, datalo, addr, 0, LDSH);
> -        } else {
> -            /* ldsha [addr] ASI_PRIMARY_LITTLE, datalo */
> -            tcg_out_ldst_asi(s, datalo, addr, 0, LDSHA, ASI_PRIMARY_LITTLE);
> -        }
> -        break;
> -    case 2:
> -        if (bigendian) {
> -            /* lduw [addr], datalo */
> -            tcg_out_ldst(s, datalo, addr, 0, LDUW);
> -        } else {
> -            /* lduwa [addr] ASI_PRIMARY_LITTLE, datalo */
> -            tcg_out_ldst_asi(s, datalo, addr, 0, LDUWA, ASI_PRIMARY_LITTLE);
> -        }
> -        break;
> -    case 2 | 4:
> -        if (bigendian) {
> -            /* ldsw [addr], datalo */
> -            tcg_out_ldst(s, datalo, addr, 0, LDSW);
> -        } else {
> -            /* ldswa [addr] ASI_PRIMARY_LITTLE, datalo */
> -            tcg_out_ldst_asi(s, datalo, addr, 0, LDSWA, ASI_PRIMARY_LITTLE);
> -        }
> -        break;
> -    case 3:
> -        if (TCG_TARGET_REG_BITS == 64) {
> -            if (bigendian) {
> -                /* ldx [addr], datalo */
> -                tcg_out_ldst(s, datalo, addr, 0, LDX);
> -            } else {
> -                /* ldxa [addr] ASI_PRIMARY_LITTLE, datalo */
> -                tcg_out_ldst_asi(s, datalo, addr, 0, LDXA, ASI_PRIMARY_LITTLE);
> -            }
> -        } else {
> -            if (bigendian) {
> -                tcg_out_ldst(s, datahi, addr, 0, LDUW);
> -                tcg_out_ldst(s, datalo, addr, 4, LDUW);
> -            } else {
> -                tcg_out_ldst_asi(s, datalo, addr, 0, LDUWA, ASI_PRIMARY_LITTLE);
> -                tcg_out_ldst_asi(s, datahi, addr, 4, LDUWA, ASI_PRIMARY_LITTLE);
> -            }
> -        }
> -        break;
> -    default:
> -        tcg_abort();
> -    }
> -}
> +};
>
> -static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
> +static const int qemu_st_opc[4] = {
> +#ifdef TARGET_WORDS_BIGENDIAN
> +    STB, STH, STW, STX
> +#else
> +    STB, STH_LE, STW_LE, STX_LE
> +#endif
> +};
> +
> +static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
>  {
>     int addrlo_idx = 1, datalo, datahi, addr_reg;
>  #if defined(CONFIG_SOFTMMU)
> @@ -915,7 +828,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
>  #endif
>
>     datahi = datalo = args[0];
> -    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
> +    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
>         datahi = args[1];
>         addrlo_idx = 2;
>     }
> @@ -923,27 +836,59 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
>  #if defined(CONFIG_SOFTMMU)
>     memi_idx = addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
>     memi = args[memi_idx];
> -    s_bits = opc & 3;
> +    s_bits = sizeop & 3;
>
>     addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, s_bits, args,
> -                                label_ptr, offsetof(CPUTLBEntry, addr_read));
> +                                offsetof(CPUTLBEntry, addr_read));
>
> -    /* TLB Hit.  */
> -    tcg_out_qemu_ld_direct(s, addr_reg, datalo, datahi, opc);
> +    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
> +        int reg64;
>
> -    /* b,pt,n label1 */
> -    label_ptr[1] = (uint32_t *)s->code_ptr;
> -    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x1)
> -                  | (1 << 29) | (1 << 19)));
> +        /* bne,pn %[xi]cc, label0 */
> +        label_ptr[0] = (uint32_t *)s->code_ptr;
> +        tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_NE, 0) | INSN_OP2(0x1)
> +                      | ((TARGET_LONG_BITS == 64) << 21)));
> +
> +        /* TLB Hit.  */
> +        /* Load all 64-bits into an O/G register.  */
> +        reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
> +        tcg_out_ldst_rr(s, reg64, addr_reg, TCG_REG_O1, qemu_ld_opc[sizeop]);
> +
> +        /* Move the two 32-bit pieces into the destination registers.  */
> +        tcg_out_arithi(s, datahi, reg64, 32, SHIFT_SRLX);
> +        if (reg64 != datalo) {
> +            tcg_out_mov(s, TCG_TYPE_I32, datalo, reg64);
> +        }
> +
> +        /* b,pt,n label1 */
> +        label_ptr[1] = (uint32_t *)s->code_ptr;
> +        tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x1)
> +                      | (1 << 29) | (1 << 19)));
> +    } else {
> +        /* The fast path is exactly one insn.  Thus we can perform the
> +           entire TLB Hit in the (annulled) delay slot of the branch
> +           over the TLB Miss case.  */
> +
> +        /* beq,a,pt %[xi]cc, label0 */
> +        label_ptr[0] = NULL;
> +        label_ptr[1] = (uint32_t *)s->code_ptr;
> +        tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_E, 0) | INSN_OP2(0x1)
> +                      | ((TARGET_LONG_BITS == 64) << 21)
> +                      | (1 << 29) | (1 << 19)));
> +        /* delay slot */
> +        tcg_out_ldst_rr(s, datalo, addr_reg, TCG_REG_O1, qemu_ld_opc[sizeop]);
> +    }
>
>     /* TLB Miss.  */
>
> -    *label_ptr[0] |= INSN_OFF19((unsigned long)s->code_ptr -
> -                                (unsigned long)label_ptr[0]);
> -    n = 0;
> -#ifdef CONFIG_TCG_PASS_AREG0
> -    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[n++], TCG_AREG0);
> -#endif
> +    if (label_ptr[0]) {
> +        *label_ptr[0] |= INSN_OFF19((unsigned long)s->code_ptr -
> +                                    (unsigned long)label_ptr[0]);
> +    }
> +    n = ARG_OFFSET;
> +    if (ARG_OFFSET) {
> +       tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);

WARNING: suspect code indent for conditional statements (4, 7)
#395: FILE: tcg/sparc/tcg-target.c:889:
+    if (ARG_OFFSET) {
+       tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);

WARNING: suspect code indent for conditional statements (4, 9)
#542: FILE: tcg/sparc/tcg-target.c:1013:
+    if (ARG_OFFSET) {
+         tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);

total: 0 errors, 2 warnings, 525 lines checked

> +    }
>     if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
>         tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
>                     args[addrlo_idx + 1]);
> @@ -971,7 +916,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
>
>     n = tcg_target_call_oarg_regs[0];
>     /* datalo = sign_extend(arg0) */
> -    switch(opc) {
> +    switch (sizeop) {
>     case 0 | 4:
>         /* Recall that SRA sign extends from bit 31 through bit 63.  */
>         tcg_out_arithi(s, datalo, n, 24, SHIFT_SLL);
> @@ -1008,75 +953,31 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
>         tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
>         addr_reg = TCG_REG_I5;
>     }
> -    tcg_out_qemu_ld_direct(s, addr_reg, datalo, datahi, opc);
> -#endif /* CONFIG_SOFTMMU */
> -}
> +    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
> +        int reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
>
> -static void tcg_out_qemu_st_direct(TCGContext *s, int addr, int datalo,
> -                                   int datahi, int sizeop)
> -{
> -#ifdef TARGET_WORDS_BIGENDIAN
> -    const int bigendian = 1;
> -#else
> -    const int bigendian = 0;
> -#endif
> -    switch (sizeop) {
> -    case 0:
> -        /* stb datalo, [addr] */
> -        tcg_out_ldst(s, datalo, addr, 0, STB);
> -        break;
> -    case 1:
> -        if (bigendian) {
> -            /* sth datalo, [addr] */
> -            tcg_out_ldst(s, datalo, addr, 0, STH);
> -        } else {
> -            /* stha datalo, [addr] ASI_PRIMARY_LITTLE */
> -            tcg_out_ldst_asi(s, datalo, addr, 0, STHA, ASI_PRIMARY_LITTLE);
> -        }
> -        break;
> -    case 2:
> -        if (bigendian) {
> -            /* stw datalo, [addr] */
> -            tcg_out_ldst(s, datalo, addr, 0, STW);
> -        } else {
> -            /* stwa datalo, [addr] ASI_PRIMARY_LITTLE */
> -            tcg_out_ldst_asi(s, datalo, addr, 0, STWA, ASI_PRIMARY_LITTLE);
> -        }
> -        break;
> -    case 3:
> -        if (TCG_TARGET_REG_BITS == 64) {
> -            if (bigendian) {
> -                /* stx datalo, [addr] */
> -                tcg_out_ldst(s, datalo, addr, 0, STX);
> -            } else {
> -                /* stxa datalo, [addr] ASI_PRIMARY_LITTLE */
> -                tcg_out_ldst_asi(s, datalo, addr, 0, STXA, ASI_PRIMARY_LITTLE);
> -            }
> -        } else {
> -            if (bigendian) {
> -                tcg_out_ldst(s, datahi, addr, 0, STW);
> -                tcg_out_ldst(s, datalo, addr, 4, STW);
> -            } else {
> -                tcg_out_ldst_asi(s, datalo, addr, 0, STWA, ASI_PRIMARY_LITTLE);
> -                tcg_out_ldst_asi(s, datahi, addr, 4, STWA, ASI_PRIMARY_LITTLE);
> -            }
> +        tcg_out_ldst_rr(s, reg64, addr_reg, TCG_REG_G0, qemu_ld_opc[sizeop]);
> +
> +        tcg_out_arithi(s, datahi, reg64, 32, SHIFT_SRLX);
> +        if (reg64 != datalo) {
> +            tcg_out_mov(s, TCG_TYPE_I32, datalo, reg64);
>         }
> -        break;
> -    default:
> -        tcg_abort();
> +    } else {
> +        tcg_out_ldst_rr(s, datalo, addr_reg, TCG_REG_G0, qemu_ld_opc[sizeop]);
>     }
> +#endif /* CONFIG_SOFTMMU */
>  }
>
> -static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
> +static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
>  {
>     int addrlo_idx = 1, datalo, datahi, addr_reg;
>  #if defined(CONFIG_SOFTMMU)
>     int memi_idx, memi, n;
> -    uint32_t *label_ptr[2];
> +    uint32_t *label_ptr;
>  #endif
>
>     datahi = datalo = args[0];
> -    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
> +    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
>         datahi = args[1];
>         addrlo_idx = 2;
>     }
> @@ -1085,33 +986,40 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
>     memi_idx = addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
>     memi = args[memi_idx];
>
> -    addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, opc, args,
> -                                label_ptr, offsetof(CPUTLBEntry, addr_write));
> +    addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, sizeop, args,
> +                                offsetof(CPUTLBEntry, addr_write));
>
> -    /* TLB Hit.  */
> -    tcg_out_qemu_st_direct(s, addr_reg, datalo, datahi, opc);
> +    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
> +        /* Reconstruct the full 64-bit value in %g1, using %o2 as temp.  */
> +        /* ??? Redefine the temps from %i4/%i5 so that we have a o/g temp. */
> +        tcg_out_arithi(s, TCG_REG_G1, datalo, 0, SHIFT_SRL);
> +        tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
> +        tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
> +        datalo = TCG_REG_G1;
> +    }
>
> -    /* b,pt,n label1 */
> -    label_ptr[1] = (uint32_t *)s->code_ptr;
> -    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x1)
> +    /* The fast path is exactly one insn.  Thus we can perform the entire
> +       TLB Hit in the (annulled) delay slot of the branch over TLB Miss.  */
> +    /* beq,a,pt %[xi]cc, label0 */
> +    label_ptr = (uint32_t *)s->code_ptr;
> +    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_E, 0) | INSN_OP2(0x1)
> +                  | ((TARGET_LONG_BITS == 64) << 21)
>                   | (1 << 29) | (1 << 19)));
> +    /* delay slot */
> +    tcg_out_ldst_rr(s, datalo, addr_reg, TCG_REG_O1, qemu_st_opc[sizeop]);
>
>     /* TLB Miss.  */
> -
> -    *label_ptr[0] |= INSN_OFF19((unsigned long)s->code_ptr -
> -                                (unsigned long)label_ptr[0]);
> -
> -    n = 0;
> -#ifdef CONFIG_TCG_PASS_AREG0
> -    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[n++], TCG_AREG0);
> -#endif
> +    n = ARG_OFFSET;
> +    if (ARG_OFFSET) {
> +         tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
> +    }
>     if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
>         tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
>                     args[addrlo_idx + 1]);
>     }
>     tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
>                 args[addrlo_idx]);
> -    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
> +    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
>         tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++], datahi);
>     }
>     tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++], datalo);
> @@ -1123,7 +1031,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
>                sizeof(long));
>
>     /* qemu_st_helper[s_bits](arg0, arg1, arg2) */
> -    tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_st_helpers[opc]
> +    tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_st_helpers[sizeop]
>                            - (tcg_target_ulong)s->code_ptr) >> 2)
>                          & 0x3fffffff));
>     /* delay slot */
> @@ -1134,15 +1042,23 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
>                TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
>                sizeof(long));
>
> -    *label_ptr[1] |= INSN_OFF19((unsigned long)s->code_ptr -
> -                                (unsigned long)label_ptr[1]);
> +    *label_ptr |= INSN_OFF19((unsigned long)s->code_ptr -
> +                             (unsigned long)label_ptr);
>  #else
>     addr_reg = args[addrlo_idx];
>     if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
>         tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
>         addr_reg = TCG_REG_I5;
>     }
> -    tcg_out_qemu_st_direct(s, addr_reg, datalo, datahi, opc);
> +    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
> +        /* Reconstruct the full 64-bit value in %g1, using %o2 as temp.  */
> +        /* ??? Redefine the temps from %i4/%i5 so that we have a o/g temp. */
> +        tcg_out_arithi(s, TCG_REG_G1, datalo, 0, SHIFT_SRL);
> +        tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
> +        tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
> +        datalo = TCG_REG_G1;
> +    }
> +    tcg_out_ldst_rr(s, datalo, addr_reg, TCG_REG_G0, qemu_st_opc[sizeop]);
>  #endif /* CONFIG_SOFTMMU */
>  }
>
> --
> 1.7.7.6
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 03/14] tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode.
  2012-03-29 18:45   ` Blue Swirl
@ 2012-03-29 18:49     ` Richard Henderson
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2012-03-29 18:49 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

On 03/29/2012 02:45 PM, Blue Swirl wrote:
>> >  #elif defined(TARGET_SPARC)
>> >     print_insn = print_insn_sparc;
>> > -#ifdef TARGET_SPARC64
>> >     disasm_info.mach = bfd_mach_sparc_v9b;
>> > -#endif
> This is not OK, it would change ASI printout for V8 guest code.
> 

Ah, right.  Ok, will fix for the next revision.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 12/14] tcg-sparc: Use defines for temporaries.
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 12/14] tcg-sparc: Use defines for temporaries Richard Henderson
@ 2012-03-29 18:56   ` Blue Swirl
  2012-03-29 19:04     ` Richard Henderson
  0 siblings, 1 reply; 21+ messages in thread
From: Blue Swirl @ 2012-03-29 18:56 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Wed, Mar 28, 2012 at 00:32, Richard Henderson <rth@twiddle.net> wrote:
> And change from %i4/%i5 to %g1/%o7 to remove a v8plus fixme.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/sparc/tcg-target.c |  114 ++++++++++++++++++++++++-----------------------
>  1 files changed, 58 insertions(+), 56 deletions(-)
>
> diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
> index 5b3cde4..9e822f3 100644
> --- a/tcg/sparc/tcg-target.c
> +++ b/tcg/sparc/tcg-target.c
> @@ -59,8 +59,12 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
>  };
>  #endif
>
> +/* Define some temporary registers.  T2 is used for constant generation.  */
> +#define TCG_REG_T1  TCG_REG_G1
> +#define TCG_REG_T2  TCG_REG_O7
> +
>  #ifdef CONFIG_USE_GUEST_BASE
> -# define TCG_GUEST_BASE_REG TCG_REG_I3
> +# define TCG_GUEST_BASE_REG TCG_REG_I5
>  #else
>  # define TCG_GUEST_BASE_REG TCG_REG_G0
>  #endif
> @@ -85,6 +89,7 @@ static const int tcg_target_reg_alloc_order[] = {
>     TCG_REG_I2,
>     TCG_REG_I3,
>     TCG_REG_I4,
> +    TCG_REG_I5,
>  };
>
>  static const int tcg_target_call_iarg_regs[6] = {
> @@ -372,10 +377,10 @@ static inline void tcg_out_movi(TCGContext *s, TCGType type,
>         tcg_out_sethi(s, ret, ~arg);
>         tcg_out_arithi(s, ret, ret, (arg & 0x3ff) | -0x400, ARITH_XOR);
>     } else {
> -        tcg_out_movi_imm32(s, TCG_REG_I4, arg >> (TCG_TARGET_REG_BITS / 2));
> -        tcg_out_arithi(s, TCG_REG_I4, TCG_REG_I4, 32, SHIFT_SLLX);
> -        tcg_out_movi_imm32(s, ret, arg);
> -        tcg_out_arith(s, ret, ret, TCG_REG_I4, ARITH_OR);
> +        tcg_out_movi_imm32(s, ret, arg >> (TCG_TARGET_REG_BITS / 2));
> +        tcg_out_arithi(s, ret, ret, 32, SHIFT_SLLX);
> +        tcg_out_movi_imm32(s, TCG_REG_T2, arg);
> +        tcg_out_arith(s, ret, ret, TCG_REG_T2, ARITH_OR);
>     }
>  }
>
> @@ -392,8 +397,8 @@ static inline void tcg_out_ldst(TCGContext *s, int ret, int addr,
>         tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(addr) |
>                   INSN_IMM13(offset));
>     } else {
> -        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I5, offset);
> -        tcg_out_ldst_rr(s, ret, addr, TCG_REG_I5, op);
> +        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_T1, offset);
> +        tcg_out_ldst_rr(s, ret, addr, TCG_REG_T1, op);
>     }
>  }
>
> @@ -435,8 +440,8 @@ static inline void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val)
>         if (check_fit_tl(val, 13))
>             tcg_out_arithi(s, reg, reg, val, ARITH_ADD);
>         else {
> -            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I5, val);
> -            tcg_out_arith(s, reg, reg, TCG_REG_I5, ARITH_ADD);
> +            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_T1, val);
> +            tcg_out_arith(s, reg, reg, TCG_REG_T1, ARITH_ADD);
>         }
>     }
>  }
> @@ -448,8 +453,8 @@ static inline void tcg_out_andi(TCGContext *s, int rd, int rs,
>         if (check_fit_tl(val, 13))
>             tcg_out_arithi(s, rd, rs, val, ARITH_AND);
>         else {
> -            tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_I5, val);
> -            tcg_out_arith(s, rd, rs, TCG_REG_I5, ARITH_AND);
> +            tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_T1, val);
> +            tcg_out_arith(s, rd, rs, TCG_REG_T1, ARITH_AND);
>         }
>     }
>  }
> @@ -461,8 +466,8 @@ static void tcg_out_div32(TCGContext *s, int rd, int rs1,
>     if (uns) {
>         tcg_out_sety(s, TCG_REG_G0);
>     } else {
> -        tcg_out_arithi(s, TCG_REG_I5, rs1, 31, SHIFT_SRA);
> -        tcg_out_sety(s, TCG_REG_I5);
> +        tcg_out_arithi(s, TCG_REG_T1, rs1, 31, SHIFT_SRA);
> +        tcg_out_sety(s, TCG_REG_T1);

By the way, since we assume V9+, this 32 bit division which uses the
register y could be changed (in some later patch) to use nicer 64 bit
division.

>     }
>
>     tcg_out_arithc(s, rd, rs1, val2, val2const,
> @@ -608,8 +613,8 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
>     case TCG_COND_GTU:
>     case TCG_COND_GEU:
>         if (c2const && c2 != 0) {
> -            tcg_out_movi_imm13(s, TCG_REG_I5, c2);
> -            c2 = TCG_REG_I5;
> +            tcg_out_movi_imm13(s, TCG_REG_T1, c2);
> +            c2 = TCG_REG_T1;
>         }
>         t = c1, c1 = c2, c2 = t, c2const = 0;
>         cond = tcg_swap_cond(cond);
> @@ -656,15 +661,15 @@ static void tcg_out_setcond2_i32(TCGContext *s, TCGCond cond, TCGArg ret,
>
>     switch (cond) {
>     case TCG_COND_EQ:
> -        tcg_out_setcond_i32(s, TCG_COND_EQ, TCG_REG_I5, al, bl, blconst);
> +        tcg_out_setcond_i32(s, TCG_COND_EQ, TCG_REG_T1, al, bl, blconst);
>         tcg_out_setcond_i32(s, TCG_COND_EQ, ret, ah, bh, bhconst);
> -        tcg_out_arith(s, ret, ret, TCG_REG_I5, ARITH_AND);
> +        tcg_out_arith(s, ret, ret, TCG_REG_T1, ARITH_AND);
>         break;
>
>     case TCG_COND_NE:
> -        tcg_out_setcond_i32(s, TCG_COND_NE, TCG_REG_I5, al, al, blconst);
> +        tcg_out_setcond_i32(s, TCG_COND_NE, TCG_REG_T1, al, al, blconst);
>         tcg_out_setcond_i32(s, TCG_COND_NE, ret, ah, bh, bhconst);
> -        tcg_out_arith(s, ret, ret, TCG_REG_I5, ARITH_OR);
> +        tcg_out_arith(s, ret, ret, TCG_REG_T1, ARITH_OR);
>         break;
>
>     default:
> @@ -964,8 +969,8 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
>  #else
>     addr_reg = args[addrlo_idx];
>     if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
> -        tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
> -        addr_reg = TCG_REG_I5;
> +        tcg_out_arithi(s, TCG_REG_T1, addr_reg, 0, SHIFT_SRL);
> +        addr_reg = TCG_REG_T1;
>     }
>     if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
>         int reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
> @@ -1008,12 +1013,11 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
>                                 offsetof(CPUTLBEntry, addr_write));
>
>     if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
> -        /* Reconstruct the full 64-bit value in %g1, using %o2 as temp.  */
> -        /* ??? Redefine the temps from %i4/%i5 so that we have a o/g temp. */
> -        tcg_out_arithi(s, TCG_REG_G1, datalo, 0, SHIFT_SRL);
> +        /* Reconstruct the full 64-bit value.  */
> +        tcg_out_arithi(s, TCG_REG_T1, datalo, 0, SHIFT_SRL);
>         tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
> -        tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
> -        datalo = TCG_REG_G1;
> +        tcg_out_arith(s, TCG_REG_O2, TCG_REG_T1, TCG_REG_O2, ARITH_OR);
> +        datalo = TCG_REG_O2;
>     }
>
>     /* The fast path is exactly one insn.  Thus we can perform the entire
> @@ -1054,16 +1058,14 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
>  #else
>     addr_reg = args[addrlo_idx];
>     if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
> -        tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
> -        addr_reg = TCG_REG_I5;
> +        tcg_out_arithi(s, TCG_REG_T1, addr_reg, 0, SHIFT_SRL);
> +        addr_reg = TCG_REG_T1;
>     }
>     if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
> -        /* Reconstruct the full 64-bit value in %g1, using %o2 as temp.  */
> -        /* ??? Redefine the temps from %i4/%i5 so that we have a o/g temp. */
> -        tcg_out_arithi(s, TCG_REG_G1, datalo, 0, SHIFT_SRL);
> +        tcg_out_arithi(s, TCG_REG_T1, datalo, 0, SHIFT_SRL);
>         tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
> -        tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
> -        datalo = TCG_REG_G1;
> +        tcg_out_arith(s, TCG_REG_O2, TCG_REG_T1, TCG_REG_O2, ARITH_OR);
> +        datalo = TCG_REG_O2;
>     }
>     tcg_out_ldst_rr(s, datalo, addr_reg,
>                     (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
> @@ -1087,28 +1089,28 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>     case INDEX_op_goto_tb:
>         if (s->tb_jmp_offset) {
>             /* direct jump method */
> -            tcg_out_sethi(s, TCG_REG_I5, args[0] & 0xffffe000);
> -            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_I5) |
> +            tcg_out_sethi(s, TCG_REG_T1, args[0] & 0xffffe000);
> +            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_T1) |
>                       INSN_IMM13((args[0] & 0x1fff)));
>             s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
>         } else {
>             /* indirect jump method */
> -            tcg_out_ld_ptr(s, TCG_REG_I5, (tcg_target_long)(s->tb_next + args[0]));
> -            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_I5) |
> +            tcg_out_ld_ptr(s, TCG_REG_T1, (tcg_target_long)(s->tb_next + args[0]));

WARNING: line over 80 characters
#231: FILE: tcg/sparc/tcg-target.c:1098:
+            tcg_out_ld_ptr(s, TCG_REG_T1,
(tcg_target_long)(s->tb_next + args[0]));


> +            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_T1) |
>                       INSN_RS2(TCG_REG_G0));
>         }
>         tcg_out_nop(s);
>         s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
>         break;
>     case INDEX_op_call:
> -        if (const_args[0])
> +        if (const_args[0]) {
>             tcg_out32(s, CALL | ((((tcg_target_ulong)args[0]
>                                    - (tcg_target_ulong)s->code_ptr) >> 2)
>                                  & 0x3fffffff));
> -        else {
> -            tcg_out_ld_ptr(s, TCG_REG_I5,
> +        } else {
> +            tcg_out_ld_ptr(s, TCG_REG_T1,
>                            (tcg_target_long)(s->tb_next + args[0]));
> -            tcg_out32(s, JMPL | INSN_RD(TCG_REG_O7) | INSN_RS1(TCG_REG_I5) |
> +            tcg_out32(s, JMPL | INSN_RD(TCG_REG_O7) | INSN_RS1(TCG_REG_T1) |
>                       INSN_RS2(TCG_REG_G0));
>         }
>         /* delay slot */
> @@ -1214,11 +1216,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>
>     case INDEX_op_rem_i32:
>     case INDEX_op_remu_i32:
> -        tcg_out_div32(s, TCG_REG_I5, args[1], args[2], const_args[2],
> +        tcg_out_div32(s, TCG_REG_T1, args[1], args[2], const_args[2],
>                       opc == INDEX_op_remu_i32);
> -        tcg_out_arithc(s, TCG_REG_I5, TCG_REG_I5, args[2], const_args[2],
> +        tcg_out_arithc(s, TCG_REG_T1, TCG_REG_T1, args[2], const_args[2],
>                        ARITH_UMUL);
> -        tcg_out_arith(s, args[0], args[1], TCG_REG_I5, ARITH_SUB);
> +        tcg_out_arith(s, args[0], args[1], TCG_REG_T1, ARITH_SUB);
>         break;
>
>     case INDEX_op_brcond_i32:
> @@ -1335,11 +1337,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>         goto gen_arith;
>     case INDEX_op_rem_i64:
>     case INDEX_op_remu_i64:
> -        tcg_out_arithc(s, TCG_REG_I5, args[1], args[2], const_args[2],
> +        tcg_out_arithc(s, TCG_REG_T1, args[1], args[2], const_args[2],
>                        opc == INDEX_op_rem_i64 ? ARITH_SDIVX : ARITH_UDIVX);
> -        tcg_out_arithc(s, TCG_REG_I5, TCG_REG_I5, args[2], const_args[2],
> +        tcg_out_arithc(s, TCG_REG_T1, TCG_REG_T1, args[2], const_args[2],
>                        ARITH_MULX);
> -        tcg_out_arith(s, args[0], args[1], TCG_REG_I5, ARITH_SUB);
> +        tcg_out_arith(s, args[0], args[1], TCG_REG_T1, ARITH_SUB);
>         break;
>     case INDEX_op_ext32s_i64:
>         if (const_args[1]) {
> @@ -1537,15 +1539,15 @@ static void tcg_target_init(TCGContext *s)
>                      (1 << TCG_REG_O7));
>
>     tcg_regset_clear(s->reserved_regs);
> -    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G0);
> -#if TCG_TARGET_REG_BITS == 64
> -    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I4); // for internal use
> -#endif
> -    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I5); // for internal use
> -    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I6);
> -    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I7);
> -    tcg_regset_set_reg(s->reserved_regs, TCG_REG_O6);
> -    tcg_regset_set_reg(s->reserved_regs, TCG_REG_O7);
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G0); /* zero */
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G6); /* reserved for os */
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G7); /* thread pointer */
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I6); /* frame pointer */
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I7); /* return address */
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_O6); /* stack pointer */
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_T1); /* for internal use */
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_T2); /* for internal use */
> +
>     tcg_add_target_add_op_defs(sparc_op_defs);
>  }
>
> --
> 1.7.7.6
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 07/14] Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0.
  2012-03-28  0:32 ` [Qemu-devel] [PATCH 07/14] Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0 Richard Henderson
@ 2012-03-29 18:57   ` Blue Swirl
  0 siblings, 0 replies; 21+ messages in thread
From: Blue Swirl @ 2012-03-29 18:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Wed, Mar 28, 2012 at 00:32, Richard Henderson <rth@twiddle.net> wrote:
> At the same time, remove use of the global ENV from user-exec.c.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  Makefile.target |    5 -----
>  dyngen-exec.h   |    5 +++++
>  user-exec.c     |   17 ++++++-----------
>  3 files changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/Makefile.target b/Makefile.target
> index aa53e28..81fdf9e 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -110,11 +110,6 @@ $(libobj-y): $(GENERATED_HEADERS)
>  ifndef CONFIG_TCG_PASS_AREG0
>  op_helper.o: QEMU_CFLAGS += $(HELPER_CFLAGS)
>  endif
> -user-exec.o: QEMU_CFLAGS += $(HELPER_CFLAGS)
> -
> -# Note: this is a workaround. The real fix is to avoid compiling
> -# cpu_signal_handler() in user-exec.c.
> -signal.o: QEMU_CFLAGS += $(HELPER_CFLAGS)

The patch does not apply, please rebase.
Applying: Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0.
error: patch failed: Makefile.target:110
error: Makefile.target: patch does not apply

>
>  #########################################################
>  # Linux user emulator target
> diff --git a/dyngen-exec.h b/dyngen-exec.h
> index cfeef99..65fcb43 100644
> --- a/dyngen-exec.h
> +++ b/dyngen-exec.h
> @@ -19,6 +19,10 @@
>  #if !defined(__DYNGEN_EXEC_H__)
>  #define __DYNGEN_EXEC_H__
>
> +/* If the target has indicated that it does not need an AREG0,
> +   don't declare the env variable at all, much less as a register.  */
> +#if !defined(CONFIG_TCG_PASS_AREG0)
> +
>  #if defined(CONFIG_TCG_INTERPRETER)
>  /* The TCG interpreter does not need a special register AREG0,
>  * but it is possible to use one by defining AREG0.
> @@ -65,4 +69,5 @@ register CPUArchState *env asm(AREG0);
>  extern CPUArchState *env;
>  #endif
>
> +#endif /* !CONFIG_TCG_PASS_AREG0 */
>  #endif /* !defined(__DYNGEN_EXEC_H__) */
> diff --git a/user-exec.c b/user-exec.c
> index cd905ff..9691f09 100644
> --- a/user-exec.c
> +++ b/user-exec.c
> @@ -18,7 +18,6 @@
>  */
>  #include "config.h"
>  #include "cpu.h"
> -#include "dyngen-exec.h"
>  #include "disas.h"
>  #include "tcg.h"
>
> @@ -58,8 +57,6 @@ void cpu_resume_from_signal(CPUArchState *env1, void *puc)
>     struct sigcontext *uc = puc;
>  #endif
>
> -    env = env1;
> -
>     /* XXX: restore cpu registers saved in host registers */
>
>     if (puc) {
> @@ -74,8 +71,8 @@ void cpu_resume_from_signal(CPUArchState *env1, void *puc)
>         sigprocmask(SIG_SETMASK, &uc->sc_mask, NULL);
>  #endif
>     }
> -    env->exception_index = -1;
> -    longjmp(env->jmp_env, 1);
> +    env1->exception_index = -1;
> +    longjmp(env1->jmp_env, 1);
>  }
>
>  /* 'pc' is the host PC at which the exception was raised. 'address' is
> @@ -86,12 +83,10 @@ static inline int handle_cpu_signal(unsigned long pc, unsigned long address,
>                                     int is_write, sigset_t *old_set,
>                                     void *puc)
>  {
> +    CPUArchState *env1 = cpu_single_env;
>     TranslationBlock *tb;
>     int ret;
>
> -    if (cpu_single_env) {
> -        env = cpu_single_env; /* XXX: find a correct solution for multithread */
> -    }
>  #if defined(DEBUG_SIGNAL)
>     qemu_printf("qemu: SIGSEGV pc=0x%08lx address=%08lx w=%d oldset=0x%08lx\n",
>                 pc, address, is_write, *(unsigned long *)old_set);
> @@ -102,7 +97,7 @@ static inline int handle_cpu_signal(unsigned long pc, unsigned long address,
>     }
>
>     /* see if it is an MMU fault */
> -    ret = cpu_handle_mmu_fault(env, address, is_write, MMU_USER_IDX);
> +    ret = cpu_handle_mmu_fault(env1, address, is_write, MMU_USER_IDX);
>     if (ret < 0) {
>         return 0; /* not an MMU fault */
>     }
> @@ -114,13 +109,13 @@ static inline int handle_cpu_signal(unsigned long pc, unsigned long address,
>     if (tb) {
>         /* the PC is inside the translated code. It means that we have
>            a virtual CPU fault */
> -        cpu_restore_state(tb, env, pc);
> +        cpu_restore_state(tb, env1, pc);
>     }
>
>     /* we restore the process signal mask as the sigreturn should
>        do it (XXX: use sigsetjmp) */
>     sigprocmask(SIG_SETMASK, old_set, NULL);
> -    exception_action(env);
> +    exception_action(env1);
>
>     /* never comes here */
>     return 1;
> --
> 1.7.7.6
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 12/14] tcg-sparc: Use defines for temporaries.
  2012-03-29 18:56   ` Blue Swirl
@ 2012-03-29 19:04     ` Richard Henderson
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2012-03-29 19:04 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

On 03/29/2012 02:56 PM, Blue Swirl wrote:
>> > +        tcg_out_arithi(s, TCG_REG_T1, rs1, 31, SHIFT_SRA);
>> > +        tcg_out_sety(s, TCG_REG_T1);
> By the way, since we assume V9+, this 32 bit division which uses the
> register y could be changed (in some later patch) to use nicer 64 bit
> division.
> 

Good spotting.

Although my next trick will be to make tcg changes such that v8plus can be a TCG_TARGET_REG_BITS == 64 host, and do proper 64-bit arithmetic in the %o/%g registers.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-03-29 19:04 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-28  0:32 [Qemu-devel] [PATCH 00/14] tcg-sparc improvments, v2 Richard Henderson
2012-03-28  0:32 ` [Qemu-devel] [PATCH 01/14] tcg-sparc: Hack in qemu_ld/st64 for 32-bit Richard Henderson
2012-03-28  0:32 ` [Qemu-devel] [PATCH 02/14] tcg-sparc: Fix ADDX opcode Richard Henderson
2012-03-28  0:32 ` [Qemu-devel] [PATCH 03/14] tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode Richard Henderson
2012-03-29 18:45   ` Blue Swirl
2012-03-29 18:49     ` Richard Henderson
2012-03-28  0:32 ` [Qemu-devel] [PATCH 04/14] tcg-sparc: Fix qemu_ld/st to handle 32-bit host Richard Henderson
2012-03-28  0:32 ` [Qemu-devel] [PATCH 05/14] tcg-sparc: Simplify qemu_ld/st direct memory paths Richard Henderson
2012-03-29 18:47   ` Blue Swirl
2012-03-28  0:32 ` [Qemu-devel] [PATCH 06/14] tcg-sparc: Support GUEST_BASE Richard Henderson
2012-03-28  0:32 ` [Qemu-devel] [PATCH 07/14] Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0 Richard Henderson
2012-03-29 18:57   ` Blue Swirl
2012-03-28  0:32 ` [Qemu-devel] [PATCH 08/14] tcg-sparc: Do not use a global register for AREG0 Richard Henderson
2012-03-28  0:32 ` [Qemu-devel] [PATCH 09/14] tcg-sparc: Change AREG0 in generated code to %i0 Richard Henderson
2012-03-28  0:32 ` [Qemu-devel] [PATCH 10/14] tcg-sparc: Clean up cruft stemming from attempts to use global registers Richard Henderson
2012-03-28  0:32 ` [Qemu-devel] [PATCH 11/14] tcg-sparc: Mask shift immediates to avoid illegal insns Richard Henderson
2012-03-28  0:32 ` [Qemu-devel] [PATCH 12/14] tcg-sparc: Use defines for temporaries Richard Henderson
2012-03-29 18:56   ` Blue Swirl
2012-03-29 19:04     ` Richard Henderson
2012-03-28  0:32 ` [Qemu-devel] [PATCH 13/14] tcg-sparc: Add %g/%o registers to alloc_order Richard Henderson
2012-03-28  0:32 ` [Qemu-devel] [PATCH 14/14] tcg-sparc: Fix and enable direct TB chaining Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.