All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements
@ 2013-03-28 15:32 Richard Henderson
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 01/20] tcg-arm: Use bic to implement and with constant Richard Henderson
                   ` (19 more replies)
  0 siblings, 20 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

Changes v2-v3:
  * Patch 16 is an important bug fix.  For helpers with more than 4 arguments,
    we were clobbering the register save area.  The patch as written is
    difficult to disentangle from the rest of the series, but I think that
    a similar patch should be written for stable.

    I have no idea how we aren't just seeing crashes everywhere without this.

  * The softmmu code is improved, using a tidier insn schedule and making
    use of the out-of-line ldst feature.

  * Always use blx for calls when we have movw+movt.  BLX is predicted as
    a call wrt the branch prediction stack, where a ldr into the pc isn't.
    At least as documented for the A9.


r~


Richard Henderson (20):
  tcg-arm: Use bic to implement and with constant
  tcg-arm: Handle negated constant arguments to and/sub
  tcg-arm: Allow constant first argument to sub
  tcg-arm: Use tcg_out_dat_rIN for compares
  tcg-arm: Handle constant arguments to add2/sub2
  tcg-arm: Improve constant generation
  tcg-arm: Fold epilogue into INDEX_op_exit_tb
  tcg-arm: Implement deposit for armv7
  tcg-arm: Implement division instructions
  tcg-arm: Use TCG_REG_TMP name for the tcg temporary
  tcg-arm: Use R12 for the tcg temporary
  tcg-arm: Cleanup multiply subroutines
  tcg-arm: Cleanup tcg_out_goto_label
  tcg-arm: Cleanup goto_tb handling
  tcg-arm: Cleanup most primitive load store subroutines
  tcg-arm: Fix local stack frame
  tcg-arm: Split out tcg_out_tlb_read
  tcg-arm: Improve scheduling of tcg_out_tlb_read
  tcg-arm: Use movi32 + blx for calls on v7
  tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION

 configure               |    2 +-
 disas/arm.c             |    4 +
 include/exec/exec-all.h |   40 +-
 tcg/arm/tcg-target.c    | 1506 ++++++++++++++++++++++++++---------------------
 tcg/arm/tcg-target.h    |   14 +-
 translate-all.c         |    2 -
 6 files changed, 869 insertions(+), 699 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 01/20] tcg-arm: Use bic to implement and with constant
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-29 16:53   ` Aurelien Jarno
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 02/20] tcg-arm: Handle negated constant arguments to and/sub Richard Henderson
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

This greatly improves the code we can produce for deposit
without armv7 support.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 52 ++++++++++++++++++++++++++++++++++++++++++----------
 tcg/arm/tcg-target.h |  2 --
 2 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 94c6ca4..9972792 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -145,6 +145,9 @@ static void patch_reloc(uint8_t *code_ptr, int type,
     }
 }
 
+#define TCG_CT_CONST_ARM 0x100
+#define TCG_CT_CONST_INV 0x200
+
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
 {
@@ -155,6 +158,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
     case 'I':
          ct->ct |= TCG_CT_CONST_ARM;
          break;
+    case 'K':
+         ct->ct |= TCG_CT_CONST_INV;
+         break;
 
     case 'r':
         ct->ct |= TCG_CT_REG;
@@ -275,16 +281,19 @@ static inline int check_fit_imm(uint32_t imm)
  * add, sub, eor...: ditto
  */
 static inline int tcg_target_const_match(tcg_target_long val,
-                const TCGArgConstraint *arg_ct)
+                                         const TCGArgConstraint *arg_ct)
 {
     int ct;
     ct = arg_ct->ct;
-    if (ct & TCG_CT_CONST)
+    if (ct & TCG_CT_CONST) {
         return 1;
-    else if ((ct & TCG_CT_CONST_ARM) && check_fit_imm(val))
+    } else if ((ct & TCG_CT_CONST_ARM) && check_fit_imm(val)) {
         return 1;
-    else
+    } else if ((ct & TCG_CT_CONST_INV) && check_fit_imm(~val)) {
+        return 1;
+    } else {
         return 0;
+    }
 }
 
 enum arm_data_opc_e {
@@ -482,6 +491,27 @@ static inline void tcg_out_dat_rI(TCGContext *s, int cond, int opc, TCGArg dst,
     }
 }
 
+static void tcg_out_dat_rIK(TCGContext *s, int cond, int opc, int opinv,
+                            TCGReg dst, TCGReg lhs, TCGArg rhs,
+                            bool rhs_is_const)
+{
+    /* Emit either the reg,imm or reg,reg form of a data-processing insn.
+     * rhs must satisfy the "rIK" constraint.
+     */
+    if (rhs_is_const) {
+        int rot = encode_imm(rhs);
+        if (rot < 0) {
+            rhs = ~rhs;
+            rot = encode_imm(rhs);
+            assert(rot >= 0);
+            opc = opinv;
+        }
+        tcg_out_dat_imm(s, cond, opc, dst, lhs, rotl(rhs, rot) | (rot << 7));
+    } else {
+        tcg_out_dat_reg(s, cond, opc, dst, lhs, rhs, SHIFT_IMM_LSL(0));
+    }
+}
+
 static inline void tcg_out_mul32(TCGContext *s,
                 int cond, int rd, int rs, int rm)
 {
@@ -1610,11 +1640,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         c = ARITH_SUB;
         goto gen_arith;
     case INDEX_op_and_i32:
-        c = ARITH_AND;
-        goto gen_arith;
+        tcg_out_dat_rIK(s, COND_AL, ARITH_AND, ARITH_BIC,
+                        args[0], args[1], args[2], const_args[2]);
+        break;
     case INDEX_op_andc_i32:
-        c = ARITH_BIC;
-        goto gen_arith;
+        tcg_out_dat_rIK(s, COND_AL, ARITH_BIC, ARITH_AND,
+                        args[0], args[1], args[2], const_args[2]);
+        break;
     case INDEX_op_or_i32:
         c = ARITH_ORR;
         goto gen_arith;
@@ -1802,8 +1834,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_mul_i32, { "r", "r", "r" } },
     { INDEX_op_mulu2_i32, { "r", "r", "r", "r" } },
     { INDEX_op_muls2_i32, { "r", "r", "r", "r" } },
-    { INDEX_op_and_i32, { "r", "r", "rI" } },
-    { INDEX_op_andc_i32, { "r", "r", "rI" } },
+    { INDEX_op_and_i32, { "r", "r", "rIK" } },
+    { INDEX_op_andc_i32, { "r", "r", "rIK" } },
     { INDEX_op_or_i32, { "r", "r", "rI" } },
     { INDEX_op_xor_i32, { "r", "r", "rI" } },
     { INDEX_op_neg_i32, { "r", "r" } },
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index b6eed1f..354dd8a 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -49,8 +49,6 @@ typedef enum {
 
 #define TCG_TARGET_NB_REGS 16
 
-#define TCG_CT_CONST_ARM 0x100
-
 /* used for function call generation */
 #define TCG_REG_CALL_STACK		TCG_REG_R13
 #define TCG_TARGET_STACK_ALIGN		8
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 02/20] tcg-arm: Handle negated constant arguments to and/sub
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 01/20] tcg-arm: Use bic to implement and with constant Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-29 16:53   ` Aurelien Jarno
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 03/20] tcg-arm: Allow constant first argument to sub Richard Henderson
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

This greatly improves code generation for addition of small
negative constants.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 41 +++++++++++++++++++++++++++++++++++------
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 9972792..f470caa 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -147,6 +147,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
 
 #define TCG_CT_CONST_ARM 0x100
 #define TCG_CT_CONST_INV 0x200
+#define TCG_CT_CONST_NEG 0x400
 
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
@@ -161,6 +162,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
     case 'K':
          ct->ct |= TCG_CT_CONST_INV;
          break;
+    case 'N': /* The gcc constraint letter is L, already used here.  */
+         ct->ct |= TCG_CT_CONST_NEG;
+         break;
 
     case 'r':
         ct->ct |= TCG_CT_REG;
@@ -291,6 +295,8 @@ static inline int tcg_target_const_match(tcg_target_long val,
         return 1;
     } else if ((ct & TCG_CT_CONST_INV) && check_fit_imm(~val)) {
         return 1;
+    } else if ((ct & TCG_CT_CONST_NEG) && check_fit_imm(-val)) {
+        return 1;
     } else {
         return 0;
     }
@@ -512,6 +518,27 @@ static void tcg_out_dat_rIK(TCGContext *s, int cond, int opc, int opinv,
     }
 }
 
+static void tcg_out_dat_rIN(TCGContext *s, int cond, int opc, int opneg,
+                            TCGArg dst, TCGArg lhs, TCGArg rhs,
+                            bool rhs_is_const)
+{
+    /* Emit either the reg,imm or reg,reg form of a data-processing insn.
+     * rhs must satisfy the "rIN" constraint.
+     */
+    if (rhs_is_const) {
+        int rot = encode_imm(rhs);
+        if (rot < 0) {
+            rhs = -rhs;
+            rot = encode_imm(rhs);
+            assert(rot >= 0);
+            opc = opneg;
+        }
+        tcg_out_dat_imm(s, cond, opc, dst, lhs, rotl(rhs, rot) | (rot << 7));
+    } else {
+        tcg_out_dat_reg(s, cond, opc, dst, lhs, rhs, SHIFT_IMM_LSL(0));
+    }
+}
+
 static inline void tcg_out_mul32(TCGContext *s,
                 int cond, int rd, int rs, int rm)
 {
@@ -1634,11 +1661,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                        ARITH_MOV, args[0], 0, args[3], const_args[3]);
         break;
     case INDEX_op_add_i32:
-        c = ARITH_ADD;
-        goto gen_arith;
+        tcg_out_dat_rIN(s, COND_AL, ARITH_ADD, ARITH_SUB,
+                        args[0], args[1], args[2], const_args[2]);
+        break;
     case INDEX_op_sub_i32:
-        c = ARITH_SUB;
-        goto gen_arith;
+        tcg_out_dat_rIN(s, COND_AL, ARITH_SUB, ARITH_ADD,
+                        args[0], args[1], args[2], const_args[2]);
+        break;
     case INDEX_op_and_i32:
         tcg_out_dat_rIK(s, COND_AL, ARITH_AND, ARITH_BIC,
                         args[0], args[1], args[2], const_args[2]);
@@ -1829,8 +1858,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_st_i32, { "r", "r" } },
 
     /* TODO: "r", "r", "ri" */
-    { INDEX_op_add_i32, { "r", "r", "rI" } },
-    { INDEX_op_sub_i32, { "r", "r", "rI" } },
+    { INDEX_op_add_i32, { "r", "r", "rIN" } },
+    { INDEX_op_sub_i32, { "r", "r", "rIN" } },
     { INDEX_op_mul_i32, { "r", "r", "r" } },
     { INDEX_op_mulu2_i32, { "r", "r", "r", "r" } },
     { INDEX_op_muls2_i32, { "r", "r", "r", "r" } },
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 03/20] tcg-arm: Allow constant first argument to sub
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 01/20] tcg-arm: Use bic to implement and with constant Richard Henderson
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 02/20] tcg-arm: Handle negated constant arguments to and/sub Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-29 16:58   ` Aurelien Jarno
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 04/20] tcg-arm: Use tcg_out_dat_rIN for compares Richard Henderson
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

This allows the generation of RSB instructions.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index f470caa..7475142 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -1665,8 +1665,17 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                         args[0], args[1], args[2], const_args[2]);
         break;
     case INDEX_op_sub_i32:
-        tcg_out_dat_rIN(s, COND_AL, ARITH_SUB, ARITH_ADD,
-                        args[0], args[1], args[2], const_args[2]);
+        if (const_args[1]) {
+            if (const_args[2]) {
+                tcg_out_movi32(s, COND_AL, args[0], args[1] - args[2]);
+            } else {
+                tcg_out_dat_rI(s, COND_AL, ARITH_RSB,
+                               args[0], args[2], args[1], 1);
+            }
+        } else {
+            tcg_out_dat_rIN(s, COND_AL, ARITH_SUB, ARITH_ADD,
+                            args[0], args[1], args[2], const_args[2]);
+        }
         break;
     case INDEX_op_and_i32:
         tcg_out_dat_rIK(s, COND_AL, ARITH_AND, ARITH_BIC,
@@ -1859,7 +1868,7 @@ static const TCGTargetOpDef arm_op_defs[] = {
 
     /* TODO: "r", "r", "ri" */
     { INDEX_op_add_i32, { "r", "r", "rIN" } },
-    { INDEX_op_sub_i32, { "r", "r", "rIN" } },
+    { INDEX_op_sub_i32, { "r", "rI", "rIN" } },
     { INDEX_op_mul_i32, { "r", "r", "r" } },
     { INDEX_op_mulu2_i32, { "r", "r", "r", "r" } },
     { INDEX_op_muls2_i32, { "r", "r", "r", "r" } },
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 04/20] tcg-arm: Use tcg_out_dat_rIN for compares
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 03/20] tcg-arm: Allow constant first argument to sub Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-29 16:58   ` Aurelien Jarno
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 05/20] tcg-arm: Handle constant arguments to add2/sub2 Richard Henderson
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

This allows us to emit CMN instructions.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 7475142..3152798 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -1655,10 +1655,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         /* Constraints mean that v2 is always in the same register as dest,
          * so we only need to do "if condition passed, move v1 to dest".
          */
-        tcg_out_dat_rI(s, COND_AL, ARITH_CMP, 0,
-                       args[1], args[2], const_args[2]);
-        tcg_out_dat_rI(s, tcg_cond_to_arm_cond[args[5]],
-                       ARITH_MOV, args[0], 0, args[3], const_args[3]);
+        tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
+                        args[1], args[2], const_args[2]);
+        tcg_out_dat_rIK(s, tcg_cond_to_arm_cond[args[5]], ARITH_MOV,
+                        ARITH_MVN, args[0], 0, args[3], const_args[3]);
         break;
     case INDEX_op_add_i32:
         tcg_out_dat_rIN(s, COND_AL, ARITH_ADD, ARITH_SUB,
@@ -1755,7 +1755,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_brcond_i32:
-        tcg_out_dat_rI(s, COND_AL, ARITH_CMP, 0,
+        tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
                        args[0], args[1], const_args[1]);
         tcg_out_goto_label(s, tcg_cond_to_arm_cond[args[2]], args[3]);
         break;
@@ -1768,15 +1768,15 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
          * TCG_COND_LE(U) --> (a0 <= a2 && a1 == a3) || (a1 <= a3 && a1 != a3),
          * TCG_COND_GT(U) --> (a0 >  a2 && a1 == a3) ||  a1 >  a3,
          */
-        tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0,
-                        args[1], args[3], SHIFT_IMM_LSL(0));
-        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0,
-                        args[0], args[2], SHIFT_IMM_LSL(0));
+        tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
+                        args[1], args[3], const_args[3]);
+        tcg_out_dat_rIN(s, COND_EQ, ARITH_CMP, ARITH_CMN, 0,
+                        args[0], args[2], const_args[2]);
         tcg_out_goto_label(s, tcg_cond_to_arm_cond[args[4]], args[5]);
         break;
     case INDEX_op_setcond_i32:
-        tcg_out_dat_rI(s, COND_AL, ARITH_CMP, 0,
-                       args[1], args[2], const_args[2]);
+        tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
+                        args[1], args[2], const_args[2]);
         tcg_out_dat_imm(s, tcg_cond_to_arm_cond[args[3]],
                         ARITH_MOV, args[0], 0, 1);
         tcg_out_dat_imm(s, tcg_cond_to_arm_cond[tcg_invert_cond(args[3])],
@@ -1784,10 +1784,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
     case INDEX_op_setcond2_i32:
         /* See brcond2_i32 comment */
-        tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0,
-                        args[2], args[4], SHIFT_IMM_LSL(0));
-        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0,
-                        args[1], args[3], SHIFT_IMM_LSL(0));
+        tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
+                        args[2], args[4], const_args[4]);
+        tcg_out_dat_rIN(s, COND_EQ, ARITH_CMP, ARITH_CMN, 0,
+                        args[1], args[3], const_args[3]);
         tcg_out_dat_imm(s, tcg_cond_to_arm_cond[args[5]],
                         ARITH_MOV, args[0], 0, 1);
         tcg_out_dat_imm(s, tcg_cond_to_arm_cond[tcg_invert_cond(args[5])],
@@ -1885,15 +1885,15 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_rotl_i32, { "r", "r", "ri" } },
     { INDEX_op_rotr_i32, { "r", "r", "ri" } },
 
-    { INDEX_op_brcond_i32, { "r", "rI" } },
-    { INDEX_op_setcond_i32, { "r", "r", "rI" } },
-    { INDEX_op_movcond_i32, { "r", "r", "rI", "rI", "0" } },
+    { INDEX_op_brcond_i32, { "r", "rIN" } },
+    { INDEX_op_setcond_i32, { "r", "r", "rIN" } },
+    { INDEX_op_movcond_i32, { "r", "r", "rIN", "rIK", "0" } },
 
     /* TODO: "r", "r", "r", "r", "ri", "ri" */
     { INDEX_op_add2_i32, { "r", "r", "r", "r", "r", "r" } },
     { INDEX_op_sub2_i32, { "r", "r", "r", "r", "r", "r" } },
-    { INDEX_op_brcond2_i32, { "r", "r", "r", "r" } },
-    { INDEX_op_setcond2_i32, { "r", "r", "r", "r", "r" } },
+    { INDEX_op_brcond2_i32, { "r", "r", "rIN", "rIN" } },
+    { INDEX_op_setcond2_i32, { "r", "r", "r", "rIN", "rIN" } },
 
 #if TARGET_LONG_BITS == 32
     { INDEX_op_qemu_ld8u, { "r", "l" } },
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 05/20] tcg-arm: Handle constant arguments to add2/sub2
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (3 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 04/20] tcg-arm: Use tcg_out_dat_rIN for compares Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-28 15:56   ` Peter Maydell
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 06/20] tcg-arm: Improve constant generation Richard Henderson
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

We get to re-use the _rIN and _rIK subroutines to handle the various
combinations of add vs sub.  Fold the << 21 into the opcode enum values
so that we can explicitly add TO_CPSR as desired.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 106 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 58 insertions(+), 48 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 3152798..48cc186 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -302,27 +302,26 @@ static inline int tcg_target_const_match(tcg_target_long val,
     }
 }
 
+#define TO_CPSR (1 << 20)
+
 enum arm_data_opc_e {
-    ARITH_AND = 0x0,
-    ARITH_EOR = 0x1,
-    ARITH_SUB = 0x2,
-    ARITH_RSB = 0x3,
-    ARITH_ADD = 0x4,
-    ARITH_ADC = 0x5,
-    ARITH_SBC = 0x6,
-    ARITH_RSC = 0x7,
-    ARITH_TST = 0x8,
-    ARITH_CMP = 0xa,
-    ARITH_CMN = 0xb,
-    ARITH_ORR = 0xc,
-    ARITH_MOV = 0xd,
-    ARITH_BIC = 0xe,
-    ARITH_MVN = 0xf,
+    ARITH_AND = 0x0 << 21,
+    ARITH_EOR = 0x1 << 21,
+    ARITH_SUB = 0x2 << 21,
+    ARITH_RSB = 0x3 << 21,
+    ARITH_ADD = 0x4 << 21,
+    ARITH_ADC = 0x5 << 21,
+    ARITH_SBC = 0x6 << 21,
+    ARITH_RSC = 0x7 << 21,
+    ARITH_TST = 0x8 << 21 | TO_CPSR,
+    ARITH_CMP = 0xa << 21 | TO_CPSR,
+    ARITH_CMN = 0xb << 21 | TO_CPSR,
+    ARITH_ORR = 0xc << 21,
+    ARITH_MOV = 0xd << 21,
+    ARITH_BIC = 0xe << 21,
+    ARITH_MVN = 0xf << 21,
 };
 
-#define TO_CPSR(opc) \
-  ((opc == ARITH_CMP || opc == ARITH_CMN || opc == ARITH_TST) << 20)
-
 #define SHIFT_IMM_LSL(im)	(((im) << 7) | 0x00)
 #define SHIFT_IMM_LSR(im)	(((im) << 7) | 0x20)
 #define SHIFT_IMM_ASR(im)	(((im) << 7) | 0x40)
@@ -409,7 +408,7 @@ static inline void tcg_out_blx_imm(TCGContext *s, int32_t offset)
 static inline void tcg_out_dat_reg(TCGContext *s,
                 int cond, int opc, int rd, int rn, int rm, int shift)
 {
-    tcg_out32(s, (cond << 28) | (0 << 25) | (opc << 21) | TO_CPSR(opc) |
+    tcg_out32(s, (cond << 28) | (0 << 25) | opc |
                     (rn << 16) | (rd << 12) | shift | rm);
 }
 
@@ -421,29 +420,10 @@ static inline void tcg_out_mov_reg(TCGContext *s, int cond, int rd, int rm)
     }
 }
 
-static inline void tcg_out_dat_reg2(TCGContext *s,
-                int cond, int opc0, int opc1, int rd0, int rd1,
-                int rn0, int rn1, int rm0, int rm1, int shift)
-{
-    if (rd0 == rn1 || rd0 == rm1) {
-        tcg_out32(s, (cond << 28) | (0 << 25) | (opc0 << 21) | (1 << 20) |
-                        (rn0 << 16) | (8 << 12) | shift | rm0);
-        tcg_out32(s, (cond << 28) | (0 << 25) | (opc1 << 21) |
-                        (rn1 << 16) | (rd1 << 12) | shift | rm1);
-        tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        rd0, 0, TCG_REG_R8, SHIFT_IMM_LSL(0));
-    } else {
-        tcg_out32(s, (cond << 28) | (0 << 25) | (opc0 << 21) | (1 << 20) |
-                        (rn0 << 16) | (rd0 << 12) | shift | rm0);
-        tcg_out32(s, (cond << 28) | (0 << 25) | (opc1 << 21) |
-                        (rn1 << 16) | (rd1 << 12) | shift | rm1);
-    }
-}
-
 static inline void tcg_out_dat_imm(TCGContext *s,
                 int cond, int opc, int rd, int rn, int im)
 {
-    tcg_out32(s, (cond << 28) | (1 << 25) | (opc << 21) | TO_CPSR(opc) |
+    tcg_out32(s, (cond << 28) | (1 << 25) | opc |
                     (rn << 16) | (rd << 12) | im);
 }
 
@@ -1563,6 +1543,7 @@ static uint8_t *tb_ret_addr;
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                 const TCGArg *args, const int *const_args)
 {
+    TCGArg a0, a1, a2, a3, a4, a5;
     int c;
 
     switch (opc) {
@@ -1695,14 +1676,44 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_dat_rI(s, COND_AL, c, args[0], args[1], args[2], const_args[2]);
         break;
     case INDEX_op_add2_i32:
-        tcg_out_dat_reg2(s, COND_AL, ARITH_ADD, ARITH_ADC,
-                        args[0], args[1], args[2], args[3],
-                        args[4], args[5], SHIFT_IMM_LSL(0));
+        a0 = args[0], a1 = args[1], a2 = args[2];
+        a3 = args[3], a4 = args[4], a5 = args[5];
+        if (a0 == a3 || (a0 == a5 && !const_args[5])) {
+            a0 = TCG_REG_R8;
+        }
+        tcg_out_dat_rIN(s, COND_AL, ARITH_ADD | TO_CPSR, ARITH_SUB | TO_CPSR,
+                        a0, a2, a4, const_args[4]);
+        tcg_out_dat_rIK(s, COND_AL, ARITH_ADC, ARITH_SBC,
+                        a1, a3, a5, const_args[5]);
+        tcg_out_mov_reg(s, COND_AL, args[0], a0);
         break;
     case INDEX_op_sub2_i32:
-        tcg_out_dat_reg2(s, COND_AL, ARITH_SUB, ARITH_SBC,
-                        args[0], args[1], args[2], args[3],
-                        args[4], args[5], SHIFT_IMM_LSL(0));
+        a0 = args[0], a1 = args[1], a2 = args[2];
+        a3 = args[3], a4 = args[4], a5 = args[5];
+        if ((a0 == a3 && !const_args[3]) || (a0 == a5 && !const_args[5])) {
+            a0 = TCG_REG_R8;
+        }
+        if (const_args[2]) {
+            if (const_args[4]) {
+                tcg_out_movi32(s, COND_AL, a0, a4);
+                a4 = a0;
+            }
+            tcg_out_dat_rI(s, COND_AL, ARITH_RSB | TO_CPSR, a0, a4, a2, 1);
+        } else {
+            tcg_out_dat_rIN(s, COND_AL, ARITH_SUB | TO_CPSR,
+                            ARITH_ADD | TO_CPSR, a0, a2, a4, const_args[4]);
+        }
+        if (const_args[3]) {
+            if (const_args[5]) {
+                tcg_out_movi32(s, COND_AL, a1, a5);
+                a5 = a1;
+            }
+            tcg_out_dat_rI(s, COND_AL, ARITH_RSC, a1, a5, a3, 1);
+        } else {
+            tcg_out_dat_rIK(s, COND_AL, ARITH_SBC, ARITH_ADC,
+                            a1, a3, a5, const_args[5]);
+        }
+        tcg_out_mov_reg(s, COND_AL, args[0], a0);
         break;
     case INDEX_op_neg_i32:
         tcg_out_dat_imm(s, COND_AL, ARITH_RSB, args[0], args[1], 0);
@@ -1889,9 +1900,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_setcond_i32, { "r", "r", "rIN" } },
     { INDEX_op_movcond_i32, { "r", "r", "rIN", "rIK", "0" } },
 
-    /* TODO: "r", "r", "r", "r", "ri", "ri" */
-    { INDEX_op_add2_i32, { "r", "r", "r", "r", "r", "r" } },
-    { INDEX_op_sub2_i32, { "r", "r", "r", "r", "r", "r" } },
+    { INDEX_op_add2_i32, { "r", "r", "r", "r", "rIN", "rIK" } },
+    { INDEX_op_sub2_i32, { "r", "r", "rI", "rI", "rIN", "rIK" } },
     { INDEX_op_brcond2_i32, { "r", "r", "rIN", "rIN" } },
     { INDEX_op_setcond2_i32, { "r", "r", "r", "rIN", "rIN" } },
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 06/20] tcg-arm: Improve constant generation
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (4 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 05/20] tcg-arm: Handle constant arguments to add2/sub2 Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 07/20] tcg-arm: Fold epilogue into INDEX_op_exit_tb Richard Henderson
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

Try fully rotated arguments to mov and mvn before trying movt
or full decomposition.  Begin decomposition with mvn when it
looks like it'll help.  Examples include

-:        mov   r9, #0x00000fa0
-:        orr   r9, r9, #0x000ee000
-:        orr   r9, r9, #0x0ff00000
-:        orr   r9, r9, #0xf0000000
+:        mvn   r9, #0x0000005f
+:        eor   r9, r9, #0x00011000

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 67 ++++++++++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 23 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 48cc186..5333fad 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -427,15 +427,31 @@ static inline void tcg_out_dat_imm(TCGContext *s,
                     (rn << 16) | (rd << 12) | im);
 }
 
-static inline void tcg_out_movi32(TCGContext *s,
-                int cond, int rd, uint32_t arg)
-{
-    /* TODO: This is very suboptimal, we can easily have a constant
-     * pool somewhere after all the instructions.  */
-    if ((int)arg < 0 && (int)arg >= -0x100) {
-        tcg_out_dat_imm(s, cond, ARITH_MVN, rd, 0, (~arg) & 0xff);
-    } else if (use_armv7_instructions) {
-        /* use movw/movt */
+static void tcg_out_movi32(TCGContext *s, int cond, int rd, uint32_t arg)
+{
+    int rot, opc, rn;
+
+    /* For armv7, make sure not to use movw+movt when mov/mvn would do.
+       Speed things up by only checking when movt would be required.
+       Prior to armv7, have one go at fully rotated immediates before
+       doing the decomposition thing below.  */
+    if (!use_armv7_instructions || (arg & 0xffff0000)) {
+        rot = encode_imm(arg);
+        if (rot >= 0) {
+            tcg_out_dat_imm(s, cond, ARITH_MOV, rd, 0,
+                            rotl(arg, rot) | (rot << 7));
+            return;
+        }
+        rot = encode_imm(~arg);
+        if (rot >= 0) {
+            tcg_out_dat_imm(s, cond, ARITH_MVN, rd, 0,
+                            rotl(~arg, rot) | (rot << 7));
+            return;
+        }
+    }
+
+    /* Use movw + movt.  */
+    if (use_armv7_instructions) {
         /* movw */
         tcg_out32(s, (cond << 28) | 0x03000000 | (rd << 12)
                   | ((arg << 4) & 0x000f0000) | (arg & 0xfff));
@@ -444,22 +460,27 @@ static inline void tcg_out_movi32(TCGContext *s,
             tcg_out32(s, (cond << 28) | 0x03400000 | (rd << 12)
                       | ((arg >> 12) & 0x000f0000) | ((arg >> 16) & 0xfff));
         }
-    } else {
-        int opc = ARITH_MOV;
-        int rn = 0;
-
-        do {
-            int i, rot;
-
-            i = ctz32(arg) & ~1;
-            rot = ((32 - i) << 7) & 0xf00;
-            tcg_out_dat_imm(s, cond, opc, rd, rn, ((arg >> i) & 0xff) | rot);
-            arg &= ~(0xff << i);
+        return;
+    }
 
-            opc = ARITH_ORR;
-            rn = rd;
-        } while (arg);
+    /* TODO: This is very suboptimal, we can easily have a constant
+       pool somewhere after all the instructions.  */
+    opc = ARITH_MOV;
+    rn = 0;
+    /* If we have lots of leading 1's, we can shorten the sequence by
+       beginning with mvn and then clearing higher bits with eor.  */
+    if (clz32(~arg) > clz32(arg)) {
+        opc = ARITH_MVN, arg = ~arg;
     }
+    do {
+        int i = ctz32(arg) & ~1;
+        rot = ((32 - i) << 7) & 0xf00;
+        tcg_out_dat_imm(s, cond, opc, rd, rn, ((arg >> i) & 0xff) | rot);
+        arg &= ~(0xff << i);
+
+        opc = ARITH_EOR;
+        rn = rd;
+    } while (arg);
 }
 
 static inline void tcg_out_dat_rI(TCGContext *s, int cond, int opc, TCGArg dst,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 07/20] tcg-arm: Fold epilogue into INDEX_op_exit_tb
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (5 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 06/20] tcg-arm: Improve constant generation Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-28 16:05   ` Peter Maydell
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 08/20] tcg-arm: Implement deposit for armv7 Richard Henderson
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

The epilogue on ARM is one pop instruction, that pops the return
address into PC.  Avoid the jump to jump for this case.  Use the
standard movi32 routine for loading the return value if it's easy.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 5333fad..88f5689 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -1559,7 +1559,7 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
 #endif
 }
 
-static uint8_t *tb_ret_addr;
+static uint32_t tb_pop_ret;
 
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                 const TCGArg *args, const int *const_args)
@@ -1569,17 +1569,15 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     switch (opc) {
     case INDEX_op_exit_tb:
-        {
-            uint8_t *ld_ptr = s->code_ptr;
-            if (args[0] >> 8)
-                tcg_out_ld32_12(s, COND_AL, TCG_REG_R0, TCG_REG_PC, 0);
-            else
-                tcg_out_dat_imm(s, COND_AL, ARITH_MOV, TCG_REG_R0, 0, args[0]);
-            tcg_out_goto(s, COND_AL, (tcg_target_ulong) tb_ret_addr);
-            if (args[0] >> 8) {
-                *ld_ptr = (uint8_t) (s->code_ptr - ld_ptr) - 8;
-                tcg_out32(s, args[0]);
-            }
+        a0 = args[0];
+        if (use_armv7_instructions || check_fit_imm(a0)) {
+            tcg_out_movi32(s, COND_AL, TCG_REG_R0, args[0]);
+            tcg_out32(s, tb_pop_ret);
+        } else {
+            /* pc is always current address + 8, so 0 reads the word.  */
+            tcg_out_ld32_12(s, COND_AL, TCG_REG_R0, TCG_REG_PC, 0);
+            tcg_out32(s, tb_pop_ret);
+            tcg_out32(s, args[0]);
         }
         break;
     case INDEX_op_goto_tb:
@@ -2025,8 +2023,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
 
     tcg_out_bx(s, COND_AL, tcg_target_call_iarg_regs[1]);
-    tb_ret_addr = s->code_ptr;
 
     /* ldmia sp!, { r4 - r12, pc } */
-    tcg_out32(s, (COND_AL << 28) | 0x08bd9ff0);
+    tb_pop_ret = (COND_AL << 28) | 0x08bd9ff0;
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 08/20] tcg-arm: Implement deposit for armv7
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (6 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 07/20] tcg-arm: Fold epilogue into INDEX_op_exit_tb Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-28 16:15   ` Peter Maydell
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 09/20] tcg-arm: Implement division instructions Richard Henderson
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

We have BFI and BFC available for implementing it.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 36 ++++++++++++++++++++++++++++++++++++
 tcg/arm/tcg-target.h |  5 ++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 88f5689..4950eaf 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -702,6 +702,35 @@ static inline void tcg_out_bswap32(TCGContext *s, int cond, int rd, int rn)
     }
 }
 
+bool tcg_target_deposit_valid(int ofs, int len)
+{
+    /* ??? Without bfi, we could improve over generic code by combining
+       the right-shift from a non-zero ofs with the orr.  We do run into
+       problems when rd == rs, and the mask generated from ofs+len don't
+       fit into an immediate.  We would have to be careful not to pessimize
+       wrt the optimizations performed on the expanded code.  */
+    return use_armv7_instructions;
+}
+
+static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
+                                   TCGArg a1, int ofs, int len, bool const_a1)
+{
+    if (const_a1) {
+        uint32_t mask = (2u << (len - 1)) - 1;
+        a1 &= mask;
+        if (a1 == 0) {
+            /* bfi becomes bfc with rn == 15.  */
+            a1 = 15;
+        } else {
+            tcg_out_movi32(s, cond, TCG_REG_R8, a1);
+            a1 = TCG_REG_R8;
+        }
+    }
+    /* bfi/bfc */
+    tcg_out32(s, 0x07c00010 | (cond << 28) | (rd << 12) | a1
+              | (ofs << 7) | ((ofs + len - 1) << 16));
+}
+
 static inline void tcg_out_ld32_12(TCGContext *s, int cond,
                 int rd, int rn, tcg_target_long im)
 {
@@ -1873,6 +1902,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_ext16u(s, COND_AL, args[0], args[1]);
         break;
 
+    case INDEX_op_deposit_i32:
+        tcg_out_deposit(s, COND_AL, args[0], args[2],
+                        args[3], args[4], const_args[2]);
+        break;
+
     default:
         tcg_abort();
     }
@@ -1957,6 +1991,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_ext16s_i32, { "r", "r" } },
     { INDEX_op_ext16u_i32, { "r", "r" } },
 
+    { INDEX_op_deposit_i32, { "r", "0", "ri" } },
+
     { -1 },
 };
 
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 354dd8a..209f585 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -71,10 +71,13 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
-#define TCG_TARGET_HAS_deposit_i32      0
+#define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_muls2_i32        1
 
+extern bool tcg_target_deposit_valid(int ofs, int len);
+#define TCG_TARGET_deposit_i32_valid  tcg_target_deposit_valid
+
 enum {
     TCG_AREG0 = TCG_REG_R6,
 };
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 09/20] tcg-arm: Implement division instructions
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (7 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 08/20] tcg-arm: Implement deposit for armv7 Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 10/20] tcg-arm: Use TCG_REG_TMP name for the tcg temporary Richard Henderson
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

An armv7 extension implements division, present on Cortex A15.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 disas/arm.c          |  4 ++++
 tcg/arm/tcg-target.c | 36 ++++++++++++++++++++++++++++++++++++
 tcg/arm/tcg-target.h |  7 ++++++-
 3 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/disas/arm.c b/disas/arm.c
index 4927d8a..76e97a8 100644
--- a/disas/arm.c
+++ b/disas/arm.c
@@ -819,6 +819,10 @@ static const struct opcode32 arm_opcodes[] =
   {ARM_EXT_V3M, 0x00800090, 0x0fa000f0, "%22?sumull%20's%c\t%12-15r, %16-19r, %0-3r, %8-11r"},
   {ARM_EXT_V3M, 0x00a00090, 0x0fa000f0, "%22?sumlal%20's%c\t%12-15r, %16-19r, %0-3r, %8-11r"},
 
+  /* IDIV instructions.  */
+  {ARM_EXT_DIV, 0x0710f010, 0x0ff0f0f0, "sdiv%c\t%16-19r, %0-3r, %8-11r"},
+  {ARM_EXT_DIV, 0x0730f010, 0x0ff0f0f0, "udiv%c\t%16-19r, %0-3r, %8-11r"},
+
   /* V7 instructions.  */
   {ARM_EXT_V7, 0xf450f000, 0xfd70f000, "pli\t%P"},
   {ARM_EXT_V7, 0x0320f0f0, 0x0ffffff0, "dbg%c\t#%0-3d"},
diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 4950eaf..e599794 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -591,6 +591,16 @@ static inline void tcg_out_smull32(TCGContext *s,
     }
 }
 
+static inline void tcg_out_sdiv(TCGContext *s, int cond, int rd, int rn, int rm)
+{
+    tcg_out32(s, 0x0710f010 | (cond << 28) | (rd << 16) | rn | (rm << 8));
+}
+
+static inline void tcg_out_udiv(TCGContext *s, int cond, int rd, int rn, int rm)
+{
+    tcg_out32(s, 0x0730f010 | (cond << 28) | (rd << 16) | rn | (rm << 8));
+}
+
 static inline void tcg_out_ext8s(TCGContext *s, int cond,
                                  int rd, int rn)
 {
@@ -1907,6 +1917,25 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                         args[3], args[4], const_args[2]);
         break;
 
+    case INDEX_op_div_i32:
+        tcg_out_sdiv(s, COND_AL, args[0], args[1], args[2]);
+        break;
+    case INDEX_op_divu_i32:
+        tcg_out_udiv(s, COND_AL, args[0], args[1], args[2]);
+        break;
+    case INDEX_op_rem_i32:
+        tcg_out_sdiv(s, COND_AL, TCG_REG_R8, args[1], args[2]);
+        tcg_out_mul32(s, COND_AL, TCG_REG_R8, TCG_REG_R8, args[2]);
+        tcg_out_dat_reg(s, COND_AL, ARITH_SUB, args[0], args[1], TCG_REG_R8,
+                        SHIFT_IMM_LSL(0));
+        break;
+    case INDEX_op_remu_i32:
+        tcg_out_udiv(s, COND_AL, TCG_REG_R8, args[1], args[2]);
+        tcg_out_mul32(s, COND_AL, TCG_REG_R8, TCG_REG_R8, args[2]);
+        tcg_out_dat_reg(s, COND_AL, ARITH_SUB, args[0], args[1], TCG_REG_R8,
+                        SHIFT_IMM_LSL(0));
+        break;
+
     default:
         tcg_abort();
     }
@@ -1993,6 +2022,13 @@ static const TCGTargetOpDef arm_op_defs[] = {
 
     { INDEX_op_deposit_i32, { "r", "0", "ri" } },
 
+#if TCG_TARGET_HAS_div_i32
+    { INDEX_op_div_i32, { "r", "r", "r" } },
+    { INDEX_op_rem_i32, { "r", "r", "r" } },
+    { INDEX_op_divu_i32, { "r", "r", "r" } },
+    { INDEX_op_remu_i32, { "r", "r", "r" } },
+#endif
+
     { -1 },
 };
 
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 209f585..3be41cc 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -56,7 +56,6 @@ typedef enum {
 #define TCG_TARGET_CALL_STACK_OFFSET	0
 
 /* optional instructions */
-#define TCG_TARGET_HAS_div_i32          0
 #define TCG_TARGET_HAS_ext8s_i32        1
 #define TCG_TARGET_HAS_ext16s_i32       1
 #define TCG_TARGET_HAS_ext8u_i32        0 /* and r0, r1, #0xff */
@@ -75,6 +74,12 @@ typedef enum {
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_muls2_i32        1
 
+#ifdef __ARM_ARCH_EXT_IDIV__
+#define TCG_TARGET_HAS_div_i32          1
+#else
+#define TCG_TARGET_HAS_div_i32          0
+#endif
+
 extern bool tcg_target_deposit_valid(int ofs, int len);
 #define TCG_TARGET_deposit_i32_valid  tcg_target_deposit_valid
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 10/20] tcg-arm: Use TCG_REG_TMP name for the tcg temporary
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (8 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 09/20] tcg-arm: Implement division instructions Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 11/20] tcg-arm: Use R12 " Richard Henderson
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

Don't hard-code R8.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 125 ++++++++++++++++++++++++++-------------------------
 1 file changed, 64 insertions(+), 61 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index e599794..90440fb 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -113,6 +113,8 @@ static const int tcg_target_call_oarg_regs[2] = {
     TCG_REG_R0, TCG_REG_R1
 };
 
+#define TCG_REG_TMP  TCG_REG_R8
+
 static inline void reloc_abs32(void *code_ptr, tcg_target_long target)
 {
     *(uint32_t *) code_ptr = target;
@@ -550,10 +552,10 @@ static inline void tcg_out_mul32(TCGContext *s,
         tcg_out32(s, (cond << 28) | (rd << 16) | (0 << 12) |
                         (rm << 8) | 0x90 | rs);
     else {
-        tcg_out32(s, (cond << 28) | ( 8 << 16) | (0 << 12) |
+        tcg_out32(s, (cond << 28) | (TCG_REG_TMP << 16) | (0 << 12) |
                         (rs << 8) | 0x90 | rm);
         tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        rd, 0, TCG_REG_R8, SHIFT_IMM_LSL(0));
+                        rd, 0, TCG_REG_TMP, SHIFT_IMM_LSL(0));
     }
 }
 
@@ -568,8 +570,8 @@ static inline void tcg_out_umull32(TCGContext *s,
                         (rd1 << 16) | (rd0 << 12) | (rm << 8) | rs);
     else {
         tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        TCG_REG_R8, 0, rm, SHIFT_IMM_LSL(0));
-        tcg_out32(s, (cond << 28) | 0x800098 |
+                        TCG_REG_TMP, 0, rm, SHIFT_IMM_LSL(0));
+        tcg_out32(s, (cond << 28) | 0x800090 | TCG_REG_TMP |
                         (rd1 << 16) | (rd0 << 12) | (rs << 8));
     }
 }
@@ -585,8 +587,8 @@ static inline void tcg_out_smull32(TCGContext *s,
                         (rd1 << 16) | (rd0 << 12) | (rm << 8) | rs);
     else {
         tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        TCG_REG_R8, 0, rm, SHIFT_IMM_LSL(0));
-        tcg_out32(s, (cond << 28) | 0xc00098 |
+                        TCG_REG_TMP, 0, rm, SHIFT_IMM_LSL(0));
+        tcg_out32(s, (cond << 28) | 0xc00090 | TCG_REG_TMP |
                         (rd1 << 16) | (rd0 << 12) | (rs << 8));
     }
 }
@@ -656,11 +658,11 @@ static inline void tcg_out_bswap16s(TCGContext *s, int cond, int rd, int rn)
         tcg_out32(s, 0x06ff0fb0 | (cond << 28) | (rd << 12) | rn);
     } else {
         tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        TCG_REG_R8, 0, rn, SHIFT_IMM_LSL(24));
+                        TCG_REG_TMP, 0, rn, SHIFT_IMM_LSL(24));
         tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        TCG_REG_R8, 0, TCG_REG_R8, SHIFT_IMM_ASR(16));
+                        TCG_REG_TMP, 0, TCG_REG_TMP, SHIFT_IMM_ASR(16));
         tcg_out_dat_reg(s, cond, ARITH_ORR,
-                        rd, TCG_REG_R8, rn, SHIFT_IMM_LSR(8));
+                        rd, TCG_REG_TMP, rn, SHIFT_IMM_LSR(8));
     }
 }
 
@@ -671,11 +673,11 @@ static inline void tcg_out_bswap16(TCGContext *s, int cond, int rd, int rn)
         tcg_out32(s, 0x06bf0fb0 | (cond << 28) | (rd << 12) | rn);
     } else {
         tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        TCG_REG_R8, 0, rn, SHIFT_IMM_LSL(24));
+                        TCG_REG_TMP, 0, rn, SHIFT_IMM_LSL(24));
         tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        TCG_REG_R8, 0, TCG_REG_R8, SHIFT_IMM_LSR(16));
+                        TCG_REG_TMP, 0, TCG_REG_TMP, SHIFT_IMM_LSR(16));
         tcg_out_dat_reg(s, cond, ARITH_ORR,
-                        rd, TCG_REG_R8, rn, SHIFT_IMM_LSR(8));
+                        rd, TCG_REG_TMP, rn, SHIFT_IMM_LSR(8));
     }
 }
 
@@ -688,10 +690,10 @@ static inline void tcg_out_bswap16st(TCGContext *s, int cond, int rd, int rn)
         tcg_out32(s, 0x06bf0fb0 | (cond << 28) | (rd << 12) | rn);
     } else {
         tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        TCG_REG_R8, 0, rn, SHIFT_IMM_LSR(8));
-        tcg_out_dat_imm(s, cond, ARITH_AND, TCG_REG_R8, TCG_REG_R8, 0xff);
+                        TCG_REG_TMP, 0, rn, SHIFT_IMM_LSR(8));
+        tcg_out_dat_imm(s, cond, ARITH_AND, TCG_REG_TMP, TCG_REG_TMP, 0xff);
         tcg_out_dat_reg(s, cond, ARITH_ORR,
-                        rd, TCG_REG_R8, rn, SHIFT_IMM_LSL(8));
+                        rd, TCG_REG_TMP, rn, SHIFT_IMM_LSL(8));
     }
 }
 
@@ -702,13 +704,13 @@ static inline void tcg_out_bswap32(TCGContext *s, int cond, int rd, int rn)
         tcg_out32(s, 0x06bf0f30 | (cond << 28) | (rd << 12) | rn);
     } else {
         tcg_out_dat_reg(s, cond, ARITH_EOR,
-                        TCG_REG_R8, rn, rn, SHIFT_IMM_ROR(16));
+                        TCG_REG_TMP, rn, rn, SHIFT_IMM_ROR(16));
         tcg_out_dat_imm(s, cond, ARITH_BIC,
-                        TCG_REG_R8, TCG_REG_R8, 0xff | 0x800);
+                        TCG_REG_TMP, TCG_REG_TMP, 0xff | 0x800);
         tcg_out_dat_reg(s, cond, ARITH_MOV,
                         rd, 0, rn, SHIFT_IMM_ROR(8));
         tcg_out_dat_reg(s, cond, ARITH_EOR,
-                        rd, rd, TCG_REG_R8, SHIFT_IMM_LSR(8));
+                        rd, rd, TCG_REG_TMP, SHIFT_IMM_LSR(8));
     }
 }
 
@@ -732,8 +734,8 @@ static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
             /* bfi becomes bfc with rn == 15.  */
             a1 = 15;
         } else {
-            tcg_out_movi32(s, cond, TCG_REG_R8, a1);
-            a1 = TCG_REG_R8;
+            tcg_out_movi32(s, cond, TCG_REG_TMP, a1);
+            a1 = TCG_REG_TMP;
         }
     }
     /* bfi/bfc */
@@ -928,8 +930,8 @@ static inline void tcg_out_ld32u(TCGContext *s, int cond,
                 int rd, int rn, int32_t offset)
 {
     if (offset > 0xfff || offset < -0xfff) {
-        tcg_out_movi32(s, cond, TCG_REG_R8, offset);
-        tcg_out_ld32_r(s, cond, rd, rn, TCG_REG_R8);
+        tcg_out_movi32(s, cond, TCG_REG_TMP, offset);
+        tcg_out_ld32_r(s, cond, rd, rn, TCG_REG_TMP);
     } else
         tcg_out_ld32_12(s, cond, rd, rn, offset);
 }
@@ -938,8 +940,8 @@ static inline void tcg_out_st32(TCGContext *s, int cond,
                 int rd, int rn, int32_t offset)
 {
     if (offset > 0xfff || offset < -0xfff) {
-        tcg_out_movi32(s, cond, TCG_REG_R8, offset);
-        tcg_out_st32_r(s, cond, rd, rn, TCG_REG_R8);
+        tcg_out_movi32(s, cond, TCG_REG_TMP, offset);
+        tcg_out_st32_r(s, cond, rd, rn, TCG_REG_TMP);
     } else
         tcg_out_st32_12(s, cond, rd, rn, offset);
 }
@@ -948,8 +950,8 @@ static inline void tcg_out_ld16u(TCGContext *s, int cond,
                 int rd, int rn, int32_t offset)
 {
     if (offset > 0xff || offset < -0xff) {
-        tcg_out_movi32(s, cond, TCG_REG_R8, offset);
-        tcg_out_ld16u_r(s, cond, rd, rn, TCG_REG_R8);
+        tcg_out_movi32(s, cond, TCG_REG_TMP, offset);
+        tcg_out_ld16u_r(s, cond, rd, rn, TCG_REG_TMP);
     } else
         tcg_out_ld16u_8(s, cond, rd, rn, offset);
 }
@@ -958,8 +960,8 @@ static inline void tcg_out_ld16s(TCGContext *s, int cond,
                 int rd, int rn, int32_t offset)
 {
     if (offset > 0xff || offset < -0xff) {
-        tcg_out_movi32(s, cond, TCG_REG_R8, offset);
-        tcg_out_ld16s_r(s, cond, rd, rn, TCG_REG_R8);
+        tcg_out_movi32(s, cond, TCG_REG_TMP, offset);
+        tcg_out_ld16s_r(s, cond, rd, rn, TCG_REG_TMP);
     } else
         tcg_out_ld16s_8(s, cond, rd, rn, offset);
 }
@@ -968,8 +970,8 @@ static inline void tcg_out_st16(TCGContext *s, int cond,
                 int rd, int rn, int32_t offset)
 {
     if (offset > 0xff || offset < -0xff) {
-        tcg_out_movi32(s, cond, TCG_REG_R8, offset);
-        tcg_out_st16_r(s, cond, rd, rn, TCG_REG_R8);
+        tcg_out_movi32(s, cond, TCG_REG_TMP, offset);
+        tcg_out_st16_r(s, cond, rd, rn, TCG_REG_TMP);
     } else
         tcg_out_st16_8(s, cond, rd, rn, offset);
 }
@@ -978,8 +980,8 @@ static inline void tcg_out_ld8u(TCGContext *s, int cond,
                 int rd, int rn, int32_t offset)
 {
     if (offset > 0xfff || offset < -0xfff) {
-        tcg_out_movi32(s, cond, TCG_REG_R8, offset);
-        tcg_out_ld8_r(s, cond, rd, rn, TCG_REG_R8);
+        tcg_out_movi32(s, cond, TCG_REG_TMP, offset);
+        tcg_out_ld8_r(s, cond, rd, rn, TCG_REG_TMP);
     } else
         tcg_out_ld8_12(s, cond, rd, rn, offset);
 }
@@ -988,8 +990,8 @@ static inline void tcg_out_ld8s(TCGContext *s, int cond,
                 int rd, int rn, int32_t offset)
 {
     if (offset > 0xff || offset < -0xff) {
-        tcg_out_movi32(s, cond, TCG_REG_R8, offset);
-        tcg_out_ld8s_r(s, cond, rd, rn, TCG_REG_R8);
+        tcg_out_movi32(s, cond, TCG_REG_TMP, offset);
+        tcg_out_ld8s_r(s, cond, rd, rn, TCG_REG_TMP);
     } else
         tcg_out_ld8s_8(s, cond, rd, rn, offset);
 }
@@ -998,8 +1000,8 @@ static inline void tcg_out_st8(TCGContext *s, int cond,
                 int rd, int rn, int32_t offset)
 {
     if (offset > 0xfff || offset < -0xfff) {
-        tcg_out_movi32(s, cond, TCG_REG_R8, offset);
-        tcg_out_st8_r(s, cond, rd, rn, TCG_REG_R8);
+        tcg_out_movi32(s, cond, TCG_REG_TMP, offset);
+        tcg_out_st8_r(s, cond, rd, rn, TCG_REG_TMP);
     } else
         tcg_out_st8_12(s, cond, rd, rn, offset);
 }
@@ -1027,10 +1029,10 @@ static inline void tcg_out_goto(TCGContext *s, int cond, uint32_t addr)
             tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
             tcg_out32(s, addr);
         } else {
-            tcg_out_movi32(s, cond, TCG_REG_R8, val - 8);
+            tcg_out_movi32(s, cond, TCG_REG_TMP, val - 8);
             tcg_out_dat_reg(s, cond, ARITH_ADD,
                             TCG_REG_PC, TCG_REG_PC,
-                            TCG_REG_R8, SHIFT_IMM_LSL(0));
+                            TCG_REG_TMP, SHIFT_IMM_LSL(0));
         }
     }
 }
@@ -1131,12 +1133,13 @@ static const void * const qemu_st_helpers[4] = {
         if (argreg < 4) {                                                  \
             TCG_OUT_ARG_GET_ARG(argreg);                                   \
         } else if (argreg == 4) {                                          \
-            TCG_OUT_ARG_GET_ARG(TCG_REG_R8);                               \
-            tcg_out32(s, (COND_AL << 28) | 0x052d8010);                    \
+            TCG_OUT_ARG_GET_ARG(TCG_REG_TMP);                              \
+            tcg_out32(s, (COND_AL << 28) | 0x052d0010 | (TCG_REG_TMP << 12)); \
         } else {                                                           \
             assert(argreg < 8);                                            \
-            TCG_OUT_ARG_GET_ARG(TCG_REG_R8);                               \
-            tcg_out32(s, (COND_AL << 28) | 0x058d8000 | (argreg - 4) * 4); \
+            TCG_OUT_ARG_GET_ARG(TCG_REG_TMP);                              \
+            tcg_out_st32_12(s, COND_AL, TCG_REG_TMP, TCG_REG_CALL_STACK,   \
+                            (argreg - 4) * 4);                             \
         }                                                                  \
         return argreg + 1;                                                 \
     }
@@ -1234,10 +1237,10 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 #  if CPU_TLB_BITS > 8
 #   error
 #  endif
-    tcg_out_dat_reg(s, COND_AL, ARITH_MOV, TCG_REG_R8,
+    tcg_out_dat_reg(s, COND_AL, ARITH_MOV, TCG_REG_TMP,
                     0, addr_reg, SHIFT_IMM_LSR(TARGET_PAGE_BITS));
     tcg_out_dat_imm(s, COND_AL, ARITH_AND,
-                    TCG_REG_R0, TCG_REG_R8, CPU_TLB_SIZE - 1);
+                    TCG_REG_R0, TCG_REG_TMP, CPU_TLB_SIZE - 1);
     tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_R0, TCG_AREG0,
                     TCG_REG_R0, SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS));
     /* We assume that the offset is contained within 20 bits.  */
@@ -1250,7 +1253,7 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
     }
     tcg_out_ld32_12wb(s, COND_AL, TCG_REG_R1, TCG_REG_R0, tlb_offset);
     tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, TCG_REG_R1,
-                    TCG_REG_R8, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
+                    TCG_REG_TMP, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
     /* Check alignment.  */
     if (s_bits)
         tcg_out_dat_imm(s, COND_EQ, ARITH_TST,
@@ -1355,9 +1358,9 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
             i = ctz32(offset) & ~1;
             rot = ((32 - i) << 7) & 0xf00;
 
-            tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R8, addr_reg,
+            tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_TMP, addr_reg,
                             ((offset >> i) & 0xff) | rot);
-            addr_reg = TCG_REG_R8;
+            addr_reg = TCG_REG_TMP;
             offset &= ~(0xff << i);
         }
     }
@@ -1444,9 +1447,9 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
      *  add r0, env, r0 lsl #CPU_TLB_ENTRY_BITS
      */
     tcg_out_dat_reg(s, COND_AL, ARITH_MOV,
-                    TCG_REG_R8, 0, addr_reg, SHIFT_IMM_LSR(TARGET_PAGE_BITS));
+                    TCG_REG_TMP, 0, addr_reg, SHIFT_IMM_LSR(TARGET_PAGE_BITS));
     tcg_out_dat_imm(s, COND_AL, ARITH_AND,
-                    TCG_REG_R0, TCG_REG_R8, CPU_TLB_SIZE - 1);
+                    TCG_REG_R0, TCG_REG_TMP, CPU_TLB_SIZE - 1);
     tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_R0,
                     TCG_AREG0, TCG_REG_R0, SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS));
     /* We assume that the offset is contained within 20 bits.  */
@@ -1459,7 +1462,7 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
     }
     tcg_out_ld32_12wb(s, COND_AL, TCG_REG_R1, TCG_REG_R0, tlb_offset);
     tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, TCG_REG_R1,
-                    TCG_REG_R8, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
+                    TCG_REG_TMP, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
     /* Check alignment.  */
     if (s_bits)
         tcg_out_dat_imm(s, COND_EQ, ARITH_TST,
@@ -1737,7 +1740,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         a0 = args[0], a1 = args[1], a2 = args[2];
         a3 = args[3], a4 = args[4], a5 = args[5];
         if (a0 == a3 || (a0 == a5 && !const_args[5])) {
-            a0 = TCG_REG_R8;
+            a0 = TCG_REG_TMP;
         }
         tcg_out_dat_rIN(s, COND_AL, ARITH_ADD | TO_CPSR, ARITH_SUB | TO_CPSR,
                         a0, a2, a4, const_args[4]);
@@ -1749,7 +1752,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         a0 = args[0], a1 = args[1], a2 = args[2];
         a3 = args[3], a4 = args[4], a5 = args[5];
         if ((a0 == a3 && !const_args[3]) || (a0 == a5 && !const_args[5])) {
-            a0 = TCG_REG_R8;
+            a0 = TCG_REG_TMP;
         }
         if (const_args[2]) {
             if (const_args[4]) {
@@ -1817,9 +1820,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                             SHIFT_IMM_ROR((0x20 - args[2]) & 0x1f) :
                             SHIFT_IMM_LSL(0));
         } else {
-            tcg_out_dat_imm(s, COND_AL, ARITH_RSB, TCG_REG_R8, args[1], 0x20);
+            tcg_out_dat_imm(s, COND_AL, ARITH_RSB, TCG_REG_TMP, args[1], 0x20);
             tcg_out_dat_reg(s, COND_AL, ARITH_MOV, args[0], 0, args[1],
-                            SHIFT_REG_ROR(TCG_REG_R8));
+                            SHIFT_REG_ROR(TCG_REG_TMP));
         }
         break;
 
@@ -1924,15 +1927,15 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_udiv(s, COND_AL, args[0], args[1], args[2]);
         break;
     case INDEX_op_rem_i32:
-        tcg_out_sdiv(s, COND_AL, TCG_REG_R8, args[1], args[2]);
-        tcg_out_mul32(s, COND_AL, TCG_REG_R8, TCG_REG_R8, args[2]);
-        tcg_out_dat_reg(s, COND_AL, ARITH_SUB, args[0], args[1], TCG_REG_R8,
+        tcg_out_sdiv(s, COND_AL, TCG_REG_TMP, args[1], args[2]);
+        tcg_out_mul32(s, COND_AL, TCG_REG_TMP, TCG_REG_TMP, args[2]);
+        tcg_out_dat_reg(s, COND_AL, ARITH_SUB, args[0], args[1], TCG_REG_TMP,
                         SHIFT_IMM_LSL(0));
         break;
     case INDEX_op_remu_i32:
-        tcg_out_udiv(s, COND_AL, TCG_REG_R8, args[1], args[2]);
-        tcg_out_mul32(s, COND_AL, TCG_REG_R8, TCG_REG_R8, args[2]);
-        tcg_out_dat_reg(s, COND_AL, ARITH_SUB, args[0], args[1], TCG_REG_R8,
+        tcg_out_udiv(s, COND_AL, TCG_REG_TMP, args[1], args[2]);
+        tcg_out_mul32(s, COND_AL, TCG_REG_TMP, TCG_REG_TMP, args[2]);
+        tcg_out_dat_reg(s, COND_AL, ARITH_SUB, args[0], args[1], TCG_REG_TMP,
                         SHIFT_IMM_LSL(0));
         break;
 
@@ -2051,7 +2054,7 @@ static void tcg_target_init(TCGContext *s)
 
     tcg_regset_clear(s->reserved_regs);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_R8);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_PC);
 
     tcg_add_target_add_op_defs(arm_op_defs);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 11/20] tcg-arm: Use R12 for the tcg temporary
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (9 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 10/20] tcg-arm: Use TCG_REG_TMP name for the tcg temporary Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 12/20] tcg-arm: Cleanup multiply subroutines Richard Henderson
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

R12 is call clobbered, while R8 is call saved.  This change
gives tcg one more call saved register for real data.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 90440fb..d7b8b88 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -113,7 +113,7 @@ static const int tcg_target_call_oarg_regs[2] = {
     TCG_REG_R0, TCG_REG_R1
 };
 
-#define TCG_REG_TMP  TCG_REG_R8
+#define TCG_REG_TMP  TCG_REG_R12
 
 static inline void reloc_abs32(void *code_ptr, tcg_target_long target)
 {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 12/20] tcg-arm: Cleanup multiply subroutines
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (10 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 11/20] tcg-arm: Use R12 " Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 13/20] tcg-arm: Cleanup tcg_out_goto_label Richard Henderson
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

Make the code more readable by only having one copy of the magic
numbers, swapping registers as needed prior to that.  Speed the
compiler by not applying the rd == rn avoidance for v6 or later.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 85 +++++++++++++++++++++++++++-------------------------
 1 file changed, 45 insertions(+), 40 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index d7b8b88..e5ec999 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -542,55 +542,60 @@ static void tcg_out_dat_rIN(TCGContext *s, int cond, int opc, int opneg,
     }
 }
 
-static inline void tcg_out_mul32(TCGContext *s,
-                int cond, int rd, int rs, int rm)
-{
-    if (rd != rm)
-        tcg_out32(s, (cond << 28) | (rd << 16) | (0 << 12) |
-                        (rs << 8) | 0x90 | rm);
-    else if (rd != rs)
-        tcg_out32(s, (cond << 28) | (rd << 16) | (0 << 12) |
-                        (rm << 8) | 0x90 | rs);
-    else {
-        tcg_out32(s, (cond << 28) | (TCG_REG_TMP << 16) | (0 << 12) |
-                        (rs << 8) | 0x90 | rm);
-        tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        rd, 0, TCG_REG_TMP, SHIFT_IMM_LSL(0));
+static inline void tcg_out_mul32(TCGContext *s, int cond, TCGReg rd,
+                                 TCGReg rn, TCGReg rm)
+{
+    /* if ArchVersion() < 6 && d == n then UNPREDICTABLE;  */
+    if (!use_armv6_instructions && rd == rn) {
+        if (rd == rm) {
+            /* rd == rn == rm; copy an input to tmp first.  */
+            tcg_out_mov_reg(s, cond, TCG_REG_TMP, rn);
+            rm = rn = TCG_REG_TMP;
+        } else {
+            rn = rm;
+            rm = rd;
+        }
     }
+    /* mul */
+    tcg_out32(s, (cond << 28) | 0x90 | (rd << 16) | (rm << 8) | rn);
 }
 
-static inline void tcg_out_umull32(TCGContext *s,
-                int cond, int rd0, int rd1, int rs, int rm)
+static inline void tcg_out_umull32(TCGContext *s, int cond, TCGReg rd0,
+                                   TCGReg rd1, TCGReg rn, TCGReg rm)
 {
-    if (rd0 != rm && rd1 != rm)
-        tcg_out32(s, (cond << 28) | 0x800090 |
-                        (rd1 << 16) | (rd0 << 12) | (rs << 8) | rm);
-    else if (rd0 != rs && rd1 != rs)
-        tcg_out32(s, (cond << 28) | 0x800090 |
-                        (rd1 << 16) | (rd0 << 12) | (rm << 8) | rs);
-    else {
-        tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        TCG_REG_TMP, 0, rm, SHIFT_IMM_LSL(0));
-        tcg_out32(s, (cond << 28) | 0x800090 | TCG_REG_TMP |
-                        (rd1 << 16) | (rd0 << 12) | (rs << 8));
+    /* if ArchVersion() < 6 && (dHi == n || dLo == n) then UNPREDICTABLE;  */
+    if (!use_armv6_instructions && (rd0 == rn || rd1 == rn)) {
+        if (rd0 == rm || rd1 == rm) {
+            tcg_out_mov_reg(s, cond, TCG_REG_TMP, rn);
+            rn = TCG_REG_TMP;
+        } else {
+            TCGReg t = rn;
+            rn = rm;
+            rm = t;
+        }
     }
+    /* umull */
+    tcg_out32(s, (cond << 28) | 0x00800090 |
+              (rd1 << 16) | (rd0 << 12) | (rm << 8) | rn);
 }
 
-static inline void tcg_out_smull32(TCGContext *s,
-                int cond, int rd0, int rd1, int rs, int rm)
+static inline void tcg_out_smull32(TCGContext *s, int cond, TCGReg rd0,
+                                   TCGReg rd1, TCGReg rn, TCGReg rm)
 {
-    if (rd0 != rm && rd1 != rm)
-        tcg_out32(s, (cond << 28) | 0xc00090 |
-                        (rd1 << 16) | (rd0 << 12) | (rs << 8) | rm);
-    else if (rd0 != rs && rd1 != rs)
-        tcg_out32(s, (cond << 28) | 0xc00090 |
-                        (rd1 << 16) | (rd0 << 12) | (rm << 8) | rs);
-    else {
-        tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        TCG_REG_TMP, 0, rm, SHIFT_IMM_LSL(0));
-        tcg_out32(s, (cond << 28) | 0xc00090 | TCG_REG_TMP |
-                        (rd1 << 16) | (rd0 << 12) | (rs << 8));
+    /* if ArchVersion() < 6 && (dHi == n || dLo == n) then UNPREDICTABLE;  */
+    if (!use_armv6_instructions && (rd0 == rn || rd1 == rn)) {
+        if (rd0 == rm || rd1 == rm) {
+            tcg_out_mov_reg(s, cond, TCG_REG_TMP, rn);
+            rn = TCG_REG_TMP;
+        } else {
+            TCGReg t = rn;
+            rn = rm;
+            rm = t;
+        }
     }
+    /* smull */
+    tcg_out32(s, (cond << 28) | 0x00c00090 |
+              (rd1 << 16) | (rd0 << 12) | (rm << 8) | rn);
 }
 
 static inline void tcg_out_sdiv(TCGContext *s, int cond, int rd, int rn, int rm)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 13/20] tcg-arm: Cleanup tcg_out_goto_label
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (11 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 12/20] tcg-arm: Cleanup multiply subroutines Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 14/20] tcg-arm: Cleanup goto_tb handling Richard Henderson
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

Since we folded the epilogue into exit_tb directly, we no longer
have a long direct branch.  Fold tcg_out_goto into its only user,
eliminate the long branch case.  Drop the long forward branch
case for tcg_out_goto_label.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 42 ++++--------------------------------------
 1 file changed, 4 insertions(+), 38 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index e5ec999..7bcba19 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -1011,37 +1011,6 @@ static inline void tcg_out_st8(TCGContext *s, int cond,
         tcg_out_st8_12(s, cond, rd, rn, offset);
 }
 
-/* The _goto case is normally between TBs within the same code buffer,
- * and with the code buffer limited to 16MB we shouldn't need the long
- * case.
- *
- * .... except to the prologue that is in its own buffer.
- */
-static inline void tcg_out_goto(TCGContext *s, int cond, uint32_t addr)
-{
-    int32_t val;
-
-    if (addr & 1) {
-        /* goto to a Thumb destination isn't supported */
-        tcg_abort();
-    }
-
-    val = addr - (tcg_target_long) s->code_ptr;
-    if (val - 8 < 0x01fffffd && val - 8 > -0x01fffffd)
-        tcg_out_b(s, cond, val);
-    else {
-        if (cond == COND_AL) {
-            tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
-            tcg_out32(s, addr);
-        } else {
-            tcg_out_movi32(s, cond, TCG_REG_TMP, val - 8);
-            tcg_out_dat_reg(s, cond, ARITH_ADD,
-                            TCG_REG_PC, TCG_REG_PC,
-                            TCG_REG_TMP, SHIFT_IMM_LSL(0));
-        }
-    }
-}
-
 /* The call case is mostly used for helpers - so it's not unreasonable
  * for them to be beyond branch range */
 static inline void tcg_out_call(TCGContext *s, uint32_t addr)
@@ -1081,14 +1050,11 @@ static inline void tcg_out_goto_label(TCGContext *s, int cond, int label_index)
 {
     TCGLabel *l = &s->labels[label_index];
 
-    if (l->has_value)
-        tcg_out_goto(s, cond, l->u.value);
-    else if (cond == COND_AL) {
-        tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
-        tcg_out_reloc(s, s->code_ptr, R_ARM_ABS32, label_index, 31337);
-        s->code_ptr += 4;
+    if (l->has_value) {
+        int32_t val = l->u.value - (tcg_target_long) s->code_ptr;
+        assert(val - 8 < 0x01fffffd && val - 8 > -0x01fffffd);
+        tcg_out_b(s, cond, val);
     } else {
-        /* Probably this should be preferred even for COND_AL... */
         tcg_out_reloc(s, s->code_ptr, R_ARM_PC24, label_index, 31337);
         tcg_out_b_noaddr(s, cond);
     }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 14/20] tcg-arm: Cleanup goto_tb handling
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (12 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 13/20] tcg-arm: Cleanup tcg_out_goto_label Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-28 20:09   ` Aurelien Jarno
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 15/20] tcg-arm: Cleanup most primitive load store subroutines Richard Henderson
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

Eliminate 2 disabled code blocks.  Choose the load-to-pc method of
jumping so that we can eliminate the 16M code_gen_buffer limitation.
Remove a test in the indirect jump method that is always true.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/exec/exec-all.h | 23 +++--------------------
 tcg/arm/tcg-target.c    | 26 ++++++--------------------
 translate-all.c         |  2 --
 3 files changed, 9 insertions(+), 42 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e856191..190effa 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -233,26 +233,9 @@ static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
 #elif defined(__arm__)
 static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
 {
-#if !QEMU_GNUC_PREREQ(4, 1)
-    register unsigned long _beg __asm ("a1");
-    register unsigned long _end __asm ("a2");
-    register unsigned long _flg __asm ("a3");
-#endif
-
-    /* we could use a ldr pc, [pc, #-4] kind of branch and avoid the flush */
-    *(uint32_t *)jmp_addr =
-        (*(uint32_t *)jmp_addr & ~0xffffff)
-        | (((addr - (jmp_addr + 8)) >> 2) & 0xffffff);
-
-#if QEMU_GNUC_PREREQ(4, 1)
-    __builtin___clear_cache((char *) jmp_addr, (char *) jmp_addr + 4);
-#else
-    /* flush icache */
-    _beg = jmp_addr;
-    _end = jmp_addr + 4;
-    _flg = 0;
-    __asm __volatile__ ("swi 0x9f0002" : : "r" (_beg), "r" (_end), "r" (_flg));
-#endif
+    /* We're using "ldr pc, [pc,#-4]", so we can just store the raw
+       address, without caring for flushing the icache.  */
+    *(uint32_t *)jmp_addr = addr;
 }
 #elif defined(__sparc__)
 void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr);
diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 7bcba19..6de5e90 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -1595,30 +1595,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
     case INDEX_op_goto_tb:
         if (s->tb_jmp_offset) {
-            /* Direct jump method */
-#if defined(USE_DIRECT_JUMP)
-            s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
-            tcg_out_b_noaddr(s, COND_AL);
-#else
+            /* "Direct" jump method.  Rather than limit the code gen buffer
+               to 16M, load the destination from the next word.  */
             tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
             s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
-            tcg_out32(s, 0);
-#endif
+            s->code_ptr += 4;
         } else {
-            /* Indirect jump method */
-#if 1
-            c = (int) (s->tb_next + args[0]) - ((int) s->code_ptr + 8);
-            if (c > 0xfff || c < -0xfff) {
-                tcg_out_movi32(s, COND_AL, TCG_REG_R0,
-                                (tcg_target_long) (s->tb_next + args[0]));
-                tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_R0, 0);
-            } else
-                tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, c);
-#else
-            tcg_out_ld32_12(s, COND_AL, TCG_REG_R0, TCG_REG_PC, 0);
+            /* Indirect jump method.  */
+            tcg_out_movi32(s, COND_AL, TCG_REG_R0,
+                           (tcg_target_long) (s->tb_next + args[0]));
             tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_R0, 0);
-            tcg_out32(s, (tcg_target_long) (s->tb_next + args[0]));
-#endif
         }
         s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
         break;
diff --git a/translate-all.c b/translate-all.c
index a98c646..3ca839f 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -460,8 +460,6 @@ static inline PageDesc *page_find(tb_page_addr_t index)
 # define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
 #elif defined(__sparc__)
 # define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
-#elif defined(__arm__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (16u * 1024 * 1024)
 #elif defined(__s390x__)
   /* We have a +- 4GB range on the branches; leave some slop.  */
 # define MAX_CODE_GEN_BUFFER_SIZE  (3ul * 1024 * 1024 * 1024)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 15/20] tcg-arm: Cleanup most primitive load store subroutines
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (13 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 14/20] tcg-arm: Cleanup goto_tb handling Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 16/20] tcg-arm: Fix local stack frame Richard Henderson
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

Use even more primitive helper functions to avoid lots of duplicated code.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 235 ++++++++++++++++++++++++---------------------------
 1 file changed, 111 insertions(+), 124 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 6de5e90..fb8c96d 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -306,7 +306,7 @@ static inline int tcg_target_const_match(tcg_target_long val,
 
 #define TO_CPSR (1 << 20)
 
-enum arm_data_opc_e {
+typedef enum {
     ARITH_AND = 0x0 << 21,
     ARITH_EOR = 0x1 << 21,
     ARITH_SUB = 0x2 << 21,
@@ -322,7 +322,26 @@ enum arm_data_opc_e {
     ARITH_MOV = 0xd << 21,
     ARITH_BIC = 0xe << 21,
     ARITH_MVN = 0xf << 21,
-};
+
+    INSN_LDR_IMM   = 0x04100000,
+    INSN_LDR_REG   = 0x06100000,
+    INSN_STR_IMM   = 0x04000000,
+    INSN_STR_REG   = 0x06000000,
+
+    INSN_LDRH_IMM  = 0x005000b0,
+    INSN_LDRH_REG  = 0x001000b0,
+    INSN_LDRSH_IMM = 0x005000f0,
+    INSN_LDRSH_REG = 0x001000f0,
+    INSN_STRH_IMM  = 0x004000b0,
+    INSN_STRH_REG  = 0x000000b0,
+
+    INSN_LDRB_IMM  = 0x04500000,
+    INSN_LDRB_REG  = 0x06500000,
+    INSN_LDRSB_IMM = 0x005000d0,
+    INSN_LDRSB_REG = 0x001000d0,
+    INSN_STRB_IMM  = 0x04400000,
+    INSN_STRB_REG  = 0x06400000,
+} ARMInsn;
 
 #define SHIFT_IMM_LSL(im)	(((im) << 7) | 0x00)
 #define SHIFT_IMM_LSR(im)	(((im) << 7) | 0x20)
@@ -748,187 +767,155 @@ static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
               | (ofs << 7) | ((ofs + len - 1) << 16));
 }
 
-static inline void tcg_out_ld32_12(TCGContext *s, int cond,
-                int rd, int rn, tcg_target_long im)
+/* Note that this routine is used for both LDR and LDRH formats, so we do
+   not wish to include an immediate shift at this point.  */
+static void tcg_out_memop_r(TCGContext *s, int cond, ARMInsn opc, TCGReg rt,
+                            TCGReg rn, TCGReg rm, bool u, bool p, bool w)
 {
-    if (im >= 0)
-        tcg_out32(s, (cond << 28) | 0x05900000 |
-                        (rn << 16) | (rd << 12) | (im & 0xfff));
-    else
-        tcg_out32(s, (cond << 28) | 0x05100000 |
-                        (rn << 16) | (rd << 12) | ((-im) & 0xfff));
+    tcg_out32(s, (cond << 28) | opc | (u << 23) | (p << 24)
+              | (w << 21) | (rn << 16) | (rt << 12) | rm);
+}
+
+static void tcg_out_memop_8(TCGContext *s, int cond, ARMInsn opc, TCGReg rt,
+                            TCGReg rn, int imm8, bool p, bool w)
+{
+    bool u = 1;
+    if (imm8 < 0) {
+        imm8 = -imm8;
+        u = 0;
+    }
+    tcg_out32(s, (cond << 28) | opc | (u << 23) | (p << 24) | (w << 21) |
+              (rn << 16) | (rt << 12) | ((imm8 & 0xf0) << 4) | (imm8 & 0xf));
+}
+
+static void tcg_out_memop_12(TCGContext *s, int cond, ARMInsn opc, TCGReg rt,
+                             TCGReg rn, int imm12, bool p, bool w)
+{
+    bool u = 1;
+    if (imm12 < 0) {
+        imm12 = -imm12;
+        u = 0;
+    }
+    tcg_out32(s, (cond << 28) | opc | (u << 23) | (p << 24) | (w << 21) |
+              (rn << 16) | (rt << 12) | imm12);
+}
+
+static inline void tcg_out_ld32_12(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, int imm12)
+{
+    tcg_out_memop_12(s, cond, INSN_LDR_IMM, rt, rn, imm12, 1, 0);
 }
 
 /* Offset pre-increment with base writeback.  */
-static inline void tcg_out_ld32_12wb(TCGContext *s, int cond,
-                                     int rd, int rn, tcg_target_long im)
+static inline void tcg_out_ld32_12wb(TCGContext *s, int cond, TCGReg rt,
+                                     TCGReg rn, int imm12)
 {
     /* ldr with writeback and both register equals is UNPREDICTABLE */
     assert(rd != rn);
-
-    if (im >= 0) {
-        tcg_out32(s, (cond << 28) | 0x05b00000 |
-                        (rn << 16) | (rd << 12) | (im & 0xfff));
-    } else {
-        tcg_out32(s, (cond << 28) | 0x05300000 |
-                        (rn << 16) | (rd << 12) | ((-im) & 0xfff));
-    }
+    tcg_out_memop_12(s, cond, INSN_LDR_IMM, rt, rn, imm12, 1, 1);
 }
 
-static inline void tcg_out_st32_12(TCGContext *s, int cond,
-                int rd, int rn, tcg_target_long im)
+static inline void tcg_out_st32_12(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, int imm12)
 {
-    if (im >= 0)
-        tcg_out32(s, (cond << 28) | 0x05800000 |
-                        (rn << 16) | (rd << 12) | (im & 0xfff));
-    else
-        tcg_out32(s, (cond << 28) | 0x05000000 |
-                        (rn << 16) | (rd << 12) | ((-im) & 0xfff));
+    tcg_out_memop_12(s, cond, INSN_STR_IMM, rt, rn, imm12, 1, 0);
 }
 
-static inline void tcg_out_ld32_r(TCGContext *s, int cond,
-                int rd, int rn, int rm)
+static inline void tcg_out_ld32_r(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, TCGReg rm)
 {
-    tcg_out32(s, (cond << 28) | 0x07900000 |
-                    (rn << 16) | (rd << 12) | rm);
+    tcg_out_memop_r(s, cond, INSN_LDR_REG, rt, rn, rm, 1, 1, 0);
 }
 
-static inline void tcg_out_st32_r(TCGContext *s, int cond,
-                int rd, int rn, int rm)
+static inline void tcg_out_st32_r(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, TCGReg rm)
 {
-    tcg_out32(s, (cond << 28) | 0x07800000 |
-                    (rn << 16) | (rd << 12) | rm);
+    tcg_out_memop_r(s, cond, INSN_STR_REG, rt, rn, rm, 1, 1, 0);
 }
 
 /* Register pre-increment with base writeback.  */
-static inline void tcg_out_ld32_rwb(TCGContext *s, int cond,
-                int rd, int rn, int rm)
+static inline void tcg_out_ld32_rwb(TCGContext *s, int cond, TCGReg rt,
+                                    TCGReg rn, TCGReg rm)
 {
-    tcg_out32(s, (cond << 28) | 0x07b00000 |
-                    (rn << 16) | (rd << 12) | rm);
+    tcg_out_memop_r(s, cond, INSN_LDR_REG, rt, rn, rm, 1, 1, 1);
 }
 
-static inline void tcg_out_st32_rwb(TCGContext *s, int cond,
-                int rd, int rn, int rm)
+static inline void tcg_out_st32_rwb(TCGContext *s, int cond, TCGReg rt,
+                                    TCGReg rn, TCGReg rm)
 {
-    tcg_out32(s, (cond << 28) | 0x07a00000 |
-                    (rn << 16) | (rd << 12) | rm);
+    tcg_out_memop_r(s, cond, INSN_STR_REG, rt, rn, rm, 1, 1, 1);
 }
 
-static inline void tcg_out_ld16u_8(TCGContext *s, int cond,
-                int rd, int rn, tcg_target_long im)
+static inline void tcg_out_ld16u_8(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, int imm8)
 {
-    if (im >= 0)
-        tcg_out32(s, (cond << 28) | 0x01d000b0 |
-                        (rn << 16) | (rd << 12) |
-                        ((im & 0xf0) << 4) | (im & 0xf));
-    else
-        tcg_out32(s, (cond << 28) | 0x015000b0 |
-                        (rn << 16) | (rd << 12) |
-                        (((-im) & 0xf0) << 4) | ((-im) & 0xf));
+    tcg_out_memop_8(s, cond, INSN_LDRH_IMM, rt, rn, imm8, 1, 0);
 }
 
-static inline void tcg_out_st16_8(TCGContext *s, int cond,
-                int rd, int rn, tcg_target_long im)
+static inline void tcg_out_st16_8(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, int imm8)
 {
-    if (im >= 0)
-        tcg_out32(s, (cond << 28) | 0x01c000b0 |
-                        (rn << 16) | (rd << 12) |
-                        ((im & 0xf0) << 4) | (im & 0xf));
-    else
-        tcg_out32(s, (cond << 28) | 0x014000b0 |
-                        (rn << 16) | (rd << 12) |
-                        (((-im) & 0xf0) << 4) | ((-im) & 0xf));
+    tcg_out_memop_8(s, cond, INSN_STRH_IMM, rt, rn, imm8, 1, 0);
 }
 
-static inline void tcg_out_ld16u_r(TCGContext *s, int cond,
-                int rd, int rn, int rm)
+static inline void tcg_out_ld16u_r(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, TCGReg rm)
 {
-    tcg_out32(s, (cond << 28) | 0x019000b0 |
-                    (rn << 16) | (rd << 12) | rm);
+    tcg_out_memop_r(s, cond, INSN_LDRH_REG, rt, rn, rm, 1, 1, 0);
 }
 
-static inline void tcg_out_st16_r(TCGContext *s, int cond,
-                int rd, int rn, int rm)
+static inline void tcg_out_st16_r(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, TCGReg rm)
 {
-    tcg_out32(s, (cond << 28) | 0x018000b0 |
-                    (rn << 16) | (rd << 12) | rm);
+    tcg_out_memop_r(s, cond, INSN_STRH_REG, rt, rn, rm, 1, 1, 0);
 }
 
-static inline void tcg_out_ld16s_8(TCGContext *s, int cond,
-                int rd, int rn, tcg_target_long im)
+static inline void tcg_out_ld16s_8(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, int imm8)
 {
-    if (im >= 0)
-        tcg_out32(s, (cond << 28) | 0x01d000f0 |
-                        (rn << 16) | (rd << 12) |
-                        ((im & 0xf0) << 4) | (im & 0xf));
-    else
-        tcg_out32(s, (cond << 28) | 0x015000f0 |
-                        (rn << 16) | (rd << 12) |
-                        (((-im) & 0xf0) << 4) | ((-im) & 0xf));
+    tcg_out_memop_8(s, cond, INSN_LDRSH_IMM, rt, rn, imm8, 1, 0);
 }
 
-static inline void tcg_out_ld16s_r(TCGContext *s, int cond,
-                int rd, int rn, int rm)
+static inline void tcg_out_ld16s_r(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, TCGReg rm)
 {
-    tcg_out32(s, (cond << 28) | 0x019000f0 |
-                    (rn << 16) | (rd << 12) | rm);
+    tcg_out_memop_r(s, cond, INSN_LDRSH_REG, rt, rn, rm, 1, 1, 0);
 }
 
-static inline void tcg_out_ld8_12(TCGContext *s, int cond,
-                int rd, int rn, tcg_target_long im)
+static inline void tcg_out_ld8_12(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, int imm12)
 {
-    if (im >= 0)
-        tcg_out32(s, (cond << 28) | 0x05d00000 |
-                        (rn << 16) | (rd << 12) | (im & 0xfff));
-    else
-        tcg_out32(s, (cond << 28) | 0x05500000 |
-                        (rn << 16) | (rd << 12) | ((-im) & 0xfff));
+    tcg_out_memop_12(s, cond, INSN_LDRB_IMM, rt, rn, imm12, 1, 0);
 }
 
-static inline void tcg_out_st8_12(TCGContext *s, int cond,
-                int rd, int rn, tcg_target_long im)
+static inline void tcg_out_st8_12(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, int imm12)
 {
-    if (im >= 0)
-        tcg_out32(s, (cond << 28) | 0x05c00000 |
-                        (rn << 16) | (rd << 12) | (im & 0xfff));
-    else
-        tcg_out32(s, (cond << 28) | 0x05400000 |
-                        (rn << 16) | (rd << 12) | ((-im) & 0xfff));
+    tcg_out_memop_12(s, cond, INSN_STRB_IMM, rt, rn, imm12, 1, 0);
 }
 
-static inline void tcg_out_ld8_r(TCGContext *s, int cond,
-                int rd, int rn, int rm)
+static inline void tcg_out_ld8_r(TCGContext *s, int cond, TCGReg rt,
+                                 TCGReg rn, TCGReg rm)
 {
-    tcg_out32(s, (cond << 28) | 0x07d00000 |
-                    (rn << 16) | (rd << 12) | rm);
+    tcg_out_memop_r(s, cond, INSN_LDRB_REG, rt, rn, rm, 1, 1, 0);
 }
 
-static inline void tcg_out_st8_r(TCGContext *s, int cond,
-                int rd, int rn, int rm)
+static inline void tcg_out_st8_r(TCGContext *s, int cond, TCGReg rt,
+                                 TCGReg rn, TCGReg rm)
 {
-    tcg_out32(s, (cond << 28) | 0x07c00000 |
-                    (rn << 16) | (rd << 12) | rm);
+    tcg_out_memop_r(s, cond, INSN_STRB_REG, rt, rn, rm, 1, 1, 0);
 }
 
-static inline void tcg_out_ld8s_8(TCGContext *s, int cond,
-                int rd, int rn, tcg_target_long im)
+static inline void tcg_out_ld8s_8(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, int imm8)
 {
-    if (im >= 0)
-        tcg_out32(s, (cond << 28) | 0x01d000d0 |
-                        (rn << 16) | (rd << 12) |
-                        ((im & 0xf0) << 4) | (im & 0xf));
-    else
-        tcg_out32(s, (cond << 28) | 0x015000d0 |
-                        (rn << 16) | (rd << 12) |
-                        (((-im) & 0xf0) << 4) | ((-im) & 0xf));
+    tcg_out_memop_8(s, cond, INSN_LDRSB_IMM, rt, rn, imm8, 1, 0);
 }
 
-static inline void tcg_out_ld8s_r(TCGContext *s, int cond,
-                int rd, int rn, int rm)
+static inline void tcg_out_ld8s_r(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, TCGReg rm)
 {
-    tcg_out32(s, (cond << 28) | 0x019000d0 |
-                    (rn << 16) | (rd << 12) | rm);
+    tcg_out_memop_r(s, cond, INSN_LDRSB_REG, rt, rn, rm, 1, 1, 0);
 }
 
 static inline void tcg_out_ld32u(TCGContext *s, int cond,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 16/20] tcg-arm: Fix local stack frame
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (14 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 15/20] tcg-arm: Cleanup most primitive load store subroutines Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-29 16:50   ` Aurelien Jarno
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 17/20] tcg-arm: Split out tcg_out_tlb_read Richard Henderson
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

We were not allocating TCG_STATIC_CALL_ARGS_SIZE, so this meant that
any helper with more than 4 arguments would clobber the saved regs.
Realizing that we're supposed to have this memory pre-allocated means
we can clean up the tcg_out_arg functions, which were trying to do
more stack allocation.

Allocate stack memory for the TCG temporaries while we're at it.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 120 ++++++++++++++++++---------------------------------
 1 file changed, 43 insertions(+), 77 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index fb8c96d..70f63cb 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -1074,65 +1074,35 @@ static const void * const qemu_st_helpers[4] = {
  * argreg is where we want to put this argument, arg is the argument itself.
  * Return value is the updated argreg ready for the next call.
  * Note that argreg 0..3 is real registers, 4+ on stack.
- * When we reach the first stacked argument, we allocate space for it
- * and the following stacked arguments using "str r8, [sp, #-0x10]!".
- * Following arguments are filled in with "str r8, [sp, #0xNN]".
- * For more than 4 stacked arguments we'd need to know how much
- * space to allocate when we pushed the first stacked argument.
- * We don't need this, so don't implement it (and will assert if you try it.)
  *
  * We provide routines for arguments which are: immediate, 32 bit
  * value in register, 16 and 8 bit values in register (which must be zero
  * extended before use) and 64 bit value in a lo:hi register pair.
  */
-#define DEFINE_TCG_OUT_ARG(NAME, ARGPARAM)                                 \
-    static TCGReg NAME(TCGContext *s, TCGReg argreg, ARGPARAM)             \
-    {                                                                      \
-        if (argreg < 4) {                                                  \
-            TCG_OUT_ARG_GET_ARG(argreg);                                   \
-        } else if (argreg == 4) {                                          \
-            TCG_OUT_ARG_GET_ARG(TCG_REG_TMP);                              \
-            tcg_out32(s, (COND_AL << 28) | 0x052d0010 | (TCG_REG_TMP << 12)); \
-        } else {                                                           \
-            assert(argreg < 8);                                            \
-            TCG_OUT_ARG_GET_ARG(TCG_REG_TMP);                              \
-            tcg_out_st32_12(s, COND_AL, TCG_REG_TMP, TCG_REG_CALL_STACK,   \
-                            (argreg - 4) * 4);                             \
-        }                                                                  \
-        return argreg + 1;                                                 \
-    }
-
-#define TCG_OUT_ARG_GET_ARG(A) tcg_out_dat_imm(s, COND_AL, ARITH_MOV, A, 0, arg)
-DEFINE_TCG_OUT_ARG(tcg_out_arg_imm32, uint32_t arg)
-#undef TCG_OUT_ARG_GET_ARG
-#define TCG_OUT_ARG_GET_ARG(A) tcg_out_ext8u(s, COND_AL, A, arg)
-DEFINE_TCG_OUT_ARG(tcg_out_arg_reg8, TCGReg arg)
-#undef TCG_OUT_ARG_GET_ARG
-#define TCG_OUT_ARG_GET_ARG(A) tcg_out_ext16u(s, COND_AL, A, arg)
-DEFINE_TCG_OUT_ARG(tcg_out_arg_reg16, TCGReg arg)
-#undef TCG_OUT_ARG_GET_ARG
-
-/* We don't use the macro for this one to avoid an unnecessary reg-reg
- * move when storing to the stack.
- */
-static TCGReg tcg_out_arg_reg32(TCGContext *s, TCGReg argreg, TCGReg arg)
-{
-    if (argreg < 4) {
-        tcg_out_mov_reg(s, COND_AL, argreg, arg);
-    } else if (argreg == 4) {
-        /* str arg, [sp, #-0x10]! */
-        tcg_out32(s, (COND_AL << 28) | 0x052d0010 | (arg << 12));
-    } else {
-        assert(argreg < 8);
-        /* str arg, [sp, #0xNN] */
-        tcg_out32(s, (COND_AL << 28) | 0x058d0000 |
-                  (arg << 12) | (argreg - 4) * 4);
-    }
-    return argreg + 1;
-}
-
-static inline TCGReg tcg_out_arg_reg64(TCGContext *s, TCGReg argreg,
-                                       TCGReg arglo, TCGReg arghi)
+#define DEFINE_TCG_OUT_ARG(NAME, ARGTYPE, MOV_ARG, EXT_ARG)                \
+static TCGReg NAME(TCGContext *s, TCGReg argreg, ARGTYPE arg)              \
+{                                                                          \
+    if (argreg < 4) {                                                      \
+        MOV_ARG(s, COND_AL, argreg, arg);                                  \
+    } else {                                                               \
+        int ofs = (argreg - 4) * 4;                                        \
+        EXT_ARG;                                                           \
+        assert(ofs + 4 <= TCG_STATIC_CALL_ARGS_SIZE);                      \
+        tcg_out_st32_12(s, COND_AL, arg, TCG_REG_CALL_STACK, ofs);         \
+    }                                                                      \
+    return argreg + 1;                                                     \
+}
+
+DEFINE_TCG_OUT_ARG(tcg_out_arg_imm32, uint32_t, tcg_out_movi32,
+    (tcg_out_movi32(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
+DEFINE_TCG_OUT_ARG(tcg_out_arg_reg8, TCGReg, tcg_out_ext8u,
+    (tcg_out_ext8u(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
+DEFINE_TCG_OUT_ARG(tcg_out_arg_reg16, TCGReg, tcg_out_ext16u,
+    (tcg_out_ext16u(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
+DEFINE_TCG_OUT_ARG(tcg_out_arg_reg32, TCGReg, tcg_out_mov_reg, )
+
+static TCGReg tcg_out_arg_reg64(TCGContext *s, TCGReg argreg,
+                                TCGReg arglo, TCGReg arghi)
 {
     /* 64 bit arguments must go in even/odd register pairs
      * and in 8-aligned stack slots.
@@ -1144,16 +1114,7 @@ static inline TCGReg tcg_out_arg_reg64(TCGContext *s, TCGReg argreg,
     argreg = tcg_out_arg_reg32(s, argreg, arghi);
     return argreg;
 }
-
-static inline void tcg_out_arg_stacktidy(TCGContext *s, TCGReg argreg)
-{
-    /* Output any necessary post-call cleanup of the stack */
-    if (argreg > 4) {
-        tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R13, TCG_REG_R13, 0x10);
-    }
-}
-
-#endif
+#endif /* SOFTMMU */
 
 #define TLB_SHIFT	(CPU_TLB_ENTRY_BITS + CPU_TLB_BITS)
 
@@ -1284,7 +1245,6 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 #endif
     argreg = tcg_out_arg_imm32(s, argreg, mem_index);
     tcg_out_call(s, (tcg_target_long) qemu_ld_helpers[s_bits]);
-    tcg_out_arg_stacktidy(s, argreg);
 
     switch (opc) {
     case 0 | 4:
@@ -1502,7 +1462,6 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
 
     argreg = tcg_out_arg_imm32(s, argreg, mem_index);
     tcg_out_call(s, (tcg_target_long) qemu_st_helpers[s_bits]);
-    tcg_out_arg_stacktidy(s, argreg);
 
     reloc_pc24(label_ptr, (tcg_target_long)s->code_ptr);
 #else /* !CONFIG_SOFTMMU */
@@ -1560,6 +1519,7 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
 }
 
 static uint32_t tb_pop_ret;
+static uint32_t tb_frame_size;
 
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                 const TCGArg *args, const int *const_args)
@@ -1570,14 +1530,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     switch (opc) {
     case INDEX_op_exit_tb:
         a0 = args[0];
+        tcg_out_dat_rI(s, COND_AL, ARITH_ADD, TCG_REG_CALL_STACK,
+                       TCG_REG_CALL_STACK, tb_frame_size, 1);
         if (use_armv7_instructions || check_fit_imm(a0)) {
-            tcg_out_movi32(s, COND_AL, TCG_REG_R0, args[0]);
+            tcg_out_movi32(s, COND_AL, TCG_REG_R0, a0);
             tcg_out32(s, tb_pop_ret);
         } else {
             /* pc is always current address + 8, so 0 reads the word.  */
             tcg_out_ld32_12(s, COND_AL, TCG_REG_R0, TCG_REG_PC, 0);
             tcg_out32(s, tb_pop_ret);
-            tcg_out32(s, args[0]);
+            tcg_out32(s, a0);
         }
         break;
     case INDEX_op_goto_tb:
@@ -2002,8 +1964,6 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_PC);
 
     tcg_add_target_add_op_defs(arm_op_defs);
-    tcg_set_frame(s, TCG_AREG0, offsetof(CPUArchState, temp_buf),
-                  CPU_TEMP_BUF_NLONGS * sizeof(long));
 }
 
 static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
@@ -2032,17 +1992,23 @@ static inline void tcg_out_movi(TCGContext *s, TCGType type,
 
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
-    /* Calling convention requires us to save r4-r11 and lr;
-     * save also r12 to maintain stack 8-alignment.
-     */
-
+    /* Calling convention requires us to save r4-r11 and lr; save also r12
+       to maintain stack 8-alignment.  */
     /* stmdb sp!, { r4 - r12, lr } */
     tcg_out32(s, (COND_AL << 28) | 0x092d5ff0);
 
+    /* ldmia sp!, { r4 - r12, pc } */
+    tb_pop_ret = (COND_AL << 28) | 0x08bd9ff0;
+
+    /* Allocate the local stack frame.  */
+    tb_frame_size = TCG_STATIC_CALL_ARGS_SIZE;
+    tb_frame_size += CPU_TEMP_BUF_NLONGS * sizeof(long);
+    tcg_out_dat_rI(s, COND_AL, ARITH_SUB, TCG_REG_CALL_STACK,
+                   TCG_REG_CALL_STACK, tb_frame_size, 1);
+    tcg_set_frame(s, TCG_REG_CALL_STACK, TCG_STATIC_CALL_ARGS_SIZE,
+                  CPU_TEMP_BUF_NLONGS * sizeof(long));
+
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
 
     tcg_out_bx(s, COND_AL, tcg_target_call_iarg_regs[1]);
-
-    /* ldmia sp!, { r4 - r12, pc } */
-    tb_pop_ret = (COND_AL << 28) | 0x08bd9ff0;
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 17/20] tcg-arm: Split out tcg_out_tlb_read
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (15 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 16/20] tcg-arm: Fix local stack frame Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 18/20] tcg-arm: Improve scheduling of tcg_out_tlb_read Richard Henderson
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

Share code between qemu_ld and qemu_st to process the tlb.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 169 +++++++++++++++++++++------------------------------
 1 file changed, 70 insertions(+), 99 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 70f63cb..1d8f1f2 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -1114,40 +1114,15 @@ static TCGReg tcg_out_arg_reg64(TCGContext *s, TCGReg argreg,
     argreg = tcg_out_arg_reg32(s, argreg, arghi);
     return argreg;
 }
-#endif /* SOFTMMU */
 
 #define TLB_SHIFT	(CPU_TLB_ENTRY_BITS + CPU_TLB_BITS)
 
-static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
-{
-    int addr_reg, data_reg, data_reg2, bswap;
-#ifdef CONFIG_SOFTMMU
-    int mem_index, s_bits, tlb_offset;
-    TCGReg argreg;
-# if TARGET_LONG_BITS == 64
-    int addr_reg2;
-# endif
-    uint32_t *label_ptr;
-#endif
-
-#ifdef TARGET_WORDS_BIGENDIAN
-    bswap = 1;
-#else
-    bswap = 0;
-#endif
-    data_reg = *args++;
-    if (opc == 3)
-        data_reg2 = *args++;
-    else
-        data_reg2 = 0; /* suppress warning */
-    addr_reg = *args++;
-#ifdef CONFIG_SOFTMMU
-# if TARGET_LONG_BITS == 64
-    addr_reg2 = *args++;
-# endif
-    mem_index = *args;
-    s_bits = opc & 3;
+/* Load and compare a TLB entry, leaving the flags set.  Leaves R0 pointing
+   to the tlb entry.  Clobbers R1 and TMP.  */
 
+static void tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
+                             int s_bits, int tlb_offset)
+{
     /* Should generate something like the following:
      *  shr r8, addr_reg, #TARGET_PAGE_BITS
      *  and r0, r8, #(CPU_TLB_SIZE - 1)   @ Assumption: CPU_TLB_BITS <= 8
@@ -1157,13 +1132,13 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 #   error
 #  endif
     tcg_out_dat_reg(s, COND_AL, ARITH_MOV, TCG_REG_TMP,
-                    0, addr_reg, SHIFT_IMM_LSR(TARGET_PAGE_BITS));
+                    0, addrlo, SHIFT_IMM_LSR(TARGET_PAGE_BITS));
     tcg_out_dat_imm(s, COND_AL, ARITH_AND,
                     TCG_REG_R0, TCG_REG_TMP, CPU_TLB_SIZE - 1);
     tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_R0, TCG_AREG0,
                     TCG_REG_R0, SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS));
+
     /* We assume that the offset is contained within 20 bits.  */
-    tlb_offset = offsetof(CPUArchState, tlb_table[mem_index][0].addr_read);
     assert((tlb_offset & ~0xfffff) == 0);
     if (tlb_offset > 0xfff) {
         tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R0, TCG_REG_R0,
@@ -1173,16 +1148,48 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
     tcg_out_ld32_12wb(s, COND_AL, TCG_REG_R1, TCG_REG_R0, tlb_offset);
     tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, TCG_REG_R1,
                     TCG_REG_TMP, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
+
     /* Check alignment.  */
-    if (s_bits)
+    if (s_bits) {
         tcg_out_dat_imm(s, COND_EQ, ARITH_TST,
-                        0, addr_reg, (1 << s_bits) - 1);
-#  if TARGET_LONG_BITS == 64
-    /* XXX: possibly we could use a block data load in the first access.  */
-    tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0, 4);
-    tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0,
-                    TCG_REG_R1, addr_reg2, SHIFT_IMM_LSL(0));
-#  endif
+                        0, addrlo, (1 << s_bits) - 1);
+    }
+
+    if (TARGET_LONG_BITS == 64) {
+        /* XXX: possibly we could use a block data load in the first access. */
+        tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0, 4);
+        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0,
+                        TCG_REG_R1, addrhi, SHIFT_IMM_LSL(0));
+    }
+}
+#endif /* SOFTMMU */
+
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
+{
+    TCGReg addr_reg, data_reg, data_reg2;
+    bool bswap;
+#ifdef CONFIG_SOFTMMU
+    int mem_index, s_bits;
+    TCGReg argreg, addr_reg2;
+    uint32_t *label_ptr;
+#endif
+#ifdef TARGET_WORDS_BIGENDIAN
+    bswap = 1;
+#else
+    bswap = 0;
+#endif
+
+    data_reg = *args++;
+    data_reg2 = (opc == 3 ? *args++ : 0);
+    addr_reg = *args++;
+#ifdef CONFIG_SOFTMMU
+    addr_reg2 = (TARGET_LONG_BITS == 64 ? *args++ : 0);
+    mem_index = *args;
+    s_bits = opc & 3;
+
+    tcg_out_tlb_read(s, addr_reg, addr_reg2, s_bits,
+                     offsetof(CPUArchState, tlb_table[mem_index][0].addr_read));
+
     tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0,
                     offsetof(CPUTLBEntry, addend)
                     - offsetof(CPUTLBEntry, addr_read));
@@ -1238,11 +1245,11 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
      */
     argreg = TCG_REG_R0;
     argreg = tcg_out_arg_reg32(s, argreg, TCG_AREG0);
-#if TARGET_LONG_BITS == 64
-    argreg = tcg_out_arg_reg64(s, argreg, addr_reg, addr_reg2);
-#else
-    argreg = tcg_out_arg_reg32(s, argreg, addr_reg);
-#endif
+    if (TARGET_LONG_BITS == 64) {
+        argreg = tcg_out_arg_reg64(s, argreg, addr_reg, addr_reg2);
+    } else {
+        argreg = tcg_out_arg_reg32(s, argreg, addr_reg);
+    }
     argreg = tcg_out_arg_imm32(s, argreg, mem_index);
     tcg_out_call(s, (tcg_target_long) qemu_ld_helpers[s_bits]);
 
@@ -1269,8 +1276,7 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 #else /* !CONFIG_SOFTMMU */
     if (GUEST_BASE) {
         uint32_t offset = GUEST_BASE;
-        int i;
-        int rot;
+        int i, rot;
 
         while (offset) {
             i = ctz32(offset) & ~1;
@@ -1329,68 +1335,33 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 #endif
 }
 
-static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
 {
-    int addr_reg, data_reg, data_reg2, bswap;
+    TCGReg addr_reg, data_reg, data_reg2;
+    bool bswap;
 #ifdef CONFIG_SOFTMMU
-    int mem_index, s_bits, tlb_offset;
-    TCGReg argreg;
-# if TARGET_LONG_BITS == 64
-    int addr_reg2;
-# endif
+    int mem_index, s_bits;
+    TCGReg argreg, addr_reg2;
     uint32_t *label_ptr;
 #endif
-
 #ifdef TARGET_WORDS_BIGENDIAN
     bswap = 1;
 #else
     bswap = 0;
 #endif
+
     data_reg = *args++;
-    if (opc == 3)
-        data_reg2 = *args++;
-    else
-        data_reg2 = 0; /* suppress warning */
+    data_reg2 = (opc == 3 ? *args++ : 0);
     addr_reg = *args++;
 #ifdef CONFIG_SOFTMMU
-# if TARGET_LONG_BITS == 64
-    addr_reg2 = *args++;
-# endif
+    addr_reg2 = (TARGET_LONG_BITS == 64 ? *args++ : 0);
     mem_index = *args;
     s_bits = opc & 3;
 
-    /* Should generate something like the following:
-     *  shr r8, addr_reg, #TARGET_PAGE_BITS
-     *  and r0, r8, #(CPU_TLB_SIZE - 1)   @ Assumption: CPU_TLB_BITS <= 8
-     *  add r0, env, r0 lsl #CPU_TLB_ENTRY_BITS
-     */
-    tcg_out_dat_reg(s, COND_AL, ARITH_MOV,
-                    TCG_REG_TMP, 0, addr_reg, SHIFT_IMM_LSR(TARGET_PAGE_BITS));
-    tcg_out_dat_imm(s, COND_AL, ARITH_AND,
-                    TCG_REG_R0, TCG_REG_TMP, CPU_TLB_SIZE - 1);
-    tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_R0,
-                    TCG_AREG0, TCG_REG_R0, SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS));
-    /* We assume that the offset is contained within 20 bits.  */
-    tlb_offset = offsetof(CPUArchState, tlb_table[mem_index][0].addr_write);
-    assert((tlb_offset & ~0xfffff) == 0);
-    if (tlb_offset > 0xfff) {
-        tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R0, TCG_REG_R0,
-                        0xa00 | (tlb_offset >> 12));
-        tlb_offset &= 0xfff;
-    }
-    tcg_out_ld32_12wb(s, COND_AL, TCG_REG_R1, TCG_REG_R0, tlb_offset);
-    tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, TCG_REG_R1,
-                    TCG_REG_TMP, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
-    /* Check alignment.  */
-    if (s_bits)
-        tcg_out_dat_imm(s, COND_EQ, ARITH_TST,
-                        0, addr_reg, (1 << s_bits) - 1);
-#  if TARGET_LONG_BITS == 64
-    /* XXX: possibly we could use a block data load in the first access.  */
-    tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0, 4);
-    tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0,
-                    TCG_REG_R1, addr_reg2, SHIFT_IMM_LSL(0));
-#  endif
+    tcg_out_tlb_read(s, addr_reg, addr_reg2, s_bits,
+                     offsetof(CPUArchState,
+                              tlb_table[mem_index][0].addr_write));
+
     tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0,
                     offsetof(CPUTLBEntry, addend)
                     - offsetof(CPUTLBEntry, addr_write));
@@ -1439,11 +1410,11 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
      */
     argreg = TCG_REG_R0;
     argreg = tcg_out_arg_reg32(s, argreg, TCG_AREG0);
-#if TARGET_LONG_BITS == 64
-    argreg = tcg_out_arg_reg64(s, argreg, addr_reg, addr_reg2);
-#else
-    argreg = tcg_out_arg_reg32(s, argreg, addr_reg);
-#endif
+    if (TARGET_LONG_BITS == 64) {
+        argreg = tcg_out_arg_reg64(s, argreg, addr_reg, addr_reg2);
+    } else {
+        argreg = tcg_out_arg_reg32(s, argreg, addr_reg);
+    }
 
     switch (opc) {
     case 0:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 18/20] tcg-arm: Improve scheduling of tcg_out_tlb_read
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (16 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 17/20] tcg-arm: Split out tcg_out_tlb_read Richard Henderson
@ 2013-03-28 15:32 ` Richard Henderson
  2013-03-28 15:33 ` [Qemu-devel] [PATCH v3 19/20] tcg-arm: Use movi32 + blx for calls on v7 Richard Henderson
  2013-03-28 15:33 ` [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION Richard Henderson
  19 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

The schedule was fully serial, with no possibility for dual issue.
The old schedule had a minimal issue of 7 cycles; the new schedule
has a minimal issue of 5 cycles.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 110 ++++++++++++++++++++++++++-------------------------
 1 file changed, 57 insertions(+), 53 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 1d8f1f2..3949623 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -178,18 +178,12 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         ct->ct |= TCG_CT_REG;
         tcg_regset_set32(ct->u.regs, 0, (1 << TCG_TARGET_NB_REGS) - 1);
 #ifdef CONFIG_SOFTMMU
-        /* r0 and r1 will be overwritten when reading the tlb entry,
+        /* r0-r2 will be overwritten when reading the tlb entry,
            so don't use these. */
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_R0);
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_R1);
-#if TARGET_LONG_BITS == 64
-        /* If we're passing env to the helper as r0 and need a regpair
-         * for the address then r2 will be overwritten as we're setting
-         * up the args to the helper.
-         */
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_R2);
 #endif
-#endif
         break;
     case 'L':
         ct->ct |= TCG_CT_REG;
@@ -203,30 +197,16 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
 
     /* qemu_st address & data_reg */
     case 's':
-        ct->ct |= TCG_CT_REG;
-        tcg_regset_set32(ct->u.regs, 0, (1 << TCG_TARGET_NB_REGS) - 1);
-        /* r0 and r1 will be overwritten when reading the tlb entry
-           (softmmu only) and doing the byte swapping, so don't
-           use these. */
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R0);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R1);
-#if defined(CONFIG_SOFTMMU) && (TARGET_LONG_BITS == 64)
-        /* Avoid clashes with registers being used for helper args */
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R2);
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R3);
-#endif
-        break;
     /* qemu_st64 data_reg2 */
     case 'S':
         ct->ct |= TCG_CT_REG;
         tcg_regset_set32(ct->u.regs, 0, (1 << TCG_TARGET_NB_REGS) - 1);
-        /* r0 and r1 will be overwritten when reading the tlb entry
-            (softmmu only) and doing the byte swapping, so don't
-            use these. */
+        /* r0-r2 will be overwritten when reading the tlb entry (softmmu only)
+           and r0-r1 doing the byte swapping, so don't use these. */
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_R0);
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_R1);
-#ifdef CONFIG_SOFTMMU
-        /* r2 is still needed to load data_reg, so don't use it. */
+#if defined(CONFIG_SOFTMMU)
+        /* Avoid clashes with registers being used for helper args */
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_R2);
 #if TARGET_LONG_BITS == 64
         /* Avoid clashes with registers being used for helper args */
@@ -341,6 +321,8 @@ typedef enum {
     INSN_LDRSB_REG = 0x001000d0,
     INSN_STRB_IMM  = 0x04400000,
     INSN_STRB_REG  = 0x06400000,
+
+    INSN_LDRD_IMM  = 0x004000d0,
 } ARMInsn;
 
 #define SHIFT_IMM_LSL(im)	(((im) << 7) | 0x00)
@@ -806,15 +788,6 @@ static inline void tcg_out_ld32_12(TCGContext *s, int cond, TCGReg rt,
     tcg_out_memop_12(s, cond, INSN_LDR_IMM, rt, rn, imm12, 1, 0);
 }
 
-/* Offset pre-increment with base writeback.  */
-static inline void tcg_out_ld32_12wb(TCGContext *s, int cond, TCGReg rt,
-                                     TCGReg rn, int imm12)
-{
-    /* ldr with writeback and both register equals is UNPREDICTABLE */
-    assert(rd != rn);
-    tcg_out_memop_12(s, cond, INSN_LDR_IMM, rt, rn, imm12, 1, 1);
-}
-
 static inline void tcg_out_st32_12(TCGContext *s, int cond, TCGReg rt,
                                    TCGReg rn, int imm12)
 {
@@ -1117,47 +1090,78 @@ static TCGReg tcg_out_arg_reg64(TCGContext *s, TCGReg argreg,
 
 #define TLB_SHIFT	(CPU_TLB_ENTRY_BITS + CPU_TLB_BITS)
 
-/* Load and compare a TLB entry, leaving the flags set.  Leaves R0 pointing
+/* Load and compare a TLB entry, leaving the flags set.  Leaves R2 pointing
    to the tlb entry.  Clobbers R1 and TMP.  */
 
 static void tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
                              int s_bits, int tlb_offset)
 {
+    TCGReg base = TCG_AREG0;
+
     /* Should generate something like the following:
-     *  shr r8, addr_reg, #TARGET_PAGE_BITS
-     *  and r0, r8, #(CPU_TLB_SIZE - 1)   @ Assumption: CPU_TLB_BITS <= 8
-     *  add r0, env, r0 lsl #CPU_TLB_ENTRY_BITS
+     * pre-v7:
+     *   shr    tmp, addr_reg, #TARGET_PAGE_BITS                  (1)
+     *   add    r2, env, #off & 0xff00
+     *   and    r0, tmp, #(CPU_TLB_SIZE - 1)                      (2)
+     *   add    r2, r2, r0, lsl #CPU_TLB_ENTRY_BITS               (3)
+     *   ldr    r0, [r2, #off & 0xff]!                            (4)
+     *   tst    addr_reg, #s_mask
+     *   cmpeq  r0, tmp, lsl #TARGET_PAGE_BITS                    (5)
+     *
+     * v7 (not implemented yet):
+     *   ubfx   r2, addr_reg, #TARGET_PAGE_BITS, #CPU_TLB_BITS    (1)
+     *   movw   tmp, #~TARGET_PAGE_MASK & ~s_mask
+     *   movw   r0, #off
+     *   add    r2, env, r2, lsl #CPU_TLB_ENTRY_BITS              (2)
+     *   bic    tmp, addr_reg, tmp
+     *   ldr    r0, [r2, r0]!                                     (3)
+     *   cmp    r0, tmp                                           (4)
      */
 #  if CPU_TLB_BITS > 8
 #   error
 #  endif
     tcg_out_dat_reg(s, COND_AL, ARITH_MOV, TCG_REG_TMP,
                     0, addrlo, SHIFT_IMM_LSR(TARGET_PAGE_BITS));
+
+    /* We assume that the offset is contained within 16 bits.  */
+    assert((tlb_offset & ~0xffff) == 0);
+    if (tlb_offset > 0xff) {
+        tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R2, base,
+                        (24 << 7) | (tlb_offset >> 8));
+        tlb_offset &= 0xff;
+        base = TCG_REG_R2;
+    }
+
     tcg_out_dat_imm(s, COND_AL, ARITH_AND,
                     TCG_REG_R0, TCG_REG_TMP, CPU_TLB_SIZE - 1);
-    tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_R0, TCG_AREG0,
+    tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_R2, base,
                     TCG_REG_R0, SHIFT_IMM_LSL(CPU_TLB_ENTRY_BITS));
 
-    /* We assume that the offset is contained within 20 bits.  */
-    assert((tlb_offset & ~0xfffff) == 0);
-    if (tlb_offset > 0xfff) {
-        tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R0, TCG_REG_R0,
-                        0xa00 | (tlb_offset >> 12));
-        tlb_offset &= 0xfff;
+    /* Load the tlb comparator.  Use ldrd if needed and available,
+       but due to how the pointer needs setting up, ldm isn't useful.
+       Base arm5 doesn't have ldrd, but armv5te does.  */
+    if (use_armv6_instructions && TARGET_LONG_BITS == 64) {
+        tcg_out_memop_8(s, COND_AL, INSN_LDRD_IMM, TCG_REG_R0,
+                        TCG_REG_R2, tlb_offset, 1, 1);
+    } else {
+        tcg_out_memop_12(s, COND_AL, INSN_LDR_IMM, TCG_REG_R0,
+                         TCG_REG_R2, tlb_offset, 1, 1);
+        if (TARGET_LONG_BITS == 64) {
+            tcg_out_memop_12(s, COND_AL, INSN_LDR_IMM, TCG_REG_R0,
+                             TCG_REG_R2, 4, 1, 0);
+        }
     }
-    tcg_out_ld32_12wb(s, COND_AL, TCG_REG_R1, TCG_REG_R0, tlb_offset);
-    tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0, TCG_REG_R1,
-                    TCG_REG_TMP, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
 
     /* Check alignment.  */
     if (s_bits) {
-        tcg_out_dat_imm(s, COND_EQ, ARITH_TST,
+        tcg_out_dat_imm(s, COND_AL, ARITH_TST,
                         0, addrlo, (1 << s_bits) - 1);
     }
 
+    tcg_out_dat_reg(s, (s_bits ? COND_EQ : COND_AL), ARITH_CMP, 0,
+                    TCG_REG_R0, TCG_REG_TMP, SHIFT_IMM_LSL(TARGET_PAGE_BITS));
+
     if (TARGET_LONG_BITS == 64) {
-        /* XXX: possibly we could use a block data load in the first access. */
-        tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0, 4);
         tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0,
                         TCG_REG_R1, addrhi, SHIFT_IMM_LSL(0));
     }
@@ -1190,7 +1194,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
     tcg_out_tlb_read(s, addr_reg, addr_reg2, s_bits,
                      offsetof(CPUArchState, tlb_table[mem_index][0].addr_read));
 
-    tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0,
+    tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R2,
                     offsetof(CPUTLBEntry, addend)
                     - offsetof(CPUTLBEntry, addr_read));
 
@@ -1362,7 +1366,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
                      offsetof(CPUArchState,
                               tlb_table[mem_index][0].addr_write));
 
-    tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R0,
+    tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R2,
                     offsetof(CPUTLBEntry, addend)
                     - offsetof(CPUTLBEntry, addr_write));
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 19/20] tcg-arm: Use movi32 + blx for calls on v7
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (17 preceding siblings ...)
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 18/20] tcg-arm: Improve scheduling of tcg_out_tlb_read Richard Henderson
@ 2013-03-28 15:33 ` Richard Henderson
  2013-03-28 15:33 ` [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION Richard Henderson
  19 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

Work better with branch predition when we have movw+movt,
as the size of the code is the same.  Perhaps re-evaluate
when we have a proper constant pool.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 3949623..148fd6f 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -988,6 +988,9 @@ static inline void tcg_out_call(TCGContext *s, uint32_t addr)
         } else {
             tcg_out_bl(s, COND_AL, val);
         }
+    } else if (use_armv7_instructions) {
+        tcg_out_movi32(s, COND_AL, TCG_REG_TMP, addr);
+        tcg_out_blx(s, COND_AL, TCG_REG_TMP);
     } else {
         tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R14, TCG_REG_PC, 4);
         tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION
  2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
                   ` (18 preceding siblings ...)
  2013-03-28 15:33 ` [Qemu-devel] [PATCH v3 19/20] tcg-arm: Use movi32 + blx for calls on v7 Richard Henderson
@ 2013-03-28 15:33 ` Richard Henderson
  2013-03-28 16:44   ` Peter Maydell
  19 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Aurelien Jarno

Move the slow path out of line, as the TODO's mention.
This allows the fast path to be unconditional, which can
speed up the fast path as well, depending on the core.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 configure               |   2 +-
 include/exec/exec-all.h |  17 +++
 tcg/arm/tcg-target.c    | 321 +++++++++++++++++++++++++++++++-----------------
 3 files changed, 229 insertions(+), 111 deletions(-)

diff --git a/configure b/configure
index f2af714..bae3132 100755
--- a/configure
+++ b/configure
@@ -4124,7 +4124,7 @@ upper() {
 }
 
 case "$cpu" in
-  i386|x86_64|ppc)
+  arm|i386|x86_64|ppc)
     # The TCG interpreter currently does not support ld/st optimization.
     if test "$tcg_interpreter" = "no" ; then
         echo "CONFIG_QEMU_LDST_OPTIMIZATION=y" >> $config_target_mak
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 190effa..01bf3cc 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -321,6 +321,23 @@ extern uintptr_t tci_tb_ptr;
 # elif defined (_ARCH_PPC) && !defined (_ARCH_PPC64)
 #  define GETRA() ((uintptr_t)__builtin_return_address(0))
 #  define GETPC_LDST() ((uintptr_t) ((*(int32_t *)(GETRA() - 4)) - 1))
+# elif defined (__arm__)
+/* We define two insns between the return address and the branch back to
+   straight-line.  Find and decode that branch insn.  */
+#  define GETRA()       ((uintptr_t)__builtin_return_address(0))
+#  define GETPC_LDST()  tcg_getpc_ldst(GETRA())
+static inline uintptr_t tcg_getpc_ldst(uintptr_t ra)
+{
+    int32_t b;
+    ra += 8;                    /* skip the two insns */
+    b = *(int32_t *)ra;         /* load the branch insn */
+    b = (b << 8) >> (8 - 2);    /* extract the displacement */
+    ra += 8;                    /* branches are relative to pc+8 */
+    ra += b;                    /* apply the displacement */
+    ra -= 4;                    /* return a pointer into the current opcode,
+                                   not the start of the next opcode  */
+    return ra;
+}
 # else
 #  error "CONFIG_QEMU_LDST_OPTIMIZATION needs GETPC_LDST() implementation!"
 # endif
diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 148fd6f..78452ee 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -415,6 +415,21 @@ static inline void tcg_out_dat_reg(TCGContext *s,
                     (rn << 16) | (rd << 12) | shift | rm);
 }
 
+static inline void tcg_out_nop(TCGContext *s)
+{
+    if (use_armv7_instructions) {
+        /* Architected nop introduced in v6k.  */
+        /* ??? This is an MSR (imm) 0,0,0 insn.  Anyone know if this
+           also Just So Happened to do nothing on pre-v6k so that we
+           don't need to conditionalize it?  */
+        tcg_out32(s, 0xe320f000);
+    } else {
+        /* Prior to that the assembler uses mov r0, r0.  Unlike the nop
+           above, this is guaranteed to consume execution resources.  */
+        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, 0, 0, 0, SHIFT_IMM_LSL(0));
+    }
+}
+
 static inline void tcg_out_mov_reg(TCGContext *s, int cond, int rd, int rm)
 {
     /* Simple reg-reg move, optimising out the 'do nothing' case */
@@ -1009,14 +1024,19 @@ static inline void tcg_out_callr(TCGContext *s, int cond, int arg)
     }
 }
 
+static inline void tcg_out_goto(TCGContext *s, int cond, int addr)
+{
+    int32_t val = addr - (tcg_target_long) s->code_ptr;
+    assert(val - 8 < 0x01fffffd && val - 8 > -0x01fffffd);
+    tcg_out_b(s, cond, val);
+}
+
 static inline void tcg_out_goto_label(TCGContext *s, int cond, int label_index)
 {
     TCGLabel *l = &s->labels[label_index];
 
     if (l->has_value) {
-        int32_t val = l->u.value - (tcg_target_long) s->code_ptr;
-        assert(val - 8 < 0x01fffffd && val - 8 > -0x01fffffd);
-        tcg_out_b(s, cond, val);
+        tcg_out_goto(s, cond, l->u.value);
     } else {
         tcg_out_reloc(s, s->code_ptr, R_ARM_PC24, label_index, 31337);
         tcg_out_b_noaddr(s, cond);
@@ -1169,6 +1189,134 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
                         TCG_REG_R1, addrhi, SHIFT_IMM_LSL(0));
     }
 }
+
+/* Record the context of a call to the out of line helper code for the slow
+   path for a load or store, so that we can later generate the correct
+   helper code.  */
+static void add_qemu_ldst_label(TCGContext *s, int is_ld, int opc,
+                                int data_reg, int data_reg2, int addrlo_reg,
+                                int addrhi_reg, int mem_index,
+                                uint8_t *raddr, uint8_t *label_ptr)
+{
+    int idx;
+    TCGLabelQemuLdst *label;
+
+    if (s->nb_qemu_ldst_labels >= TCG_MAX_QEMU_LDST) {
+        tcg_abort();
+    }
+
+    idx = s->nb_qemu_ldst_labels++;
+    label = (TCGLabelQemuLdst *)&s->qemu_ldst_labels[idx];
+    label->is_ld = is_ld;
+    label->opc = opc;
+    label->datalo_reg = data_reg;
+    label->datahi_reg = data_reg2;
+    label->addrlo_reg = addrlo_reg;
+    label->addrhi_reg = addrhi_reg;
+    label->mem_index = mem_index;
+    label->raddr = raddr;
+    label->label_ptr[0] = label_ptr;
+}
+
+static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
+{
+    TCGReg argreg, data_reg, data_reg2;
+    uint8_t *start;
+
+    reloc_pc24(lb->label_ptr[0], (tcg_target_long)s->code_ptr);
+
+    argreg = tcg_out_arg_reg32(s, TCG_REG_R0, TCG_AREG0);
+    if (TARGET_LONG_BITS == 64) {
+        argreg = tcg_out_arg_reg64(s, argreg, lb->addrlo_reg, lb->addrhi_reg);
+    } else {
+        argreg = tcg_out_arg_reg32(s, argreg, lb->addrlo_reg);
+    }
+    argreg = tcg_out_arg_imm32(s, argreg, lb->mem_index);
+    tcg_out_call(s, (tcg_target_long) qemu_ld_helpers[lb->opc & 3]);
+
+    data_reg = lb->datalo_reg;
+    data_reg2 = lb->datahi_reg;
+
+    start = s->code_ptr;
+    switch (lb->opc) {
+    case 0 | 4:
+        tcg_out_ext8s(s, COND_AL, data_reg, TCG_REG_R0);
+        break;
+    case 1 | 4:
+        tcg_out_ext16s(s, COND_AL, data_reg, TCG_REG_R0);
+        break;
+    case 0:
+    case 1:
+    case 2:
+    default:
+        tcg_out_mov_reg(s, COND_AL, data_reg, TCG_REG_R0);
+        break;
+    case 3:
+        tcg_out_mov_reg(s, COND_AL, data_reg, TCG_REG_R0);
+        tcg_out_mov_reg(s, COND_AL, data_reg2, TCG_REG_R1);
+        break;
+    }
+
+    /* For GETPC_LDST in exec-all.h, we architect exactly 2 insns between
+       the call and the branch back to straight-line code.  Note that the
+       moves above could be elided by register allocation, nor do we know
+       which code alternative we chose for extension.  */
+    switch (s->code_ptr - start) {
+    case 0:
+        tcg_out_nop(s);
+        /* FALLTHRU */
+    case 4:
+        tcg_out_nop(s);
+        /* FALLTHRU */
+    case 8:
+        break;
+    default:
+        abort();
+    }
+
+    tcg_out_goto(s, COND_AL, (tcg_target_long)lb->raddr);
+}
+
+static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
+{
+    TCGReg argreg, data_reg, data_reg2;
+
+    reloc_pc24(lb->label_ptr[0], (tcg_target_long)s->code_ptr);
+
+    argreg = TCG_REG_R0;
+    argreg = tcg_out_arg_reg32(s, argreg, TCG_AREG0);
+    if (TARGET_LONG_BITS == 64) {
+        argreg = tcg_out_arg_reg64(s, argreg, lb->addrlo_reg, lb->addrhi_reg);
+    } else {
+        argreg = tcg_out_arg_reg32(s, argreg, lb->addrlo_reg);
+    }
+
+    data_reg = lb->datalo_reg;
+    data_reg2 = lb->datahi_reg;
+    switch (lb->opc) {
+    case 0:
+        argreg = tcg_out_arg_reg8(s, argreg, data_reg);
+        break;
+    case 1:
+        argreg = tcg_out_arg_reg16(s, argreg, data_reg);
+        break;
+    case 2:
+        argreg = tcg_out_arg_reg32(s, argreg, data_reg);
+        break;
+    case 3:
+        argreg = tcg_out_arg_reg64(s, argreg, data_reg, data_reg2);
+        break;
+    }
+
+    argreg = tcg_out_arg_imm32(s, argreg, lb->mem_index);
+    tcg_out_call(s, (tcg_target_long) qemu_st_helpers[lb->opc & 3]);
+
+    /* For GETPC_LDST in exec-all.h, we architect exactly 2 insns between
+       the call and the branch back to straight-line code.  */
+    tcg_out_nop(s);
+    tcg_out_nop(s);
+    tcg_out_goto(s, COND_AL, (tcg_target_long)lb->raddr);
+}
 #endif /* SOFTMMU */
 
 static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
@@ -1177,8 +1325,8 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
     bool bswap;
 #ifdef CONFIG_SOFTMMU
     int mem_index, s_bits;
-    TCGReg argreg, addr_reg2;
-    uint32_t *label_ptr;
+    TCGReg addr_reg2;
+    uint8_t *label_ptr;
 #endif
 #ifdef TARGET_WORDS_BIGENDIAN
     bswap = 1;
@@ -1197,89 +1345,56 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
     tcg_out_tlb_read(s, addr_reg, addr_reg2, s_bits,
                      offsetof(CPUArchState, tlb_table[mem_index][0].addr_read));
 
-    tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R2,
+    label_ptr = s->code_ptr;
+    tcg_out_b_noaddr(s, COND_NE);
+
+    tcg_out_ld32_12(s, COND_AL, TCG_REG_R1, TCG_REG_R2,
                     offsetof(CPUTLBEntry, addend)
                     - offsetof(CPUTLBEntry, addr_read));
 
     switch (opc) {
     case 0:
-        tcg_out_ld8_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
+        tcg_out_ld8_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
         break;
     case 0 | 4:
-        tcg_out_ld8s_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
+        tcg_out_ld8s_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
         break;
     case 1:
-        tcg_out_ld16u_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
+        tcg_out_ld16u_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
         if (bswap) {
-            tcg_out_bswap16(s, COND_EQ, data_reg, data_reg);
+            tcg_out_bswap16(s, COND_AL, data_reg, data_reg);
         }
         break;
     case 1 | 4:
         if (bswap) {
-            tcg_out_ld16u_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
-            tcg_out_bswap16s(s, COND_EQ, data_reg, data_reg);
+            tcg_out_ld16u_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
+            tcg_out_bswap16s(s, COND_AL, data_reg, data_reg);
         } else {
-            tcg_out_ld16s_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
+            tcg_out_ld16s_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
         }
         break;
     case 2:
     default:
-        tcg_out_ld32_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
+        tcg_out_ld32_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
         if (bswap) {
-            tcg_out_bswap32(s, COND_EQ, data_reg, data_reg);
+            tcg_out_bswap32(s, COND_AL, data_reg, data_reg);
         }
         break;
     case 3:
         if (bswap) {
-            tcg_out_ld32_rwb(s, COND_EQ, data_reg2, TCG_REG_R1, addr_reg);
-            tcg_out_ld32_12(s, COND_EQ, data_reg, TCG_REG_R1, 4);
-            tcg_out_bswap32(s, COND_EQ, data_reg2, data_reg2);
-            tcg_out_bswap32(s, COND_EQ, data_reg, data_reg);
+            tcg_out_ld32_rwb(s, COND_AL, data_reg2, TCG_REG_R1, addr_reg);
+            tcg_out_ld32_12(s, COND_AL, data_reg, TCG_REG_R1, 4);
+            tcg_out_bswap32(s, COND_AL, data_reg2, data_reg2);
+            tcg_out_bswap32(s, COND_AL, data_reg, data_reg);
         } else {
-            tcg_out_ld32_rwb(s, COND_EQ, data_reg, TCG_REG_R1, addr_reg);
-            tcg_out_ld32_12(s, COND_EQ, data_reg2, TCG_REG_R1, 4);
+            tcg_out_ld32_rwb(s, COND_AL, data_reg, TCG_REG_R1, addr_reg);
+            tcg_out_ld32_12(s, COND_AL, data_reg2, TCG_REG_R1, 4);
         }
         break;
     }
 
-    label_ptr = (void *) s->code_ptr;
-    tcg_out_b_noaddr(s, COND_EQ);
-
-    /* TODO: move this code to where the constants pool will be */
-    /* Note that this code relies on the constraints we set in arm_op_defs[]
-     * to ensure that later arguments are not passed to us in registers we
-     * trash by moving the earlier arguments into them.
-     */
-    argreg = TCG_REG_R0;
-    argreg = tcg_out_arg_reg32(s, argreg, TCG_AREG0);
-    if (TARGET_LONG_BITS == 64) {
-        argreg = tcg_out_arg_reg64(s, argreg, addr_reg, addr_reg2);
-    } else {
-        argreg = tcg_out_arg_reg32(s, argreg, addr_reg);
-    }
-    argreg = tcg_out_arg_imm32(s, argreg, mem_index);
-    tcg_out_call(s, (tcg_target_long) qemu_ld_helpers[s_bits]);
-
-    switch (opc) {
-    case 0 | 4:
-        tcg_out_ext8s(s, COND_AL, data_reg, TCG_REG_R0);
-        break;
-    case 1 | 4:
-        tcg_out_ext16s(s, COND_AL, data_reg, TCG_REG_R0);
-        break;
-    case 0:
-    case 1:
-    case 2:
-    default:
-        tcg_out_mov_reg(s, COND_AL, data_reg, TCG_REG_R0);
-        break;
-    case 3:
-        tcg_out_mov_reg(s, COND_AL, data_reg, TCG_REG_R0);
-        tcg_out_mov_reg(s, COND_AL, data_reg2, TCG_REG_R1);
-        break;
-    }
-
-    reloc_pc24(label_ptr, (tcg_target_long)s->code_ptr);
+    add_qemu_ldst_label(s, 1, opc, data_reg, data_reg2, addr_reg, addr_reg2,
+                        mem_index, s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
     if (GUEST_BASE) {
         uint32_t offset = GUEST_BASE;
@@ -1348,8 +1463,8 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
     bool bswap;
 #ifdef CONFIG_SOFTMMU
     int mem_index, s_bits;
-    TCGReg argreg, addr_reg2;
-    uint32_t *label_ptr;
+    TCGReg addr_reg2;
+    uint8_t *label_ptr;
 #endif
 #ifdef TARGET_WORDS_BIGENDIAN
     bswap = 1;
@@ -1369,79 +1484,49 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
                      offsetof(CPUArchState,
                               tlb_table[mem_index][0].addr_write));
 
-    tcg_out_ld32_12(s, COND_EQ, TCG_REG_R1, TCG_REG_R2,
+    label_ptr = s->code_ptr;
+    tcg_out_b_noaddr(s, COND_NE);
+
+    tcg_out_ld32_12(s, COND_AL, TCG_REG_R1, TCG_REG_R2,
                     offsetof(CPUTLBEntry, addend)
                     - offsetof(CPUTLBEntry, addr_write));
 
     switch (opc) {
     case 0:
-        tcg_out_st8_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
+        tcg_out_st8_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
         break;
     case 1:
         if (bswap) {
-            tcg_out_bswap16st(s, COND_EQ, TCG_REG_R0, data_reg);
-            tcg_out_st16_r(s, COND_EQ, TCG_REG_R0, addr_reg, TCG_REG_R1);
+            tcg_out_bswap16st(s, COND_AL, TCG_REG_R0, data_reg);
+            tcg_out_st16_r(s, COND_AL, TCG_REG_R0, addr_reg, TCG_REG_R1);
         } else {
-            tcg_out_st16_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
+            tcg_out_st16_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
         }
         break;
     case 2:
     default:
         if (bswap) {
-            tcg_out_bswap32(s, COND_EQ, TCG_REG_R0, data_reg);
-            tcg_out_st32_r(s, COND_EQ, TCG_REG_R0, addr_reg, TCG_REG_R1);
+            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, data_reg);
+            tcg_out_st32_r(s, COND_AL, TCG_REG_R0, addr_reg, TCG_REG_R1);
         } else {
-            tcg_out_st32_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
+            tcg_out_st32_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
         }
         break;
     case 3:
         if (bswap) {
-            tcg_out_bswap32(s, COND_EQ, TCG_REG_R0, data_reg2);
-            tcg_out_st32_rwb(s, COND_EQ, TCG_REG_R0, TCG_REG_R1, addr_reg);
-            tcg_out_bswap32(s, COND_EQ, TCG_REG_R0, data_reg);
-            tcg_out_st32_12(s, COND_EQ, TCG_REG_R0, TCG_REG_R1, 4);
+            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, data_reg2);
+            tcg_out_st32_rwb(s, COND_AL, TCG_REG_R0, TCG_REG_R1, addr_reg);
+            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, data_reg);
+            tcg_out_st32_12(s, COND_AL, TCG_REG_R0, TCG_REG_R1, 4);
         } else {
-            tcg_out_st32_rwb(s, COND_EQ, data_reg, TCG_REG_R1, addr_reg);
-            tcg_out_st32_12(s, COND_EQ, data_reg2, TCG_REG_R1, 4);
+            tcg_out_st32_rwb(s, COND_AL, data_reg, TCG_REG_R1, addr_reg);
+            tcg_out_st32_12(s, COND_AL, data_reg2, TCG_REG_R1, 4);
         }
         break;
     }
 
-    label_ptr = (void *) s->code_ptr;
-    tcg_out_b_noaddr(s, COND_EQ);
-
-    /* TODO: move this code to where the constants pool will be */
-    /* Note that this code relies on the constraints we set in arm_op_defs[]
-     * to ensure that later arguments are not passed to us in registers we
-     * trash by moving the earlier arguments into them.
-     */
-    argreg = TCG_REG_R0;
-    argreg = tcg_out_arg_reg32(s, argreg, TCG_AREG0);
-    if (TARGET_LONG_BITS == 64) {
-        argreg = tcg_out_arg_reg64(s, argreg, addr_reg, addr_reg2);
-    } else {
-        argreg = tcg_out_arg_reg32(s, argreg, addr_reg);
-    }
-
-    switch (opc) {
-    case 0:
-        argreg = tcg_out_arg_reg8(s, argreg, data_reg);
-        break;
-    case 1:
-        argreg = tcg_out_arg_reg16(s, argreg, data_reg);
-        break;
-    case 2:
-        argreg = tcg_out_arg_reg32(s, argreg, data_reg);
-        break;
-    case 3:
-        argreg = tcg_out_arg_reg64(s, argreg, data_reg, data_reg2);
-        break;
-    }
-
-    argreg = tcg_out_arg_imm32(s, argreg, mem_index);
-    tcg_out_call(s, (tcg_target_long) qemu_st_helpers[s_bits]);
-
-    reloc_pc24(label_ptr, (tcg_target_long)s->code_ptr);
+    add_qemu_ldst_label(s, 0, opc, data_reg, data_reg2, addr_reg, addr_reg2,
+                        mem_index, s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
     if (GUEST_BASE) {
         uint32_t offset = GUEST_BASE;
@@ -1828,6 +1913,22 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     }
 }
 
+#ifdef CONFIG_SOFTMMU
+/* Generate TB finalization at the end of block.  */
+void tcg_out_tb_finalize(TCGContext *s)
+{
+    int i;
+    for (i = 0; i < s->nb_qemu_ldst_labels; i++) {
+        TCGLabelQemuLdst *label = &s->qemu_ldst_labels[i];
+        if (label->is_ld) {
+            tcg_out_qemu_ld_slow_path(s, label);
+        } else {
+            tcg_out_qemu_st_slow_path(s, label);
+        }
+    }
+}
+#endif /* SOFTMMU */
+
 static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_exit_tb, { } },
     { INDEX_op_goto_tb, { } },
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/20] tcg-arm: Handle constant arguments to add2/sub2
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 05/20] tcg-arm: Handle constant arguments to add2/sub2 Richard Henderson
@ 2013-03-28 15:56   ` Peter Maydell
  2013-03-28 16:04     ` Richard Henderson
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Maydell @ 2013-03-28 15:56 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Aurelien Jarno

On 28 March 2013 15:32, Richard Henderson <rth@twiddle.net> wrote:
> We get to re-use the _rIN and _rIK subroutines to handle the various
> combinations of add vs sub.  Fold the << 21 into the opcode enum values
> so that we can explicitly add TO_CPSR as desired.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/arm/tcg-target.c | 106 ++++++++++++++++++++++++++++-----------------------
>  1 file changed, 58 insertions(+), 48 deletions(-)
>
> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
> index 3152798..48cc186 100644
> --- a/tcg/arm/tcg-target.c
> +++ b/tcg/arm/tcg-target.c
> @@ -302,27 +302,26 @@ static inline int tcg_target_const_match(tcg_target_long val,
>      }
>  }
>
> +#define TO_CPSR (1 << 20)

This is the S bit; I think it would be helpful if our #define
had a name that made that clearer...

> +
>  enum arm_data_opc_e {
> -    ARITH_AND = 0x0,
> -    ARITH_EOR = 0x1,
> -    ARITH_SUB = 0x2,
> -    ARITH_RSB = 0x3,
> -    ARITH_ADD = 0x4,
> -    ARITH_ADC = 0x5,
> -    ARITH_SBC = 0x6,
> -    ARITH_RSC = 0x7,
> -    ARITH_TST = 0x8,
> -    ARITH_CMP = 0xa,
> -    ARITH_CMN = 0xb,
> -    ARITH_ORR = 0xc,
> -    ARITH_MOV = 0xd,
> -    ARITH_BIC = 0xe,
> -    ARITH_MVN = 0xf,
> +    ARITH_AND = 0x0 << 21,
> +    ARITH_EOR = 0x1 << 21,
> +    ARITH_SUB = 0x2 << 21,
> +    ARITH_RSB = 0x3 << 21,
> +    ARITH_ADD = 0x4 << 21,
> +    ARITH_ADC = 0x5 << 21,
> +    ARITH_SBC = 0x6 << 21,
> +    ARITH_RSC = 0x7 << 21,
> +    ARITH_TST = 0x8 << 21 | TO_CPSR,
> +    ARITH_CMP = 0xa << 21 | TO_CPSR,
> +    ARITH_CMN = 0xb << 21 | TO_CPSR,
> +    ARITH_ORR = 0xc << 21,
> +    ARITH_MOV = 0xd << 21,
> +    ARITH_BIC = 0xe << 21,
> +    ARITH_MVN = 0xf << 21,
>  };

It feels a little ugly to OR in the S bit in this enum, but I guess
it works.

Maybe we should add ARITH_TEQ at some point?

-- PMM

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/20] tcg-arm: Handle constant arguments to add2/sub2
  2013-03-28 15:56   ` Peter Maydell
@ 2013-03-28 16:04     ` Richard Henderson
  2013-03-28 16:09       ` Laurent Desnogues
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 16:04 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, Aurelien Jarno

On 03/28/2013 08:56 AM, Peter Maydell wrote:
>> +#define TO_CPSR (1 << 20)
> 
> This is the S bit; I think it would be helpful if our #define
> had a name that made that clearer...

Suggestions?  I thought "TO_CPSR" was clear...

>> +    ARITH_TST = 0x8 << 21 | TO_CPSR,
>> +    ARITH_CMP = 0xa << 21 | TO_CPSR,
>> +    ARITH_CMN = 0xb << 21 | TO_CPSR,
>> +    ARITH_ORR = 0xc << 21,
>> +    ARITH_MOV = 0xd << 21,
>> +    ARITH_BIC = 0xe << 21,
>> +    ARITH_MVN = 0xf << 21,
>>  };
> 
> It feels a little ugly to OR in the S bit in this enum, but I guess
> it works.

I think it actually makes everything a lot cleaner myself.  Especially given
the previous definition that required runtime checks for TST et al.

> Maybe we should add ARITH_TEQ at some point?

Given that it's the only one left out, sure.  I can't imagine what context it
would be useful within tcg though -- it's main use seems to be doing an
equality comparison while preserving the C flag.



r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/20] tcg-arm: Fold epilogue into INDEX_op_exit_tb
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 07/20] tcg-arm: Fold epilogue into INDEX_op_exit_tb Richard Henderson
@ 2013-03-28 16:05   ` Peter Maydell
  2013-03-28 16:12     ` Richard Henderson
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Maydell @ 2013-03-28 16:05 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Aurelien Jarno

On 28 March 2013 15:32, Richard Henderson <rth@twiddle.net> wrote:
> The epilogue on ARM is one pop instruction, that pops the return
> address into PC.  Avoid the jump to jump for this case.  Use the
> standard movi32 routine for loading the return value if it's easy.

> @@ -2025,8 +2023,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
>      tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
>
>      tcg_out_bx(s, COND_AL, tcg_target_call_iarg_regs[1]);
> -    tb_ret_addr = s->code_ptr;
>
>      /* ldmia sp!, { r4 - r12, pc } */
> -    tcg_out32(s, (COND_AL << 28) | 0x08bd9ff0);
> +    tb_pop_ret = (COND_AL << 28) | 0x08bd9ff0;
>  }

Why are we using a variable when it's always constant?

Also, please add a comment to the bottom of the qemu_prologue()
saying something like

/* We never return here; we always return directly from generated
 * code to our caller.
 */

thanks
-- PMM

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/20] tcg-arm: Handle constant arguments to add2/sub2
  2013-03-28 16:04     ` Richard Henderson
@ 2013-03-28 16:09       ` Laurent Desnogues
  2013-03-28 16:16         ` Richard Henderson
  0 siblings, 1 reply; 41+ messages in thread
From: Laurent Desnogues @ 2013-03-28 16:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-devel, Aurelien Jarno

On Thu, Mar 28, 2013 at 5:04 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 03/28/2013 08:56 AM, Peter Maydell wrote:
>>> +#define TO_CPSR (1 << 20)
>>
>> This is the S bit; I think it would be helpful if our #define
>> had a name that made that clearer...
>
> Suggestions?  I thought "TO_CPSR" was clear...

SET_FLAGS?

>>> +    ARITH_TST = 0x8 << 21 | TO_CPSR,
>>> +    ARITH_CMP = 0xa << 21 | TO_CPSR,
>>> +    ARITH_CMN = 0xb << 21 | TO_CPSR,
>>> +    ARITH_ORR = 0xc << 21,
>>> +    ARITH_MOV = 0xd << 21,
>>> +    ARITH_BIC = 0xe << 21,
>>> +    ARITH_MVN = 0xf << 21,
>>>  };
>>
>> It feels a little ugly to OR in the S bit in this enum, but I guess
>> it works.
>
> I think it actually makes everything a lot cleaner myself.  Especially given
> the previous definition that required runtime checks for TST et al.

I also find this cleaner.  OTOH I understand that Peter finds it
odd given that other operations (such as ADD) can optionally
set flags.


Laurent

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/20] tcg-arm: Fold epilogue into INDEX_op_exit_tb
  2013-03-28 16:05   ` Peter Maydell
@ 2013-03-28 16:12     ` Richard Henderson
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 16:12 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, Aurelien Jarno

On 03/28/2013 09:05 AM, Peter Maydell wrote:
> On 28 March 2013 15:32, Richard Henderson <rth@twiddle.net> wrote:
>> The epilogue on ARM is one pop instruction, that pops the return
>> address into PC.  Avoid the jump to jump for this case.  Use the
>> standard movi32 routine for loading the return value if it's easy.
> 
>> @@ -2025,8 +2023,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
>>      tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
>>
>>      tcg_out_bx(s, COND_AL, tcg_target_call_iarg_regs[1]);
>> -    tb_ret_addr = s->code_ptr;
>>
>>      /* ldmia sp!, { r4 - r12, pc } */
>> -    tcg_out32(s, (COND_AL << 28) | 0x08bd9ff0);
>> +    tb_pop_ret = (COND_AL << 28) | 0x08bd9ff0;
>>  }
> 
> Why are we using a variable when it's always constant?

I just wanted the comments for the stmia and ldmia to be the same, and near one
another.  Would it be better if tcg_target_qemu_prologue was moved up in the
file, and tb_ret_addr made a static const (which the compiler should fold)?

E.g.

/* ldmia sp!, { r4 - r12, pc }  @ See the stmia definition below.  */
static const uint32_t tb_pop_ret = ...;

static void tcg_target_qemu_prologue(TCGContext *s)
{
   /* comment */
   /* stmia sp!, {r4-r12, lr} */
   tcg_out32(...);
   ...
}

> Also, please add a comment to the bottom of the qemu_prologue()
> saying something like
> 
> /* We never return here; we always return directly from generated
>  * code to our caller.
>  */

Sure.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/20] tcg-arm: Implement deposit for armv7
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 08/20] tcg-arm: Implement deposit for armv7 Richard Henderson
@ 2013-03-28 16:15   ` Peter Maydell
  2013-03-28 16:22     ` Richard Henderson
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Maydell @ 2013-03-28 16:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Aurelien Jarno

On 28 March 2013 15:32, Richard Henderson <rth@twiddle.net> wrote:
> We have BFI and BFC available for implementing it.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/arm/tcg-target.c | 36 ++++++++++++++++++++++++++++++++++++
>  tcg/arm/tcg-target.h |  5 ++++-
>  2 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
> index 88f5689..4950eaf 100644
> --- a/tcg/arm/tcg-target.c
> +++ b/tcg/arm/tcg-target.c
> @@ -702,6 +702,35 @@ static inline void tcg_out_bswap32(TCGContext *s, int cond, int rd, int rn)
>      }
>  }
>
> +bool tcg_target_deposit_valid(int ofs, int len)
> +{
> +    /* ??? Without bfi, we could improve over generic code by combining
> +       the right-shift from a non-zero ofs with the orr.  We do run into
> +       problems when rd == rs, and the mask generated from ofs+len don't
> +       fit into an immediate.  We would have to be careful not to pessimize
> +       wrt the optimizations performed on the expanded code.  */
> +    return use_armv7_instructions;

Strictly speaking BFI is v6T2, but there doesn't seem much point
in making the distinction given it would only affect the rare
ARM1156. (Personally I don't think there's much point worrying about
optmising codegen for anything pre-v7 at all.)

> +}
> +
> +static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
> +                                   TCGArg a1, int ofs, int len, bool const_a1)
> +{
> +    if (const_a1) {
> +        uint32_t mask = (2u << (len - 1)) - 1;

What guarantees us that we won't see a length of 0?
The tcg/README description doesn't say that's invalid
and I don't think the optimize pass handles it (maybe I
missed it).

-- PMM

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/20] tcg-arm: Handle constant arguments to add2/sub2
  2013-03-28 16:09       ` Laurent Desnogues
@ 2013-03-28 16:16         ` Richard Henderson
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 16:16 UTC (permalink / raw)
  To: Laurent Desnogues; +Cc: Peter Maydell, qemu-devel, Aurelien Jarno

On 03/28/2013 09:09 AM, Laurent Desnogues wrote:
>> I think it actually makes everything a lot cleaner myself.  Especially given
>> the previous definition that required runtime checks for TST et al.
> 
> I also find this cleaner.  OTOH I understand that Peter finds it
> odd given that other operations (such as ADD) can optionally
> set flags.

Well of course they can.  Which is exactly why I'm doing all this, because I
can include that bit in the add2/sub2 expansion below.  It's just that the
comparison insns *require* the S bit set, which is what is going on in this enum.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/20] tcg-arm: Implement deposit for armv7
  2013-03-28 16:15   ` Peter Maydell
@ 2013-03-28 16:22     ` Richard Henderson
  2013-03-28 16:59       ` Peter Maydell
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 16:22 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, Aurelien Jarno

On 03/28/2013 09:15 AM, Peter Maydell wrote:
>> +    /* ??? Without bfi, we could improve over generic code by combining
>> +       the right-shift from a non-zero ofs with the orr.  We do run into
>> +       problems when rd == rs, and the mask generated from ofs+len don't
>> +       fit into an immediate.  We would have to be careful not to pessimize
>> +       wrt the optimizations performed on the expanded code.  */
>> +    return use_armv7_instructions;
> 
> Strictly speaking BFI is v6T2, but there doesn't seem much point
> in making the distinction given it would only affect the rare
> ARM1156. (Personally I don't think there's much point worrying about
> optmising codegen for anything pre-v7 at all.)

Fair enough.  I could update the comment to include v6t2, since I've done
similar for e.g. v6k (while retaining the v7 test) elsewhere in the patch set.

> What guarantees us that we won't see a length of 0?
> The tcg/README description doesn't say that's invalid
> and I don't think the optimize pass handles it (maybe I
> missed it).

We can patch the readme, and the asserts in tcg-op.h if you like.

I've assumed elsewhere that we won't see a zero length.  E.g. none of the other
cpus -- ppc, hppa, ia64 -- can encode that either.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION
  2013-03-28 15:33 ` [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION Richard Henderson
@ 2013-03-28 16:44   ` Peter Maydell
  2013-03-28 17:46     ` Richard Henderson
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Maydell @ 2013-03-28 16:44 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Aurelien Jarno

On 28 March 2013 15:33, Richard Henderson <rth@twiddle.net> wrote:
> +static inline void tcg_out_nop(TCGContext *s)
> +{
> +    if (use_armv7_instructions) {
> +        /* Architected nop introduced in v6k.  */
> +        /* ??? This is an MSR (imm) 0,0,0 insn.  Anyone know if this
> +           also Just So Happened to do nothing on pre-v6k so that we
> +           don't need to conditionalize it?  */

Well, it corresponds to a "do nothing" kind of MSR (because
all the field_mask bits are zero) but I bet it's more expensive
than mov r0, r0 on at least some pre-v6k cores.

> +        tcg_out32(s, 0xe320f000);
> +    } else {
> +        /* Prior to that the assembler uses mov r0, r0.  Unlike the nop
> +           above, this is guaranteed to consume execution resources.  */

Guaranteed by who? Catching this case in the decoder and treating it
exactly like NOP is a perfectly legal implementation.
(For that matter there's nothing restricting an implementation of
the architectural NOP from tying up every execution resource on
the core for 500 cycles.)

> +        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, 0, 0, 0, SHIFT_IMM_LSL(0));
> +    }

-- PMM

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/20] tcg-arm: Implement deposit for armv7
  2013-03-28 16:22     ` Richard Henderson
@ 2013-03-28 16:59       ` Peter Maydell
  0 siblings, 0 replies; 41+ messages in thread
From: Peter Maydell @ 2013-03-28 16:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Aurelien Jarno

On 28 March 2013 16:22, Richard Henderson <rth@twiddle.net> wrote:
> On 03/28/2013 09:15 AM, Peter Maydell wrote:
>> What guarantees us that we won't see a length of 0?
>> The tcg/README description doesn't say that's invalid
>> and I don't think the optimize pass handles it (maybe I
>> missed it).
>
> We can patch the readme, and the asserts in tcg-op.h if you like.

That would be nice, but I think I was getting confused with
the other edge case (length == 32), which we do handle
correctly.

-- PMM

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION
  2013-03-28 16:44   ` Peter Maydell
@ 2013-03-28 17:46     ` Richard Henderson
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 17:46 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, Aurelien Jarno

On 03/28/2013 09:44 AM, Peter Maydell wrote:
>> +        /* Prior to that the assembler uses mov r0, r0.  Unlike the nop
>> +           above, this is guaranteed to consume execution resources.  */
> 
> Guaranteed by who? Catching this case in the decoder and treating it
> exactly like NOP is a perfectly legal implementation.
> (For that matter there's nothing restricting an implementation of
> the architectural NOP from tying up every execution resource on
> the core for 500 cycles.)

Hmph, I could have sworn I saw language exactly like that in the AARM,
but I can't find it anymore.  I do see a note about not using NOP in
timing loops in A8.8.119.

As for timing on real hardware, I can make a loop like

1:	subs	r0, r0, #1
	mov	r0, r0
	mov	r0, r0
	mov	r0, r0
	mov	r0, r0
	mov	r0, r0
	mov	r0, r0
	bne	1b

runs in 7 cycles on Cortex-A15, whereas the same loop with nops runs in 6.  Of
course, changing to "mov r1, r1" so that we don't conflict with the subs in the
first cycle also runs in 6 cycles.  So it's all about finding a nop that
doesn't have a RAW conflict with the previous insn.

I don't have any other ARM hw readily available.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 14/20] tcg-arm: Cleanup goto_tb handling
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 14/20] tcg-arm: Cleanup goto_tb handling Richard Henderson
@ 2013-03-28 20:09   ` Aurelien Jarno
  2013-03-28 20:48     ` Richard Henderson
  0 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2013-03-28 20:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-devel

On Thu, Mar 28, 2013 at 08:32:55AM -0700, Richard Henderson wrote:
> Eliminate 2 disabled code blocks.  Choose the load-to-pc method of
> jumping so that we can eliminate the 16M code_gen_buffer limitation.
> Remove a test in the indirect jump method that is always true.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/exec/exec-all.h | 23 +++--------------------
>  tcg/arm/tcg-target.c    | 26 ++++++--------------------
>  translate-all.c         |  2 --
>  3 files changed, 9 insertions(+), 42 deletions(-)

I already proposed such a patch, but it seems to improve things only on
some specific cases (kernel boot), while increasing the I/D cache and
TLB pressure.

See https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg01684.html 

> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index e856191..190effa 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -233,26 +233,9 @@ static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
>  #elif defined(__arm__)
>  static inline void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
>  {
> -#if !QEMU_GNUC_PREREQ(4, 1)
> -    register unsigned long _beg __asm ("a1");
> -    register unsigned long _end __asm ("a2");
> -    register unsigned long _flg __asm ("a3");
> -#endif
> -
> -    /* we could use a ldr pc, [pc, #-4] kind of branch and avoid the flush */
> -    *(uint32_t *)jmp_addr =
> -        (*(uint32_t *)jmp_addr & ~0xffffff)
> -        | (((addr - (jmp_addr + 8)) >> 2) & 0xffffff);
> -
> -#if QEMU_GNUC_PREREQ(4, 1)
> -    __builtin___clear_cache((char *) jmp_addr, (char *) jmp_addr + 4);
> -#else
> -    /* flush icache */
> -    _beg = jmp_addr;
> -    _end = jmp_addr + 4;
> -    _flg = 0;
> -    __asm __volatile__ ("swi 0x9f0002" : : "r" (_beg), "r" (_end), "r" (_flg));
> -#endif
> +    /* We're using "ldr pc, [pc,#-4]", so we can just store the raw
> +       address, without caring for flushing the icache.  */
> +    *(uint32_t *)jmp_addr = addr;
>  }
>  #elif defined(__sparc__)
>  void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr);
> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
> index 7bcba19..6de5e90 100644
> --- a/tcg/arm/tcg-target.c
> +++ b/tcg/arm/tcg-target.c
> @@ -1595,30 +1595,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>      case INDEX_op_goto_tb:
>          if (s->tb_jmp_offset) {
> -            /* Direct jump method */
> -#if defined(USE_DIRECT_JUMP)
> -            s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
> -            tcg_out_b_noaddr(s, COND_AL);
> -#else
> +            /* "Direct" jump method.  Rather than limit the code gen buffer
> +               to 16M, load the destination from the next word.  */
>              tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
>              s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
> -            tcg_out32(s, 0);
> -#endif
> +            s->code_ptr += 4;
>          } else {
> -            /* Indirect jump method */
> -#if 1
> -            c = (int) (s->tb_next + args[0]) - ((int) s->code_ptr + 8);
> -            if (c > 0xfff || c < -0xfff) {
> -                tcg_out_movi32(s, COND_AL, TCG_REG_R0,
> -                                (tcg_target_long) (s->tb_next + args[0]));
> -                tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_R0, 0);
> -            } else
> -                tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, c);
> -#else
> -            tcg_out_ld32_12(s, COND_AL, TCG_REG_R0, TCG_REG_PC, 0);
> +            /* Indirect jump method.  */
> +            tcg_out_movi32(s, COND_AL, TCG_REG_R0,
> +                           (tcg_target_long) (s->tb_next + args[0]));
>              tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_R0, 0);
> -            tcg_out32(s, (tcg_target_long) (s->tb_next + args[0]));
> -#endif
>          }
>          s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
>          break;
> diff --git a/translate-all.c b/translate-all.c
> index a98c646..3ca839f 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -460,8 +460,6 @@ static inline PageDesc *page_find(tb_page_addr_t index)
>  # define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
>  #elif defined(__sparc__)
>  # define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
> -#elif defined(__arm__)
> -# define MAX_CODE_GEN_BUFFER_SIZE  (16u * 1024 * 1024)
>  #elif defined(__s390x__)
>    /* We have a +- 4GB range on the branches; leave some slop.  */
>  # define MAX_CODE_GEN_BUFFER_SIZE  (3ul * 1024 * 1024 * 1024)
> -- 
> 1.8.1.4
> 
> 
> 

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 14/20] tcg-arm: Cleanup goto_tb handling
  2013-03-28 20:09   ` Aurelien Jarno
@ 2013-03-28 20:48     ` Richard Henderson
  2013-03-29  6:50       ` Aurelien Jarno
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2013-03-28 20:48 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: Peter Maydell, qemu-devel

On 2013-03-28 13:09, Aurelien Jarno wrote:
> I already proposed such a patch, but it seems to improve things only on
> some specific cases (kernel boot), while increasing the I/D cache and
> TLB pressure.
>
> See https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg01684.html

Hmm.  I see that your previous attempt didn't also go with a removal of
the 16MB limit on code_gen_buffer.  Given the default is 32MB, I wonder
if the decrease in translation makes enough difference.

I'll admit that this sort of change needs more benchmarking.  What I'd
like to do one way or the other is get rid of the myriad ifdefs.  That's
just too confusing.

For the record, I've also been experimenting with a real constant pool.
This starts to significantly reduce the insn size, even with movw.  This
primarily due to re-use of helper address constants.  With the pool, I
also place the goto_tb constants there as well.

I've not gotten that code completely stable yet, but once I do I can
also consider the number of Dcache lines that the pool will occupy.
(With v7, I only see 2-3 entries in the pool per TB, so in most cases
aligning to a cache boundary won't matter.)


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 14/20] tcg-arm: Cleanup goto_tb handling
  2013-03-28 20:48     ` Richard Henderson
@ 2013-03-29  6:50       ` Aurelien Jarno
  2013-03-29 15:06         ` Richard Henderson
  0 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2013-03-29  6:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-devel

On Thu, Mar 28, 2013 at 01:48:42PM -0700, Richard Henderson wrote:
> On 2013-03-28 13:09, Aurelien Jarno wrote:
> >I already proposed such a patch, but it seems to improve things only on
> >some specific cases (kernel boot), while increasing the I/D cache and
> >TLB pressure.
> >
> >See https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg01684.html
> 
> Hmm.  I see that your previous attempt didn't also go with a removal of
> the 16MB limit on code_gen_buffer.  Given the default is 32MB, I wonder
> if the decrease in translation makes enough difference.
> 

My patch also removed the 16MB limit. Note also the default is 1/4th
of the emulated RAM, so it can be a lot bigger than 32MB.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 14/20] tcg-arm: Cleanup goto_tb handling
  2013-03-29  6:50       ` Aurelien Jarno
@ 2013-03-29 15:06         ` Richard Henderson
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2013-03-29 15:06 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: Peter Maydell, qemu-devel

On 2013-03-28 23:50, Aurelien Jarno wrote:
> On Thu, Mar 28, 2013 at 01:48:42PM -0700, Richard Henderson wrote:
>> On 2013-03-28 13:09, Aurelien Jarno wrote:
>>> I already proposed such a patch, but it seems to improve things only on
>>> some specific cases (kernel boot), while increasing the I/D cache and
>>> TLB pressure.
>>>
>>> See https://lists.gnu.org/archive/html/qemu-devel/2012-10/msg01684.html
>>
>> Hmm.  I see that your previous attempt didn't also go with a removal of
>> the 16MB limit on code_gen_buffer.  Given the default is 32MB, I wonder
>> if the decrease in translation makes enough difference.
>>
>
> My patch also removed the 16MB limit. Note also the default is 1/4th
> of the emulated RAM, so it can be a lot bigger than 32MB.
>
Ok, I guess consider that patch dropped for now then.

I'll post v4 today with the changes suggested so far.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 16/20] tcg-arm: Fix local stack frame
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 16/20] tcg-arm: Fix local stack frame Richard Henderson
@ 2013-03-29 16:50   ` Aurelien Jarno
  0 siblings, 0 replies; 41+ messages in thread
From: Aurelien Jarno @ 2013-03-29 16:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-devel

On Thu, Mar 28, 2013 at 08:32:57AM -0700, Richard Henderson wrote:
> We were not allocating TCG_STATIC_CALL_ARGS_SIZE, so this meant that
> any helper with more than 4 arguments would clobber the saved regs.
> Realizing that we're supposed to have this memory pre-allocated means
> we can clean up the tcg_out_arg functions, which were trying to do
> more stack allocation.
> 
> Allocate stack memory for the TCG temporaries while we're at it.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/arm/tcg-target.c | 120 ++++++++++++++++++---------------------------------
>  1 file changed, 43 insertions(+), 77 deletions(-)

This fixes a crash booting a mips64 guest, but unfortunately there are
still more issues to fix.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
> index fb8c96d..70f63cb 100644
> --- a/tcg/arm/tcg-target.c
> +++ b/tcg/arm/tcg-target.c
> @@ -1074,65 +1074,35 @@ static const void * const qemu_st_helpers[4] = {
>   * argreg is where we want to put this argument, arg is the argument itself.
>   * Return value is the updated argreg ready for the next call.
>   * Note that argreg 0..3 is real registers, 4+ on stack.
> - * When we reach the first stacked argument, we allocate space for it
> - * and the following stacked arguments using "str r8, [sp, #-0x10]!".
> - * Following arguments are filled in with "str r8, [sp, #0xNN]".
> - * For more than 4 stacked arguments we'd need to know how much
> - * space to allocate when we pushed the first stacked argument.
> - * We don't need this, so don't implement it (and will assert if you try it.)
>   *
>   * We provide routines for arguments which are: immediate, 32 bit
>   * value in register, 16 and 8 bit values in register (which must be zero
>   * extended before use) and 64 bit value in a lo:hi register pair.
>   */
> -#define DEFINE_TCG_OUT_ARG(NAME, ARGPARAM)                                 \
> -    static TCGReg NAME(TCGContext *s, TCGReg argreg, ARGPARAM)             \
> -    {                                                                      \
> -        if (argreg < 4) {                                                  \
> -            TCG_OUT_ARG_GET_ARG(argreg);                                   \
> -        } else if (argreg == 4) {                                          \
> -            TCG_OUT_ARG_GET_ARG(TCG_REG_TMP);                              \
> -            tcg_out32(s, (COND_AL << 28) | 0x052d0010 | (TCG_REG_TMP << 12)); \
> -        } else {                                                           \
> -            assert(argreg < 8);                                            \
> -            TCG_OUT_ARG_GET_ARG(TCG_REG_TMP);                              \
> -            tcg_out_st32_12(s, COND_AL, TCG_REG_TMP, TCG_REG_CALL_STACK,   \
> -                            (argreg - 4) * 4);                             \
> -        }                                                                  \
> -        return argreg + 1;                                                 \
> -    }
> -
> -#define TCG_OUT_ARG_GET_ARG(A) tcg_out_dat_imm(s, COND_AL, ARITH_MOV, A, 0, arg)
> -DEFINE_TCG_OUT_ARG(tcg_out_arg_imm32, uint32_t arg)
> -#undef TCG_OUT_ARG_GET_ARG
> -#define TCG_OUT_ARG_GET_ARG(A) tcg_out_ext8u(s, COND_AL, A, arg)
> -DEFINE_TCG_OUT_ARG(tcg_out_arg_reg8, TCGReg arg)
> -#undef TCG_OUT_ARG_GET_ARG
> -#define TCG_OUT_ARG_GET_ARG(A) tcg_out_ext16u(s, COND_AL, A, arg)
> -DEFINE_TCG_OUT_ARG(tcg_out_arg_reg16, TCGReg arg)
> -#undef TCG_OUT_ARG_GET_ARG
> -
> -/* We don't use the macro for this one to avoid an unnecessary reg-reg
> - * move when storing to the stack.
> - */
> -static TCGReg tcg_out_arg_reg32(TCGContext *s, TCGReg argreg, TCGReg arg)
> -{
> -    if (argreg < 4) {
> -        tcg_out_mov_reg(s, COND_AL, argreg, arg);
> -    } else if (argreg == 4) {
> -        /* str arg, [sp, #-0x10]! */
> -        tcg_out32(s, (COND_AL << 28) | 0x052d0010 | (arg << 12));
> -    } else {
> -        assert(argreg < 8);
> -        /* str arg, [sp, #0xNN] */
> -        tcg_out32(s, (COND_AL << 28) | 0x058d0000 |
> -                  (arg << 12) | (argreg - 4) * 4);
> -    }
> -    return argreg + 1;
> -}
> -
> -static inline TCGReg tcg_out_arg_reg64(TCGContext *s, TCGReg argreg,
> -                                       TCGReg arglo, TCGReg arghi)
> +#define DEFINE_TCG_OUT_ARG(NAME, ARGTYPE, MOV_ARG, EXT_ARG)                \
> +static TCGReg NAME(TCGContext *s, TCGReg argreg, ARGTYPE arg)              \
> +{                                                                          \
> +    if (argreg < 4) {                                                      \
> +        MOV_ARG(s, COND_AL, argreg, arg);                                  \
> +    } else {                                                               \
> +        int ofs = (argreg - 4) * 4;                                        \
> +        EXT_ARG;                                                           \
> +        assert(ofs + 4 <= TCG_STATIC_CALL_ARGS_SIZE);                      \
> +        tcg_out_st32_12(s, COND_AL, arg, TCG_REG_CALL_STACK, ofs);         \
> +    }                                                                      \
> +    return argreg + 1;                                                     \
> +}
> +
> +DEFINE_TCG_OUT_ARG(tcg_out_arg_imm32, uint32_t, tcg_out_movi32,
> +    (tcg_out_movi32(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
> +DEFINE_TCG_OUT_ARG(tcg_out_arg_reg8, TCGReg, tcg_out_ext8u,
> +    (tcg_out_ext8u(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
> +DEFINE_TCG_OUT_ARG(tcg_out_arg_reg16, TCGReg, tcg_out_ext16u,
> +    (tcg_out_ext16u(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
> +DEFINE_TCG_OUT_ARG(tcg_out_arg_reg32, TCGReg, tcg_out_mov_reg, )
> +
> +static TCGReg tcg_out_arg_reg64(TCGContext *s, TCGReg argreg,
> +                                TCGReg arglo, TCGReg arghi)
>  {
>      /* 64 bit arguments must go in even/odd register pairs
>       * and in 8-aligned stack slots.
> @@ -1144,16 +1114,7 @@ static inline TCGReg tcg_out_arg_reg64(TCGContext *s, TCGReg argreg,
>      argreg = tcg_out_arg_reg32(s, argreg, arghi);
>      return argreg;
>  }
> -
> -static inline void tcg_out_arg_stacktidy(TCGContext *s, TCGReg argreg)
> -{
> -    /* Output any necessary post-call cleanup of the stack */
> -    if (argreg > 4) {
> -        tcg_out_dat_imm(s, COND_AL, ARITH_ADD, TCG_REG_R13, TCG_REG_R13, 0x10);
> -    }
> -}
> -
> -#endif
> +#endif /* SOFTMMU */
>  
>  #define TLB_SHIFT	(CPU_TLB_ENTRY_BITS + CPU_TLB_BITS)
>  
> @@ -1284,7 +1245,6 @@ static inline void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
>  #endif
>      argreg = tcg_out_arg_imm32(s, argreg, mem_index);
>      tcg_out_call(s, (tcg_target_long) qemu_ld_helpers[s_bits]);
> -    tcg_out_arg_stacktidy(s, argreg);
>  
>      switch (opc) {
>      case 0 | 4:
> @@ -1502,7 +1462,6 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
>  
>      argreg = tcg_out_arg_imm32(s, argreg, mem_index);
>      tcg_out_call(s, (tcg_target_long) qemu_st_helpers[s_bits]);
> -    tcg_out_arg_stacktidy(s, argreg);
>  
>      reloc_pc24(label_ptr, (tcg_target_long)s->code_ptr);
>  #else /* !CONFIG_SOFTMMU */
> @@ -1560,6 +1519,7 @@ static inline void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
>  }
>  
>  static uint32_t tb_pop_ret;
> +static uint32_t tb_frame_size;
>  
>  static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>                  const TCGArg *args, const int *const_args)
> @@ -1570,14 +1530,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      switch (opc) {
>      case INDEX_op_exit_tb:
>          a0 = args[0];
> +        tcg_out_dat_rI(s, COND_AL, ARITH_ADD, TCG_REG_CALL_STACK,
> +                       TCG_REG_CALL_STACK, tb_frame_size, 1);
>          if (use_armv7_instructions || check_fit_imm(a0)) {
> -            tcg_out_movi32(s, COND_AL, TCG_REG_R0, args[0]);
> +            tcg_out_movi32(s, COND_AL, TCG_REG_R0, a0);
>              tcg_out32(s, tb_pop_ret);
>          } else {
>              /* pc is always current address + 8, so 0 reads the word.  */
>              tcg_out_ld32_12(s, COND_AL, TCG_REG_R0, TCG_REG_PC, 0);
>              tcg_out32(s, tb_pop_ret);
> -            tcg_out32(s, args[0]);
> +            tcg_out32(s, a0);
>          }
>          break;
>      case INDEX_op_goto_tb:
> @@ -2002,8 +1964,6 @@ static void tcg_target_init(TCGContext *s)
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_PC);
>  
>      tcg_add_target_add_op_defs(arm_op_defs);
> -    tcg_set_frame(s, TCG_AREG0, offsetof(CPUArchState, temp_buf),
> -                  CPU_TEMP_BUF_NLONGS * sizeof(long));
>  }
>  
>  static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
> @@ -2032,17 +1992,23 @@ static inline void tcg_out_movi(TCGContext *s, TCGType type,
>  
>  static void tcg_target_qemu_prologue(TCGContext *s)
>  {
> -    /* Calling convention requires us to save r4-r11 and lr;
> -     * save also r12 to maintain stack 8-alignment.
> -     */
> -
> +    /* Calling convention requires us to save r4-r11 and lr; save also r12
> +       to maintain stack 8-alignment.  */
>      /* stmdb sp!, { r4 - r12, lr } */
>      tcg_out32(s, (COND_AL << 28) | 0x092d5ff0);
>  
> +    /* ldmia sp!, { r4 - r12, pc } */
> +    tb_pop_ret = (COND_AL << 28) | 0x08bd9ff0;
> +
> +    /* Allocate the local stack frame.  */
> +    tb_frame_size = TCG_STATIC_CALL_ARGS_SIZE;
> +    tb_frame_size += CPU_TEMP_BUF_NLONGS * sizeof(long);
> +    tcg_out_dat_rI(s, COND_AL, ARITH_SUB, TCG_REG_CALL_STACK,
> +                   TCG_REG_CALL_STACK, tb_frame_size, 1);
> +    tcg_set_frame(s, TCG_REG_CALL_STACK, TCG_STATIC_CALL_ARGS_SIZE,
> +                  CPU_TEMP_BUF_NLONGS * sizeof(long));
> +
>      tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
>  
>      tcg_out_bx(s, COND_AL, tcg_target_call_iarg_regs[1]);
> -
> -    /* ldmia sp!, { r4 - r12, pc } */
> -    tb_pop_ret = (COND_AL << 28) | 0x08bd9ff0;
>  }
> -- 
> 1.8.1.4
> 
> 
> 

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/20] tcg-arm: Use bic to implement and with constant
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 01/20] tcg-arm: Use bic to implement and with constant Richard Henderson
@ 2013-03-29 16:53   ` Aurelien Jarno
  0 siblings, 0 replies; 41+ messages in thread
From: Aurelien Jarno @ 2013-03-29 16:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-devel

On Thu, Mar 28, 2013 at 08:32:42AM -0700, Richard Henderson wrote:
> This greatly improves the code we can produce for deposit
> without armv7 support.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/arm/tcg-target.c | 52 ++++++++++++++++++++++++++++++++++++++++++----------
>  tcg/arm/tcg-target.h |  2 --
>  2 files changed, 42 insertions(+), 12 deletions(-)
> 
> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
> index 94c6ca4..9972792 100644
> --- a/tcg/arm/tcg-target.c
> +++ b/tcg/arm/tcg-target.c
> @@ -145,6 +145,9 @@ static void patch_reloc(uint8_t *code_ptr, int type,
>      }
>  }
>  
> +#define TCG_CT_CONST_ARM 0x100
> +#define TCG_CT_CONST_INV 0x200
> +
>  /* parse target specific constraints */
>  static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>  {
> @@ -155,6 +158,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>      case 'I':
>           ct->ct |= TCG_CT_CONST_ARM;
>           break;
> +    case 'K':
> +         ct->ct |= TCG_CT_CONST_INV;
> +         break;
>  
>      case 'r':
>          ct->ct |= TCG_CT_REG;
> @@ -275,16 +281,19 @@ static inline int check_fit_imm(uint32_t imm)
>   * add, sub, eor...: ditto
>   */
>  static inline int tcg_target_const_match(tcg_target_long val,
> -                const TCGArgConstraint *arg_ct)
> +                                         const TCGArgConstraint *arg_ct)
>  {
>      int ct;
>      ct = arg_ct->ct;
> -    if (ct & TCG_CT_CONST)
> +    if (ct & TCG_CT_CONST) {
>          return 1;
> -    else if ((ct & TCG_CT_CONST_ARM) && check_fit_imm(val))
> +    } else if ((ct & TCG_CT_CONST_ARM) && check_fit_imm(val)) {
>          return 1;
> -    else
> +    } else if ((ct & TCG_CT_CONST_INV) && check_fit_imm(~val)) {
> +        return 1;
> +    } else {
>          return 0;
> +    }
>  }
>  
>  enum arm_data_opc_e {
> @@ -482,6 +491,27 @@ static inline void tcg_out_dat_rI(TCGContext *s, int cond, int opc, TCGArg dst,
>      }
>  }
>  
> +static void tcg_out_dat_rIK(TCGContext *s, int cond, int opc, int opinv,
> +                            TCGReg dst, TCGReg lhs, TCGArg rhs,
> +                            bool rhs_is_const)
> +{
> +    /* Emit either the reg,imm or reg,reg form of a data-processing insn.
> +     * rhs must satisfy the "rIK" constraint.
> +     */
> +    if (rhs_is_const) {
> +        int rot = encode_imm(rhs);
> +        if (rot < 0) {
> +            rhs = ~rhs;
> +            rot = encode_imm(rhs);
> +            assert(rot >= 0);
> +            opc = opinv;
> +        }
> +        tcg_out_dat_imm(s, cond, opc, dst, lhs, rotl(rhs, rot) | (rot << 7));
> +    } else {
> +        tcg_out_dat_reg(s, cond, opc, dst, lhs, rhs, SHIFT_IMM_LSL(0));
> +    }
> +}
> +
>  static inline void tcg_out_mul32(TCGContext *s,
>                  int cond, int rd, int rs, int rm)
>  {
> @@ -1610,11 +1640,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          c = ARITH_SUB;
>          goto gen_arith;
>      case INDEX_op_and_i32:
> -        c = ARITH_AND;
> -        goto gen_arith;
> +        tcg_out_dat_rIK(s, COND_AL, ARITH_AND, ARITH_BIC,
> +                        args[0], args[1], args[2], const_args[2]);
> +        break;
>      case INDEX_op_andc_i32:
> -        c = ARITH_BIC;
> -        goto gen_arith;
> +        tcg_out_dat_rIK(s, COND_AL, ARITH_BIC, ARITH_AND,
> +                        args[0], args[1], args[2], const_args[2]);
> +        break;
>      case INDEX_op_or_i32:
>          c = ARITH_ORR;
>          goto gen_arith;
> @@ -1802,8 +1834,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
>      { INDEX_op_mul_i32, { "r", "r", "r" } },
>      { INDEX_op_mulu2_i32, { "r", "r", "r", "r" } },
>      { INDEX_op_muls2_i32, { "r", "r", "r", "r" } },
> -    { INDEX_op_and_i32, { "r", "r", "rI" } },
> -    { INDEX_op_andc_i32, { "r", "r", "rI" } },
> +    { INDEX_op_and_i32, { "r", "r", "rIK" } },
> +    { INDEX_op_andc_i32, { "r", "r", "rIK" } },
>      { INDEX_op_or_i32, { "r", "r", "rI" } },
>      { INDEX_op_xor_i32, { "r", "r", "rI" } },
>      { INDEX_op_neg_i32, { "r", "r" } },
> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index b6eed1f..354dd8a 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -49,8 +49,6 @@ typedef enum {
>  
>  #define TCG_TARGET_NB_REGS 16
>  
> -#define TCG_CT_CONST_ARM 0x100
> -
>  /* used for function call generation */
>  #define TCG_REG_CALL_STACK		TCG_REG_R13
>  #define TCG_TARGET_STACK_ALIGN		8

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>


-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 02/20] tcg-arm: Handle negated constant arguments to and/sub
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 02/20] tcg-arm: Handle negated constant arguments to and/sub Richard Henderson
@ 2013-03-29 16:53   ` Aurelien Jarno
  0 siblings, 0 replies; 41+ messages in thread
From: Aurelien Jarno @ 2013-03-29 16:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-devel

On Thu, Mar 28, 2013 at 08:32:43AM -0700, Richard Henderson wrote:
> This greatly improves code generation for addition of small
> negative constants.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/arm/tcg-target.c | 41 +++++++++++++++++++++++++++++++++++------
>  1 file changed, 35 insertions(+), 6 deletions(-)
> 
> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
> index 9972792..f470caa 100644
> --- a/tcg/arm/tcg-target.c
> +++ b/tcg/arm/tcg-target.c
> @@ -147,6 +147,7 @@ static void patch_reloc(uint8_t *code_ptr, int type,
>  
>  #define TCG_CT_CONST_ARM 0x100
>  #define TCG_CT_CONST_INV 0x200
> +#define TCG_CT_CONST_NEG 0x400
>  
>  /* parse target specific constraints */
>  static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
> @@ -161,6 +162,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>      case 'K':
>           ct->ct |= TCG_CT_CONST_INV;
>           break;
> +    case 'N': /* The gcc constraint letter is L, already used here.  */
> +         ct->ct |= TCG_CT_CONST_NEG;
> +         break;
>  
>      case 'r':
>          ct->ct |= TCG_CT_REG;
> @@ -291,6 +295,8 @@ static inline int tcg_target_const_match(tcg_target_long val,
>          return 1;
>      } else if ((ct & TCG_CT_CONST_INV) && check_fit_imm(~val)) {
>          return 1;
> +    } else if ((ct & TCG_CT_CONST_NEG) && check_fit_imm(-val)) {
> +        return 1;
>      } else {
>          return 0;
>      }
> @@ -512,6 +518,27 @@ static void tcg_out_dat_rIK(TCGContext *s, int cond, int opc, int opinv,
>      }
>  }
>  
> +static void tcg_out_dat_rIN(TCGContext *s, int cond, int opc, int opneg,
> +                            TCGArg dst, TCGArg lhs, TCGArg rhs,
> +                            bool rhs_is_const)
> +{
> +    /* Emit either the reg,imm or reg,reg form of a data-processing insn.
> +     * rhs must satisfy the "rIN" constraint.
> +     */
> +    if (rhs_is_const) {
> +        int rot = encode_imm(rhs);
> +        if (rot < 0) {
> +            rhs = -rhs;
> +            rot = encode_imm(rhs);
> +            assert(rot >= 0);
> +            opc = opneg;
> +        }
> +        tcg_out_dat_imm(s, cond, opc, dst, lhs, rotl(rhs, rot) | (rot << 7));
> +    } else {
> +        tcg_out_dat_reg(s, cond, opc, dst, lhs, rhs, SHIFT_IMM_LSL(0));
> +    }
> +}
> +
>  static inline void tcg_out_mul32(TCGContext *s,
>                  int cond, int rd, int rs, int rm)
>  {
> @@ -1634,11 +1661,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>                         ARITH_MOV, args[0], 0, args[3], const_args[3]);
>          break;
>      case INDEX_op_add_i32:
> -        c = ARITH_ADD;
> -        goto gen_arith;
> +        tcg_out_dat_rIN(s, COND_AL, ARITH_ADD, ARITH_SUB,
> +                        args[0], args[1], args[2], const_args[2]);
> +        break;
>      case INDEX_op_sub_i32:
> -        c = ARITH_SUB;
> -        goto gen_arith;
> +        tcg_out_dat_rIN(s, COND_AL, ARITH_SUB, ARITH_ADD,
> +                        args[0], args[1], args[2], const_args[2]);
> +        break;
>      case INDEX_op_and_i32:
>          tcg_out_dat_rIK(s, COND_AL, ARITH_AND, ARITH_BIC,
>                          args[0], args[1], args[2], const_args[2]);
> @@ -1829,8 +1858,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
>      { INDEX_op_st_i32, { "r", "r" } },
>  
>      /* TODO: "r", "r", "ri" */
> -    { INDEX_op_add_i32, { "r", "r", "rI" } },
> -    { INDEX_op_sub_i32, { "r", "r", "rI" } },
> +    { INDEX_op_add_i32, { "r", "r", "rIN" } },
> +    { INDEX_op_sub_i32, { "r", "r", "rIN" } },
>      { INDEX_op_mul_i32, { "r", "r", "r" } },
>      { INDEX_op_mulu2_i32, { "r", "r", "r", "r" } },
>      { INDEX_op_muls2_i32, { "r", "r", "r", "r" } },

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/20] tcg-arm: Allow constant first argument to sub
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 03/20] tcg-arm: Allow constant first argument to sub Richard Henderson
@ 2013-03-29 16:58   ` Aurelien Jarno
  0 siblings, 0 replies; 41+ messages in thread
From: Aurelien Jarno @ 2013-03-29 16:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-devel

On Thu, Mar 28, 2013 at 08:32:44AM -0700, Richard Henderson wrote:
> This allows the generation of RSB instructions.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/arm/tcg-target.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
> index f470caa..7475142 100644
> --- a/tcg/arm/tcg-target.c
> +++ b/tcg/arm/tcg-target.c
> @@ -1665,8 +1665,17 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>                          args[0], args[1], args[2], const_args[2]);
>          break;
>      case INDEX_op_sub_i32:
> -        tcg_out_dat_rIN(s, COND_AL, ARITH_SUB, ARITH_ADD,
> -                        args[0], args[1], args[2], const_args[2]);
> +        if (const_args[1]) {
> +            if (const_args[2]) {
> +                tcg_out_movi32(s, COND_AL, args[0], args[1] - args[2]);
> +            } else {
> +                tcg_out_dat_rI(s, COND_AL, ARITH_RSB,
> +                               args[0], args[2], args[1], 1);
> +            }
> +        } else {
> +            tcg_out_dat_rIN(s, COND_AL, ARITH_SUB, ARITH_ADD,
> +                            args[0], args[1], args[2], const_args[2]);
> +        }
>          break;
>      case INDEX_op_and_i32:
>          tcg_out_dat_rIK(s, COND_AL, ARITH_AND, ARITH_BIC,
> @@ -1859,7 +1868,7 @@ static const TCGTargetOpDef arm_op_defs[] = {
>  
>      /* TODO: "r", "r", "ri" */
>      { INDEX_op_add_i32, { "r", "r", "rIN" } },
> -    { INDEX_op_sub_i32, { "r", "r", "rIN" } },
> +    { INDEX_op_sub_i32, { "r", "rI", "rIN" } },
>      { INDEX_op_mul_i32, { "r", "r", "r" } },
>      { INDEX_op_mulu2_i32, { "r", "r", "r", "r" } },
>      { INDEX_op_muls2_i32, { "r", "r", "r", "r" } },

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/20] tcg-arm: Use tcg_out_dat_rIN for compares
  2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 04/20] tcg-arm: Use tcg_out_dat_rIN for compares Richard Henderson
@ 2013-03-29 16:58   ` Aurelien Jarno
  0 siblings, 0 replies; 41+ messages in thread
From: Aurelien Jarno @ 2013-03-29 16:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-devel

On Thu, Mar 28, 2013 at 08:32:45AM -0700, Richard Henderson wrote:
> This allows us to emit CMN instructions.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/arm/tcg-target.c | 40 ++++++++++++++++++++--------------------
>  1 file changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
> index 7475142..3152798 100644
> --- a/tcg/arm/tcg-target.c
> +++ b/tcg/arm/tcg-target.c
> @@ -1655,10 +1655,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          /* Constraints mean that v2 is always in the same register as dest,
>           * so we only need to do "if condition passed, move v1 to dest".
>           */
> -        tcg_out_dat_rI(s, COND_AL, ARITH_CMP, 0,
> -                       args[1], args[2], const_args[2]);
> -        tcg_out_dat_rI(s, tcg_cond_to_arm_cond[args[5]],
> -                       ARITH_MOV, args[0], 0, args[3], const_args[3]);
> +        tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
> +                        args[1], args[2], const_args[2]);
> +        tcg_out_dat_rIK(s, tcg_cond_to_arm_cond[args[5]], ARITH_MOV,
> +                        ARITH_MVN, args[0], 0, args[3], const_args[3]);
>          break;
>      case INDEX_op_add_i32:
>          tcg_out_dat_rIN(s, COND_AL, ARITH_ADD, ARITH_SUB,
> @@ -1755,7 +1755,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>  
>      case INDEX_op_brcond_i32:
> -        tcg_out_dat_rI(s, COND_AL, ARITH_CMP, 0,
> +        tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
>                         args[0], args[1], const_args[1]);
>          tcg_out_goto_label(s, tcg_cond_to_arm_cond[args[2]], args[3]);
>          break;
> @@ -1768,15 +1768,15 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>           * TCG_COND_LE(U) --> (a0 <= a2 && a1 == a3) || (a1 <= a3 && a1 != a3),
>           * TCG_COND_GT(U) --> (a0 >  a2 && a1 == a3) ||  a1 >  a3,
>           */
> -        tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0,
> -                        args[1], args[3], SHIFT_IMM_LSL(0));
> -        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0,
> -                        args[0], args[2], SHIFT_IMM_LSL(0));
> +        tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
> +                        args[1], args[3], const_args[3]);
> +        tcg_out_dat_rIN(s, COND_EQ, ARITH_CMP, ARITH_CMN, 0,
> +                        args[0], args[2], const_args[2]);
>          tcg_out_goto_label(s, tcg_cond_to_arm_cond[args[4]], args[5]);
>          break;
>      case INDEX_op_setcond_i32:
> -        tcg_out_dat_rI(s, COND_AL, ARITH_CMP, 0,
> -                       args[1], args[2], const_args[2]);
> +        tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
> +                        args[1], args[2], const_args[2]);
>          tcg_out_dat_imm(s, tcg_cond_to_arm_cond[args[3]],
>                          ARITH_MOV, args[0], 0, 1);
>          tcg_out_dat_imm(s, tcg_cond_to_arm_cond[tcg_invert_cond(args[3])],
> @@ -1784,10 +1784,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>      case INDEX_op_setcond2_i32:
>          /* See brcond2_i32 comment */
> -        tcg_out_dat_reg(s, COND_AL, ARITH_CMP, 0,
> -                        args[2], args[4], SHIFT_IMM_LSL(0));
> -        tcg_out_dat_reg(s, COND_EQ, ARITH_CMP, 0,
> -                        args[1], args[3], SHIFT_IMM_LSL(0));
> +        tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
> +                        args[2], args[4], const_args[4]);
> +        tcg_out_dat_rIN(s, COND_EQ, ARITH_CMP, ARITH_CMN, 0,
> +                        args[1], args[3], const_args[3]);
>          tcg_out_dat_imm(s, tcg_cond_to_arm_cond[args[5]],
>                          ARITH_MOV, args[0], 0, 1);
>          tcg_out_dat_imm(s, tcg_cond_to_arm_cond[tcg_invert_cond(args[5])],
> @@ -1885,15 +1885,15 @@ static const TCGTargetOpDef arm_op_defs[] = {
>      { INDEX_op_rotl_i32, { "r", "r", "ri" } },
>      { INDEX_op_rotr_i32, { "r", "r", "ri" } },
>  
> -    { INDEX_op_brcond_i32, { "r", "rI" } },
> -    { INDEX_op_setcond_i32, { "r", "r", "rI" } },
> -    { INDEX_op_movcond_i32, { "r", "r", "rI", "rI", "0" } },
> +    { INDEX_op_brcond_i32, { "r", "rIN" } },
> +    { INDEX_op_setcond_i32, { "r", "r", "rIN" } },
> +    { INDEX_op_movcond_i32, { "r", "r", "rIN", "rIK", "0" } },
>  
>      /* TODO: "r", "r", "r", "r", "ri", "ri" */
>      { INDEX_op_add2_i32, { "r", "r", "r", "r", "r", "r" } },
>      { INDEX_op_sub2_i32, { "r", "r", "r", "r", "r", "r" } },
> -    { INDEX_op_brcond2_i32, { "r", "r", "r", "r" } },
> -    { INDEX_op_setcond2_i32, { "r", "r", "r", "r", "r" } },
> +    { INDEX_op_brcond2_i32, { "r", "r", "rIN", "rIN" } },
> +    { INDEX_op_setcond2_i32, { "r", "r", "r", "rIN", "rIN" } },
>  
>  #if TARGET_LONG_BITS == 32
>      { INDEX_op_qemu_ld8u, { "r", "l" } },

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2013-03-29 16:58 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-28 15:32 [Qemu-devel] [PATCH v3 00/20] tcg-arm improvements Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 01/20] tcg-arm: Use bic to implement and with constant Richard Henderson
2013-03-29 16:53   ` Aurelien Jarno
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 02/20] tcg-arm: Handle negated constant arguments to and/sub Richard Henderson
2013-03-29 16:53   ` Aurelien Jarno
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 03/20] tcg-arm: Allow constant first argument to sub Richard Henderson
2013-03-29 16:58   ` Aurelien Jarno
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 04/20] tcg-arm: Use tcg_out_dat_rIN for compares Richard Henderson
2013-03-29 16:58   ` Aurelien Jarno
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 05/20] tcg-arm: Handle constant arguments to add2/sub2 Richard Henderson
2013-03-28 15:56   ` Peter Maydell
2013-03-28 16:04     ` Richard Henderson
2013-03-28 16:09       ` Laurent Desnogues
2013-03-28 16:16         ` Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 06/20] tcg-arm: Improve constant generation Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 07/20] tcg-arm: Fold epilogue into INDEX_op_exit_tb Richard Henderson
2013-03-28 16:05   ` Peter Maydell
2013-03-28 16:12     ` Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 08/20] tcg-arm: Implement deposit for armv7 Richard Henderson
2013-03-28 16:15   ` Peter Maydell
2013-03-28 16:22     ` Richard Henderson
2013-03-28 16:59       ` Peter Maydell
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 09/20] tcg-arm: Implement division instructions Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 10/20] tcg-arm: Use TCG_REG_TMP name for the tcg temporary Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 11/20] tcg-arm: Use R12 " Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 12/20] tcg-arm: Cleanup multiply subroutines Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 13/20] tcg-arm: Cleanup tcg_out_goto_label Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 14/20] tcg-arm: Cleanup goto_tb handling Richard Henderson
2013-03-28 20:09   ` Aurelien Jarno
2013-03-28 20:48     ` Richard Henderson
2013-03-29  6:50       ` Aurelien Jarno
2013-03-29 15:06         ` Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 15/20] tcg-arm: Cleanup most primitive load store subroutines Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 16/20] tcg-arm: Fix local stack frame Richard Henderson
2013-03-29 16:50   ` Aurelien Jarno
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 17/20] tcg-arm: Split out tcg_out_tlb_read Richard Henderson
2013-03-28 15:32 ` [Qemu-devel] [PATCH v3 18/20] tcg-arm: Improve scheduling of tcg_out_tlb_read Richard Henderson
2013-03-28 15:33 ` [Qemu-devel] [PATCH v3 19/20] tcg-arm: Use movi32 + blx for calls on v7 Richard Henderson
2013-03-28 15:33 ` [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION Richard Henderson
2013-03-28 16:44   ` Peter Maydell
2013-03-28 17:46     ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.