All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer
@ 2015-07-24 16:30 Aurelien Jarno
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking Aurelien Jarno
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson

This patchset improves the optimizer in 3 different ways:
 - by optimizing temp tracking using a bit array
 - by allowing constants to have copy
 - by differentiating 32 <-> 64 bits conversions from moves in the
   frontend by using specific instructions

The latter change introduces 2 new mandatory ext/extu_i32_i64 ops.
For the future we might want to allow each guest to only implement 2 of
the 3 size changing ops as I started to do in the following patchset:
http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg03369.html
That said we probably want to experiment with that first and see the
impact on the performances. If wrongly implemented, things might seem
to work but some bugs might appear in very rare cases, like the
truncation problem we recently got on aarch64 and x86-64 hosts. 
Alternatively we might want to always implement the 3 ops on all
backends to avoid multiplying variants of the code and with that bugs.

This patchset has been fully tested on x86-64 host, but only lightly
tested on aarch64, ppc64 and s390x hosts.

Aurelien Jarno (10):
  tcg/optimize: optimize temps tracking
  tcg/optimize: add temp_is_const and temp_is_copy functions
  tcg/optimize: track const/copy status separately
  tcg/optimize: allow constant to have copies
  tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
  tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
  tcg: implement real ext_i32_i64 and extu_i32_i64 ops
  tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
  tcg/optimize: do not remember garbage high bits for 32-bit ops
  tcg: update README about size changing ops

 tcg/README               |  20 +++-
 tcg/aarch64/tcg-target.c |   4 +
 tcg/aarch64/tcg-target.h |   2 +-
 tcg/i386/tcg-target.c    |   5 +
 tcg/i386/tcg-target.h    |   2 +-
 tcg/ia64/tcg-target.c    |   4 +
 tcg/ia64/tcg-target.h    |   2 +-
 tcg/optimize.c           | 246 ++++++++++++++++++++++-------------------------
 tcg/ppc/tcg-target.c     |   6 ++
 tcg/ppc/tcg-target.h     |   2 +-
 tcg/s390/tcg-target.c    |   5 +
 tcg/s390/tcg-target.h    |   2 +-
 tcg/sparc/tcg-target.c   |  12 ++-
 tcg/sparc/tcg-target.h   |   2 +-
 tcg/tcg-op.c             |  16 ++-
 tcg/tcg-opc.h            |   7 +-
 tcg/tcg.h                |   2 +-
 tcg/tci/tcg-target.c     |   4 +
 tcg/tci/tcg-target.h     |   2 +-
 tci.c                    |   6 +-
 20 files changed, 193 insertions(+), 158 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking
  2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
  2015-07-27  8:21   ` Paolo Bonzini
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 02/10] tcg/optimize: add temp_is_const and temp_is_copy functions Aurelien Jarno
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson

The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate
vector registers on most guests, it's not uncommon to have more than 100
used temps. This means we have initialize more than 2kB at least twice
per TB, often more when there is a few goto_tb.

Instead used a TCGTempSet bit array to track which temps are in used in
the current basic block. This means there are only around 16 bytes to
initialize.

This improves the boot time of a MIPS guest on an x86-64 host by around
7% and moves out tcg_optimize from the the top of the profiler list.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index cd0e793..20e24b3 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -50,6 +50,7 @@ struct tcg_temp_info {
 };
 
 static struct tcg_temp_info temps[TCG_MAX_TEMPS];
+static TCGTempSet temps_used;
 
 /* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
    the copy flag from the left temp.  */
@@ -67,6 +68,22 @@ static void reset_temp(TCGArg temp)
     temps[temp].mask = -1;
 }
 
+/* Reset all temporaries, given that there are NB_TEMPS of them.  */
+static void reset_all_temps(int nb_temps)
+{
+    memset(&temps_used.l, 0, sizeof(long) * BITS_TO_LONGS(nb_temps));
+}
+
+/* Initialize and activate a temporary.  */
+static void init_temp_info(TCGArg temp)
+{
+    if (!test_bit(temp, temps_used.l)) {
+        temps[temp].state = TCG_TEMP_UNDEF;
+        temps[temp].mask = -1;
+        set_bit(temp, temps_used.l);
+    }
+}
+
 static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
                                 TCGOpcode opc, int nargs)
 {
@@ -98,16 +115,6 @@ static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
     return new_op;
 }
 
-/* Reset all temporaries, given that there are NB_TEMPS of them.  */
-static void reset_all_temps(int nb_temps)
-{
-    int i;
-    for (i = 0; i < nb_temps; i++) {
-        temps[i].state = TCG_TEMP_UNDEF;
-        temps[i].mask = -1;
-    }
-}
-
 static int op_bits(TCGOpcode op)
 {
     const TCGOpDef *def = &tcg_op_defs[op];
@@ -606,6 +613,11 @@ void tcg_optimize(TCGContext *s)
             nb_iargs = def->nb_iargs;
         }
 
+        /* Initialize the temps that are going to be used */
+        for (i = 0; i < nb_oargs + nb_iargs; i++) {
+            init_temp_info(args[i]);
+        }
+
         /* Do copy propagation */
         for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
             if (temps[args[i]].state == TCG_TEMP_COPY) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-2.5 02/10] tcg/optimize: add temp_is_const and temp_is_copy functions
  2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
  2015-07-29 16:01   ` Alex Bennée
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately Aurelien Jarno
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson

Add two accessor functions temp_is_const and temp_is_copy, to make the
code more readable and make code change easier.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 131 ++++++++++++++++++++++++++-------------------------------
 1 file changed, 60 insertions(+), 71 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 20e24b3..d2b63a4 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -52,11 +52,21 @@ struct tcg_temp_info {
 static struct tcg_temp_info temps[TCG_MAX_TEMPS];
 static TCGTempSet temps_used;
 
+static inline bool temp_is_const(TCGArg arg)
+{
+    return temps[arg].state == TCG_TEMP_CONST;
+}
+
+static inline bool temp_is_copy(TCGArg arg)
+{
+    return temps[arg].state == TCG_TEMP_COPY;
+}
+
 /* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
    the copy flag from the left temp.  */
 static void reset_temp(TCGArg temp)
 {
-    if (temps[temp].state == TCG_TEMP_COPY) {
+    if (temp_is_copy(temp)) {
         if (temps[temp].prev_copy == temps[temp].next_copy) {
             temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
         } else {
@@ -186,8 +196,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
         return true;
     }
 
-    if (temps[arg1].state != TCG_TEMP_COPY
-        || temps[arg2].state != TCG_TEMP_COPY) {
+    if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) {
         return false;
     }
 
@@ -230,7 +239,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
         return;
     }
 
-    if (temps[src].state == TCG_TEMP_CONST) {
+    if (temp_is_const(src)) {
         tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
         return;
     }
@@ -248,10 +257,10 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
     }
     temps[dst].mask = mask;
 
-    assert(temps[src].state != TCG_TEMP_CONST);
+    assert(!temp_is_const(src));
 
     if (s->temps[src].type == s->temps[dst].type) {
-        if (temps[src].state != TCG_TEMP_COPY) {
+        if (!temp_is_copy(src)) {
             temps[src].state = TCG_TEMP_COPY;
             temps[src].next_copy = src;
             temps[src].prev_copy = src;
@@ -488,7 +497,7 @@ static bool do_constant_folding_cond_eq(TCGCond c)
 static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
                                        TCGArg y, TCGCond c)
 {
-    if (temps[x].state == TCG_TEMP_CONST && temps[y].state == TCG_TEMP_CONST) {
+    if (temp_is_const(x) && temp_is_const(y)) {
         switch (op_bits(op)) {
         case 32:
             return do_constant_folding_cond_32(temps[x].val, temps[y].val, c);
@@ -499,7 +508,7 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
         }
     } else if (temps_are_copies(x, y)) {
         return do_constant_folding_cond_eq(c);
-    } else if (temps[y].state == TCG_TEMP_CONST && temps[y].val == 0) {
+    } else if (temp_is_const(y) && temps[y].val == 0) {
         switch (c) {
         case TCG_COND_LTU:
             return 0;
@@ -520,12 +529,10 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
     TCGArg al = p1[0], ah = p1[1];
     TCGArg bl = p2[0], bh = p2[1];
 
-    if (temps[bl].state == TCG_TEMP_CONST
-        && temps[bh].state == TCG_TEMP_CONST) {
+    if (temp_is_const(bl) && temp_is_const(bh)) {
         uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val;
 
-        if (temps[al].state == TCG_TEMP_CONST
-            && temps[ah].state == TCG_TEMP_CONST) {
+        if (temp_is_const(al) && temp_is_const(ah)) {
             uint64_t a;
             a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val;
             return do_constant_folding_cond_64(a, b, c);
@@ -551,8 +558,8 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
 {
     TCGArg a1 = *p1, a2 = *p2;
     int sum = 0;
-    sum += temps[a1].state == TCG_TEMP_CONST;
-    sum -= temps[a2].state == TCG_TEMP_CONST;
+    sum += temp_is_const(a1);
+    sum -= temp_is_const(a2);
 
     /* Prefer the constant in second argument, and then the form
        op a, a, b, which is better handled on non-RISC hosts. */
@@ -567,10 +574,10 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
 static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 {
     int sum = 0;
-    sum += temps[p1[0]].state == TCG_TEMP_CONST;
-    sum += temps[p1[1]].state == TCG_TEMP_CONST;
-    sum -= temps[p2[0]].state == TCG_TEMP_CONST;
-    sum -= temps[p2[1]].state == TCG_TEMP_CONST;
+    sum += temp_is_const(p1[0]);
+    sum += temp_is_const(p1[1]);
+    sum -= temp_is_const(p2[0]);
+    sum -= temp_is_const(p2[1]);
     if (sum > 0) {
         TCGArg t;
         t = p1[0], p1[0] = p2[0], p2[0] = t;
@@ -620,7 +627,7 @@ void tcg_optimize(TCGContext *s)
 
         /* Do copy propagation */
         for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-            if (temps[args[i]].state == TCG_TEMP_COPY) {
+            if (temp_is_copy(args[i])) {
                 args[i] = find_better_copy(s, args[i]);
             }
         }
@@ -690,8 +697,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(sar):
         CASE_OP_32_64(rotl):
         CASE_OP_32_64(rotr):
-            if (temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[1]].val == 0) {
+            if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
                 tcg_opt_gen_movi(s, op, args, args[0], 0);
                 continue;
             }
@@ -701,7 +707,7 @@ void tcg_optimize(TCGContext *s)
                 TCGOpcode neg_op;
                 bool have_neg;
 
-                if (temps[args[2]].state == TCG_TEMP_CONST) {
+                if (temp_is_const(args[2])) {
                     /* Proceed with possible constant folding. */
                     break;
                 }
@@ -715,8 +721,7 @@ void tcg_optimize(TCGContext *s)
                 if (!have_neg) {
                     break;
                 }
-                if (temps[args[1]].state == TCG_TEMP_CONST
-                    && temps[args[1]].val == 0) {
+                if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
                     op->opc = neg_op;
                     reset_temp(args[0]);
                     args[1] = args[2];
@@ -726,34 +731,30 @@ void tcg_optimize(TCGContext *s)
             break;
         CASE_OP_32_64(xor):
         CASE_OP_32_64(nand):
-            if (temps[args[1]].state != TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == -1) {
+            if (!temp_is_const(args[1])
+                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
                 i = 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(nor):
-            if (temps[args[1]].state != TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == 0) {
+            if (!temp_is_const(args[1])
+                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
                 i = 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(andc):
-            if (temps[args[2]].state != TCG_TEMP_CONST
-                && temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[1]].val == -1) {
+            if (!temp_is_const(args[2])
+                && temp_is_const(args[1]) && temps[args[1]].val == -1) {
                 i = 2;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (temps[args[2]].state != TCG_TEMP_CONST
-                && temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[1]].val == 0) {
+            if (!temp_is_const(args[2])
+                && temp_is_const(args[1]) && temps[args[1]].val == 0) {
                 i = 2;
                 goto try_not;
             }
@@ -794,9 +795,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(or):
         CASE_OP_32_64(xor):
         CASE_OP_32_64(andc):
-            if (temps[args[1]].state != TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == 0) {
+            if (!temp_is_const(args[1])
+                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
                 tcg_opt_gen_mov(s, op, args, args[0], args[1]);
                 continue;
             }
@@ -804,9 +804,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(and):
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (temps[args[1]].state != TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == -1) {
+            if (!temp_is_const(args[1])
+                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
                 tcg_opt_gen_mov(s, op, args, args[0], args[1]);
                 continue;
             }
@@ -844,7 +843,7 @@ void tcg_optimize(TCGContext *s)
 
         CASE_OP_32_64(and):
             mask = temps[args[2]].mask;
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
         and_const:
                 affected = temps[args[1]].mask & ~mask;
             }
@@ -854,7 +853,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(andc):
             /* Known-zeros does not imply known-ones.  Therefore unless
                args[2] is constant, we can't infer anything from it.  */
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 mask = ~temps[args[2]].mask;
                 goto and_const;
             }
@@ -863,26 +862,26 @@ void tcg_optimize(TCGContext *s)
             break;
 
         case INDEX_op_sar_i32:
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & 31;
                 mask = (int32_t)temps[args[1]].mask >> tmp;
             }
             break;
         case INDEX_op_sar_i64:
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & 63;
                 mask = (int64_t)temps[args[1]].mask >> tmp;
             }
             break;
 
         case INDEX_op_shr_i32:
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & 31;
                 mask = (uint32_t)temps[args[1]].mask >> tmp;
             }
             break;
         case INDEX_op_shr_i64:
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & 63;
                 mask = (uint64_t)temps[args[1]].mask >> tmp;
             }
@@ -893,7 +892,7 @@ void tcg_optimize(TCGContext *s)
             break;
 
         CASE_OP_32_64(shl):
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & (TCG_TARGET_REG_BITS - 1);
                 mask = temps[args[1]].mask << tmp;
             }
@@ -974,8 +973,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(mul):
         CASE_OP_32_64(muluh):
         CASE_OP_32_64(mulsh):
-            if ((temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == 0)) {
+            if ((temp_is_const(args[2]) && temps[args[2]].val == 0)) {
                 tcg_opt_gen_movi(s, op, args, args[0], 0);
                 continue;
             }
@@ -1030,7 +1028,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(ext16u):
         case INDEX_op_ext32s_i64:
         case INDEX_op_ext32u_i64:
-            if (temps[args[1]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, 0);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
                 break;
@@ -1038,7 +1036,7 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         case INDEX_op_trunc_shr_i32:
-            if (temps[args[1]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, args[2]);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
                 break;
@@ -1067,8 +1065,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(divu):
         CASE_OP_32_64(rem):
         CASE_OP_32_64(remu):
-            if (temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val,
                                           temps[args[2]].val);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
@@ -1077,8 +1074,7 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         CASE_OP_32_64(deposit):
-            if (temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
                 tmp = deposit64(temps[args[1]].val, args[3], args[4],
                                 temps[args[2]].val);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
@@ -1118,10 +1114,8 @@ void tcg_optimize(TCGContext *s)
 
         case INDEX_op_add2_i32:
         case INDEX_op_sub2_i32:
-            if (temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[3]].state == TCG_TEMP_CONST
-                && temps[args[4]].state == TCG_TEMP_CONST
-                && temps[args[5]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2]) && temp_is_const(args[3])
+                && temp_is_const(args[4]) && temp_is_const(args[5])) {
                 uint32_t al = temps[args[2]].val;
                 uint32_t ah = temps[args[3]].val;
                 uint32_t bl = temps[args[4]].val;
@@ -1150,8 +1144,7 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         case INDEX_op_mulu2_i32:
-            if (temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[3]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2]) && temp_is_const(args[3])) {
                 uint32_t a = temps[args[2]].val;
                 uint32_t b = temps[args[3]].val;
                 uint64_t r = (uint64_t)a * b;
@@ -1183,10 +1176,8 @@ void tcg_optimize(TCGContext *s)
                     tcg_op_remove(s, op);
                 }
             } else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
-                       && temps[args[2]].state == TCG_TEMP_CONST
-                       && temps[args[3]].state == TCG_TEMP_CONST
-                       && temps[args[2]].val == 0
-                       && temps[args[3]].val == 0) {
+                       && temp_is_const(args[2]) && temps[args[2]].val == 0
+                       && temp_is_const(args[3]) && temps[args[3]].val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_brcond_high:
@@ -1248,10 +1239,8 @@ void tcg_optimize(TCGContext *s)
             do_setcond_const:
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
             } else if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
-                       && temps[args[3]].state == TCG_TEMP_CONST
-                       && temps[args[4]].state == TCG_TEMP_CONST
-                       && temps[args[3]].val == 0
-                       && temps[args[4]].val == 0) {
+                       && temp_is_const(args[3]) && temps[args[3]].val == 0
+                       && temp_is_const(args[4]) && temps[args[4]].val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_setcond_high:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately
  2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking Aurelien Jarno
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 02/10] tcg/optimize: add temp_is_const and temp_is_copy functions Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
  2015-07-27  8:25   ` Paolo Bonzini
  2015-07-29 16:10   ` Alex Bennée
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies Aurelien Jarno
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson

Use two bools to track constants and copies instead of an enum.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index d2b63a4..f16eb1e 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -35,14 +35,9 @@
         glue(glue(case INDEX_op_, x), _i32):    \
         glue(glue(case INDEX_op_, x), _i64)
 
-typedef enum {
-    TCG_TEMP_UNDEF = 0,
-    TCG_TEMP_CONST,
-    TCG_TEMP_COPY,
-} tcg_temp_state;
-
 struct tcg_temp_info {
-    tcg_temp_state state;
+    bool is_const;
+    bool is_copy;
     uint16_t prev_copy;
     uint16_t next_copy;
     tcg_target_ulong val;
@@ -54,12 +49,12 @@ static TCGTempSet temps_used;
 
 static inline bool temp_is_const(TCGArg arg)
 {
-    return temps[arg].state == TCG_TEMP_CONST;
+    return temps[arg].is_const;
 }
 
 static inline bool temp_is_copy(TCGArg arg)
 {
-    return temps[arg].state == TCG_TEMP_COPY;
+    return temps[arg].is_copy;
 }
 
 /* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
@@ -68,13 +63,14 @@ static void reset_temp(TCGArg temp)
 {
     if (temp_is_copy(temp)) {
         if (temps[temp].prev_copy == temps[temp].next_copy) {
-            temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
+            temps[temps[temp].next_copy].is_copy = false;
         } else {
             temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
             temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
         }
     }
-    temps[temp].state = TCG_TEMP_UNDEF;
+    temps[temp].is_const = false;
+    temps[temp].is_copy = false;
     temps[temp].mask = -1;
 }
 
@@ -88,7 +84,8 @@ static void reset_all_temps(int nb_temps)
 static void init_temp_info(TCGArg temp)
 {
     if (!test_bit(temp, temps_used.l)) {
-        temps[temp].state = TCG_TEMP_UNDEF;
+        temps[temp].is_const = false;
+        temps[temp].is_copy = false;
         temps[temp].mask = -1;
         set_bit(temp, temps_used.l);
     }
@@ -218,7 +215,8 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
     op->opc = new_op;
 
     reset_temp(dst);
-    temps[dst].state = TCG_TEMP_CONST;
+    temps[dst].is_const = true;
+    temps[dst].is_copy = false;
     temps[dst].val = val;
     mask = val;
     if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
@@ -261,11 +259,13 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
 
     if (s->temps[src].type == s->temps[dst].type) {
         if (!temp_is_copy(src)) {
-            temps[src].state = TCG_TEMP_COPY;
+            temps[src].is_const = false;
+            temps[src].is_copy = true;
             temps[src].next_copy = src;
             temps[src].prev_copy = src;
         }
-        temps[dst].state = TCG_TEMP_COPY;
+        temps[dst].is_const = false;
+        temps[dst].is_copy = true;
         temps[dst].next_copy = temps[src].next_copy;
         temps[dst].prev_copy = src;
         temps[temps[dst].next_copy].prev_copy = dst;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies
  2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
                   ` (2 preceding siblings ...)
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
  2015-07-24 20:15   ` Richard Henderson
  2015-07-29 16:12   ` Alex Bennée
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 05/10] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson

Now that copies and constants are tracked separately, we can allow
constant to have copies, deferring the choice to use a register or a
constant to the register allocation pass. This prevent this kind of
regular constant reloading:

-OUT: [size=338]
+OUT: [size=298]
   mov    -0x4(%r14),%ebp
   test   %ebp,%ebp
   jne    0x7ffbe9cb0ed6
   mov    $0x40002219f8,%rbp
   mov    %rbp,(%r14)
-  mov    $0x40002219f8,%rbp
   mov    $0x4000221a20,%rbx
   mov    %rbp,(%rbx)
   mov    $0x4000000000,%rbp
   mov    %rbp,(%r14)
-  mov    $0x4000000000,%rbp
   mov    $0x4000221d38,%rbx
   mov    %rbp,(%rbx)
   mov    $0x40002221a8,%rbp
   mov    %rbp,(%r14)
-  mov    $0x40002221a8,%rbp
   mov    $0x4000221d40,%rbx
   mov    %rbp,(%rbx)
   mov    $0x4000019170,%rbp
   mov    %rbp,(%r14)
-  mov    $0x4000019170,%rbp
   mov    $0x4000221d48,%rbx
   mov    %rbp,(%rbx)
   mov    $0x40000049ee,%rbp
   mov    %rbp,0x80(%r14)
   mov    %r14,%rdi
   callq  0x7ffbe99924d0
   mov    $0x4000001680,%rbp
   mov    %rbp,0x30(%r14)
   mov    0x10(%r14),%rbp
   mov    $0x4000001680,%rbp
   mov    %rbp,0x30(%r14)
   mov    0x10(%r14),%rbp
   shl    $0x20,%rbp
   mov    (%r14),%rbx
   mov    %ebx,%ebx
   mov    %rbx,(%r14)
   or     %rbx,%rbp
   mov    %rbp,0x10(%r14)
   mov    %rbp,0x90(%r14)
   mov    0x60(%r14),%rbx
   mov    %rbx,0x38(%r14)
   mov    0x28(%r14),%rbx
   mov    $0x4000220e60,%r12
   mov    %rbx,(%r12)
   mov    $0x40002219c8,%rbx
   mov    %rbp,(%rbx)
   mov    0x20(%r14),%rbp
   sub    $0x8,%rbp
   mov    $0x4000004a16,%rbx
   mov    %rbx,0x0(%rbp)
   mov    %rbp,0x20(%r14)
   mov    $0x19,%ebp
   mov    %ebp,0xa8(%r14)
   mov    $0x4000015110,%rbp
   mov    %rbp,0x80(%r14)
   xor    %eax,%eax
   jmpq   0x7ffbebcae426
   lea    -0x5f6d72a(%rip),%rax        # 0x7ffbe3d437b3
   jmpq   0x7ffbebcae426

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index f16eb1e..48103b2 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -237,11 +237,6 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
         return;
     }
 
-    if (temp_is_const(src)) {
-        tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
-        return;
-    }
-
     TCGOpcode new_op = op_to_mov(op->opc);
     tcg_target_ulong mask;
 
@@ -255,17 +250,15 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
     }
     temps[dst].mask = mask;
 
-    assert(!temp_is_const(src));
-
     if (s->temps[src].type == s->temps[dst].type) {
         if (!temp_is_copy(src)) {
-            temps[src].is_const = false;
             temps[src].is_copy = true;
             temps[src].next_copy = src;
             temps[src].prev_copy = src;
         }
-        temps[dst].is_const = false;
+        temps[dst].is_const = temps[src].is_const;
         temps[dst].is_copy = true;
+        temps[dst].val = temps[src].val;
         temps[dst].next_copy = temps[src].next_copy;
         temps[dst].prev_copy = src;
         temps[temps[dst].next_copy].prev_copy = dst;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-2.5 05/10] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
  2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
                   ` (3 preceding siblings ...)
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
  2015-07-31  6:31   ` Alex Bennée
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 06/10] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson

The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32,
and the name in the README doesn't match the name offered to the
frontends.

Always use the long name to make it clear it is a size changing op.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/README               | 2 +-
 tcg/aarch64/tcg-target.h | 2 +-
 tcg/i386/tcg-target.h    | 2 +-
 tcg/ia64/tcg-target.h    | 2 +-
 tcg/optimize.c           | 6 +++---
 tcg/ppc/tcg-target.h     | 2 +-
 tcg/s390/tcg-target.h    | 2 +-
 tcg/sparc/tcg-target.c   | 4 ++--
 tcg/sparc/tcg-target.h   | 2 +-
 tcg/tcg-op.c             | 4 ++--
 tcg/tcg-opc.h            | 4 ++--
 tcg/tcg.h                | 2 +-
 tcg/tci/tcg-target.h     | 2 +-
 13 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/tcg/README b/tcg/README
index a550ff1..61b3899 100644
--- a/tcg/README
+++ b/tcg/README
@@ -314,7 +314,7 @@ This operation would be equivalent to
 
   dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
 
-* trunc_shr_i32 t0, t1, pos
+* trunc_shr_i64_i32 t0, t1, pos
 
 For 64-bit hosts only, right shift the 64-bit input T1 by POS and
 truncate to 32-bit output T0.  Depending on the host, this may be
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 8aec04d..dfd8801 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -70,7 +70,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 25b5133..dae50ba 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -102,7 +102,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
 #define TCG_TARGET_HAS_ext8s_i64        1
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index a04ed81..29902f9 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -160,7 +160,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_mulsh_i64        0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) ((len) <= 16)
 #define TCG_TARGET_deposit_i64_valid(ofs, len) ((len) <= 16)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 48103b2..56e0a17 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -301,7 +301,7 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     case INDEX_op_shr_i32:
         return (uint32_t)x >> (y & 31);
 
-    case INDEX_op_trunc_shr_i32:
+    case INDEX_op_trunc_shr_i64_i32:
     case INDEX_op_shr_i64:
         return (uint64_t)x >> (y & 63);
 
@@ -880,7 +880,7 @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        case INDEX_op_trunc_shr_i32:
+        case INDEX_op_trunc_shr_i64_i32:
             mask = (uint64_t)temps[args[1]].mask >> args[2];
             break;
 
@@ -1028,7 +1028,7 @@ void tcg_optimize(TCGContext *s)
             }
             goto do_default;
 
-        case INDEX_op_trunc_shr_i32:
+        case INDEX_op_trunc_shr_i64_i32:
             if (temp_is_const(args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, args[2]);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 7ce7048..b7e6861 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -77,7 +77,7 @@ typedef enum {
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
 #define TCG_TARGET_HAS_sub2_i32         0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_rot_i64          1
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 91576d5..50016a8 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -72,7 +72,7 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 1a870a8..b23032b 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1413,7 +1413,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext32u_i64:
         tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
         break;
-    case INDEX_op_trunc_shr_i32:
+    case INDEX_op_trunc_shr_i64_i32:
         if (a2 == 0) {
             tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
         } else {
@@ -1533,7 +1533,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 
     { INDEX_op_ext32s_i64, { "R", "r" } },
     { INDEX_op_ext32u_i64, { "R", "r" } },
-    { INDEX_op_trunc_shr_i32,  { "r", "R" } },
+    { INDEX_op_trunc_shr_i64_i32,  { "r", "R" } },
 
     { INDEX_op_brcond_i64, { "RZ", "RJ" } },
     { INDEX_op_setcond_i64, { "R", "RZ", "RJ" } },
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index f584de4..336c47f 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -118,7 +118,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 
-#define TCG_TARGET_HAS_trunc_shr_i32    1
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 1
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_rot_i64          0
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 45098c3..61b64db 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1751,8 +1751,8 @@ void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
             tcg_gen_mov_i32(ret, TCGV_LOW(t));
             tcg_temp_free_i64(t);
         }
-    } else if (TCG_TARGET_HAS_trunc_shr_i32) {
-        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i32, ret,
+    } else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
+        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i64_i32, ret,
                          MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
     } else if (count == 0) {
         tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 13ccb60..4a34f43 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -138,8 +138,8 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
 
-DEF(trunc_shr_i32, 1, 1, 1,
-    IMPL(TCG_TARGET_HAS_trunc_shr_i32)
+DEF(trunc_shr_i64_i32, 1, 1, 1,
+    IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
     | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
 
 DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | IMPL64)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 231a781..e7e33b9 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -66,7 +66,7 @@ typedef uint64_t TCGRegSet;
 
 #if TCG_TARGET_REG_BITS == 32
 /* Turn some undef macros into false macros.  */
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_div_i64          0
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_div2_i64         0
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index cbf3f9b..8b1139b 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -84,7 +84,7 @@
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_bswap16_i64      1
 #define TCG_TARGET_HAS_bswap32_i64      1
 #define TCG_TARGET_HAS_bswap64_i64      1
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-2.5 06/10] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
  2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
                   ` (4 preceding siblings ...)
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 05/10] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
  2015-07-31  7:32   ` Alex Bennée
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 07/10] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson

The tcg_gen_trunc_shr_i64_i32 function takes a 64-bit argument and
returns a 32-bit value. Directly call tcg_gen_op3 with the correct
types instead of calling tcg_gen_op3i_i32 and abusing the TCG types.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg-op.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 61b64db..0e79fd1 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1752,8 +1752,8 @@ void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
             tcg_temp_free_i64(t);
         }
     } else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
-        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i64_i32, ret,
-                         MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
+        tcg_gen_op3(&tcg_ctx, INDEX_op_trunc_shr_i64_i32,
+                    GET_TCGV_I32(ret), GET_TCGV_I64(arg), count);
     } else if (count == 0) {
         tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
     } else {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-2.5 07/10] tcg: implement real ext_i32_i64 and extu_i32_i64 ops
  2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
                   ` (5 preceding siblings ...)
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 06/10] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
  2015-07-31 16:01   ` Alex Bennée
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 08/10] tcg/optimize: add optimizations for " Aurelien Jarno
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Claudio Fontana, Stefan Weil, Claudio Fontana, Alexander Graf,
	Blue Swirl, Aurelien Jarno, Richard Henderson

Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a
32-bit value is always converted to a 64-bit value and not propagated
through the register allocator or the optimizer.

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Claudio Fontana <claudio.fontana@huawei.com>
Cc: Claudio Fontana <claudio.fontana@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/aarch64/tcg-target.c |  4 ++++
 tcg/i386/tcg-target.c    |  5 +++++
 tcg/ia64/tcg-target.c    |  4 ++++
 tcg/ppc/tcg-target.c     |  6 ++++++
 tcg/s390/tcg-target.c    |  5 +++++
 tcg/sparc/tcg-target.c   |  8 ++++++--
 tcg/tcg-op.c             | 10 ++++------
 tcg/tcg-opc.h            |  3 +++
 tcg/tci/tcg-target.c     |  4 ++++
 tci.c                    |  6 ++++--
 10 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index b7ec4f5..7f7ab7e 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1556,6 +1556,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16s_i32:
         tcg_out_sxt(s, ext, MO_16, a0, a1);
         break;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
         break;
@@ -1567,6 +1568,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16u_i32:
         tcg_out_uxt(s, MO_16, a0, a1);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tcg_out_movr(s, TCG_TYPE_I32, a0, a1);
         break;
@@ -1712,6 +1714,8 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_ext8u_i64, { "r", "r" } },
     { INDEX_op_ext16u_i64, { "r", "r" } },
     { INDEX_op_ext32u_i64, { "r", "r" } },
+    { INDEX_op_ext_i32_i64, { "r", "r" } },
+    { INDEX_op_extu_i32_i64, { "r", "r" } },
 
     { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
     { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 4f40468..ff55499 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -2068,9 +2068,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_bswap64_i64:
         tcg_out_bswap64(s, args[0]);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tcg_out_ext32u(s, args[0], args[1]);
         break;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tcg_out_ext32s(s, args[0], args[1]);
         break;
@@ -2205,6 +2207,9 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_ext16u_i64, { "r", "r" } },
     { INDEX_op_ext32u_i64, { "r", "r" } },
 
+    { INDEX_op_ext_i32_i64, { "r", "r" } },
+    { INDEX_op_extu_i32_i64, { "r", "r" } },
+
     { INDEX_op_deposit_i64, { "Q", "0", "Q" } },
     { INDEX_op_movcond_i64, { "r", "r", "re", "r", "0" } },
 
diff --git a/tcg/ia64/tcg-target.c b/tcg/ia64/tcg-target.c
index 81cb9f7..71e79cf 100644
--- a/tcg/ia64/tcg-target.c
+++ b/tcg/ia64/tcg-target.c
@@ -2148,9 +2148,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16u_i64:
         tcg_out_ext(s, OPC_ZXT2_I29, args[0], args[1]);
         break;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tcg_out_ext(s, OPC_SXT4_I29, args[0], args[1]);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tcg_out_ext(s, OPC_ZXT4_I29, args[0], args[1]);
         break;
@@ -2301,6 +2303,8 @@ static const TCGTargetOpDef ia64_op_defs[] = {
     { INDEX_op_ext16u_i64, { "r", "rZ"} },
     { INDEX_op_ext32s_i64, { "r", "rZ"} },
     { INDEX_op_ext32u_i64, { "r", "rZ"} },
+    { INDEX_op_ext_i32_i64, { "r", "rZ" } },
+    { INDEX_op_extu_i32_i64, { "r", "rZ" } },
 
     { INDEX_op_bswap16_i64, { "r", "rZ" } },
     { INDEX_op_bswap32_i64, { "r", "rZ" } },
diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
index ce8d546..1672220 100644
--- a/tcg/ppc/tcg-target.c
+++ b/tcg/ppc/tcg-target.c
@@ -2221,12 +2221,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_ext16s_i64:
         c = EXTSH;
         goto gen_ext;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         c = EXTSW;
         goto gen_ext;
     gen_ext:
         tcg_out32(s, c | RS(args[1]) | RA(args[0]));
         break;
+    case INDEX_op_extu_i32_i64:
+        tcg_out_ext32u(s, args[0], args[1]);
+        break;
 
     case INDEX_op_setcond_i32:
         tcg_out_setcond(s, TCG_TYPE_I32, args[3], args[0], args[1], args[2],
@@ -2503,6 +2507,8 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_ext8s_i64, { "r", "r" } },
     { INDEX_op_ext16s_i64, { "r", "r" } },
     { INDEX_op_ext32s_i64, { "r", "r" } },
+    { INDEX_op_ext_i32_i64, { "r", "r" } },
+    { INDEX_op_extu_i32_i64, { "r", "r" } },
     { INDEX_op_bswap16_i64, { "r", "r" } },
     { INDEX_op_bswap32_i64, { "r", "r" } },
     { INDEX_op_bswap64_i64, { "r", "r" } },
diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index b3433ce..d4db6d3 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -2106,6 +2106,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16s_i64:
         tgen_ext16s(s, TCG_TYPE_I64, args[0], args[1]);
         break;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tgen_ext32s(s, args[0], args[1]);
         break;
@@ -2115,6 +2116,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16u_i64:
         tgen_ext16u(s, TCG_TYPE_I64, args[0], args[1]);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tgen_ext32u(s, args[0], args[1]);
         break;
@@ -2267,6 +2269,9 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_ext32s_i64, { "r", "r" } },
     { INDEX_op_ext32u_i64, { "r", "r" } },
 
+    { INDEX_op_ext_i32_i64, { "r", "r" } },
+    { INDEX_op_extu_i32_i64, { "r", "r" } },
+
     { INDEX_op_bswap16_i64, { "r", "r" } },
     { INDEX_op_bswap32_i64, { "r", "r" } },
     { INDEX_op_bswap64_i64, { "r", "r" } },
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index b23032b..fe75af0 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1407,9 +1407,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_divu_i64:
         c = ARITH_UDIVX;
         goto gen_arith;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tcg_out_arithi(s, a0, a1, 0, SHIFT_SRA);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
         break;
@@ -1531,8 +1533,10 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_neg_i64, { "R", "RJ" } },
     { INDEX_op_not_i64, { "R", "RJ" } },
 
-    { INDEX_op_ext32s_i64, { "R", "r" } },
-    { INDEX_op_ext32u_i64, { "R", "r" } },
+    { INDEX_op_ext32s_i64, { "R", "R" } },
+    { INDEX_op_ext32u_i64, { "R", "R" } },
+    { INDEX_op_ext_i32_i64, { "R", "r" } },
+    { INDEX_op_extu_i32_i64, { "R", "r" } },
     { INDEX_op_trunc_shr_i64_i32,  { "r", "R" } },
 
     { INDEX_op_brcond_i64, { "RZ", "RJ" } },
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 0e79fd1..7114315 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1770,9 +1770,8 @@ void tcg_gen_extu_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
         tcg_gen_mov_i32(TCGV_LOW(ret), arg);
         tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
     } else {
-        /* Note: we assume the target supports move between
-           32 and 64 bit registers.  */
-        tcg_gen_ext32u_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
+        tcg_gen_op2(&tcg_ctx, INDEX_op_extu_i32_i64,
+                    GET_TCGV_I64(ret), GET_TCGV_I32(arg));
     }
 }
 
@@ -1782,9 +1781,8 @@ void tcg_gen_ext_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
         tcg_gen_mov_i32(TCGV_LOW(ret), arg);
         tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
     } else {
-        /* Note: we assume the target supports move between
-           32 and 64 bit registers.  */
-        tcg_gen_ext32s_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
+        tcg_gen_op2(&tcg_ctx, INDEX_op_ext_i32_i64,
+                    GET_TCGV_I64(ret), GET_TCGV_I32(arg));
     }
 }
 
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 4a34f43..f721a5a 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -138,6 +138,9 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
 
+/* size changing ops */
+DEF(ext_i32_i64, 1, 1, 0, IMPL64)
+DEF(extu_i32_i64, 1, 1, 0, IMPL64)
 DEF(trunc_shr_i64_i32, 1, 1, 1,
     IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
     | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
diff --git a/tcg/tci/tcg-target.c b/tcg/tci/tcg-target.c
index 83472db..bbb54d4 100644
--- a/tcg/tci/tcg-target.c
+++ b/tcg/tci/tcg-target.c
@@ -210,6 +210,8 @@ static const TCGTargetOpDef tcg_target_op_defs[] = {
 #if TCG_TARGET_HAS_ext32u_i64
     { INDEX_op_ext32u_i64, { R, R } },
 #endif
+    { INDEX_op_ext_i32_i64, { R, R } },
+    { INDEX_op_extu_i32_i64, { R, R } },
 #if TCG_TARGET_HAS_bswap16_i64
     { INDEX_op_bswap16_i64, { R, R } },
 #endif
@@ -701,6 +703,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_ext16u_i64:   /* Optional (TCG_TARGET_HAS_ext16u_i64). */
     case INDEX_op_ext32s_i64:   /* Optional (TCG_TARGET_HAS_ext32s_i64). */
     case INDEX_op_ext32u_i64:   /* Optional (TCG_TARGET_HAS_ext32u_i64). */
+    case INDEX_op_ext_i32_i64:
+    case INDEX_op_extu_i32_i64:
 #endif /* TCG_TARGET_REG_BITS == 64 */
     case INDEX_op_neg_i32:      /* Optional (TCG_TARGET_HAS_neg_i32). */
     case INDEX_op_not_i32:      /* Optional (TCG_TARGET_HAS_not_i32). */
diff --git a/tci.c b/tci.c
index 8444948..3d6d177 100644
--- a/tci.c
+++ b/tci.c
@@ -1033,18 +1033,20 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
 #endif
 #if TCG_TARGET_HAS_ext32s_i64
         case INDEX_op_ext32s_i64:
+#endif
+        case INDEX_op_ext_i32_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r32s(&tb_ptr);
             tci_write_reg64(t0, t1);
             break;
-#endif
 #if TCG_TARGET_HAS_ext32u_i64
         case INDEX_op_ext32u_i64:
+#endif
+        case INDEX_op_extu_i32_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r32(&tb_ptr);
             tci_write_reg64(t0, t1);
             break;
-#endif
 #if TCG_TARGET_HAS_bswap16_i64
         case INDEX_op_bswap16_i64:
             TODO();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-2.5 08/10] tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
  2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
                   ` (6 preceding siblings ...)
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 07/10] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 09/10] tcg/optimize: do not remember garbage high bits for 32-bit ops Aurelien Jarno
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 10/10] tcg: update README about size changing ops Aurelien Jarno
  9 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson

They behave the same as ext32s_i64 and ext32u_i64 from the constant
folding and zero propagation point of view, except that they can't
be replaced by a mov, so we don't compute the affected value.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 56e0a17..155d5c3 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -356,9 +356,11 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     CASE_OP_32_64(ext16u):
         return (uint16_t)x;
 
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         return (int32_t)x;
 
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         return (uint32_t)x;
 
@@ -843,6 +845,15 @@ void tcg_optimize(TCGContext *s)
             mask = temps[args[1]].mask & mask;
             break;
 
+        case INDEX_op_ext_i32_i64:
+            if ((temps[args[1]].mask & 0x80000000) != 0) {
+                break;
+            }
+        case INDEX_op_extu_i32_i64:
+            /* We do not compute affected as it is a size changing op.  */
+            mask = (uint32_t)temps[args[1]].mask;
+            break;
+
         CASE_OP_32_64(andc):
             /* Known-zeros does not imply known-ones.  Therefore unless
                args[2] is constant, we can't infer anything from it.  */
@@ -1021,6 +1032,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(ext16u):
         case INDEX_op_ext32s_i64:
         case INDEX_op_ext32u_i64:
+        case INDEX_op_ext_i32_i64:
+        case INDEX_op_extu_i32_i64:
             if (temp_is_const(args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, 0);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-2.5 09/10] tcg/optimize: do not remember garbage high bits for 32-bit ops
  2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
                   ` (7 preceding siblings ...)
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 08/10] tcg/optimize: add optimizations for " Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 10/10] tcg: update README about size changing ops Aurelien Jarno
  9 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson

Now that we have real size changing ops, we don't need to mark high
bits of the destination as garbage. The goal of the optimizer is to
predict the value of the temps (and not of the registers) and do
simplifications when possible. The problem there is therefore not the
fact that those bits are not counted as garbage, but that a size
changing op is replaced by a move.

This patch is basically a revert of 24666baf, including the changes that
have been made since then.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 43 ++++++++++++-------------------------------
 1 file changed, 12 insertions(+), 31 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 155d5c3..15d76a0 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -209,21 +209,13 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
 static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
                              TCGArg dst, TCGArg val)
 {
-    TCGOpcode new_op = op_to_movi(op->opc);
-    tcg_target_ulong mask;
-
-    op->opc = new_op;
+    op->opc = op_to_movi(op->opc);
 
     reset_temp(dst);
-    temps[dst].is_const = true;
     temps[dst].is_copy = false;
+    temps[dst].is_const = true;
     temps[dst].val = val;
-    mask = val;
-    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
-        /* High bits of the destination are now garbage.  */
-        mask |= ~0xffffffffull;
-    }
-    temps[dst].mask = mask;
+    temps[dst].mask = val;
 
     args[0] = dst;
     args[1] = val;
@@ -237,18 +229,9 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
         return;
     }
 
-    TCGOpcode new_op = op_to_mov(op->opc);
-    tcg_target_ulong mask;
-
-    op->opc = new_op;
+    op->opc = op_to_mov(op->opc);
 
     reset_temp(dst);
-    mask = temps[src].mask;
-    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
-        /* High bits of the destination are now garbage.  */
-        mask |= ~0xffffffffull;
-    }
-    temps[dst].mask = mask;
 
     if (s->temps[src].type == s->temps[dst].type) {
         if (!temp_is_copy(src)) {
@@ -256,15 +239,17 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
             temps[src].next_copy = src;
             temps[src].prev_copy = src;
         }
-        temps[dst].is_const = temps[src].is_const;
         temps[dst].is_copy = true;
-        temps[dst].val = temps[src].val;
         temps[dst].next_copy = temps[src].next_copy;
         temps[dst].prev_copy = src;
         temps[temps[dst].next_copy].prev_copy = dst;
         temps[src].next_copy = dst;
     }
 
+    temps[dst].is_const = temps[src].is_const;
+    temps[dst].val = temps[src].val;
+    temps[dst].mask = temps[src].mask;
+
     args[0] = dst;
     args[1] = src;
 }
@@ -597,7 +582,7 @@ void tcg_optimize(TCGContext *s)
     reset_all_temps(nb_temps);
 
     for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) {
-        tcg_target_ulong mask, partmask, affected;
+        tcg_target_ulong mask, affected;
         int nb_oargs, nb_iargs, i;
         TCGArg tmp;
 
@@ -950,17 +935,13 @@ void tcg_optimize(TCGContext *s)
             break;
         }
 
-        /* 32-bit ops generate 32-bit results.  For the result is zero test
-           below, we can ignore high bits, but for further optimizations we
-           need to record that the high bits contain garbage.  */
-        partmask = mask;
+        /* 32-bit ops generate 32-bit results.  */
         if (!(def->flags & TCG_OPF_64BIT)) {
-            mask |= ~(tcg_target_ulong)0xffffffffu;
-            partmask &= 0xffffffffu;
+            mask &= 0xffffffffu;
             affected &= 0xffffffffu;
         }
 
-        if (partmask == 0) {
+        if (mask == 0) {
             assert(nb_oargs == 1);
             tcg_opt_gen_movi(s, op, args, args[0], 0);
             continue;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH for-2.5 10/10] tcg: update README about size changing ops
  2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
                   ` (8 preceding siblings ...)
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 09/10] tcg/optimize: do not remember garbage high bits for 32-bit ops Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
  2015-07-31 16:02   ` Alex Bennée
  9 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/README | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/tcg/README b/tcg/README
index 61b3899..a22f251 100644
--- a/tcg/README
+++ b/tcg/README
@@ -466,13 +466,25 @@ On a 32 bit target, all 64 bit operations are converted to 32 bits. A
 few specific operations must be implemented to allow it (see add2_i32,
 sub2_i32, brcond2_i32).
 
+On a 64 bit target, the values are transfered between 32 and 64-bit
+registers using the following ops:
+- trunc_shr_i64_i32
+- ext_i32_i64
+- extu_i32_i64
+
+They ensure that the values are correctly truncated or extended when
+moved from a 32-bit to a 64-bit register or vice-versa. Note that the
+trunc_shr_i64_i32 is an optional op. It is not necessary to implement
+it if all the following conditions are met:
+- 64-bit registers can hold 32-bit values
+- 32-bit values in a 64-bit register do not need to stay zero or
+  sign extended
+- all 32-bit TCG ops ignore the high part of 64-bit registers
+
 Floating point operations are not supported in this version. A
 previous incarnation of the code generator had full support of them,
 but it is better to concentrate on integer operations first.
 
-On a 64 bit target, no assumption is made in TCG about the storage of
-the 32 bit values in 64 bit registers.
-
 4.2) Constraints
 
 GCC like constraints are used to define the constraints of every
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies Aurelien Jarno
@ 2015-07-24 20:15   ` Richard Henderson
  2015-07-24 22:56     ` Aurelien Jarno
  2015-07-29 16:12   ` Alex Bennée
  1 sibling, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2015-07-24 20:15 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel

On 07/24/2015 09:30 AM, Aurelien Jarno wrote:
> Now that copies and constants are tracked separately, we can allow
> constant to have copies, deferring the choice to use a register or a
> constant to the register allocation pass. This prevent this kind of
> regular constant reloading:

This appears to cause

$ gdb --args ./x86_64-linux-user/qemu-x86_64 /bin/ls
...
(gdb) run
Starting program: /home/rth/work/qemu/bld/x86_64-linux-user/qemu-x86_64 /bin/ls
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff6604700 (LWP 7767)]
qemu-x86_64: /home/rth/work/qemu/qemu/tcg/tcg.c:1827: tcg_reg_alloc_bb_end:
Assertion `ts->val_type == TEMP_VAL_DEAD' failed.

Program received signal SIGABRT, Aborted.


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies
  2015-07-24 20:15   ` Richard Henderson
@ 2015-07-24 22:56     ` Aurelien Jarno
  0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 22:56 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 2015-07-24 13:15, Richard Henderson wrote:
> On 07/24/2015 09:30 AM, Aurelien Jarno wrote:
> > Now that copies and constants are tracked separately, we can allow
> > constant to have copies, deferring the choice to use a register or a
> > constant to the register allocation pass. This prevent this kind of
> > regular constant reloading:
> 
> This appears to cause
> 
> $ gdb --args ./x86_64-linux-user/qemu-x86_64 /bin/ls
> ...
> (gdb) run
> Starting program: /home/rth/work/qemu/bld/x86_64-linux-user/qemu-x86_64 /bin/ls
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New Thread 0x7ffff6604700 (LWP 7767)]
> qemu-x86_64: /home/rth/work/qemu/qemu/tcg/tcg.c:1827: tcg_reg_alloc_bb_end:
> Assertion `ts->val_type == TEMP_VAL_DEAD' failed.
> 
> Program received signal SIGABRT, Aborted.

It looks like I have forgotten to configure with --enable-debug-tcg.
With it I am able to reproduce the problem. It seems to be a bug in the
liveness analysis or the register allocator, I will try to come with a
patch soon.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking Aurelien Jarno
@ 2015-07-27  8:21   ` Paolo Bonzini
  2015-07-27  9:09     ` Aurelien Jarno
  0 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2015-07-27  8:21 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Richard Henderson



On 24/07/2015 18:30, Aurelien Jarno wrote:
> The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate
> vector registers on most guests, it's not uncommon to have more than 100
> used temps. This means we have initialize more than 2kB at least twice
> per TB, often more when there is a few goto_tb.
> 
> Instead used a TCGTempSet bit array to track which temps are in used in
> the current basic block. This means there are only around 16 bytes to
> initialize.
> 
> This improves the boot time of a MIPS guest on an x86-64 host by around
> 7% and moves out tcg_optimize from the the top of the profiler list.
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/optimize.c | 32 ++++++++++++++++++++++----------
>  1 file changed, 22 insertions(+), 10 deletions(-)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index cd0e793..20e24b3 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -50,6 +50,7 @@ struct tcg_temp_info {
>  };
>  
>  static struct tcg_temp_info temps[TCG_MAX_TEMPS];
> +static TCGTempSet temps_used;
>  
>  /* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
>     the copy flag from the left temp.  */
> @@ -67,6 +68,22 @@ static void reset_temp(TCGArg temp)
>      temps[temp].mask = -1;
>  }
>  
> +/* Reset all temporaries, given that there are NB_TEMPS of them.  */
> +static void reset_all_temps(int nb_temps)
> +{
> +    memset(&temps_used.l, 0, sizeof(long) * BITS_TO_LONGS(nb_temps));

You can use bitmap_zero here.

Paolo

> +}
> +
> +/* Initialize and activate a temporary.  */
> +static void init_temp_info(TCGArg temp)
> +{
> +    if (!test_bit(temp, temps_used.l)) {
> +        temps[temp].state = TCG_TEMP_UNDEF;
> +        temps[temp].mask = -1;
> +        set_bit(temp, temps_used.l);
> +    }
> +}
> +
>  static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
>                                  TCGOpcode opc, int nargs)
>  {
> @@ -98,16 +115,6 @@ static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
>      return new_op;
>  }
>  
> -/* Reset all temporaries, given that there are NB_TEMPS of them.  */
> -static void reset_all_temps(int nb_temps)
> -{
> -    int i;
> -    for (i = 0; i < nb_temps; i++) {
> -        temps[i].state = TCG_TEMP_UNDEF;
> -        temps[i].mask = -1;
> -    }
> -}
> -
>  static int op_bits(TCGOpcode op)
>  {
>      const TCGOpDef *def = &tcg_op_defs[op];
> @@ -606,6 +613,11 @@ void tcg_optimize(TCGContext *s)
>              nb_iargs = def->nb_iargs;
>          }
>  
> +        /* Initialize the temps that are going to be used */
> +        for (i = 0; i < nb_oargs + nb_iargs; i++) {
> +            init_temp_info(args[i]);
> +        }
> +
>          /* Do copy propagation */
>          for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
>              if (temps[args[i]].state == TCG_TEMP_COPY) {
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately Aurelien Jarno
@ 2015-07-27  8:25   ` Paolo Bonzini
  2015-07-27  9:10     ` Aurelien Jarno
  2015-07-29 16:10   ` Alex Bennée
  1 sibling, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2015-07-27  8:25 UTC (permalink / raw)
  To: Aurelien Jarno, qemu-devel; +Cc: Richard Henderson



On 24/07/2015 18:30, Aurelien Jarno wrote:
> Use two bools to track constants and copies instead of an enum.
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/optimize.c | 30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index d2b63a4..f16eb1e 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -35,14 +35,9 @@
>          glue(glue(case INDEX_op_, x), _i32):    \
>          glue(glue(case INDEX_op_, x), _i64)
>  
> -typedef enum {
> -    TCG_TEMP_UNDEF = 0,
> -    TCG_TEMP_CONST,
> -    TCG_TEMP_COPY,
> -} tcg_temp_state;
> -
>  struct tcg_temp_info {
> -    tcg_temp_state state;
> +    bool is_const;
> +    bool is_copy;

Could temps[arg].is_copy be replaced by temps[arg].next_copy != arg?

For example, this:

    if (temps[temp].prev_copy == temps[temp].next_copy) {
        temps[temps[temp].next_copy].is_copy = false;
    } else {
        temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
        temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
    }

would be replaced simply by

    temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
    temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;

Paolo

>      uint16_t prev_copy;
>      uint16_t next_copy;
>      tcg_target_ulong val;
> @@ -54,12 +49,12 @@ static TCGTempSet temps_used;
>  
>  static inline bool temp_is_const(TCGArg arg)
>  {
> -    return temps[arg].state == TCG_TEMP_CONST;
> +    return temps[arg].is_const;
>  }
>  
>  static inline bool temp_is_copy(TCGArg arg)
>  {
> -    return temps[arg].state == TCG_TEMP_COPY;
> +    return temps[arg].is_copy;
>  }
>  
>  /* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
> @@ -68,13 +63,14 @@ static void reset_temp(TCGArg temp)
>  {
>      if (temp_is_copy(temp)) {
>          if (temps[temp].prev_copy == temps[temp].next_copy) {
> -            temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
> +            temps[temps[temp].next_copy].is_copy = false;
>          } else {
>              temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
>              temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
>          }
>      }
> -    temps[temp].state = TCG_TEMP_UNDEF;
> +    temps[temp].is_const = false;
> +    temps[temp].is_copy = false;
>      temps[temp].mask = -1;
>  }
>  
> @@ -88,7 +84,8 @@ static void reset_all_temps(int nb_temps)
>  static void init_temp_info(TCGArg temp)
>  {
>      if (!test_bit(temp, temps_used.l)) {
> -        temps[temp].state = TCG_TEMP_UNDEF;
> +        temps[temp].is_const = false;
> +        temps[temp].is_copy = false;
>          temps[temp].mask = -1;
>          set_bit(temp, temps_used.l);
>      }
> @@ -218,7 +215,8 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
>      op->opc = new_op;
>  
>      reset_temp(dst);
> -    temps[dst].state = TCG_TEMP_CONST;
> +    temps[dst].is_const = true;
> +    temps[dst].is_copy = false;
>      temps[dst].val = val;
>      mask = val;
>      if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
> @@ -261,11 +259,13 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
>  
>      if (s->temps[src].type == s->temps[dst].type) {
>          if (!temp_is_copy(src)) {
> -            temps[src].state = TCG_TEMP_COPY;
> +            temps[src].is_const = false;
> +            temps[src].is_copy = true;
>              temps[src].next_copy = src;
>              temps[src].prev_copy = src;
>          }
> -        temps[dst].state = TCG_TEMP_COPY;
> +        temps[dst].is_const = false;
> +        temps[dst].is_copy = true;
>          temps[dst].next_copy = temps[src].next_copy;
>          temps[dst].prev_copy = src;
>          temps[temps[dst].next_copy].prev_copy = dst;
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking
  2015-07-27  8:21   ` Paolo Bonzini
@ 2015-07-27  9:09     ` Aurelien Jarno
  0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-27  9:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Richard Henderson

On 2015-07-27 10:21, Paolo Bonzini wrote:
> 
> 
> On 24/07/2015 18:30, Aurelien Jarno wrote:
> > The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate
> > vector registers on most guests, it's not uncommon to have more than 100
> > used temps. This means we have initialize more than 2kB at least twice
> > per TB, often more when there is a few goto_tb.
> > 
> > Instead used a TCGTempSet bit array to track which temps are in used in
> > the current basic block. This means there are only around 16 bytes to
> > initialize.
> > 
> > This improves the boot time of a MIPS guest on an x86-64 host by around
> > 7% and moves out tcg_optimize from the the top of the profiler list.
> > 
> > Cc: Richard Henderson <rth@twiddle.net>
> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> > ---
> >  tcg/optimize.c | 32 ++++++++++++++++++++++----------
> >  1 file changed, 22 insertions(+), 10 deletions(-)
> > 
> > diff --git a/tcg/optimize.c b/tcg/optimize.c
> > index cd0e793..20e24b3 100644
> > --- a/tcg/optimize.c
> > +++ b/tcg/optimize.c
> > @@ -50,6 +50,7 @@ struct tcg_temp_info {
> >  };
> >  
> >  static struct tcg_temp_info temps[TCG_MAX_TEMPS];
> > +static TCGTempSet temps_used;
> >  
> >  /* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
> >     the copy flag from the left temp.  */
> > @@ -67,6 +68,22 @@ static void reset_temp(TCGArg temp)
> >      temps[temp].mask = -1;
> >  }
> >  
> > +/* Reset all temporaries, given that there are NB_TEMPS of them.  */
> > +static void reset_all_temps(int nb_temps)
> > +{
> > +    memset(&temps_used.l, 0, sizeof(long) * BITS_TO_LONGS(nb_temps));
> 
> You can use bitmap_zero here.
> 

Indeed, thanks for the hint. That will be in v2.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately
  2015-07-27  8:25   ` Paolo Bonzini
@ 2015-07-27  9:10     ` Aurelien Jarno
  0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-27  9:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Richard Henderson

On 2015-07-27 10:25, Paolo Bonzini wrote:
> 
> 
> On 24/07/2015 18:30, Aurelien Jarno wrote:
> > Use two bools to track constants and copies instead of an enum.
> > 
> > Cc: Richard Henderson <rth@twiddle.net>
> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> > ---
> >  tcg/optimize.c | 30 +++++++++++++++---------------
> >  1 file changed, 15 insertions(+), 15 deletions(-)
> > 
> > diff --git a/tcg/optimize.c b/tcg/optimize.c
> > index d2b63a4..f16eb1e 100644
> > --- a/tcg/optimize.c
> > +++ b/tcg/optimize.c
> > @@ -35,14 +35,9 @@
> >          glue(glue(case INDEX_op_, x), _i32):    \
> >          glue(glue(case INDEX_op_, x), _i64)
> >  
> > -typedef enum {
> > -    TCG_TEMP_UNDEF = 0,
> > -    TCG_TEMP_CONST,
> > -    TCG_TEMP_COPY,
> > -} tcg_temp_state;
> > -
> >  struct tcg_temp_info {
> > -    tcg_temp_state state;
> > +    bool is_const;
> > +    bool is_copy;
> 
> Could temps[arg].is_copy be replaced by temps[arg].next_copy != arg?
> 
> For example, this:
> 
>     if (temps[temp].prev_copy == temps[temp].next_copy) {
>         temps[temps[temp].next_copy].is_copy = false;
>     } else {
>         temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
>         temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
>     }
> 
> would be replaced simply by
> 
>     temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
>     temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
> 

That's an interesting idea, especially I have a patch series in
preparation that get rid of is_const. I will try to get something like
that for v2.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 02/10] tcg/optimize: add temp_is_const and temp_is_copy functions
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 02/10] tcg/optimize: add temp_is_const and temp_is_copy functions Aurelien Jarno
@ 2015-07-29 16:01   ` Alex Bennée
  2015-07-29 16:25     ` Aurelien Jarno
  0 siblings, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2015-07-29 16:01 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson


Aurelien Jarno <aurelien@aurel32.net> writes:

> Add two accessor functions temp_is_const and temp_is_copy, to make the
> code more readable and make code change easier.
>
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/optimize.c | 131 ++++++++++++++++++++++++++-------------------------------
>  1 file changed, 60 insertions(+), 71 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 20e24b3..d2b63a4 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -52,11 +52,21 @@ struct tcg_temp_info {
>  static struct tcg_temp_info temps[TCG_MAX_TEMPS];
>  static TCGTempSet temps_used;
>  
> +static inline bool temp_is_const(TCGArg arg)
> +{
> +    return temps[arg].state == TCG_TEMP_CONST;
> +}
> +
> +static inline bool temp_is_copy(TCGArg arg)
> +{
> +    return temps[arg].state == TCG_TEMP_COPY;
> +}
> +
>  /* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
>     the copy flag from the left temp.  */
>  static void reset_temp(TCGArg temp)
>  {
> -    if (temps[temp].state == TCG_TEMP_COPY) {
> +    if (temp_is_copy(temp)) {
>          if (temps[temp].prev_copy == temps[temp].next_copy) {
>              temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
>          } else {
> @@ -186,8 +196,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
>          return true;
>      }
>  
> -    if (temps[arg1].state != TCG_TEMP_COPY
> -        || temps[arg2].state != TCG_TEMP_COPY) {
> +    if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) {
>          return false;
>      }
>  
> @@ -230,7 +239,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
>          return;
>      }
>  
> -    if (temps[src].state == TCG_TEMP_CONST) {
> +    if (temp_is_const(src)) {
>          tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
>          return;
>      }
> @@ -248,10 +257,10 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
>      }
>      temps[dst].mask = mask;
>  
> -    assert(temps[src].state != TCG_TEMP_CONST);
> +    assert(!temp_is_const(src));
>  
>      if (s->temps[src].type == s->temps[dst].type) {
> -        if (temps[src].state != TCG_TEMP_COPY) {
> +        if (!temp_is_copy(src)) {
>              temps[src].state = TCG_TEMP_COPY;
>              temps[src].next_copy = src;
>              temps[src].prev_copy = src;
> @@ -488,7 +497,7 @@ static bool do_constant_folding_cond_eq(TCGCond c)
>  static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
>                                         TCGArg y, TCGCond c)
>  {
> -    if (temps[x].state == TCG_TEMP_CONST && temps[y].state == TCG_TEMP_CONST) {
> +    if (temp_is_const(x) && temp_is_const(y)) {
>          switch (op_bits(op)) {
>          case 32:
>              return do_constant_folding_cond_32(temps[x].val, temps[y].val, c);
> @@ -499,7 +508,7 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
>          }
>      } else if (temps_are_copies(x, y)) {
>          return do_constant_folding_cond_eq(c);
> -    } else if (temps[y].state == TCG_TEMP_CONST && temps[y].val == 0) {
> +    } else if (temp_is_const(y) && temps[y].val == 0) {
>          switch (c) {
>          case TCG_COND_LTU:
>              return 0;
> @@ -520,12 +529,10 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
>      TCGArg al = p1[0], ah = p1[1];
>      TCGArg bl = p2[0], bh = p2[1];
>  
> -    if (temps[bl].state == TCG_TEMP_CONST
> -        && temps[bh].state == TCG_TEMP_CONST) {
> +    if (temp_is_const(bl) && temp_is_const(bh)) {
>          uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val;
>  
> -        if (temps[al].state == TCG_TEMP_CONST
> -            && temps[ah].state == TCG_TEMP_CONST) {
> +        if (temp_is_const(al) && temp_is_const(ah)) {
>              uint64_t a;
>              a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val;
>              return do_constant_folding_cond_64(a, b, c);
> @@ -551,8 +558,8 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
>  {
>      TCGArg a1 = *p1, a2 = *p2;
>      int sum = 0;
> -    sum += temps[a1].state == TCG_TEMP_CONST;
> -    sum -= temps[a2].state == TCG_TEMP_CONST;
> +    sum += temp_is_const(a1);
> +    sum -= temp_is_const(a2);
>  
>      /* Prefer the constant in second argument, and then the form
>         op a, a, b, which is better handled on non-RISC hosts. */
> @@ -567,10 +574,10 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
>  static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
>  {
>      int sum = 0;
> -    sum += temps[p1[0]].state == TCG_TEMP_CONST;
> -    sum += temps[p1[1]].state == TCG_TEMP_CONST;
> -    sum -= temps[p2[0]].state == TCG_TEMP_CONST;
> -    sum -= temps[p2[1]].state == TCG_TEMP_CONST;
> +    sum += temp_is_const(p1[0]);
> +    sum += temp_is_const(p1[1]);
> +    sum -= temp_is_const(p2[0]);
> +    sum -= temp_is_const(p2[1]);
>      if (sum > 0) {
>          TCGArg t;
>          t = p1[0], p1[0] = p2[0], p2[0] = t;
> @@ -620,7 +627,7 @@ void tcg_optimize(TCGContext *s)
>  
>          /* Do copy propagation */
>          for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
> -            if (temps[args[i]].state == TCG_TEMP_COPY) {
> +            if (temp_is_copy(args[i])) {
>                  args[i] = find_better_copy(s, args[i]);
>              }
>          }
> @@ -690,8 +697,7 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(sar):
>          CASE_OP_32_64(rotl):
>          CASE_OP_32_64(rotr):
> -            if (temps[args[1]].state == TCG_TEMP_CONST
> -                && temps[args[1]].val == 0) {
> +            if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
>                  tcg_opt_gen_movi(s, op, args, args[0], 0);
>                  continue;
>              }
> @@ -701,7 +707,7 @@ void tcg_optimize(TCGContext *s)
>                  TCGOpcode neg_op;
>                  bool have_neg;
>  
> -                if (temps[args[2]].state == TCG_TEMP_CONST) {
> +                if (temp_is_const(args[2])) {
>                      /* Proceed with possible constant folding. */
>                      break;
>                  }
> @@ -715,8 +721,7 @@ void tcg_optimize(TCGContext *s)
>                  if (!have_neg) {
>                      break;
>                  }
> -                if (temps[args[1]].state == TCG_TEMP_CONST
> -                    && temps[args[1]].val == 0) {
> +                if (temp_is_const(args[1]) && temps[args[1]].val ==
> 0) {

This makes me wonder if we should have:

  temp_is_const_val(arg, val)

to wrap up these tests even more neatly

>                      op->opc = neg_op;
>                      reset_temp(args[0]);
>                      args[1] = args[2];
> @@ -726,34 +731,30 @@ void tcg_optimize(TCGContext *s)
>              break;
>          CASE_OP_32_64(xor):
>          CASE_OP_32_64(nand):
> -            if (temps[args[1]].state != TCG_TEMP_CONST
> -                && temps[args[2]].state == TCG_TEMP_CONST
> -                && temps[args[2]].val == -1) {
> +            if (!temp_is_const(args[1])
> +                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
>                  i = 1;
>                  goto try_not;
>              }
>              break;
>          CASE_OP_32_64(nor):
> -            if (temps[args[1]].state != TCG_TEMP_CONST
> -                && temps[args[2]].state == TCG_TEMP_CONST
> -                && temps[args[2]].val == 0) {
> +            if (!temp_is_const(args[1])
> +                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
>                  i = 1;
>                  goto try_not;
>              }
>              break;
>          CASE_OP_32_64(andc):
> -            if (temps[args[2]].state != TCG_TEMP_CONST
> -                && temps[args[1]].state == TCG_TEMP_CONST
> -                && temps[args[1]].val == -1) {
> +            if (!temp_is_const(args[2])
> +                && temp_is_const(args[1]) && temps[args[1]].val == -1) {
>                  i = 2;
>                  goto try_not;
>              }
>              break;
>          CASE_OP_32_64(orc):
>          CASE_OP_32_64(eqv):
> -            if (temps[args[2]].state != TCG_TEMP_CONST
> -                && temps[args[1]].state == TCG_TEMP_CONST
> -                && temps[args[1]].val == 0) {
> +            if (!temp_is_const(args[2])
> +                && temp_is_const(args[1]) && temps[args[1]].val == 0) {
>                  i = 2;
>                  goto try_not;
>              }
> @@ -794,9 +795,8 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(or):
>          CASE_OP_32_64(xor):
>          CASE_OP_32_64(andc):
> -            if (temps[args[1]].state != TCG_TEMP_CONST
> -                && temps[args[2]].state == TCG_TEMP_CONST
> -                && temps[args[2]].val == 0) {
> +            if (!temp_is_const(args[1])
> +                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
>                  tcg_opt_gen_mov(s, op, args, args[0], args[1]);
>                  continue;
>              }
> @@ -804,9 +804,8 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(and):
>          CASE_OP_32_64(orc):
>          CASE_OP_32_64(eqv):
> -            if (temps[args[1]].state != TCG_TEMP_CONST
> -                && temps[args[2]].state == TCG_TEMP_CONST
> -                && temps[args[2]].val == -1) {
> +            if (!temp_is_const(args[1])
> +                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
>                  tcg_opt_gen_mov(s, op, args, args[0], args[1]);
>                  continue;
>              }
> @@ -844,7 +843,7 @@ void tcg_optimize(TCGContext *s)
>  
>          CASE_OP_32_64(and):
>              mask = temps[args[2]].mask;
> -            if (temps[args[2]].state == TCG_TEMP_CONST) {
> +            if (temp_is_const(args[2])) {
>          and_const:
>                  affected = temps[args[1]].mask & ~mask;
>              }
> @@ -854,7 +853,7 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(andc):
>              /* Known-zeros does not imply known-ones.  Therefore unless
>                 args[2] is constant, we can't infer anything from it.  */
> -            if (temps[args[2]].state == TCG_TEMP_CONST) {
> +            if (temp_is_const(args[2])) {
>                  mask = ~temps[args[2]].mask;
>                  goto and_const;
>              }
> @@ -863,26 +862,26 @@ void tcg_optimize(TCGContext *s)
>              break;
>  
>          case INDEX_op_sar_i32:
> -            if (temps[args[2]].state == TCG_TEMP_CONST) {
> +            if (temp_is_const(args[2])) {
>                  tmp = temps[args[2]].val & 31;
>                  mask = (int32_t)temps[args[1]].mask >> tmp;
>              }
>              break;
>          case INDEX_op_sar_i64:
> -            if (temps[args[2]].state == TCG_TEMP_CONST) {
> +            if (temp_is_const(args[2])) {
>                  tmp = temps[args[2]].val & 63;
>                  mask = (int64_t)temps[args[1]].mask >> tmp;
>              }
>              break;
>  
>          case INDEX_op_shr_i32:
> -            if (temps[args[2]].state == TCG_TEMP_CONST) {
> +            if (temp_is_const(args[2])) {
>                  tmp = temps[args[2]].val & 31;
>                  mask = (uint32_t)temps[args[1]].mask >> tmp;
>              }
>              break;
>          case INDEX_op_shr_i64:
> -            if (temps[args[2]].state == TCG_TEMP_CONST) {
> +            if (temp_is_const(args[2])) {
>                  tmp = temps[args[2]].val & 63;
>                  mask = (uint64_t)temps[args[1]].mask >> tmp;
>              }
> @@ -893,7 +892,7 @@ void tcg_optimize(TCGContext *s)
>              break;
>  
>          CASE_OP_32_64(shl):
> -            if (temps[args[2]].state == TCG_TEMP_CONST) {
> +            if (temp_is_const(args[2])) {
>                  tmp = temps[args[2]].val & (TCG_TARGET_REG_BITS - 1);
>                  mask = temps[args[1]].mask << tmp;
>              }
> @@ -974,8 +973,7 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(mul):
>          CASE_OP_32_64(muluh):
>          CASE_OP_32_64(mulsh):
> -            if ((temps[args[2]].state == TCG_TEMP_CONST
> -                && temps[args[2]].val == 0)) {
> +            if ((temp_is_const(args[2]) && temps[args[2]].val == 0)) {
>                  tcg_opt_gen_movi(s, op, args, args[0], 0);
>                  continue;
>              }
> @@ -1030,7 +1028,7 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(ext16u):
>          case INDEX_op_ext32s_i64:
>          case INDEX_op_ext32u_i64:
> -            if (temps[args[1]].state == TCG_TEMP_CONST) {
> +            if (temp_is_const(args[1])) {
>                  tmp = do_constant_folding(opc, temps[args[1]].val, 0);
>                  tcg_opt_gen_movi(s, op, args, args[0], tmp);
>                  break;
> @@ -1038,7 +1036,7 @@ void tcg_optimize(TCGContext *s)
>              goto do_default;
>  
>          case INDEX_op_trunc_shr_i32:
> -            if (temps[args[1]].state == TCG_TEMP_CONST) {
> +            if (temp_is_const(args[1])) {
>                  tmp = do_constant_folding(opc, temps[args[1]].val, args[2]);
>                  tcg_opt_gen_movi(s, op, args, args[0], tmp);
>                  break;
> @@ -1067,8 +1065,7 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(divu):
>          CASE_OP_32_64(rem):
>          CASE_OP_32_64(remu):
> -            if (temps[args[1]].state == TCG_TEMP_CONST
> -                && temps[args[2]].state == TCG_TEMP_CONST) {
> +            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
>                  tmp = do_constant_folding(opc, temps[args[1]].val,
>                                            temps[args[2]].val);
>                  tcg_opt_gen_movi(s, op, args, args[0], tmp);
> @@ -1077,8 +1074,7 @@ void tcg_optimize(TCGContext *s)
>              goto do_default;
>  
>          CASE_OP_32_64(deposit):
> -            if (temps[args[1]].state == TCG_TEMP_CONST
> -                && temps[args[2]].state == TCG_TEMP_CONST) {
> +            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
>                  tmp = deposit64(temps[args[1]].val, args[3], args[4],
>                                  temps[args[2]].val);
>                  tcg_opt_gen_movi(s, op, args, args[0], tmp);
> @@ -1118,10 +1114,8 @@ void tcg_optimize(TCGContext *s)
>  
>          case INDEX_op_add2_i32:
>          case INDEX_op_sub2_i32:
> -            if (temps[args[2]].state == TCG_TEMP_CONST
> -                && temps[args[3]].state == TCG_TEMP_CONST
> -                && temps[args[4]].state == TCG_TEMP_CONST
> -                && temps[args[5]].state == TCG_TEMP_CONST) {
> +            if (temp_is_const(args[2]) && temp_is_const(args[3])
> +                && temp_is_const(args[4]) && temp_is_const(args[5])) {
>                  uint32_t al = temps[args[2]].val;
>                  uint32_t ah = temps[args[3]].val;
>                  uint32_t bl = temps[args[4]].val;
> @@ -1150,8 +1144,7 @@ void tcg_optimize(TCGContext *s)
>              goto do_default;
>  
>          case INDEX_op_mulu2_i32:
> -            if (temps[args[2]].state == TCG_TEMP_CONST
> -                && temps[args[3]].state == TCG_TEMP_CONST) {
> +            if (temp_is_const(args[2]) && temp_is_const(args[3])) {
>                  uint32_t a = temps[args[2]].val;
>                  uint32_t b = temps[args[3]].val;
>                  uint64_t r = (uint64_t)a * b;
> @@ -1183,10 +1176,8 @@ void tcg_optimize(TCGContext *s)
>                      tcg_op_remove(s, op);
>                  }
>              } else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
> -                       && temps[args[2]].state == TCG_TEMP_CONST
> -                       && temps[args[3]].state == TCG_TEMP_CONST
> -                       && temps[args[2]].val == 0
> -                       && temps[args[3]].val == 0) {
> +                       && temp_is_const(args[2]) && temps[args[2]].val == 0
> +                       && temp_is_const(args[3]) && temps[args[3]].val == 0) {
>                  /* Simplify LT/GE comparisons vs zero to a single compare
>                     vs the high word of the input.  */
>              do_brcond_high:
> @@ -1248,10 +1239,8 @@ void tcg_optimize(TCGContext *s)
>              do_setcond_const:
>                  tcg_opt_gen_movi(s, op, args, args[0], tmp);
>              } else if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
> -                       && temps[args[3]].state == TCG_TEMP_CONST
> -                       && temps[args[4]].state == TCG_TEMP_CONST
> -                       && temps[args[3]].val == 0
> -                       && temps[args[4]].val == 0) {
> +                       && temp_is_const(args[3]) && temps[args[3]].val == 0
> +                       && temp_is_const(args[4]) && temps[args[4]].val == 0) {
>                  /* Simplify LT/GE comparisons vs zero to a single compare
>                     vs the high word of the input.  */
>              do_setcond_high:

Otherwise it look sane to me:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately Aurelien Jarno
  2015-07-27  8:25   ` Paolo Bonzini
@ 2015-07-29 16:10   ` Alex Bennée
  2015-07-29 16:25     ` Aurelien Jarno
  1 sibling, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2015-07-29 16:10 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson


Aurelien Jarno <aurelien@aurel32.net> writes:

> Use two bools to track constants and copies instead of an enum.

More of an explanation would be useful as to why, otherwise it seems to
me we are just increasing the size of the structure (assuming the
compiler uses the same base sizes for the bool and enum).


>
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/optimize.c | 30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index d2b63a4..f16eb1e 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -35,14 +35,9 @@
>          glue(glue(case INDEX_op_, x), _i32):    \
>          glue(glue(case INDEX_op_, x), _i64)
>  
> -typedef enum {
> -    TCG_TEMP_UNDEF = 0,
> -    TCG_TEMP_CONST,
> -    TCG_TEMP_COPY,
> -} tcg_temp_state;
> -
>  struct tcg_temp_info {
> -    tcg_temp_state state;
> +    bool is_const;
> +    bool is_copy;
>      uint16_t prev_copy;
>      uint16_t next_copy;
>      tcg_target_ulong val;
> @@ -54,12 +49,12 @@ static TCGTempSet temps_used;
>  
>  static inline bool temp_is_const(TCGArg arg)
>  {
> -    return temps[arg].state == TCG_TEMP_CONST;
> +    return temps[arg].is_const;
>  }
>  
>  static inline bool temp_is_copy(TCGArg arg)
>  {
> -    return temps[arg].state == TCG_TEMP_COPY;
> +    return temps[arg].is_copy;
>  }
>  
>  /* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
> @@ -68,13 +63,14 @@ static void reset_temp(TCGArg temp)
>  {
>      if (temp_is_copy(temp)) {
>          if (temps[temp].prev_copy == temps[temp].next_copy) {
> -            temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
> +            temps[temps[temp].next_copy].is_copy = false;
>          } else {
>              temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
>              temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
>          }
>      }
> -    temps[temp].state = TCG_TEMP_UNDEF;
> +    temps[temp].is_const = false;
> +    temps[temp].is_copy = false;
>      temps[temp].mask = -1;
>  }
>  
> @@ -88,7 +84,8 @@ static void reset_all_temps(int nb_temps)
>  static void init_temp_info(TCGArg temp)
>  {
>      if (!test_bit(temp, temps_used.l)) {
> -        temps[temp].state = TCG_TEMP_UNDEF;
> +        temps[temp].is_const = false;
> +        temps[temp].is_copy = false;
>          temps[temp].mask = -1;
>          set_bit(temp, temps_used.l);
>      }
> @@ -218,7 +215,8 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
>      op->opc = new_op;
>  
>      reset_temp(dst);
> -    temps[dst].state = TCG_TEMP_CONST;
> +    temps[dst].is_const = true;
> +    temps[dst].is_copy = false;
>      temps[dst].val = val;
>      mask = val;
>      if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
> @@ -261,11 +259,13 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
>  
>      if (s->temps[src].type == s->temps[dst].type) {
>          if (!temp_is_copy(src)) {
> -            temps[src].state = TCG_TEMP_COPY;
> +            temps[src].is_const = false;
> +            temps[src].is_copy = true;
>              temps[src].next_copy = src;
>              temps[src].prev_copy = src;
>          }
> -        temps[dst].state = TCG_TEMP_COPY;
> +        temps[dst].is_const = false;
> +        temps[dst].is_copy = true;
>          temps[dst].next_copy = temps[src].next_copy;
>          temps[dst].prev_copy = src;
>          temps[temps[dst].next_copy].prev_copy = dst;

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies Aurelien Jarno
  2015-07-24 20:15   ` Richard Henderson
@ 2015-07-29 16:12   ` Alex Bennée
  2015-07-29 16:27     ` Aurelien Jarno
  1 sibling, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2015-07-29 16:12 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson


Aurelien Jarno <aurelien@aurel32.net> writes:

> Now that copies and constants are tracked separately, we can allow
> constant to have copies, deferring the choice to use a register or a
> constant to the register allocation pass. This prevent this kind of
> regular constant reloading:
>
> -OUT: [size=338]
> +OUT: [size=298]
>    mov    -0x4(%r14),%ebp
>    test   %ebp,%ebp
>    jne    0x7ffbe9cb0ed6
>    mov    $0x40002219f8,%rbp
>    mov    %rbp,(%r14)
> -  mov    $0x40002219f8,%rbp
>    mov    $0x4000221a20,%rbx
>    mov    %rbp,(%rbx)
>    mov    $0x4000000000,%rbp
>    mov    %rbp,(%r14)
> -  mov    $0x4000000000,%rbp
>    mov    $0x4000221d38,%rbx
>    mov    %rbp,(%rbx)
>    mov    $0x40002221a8,%rbp
>    mov    %rbp,(%r14)
> -  mov    $0x40002221a8,%rbp
>    mov    $0x4000221d40,%rbx
>    mov    %rbp,(%rbx)
>    mov    $0x4000019170,%rbp
>    mov    %rbp,(%r14)
> -  mov    $0x4000019170,%rbp
>    mov    $0x4000221d48,%rbx
>    mov    %rbp,(%rbx)
>    mov    $0x40000049ee,%rbp
>    mov    %rbp,0x80(%r14)
>    mov    %r14,%rdi
>    callq  0x7ffbe99924d0
>    mov    $0x4000001680,%rbp
>    mov    %rbp,0x30(%r14)
>    mov    0x10(%r14),%rbp
>    mov    $0x4000001680,%rbp
>    mov    %rbp,0x30(%r14)
>    mov    0x10(%r14),%rbp
>    shl    $0x20,%rbp
>    mov    (%r14),%rbx
>    mov    %ebx,%ebx
>    mov    %rbx,(%r14)
>    or     %rbx,%rbp
>    mov    %rbp,0x10(%r14)
>    mov    %rbp,0x90(%r14)
>    mov    0x60(%r14),%rbx
>    mov    %rbx,0x38(%r14)
>    mov    0x28(%r14),%rbx
>    mov    $0x4000220e60,%r12
>    mov    %rbx,(%r12)
>    mov    $0x40002219c8,%rbx
>    mov    %rbp,(%rbx)
>    mov    0x20(%r14),%rbp
>    sub    $0x8,%rbp
>    mov    $0x4000004a16,%rbx
>    mov    %rbx,0x0(%rbp)
>    mov    %rbp,0x20(%r14)
>    mov    $0x19,%ebp
>    mov    %ebp,0xa8(%r14)
>    mov    $0x4000015110,%rbp
>    mov    %rbp,0x80(%r14)
>    xor    %eax,%eax
>    jmpq   0x7ffbebcae426
>    lea    -0x5f6d72a(%rip),%rax        # 0x7ffbe3d437b3
>    jmpq   0x7ffbebcae426
>
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/optimize.c | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index f16eb1e..48103b2 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -237,11 +237,6 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
>          return;
>      }
>  
> -    if (temp_is_const(src)) {
> -        tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
> -        return;
> -    }
> -

That looks suspicious, surely we only want to drop the move if we
already have the const somewhere else?

>      TCGOpcode new_op = op_to_mov(op->opc);
>      tcg_target_ulong mask;
>  
> @@ -255,17 +250,15 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
>      }
>      temps[dst].mask = mask;
>  
> -    assert(!temp_is_const(src));
> -
>      if (s->temps[src].type == s->temps[dst].type) {
>          if (!temp_is_copy(src)) {
> -            temps[src].is_const = false;
>              temps[src].is_copy = true;
>              temps[src].next_copy = src;
>              temps[src].prev_copy = src;
>          }
> -        temps[dst].is_const = false;
> +        temps[dst].is_const = temps[src].is_const;
>          temps[dst].is_copy = true;
> +        temps[dst].val = temps[src].val;
>          temps[dst].next_copy = temps[src].next_copy;
>          temps[dst].prev_copy = src;
>          temps[temps[dst].next_copy].prev_copy = dst;

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately
  2015-07-29 16:10   ` Alex Bennée
@ 2015-07-29 16:25     ` Aurelien Jarno
  0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-29 16:25 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, Richard Henderson

On 2015-07-29 17:10, Alex Bennée wrote:
> 
> Aurelien Jarno <aurelien@aurel32.net> writes:
> 
> > Use two bools to track constants and copies instead of an enum.
> 
> More of an explanation would be useful as to why, otherwise it seems to
> me we are just increasing the size of the structure (assuming the
> compiler uses the same base sizes for the bool and enum).

The reason is in the following patch, to allow separate tracking of const
and copy. Note that this doesn't increase the structure size due to
alignment and that the v2 of this patchset actually uses only one bool.

I'll add a note about why it is need for v3.


-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 02/10] tcg/optimize: add temp_is_const and temp_is_copy functions
  2015-07-29 16:01   ` Alex Bennée
@ 2015-07-29 16:25     ` Aurelien Jarno
  0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-29 16:25 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, Richard Henderson

On 2015-07-29 17:01, Alex Bennée wrote:
> 
> Aurelien Jarno <aurelien@aurel32.net> writes:
> 
> > Add two accessor functions temp_is_const and temp_is_copy, to make the
> > code more readable and make code change easier.
> >
> > Cc: Richard Henderson <rth@twiddle.net>
> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> > ---
> >  tcg/optimize.c | 131 ++++++++++++++++++++++++++-------------------------------
> >  1 file changed, 60 insertions(+), 71 deletions(-)
> >
> > diff --git a/tcg/optimize.c b/tcg/optimize.c
> > index 20e24b3..d2b63a4 100644
> > --- a/tcg/optimize.c
> > +++ b/tcg/optimize.c
> > @@ -52,11 +52,21 @@ struct tcg_temp_info {
> >  static struct tcg_temp_info temps[TCG_MAX_TEMPS];
> >  static TCGTempSet temps_used;
> >  
> > +static inline bool temp_is_const(TCGArg arg)
> > +{
> > +    return temps[arg].state == TCG_TEMP_CONST;
> > +}
> > +
> > +static inline bool temp_is_copy(TCGArg arg)
> > +{
> > +    return temps[arg].state == TCG_TEMP_COPY;
> > +}
> > +
> >  /* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
> >     the copy flag from the left temp.  */
> >  static void reset_temp(TCGArg temp)
> >  {
> > -    if (temps[temp].state == TCG_TEMP_COPY) {
> > +    if (temp_is_copy(temp)) {
> >          if (temps[temp].prev_copy == temps[temp].next_copy) {
> >              temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
> >          } else {
> > @@ -186,8 +196,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
> >          return true;
> >      }
> >  
> > -    if (temps[arg1].state != TCG_TEMP_COPY
> > -        || temps[arg2].state != TCG_TEMP_COPY) {
> > +    if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) {
> >          return false;
> >      }
> >  
> > @@ -230,7 +239,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
> >          return;
> >      }
> >  
> > -    if (temps[src].state == TCG_TEMP_CONST) {
> > +    if (temp_is_const(src)) {
> >          tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
> >          return;
> >      }
> > @@ -248,10 +257,10 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
> >      }
> >      temps[dst].mask = mask;
> >  
> > -    assert(temps[src].state != TCG_TEMP_CONST);
> > +    assert(!temp_is_const(src));
> >  
> >      if (s->temps[src].type == s->temps[dst].type) {
> > -        if (temps[src].state != TCG_TEMP_COPY) {
> > +        if (!temp_is_copy(src)) {
> >              temps[src].state = TCG_TEMP_COPY;
> >              temps[src].next_copy = src;
> >              temps[src].prev_copy = src;
> > @@ -488,7 +497,7 @@ static bool do_constant_folding_cond_eq(TCGCond c)
> >  static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
> >                                         TCGArg y, TCGCond c)
> >  {
> > -    if (temps[x].state == TCG_TEMP_CONST && temps[y].state == TCG_TEMP_CONST) {
> > +    if (temp_is_const(x) && temp_is_const(y)) {
> >          switch (op_bits(op)) {
> >          case 32:
> >              return do_constant_folding_cond_32(temps[x].val, temps[y].val, c);
> > @@ -499,7 +508,7 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
> >          }
> >      } else if (temps_are_copies(x, y)) {
> >          return do_constant_folding_cond_eq(c);
> > -    } else if (temps[y].state == TCG_TEMP_CONST && temps[y].val == 0) {
> > +    } else if (temp_is_const(y) && temps[y].val == 0) {
> >          switch (c) {
> >          case TCG_COND_LTU:
> >              return 0;
> > @@ -520,12 +529,10 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
> >      TCGArg al = p1[0], ah = p1[1];
> >      TCGArg bl = p2[0], bh = p2[1];
> >  
> > -    if (temps[bl].state == TCG_TEMP_CONST
> > -        && temps[bh].state == TCG_TEMP_CONST) {
> > +    if (temp_is_const(bl) && temp_is_const(bh)) {
> >          uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val;
> >  
> > -        if (temps[al].state == TCG_TEMP_CONST
> > -            && temps[ah].state == TCG_TEMP_CONST) {
> > +        if (temp_is_const(al) && temp_is_const(ah)) {
> >              uint64_t a;
> >              a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val;
> >              return do_constant_folding_cond_64(a, b, c);
> > @@ -551,8 +558,8 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
> >  {
> >      TCGArg a1 = *p1, a2 = *p2;
> >      int sum = 0;
> > -    sum += temps[a1].state == TCG_TEMP_CONST;
> > -    sum -= temps[a2].state == TCG_TEMP_CONST;
> > +    sum += temp_is_const(a1);
> > +    sum -= temp_is_const(a2);
> >  
> >      /* Prefer the constant in second argument, and then the form
> >         op a, a, b, which is better handled on non-RISC hosts. */
> > @@ -567,10 +574,10 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
> >  static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
> >  {
> >      int sum = 0;
> > -    sum += temps[p1[0]].state == TCG_TEMP_CONST;
> > -    sum += temps[p1[1]].state == TCG_TEMP_CONST;
> > -    sum -= temps[p2[0]].state == TCG_TEMP_CONST;
> > -    sum -= temps[p2[1]].state == TCG_TEMP_CONST;
> > +    sum += temp_is_const(p1[0]);
> > +    sum += temp_is_const(p1[1]);
> > +    sum -= temp_is_const(p2[0]);
> > +    sum -= temp_is_const(p2[1]);
> >      if (sum > 0) {
> >          TCGArg t;
> >          t = p1[0], p1[0] = p2[0], p2[0] = t;
> > @@ -620,7 +627,7 @@ void tcg_optimize(TCGContext *s)
> >  
> >          /* Do copy propagation */
> >          for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
> > -            if (temps[args[i]].state == TCG_TEMP_COPY) {
> > +            if (temp_is_copy(args[i])) {
> >                  args[i] = find_better_copy(s, args[i]);
> >              }
> >          }
> > @@ -690,8 +697,7 @@ void tcg_optimize(TCGContext *s)
> >          CASE_OP_32_64(sar):
> >          CASE_OP_32_64(rotl):
> >          CASE_OP_32_64(rotr):
> > -            if (temps[args[1]].state == TCG_TEMP_CONST
> > -                && temps[args[1]].val == 0) {
> > +            if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
> >                  tcg_opt_gen_movi(s, op, args, args[0], 0);
> >                  continue;
> >              }
> > @@ -701,7 +707,7 @@ void tcg_optimize(TCGContext *s)
> >                  TCGOpcode neg_op;
> >                  bool have_neg;
> >  
> > -                if (temps[args[2]].state == TCG_TEMP_CONST) {
> > +                if (temp_is_const(args[2])) {
> >                      /* Proceed with possible constant folding. */
> >                      break;
> >                  }
> > @@ -715,8 +721,7 @@ void tcg_optimize(TCGContext *s)
> >                  if (!have_neg) {
> >                      break;
> >                  }
> > -                if (temps[args[1]].state == TCG_TEMP_CONST
> > -                    && temps[args[1]].val == 0) {
> > +                if (temp_is_const(args[1]) && temps[args[1]].val ==
> > 0) {
> 
> This makes me wonder if we should have:
> 
>   temp_is_const_val(arg, val)
> 
> to wrap up these tests even more neatly

That's something that can be added, but in that case we need both
temp_is_const and temp_is_const_val as we sometimes need to test if
a temp is a constant without necessarily checking for a particular
value.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies
  2015-07-29 16:12   ` Alex Bennée
@ 2015-07-29 16:27     ` Aurelien Jarno
  2015-07-29 20:42       ` Alex Bennée
  0 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-29 16:27 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, Richard Henderson

On 2015-07-29 17:12, Alex Bennée wrote:
> 
> Aurelien Jarno <aurelien@aurel32.net> writes:
> 
> > Now that copies and constants are tracked separately, we can allow
> > constant to have copies, deferring the choice to use a register or a
> > constant to the register allocation pass. This prevent this kind of
> > regular constant reloading:
> >
> > -OUT: [size=338]
> > +OUT: [size=298]
> >    mov    -0x4(%r14),%ebp
> >    test   %ebp,%ebp
> >    jne    0x7ffbe9cb0ed6
> >    mov    $0x40002219f8,%rbp
> >    mov    %rbp,(%r14)
> > -  mov    $0x40002219f8,%rbp
> >    mov    $0x4000221a20,%rbx
> >    mov    %rbp,(%rbx)
> >    mov    $0x4000000000,%rbp
> >    mov    %rbp,(%r14)
> > -  mov    $0x4000000000,%rbp
> >    mov    $0x4000221d38,%rbx
> >    mov    %rbp,(%rbx)
> >    mov    $0x40002221a8,%rbp
> >    mov    %rbp,(%r14)
> > -  mov    $0x40002221a8,%rbp
> >    mov    $0x4000221d40,%rbx
> >    mov    %rbp,(%rbx)
> >    mov    $0x4000019170,%rbp
> >    mov    %rbp,(%r14)
> > -  mov    $0x4000019170,%rbp
> >    mov    $0x4000221d48,%rbx
> >    mov    %rbp,(%rbx)
> >    mov    $0x40000049ee,%rbp
> >    mov    %rbp,0x80(%r14)
> >    mov    %r14,%rdi
> >    callq  0x7ffbe99924d0
> >    mov    $0x4000001680,%rbp
> >    mov    %rbp,0x30(%r14)
> >    mov    0x10(%r14),%rbp
> >    mov    $0x4000001680,%rbp
> >    mov    %rbp,0x30(%r14)
> >    mov    0x10(%r14),%rbp
> >    shl    $0x20,%rbp
> >    mov    (%r14),%rbx
> >    mov    %ebx,%ebx
> >    mov    %rbx,(%r14)
> >    or     %rbx,%rbp
> >    mov    %rbp,0x10(%r14)
> >    mov    %rbp,0x90(%r14)
> >    mov    0x60(%r14),%rbx
> >    mov    %rbx,0x38(%r14)
> >    mov    0x28(%r14),%rbx
> >    mov    $0x4000220e60,%r12
> >    mov    %rbx,(%r12)
> >    mov    $0x40002219c8,%rbx
> >    mov    %rbp,(%rbx)
> >    mov    0x20(%r14),%rbp
> >    sub    $0x8,%rbp
> >    mov    $0x4000004a16,%rbx
> >    mov    %rbx,0x0(%rbp)
> >    mov    %rbp,0x20(%r14)
> >    mov    $0x19,%ebp
> >    mov    %ebp,0xa8(%r14)
> >    mov    $0x4000015110,%rbp
> >    mov    %rbp,0x80(%r14)
> >    xor    %eax,%eax
> >    jmpq   0x7ffbebcae426
> >    lea    -0x5f6d72a(%rip),%rax        # 0x7ffbe3d437b3
> >    jmpq   0x7ffbebcae426
> >
> > Cc: Richard Henderson <rth@twiddle.net>
> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> > ---
> >  tcg/optimize.c | 11 ++---------
> >  1 file changed, 2 insertions(+), 9 deletions(-)
> >
> > diff --git a/tcg/optimize.c b/tcg/optimize.c
> > index f16eb1e..48103b2 100644
> > --- a/tcg/optimize.c
> > +++ b/tcg/optimize.c
> > @@ -237,11 +237,6 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
> >          return;
> >      }
> >  
> > -    if (temp_is_const(src)) {
> > -        tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
> > -        return;
> > -    }
> > -
> 
> That looks suspicious, surely we only want to drop the move if we
> already have the const somewhere else?

That's actually the while point of this patchset, to avoid converting
mov into moving for constant values and thus loosing the link between
temps.

At this moment point of the code, the only way to know that the source
temp is a constant is when it has been set using a movi before.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies
  2015-07-29 16:27     ` Aurelien Jarno
@ 2015-07-29 20:42       ` Alex Bennée
  2015-07-30  7:46         ` Aurelien Jarno
  0 siblings, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2015-07-29 20:42 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson


Aurelien Jarno <aurelien@aurel32.net> writes:

> On 2015-07-29 17:12, Alex Bennée wrote:
>> 
>> Aurelien Jarno <aurelien@aurel32.net> writes:
>> 
>> > Now that copies and constants are tracked separately, we can allow
>> > constant to have copies, deferring the choice to use a register or a
>> > constant to the register allocation pass. This prevent this kind of
>> > regular constant reloading:
>> >
>> > -OUT: [size=338]
>> > +OUT: [size=298]
>> >    mov    -0x4(%r14),%ebp
>> >    test   %ebp,%ebp
>> >    jne    0x7ffbe9cb0ed6
>> >    mov    $0x40002219f8,%rbp
>> >    mov    %rbp,(%r14)
>> > -  mov    $0x40002219f8,%rbp
>> >    mov    $0x4000221a20,%rbx
>> >    mov    %rbp,(%rbx)
>> >    mov    $0x4000000000,%rbp
>> >    mov    %rbp,(%r14)
<snip>
>> > --- a/tcg/optimize.c
>> > +++ b/tcg/optimize.c
>> > @@ -237,11 +237,6 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
>> >          return;
>> >      }
>> >  
>> > -    if (temp_is_const(src)) {
>> > -        tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
>> > -        return;
>> > -    }
>> > -
>> 
>> That looks suspicious, surely we only want to drop the move if we
>> already have the const somewhere else?
>
> That's actually the while point of this patchset, to avoid converting
> mov into moving for constant values and thus loosing the link between
> temps.

I get that, I guess I didn't follow the previous loading of the constant
value. Maybe showing the ops in the commit message will make it clearer.

>
> At this moment point of the code, the only way to know that the source
> temp is a constant is when it has been set using a movi before.

OK.


-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies
  2015-07-29 20:42       ` Alex Bennée
@ 2015-07-30  7:46         ` Aurelien Jarno
  0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-30  7:46 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, Richard Henderson

On 2015-07-29 21:42, Alex Bennée wrote:
> 
> Aurelien Jarno <aurelien@aurel32.net> writes:
> 
> > On 2015-07-29 17:12, Alex Bennée wrote:
> >> 
> >> Aurelien Jarno <aurelien@aurel32.net> writes:
> >> 
> >> > Now that copies and constants are tracked separately, we can allow
> >> > constant to have copies, deferring the choice to use a register or a
> >> > constant to the register allocation pass. This prevent this kind of
> >> > regular constant reloading:
> >> >
> >> > -OUT: [size=338]
> >> > +OUT: [size=298]
> >> >    mov    -0x4(%r14),%ebp
> >> >    test   %ebp,%ebp
> >> >    jne    0x7ffbe9cb0ed6
> >> >    mov    $0x40002219f8,%rbp
> >> >    mov    %rbp,(%r14)
> >> > -  mov    $0x40002219f8,%rbp
> >> >    mov    $0x4000221a20,%rbx
> >> >    mov    %rbp,(%rbx)
> >> >    mov    $0x4000000000,%rbp
> >> >    mov    %rbp,(%r14)
> <snip>
> >> > --- a/tcg/optimize.c
> >> > +++ b/tcg/optimize.c
> >> > @@ -237,11 +237,6 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
> >> >          return;
> >> >      }
> >> >  
> >> > -    if (temp_is_const(src)) {
> >> > -        tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
> >> > -        return;
> >> > -    }
> >> > -
> >> 
> >> That looks suspicious, surely we only want to drop the move if we
> >> already have the const somewhere else?
> >
> > That's actually the while point of this patchset, to avoid converting
> > mov into moving for constant values and thus loosing the link between
> > temps.
> 
> I get that, I guess I didn't follow the previous loading of the constant
> value. Maybe showing the ops in the commit message will make it clearer.

Ok, I'll do that in v3.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 05/10] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 05/10] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
@ 2015-07-31  6:31   ` Alex Bennée
  0 siblings, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2015-07-31  6:31 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson


Aurelien Jarno <aurelien@aurel32.net> writes:

> The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32,
> and the name in the README doesn't match the name offered to the
> frontends.

I was tempted to suggest just naming it shr_i64_i32 as the truncation
is implicit and the slightly shorter name doesn't mess the formatting up
as much. But then I see we have another usage of trunc so:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

>
> Always use the long name to make it clear it is a size changing op.
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/README               | 2 +-
>  tcg/aarch64/tcg-target.h | 2 +-
>  tcg/i386/tcg-target.h    | 2 +-
>  tcg/ia64/tcg-target.h    | 2 +-
>  tcg/optimize.c           | 6 +++---
>  tcg/ppc/tcg-target.h     | 2 +-
>  tcg/s390/tcg-target.h    | 2 +-
>  tcg/sparc/tcg-target.c   | 4 ++--
>  tcg/sparc/tcg-target.h   | 2 +-
>  tcg/tcg-op.c             | 4 ++--
>  tcg/tcg-opc.h            | 4 ++--
>  tcg/tcg.h                | 2 +-
>  tcg/tci/tcg-target.h     | 2 +-
>  13 files changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/tcg/README b/tcg/README
> index a550ff1..61b3899 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -314,7 +314,7 @@ This operation would be equivalent to
>  
>    dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
>  
> -* trunc_shr_i32 t0, t1, pos
> +* trunc_shr_i64_i32 t0, t1, pos
>  
>  For 64-bit hosts only, right shift the 64-bit input T1 by POS and
>  truncate to 32-bit output T0.  Depending on the host, this may be
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 8aec04d..dfd8801 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -70,7 +70,7 @@ typedef enum {
>  #define TCG_TARGET_HAS_muls2_i32        0
>  #define TCG_TARGET_HAS_muluh_i32        0
>  #define TCG_TARGET_HAS_mulsh_i32        0
> -#define TCG_TARGET_HAS_trunc_shr_i32    0
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
>  
>  #define TCG_TARGET_HAS_div_i64          1
>  #define TCG_TARGET_HAS_rem_i64          1
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 25b5133..dae50ba 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -102,7 +102,7 @@ extern bool have_bmi1;
>  #define TCG_TARGET_HAS_mulsh_i32        0
>  
>  #if TCG_TARGET_REG_BITS == 64
> -#define TCG_TARGET_HAS_trunc_shr_i32    0
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
>  #define TCG_TARGET_HAS_div2_i64         1
>  #define TCG_TARGET_HAS_rot_i64          1
>  #define TCG_TARGET_HAS_ext8s_i64        1
> diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
> index a04ed81..29902f9 100644
> --- a/tcg/ia64/tcg-target.h
> +++ b/tcg/ia64/tcg-target.h
> @@ -160,7 +160,7 @@ typedef enum {
>  #define TCG_TARGET_HAS_muluh_i64        0
>  #define TCG_TARGET_HAS_mulsh_i32        0
>  #define TCG_TARGET_HAS_mulsh_i64        0
> -#define TCG_TARGET_HAS_trunc_shr_i32    0
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
>  
>  #define TCG_TARGET_deposit_i32_valid(ofs, len) ((len) <= 16)
>  #define TCG_TARGET_deposit_i64_valid(ofs, len) ((len) <= 16)
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 48103b2..56e0a17 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -301,7 +301,7 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
>      case INDEX_op_shr_i32:
>          return (uint32_t)x >> (y & 31);
>  
> -    case INDEX_op_trunc_shr_i32:
> +    case INDEX_op_trunc_shr_i64_i32:
>      case INDEX_op_shr_i64:
>          return (uint64_t)x >> (y & 63);
>  
> @@ -880,7 +880,7 @@ void tcg_optimize(TCGContext *s)
>              }
>              break;
>  
> -        case INDEX_op_trunc_shr_i32:
> +        case INDEX_op_trunc_shr_i64_i32:
>              mask = (uint64_t)temps[args[1]].mask >> args[2];
>              break;
>  
> @@ -1028,7 +1028,7 @@ void tcg_optimize(TCGContext *s)
>              }
>              goto do_default;
>  
> -        case INDEX_op_trunc_shr_i32:
> +        case INDEX_op_trunc_shr_i64_i32:
>              if (temp_is_const(args[1])) {
>                  tmp = do_constant_folding(opc, temps[args[1]].val, args[2]);
>                  tcg_opt_gen_movi(s, op, args, args[0], tmp);
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index 7ce7048..b7e6861 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -77,7 +77,7 @@ typedef enum {
>  #if TCG_TARGET_REG_BITS == 64
>  #define TCG_TARGET_HAS_add2_i32         0
>  #define TCG_TARGET_HAS_sub2_i32         0
> -#define TCG_TARGET_HAS_trunc_shr_i32    0
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
>  #define TCG_TARGET_HAS_div_i64          1
>  #define TCG_TARGET_HAS_rem_i64          0
>  #define TCG_TARGET_HAS_rot_i64          1
> diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
> index 91576d5..50016a8 100644
> --- a/tcg/s390/tcg-target.h
> +++ b/tcg/s390/tcg-target.h
> @@ -72,7 +72,7 @@ typedef enum TCGReg {
>  #define TCG_TARGET_HAS_muls2_i32        0
>  #define TCG_TARGET_HAS_muluh_i32        0
>  #define TCG_TARGET_HAS_mulsh_i32        0
> -#define TCG_TARGET_HAS_trunc_shr_i32    0
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
>  
>  #define TCG_TARGET_HAS_div2_i64         1
>  #define TCG_TARGET_HAS_rot_i64          1
> diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
> index 1a870a8..b23032b 100644
> --- a/tcg/sparc/tcg-target.c
> +++ b/tcg/sparc/tcg-target.c
> @@ -1413,7 +1413,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_ext32u_i64:
>          tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
>          break;
> -    case INDEX_op_trunc_shr_i32:
> +    case INDEX_op_trunc_shr_i64_i32:
>          if (a2 == 0) {
>              tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
>          } else {
> @@ -1533,7 +1533,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
>  
>      { INDEX_op_ext32s_i64, { "R", "r" } },
>      { INDEX_op_ext32u_i64, { "R", "r" } },
> -    { INDEX_op_trunc_shr_i32,  { "r", "R" } },
> +    { INDEX_op_trunc_shr_i64_i32,  { "r", "R" } },
>  
>      { INDEX_op_brcond_i64, { "RZ", "RJ" } },
>      { INDEX_op_setcond_i64, { "R", "RZ", "RJ" } },
> diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
> index f584de4..336c47f 100644
> --- a/tcg/sparc/tcg-target.h
> +++ b/tcg/sparc/tcg-target.h
> @@ -118,7 +118,7 @@ extern bool use_vis3_instructions;
>  #define TCG_TARGET_HAS_muluh_i32        0
>  #define TCG_TARGET_HAS_mulsh_i32        0
>  
> -#define TCG_TARGET_HAS_trunc_shr_i32    1
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 1
>  #define TCG_TARGET_HAS_div_i64          1
>  #define TCG_TARGET_HAS_rem_i64          0
>  #define TCG_TARGET_HAS_rot_i64          0
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 45098c3..61b64db 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -1751,8 +1751,8 @@ void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
>              tcg_gen_mov_i32(ret, TCGV_LOW(t));
>              tcg_temp_free_i64(t);
>          }
> -    } else if (TCG_TARGET_HAS_trunc_shr_i32) {
> -        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i32, ret,
> +    } else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
> +        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i64_i32, ret,
>                           MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
>      } else if (count == 0) {
>          tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index 13ccb60..4a34f43 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -138,8 +138,8 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
>  DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
>  DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
>  
> -DEF(trunc_shr_i32, 1, 1, 1,
> -    IMPL(TCG_TARGET_HAS_trunc_shr_i32)
> +DEF(trunc_shr_i64_i32, 1, 1, 1,
> +    IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
>      | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
>  
>  DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | IMPL64)
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 231a781..e7e33b9 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -66,7 +66,7 @@ typedef uint64_t TCGRegSet;
>  
>  #if TCG_TARGET_REG_BITS == 32
>  /* Turn some undef macros into false macros.  */
> -#define TCG_TARGET_HAS_trunc_shr_i32    0
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
>  #define TCG_TARGET_HAS_div_i64          0
>  #define TCG_TARGET_HAS_rem_i64          0
>  #define TCG_TARGET_HAS_div2_i64         0
> diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
> index cbf3f9b..8b1139b 100644
> --- a/tcg/tci/tcg-target.h
> +++ b/tcg/tci/tcg-target.h
> @@ -84,7 +84,7 @@
>  #define TCG_TARGET_HAS_mulsh_i32        0
>  
>  #if TCG_TARGET_REG_BITS == 64
> -#define TCG_TARGET_HAS_trunc_shr_i32    0
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
>  #define TCG_TARGET_HAS_bswap16_i64      1
>  #define TCG_TARGET_HAS_bswap32_i64      1
>  #define TCG_TARGET_HAS_bswap64_i64      1

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 06/10] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 06/10] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
@ 2015-07-31  7:32   ` Alex Bennée
  0 siblings, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2015-07-31  7:32 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson


Aurelien Jarno <aurelien@aurel32.net> writes:

> The tcg_gen_trunc_shr_i64_i32 function takes a 64-bit argument and
> returns a 32-bit value. Directly call tcg_gen_op3 with the correct
> types instead of calling tcg_gen_op3i_i32 and abusing the TCG types.
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/tcg-op.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 61b64db..0e79fd1 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -1752,8 +1752,8 @@ void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
>              tcg_temp_free_i64(t);
>          }
>      } else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
> -        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i64_i32, ret,
> -                         MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
> +        tcg_gen_op3(&tcg_ctx, INDEX_op_trunc_shr_i64_i32,
> +                    GET_TCGV_I32(ret), GET_TCGV_I64(arg), count);
>      } else if (count == 0) {
>          tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
>      } else {

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 07/10] tcg: implement real ext_i32_i64 and extu_i32_i64 ops
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 07/10] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
@ 2015-07-31 16:01   ` Alex Bennée
  2015-07-31 16:11     ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2015-07-31 16:01 UTC (permalink / raw)
  To: Aurelien Jarno
  Cc: Claudio Fontana, Stefan Weil, Claudio Fontana, qemu-devel,
	Alexander Graf, Blue Swirl, Richard Henderson


Aurelien Jarno <aurelien@aurel32.net> writes:

> Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a
> 32-bit value is always converted to a 64-bit value and not propagated
> through the register allocator or the optimizer.
>
> Cc: Andrzej Zaborowski <balrogg@gmail.com>
> Cc: Alexander Graf <agraf@suse.de>
> Cc: Blue Swirl <blauwirbel@gmail.com>
> Cc: Claudio Fontana <claudio.fontana@huawei.com>
> Cc: Claudio Fontana <claudio.fontana@gmail.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Stefan Weil <sw@weilnetz.de>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/aarch64/tcg-target.c |  4 ++++
>  tcg/i386/tcg-target.c    |  5 +++++
>  tcg/ia64/tcg-target.c    |  4 ++++
>  tcg/ppc/tcg-target.c     |  6 ++++++
>  tcg/s390/tcg-target.c    |  5 +++++
>  tcg/sparc/tcg-target.c   |  8 ++++++--
>  tcg/tcg-op.c             | 10 ++++------
>  tcg/tcg-opc.h            |  3 +++
>  tcg/tci/tcg-target.c     |  4 ++++
>  tci.c                    |  6 ++++--
>  10 files changed, 45 insertions(+), 10 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index b7ec4f5..7f7ab7e 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -1556,6 +1556,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_ext16s_i32:
>          tcg_out_sxt(s, ext, MO_16, a0, a1);
>          break;
> +    case INDEX_op_ext_i32_i64:
>      case INDEX_op_ext32s_i64:
>          tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
>          break;
> @@ -1567,6 +1568,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_ext16u_i32:
>          tcg_out_uxt(s, MO_16, a0, a1);
>          break;
> +    case INDEX_op_extu_i32_i64:
>      case INDEX_op_ext32u_i64:
>          tcg_out_movr(s, TCG_TYPE_I32, a0, a1);

So what is the difference between extu_i32_i64 and ext32u_i64. The
README skips over this particular part of the naming convention and I
wonder if we should be clearer about that before we add more ops.

>          break;
> @@ -1712,6 +1714,8 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
>      { INDEX_op_ext8u_i64, { "r", "r" } },
>      { INDEX_op_ext16u_i64, { "r", "r" } },
>      { INDEX_op_ext32u_i64, { "r", "r" } },
> +    { INDEX_op_ext_i32_i64, { "r", "r" } },
> +    { INDEX_op_extu_i32_i64, { "r", "r" } },
>  
>      { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
>      { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
> diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
> index 4f40468..ff55499 100644
> --- a/tcg/i386/tcg-target.c
> +++ b/tcg/i386/tcg-target.c
> @@ -2068,9 +2068,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_bswap64_i64:
>          tcg_out_bswap64(s, args[0]);
>          break;
> +    case INDEX_op_extu_i32_i64:
>      case INDEX_op_ext32u_i64:
>          tcg_out_ext32u(s, args[0], args[1]);
>          break;
> +    case INDEX_op_ext_i32_i64:
>      case INDEX_op_ext32s_i64:
>          tcg_out_ext32s(s, args[0], args[1]);
>          break;
> @@ -2205,6 +2207,9 @@ static const TCGTargetOpDef x86_op_defs[] = {
>      { INDEX_op_ext16u_i64, { "r", "r" } },
>      { INDEX_op_ext32u_i64, { "r", "r" } },
>  
> +    { INDEX_op_ext_i32_i64, { "r", "r" } },
> +    { INDEX_op_extu_i32_i64, { "r", "r" } },
> +
>      { INDEX_op_deposit_i64, { "Q", "0", "Q" } },
>      { INDEX_op_movcond_i64, { "r", "r", "re", "r", "0" } },
>  
> diff --git a/tcg/ia64/tcg-target.c b/tcg/ia64/tcg-target.c
> index 81cb9f7..71e79cf 100644
> --- a/tcg/ia64/tcg-target.c
> +++ b/tcg/ia64/tcg-target.c
> @@ -2148,9 +2148,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_ext16u_i64:
>          tcg_out_ext(s, OPC_ZXT2_I29, args[0], args[1]);
>          break;
> +    case INDEX_op_ext_i32_i64:
>      case INDEX_op_ext32s_i64:
>          tcg_out_ext(s, OPC_SXT4_I29, args[0], args[1]);
>          break;
> +    case INDEX_op_extu_i32_i64:
>      case INDEX_op_ext32u_i64:
>          tcg_out_ext(s, OPC_ZXT4_I29, args[0], args[1]);
>          break;
> @@ -2301,6 +2303,8 @@ static const TCGTargetOpDef ia64_op_defs[] = {
>      { INDEX_op_ext16u_i64, { "r", "rZ"} },
>      { INDEX_op_ext32s_i64, { "r", "rZ"} },
>      { INDEX_op_ext32u_i64, { "r", "rZ"} },
> +    { INDEX_op_ext_i32_i64, { "r", "rZ" } },
> +    { INDEX_op_extu_i32_i64, { "r", "rZ" } },
>  
>      { INDEX_op_bswap16_i64, { "r", "rZ" } },
>      { INDEX_op_bswap32_i64, { "r", "rZ" } },
> diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
> index ce8d546..1672220 100644
> --- a/tcg/ppc/tcg-target.c
> +++ b/tcg/ppc/tcg-target.c
> @@ -2221,12 +2221,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>      case INDEX_op_ext16s_i64:
>          c = EXTSH;
>          goto gen_ext;
> +    case INDEX_op_ext_i32_i64:
>      case INDEX_op_ext32s_i64:
>          c = EXTSW;
>          goto gen_ext;
>      gen_ext:
>          tcg_out32(s, c | RS(args[1]) | RA(args[0]));
>          break;
> +    case INDEX_op_extu_i32_i64:
> +        tcg_out_ext32u(s, args[0], args[1]);
> +        break;
>  
>      case INDEX_op_setcond_i32:
>          tcg_out_setcond(s, TCG_TYPE_I32, args[3], args[0], args[1], args[2],
> @@ -2503,6 +2507,8 @@ static const TCGTargetOpDef ppc_op_defs[] = {
>      { INDEX_op_ext8s_i64, { "r", "r" } },
>      { INDEX_op_ext16s_i64, { "r", "r" } },
>      { INDEX_op_ext32s_i64, { "r", "r" } },
> +    { INDEX_op_ext_i32_i64, { "r", "r" } },
> +    { INDEX_op_extu_i32_i64, { "r", "r" } },

Again getting confused about naming here - I would read both of those as
widen from 32 to 64 bit registers without sign extension.


>      { INDEX_op_bswap16_i64, { "r", "r" } },
>      { INDEX_op_bswap32_i64, { "r", "r" } },
>      { INDEX_op_bswap64_i64, { "r", "r" } },
> diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
> index b3433ce..d4db6d3 100644
> --- a/tcg/s390/tcg-target.c
> +++ b/tcg/s390/tcg-target.c
> @@ -2106,6 +2106,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_ext16s_i64:
>          tgen_ext16s(s, TCG_TYPE_I64, args[0], args[1]);
>          break;
> +    case INDEX_op_ext_i32_i64:
>      case INDEX_op_ext32s_i64:
>          tgen_ext32s(s, args[0], args[1]);
>          break;
> @@ -2115,6 +2116,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_ext16u_i64:
>          tgen_ext16u(s, TCG_TYPE_I64, args[0], args[1]);
>          break;
> +    case INDEX_op_extu_i32_i64:
>      case INDEX_op_ext32u_i64:
>          tgen_ext32u(s, args[0], args[1]);
>          break;
> @@ -2267,6 +2269,9 @@ static const TCGTargetOpDef s390_op_defs[] = {
>      { INDEX_op_ext32s_i64, { "r", "r" } },
>      { INDEX_op_ext32u_i64, { "r", "r" } },
>  
> +    { INDEX_op_ext_i32_i64, { "r", "r" } },
> +    { INDEX_op_extu_i32_i64, { "r", "r" } },
> +
>      { INDEX_op_bswap16_i64, { "r", "r" } },
>      { INDEX_op_bswap32_i64, { "r", "r" } },
>      { INDEX_op_bswap64_i64, { "r", "r" } },
> diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
> index b23032b..fe75af0 100644
> --- a/tcg/sparc/tcg-target.c
> +++ b/tcg/sparc/tcg-target.c
> @@ -1407,9 +1407,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_divu_i64:
>          c = ARITH_UDIVX;
>          goto gen_arith;
> +    case INDEX_op_ext_i32_i64:
>      case INDEX_op_ext32s_i64:
>          tcg_out_arithi(s, a0, a1, 0, SHIFT_SRA);
>          break;
> +    case INDEX_op_extu_i32_i64:
>      case INDEX_op_ext32u_i64:
>          tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
>          break;
> @@ -1531,8 +1533,10 @@ static const TCGTargetOpDef sparc_op_defs[] = {
>      { INDEX_op_neg_i64, { "R", "RJ" } },
>      { INDEX_op_not_i64, { "R", "RJ" } },
>  
> -    { INDEX_op_ext32s_i64, { "R", "r" } },
> -    { INDEX_op_ext32u_i64, { "R", "r" } },
> +    { INDEX_op_ext32s_i64, { "R", "R" } },
> +    { INDEX_op_ext32u_i64, { "R", "R" } },
> +    { INDEX_op_ext_i32_i64, { "R", "r" } },
> +    { INDEX_op_extu_i32_i64, { "R", "r" } },
>      { INDEX_op_trunc_shr_i64_i32,  { "r", "R" } },
>  
>      { INDEX_op_brcond_i64, { "RZ", "RJ" } },
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 0e79fd1..7114315 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -1770,9 +1770,8 @@ void tcg_gen_extu_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
>          tcg_gen_mov_i32(TCGV_LOW(ret), arg);
>          tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
>      } else {
> -        /* Note: we assume the target supports move between
> -           32 and 64 bit registers.  */
> -        tcg_gen_ext32u_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
> +        tcg_gen_op2(&tcg_ctx, INDEX_op_extu_i32_i64,
> +                    GET_TCGV_I64(ret), GET_TCGV_I32(arg));
>      }
>  }
>  
> @@ -1782,9 +1781,8 @@ void tcg_gen_ext_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
>          tcg_gen_mov_i32(TCGV_LOW(ret), arg);
>          tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
>      } else {
> -        /* Note: we assume the target supports move between
> -           32 and 64 bit registers.  */
> -        tcg_gen_ext32s_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
> +        tcg_gen_op2(&tcg_ctx, INDEX_op_ext_i32_i64,
> +                    GET_TCGV_I64(ret), GET_TCGV_I32(arg));
>      }
>  }
>  
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index 4a34f43..f721a5a 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -138,6 +138,9 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
>  DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
>  DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
>  
> +/* size changing ops */
> +DEF(ext_i32_i64, 1, 1, 0, IMPL64)
> +DEF(extu_i32_i64, 1, 1, 0, IMPL64)
>  DEF(trunc_shr_i64_i32, 1, 1, 1,
>      IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
>      | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
> diff --git a/tcg/tci/tcg-target.c b/tcg/tci/tcg-target.c
> index 83472db..bbb54d4 100644
> --- a/tcg/tci/tcg-target.c
> +++ b/tcg/tci/tcg-target.c
> @@ -210,6 +210,8 @@ static const TCGTargetOpDef tcg_target_op_defs[] = {
>  #if TCG_TARGET_HAS_ext32u_i64
>      { INDEX_op_ext32u_i64, { R, R } },
>  #endif
> +    { INDEX_op_ext_i32_i64, { R, R } },
> +    { INDEX_op_extu_i32_i64, { R, R } },
>  #if TCG_TARGET_HAS_bswap16_i64
>      { INDEX_op_bswap16_i64, { R, R } },
>  #endif
> @@ -701,6 +703,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>      case INDEX_op_ext16u_i64:   /* Optional (TCG_TARGET_HAS_ext16u_i64). */
>      case INDEX_op_ext32s_i64:   /* Optional (TCG_TARGET_HAS_ext32s_i64). */
>      case INDEX_op_ext32u_i64:   /* Optional (TCG_TARGET_HAS_ext32u_i64). */
> +    case INDEX_op_ext_i32_i64:
> +    case INDEX_op_extu_i32_i64:
>  #endif /* TCG_TARGET_REG_BITS == 64 */
>      case INDEX_op_neg_i32:      /* Optional (TCG_TARGET_HAS_neg_i32). */
>      case INDEX_op_not_i32:      /* Optional (TCG_TARGET_HAS_not_i32). */
> diff --git a/tci.c b/tci.c
> index 8444948..3d6d177 100644
> --- a/tci.c
> +++ b/tci.c
> @@ -1033,18 +1033,20 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
>  #endif
>  #if TCG_TARGET_HAS_ext32s_i64
>          case INDEX_op_ext32s_i64:
> +#endif
> +        case INDEX_op_ext_i32_i64:
>              t0 = *tb_ptr++;
>              t1 = tci_read_r32s(&tb_ptr);
>              tci_write_reg64(t0, t1);
>              break;
> -#endif
>  #if TCG_TARGET_HAS_ext32u_i64
>          case INDEX_op_ext32u_i64:
> +#endif
> +        case INDEX_op_extu_i32_i64:
>              t0 = *tb_ptr++;
>              t1 = tci_read_r32(&tb_ptr);
>              tci_write_reg64(t0, t1);
>              break;
> -#endif
>  #if TCG_TARGET_HAS_bswap16_i64
>          case INDEX_op_bswap16_i64:
>              TODO();

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 10/10] tcg: update README about size changing ops
  2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 10/10] tcg: update README about size changing ops Aurelien Jarno
@ 2015-07-31 16:02   ` Alex Bennée
  0 siblings, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2015-07-31 16:02 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson


Aurelien Jarno <aurelien@aurel32.net> writes:

> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/README | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/tcg/README b/tcg/README
> index 61b3899..a22f251 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -466,13 +466,25 @@ On a 32 bit target, all 64 bit operations are converted to 32 bits. A
>  few specific operations must be implemented to allow it (see add2_i32,
>  sub2_i32, brcond2_i32).
>  
> +On a 64 bit target, the values are transfered between 32 and 64-bit
> +registers using the following ops:
> +- trunc_shr_i64_i32
> +- ext_i32_i64
> +- extu_i32_i64
> +
> +They ensure that the values are correctly truncated or extended when
> +moved from a 32-bit to a 64-bit register or vice-versa. Note that the
> +trunc_shr_i64_i32 is an optional op. It is not necessary to implement
> +it if all the following conditions are met:
> +- 64-bit registers can hold 32-bit values
> +- 32-bit values in a 64-bit register do not need to stay zero or
> +  sign extended
> +- all 32-bit TCG ops ignore the high part of 64-bit registers
> +
>  Floating point operations are not supported in this version. A
>  previous incarnation of the code generator had full support of them,
>  but it is better to concentrate on integer operations first.
>  
> -On a 64 bit target, no assumption is made in TCG about the storage of
> -the 32 bit values in 64 bit registers.
> -
>  4.2) Constraints
>  
>  GCC like constraints are used to define the constraints of every

I would include patches to the README with the respective patches that
introduced the changes. It is basically an extended comment ;-)

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.5 07/10] tcg: implement real ext_i32_i64 and extu_i32_i64 ops
  2015-07-31 16:01   ` Alex Bennée
@ 2015-07-31 16:11     ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2015-07-31 16:11 UTC (permalink / raw)
  To: Alex Bennée, Aurelien Jarno
  Cc: Claudio Fontana, Stefan Weil, Claudio Fontana, qemu-devel,
	Alexander Graf, Blue Swirl

On 07/31/2015 09:01 AM, Alex Bennée wrote:
>
> Aurelien Jarno <aurelien@aurel32.net> writes:
>
>> Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a
>> 32-bit value is always converted to a 64-bit value and not propagated
>> through the register allocator or the optimizer.
>>
>> Cc: Andrzej Zaborowski <balrogg@gmail.com>
>> Cc: Alexander Graf <agraf@suse.de>
>> Cc: Blue Swirl <blauwirbel@gmail.com>
>> Cc: Claudio Fontana <claudio.fontana@huawei.com>
>> Cc: Claudio Fontana <claudio.fontana@gmail.com>
>> Cc: Richard Henderson <rth@twiddle.net>
>> Cc: Stefan Weil <sw@weilnetz.de>
>> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
>> ---
>>   tcg/aarch64/tcg-target.c |  4 ++++
>>   tcg/i386/tcg-target.c    |  5 +++++
>>   tcg/ia64/tcg-target.c    |  4 ++++
>>   tcg/ppc/tcg-target.c     |  6 ++++++
>>   tcg/s390/tcg-target.c    |  5 +++++
>>   tcg/sparc/tcg-target.c   |  8 ++++++--
>>   tcg/tcg-op.c             | 10 ++++------
>>   tcg/tcg-opc.h            |  3 +++
>>   tcg/tci/tcg-target.c     |  4 ++++
>>   tci.c                    |  6 ++++--
>>   10 files changed, 45 insertions(+), 10 deletions(-)
>>
>> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
>> index b7ec4f5..7f7ab7e 100644
>> --- a/tcg/aarch64/tcg-target.c
>> +++ b/tcg/aarch64/tcg-target.c
>> @@ -1556,6 +1556,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>>       case INDEX_op_ext16s_i32:
>>           tcg_out_sxt(s, ext, MO_16, a0, a1);
>>           break;
>> +    case INDEX_op_ext_i32_i64:
>>       case INDEX_op_ext32s_i64:
>>           tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
>>           break;
>> @@ -1567,6 +1568,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>>       case INDEX_op_ext16u_i32:
>>           tcg_out_uxt(s, MO_16, a0, a1);
>>           break;
>> +    case INDEX_op_extu_i32_i64:
>>       case INDEX_op_ext32u_i64:
>>           tcg_out_movr(s, TCG_TYPE_I32, a0, a1);
>
> So what is the difference between extu_i32_i64 and ext32u_i64. The
> README skips over this particular part of the naming convention and I
> wonder if we should be clearer about that before we add more ops.

The size of the input, for one.  The possibility of eliding is another.

Our current plan for x86_64 and aarch64 is to canonicalize all 32-bit 
quantities to be zero-extended in the register.  Primarily because for both 
platforms this can be done for free.  Thus exts_i32_64 and extrl_i64_i32 will 
require implementation, but extu_i32_i64 will be replaced by a move.

Similarly, mips64 would keep values sign-extended (as *required* by the 
standard for all 32-bit operations), and thus exts_i32_i64 would be replaced by 
a move.

Other targets will probably make extrl_i64_i32 be the move, since the 32-bit 
ops don't re-extend for free.


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-07-31 16:12 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking Aurelien Jarno
2015-07-27  8:21   ` Paolo Bonzini
2015-07-27  9:09     ` Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 02/10] tcg/optimize: add temp_is_const and temp_is_copy functions Aurelien Jarno
2015-07-29 16:01   ` Alex Bennée
2015-07-29 16:25     ` Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately Aurelien Jarno
2015-07-27  8:25   ` Paolo Bonzini
2015-07-27  9:10     ` Aurelien Jarno
2015-07-29 16:10   ` Alex Bennée
2015-07-29 16:25     ` Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies Aurelien Jarno
2015-07-24 20:15   ` Richard Henderson
2015-07-24 22:56     ` Aurelien Jarno
2015-07-29 16:12   ` Alex Bennée
2015-07-29 16:27     ` Aurelien Jarno
2015-07-29 20:42       ` Alex Bennée
2015-07-30  7:46         ` Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 05/10] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
2015-07-31  6:31   ` Alex Bennée
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 06/10] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
2015-07-31  7:32   ` Alex Bennée
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 07/10] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
2015-07-31 16:01   ` Alex Bennée
2015-07-31 16:11     ` Richard Henderson
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 08/10] tcg/optimize: add optimizations for " Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 09/10] tcg/optimize: do not remember garbage high bits for 32-bit ops Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 10/10] tcg: update README about size changing ops Aurelien Jarno
2015-07-31 16:02   ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.