[Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements
@ 2012-10-02 18:32 Richard Henderson
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 01/10] tcg: Split out swap_commutative as a subroutine Richard Henderson
                   ` (10 more replies)
  0 siblings, 11 replies; 30+ messages in thread
From: Richard Henderson @ 2012-10-02 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Changes v1->v2:

* Patch 1 changes the exact swap condition.  This helps add2 for e.g.

    add2 tmp4,tmp5,tmp4,tmp5,c1,c2

  where tmp5, c1, and c2 are all input constants.  Since tmp4 is variable,
  we cannot constant fold this.  But the existing swap condition would give

    add2 tmp4.tmp5,tmp4,c2,c1,tmp5

  While not incorrect, we do want to prefer "adc $c2,tmp5" on i686.

* Patch 2 drops the partial constant folding for add2/sub2.  It only
  does the operand ordering for add2.

* Patch 4 is new.  When writing the code for brcond2 et al, it did seem
  silly to do all the gen_args[N] = args[N] copying by hand.  I think the
  patch makes the code more readable.

* Patch 5 has the operand typo fixed that Aurelien noticed.

* Patch 8 is new, adding the extra nop into the opcode stream that
  was suggested on the list.  With this we fully constant fold add2/sub2.

* Patch 9 is new.  While looking at dumps from x86_64 bios boot, I noticed
  that sequences of push/pop insns leave the high-part of %rsp dead.   And
  in general any 32-bit addition in which the high-part isn't "consumed"
  by cc_dst.

* Patch 10 is new, treating mulu2 similarly to add2.  It triggers frequently
  during the boot of seabios, and should not be expensive.

r~

Richard Henderson (10):
  tcg: Split out swap_commutative as a subroutine
  tcg: Canonicalize add2 operand ordering
  tcg: Swap commutative double-word comparisons
  tcg: Use common code when failing to optimize
  tcg: Optimize double-word comparisons against zero
  tcg: Split out subroutines from do_constant_folding_cond
  tcg: Do constant folding on double-word comparisons
  tcg: Constant fold add2 and sub2
  tcg: Optimize half-dead add2/sub2
  tcg: Optimize mulu2

 tcg/optimize.c | 465 ++++++++++++++++++++++++++++++++++++++-------------------
 tcg/tcg-op.h   |  11 ++
 tcg/tcg.c      |  53 ++++++-
 3 files changed, 377 insertions(+), 152 deletions(-)

-- 
1.7.11.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 01/10] tcg: Split out swap_commutative as a subroutine
  2012-10-02 18:32 [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Richard Henderson
@ 2012-10-02 18:32 ` Richard Henderson
  2012-10-09 15:13   ` Aurelien Jarno
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 02/10] tcg: Canonicalize add2 operand ordering Richard Henderson
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2012-10-02 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Reduces code duplication and prefers

  movcond d, c1, c2, const, s
to
  movcond d, c1, c2, s, const

It also prefers

  add r, r, c
over
  add r, c, r

when both inputs are known constants.  This doesn't matter for true add, as
we will fully constant fold that.  But it matters for a follow-on patch using
this routine for add2 which may not be fully foldable.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 56 ++++++++++++++++++++++++--------------------------------
 1 file changed, 24 insertions(+), 32 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 35532a1..5e0504a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -382,6 +382,23 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
     tcg_abort();
 }
 
+static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
+{
+    TCGArg a1 = *p1, a2 = *p2;
+    int sum = 0;
+    sum += temps[a1].state == TCG_TEMP_CONST;
+    sum -= temps[a2].state == TCG_TEMP_CONST;
+
+    /* Prefer the constant in second argument, and then the form
+       op a, a, b, which is better handled on non-RISC hosts. */
+    if (sum > 0 || (sum == 0 && dest == a2)) {
+        *p1 = a2;
+        *p2 = a1;
+        return true;
+    }
+    return false;
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                                     TCGArg *args, TCGOpDef *tcg_op_defs)
@@ -391,7 +408,6 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
     const TCGOpDef *def;
     TCGArg *gen_args;
     TCGArg tmp;
-    TCGCond cond;
 
     /* Array VALS has an element for each temp.
        If this temp holds a constant then its value is kept in VALS' element.
@@ -434,52 +450,28 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         CASE_OP_32_64(eqv):
         CASE_OP_32_64(nand):
         CASE_OP_32_64(nor):
-            /* Prefer the constant in second argument, and then the form
-               op a, a, b, which is better handled on non-RISC hosts. */
-            if (temps[args[1]].state == TCG_TEMP_CONST || (args[0] == args[2]
-                && temps[args[2]].state != TCG_TEMP_CONST)) {
-                tmp = args[1];
-                args[1] = args[2];
-                args[2] = tmp;
-            }
+            swap_commutative(args[0], &args[1], &args[2]);
             break;
         CASE_OP_32_64(brcond):
-            if (temps[args[0]].state == TCG_TEMP_CONST
-                && temps[args[1]].state != TCG_TEMP_CONST) {
-                tmp = args[0];
-                args[0] = args[1];
-                args[1] = tmp;
+            if (swap_commutative(-1, &args[0], &args[1])) {
                 args[2] = tcg_swap_cond(args[2]);
             }
             break;
         CASE_OP_32_64(setcond):
-            if (temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[2]].state != TCG_TEMP_CONST) {
-                tmp = args[1];
-                args[1] = args[2];
-                args[2] = tmp;
+            if (swap_commutative(args[0], &args[1], &args[2])) {
                 args[3] = tcg_swap_cond(args[3]);
             }
             break;
         CASE_OP_32_64(movcond):
-            cond = args[5];
-            if (temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[2]].state != TCG_TEMP_CONST) {
-                tmp = args[1];
-                args[1] = args[2];
-                args[2] = tmp;
-                cond = tcg_swap_cond(cond);
+            if (swap_commutative(-1, &args[1], &args[2])) {
+                args[5] = tcg_swap_cond(args[5]);
             }
             /* For movcond, we canonicalize the "false" input reg to match
                the destination reg so that the tcg backend can implement
                a "move if true" operation.  */
-            if (args[0] == args[3]) {
-                tmp = args[3];
-                args[3] = args[4];
-                args[4] = tmp;
-                cond = tcg_invert_cond(cond);
+            if (swap_commutative(args[0], &args[4], &args[3])) {
+                args[5] = tcg_invert_cond(args[5]);
             }
-            args[5] = cond;
         default:
             break;
         }
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 02/10] tcg: Canonicalize add2 operand ordering
  2012-10-02 18:32 [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Richard Henderson
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 01/10] tcg: Split out swap_commutative as a subroutine Richard Henderson
@ 2012-10-02 18:32 ` Richard Henderson
  2012-10-09 15:14   ` Aurelien Jarno
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 03/10] tcg: Swap commutative double-word comparisons Richard Henderson
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2012-10-02 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 5e0504a..3539826 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -472,6 +472,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             if (swap_commutative(args[0], &args[4], &args[3])) {
                 args[5] = tcg_invert_cond(args[5]);
             }
+            break;
+        case INDEX_op_add2_i32:
+            swap_commutative(args[0], &args[2], &args[4]);
+            swap_commutative(args[1], &args[3], &args[5]);
+            break;
         default:
             break;
         }
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 03/10] tcg: Swap commutative double-word comparisons
  2012-10-02 18:32 [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Richard Henderson
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 01/10] tcg: Split out swap_commutative as a subroutine Richard Henderson
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 02/10] tcg: Canonicalize add2 operand ordering Richard Henderson
@ 2012-10-02 18:32 ` Richard Henderson
  2012-10-09 15:16   ` Aurelien Jarno
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 04/10] tcg: Use common code when failing to optimize Richard Henderson
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2012-10-02 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 3539826..a713513 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -399,6 +399,22 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
     return false;
 }
 
+static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
+{
+    int sum = 0;
+    sum += temps[p1[0]].state == TCG_TEMP_CONST;
+    sum += temps[p1[1]].state == TCG_TEMP_CONST;
+    sum -= temps[p2[0]].state == TCG_TEMP_CONST;
+    sum -= temps[p2[1]].state == TCG_TEMP_CONST;
+    if (sum > 0) {
+        TCGArg t;
+        t = p1[0], p1[0] = p2[0], p2[0] = t;
+        t = p1[1], p1[1] = p2[1], p2[1] = t;
+        return true;
+    }
+    return false;
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                                     TCGArg *args, TCGOpDef *tcg_op_defs)
@@ -477,6 +493,16 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             swap_commutative(args[0], &args[2], &args[4]);
             swap_commutative(args[1], &args[3], &args[5]);
             break;
+        case INDEX_op_brcond2_i32:
+            if (swap_commutative2(&args[0], &args[2])) {
+                args[4] = tcg_swap_cond(args[4]);
+            }
+            break;
+        case INDEX_op_setcond2_i32:
+            if (swap_commutative2(&args[1], &args[3])) {
+                args[5] = tcg_swap_cond(args[5]);
+            }
+            break;
         default:
             break;
         }
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 04/10] tcg: Use common code when failing to optimize
  2012-10-02 18:32 [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 03/10] tcg: Swap commutative double-word comparisons Richard Henderson
@ 2012-10-02 18:32 ` Richard Henderson
  2012-10-09 15:25   ` Aurelien Jarno
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 05/10] tcg: Optimize double-word comparisons against zero Richard Henderson
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2012-10-02 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

This saves a whole lot of repetitive code sequences.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 91 +++++++++++++++++++++-------------------------------------
 1 file changed, 32 insertions(+), 59 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index a713513..592d166 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -639,6 +639,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             gen_args += 2;
             args += 2;
             break;
+
         CASE_OP_32_64(not):
         CASE_OP_32_64(neg):
         CASE_OP_32_64(ext8s):
@@ -651,14 +652,12 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 gen_opc_buf[op_index] = op_to_movi(op);
                 tmp = do_constant_folding(op, temps[args[1]].val, 0);
                 tcg_opt_gen_movi(gen_args, args[0], tmp);
-            } else {
-                reset_temp(args[0]);
-                gen_args[0] = args[0];
-                gen_args[1] = args[1];
+                gen_args += 2;
+                args += 2;
+                break;
             }
-            gen_args += 2;
-            args += 2;
-            break;
+            goto do_default;
+
         CASE_OP_32_64(add):
         CASE_OP_32_64(sub):
         CASE_OP_32_64(mul):
@@ -682,15 +681,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                                           temps[args[2]].val);
                 tcg_opt_gen_movi(gen_args, args[0], tmp);
                 gen_args += 2;
-            } else {
-                reset_temp(args[0]);
-                gen_args[0] = args[0];
-                gen_args[1] = args[1];
-                gen_args[2] = args[2];
-                gen_args += 3;
+                args += 3;
+                break;
             }
-            args += 3;
-            break;
+            goto do_default;
+
         CASE_OP_32_64(deposit):
             if (temps[args[1]].state == TCG_TEMP_CONST
                 && temps[args[2]].state == TCG_TEMP_CONST) {
@@ -700,33 +695,22 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                       | ((temps[args[2]].val & tmp) << args[3]);
                 tcg_opt_gen_movi(gen_args, args[0], tmp);
                 gen_args += 2;
-            } else {
-                reset_temp(args[0]);
-                gen_args[0] = args[0];
-                gen_args[1] = args[1];
-                gen_args[2] = args[2];
-                gen_args[3] = args[3];
-                gen_args[4] = args[4];
-                gen_args += 5;
+                args += 5;
+                break;
             }
-            args += 5;
-            break;
+            goto do_default;
+
         CASE_OP_32_64(setcond):
             tmp = do_constant_folding_cond(op, args[1], args[2], args[3]);
             if (tmp != 2) {
                 gen_opc_buf[op_index] = op_to_movi(op);
                 tcg_opt_gen_movi(gen_args, args[0], tmp);
                 gen_args += 2;
-            } else {
-                reset_temp(args[0]);
-                gen_args[0] = args[0];
-                gen_args[1] = args[1];
-                gen_args[2] = args[2];
-                gen_args[3] = args[3];
-                gen_args += 4;
+                args += 4;
+                break;
             }
-            args += 4;
-            break;
+            goto do_default;
+
         CASE_OP_32_64(brcond):
             tmp = do_constant_folding_cond(op, args[0], args[1], args[2]);
             if (tmp != 2) {
@@ -738,17 +722,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 } else {
                     gen_opc_buf[op_index] = INDEX_op_nop;
                 }
-            } else {
-                memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
-                reset_temp(args[0]);
-                gen_args[0] = args[0];
-                gen_args[1] = args[1];
-                gen_args[2] = args[2];
-                gen_args[3] = args[3];
-                gen_args += 4;
+                args += 4;
+                break;
             }
-            args += 4;
-            break;
+            goto do_default;
+
         CASE_OP_32_64(movcond):
             tmp = do_constant_folding_cond(op, args[1], args[2], args[5]);
             if (tmp != 2) {
@@ -763,18 +741,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                     tcg_opt_gen_mov(s, gen_args, args[0], args[4-tmp]);
                     gen_args += 2;
                 }
-            } else {
-                reset_temp(args[0]);
-                gen_args[0] = args[0];
-                gen_args[1] = args[1];
-                gen_args[2] = args[2];
-                gen_args[3] = args[3];
-                gen_args[4] = args[4];
-                gen_args[5] = args[5];
-                gen_args += 6;
+                args += 6;
+                break;
             }
-            args += 6;
-            break;
+            goto do_default;
+
         case INDEX_op_call:
             nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
             if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) {
@@ -793,11 +764,13 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                 i--;
             }
             break;
+
         default:
-            /* Default case: we do know nothing about operation so no
-               propagation is done.  We trash everything if the operation
-               is the end of a basic block, otherwise we only trash the
-               output args.  */
+        do_default:
+            /* Default case: we know nothing about operation (or were unable
+               to compute the operation result) so no propagation is done.
+               We trash everything if the operation is the end of a basic
+               block, otherwise we only trash the output args.  */
             if (def->flags & TCG_OPF_BB_END) {
                 memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
             } else {
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 05/10] tcg: Optimize double-word comparisons against zero
  2012-10-02 18:32 [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Richard Henderson
                   ` (3 preceding siblings ...)
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 04/10] tcg: Use common code when failing to optimize Richard Henderson
@ 2012-10-02 18:32 ` Richard Henderson
  2012-10-09 16:32   ` Aurelien Jarno
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 06/10] tcg: Split out subroutines from do_constant_folding_cond Richard Henderson
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2012-10-02 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 592d166..5804b66 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -746,6 +746,45 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             }
             goto do_default;
 
+        case INDEX_op_brcond2_i32:
+            /* Simplify LT/GE comparisons vs zero to a single compare
+               vs the high word of the input.  */
+            if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
+                && temps[args[2]].state == TCG_TEMP_CONST
+                && temps[args[3]].state == TCG_TEMP_CONST
+                && temps[args[2]].val == 0
+                && temps[args[3]].val == 0) {
+                gen_opc_buf[op_index] = INDEX_op_brcond_i32;
+                gen_args[0] = args[1];
+                gen_args[1] = args[3];
+                gen_args[2] = args[4];
+                gen_args[3] = args[5];
+                gen_args += 4;
+                args += 6;
+                memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
+                break;
+            }
+            goto do_default;
+
+        case INDEX_op_setcond2_i32:
+            /* Simplify LT/GE comparisons vs zero to a single compare
+               vs the high word of the input.  */
+            if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
+                && temps[args[3]].state == TCG_TEMP_CONST
+                && temps[args[4]].state == TCG_TEMP_CONST
+                && temps[args[3]].val == 0
+                && temps[args[4]].val == 0) {
+                gen_opc_buf[op_index] = INDEX_op_setcond_i32;
+                gen_args[0] = args[0];
+                gen_args[1] = args[2];
+                gen_args[2] = args[4];
+                gen_args[3] = args[5];
+                gen_args += 4;
+                args += 6;
+                break;
+            }
+            goto do_default;
+
         case INDEX_op_call:
             nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
             if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) {
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 06/10] tcg: Split out subroutines from do_constant_folding_cond
  2012-10-02 18:32 [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Richard Henderson
                   ` (4 preceding siblings ...)
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 05/10] tcg: Optimize double-word comparisons against zero Richard Henderson
@ 2012-10-02 18:32 ` Richard Henderson
  2012-10-09 16:33   ` Aurelien Jarno
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 07/10] tcg: Do constant folding on double-word comparisons Richard Henderson
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2012-10-02 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

We can re-use these for implementing double-word folding.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 146 ++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 81 insertions(+), 65 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 5804b66..38027dc 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -292,6 +292,82 @@ static TCGArg do_constant_folding(TCGOpcode op, TCGArg x, TCGArg y)
     return res;
 }
 
+static bool do_constant_folding_cond_32(uint32_t x, uint32_t y, TCGCond c)
+{
+    switch (c) {
+    case TCG_COND_EQ:
+        return x == y;
+    case TCG_COND_NE:
+        return x != y;
+    case TCG_COND_LT:
+        return (int32_t)x < (int32_t)y;
+    case TCG_COND_GE:
+        return (int32_t)x >= (int32_t)y;
+    case TCG_COND_LE:
+        return (int32_t)x <= (int32_t)y;
+    case TCG_COND_GT:
+        return (int32_t)x > (int32_t)y;
+    case TCG_COND_LTU:
+        return x < y;
+    case TCG_COND_GEU:
+        return x >= y;
+    case TCG_COND_LEU:
+        return x <= y;
+    case TCG_COND_GTU:
+        return x > y;
+    default:
+        tcg_abort();
+    }
+}
+
+static bool do_constant_folding_cond_64(uint64_t x, uint64_t y, TCGCond c)
+{
+    switch (c) {
+    case TCG_COND_EQ:
+        return x == y;
+    case TCG_COND_NE:
+        return x != y;
+    case TCG_COND_LT:
+        return (int64_t)x < (int64_t)y;
+    case TCG_COND_GE:
+        return (int64_t)x >= (int64_t)y;
+    case TCG_COND_LE:
+        return (int64_t)x <= (int64_t)y;
+    case TCG_COND_GT:
+        return (int64_t)x > (int64_t)y;
+    case TCG_COND_LTU:
+        return x < y;
+    case TCG_COND_GEU:
+        return x >= y;
+    case TCG_COND_LEU:
+        return x <= y;
+    case TCG_COND_GTU:
+        return x > y;
+    default:
+        tcg_abort();
+    }
+}
+
+static bool do_constant_folding_cond_eq(TCGCond c)
+{
+    switch (c) {
+    case TCG_COND_GT:
+    case TCG_COND_LTU:
+    case TCG_COND_LT:
+    case TCG_COND_GTU:
+    case TCG_COND_NE:
+        return 0;
+    case TCG_COND_GE:
+    case TCG_COND_GEU:
+    case TCG_COND_LE:
+    case TCG_COND_LEU:
+    case TCG_COND_EQ:
+        return 1;
+    default:
+        tcg_abort();
+    }
+}
+
 /* Return 2 if the condition can't be simplified, and the result
    of the condition (0 or 1) if it can */
 static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
@@ -300,69 +376,14 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
     if (temps[x].state == TCG_TEMP_CONST && temps[y].state == TCG_TEMP_CONST) {
         switch (op_bits(op)) {
         case 32:
-            switch (c) {
-            case TCG_COND_EQ:
-                return (uint32_t)temps[x].val == (uint32_t)temps[y].val;
-            case TCG_COND_NE:
-                return (uint32_t)temps[x].val != (uint32_t)temps[y].val;
-            case TCG_COND_LT:
-                return (int32_t)temps[x].val < (int32_t)temps[y].val;
-            case TCG_COND_GE:
-                return (int32_t)temps[x].val >= (int32_t)temps[y].val;
-            case TCG_COND_LE:
-                return (int32_t)temps[x].val <= (int32_t)temps[y].val;
-            case TCG_COND_GT:
-                return (int32_t)temps[x].val > (int32_t)temps[y].val;
-            case TCG_COND_LTU:
-                return (uint32_t)temps[x].val < (uint32_t)temps[y].val;
-            case TCG_COND_GEU:
-                return (uint32_t)temps[x].val >= (uint32_t)temps[y].val;
-            case TCG_COND_LEU:
-                return (uint32_t)temps[x].val <= (uint32_t)temps[y].val;
-            case TCG_COND_GTU:
-                return (uint32_t)temps[x].val > (uint32_t)temps[y].val;
-            }
-            break;
+            return do_constant_folding_cond_32(temps[x].val, temps[y].val, c);
         case 64:
-            switch (c) {
-            case TCG_COND_EQ:
-                return (uint64_t)temps[x].val == (uint64_t)temps[y].val;
-            case TCG_COND_NE:
-                return (uint64_t)temps[x].val != (uint64_t)temps[y].val;
-            case TCG_COND_LT:
-                return (int64_t)temps[x].val < (int64_t)temps[y].val;
-            case TCG_COND_GE:
-                return (int64_t)temps[x].val >= (int64_t)temps[y].val;
-            case TCG_COND_LE:
-                return (int64_t)temps[x].val <= (int64_t)temps[y].val;
-            case TCG_COND_GT:
-                return (int64_t)temps[x].val > (int64_t)temps[y].val;
-            case TCG_COND_LTU:
-                return (uint64_t)temps[x].val < (uint64_t)temps[y].val;
-            case TCG_COND_GEU:
-                return (uint64_t)temps[x].val >= (uint64_t)temps[y].val;
-            case TCG_COND_LEU:
-                return (uint64_t)temps[x].val <= (uint64_t)temps[y].val;
-            case TCG_COND_GTU:
-                return (uint64_t)temps[x].val > (uint64_t)temps[y].val;
-            }
-            break;
+            return do_constant_folding_cond_64(temps[x].val, temps[y].val, c);
+        default:
+            tcg_abort();
         }
     } else if (temps_are_copies(x, y)) {
-        switch (c) {
-        case TCG_COND_GT:
-        case TCG_COND_LTU:
-        case TCG_COND_LT:
-        case TCG_COND_GTU:
-        case TCG_COND_NE:
-            return 0;
-        case TCG_COND_GE:
-        case TCG_COND_GEU:
-        case TCG_COND_LE:
-        case TCG_COND_LEU:
-        case TCG_COND_EQ:
-            return 1;
-        }
+        return do_constant_folding_cond_eq(c);
     } else if (temps[y].state == TCG_TEMP_CONST && temps[y].val == 0) {
         switch (c) {
         case TCG_COND_LTU:
@@ -375,11 +396,6 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
     } else {
         return 2;
     }
-
-    fprintf(stderr,
-            "Unrecognized bitness %d or condition %d in "
-            "do_constant_folding_cond.\n", op_bits(op), c);
-    tcg_abort();
 }
 
 static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 07/10] tcg: Do constant folding on double-word comparisons
  2012-10-02 18:32 [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Richard Henderson
                   ` (5 preceding siblings ...)
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 06/10] tcg: Split out subroutines from do_constant_folding_cond Richard Henderson
@ 2012-10-02 18:32 ` Richard Henderson
  2012-10-10  9:45   ` Aurelien Jarno
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 08/10] tcg: Constant fold add2 and sub2 Richard Henderson
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2012-10-02 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 93 +++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 72 insertions(+), 21 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 38027dc..d9251e4 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -398,6 +398,40 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
     }
 }
 
+/* Return 2 if the condition can't be simplified, and the result
+   of the condition (0 or 1) if it can */
+static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
+{
+    TCGArg al = p1[0], ah = p1[1];
+    TCGArg bl = p2[0], bh = p2[1];
+
+    if (temps[bl].state == TCG_TEMP_CONST
+        && temps[bh].state == TCG_TEMP_CONST) {
+        uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val;
+
+        if (temps[al].state == TCG_TEMP_CONST
+            && temps[ah].state == TCG_TEMP_CONST) {
+            uint64_t a;
+            a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val;
+            return do_constant_folding_cond_64(a, b, c);
+        }
+        if (b == 0) {
+            switch (c) {
+            case TCG_COND_LTU:
+                return 0;
+            case TCG_COND_GEU:
+                return 1;
+            default:
+                break;
+            }
+        }
+    }
+    if (temps_are_copies(al, bl) && temps_are_copies(ah, bh)) {
+        return do_constant_folding_cond_eq(c);
+    }
+    return 2;
+}
+
 static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
 {
     TCGArg a1 = *p1, a2 = *p2;
@@ -763,43 +797,60 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             goto do_default;
 
         case INDEX_op_brcond2_i32:
-            /* Simplify LT/GE comparisons vs zero to a single compare
-               vs the high word of the input.  */
-            if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
-                && temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[3]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == 0
-                && temps[args[3]].val == 0) {
+            tmp = do_constant_folding_cond2(&args[0], &args[2], args[4]);
+            if (tmp != 2) {
+                if (tmp) {
+                    memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
+                    gen_opc_buf[op_index] = INDEX_op_br;
+                    gen_args[0] = args[5];
+                    gen_args += 1;
+                } else {
+                    gen_opc_buf[op_index] = INDEX_op_nop;
+                }
+            } else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
+                       && temps[args[2]].state == TCG_TEMP_CONST
+                       && temps[args[3]].state == TCG_TEMP_CONST
+                       && temps[args[2]].val == 0
+                       && temps[args[3]].val == 0) {
+                /* Simplify LT/GE comparisons vs zero to a single compare
+                   vs the high word of the input.  */
+                memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
                 gen_opc_buf[op_index] = INDEX_op_brcond_i32;
                 gen_args[0] = args[1];
                 gen_args[1] = args[3];
                 gen_args[2] = args[4];
                 gen_args[3] = args[5];
                 gen_args += 4;
-                args += 6;
-                memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
-                break;
+            } else {
+                goto do_default;
             }
-            goto do_default;
+            args += 6;
+            break;
 
         case INDEX_op_setcond2_i32:
-            /* Simplify LT/GE comparisons vs zero to a single compare
-               vs the high word of the input.  */
-            if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
-                && temps[args[3]].state == TCG_TEMP_CONST
-                && temps[args[4]].state == TCG_TEMP_CONST
-                && temps[args[3]].val == 0
-                && temps[args[4]].val == 0) {
+            tmp = do_constant_folding_cond2(&args[1], &args[3], args[5]);
+            if (tmp != 2) {
+                gen_opc_buf[op_index] = INDEX_op_movi_i32;
+                tcg_opt_gen_movi(gen_args, args[0], tmp);
+                gen_args += 2;
+            } else if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
+                       && temps[args[3]].state == TCG_TEMP_CONST
+                       && temps[args[4]].state == TCG_TEMP_CONST
+                       && temps[args[3]].val == 0
+                       && temps[args[4]].val == 0) {
+                /* Simplify LT/GE comparisons vs zero to a single compare
+                   vs the high word of the input.  */
                 gen_opc_buf[op_index] = INDEX_op_setcond_i32;
                 gen_args[0] = args[0];
                 gen_args[1] = args[2];
                 gen_args[2] = args[4];
                 gen_args[3] = args[5];
                 gen_args += 4;
-                args += 6;
-                break;
+            } else {
+                goto do_default;
             }
-            goto do_default;
+            args += 6;
+            break;
 
         case INDEX_op_call:
             nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 08/10] tcg: Constant fold add2 and sub2
  2012-10-02 18:32 [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Richard Henderson
                   ` (6 preceding siblings ...)
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 07/10] tcg: Do constant folding on double-word comparisons Richard Henderson
@ 2012-10-02 18:32 ` Richard Henderson
  2012-10-10  9:52   ` Aurelien Jarno
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 09/10] tcg: Optimize half-dead add2/sub2 Richard Henderson
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2012-10-02 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 35 +++++++++++++++++++++++++++++++++++
 tcg/tcg-op.h   |  9 +++++++++
 2 files changed, 44 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index d9251e4..05891ef 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -796,6 +796,41 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             }
             goto do_default;
 
+        case INDEX_op_add2_i32:
+        case INDEX_op_sub2_i32:
+            if (temps[args[2]].state == TCG_TEMP_CONST
+                && temps[args[3]].state == TCG_TEMP_CONST
+                && temps[args[4]].state == TCG_TEMP_CONST
+                && temps[args[5]].state == TCG_TEMP_CONST) {
+                uint32_t al = temps[args[2]].val;
+                uint32_t ah = temps[args[3]].val;
+                uint32_t bl = temps[args[4]].val;
+                uint32_t bh = temps[args[5]].val;
+                uint64_t a = ((uint64_t)ah << 32) | al;
+                uint64_t b = ((uint64_t)bh << 32) | bl;
+                TCGArg rl, rh;
+
+                if (op == INDEX_op_add2_i32) {
+                    a += b;
+                } else {
+                    a -= b;
+                }
+
+                /* We emit the extra nop when we emit the add2/sub2.  */
+                assert(gen_opc_buf[op_index + 1] == INDEX_op_nop);
+
+                rl = args[0];
+                rh = args[1];
+                gen_opc_buf[op_index] = INDEX_op_movi_i32;
+                gen_opc_buf[++op_index] = INDEX_op_movi_i32;
+                tcg_opt_gen_movi(&gen_args[0], rl, (uint32_t)a);
+                tcg_opt_gen_movi(&gen_args[2], rh, (uint32_t)(a >> 32));
+                gen_args += 4;
+                args += 6;
+                break;
+            }
+            goto do_default;
+
         case INDEX_op_brcond2_i32:
             tmp = do_constant_folding_cond2(&args[0], &args[2], args[4]);
             if (tmp != 2) {
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index bd93fe4..1f5a021 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -25,6 +25,11 @@
 
 int gen_new_label(void);
 
+static inline void tcg_gen_op0(TCGOpcode opc)
+{
+    *gen_opc_ptr++ = opc;
+}
+
 static inline void tcg_gen_op1_i32(TCGOpcode opc, TCGv_i32 arg1)
 {
     *gen_opc_ptr++ = opc;
@@ -866,6 +871,8 @@ static inline void tcg_gen_add_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
     tcg_gen_op6_i32(INDEX_op_add2_i32, TCGV_LOW(ret), TCGV_HIGH(ret),
                     TCGV_LOW(arg1), TCGV_HIGH(arg1), TCGV_LOW(arg2),
                     TCGV_HIGH(arg2));
+    /* Allow the optimizer room to replace add2 with two moves.  */
+    tcg_gen_op0(INDEX_op_nop);
 }
 
 static inline void tcg_gen_sub_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
@@ -873,6 +880,8 @@ static inline void tcg_gen_sub_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
     tcg_gen_op6_i32(INDEX_op_sub2_i32, TCGV_LOW(ret), TCGV_HIGH(ret),
                     TCGV_LOW(arg1), TCGV_HIGH(arg1), TCGV_LOW(arg2),
                     TCGV_HIGH(arg2));
+    /* Allow the optimizer room to replace sub2 with two moves.  */
+    tcg_gen_op0(INDEX_op_nop);
 }
 
 static inline void tcg_gen_and_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 09/10] tcg: Optimize half-dead add2/sub2
  2012-10-02 18:32 [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Richard Henderson
                   ` (7 preceding siblings ...)
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 08/10] tcg: Constant fold add2 and sub2 Richard Henderson
@ 2012-10-02 18:32 ` Richard Henderson
  2012-10-16 23:25   ` Aurelien Jarno
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 10/10] tcg: Optimize mulu2 Richard Henderson
  2012-10-17 16:41 ` [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Aurelien Jarno
  10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2012-10-02 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

When x86_64 guest is not in 64-bit mode, the high-part of the 64-bit
add is dead.  When the host is 32-bit, we can simplify to 32-bit
arithmetic.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index c069e44..21c1074 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1306,8 +1306,39 @@ static void tcg_liveness_analysis(TCGContext *s)
             break;
         case INDEX_op_end:
             break;
-            /* XXX: optimize by hardcoding common cases (e.g. triadic ops) */
+
+        case INDEX_op_add2_i32:
+        case INDEX_op_sub2_i32:
+            args -= 6;
+            nb_iargs = 4;
+            nb_oargs = 2;
+            /* Test if the high part of the operation is dead, but not
+               the low part.  The result can be optimized to a simple
+               add or sub.  This happens often for x86_64 guest when the
+               cpu mode is set to 32 bit.  */
+            if (dead_temps[args[1]]) {
+                if (dead_temps[args[0]]) {
+                    goto do_remove;
+                }
+                /* Create the single operation plus nop.  */
+                if (op == INDEX_op_add2_i32) {
+                    op = INDEX_op_add_i32;
+                } else {
+                    op = INDEX_op_sub_i32;
+                }
+                gen_opc_buf[op_index] = op;
+                args[1] = args[2];
+                args[2] = args[4];
+                assert(gen_opc_buf[op_index + 1] == INDEX_op_nop);
+                tcg_set_nop(s, gen_opc_buf + op_index + 1, args + 3, 3);
+                /* Fall through and mark the single-word operation live.  */
+                nb_iargs = 2;
+                nb_oargs = 1;
+            }
+            goto do_not_remove;
+
         default:
+            /* XXX: optimize by hardcoding common cases (e.g. triadic ops) */
             args -= def->nb_args;
             nb_iargs = def->nb_iargs;
             nb_oargs = def->nb_oargs;
@@ -1321,6 +1352,7 @@ static void tcg_liveness_analysis(TCGContext *s)
                     if (!dead_temps[arg])
                         goto do_not_remove;
                 }
+            do_remove:
                 tcg_set_nop(s, gen_opc_buf + op_index, args, def->nb_args);
 #ifdef CONFIG_PROFILER
                 s->del_op_count++;
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 10/10] tcg: Optimize mulu2
  2012-10-02 18:32 [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Richard Henderson
                   ` (8 preceding siblings ...)
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 09/10] tcg: Optimize half-dead add2/sub2 Richard Henderson
@ 2012-10-02 18:32 ` Richard Henderson
  2012-10-16 23:25   ` Aurelien Jarno
  2012-10-17 16:41 ` [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Aurelien Jarno
  10 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2012-10-02 18:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Like add2, do operand ordering, constant folding, and dead operand
elimination.  The latter happens about 15% of all mulu2 during an
x86_64 bios boot.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 26 ++++++++++++++++++++++++++
 tcg/tcg-op.h   |  2 ++
 tcg/tcg.c      | 19 +++++++++++++++++++
 3 files changed, 47 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 05891ef..a06c8eb 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -543,6 +543,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             swap_commutative(args[0], &args[2], &args[4]);
             swap_commutative(args[1], &args[3], &args[5]);
             break;
+        case INDEX_op_mulu2_i32:
+            swap_commutative(args[0], &args[2], &args[3]);
+            break;
         case INDEX_op_brcond2_i32:
             if (swap_commutative2(&args[0], &args[2])) {
                 args[4] = tcg_swap_cond(args[4]);
@@ -831,6 +834,29 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             }
             goto do_default;
 
+        case INDEX_op_mulu2_i32:
+            if (temps[args[2]].state == TCG_TEMP_CONST
+                && temps[args[3]].state == TCG_TEMP_CONST) {
+                uint32_t a = temps[args[2]].val;
+                uint32_t b = temps[args[3]].val;
+                uint64_t r = (uint64_t)a * b;
+                TCGArg rl, rh;
+
+                /* We emit the extra nop when we emit the mulu2.  */
+                assert(gen_opc_buf[op_index + 1] == INDEX_op_nop);
+
+                rl = args[0];
+                rh = args[1];
+                gen_opc_buf[op_index] = INDEX_op_movi_i32;
+                gen_opc_buf[++op_index] = INDEX_op_movi_i32;
+                tcg_opt_gen_movi(&gen_args[0], rl, (uint32_t)r);
+                tcg_opt_gen_movi(&gen_args[2], rh, (uint32_t)(r >> 32));
+                gen_args += 4;
+                args += 4;
+                break;
+            }
+            goto do_default;
+
         case INDEX_op_brcond2_i32:
             tmp = do_constant_folding_cond2(&args[0], &args[2], args[4]);
             if (tmp != 2) {
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 1f5a021..044e648 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -997,6 +997,8 @@ static inline void tcg_gen_mul_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
 
     tcg_gen_op4_i32(INDEX_op_mulu2_i32, TCGV_LOW(t0), TCGV_HIGH(t0),
                     TCGV_LOW(arg1), TCGV_LOW(arg2));
+    /* Allow the optimizer room to replace mulu2 with two moves.  */
+    tcg_gen_op0(INDEX_op_nop);
 
     tcg_gen_mul_i32(t1, TCGV_LOW(arg1), TCGV_HIGH(arg2));
     tcg_gen_add_i32(TCGV_HIGH(t0), TCGV_HIGH(t0), t1);
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 21c1074..8280489 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1337,6 +1337,25 @@ static void tcg_liveness_analysis(TCGContext *s)
             }
             goto do_not_remove;
 
+        case INDEX_op_mulu2_i32:
+            args -= 4;
+            nb_iargs = 2;
+            nb_oargs = 2;
+            /* Likewise, test for the high part of the operation dead.  */
+            if (dead_temps[args[1]]) {
+                if (dead_temps[args[0]]) {
+                    goto do_remove;
+                }
+                gen_opc_buf[op_index] = op = INDEX_op_mul_i32;
+                args[1] = args[2];
+                args[2] = args[3];
+                assert(gen_opc_buf[op_index + 1] == INDEX_op_nop);
+                tcg_set_nop(s, gen_opc_buf + op_index + 1, args + 3, 1);
+                /* Fall through and mark the single-word operation live.  */
+                nb_oargs = 1;
+            }
+            goto do_not_remove;
+
         default:
             /* XXX: optimize by hardcoding common cases (e.g. triadic ops) */
             args -= def->nb_args;
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] tcg: Split out swap_commutative as a subroutine
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 01/10] tcg: Split out swap_commutative as a subroutine Richard Henderson
@ 2012-10-09 15:13   ` Aurelien Jarno
  2012-10-09 15:23     ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2012-10-09 15:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Oct 02, 2012 at 11:32:21AM -0700, Richard Henderson wrote:
> Reduces code duplication and prefers
> 
>   movcond d, c1, c2, const, s
> to
>   movcond d, c1, c2, s, const
> 
> It also prefers
> 
>   add r, r, c
> over
>   add r, c, r
> 
> when both inputs are known constants.  This doesn't matter for true add, as
> we will fully constant fold that.  But it matters for a follow-on patch using
> this routine for add2 which may not be fully foldable.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 56 ++++++++++++++++++++++++--------------------------------
>  1 file changed, 24 insertions(+), 32 deletions(-)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 35532a1..5e0504a 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -382,6 +382,23 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
>      tcg_abort();
>  }
>  
> +static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
> +{
> +    TCGArg a1 = *p1, a2 = *p2;
> +    int sum = 0;
> +    sum += temps[a1].state == TCG_TEMP_CONST;
> +    sum -= temps[a2].state == TCG_TEMP_CONST;
> +
> +    /* Prefer the constant in second argument, and then the form
> +       op a, a, b, which is better handled on non-RISC hosts. */
> +    if (sum > 0 || (sum == 0 && dest == a2)) {
> +        *p1 = a2;
> +        *p2 = a1;
> +        return true;
> +    }
> +    return false;
> +}
> +

Does this sum += and -= actually generates better code than the previous
one? It's not something obvious to read (fortunately there is the
comment for helping), so if it doesn't bring any optimization, it's
better to keep the previous form.

>  /* Propagate constants and copies, fold constant expressions. */
>  static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>                                      TCGArg *args, TCGOpDef *tcg_op_defs)
> @@ -391,7 +408,6 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>      const TCGOpDef *def;
>      TCGArg *gen_args;
>      TCGArg tmp;
> -    TCGCond cond;
>  
>      /* Array VALS has an element for each temp.
>         If this temp holds a constant then its value is kept in VALS' element.
> @@ -434,52 +450,28 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>          CASE_OP_32_64(eqv):
>          CASE_OP_32_64(nand):
>          CASE_OP_32_64(nor):
> -            /* Prefer the constant in second argument, and then the form
> -               op a, a, b, which is better handled on non-RISC hosts. */
> -            if (temps[args[1]].state == TCG_TEMP_CONST || (args[0] == args[2]
> -                && temps[args[2]].state != TCG_TEMP_CONST)) {
> -                tmp = args[1];
> -                args[1] = args[2];
> -                args[2] = tmp;
> -            }
> +            swap_commutative(args[0], &args[1], &args[2]);
>              break;
>          CASE_OP_32_64(brcond):
> -            if (temps[args[0]].state == TCG_TEMP_CONST
> -                && temps[args[1]].state != TCG_TEMP_CONST) {
> -                tmp = args[0];
> -                args[0] = args[1];
> -                args[1] = tmp;
> +            if (swap_commutative(-1, &args[0], &args[1])) {
>                  args[2] = tcg_swap_cond(args[2]);
>              }
>              break;
>          CASE_OP_32_64(setcond):
> -            if (temps[args[1]].state == TCG_TEMP_CONST
> -                && temps[args[2]].state != TCG_TEMP_CONST) {
> -                tmp = args[1];
> -                args[1] = args[2];
> -                args[2] = tmp;
> +            if (swap_commutative(args[0], &args[1], &args[2])) {
>                  args[3] = tcg_swap_cond(args[3]);
>              }
>              break;
>          CASE_OP_32_64(movcond):
> -            cond = args[5];
> -            if (temps[args[1]].state == TCG_TEMP_CONST
> -                && temps[args[2]].state != TCG_TEMP_CONST) {
> -                tmp = args[1];
> -                args[1] = args[2];
> -                args[2] = tmp;
> -                cond = tcg_swap_cond(cond);
> +            if (swap_commutative(-1, &args[1], &args[2])) {
> +                args[5] = tcg_swap_cond(args[5]);
>              }
>              /* For movcond, we canonicalize the "false" input reg to match
>                 the destination reg so that the tcg backend can implement
>                 a "move if true" operation.  */
> -            if (args[0] == args[3]) {
> -                tmp = args[3];
> -                args[3] = args[4];
> -                args[4] = tmp;
> -                cond = tcg_invert_cond(cond);
> +            if (swap_commutative(args[0], &args[4], &args[3])) {
> +                args[5] = tcg_invert_cond(args[5]);
>              }
> -            args[5] = cond;
>          default:
>              break;
>          }

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>


-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] tcg: Canonicalize add2 operand ordering
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 02/10] tcg: Canonicalize add2 operand ordering Richard Henderson
@ 2012-10-09 15:14   ` Aurelien Jarno
  0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2012-10-09 15:14 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Oct 02, 2012 at 11:32:22AM -0700, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 5e0504a..3539826 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -472,6 +472,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>              if (swap_commutative(args[0], &args[4], &args[3])) {
>                  args[5] = tcg_invert_cond(args[5]);
>              }
> +            break;
> +        case INDEX_op_add2_i32:
> +            swap_commutative(args[0], &args[2], &args[4]);
> +            swap_commutative(args[1], &args[3], &args[5]);
> +            break;
>          default:
>              break;
>          }

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 03/10] tcg: Swap commutative double-word comparisons
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 03/10] tcg: Swap commutative double-word comparisons Richard Henderson
@ 2012-10-09 15:16   ` Aurelien Jarno
  2012-10-09 15:31     ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2012-10-09 15:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Oct 02, 2012 at 11:32:23AM -0700, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 3539826..a713513 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -399,6 +399,22 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
>      return false;
>  }
>  
> +static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
> +{
> +    int sum = 0;
> +    sum += temps[p1[0]].state == TCG_TEMP_CONST;
> +    sum += temps[p1[1]].state == TCG_TEMP_CONST;
> +    sum -= temps[p2[0]].state == TCG_TEMP_CONST;
> +    sum -= temps[p2[1]].state == TCG_TEMP_CONST;
> +    if (sum > 0) {
> +        TCGArg t;
> +        t = p1[0], p1[0] = p2[0], p2[0] = t;
> +        t = p1[1], p1[1] = p2[1], p2[1] = t;
> +        return true;
> +    }
> +    return false;
> +}
> +
>  /* Propagate constants and copies, fold constant expressions. */
>  static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>                                      TCGArg *args, TCGOpDef *tcg_op_defs)
> @@ -477,6 +493,16 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>              swap_commutative(args[0], &args[2], &args[4]);
>              swap_commutative(args[1], &args[3], &args[5]);
>              break;
> +        case INDEX_op_brcond2_i32:
> +            if (swap_commutative2(&args[0], &args[2])) {
> +                args[4] = tcg_swap_cond(args[4]);
> +            }
> +            break;
> +        case INDEX_op_setcond2_i32:
> +            if (swap_commutative2(&args[1], &args[3])) {
> +                args[5] = tcg_swap_cond(args[5]);
> +            }
> +            break;
>          default:
>              break;
>          }

Same comment are for the swap_commutative() patch, otherwise:

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] tcg: Split out swap_commutative as a subroutine
  2012-10-09 15:13   ` Aurelien Jarno
@ 2012-10-09 15:23     ` Richard Henderson
  2012-10-09 15:31       ` Aurelien Jarno
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2012-10-09 15:23 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 08:13 AM, Aurelien Jarno wrote:
>> > It also prefers
>> > 
>> >   add r, r, c
>> > over
>> >   add r, c, r
>> > 
>> > when both inputs are known constants.  This doesn't matter for true add, as
>> > we will fully constant fold that.  But it matters for a follow-on patch using
>> > this routine for add2 which may not be fully foldable.
...
> Does this sum += and -= actually generates better code than the previous
> one? It's not something obvious to read (fortunately there is the
> comment for helping), so if it doesn't bring any optimization, it's
> better to keep the previous form.

Yes.  See the comment within the log above.


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 04/10] tcg: Use common code when failing to optimize
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 04/10] tcg: Use common code when failing to optimize Richard Henderson
@ 2012-10-09 15:25   ` Aurelien Jarno
  2012-10-09 15:33     ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2012-10-09 15:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Oct 02, 2012 at 11:32:24AM -0700, Richard Henderson wrote:
> This saves a whole lot of repetitive code sequences.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 91 +++++++++++++++++++++-------------------------------------
>  1 file changed, 32 insertions(+), 59 deletions(-)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index a713513..592d166 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -639,6 +639,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>              gen_args += 2;
>              args += 2;
>              break;
> +

Why this new line?

>          CASE_OP_32_64(not):
>          CASE_OP_32_64(neg):
>          CASE_OP_32_64(ext8s):
> @@ -651,14 +652,12 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>                  gen_opc_buf[op_index] = op_to_movi(op);
>                  tmp = do_constant_folding(op, temps[args[1]].val, 0);
>                  tcg_opt_gen_movi(gen_args, args[0], tmp);
> -            } else {
> -                reset_temp(args[0]);
> -                gen_args[0] = args[0];
> -                gen_args[1] = args[1];
> +                gen_args += 2;
> +                args += 2;
> +                break;
>              }
> -            gen_args += 2;
> -            args += 2;
> -            break;
> +            goto do_default;
> +
>          CASE_OP_32_64(add):
>          CASE_OP_32_64(sub):
>          CASE_OP_32_64(mul):
> @@ -682,15 +681,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>                                            temps[args[2]].val);
>                  tcg_opt_gen_movi(gen_args, args[0], tmp);
>                  gen_args += 2;
> -            } else {
> -                reset_temp(args[0]);
> -                gen_args[0] = args[0];
> -                gen_args[1] = args[1];
> -                gen_args[2] = args[2];
> -                gen_args += 3;
> +                args += 3;
> +                break;
>              }
> -            args += 3;
> -            break;
> +            goto do_default;
> +
>          CASE_OP_32_64(deposit):
>              if (temps[args[1]].state == TCG_TEMP_CONST
>                  && temps[args[2]].state == TCG_TEMP_CONST) {
> @@ -700,33 +695,22 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>                        | ((temps[args[2]].val & tmp) << args[3]);
>                  tcg_opt_gen_movi(gen_args, args[0], tmp);
>                  gen_args += 2;
> -            } else {
> -                reset_temp(args[0]);
> -                gen_args[0] = args[0];
> -                gen_args[1] = args[1];
> -                gen_args[2] = args[2];
> -                gen_args[3] = args[3];
> -                gen_args[4] = args[4];
> -                gen_args += 5;
> +                args += 5;
> +                break;
>              }
> -            args += 5;
> -            break;
> +            goto do_default;
> +
>          CASE_OP_32_64(setcond):
>              tmp = do_constant_folding_cond(op, args[1], args[2], args[3]);
>              if (tmp != 2) {
>                  gen_opc_buf[op_index] = op_to_movi(op);
>                  tcg_opt_gen_movi(gen_args, args[0], tmp);
>                  gen_args += 2;
> -            } else {
> -                reset_temp(args[0]);
> -                gen_args[0] = args[0];
> -                gen_args[1] = args[1];
> -                gen_args[2] = args[2];
> -                gen_args[3] = args[3];
> -                gen_args += 4;
> +                args += 4;
> +                break;
>              }
> -            args += 4;
> -            break;
> +            goto do_default;
> +
>          CASE_OP_32_64(brcond):
>              tmp = do_constant_folding_cond(op, args[0], args[1], args[2]);
>              if (tmp != 2) {
> @@ -738,17 +722,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>                  } else {
>                      gen_opc_buf[op_index] = INDEX_op_nop;
>                  }
> -            } else {
> -                memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
> -                reset_temp(args[0]);
> -                gen_args[0] = args[0];
> -                gen_args[1] = args[1];
> -                gen_args[2] = args[2];
> -                gen_args[3] = args[3];
> -                gen_args += 4;
> +                args += 4;
> +                break;
>              }
> -            args += 4;
> -            break;
> +            goto do_default;
> +
>          CASE_OP_32_64(movcond):
>              tmp = do_constant_folding_cond(op, args[1], args[2], args[5]);
>              if (tmp != 2) {
> @@ -763,18 +741,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>                      tcg_opt_gen_mov(s, gen_args, args[0], args[4-tmp]);
>                      gen_args += 2;
>                  }
> -            } else {
> -                reset_temp(args[0]);
> -                gen_args[0] = args[0];
> -                gen_args[1] = args[1];
> -                gen_args[2] = args[2];
> -                gen_args[3] = args[3];
> -                gen_args[4] = args[4];
> -                gen_args[5] = args[5];
> -                gen_args += 6;
> +                args += 6;
> +                break;
>              }
> -            args += 6;
> -            break;
> +            goto do_default;
> +
>          case INDEX_op_call:
>              nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
>              if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) {
> @@ -793,11 +764,13 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>                  i--;
>              }
>              break;
> +
>          default:
> -            /* Default case: we do know nothing about operation so no
> -               propagation is done.  We trash everything if the operation
> -               is the end of a basic block, otherwise we only trash the
> -               output args.  */
> +        do_default:
> +            /* Default case: we know nothing about operation (or were unable
> +               to compute the operation result) so no propagation is done.
> +               We trash everything if the operation is the end of a basic
> +               block, otherwise we only trash the output args.  */
>              if (def->flags & TCG_OPF_BB_END) {
>                  memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
>              } else {

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] tcg: Split out swap_commutative as a subroutine
  2012-10-09 15:23     ` Richard Henderson
@ 2012-10-09 15:31       ` Aurelien Jarno
  2012-10-09 16:40         ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2012-10-09 15:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Oct 09, 2012 at 08:23:27AM -0700, Richard Henderson wrote:
> On 10/09/2012 08:13 AM, Aurelien Jarno wrote:
> >> > It also prefers
> >> > 
> >> >   add r, r, c
> >> > over
> >> >   add r, c, r
> >> > 
> >> > when both inputs are known constants.  This doesn't matter for true add, as
> >> > we will fully constant fold that.  But it matters for a follow-on patch using
> >> > this routine for add2 which may not be fully foldable.
> ...
> > Does this sum += and -= actually generates better code than the previous
> > one? It's not something obvious to read (fortunately there is the
> > comment for helping), so if it doesn't bring any optimization, it's
> > better to keep the previous form.
> 
> Yes.  See the comment within the log above.

I am not talking about the code generated by TCG, but rather by the code
generated by GCC. Does using sum += and sum -= brings a gain to compare
to the equivalent if function?

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 03/10] tcg: Swap commutative double-word comparisons
  2012-10-09 15:16   ` Aurelien Jarno
@ 2012-10-09 15:31     ` Richard Henderson
  2012-10-09 15:48       ` Aurelien Jarno
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2012-10-09 15:31 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 08:16 AM, Aurelien Jarno wrote:
>> > +static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
>> > +{
>> > +    int sum = 0;
>> > +    sum += temps[p1[0]].state == TCG_TEMP_CONST;
>> > +    sum += temps[p1[1]].state == TCG_TEMP_CONST;
>> > +    sum -= temps[p2[0]].state == TCG_TEMP_CONST;
>> > +    sum -= temps[p2[1]].state == TCG_TEMP_CONST;
>> > +    if (sum > 0) {
...
> Same comment are for the swap_commutative() patch, otherwise:

While I don't have an explicit test case for swap_commutative2 like
I do for swap_commutative, think about how many conditionals you'd
have to use to write this without using SUM:

  if (((temps[p1[0]].state == TCG_TEMP_CONST            // if both p1 are const
        && temps[p1[1]].state == TCG_TEMP_CONST
        && !(temps[p2[0]].state == TCG_TEMP_CONST       // ... and not both p2 are const
             && temps[p2[1]].state == TCG_TEMP_CONST))
      || ((temps[p1[0]].state == TCG_TEMP_CONST         // if either p1 are const
           || temps[p1[1]].state == TCG_TEMP_CONST)
          && !temps[p2[0]].state == TCG_TEMP_CONST      // ... and neither p2 are const
          && !temps[p2[1]].state == TCG_TEMP_CONST))

I don't see how that can possibly be easier to understand.


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 04/10] tcg: Use common code when failing to optimize
  2012-10-09 15:25   ` Aurelien Jarno
@ 2012-10-09 15:33     ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2012-10-09 15:33 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 08:25 AM, Aurelien Jarno wrote:
>> > @@ -639,6 +639,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>> >              gen_args += 2;
>> >              args += 2;
>> >              break;
>> > +
> Why this new line?
> 

After the patch, all of the CASE blocks have a separator line.
It looked weird to have that be true of all but one case block
within that switch.


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 03/10] tcg: Swap commutative double-word comparisons
  2012-10-09 15:31     ` Richard Henderson
@ 2012-10-09 15:48       ` Aurelien Jarno
  0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2012-10-09 15:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Oct 09, 2012 at 08:31:47AM -0700, Richard Henderson wrote:
> On 10/09/2012 08:16 AM, Aurelien Jarno wrote:
> >> > +static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
> >> > +{
> >> > +    int sum = 0;
> >> > +    sum += temps[p1[0]].state == TCG_TEMP_CONST;
> >> > +    sum += temps[p1[1]].state == TCG_TEMP_CONST;
> >> > +    sum -= temps[p2[0]].state == TCG_TEMP_CONST;
> >> > +    sum -= temps[p2[1]].state == TCG_TEMP_CONST;
> >> > +    if (sum > 0) {
> ...
> > Same comment are for the swap_commutative() patch, otherwise:
> 
> While I don't have an explicit test case for swap_commutative2 like
> I do for swap_commutative, think about how many conditionals you'd
> have to use to write this without using SUM:
> 
>   if (((temps[p1[0]].state == TCG_TEMP_CONST            // if both p1 are const
>         && temps[p1[1]].state == TCG_TEMP_CONST
>         && !(temps[p2[0]].state == TCG_TEMP_CONST       // ... and not both p2 are const
>              && temps[p2[1]].state == TCG_TEMP_CONST))
>       || ((temps[p1[0]].state == TCG_TEMP_CONST         // if either p1 are const
>            || temps[p1[1]].state == TCG_TEMP_CONST)
>           && !temps[p2[0]].state == TCG_TEMP_CONST      // ... and neither p2 are const
>           && !temps[p2[1]].state == TCG_TEMP_CONST))
> 
> I don't see how that can possibly be easier to understand.
> 
> 

For that one I agree.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] tcg: Optimize double-word comparisons against zero
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 05/10] tcg: Optimize double-word comparisons against zero Richard Henderson
@ 2012-10-09 16:32   ` Aurelien Jarno
  0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2012-10-09 16:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Oct 02, 2012 at 11:32:25AM -0700, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 592d166..5804b66 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -746,6 +746,45 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>              }
>              goto do_default;
>  
> +        case INDEX_op_brcond2_i32:
> +            /* Simplify LT/GE comparisons vs zero to a single compare
> +               vs the high word of the input.  */
> +            if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
> +                && temps[args[2]].state == TCG_TEMP_CONST
> +                && temps[args[3]].state == TCG_TEMP_CONST
> +                && temps[args[2]].val == 0
> +                && temps[args[3]].val == 0) {
> +                gen_opc_buf[op_index] = INDEX_op_brcond_i32;
> +                gen_args[0] = args[1];
> +                gen_args[1] = args[3];
> +                gen_args[2] = args[4];
> +                gen_args[3] = args[5];
> +                gen_args += 4;
> +                args += 6;
> +                memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
> +                break;
> +            }
> +            goto do_default;
> +
> +        case INDEX_op_setcond2_i32:
> +            /* Simplify LT/GE comparisons vs zero to a single compare
> +               vs the high word of the input.  */
> +            if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
> +                && temps[args[3]].state == TCG_TEMP_CONST
> +                && temps[args[4]].state == TCG_TEMP_CONST
> +                && temps[args[3]].val == 0
> +                && temps[args[4]].val == 0) {
> +                gen_opc_buf[op_index] = INDEX_op_setcond_i32;
> +                gen_args[0] = args[0];
> +                gen_args[1] = args[2];
> +                gen_args[2] = args[4];
> +                gen_args[3] = args[5];
> +                gen_args += 4;
> +                args += 6;
> +                break;
> +            }
> +            goto do_default;
> +
>          case INDEX_op_call:
>              nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);
>              if (!(args[nb_call_args + 1] & (TCG_CALL_CONST | TCG_CALL_PURE))) {

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>


-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 06/10] tcg: Split out subroutines from do_constant_folding_cond
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 06/10] tcg: Split out subroutines from do_constant_folding_cond Richard Henderson
@ 2012-10-09 16:33   ` Aurelien Jarno
  0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2012-10-09 16:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Oct 02, 2012 at 11:32:26AM -0700, Richard Henderson wrote:
> We can re-use these for implementing double-word folding.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 146 ++++++++++++++++++++++++++++++++-------------------------
>  1 file changed, 81 insertions(+), 65 deletions(-)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 5804b66..38027dc 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -292,6 +292,82 @@ static TCGArg do_constant_folding(TCGOpcode op, TCGArg x, TCGArg y)
>      return res;
>  }
>  
> +static bool do_constant_folding_cond_32(uint32_t x, uint32_t y, TCGCond c)
> +{
> +    switch (c) {
> +    case TCG_COND_EQ:
> +        return x == y;
> +    case TCG_COND_NE:
> +        return x != y;
> +    case TCG_COND_LT:
> +        return (int32_t)x < (int32_t)y;
> +    case TCG_COND_GE:
> +        return (int32_t)x >= (int32_t)y;
> +    case TCG_COND_LE:
> +        return (int32_t)x <= (int32_t)y;
> +    case TCG_COND_GT:
> +        return (int32_t)x > (int32_t)y;
> +    case TCG_COND_LTU:
> +        return x < y;
> +    case TCG_COND_GEU:
> +        return x >= y;
> +    case TCG_COND_LEU:
> +        return x <= y;
> +    case TCG_COND_GTU:
> +        return x > y;
> +    default:
> +        tcg_abort();
> +    }
> +}
> +
> +static bool do_constant_folding_cond_64(uint64_t x, uint64_t y, TCGCond c)
> +{
> +    switch (c) {
> +    case TCG_COND_EQ:
> +        return x == y;
> +    case TCG_COND_NE:
> +        return x != y;
> +    case TCG_COND_LT:
> +        return (int64_t)x < (int64_t)y;
> +    case TCG_COND_GE:
> +        return (int64_t)x >= (int64_t)y;
> +    case TCG_COND_LE:
> +        return (int64_t)x <= (int64_t)y;
> +    case TCG_COND_GT:
> +        return (int64_t)x > (int64_t)y;
> +    case TCG_COND_LTU:
> +        return x < y;
> +    case TCG_COND_GEU:
> +        return x >= y;
> +    case TCG_COND_LEU:
> +        return x <= y;
> +    case TCG_COND_GTU:
> +        return x > y;
> +    default:
> +        tcg_abort();
> +    }
> +}
> +
> +static bool do_constant_folding_cond_eq(TCGCond c)
> +{
> +    switch (c) {
> +    case TCG_COND_GT:
> +    case TCG_COND_LTU:
> +    case TCG_COND_LT:
> +    case TCG_COND_GTU:
> +    case TCG_COND_NE:
> +        return 0;
> +    case TCG_COND_GE:
> +    case TCG_COND_GEU:
> +    case TCG_COND_LE:
> +    case TCG_COND_LEU:
> +    case TCG_COND_EQ:
> +        return 1;
> +    default:
> +        tcg_abort();
> +    }
> +}
> +
>  /* Return 2 if the condition can't be simplified, and the result
>     of the condition (0 or 1) if it can */
>  static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
> @@ -300,69 +376,14 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
>      if (temps[x].state == TCG_TEMP_CONST && temps[y].state == TCG_TEMP_CONST) {
>          switch (op_bits(op)) {
>          case 32:
> -            switch (c) {
> -            case TCG_COND_EQ:
> -                return (uint32_t)temps[x].val == (uint32_t)temps[y].val;
> -            case TCG_COND_NE:
> -                return (uint32_t)temps[x].val != (uint32_t)temps[y].val;
> -            case TCG_COND_LT:
> -                return (int32_t)temps[x].val < (int32_t)temps[y].val;
> -            case TCG_COND_GE:
> -                return (int32_t)temps[x].val >= (int32_t)temps[y].val;
> -            case TCG_COND_LE:
> -                return (int32_t)temps[x].val <= (int32_t)temps[y].val;
> -            case TCG_COND_GT:
> -                return (int32_t)temps[x].val > (int32_t)temps[y].val;
> -            case TCG_COND_LTU:
> -                return (uint32_t)temps[x].val < (uint32_t)temps[y].val;
> -            case TCG_COND_GEU:
> -                return (uint32_t)temps[x].val >= (uint32_t)temps[y].val;
> -            case TCG_COND_LEU:
> -                return (uint32_t)temps[x].val <= (uint32_t)temps[y].val;
> -            case TCG_COND_GTU:
> -                return (uint32_t)temps[x].val > (uint32_t)temps[y].val;
> -            }
> -            break;
> +            return do_constant_folding_cond_32(temps[x].val, temps[y].val, c);
>          case 64:
> -            switch (c) {
> -            case TCG_COND_EQ:
> -                return (uint64_t)temps[x].val == (uint64_t)temps[y].val;
> -            case TCG_COND_NE:
> -                return (uint64_t)temps[x].val != (uint64_t)temps[y].val;
> -            case TCG_COND_LT:
> -                return (int64_t)temps[x].val < (int64_t)temps[y].val;
> -            case TCG_COND_GE:
> -                return (int64_t)temps[x].val >= (int64_t)temps[y].val;
> -            case TCG_COND_LE:
> -                return (int64_t)temps[x].val <= (int64_t)temps[y].val;
> -            case TCG_COND_GT:
> -                return (int64_t)temps[x].val > (int64_t)temps[y].val;
> -            case TCG_COND_LTU:
> -                return (uint64_t)temps[x].val < (uint64_t)temps[y].val;
> -            case TCG_COND_GEU:
> -                return (uint64_t)temps[x].val >= (uint64_t)temps[y].val;
> -            case TCG_COND_LEU:
> -                return (uint64_t)temps[x].val <= (uint64_t)temps[y].val;
> -            case TCG_COND_GTU:
> -                return (uint64_t)temps[x].val > (uint64_t)temps[y].val;
> -            }
> -            break;
> +            return do_constant_folding_cond_64(temps[x].val, temps[y].val, c);
> +        default:
> +            tcg_abort();
>          }
>      } else if (temps_are_copies(x, y)) {
> -        switch (c) {
> -        case TCG_COND_GT:
> -        case TCG_COND_LTU:
> -        case TCG_COND_LT:
> -        case TCG_COND_GTU:
> -        case TCG_COND_NE:
> -            return 0;
> -        case TCG_COND_GE:
> -        case TCG_COND_GEU:
> -        case TCG_COND_LE:
> -        case TCG_COND_LEU:
> -        case TCG_COND_EQ:
> -            return 1;
> -        }
> +        return do_constant_folding_cond_eq(c);
>      } else if (temps[y].state == TCG_TEMP_CONST && temps[y].val == 0) {
>          switch (c) {
>          case TCG_COND_LTU:
> @@ -375,11 +396,6 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
>      } else {
>          return 2;
>      }
> -
> -    fprintf(stderr,
> -            "Unrecognized bitness %d or condition %d in "
> -            "do_constant_folding_cond.\n", op_bits(op), c);
> -    tcg_abort();
>  }
>  
>  static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>


-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] tcg: Split out swap_commutative as a subroutine
  2012-10-09 15:31       ` Aurelien Jarno
@ 2012-10-09 16:40         ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2012-10-09 16:40 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 10/09/2012 08:31 AM, Aurelien Jarno wrote:
> I am not talking about the code generated by TCG, but rather by the code
> generated by GCC. Does using sum += and sum -= brings a gain to compare
> to the equivalent if function?

It's hard to tell.  My guess is that it's about a wash.  Adding an
artificial __attribute__((noinline)) to make it easier to see:

SUM version

0000000000000190 <swap_commutative>:
     190:       48 83 ec 18             sub    $0x18,%rsp
     194:       4c 8b 06                mov    (%rsi),%r8
     197:       48 8b 0a                mov    (%rdx),%rcx
     19a:       64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
     1a1:       00 00 
     1a3:       48 89 44 24 08          mov    %rax,0x8(%rsp)
     1a8:       31 c0                   xor    %eax,%eax
     1aa:       4c 89 c0                mov    %r8,%rax
     1ad:       49 89 c9                mov    %rcx,%r9
     1b0:       48 c1 e0 04             shl    $0x4,%rax
     1b4:       83 b8 00 00 00 00 01    cmpl   $0x1,0x0(%rax)
                        1b6: R_X86_64_32S       .bss
     1bb:       0f 94 c0                sete   %al
     1be:       49 c1 e1 04             shl    $0x4,%r9
     1c2:       41 83 b9 00 00 00 00    cmpl   $0x1,0x0(%r9)
     1c9:       01 
                        1c5: R_X86_64_32S       .bss
     1ca:       0f b6 c0                movzbl %al,%eax
     1cd:       41 0f 94 c1             sete   %r9b
     1d1:       45 0f b6 c9             movzbl %r9b,%r9d
     1d5:       44 29 c8                sub    %r9d,%eax
     1d8:       83 f8 01                cmp    $0x1,%eax
     1db:       75 23                   jne    200 <swap_commutative+0x70>
     1dd:       48 89 0e                mov    %rcx,(%rsi)
     1e0:       b8 01 00 00 00          mov    $0x1,%eax
     1e5:       4c 89 02                mov    %r8,(%rdx)
     1e8:       48 8b 54 24 08          mov    0x8(%rsp),%rdx
     1ed:       64 48 33 14 25 28 00    xor    %fs:0x28,%rdx
     1f4:       00 00 
     1f6:       75 15                   jne    20d <swap_commutative+0x7d>
     1f8:       48 83 c4 18             add    $0x18,%rsp
     1fc:       c3                      retq   
     1fd:       0f 1f 00                nopl   (%rax)
     200:       48 39 cf                cmp    %rcx,%rdi
     203:       75 04                   jne    209 <swap_commutative+0x79>
     205:       85 c0                   test   %eax,%eax
     207:       74 d4                   je     1dd <swap_commutative+0x4d>
     209:       31 c0                   xor    %eax,%eax
     20b:       eb db                   jmp    1e8 <swap_commutative+0x58>
     20d:       0f 1f 00                nopl   (%rax)
     210:       e8 00 00 00 00          callq  215 <swap_commutative+0x85>
                        211: R_X86_64_PC32      __stack_chk_fail-0x4

=======

    if ((temps[a1].state == TCG_TEMP_CONST
         && temps[a2].state != TCG_TEMP_CONST)
        || (dest == a2
            && ((temps[a1].state == TCG_TEMP_CONST
                 && temps[a2].state == TCG_TEMP_CONST)
                || (temps[a1].state != TCG_TEMP_CONST
                     && temps[a2].state != TCG_TEMP_CONST)))) {

0000000000000190 <swap_commutative>:
     190:       48 83 ec 18             sub    $0x18,%rsp
     194:       4c 8b 02                mov    (%rdx),%r8
     197:       64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
     19e:       00 00 
     1a0:       48 89 44 24 08          mov    %rax,0x8(%rsp)
     1a5:       31 c0                   xor    %eax,%eax
     1a7:       48 8b 06                mov    (%rsi),%rax
     1aa:       48 89 c1                mov    %rax,%rcx
     1ad:       48 c1 e1 04             shl    $0x4,%rcx
     1b1:       83 b9 00 00 00 00 01    cmpl   $0x1,0x0(%rcx)
                        1b3: R_X86_64_32S       .bss
     1b8:       74 0e                   je     1c8 <swap_commutative+0x38>
     1ba:       4c 39 c7                cmp    %r8,%rdi
     1bd:       74 39                   je     1f8 <swap_commutative+0x68>
     1bf:       31 c0                   xor    %eax,%eax
     1c1:       eb 20                   jmp    1e3 <swap_commutative+0x53>
     1c3:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
     1c8:       4c 89 c1                mov    %r8,%rcx
     1cb:       48 c1 e1 04             shl    $0x4,%rcx
     1cf:       83 b9 00 00 00 00 01    cmpl   $0x1,0x0(%rcx)
                        1d1: R_X86_64_32S       .bss
     1d6:       74 36                   je     20e <swap_commutative+0x7e>
     1d8:       4c 89 06                mov    %r8,(%rsi)
     1db:       48 89 02                mov    %rax,(%rdx)
     1de:       b8 01 00 00 00          mov    $0x1,%eax
     1e3:       48 8b 54 24 08          mov    0x8(%rsp),%rdx
     1e8:       64 48 33 14 25 28 00    xor    %fs:0x28,%rdx
     1ef:       00 00 
     1f1:       75 16                   jne    209 <swap_commutative+0x79>
     1f3:       48 83 c4 18             add    $0x18,%rsp
     1f7:       c3                      retq   
     1f8:       48 c1 e7 04             shl    $0x4,%rdi
     1fc:       83 bf 00 00 00 00 01    cmpl   $0x1,0x0(%rdi)
                        1fe: R_X86_64_32S       .bss
     203:       75 d3                   jne    1d8 <swap_commutative+0x48>
     205:       31 c0                   xor    %eax,%eax
     207:       eb da                   jmp    1e3 <swap_commutative+0x53>
     209:       e8 00 00 00 00          callq  20e <swap_commutative+0x7e>
                        20a: R_X86_64_PC32      __stack_chk_fail-0x4
     20e:       4c 39 c7                cmp    %r8,%rdi
     211:       75 ac                   jne    1bf <swap_commutative+0x2f>
     213:       eb c3                   jmp    1d8 <swap_commutative+0x48>


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 07/10] tcg: Do constant folding on double-word comparisons
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 07/10] tcg: Do constant folding on double-word comparisons Richard Henderson
@ 2012-10-10  9:45   ` Aurelien Jarno
  0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2012-10-10  9:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Oct 02, 2012 at 11:32:27AM -0700, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 93 +++++++++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 72 insertions(+), 21 deletions(-)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 38027dc..d9251e4 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -398,6 +398,40 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
>      }
>  }
>  
> +/* Return 2 if the condition can't be simplified, and the result
> +   of the condition (0 or 1) if it can */
> +static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
> +{
> +    TCGArg al = p1[0], ah = p1[1];
> +    TCGArg bl = p2[0], bh = p2[1];
> +
> +    if (temps[bl].state == TCG_TEMP_CONST
> +        && temps[bh].state == TCG_TEMP_CONST) {
> +        uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val;
> +
> +        if (temps[al].state == TCG_TEMP_CONST
> +            && temps[ah].state == TCG_TEMP_CONST) {
> +            uint64_t a;
> +            a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val;
> +            return do_constant_folding_cond_64(a, b, c);
> +        }
> +        if (b == 0) {
> +            switch (c) {
> +            case TCG_COND_LTU:
> +                return 0;
> +            case TCG_COND_GEU:
> +                return 1;
> +            default:
> +                break;
> +            }
> +        }
> +    }
> +    if (temps_are_copies(al, bl) && temps_are_copies(ah, bh)) {
> +        return do_constant_folding_cond_eq(c);
> +    }
> +    return 2;
> +}
> +
>  static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
>  {
>      TCGArg a1 = *p1, a2 = *p2;
> @@ -763,43 +797,60 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>              goto do_default;
>  
>          case INDEX_op_brcond2_i32:
> -            /* Simplify LT/GE comparisons vs zero to a single compare
> -               vs the high word of the input.  */
> -            if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
> -                && temps[args[2]].state == TCG_TEMP_CONST
> -                && temps[args[3]].state == TCG_TEMP_CONST
> -                && temps[args[2]].val == 0
> -                && temps[args[3]].val == 0) {
> +            tmp = do_constant_folding_cond2(&args[0], &args[2], args[4]);
> +            if (tmp != 2) {
> +                if (tmp) {
> +                    memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
> +                    gen_opc_buf[op_index] = INDEX_op_br;
> +                    gen_args[0] = args[5];
> +                    gen_args += 1;
> +                } else {
> +                    gen_opc_buf[op_index] = INDEX_op_nop;
> +                }
> +            } else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
> +                       && temps[args[2]].state == TCG_TEMP_CONST
> +                       && temps[args[3]].state == TCG_TEMP_CONST
> +                       && temps[args[2]].val == 0
> +                       && temps[args[3]].val == 0) {
> +                /* Simplify LT/GE comparisons vs zero to a single compare
> +                   vs the high word of the input.  */
> +                memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
>                  gen_opc_buf[op_index] = INDEX_op_brcond_i32;
>                  gen_args[0] = args[1];
>                  gen_args[1] = args[3];
>                  gen_args[2] = args[4];
>                  gen_args[3] = args[5];
>                  gen_args += 4;
> -                args += 6;
> -                memset(temps, 0, nb_temps * sizeof(struct tcg_temp_info));
> -                break;
> +            } else {
> +                goto do_default;
>              }
> -            goto do_default;
> +            args += 6;
> +            break;
>  
>          case INDEX_op_setcond2_i32:
> -            /* Simplify LT/GE comparisons vs zero to a single compare
> -               vs the high word of the input.  */
> -            if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
> -                && temps[args[3]].state == TCG_TEMP_CONST
> -                && temps[args[4]].state == TCG_TEMP_CONST
> -                && temps[args[3]].val == 0
> -                && temps[args[4]].val == 0) {
> +            tmp = do_constant_folding_cond2(&args[1], &args[3], args[5]);
> +            if (tmp != 2) {
> +                gen_opc_buf[op_index] = INDEX_op_movi_i32;
> +                tcg_opt_gen_movi(gen_args, args[0], tmp);
> +                gen_args += 2;
> +            } else if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
> +                       && temps[args[3]].state == TCG_TEMP_CONST
> +                       && temps[args[4]].state == TCG_TEMP_CONST
> +                       && temps[args[3]].val == 0
> +                       && temps[args[4]].val == 0) {
> +                /* Simplify LT/GE comparisons vs zero to a single compare
> +                   vs the high word of the input.  */
>                  gen_opc_buf[op_index] = INDEX_op_setcond_i32;
>                  gen_args[0] = args[0];
>                  gen_args[1] = args[2];
>                  gen_args[2] = args[4];
>                  gen_args[3] = args[5];
>                  gen_args += 4;
> -                args += 6;
> -                break;
> +            } else {
> +                goto do_default;
>              }
> -            goto do_default;
> +            args += 6;
> +            break;
>  
>          case INDEX_op_call:
>              nb_call_args = (args[0] >> 16) + (args[0] & 0xffff);

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 08/10] tcg: Constant fold add2 and sub2
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 08/10] tcg: Constant fold add2 and sub2 Richard Henderson
@ 2012-10-10  9:52   ` Aurelien Jarno
  0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2012-10-10  9:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Oct 02, 2012 at 11:32:28AM -0700, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 35 +++++++++++++++++++++++++++++++++++
>  tcg/tcg-op.h   |  9 +++++++++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index d9251e4..05891ef 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -796,6 +796,41 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>              }
>              goto do_default;
>  
> +        case INDEX_op_add2_i32:
> +        case INDEX_op_sub2_i32:
> +            if (temps[args[2]].state == TCG_TEMP_CONST
> +                && temps[args[3]].state == TCG_TEMP_CONST
> +                && temps[args[4]].state == TCG_TEMP_CONST
> +                && temps[args[5]].state == TCG_TEMP_CONST) {
> +                uint32_t al = temps[args[2]].val;
> +                uint32_t ah = temps[args[3]].val;
> +                uint32_t bl = temps[args[4]].val;
> +                uint32_t bh = temps[args[5]].val;
> +                uint64_t a = ((uint64_t)ah << 32) | al;
> +                uint64_t b = ((uint64_t)bh << 32) | bl;
> +                TCGArg rl, rh;
> +
> +                if (op == INDEX_op_add2_i32) {
> +                    a += b;
> +                } else {
> +                    a -= b;
> +                }
> +
> +                /* We emit the extra nop when we emit the add2/sub2.  */
> +                assert(gen_opc_buf[op_index + 1] == INDEX_op_nop);
> +
> +                rl = args[0];
> +                rh = args[1];
> +                gen_opc_buf[op_index] = INDEX_op_movi_i32;
> +                gen_opc_buf[++op_index] = INDEX_op_movi_i32;
> +                tcg_opt_gen_movi(&gen_args[0], rl, (uint32_t)a);
> +                tcg_opt_gen_movi(&gen_args[2], rh, (uint32_t)(a >> 32));
> +                gen_args += 4;
> +                args += 6;
> +                break;
> +            }
> +            goto do_default;
> +
>          case INDEX_op_brcond2_i32:
>              tmp = do_constant_folding_cond2(&args[0], &args[2], args[4]);
>              if (tmp != 2) {
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index bd93fe4..1f5a021 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -25,6 +25,11 @@
>  
>  int gen_new_label(void);
>  
> +static inline void tcg_gen_op0(TCGOpcode opc)
> +{
> +    *gen_opc_ptr++ = opc;
> +}
> +
>  static inline void tcg_gen_op1_i32(TCGOpcode opc, TCGv_i32 arg1)
>  {
>      *gen_opc_ptr++ = opc;
> @@ -866,6 +871,8 @@ static inline void tcg_gen_add_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>      tcg_gen_op6_i32(INDEX_op_add2_i32, TCGV_LOW(ret), TCGV_HIGH(ret),
>                      TCGV_LOW(arg1), TCGV_HIGH(arg1), TCGV_LOW(arg2),
>                      TCGV_HIGH(arg2));
> +    /* Allow the optimizer room to replace add2 with two moves.  */
> +    tcg_gen_op0(INDEX_op_nop);
>  }
>  
>  static inline void tcg_gen_sub_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
> @@ -873,6 +880,8 @@ static inline void tcg_gen_sub_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>      tcg_gen_op6_i32(INDEX_op_sub2_i32, TCGV_LOW(ret), TCGV_HIGH(ret),
>                      TCGV_LOW(arg1), TCGV_HIGH(arg1), TCGV_LOW(arg2),
>                      TCGV_HIGH(arg2));
> +    /* Allow the optimizer room to replace sub2 with two moves.  */
> +    tcg_gen_op0(INDEX_op_nop);
>  }
>  
>  static inline void tcg_gen_and_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] tcg: Optimize half-dead add2/sub2
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 09/10] tcg: Optimize half-dead add2/sub2 Richard Henderson
@ 2012-10-16 23:25   ` Aurelien Jarno
  0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2012-10-16 23:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Oct 02, 2012 at 11:32:29AM -0700, Richard Henderson wrote:
> When x86_64 guest is not in 64-bit mode, the high-part of the 64-bit
> add is dead.  When the host is 32-bit, we can simplify to 32-bit
> arithmetic.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/tcg.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index c069e44..21c1074 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1306,8 +1306,39 @@ static void tcg_liveness_analysis(TCGContext *s)
>              break;
>          case INDEX_op_end:
>              break;
> -            /* XXX: optimize by hardcoding common cases (e.g. triadic ops) */
> +
> +        case INDEX_op_add2_i32:
> +        case INDEX_op_sub2_i32:
> +            args -= 6;
> +            nb_iargs = 4;
> +            nb_oargs = 2;
> +            /* Test if the high part of the operation is dead, but not
> +               the low part.  The result can be optimized to a simple
> +               add or sub.  This happens often for x86_64 guest when the
> +               cpu mode is set to 32 bit.  */
> +            if (dead_temps[args[1]]) {
> +                if (dead_temps[args[0]]) {
> +                    goto do_remove;
> +                }
> +                /* Create the single operation plus nop.  */
> +                if (op == INDEX_op_add2_i32) {
> +                    op = INDEX_op_add_i32;
> +                } else {
> +                    op = INDEX_op_sub_i32;
> +                }
> +                gen_opc_buf[op_index] = op;
> +                args[1] = args[2];
> +                args[2] = args[4];
> +                assert(gen_opc_buf[op_index + 1] == INDEX_op_nop);
> +                tcg_set_nop(s, gen_opc_buf + op_index + 1, args + 3, 3);
> +                /* Fall through and mark the single-word operation live.  */
> +                nb_iargs = 2;
> +                nb_oargs = 1;
> +            }
> +            goto do_not_remove;
> +
>          default:
> +            /* XXX: optimize by hardcoding common cases (e.g. triadic ops) */
>              args -= def->nb_args;
>              nb_iargs = def->nb_iargs;
>              nb_oargs = def->nb_oargs;
> @@ -1321,6 +1352,7 @@ static void tcg_liveness_analysis(TCGContext *s)
>                      if (!dead_temps[arg])
>                          goto do_not_remove;
>                  }
> +            do_remove:
>                  tcg_set_nop(s, gen_opc_buf + op_index, args, def->nb_args);
>  #ifdef CONFIG_PROFILER
>                  s->del_op_count++;

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 10/10] tcg: Optimize mulu2
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 10/10] tcg: Optimize mulu2 Richard Henderson
@ 2012-10-16 23:25   ` Aurelien Jarno
  2012-10-17  1:09     ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2012-10-16 23:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Oct 02, 2012 at 11:32:30AM -0700, Richard Henderson wrote:
> Like add2, do operand ordering, constant folding, and dead operand
> elimination.  The latter happens about 15% of all mulu2 during an
> x86_64 bios boot.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 26 ++++++++++++++++++++++++++
>  tcg/tcg-op.h   |  2 ++
>  tcg/tcg.c      | 19 +++++++++++++++++++
>  3 files changed, 47 insertions(+)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 05891ef..a06c8eb 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -543,6 +543,9 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>              swap_commutative(args[0], &args[2], &args[4]);
>              swap_commutative(args[1], &args[3], &args[5]);
>              break;
> +        case INDEX_op_mulu2_i32:
> +            swap_commutative(args[0], &args[2], &args[3]);
> +            break;
>          case INDEX_op_brcond2_i32:
>              if (swap_commutative2(&args[0], &args[2])) {
>                  args[4] = tcg_swap_cond(args[4]);
> @@ -831,6 +834,29 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>              }
>              goto do_default;
>  
> +        case INDEX_op_mulu2_i32:
> +            if (temps[args[2]].state == TCG_TEMP_CONST
> +                && temps[args[3]].state == TCG_TEMP_CONST) {
> +                uint32_t a = temps[args[2]].val;
> +                uint32_t b = temps[args[3]].val;
> +                uint64_t r = (uint64_t)a * b;
> +                TCGArg rl, rh;
> +
> +                /* We emit the extra nop when we emit the mulu2.  */
> +                assert(gen_opc_buf[op_index + 1] == INDEX_op_nop);
> +
> +                rl = args[0];
> +                rh = args[1];
> +                gen_opc_buf[op_index] = INDEX_op_movi_i32;
> +                gen_opc_buf[++op_index] = INDEX_op_movi_i32;
> +                tcg_opt_gen_movi(&gen_args[0], rl, (uint32_t)r);
> +                tcg_opt_gen_movi(&gen_args[2], rh, (uint32_t)(r >> 32));
> +                gen_args += 4;
> +                args += 4;
> +                break;
> +            }
> +            goto do_default;
> +
>          case INDEX_op_brcond2_i32:
>              tmp = do_constant_folding_cond2(&args[0], &args[2], args[4]);
>              if (tmp != 2) {
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index 1f5a021..044e648 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -997,6 +997,8 @@ static inline void tcg_gen_mul_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>  
>      tcg_gen_op4_i32(INDEX_op_mulu2_i32, TCGV_LOW(t0), TCGV_HIGH(t0),
>                      TCGV_LOW(arg1), TCGV_LOW(arg2));
> +    /* Allow the optimizer room to replace mulu2 with two moves.  */
> +    tcg_gen_op0(INDEX_op_nop);
>  
>      tcg_gen_mul_i32(t1, TCGV_LOW(arg1), TCGV_HIGH(arg2));
>      tcg_gen_add_i32(TCGV_HIGH(t0), TCGV_HIGH(t0), t1);
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 21c1074..8280489 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1337,6 +1337,25 @@ static void tcg_liveness_analysis(TCGContext *s)
>              }
>              goto do_not_remove;
>  
> +        case INDEX_op_mulu2_i32:
> +            args -= 4;
> +            nb_iargs = 2;
> +            nb_oargs = 2;
> +            /* Likewise, test for the high part of the operation dead.  */
> +            if (dead_temps[args[1]]) {
> +                if (dead_temps[args[0]]) {
> +                    goto do_remove;
> +                }
> +                gen_opc_buf[op_index] = op = INDEX_op_mul_i32;

Very minor nitpick: you probably don't need to set op there.

> +                args[1] = args[2];
> +                args[2] = args[3];
> +                assert(gen_opc_buf[op_index + 1] == INDEX_op_nop);
> +                tcg_set_nop(s, gen_opc_buf + op_index + 1, args + 3, 1);
> +                /* Fall through and mark the single-word operation live.  */
> +                nb_oargs = 1;
> +            }
> +            goto do_not_remove;
> +
>          default:
>              /* XXX: optimize by hardcoding common cases (e.g. triadic ops) */
>              args -= def->nb_args;

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 10/10] tcg: Optimize mulu2
  2012-10-16 23:25   ` Aurelien Jarno
@ 2012-10-17  1:09     ` Richard Henderson
  2012-10-17 10:58       ` Avi Kivity
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2012-10-17  1:09 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 2012-10-17 09:25, Aurelien Jarno wrote:
>> > +                gen_opc_buf[op_index] = op = INDEX_op_mul_i32;
> Very minor nitpick: you probably don't need to set op there.
> 

Perhaps not, but I prefer to keep the variables in sync as we
drop into common code...


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 10/10] tcg: Optimize mulu2
  2012-10-17  1:09     ` Richard Henderson
@ 2012-10-17 10:58       ` Avi Kivity
  0 siblings, 0 replies; 30+ messages in thread
From: Avi Kivity @ 2012-10-17 10:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Aurelien Jarno

On 10/17/2012 03:09 AM, Richard Henderson wrote:
> On 2012-10-17 09:25, Aurelien Jarno wrote:
>>> > +                gen_opc_buf[op_index] = op = INDEX_op_mul_i32;
>> Very minor nitpick: you probably don't need to set op there.
>> 
> 
> Perhaps not, but I prefer to keep the variables in sync as we
> drop into common code...

The compiler should recognize the dead variable anyway.  How very meta.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements
  2012-10-02 18:32 [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Richard Henderson
                   ` (9 preceding siblings ...)
  2012-10-02 18:32 ` [Qemu-devel] [PATCH 10/10] tcg: Optimize mulu2 Richard Henderson
@ 2012-10-17 16:41 ` Aurelien Jarno
  10 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2012-10-17 16:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Oct 02, 2012 at 11:32:20AM -0700, Richard Henderson wrote:
> Changes v1->v2:
> 
> * Patch 1 changes the exact swap condition.  This helps add2 for e.g.
> 
>     add2 tmp4,tmp5,tmp4,tmp5,c1,c2
> 
>   where tmp5, c1, and c2 are all input constants.  Since tmp4 is variable,
>   we cannot constant fold this.  But the existing swap condition would give
> 
>     add2 tmp4.tmp5,tmp4,c2,c1,tmp5
> 
>   While not incorrect, we do want to prefer "adc $c2,tmp5" on i686.
> 
> * Patch 2 drops the partial constant folding for add2/sub2.  It only
>   does the operand ordering for add2.
> 
> * Patch 4 is new.  When writing the code for brcond2 et al, it did seem
>   silly to do all the gen_args[N] = args[N] copying by hand.  I think the
>   patch makes the code more readable.
> 
> * Patch 5 has the operand typo fixed that Aurelien noticed.
> 
> * Patch 8 is new, adding the extra nop into the opcode stream that
>   was suggested on the list.  With this we fully constant fold add2/sub2.
> 
> * Patch 9 is new.  While looking at dumps from x86_64 bios boot, I noticed
>   that sequences of push/pop insns leave the high-part of %rsp dead.   And
>   in general any 32-bit addition in which the high-part isn't "consumed"
>   by cc_dst.
> 
> * Patch 10 is new, treating mulu2 similarly to add2.  It triggers frequently
>   during the boot of seabios, and should not be expensive.
> 
> 
> r~
> 
> 
> Richard Henderson (10):
>   tcg: Split out swap_commutative as a subroutine
>   tcg: Canonicalize add2 operand ordering
>   tcg: Swap commutative double-word comparisons
>   tcg: Use common code when failing to optimize
>   tcg: Optimize double-word comparisons against zero
>   tcg: Split out subroutines from do_constant_folding_cond
>   tcg: Do constant folding on double-word comparisons
>   tcg: Constant fold add2 and sub2
>   tcg: Optimize half-dead add2/sub2
>   tcg: Optimize mulu2
> 
>  tcg/optimize.c | 465 ++++++++++++++++++++++++++++++++++++++-------------------
>  tcg/tcg-op.h   |  11 ++
>  tcg/tcg.c      |  53 ++++++-
>  3 files changed, 377 insertions(+), 152 deletions(-)
> 

All applied, after fixing the conflicts in patch 6.


-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2012-10-17 16:42 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-02 18:32 [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Richard Henderson
2012-10-02 18:32 ` [Qemu-devel] [PATCH 01/10] tcg: Split out swap_commutative as a subroutine Richard Henderson
2012-10-09 15:13   ` Aurelien Jarno
2012-10-09 15:23     ` Richard Henderson
2012-10-09 15:31       ` Aurelien Jarno
2012-10-09 16:40         ` Richard Henderson
2012-10-02 18:32 ` [Qemu-devel] [PATCH 02/10] tcg: Canonicalize add2 operand ordering Richard Henderson
2012-10-09 15:14   ` Aurelien Jarno
2012-10-02 18:32 ` [Qemu-devel] [PATCH 03/10] tcg: Swap commutative double-word comparisons Richard Henderson
2012-10-09 15:16   ` Aurelien Jarno
2012-10-09 15:31     ` Richard Henderson
2012-10-09 15:48       ` Aurelien Jarno
2012-10-02 18:32 ` [Qemu-devel] [PATCH 04/10] tcg: Use common code when failing to optimize Richard Henderson
2012-10-09 15:25   ` Aurelien Jarno
2012-10-09 15:33     ` Richard Henderson
2012-10-02 18:32 ` [Qemu-devel] [PATCH 05/10] tcg: Optimize double-word comparisons against zero Richard Henderson
2012-10-09 16:32   ` Aurelien Jarno
2012-10-02 18:32 ` [Qemu-devel] [PATCH 06/10] tcg: Split out subroutines from do_constant_folding_cond Richard Henderson
2012-10-09 16:33   ` Aurelien Jarno
2012-10-02 18:32 ` [Qemu-devel] [PATCH 07/10] tcg: Do constant folding on double-word comparisons Richard Henderson
2012-10-10  9:45   ` Aurelien Jarno
2012-10-02 18:32 ` [Qemu-devel] [PATCH 08/10] tcg: Constant fold add2 and sub2 Richard Henderson
2012-10-10  9:52   ` Aurelien Jarno
2012-10-02 18:32 ` [Qemu-devel] [PATCH 09/10] tcg: Optimize half-dead add2/sub2 Richard Henderson
2012-10-16 23:25   ` Aurelien Jarno
2012-10-02 18:32 ` [Qemu-devel] [PATCH 10/10] tcg: Optimize mulu2 Richard Henderson
2012-10-16 23:25   ` Aurelien Jarno
2012-10-17  1:09     ` Richard Henderson
2012-10-17 10:58       ` Avi Kivity
2012-10-17 16:41 ` [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Aurelien Jarno

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.