All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG
@ 2011-05-20 12:39 Kirill Batuzov
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 1/6] Add TCG optimizations stub Kirill Batuzov
                   ` (7 more replies)
  0 siblings, 8 replies; 34+ messages in thread
From: Kirill Batuzov @ 2011-05-20 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mj.mccormack, zhur

This series implements some basic machine-independent optimizations.  They
simplify code and allow liveness analysis do it's work better.

Suppose we have following ARM code:

 movw    r12, #0xb6db
 movt    r12, #0xdb6d

In TCG before optimizations we'll have:

 movi_i32 tmp8,$0xb6db
 mov_i32 r12,tmp8
 mov_i32 tmp8,r12
 ext16u_i32 tmp8,tmp8
 movi_i32 tmp9,$0xdb6d0000
 or_i32 tmp8,tmp8,tmp9
 mov_i32 r12,tmp8

And after optimizations we'll have this:

 movi_i32 r12,$0xdb6db6db

Here are performance evaluation results on SPEC CPU2000 integer tests in
user-mode emulation on x86_64 host.  There were 5 runs of each test on
reference data set.  The tables below show runtime in seconds for all these
runs.

ARM guest without optimizations:
Test name       #1       #2       #3       #4       #5    Median
164.gzip    1403.612 1403.499 1403.52  1208.55  1403.583 1403.52
175.vpr     1237.729 1238.008 1238.019 1176.852 1237.902 1237.902
176.gcc      929.511  928.867  929.048  928.927  928.792  928.927
181.mcf      196.371  196.335  196.172  197.057  196.196  196.335
186.crafty  1547.101 1547.293 1547.133 1547.037 1547.044 1547.101
197.parser  3804.336 3804.429 3804.412 3804.45  3804.301 3804.412
252.eon     2760.414 2760.45  2473.608 2760.606 2760.216 2760.414
253.perlbmk 2557.966 2558.971 2559.731 2479.299 2556.835 2557.966
256.bzip2   1296.412 1296.215 1296.63  1296.489 1296.092 1296.412
300.twolf   2919.496 2919.444 2919.529 2919.384 2919.404 2919.444
      
ARM guest with optimizations:
Test name       #1       #2       #3       #4       #5    Median    Gain
164.gzip    1345.416 1401.741 1377.022 1401.737 1401.773 1401.737   0.13%
175.vpr     1116.75  1243.213 1243.32  1243.316 1243.144 1243.213  -0.43%
176.gcc      897.045  909.568  850.1    909.65   909.57   909.568   2.08%
181.mcf      199.058  198.717  198.28   198.866  197.955  198.717  -1.21%
186.crafty  1525.667 1526.663 1525.981 1525.995 1526.164 1525.995   1.36%
197.parser  3749.453 3749.522 3749.413 3749.5   3749.484 3749.484   1.44%
252.eon     2730.593 2746.525 2746.495 2746.493 2746.62  2746.495   0.50%
253.perlbmk 2577.341 2521.057 2578.461 2578.721 2581.313 2578.461  -0.80%
256.bzip2   1184.498 1190.116 1294.352 1294.554 1294.637 1294.352   0.16%
300.twolf   2894.264 2894.133 2894.398 2894.103 2894.146 2894.146   0.87%


x86_64 guest without optimizations:
Test name       #1       #2       #3       #4       #5    Median
164.gzip     858.118  858.151  858.09   858.139  858.122  858.122
175.vpr      956.361  956.465  956.521  956.438  956.705  956.465
176.gcc      647.275  647.465  647.186  647.294  647.268  647.275
181.mcf      219.239  221.964  220.244  220.74   220.559  220.559
186.crafty  1128.027 1128.071 1128.028 1128.115 1128.123 1128.071
197.parser  1815.669 1815.651 1815.669 1815.711 1815.759 1815.669
253.perlbmk 1777.143 1777.749 1667.508 1777.051 1778.391 1777.143
254.gap     1062.808 1062.758 1062.801 1063.099 1062.859 1062.808
255.vortex  1930.693 1930.706 1930.579 1930.7   1930.566 1930.693
256.bzip2   1014.566 1014.702 1014.6   1014.274 1014.421 1014.566
300.twolf   1342.653 1342.759 1344.092 1342.641 1342.794 1342.759
     
x86_64 guest with optimizations:
Test name       #1       #2       #3       #4       #5    Median    Gain
164.gzip     857.485  857.457  857.475  857.509  857.507  857.485   0.07%
175.vpr      963.255  962.972  963.27   963.124  963.686  963.255  -0.71%
176.gcc      644.123  644.055  644.145  643.818  635.773  644.055   0.50%
181.mcf      216.215  217.549  218.744  216.437  217.83   217.549   1.36%
186.crafty  1128.873 1128.792 1128.871 1128.816 1128.823 1128.823  -0.07%
197.parser  1814.626 1814.503 1814.552 1814.602 1814.748 1814.602   0.06%
253.perlbmk 1758.056 1751.963 1753.267 1765.27  1759.828 1758.056   1.07%
254.gap     1064.702 1064.712 1064.629 1064.657 1064.645 1064.657  -0.17%
255.vortex  1760.638 1936.387 1937.871 1937.471 1760.496 1936.387  -0.29%
256.bzip2   1007.658 1007.682 1007.316 1007.982 1007.747 1007.682   0.68%
300.twolf   1334.139 1333.791 1333.795 1334.147 1333.732 1333.795   0.67%

ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not
work under QEMU for some unrelated reason.

Kirill Batuzov (6):
  Add TCG optimizations stub
  Add copy and constant propagation.
  Do constant folding for basic arithmetic operations.
  Do constant folding for boolean operations.
  Do constant folding for shift operations.
  Do constant folding for unary operations.

 Makefile.target |    2 +-
 tcg/optimize.c  |  539 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.c       |    6 +
 tcg/tcg.h       |    3 +
 4 files changed, 549 insertions(+), 1 deletions(-)
 create mode 100644 tcg/optimize.c

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 1/6] Add TCG optimizations stub
  2011-05-20 12:39 [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
@ 2011-05-20 12:39 ` Kirill Batuzov
  2011-05-20 18:12   ` Richard Henderson
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 2/6] Add copy and constant propagation Kirill Batuzov
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 34+ messages in thread
From: Kirill Batuzov @ 2011-05-20 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mj.mccormack, zhur

Added file tcg/optimize.c to hold TCG optimizations. Function tcg_optimize
is called from tcg_gen_code_common. It calls other functions performing
specific optimizations. Stub for constant folding was added.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 Makefile.target |    2 +-
 tcg/optimize.c  |   87 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.c       |    6 ++++
 tcg/tcg.h       |    3 ++
 4 files changed, 97 insertions(+), 1 deletions(-)
 create mode 100644 tcg/optimize.c

diff --git a/Makefile.target b/Makefile.target
index 21f864a..5a61778 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -70,7 +70,7 @@ all: $(PROGS) stap
 #########################################################
 # cpu emulator library
 libobj-y = exec.o translate-all.o cpu-exec.o translate.o
-libobj-y += tcg/tcg.o
+libobj-y += tcg/tcg.o tcg/optimize.o
 libobj-$(CONFIG_SOFTFLOAT) += fpu/softfloat.o
 libobj-$(CONFIG_NOSOFTFLOAT) += fpu/softfloat-native.o
 libobj-y += op_helper.o helper.o
diff --git a/tcg/optimize.c b/tcg/optimize.c
new file mode 100644
index 0000000..cf31d18
--- /dev/null
+++ b/tcg/optimize.c
@@ -0,0 +1,87 @@
+/*
+ * Optimizations for Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2010 Samsung Electronics.
+ * Contributed by Kirill Batuzov <batuzovk@ispras.ru>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "config.h"
+
+#include <stdlib.h>
+#include <stdio.h>
+
+#include "qemu-common.h"
+#include "tcg-op.h"
+
+static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
+                                    TCGArg *args, TCGOpDef *tcg_op_defs)
+{
+    int i, nb_ops, op_index, op, nb_temps, nb_globals;
+    const TCGOpDef *def;
+    TCGArg *gen_args;
+
+    nb_temps = s->nb_temps;
+    nb_globals = s->nb_globals;
+
+    nb_ops = tcg_opc_ptr - gen_opc_buf;
+    gen_args = args;
+    for (op_index = 0; op_index < nb_ops; op_index++) {
+        op = gen_opc_buf[op_index];
+        def = &tcg_op_defs[op];
+        switch (op) {
+        case INDEX_op_call:
+        case INDEX_op_jmp:
+        case INDEX_op_br:
+        case INDEX_op_brcond_i32:
+        case INDEX_op_set_label:
+#if TCG_TARGET_REG_BITS == 64
+        case INDEX_op_brcond_i64:
+#endif
+            i = (op == INDEX_op_call) ?
+                (args[0] >> 16) + (args[0] & 0xffff) + 3 :
+                def->nb_args;
+            while (i) {
+                *gen_args = *args;
+                args++;
+                gen_args++;
+                i--;
+            }
+            break;
+        default:
+            for (i = 0; i < def->nb_args; i++) {
+                gen_args[i] = args[i];
+            }
+            args += def->nb_args;
+            gen_args += def->nb_args;
+            break;
+        }
+    }
+
+    return gen_args;
+}
+
+TCGArg *tcg_optimize(TCGContext *s, uint16_t *tcg_opc_ptr,
+        TCGArg *args, TCGOpDef *tcg_op_defs)
+{
+    TCGArg *res;
+    res = tcg_constant_folding(s, tcg_opc_ptr, args, tcg_op_defs);
+    return res;
+}
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 8748c05..6fb4dd6 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -24,6 +24,7 @@
 
 /* define it to use liveness analysis (better code) */
 #define USE_LIVENESS_ANALYSIS
+#define USE_TCG_OPTIMIZATIONS
 
 #include "config.h"
 
@@ -2018,6 +2019,11 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
     }
 #endif
 
+#ifdef USE_TCG_OPTIMIZATIONS
+    gen_opparam_ptr =
+        tcg_optimize(s, gen_opc_ptr, gen_opparam_buf, tcg_op_defs);
+#endif
+
 #ifdef CONFIG_PROFILER
     s->la_time -= profile_getclock();
 #endif
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 3fab8d6..a85a8d7 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -486,6 +486,9 @@ void tcg_gen_callN(TCGContext *s, TCGv_ptr func, unsigned int flags,
 void tcg_gen_shifti_i64(TCGv_i64 ret, TCGv_i64 arg1,
                         int c, int right, int arith);
 
+TCGArg *tcg_optimize(TCGContext *s, uint16_t *tcg_opc_ptr, TCGArg *args,
+                     TCGOpDef *tcg_op_def);
+
 /* only used for debugging purposes */
 void tcg_register_helper(void *func, const char *name);
 const char *tcg_helper_get_name(TCGContext *s, void *func);
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 2/6] Add copy and constant propagation.
  2011-05-20 12:39 [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 1/6] Add TCG optimizations stub Kirill Batuzov
@ 2011-05-20 12:39 ` Kirill Batuzov
  2011-05-20 18:22   ` Richard Henderson
  2011-05-20 19:41   ` Aurelien Jarno
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 3/6] Do constant folding for basic arithmetic operations Kirill Batuzov
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 34+ messages in thread
From: Kirill Batuzov @ 2011-05-20 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mj.mccormack, zhur

Make tcg_constant_folding do copy and constant propagation. It is a
preparational work before actual constant folding.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/optimize.c |  123 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 123 insertions(+), 0 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index cf31d18..a761c51 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -31,22 +31,139 @@
 #include "qemu-common.h"
 #include "tcg-op.h"
 
+typedef enum {
+    TCG_TEMP_UNDEF = 0,
+    TCG_TEMP_CONST,
+    TCG_TEMP_COPY,
+    TCG_TEMP_ANY
+} tcg_temp_state;
+
+static int mov_to_movi(int op)
+{
+    switch (op) {
+    case INDEX_op_mov_i32: return INDEX_op_movi_i32;
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_mov_i64: return INDEX_op_movi_i64;
+#endif
+    default:
+        fprintf(stderr, "Unrecognized operation %d in mov_to_movi.\n", op);
+        tcg_abort();
+    }
+}
+
+/* Reset TEMP's state to TCG_TEMP_ANY.  If TEMP was a representative of some
+   class of equivalent temp's, a new representative should be chosen in this
+   class. */
+static void reset_temp(tcg_temp_state *state, tcg_target_ulong *vals,
+                       TCGArg temp, int nb_temps, int nb_globals)
+{
+    int i;
+    TCGArg new_base;
+    new_base = (TCGArg)-1;
+    for (i = nb_globals; i < nb_temps; i++) {
+        if (state[i] == TCG_TEMP_COPY && vals[i] == temp) {
+            if (new_base == ((TCGArg)-1)) {
+                new_base = (TCGArg)i;
+                state[i] = TCG_TEMP_ANY;
+            } else {
+                vals[i] = new_base;
+            }
+        }
+    }
+    for (i = 0; i < nb_globals; i++) {
+        if (state[i] == TCG_TEMP_COPY && vals[i] == temp) {
+            if (new_base == ((TCGArg)-1)) {
+                state[i] = TCG_TEMP_ANY;
+            } else {
+                vals[i] = new_base;
+            }
+        }
+    }
+    state[temp] = TCG_TEMP_ANY;
+}
+
+/* Propagate constants and copies, fold constant expressions. */
 static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                                     TCGArg *args, TCGOpDef *tcg_op_defs)
 {
     int i, nb_ops, op_index, op, nb_temps, nb_globals;
     const TCGOpDef *def;
     TCGArg *gen_args;
+    /* Array VALS has an element for each temp.
+       If this temp holds a constant then its value is kept in VALS' element.
+       If this temp is a copy of other ones then this equivalence class'
+       representative is kept in VALS' element.
+       If this temp is neither copy nor constant then corresponding VALS'
+       element is unused. */
+    static tcg_target_ulong vals[TCG_MAX_TEMPS];
+    static tcg_temp_state state[TCG_MAX_TEMPS];
 
     nb_temps = s->nb_temps;
     nb_globals = s->nb_globals;
+    memset(state, 0, nb_temps * sizeof(tcg_temp_state));
 
     nb_ops = tcg_opc_ptr - gen_opc_buf;
     gen_args = args;
     for (op_index = 0; op_index < nb_ops; op_index++) {
         op = gen_opc_buf[op_index];
         def = &tcg_op_defs[op];
+        /* Do copy propagation */
+        if (op != INDEX_op_call) {
+            for (i = def->nb_oargs; i < def->nb_oargs + def->nb_iargs; i++) {
+                if (state[args[i]] == TCG_TEMP_COPY
+                    && !(def->args_ct[i].ct & TCG_CT_IALIAS)
+                    && (def->args_ct[i].ct & TCG_CT_REG)) {
+                    args[i] = vals[args[i]];
+                }
+            }
+        }
+
+        /* Propagate constants through copy operations and do constant
+           folding.  Constants will be substituted to arguments by register
+           allocator where needed and possible.  Also detect copies. */
         switch (op) {
+        case INDEX_op_mov_i32:
+#if TCG_TARGET_REG_BITS == 64
+        case INDEX_op_mov_i64:
+#endif
+            if ((state[args[1]] == TCG_TEMP_COPY
+                && vals[args[1]] == args[0])
+                || args[0] == args[1]) {
+                args += 2;
+                gen_opc_buf[op_index] = INDEX_op_nop;
+                break;
+            }
+            if (state[args[1]] != TCG_TEMP_CONST) {
+                reset_temp(state, vals, args[0], nb_temps, nb_globals);
+                if (args[1] >= s->nb_globals) {
+                    state[args[0]] = TCG_TEMP_COPY;
+                    vals[args[0]] = args[1];
+                }
+                gen_args[0] = args[0];
+                gen_args[1] = args[1];
+                gen_args += 2;
+                args += 2;
+                break;
+            } else {
+                /* Source argument is constant.  Rewrite the operation and
+                   let movi case handle it. */
+                op = mov_to_movi(op);
+                gen_opc_buf[op_index] = op;
+                args[1] = vals[args[1]];
+                /* fallthrough */
+            }
+        case INDEX_op_movi_i32:
+#if TCG_TARGET_REG_BITS == 64
+        case INDEX_op_movi_i64:
+#endif
+            reset_temp(state, vals, args[0], nb_temps, nb_globals);
+            state[args[0]] = TCG_TEMP_CONST;
+            vals[args[0]] = args[1];
+            gen_args[0] = args[0];
+            gen_args[1] = args[1];
+            gen_args += 2;
+            args += 2;
+            break;
         case INDEX_op_call:
         case INDEX_op_jmp:
         case INDEX_op_br:
@@ -55,6 +172,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
 #if TCG_TARGET_REG_BITS == 64
         case INDEX_op_brcond_i64:
 #endif
+            memset(state, 0, nb_temps * sizeof(tcg_temp_state));
             i = (op == INDEX_op_call) ?
                 (args[0] >> 16) + (args[0] & 0xffff) + 3 :
                 def->nb_args;
@@ -66,6 +184,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             }
             break;
         default:
+            /* Default case: we do know nothing about operation so no
+               propagation is done.  We only trash output args.  */
+            for (i = 0; i < def->nb_oargs; i++) {
+                reset_temp(state, vals, args[i], nb_temps, nb_globals);
+            }
             for (i = 0; i < def->nb_args; i++) {
                 gen_args[i] = args[i];
             }
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 3/6] Do constant folding for basic arithmetic operations.
  2011-05-20 12:39 [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 1/6] Add TCG optimizations stub Kirill Batuzov
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 2/6] Add copy and constant propagation Kirill Batuzov
@ 2011-05-20 12:39 ` Kirill Batuzov
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 4/6] Do constant folding for boolean operations Kirill Batuzov
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 34+ messages in thread
From: Kirill Batuzov @ 2011-05-20 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mj.mccormack, zhur

Perform actual constant folding for ADD, SUB and MUL operations.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/optimize.c |  102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 102 insertions(+), 0 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index a761c51..4073f05 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -82,6 +82,79 @@ static void reset_temp(tcg_temp_state *state, tcg_target_ulong *vals,
     state[temp] = TCG_TEMP_ANY;
 }
 
+static int op_bits(int op)
+{
+    switch (op) {
+    case INDEX_op_mov_i32:
+    case INDEX_op_add_i32:
+    case INDEX_op_sub_i32:
+    case INDEX_op_mul_i32:
+        return 32;
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_mov_i64:
+    case INDEX_op_add_i64:
+    case INDEX_op_sub_i64:
+    case INDEX_op_mul_i64:
+        return 64;
+#endif
+    default:
+        fprintf(stderr, "Unrecognized operation %d in op_bits.\n", op);
+        tcg_abort();
+    }
+}
+
+static int op_to_movi(int op)
+{
+    if (op_bits(op) == 32) {
+        return INDEX_op_movi_i32;
+    }
+#if TCG_TARGET_REG_BITS == 64
+    if (op_bits(op) == 64) {
+        return INDEX_op_movi_i64;
+    }
+#endif
+    tcg_abort();
+}
+
+static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
+{
+    switch (op) {
+    case INDEX_op_add_i32:
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_add_i64:
+#endif
+        return x + y;
+
+    case INDEX_op_sub_i32:
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_sub_i64:
+#endif
+        return x - y;
+
+    case INDEX_op_mul_i32:
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_mul_i64:
+#endif
+        return x * y;
+
+    default:
+        fprintf(stderr,
+                "Unrecognized operation %d in do_constant_folding.\n", op);
+        tcg_abort();
+    }
+}
+
+static TCGArg do_constant_folding(int op, TCGArg x, TCGArg y)
+{
+    TCGArg res = do_constant_folding_2(op, x, y);
+#if TCG_TARGET_REG_BITS == 64
+    if (op_bits(op) == 32) {
+        res &= 0xffffffff;
+    }
+#endif
+    return res;
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
                                     TCGArg *args, TCGOpDef *tcg_op_defs)
@@ -164,6 +237,35 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             gen_args += 2;
             args += 2;
             break;
+        case INDEX_op_add_i32:
+        case INDEX_op_sub_i32:
+        case INDEX_op_mul_i32:
+#if TCG_TARGET_REG_BITS == 64
+        case INDEX_op_add_i64:
+        case INDEX_op_sub_i64:
+        case INDEX_op_mul_i64:
+#endif
+            if (state[args[1]] == TCG_TEMP_CONST
+                && state[args[2]] == TCG_TEMP_CONST) {
+                gen_opc_buf[op_index] = op_to_movi(op);
+                gen_args[0] = args[0];
+                gen_args[1] =
+                    do_constant_folding(op, vals[args[1]], vals[args[2]]);
+                reset_temp(state, vals, gen_args[0], nb_temps, nb_globals);
+                state[gen_args[0]] = TCG_TEMP_CONST;
+                vals[gen_args[0]] = gen_args[1];
+                gen_args += 2;
+                args += 3;
+                break;
+            } else {
+                reset_temp(state, vals, args[0], nb_temps, nb_globals);
+                gen_args[0] = args[0];
+                gen_args[1] = args[1];
+                gen_args[2] = args[2];
+                gen_args += 3;
+                args += 3;
+                break;
+            }
         case INDEX_op_call:
         case INDEX_op_jmp:
         case INDEX_op_br:
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 4/6] Do constant folding for boolean operations.
  2011-05-20 12:39 [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
                   ` (2 preceding siblings ...)
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 3/6] Do constant folding for basic arithmetic operations Kirill Batuzov
@ 2011-05-20 12:39 ` Kirill Batuzov
  2011-05-20 18:45   ` Richard Henderson
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations Kirill Batuzov
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 34+ messages in thread
From: Kirill Batuzov @ 2011-05-20 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mj.mccormack, zhur

Perform constant folding for AND, OR, XOR operations.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/optimize.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 4073f05..a02d5c1 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -38,6 +38,13 @@ typedef enum {
     TCG_TEMP_ANY
 } tcg_temp_state;
 
+const int mov_opc[] = {
+    INDEX_op_mov_i32,
+#if TCG_TARGET_REG_BITS == 64
+    INDEX_op_mov_i64,
+#endif
+};
+
 static int mov_to_movi(int op)
 {
     switch (op) {
@@ -89,12 +96,18 @@ static int op_bits(int op)
     case INDEX_op_add_i32:
     case INDEX_op_sub_i32:
     case INDEX_op_mul_i32:
+    case INDEX_op_and_i32:
+    case INDEX_op_or_i32:
+    case INDEX_op_xor_i32:
         return 32;
 #if TCG_TARGET_REG_BITS == 64
     case INDEX_op_mov_i64:
     case INDEX_op_add_i64:
     case INDEX_op_sub_i64:
     case INDEX_op_mul_i64:
+    case INDEX_op_and_i64:
+    case INDEX_op_or_i64:
+    case INDEX_op_xor_i64:
         return 64;
 #endif
     default:
@@ -137,6 +150,24 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
 #endif
         return x * y;
 
+    case INDEX_op_and_i32:
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_and_i64:
+#endif
+        return x & y;
+
+    case INDEX_op_or_i32:
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_or_i64:
+#endif
+        return x | y;
+
+    case INDEX_op_xor_i32:
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_xor_i64:
+#endif
+        return x ^ y;
+
     default:
         fprintf(stderr,
                 "Unrecognized operation %d in do_constant_folding.\n", op);
@@ -237,10 +268,37 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             gen_args += 2;
             args += 2;
             break;
+        case INDEX_op_or_i32:
+        case INDEX_op_and_i32:
+#if TCG_TARGET_REG_BITS == 64
+        case INDEX_op_and_i64:
+        case INDEX_op_or_i64:
+#endif
+            if (args[1] == args[2]) {
+                if (args[1] == args[0]) {
+                    args += 3;
+                    gen_opc_buf[op_index] = INDEX_op_nop;
+                } else {
+                    reset_temp(state, vals, args[0], nb_temps, nb_globals);
+                    if (args[1] >= s->nb_globals) {
+                        state[args[0]] = TCG_TEMP_COPY;
+                        vals[args[0]] = args[1];
+                    }
+                    gen_opc_buf[op_index] = mov_opc[op_bits(op) / 32 - 1];
+                    gen_args[0] = args[0];
+                    gen_args[1] = args[1];
+                    gen_args += 2;
+                    args += 3;
+                }
+                break;
+            }
+            /* Proceed with default binary operation handling */
+        case INDEX_op_xor_i32:
         case INDEX_op_add_i32:
         case INDEX_op_sub_i32:
         case INDEX_op_mul_i32:
 #if TCG_TARGET_REG_BITS == 64
+        case INDEX_op_xor_i64:
         case INDEX_op_add_i64:
         case INDEX_op_sub_i64:
         case INDEX_op_mul_i64:
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations.
  2011-05-20 12:39 [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
                   ` (3 preceding siblings ...)
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 4/6] Do constant folding for boolean operations Kirill Batuzov
@ 2011-05-20 12:39 ` Kirill Batuzov
  2011-05-20 18:37   ` Richard Henderson
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 6/6] Do constant folding for unary operations Kirill Batuzov
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 34+ messages in thread
From: Kirill Batuzov @ 2011-05-20 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mj.mccormack, zhur

Perform constant forlding for SHR, SHL, SAR, ROTR, ROTL operations.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/optimize.c |   87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index a02d5c1..b6b0dc4 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -99,6 +99,11 @@ static int op_bits(int op)
     case INDEX_op_and_i32:
     case INDEX_op_or_i32:
     case INDEX_op_xor_i32:
+    case INDEX_op_shl_i32:
+    case INDEX_op_shr_i32:
+    case INDEX_op_sar_i32:
+    case INDEX_op_rotl_i32:
+    case INDEX_op_rotr_i32:
         return 32;
 #if TCG_TARGET_REG_BITS == 64
     case INDEX_op_mov_i64:
@@ -108,6 +113,11 @@ static int op_bits(int op)
     case INDEX_op_and_i64:
     case INDEX_op_or_i64:
     case INDEX_op_xor_i64:
+    case INDEX_op_shl_i64:
+    case INDEX_op_shr_i64:
+    case INDEX_op_sar_i64:
+    case INDEX_op_rotl_i64:
+    case INDEX_op_rotr_i64:
         return 64;
 #endif
     default:
@@ -131,6 +141,7 @@ static int op_to_movi(int op)
 
 static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
 {
+    TCGArg r;
     switch (op) {
     case INDEX_op_add_i32:
 #if TCG_TARGET_REG_BITS == 64
@@ -168,6 +179,72 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
 #endif
         return x ^ y;
 
+    case INDEX_op_shl_i32:
+#if TCG_TARGET_REG_BITS == 64
+        y &= 0xffffffff;
+    case INDEX_op_shl_i64:
+#endif
+        return x << y;
+
+    case INDEX_op_shr_i32:
+#if TCG_TARGET_REG_BITS == 64
+        x &= 0xffffffff;
+        y &= 0xffffffff;
+    case INDEX_op_shr_i64:
+#endif
+        /* Assuming TCGArg to be unsigned */
+        return x >> y;
+
+    case INDEX_op_sar_i32:
+#if TCG_TARGET_REG_BITS == 64
+        x &= 0xffffffff;
+        y &= 0xffffffff;
+#endif
+        r = x & 0x80000000;
+        x &= ~0x80000000;
+        x >>= y;
+        r |= r - (r >> y);
+        x |= r;
+        return x;
+
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_sar_i64:
+        r = x & 0x8000000000000000ULL;
+        x &= ~0x8000000000000000ULL;
+        x >>= y;
+        r |= r - (r >> y);
+        x |= r;
+        return x;
+#endif
+
+    case INDEX_op_rotr_i32:
+#if TCG_TARGET_REG_BITS == 64
+        x &= 0xffffffff;
+        y &= 0xffffffff;
+#endif
+        x = (x << (32 - y)) | (x >> y);
+        return x;
+
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_rotr_i64:
+        x = (x << (64 - y)) | (x >> y);
+        return x;
+#endif
+
+    case INDEX_op_rotl_i32:
+#if TCG_TARGET_REG_BITS == 64
+        x &= 0xffffffff;
+        y &= 0xffffffff;
+#endif
+        x = (x << y) | (x >> (32 - y));
+        return x;
+
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_rotl_i64:
+        x = (x << y) | (x >> (64 - y));
+        return x;
+#endif
+
     default:
         fprintf(stderr,
                 "Unrecognized operation %d in do_constant_folding.\n", op);
@@ -297,11 +374,21 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
         case INDEX_op_add_i32:
         case INDEX_op_sub_i32:
         case INDEX_op_mul_i32:
+        case INDEX_op_shl_i32:
+        case INDEX_op_shr_i32:
+        case INDEX_op_sar_i32:
+        case INDEX_op_rotl_i32:
+        case INDEX_op_rotr_i32:
 #if TCG_TARGET_REG_BITS == 64
         case INDEX_op_xor_i64:
         case INDEX_op_add_i64:
         case INDEX_op_sub_i64:
         case INDEX_op_mul_i64:
+        case INDEX_op_shl_i64:
+        case INDEX_op_shr_i64:
+        case INDEX_op_sar_i64:
+        case INDEX_op_rotl_i64:
+        case INDEX_op_rotr_i64:
 #endif
             if (state[args[1]] == TCG_TEMP_CONST
                 && state[args[2]] == TCG_TEMP_CONST) {
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 6/6] Do constant folding for unary operations.
  2011-05-20 12:39 [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
                   ` (4 preceding siblings ...)
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations Kirill Batuzov
@ 2011-05-20 12:39 ` Kirill Batuzov
  2011-05-20 18:39   ` Richard Henderson
  2011-05-20 17:50 ` [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG Richard Henderson
  2011-05-20 19:35 ` Aurelien Jarno
  7 siblings, 1 reply; 34+ messages in thread
From: Kirill Batuzov @ 2011-05-20 12:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mj.mccormack, zhur

Perform constant folding for NOT and EXT{8,16,32}{S,U} operations.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/optimize.c |   82 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 82 insertions(+), 0 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index b6b0dc4..bda469a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -104,6 +104,11 @@ static int op_bits(int op)
     case INDEX_op_sar_i32:
     case INDEX_op_rotl_i32:
     case INDEX_op_rotr_i32:
+    case INDEX_op_not_i32:
+    case INDEX_op_ext8s_i32:
+    case INDEX_op_ext16s_i32:
+    case INDEX_op_ext8u_i32:
+    case INDEX_op_ext16u_i32:
         return 32;
 #if TCG_TARGET_REG_BITS == 64
     case INDEX_op_mov_i64:
@@ -118,6 +123,13 @@ static int op_bits(int op)
     case INDEX_op_sar_i64:
     case INDEX_op_rotl_i64:
     case INDEX_op_rotr_i64:
+    case INDEX_op_not_i64:
+    case INDEX_op_ext8s_i64:
+    case INDEX_op_ext16s_i64:
+    case INDEX_op_ext32s_i64:
+    case INDEX_op_ext8u_i64:
+    case INDEX_op_ext16u_i64:
+    case INDEX_op_ext32u_i64:
         return 64;
 #endif
     default:
@@ -245,6 +257,44 @@ static TCGArg do_constant_folding_2(int op, TCGArg x, TCGArg y)
         return x;
 #endif
 
+    case INDEX_op_not_i32:
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_not_i64:
+#endif
+        return ~x;
+
+    case INDEX_op_ext8s_i32:
+        return x & (1 << 7) ? x | ~0xff : x & 0xff;
+
+    case INDEX_op_ext16s_i32:
+        return x & (1 << 15) ? x | ~0xffff : x & 0xffff;
+
+    case INDEX_op_ext8u_i32:
+        return x & 0xff;
+
+    case INDEX_op_ext16u_i32:
+        return x & 0xffff;
+
+#if TCG_TARGET_REG_BITS == 64
+    case INDEX_op_ext8s_i64:
+        return x & (1 << 7) ? x | ~0xffULL : x & 0xff;
+
+    case INDEX_op_ext16s_i64:
+        return x & (1 << 15) ? x | ~0xffffULL : x & 0xffff;
+
+    case INDEX_op_ext32s_i64:
+        return x & (1U << 31) ? x | ~0xffffffffULL : x & 0xffffffff;
+
+    case INDEX_op_ext8u_i64:
+        return x & 0xff;
+
+    case INDEX_op_ext16u_i64:
+        return x & 0xffff;
+
+    case INDEX_op_ext32u_i64:
+        return x & 0xffffffff;
+#endif
+
     default:
         fprintf(stderr,
                 "Unrecognized operation %d in do_constant_folding.\n", op);
@@ -345,6 +395,38 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
             gen_args += 2;
             args += 2;
             break;
+        case INDEX_op_not_i32:
+        case INDEX_op_ext8s_i32:
+        case INDEX_op_ext16s_i32:
+        case INDEX_op_ext8u_i32:
+        case INDEX_op_ext16u_i32:
+#if TCG_TARGET_REG_BITS == 64
+        case INDEX_op_not_i64:
+        case INDEX_op_ext8s_i64:
+        case INDEX_op_ext16s_i64:
+        case INDEX_op_ext32s_i64:
+        case INDEX_op_ext8u_i64:
+        case INDEX_op_ext16u_i64:
+        case INDEX_op_ext32u_i64:
+#endif
+            if (state[args[1]] == TCG_TEMP_CONST) {
+                gen_opc_buf[op_index] = op_to_movi(op);
+                gen_args[0] = args[0];
+                gen_args[1] = do_constant_folding(op, vals[args[1]], 0);
+                reset_temp(state, vals, gen_args[0], nb_temps, nb_globals);
+                state[gen_args[0]] = TCG_TEMP_CONST;
+                vals[gen_args[0]] = gen_args[1];
+                gen_args += 2;
+                args += 2;
+                break;
+            } else {
+                reset_temp(state, vals, args[0], nb_temps, nb_globals);
+                gen_args[0] = args[0];
+                gen_args[1] = args[1];
+                gen_args += 2;
+                args += 2;
+                break;
+            }
         case INDEX_op_or_i32:
         case INDEX_op_and_i32:
 #if TCG_TARGET_REG_BITS == 64
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG
  2011-05-20 12:39 [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
                   ` (5 preceding siblings ...)
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 6/6] Do constant folding for unary operations Kirill Batuzov
@ 2011-05-20 17:50 ` Richard Henderson
  2011-05-20 19:37   ` Aurelien Jarno
  2011-05-20 19:35 ` Aurelien Jarno
  7 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2011-05-20 17:50 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: mj.mccormack, qemu-devel, zhur

On 05/20/2011 05:39 AM, Kirill Batuzov wrote:
> This series implements some basic machine-independent optimizations.  They
> simplify code and allow liveness analysis do it's work better.
> 
> Suppose we have following ARM code:
> 
>  movw    r12, #0xb6db
>  movt    r12, #0xdb6d
> 
> In TCG before optimizations we'll have:
> 
>  movi_i32 tmp8,$0xb6db
>  mov_i32 r12,tmp8
>  mov_i32 tmp8,r12
>  ext16u_i32 tmp8,tmp8
>  movi_i32 tmp9,$0xdb6d0000
>  or_i32 tmp8,tmp8,tmp9
>  mov_i32 r12,tmp8
> 
> And after optimizations we'll have this:
> 
>  movi_i32 r12,$0xdb6db6db
> 
> Here are performance evaluation results on SPEC CPU2000 integer tests in
> user-mode emulation on x86_64 host.  There were 5 runs of each test on
> reference data set.  The tables below show runtime in seconds for all these
> runs.

I totally agree that this sort of optimization is needed in TCG.  Essentially
all RISC guests have the same problem.  When emulating one RISC upon another,
the problem may be exacerbated.  E.g. Sparc on PPC -- sparc will use a 21/11
bit split of the constant, ppc will use a 16/16 split of the constant, which
results in 3 insns in the generated code where 2 would do.

You should be aware of prior work in this area by Aurelien Jarno:

  git://git.aurel32.net/qemu.git tcg-optimizations

Given that's now 2 years old, and doesn't seem to be progressing, I hope your
patch series can get things going again...

Further optimizations that are enabled by this constant propagation include
propagating the address of an absolute memory read into the TLB lookup:

  git://repo.or.cz/git/qemu/rth.git tcg-const-addr-1

Also, enabling the tcg backend to store a constant to memory directly, rather
than loading the constant to a register first.  Sorry, I forget what branch 
this is is, but one of Aurelien's.   This improves generated code density a
bit.  While it's only i386 that can store arbitrary constants directly to
memory, most of the hosts have a zero register that can be used.

Specific comments as followups to specific patches.


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 1/6] Add TCG optimizations stub
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 1/6] Add TCG optimizations stub Kirill Batuzov
@ 2011-05-20 18:12   ` Richard Henderson
  2011-05-20 18:33     ` Richard Henderson
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2011-05-20 18:12 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: mj.mccormack, qemu-devel, zhur

On 05/20/2011 05:39 AM, Kirill Batuzov wrote:
> +            i = (op == INDEX_op_call) ?
> +                (args[0] >> 16) + (args[0] & 0xffff) + 3 :
> +                def->nb_args;
> +            while (i) {
> +                *gen_args = *args;
> +                args++;
> +                gen_args++;
> +                i--;
> +            }

If you use the correct NOP, i.e. nop vs nop[123n], then 
I don't believe you need to compact the arguments like this.


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] Add copy and constant propagation.
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 2/6] Add copy and constant propagation Kirill Batuzov
@ 2011-05-20 18:22   ` Richard Henderson
  2011-05-20 18:46     ` Paolo Bonzini
  2011-05-20 19:41   ` Aurelien Jarno
  1 sibling, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2011-05-20 18:22 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: mj.mccormack, qemu-devel, zhur

On 05/20/2011 05:39 AM, Kirill Batuzov wrote:
> +    static tcg_target_ulong vals[TCG_MAX_TEMPS];
> +    static tcg_temp_state state[TCG_MAX_TEMPS];

Any particular reason to make these static?  It's 4-6k on the host
stack depending on sizeof tcg_target_ulong.  Large, but not excessive.
I think it's just confusing to have them static, myself.

I think it might be clearer to have these two in a structure, rather
than two separate arrays.  That does waste a bit of memory in padding
for 64-bit target, but I think it might clean up the code a bit.

>      nb_temps = s->nb_temps;
>      nb_globals = s->nb_globals;
> +    memset(state, 0, nb_temps * sizeof(tcg_temp_state));

Of course, instead of allocating static structures, you could alloca
the memory in the appropriate size...

> +        case INDEX_op_mov_i32:
> +#if TCG_TARGET_REG_BITS == 64
> +        case INDEX_op_mov_i64:
> +#endif
> +            if ((state[args[1]] == TCG_TEMP_COPY
> +                && vals[args[1]] == args[0])
> +                || args[0] == args[1]) {
> +                args += 2;
> +                gen_opc_buf[op_index] = INDEX_op_nop;
> +                break;

Here, for example, INDEX_op_nop2 would be more appropriate,
obviating the need for the arg copy loop from patch 1.

> +            if (state[args[1]] != TCG_TEMP_CONST) {
> +            } else {
> +                /* Source argument is constant.  Rewrite the operation and
> +                   let movi case handle it. */
> +            }

FWIW, I think writing positive tests is clearer than writing
negative tests.  I.e. reverse the condition and the if/else bodies.

>          case INDEX_op_brcond_i64:
>  #endif
> +            memset(state, 0, nb_temps * sizeof(tcg_temp_state));

Why are you resetting at the branch, rather than at the label?
Seems reasonable enough to handle the extended basic block when
possible...


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 1/6] Add TCG optimizations stub
  2011-05-20 18:12   ` Richard Henderson
@ 2011-05-20 18:33     ` Richard Henderson
  0 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2011-05-20 18:33 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: mj.mccormack, qemu-devel, zhur

On 05/20/2011 11:12 AM, Richard Henderson wrote:
> On 05/20/2011 05:39 AM, Kirill Batuzov wrote:
>> +            i = (op == INDEX_op_call) ?
>> +                (args[0] >> 16) + (args[0] & 0xffff) + 3 :
>> +                def->nb_args;
>> +            while (i) {
>> +                *gen_args = *args;
>> +                args++;
>> +                gen_args++;
>> +                i--;
>> +            }
> 
> If you use the correct NOP, i.e. nop vs nop[123n], then 
> I don't believe you need to compact the arguments like this.

Bah, nevermind.  I forgot that we'd have to do something else
odd to replace n-operand operation opcodes with 2-operand movi.

I went back and saw Aurelien did this with a memmove in his
patch; it's probably more efficient to move pieces at a time
to fill in holes, as you do here.


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations.
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations Kirill Batuzov
@ 2011-05-20 18:37   ` Richard Henderson
  2011-05-26 12:36     ` Kirill Batuzov
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2011-05-20 18:37 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: mj.mccormack, qemu-devel, zhur

On 05/20/2011 05:39 AM, Kirill Batuzov wrote:
> +    case INDEX_op_sar_i32:
> +#if TCG_TARGET_REG_BITS == 64
> +        x &= 0xffffffff;
> +        y &= 0xffffffff;
> +#endif
> +        r = x & 0x80000000;
> +        x &= ~0x80000000;
> +        x >>= y;
> +        r |= r - (r >> y);
> +        x |= r;
> +        return x;
> +

Any reason you're emulating the 32-bit shift by
hand, rather than letting the compiler do it?  I.e.

  x = (int32_t)x >> (int32_t)y;


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 6/6] Do constant folding for unary operations.
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 6/6] Do constant folding for unary operations Kirill Batuzov
@ 2011-05-20 18:39   ` Richard Henderson
  0 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2011-05-20 18:39 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: mj.mccormack, qemu-devel, zhur

On 05/20/2011 05:39 AM, Kirill Batuzov wrote:
> +    case INDEX_op_ext8s_i64:
> +        return x & (1 << 7) ? x | ~0xffULL : x & 0xff;
> +
> +    case INDEX_op_ext16s_i64:
> +        return x & (1 << 15) ? x | ~0xffffULL : x & 0xffff;
> +
> +    case INDEX_op_ext32s_i64:
> +        return x & (1U << 31) ? x | ~0xffffffffULL : x & 0xffffffff;

Likewise for letting the compiler help with appropriate casts.


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] Do constant folding for boolean operations.
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 4/6] Do constant folding for boolean operations Kirill Batuzov
@ 2011-05-20 18:45   ` Richard Henderson
  0 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2011-05-20 18:45 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: mj.mccormack, qemu-devel, zhur

On 05/20/2011 05:39 AM, Kirill Batuzov wrote:
> +        case INDEX_op_or_i32:
> +        case INDEX_op_and_i32:
> +#if TCG_TARGET_REG_BITS == 64
> +        case INDEX_op_and_i64:
> +        case INDEX_op_or_i64:
> +#endif
> +            if (args[1] == args[2]) {
> +                if (args[1] == args[0]) {
> +                    args += 3;
> +                    gen_opc_buf[op_index] = INDEX_op_nop;
> +                } else {

I do wonder if it would be better to split this sort of optimization
out into a different function.  You're applying identity sorts of
functions here, where you're not doing it for other operations, 
such as x + 0.

Indeed, I'll argue that 0+x is more likely to happen than x|x, given
that the 0 value could have been relocation filled in by the linker.
Consider @hi16 and @lo16 relocation pairs when the symbol happens to
be linked into the low 64k of the address space.


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] Add copy and constant propagation.
  2011-05-20 18:22   ` Richard Henderson
@ 2011-05-20 18:46     ` Paolo Bonzini
  0 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2011-05-20 18:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: mj.mccormack, qemu-devel, zhur, Kirill Batuzov

On 05/20/2011 08:22 PM, Richard Henderson wrote:
> I think it might be clearer to have these two in a structure, rather
> than two separate arrays.  That does waste a bit of memory in padding
> for 64-bit target, but I think it might clean up the code a bit.

You can use the padding to implement a generation count.  O(1) clearing 
of the array might help performance somewhat.  At this point making the 
arrays static makes sense again, because it allows to bump the 
generation count at the beginning of tcg_constant_folding (if you put it 
on the stack you have to zero the state).

It could also help to add a num_copies flag tracking whether a register 
is a copy of this one.  If not, you can avoid processing the output 
registers in the default case.

That would be something like

struct {
     uint16_t state;
     uint16_t num_copies;
     uint32_t gen;
     tcg_target_ulong val;
};

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG
  2011-05-20 12:39 [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
                   ` (6 preceding siblings ...)
  2011-05-20 17:50 ` [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG Richard Henderson
@ 2011-05-20 19:35 ` Aurelien Jarno
  2011-05-21 12:47   ` Dmitry Zhurikhin
  2011-05-21 12:48   ` Aurelien Jarno
  7 siblings, 2 replies; 34+ messages in thread
From: Aurelien Jarno @ 2011-05-20 19:35 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: mj.mccormack, qemu-devel, zhur

On Fri, May 20, 2011 at 04:39:27PM +0400, Kirill Batuzov wrote:
> This series implements some basic machine-independent optimizations.  They
> simplify code and allow liveness analysis do it's work better.
> 
> Suppose we have following ARM code:
> 
>  movw    r12, #0xb6db
>  movt    r12, #0xdb6d
> 
> In TCG before optimizations we'll have:
> 
>  movi_i32 tmp8,$0xb6db
>  mov_i32 r12,tmp8
>  mov_i32 tmp8,r12
>  ext16u_i32 tmp8,tmp8
>  movi_i32 tmp9,$0xdb6d0000
>  or_i32 tmp8,tmp8,tmp9
>  mov_i32 r12,tmp8
> 
> And after optimizations we'll have this:
> 
>  movi_i32 r12,$0xdb6db6db
> 
> Here are performance evaluation results on SPEC CPU2000 integer tests in
> user-mode emulation on x86_64 host.  There were 5 runs of each test on
> reference data set.  The tables below show runtime in seconds for all these
> runs.

How are the tests done? Are they done with linux-user, or running the
executables in qemu-system-xxx?

> ARM guest without optimizations:
> Test name       #1       #2       #3       #4       #5    Median
> 164.gzip    1403.612 1403.499 1403.52  1208.55  1403.583 1403.52
> 175.vpr     1237.729 1238.008 1238.019 1176.852 1237.902 1237.902
> 176.gcc      929.511  928.867  929.048  928.927  928.792  928.927
> 181.mcf      196.371  196.335  196.172  197.057  196.196  196.335
> 186.crafty  1547.101 1547.293 1547.133 1547.037 1547.044 1547.101
> 197.parser  3804.336 3804.429 3804.412 3804.45  3804.301 3804.412
> 252.eon     2760.414 2760.45  2473.608 2760.606 2760.216 2760.414
> 253.perlbmk 2557.966 2558.971 2559.731 2479.299 2556.835 2557.966
> 256.bzip2   1296.412 1296.215 1296.63  1296.489 1296.092 1296.412
> 300.twolf   2919.496 2919.444 2919.529 2919.384 2919.404 2919.444
>       
> ARM guest with optimizations:
> Test name       #1       #2       #3       #4       #5    Median    Gain
> 164.gzip    1345.416 1401.741 1377.022 1401.737 1401.773 1401.737   0.13%
> 175.vpr     1116.75  1243.213 1243.32  1243.316 1243.144 1243.213  -0.43%
> 176.gcc      897.045  909.568  850.1    909.65   909.57   909.568   2.08%
> 181.mcf      199.058  198.717  198.28   198.866  197.955  198.717  -1.21%
> 186.crafty  1525.667 1526.663 1525.981 1525.995 1526.164 1525.995   1.36%
> 197.parser  3749.453 3749.522 3749.413 3749.5   3749.484 3749.484   1.44%
> 252.eon     2730.593 2746.525 2746.495 2746.493 2746.62  2746.495   0.50%
> 253.perlbmk 2577.341 2521.057 2578.461 2578.721 2581.313 2578.461  -0.80%
> 256.bzip2   1184.498 1190.116 1294.352 1294.554 1294.637 1294.352   0.16%
> 300.twolf   2894.264 2894.133 2894.398 2894.103 2894.146 2894.146   0.87%
> 
> 
> x86_64 guest without optimizations:
> Test name       #1       #2       #3       #4       #5    Median
> 164.gzip     858.118  858.151  858.09   858.139  858.122  858.122
> 175.vpr      956.361  956.465  956.521  956.438  956.705  956.465
> 176.gcc      647.275  647.465  647.186  647.294  647.268  647.275
> 181.mcf      219.239  221.964  220.244  220.74   220.559  220.559
> 186.crafty  1128.027 1128.071 1128.028 1128.115 1128.123 1128.071
> 197.parser  1815.669 1815.651 1815.669 1815.711 1815.759 1815.669
> 253.perlbmk 1777.143 1777.749 1667.508 1777.051 1778.391 1777.143
> 254.gap     1062.808 1062.758 1062.801 1063.099 1062.859 1062.808
> 255.vortex  1930.693 1930.706 1930.579 1930.7   1930.566 1930.693
> 256.bzip2   1014.566 1014.702 1014.6   1014.274 1014.421 1014.566
> 300.twolf   1342.653 1342.759 1344.092 1342.641 1342.794 1342.759
>      
> x86_64 guest with optimizations:
> Test name       #1       #2       #3       #4       #5    Median    Gain
> 164.gzip     857.485  857.457  857.475  857.509  857.507  857.485   0.07%
> 175.vpr      963.255  962.972  963.27   963.124  963.686  963.255  -0.71%
> 176.gcc      644.123  644.055  644.145  643.818  635.773  644.055   0.50%
> 181.mcf      216.215  217.549  218.744  216.437  217.83   217.549   1.36%
> 186.crafty  1128.873 1128.792 1128.871 1128.816 1128.823 1128.823  -0.07%
> 197.parser  1814.626 1814.503 1814.552 1814.602 1814.748 1814.602   0.06%
> 253.perlbmk 1758.056 1751.963 1753.267 1765.27  1759.828 1758.056   1.07%
> 254.gap     1064.702 1064.712 1064.629 1064.657 1064.645 1064.657  -0.17%
> 255.vortex  1760.638 1936.387 1937.871 1937.471 1760.496 1936.387  -0.29%
> 256.bzip2   1007.658 1007.682 1007.316 1007.982 1007.747 1007.682   0.68%
> 300.twolf   1334.139 1333.791 1333.795 1334.147 1333.732 1333.795   0.67%
> 
> ARM guests for 254.gap and 255.vortex and x86_64 guest for 252.eon does not
> work under QEMU for some unrelated reason.
> 
> Kirill Batuzov (6):
>   Add TCG optimizations stub
>   Add copy and constant propagation.
>   Do constant folding for basic arithmetic operations.
>   Do constant folding for boolean operations.
>   Do constant folding for shift operations.
>   Do constant folding for unary operations.
> 
>  Makefile.target |    2 +-
>  tcg/optimize.c  |  539 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tcg/tcg.c       |    6 +
>  tcg/tcg.h       |    3 +
>  4 files changed, 549 insertions(+), 1 deletions(-)
>  create mode 100644 tcg/optimize.c
> 
> -- 
> 1.7.4.1
> 
> 
> 

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG
  2011-05-20 17:50 ` [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG Richard Henderson
@ 2011-05-20 19:37   ` Aurelien Jarno
  2011-05-20 23:31     ` Andreas Färber
  0 siblings, 1 reply; 34+ messages in thread
From: Aurelien Jarno @ 2011-05-20 19:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: mj.mccormack, qemu-devel, zhur, Kirill Batuzov

On Fri, May 20, 2011 at 10:50:49AM -0700, Richard Henderson wrote:
> On 05/20/2011 05:39 AM, Kirill Batuzov wrote:
> > This series implements some basic machine-independent optimizations.  They
> > simplify code and allow liveness analysis do it's work better.
> > 
> > Suppose we have following ARM code:
> > 
> >  movw    r12, #0xb6db
> >  movt    r12, #0xdb6d
> > 
> > In TCG before optimizations we'll have:
> > 
> >  movi_i32 tmp8,$0xb6db
> >  mov_i32 r12,tmp8
> >  mov_i32 tmp8,r12
> >  ext16u_i32 tmp8,tmp8
> >  movi_i32 tmp9,$0xdb6d0000
> >  or_i32 tmp8,tmp8,tmp9
> >  mov_i32 r12,tmp8
> > 
> > And after optimizations we'll have this:
> > 
> >  movi_i32 r12,$0xdb6db6db
> > 
> > Here are performance evaluation results on SPEC CPU2000 integer tests in
> > user-mode emulation on x86_64 host.  There were 5 runs of each test on
> > reference data set.  The tables below show runtime in seconds for all these
> > runs.
> 
> I totally agree that this sort of optimization is needed in TCG.  Essentially
> all RISC guests have the same problem.  When emulating one RISC upon another,
> the problem may be exacerbated.  E.g. Sparc on PPC -- sparc will use a 21/11
> bit split of the constant, ppc will use a 16/16 split of the constant, which
> results in 3 insns in the generated code where 2 would do.
> 
> You should be aware of prior work in this area by Aurelien Jarno:
> 
>   git://git.aurel32.net/qemu.git tcg-optimizations
> 
> Given that's now 2 years old, and doesn't seem to be progressing, I hope your
> patch series can get things going again...

I basically stopped working on constant propagation, as while the TCG 
code looked nicer, the resulting code was always slower.

Since the discussion about TCG_AREG0, I have started to work again on
the register allocation (see the first patch series I sent about that),
I hope to have something ready by the end of the week-end.

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] Add copy and constant propagation.
  2011-05-20 12:39 ` [Qemu-devel] [PATCH 2/6] Add copy and constant propagation Kirill Batuzov
  2011-05-20 18:22   ` Richard Henderson
@ 2011-05-20 19:41   ` Aurelien Jarno
  1 sibling, 0 replies; 34+ messages in thread
From: Aurelien Jarno @ 2011-05-20 19:41 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: mj.mccormack, qemu-devel, zhur

On Fri, May 20, 2011 at 04:39:29PM +0400, Kirill Batuzov wrote:
> Make tcg_constant_folding do copy and constant propagation. It is a
> preparational work before actual constant folding.
> 
> Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
> ---
>  tcg/optimize.c |  123 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 123 insertions(+), 0 deletions(-)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index cf31d18..a761c51 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -31,22 +31,139 @@
>  #include "qemu-common.h"
>  #include "tcg-op.h"
>  
> +typedef enum {
> +    TCG_TEMP_UNDEF = 0,
> +    TCG_TEMP_CONST,
> +    TCG_TEMP_COPY,
> +    TCG_TEMP_ANY
> +} tcg_temp_state;
> +
> +static int mov_to_movi(int op)
> +{
> +    switch (op) {
> +    case INDEX_op_mov_i32: return INDEX_op_movi_i32;
> +#if TCG_TARGET_REG_BITS == 64
> +    case INDEX_op_mov_i64: return INDEX_op_movi_i64;
> +#endif
> +    default:
> +        fprintf(stderr, "Unrecognized operation %d in mov_to_movi.\n", op);
> +        tcg_abort();
> +    }
> +}
> +
> +/* Reset TEMP's state to TCG_TEMP_ANY.  If TEMP was a representative of some
> +   class of equivalent temp's, a new representative should be chosen in this
> +   class. */
> +static void reset_temp(tcg_temp_state *state, tcg_target_ulong *vals,
> +                       TCGArg temp, int nb_temps, int nb_globals)
> +{
> +    int i;
> +    TCGArg new_base;
> +    new_base = (TCGArg)-1;
> +    for (i = nb_globals; i < nb_temps; i++) {
> +        if (state[i] == TCG_TEMP_COPY && vals[i] == temp) {
> +            if (new_base == ((TCGArg)-1)) {
> +                new_base = (TCGArg)i;
> +                state[i] = TCG_TEMP_ANY;
> +            } else {
> +                vals[i] = new_base;
> +            }
> +        }
> +    }
> +    for (i = 0; i < nb_globals; i++) {
> +        if (state[i] == TCG_TEMP_COPY && vals[i] == temp) {
> +            if (new_base == ((TCGArg)-1)) {
> +                state[i] = TCG_TEMP_ANY;
> +            } else {
> +                vals[i] = new_base;
> +            }
> +        }
> +    }
> +    state[temp] = TCG_TEMP_ANY;
> +}
> +
> +/* Propagate constants and copies, fold constant expressions. */
>  static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>                                      TCGArg *args, TCGOpDef *tcg_op_defs)
>  {
>      int i, nb_ops, op_index, op, nb_temps, nb_globals;
>      const TCGOpDef *def;
>      TCGArg *gen_args;
> +    /* Array VALS has an element for each temp.
> +       If this temp holds a constant then its value is kept in VALS' element.
> +       If this temp is a copy of other ones then this equivalence class'
> +       representative is kept in VALS' element.
> +       If this temp is neither copy nor constant then corresponding VALS'
> +       element is unused. */
> +    static tcg_target_ulong vals[TCG_MAX_TEMPS];
> +    static tcg_temp_state state[TCG_MAX_TEMPS];
>  
>      nb_temps = s->nb_temps;
>      nb_globals = s->nb_globals;
> +    memset(state, 0, nb_temps * sizeof(tcg_temp_state));
>  
>      nb_ops = tcg_opc_ptr - gen_opc_buf;
>      gen_args = args;
>      for (op_index = 0; op_index < nb_ops; op_index++) {
>          op = gen_opc_buf[op_index];
>          def = &tcg_op_defs[op];
> +        /* Do copy propagation */
> +        if (op != INDEX_op_call) {
> +            for (i = def->nb_oargs; i < def->nb_oargs + def->nb_iargs; i++) {
> +                if (state[args[i]] == TCG_TEMP_COPY
> +                    && !(def->args_ct[i].ct & TCG_CT_IALIAS)
> +                    && (def->args_ct[i].ct & TCG_CT_REG)) {
> +                    args[i] = vals[args[i]];
> +                }
> +            }
> +        }
> +
> +        /* Propagate constants through copy operations and do constant
> +           folding.  Constants will be substituted to arguments by register
> +           allocator where needed and possible.  Also detect copies. */
>          switch (op) {
> +        case INDEX_op_mov_i32:
> +#if TCG_TARGET_REG_BITS == 64
> +        case INDEX_op_mov_i64:
> +#endif
> +            if ((state[args[1]] == TCG_TEMP_COPY
> +                && vals[args[1]] == args[0])
> +                || args[0] == args[1]) {
> +                args += 2;
> +                gen_opc_buf[op_index] = INDEX_op_nop;
> +                break;
> +            }
> +            if (state[args[1]] != TCG_TEMP_CONST) {
> +                reset_temp(state, vals, args[0], nb_temps, nb_globals);
> +                if (args[1] >= s->nb_globals) {
> +                    state[args[0]] = TCG_TEMP_COPY;
> +                    vals[args[0]] = args[1];
> +                }
> +                gen_args[0] = args[0];
> +                gen_args[1] = args[1];
> +                gen_args += 2;
> +                args += 2;
> +                break;
> +            } else {
> +                /* Source argument is constant.  Rewrite the operation and
> +                   let movi case handle it. */
> +                op = mov_to_movi(op);
> +                gen_opc_buf[op_index] = op;
> +                args[1] = vals[args[1]];
> +                /* fallthrough */
> +            }
> +        case INDEX_op_movi_i32:
> +#if TCG_TARGET_REG_BITS == 64
> +        case INDEX_op_movi_i64:
> +#endif
> +            reset_temp(state, vals, args[0], nb_temps, nb_globals);
> +            state[args[0]] = TCG_TEMP_CONST;
> +            vals[args[0]] = args[1];
> +            gen_args[0] = args[0];
> +            gen_args[1] = args[1];
> +            gen_args += 2;
> +            args += 2;
> +            break;
>          case INDEX_op_call:
>          case INDEX_op_jmp:
>          case INDEX_op_br:
> @@ -55,6 +172,7 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>  #if TCG_TARGET_REG_BITS == 64
>          case INDEX_op_brcond_i64:
>  #endif
> +            memset(state, 0, nb_temps * sizeof(tcg_temp_state));
>              i = (op == INDEX_op_call) ?
>                  (args[0] >> 16) + (args[0] & 0xffff) + 3 :
>                  def->nb_args;
> @@ -66,6 +184,11 @@ static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr,
>              }
>              break;
>          default:
> +            /* Default case: we do know nothing about operation so no
> +               propagation is done.  We only trash output args.  */
> +            for (i = 0; i < def->nb_oargs; i++) {
> +                reset_temp(state, vals, args[i], nb_temps, nb_globals);
> +            }
>              for (i = 0; i < def->nb_args; i++) {
>                  gen_args[i] = args[i];
>              }

It's not possible to do any optimization across certain ops that have
side effect or helper which access globals directly. For helpers this
can be easily detected looking at the TCG_CALL_PURE and TCG_CALL_CONST
flags. For ops, you should look at TCG_OPF_BB_END, TCG_OPF_CALL_CLOBBER
and TCG_OPF_SIDE_EFFECTS flags.

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG
  2011-05-20 19:37   ` Aurelien Jarno
@ 2011-05-20 23:31     ` Andreas Färber
  2011-05-21  9:37       ` Laurent Desnogues
  0 siblings, 1 reply; 34+ messages in thread
From: Andreas Färber @ 2011-05-20 23:31 UTC (permalink / raw)
  To: Aurelien Jarno
  Cc: mj.mccormack, Kirill Batuzov, qemu-devel, zhur, Richard Henderson

Am 20.05.2011 um 21:37 schrieb Aurelien Jarno:

> On Fri, May 20, 2011 at 10:50:49AM -0700, Richard Henderson wrote:
>> On 05/20/2011 05:39 AM, Kirill Batuzov wrote:
>>> This series implements some basic machine-independent  
>>> optimizations.  They
>>> simplify code and allow liveness analysis do it's work better.
>>>
>>> Suppose we have following ARM code:
>>>
>>> movw    r12, #0xb6db
>>> movt    r12, #0xdb6d
>>>
>>> In TCG before optimizations we'll have:
>>>
>>> movi_i32 tmp8,$0xb6db
>>> mov_i32 r12,tmp8
>>> mov_i32 tmp8,r12
>>> ext16u_i32 tmp8,tmp8
>>> movi_i32 tmp9,$0xdb6d0000
>>> or_i32 tmp8,tmp8,tmp9
>>> mov_i32 r12,tmp8
>>>
>>> And after optimizations we'll have this:
>>>
>>> movi_i32 r12,$0xdb6db6db
>>>
>>> Here are performance evaluation results on SPEC CPU2000 integer  
>>> tests in
>>> user-mode emulation on x86_64 host.  There were 5 runs of each  
>>> test on
>>> reference data set.  The tables below show runtime in seconds for  
>>> all these
>>> runs.
>>
>> I totally agree that this sort of optimization is needed in TCG.
>
> I basically stopped working on constant propagation, as while the TCG
> code looked nicer, the resulting code was always slower.

Has anyone evaluated reusing LLVM optimization passes for TCG? Or  
maybe GIMPL if there's an equivalent?

Andreas

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG
  2011-05-20 23:31     ` Andreas Färber
@ 2011-05-21  9:37       ` Laurent Desnogues
  2011-05-21 10:46         ` Aurelien Jarno
  0 siblings, 1 reply; 34+ messages in thread
From: Laurent Desnogues @ 2011-05-21  9:37 UTC (permalink / raw)
  To: Andreas Färber
  Cc: mj.mccormack, qemu-devel, zhur, Kirill Batuzov, Aurelien Jarno,
	Richard Henderson

On Sat, May 21, 2011 at 1:31 AM, Andreas Färber <andreas.faerber@web.de> wrote:
[...]
> Has anyone evaluated reusing LLVM optimization passes for TCG? Or maybe
> GIMPL if there's an equivalent?

IMHO the qemu_ld/st semantics and the size of TB blocks
will always limit the usefulness of more involved
optimizations than what already exists in QEMU.

Basically qemu_ld/st ops prevent any code change across
them which make optimization opportunities rather low.
Considering they always take the fast path would surely
change the situation, but then the slow path would have to
be able to rebuild information;  certainly possible, but
quite involved too :-)

Once this qemu_ld/st change is done, you'll the need to
get to the problem of too short TB blocks;  a way to do
this is to generate traces of frequently executed blocks
and consider them as a single block[1];  that would open
the door to many more code generation improvements.

This certainly doesn't mean Kirill and Aurélien changes
are not welcome, just that I think, in the current state,
such changes will always have a limited impact.


Laurent

[1] See for instance section 2.3 of Derek Bruening's
    PhD thesis about DynamoRIO
    http://www.burningcutlery.com/derek/phd.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG
  2011-05-21  9:37       ` Laurent Desnogues
@ 2011-05-21 10:46         ` Aurelien Jarno
  2011-05-21 17:53           ` Kirill Batuzov
  0 siblings, 1 reply; 34+ messages in thread
From: Aurelien Jarno @ 2011-05-21 10:46 UTC (permalink / raw)
  To: Laurent Desnogues
  Cc: mj.mccormack, qemu-devel, zhur, Kirill Batuzov,
	Andreas Färber, Richard Henderson

On Sat, May 21, 2011 at 11:37:49AM +0200, Laurent Desnogues wrote:
> On Sat, May 21, 2011 at 1:31 AM, Andreas Färber <andreas.faerber@web.de> wrote:
> [...]
> > Has anyone evaluated reusing LLVM optimization passes for TCG? Or maybe
> > GIMPL if there's an equivalent?
> 
> IMHO the qemu_ld/st semantics and the size of TB blocks
> will always limit the usefulness of more involved
> optimizations than what already exists in QEMU.
> 
> Basically qemu_ld/st ops prevent any code change across
> them which make optimization opportunities rather low.
> Considering they always take the fast path would surely
> change the situation, but then the slow path would have to
> be able to rebuild information;  certainly possible, but
> quite involved too :-)

Actually one way to improve this is to consider that the value in TCG
registers should be in sync with the memory. This is the only needed
thing to get the CPU state correct in case the slow path is taken and
an exception is triggered.

This is not something difficult to do, and already allow a few more
optimizations accross the qemu_ld/st ops, though not as many as we want.
However, given the current register allocator, qemu_ld/st ops have the
good effect of saving globals back to their canonical location, and thus
freeing some register. If you only sync them, you end up with all
globals being free a lot later even if they are not used, which means
plenty of suboptimal register spills.

We definitely need to rewrite/improve the register allocator to save a
global back to memory when it is not used later in the TB. I am
currently working on that, I hope to have something ready soon.

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG
  2011-05-20 19:35 ` Aurelien Jarno
@ 2011-05-21 12:47   ` Dmitry Zhurikhin
  2011-05-21 12:48   ` Aurelien Jarno
  1 sibling, 0 replies; 34+ messages in thread
From: Dmitry Zhurikhin @ 2011-05-21 12:47 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: mj.mccormack, qemu-devel, Kirill Batuzov

On 05/20/2011 11:35 PM, Aurelien Jarno wrote:
> 
> On Fri, May 20, 2011 at 04:39:27PM +0400, Kirill Batuzov wrote:
>> This series implements some basic machine-independent optimizations.  They
>> simplify code and allow liveness analysis do it's work better.
>>
>> Suppose we have following ARM code:
>>
>>  movw    r12, #0xb6db
>>  movt    r12, #0xdb6d
>>
>> In TCG before optimizations we'll have:
>>
>>  movi_i32 tmp8,$0xb6db
>>  mov_i32 r12,tmp8
>>  mov_i32 tmp8,r12
>>  ext16u_i32 tmp8,tmp8
>>  movi_i32 tmp9,$0xdb6d0000
>>  or_i32 tmp8,tmp8,tmp9
>>  mov_i32 r12,tmp8
>>
>> And after optimizations we'll have this:
>>
>>  movi_i32 r12,$0xdb6db6db
>>
>> Here are performance evaluation results on SPEC CPU2000 integer tests in
>> user-mode emulation on x86_64 host.  There were 5 runs of each test on
>> reference data set.  The tables below show runtime in seconds for all these
>> runs.
> 
> How are the tests done? Are they done with linux-user, or running the
> executables in qemu-system-xxx?
They were run in user mode on a dedicated machine not doing anything
else.  We found system emulation to be too volatile for measuring
anything.  Anyway even with user mode and on a dedicated machine there
are some weird performance jumps we can't explain but overall SPEC seems
stable enough to notice influence of such changes in code generation.

> 
>> ...

	Dmitry

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG
  2011-05-20 19:35 ` Aurelien Jarno
  2011-05-21 12:47   ` Dmitry Zhurikhin
@ 2011-05-21 12:48   ` Aurelien Jarno
  1 sibling, 0 replies; 34+ messages in thread
From: Aurelien Jarno @ 2011-05-21 12:48 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: mj.mccormack, qemu-devel, zhur

On Fri, May 20, 2011 at 09:35:08PM +0200, Aurelien Jarno wrote:
> On Fri, May 20, 2011 at 04:39:27PM +0400, Kirill Batuzov wrote:
> > This series implements some basic machine-independent optimizations.  They
> > simplify code and allow liveness analysis do it's work better.
> > 
> > Suppose we have following ARM code:
> > 
> >  movw    r12, #0xb6db
> >  movt    r12, #0xdb6d
> > 
> > In TCG before optimizations we'll have:
> > 
> >  movi_i32 tmp8,$0xb6db
> >  mov_i32 r12,tmp8
> >  mov_i32 tmp8,r12
> >  ext16u_i32 tmp8,tmp8
> >  movi_i32 tmp9,$0xdb6d0000
> >  or_i32 tmp8,tmp8,tmp9
> >  mov_i32 r12,tmp8
> > 
> > And after optimizations we'll have this:
> > 
> >  movi_i32 r12,$0xdb6db6db
> > 
> > Here are performance evaluation results on SPEC CPU2000 integer tests in
> > user-mode emulation on x86_64 host.  There were 5 runs of each test on
> > reference data set.  The tables below show runtime in seconds for all these
> > runs.
> 
> How are the tests done? Are they done with linux-user, or running the
> executables in qemu-system-xxx?
> 

Another point I forgot, the current TCG code already does a very simple
copy and constant propagation, so it might be a good idea to end the
series with cleanup this code. There is no point in doing such
optimisation twice, and it is most probably taking a non negligible 
time. This way you can provide benchmarks on the whole set.

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG
  2011-05-21 10:46         ` Aurelien Jarno
@ 2011-05-21 17:53           ` Kirill Batuzov
  0 siblings, 0 replies; 34+ messages in thread
From: Kirill Batuzov @ 2011-05-21 17:53 UTC (permalink / raw)
  To: Aurelien Jarno
  Cc: mj.mccormack, qemu-devel, zhur, Andreas Färber,
	Laurent Desnogues, Richard Henderson

On 21.05.2011 14:46, Aurelien Jarno wrote:
> We definitely need to rewrite/improve the register allocator to save a
> global back to memory when it is not used later in the TB. I am
> currently working on that, I hope to have something ready soon.

I think I have this done already. The patches need cleaning up and 
performance testing but I'll post them (with RFC tag) on monday 
nevertheless to avoid work duplication.

----
   Kirill.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations.
  2011-05-20 18:37   ` Richard Henderson
@ 2011-05-26 12:36     ` Kirill Batuzov
  2011-05-26 13:56       ` Richard Henderson
  0 siblings, 1 reply; 34+ messages in thread
From: Kirill Batuzov @ 2011-05-26 12:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: mj.mccormack, qemu-devel, zhur



On Fri, 20 May 2011, Richard Henderson wrote:

> 
> On 05/20/2011 05:39 AM, Kirill Batuzov wrote:
> > +    case INDEX_op_sar_i32:
> > +#if TCG_TARGET_REG_BITS == 64
> > +        x &= 0xffffffff;
> > +        y &= 0xffffffff;
> > +#endif
> > +        r = x & 0x80000000;
> > +        x &= ~0x80000000;
> > +        x >>= y;
> > +        r |= r - (r >> y);
> > +        x |= r;
> > +        return x;
> > +
> 
> Any reason you're emulating the 32-bit shift by
> hand, rather than letting the compiler do it?  I.e.
> 
>   x = (int32_t)x >> (int32_t)y;
>
This expression has an implementation-defined behavior accroding to
C99 6.5.7 so we decided to emulate signed shifts by hand.

----
  Kirill.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations.
  2011-05-26 12:36     ` Kirill Batuzov
@ 2011-05-26 13:56       ` Richard Henderson
  2011-05-26 19:14         ` Blue Swirl
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2011-05-26 13:56 UTC (permalink / raw)
  To: Kirill Batuzov; +Cc: mj.mccormack, qemu-devel, zhur

On 05/26/2011 05:36 AM, Kirill Batuzov wrote:
>>   x = (int32_t)x >> (int32_t)y;
>>
> This expression has an implementation-defined behavior accroding to
> C99 6.5.7 so we decided to emulate signed shifts by hand.

Technically, yes.  In practice, no.  GCC, ICC, LLVM, MSVC all know
what the user wants here and will implement it "properly".


r~

  

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations.
  2011-05-26 13:56       ` Richard Henderson
@ 2011-05-26 19:14         ` Blue Swirl
  2011-05-26 20:10           ` Richard Henderson
  2011-05-27  7:09           ` Paolo Bonzini
  0 siblings, 2 replies; 34+ messages in thread
From: Blue Swirl @ 2011-05-26 19:14 UTC (permalink / raw)
  To: Richard Henderson; +Cc: mj.mccormack, qemu-devel, zhur, Kirill Batuzov

On Thu, May 26, 2011 at 4:56 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 05/26/2011 05:36 AM, Kirill Batuzov wrote:
>>>   x = (int32_t)x >> (int32_t)y;
>>>
>> This expression has an implementation-defined behavior accroding to
>> C99 6.5.7 so we decided to emulate signed shifts by hand.
>
> Technically, yes.  In practice, no.  GCC, ICC, LLVM, MSVC all know
> what the user wants here and will implement it "properly".

Can't this be probed by configure? Then a wrapper could be introduced
for signed shifts.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations.
  2011-05-26 19:14         ` Blue Swirl
@ 2011-05-26 20:10           ` Richard Henderson
  2011-05-26 20:25             ` Blue Swirl
  2011-05-27  7:09           ` Paolo Bonzini
  1 sibling, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2011-05-26 20:10 UTC (permalink / raw)
  To: Blue Swirl; +Cc: mj.mccormack, qemu-devel, zhur, Kirill Batuzov

On 05/26/2011 12:14 PM, Blue Swirl wrote:
> On Thu, May 26, 2011 at 4:56 PM, Richard Henderson <rth@twiddle.net> wrote:
>> On 05/26/2011 05:36 AM, Kirill Batuzov wrote:
>>>>   x = (int32_t)x >> (int32_t)y;
>>>>
>>> This expression has an implementation-defined behavior accroding to
>>> C99 6.5.7 so we decided to emulate signed shifts by hand.
>>
>> Technically, yes.  In practice, no.  GCC, ICC, LLVM, MSVC all know
>> what the user wants here and will implement it "properly".
> 
> Can't this be probed by configure? Then a wrapper could be introduced
> for signed shifts.

I don't see the point.  The C99 implementation defined escape hatch
exists for weird cpus.  Which we won't be supporting as a QEMU host.


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations.
  2011-05-26 20:10           ` Richard Henderson
@ 2011-05-26 20:25             ` Blue Swirl
  2011-05-26 21:14               ` Richard Henderson
  0 siblings, 1 reply; 34+ messages in thread
From: Blue Swirl @ 2011-05-26 20:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: mj.mccormack, qemu-devel, zhur, Kirill Batuzov

On Thu, May 26, 2011 at 11:10 PM, Richard Henderson <rth@twiddle.net> wrote:
> On 05/26/2011 12:14 PM, Blue Swirl wrote:
>> On Thu, May 26, 2011 at 4:56 PM, Richard Henderson <rth@twiddle.net> wrote:
>>> On 05/26/2011 05:36 AM, Kirill Batuzov wrote:
>>>>>   x = (int32_t)x >> (int32_t)y;
>>>>>
>>>> This expression has an implementation-defined behavior accroding to
>>>> C99 6.5.7 so we decided to emulate signed shifts by hand.
>>>
>>> Technically, yes.  In practice, no.  GCC, ICC, LLVM, MSVC all know
>>> what the user wants here and will implement it "properly".
>>
>> Can't this be probed by configure? Then a wrapper could be introduced
>> for signed shifts.
>
> I don't see the point.  The C99 implementation defined escape hatch
> exists for weird cpus.  Which we won't be supporting as a QEMU host.

Maybe not, but a compiler with this property could arrive. For
example, GCC developers could decide that since this weirdness is
allowed by the standard, it may be implemented as well.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations.
  2011-05-26 20:25             ` Blue Swirl
@ 2011-05-26 21:14               ` Richard Henderson
  2011-05-27 15:41                 ` Jamie Lokier
  2011-05-27 17:07                 ` Blue Swirl
  0 siblings, 2 replies; 34+ messages in thread
From: Richard Henderson @ 2011-05-26 21:14 UTC (permalink / raw)
  To: Blue Swirl; +Cc: mj.mccormack, qemu-devel, zhur, Kirill Batuzov

On 05/26/2011 01:25 PM, Blue Swirl wrote:
>> I don't see the point.  The C99 implementation defined escape hatch
>> exists for weird cpus.  Which we won't be supporting as a QEMU host.
> 
> Maybe not, but a compiler with this property could arrive. For
> example, GCC developers could decide that since this weirdness is
> allowed by the standard, it may be implemented as well.

If you like, you can write a configure test for it.  But, honestly,
essentially every place in qemu that uses shifts on signed types
would have to be audited.  Really.

The C99 hook exists to efficiently support targets that don't have
arithmetic shift operations.  Honestly.


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations.
  2011-05-26 19:14         ` Blue Swirl
  2011-05-26 20:10           ` Richard Henderson
@ 2011-05-27  7:09           ` Paolo Bonzini
  1 sibling, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2011-05-27  7:09 UTC (permalink / raw)
  To: Blue Swirl
  Cc: mj.mccormack, Kirill Batuzov, qemu-devel, zhur, Richard Henderson

On 05/26/2011 09:14 PM, Blue Swirl wrote:
>>>>    x = (int32_t)x>>  (int32_t)y;
>>>> >>>
>>> >>  This expression has an implementation-defined behavior accroding to
>>> >>  C99 6.5.7 so we decided to emulate signed shifts by hand.
>> >
>> >  Technically, yes.  In practice, no.  GCC, ICC, LLVM, MSVC all know
>> >  what the user wants here and will implement it "properly".
>
> Can't this be probed by configure? Then a wrapper could be introduced
> for signed shifts.

The reason for implementation-defined behavior is basically to allow for 
non-two's-complement machine, which isn't really practical to support.

Paolo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations.
  2011-05-26 21:14               ` Richard Henderson
@ 2011-05-27 15:41                 ` Jamie Lokier
  2011-05-27 17:07                 ` Blue Swirl
  1 sibling, 0 replies; 34+ messages in thread
From: Jamie Lokier @ 2011-05-27 15:41 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Blue Swirl, mj.mccormack, qemu-devel, zhur, Kirill Batuzov

Richard Henderson wrote:
> On 05/26/2011 01:25 PM, Blue Swirl wrote:
> >> I don't see the point.  The C99 implementation defined escape hatch
> >> exists for weird cpus.  Which we won't be supporting as a QEMU host.
> > 
> > Maybe not, but a compiler with this property could arrive. For
> > example, GCC developers could decide that since this weirdness is
> > allowed by the standard, it may be implemented as well.
> 
> If you like, you can write a configure test for it.  But, honestly,
> essentially every place in qemu that uses shifts on signed types
> would have to be audited.  Really.

I agree, the chance of qemu ever working, or needing to work, on a non
two's complement machine is pretty remote!

> The C99 hook exists to efficiently support targets that don't have
> arithmetic shift operations.  Honestly.

If you care, this should be portable without a configure test, as
constant folding should have the same behaviour:

    (((int32_t)-3 >> 1 == (int32_t)-2)
     ? (int32_t)x >> (int32_t)y
     : long_winded_portable_shift_right(x, y))

-- Jamie

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations.
  2011-05-26 21:14               ` Richard Henderson
  2011-05-27 15:41                 ` Jamie Lokier
@ 2011-05-27 17:07                 ` Blue Swirl
  2011-05-27 19:54                   ` Richard Henderson
  1 sibling, 1 reply; 34+ messages in thread
From: Blue Swirl @ 2011-05-27 17:07 UTC (permalink / raw)
  To: Richard Henderson; +Cc: mj.mccormack, qemu-devel, zhur, Kirill Batuzov

On Fri, May 27, 2011 at 12:14 AM, Richard Henderson <rth@twiddle.net> wrote:
> On 05/26/2011 01:25 PM, Blue Swirl wrote:
>>> I don't see the point.  The C99 implementation defined escape hatch
>>> exists for weird cpus.  Which we won't be supporting as a QEMU host.
>>
>> Maybe not, but a compiler with this property could arrive. For
>> example, GCC developers could decide that since this weirdness is
>> allowed by the standard, it may be implemented as well.
>
> If you like, you can write a configure test for it.  But, honestly,
> essentially every place in qemu that uses shifts on signed types
> would have to be audited.  Really.

OK.

> The C99 hook exists to efficiently support targets that don't have
> arithmetic shift operations.  Honestly.

So it would be impossible for a compiler developer to change the logic
for shifts for some supported two's-complement logic CPUs (like x86)
just because it's legal?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations.
  2011-05-27 17:07                 ` Blue Swirl
@ 2011-05-27 19:54                   ` Richard Henderson
  0 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2011-05-27 19:54 UTC (permalink / raw)
  To: Blue Swirl; +Cc: mj.mccormack, qemu-devel, zhur, Kirill Batuzov

On 05/27/2011 10:07 AM, Blue Swirl wrote:
>> The C99 hook exists to efficiently support targets that don't have
>> arithmetic shift operations.  Honestly.
> 
> So it would be impossible for a compiler developer to change the logic
> for shifts for some supported two's-complement logic CPUs (like x86)
> just because it's legal?

Not without being lynched, no.


r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2011-05-27 19:54 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-20 12:39 [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG Kirill Batuzov
2011-05-20 12:39 ` [Qemu-devel] [PATCH 1/6] Add TCG optimizations stub Kirill Batuzov
2011-05-20 18:12   ` Richard Henderson
2011-05-20 18:33     ` Richard Henderson
2011-05-20 12:39 ` [Qemu-devel] [PATCH 2/6] Add copy and constant propagation Kirill Batuzov
2011-05-20 18:22   ` Richard Henderson
2011-05-20 18:46     ` Paolo Bonzini
2011-05-20 19:41   ` Aurelien Jarno
2011-05-20 12:39 ` [Qemu-devel] [PATCH 3/6] Do constant folding for basic arithmetic operations Kirill Batuzov
2011-05-20 12:39 ` [Qemu-devel] [PATCH 4/6] Do constant folding for boolean operations Kirill Batuzov
2011-05-20 18:45   ` Richard Henderson
2011-05-20 12:39 ` [Qemu-devel] [PATCH 5/6] Do constant folding for shift operations Kirill Batuzov
2011-05-20 18:37   ` Richard Henderson
2011-05-26 12:36     ` Kirill Batuzov
2011-05-26 13:56       ` Richard Henderson
2011-05-26 19:14         ` Blue Swirl
2011-05-26 20:10           ` Richard Henderson
2011-05-26 20:25             ` Blue Swirl
2011-05-26 21:14               ` Richard Henderson
2011-05-27 15:41                 ` Jamie Lokier
2011-05-27 17:07                 ` Blue Swirl
2011-05-27 19:54                   ` Richard Henderson
2011-05-27  7:09           ` Paolo Bonzini
2011-05-20 12:39 ` [Qemu-devel] [PATCH 6/6] Do constant folding for unary operations Kirill Batuzov
2011-05-20 18:39   ` Richard Henderson
2011-05-20 17:50 ` [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG Richard Henderson
2011-05-20 19:37   ` Aurelien Jarno
2011-05-20 23:31     ` Andreas Färber
2011-05-21  9:37       ` Laurent Desnogues
2011-05-21 10:46         ` Aurelien Jarno
2011-05-21 17:53           ` Kirill Batuzov
2011-05-20 19:35 ` Aurelien Jarno
2011-05-21 12:47   ` Dmitry Zhurikhin
2011-05-21 12:48   ` Aurelien Jarno

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.