All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator
@ 2012-09-27 17:15 Aurelien Jarno
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 01/13] tcg: add temp_dead() Aurelien Jarno
                   ` (12 more replies)
  0 siblings, 13 replies; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

This patch series rework the liveness analysis and register allocator
in order to generate more optimized code, by avoiding a lot of move
instructions. I have measured a 9% performance improvement in user mode
and 4% in system mode.

The idea behind this patch series is to free registers as soon as the
temps are not used anymore instead of waiting for a basic block end or
an op with side effects. In addition temps are copied to memory as soon
as they are not going to be written anymore, this way even globals can
be marked as "dead", avoiding moves to a new register when inputs and
outputs are aliased. Finally qemu_ld/st operations do not save back
globals to memory, but only copy them there. In case of an exception
the globals have the correct values, and otherwise they do not have
to be reloaded.

Overall this greatly reduces the number of moves emitted, and spread
them all over the TBs, increasing the performances on in-order CPUs.
This also reduces register spilling, especially on CPUs with few
registers.

In practice it means the liveness analysis is providing more
information to the register allocator, and especially when to the memory
version of a temp with the content of the associated register. This
means that the two are now quite linked, and that for some functions the
code exist in two versions, one used when the liveness analysis is
enabled which only does some checks with assert(), the other when it is
disabled. It might be possible to keep only one version, but it implies
de-optimizing the liveness analysis disabled case. In any case the
checks with assert() should be kept, as they are quite useful to make
sure nothing subtly breaks.

Aurelien Jarno (13):
  tcg: add temp_dead()
  tcg: add tcg_reg_sync()
  tcg: add temp_sync()
  tcg: sync output arguments on liveness request
  tcg: rework liveness analysis
  tcg: improve tcg_reg_alloc_movi()
  tcg: rewrite tcg_reg_alloc_mov()
  tcg: always mark dead input arguments as dead
  tcg: start with local temps in TEMP_VAL_MEM state
  tcg: don't explicitely save globals and temps
  tcg: sync globals for pure helpers instead of saving them
  tcg: fix some op flags
  tcg: rework TCG ops flags

 tcg/README    |   15 +-
 tcg/tcg-opc.h |   55 ++++---
 tcg/tcg.c     |  443 ++++++++++++++++++++++++++++++++++++---------------------
 tcg/tcg.h     |   19 ++-
 4 files changed, 329 insertions(+), 203 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 01/13] tcg: add temp_dead()
  2012-09-27 17:15 [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator Aurelien Jarno
@ 2012-09-27 17:15 ` Aurelien Jarno
  2012-09-27 18:19   ` Richard Henderson
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 02/13] tcg: add tcg_reg_sync() Aurelien Jarno
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

A lot of code is duplicated to mark a temporary as dead. Replace it
by temp_dead(), which in addition marks the temp as saved in memory
for globals and local temps, instead of doing this a posteriori in
temp_save().

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg.c |   67 ++++++++++++++++++++++++++++---------------------------------
 1 file changed, 31 insertions(+), 36 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index c069e44..c739a3b 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1516,6 +1516,24 @@ static int tcg_reg_alloc(TCGContext *s, TCGRegSet reg1, TCGRegSet reg2)
     tcg_abort();
 }
 
+/* mark a temporary as dead. */
+static inline void temp_dead(TCGContext *s, int temp)
+{
+    TCGTemp *ts;
+
+    ts = &s->temps[temp];
+    if (!ts->fixed_reg) {
+        if (ts->val_type == TEMP_VAL_REG) {
+            s->reg_to_temp[ts->reg] = -1;
+        }
+        if (temp < s->nb_globals || (ts->temp_local && ts->mem_allocated)) {
+            ts->val_type = TEMP_VAL_MEM;
+        } else {
+            ts->val_type = TEMP_VAL_DEAD;
+        }
+    }
+}
+
 /* save a temporary to memory. 'allocated_regs' is used in case a
    temporary registers needs to be allocated to store a constant. */
 static void temp_save(TCGContext *s, int temp, TCGRegSet allocated_regs)
@@ -1573,10 +1591,7 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
         if (ts->temp_local) {
             temp_save(s, i, allocated_regs);
         } else {
-            if (ts->val_type == TEMP_VAL_REG) {
-                s->reg_to_temp[ts->reg] = -1;
-            }
-            ts->val_type = TEMP_VAL_DEAD;
+            temp_dead(s, i);
         }
     }
 
@@ -1625,8 +1640,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
             if (ots->val_type == TEMP_VAL_REG)
                 s->reg_to_temp[ots->reg] = -1;
             reg = ts->reg;
-            s->reg_to_temp[reg] = -1;
-            ts->val_type = TEMP_VAL_DEAD;
+            temp_dead(s, args[1]);
         } else {
             if (ots->val_type == TEMP_VAL_REG) {
                 reg = ots->reg;
@@ -1753,14 +1767,8 @@ static void tcg_reg_alloc_op(TCGContext *s,
     } else {
         /* mark dead temporaries and free the associated registers */
         for(i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-            arg = args[i];
             if (IS_DEAD_ARG(i)) {
-                ts = &s->temps[arg];
-                if (!ts->fixed_reg) {
-                    if (ts->val_type == TEMP_VAL_REG)
-                        s->reg_to_temp[ts->reg] = -1;
-                    ts->val_type = TEMP_VAL_DEAD;
-                }
+                temp_dead(s, args[i]);
             }
         }
         
@@ -1800,11 +1808,12 @@ static void tcg_reg_alloc_op(TCGContext *s,
             tcg_regset_set_reg(allocated_regs, reg);
             /* if a fixed register is used, then a move will be done afterwards */
             if (!ts->fixed_reg) {
-                if (ts->val_type == TEMP_VAL_REG)
-                    s->reg_to_temp[ts->reg] = -1;
                 if (IS_DEAD_ARG(i)) {
-                    ts->val_type = TEMP_VAL_DEAD;
+                    temp_dead(s, args[i]);
                 } else {
+                    if (ts->val_type == TEMP_VAL_REG) {
+                        s->reg_to_temp[ts->reg] = -1;
+                    }
                     ts->val_type = TEMP_VAL_REG;
                     ts->reg = reg;
                     /* temp value is modified, so the value kept in memory is
@@ -1963,14 +1972,8 @@ static int tcg_reg_alloc_call(TCGContext *s, const TCGOpDef *def,
     
     /* mark dead temporaries and free the associated registers */
     for(i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
-        arg = args[i];
         if (IS_DEAD_ARG(i)) {
-            ts = &s->temps[arg];
-            if (!ts->fixed_reg) {
-                if (ts->val_type == TEMP_VAL_REG)
-                    s->reg_to_temp[ts->reg] = -1;
-                ts->val_type = TEMP_VAL_DEAD;
-            }
+            temp_dead(s, args[i]);
         }
     }
     
@@ -2000,11 +2003,12 @@ static int tcg_reg_alloc_call(TCGContext *s, const TCGOpDef *def,
                 tcg_out_mov(s, ts->type, ts->reg, reg);
             }
         } else {
-            if (ts->val_type == TEMP_VAL_REG)
-                s->reg_to_temp[ts->reg] = -1;
             if (IS_DEAD_ARG(i)) {
-                ts->val_type = TEMP_VAL_DEAD;
+                temp_dead(s, args[i]);
             } else {
+                if (ts->val_type == TEMP_VAL_REG) {
+                    s->reg_to_temp[ts->reg] = -1;
+                }
                 ts->val_type = TEMP_VAL_REG;
                 ts->reg = reg;
                 ts->mem_coherent = 0;
@@ -2119,16 +2123,7 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
             args += args[0];
             goto next;
         case INDEX_op_discard:
-            {
-                TCGTemp *ts;
-                ts = &s->temps[args[0]];
-                /* mark the temporary as dead */
-                if (!ts->fixed_reg) {
-                    if (ts->val_type == TEMP_VAL_REG)
-                        s->reg_to_temp[ts->reg] = -1;
-                    ts->val_type = TEMP_VAL_DEAD;
-                }
-            }
+            temp_dead(s, args[0]);
             break;
         case INDEX_op_set_label:
             tcg_reg_alloc_bb_end(s, s->reserved_regs);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 02/13] tcg: add tcg_reg_sync()
  2012-09-27 17:15 [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator Aurelien Jarno
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 01/13] tcg: add temp_dead() Aurelien Jarno
@ 2012-09-27 17:15 ` Aurelien Jarno
  2012-09-27 18:24   ` Richard Henderson
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 03/13] tcg: add temp_sync() Aurelien Jarno
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Add a new function tcg_reg_sync() to synchronize the canonical location
of a temp with the value in the associated register, but without freeing
it. Rewrite tcg_reg_free() to first call tcg_reg_sync() and then to free
the register.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg.c |   29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index c739a3b..2ac7097 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1469,22 +1469,33 @@ static void temp_allocate_frame(TCGContext *s, int temp)
     s->current_frame_offset += (tcg_target_long)sizeof(tcg_target_long);
 }
 
+/* sync register 'reg' by saving it to the corresponding temporary */
+static inline void tcg_reg_sync(TCGContext *s, int reg)
+{
+    TCGTemp *ts;
+    int temp;
+
+    temp = s->reg_to_temp[reg];
+    ts = &s->temps[temp];
+    assert(ts->val_type == TEMP_VAL_REG);
+    if (!ts->mem_coherent && !ts->fixed_reg) {
+        if (!ts->mem_allocated) {
+            temp_allocate_frame(s, temp);
+        }
+        tcg_out_st(s, ts->type, reg, ts->mem_reg, ts->mem_offset);
+    }
+    ts->mem_coherent = 1;
+}
+
 /* free register 'reg' by spilling the corresponding temporary if necessary */
 static void tcg_reg_free(TCGContext *s, int reg)
 {
-    TCGTemp *ts;
     int temp;
 
     temp = s->reg_to_temp[reg];
     if (temp != -1) {
-        ts = &s->temps[temp];
-        assert(ts->val_type == TEMP_VAL_REG);
-        if (!ts->mem_coherent) {
-            if (!ts->mem_allocated) 
-                temp_allocate_frame(s, temp);
-            tcg_out_st(s, ts->type, reg, ts->mem_reg, ts->mem_offset);
-        }
-        ts->val_type = TEMP_VAL_MEM;
+        tcg_reg_sync(s, reg);
+        s->temps[temp].val_type = TEMP_VAL_MEM;
         s->reg_to_temp[reg] = -1;
     }
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 03/13] tcg: add temp_sync()
  2012-09-27 17:15 [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator Aurelien Jarno
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 01/13] tcg: add temp_dead() Aurelien Jarno
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 02/13] tcg: add tcg_reg_sync() Aurelien Jarno
@ 2012-09-27 17:15 ` Aurelien Jarno
  2012-09-27 18:30   ` Richard Henderson
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 04/13] tcg: sync output arguments on liveness request Aurelien Jarno
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Add a new function temp_sync() to synchronize the canonical location
of a temp with the value in the corresponding register, but without
freeing the associated register. Rewrite temp_save() to call
temp_sync() followed by temp_dead().

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg.c |   33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 2ac7097..fb2223f 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1545,31 +1545,28 @@ static inline void temp_dead(TCGContext *s, int temp)
     }
 }
 
-/* save a temporary to memory. 'allocated_regs' is used in case a
+/* sync a temporary to memory. 'allocated_regs' is used in case a
    temporary registers needs to be allocated to store a constant. */
-static void temp_save(TCGContext *s, int temp, TCGRegSet allocated_regs)
+static inline void temp_sync(TCGContext *s, int temp, TCGRegSet allocated_regs)
 {
     TCGTemp *ts;
-    int reg;
 
     ts = &s->temps[temp];
     if (!ts->fixed_reg) {
         switch(ts->val_type) {
         case TEMP_VAL_REG:
-            tcg_reg_free(s, ts->reg);
-            break;
-        case TEMP_VAL_DEAD:
-            ts->val_type = TEMP_VAL_MEM;
+            tcg_reg_sync(s, ts->reg);
             break;
         case TEMP_VAL_CONST:
-            reg = tcg_reg_alloc(s, tcg_target_available_regs[ts->type], 
-                                allocated_regs);
-            if (!ts->mem_allocated) 
-                temp_allocate_frame(s, temp);
-            tcg_out_movi(s, ts->type, reg, ts->val);
-            tcg_out_st(s, ts->type, reg, ts->mem_reg, ts->mem_offset);
-            ts->val_type = TEMP_VAL_MEM;
+            ts->reg = tcg_reg_alloc(s, tcg_target_available_regs[ts->type],
+                                    allocated_regs);
+            ts->val_type = TEMP_VAL_REG;
+            s->reg_to_temp[ts->reg] = temp;
+            ts->mem_coherent = 0;
+            tcg_out_movi(s, ts->type, ts->reg, ts->val);
+            tcg_reg_sync(s, ts->reg);
             break;
+        case TEMP_VAL_DEAD:
         case TEMP_VAL_MEM:
             break;
         default:
@@ -1578,6 +1575,14 @@ static void temp_save(TCGContext *s, int temp, TCGRegSet allocated_regs)
     }
 }
 
+/* save a temporary to memory. 'allocated_regs' is used in case a
+   temporary registers needs to be allocated to store a constant. */
+static inline void temp_save(TCGContext *s, int temp, TCGRegSet allocated_regs)
+{
+    temp_sync(s, temp, allocated_regs);
+    temp_dead(s, temp);
+}
+
 /* save globals to their canonical location and assume they can be
    modified be the following code. 'allocated_regs' is used in case a
    temporary registers needs to be allocated to store a constant. */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 04/13] tcg: sync output arguments on liveness request
  2012-09-27 17:15 [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator Aurelien Jarno
                   ` (2 preceding siblings ...)
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 03/13] tcg: add temp_sync() Aurelien Jarno
@ 2012-09-27 17:15 ` Aurelien Jarno
  2012-09-27 18:39   ` Richard Henderson
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 05/13] tcg: rework liveness analysis Aurelien Jarno
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Synchronize an output argument when requested by the liveness analysis.
This is needed so that the temp can be declared dead later.

For that, add a new op_sync_args table in which each bit tells if the
corresponding output argument needs to be synchronized with the memory.
Pass it to the tcg_reg_alloc_* functions, and honor this bit. We need to
synchronize the argument before marking it as dead, and we have to make
sure all the infos about the temp are correctly filled.

At the same time change some types from unsigned int to uint16_t when
passing op_dead_args.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg.c |   97 ++++++++++++++++++++++++++++++++++++++-----------------------
 tcg/tcg.h |    3 ++
 2 files changed, 64 insertions(+), 36 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index fb2223f..b89ab04 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1221,13 +1221,15 @@ static void tcg_liveness_analysis(TCGContext *s)
     TCGArg *args;
     const TCGOpDef *def;
     uint8_t *dead_temps;
-    unsigned int dead_args;
+    uint16_t dead_args;
+    uint8_t sync_args;
     
     gen_opc_ptr++; /* skip end */
 
     nb_ops = gen_opc_ptr - gen_opc_buf;
 
     s->op_dead_args = tcg_malloc(nb_ops * sizeof(uint16_t));
+    s->op_sync_args = tcg_malloc(nb_ops * sizeof(uint8_t));
     
     dead_temps = tcg_malloc(s->nb_temps);
     memset(dead_temps, 1, s->nb_temps);
@@ -1264,6 +1266,7 @@ static void tcg_liveness_analysis(TCGContext *s)
 
                     /* output args are dead */
                     dead_args = 0;
+                    sync_args = 0;
                     for(i = 0; i < nb_oargs; i++) {
                         arg = args[i];
                         if (dead_temps[arg]) {
@@ -1288,6 +1291,7 @@ static void tcg_liveness_analysis(TCGContext *s)
                         }
                     }
                     s->op_dead_args[op_index] = dead_args;
+                    s->op_sync_args[op_index] = sync_args;
                 }
                 args--;
             }
@@ -1330,6 +1334,7 @@ static void tcg_liveness_analysis(TCGContext *s)
 
                 /* output args are dead */
                 dead_args = 0;
+                sync_args = 0;
                 for(i = 0; i < nb_oargs; i++) {
                     arg = args[i];
                     if (dead_temps[arg]) {
@@ -1355,6 +1360,7 @@ static void tcg_liveness_analysis(TCGContext *s)
                     dead_temps[arg] = 0;
                 }
                 s->op_dead_args[op_index] = dead_args;
+                s->op_sync_args[op_index] = sync_args;
             }
             break;
         }
@@ -1373,6 +1379,8 @@ static void tcg_liveness_analysis(TCGContext *s)
 
     s->op_dead_args = tcg_malloc(nb_ops * sizeof(uint16_t));
     memset(s->op_dead_args, 0, nb_ops * sizeof(uint16_t));
+    s->op_sync_args = tcg_malloc(nb_ops * sizeof(uint8_t));
+    memset(s->op_sync_args, 0, nb_ops * sizeof(uint8_t));
 }
 #endif
 
@@ -1615,8 +1623,10 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
 }
 
 #define IS_DEAD_ARG(n) ((dead_args >> (n)) & 1)
+#define NEED_SYNC_ARG(n) ((sync_args >> (n)) & 1)
 
-static void tcg_reg_alloc_movi(TCGContext *s, const TCGArg *args)
+static void tcg_reg_alloc_movi(TCGContext *s, const TCGArg *args,
+                               uint16_t dead_args, uint8_t sync_args)
 {
     TCGTemp *ots;
     tcg_target_ulong val;
@@ -1635,11 +1645,14 @@ static void tcg_reg_alloc_movi(TCGContext *s, const TCGArg *args)
         ots->val_type = TEMP_VAL_CONST;
         ots->val = val;
     }
+    if (NEED_SYNC_ARG(0)) {
+        temp_sync(s, args[0], s->reserved_regs);
+    }
 }
 
 static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
-                              const TCGArg *args,
-                              unsigned int dead_args)
+                              const TCGArg *args, uint16_t dead_args,
+                              uint8_t sync_args)
 {
     TCGTemp *ts, *ots;
     int reg;
@@ -1684,6 +1697,9 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
                 s->reg_to_temp[ots->reg] = -1;
             ots->val_type = TEMP_VAL_CONST;
             ots->val = ts->val;
+            if (NEED_SYNC_ARG(0)) {
+                temp_sync(s, args[0], s->reserved_regs);
+            }
             return;
         }
     } else {
@@ -1693,12 +1709,16 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
     ots->reg = reg;
     ots->val_type = TEMP_VAL_REG;
     ots->mem_coherent = 0;
+
+    if (NEED_SYNC_ARG(0)) {
+        tcg_reg_sync(s, reg);
+    }
 }
 
 static void tcg_reg_alloc_op(TCGContext *s, 
                              const TCGOpDef *def, TCGOpcode opc,
-                             const TCGArg *args,
-                             unsigned int dead_args)
+                             const TCGArg *args, uint16_t dead_args,
+                             uint8_t sync_args)
 {
     TCGRegSet allocated_regs;
     int i, k, nb_iargs, nb_oargs, reg;
@@ -1824,19 +1844,15 @@ static void tcg_reg_alloc_op(TCGContext *s,
             tcg_regset_set_reg(allocated_regs, reg);
             /* if a fixed register is used, then a move will be done afterwards */
             if (!ts->fixed_reg) {
-                if (IS_DEAD_ARG(i)) {
-                    temp_dead(s, args[i]);
-                } else {
-                    if (ts->val_type == TEMP_VAL_REG) {
-                        s->reg_to_temp[ts->reg] = -1;
-                    }
-                    ts->val_type = TEMP_VAL_REG;
-                    ts->reg = reg;
-                    /* temp value is modified, so the value kept in memory is
-                       potentially not the same */
-                    ts->mem_coherent = 0;
-                    s->reg_to_temp[reg] = arg;
-               }
+                if (ts->val_type == TEMP_VAL_REG) {
+                    s->reg_to_temp[ts->reg] = -1;
+                }
+                ts->val_type = TEMP_VAL_REG;
+                ts->reg = reg;
+                /* temp value is modified, so the value kept in memory is
+                   potentially not the same */
+                ts->mem_coherent = 0;
+                s->reg_to_temp[reg] = arg;
             }
         oarg_end:
             new_args[i] = reg;
@@ -1853,6 +1869,12 @@ static void tcg_reg_alloc_op(TCGContext *s,
         if (ts->fixed_reg && ts->reg != reg) {
             tcg_out_mov(s, ts->type, ts->reg, reg);
         }
+        if (NEED_SYNC_ARG(i)) {
+            tcg_reg_sync(s, reg);
+        }
+        if (IS_DEAD_ARG(i)) {
+            temp_dead(s, args[i]);
+        }
     }
 }
 
@@ -1864,7 +1886,7 @@ static void tcg_reg_alloc_op(TCGContext *s,
 
 static int tcg_reg_alloc_call(TCGContext *s, const TCGOpDef *def,
                               TCGOpcode opc, const TCGArg *args,
-                              unsigned int dead_args)
+                              uint16_t dead_args, uint8_t sync_args)
 {
     int nb_iargs, nb_oargs, flags, nb_regs, i, reg, nb_params;
     TCGArg arg, func_arg;
@@ -2019,16 +2041,18 @@ static int tcg_reg_alloc_call(TCGContext *s, const TCGOpDef *def,
                 tcg_out_mov(s, ts->type, ts->reg, reg);
             }
         } else {
+            if (ts->val_type == TEMP_VAL_REG) {
+                s->reg_to_temp[ts->reg] = -1;
+            }
+            ts->val_type = TEMP_VAL_REG;
+            ts->reg = reg;
+            ts->mem_coherent = 0;
+            s->reg_to_temp[reg] = arg;
+            if (NEED_SYNC_ARG(i)) {
+                tcg_reg_sync(s, reg);
+            }
             if (IS_DEAD_ARG(i)) {
                 temp_dead(s, args[i]);
-            } else {
-                if (ts->val_type == TEMP_VAL_REG) {
-                    s->reg_to_temp[ts->reg] = -1;
-                }
-                ts->val_type = TEMP_VAL_REG;
-                ts->reg = reg;
-                ts->mem_coherent = 0;
-                s->reg_to_temp[reg] = arg;
             }
         }
     }
@@ -2059,7 +2083,6 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
     TCGOpcode opc;
     int op_index;
     const TCGOpDef *def;
-    unsigned int dead_args;
     const TCGArg *args;
 
 #ifdef DEBUG_DISAS
@@ -2120,12 +2143,13 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
         switch(opc) {
         case INDEX_op_mov_i32:
         case INDEX_op_mov_i64:
-            dead_args = s->op_dead_args[op_index];
-            tcg_reg_alloc_mov(s, def, args, dead_args);
+            tcg_reg_alloc_mov(s, def, args, s->op_dead_args[op_index],
+                              s->op_sync_args[op_index]);
             break;
         case INDEX_op_movi_i32:
         case INDEX_op_movi_i64:
-            tcg_reg_alloc_movi(s, args);
+            tcg_reg_alloc_movi(s, args, s->op_dead_args[op_index],
+                               s->op_sync_args[op_index]);
             break;
         case INDEX_op_debug_insn_start:
             /* debug instruction */
@@ -2146,8 +2170,9 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
             tcg_out_label(s, args[0], s->code_ptr);
             break;
         case INDEX_op_call:
-            dead_args = s->op_dead_args[op_index];
-            args += tcg_reg_alloc_call(s, def, opc, args, dead_args);
+            args += tcg_reg_alloc_call(s, def, opc, args,
+                                       s->op_dead_args[op_index],
+                                       s->op_sync_args[op_index]);
             goto next;
         case INDEX_op_end:
             goto the_end;
@@ -2159,8 +2184,8 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
             /* Note: in order to speed up the code, it would be much
                faster to have specialized register allocator functions for
                some common argument patterns */
-            dead_args = s->op_dead_args[op_index];
-            tcg_reg_alloc_op(s, def, opc, args, dead_args);
+            tcg_reg_alloc_op(s, def, opc, args, s->op_dead_args[op_index],
+                             s->op_sync_args[op_index]);
             break;
         }
         args += def->nb_args;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index af7464a..6875bc1 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -351,6 +351,9 @@ struct TCGContext {
     /* liveness analysis */
     uint16_t *op_dead_args; /* for each operation, each bit tells if the
                                corresponding argument is dead */
+    uint8_t *op_sync_args;  /* for each operation, each bit tells if the
+                               corresponding output argument needs to be
+                               sync to memory. */
     
     /* tells in which temporary a given register is. It does not take
        into account fixed registers */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 05/13] tcg: rework liveness analysis
  2012-09-27 17:15 [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator Aurelien Jarno
                   ` (3 preceding siblings ...)
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 04/13] tcg: sync output arguments on liveness request Aurelien Jarno
@ 2012-09-27 17:15 ` Aurelien Jarno
  2012-09-27 18:54   ` Richard Henderson
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 06/13] tcg: improve tcg_reg_alloc_movi() Aurelien Jarno
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Rework the liveness analysis by tracking temps that need to go back to
memory in addition to dead temps tracking. This allows to mark output
arguments as "need sync", and to synchronize them back to memory as soon
as they are not written anymore. This way even arguments mapping to
globals can be marked as "dead", avoiding moves to a new register when
input and outputs are aliased.

In addition it means that registers are freed as soon as temps are not
used anymore, instead of waiting for a basic block end or an op with side
effects. This reduces register spilling especially on CPUs with few
registers, and spread the mov over all the TB, increasing the
performances on in-order CPUs.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg.c |   64 +++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 37 insertions(+), 27 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index b89ab04..49e2478 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1183,31 +1183,27 @@ static inline void tcg_set_nop(TCGContext *s, uint16_t *opc_ptr,
     }
 }
 
-/* liveness analysis: end of function: globals are live, temps are
-   dead. */
-/* XXX: at this stage, not used as there would be little gains because
-   most TBs end with a conditional jump. */
-static inline void tcg_la_func_end(TCGContext *s, uint8_t *dead_temps)
+/* liveness analysis: end of function: all temps are dead, and globals
+   should be in memory. */
+static inline void tcg_la_func_end(TCGContext *s, uint8_t *dead_temps,
+                                   uint8_t *mem_temps)
 {
-    memset(dead_temps, 0, s->nb_globals);
-    memset(dead_temps + s->nb_globals, 1, s->nb_temps - s->nb_globals);
+    memset(dead_temps, 1, s->nb_temps);
+    memset(mem_temps, 1, s->nb_globals);
+    memset(mem_temps + s->nb_globals, 0, s->nb_temps - s->nb_globals);
 }
 
-/* liveness analysis: end of basic block: globals are live, temps are
-   dead, local temps are live. */
-static inline void tcg_la_bb_end(TCGContext *s, uint8_t *dead_temps)
+/* liveness analysis: end of basic block: all temps are dead, globals
+   and local temps should be in memory. */
+static inline void tcg_la_bb_end(TCGContext *s, uint8_t *dead_temps,
+                                 uint8_t *mem_temps)
 {
     int i;
-    TCGTemp *ts;
 
-    memset(dead_temps, 0, s->nb_globals);
-    ts = &s->temps[s->nb_globals];
+    memset(dead_temps, 1, s->nb_temps);
+    memset(mem_temps, 1, s->nb_globals);
     for(i = s->nb_globals; i < s->nb_temps; i++) {
-        if (ts->temp_local)
-            dead_temps[i] = 0;
-        else
-            dead_temps[i] = 1;
-        ts++;
+        mem_temps[i] = s->temps[i].temp_local;
     }
 }
 
@@ -1220,7 +1216,7 @@ static void tcg_liveness_analysis(TCGContext *s)
     TCGOpcode op;
     TCGArg *args;
     const TCGOpDef *def;
-    uint8_t *dead_temps;
+    uint8_t *dead_temps, *mem_temps;
     uint16_t dead_args;
     uint8_t sync_args;
     
@@ -1232,7 +1228,8 @@ static void tcg_liveness_analysis(TCGContext *s)
     s->op_sync_args = tcg_malloc(nb_ops * sizeof(uint8_t));
     
     dead_temps = tcg_malloc(s->nb_temps);
-    memset(dead_temps, 1, s->nb_temps);
+    mem_temps = tcg_malloc(s->nb_temps);
+    tcg_la_func_end(s, dead_temps, mem_temps);
 
     args = gen_opparam_ptr;
     op_index = nb_ops - 1;
@@ -1256,8 +1253,9 @@ static void tcg_liveness_analysis(TCGContext *s)
                 if (call_flags & TCG_CALL_PURE) {
                     for(i = 0; i < nb_oargs; i++) {
                         arg = args[i];
-                        if (!dead_temps[arg])
+                        if (!dead_temps[arg] || mem_temps[arg]) {
                             goto do_not_remove_call;
+                        }
                     }
                     tcg_set_nop(s, gen_opc_buf + op_index, 
                                 args - 1, nb_args);
@@ -1272,12 +1270,17 @@ static void tcg_liveness_analysis(TCGContext *s)
                         if (dead_temps[arg]) {
                             dead_args |= (1 << i);
                         }
+                        if (mem_temps[arg]) {
+                            sync_args |= (1 << i);
+                        }
                         dead_temps[arg] = 1;
+                        mem_temps[arg] = 0;
                     }
                     
                     if (!(call_flags & TCG_CALL_CONST)) {
-                        /* globals are live (they may be used by the call) */
-                        memset(dead_temps, 0, s->nb_globals);
+                        /* globals should go back to memory */
+                        memset(dead_temps, 1, s->nb_globals);
+                        memset(mem_temps, 1, s->nb_globals);
                     }
 
                     /* input args are live */
@@ -1307,6 +1310,7 @@ static void tcg_liveness_analysis(TCGContext *s)
             args--;
             /* mark the temporary as dead */
             dead_temps[args[0]] = 1;
+            mem_temps[args[0]] = 0;
             break;
         case INDEX_op_end:
             break;
@@ -1322,8 +1326,9 @@ static void tcg_liveness_analysis(TCGContext *s)
             if (!(def->flags & TCG_OPF_SIDE_EFFECTS) && nb_oargs != 0) {
                 for(i = 0; i < nb_oargs; i++) {
                     arg = args[i];
-                    if (!dead_temps[arg])
+                    if (!dead_temps[arg] || mem_temps[arg]) {
                         goto do_not_remove;
+                    }
                 }
                 tcg_set_nop(s, gen_opc_buf + op_index, args, def->nb_args);
 #ifdef CONFIG_PROFILER
@@ -1340,15 +1345,20 @@ static void tcg_liveness_analysis(TCGContext *s)
                     if (dead_temps[arg]) {
                         dead_args |= (1 << i);
                     }
+                    if (mem_temps[arg]) {
+                        sync_args |= (1 << i);
+                    }
                     dead_temps[arg] = 1;
+                    mem_temps[arg] = 0;
                 }
 
                 /* if end of basic block, update */
                 if (def->flags & TCG_OPF_BB_END) {
-                    tcg_la_bb_end(s, dead_temps);
+                    tcg_la_bb_end(s, dead_temps, mem_temps);
                 } else if (def->flags & TCG_OPF_CALL_CLOBBER) {
-                    /* globals are live */
-                    memset(dead_temps, 0, s->nb_globals);
+                    /* globals should go back to memory */
+                    memset(dead_temps, 1, s->nb_globals);
+                    memset(mem_temps, 1, s->nb_globals);
                 }
 
                 /* input args are live */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 06/13] tcg: improve tcg_reg_alloc_movi()
  2012-09-27 17:15 [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator Aurelien Jarno
                   ` (4 preceding siblings ...)
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 05/13] tcg: rework liveness analysis Aurelien Jarno
@ 2012-09-27 17:15 ` Aurelien Jarno
  2012-09-27 18:55   ` Richard Henderson
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 07/13] tcg: rewrite tcg_reg_alloc_mov() Aurelien Jarno
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Now that the liveness analysis might mark some output temps as dead, call
temp_dead() if needed.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 49e2478..bfe6411 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1658,6 +1658,9 @@ static void tcg_reg_alloc_movi(TCGContext *s, const TCGArg *args,
     if (NEED_SYNC_ARG(0)) {
         temp_sync(s, args[0], s->reserved_regs);
     }
+    if (IS_DEAD_ARG(0)) {
+        temp_dead(s, args[0]);
+    }
 }
 
 static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 07/13] tcg: rewrite tcg_reg_alloc_mov()
  2012-09-27 17:15 [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator Aurelien Jarno
                   ` (5 preceding siblings ...)
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 06/13] tcg: improve tcg_reg_alloc_movi() Aurelien Jarno
@ 2012-09-27 17:15 ` Aurelien Jarno
  2012-09-27 19:09   ` Richard Henderson
  2012-09-27 22:18   ` Richard Henderson
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 08/13] tcg: always mark dead input arguments as dead Aurelien Jarno
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Now that the liveness analysis provides more information, rewrite
tcg_reg_alloc_mov(). This changes the behaviour about propagating
constants and memory accesses. We now take the assumption that once
a value is loaded into a register (from memory or from a constant),
it's better to keep it there than to reload it later. This assumption
is now always almost correct given that we are now sure the
corresponding temp is going to be used later (otherwise it would have
been synchronized and marked as dead already). The assumption is wrong
if one of the op after clobbers some registers including the one
of the holding the temp (this can be avoided by allocating clobbered
registers last, which is what most TCG target do), or in case of lack
of available register.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg.c |  105 ++++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 63 insertions(+), 42 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index bfe6411..5fb4901 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1667,64 +1667,85 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
                               const TCGArg *args, uint16_t dead_args,
                               uint8_t sync_args)
 {
+    TCGRegSet allocated_regs;
     TCGTemp *ts, *ots;
-    int reg;
-    const TCGArgConstraint *arg_ct;
+    const TCGArgConstraint *arg_ct, *oarg_ct;
 
+    tcg_regset_set(allocated_regs, s->reserved_regs);
     ots = &s->temps[args[0]];
     ts = &s->temps[args[1]];
-    arg_ct = &def->args_ct[0];
+    oarg_ct = &def->args_ct[0];
+    arg_ct = &def->args_ct[1];
+
+    /* We have to load the value in a register for moving it to another
+       or for saving it. We assume it's better to keep it there than to
+       reload it later. */
+    if ((NEED_SYNC_ARG(0) && ts->val_type != TEMP_VAL_REG)
+        || ts->val_type == TEMP_VAL_MEM) {
+        ts->reg = tcg_reg_alloc(s, arg_ct->u.regs, allocated_regs);
+        if (ts->val_type == TEMP_VAL_MEM) {
+            tcg_out_ld(s, ts->type, ts->reg, ts->mem_reg, ts->mem_offset);
+        } else if (ts->val_type == TEMP_VAL_CONST) {
+            tcg_out_movi(s, ts->type, ts->reg, ts->val);
+        }
+        s->reg_to_temp[ts->reg] = args[1];
+        ts->val_type = TEMP_VAL_REG;
+        ts->mem_coherent = 1;
+    }
 
-    /* XXX: always mark arg dead if IS_DEAD_ARG(1) */
-    if (ts->val_type == TEMP_VAL_REG) {
-        if (IS_DEAD_ARG(1) && !ts->fixed_reg && !ots->fixed_reg) {
-            /* the mov can be suppressed */
-            if (ots->val_type == TEMP_VAL_REG)
-                s->reg_to_temp[ots->reg] = -1;
-            reg = ts->reg;
-            temp_dead(s, args[1]);
-        } else {
-            if (ots->val_type == TEMP_VAL_REG) {
-                reg = ots->reg;
-            } else {
-                reg = tcg_reg_alloc(s, arg_ct->u.regs, s->reserved_regs);
-            }
-            if (ts->reg != reg) {
-                tcg_out_mov(s, ots->type, reg, ts->reg);
-            }
+    if (IS_DEAD_ARG(0) && !ots->fixed_reg) {
+        /* mov to a non-saved dead register makes no sense (even with
+           liveness analysis disabled). */
+        assert(NEED_SYNC_ARG(0));
+        /* The code above should have moved the temp to a register. */
+        assert(ts->val_type == TEMP_VAL_REG);
+        if (!ots->mem_allocated) {
+            temp_allocate_frame(s, args[0]);
         }
-    } else if (ts->val_type == TEMP_VAL_MEM) {
-        if (ots->val_type == TEMP_VAL_REG) {
-            reg = ots->reg;
-        } else {
-            reg = tcg_reg_alloc(s, arg_ct->u.regs, s->reserved_regs);
+        if (ts->val_type == TEMP_VAL_REG) {
+            tcg_out_st(s, ots->type, ts->reg, ots->mem_reg, ots->mem_offset);
+            if (IS_DEAD_ARG(1)) {
+                temp_dead(s, args[1]);
+            }
         }
-        tcg_out_ld(s, ts->type, reg, ts->mem_reg, ts->mem_offset);
+        temp_dead(s, args[0]);
     } else if (ts->val_type == TEMP_VAL_CONST) {
         if (ots->fixed_reg) {
-            reg = ots->reg;
-            tcg_out_movi(s, ots->type, reg, ts->val);
+            tcg_out_movi(s, ots->type, ots->reg, ts->val);
+            s->reg_to_temp[ots->reg] = args[0];
         } else {
             /* propagate constant */
-            if (ots->val_type == TEMP_VAL_REG)
+            if (ots->val_type == TEMP_VAL_REG) {
                 s->reg_to_temp[ots->reg] = -1;
+            }
             ots->val_type = TEMP_VAL_CONST;
             ots->val = ts->val;
-            if (NEED_SYNC_ARG(0)) {
-                temp_sync(s, args[0], s->reserved_regs);
-            }
-            return;
         }
     } else {
-        tcg_abort();
-    }
-    s->reg_to_temp[reg] = args[0];
-    ots->reg = reg;
-    ots->val_type = TEMP_VAL_REG;
-    ots->mem_coherent = 0;
-
-    if (NEED_SYNC_ARG(0)) {
-        tcg_reg_sync(s, reg);
+        /* The code above should have moved the temp to a register. */
+        assert(ts->val_type == TEMP_VAL_REG);
+        if (IS_DEAD_ARG(1) && !ts->fixed_reg && !ots->fixed_reg) {
+                /* the mov can be suppressed */
+                if (ots->val_type == TEMP_VAL_REG) {
+                    s->reg_to_temp[ots->reg] = -1;
+                }
+                ots->reg = ts->reg;
+                temp_dead(s, args[1]);
+        } else {
+            if (ots->val_type != TEMP_VAL_REG) {
+                /* When allocating a new register, make sure to not spill the
+                   input one. */
+                tcg_regset_set_reg(allocated_regs, ts->reg);
+                ots->reg = tcg_reg_alloc(s, oarg_ct->u.regs, allocated_regs);
+            }
+            tcg_out_mov(s, ots->type, ots->reg, ts->reg);
+        }
+        ots->val_type = TEMP_VAL_REG;
+        ots->mem_coherent = 0;
+        s->reg_to_temp[ots->reg] = args[0];
+        if (NEED_SYNC_ARG(0)) {
+            tcg_reg_sync(s, ots->reg);
+        }
     }
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 08/13] tcg: always mark dead input arguments as dead
  2012-09-27 17:15 [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator Aurelien Jarno
                   ` (6 preceding siblings ...)
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 07/13] tcg: rewrite tcg_reg_alloc_mov() Aurelien Jarno
@ 2012-09-27 17:15 ` Aurelien Jarno
  2012-09-27 19:10   ` Richard Henderson
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 09/13] tcg: start with local temps in TEMP_VAL_MEM state Aurelien Jarno
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Always mark dead input arguments as dead, even if the op is at the basic
block end. This will allow to check that all temps are correctly saved.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 5fb4901..71dd2ad 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1832,16 +1832,16 @@ static void tcg_reg_alloc_op(TCGContext *s,
     iarg_end: ;
     }
     
+    /* mark dead temporaries and free the associated registers */
+    for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
+        if (IS_DEAD_ARG(i)) {
+            temp_dead(s, args[i]);
+        }
+    }
+
     if (def->flags & TCG_OPF_BB_END) {
         tcg_reg_alloc_bb_end(s, allocated_regs);
     } else {
-        /* mark dead temporaries and free the associated registers */
-        for(i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-            if (IS_DEAD_ARG(i)) {
-                temp_dead(s, args[i]);
-            }
-        }
-        
         if (def->flags & TCG_OPF_CALL_CLOBBER) {
             /* XXX: permit generic clobber register list ? */ 
             for(reg = 0; reg < TCG_TARGET_NB_REGS; reg++) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 09/13] tcg: start with local temps in TEMP_VAL_MEM state
  2012-09-27 17:15 [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator Aurelien Jarno
                   ` (7 preceding siblings ...)
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 08/13] tcg: always mark dead input arguments as dead Aurelien Jarno
@ 2012-09-27 17:15 ` Aurelien Jarno
  2012-09-27 19:10   ` Richard Henderson
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 10/13] tcg: don't explicitely save globals and temps Aurelien Jarno
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Start with local temps in TEMP_VAL_MEM state, to make possible a later
check that all the temps are correctly saved back to memory.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 71dd2ad..9e12be8 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -781,7 +781,11 @@ static void tcg_reg_alloc_start(TCGContext *s)
     }
     for(i = s->nb_globals; i < s->nb_temps; i++) {
         ts = &s->temps[i];
-        ts->val_type = TEMP_VAL_DEAD;
+        if (ts->temp_local) {
+            ts->val_type = TEMP_VAL_MEM;
+        } else {
+            ts->val_type = TEMP_VAL_DEAD;
+        }
         ts->mem_allocated = 0;
         ts->fixed_reg = 0;
     }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 10/13] tcg: don't explicitely save globals and temps
  2012-09-27 17:15 [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator Aurelien Jarno
                   ` (8 preceding siblings ...)
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 09/13] tcg: start with local temps in TEMP_VAL_MEM state Aurelien Jarno
@ 2012-09-27 17:15 ` Aurelien Jarno
  2012-09-27 19:13   ` Richard Henderson
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 11/13] tcg: sync globals for pure helpers instead of saving them Aurelien Jarno
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

The liveness analysis ensures that globals and temps are at the correct
state at a basic block end or with an op with side effects. Avoid
looping on all temps, this can be time consuming on targets with a lot
of globals. Keep an assert in debug mode.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg.c |   20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 9e12be8..2f973e8 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1601,8 +1601,14 @@ static inline void temp_sync(TCGContext *s, int temp, TCGRegSet allocated_regs)
    temporary registers needs to be allocated to store a constant. */
 static inline void temp_save(TCGContext *s, int temp, TCGRegSet allocated_regs)
 {
+#ifdef USE_LIVENESS_ANALYSIS
+    /* The liveness analysis already ensures that globals are back
+       in memory. Keep an assert for safety. */
+    assert(s->temps[temp].val_type == TEMP_VAL_MEM || s->temps[temp].fixed_reg);
+#else
     temp_sync(s, temp, allocated_regs);
     temp_dead(s, temp);
+#endif
 }
 
 /* save globals to their canonical location and assume they can be
@@ -1629,7 +1635,13 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
         if (ts->temp_local) {
             temp_save(s, i, allocated_regs);
         } else {
+#ifdef USE_LIVENESS_ANALYSIS
+            /* The liveness analysis already ensures that temps are dead.
+               Keep an assert for safety. */
+            assert(ts->val_type == TEMP_VAL_DEAD);
+#else
             temp_dead(s, i);
+#endif
         }
     }
 
@@ -1706,11 +1718,9 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
         if (!ots->mem_allocated) {
             temp_allocate_frame(s, args[0]);
         }
-        if (ts->val_type == TEMP_VAL_REG) {
-            tcg_out_st(s, ots->type, ts->reg, ots->mem_reg, ots->mem_offset);
-            if (IS_DEAD_ARG(1)) {
-                temp_dead(s, args[1]);
-            }
+        tcg_out_st(s, ots->type, ts->reg, ots->mem_reg, ots->mem_offset);
+        if (IS_DEAD_ARG(1)) {
+            temp_dead(s, args[1]);
         }
         temp_dead(s, args[0]);
     } else if (ts->val_type == TEMP_VAL_CONST) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 11/13] tcg: sync globals for pure helpers instead of saving them
  2012-09-27 17:15 [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator Aurelien Jarno
                   ` (9 preceding siblings ...)
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 10/13] tcg: don't explicitely save globals and temps Aurelien Jarno
@ 2012-09-27 17:15 ` Aurelien Jarno
  2012-09-27 19:39   ` Richard Henderson
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 12/13] tcg: fix some op flags Aurelien Jarno
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 13/13] tcg: rework TCG ops flags Aurelien Jarno
  12 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Pure helpers are allowed to read globals, but not to modify them. In
that case, instead of moving all the globals to memory, just synchronize
them.

Also update the documentation to make it more clear.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/README |   15 ++++++++++-----
 tcg/tcg.c  |   33 ++++++++++++++++++++++++++++-----
 2 files changed, 38 insertions(+), 10 deletions(-)

Note: this is a change of behavior, but the new one is still compliant
with the documentation. Currently only cris, i386 and s390x have PURE
only helpers. I have checked they are really PURE, and also did some 
tests on cris and i386.

diff --git a/tcg/README b/tcg/README
index 27846f1..d61daa1 100644
--- a/tcg/README
+++ b/tcg/README
@@ -77,11 +77,16 @@ destroyed, but local temporaries and globals are preserved.
 Using the tcg_gen_helper_x_y it is possible to call any function
 taking i32, i64 or pointer types. By default, before calling a helper,
 all globals are stored at their canonical location and it is assumed
-that the function can modify them. This can be overridden by the
-TCG_CALL_CONST function modifier. By default, the helper is allowed to
-modify the CPU state or raise an exception. This can be overridden by
-the TCG_CALL_PURE function modifier, in which case the call to the
-function is removed if the return value is not used.
+that the function can modify them. The helper is allowed to modify
+the CPU state or raise an exception.
+
+This can be overridden by the TCG_CALL_CONST and TCG_CALL_PURE function
+modifiers. A PURE helper can read globals but cannot modify them nor the
+CPU state and cannot raise an exception. It can be removed if the return
+value is not used. For that the globals are synchronized with their canonical
+location, but the associated registers are not freed nor reloaded afterwise.
+A CONST helper is a PURE helper which in addition cannot read globals, they
+are not synchronized nor stored. Note that CONST implies PURE.
 
 On some TCG targets (e.g. x86), several calling conventions are
 supported.
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 2f973e8..1c5ab81 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1280,11 +1280,14 @@ static void tcg_liveness_analysis(TCGContext *s)
                         dead_temps[arg] = 1;
                         mem_temps[arg] = 0;
                     }
-                    
+
                     if (!(call_flags & TCG_CALL_CONST)) {
+                        /* globals should be synced to memory */
+                        memset(mem_temps, 1, s->nb_globals);
+                    }
+                    if (!(call_flags & (TCG_CALL_CONST | TCG_CALL_PURE))) {
                         /* globals should go back to memory */
                         memset(dead_temps, 1, s->nb_globals);
-                        memset(mem_temps, 1, s->nb_globals);
                     }
 
                     /* input args are live */
@@ -1623,6 +1626,23 @@ static void save_globals(TCGContext *s, TCGRegSet allocated_regs)
     }
 }
 
+/* sync globals to their canonical location and assume they can be
+   read by the following code. 'allocated_regs' is used in case a
+   temporary registers needs to be allocated to store a constant. */
+static void sync_globals(TCGContext *s, TCGRegSet allocated_regs)
+{
+    int i;
+
+    for (i = 0; i < s->nb_globals; i++) {
+#ifdef USE_LIVENESS_ANALYSIS
+        assert(s->temps[i].val_type != TEMP_VAL_REG || s->temps[i].fixed_reg ||
+               s->temps[i].mem_coherent);
+#else
+        temp_sync(s, i, allocated_regs);
+#endif
+    }
+}
+
 /* at the end of a basic block, we assume all temporaries are dead and
    all globals are stored at their canonical location. */
 static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
@@ -2070,9 +2090,12 @@ static int tcg_reg_alloc_call(TCGContext *s, const TCGOpDef *def,
         }
     }
     
-    /* store globals and free associated registers (we assume the call
-       can modify any global. */
-    if (!(flags & TCG_CALL_CONST)) {
+    /* sync or save globals */
+    if (flags & TCG_CALL_CONST) {
+        /* nothing to do */
+    } else if (flags & TCG_CALL_PURE) {
+        sync_globals(s, allocated_regs);
+    } else {
         save_globals(s, allocated_regs);
     }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 12/13] tcg: fix some op flags
  2012-09-27 17:15 [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator Aurelien Jarno
                   ` (10 preceding siblings ...)
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 11/13] tcg: sync globals for pure helpers instead of saving them Aurelien Jarno
@ 2012-09-27 17:15 ` Aurelien Jarno
  2012-09-27 19:44   ` Richard Henderson
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 13/13] tcg: rework TCG ops flags Aurelien Jarno
  12 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Some branch related ops are marked with TCG_OPF_SIDE_EFFECTS, some other
not. In practice they don't need to, as they are all marked with
TCG_OPF_BB_END, which is handled specifically in all the code.

The call op is marked as TCG_OPF_SIDE_EFFECTS, which might be not true
as there is are specific flags (TCG_CALL_CONST and TCG_CALL_PURE) for
specifying that. On the other hand it always clobber arguments, so mark
it as such even if the call op is handled in a different code path.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg-opc.h |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index dbb0e39..0ffd44f 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -37,9 +37,9 @@ DEF(nopn, 0, 0, 1, 0) /* variable number of parameters */
 DEF(discard, 1, 0, 0, 0)
 
 DEF(set_label, 0, 0, 1, TCG_OPF_BB_END)
-DEF(call, 0, 1, 2, TCG_OPF_SIDE_EFFECTS) /* variable number of parameters */
-DEF(jmp, 0, 1, 0, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
-DEF(br, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
+DEF(call, 0, 1, 2, TCG_OPF_CALL_CLOBBER) /* variable number of parameters */
+DEF(jmp, 0, 1, 0, TCG_OPF_BB_END)
+DEF(br, 0, 0, 1, TCG_OPF_BB_END)
 
 #define IMPL(X) (X ? 0 : TCG_OPF_NOT_PRESENT)
 #if TCG_TARGET_REG_BITS == 32
@@ -82,12 +82,11 @@ DEF(rotl_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_rot_i32))
 DEF(rotr_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_rot_i32))
 DEF(deposit_i32, 1, 2, 2, IMPL(TCG_TARGET_HAS_deposit_i32))
 
-DEF(brcond_i32, 0, 2, 2, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
+DEF(brcond_i32, 0, 2, 2, TCG_OPF_BB_END)
 
 DEF(add2_i32, 2, 4, 0, IMPL(TCG_TARGET_REG_BITS == 32))
 DEF(sub2_i32, 2, 4, 0, IMPL(TCG_TARGET_REG_BITS == 32))
-DEF(brcond2_i32, 0, 4, 2,
-    TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS | IMPL(TCG_TARGET_REG_BITS == 32))
+DEF(brcond2_i32, 0, 4, 2, TCG_OPF_BB_END | IMPL(TCG_TARGET_REG_BITS == 32))
 DEF(mulu2_i32, 2, 2, 0, IMPL(TCG_TARGET_REG_BITS == 32))
 DEF(setcond2_i32, 1, 4, 1, IMPL(TCG_TARGET_REG_BITS == 32))
 
@@ -142,7 +141,7 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
 
-DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS | IMPL64)
+DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | IMPL64)
 DEF(ext8s_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ext8s_i64))
 DEF(ext16s_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ext16s_i64))
 DEF(ext32s_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ext32s_i64))
@@ -166,8 +165,8 @@ DEF(debug_insn_start, 0, 0, 2, 0)
 #else
 DEF(debug_insn_start, 0, 0, 1, 0)
 #endif
-DEF(exit_tb, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
-DEF(goto_tb, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
+DEF(exit_tb, 0, 0, 1, TCG_OPF_BB_END)
+DEF(goto_tb, 0, 0, 1, TCG_OPF_BB_END)
 /* Note: even if TARGET_LONG_BITS is not defined, the INDEX_op
    constants must be defined */
 #if TCG_TARGET_REG_BITS == 32
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 13/13] tcg: rework TCG ops flags
  2012-09-27 17:15 [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator Aurelien Jarno
                   ` (11 preceding siblings ...)
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 12/13] tcg: fix some op flags Aurelien Jarno
@ 2012-09-27 17:15 ` Aurelien Jarno
  2012-09-27 19:56   ` Richard Henderson
  12 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 17:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

TCG_OPF_CALL_CLOBBER means that an op is clobbering the call registers,
and thus they should be emptied before the op is executed. It doesn't
says anything about globals.

On the other hand, TCG_OPF_SIDE_EFFECTS means that the op might touch
some globals or even affects some non map registers. As a consequence
such an op should not be removed even if its output arguments are not
used, and the globals should be saved back to memory.

Unfortunately the two seems to be wrongly used: saving globals is
decided depending on TCG_OPF_CALL_CLOBBER, while non-qemu store
functions are marked as TCG_OPF_SIDE_EFFECTS. This is why mapping a
memory address using a global and accessing it through ld/st doesn't
work.

While theoretically it could be possible to use PURE and CONST like
for helpers, it only refers to globals and not about other things like
exception. This patch reworks that in the following way:
- TCG_OPF_CALL_CLOBBER: The op clobber the call registers. They are
  freed before emitting the op.
- TCG_OPF_SIDE_EFFECTS: The op is not removed if the returned value
  if not used. It can trigger exception and thus globals are
  synchronized before emitting the op.
- TCG_OPF_READ_GLOBALS: The op can read globals but not write them,
  and thus globals are synchronized before emitting the op.
- TCG_OPF_WRITE_GLOBALS: The op can read and write globals, and thus
  globals are saved back to their canonical location before emitting
  the op.

This patch also update the flags for the affected instructions.
Compared to the previous behaviour, here are the changes:
- Globals are only synchronized and not saved back to memory for
  qemu_ld/st helpers. This significantly reduces the generated code
  size.
- Mapping a memory address using a global and accessing it through
  ld/st now does work, but at a cost of a small increase of the
  generated code size.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg-opc.h |   38 +++++++++++++++++++-------------------
 tcg/tcg.c     |   26 +++++++++++++++++---------
 tcg/tcg.h     |   16 ++++++++++------
 3 files changed, 46 insertions(+), 34 deletions(-)

Note: While this is restoring the possibility to map a memory address
using both a global and accessing it through ld/st, this reduces the
performances of targets generating a lot of ld/st op. Given it is 
currently technically not allowed, we might instead change the TCG 
documentation to make that clear.

diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 0ffd44f..aeccebb 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -53,14 +53,14 @@ DEF(movi_i32, 1, 0, 1, 0)
 DEF(setcond_i32, 1, 2, 1, 0)
 DEF(movcond_i32, 1, 4, 1, IMPL(TCG_TARGET_HAS_movcond_i32))
 /* load/store */
-DEF(ld8u_i32, 1, 1, 1, 0)
-DEF(ld8s_i32, 1, 1, 1, 0)
-DEF(ld16u_i32, 1, 1, 1, 0)
-DEF(ld16s_i32, 1, 1, 1, 0)
-DEF(ld_i32, 1, 1, 1, 0)
-DEF(st8_i32, 0, 2, 1, TCG_OPF_SIDE_EFFECTS)
-DEF(st16_i32, 0, 2, 1, TCG_OPF_SIDE_EFFECTS)
-DEF(st_i32, 0, 2, 1, TCG_OPF_SIDE_EFFECTS)
+DEF(ld8u_i32, 1, 1, 1, TCG_OPF_READ_GLOBALS)
+DEF(ld8s_i32, 1, 1, 1, TCG_OPF_READ_GLOBALS)
+DEF(ld16u_i32, 1, 1, 1, TCG_OPF_READ_GLOBALS)
+DEF(ld16s_i32, 1, 1, 1, TCG_OPF_READ_GLOBALS)
+DEF(ld_i32, 1, 1, 1, TCG_OPF_READ_GLOBALS)
+DEF(st8_i32, 0, 2, 1, TCG_OPF_WRITE_GLOBALS)
+DEF(st16_i32, 0, 2, 1, TCG_OPF_WRITE_GLOBALS)
+DEF(st_i32, 0, 2, 1, TCG_OPF_WRITE_GLOBALS)
 /* arith */
 DEF(add_i32, 1, 2, 0, 0)
 DEF(sub_i32, 1, 2, 0, 0)
@@ -109,17 +109,17 @@ DEF(movi_i64, 1, 0, 1, IMPL64)
 DEF(setcond_i64, 1, 2, 1, IMPL64)
 DEF(movcond_i64, 1, 4, 1, IMPL64 | IMPL(TCG_TARGET_HAS_movcond_i64))
 /* load/store */
-DEF(ld8u_i64, 1, 1, 1, IMPL64)
-DEF(ld8s_i64, 1, 1, 1, IMPL64)
-DEF(ld16u_i64, 1, 1, 1, IMPL64)
-DEF(ld16s_i64, 1, 1, 1, IMPL64)
-DEF(ld32u_i64, 1, 1, 1, IMPL64)
-DEF(ld32s_i64, 1, 1, 1, IMPL64)
-DEF(ld_i64, 1, 1, 1, IMPL64)
-DEF(st8_i64, 0, 2, 1, TCG_OPF_SIDE_EFFECTS | IMPL64)
-DEF(st16_i64, 0, 2, 1, TCG_OPF_SIDE_EFFECTS | IMPL64)
-DEF(st32_i64, 0, 2, 1, TCG_OPF_SIDE_EFFECTS | IMPL64)
-DEF(st_i64, 0, 2, 1, TCG_OPF_SIDE_EFFECTS | IMPL64)
+DEF(ld8u_i64, 1, 1, 1, TCG_OPF_READ_GLOBALS | IMPL64)
+DEF(ld8s_i64, 1, 1, 1, TCG_OPF_READ_GLOBALS | IMPL64)
+DEF(ld16u_i64, 1, 1, 1, TCG_OPF_READ_GLOBALS | IMPL64)
+DEF(ld16s_i64, 1, 1, 1, TCG_OPF_READ_GLOBALS | IMPL64)
+DEF(ld32u_i64, 1, 1, 1, TCG_OPF_READ_GLOBALS | IMPL64)
+DEF(ld32s_i64, 1, 1, 1, TCG_OPF_READ_GLOBALS | IMPL64)
+DEF(ld_i64, 1, 1, 1, TCG_OPF_READ_GLOBALS | IMPL64)
+DEF(st8_i64, 0, 2, 1, TCG_OPF_WRITE_GLOBALS | IMPL64)
+DEF(st16_i64, 0, 2, 1, TCG_OPF_WRITE_GLOBALS | IMPL64)
+DEF(st32_i64, 0, 2, 1, TCG_OPF_WRITE_GLOBALS | IMPL64)
+DEF(st_i64, 0, 2, 1, TCG_OPF_WRITE_GLOBALS | IMPL64)
 /* arith */
 DEF(add_i64, 1, 2, 0, IMPL64)
 DEF(sub_i64, 1, 2, 0, IMPL64)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 1c5ab81..1f36777 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1362,10 +1362,17 @@ static void tcg_liveness_analysis(TCGContext *s)
                 /* if end of basic block, update */
                 if (def->flags & TCG_OPF_BB_END) {
                     tcg_la_bb_end(s, dead_temps, mem_temps);
-                } else if (def->flags & TCG_OPF_CALL_CLOBBER) {
-                    /* globals should go back to memory */
-                    memset(dead_temps, 1, s->nb_globals);
-                    memset(mem_temps, 1, s->nb_globals);
+                } else {
+                    if (def->flags & (TCG_OPF_SIDE_EFFECTS |
+                                      TCG_OPF_READ_GLOBALS |
+                                      TCG_OPF_WRITE_GLOBALS)) {
+                        /* globals should be synced to memory */
+                        memset(mem_temps, 1, s->nb_globals);
+                    }
+                    if (def->flags & TCG_OPF_WRITE_GLOBALS) {
+                        /* globals should go back to memory */
+                        memset(dead_temps, 1, s->nb_globals);
+                    }
                 }
 
                 /* input args are live */
@@ -1883,12 +1890,13 @@ static void tcg_reg_alloc_op(TCGContext *s,
                     tcg_reg_free(s, reg);
                 }
             }
-            /* XXX: for load/store we could do that only for the slow path
-               (i.e. when a memory callback is called) */
-            
-            /* store globals and free associated registers (we assume the insn
-               can modify any global. */
+        }
+        /* Save globals if the op might write them, sync if the op might
+           read them or has side effects. */
+        if (def->flags & TCG_OPF_WRITE_GLOBALS) {
             save_globals(s, allocated_regs);
+        } else if (def->flags & (TCG_OPF_SIDE_EFFECTS | TCG_OPF_READ_GLOBALS)) {
+            sync_globals(s, allocated_regs);
         }
         
         /* satisfy the output constraints */
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 6875bc1..a182f59 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -497,16 +497,20 @@ typedef struct TCGArgConstraint {
 /* Bits for TCGOpDef->flags, 8 bits available.  */
 enum {
     /* Instruction defines the end of a basic block.  */
-    TCG_OPF_BB_END       = 0x01,
-    /* Instruction clobbers call registers and potentially update globals.  */
-    TCG_OPF_CALL_CLOBBER = 0x02,
+    TCG_OPF_BB_END        = 0x01,
+    /* Instruction clobbers call registers.  */
+    TCG_OPF_CALL_CLOBBER  = 0x02,
     /* Instruction has side effects: it cannot be removed
        if its outputs are not used.  */
-    TCG_OPF_SIDE_EFFECTS = 0x04,
+    TCG_OPF_SIDE_EFFECTS  = 0x04,
+    /* Instruction read globals, they should be synced before.  */
+    TCG_OPF_READ_GLOBALS  = 0x08,
+    /* Instruction write globals, they should be saved back before.  */
+    TCG_OPF_WRITE_GLOBALS = 0x10,
     /* Instruction operands are 64-bits (otherwise 32-bits).  */
-    TCG_OPF_64BIT        = 0x08,
+    TCG_OPF_64BIT         = 0x10,
     /* Instruction is optional and not implemented by the host.  */
-    TCG_OPF_NOT_PRESENT  = 0x10,
+    TCG_OPF_NOT_PRESENT   = 0x20,
 };
 
 typedef struct TCGOpDef {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 01/13] tcg: add temp_dead()
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 01/13] tcg: add temp_dead() Aurelien Jarno
@ 2012-09-27 18:19   ` Richard Henderson
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 18:19 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> A lot of code is duplicated to mark a temporary as dead. Replace it
> by temp_dead(), which in addition marks the temp as saved in memory
> for globals and local temps, instead of doing this a posteriori in
> temp_save().
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] tcg: add tcg_reg_sync()
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 02/13] tcg: add tcg_reg_sync() Aurelien Jarno
@ 2012-09-27 18:24   ` Richard Henderson
  2012-09-27 20:00     ` Aurelien Jarno
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 18:24 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> Add a new function tcg_reg_sync() to synchronize the canonical location
> of a temp with the value in the associated register, but without freeing
> it. Rewrite tcg_reg_free() to first call tcg_reg_sync() and then to free
> the register.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Richard Henderson <rth@twiddle.net>

> +/* sync register 'reg' by saving it to the corresponding temporary */
> +static inline void tcg_reg_sync(TCGContext *s, int reg)

Isn't the compiler good enough at deciding inlining?


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] tcg: add temp_sync()
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 03/13] tcg: add temp_sync() Aurelien Jarno
@ 2012-09-27 18:30   ` Richard Henderson
  2012-09-27 20:02     ` Aurelien Jarno
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 18:30 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> Add a new function temp_sync() to synchronize the canonical location
> of a temp with the value in the corresponding register, but without
> freeing the associated register. Rewrite temp_save() to call
> temp_sync() followed by temp_dead().
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Richard Henderson <rth@twiddle.net>

>          case TEMP_VAL_REG:
> +            tcg_reg_sync(s, ts->reg);
>              break;
>          case TEMP_VAL_CONST:
> +            ts->reg = tcg_reg_alloc(s, tcg_target_available_regs[ts->type],
> +                                    allocated_regs);
> +            ts->val_type = TEMP_VAL_REG;
> +            s->reg_to_temp[ts->reg] = temp;
> +            ts->mem_coherent = 0;
> +            tcg_out_movi(s, ts->type, ts->reg, ts->val);
> +            tcg_reg_sync(s, ts->reg);
>              break;

Fallthru from TEMP_VAL_CONST into TEMP_VAL_REG?


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] tcg: sync output arguments on liveness request
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 04/13] tcg: sync output arguments on liveness request Aurelien Jarno
@ 2012-09-27 18:39   ` Richard Henderson
  2012-09-27 20:05     ` Aurelien Jarno
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 18:39 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> Synchronize an output argument when requested by the liveness analysis.
> This is needed so that the temp can be declared dead later.
> 
> For that, add a new op_sync_args table in which each bit tells if the
> corresponding output argument needs to be synchronized with the memory.
> Pass it to the tcg_reg_alloc_* functions, and honor this bit. We need to
> synchronize the argument before marking it as dead, and we have to make
> sure all the infos about the temp are correctly filled.
> 
> At the same time change some types from unsigned int to uint16_t when
> passing op_dead_args.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

If I understand the patch correctly, this patch alone is no change,
since no sync_args bits are ever set.

Which means that

> +        if (NEED_SYNC_ARG(i)) {
> +            tcg_reg_sync(s, reg);
> +        }
> +        if (IS_DEAD_ARG(i)) {
> +            temp_dead(s, args[i]);
> +        }

Might ought to be better written as

  if (IS_DEAD_ARG(i)) {
     ...
  } else if (NEED_SYNC_ARG(i)) {
     ...
  }

as temp_dead implies sync from patches 2 and 3.  This pattern is also
replicated 3-4 times.  Subroutine?


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 05/13] tcg: rework liveness analysis
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 05/13] tcg: rework liveness analysis Aurelien Jarno
@ 2012-09-27 18:54   ` Richard Henderson
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 18:54 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> Rework the liveness analysis by tracking temps that need to go back to
> memory in addition to dead temps tracking. This allows to mark output
> arguments as "need sync", and to synchronize them back to memory as soon
> as they are not written anymore. This way even arguments mapping to
> globals can be marked as "dead", avoiding moves to a new register when
> input and outputs are aliased.
> 
> In addition it means that registers are freed as soon as temps are not
> used anymore, instead of waiting for a basic block end or an op with side
> effects. This reduces register spilling especially on CPUs with few
> registers, and spread the mov over all the TB, increasing the
> performances on in-order CPUs.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/tcg.c |   64 +++++++++++++++++++++++++++++++++++--------------------------
>  1 file changed, 37 insertions(+), 27 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 06/13] tcg: improve tcg_reg_alloc_movi()
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 06/13] tcg: improve tcg_reg_alloc_movi() Aurelien Jarno
@ 2012-09-27 18:55   ` Richard Henderson
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 18:55 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> +++ b/tcg/tcg.c
> @@ -1658,6 +1658,9 @@ static void tcg_reg_alloc_movi(TCGContext *s, const TCGArg *args,
>      if (NEED_SYNC_ARG(0)) {
>          temp_sync(s, args[0], s->reserved_regs);
>      }
> +    if (IS_DEAD_ARG(0)) {
> +        temp_dead(s, args[0]);
> +    }
>  }

Same comment re ordering the DEAD/SYNC tests.  Otherwise,

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 07/13] tcg: rewrite tcg_reg_alloc_mov()
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 07/13] tcg: rewrite tcg_reg_alloc_mov() Aurelien Jarno
@ 2012-09-27 19:09   ` Richard Henderson
  2012-09-27 20:17     ` Aurelien Jarno
  2012-09-27 22:18   ` Richard Henderson
  1 sibling, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 19:09 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> +    /* We have to load the value in a register for moving it to another
> +       or for saving it. We assume it's better to keep it there than to
> +       reload it later. */
> +    if ((NEED_SYNC_ARG(0) && ts->val_type != TEMP_VAL_REG)
> +        || ts->val_type == TEMP_VAL_MEM) {
> +        ts->reg = tcg_reg_alloc(s, arg_ct->u.regs, allocated_regs);
> +        if (ts->val_type == TEMP_VAL_MEM) {
> +            tcg_out_ld(s, ts->type, ts->reg, ts->mem_reg, ts->mem_offset);
> +        } else if (ts->val_type == TEMP_VAL_CONST) {
> +            tcg_out_movi(s, ts->type, ts->reg, ts->val);
> +        }
> +        s->reg_to_temp[ts->reg] = args[1];
> +        ts->val_type = TEMP_VAL_REG;
> +        ts->mem_coherent = 1;
> +    }

I don't understand this block.  In particular, the ts->mem_coherent = 1
in the TEMP_VAL_CONST case looks wrong.

Why are you handling NEED_SYNC_ARG before the move, rather than after?

> +    if (IS_DEAD_ARG(0) && !ots->fixed_reg) {
> +        /* mov to a non-saved dead register makes no sense (even with
> +           liveness analysis disabled). */
> +        assert(NEED_SYNC_ARG(0));
> +        /* The code above should have moved the temp to a register. */
> +        assert(ts->val_type == TEMP_VAL_REG);
> +        if (!ots->mem_allocated) {
> +            temp_allocate_frame(s, args[0]);
>          }
> +        if (ts->val_type == TEMP_VAL_REG) {
> +            tcg_out_st(s, ots->type, ts->reg, ots->mem_reg, ots->mem_offset);
> +            if (IS_DEAD_ARG(1)) {
> +                temp_dead(s, args[1]);
> +            }
>          }
> +        temp_dead(s, args[0]);

Isn't this store going to happen via temp_dead -> temp_sync -> tcg_reg_sync?

>      } else if (ts->val_type == TEMP_VAL_CONST) {
>          if (ots->fixed_reg) {
> +            tcg_out_movi(s, ots->type, ots->reg, ts->val);
> +            s->reg_to_temp[ots->reg] = args[0];
>          } else {
>              /* propagate constant */
> +            if (ots->val_type == TEMP_VAL_REG) {
>                  s->reg_to_temp[ots->reg] = -1;
> +            }
>              ots->val_type = TEMP_VAL_CONST;
>              ots->val = ts->val;
>          }

How much of the first block above is redundant with this?
Especially given that I think there's a missing sync.

>      } else {
> +        /* The code above should have moved the temp to a register. */
> +        assert(ts->val_type == TEMP_VAL_REG);
> +        if (IS_DEAD_ARG(1) && !ts->fixed_reg && !ots->fixed_reg) {
> +                /* the mov can be suppressed */
> +                if (ots->val_type == TEMP_VAL_REG) {
> +                    s->reg_to_temp[ots->reg] = -1;
> +                }
> +                ots->reg = ts->reg;
> +                temp_dead(s, args[1]);
> +        } else {
> +            if (ots->val_type != TEMP_VAL_REG) {
> +                /* When allocating a new register, make sure to not spill the
> +                   input one. */
> +                tcg_regset_set_reg(allocated_regs, ts->reg);
> +                ots->reg = tcg_reg_alloc(s, oarg_ct->u.regs, allocated_regs);
> +            }
> +            tcg_out_mov(s, ots->type, ots->reg, ts->reg);
> +        }
> +        ots->val_type = TEMP_VAL_REG;
> +        ots->mem_coherent = 0;
> +        s->reg_to_temp[ots->reg] = args[0];
> +        if (NEED_SYNC_ARG(0)) {
> +            tcg_reg_sync(s, ots->reg);
> +        }

... as we do here.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 08/13] tcg: always mark dead input arguments as dead
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 08/13] tcg: always mark dead input arguments as dead Aurelien Jarno
@ 2012-09-27 19:10   ` Richard Henderson
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 19:10 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> Always mark dead input arguments as dead, even if the op is at the basic
> block end. This will allow to check that all temps are correctly saved.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 09/13] tcg: start with local temps in TEMP_VAL_MEM state
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 09/13] tcg: start with local temps in TEMP_VAL_MEM state Aurelien Jarno
@ 2012-09-27 19:10   ` Richard Henderson
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 19:10 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> Start with local temps in TEMP_VAL_MEM state, to make possible a later
> check that all the temps are correctly saved back to memory.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 10/13] tcg: don't explicitely save globals and temps
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 10/13] tcg: don't explicitely save globals and temps Aurelien Jarno
@ 2012-09-27 19:13   ` Richard Henderson
  2012-09-27 20:23     ` Aurelien Jarno
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 19:13 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> @@ -1706,11 +1718,9 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
>          if (!ots->mem_allocated) {
>              temp_allocate_frame(s, args[0]);
>          }
> -        if (ts->val_type == TEMP_VAL_REG) {
> -            tcg_out_st(s, ots->type, ts->reg, ots->mem_reg, ots->mem_offset);
> -            if (IS_DEAD_ARG(1)) {
> -                temp_dead(s, args[1]);
> -            }
> +        tcg_out_st(s, ots->type, ts->reg, ots->mem_reg, ots->mem_offset);
> +        if (IS_DEAD_ARG(1)) {
> +            temp_dead(s, args[1]);
>          }
>          temp_dead(s, args[0]);
>      } else if (ts->val_type == TEMP_VAL_CONST) {

Did this hunk belong to a different patch?  It seems like it belongs
with the tcg_reg_alloc_mov rewrite.

If it actually depends on patches 7-8, then perhaps a reorder is better.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 11/13] tcg: sync globals for pure helpers instead of saving them
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 11/13] tcg: sync globals for pure helpers instead of saving them Aurelien Jarno
@ 2012-09-27 19:39   ` Richard Henderson
  2012-09-27 20:34     ` Aurelien Jarno
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 19:39 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> Note: this is a change of behavior, but the new one is still compliant
> with the documentation. Currently only cris, i386 and s390x have PURE
> only helpers. I have checked they are really PURE, and also did some 
> tests on cris and i386.
> 
> diff --git a/tcg/README b/tcg/README
> index 27846f1..d61daa1 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -77,11 +77,16 @@ destroyed, but local temporaries and globals are preserved.
>  Using the tcg_gen_helper_x_y it is possible to call any function
>  taking i32, i64 or pointer types. By default, before calling a helper,
>  all globals are stored at their canonical location and it is assumed
> -that the function can modify them. This can be overridden by the
> -TCG_CALL_CONST function modifier. By default, the helper is allowed to
> -modify the CPU state or raise an exception. This can be overridden by
> -the TCG_CALL_PURE function modifier, in which case the call to the
> -function is removed if the return value is not used.
> +that the function can modify them. The helper is allowed to modify
> +the CPU state or raise an exception.
> +
> +This can be overridden by the TCG_CALL_CONST and TCG_CALL_PURE function
> +modifiers. A PURE helper can read globals but cannot modify them nor the
> +CPU state and cannot raise an exception. It can be removed if the return
> +value is not used. For that the globals are synchronized with their canonical
> +location, but the associated registers are not freed nor reloaded afterwise.
> +A CONST helper is a PURE helper which in addition cannot read globals, they
> +are not synchronized nor stored. Note that CONST implies PURE.

If we're going to re-org these flags, lets please do it right.

In particular the terms "const" and "pure" were clearly stolen from gcc,
but then given different meanings (!).  This, to me at least, is incredibly
confusing.  I have to go back and re-read the docs every time I touch
one of the helper declarations.

There are really three flags we care about:

(1) Helper may read globals.  Sometimes indirectly via exception.
    Implies that globals must be synced, but may remain in REG/CONST state.

(2) Helper may write globals.  Implies globals must be synced, and
    returned to MEM state.

(3) Helper has no side effects at all.  Implies that it can be deleted if
    its outputs are dead.

For the sake of discussion, lets call these READG, WRITEG, NOSIDE.

Our current default is READG | WRITEG.

Our current definition of PURE is READG | NOSIDE.  Note that the gcc
definition of pure actually talks about main memory.

Our current definition of CONST is none of these bits.  Note that the
gcc definition of const is a superset of pure.

What we'd like for all fp helpers is READG only.  Something that we
cannot get at the moment.

There are several cases in which the helper needs to return more than
64 bits in which we use a "temp" allocated in the env struct.  Sparc
does this for some of its 128-bit arithmetic; I'm planning to do just
the same in my s390x rewrite.  For this, the helper neither reads nor
writes globals, but it has non-global-register side effects.  There
will generally be a post-helper load from env to get the "extra" bits.
While the "true" output of the helper may be dead, the side load from
env may not be.  So we want none of READG | WRITEG | NOSIDE.

If we do reorg, we can certainly add compatibility defines so that we
need not update all translators at once.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 12/13] tcg: fix some op flags
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 12/13] tcg: fix some op flags Aurelien Jarno
@ 2012-09-27 19:44   ` Richard Henderson
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 19:44 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> Some branch related ops are marked with TCG_OPF_SIDE_EFFECTS, some other
> not. In practice they don't need to, as they are all marked with
> TCG_OPF_BB_END, which is handled specifically in all the code.
> 
> The call op is marked as TCG_OPF_SIDE_EFFECTS, which might be not true
> as there is are specific flags (TCG_CALL_CONST and TCG_CALL_PURE) for
> specifying that. On the other hand it always clobber arguments, so mark
> it as such even if the call op is handled in a different code path.
> 
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 13/13] tcg: rework TCG ops flags
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 13/13] tcg: rework TCG ops flags Aurelien Jarno
@ 2012-09-27 19:56   ` Richard Henderson
  2012-09-27 20:37     ` Aurelien Jarno
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 19:56 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> - TCG_OPF_CALL_CLOBBER: The op clobber the call registers. They are
>   freed before emitting the op.
> - TCG_OPF_SIDE_EFFECTS: The op is not removed if the returned value
>   if not used. It can trigger exception and thus globals are
>   synchronized before emitting the op.
> - TCG_OPF_READ_GLOBALS: The op can read globals but not write them,
>   and thus globals are synchronized before emitting the op.
> - TCG_OPF_WRITE_GLOBALS: The op can read and write globals, and thus
>   globals are saved back to their canonical location before emitting
>   the op.

This is more or less exactly the flag breakup I was talking about for calls.

I don't agree with SIDE_EFFECTS implying exceptions.  How can "br" cause an
exception?  Or for that matter "st_i32", recalling that we're not storing
to guest memory.

> Note: While this is restoring the possibility to map a memory address
> using both a global and accessing it through ld/st, this reduces the
> performances of targets generating a lot of ld/st op. Given it is 
> currently technically not allowed, we might instead change the TCG 
> documentation to make that clear.

Yes, I think we should.  I can't see that this is something that we
should *ever* do.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] tcg: add tcg_reg_sync()
  2012-09-27 18:24   ` Richard Henderson
@ 2012-09-27 20:00     ` Aurelien Jarno
  0 siblings, 0 replies; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 20:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Sep 27, 2012 at 11:24:34AM -0700, Richard Henderson wrote:
> On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> > Add a new function tcg_reg_sync() to synchronize the canonical location
> > of a temp with the value in the associated register, but without freeing
> > it. Rewrite tcg_reg_free() to first call tcg_reg_sync() and then to free
> > the register.
> > 
> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> 
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> 
> > +/* sync register 'reg' by saving it to the corresponding temporary */
> > +static inline void tcg_reg_sync(TCGContext *s, int reg)
> 
> Isn't the compiler good enough at deciding inlining?
> 

Surprisingly the compiler doesn't inline that many things. For example
all the tcg_reg_alloc_* functions are used only once, but are not
inlined.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] tcg: add temp_sync()
  2012-09-27 18:30   ` Richard Henderson
@ 2012-09-27 20:02     ` Aurelien Jarno
  0 siblings, 0 replies; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 20:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Sep 27, 2012 at 11:30:14AM -0700, Richard Henderson wrote:
> On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> > Add a new function temp_sync() to synchronize the canonical location
> > of a temp with the value in the corresponding register, but without
> > freeing the associated register. Rewrite temp_save() to call
> > temp_sync() followed by temp_dead().
> > 
> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> 
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> 
> >          case TEMP_VAL_REG:
> > +            tcg_reg_sync(s, ts->reg);
> >              break;
> >          case TEMP_VAL_CONST:
> > +            ts->reg = tcg_reg_alloc(s, tcg_target_available_regs[ts->type],
> > +                                    allocated_regs);
> > +            ts->val_type = TEMP_VAL_REG;
> > +            s->reg_to_temp[ts->reg] = temp;
> > +            ts->mem_coherent = 0;
> > +            tcg_out_movi(s, ts->type, ts->reg, ts->val);
> > +            tcg_reg_sync(s, ts->reg);
> >              break;
> 
> Fallthru from TEMP_VAL_CONST into TEMP_VAL_REG?
> 

Good catch. It was refactoring from temp_save, and I didn't see it.


-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] tcg: sync output arguments on liveness request
  2012-09-27 18:39   ` Richard Henderson
@ 2012-09-27 20:05     ` Aurelien Jarno
  2012-09-27 20:10       ` Richard Henderson
  0 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 20:05 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Sep 27, 2012 at 11:39:06AM -0700, Richard Henderson wrote:
> On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> > Synchronize an output argument when requested by the liveness analysis.
> > This is needed so that the temp can be declared dead later.
> > 
> > For that, add a new op_sync_args table in which each bit tells if the
> > corresponding output argument needs to be synchronized with the memory.
> > Pass it to the tcg_reg_alloc_* functions, and honor this bit. We need to
> > synchronize the argument before marking it as dead, and we have to make
> > sure all the infos about the temp are correctly filled.
> > 
> > At the same time change some types from unsigned int to uint16_t when
> > passing op_dead_args.
> > 
> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> 
> If I understand the patch correctly, this patch alone is no change,
> since no sync_args bits are ever set.

They are set later in this series, this is a preparatory work.

> Which means that
> 
> > +        if (NEED_SYNC_ARG(i)) {
> > +            tcg_reg_sync(s, reg);
> > +        }
> > +        if (IS_DEAD_ARG(i)) {
> > +            temp_dead(s, args[i]);
> > +        }
> 
> Might ought to be better written as
> 
>   if (IS_DEAD_ARG(i)) {
>      ...
>   } else if (NEED_SYNC_ARG(i)) {
>      ...
>   }
> 
> as temp_dead implies sync from patches 2 and 3.  This pattern is also

No temp_dead doesn't imply sync. temp_dead is only marking the temp as
dead, it doesn' save it. 

> replicated 3-4 times.  Subroutine?
> 

Replicated in this patch, but only 2 are left at the end.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] tcg: sync output arguments on liveness request
  2012-09-27 20:05     ` Aurelien Jarno
@ 2012-09-27 20:10       ` Richard Henderson
  2012-09-27 20:25         ` Aurelien Jarno
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 20:10 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 01:05 PM, Aurelien Jarno wrote:
> No temp_dead doesn't imply sync. temp_dead is only marking the temp as
> dead, it doesn' save it. 

Oh, duh.  How did I mis-read "dead" as "save"?


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 07/13] tcg: rewrite tcg_reg_alloc_mov()
  2012-09-27 19:09   ` Richard Henderson
@ 2012-09-27 20:17     ` Aurelien Jarno
  0 siblings, 0 replies; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 20:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Sep 27, 2012 at 12:09:35PM -0700, Richard Henderson wrote:
> On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> > +    /* We have to load the value in a register for moving it to another
> > +       or for saving it. We assume it's better to keep it there than to
> > +       reload it later. */
> > +    if ((NEED_SYNC_ARG(0) && ts->val_type != TEMP_VAL_REG)
> > +        || ts->val_type == TEMP_VAL_MEM) {
> > +        ts->reg = tcg_reg_alloc(s, arg_ct->u.regs, allocated_regs);
> > +        if (ts->val_type == TEMP_VAL_MEM) {
> > +            tcg_out_ld(s, ts->type, ts->reg, ts->mem_reg, ts->mem_offset);
> > +        } else if (ts->val_type == TEMP_VAL_CONST) {
> > +            tcg_out_movi(s, ts->type, ts->reg, ts->val);
> > +        }
> > +        s->reg_to_temp[ts->reg] = args[1];
> > +        ts->val_type = TEMP_VAL_REG;
> > +        ts->mem_coherent = 1;
> > +    }
> 
> I don't understand this block.  In particular, the ts->mem_coherent = 1
> in the TEMP_VAL_CONST case looks wrong.

Indeed the ts->mem_coherent = 1 is wrong in the TEMP_VAL_CONST.

> Why are you handling NEED_SYNC_ARG before the move, rather than after?

Because the move is likely to be eliminated by the code below,
especially if the temp is dead in addition.

> > +    if (IS_DEAD_ARG(0) && !ots->fixed_reg) {
> > +        /* mov to a non-saved dead register makes no sense (even with
> > +           liveness analysis disabled). */
> > +        assert(NEED_SYNC_ARG(0));
> > +        /* The code above should have moved the temp to a register. */
> > +        assert(ts->val_type == TEMP_VAL_REG);
> > +        if (!ots->mem_allocated) {
> > +            temp_allocate_frame(s, args[0]);
> >          }
> > +        if (ts->val_type == TEMP_VAL_REG) {
> > +            tcg_out_st(s, ots->type, ts->reg, ots->mem_reg, ots->mem_offset);
> > +            if (IS_DEAD_ARG(1)) {
> > +                temp_dead(s, args[1]);
> > +            }
> >          }
> > +        temp_dead(s, args[0]);
> 
> Isn't this store going to happen via temp_dead -> temp_sync -> tcg_reg_sync?

temp_dead only mark the temp as dead, it doesn't save it.

> >      } else if (ts->val_type == TEMP_VAL_CONST) {
> >          if (ots->fixed_reg) {
> > +            tcg_out_movi(s, ots->type, ots->reg, ts->val);
> > +            s->reg_to_temp[ots->reg] = args[0];
> >          } else {
> >              /* propagate constant */
> > +            if (ots->val_type == TEMP_VAL_REG) {
> >                  s->reg_to_temp[ots->reg] = -1;
> > +            }
> >              ots->val_type = TEMP_VAL_CONST;
> >              ots->val = ts->val;
> >          }
> 
> How much of the first block above is redundant with this?
> Especially given that I think there's a missing sync.

The goal of to first block is to move the value to a register in case a
sync is needed, and the sync is done at this moment. The ots->fixed_reg
can indeed by merged into the first block, but the rest has to stay there.

> >      } else {
> > +        /* The code above should have moved the temp to a register. */
> > +        assert(ts->val_type == TEMP_VAL_REG);
> > +        if (IS_DEAD_ARG(1) && !ts->fixed_reg && !ots->fixed_reg) {
> > +                /* the mov can be suppressed */
> > +                if (ots->val_type == TEMP_VAL_REG) {
> > +                    s->reg_to_temp[ots->reg] = -1;
> > +                }
> > +                ots->reg = ts->reg;
> > +                temp_dead(s, args[1]);
> > +        } else {
> > +            if (ots->val_type != TEMP_VAL_REG) {
> > +                /* When allocating a new register, make sure to not spill the
> > +                   input one. */
> > +                tcg_regset_set_reg(allocated_regs, ts->reg);
> > +                ots->reg = tcg_reg_alloc(s, oarg_ct->u.regs, allocated_regs);
> > +            }
> > +            tcg_out_mov(s, ots->type, ots->reg, ts->reg);
> > +        }
> > +        ots->val_type = TEMP_VAL_REG;
> > +        ots->mem_coherent = 0;
> > +        s->reg_to_temp[ots->reg] = args[0];
> > +        if (NEED_SYNC_ARG(0)) {
> > +            tcg_reg_sync(s, ots->reg);
> > +        }
> 
> ... as we do here.

The sync is done here because it has not been done in the first block.
It's different than in the ts->val_type == TEMP_VAL_CONST


-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 10/13] tcg: don't explicitely save globals and temps
  2012-09-27 19:13   ` Richard Henderson
@ 2012-09-27 20:23     ` Aurelien Jarno
  0 siblings, 0 replies; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 20:23 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Sep 27, 2012 at 12:13:25PM -0700, Richard Henderson wrote:
> On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> > @@ -1706,11 +1718,9 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
> >          if (!ots->mem_allocated) {
> >              temp_allocate_frame(s, args[0]);
> >          }
> > -        if (ts->val_type == TEMP_VAL_REG) {
> > -            tcg_out_st(s, ots->type, ts->reg, ots->mem_reg, ots->mem_offset);
> > -            if (IS_DEAD_ARG(1)) {
> > -                temp_dead(s, args[1]);
> > -            }
> > +        tcg_out_st(s, ots->type, ts->reg, ots->mem_reg, ots->mem_offset);
> > +        if (IS_DEAD_ARG(1)) {
> > +            temp_dead(s, args[1]);
> >          }
> >          temp_dead(s, args[0]);
> >      } else if (ts->val_type == TEMP_VAL_CONST) {
> 
> Did this hunk belong to a different patch?  It seems like it belongs
> with the tcg_reg_alloc_mov rewrite.

Yes, it was a simplification meant to go in the patch 7, but it ended up
at the wrong place (the assert above already ensure that ts->val_type ==
TEMP_VAL_REG.

I'll move it to the write patch.

> If it actually depends on patches 7-8, then perhaps a reorder is better.
> 
> 
> r~
> 
> 

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 04/13] tcg: sync output arguments on liveness request
  2012-09-27 20:10       ` Richard Henderson
@ 2012-09-27 20:25         ` Aurelien Jarno
  0 siblings, 0 replies; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 20:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Sep 27, 2012 at 01:10:28PM -0700, Richard Henderson wrote:
> On 09/27/2012 01:05 PM, Aurelien Jarno wrote:
> > No temp_dead doesn't imply sync. temp_dead is only marking the temp as
> > dead, it doesn' save it. 
> 
> Oh, duh.  How did I mis-read "dead" as "save"?
> 

Looks like ;)

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 11/13] tcg: sync globals for pure helpers instead of saving them
  2012-09-27 19:39   ` Richard Henderson
@ 2012-09-27 20:34     ` Aurelien Jarno
  2012-09-27 22:02       ` Richard Henderson
  0 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 20:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Sep 27, 2012 at 12:39:10PM -0700, Richard Henderson wrote:
> On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> > Note: this is a change of behavior, but the new one is still compliant
> > with the documentation. Currently only cris, i386 and s390x have PURE
> > only helpers. I have checked they are really PURE, and also did some 
> > tests on cris and i386.
> > 
> > diff --git a/tcg/README b/tcg/README
> > index 27846f1..d61daa1 100644
> > --- a/tcg/README
> > +++ b/tcg/README
> > @@ -77,11 +77,16 @@ destroyed, but local temporaries and globals are preserved.
> >  Using the tcg_gen_helper_x_y it is possible to call any function
> >  taking i32, i64 or pointer types. By default, before calling a helper,
> >  all globals are stored at their canonical location and it is assumed
> > -that the function can modify them. This can be overridden by the
> > -TCG_CALL_CONST function modifier. By default, the helper is allowed to
> > -modify the CPU state or raise an exception. This can be overridden by
> > -the TCG_CALL_PURE function modifier, in which case the call to the
> > -function is removed if the return value is not used.
> > +that the function can modify them. The helper is allowed to modify
> > +the CPU state or raise an exception.
> > +
> > +This can be overridden by the TCG_CALL_CONST and TCG_CALL_PURE function
> > +modifiers. A PURE helper can read globals but cannot modify them nor the
> > +CPU state and cannot raise an exception. It can be removed if the return
> > +value is not used. For that the globals are synchronized with their canonical
> > +location, but the associated registers are not freed nor reloaded afterwise.
> > +A CONST helper is a PURE helper which in addition cannot read globals, they
> > +are not synchronized nor stored. Note that CONST implies PURE.
> 
> If we're going to re-org these flags, lets please do it right.

Well it was not a re-org, but more an improvement of the documentation.

> In particular the terms "const" and "pure" were clearly stolen from gcc,
> but then given different meanings (!).  This, to me at least, is incredibly
> confusing.  I have to go back and re-read the docs every time I touch
> one of the helper declarations.
> 
> There are really three flags we care about:
> 
> (1) Helper may read globals.  Sometimes indirectly via exception.
>     Implies that globals must be synced, but may remain in REG/CONST state.

I actually do wonder if we shouldn't use a different flag for "may
trigger exception", even if we end-up handling it the same way as "may
read globals". I am not sure it would be clear without knowledge of the
QEMU internals that a function triggering exceptions should be flagged
as "read globals".

> (2) Helper may write globals.  Implies globals must be synced, and
>     returned to MEM state.
> 
> (3) Helper has no side effects at all.  Implies that it can be deleted if
>     its outputs are dead.

This is basically the flags I have used for the TCG op. That said the
problem with this way of doing for helpers is that most helpers will now
need flags, while it was more or less the contrary up to now.

It's a bit like in GCC, a default function implies nothing, but you can
add more flags latter.

> For the sake of discussion, lets call these READG, WRITEG, NOSIDE.
> 
> Our current default is READG | WRITEG.
> 
> Our current definition of PURE is READG | NOSIDE.  Note that the gcc
> definition of pure actually talks about main memory.
> 
> Our current definition of CONST is none of these bits.  Note that the
> gcc definition of const is a superset of pure.
> 
> What we'd like for all fp helpers is READG only.  Something that we
> cannot get at the moment.

Agreed, as long as the fp env is not mapped as a global.

> There are several cases in which the helper needs to return more than
> 64 bits in which we use a "temp" allocated in the env struct.  Sparc
> does this for some of its 128-bit arithmetic; I'm planning to do just
> the same in my s390x rewrite.  For this, the helper neither reads nor
> writes globals, but it has non-global-register side effects.  There
> will generally be a post-helper load from env to get the "extra" bits.
> While the "true" output of the helper may be dead, the side load from
> env may not be.  So we want none of READG | WRITEG | NOSIDE.
> 
> If we do reorg, we can certainly add compatibility defines so that we
> need not update all translators at once.
> 

Given the default is READG | WRITEG, it would be difficult to do that
unless we negate the flags. Actually NOWRITEG, NOREADG, NOSIDE might be
a good way to go, and is more consistent that by default nothing should
be implied.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 13/13] tcg: rework TCG ops flags
  2012-09-27 19:56   ` Richard Henderson
@ 2012-09-27 20:37     ` Aurelien Jarno
  2012-09-27 22:00       ` Richard Henderson
  0 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 20:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Sep 27, 2012 at 12:56:11PM -0700, Richard Henderson wrote:
> On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> > - TCG_OPF_CALL_CLOBBER: The op clobber the call registers. They are
> >   freed before emitting the op.
> > - TCG_OPF_SIDE_EFFECTS: The op is not removed if the returned value
> >   if not used. It can trigger exception and thus globals are
> >   synchronized before emitting the op.
> > - TCG_OPF_READ_GLOBALS: The op can read globals but not write them,
> >   and thus globals are synchronized before emitting the op.
> > - TCG_OPF_WRITE_GLOBALS: The op can read and write globals, and thus
> >   globals are saved back to their canonical location before emitting
> >   the op.
> 
> This is more or less exactly the flag breakup I was talking about for calls.
> 
> I don't agree with SIDE_EFFECTS implying exceptions.  How can "br" cause an
> exception?  Or for that matter "st_i32", recalling that we're not storing
> to guest memory.

That's exactly why SIDE_EFFECTS has been removed from this op in the
previous patch.

I think it implies exception, because I don't see why an op shouldn't be
removed otherwise (remember ops without outputs are never removed).

> > Note: While this is restoring the possibility to map a memory address
> > using both a global and accessing it through ld/st, this reduces the
> > performances of targets generating a lot of ld/st op. Given it is 
> > currently technically not allowed, we might instead change the TCG 
> > documentation to make that clear.
> 
> Yes, I think we should.  I can't see that this is something that we
> should *ever* do.
> 

Ok, then the documentation should be fixed.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 13/13] tcg: rework TCG ops flags
  2012-09-27 20:37     ` Aurelien Jarno
@ 2012-09-27 22:00       ` Richard Henderson
  2012-09-27 23:08         ` Aurelien Jarno
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 22:00 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 01:37 PM, Aurelien Jarno wrote:
>> > I don't agree with SIDE_EFFECTS implying exceptions.  How can "br" cause an
>> > exception?  Or for that matter "st_i32", recalling that we're not storing
>> > to guest memory.
> That's exactly why SIDE_EFFECTS has been removed from this op in the
> previous patch.

Well, you removed it from br, but not st.

> I think it implies exception, because I don't see why an op shouldn't be
> removed otherwise (remember ops without outputs are never removed).

In which case, because the non-qemu store insns cannot raise exceptions,
there ought to be exactly zero instances of TCG_OPF_SIDE_EFFECTS remaining.
At which point we simply ought to remove it.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 11/13] tcg: sync globals for pure helpers instead of saving them
  2012-09-27 20:34     ` Aurelien Jarno
@ 2012-09-27 22:02       ` Richard Henderson
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 22:02 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 01:34 PM, Aurelien Jarno wrote:
>> (1) Helper may read globals.  Sometimes indirectly via exception.
>>     Implies that globals must be synced, but may remain in REG/CONST state.
> 
> I actually do wonder if we shouldn't use a different flag for "may
> trigger exception", even if we end-up handling it the same way as "may
> read globals". I am not sure it would be clear without knowledge of the
> QEMU internals that a function triggering exceptions should be flagged
> as "read globals".

I agree it is better to have a "may trigger exception" flag
if for nothing else besides documentation.

> Given the default is READG | WRITEG, it would be difficult to do that
> unless we negate the flags. Actually NOWRITEG, NOREADG, NOSIDE might be
> a good way to go, and is more consistent that by default nothing should
> be implied.

Sure.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 07/13] tcg: rewrite tcg_reg_alloc_mov()
  2012-09-27 17:15 ` [Qemu-devel] [PATCH 07/13] tcg: rewrite tcg_reg_alloc_mov() Aurelien Jarno
  2012-09-27 19:09   ` Richard Henderson
@ 2012-09-27 22:18   ` Richard Henderson
  1 sibling, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 22:18 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 10:15 AM, Aurelien Jarno wrote:
> +    /* We have to load the value in a register for moving it to another
> +       or for saving it. We assume it's better to keep it there than to
> +       reload it later. */
> +    if ((NEED_SYNC_ARG(0) && ts->val_type != TEMP_VAL_REG)
> +        || ts->val_type == TEMP_VAL_MEM) {
> +        ts->reg = tcg_reg_alloc(s, arg_ct->u.regs, allocated_regs);
> +        if (ts->val_type == TEMP_VAL_MEM) {
> +            tcg_out_ld(s, ts->type, ts->reg, ts->mem_reg, ts->mem_offset);
> +        } else if (ts->val_type == TEMP_VAL_CONST) {
> +            tcg_out_movi(s, ts->type, ts->reg, ts->val);
> +        }
> +        s->reg_to_temp[ts->reg] = args[1];
> +        ts->val_type = TEMP_VAL_REG;
> +        ts->mem_coherent = 1;
> +    }

Ok, I believe I understand what's going on now.  Nothing like trying to
rewrite the function yourself to figure out why you've done things this way.

The only thing I'd change for clarity is that first condition:

    /* If the source value is not in a register, and we're going to be
       forced to have it in a register in order to perform the copy,
       then copy the SOURCE value into its own register first.  That way
       we don't have to reload SOURCE the next time it is used.

       Note that in the CONST + SYNC case, we must for the moment have
       the value in a register because we have no direct access to a
       store constant primitive.  */

    if ((ts->val_type == TEMP_VAL_CONST && NEED_SYNC_ARG(0))
        || ts->val_type == TEMP_VAL_MEM) {



r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 13/13] tcg: rework TCG ops flags
  2012-09-27 22:00       ` Richard Henderson
@ 2012-09-27 23:08         ` Aurelien Jarno
  2012-09-27 23:11           ` Richard Henderson
  0 siblings, 1 reply; 41+ messages in thread
From: Aurelien Jarno @ 2012-09-27 23:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Sep 27, 2012 at 03:00:15PM -0700, Richard Henderson wrote:
> On 09/27/2012 01:37 PM, Aurelien Jarno wrote:
> >> > I don't agree with SIDE_EFFECTS implying exceptions.  How can "br" cause an
> >> > exception?  Or for that matter "st_i32", recalling that we're not storing
> >> > to guest memory.
> > That's exactly why SIDE_EFFECTS has been removed from this op in the
> > previous patch.
> 
> Well, you removed it from br, but not st.

Oh correct. st doesn't need one as it has zero output arguments, so it
won't be removed.

> > I think it implies exception, because I don't see why an op shouldn't be
> > removed otherwise (remember ops without outputs are never removed).
> 
> In which case, because the non-qemu store insns cannot raise exceptions,
> there ought to be exactly zero instances of TCG_OPF_SIDE_EFFECTS remaining.
> At which point we simply ought to remove it.
> 

I don't understand, the qemu/load store still need to keep this
TCG_OPF_SIDE_EFFECTS. Even a load to a dead output argument might 
trigger a TLB miss exception, and thus should be fixed.

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 13/13] tcg: rework TCG ops flags
  2012-09-27 23:08         ` Aurelien Jarno
@ 2012-09-27 23:11           ` Richard Henderson
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2012-09-27 23:11 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 09/27/2012 04:08 PM, Aurelien Jarno wrote:
>> > In which case, because the non-qemu store insns cannot raise exceptions,
>> > there ought to be exactly zero instances of TCG_OPF_SIDE_EFFECTS remaining.
>> > At which point we simply ought to remove it.
>> > 
> I don't understand, the qemu/load store still need to keep this
> TCG_OPF_SIDE_EFFECTS. Even a load to a dead output argument might 
> trigger a TLB miss exception, and thus should be fixed.

Ah, true.


r~

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2012-09-27 23:11 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-27 17:15 [Qemu-devel] [PATCH 00/13] tcg: rework liveness analysis and register allocator Aurelien Jarno
2012-09-27 17:15 ` [Qemu-devel] [PATCH 01/13] tcg: add temp_dead() Aurelien Jarno
2012-09-27 18:19   ` Richard Henderson
2012-09-27 17:15 ` [Qemu-devel] [PATCH 02/13] tcg: add tcg_reg_sync() Aurelien Jarno
2012-09-27 18:24   ` Richard Henderson
2012-09-27 20:00     ` Aurelien Jarno
2012-09-27 17:15 ` [Qemu-devel] [PATCH 03/13] tcg: add temp_sync() Aurelien Jarno
2012-09-27 18:30   ` Richard Henderson
2012-09-27 20:02     ` Aurelien Jarno
2012-09-27 17:15 ` [Qemu-devel] [PATCH 04/13] tcg: sync output arguments on liveness request Aurelien Jarno
2012-09-27 18:39   ` Richard Henderson
2012-09-27 20:05     ` Aurelien Jarno
2012-09-27 20:10       ` Richard Henderson
2012-09-27 20:25         ` Aurelien Jarno
2012-09-27 17:15 ` [Qemu-devel] [PATCH 05/13] tcg: rework liveness analysis Aurelien Jarno
2012-09-27 18:54   ` Richard Henderson
2012-09-27 17:15 ` [Qemu-devel] [PATCH 06/13] tcg: improve tcg_reg_alloc_movi() Aurelien Jarno
2012-09-27 18:55   ` Richard Henderson
2012-09-27 17:15 ` [Qemu-devel] [PATCH 07/13] tcg: rewrite tcg_reg_alloc_mov() Aurelien Jarno
2012-09-27 19:09   ` Richard Henderson
2012-09-27 20:17     ` Aurelien Jarno
2012-09-27 22:18   ` Richard Henderson
2012-09-27 17:15 ` [Qemu-devel] [PATCH 08/13] tcg: always mark dead input arguments as dead Aurelien Jarno
2012-09-27 19:10   ` Richard Henderson
2012-09-27 17:15 ` [Qemu-devel] [PATCH 09/13] tcg: start with local temps in TEMP_VAL_MEM state Aurelien Jarno
2012-09-27 19:10   ` Richard Henderson
2012-09-27 17:15 ` [Qemu-devel] [PATCH 10/13] tcg: don't explicitely save globals and temps Aurelien Jarno
2012-09-27 19:13   ` Richard Henderson
2012-09-27 20:23     ` Aurelien Jarno
2012-09-27 17:15 ` [Qemu-devel] [PATCH 11/13] tcg: sync globals for pure helpers instead of saving them Aurelien Jarno
2012-09-27 19:39   ` Richard Henderson
2012-09-27 20:34     ` Aurelien Jarno
2012-09-27 22:02       ` Richard Henderson
2012-09-27 17:15 ` [Qemu-devel] [PATCH 12/13] tcg: fix some op flags Aurelien Jarno
2012-09-27 19:44   ` Richard Henderson
2012-09-27 17:15 ` [Qemu-devel] [PATCH 13/13] tcg: rework TCG ops flags Aurelien Jarno
2012-09-27 19:56   ` Richard Henderson
2012-09-27 20:37     ` Aurelien Jarno
2012-09-27 22:00       ` Richard Henderson
2012-09-27 23:08         ` Aurelien Jarno
2012-09-27 23:11           ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.