All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end
@ 2017-06-21  2:48 Richard Henderson
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 01/16] tcg: Merge opcode arguments into TCGOp Richard Henderson
                   ` (17 more replies)
  0 siblings, 18 replies; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

There are two conceptually unrelated cleanups in here, though
the second touches many of the same lines as the first, so
separating the two would be ugly.

The first is to split gen_opparam_buf and move the pieces into
TCGOp.  This has two effects: the operands for an op is in the
same cacheline as the op, and we get to drop the pointer into
gen_opparam_buf, freeing up a register and/or function argument.

The second is to change what value is stored in TCGArg for each
TCG temporary.  Rather than store the index into tcg_ctx.temps,
store the pointer to the temp itself.  This allows us to drop
some arithmetic on many uses of a temp within the backend.

Making that second change is tricky, as we don't want to miss any
of the places that ought to be changed.  To do that I introduce a
number of helpers.

As a final step I changed the type of TCGOp.args to a structure,
and annotated the places that access constant arguments.  I found
that final patch to be really ugly, so I dropped it.  But I'm
fairly confident that I've updated all of the non-constant args.

The effect of this is nearly noise, but does reduce code size,

   text	   data	    bss	    dec	    hex	filename
6648688	2106408	4486112	13241208 ca0b78	qemu-system-alpha (before)
6627656	2106408	4502496	13236560 c9f950	qemu-system-alpha (after)

or about 21k.


r~


Richard Henderson (16):
  tcg: Merge opcode arguments into TCGOp
  tcg: Propagate args to op->args in optimizer
  tcg: Propagate args to op->args in tcg.c
  tcg: Propagate TCGOp down to allocators
  tcg: Introduce arg_temp
  tcg: Add temp_global bit to TCGTemp
  tcg: Return NULL temp for TCG_CALL_DUMMY_ARG
  tcg: Introduce temp_arg
  tcg: Use per-temp state data in liveness
  tcg: Avoid loops against variable bounds
  tcg: Change temp_allocate_frame arg to TCGTemp
  tcg: Remove unused TCG_CALL_DUMMY_TCGV
  tcg: Export temp_idx
  tcg: Use per-temp state data in optimize
  tcg: Define separate structures for TCGv_*
  tcg: Store pointers to temporaries directly in TCGArg

 tcg/optimize.c | 647 ++++++++++++++++++++++++++++++++-------------------------
 tcg/tcg-op.c   |  99 ++++-----
 tcg/tcg.c      | 610 ++++++++++++++++++++++++-----------------------------
 tcg/tcg.h      | 287 ++++++++++++++-----------
 4 files changed, 841 insertions(+), 802 deletions(-)

-- 
2.9.4

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 01/16] tcg: Merge opcode arguments into TCGOp
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-26 14:44   ` Alex Bennée
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 02/16] tcg: Propagate args to op->args in optimizer Richard Henderson
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Rather than have a separate buffer of 10*max_ops entries,
give each opcode 10 entries.  The result is actually a bit
smaller and should have slightly more cache locality.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c |  6 ++--
 tcg/tcg-op.c   | 99 +++++++++++++++++++++-------------------------------------
 tcg/tcg.c      | 98 ++++++++++++++++++++++++++-------------------------------
 tcg/tcg.h      | 33 +++++++-------------
 4 files changed, 94 insertions(+), 142 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index adfc56c..002aad6 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -576,7 +576,7 @@ void tcg_optimize(TCGContext *s)
         TCGArg tmp;
 
         TCGOp * const op = &s->gen_op_buf[oi];
-        TCGArg * const args = &s->gen_opparam_buf[op->args];
+        TCGArg * const args = op->args;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def = &tcg_op_defs[opc];
 
@@ -1184,7 +1184,7 @@ void tcg_optimize(TCGContext *s)
                 uint64_t b = ((uint64_t)bh << 32) | bl;
                 TCGArg rl, rh;
                 TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32, 2);
-                TCGArg *args2 = &s->gen_opparam_buf[op2->args];
+                TCGArg *args2 = op2->args;
 
                 if (opc == INDEX_op_add2_i32) {
                     a += b;
@@ -1210,7 +1210,7 @@ void tcg_optimize(TCGContext *s)
                 uint64_t r = (uint64_t)a * b;
                 TCGArg rl, rh;
                 TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32, 2);
-                TCGArg *args2 = &s->gen_opparam_buf[op2->args];
+                TCGArg *args2 = op2->args;
 
                 rl = args[0];
                 rh = args[1];
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 87f673e..3a627c1 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -45,107 +45,78 @@ extern TCGv_i32 TCGV_HIGH_link_error(TCGv_i64);
    Up to and including filling in the forward link immediately.  We'll do
    proper termination of the end of the list after we finish translation.  */
 
-static void tcg_emit_op(TCGContext *ctx, TCGOpcode opc, int args)
+static inline TCGOp *tcg_emit_op(TCGContext *ctx, TCGOpcode opc)
 {
     int oi = ctx->gen_next_op_idx;
     int ni = oi + 1;
     int pi = oi - 1;
+    TCGOp *op = &ctx->gen_op_buf[oi];
 
     tcg_debug_assert(oi < OPC_BUF_SIZE);
     ctx->gen_op_buf[0].prev = oi;
     ctx->gen_next_op_idx = ni;
 
-    ctx->gen_op_buf[oi] = (TCGOp){
-        .opc = opc,
-        .args = args,
-        .prev = pi,
-        .next = ni
-    };
+    memset(op, 0, offsetof(TCGOp, args));
+    op->opc = opc;
+    op->prev = pi;
+    op->next = ni;
+
+    return op;
 }
 
 void tcg_gen_op1(TCGContext *ctx, TCGOpcode opc, TCGArg a1)
 {
-    int pi = ctx->gen_next_parm_idx;
-
-    tcg_debug_assert(pi + 1 <= OPPARAM_BUF_SIZE);
-    ctx->gen_next_parm_idx = pi + 1;
-    ctx->gen_opparam_buf[pi] = a1;
-
-    tcg_emit_op(ctx, opc, pi);
+    TCGOp *op = tcg_emit_op(ctx, opc);
+    op->args[0] = a1;
 }
 
 void tcg_gen_op2(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2)
 {
-    int pi = ctx->gen_next_parm_idx;
-
-    tcg_debug_assert(pi + 2 <= OPPARAM_BUF_SIZE);
-    ctx->gen_next_parm_idx = pi + 2;
-    ctx->gen_opparam_buf[pi + 0] = a1;
-    ctx->gen_opparam_buf[pi + 1] = a2;
-
-    tcg_emit_op(ctx, opc, pi);
+    TCGOp *op = tcg_emit_op(ctx, opc);
+    op->args[0] = a1;
+    op->args[1] = a2;
 }
 
 void tcg_gen_op3(TCGContext *ctx, TCGOpcode opc, TCGArg a1,
                  TCGArg a2, TCGArg a3)
 {
-    int pi = ctx->gen_next_parm_idx;
-
-    tcg_debug_assert(pi + 3 <= OPPARAM_BUF_SIZE);
-    ctx->gen_next_parm_idx = pi + 3;
-    ctx->gen_opparam_buf[pi + 0] = a1;
-    ctx->gen_opparam_buf[pi + 1] = a2;
-    ctx->gen_opparam_buf[pi + 2] = a3;
-
-    tcg_emit_op(ctx, opc, pi);
+    TCGOp *op = tcg_emit_op(ctx, opc);
+    op->args[0] = a1;
+    op->args[1] = a2;
+    op->args[2] = a3;
 }
 
 void tcg_gen_op4(TCGContext *ctx, TCGOpcode opc, TCGArg a1,
                  TCGArg a2, TCGArg a3, TCGArg a4)
 {
-    int pi = ctx->gen_next_parm_idx;
-
-    tcg_debug_assert(pi + 4 <= OPPARAM_BUF_SIZE);
-    ctx->gen_next_parm_idx = pi + 4;
-    ctx->gen_opparam_buf[pi + 0] = a1;
-    ctx->gen_opparam_buf[pi + 1] = a2;
-    ctx->gen_opparam_buf[pi + 2] = a3;
-    ctx->gen_opparam_buf[pi + 3] = a4;
-
-    tcg_emit_op(ctx, opc, pi);
+    TCGOp *op = tcg_emit_op(ctx, opc);
+    op->args[0] = a1;
+    op->args[1] = a2;
+    op->args[2] = a3;
+    op->args[3] = a4;
 }
 
 void tcg_gen_op5(TCGContext *ctx, TCGOpcode opc, TCGArg a1,
                  TCGArg a2, TCGArg a3, TCGArg a4, TCGArg a5)
 {
-    int pi = ctx->gen_next_parm_idx;
-
-    tcg_debug_assert(pi + 5 <= OPPARAM_BUF_SIZE);
-    ctx->gen_next_parm_idx = pi + 5;
-    ctx->gen_opparam_buf[pi + 0] = a1;
-    ctx->gen_opparam_buf[pi + 1] = a2;
-    ctx->gen_opparam_buf[pi + 2] = a3;
-    ctx->gen_opparam_buf[pi + 3] = a4;
-    ctx->gen_opparam_buf[pi + 4] = a5;
-
-    tcg_emit_op(ctx, opc, pi);
+    TCGOp *op = tcg_emit_op(ctx, opc);
+    op->args[0] = a1;
+    op->args[1] = a2;
+    op->args[2] = a3;
+    op->args[3] = a4;
+    op->args[4] = a5;
 }
 
 void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2,
                  TCGArg a3, TCGArg a4, TCGArg a5, TCGArg a6)
 {
-    int pi = ctx->gen_next_parm_idx;
-
-    tcg_debug_assert(pi + 6 <= OPPARAM_BUF_SIZE);
-    ctx->gen_next_parm_idx = pi + 6;
-    ctx->gen_opparam_buf[pi + 0] = a1;
-    ctx->gen_opparam_buf[pi + 1] = a2;
-    ctx->gen_opparam_buf[pi + 2] = a3;
-    ctx->gen_opparam_buf[pi + 3] = a4;
-    ctx->gen_opparam_buf[pi + 4] = a5;
-    ctx->gen_opparam_buf[pi + 5] = a6;
-
-    tcg_emit_op(ctx, opc, pi);
+    TCGOp *op = tcg_emit_op(ctx, opc);
+    op->args[0] = a1;
+    op->args[1] = a2;
+    op->args[2] = a3;
+    op->args[3] = a4;
+    op->args[4] = a5;
+    op->args[5] = a6;
 }
 
 void tcg_gen_mb(TCGBar mb_type)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 3559829..298aa0c 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -469,7 +469,6 @@ void tcg_func_start(TCGContext *s)
     s->gen_op_buf[0].next = 1;
     s->gen_op_buf[0].prev = 0;
     s->gen_next_op_idx = 1;
-    s->gen_next_parm_idx = 0;
 
     s->be = tcg_malloc(sizeof(TCGBackendData));
 }
@@ -757,9 +756,10 @@ int tcg_check_temp_count(void)
 void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
                    int nargs, TCGArg *args)
 {
-    int i, real_args, nb_rets, pi, pi_first;
+    int i, real_args, nb_rets, pi;
     unsigned sizemask, flags;
     TCGHelperInfo *info;
+    TCGOp *op;
 
     info = g_hash_table_lookup(s->helpers, (gpointer)func);
     flags = info->flags;
@@ -772,11 +772,11 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
     int orig_sizemask = sizemask;
     int orig_nargs = nargs;
     TCGv_i64 retl, reth;
+    TCGArg split_args[MAX_OPC_PARAM];
 
     TCGV_UNUSED_I64(retl);
     TCGV_UNUSED_I64(reth);
     if (sizemask != 0) {
-        TCGArg *split_args = __builtin_alloca(sizeof(TCGArg) * nargs * 2);
         for (i = real_args = 0; i < nargs; ++i) {
             int is_64bit = sizemask & (1 << (i+1)*2);
             if (is_64bit) {
@@ -811,7 +811,19 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
     }
 #endif /* TCG_TARGET_EXTEND_ARGS */
 
-    pi_first = pi = s->gen_next_parm_idx;
+    i = s->gen_next_op_idx;
+    tcg_debug_assert(i < OPC_BUF_SIZE);
+    s->gen_op_buf[0].prev = i;
+    s->gen_next_op_idx = i + 1;
+    op = &s->gen_op_buf[i];
+
+    /* Set links for sequential allocation during translation.  */
+    memset(op, 0, offsetof(TCGOp, args));
+    op->opc = INDEX_op_call;
+    op->prev = i - 1;
+    op->next = i + 1;
+
+    pi = 0;
     if (ret != TCG_CALL_DUMMY_ARG) {
 #if defined(__sparc__) && !defined(__arch64__) \
     && !defined(CONFIG_TCG_INTERPRETER)
@@ -821,31 +833,33 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
                two return temporaries, and reassemble below.  */
             retl = tcg_temp_new_i64();
             reth = tcg_temp_new_i64();
-            s->gen_opparam_buf[pi++] = GET_TCGV_I64(reth);
-            s->gen_opparam_buf[pi++] = GET_TCGV_I64(retl);
+            op->args[pi++] = GET_TCGV_I64(reth);
+            op->args[pi++] = GET_TCGV_I64(retl);
             nb_rets = 2;
         } else {
-            s->gen_opparam_buf[pi++] = ret;
+            op->args[pi++] = ret;
             nb_rets = 1;
         }
 #else
         if (TCG_TARGET_REG_BITS < 64 && (sizemask & 1)) {
 #ifdef HOST_WORDS_BIGENDIAN
-            s->gen_opparam_buf[pi++] = ret + 1;
-            s->gen_opparam_buf[pi++] = ret;
+            op->args[pi++] = ret + 1;
+            op->args[pi++] = ret;
 #else
-            s->gen_opparam_buf[pi++] = ret;
-            s->gen_opparam_buf[pi++] = ret + 1;
+            op->args[pi++] = ret;
+            op->args[pi++] = ret + 1;
 #endif
             nb_rets = 2;
         } else {
-            s->gen_opparam_buf[pi++] = ret;
+            op->args[pi++] = ret;
             nb_rets = 1;
         }
 #endif
     } else {
         nb_rets = 0;
     }
+    op->callo = nb_rets;
+
     real_args = 0;
     for (i = 0; i < nargs; i++) {
         int is_64bit = sizemask & (1 << (i+1)*2);
@@ -853,7 +867,7 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
 #ifdef TCG_TARGET_CALL_ALIGN_ARGS
             /* some targets want aligned 64 bit args */
             if (real_args & 1) {
-                s->gen_opparam_buf[pi++] = TCG_CALL_DUMMY_ARG;
+                op->args[pi++] = TCG_CALL_DUMMY_ARG;
                 real_args++;
             }
 #endif
@@ -868,42 +882,26 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
               have to get more complicated to differentiate between
               stack arguments and register arguments.  */
 #if defined(HOST_WORDS_BIGENDIAN) != defined(TCG_TARGET_STACK_GROWSUP)
-            s->gen_opparam_buf[pi++] = args[i] + 1;
-            s->gen_opparam_buf[pi++] = args[i];
+            op->args[pi++] = args[i] + 1;
+            op->args[pi++] = args[i];
 #else
-            s->gen_opparam_buf[pi++] = args[i];
-            s->gen_opparam_buf[pi++] = args[i] + 1;
+            op->args[pi++] = args[i];
+            op->args[pi++] = args[i] + 1;
 #endif
             real_args += 2;
             continue;
         }
 
-        s->gen_opparam_buf[pi++] = args[i];
+        op->args[pi++] = args[i];
         real_args++;
     }
-    s->gen_opparam_buf[pi++] = (uintptr_t)func;
-    s->gen_opparam_buf[pi++] = flags;
+    op->args[pi++] = (uintptr_t)func;
+    op->args[pi++] = flags;
+    op->calli = real_args;
 
-    i = s->gen_next_op_idx;
-    tcg_debug_assert(i < OPC_BUF_SIZE);
-    tcg_debug_assert(pi <= OPPARAM_BUF_SIZE);
-
-    /* Set links for sequential allocation during translation.  */
-    s->gen_op_buf[i] = (TCGOp){
-        .opc = INDEX_op_call,
-        .callo = nb_rets,
-        .calli = real_args,
-        .args = pi_first,
-        .prev = i - 1,
-        .next = i + 1
-    };
-
-    /* Make sure the calli field didn't overflow.  */
-    tcg_debug_assert(s->gen_op_buf[i].calli == real_args);
-
-    s->gen_op_buf[0].prev = i;
-    s->gen_next_op_idx = i + 1;
-    s->gen_next_parm_idx = pi;
+    /* Make sure the fields didn't overflow.  */
+    tcg_debug_assert(op->calli == real_args);
+    tcg_debug_assert(pi <= ARRAY_SIZE(op->args));
 
 #if defined(__sparc__) && !defined(__arch64__) \
     && !defined(CONFIG_TCG_INTERPRETER)
@@ -1063,7 +1061,7 @@ void tcg_dump_ops(TCGContext *s)
         op = &s->gen_op_buf[oi];
         c = op->opc;
         def = &tcg_op_defs[c];
-        args = &s->gen_opparam_buf[op->args];
+        args = op->args;
 
         if (c == INDEX_op_insn_start) {
             col += qemu_log("%s ----", oi != s->gen_op_buf[0].next ? "\n" : "");
@@ -1347,20 +1345,16 @@ TCGOp *tcg_op_insert_before(TCGContext *s, TCGOp *old_op,
                             TCGOpcode opc, int nargs)
 {
     int oi = s->gen_next_op_idx;
-    int pi = s->gen_next_parm_idx;
     int prev = old_op->prev;
     int next = old_op - s->gen_op_buf;
     TCGOp *new_op;
 
     tcg_debug_assert(oi < OPC_BUF_SIZE);
-    tcg_debug_assert(pi + nargs <= OPPARAM_BUF_SIZE);
     s->gen_next_op_idx = oi + 1;
-    s->gen_next_parm_idx = pi + nargs;
 
     new_op = &s->gen_op_buf[oi];
     *new_op = (TCGOp){
         .opc = opc,
-        .args = pi,
         .prev = prev,
         .next = next
     };
@@ -1374,20 +1368,16 @@ TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *old_op,
                            TCGOpcode opc, int nargs)
 {
     int oi = s->gen_next_op_idx;
-    int pi = s->gen_next_parm_idx;
     int prev = old_op - s->gen_op_buf;
     int next = old_op->next;
     TCGOp *new_op;
 
     tcg_debug_assert(oi < OPC_BUF_SIZE);
-    tcg_debug_assert(pi + nargs <= OPPARAM_BUF_SIZE);
     s->gen_next_op_idx = oi + 1;
-    s->gen_next_parm_idx = pi + nargs;
 
     new_op = &s->gen_op_buf[oi];
     *new_op = (TCGOp){
         .opc = opc,
-        .args = pi,
         .prev = prev,
         .next = next
     };
@@ -1443,7 +1433,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
         TCGArg arg;
 
         TCGOp * const op = &s->gen_op_buf[oi];
-        TCGArg * const args = &s->gen_opparam_buf[op->args];
+        TCGArg * const args = op->args;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def = &tcg_op_defs[opc];
 
@@ -1681,7 +1671,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
 
     for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
         TCGOp *op = &s->gen_op_buf[oi];
-        TCGArg *args = &s->gen_opparam_buf[op->args];
+        TCGArg *args = op->args;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def = &tcg_op_defs[opc];
         TCGLifeData arg_life = op->life;
@@ -1724,7 +1714,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
                                       ? INDEX_op_ld_i32
                                       : INDEX_op_ld_i64);
                     TCGOp *lop = tcg_op_insert_before(s, op, lopc, 3);
-                    TCGArg *largs = &s->gen_opparam_buf[lop->args];
+                    TCGArg *largs = lop->args;
 
                     largs[0] = dir;
                     largs[1] = temp_idx(s, its->mem_base);
@@ -1796,7 +1786,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
                                   ? INDEX_op_st_i32
                                   : INDEX_op_st_i64);
                 TCGOp *sop = tcg_op_insert_after(s, op, sopc, 3);
-                TCGArg *sargs = &s->gen_opparam_buf[sop->args];
+                TCGArg *sargs = sop->args;
 
                 sargs[0] = dir;
                 sargs[1] = temp_idx(s, its->mem_base);
@@ -2624,7 +2614,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     num_insns = -1;
     for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
         TCGOp * const op = &s->gen_op_buf[oi];
-        TCGArg * const args = &s->gen_opparam_buf[op->args];
+        TCGArg * const args = op->args;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def = &tcg_op_defs[opc];
         TCGLifeData arg_life = op->life;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 9e37722..720e04e 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -51,8 +51,6 @@
 #define OPC_BUF_SIZE 640
 #define OPC_MAX_SIZE (OPC_BUF_SIZE - MAX_OP_PER_INSTR)
 
-#define OPPARAM_BUF_SIZE (OPC_BUF_SIZE * MAX_OPC_PARAM)
-
 #define CPU_TEMP_BUF_NLONGS 128
 
 /* Default target word size to pointer size.  */
@@ -613,33 +611,29 @@ typedef struct TCGTempSet {
 #define SYNC_ARG  1
 typedef uint16_t TCGLifeData;
 
-/* The layout here is designed to avoid crossing of a 32-bit boundary.
-   If we do so, gcc adds padding, expanding the size to 12.  */
+/* The layout here is designed to avoid crossing of a 32-bit boundary.  */
 typedef struct TCGOp {
     TCGOpcode opc   : 8;        /*  8 */
 
-    /* Index of the prev/next op, or 0 for the end of the list.  */
-    unsigned prev   : 10;       /* 18 */
-    unsigned next   : 10;       /* 28 */
-
     /* The number of out and in parameter for a call.  */
-    unsigned calli  : 4;        /* 32 */
-    unsigned callo  : 2;        /* 34 */
+    unsigned calli  : 4;        /* 12 */
+    unsigned callo  : 2;        /* 14 */
+    unsigned        : 2;        /* 16 */
 
-    /* Index of the arguments for this op, or 0 for zero-operand ops.  */
-    unsigned args   : 14;       /* 48 */
+    /* Index of the prev/next op, or 0 for the end of the list.  */
+    unsigned prev   : 16;       /* 32 */
+    unsigned next   : 16;       /* 48 */
 
     /* Lifetime data of the operands.  */
     unsigned life   : 16;       /* 64 */
+
+    /* Arguments for the opcode.  */
+    TCGArg args[MAX_OPC_PARAM];
 } TCGOp;
 
 /* Make sure operands fit in the bitfields above.  */
 QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8));
-QEMU_BUILD_BUG_ON(OPC_BUF_SIZE > (1 << 10));
-QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE > (1 << 14));
-
-/* Make sure that we don't overflow 64 bits without noticing.  */
-QEMU_BUILD_BUG_ON(sizeof(TCGOp) > 8);
+QEMU_BUILD_BUG_ON(OPC_BUF_SIZE > (1 << 16));
 
 struct TCGContext {
     uint8_t *pool_cur, *pool_end;
@@ -691,7 +685,6 @@ struct TCGContext {
 #endif
 
     int gen_next_op_idx;
-    int gen_next_parm_idx;
 
     /* Code generation.  Note that we specifically do not use tcg_insn_unit
        here, because there's too much arithmetic throughout that relies
@@ -723,7 +716,6 @@ struct TCGContext {
     TCGTemp *reg_to_temp[TCG_TARGET_NB_REGS];
 
     TCGOp gen_op_buf[OPC_BUF_SIZE];
-    TCGArg gen_opparam_buf[OPPARAM_BUF_SIZE];
 
     uint16_t gen_insn_end_off[TCG_MAX_INSNS];
     target_ulong gen_insn_data[TCG_MAX_INSNS][TARGET_INSN_START_WORDS];
@@ -734,8 +726,7 @@ extern bool parallel_cpus;
 
 static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
 {
-    int op_argi = tcg_ctx.gen_op_buf[op_idx].args;
-    tcg_ctx.gen_opparam_buf[op_argi + arg] = v;
+    tcg_ctx.gen_op_buf[op_idx].args[arg] = v;
 }
 
 /* The number of opcodes emitted so far.  */
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 02/16] tcg: Propagate args to op->args in optimizer
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 01/16] tcg: Merge opcode arguments into TCGOp Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-26 14:53   ` Alex Bennée
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 03/16] tcg: Propagate args to op->args in tcg.c Richard Henderson
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 430 ++++++++++++++++++++++++++++++---------------------------
 1 file changed, 227 insertions(+), 203 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 002aad6..1a1c6fb 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -166,8 +166,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
     return false;
 }
 
-static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
-                             TCGArg dst, TCGArg val)
+static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val)
 {
     TCGOpcode new_op = op_to_movi(op->opc);
     tcg_target_ulong mask;
@@ -184,12 +183,11 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
     }
     temps[dst].mask = mask;
 
-    args[0] = dst;
-    args[1] = val;
+    op->args[0] = dst;
+    op->args[1] = val;
 }
 
-static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
-                            TCGArg dst, TCGArg src)
+static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
 {
     if (temps_are_copies(dst, src)) {
         tcg_op_remove(s, op);
@@ -218,8 +216,8 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
         temps[dst].val = temps[src].val;
     }
 
-    args[0] = dst;
-    args[1] = src;
+    op->args[0] = dst;
+    op->args[1] = src;
 }
 
 static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
@@ -559,7 +557,7 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 void tcg_optimize(TCGContext *s)
 {
     int oi, oi_next, nb_temps, nb_globals;
-    TCGArg *prev_mb_args = NULL;
+    TCGOp *prev_mb = NULL;
 
     /* Array VALS has an element for each temp.
        If this temp holds a constant then its value is kept in VALS' element.
@@ -576,7 +574,6 @@ void tcg_optimize(TCGContext *s)
         TCGArg tmp;
 
         TCGOp * const op = &s->gen_op_buf[oi];
-        TCGArg * const args = op->args;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def = &tcg_op_defs[opc];
 
@@ -588,7 +585,7 @@ void tcg_optimize(TCGContext *s)
             nb_oargs = op->callo;
             nb_iargs = op->calli;
             for (i = 0; i < nb_oargs + nb_iargs; i++) {
-                tmp = args[i];
+                tmp = op->args[i];
                 if (tmp != TCG_CALL_DUMMY_ARG) {
                     init_temp_info(tmp);
                 }
@@ -597,14 +594,14 @@ void tcg_optimize(TCGContext *s)
             nb_oargs = def->nb_oargs;
             nb_iargs = def->nb_iargs;
             for (i = 0; i < nb_oargs + nb_iargs; i++) {
-                init_temp_info(args[i]);
+                init_temp_info(op->args[i]);
             }
         }
 
         /* Do copy propagation */
         for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-            if (temp_is_copy(args[i])) {
-                args[i] = find_better_copy(s, args[i]);
+            if (temp_is_copy(op->args[i])) {
+                op->args[i] = find_better_copy(s, op->args[i]);
             }
         }
 
@@ -620,45 +617,45 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(nor):
         CASE_OP_32_64(muluh):
         CASE_OP_32_64(mulsh):
-            swap_commutative(args[0], &args[1], &args[2]);
+            swap_commutative(op->args[0], &op->args[1], &op->args[2]);
             break;
         CASE_OP_32_64(brcond):
-            if (swap_commutative(-1, &args[0], &args[1])) {
-                args[2] = tcg_swap_cond(args[2]);
+            if (swap_commutative(-1, &op->args[0], &op->args[1])) {
+                op->args[2] = tcg_swap_cond(op->args[2]);
             }
             break;
         CASE_OP_32_64(setcond):
-            if (swap_commutative(args[0], &args[1], &args[2])) {
-                args[3] = tcg_swap_cond(args[3]);
+            if (swap_commutative(op->args[0], &op->args[1], &op->args[2])) {
+                op->args[3] = tcg_swap_cond(op->args[3]);
             }
             break;
         CASE_OP_32_64(movcond):
-            if (swap_commutative(-1, &args[1], &args[2])) {
-                args[5] = tcg_swap_cond(args[5]);
+            if (swap_commutative(-1, &op->args[1], &op->args[2])) {
+                op->args[5] = tcg_swap_cond(op->args[5]);
             }
             /* For movcond, we canonicalize the "false" input reg to match
                the destination reg so that the tcg backend can implement
                a "move if true" operation.  */
-            if (swap_commutative(args[0], &args[4], &args[3])) {
-                args[5] = tcg_invert_cond(args[5]);
+            if (swap_commutative(op->args[0], &op->args[4], &op->args[3])) {
+                op->args[5] = tcg_invert_cond(op->args[5]);
             }
             break;
         CASE_OP_32_64(add2):
-            swap_commutative(args[0], &args[2], &args[4]);
-            swap_commutative(args[1], &args[3], &args[5]);
+            swap_commutative(op->args[0], &op->args[2], &op->args[4]);
+            swap_commutative(op->args[1], &op->args[3], &op->args[5]);
             break;
         CASE_OP_32_64(mulu2):
         CASE_OP_32_64(muls2):
-            swap_commutative(args[0], &args[2], &args[3]);
+            swap_commutative(op->args[0], &op->args[2], &op->args[3]);
             break;
         case INDEX_op_brcond2_i32:
-            if (swap_commutative2(&args[0], &args[2])) {
-                args[4] = tcg_swap_cond(args[4]);
+            if (swap_commutative2(&op->args[0], &op->args[2])) {
+                op->args[4] = tcg_swap_cond(op->args[4]);
             }
             break;
         case INDEX_op_setcond2_i32:
-            if (swap_commutative2(&args[1], &args[3])) {
-                args[5] = tcg_swap_cond(args[5]);
+            if (swap_commutative2(&op->args[1], &op->args[3])) {
+                op->args[5] = tcg_swap_cond(op->args[5]);
             }
             break;
         default:
@@ -673,8 +670,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(sar):
         CASE_OP_32_64(rotl):
         CASE_OP_32_64(rotr):
-            if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
-                tcg_opt_gen_movi(s, op, args, args[0], 0);
+            if (temp_is_const(op->args[1]) && temps[op->args[1]].val == 0) {
+                tcg_opt_gen_movi(s, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -683,7 +680,7 @@ void tcg_optimize(TCGContext *s)
                 TCGOpcode neg_op;
                 bool have_neg;
 
-                if (temp_is_const(args[2])) {
+                if (temp_is_const(op->args[2])) {
                     /* Proceed with possible constant folding. */
                     break;
                 }
@@ -697,40 +694,45 @@ void tcg_optimize(TCGContext *s)
                 if (!have_neg) {
                     break;
                 }
-                if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
+                if (temp_is_const(op->args[1])
+                    && temps[op->args[1]].val == 0) {
                     op->opc = neg_op;
-                    reset_temp(args[0]);
-                    args[1] = args[2];
+                    reset_temp(op->args[0]);
+                    op->args[1] = op->args[2];
                     continue;
                 }
             }
             break;
         CASE_OP_32_64(xor):
         CASE_OP_32_64(nand):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
+            if (!temp_is_const(op->args[1])
+                && temp_is_const(op->args[2])
+                && temps[op->args[2]].val == -1) {
                 i = 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(nor):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
+            if (!temp_is_const(op->args[1])
+                && temp_is_const(op->args[2])
+                && temps[op->args[2]].val == 0) {
                 i = 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(andc):
-            if (!temp_is_const(args[2])
-                && temp_is_const(args[1]) && temps[args[1]].val == -1) {
+            if (!temp_is_const(op->args[2])
+                && temp_is_const(op->args[1])
+                && temps[op->args[1]].val == -1) {
                 i = 2;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (!temp_is_const(args[2])
-                && temp_is_const(args[1]) && temps[args[1]].val == 0) {
+            if (!temp_is_const(op->args[2])
+                && temp_is_const(op->args[1])
+                && temps[op->args[1]].val == 0) {
                 i = 2;
                 goto try_not;
             }
@@ -751,8 +753,8 @@ void tcg_optimize(TCGContext *s)
                     break;
                 }
                 op->opc = not_op;
-                reset_temp(args[0]);
-                args[1] = args[i];
+                reset_temp(op->args[0]);
+                op->args[1] = op->args[i];
                 continue;
             }
         default:
@@ -771,18 +773,20 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(or):
         CASE_OP_32_64(xor):
         CASE_OP_32_64(andc):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            if (!temp_is_const(op->args[1])
+                && temp_is_const(op->args[2])
+                && temps[op->args[2]].val == 0) {
+                tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
                 continue;
             }
             break;
         CASE_OP_32_64(and):
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (!temp_is_const(args[1])
-                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            if (!temp_is_const(op->args[1])
+                && temp_is_const(op->args[2])
+                && temps[op->args[2]].val == -1) {
+                tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
                 continue;
             }
             break;
@@ -796,21 +800,21 @@ void tcg_optimize(TCGContext *s)
         affected = -1;
         switch (opc) {
         CASE_OP_32_64(ext8s):
-            if ((temps[args[1]].mask & 0x80) != 0) {
+            if ((temps[op->args[1]].mask & 0x80) != 0) {
                 break;
             }
         CASE_OP_32_64(ext8u):
             mask = 0xff;
             goto and_const;
         CASE_OP_32_64(ext16s):
-            if ((temps[args[1]].mask & 0x8000) != 0) {
+            if ((temps[op->args[1]].mask & 0x8000) != 0) {
                 break;
             }
         CASE_OP_32_64(ext16u):
             mask = 0xffff;
             goto and_const;
         case INDEX_op_ext32s_i64:
-            if ((temps[args[1]].mask & 0x80000000) != 0) {
+            if ((temps[op->args[1]].mask & 0x80000000) != 0) {
                 break;
             }
         case INDEX_op_ext32u_i64:
@@ -818,110 +822,111 @@ void tcg_optimize(TCGContext *s)
             goto and_const;
 
         CASE_OP_32_64(and):
-            mask = temps[args[2]].mask;
-            if (temp_is_const(args[2])) {
+            mask = temps[op->args[2]].mask;
+            if (temp_is_const(op->args[2])) {
         and_const:
-                affected = temps[args[1]].mask & ~mask;
+                affected = temps[op->args[1]].mask & ~mask;
             }
-            mask = temps[args[1]].mask & mask;
+            mask = temps[op->args[1]].mask & mask;
             break;
 
         case INDEX_op_ext_i32_i64:
-            if ((temps[args[1]].mask & 0x80000000) != 0) {
+            if ((temps[op->args[1]].mask & 0x80000000) != 0) {
                 break;
             }
         case INDEX_op_extu_i32_i64:
             /* We do not compute affected as it is a size changing op.  */
-            mask = (uint32_t)temps[args[1]].mask;
+            mask = (uint32_t)temps[op->args[1]].mask;
             break;
 
         CASE_OP_32_64(andc):
             /* Known-zeros does not imply known-ones.  Therefore unless
-               args[2] is constant, we can't infer anything from it.  */
-            if (temp_is_const(args[2])) {
-                mask = ~temps[args[2]].mask;
+               op->args[2] is constant, we can't infer anything from it.  */
+            if (temp_is_const(op->args[2])) {
+                mask = ~temps[op->args[2]].mask;
                 goto and_const;
             }
-            /* But we certainly know nothing outside args[1] may be set. */
-            mask = temps[args[1]].mask;
+            /* But we certainly know nothing outside op->args[1] may be set. */
+            mask = temps[op->args[1]].mask;
             break;
 
         case INDEX_op_sar_i32:
-            if (temp_is_const(args[2])) {
-                tmp = temps[args[2]].val & 31;
-                mask = (int32_t)temps[args[1]].mask >> tmp;
+            if (temp_is_const(op->args[2])) {
+                tmp = temps[op->args[2]].val & 31;
+                mask = (int32_t)temps[op->args[1]].mask >> tmp;
             }
             break;
         case INDEX_op_sar_i64:
-            if (temp_is_const(args[2])) {
-                tmp = temps[args[2]].val & 63;
-                mask = (int64_t)temps[args[1]].mask >> tmp;
+            if (temp_is_const(op->args[2])) {
+                tmp = temps[op->args[2]].val & 63;
+                mask = (int64_t)temps[op->args[1]].mask >> tmp;
             }
             break;
 
         case INDEX_op_shr_i32:
-            if (temp_is_const(args[2])) {
-                tmp = temps[args[2]].val & 31;
-                mask = (uint32_t)temps[args[1]].mask >> tmp;
+            if (temp_is_const(op->args[2])) {
+                tmp = temps[op->args[2]].val & 31;
+                mask = (uint32_t)temps[op->args[1]].mask >> tmp;
             }
             break;
         case INDEX_op_shr_i64:
-            if (temp_is_const(args[2])) {
-                tmp = temps[args[2]].val & 63;
-                mask = (uint64_t)temps[args[1]].mask >> tmp;
+            if (temp_is_const(op->args[2])) {
+                tmp = temps[op->args[2]].val & 63;
+                mask = (uint64_t)temps[op->args[1]].mask >> tmp;
             }
             break;
 
         case INDEX_op_extrl_i64_i32:
-            mask = (uint32_t)temps[args[1]].mask;
+            mask = (uint32_t)temps[op->args[1]].mask;
             break;
         case INDEX_op_extrh_i64_i32:
-            mask = (uint64_t)temps[args[1]].mask >> 32;
+            mask = (uint64_t)temps[op->args[1]].mask >> 32;
             break;
 
         CASE_OP_32_64(shl):
-            if (temp_is_const(args[2])) {
-                tmp = temps[args[2]].val & (TCG_TARGET_REG_BITS - 1);
-                mask = temps[args[1]].mask << tmp;
+            if (temp_is_const(op->args[2])) {
+                tmp = temps[op->args[2]].val & (TCG_TARGET_REG_BITS - 1);
+                mask = temps[op->args[1]].mask << tmp;
             }
             break;
 
         CASE_OP_32_64(neg):
             /* Set to 1 all bits to the left of the rightmost.  */
-            mask = -(temps[args[1]].mask & -temps[args[1]].mask);
+            mask = -(temps[op->args[1]].mask & -temps[op->args[1]].mask);
             break;
 
         CASE_OP_32_64(deposit):
-            mask = deposit64(temps[args[1]].mask, args[3], args[4],
-                             temps[args[2]].mask);
+            mask = deposit64(temps[op->args[1]].mask, op->args[3],
+                             op->args[4], temps[op->args[2]].mask);
             break;
 
         CASE_OP_32_64(extract):
-            mask = extract64(temps[args[1]].mask, args[2], args[3]);
-            if (args[2] == 0) {
-                affected = temps[args[1]].mask & ~mask;
+            mask = extract64(temps[op->args[1]].mask, op->args[2], op->args[3]);
+            if (op->args[2] == 0) {
+                affected = temps[op->args[1]].mask & ~mask;
             }
             break;
         CASE_OP_32_64(sextract):
-            mask = sextract64(temps[args[1]].mask, args[2], args[3]);
-            if (args[2] == 0 && (tcg_target_long)mask >= 0) {
-                affected = temps[args[1]].mask & ~mask;
+            mask = sextract64(temps[op->args[1]].mask,
+                              op->args[2], op->args[3]);
+            if (op->args[2] == 0 && (tcg_target_long)mask >= 0) {
+                affected = temps[op->args[1]].mask & ~mask;
             }
             break;
 
         CASE_OP_32_64(or):
         CASE_OP_32_64(xor):
-            mask = temps[args[1]].mask | temps[args[2]].mask;
+            mask = temps[op->args[1]].mask | temps[op->args[2]].mask;
             break;
 
         case INDEX_op_clz_i32:
         case INDEX_op_ctz_i32:
-            mask = temps[args[2]].mask | 31;
+            mask = temps[op->args[2]].mask | 31;
             break;
 
         case INDEX_op_clz_i64:
         case INDEX_op_ctz_i64:
-            mask = temps[args[2]].mask | 63;
+            mask = temps[op->args[2]].mask | 63;
             break;
 
         case INDEX_op_ctpop_i32:
@@ -937,7 +942,7 @@ void tcg_optimize(TCGContext *s)
             break;
 
         CASE_OP_32_64(movcond):
-            mask = temps[args[3]].mask | temps[args[4]].mask;
+            mask = temps[op->args[3]].mask | temps[op->args[4]].mask;
             break;
 
         CASE_OP_32_64(ld8u):
@@ -952,7 +957,7 @@ void tcg_optimize(TCGContext *s)
 
         CASE_OP_32_64(qemu_ld):
             {
-                TCGMemOpIdx oi = args[nb_oargs + nb_iargs];
+                TCGMemOpIdx oi = op->args[nb_oargs + nb_iargs];
                 TCGMemOp mop = get_memop(oi);
                 if (!(mop & MO_SIGN)) {
                     mask = (2ULL << ((8 << (mop & MO_SIZE)) - 1)) - 1;
@@ -976,12 +981,12 @@ void tcg_optimize(TCGContext *s)
 
         if (partmask == 0) {
             tcg_debug_assert(nb_oargs == 1);
-            tcg_opt_gen_movi(s, op, args, args[0], 0);
+            tcg_opt_gen_movi(s, op, op->args[0], 0);
             continue;
         }
         if (affected == 0) {
             tcg_debug_assert(nb_oargs == 1);
-            tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
             continue;
         }
 
@@ -991,8 +996,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(mul):
         CASE_OP_32_64(muluh):
         CASE_OP_32_64(mulsh):
-            if ((temp_is_const(args[2]) && temps[args[2]].val == 0)) {
-                tcg_opt_gen_movi(s, op, args, args[0], 0);
+            if ((temp_is_const(op->args[2]) && temps[op->args[2]].val == 0)) {
+                tcg_opt_gen_movi(s, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -1004,8 +1009,8 @@ void tcg_optimize(TCGContext *s)
         switch (opc) {
         CASE_OP_32_64(or):
         CASE_OP_32_64(and):
-            if (temps_are_copies(args[1], args[2])) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            if (temps_are_copies(op->args[1], op->args[2])) {
+                tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
                 continue;
             }
             break;
@@ -1018,8 +1023,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(andc):
         CASE_OP_32_64(sub):
         CASE_OP_32_64(xor):
-            if (temps_are_copies(args[1], args[2])) {
-                tcg_opt_gen_movi(s, op, args, args[0], 0);
+            if (temps_are_copies(op->args[1], op->args[2])) {
+                tcg_opt_gen_movi(s, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -1032,10 +1037,10 @@ void tcg_optimize(TCGContext *s)
            allocator where needed and possible.  Also detect copies. */
         switch (opc) {
         CASE_OP_32_64(mov):
-            tcg_opt_gen_mov(s, op, args, args[0], args[1]);
+            tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
             break;
         CASE_OP_32_64(movi):
-            tcg_opt_gen_movi(s, op, args, args[0], args[1]);
+            tcg_opt_gen_movi(s, op, op->args[0], op->args[1]);
             break;
 
         CASE_OP_32_64(not):
@@ -1051,9 +1056,9 @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_extu_i32_i64:
         case INDEX_op_extrl_i64_i32:
         case INDEX_op_extrh_i64_i32:
-            if (temp_is_const(args[1])) {
-                tmp = do_constant_folding(opc, temps[args[1]].val, 0);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+            if (temp_is_const(op->args[1])) {
+                tmp = do_constant_folding(opc, temps[op->args[1]].val, 0);
+                tcg_opt_gen_movi(s, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1080,68 +1085,72 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(divu):
         CASE_OP_32_64(rem):
         CASE_OP_32_64(remu):
-            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
-                tmp = do_constant_folding(opc, temps[args[1]].val,
-                                          temps[args[2]].val);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+            if (temp_is_const(op->args[1]) && temp_is_const(op->args[2])) {
+                tmp = do_constant_folding(opc, temps[op->args[1]].val,
+                                          temps[op->args[2]].val);
+                tcg_opt_gen_movi(s, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(clz):
         CASE_OP_32_64(ctz):
-            if (temp_is_const(args[1])) {
-                TCGArg v = temps[args[1]].val;
+            if (temp_is_const(op->args[1])) {
+                TCGArg v = temps[op->args[1]].val;
                 if (v != 0) {
                     tmp = do_constant_folding(opc, v, 0);
-                    tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                    tcg_opt_gen_movi(s, op, op->args[0], tmp);
                 } else {
-                    tcg_opt_gen_mov(s, op, args, args[0], args[2]);
+                    tcg_opt_gen_mov(s, op, op->args[0], op->args[2]);
                 }
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(deposit):
-            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
-                tmp = deposit64(temps[args[1]].val, args[3], args[4],
-                                temps[args[2]].val);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+            if (temp_is_const(op->args[1]) && temp_is_const(op->args[2])) {
+                tmp = deposit64(temps[op->args[1]].val, op->args[3],
+                                op->args[4], temps[op->args[2]].val);
+                tcg_opt_gen_movi(s, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(extract):
-            if (temp_is_const(args[1])) {
-                tmp = extract64(temps[args[1]].val, args[2], args[3]);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+            if (temp_is_const(op->args[1])) {
+                tmp = extract64(temps[op->args[1]].val,
+                                op->args[2], op->args[3]);
+                tcg_opt_gen_movi(s, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(sextract):
-            if (temp_is_const(args[1])) {
-                tmp = sextract64(temps[args[1]].val, args[2], args[3]);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+            if (temp_is_const(op->args[1])) {
+                tmp = sextract64(temps[op->args[1]].val,
+                                 op->args[2], op->args[3]);
+                tcg_opt_gen_movi(s, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(setcond):
-            tmp = do_constant_folding_cond(opc, args[1], args[2], args[3]);
+            tmp = do_constant_folding_cond(opc, op->args[1],
+                                           op->args[2], op->args[3]);
             if (tmp != 2) {
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                tcg_opt_gen_movi(s, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(brcond):
-            tmp = do_constant_folding_cond(opc, args[0], args[1], args[2]);
+            tmp = do_constant_folding_cond(opc, op->args[0],
+                                           op->args[1], op->args[2]);
             if (tmp != 2) {
                 if (tmp) {
                     reset_all_temps(nb_temps);
                     op->opc = INDEX_op_br;
-                    args[0] = args[3];
+                    op->args[0] = op->args[3];
                 } else {
                     tcg_op_remove(s, op);
                 }
@@ -1150,21 +1159,22 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         CASE_OP_32_64(movcond):
-            tmp = do_constant_folding_cond(opc, args[1], args[2], args[5]);
+            tmp = do_constant_folding_cond(opc, op->args[1],
+                                           op->args[2], op->args[5]);
             if (tmp != 2) {
-                tcg_opt_gen_mov(s, op, args, args[0], args[4-tmp]);
+                tcg_opt_gen_mov(s, op, op->args[0], op->args[4-tmp]);
                 break;
             }
-            if (temp_is_const(args[3]) && temp_is_const(args[4])) {
-                tcg_target_ulong tv = temps[args[3]].val;
-                tcg_target_ulong fv = temps[args[4]].val;
-                TCGCond cond = args[5];
+            if (temp_is_const(op->args[3]) && temp_is_const(op->args[4])) {
+                tcg_target_ulong tv = temps[op->args[3]].val;
+                tcg_target_ulong fv = temps[op->args[4]].val;
+                TCGCond cond = op->args[5];
                 if (fv == 1 && tv == 0) {
                     cond = tcg_invert_cond(cond);
                 } else if (!(tv == 1 && fv == 0)) {
                     goto do_default;
                 }
-                args[3] = cond;
+                op->args[3] = cond;
                 op->opc = opc = (opc == INDEX_op_movcond_i32
                                  ? INDEX_op_setcond_i32
                                  : INDEX_op_setcond_i64);
@@ -1174,17 +1184,16 @@ void tcg_optimize(TCGContext *s)
 
         case INDEX_op_add2_i32:
         case INDEX_op_sub2_i32:
-            if (temp_is_const(args[2]) && temp_is_const(args[3])
-                && temp_is_const(args[4]) && temp_is_const(args[5])) {
-                uint32_t al = temps[args[2]].val;
-                uint32_t ah = temps[args[3]].val;
-                uint32_t bl = temps[args[4]].val;
-                uint32_t bh = temps[args[5]].val;
+            if (temp_is_const(op->args[2]) && temp_is_const(op->args[3])
+                && temp_is_const(op->args[4]) && temp_is_const(op->args[5])) {
+                uint32_t al = temps[op->args[2]].val;
+                uint32_t ah = temps[op->args[3]].val;
+                uint32_t bl = temps[op->args[4]].val;
+                uint32_t bh = temps[op->args[5]].val;
                 uint64_t a = ((uint64_t)ah << 32) | al;
                 uint64_t b = ((uint64_t)bh << 32) | bl;
                 TCGArg rl, rh;
                 TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32, 2);
-                TCGArg *args2 = op2->args;
 
                 if (opc == INDEX_op_add2_i32) {
                     a += b;
@@ -1192,10 +1201,10 @@ void tcg_optimize(TCGContext *s)
                     a -= b;
                 }
 
-                rl = args[0];
-                rh = args[1];
-                tcg_opt_gen_movi(s, op, args, rl, (int32_t)a);
-                tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(a >> 32));
+                rl = op->args[0];
+                rh = op->args[1];
+                tcg_opt_gen_movi(s, op, rl, (int32_t)a);
+                tcg_opt_gen_movi(s, op2, rh, (int32_t)(a >> 32));
 
                 /* We've done all we need to do with the movi.  Skip it.  */
                 oi_next = op2->next;
@@ -1204,18 +1213,17 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         case INDEX_op_mulu2_i32:
-            if (temp_is_const(args[2]) && temp_is_const(args[3])) {
-                uint32_t a = temps[args[2]].val;
-                uint32_t b = temps[args[3]].val;
+            if (temp_is_const(op->args[2]) && temp_is_const(op->args[3])) {
+                uint32_t a = temps[op->args[2]].val;
+                uint32_t b = temps[op->args[3]].val;
                 uint64_t r = (uint64_t)a * b;
                 TCGArg rl, rh;
                 TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32, 2);
-                TCGArg *args2 = op2->args;
 
-                rl = args[0];
-                rh = args[1];
-                tcg_opt_gen_movi(s, op, args, rl, (int32_t)r);
-                tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(r >> 32));
+                rl = op->args[0];
+                rh = op->args[1];
+                tcg_opt_gen_movi(s, op, rl, (int32_t)r);
+                tcg_opt_gen_movi(s, op2, rh, (int32_t)(r >> 32));
 
                 /* We've done all we need to do with the movi.  Skip it.  */
                 oi_next = op2->next;
@@ -1224,41 +1232,47 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         case INDEX_op_brcond2_i32:
-            tmp = do_constant_folding_cond2(&args[0], &args[2], args[4]);
+            tmp = do_constant_folding_cond2(&op->args[0], &op->args[2],
+                                            op->args[4]);
             if (tmp != 2) {
                 if (tmp) {
             do_brcond_true:
                     reset_all_temps(nb_temps);
                     op->opc = INDEX_op_br;
-                    args[0] = args[5];
+                    op->args[0] = op->args[5];
                 } else {
             do_brcond_false:
                     tcg_op_remove(s, op);
                 }
-            } else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
-                       && temp_is_const(args[2]) && temps[args[2]].val == 0
-                       && temp_is_const(args[3]) && temps[args[3]].val == 0) {
+            } else if ((op->args[4] == TCG_COND_LT
+                        || op->args[4] == TCG_COND_GE)
+                       && temp_is_const(op->args[2])
+                       && temps[op->args[2]].val == 0
+                       && temp_is_const(op->args[3])
+                       && temps[op->args[3]].val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_brcond_high:
                 reset_all_temps(nb_temps);
                 op->opc = INDEX_op_brcond_i32;
-                args[0] = args[1];
-                args[1] = args[3];
-                args[2] = args[4];
-                args[3] = args[5];
-            } else if (args[4] == TCG_COND_EQ) {
+                op->args[0] = op->args[1];
+                op->args[1] = op->args[3];
+                op->args[2] = op->args[4];
+                op->args[3] = op->args[5];
+            } else if (op->args[4] == TCG_COND_EQ) {
                 /* Simplify EQ comparisons where one of the pairs
                    can be simplified.  */
                 tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
-                                               args[0], args[2], TCG_COND_EQ);
+                                               op->args[0], op->args[2],
+                                               TCG_COND_EQ);
                 if (tmp == 0) {
                     goto do_brcond_false;
                 } else if (tmp == 1) {
                     goto do_brcond_high;
                 }
                 tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
-                                               args[1], args[3], TCG_COND_EQ);
+                                               op->args[1], op->args[3],
+                                               TCG_COND_EQ);
                 if (tmp == 0) {
                     goto do_brcond_false;
                 } else if (tmp != 1) {
@@ -1267,21 +1281,23 @@ void tcg_optimize(TCGContext *s)
             do_brcond_low:
                 reset_all_temps(nb_temps);
                 op->opc = INDEX_op_brcond_i32;
-                args[1] = args[2];
-                args[2] = args[4];
-                args[3] = args[5];
-            } else if (args[4] == TCG_COND_NE) {
+                op->args[1] = op->args[2];
+                op->args[2] = op->args[4];
+                op->args[3] = op->args[5];
+            } else if (op->args[4] == TCG_COND_NE) {
                 /* Simplify NE comparisons where one of the pairs
                    can be simplified.  */
                 tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
-                                               args[0], args[2], TCG_COND_NE);
+                                               op->args[0], op->args[2],
+                                               TCG_COND_NE);
                 if (tmp == 0) {
                     goto do_brcond_high;
                 } else if (tmp == 1) {
                     goto do_brcond_true;
                 }
                 tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
-                                               args[1], args[3], TCG_COND_NE);
+                                               op->args[1], op->args[3],
+                                               TCG_COND_NE);
                 if (tmp == 0) {
                     goto do_brcond_low;
                 } else if (tmp == 1) {
@@ -1294,57 +1310,65 @@ void tcg_optimize(TCGContext *s)
             break;
 
         case INDEX_op_setcond2_i32:
-            tmp = do_constant_folding_cond2(&args[1], &args[3], args[5]);
+            tmp = do_constant_folding_cond2(&op->args[1], &op->args[3],
+                                            op->args[5]);
             if (tmp != 2) {
             do_setcond_const:
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
-            } else if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
-                       && temp_is_const(args[3]) && temps[args[3]].val == 0
-                       && temp_is_const(args[4]) && temps[args[4]].val == 0) {
+                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+            } else if ((op->args[5] == TCG_COND_LT
+                        || op->args[5] == TCG_COND_GE)
+                       && temp_is_const(op->args[3])
+                       && temps[op->args[3]].val == 0
+                       && temp_is_const(op->args[4])
+                       && temps[op->args[4]].val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_setcond_high:
-                reset_temp(args[0]);
-                temps[args[0]].mask = 1;
+                reset_temp(op->args[0]);
+                temps[op->args[0]].mask = 1;
                 op->opc = INDEX_op_setcond_i32;
-                args[1] = args[2];
-                args[2] = args[4];
-                args[3] = args[5];
-            } else if (args[5] == TCG_COND_EQ) {
+                op->args[1] = op->args[2];
+                op->args[2] = op->args[4];
+                op->args[3] = op->args[5];
+            } else if (op->args[5] == TCG_COND_EQ) {
                 /* Simplify EQ comparisons where one of the pairs
                    can be simplified.  */
                 tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
-                                               args[1], args[3], TCG_COND_EQ);
+                                               op->args[1], op->args[3],
+                                               TCG_COND_EQ);
                 if (tmp == 0) {
                     goto do_setcond_const;
                 } else if (tmp == 1) {
                     goto do_setcond_high;
                 }
                 tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
-                                               args[2], args[4], TCG_COND_EQ);
+                                               op->args[2], op->args[4],
+                                               TCG_COND_EQ);
                 if (tmp == 0) {
                     goto do_setcond_high;
                 } else if (tmp != 1) {
                     goto do_default;
                 }
             do_setcond_low:
-                reset_temp(args[0]);
-                temps[args[0]].mask = 1;
+                reset_temp(op->args[0]);
+                temps[op->args[0]].mask = 1;
                 op->opc = INDEX_op_setcond_i32;
-                args[2] = args[3];
-                args[3] = args[5];
-            } else if (args[5] == TCG_COND_NE) {
+                op->args[2] = op->args[3];
+                op->args[3] = op->args[5];
+            } else if (op->args[5] == TCG_COND_NE) {
                 /* Simplify NE comparisons where one of the pairs
                    can be simplified.  */
                 tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
-                                               args[1], args[3], TCG_COND_NE);
+                                               op->args[1], op->args[3],
+                                               TCG_COND_NE);
                 if (tmp == 0) {
                     goto do_setcond_high;
                 } else if (tmp == 1) {
                     goto do_setcond_const;
                 }
                 tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
-                                               args[2], args[4], TCG_COND_NE);
+                                               op->args[2], op->args[4],
+                                               TCG_COND_NE);
                 if (tmp == 0) {
                     goto do_setcond_low;
                 } else if (tmp == 1) {
@@ -1357,7 +1381,7 @@ void tcg_optimize(TCGContext *s)
             break;
 
         case INDEX_op_call:
-            if (!(args[nb_oargs + nb_iargs + 1]
+            if (!(op->args[nb_oargs + nb_iargs + 1]
                   & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
                 for (i = 0; i < nb_globals; i++) {
                     if (test_bit(i, temps_used.l)) {
@@ -1379,11 +1403,11 @@ void tcg_optimize(TCGContext *s)
             } else {
         do_reset_output:
                 for (i = 0; i < nb_oargs; i++) {
-                    reset_temp(args[i]);
+                    reset_temp(op->args[i]);
                     /* Save the corresponding known-zero bits mask for the
                        first output argument (only one supported so far). */
                     if (i == 0) {
-                        temps[args[i]].mask = mask;
+                        temps[op->args[i]].mask = mask;
                     }
                 }
             }
@@ -1391,7 +1415,7 @@ void tcg_optimize(TCGContext *s)
         }
 
         /* Eliminate duplicate and redundant fence instructions.  */
-        if (prev_mb_args) {
+        if (prev_mb) {
             switch (opc) {
             case INDEX_op_mb:
                 /* Merge two barriers of the same type into one,
@@ -1405,7 +1429,7 @@ void tcg_optimize(TCGContext *s)
                  * barrier.  This is stricter than specified but for
                  * the purposes of TCG is better than not optimizing.
                  */
-                prev_mb_args[0] |= args[0];
+                prev_mb->args[0] |= op->args[0];
                 tcg_op_remove(s, op);
                 break;
 
@@ -1421,11 +1445,11 @@ void tcg_optimize(TCGContext *s)
             case INDEX_op_qemu_st_i64:
             case INDEX_op_call:
                 /* Opcodes that touch guest memory stop the optimization.  */
-                prev_mb_args = NULL;
+                prev_mb = NULL;
                 break;
             }
         } else if (opc == INDEX_op_mb) {
-            prev_mb_args = args;
+            prev_mb = op;
         }
     }
 }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 03/16] tcg: Propagate args to op->args in tcg.c
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 01/16] tcg: Merge opcode arguments into TCGOp Richard Henderson
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 02/16] tcg: Propagate args to op->args in optimizer Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-26 15:02   ` Alex Bennée
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 04/16] tcg: Propagate TCGOp down to allocators Richard Henderson
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.c | 121 ++++++++++++++++++++++++++++++--------------------------------
 1 file changed, 58 insertions(+), 63 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 298aa0c..be5b69c 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1054,14 +1054,12 @@ void tcg_dump_ops(TCGContext *s)
     for (oi = s->gen_op_buf[0].next; oi != 0; oi = op->next) {
         int i, k, nb_oargs, nb_iargs, nb_cargs;
         const TCGOpDef *def;
-        const TCGArg *args;
         TCGOpcode c;
         int col = 0;
 
         op = &s->gen_op_buf[oi];
         c = op->opc;
         def = &tcg_op_defs[c];
-        args = op->args;
 
         if (c == INDEX_op_insn_start) {
             col += qemu_log("%s ----", oi != s->gen_op_buf[0].next ? "\n" : "");
@@ -1069,9 +1067,9 @@ void tcg_dump_ops(TCGContext *s)
             for (i = 0; i < TARGET_INSN_START_WORDS; ++i) {
                 target_ulong a;
 #if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
-                a = ((target_ulong)args[i * 2 + 1] << 32) | args[i * 2];
+                a = deposit64(op->args[i * 2], 32, 32, op->args[i * 2 + 1]);
 #else
-                a = args[i];
+                a = op->args[i];
 #endif
                 col += qemu_log(" " TARGET_FMT_lx, a);
             }
@@ -1083,14 +1081,14 @@ void tcg_dump_ops(TCGContext *s)
 
             /* function name, flags, out args */
             col += qemu_log(" %s %s,$0x%" TCG_PRIlx ",$%d", def->name,
-                            tcg_find_helper(s, args[nb_oargs + nb_iargs]),
-                            args[nb_oargs + nb_iargs + 1], nb_oargs);
+                            tcg_find_helper(s, op->args[nb_oargs + nb_iargs]),
+                            op->args[nb_oargs + nb_iargs + 1], nb_oargs);
             for (i = 0; i < nb_oargs; i++) {
                 col += qemu_log(",%s", tcg_get_arg_str_idx(s, buf, sizeof(buf),
-                                                           args[i]));
+                                                           op->args[i]));
             }
             for (i = 0; i < nb_iargs; i++) {
-                TCGArg arg = args[nb_oargs + i];
+                TCGArg arg = op->args[nb_oargs + i];
                 const char *t = "<dummy>";
                 if (arg != TCG_CALL_DUMMY_ARG) {
                     t = tcg_get_arg_str_idx(s, buf, sizeof(buf), arg);
@@ -1110,14 +1108,14 @@ void tcg_dump_ops(TCGContext *s)
                     col += qemu_log(",");
                 }
                 col += qemu_log("%s", tcg_get_arg_str_idx(s, buf, sizeof(buf),
-                                                          args[k++]));
+                                                          op->args[k++]));
             }
             for (i = 0; i < nb_iargs; i++) {
                 if (k != 0) {
                     col += qemu_log(",");
                 }
                 col += qemu_log("%s", tcg_get_arg_str_idx(s, buf, sizeof(buf),
-                                                          args[k++]));
+                                                          op->args[k++]));
             }
             switch (c) {
             case INDEX_op_brcond_i32:
@@ -1128,10 +1126,11 @@ void tcg_dump_ops(TCGContext *s)
             case INDEX_op_brcond_i64:
             case INDEX_op_setcond_i64:
             case INDEX_op_movcond_i64:
-                if (args[k] < ARRAY_SIZE(cond_name) && cond_name[args[k]]) {
-                    col += qemu_log(",%s", cond_name[args[k++]]);
+                if (op->args[k] < ARRAY_SIZE(cond_name)
+                    && cond_name[op->args[k]]) {
+                    col += qemu_log(",%s", cond_name[op->args[k++]]);
                 } else {
-                    col += qemu_log(",$0x%" TCG_PRIlx, args[k++]);
+                    col += qemu_log(",$0x%" TCG_PRIlx, op->args[k++]);
                 }
                 i = 1;
                 break;
@@ -1140,7 +1139,7 @@ void tcg_dump_ops(TCGContext *s)
             case INDEX_op_qemu_ld_i64:
             case INDEX_op_qemu_st_i64:
                 {
-                    TCGMemOpIdx oi = args[k++];
+                    TCGMemOpIdx oi = op->args[k++];
                     TCGMemOp op = get_memop(oi);
                     unsigned ix = get_mmuidx(oi);
 
@@ -1165,14 +1164,15 @@ void tcg_dump_ops(TCGContext *s)
             case INDEX_op_brcond_i32:
             case INDEX_op_brcond_i64:
             case INDEX_op_brcond2_i32:
-                col += qemu_log("%s$L%d", k ? "," : "", arg_label(args[k])->id);
+                col += qemu_log("%s$L%d", k ? "," : "",
+                                arg_label(op->args[k])->id);
                 i++, k++;
                 break;
             default:
                 break;
             }
             for (; i < nb_cargs; i++, k++) {
-                col += qemu_log("%s$0x%" TCG_PRIlx, k ? "," : "", args[k]);
+                col += qemu_log("%s$0x%" TCG_PRIlx, k ? "," : "", op->args[k]);
             }
         }
         if (op->life) {
@@ -1433,7 +1433,6 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
         TCGArg arg;
 
         TCGOp * const op = &s->gen_op_buf[oi];
-        TCGArg * const args = op->args;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def = &tcg_op_defs[opc];
 
@@ -1446,12 +1445,12 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
 
                 nb_oargs = op->callo;
                 nb_iargs = op->calli;
-                call_flags = args[nb_oargs + nb_iargs + 1];
+                call_flags = op->args[nb_oargs + nb_iargs + 1];
 
                 /* pure functions can be removed if their result is unused */
                 if (call_flags & TCG_CALL_NO_SIDE_EFFECTS) {
                     for (i = 0; i < nb_oargs; i++) {
-                        arg = args[i];
+                        arg = op->args[i];
                         if (temp_state[arg] != TS_DEAD) {
                             goto do_not_remove_call;
                         }
@@ -1462,7 +1461,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
 
                     /* output args are dead */
                     for (i = 0; i < nb_oargs; i++) {
-                        arg = args[i];
+                        arg = op->args[i];
                         if (temp_state[arg] & TS_DEAD) {
                             arg_life |= DEAD_ARG << i;
                         }
@@ -1485,7 +1484,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
 
                     /* record arguments that die in this helper */
                     for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
-                        arg = args[i];
+                        arg = op->args[i];
                         if (arg != TCG_CALL_DUMMY_ARG) {
                             if (temp_state[arg] & TS_DEAD) {
                                 arg_life |= DEAD_ARG << i;
@@ -1494,7 +1493,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
                     }
                     /* input arguments are live for preceding opcodes */
                     for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
-                        arg = args[i];
+                        arg = op->args[i];
                         if (arg != TCG_CALL_DUMMY_ARG) {
                             temp_state[arg] &= ~TS_DEAD;
                         }
@@ -1506,7 +1505,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
             break;
         case INDEX_op_discard:
             /* mark the temporary as dead */
-            temp_state[args[0]] = TS_DEAD;
+            temp_state[op->args[0]] = TS_DEAD;
             break;
 
         case INDEX_op_add2_i32:
@@ -1527,15 +1526,15 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
                the low part.  The result can be optimized to a simple
                add or sub.  This happens often for x86_64 guest when the
                cpu mode is set to 32 bit.  */
-            if (temp_state[args[1]] == TS_DEAD) {
-                if (temp_state[args[0]] == TS_DEAD) {
+            if (temp_state[op->args[1]] == TS_DEAD) {
+                if (temp_state[op->args[0]] == TS_DEAD) {
                     goto do_remove;
                 }
                 /* Replace the opcode and adjust the args in place,
                    leaving 3 unused args at the end.  */
                 op->opc = opc = opc_new;
-                args[1] = args[2];
-                args[2] = args[4];
+                op->args[1] = op->args[2];
+                op->args[2] = op->args[4];
                 /* Fall through and mark the single-word operation live.  */
                 nb_iargs = 2;
                 nb_oargs = 1;
@@ -1565,21 +1564,21 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
         do_mul2:
             nb_iargs = 2;
             nb_oargs = 2;
-            if (temp_state[args[1]] == TS_DEAD) {
-                if (temp_state[args[0]] == TS_DEAD) {
+            if (temp_state[op->args[1]] == TS_DEAD) {
+                if (temp_state[op->args[0]] == TS_DEAD) {
                     /* Both parts of the operation are dead.  */
                     goto do_remove;
                 }
                 /* The high part of the operation is dead; generate the low. */
                 op->opc = opc = opc_new;
-                args[1] = args[2];
-                args[2] = args[3];
-            } else if (temp_state[args[0]] == TS_DEAD && have_opc_new2) {
+                op->args[1] = op->args[2];
+                op->args[2] = op->args[3];
+            } else if (temp_state[op->args[0]] == TS_DEAD && have_opc_new2) {
                 /* The low part of the operation is dead; generate the high. */
                 op->opc = opc = opc_new2;
-                args[0] = args[1];
-                args[1] = args[2];
-                args[2] = args[3];
+                op->args[0] = op->args[1];
+                op->args[1] = op->args[2];
+                op->args[2] = op->args[3];
             } else {
                 goto do_not_remove;
             }
@@ -1597,7 +1596,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
                implies side effects */
             if (!(def->flags & TCG_OPF_SIDE_EFFECTS) && nb_oargs != 0) {
                 for (i = 0; i < nb_oargs; i++) {
-                    if (temp_state[args[i]] != TS_DEAD) {
+                    if (temp_state[op->args[i]] != TS_DEAD) {
                         goto do_not_remove;
                     }
                 }
@@ -1607,7 +1606,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
             do_not_remove:
                 /* output args are dead */
                 for (i = 0; i < nb_oargs; i++) {
-                    arg = args[i];
+                    arg = op->args[i];
                     if (temp_state[arg] & TS_DEAD) {
                         arg_life |= DEAD_ARG << i;
                     }
@@ -1629,14 +1628,14 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
 
                 /* record arguments that die in this opcode */
                 for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-                    arg = args[i];
+                    arg = op->args[i];
                     if (temp_state[arg] & TS_DEAD) {
                         arg_life |= DEAD_ARG << i;
                     }
                 }
                 /* input arguments are live for preceding opcodes */
                 for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-                    temp_state[args[i]] &= ~TS_DEAD;
+                    temp_state[op->args[i]] &= ~TS_DEAD;
                 }
             }
             break;
@@ -1671,7 +1670,6 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
 
     for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
         TCGOp *op = &s->gen_op_buf[oi];
-        TCGArg *args = op->args;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def = &tcg_op_defs[opc];
         TCGLifeData arg_life = op->life;
@@ -1683,7 +1681,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
         if (opc == INDEX_op_call) {
             nb_oargs = op->callo;
             nb_iargs = op->calli;
-            call_flags = args[nb_oargs + nb_iargs + 1];
+            call_flags = op->args[nb_oargs + nb_iargs + 1];
         } else {
             nb_iargs = def->nb_iargs;
             nb_oargs = def->nb_oargs;
@@ -1704,7 +1702,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
 
         /* Make sure that input arguments are available.  */
         for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
-            arg = args[i];
+            arg = op->args[i];
             /* Note this unsigned test catches TCG_CALL_ARG_DUMMY too.  */
             if (arg < nb_globals) {
                 dir = dir_temps[arg];
@@ -1714,11 +1712,10 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
                                       ? INDEX_op_ld_i32
                                       : INDEX_op_ld_i64);
                     TCGOp *lop = tcg_op_insert_before(s, op, lopc, 3);
-                    TCGArg *largs = lop->args;
 
-                    largs[0] = dir;
-                    largs[1] = temp_idx(s, its->mem_base);
-                    largs[2] = its->mem_offset;
+                    lop->args[0] = dir;
+                    lop->args[1] = temp_idx(s, its->mem_base);
+                    lop->args[2] = its->mem_offset;
 
                     /* Loaded, but synced with memory.  */
                     temp_state[arg] = TS_MEM;
@@ -1730,11 +1727,11 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
            No action is required except keeping temp_state up to date
            so that we reload when needed.  */
         for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
-            arg = args[i];
+            arg = op->args[i];
             if (arg < nb_globals) {
                 dir = dir_temps[arg];
                 if (dir != 0) {
-                    args[i] = dir;
+                    op->args[i] = dir;
                     changes = true;
                     if (IS_DEAD_ARG(i)) {
                         temp_state[arg] = TS_DEAD;
@@ -1765,7 +1762,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
 
         /* Outputs become available.  */
         for (i = 0; i < nb_oargs; i++) {
-            arg = args[i];
+            arg = op->args[i];
             if (arg >= nb_globals) {
                 continue;
             }
@@ -1773,7 +1770,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
             if (dir == 0) {
                 continue;
             }
-            args[i] = dir;
+            op->args[i] = dir;
             changes = true;
 
             /* The output is now live and modified.  */
@@ -1786,11 +1783,10 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
                                   ? INDEX_op_st_i32
                                   : INDEX_op_st_i64);
                 TCGOp *sop = tcg_op_insert_after(s, op, sopc, 3);
-                TCGArg *sargs = sop->args;
 
-                sargs[0] = dir;
-                sargs[1] = temp_idx(s, its->mem_base);
-                sargs[2] = its->mem_offset;
+                sop->args[0] = dir;
+                sop->args[1] = temp_idx(s, its->mem_base);
+                sop->args[2] = its->mem_offset;
 
                 temp_state[arg] = TS_MEM;
             }
@@ -2614,7 +2610,6 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     num_insns = -1;
     for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
         TCGOp * const op = &s->gen_op_buf[oi];
-        TCGArg * const args = op->args;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def = &tcg_op_defs[opc];
         TCGLifeData arg_life = op->life;
@@ -2627,11 +2622,11 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
         switch (opc) {
         case INDEX_op_mov_i32:
         case INDEX_op_mov_i64:
-            tcg_reg_alloc_mov(s, def, args, arg_life);
+            tcg_reg_alloc_mov(s, def, op->args, arg_life);
             break;
         case INDEX_op_movi_i32:
         case INDEX_op_movi_i64:
-            tcg_reg_alloc_movi(s, args, arg_life);
+            tcg_reg_alloc_movi(s, op->args, arg_life);
             break;
         case INDEX_op_insn_start:
             if (num_insns >= 0) {
@@ -2641,22 +2636,22 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
             for (i = 0; i < TARGET_INSN_START_WORDS; ++i) {
                 target_ulong a;
 #if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
-                a = ((target_ulong)args[i * 2 + 1] << 32) | args[i * 2];
+                a = deposit64(op->args[i * 2], 32, 32, op->args[i * 2 + 1]);
 #else
-                a = args[i];
+                a = op->args[i];
 #endif
                 s->gen_insn_data[num_insns][i] = a;
             }
             break;
         case INDEX_op_discard:
-            temp_dead(s, &s->temps[args[0]]);
+            temp_dead(s, &s->temps[op->args[0]]);
             break;
         case INDEX_op_set_label:
             tcg_reg_alloc_bb_end(s, s->reserved_regs);
-            tcg_out_label(s, arg_label(args[0]), s->code_ptr);
+            tcg_out_label(s, arg_label(op->args[0]), s->code_ptr);
             break;
         case INDEX_op_call:
-            tcg_reg_alloc_call(s, op->callo, op->calli, args, arg_life);
+            tcg_reg_alloc_call(s, op->callo, op->calli, op->args, arg_life);
             break;
         default:
             /* Sanity check that we've not introduced any unhandled opcodes. */
@@ -2666,7 +2661,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
             /* Note: in order to speed up the code, it would be much
                faster to have specialized register allocator functions for
                some common argument patterns */
-            tcg_reg_alloc_op(s, def, opc, args, arg_life);
+            tcg_reg_alloc_op(s, def, opc, op->args, arg_life);
             break;
         }
 #ifdef CONFIG_DEBUG_TCG
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 04/16] tcg: Propagate TCGOp down to allocators
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (2 preceding siblings ...)
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 03/16] tcg: Propagate args to op->args in tcg.c Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-26 15:08   ` Alex Bennée
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 05/16] tcg: Introduce arg_temp Richard Henderson
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.c | 82 +++++++++++++++++++++++++++++++--------------------------------
 1 file changed, 40 insertions(+), 42 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index be5b69c..e2248a6 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -2111,25 +2111,24 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
     }
 }
 
-static void tcg_reg_alloc_movi(TCGContext *s, const TCGArg *args,
-                               TCGLifeData arg_life)
+static void tcg_reg_alloc_movi(TCGContext *s, const TCGOp *op)
 {
-    TCGTemp *ots = &s->temps[args[0]];
-    tcg_target_ulong val = args[1];
+    TCGTemp *ots = &s->temps[op->args[0]];
+    tcg_target_ulong val = op->args[1];
 
-    tcg_reg_alloc_do_movi(s, ots, val, arg_life);
+    tcg_reg_alloc_do_movi(s, ots, val, op->life);
 }
 
-static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
-                              const TCGArg *args, TCGLifeData arg_life)
+static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
 {
+    const TCGLifeData arg_life = op->life;
     TCGRegSet allocated_regs;
     TCGTemp *ts, *ots;
     TCGType otype, itype;
 
     tcg_regset_set(allocated_regs, s->reserved_regs);
-    ots = &s->temps[args[0]];
-    ts = &s->temps[args[1]];
+    ots = &s->temps[op->args[0]];
+    ts = &s->temps[op->args[1]];
 
     /* Note that otype != itype for no-op truncation.  */
     otype = ots->type;
@@ -2159,7 +2158,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
            liveness analysis disabled). */
         tcg_debug_assert(NEED_SYNC_ARG(0));
         if (!ots->mem_allocated) {
-            temp_allocate_frame(s, args[0]);
+            temp_allocate_frame(s, op->args[0]);
         }
         tcg_out_st(s, otype, ts->reg, ots->mem_base->reg, ots->mem_offset);
         if (IS_DEAD_ARG(1)) {
@@ -2193,10 +2192,10 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
     }
 }
 
-static void tcg_reg_alloc_op(TCGContext *s, 
-                             const TCGOpDef *def, TCGOpcode opc,
-                             const TCGArg *args, TCGLifeData arg_life)
+static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
 {
+    const TCGLifeData arg_life = op->life;
+    const TCGOpDef * const def = &tcg_op_defs[op->opc];
     TCGRegSet i_allocated_regs;
     TCGRegSet o_allocated_regs;
     int i, k, nb_iargs, nb_oargs;
@@ -2207,21 +2206,24 @@ static void tcg_reg_alloc_op(TCGContext *s,
     TCGArg new_args[TCG_MAX_OP_ARGS];
     int const_args[TCG_MAX_OP_ARGS];
 
+    /* Sanity check that we've not introduced any unhandled opcodes. */
+    tcg_debug_assert(!(def->flags & TCG_OPF_NOT_PRESENT));
+
     nb_oargs = def->nb_oargs;
     nb_iargs = def->nb_iargs;
 
     /* copy constants */
     memcpy(new_args + nb_oargs + nb_iargs, 
-           args + nb_oargs + nb_iargs, 
+           op->args + nb_oargs + nb_iargs,
            sizeof(TCGArg) * def->nb_cargs);
 
     tcg_regset_set(i_allocated_regs, s->reserved_regs);
     tcg_regset_set(o_allocated_regs, s->reserved_regs);
 
     /* satisfy input constraints */ 
-    for(k = 0; k < nb_iargs; k++) {
+    for (k = 0; k < nb_iargs; k++) {
         i = def->sorted_args[nb_oargs + k];
-        arg = args[i];
+        arg = op->args[i];
         arg_ct = &def->args_ct[i];
         ts = &s->temps[arg];
 
@@ -2239,7 +2241,7 @@ static void tcg_reg_alloc_op(TCGContext *s,
             if (ts->fixed_reg) {
                 /* if fixed register, we must allocate a new register
                    if the alias is not the same register */
-                if (arg != args[arg_ct->alias_index])
+                if (arg != op->args[arg_ct->alias_index])
                     goto allocate_in_reg;
             } else {
                 /* if the input is aliased to an output and if it is
@@ -2280,7 +2282,7 @@ static void tcg_reg_alloc_op(TCGContext *s,
     /* mark dead temporaries and free the associated registers */
     for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
         if (IS_DEAD_ARG(i)) {
-            temp_dead(s, &s->temps[args[i]]);
+            temp_dead(s, &s->temps[op->args[i]]);
         }
     }
 
@@ -2304,7 +2306,7 @@ static void tcg_reg_alloc_op(TCGContext *s,
         /* satisfy the output constraints */
         for(k = 0; k < nb_oargs; k++) {
             i = def->sorted_args[k];
-            arg = args[i];
+            arg = op->args[i];
             arg_ct = &def->args_ct[i];
             ts = &s->temps[arg];
             if ((arg_ct->ct & TCG_CT_ALIAS)
@@ -2343,11 +2345,11 @@ static void tcg_reg_alloc_op(TCGContext *s,
     }
 
     /* emit instruction */
-    tcg_out_op(s, opc, new_args, const_args);
+    tcg_out_op(s, op->opc, new_args, const_args);
     
     /* move the outputs in the correct register if needed */
     for(i = 0; i < nb_oargs; i++) {
-        ts = &s->temps[args[i]];
+        ts = &s->temps[op->args[i]];
         reg = new_args[i];
         if (ts->fixed_reg && ts->reg != reg) {
             tcg_out_mov(s, ts->type, ts->reg, reg);
@@ -2366,9 +2368,11 @@ static void tcg_reg_alloc_op(TCGContext *s,
 #define STACK_DIR(x) (x)
 #endif
 
-static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
-                               const TCGArg * const args, TCGLifeData arg_life)
+static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
 {
+    const int nb_oargs = op->callo;
+    const int nb_iargs = op->calli;
+    const TCGLifeData arg_life = op->life;
     int flags, nb_regs, i;
     TCGReg reg;
     TCGArg arg;
@@ -2379,8 +2383,8 @@ static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
     int allocate_args;
     TCGRegSet allocated_regs;
 
-    func_addr = (tcg_insn_unit *)(intptr_t)args[nb_oargs + nb_iargs];
-    flags = args[nb_oargs + nb_iargs + 1];
+    func_addr = (tcg_insn_unit *)(intptr_t)op->args[nb_oargs + nb_iargs];
+    flags = op->args[nb_oargs + nb_iargs + 1];
 
     nb_regs = ARRAY_SIZE(tcg_target_call_iarg_regs);
     if (nb_regs > nb_iargs) {
@@ -2399,8 +2403,8 @@ static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
     }
 
     stack_offset = TCG_TARGET_CALL_STACK_OFFSET;
-    for(i = nb_regs; i < nb_iargs; i++) {
-        arg = args[nb_oargs + i];
+    for (i = nb_regs; i < nb_iargs; i++) {
+        arg = op->args[nb_oargs + i];
 #ifdef TCG_TARGET_STACK_GROWSUP
         stack_offset -= sizeof(tcg_target_long);
 #endif
@@ -2417,8 +2421,8 @@ static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
     
     /* assign input registers */
     tcg_regset_set(allocated_regs, s->reserved_regs);
-    for(i = 0; i < nb_regs; i++) {
-        arg = args[nb_oargs + i];
+    for (i = 0; i < nb_regs; i++) {
+        arg = op->args[nb_oargs + i];
         if (arg != TCG_CALL_DUMMY_ARG) {
             ts = &s->temps[arg];
             reg = tcg_target_call_iarg_regs[i];
@@ -2441,9 +2445,9 @@ static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
     }
     
     /* mark dead temporaries and free the associated registers */
-    for(i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
+    for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
         if (IS_DEAD_ARG(i)) {
-            temp_dead(s, &s->temps[args[i]]);
+            temp_dead(s, &s->temps[op->args[i]]);
         }
     }
     
@@ -2468,7 +2472,7 @@ static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
 
     /* assign output registers and emit moves if needed */
     for(i = 0; i < nb_oargs; i++) {
-        arg = args[i];
+        arg = op->args[i];
         ts = &s->temps[arg];
         reg = tcg_target_call_oarg_regs[i];
         tcg_debug_assert(s->reg_to_temp[reg] == NULL);
@@ -2611,8 +2615,6 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
         TCGOp * const op = &s->gen_op_buf[oi];
         TCGOpcode opc = op->opc;
-        const TCGOpDef *def = &tcg_op_defs[opc];
-        TCGLifeData arg_life = op->life;
 
         oi_next = op->next;
 #ifdef CONFIG_PROFILER
@@ -2622,11 +2624,11 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
         switch (opc) {
         case INDEX_op_mov_i32:
         case INDEX_op_mov_i64:
-            tcg_reg_alloc_mov(s, def, op->args, arg_life);
+            tcg_reg_alloc_mov(s, op);
             break;
         case INDEX_op_movi_i32:
         case INDEX_op_movi_i64:
-            tcg_reg_alloc_movi(s, op->args, arg_life);
+            tcg_reg_alloc_movi(s, op);
             break;
         case INDEX_op_insn_start:
             if (num_insns >= 0) {
@@ -2651,17 +2653,13 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
             tcg_out_label(s, arg_label(op->args[0]), s->code_ptr);
             break;
         case INDEX_op_call:
-            tcg_reg_alloc_call(s, op->callo, op->calli, op->args, arg_life);
+            tcg_reg_alloc_call(s, op);
             break;
         default:
-            /* Sanity check that we've not introduced any unhandled opcodes. */
-            if (def->flags & TCG_OPF_NOT_PRESENT) {
-                tcg_abort();
-            }
             /* Note: in order to speed up the code, it would be much
                faster to have specialized register allocator functions for
                some common argument patterns */
-            tcg_reg_alloc_op(s, def, opc, op->args, arg_life);
+            tcg_reg_alloc_op(s, op);
             break;
         }
 #ifdef CONFIG_DEBUG_TCG
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 05/16] tcg: Introduce arg_temp
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (3 preceding siblings ...)
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 04/16] tcg: Propagate TCGOp down to allocators Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-26 16:37   ` Alex Bennée
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 06/16] tcg: Add temp_global bit to TCGTemp Richard Henderson
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c |  4 ++--
 tcg/tcg.c      | 51 +++++++++++++++++++++++++--------------------------
 tcg/tcg.h      |  5 +++++
 3 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 1a1c6fb..d8c3a7e 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -133,7 +133,7 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
     }
 
     /* If it is a temp, search for a temp local. */
-    if (!s->temps[temp].temp_local) {
+    if (!arg_temp(temp)->temp_local) {
         for (i = temps[temp].next_copy ; i != temp ; i = temps[i].next_copy) {
             if (s->temps[i].temp_local) {
                 return i;
@@ -207,7 +207,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
     }
     temps[dst].mask = mask;
 
-    if (s->temps[src].type == s->temps[dst].type) {
+    if (arg_temp(src)->type == arg_temp(dst)->type) {
         temps[dst].next_copy = temps[src].next_copy;
         temps[dst].prev_copy = src;
         temps[temps[dst].next_copy].prev_copy = dst;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index e2248a6..068ac51 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -977,11 +977,10 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
     return buf;
 }
 
-static char *tcg_get_arg_str_idx(TCGContext *s, char *buf,
-                                 int buf_size, int idx)
+static char *tcg_get_arg_str(TCGContext *s, char *buf,
+                             int buf_size, TCGArg arg)
 {
-    tcg_debug_assert(idx >= 0 && idx < s->nb_temps);
-    return tcg_get_arg_str_ptr(s, buf, buf_size, &s->temps[idx]);
+    return tcg_get_arg_str_ptr(s, buf, buf_size, arg_temp(arg));
 }
 
 /* Find helper name.  */
@@ -1084,14 +1083,14 @@ void tcg_dump_ops(TCGContext *s)
                             tcg_find_helper(s, op->args[nb_oargs + nb_iargs]),
                             op->args[nb_oargs + nb_iargs + 1], nb_oargs);
             for (i = 0; i < nb_oargs; i++) {
-                col += qemu_log(",%s", tcg_get_arg_str_idx(s, buf, sizeof(buf),
-                                                           op->args[i]));
+                col += qemu_log(",%s", tcg_get_arg_str(s, buf, sizeof(buf),
+                                                       op->args[i]));
             }
             for (i = 0; i < nb_iargs; i++) {
                 TCGArg arg = op->args[nb_oargs + i];
                 const char *t = "<dummy>";
                 if (arg != TCG_CALL_DUMMY_ARG) {
-                    t = tcg_get_arg_str_idx(s, buf, sizeof(buf), arg);
+                    t = tcg_get_arg_str(s, buf, sizeof(buf), arg);
                 }
                 col += qemu_log(",%s", t);
             }
@@ -1107,15 +1106,15 @@ void tcg_dump_ops(TCGContext *s)
                 if (k != 0) {
                     col += qemu_log(",");
                 }
-                col += qemu_log("%s", tcg_get_arg_str_idx(s, buf, sizeof(buf),
-                                                          op->args[k++]));
+                col += qemu_log("%s", tcg_get_arg_str(s, buf, sizeof(buf),
+                                                      op->args[k++]));
             }
             for (i = 0; i < nb_iargs; i++) {
                 if (k != 0) {
                     col += qemu_log(",");
                 }
-                col += qemu_log("%s", tcg_get_arg_str_idx(s, buf, sizeof(buf),
-                                                          op->args[k++]));
+                col += qemu_log("%s", tcg_get_arg_str(s, buf, sizeof(buf),
+                                                      op->args[k++]));
             }
             switch (c) {
             case INDEX_op_brcond_i32:
@@ -1707,7 +1706,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
             if (arg < nb_globals) {
                 dir = dir_temps[arg];
                 if (dir != 0 && temp_state[arg] == TS_DEAD) {
-                    TCGTemp *its = &s->temps[arg];
+                    TCGTemp *its = arg_temp(arg);
                     TCGOpcode lopc = (its->type == TCG_TYPE_I32
                                       ? INDEX_op_ld_i32
                                       : INDEX_op_ld_i64);
@@ -1778,7 +1777,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
 
             /* Sync outputs upon their last write.  */
             if (NEED_SYNC_ARG(i)) {
-                TCGTemp *its = &s->temps[arg];
+                TCGTemp *its = arg_temp(arg);
                 TCGOpcode sopc = (its->type == TCG_TYPE_I32
                                   ? INDEX_op_st_i32
                                   : INDEX_op_st_i64);
@@ -1809,7 +1808,7 @@ static void dump_regs(TCGContext *s)
 
     for(i = 0; i < s->nb_temps; i++) {
         ts = &s->temps[i];
-        printf("  %10s: ", tcg_get_arg_str_idx(s, buf, sizeof(buf), i));
+        printf("  %10s: ", tcg_get_arg_str_ptr(s, buf, sizeof(buf), ts));
         switch(ts->val_type) {
         case TEMP_VAL_REG:
             printf("%s", tcg_target_reg_names[ts->reg]);
@@ -2113,7 +2112,7 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
 
 static void tcg_reg_alloc_movi(TCGContext *s, const TCGOp *op)
 {
-    TCGTemp *ots = &s->temps[op->args[0]];
+    TCGTemp *ots = arg_temp(op->args[0]);
     tcg_target_ulong val = op->args[1];
 
     tcg_reg_alloc_do_movi(s, ots, val, op->life);
@@ -2127,8 +2126,8 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
     TCGType otype, itype;
 
     tcg_regset_set(allocated_regs, s->reserved_regs);
-    ots = &s->temps[op->args[0]];
-    ts = &s->temps[op->args[1]];
+    ots = arg_temp(op->args[0]);
+    ts = arg_temp(op->args[1]);
 
     /* Note that otype != itype for no-op truncation.  */
     otype = ots->type;
@@ -2225,7 +2224,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
         i = def->sorted_args[nb_oargs + k];
         arg = op->args[i];
         arg_ct = &def->args_ct[i];
-        ts = &s->temps[arg];
+        ts = arg_temp(arg);
 
         if (ts->val_type == TEMP_VAL_CONST
             && tcg_target_const_match(ts->val, ts->type, arg_ct)) {
@@ -2282,7 +2281,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
     /* mark dead temporaries and free the associated registers */
     for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
         if (IS_DEAD_ARG(i)) {
-            temp_dead(s, &s->temps[op->args[i]]);
+            temp_dead(s, arg_temp(op->args[i]));
         }
     }
 
@@ -2308,7 +2307,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
             i = def->sorted_args[k];
             arg = op->args[i];
             arg_ct = &def->args_ct[i];
-            ts = &s->temps[arg];
+            ts = arg_temp(arg);
             if ((arg_ct->ct & TCG_CT_ALIAS)
                 && !const_args[arg_ct->alias_index]) {
                 reg = new_args[arg_ct->alias_index];
@@ -2349,7 +2348,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
     
     /* move the outputs in the correct register if needed */
     for(i = 0; i < nb_oargs; i++) {
-        ts = &s->temps[op->args[i]];
+        ts = arg_temp(op->args[i]);
         reg = new_args[i];
         if (ts->fixed_reg && ts->reg != reg) {
             tcg_out_mov(s, ts->type, ts->reg, reg);
@@ -2409,7 +2408,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
         stack_offset -= sizeof(tcg_target_long);
 #endif
         if (arg != TCG_CALL_DUMMY_ARG) {
-            ts = &s->temps[arg];
+            ts = arg_temp(arg);
             temp_load(s, ts, tcg_target_available_regs[ts->type],
                       s->reserved_regs);
             tcg_out_st(s, ts->type, ts->reg, TCG_REG_CALL_STACK, stack_offset);
@@ -2424,7 +2423,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
     for (i = 0; i < nb_regs; i++) {
         arg = op->args[nb_oargs + i];
         if (arg != TCG_CALL_DUMMY_ARG) {
-            ts = &s->temps[arg];
+            ts = arg_temp(arg);
             reg = tcg_target_call_iarg_regs[i];
             tcg_reg_free(s, reg, allocated_regs);
 
@@ -2447,7 +2446,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
     /* mark dead temporaries and free the associated registers */
     for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
         if (IS_DEAD_ARG(i)) {
-            temp_dead(s, &s->temps[op->args[i]]);
+            temp_dead(s, arg_temp(op->args[i]));
         }
     }
     
@@ -2473,7 +2472,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
     /* assign output registers and emit moves if needed */
     for(i = 0; i < nb_oargs; i++) {
         arg = op->args[i];
-        ts = &s->temps[arg];
+        ts = arg_temp(arg);
         reg = tcg_target_call_oarg_regs[i];
         tcg_debug_assert(s->reg_to_temp[reg] == NULL);
 
@@ -2646,7 +2645,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
             }
             break;
         case INDEX_op_discard:
-            temp_dead(s, &s->temps[op->args[0]]);
+            temp_dead(s, arg_temp(op->args[0]));
             break;
         case INDEX_op_set_label:
             tcg_reg_alloc_bb_end(s, s->reserved_regs);
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 720e04e..70d9fda 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -724,6 +724,11 @@ struct TCGContext {
 extern TCGContext tcg_ctx;
 extern bool parallel_cpus;
 
+static inline TCGTemp *arg_temp(TCGArg a)
+{
+    return &tcg_ctx.temps[a];
+}
+
 static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
 {
     tcg_ctx.gen_op_buf[op_idx].args[arg] = v;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 06/16] tcg: Add temp_global bit to TCGTemp
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (4 preceding siblings ...)
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 05/16] tcg: Introduce arg_temp Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-27  8:39   ` Alex Bennée
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 07/16] tcg: Return NULL temp for TCG_CALL_DUMMY_ARG Richard Henderson
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

This avoids needing to test the index of a temp against nb_globals.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 15 ++++++++-------
 tcg/tcg.c      | 11 ++++++++---
 tcg/tcg.h      | 12 ++++++++----
 3 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index d8c3a7e..55f9e83 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -116,25 +116,26 @@ static TCGOpcode op_to_movi(TCGOpcode op)
     }
 }
 
-static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
+static TCGArg find_better_copy(TCGContext *s, TCGArg arg)
 {
+    TCGTemp *ts = arg_temp(arg);
     TCGArg i;
 
     /* If this is already a global, we can't do better. */
-    if (temp < s->nb_globals) {
-        return temp;
+    if (ts->temp_global) {
+        return arg;
     }
 
     /* Search for a global first. */
-    for (i = temps[temp].next_copy ; i != temp ; i = temps[i].next_copy) {
+    for (i = temps[arg].next_copy ; i != arg; i = temps[i].next_copy) {
         if (i < s->nb_globals) {
             return i;
         }
     }
 
     /* If it is a temp, search for a temp local. */
-    if (!arg_temp(temp)->temp_local) {
-        for (i = temps[temp].next_copy ; i != temp ; i = temps[i].next_copy) {
+    if (!ts->temp_local) {
+        for (i = temps[arg].next_copy ; i != arg; i = temps[i].next_copy) {
             if (s->temps[i].temp_local) {
                 return i;
             }
@@ -142,7 +143,7 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
     }
 
     /* Failure to find a better representation, return the same temp. */
-    return temp;
+    return arg;
 }
 
 static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 068ac51..0bb88b1 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -489,9 +489,14 @@ static inline TCGTemp *tcg_temp_alloc(TCGContext *s)
 
 static inline TCGTemp *tcg_global_alloc(TCGContext *s)
 {
+    TCGTemp *ts;
+
     tcg_debug_assert(s->nb_globals == s->nb_temps);
     s->nb_globals++;
-    return tcg_temp_alloc(s);
+    ts = tcg_temp_alloc(s);
+    ts->temp_global = 1;
+
+    return ts;
 }
 
 static int tcg_global_reg_new_internal(TCGContext *s, TCGType type,
@@ -967,7 +972,7 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
 {
     int idx = temp_idx(s, ts);
 
-    if (idx < s->nb_globals) {
+    if (ts->temp_global) {
         pstrcpy(buf, buf_size, ts->name);
     } else if (ts->temp_local) {
         snprintf(buf, buf_size, "loc%d", idx - s->nb_globals);
@@ -1905,7 +1910,7 @@ static void temp_free_or_dead(TCGContext *s, TCGTemp *ts, int free_or_dead)
     }
     ts->val_type = (free_or_dead < 0
                     || ts->temp_local
-                    || temp_idx(s, ts) < s->nb_globals
+                    || ts->temp_global
                     ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
 }
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 70d9fda..3b35344 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -586,10 +586,14 @@ typedef struct TCGTemp {
     unsigned int indirect_base:1;
     unsigned int mem_coherent:1;
     unsigned int mem_allocated:1;
-    unsigned int temp_local:1; /* If true, the temp is saved across
-                                  basic blocks. Otherwise, it is not
-                                  preserved across basic blocks. */
-    unsigned int temp_allocated:1; /* never used for code gen */
+    /* If true, the temp is saved across both basic blocks and
+       translation blocks.  */
+    unsigned int temp_global:1;
+    /* If true, the temp is saved across basic blocks but dead
+       at the end of translation blocks.  If false, the temp is
+       dead at the end of basic blocks.  */
+    unsigned int temp_local:1;
+    unsigned int temp_allocated:1;
 
     tcg_target_long val;
     struct TCGTemp *mem_base;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 07/16] tcg: Return NULL temp for TCG_CALL_DUMMY_ARG
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (5 preceding siblings ...)
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 06/16] tcg: Add temp_global bit to TCGTemp Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-27  8:47   ` Alex Bennée
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 08/16] tcg: Introduce temp_arg Richard Henderson
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index 3b35344..6c357e7 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -730,7 +730,7 @@ extern bool parallel_cpus;
 
 static inline TCGTemp *arg_temp(TCGArg a)
 {
-    return &tcg_ctx.temps[a];
+    return a == TCG_CALL_DUMMY_ARG ? NULL : &tcg_ctx.temps[a];
 }
 
 static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 08/16] tcg: Introduce temp_arg
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (6 preceding siblings ...)
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 07/16] tcg: Return NULL temp for TCG_CALL_DUMMY_ARG Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 09/16] tcg: Use per-temp state data in liveness Richard Henderson
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.c | 4 ++--
 tcg/tcg.h | 7 +++++++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 0bb88b1..0d758e4 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1718,7 +1718,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
                     TCGOp *lop = tcg_op_insert_before(s, op, lopc, 3);
 
                     lop->args[0] = dir;
-                    lop->args[1] = temp_idx(s, its->mem_base);
+                    lop->args[1] = temp_arg(its->mem_base);
                     lop->args[2] = its->mem_offset;
 
                     /* Loaded, but synced with memory.  */
@@ -1789,7 +1789,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
                 TCGOp *sop = tcg_op_insert_after(s, op, sopc, 3);
 
                 sop->args[0] = dir;
-                sop->args[1] = temp_idx(s, its->mem_base);
+                sop->args[1] = temp_arg(its->mem_base);
                 sop->args[2] = its->mem_offset;
 
                 temp_state[arg] = TS_MEM;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 6c357e7..80012b5 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -728,6 +728,13 @@ struct TCGContext {
 extern TCGContext tcg_ctx;
 extern bool parallel_cpus;
 
+static inline TCGArg temp_arg(TCGTemp *ts)
+{
+    ptrdiff_t n = ts - tcg_ctx.temps;
+    tcg_debug_assert(n >= 0 && n < tcg_ctx.nb_temps);
+    return n;
+}
+
 static inline TCGTemp *arg_temp(TCGArg a)
 {
     return a == TCG_CALL_DUMMY_ARG ? NULL : &tcg_ctx.temps[a];
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 09/16] tcg: Use per-temp state data in liveness
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (7 preceding siblings ...)
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 08/16] tcg: Introduce temp_arg Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-27  8:57   ` Alex Bennée
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 10/16] tcg: Avoid loops against variable bounds Richard Henderson
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

This avoids having to allocate external memory for each temporary.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.c | 232 ++++++++++++++++++++++++++++++--------------------------------
 tcg/tcg.h |   6 ++
 2 files changed, 120 insertions(+), 118 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 0d758e4..e78140b 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1399,42 +1399,54 @@ TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *old_op,
 
 /* liveness analysis: end of function: all temps are dead, and globals
    should be in memory. */
-static inline void tcg_la_func_end(TCGContext *s, uint8_t *temp_state)
+static void tcg_la_func_end(TCGContext *s)
 {
-    memset(temp_state, TS_DEAD | TS_MEM, s->nb_globals);
-    memset(temp_state + s->nb_globals, TS_DEAD, s->nb_temps - s->nb_globals);
+    int ng = s->nb_globals;
+    int nt = s->nb_temps;
+    int i;
+
+    for (i = 0; i < ng; ++i) {
+        s->temps[i].state = TS_DEAD | TS_MEM;
+    }
+    for (i = ng; i < nt; ++i) {
+        s->temps[i].state = TS_DEAD;
+    }
 }
 
 /* liveness analysis: end of basic block: all temps are dead, globals
    and local temps should be in memory. */
-static inline void tcg_la_bb_end(TCGContext *s, uint8_t *temp_state)
+static void tcg_la_bb_end(TCGContext *s)
 {
-    int i, n;
+    int ng = s->nb_globals;
+    int nt = s->nb_temps;
+    int i;
 
-    tcg_la_func_end(s, temp_state);
-    for (i = s->nb_globals, n = s->nb_temps; i < n; i++) {
-        if (s->temps[i].temp_local) {
-            temp_state[i] |= TS_MEM;
-        }
+    for (i = 0; i < ng; ++i) {
+        s->temps[i].state = TS_DEAD | TS_MEM;
+    }
+    for (i = ng; i < nt; ++i) {
+        s->temps[i].state = (s->temps[i].temp_local
+                             ? TS_DEAD | TS_MEM
+                             : TS_DEAD);
     }
 }
 
 /* Liveness analysis : update the opc_arg_life array to tell if a
    given input arguments is dead. Instructions updating dead
    temporaries are removed. */
-static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
+static void liveness_pass_1(TCGContext *s)
 {
     int nb_globals = s->nb_globals;
     int oi, oi_prev;
 
-    tcg_la_func_end(s, temp_state);
+    tcg_la_func_end(s);
 
     for (oi = s->gen_op_buf[0].prev; oi != 0; oi = oi_prev) {
         int i, nb_iargs, nb_oargs;
         TCGOpcode opc_new, opc_new2;
         bool have_opc_new2;
         TCGLifeData arg_life = 0;
-        TCGArg arg;
+        TCGTemp *arg_ts;
 
         TCGOp * const op = &s->gen_op_buf[oi];
         TCGOpcode opc = op->opc;
@@ -1454,8 +1466,8 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
                 /* pure functions can be removed if their result is unused */
                 if (call_flags & TCG_CALL_NO_SIDE_EFFECTS) {
                     for (i = 0; i < nb_oargs; i++) {
-                        arg = op->args[i];
-                        if (temp_state[arg] != TS_DEAD) {
+                        arg_ts = arg_temp(op->args[i]);
+                        if (arg_ts->state != TS_DEAD) {
                             goto do_not_remove_call;
                         }
                     }
@@ -1465,41 +1477,41 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
 
                     /* output args are dead */
                     for (i = 0; i < nb_oargs; i++) {
-                        arg = op->args[i];
-                        if (temp_state[arg] & TS_DEAD) {
+                        arg_ts = arg_temp(op->args[i]);
+                        if (arg_ts->state & TS_DEAD) {
                             arg_life |= DEAD_ARG << i;
                         }
-                        if (temp_state[arg] & TS_MEM) {
+                        if (arg_ts->state & TS_MEM) {
                             arg_life |= SYNC_ARG << i;
                         }
-                        temp_state[arg] = TS_DEAD;
+                        arg_ts->state = TS_DEAD;
                     }
 
                     if (!(call_flags & (TCG_CALL_NO_WRITE_GLOBALS |
                                         TCG_CALL_NO_READ_GLOBALS))) {
                         /* globals should go back to memory */
-                        memset(temp_state, TS_DEAD | TS_MEM, nb_globals);
+                        for (i = 0; i < nb_globals; i++) {
+                            s->temps[i].state = TS_DEAD | TS_MEM;
+                        }
                     } else if (!(call_flags & TCG_CALL_NO_READ_GLOBALS)) {
                         /* globals should be synced to memory */
                         for (i = 0; i < nb_globals; i++) {
-                            temp_state[i] |= TS_MEM;
+                            s->temps[i].state |= TS_MEM;
                         }
                     }
 
                     /* record arguments that die in this helper */
                     for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
-                        arg = op->args[i];
-                        if (arg != TCG_CALL_DUMMY_ARG) {
-                            if (temp_state[arg] & TS_DEAD) {
-                                arg_life |= DEAD_ARG << i;
-                            }
+                        arg_ts = arg_temp(op->args[i]);
+                        if (arg_ts && arg_ts->state & TS_DEAD) {
+                            arg_life |= DEAD_ARG << i;
                         }
                     }
                     /* input arguments are live for preceding opcodes */
                     for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
-                        arg = op->args[i];
-                        if (arg != TCG_CALL_DUMMY_ARG) {
-                            temp_state[arg] &= ~TS_DEAD;
+                        arg_ts = arg_temp(op->args[i]);
+                        if (arg_ts) {
+                            arg_ts->state &= ~TS_DEAD;
                         }
                     }
                 }
@@ -1509,7 +1521,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
             break;
         case INDEX_op_discard:
             /* mark the temporary as dead */
-            temp_state[op->args[0]] = TS_DEAD;
+            arg_temp(op->args[0])->state = TS_DEAD;
             break;
 
         case INDEX_op_add2_i32:
@@ -1530,8 +1542,8 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
                the low part.  The result can be optimized to a simple
                add or sub.  This happens often for x86_64 guest when the
                cpu mode is set to 32 bit.  */
-            if (temp_state[op->args[1]] == TS_DEAD) {
-                if (temp_state[op->args[0]] == TS_DEAD) {
+            if (arg_temp(op->args[1])->state == TS_DEAD) {
+                if (arg_temp(op->args[0])->state == TS_DEAD) {
                     goto do_remove;
                 }
                 /* Replace the opcode and adjust the args in place,
@@ -1568,8 +1580,8 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
         do_mul2:
             nb_iargs = 2;
             nb_oargs = 2;
-            if (temp_state[op->args[1]] == TS_DEAD) {
-                if (temp_state[op->args[0]] == TS_DEAD) {
+            if (arg_temp(op->args[1])->state == TS_DEAD) {
+                if (arg_temp(op->args[0])->state == TS_DEAD) {
                     /* Both parts of the operation are dead.  */
                     goto do_remove;
                 }
@@ -1577,7 +1589,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
                 op->opc = opc = opc_new;
                 op->args[1] = op->args[2];
                 op->args[2] = op->args[3];
-            } else if (temp_state[op->args[0]] == TS_DEAD && have_opc_new2) {
+            } else if (arg_temp(op->args[0])->state == TS_DEAD && have_opc_new2) {
                 /* The low part of the operation is dead; generate the high. */
                 op->opc = opc = opc_new2;
                 op->args[0] = op->args[1];
@@ -1600,7 +1612,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
                implies side effects */
             if (!(def->flags & TCG_OPF_SIDE_EFFECTS) && nb_oargs != 0) {
                 for (i = 0; i < nb_oargs; i++) {
-                    if (temp_state[op->args[i]] != TS_DEAD) {
+                    if (arg_temp(op->args[i])->state != TS_DEAD) {
                         goto do_not_remove;
                     }
                 }
@@ -1610,36 +1622,36 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
             do_not_remove:
                 /* output args are dead */
                 for (i = 0; i < nb_oargs; i++) {
-                    arg = op->args[i];
-                    if (temp_state[arg] & TS_DEAD) {
+                    arg_ts = arg_temp(op->args[i]);
+                    if (arg_ts->state & TS_DEAD) {
                         arg_life |= DEAD_ARG << i;
                     }
-                    if (temp_state[arg] & TS_MEM) {
+                    if (arg_ts->state & TS_MEM) {
                         arg_life |= SYNC_ARG << i;
                     }
-                    temp_state[arg] = TS_DEAD;
+                    arg_ts->state = TS_DEAD;
                 }
 
                 /* if end of basic block, update */
                 if (def->flags & TCG_OPF_BB_END) {
-                    tcg_la_bb_end(s, temp_state);
+                    tcg_la_bb_end(s);
                 } else if (def->flags & TCG_OPF_SIDE_EFFECTS) {
                     /* globals should be synced to memory */
                     for (i = 0; i < nb_globals; i++) {
-                        temp_state[i] |= TS_MEM;
+                        s->temps[i].state |= TS_MEM;
                     }
                 }
 
                 /* record arguments that die in this opcode */
                 for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-                    arg = op->args[i];
-                    if (temp_state[arg] & TS_DEAD) {
+                    arg_ts = arg_temp(op->args[i]);
+                    if (arg_ts->state & TS_DEAD) {
                         arg_life |= DEAD_ARG << i;
                     }
                 }
                 /* input arguments are live for preceding opcodes */
                 for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-                    temp_state[op->args[i]] &= ~TS_DEAD;
+                    arg_temp(op->args[i])->state &= ~TS_DEAD;
                 }
             }
             break;
@@ -1649,16 +1661,12 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
 }
 
 /* Liveness analysis: Convert indirect regs to direct temporaries.  */
-static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
+static bool liveness_pass_2(TCGContext *s)
 {
     int nb_globals = s->nb_globals;
-    int16_t *dir_temps;
     int i, oi, oi_next;
     bool changes = false;
 
-    dir_temps = tcg_malloc(nb_globals * sizeof(int16_t));
-    memset(dir_temps, 0, nb_globals * sizeof(int16_t));
-
     /* Create a temporary for each indirect global.  */
     for (i = 0; i < nb_globals; ++i) {
         TCGTemp *its = &s->temps[i];
@@ -1666,19 +1674,19 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
             TCGTemp *dts = tcg_temp_alloc(s);
             dts->type = its->type;
             dts->base_type = its->base_type;
-            dir_temps[i] = temp_idx(s, dts);
+            its->state_ptr = dts;
         }
+        /* All globals begin dead.  */
+        its->state = TS_DEAD;
     }
 
-    memset(temp_state, TS_DEAD, nb_globals);
-
     for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
         TCGOp *op = &s->gen_op_buf[oi];
         TCGOpcode opc = op->opc;
         const TCGOpDef *def = &tcg_op_defs[opc];
         TCGLifeData arg_life = op->life;
         int nb_iargs, nb_oargs, call_flags;
-        TCGArg arg, dir;
+        TCGTemp *arg_ts, *dir_ts;
 
         oi_next = op->next;
 
@@ -1706,24 +1714,20 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
 
         /* Make sure that input arguments are available.  */
         for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
-            arg = op->args[i];
-            /* Note this unsigned test catches TCG_CALL_ARG_DUMMY too.  */
-            if (arg < nb_globals) {
-                dir = dir_temps[arg];
-                if (dir != 0 && temp_state[arg] == TS_DEAD) {
-                    TCGTemp *its = arg_temp(arg);
-                    TCGOpcode lopc = (its->type == TCG_TYPE_I32
-                                      ? INDEX_op_ld_i32
-                                      : INDEX_op_ld_i64);
-                    TCGOp *lop = tcg_op_insert_before(s, op, lopc, 3);
-
-                    lop->args[0] = dir;
-                    lop->args[1] = temp_arg(its->mem_base);
-                    lop->args[2] = its->mem_offset;
-
-                    /* Loaded, but synced with memory.  */
-                    temp_state[arg] = TS_MEM;
-                }
+            arg_ts = arg_temp(op->args[i]);
+            dir_ts = arg_ts->state_ptr;
+            if (dir_ts && arg_ts->state == TS_DEAD) {
+                TCGOpcode lopc = (arg_ts->type == TCG_TYPE_I32
+                                  ? INDEX_op_ld_i32
+                                  : INDEX_op_ld_i64);
+                TCGOp *lop = tcg_op_insert_before(s, op, lopc, 3);
+
+                lop->args[0] = temp_arg(dir_ts);
+                lop->args[1] = temp_arg(arg_ts->mem_base);
+                lop->args[2] = arg_ts->mem_offset;
+
+                /* Loaded, but synced with memory.  */
+                arg_ts->state = TS_MEM;
             }
         }
 
@@ -1731,15 +1735,13 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
            No action is required except keeping temp_state up to date
            so that we reload when needed.  */
         for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
-            arg = op->args[i];
-            if (arg < nb_globals) {
-                dir = dir_temps[arg];
-                if (dir != 0) {
-                    op->args[i] = dir;
-                    changes = true;
-                    if (IS_DEAD_ARG(i)) {
-                        temp_state[arg] = TS_DEAD;
-                    }
+            arg_ts = arg_temp(op->args[i]);
+            dir_ts = arg_ts->state_ptr;
+            if (dir_ts) {
+                op->args[i] = temp_arg(dir_ts);
+                changes = true;
+                if (IS_DEAD_ARG(i)) {
+                    arg_ts->state = TS_DEAD;
                 }
             }
         }
@@ -1752,51 +1754,49 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
             for (i = 0; i < nb_globals; ++i) {
                 /* Liveness should see that globals are synced back,
                    that is, either TS_DEAD or TS_MEM.  */
-                tcg_debug_assert(dir_temps[i] == 0
-                                 || temp_state[i] != 0);
+                arg_ts = &s->temps[i];
+                tcg_debug_assert(arg_ts->state_ptr == 0
+                                 || arg_ts->state != 0);
             }
         } else {
             for (i = 0; i < nb_globals; ++i) {
                 /* Liveness should see that globals are saved back,
                    that is, TS_DEAD, waiting to be reloaded.  */
-                tcg_debug_assert(dir_temps[i] == 0
-                                 || temp_state[i] == TS_DEAD);
+                arg_ts = &s->temps[i];
+                tcg_debug_assert(arg_ts->state_ptr == 0
+                                 || arg_ts->state == TS_DEAD);
             }
         }
 
         /* Outputs become available.  */
         for (i = 0; i < nb_oargs; i++) {
-            arg = op->args[i];
-            if (arg >= nb_globals) {
-                continue;
-            }
-            dir = dir_temps[arg];
-            if (dir == 0) {
+            arg_ts = arg_temp(op->args[i]);
+            dir_ts = arg_ts->state_ptr;
+            if (!dir_ts) {
                 continue;
             }
-            op->args[i] = dir;
+            op->args[i] = temp_arg(dir_ts);
             changes = true;
 
             /* The output is now live and modified.  */
-            temp_state[arg] = 0;
+            arg_ts->state = 0;
 
             /* Sync outputs upon their last write.  */
             if (NEED_SYNC_ARG(i)) {
-                TCGTemp *its = arg_temp(arg);
-                TCGOpcode sopc = (its->type == TCG_TYPE_I32
+                TCGOpcode sopc = (arg_ts->type == TCG_TYPE_I32
                                   ? INDEX_op_st_i32
                                   : INDEX_op_st_i64);
                 TCGOp *sop = tcg_op_insert_after(s, op, sopc, 3);
 
-                sop->args[0] = dir;
-                sop->args[1] = temp_arg(its->mem_base);
-                sop->args[2] = its->mem_offset;
+                sop->args[0] = temp_arg(dir_ts);
+                sop->args[1] = temp_arg(arg_ts->mem_base);
+                sop->args[2] = arg_ts->mem_offset;
 
-                temp_state[arg] = TS_MEM;
+                arg_ts->state = TS_MEM;
             }
             /* Drop outputs that are dead.  */
             if (IS_DEAD_ARG(i)) {
-                temp_state[arg] = TS_DEAD;
+                arg_ts->state = TS_DEAD;
             }
         }
     }
@@ -2569,27 +2569,23 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     s->la_time -= profile_getclock();
 #endif
 
-    {
-        uint8_t *temp_state = tcg_malloc(s->nb_temps + s->nb_indirects);
-
-        liveness_pass_1(s, temp_state);
+    liveness_pass_1(s);
 
-        if (s->nb_indirects > 0) {
+    if (s->nb_indirects > 0) {
 #ifdef DEBUG_DISAS
-            if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_IND)
-                         && qemu_log_in_addr_range(tb->pc))) {
-                qemu_log_lock();
-                qemu_log("OP before indirect lowering:\n");
-                tcg_dump_ops(s);
-                qemu_log("\n");
-                qemu_log_unlock();
-            }
+        if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_IND)
+                     && qemu_log_in_addr_range(tb->pc))) {
+            qemu_log_lock();
+            qemu_log("OP before indirect lowering:\n");
+            tcg_dump_ops(s);
+            qemu_log("\n");
+            qemu_log_unlock();
+        }
 #endif
-            /* Replace indirect temps with direct temps.  */
-            if (liveness_pass_2(s, temp_state)) {
-                /* If changes were made, re-run liveness.  */
-                liveness_pass_1(s, temp_state);
-            }
+        /* Replace indirect temps with direct temps.  */
+        if (liveness_pass_2(s)) {
+            /* If changes were made, re-run liveness.  */
+            liveness_pass_1(s);
         }
     }
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 80012b5..1eeeca5 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -599,6 +599,12 @@ typedef struct TCGTemp {
     struct TCGTemp *mem_base;
     intptr_t mem_offset;
     const char *name;
+
+    /* Pass-specific information that can be stored for a temporary.
+       One word worth of integer data, and one pointer to data
+       allocated separately.  */
+    uintptr_t state;
+    void *state_ptr;
 } TCGTemp;
 
 typedef struct TCGContext TCGContext;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 10/16] tcg: Avoid loops against variable bounds
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (8 preceding siblings ...)
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 09/16] tcg: Use per-temp state data in liveness Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-27  9:01   ` Alex Bennée
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 11/16] tcg: Change temp_allocate_frame arg to TCGTemp Richard Henderson
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Copy s->nb_globals or s->nb_temps to a local variable for the purposes
of iteration.  This should allow the compiler to use low-overhead
looping constructs on some hosts.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index e78140b..c228f1e 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -943,23 +943,16 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
 
 static void tcg_reg_alloc_start(TCGContext *s)
 {
-    int i;
+    int i, n;
     TCGTemp *ts;
-    for(i = 0; i < s->nb_globals; i++) {
+
+    for (i = 0, n = s->nb_globals; i < n; i++) {
         ts = &s->temps[i];
-        if (ts->fixed_reg) {
-            ts->val_type = TEMP_VAL_REG;
-        } else {
-            ts->val_type = TEMP_VAL_MEM;
-        }
+        ts->val_type = (ts->fixed_reg ? TEMP_VAL_REG : TEMP_VAL_MEM);
     }
-    for(i = s->nb_globals; i < s->nb_temps; i++) {
+    for (n = s->nb_temps; i < n; i++) {
         ts = &s->temps[i];
-        if (ts->temp_local) {
-            ts->val_type = TEMP_VAL_MEM;
-        } else {
-            ts->val_type = TEMP_VAL_DEAD;
-        }
+        ts->val_type = (ts->temp_local ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
         ts->mem_allocated = 0;
         ts->fixed_reg = 0;
     }
@@ -2050,9 +2043,9 @@ static void temp_save(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs)
    temporary registers needs to be allocated to store a constant. */
 static void save_globals(TCGContext *s, TCGRegSet allocated_regs)
 {
-    int i;
+    int i, n;
 
-    for (i = 0; i < s->nb_globals; i++) {
+    for (i = 0, n = s->nb_globals; i < n; i++) {
         temp_save(s, &s->temps[i], allocated_regs);
     }
 }
@@ -2062,9 +2055,9 @@ static void save_globals(TCGContext *s, TCGRegSet allocated_regs)
    temporary registers needs to be allocated to store a constant. */
 static void sync_globals(TCGContext *s, TCGRegSet allocated_regs)
 {
-    int i;
+    int i, n;
 
-    for (i = 0; i < s->nb_globals; i++) {
+    for (i = 0, n = s->nb_globals; i < n; i++) {
         TCGTemp *ts = &s->temps[i];
         tcg_debug_assert(ts->val_type != TEMP_VAL_REG
                          || ts->fixed_reg
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 11/16] tcg: Change temp_allocate_frame arg to TCGTemp
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (9 preceding siblings ...)
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 10/16] tcg: Avoid loops against variable bounds Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 12/16] tcg: Remove unused TCG_CALL_DUMMY_TCGV Richard Henderson
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index c228f1e..f8d96fa 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1869,10 +1869,8 @@ static void check_regs(TCGContext *s)
 }
 #endif
 
-static void temp_allocate_frame(TCGContext *s, int temp)
+static void temp_allocate_frame(TCGContext *s, TCGTemp *ts)
 {
-    TCGTemp *ts;
-    ts = &s->temps[temp];
 #if !(defined(__sparc__) && TCG_TARGET_REG_BITS == 64)
     /* Sparc64 stack is accessed with offset of 2047 */
     s->current_frame_offset = (s->current_frame_offset +
@@ -1925,7 +1923,7 @@ static void temp_sync(TCGContext *s, TCGTemp *ts,
     }
     if (!ts->mem_coherent) {
         if (!ts->mem_allocated) {
-            temp_allocate_frame(s, temp_idx(s, ts));
+            temp_allocate_frame(s, ts);
         }
         switch (ts->val_type) {
         case TEMP_VAL_CONST:
@@ -2155,7 +2153,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
            liveness analysis disabled). */
         tcg_debug_assert(NEED_SYNC_ARG(0));
         if (!ots->mem_allocated) {
-            temp_allocate_frame(s, op->args[0]);
+            temp_allocate_frame(s, ots);
         }
         tcg_out_st(s, otype, ts->reg, ots->mem_base->reg, ots->mem_offset);
         if (IS_DEAD_ARG(1)) {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 12/16] tcg: Remove unused TCG_CALL_DUMMY_TCGV
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (10 preceding siblings ...)
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 11/16] tcg: Change temp_allocate_frame arg to TCGTemp Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-27  9:42   ` Alex Bennée
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 13/16] tcg: Export temp_idx Richard Henderson
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index 1eeeca5..4f69d0c 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -503,7 +503,6 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
 #define TCG_CALL_NO_WG_SE       (TCG_CALL_NO_WG | TCG_CALL_NO_SE)
 
 /* used to align parameters */
-#define TCG_CALL_DUMMY_TCGV     MAKE_TCGV_I32(-1)
 #define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
 
 /* Conditions.  Note that these are laid out for easy manipulation by
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 13/16] tcg: Export temp_idx
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (11 preceding siblings ...)
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 12/16] tcg: Remove unused TCG_CALL_DUMMY_TCGV Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-27  9:46   ` Alex Bennée
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 14/16] tcg: Use per-temp state data in optimize Richard Henderson
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

At the same time, drop the TCGContext argument and use tcg_ctx instead.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.c | 15 ++++-----------
 tcg/tcg.h |  7 ++++++-
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index f8d96fa..26931a7 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -473,13 +473,6 @@ void tcg_func_start(TCGContext *s)
     s->be = tcg_malloc(sizeof(TCGBackendData));
 }
 
-static inline int temp_idx(TCGContext *s, TCGTemp *ts)
-{
-    ptrdiff_t n = ts - s->temps;
-    tcg_debug_assert(n >= 0 && n < s->nb_temps);
-    return n;
-}
-
 static inline TCGTemp *tcg_temp_alloc(TCGContext *s)
 {
     int n = s->nb_temps++;
@@ -516,7 +509,7 @@ static int tcg_global_reg_new_internal(TCGContext *s, TCGType type,
     ts->name = name;
     tcg_regset_set_reg(s->reserved_regs, reg);
 
-    return temp_idx(s, ts);
+    return temp_idx(ts);
 }
 
 void tcg_set_frame(TCGContext *s, TCGReg reg, intptr_t start, intptr_t size)
@@ -605,7 +598,7 @@ int tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
         ts->mem_offset = offset;
         ts->name = name;
     }
-    return temp_idx(s, ts);
+    return temp_idx(ts);
 }
 
 static int tcg_temp_new_internal(TCGType type, int temp_local)
@@ -645,7 +638,7 @@ static int tcg_temp_new_internal(TCGType type, int temp_local)
             ts->temp_allocated = 1;
             ts->temp_local = temp_local;
         }
-        idx = temp_idx(s, ts);
+        idx = temp_idx(ts);
     }
 
 #if defined(CONFIG_DEBUG_TCG)
@@ -963,7 +956,7 @@ static void tcg_reg_alloc_start(TCGContext *s)
 static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
                                  TCGTemp *ts)
 {
-    int idx = temp_idx(s, ts);
+    int idx = temp_idx(ts);
 
     if (ts->temp_global) {
         pstrcpy(buf, buf_size, ts->name);
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 4f69d0c..b75a745 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -733,13 +733,18 @@ struct TCGContext {
 extern TCGContext tcg_ctx;
 extern bool parallel_cpus;
 
-static inline TCGArg temp_arg(TCGTemp *ts)
+static inline size_t temp_idx(TCGTemp *ts)
 {
     ptrdiff_t n = ts - tcg_ctx.temps;
     tcg_debug_assert(n >= 0 && n < tcg_ctx.nb_temps);
     return n;
 }
 
+static inline TCGArg temp_arg(TCGTemp *ts)
+{
+    return temp_idx(ts);
+}
+
 static inline TCGTemp *arg_temp(TCGArg a)
 {
     return a == TCG_CALL_DUMMY_ARG ? NULL : &tcg_ctx.temps[a];
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 14/16] tcg: Use per-temp state data in optimize
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (12 preceding siblings ...)
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 13/16] tcg: Export temp_idx Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-27  9:59   ` Alex Bennée
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 15/16] tcg: Define separate structures for TCGv_* Richard Henderson
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

While we're touching many of the lines anyway, adjust the naming
of the functions to better distinguish when "TCGArg" vs "TCGTemp"
should be used.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 424 +++++++++++++++++++++++++++++++++------------------------
 tcg/tcg.h      |   5 +
 2 files changed, 249 insertions(+), 180 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 55f9e83..eb09ae5 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -34,34 +34,63 @@
 
 struct tcg_temp_info {
     bool is_const;
-    uint16_t prev_copy;
-    uint16_t next_copy;
+    TCGTemp *prev_copy;
+    TCGTemp *next_copy;
     tcg_target_ulong val;
     tcg_target_ulong mask;
 };
 
-static struct tcg_temp_info temps[TCG_MAX_TEMPS];
+static struct tcg_temp_info temps_[TCG_MAX_TEMPS];
 static TCGTempSet temps_used;
 
-static inline bool temp_is_const(TCGArg arg)
+static inline struct tcg_temp_info *ts_info(TCGTemp *ts)
 {
-    return temps[arg].is_const;
+    return ts->state_ptr;
 }
 
-static inline bool temp_is_copy(TCGArg arg)
+static inline struct tcg_temp_info *arg_info(TCGArg arg)
 {
-    return temps[arg].next_copy != arg;
+    return ts_info(arg_temp(arg));
+}
+
+static inline bool ts_is_const(TCGTemp *ts)
+{
+    return ts_info(ts)->is_const;
+}
+
+static inline bool arg_is_const(TCGArg arg)
+{
+    return ts_is_const(arg_temp(arg));
+}
+
+static inline bool ts_is_copy(TCGTemp *ts)
+{
+    return ts_info(ts)->next_copy != ts;
+}
+
+static inline bool arg_is_copy(TCGArg arg)
+{
+    return ts_is_copy(arg_temp(arg));
 }
 
 /* Reset TEMP's state, possibly removing the temp for the list of copies.  */
-static void reset_temp(TCGArg temp)
+static void reset_ts(TCGTemp *ts)
 {
-    temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
-    temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
-    temps[temp].next_copy = temp;
-    temps[temp].prev_copy = temp;
-    temps[temp].is_const = false;
-    temps[temp].mask = -1;
+    struct tcg_temp_info *ti = ts_info(ts);
+    struct tcg_temp_info *pi = ts_info(ti->prev_copy);
+    struct tcg_temp_info *ni = ts_info(ti->next_copy);
+
+    ni->prev_copy = ti->prev_copy;
+    pi->next_copy = ti->next_copy;
+    ti->next_copy = ts;
+    ti->prev_copy = ts;
+    ti->is_const = false;
+    ti->mask = -1;
+}
+
+static void reset_temp(TCGArg arg)
+{
+    reset_ts(arg_temp(arg));
 }
 
 /* Reset all temporaries, given that there are NB_TEMPS of them.  */
@@ -71,17 +100,26 @@ static void reset_all_temps(int nb_temps)
 }
 
 /* Initialize and activate a temporary.  */
-static void init_temp_info(TCGArg temp)
+static void init_ts_info(TCGTemp *ts)
 {
-    if (!test_bit(temp, temps_used.l)) {
-        temps[temp].next_copy = temp;
-        temps[temp].prev_copy = temp;
-        temps[temp].is_const = false;
-        temps[temp].mask = -1;
-        set_bit(temp, temps_used.l);
+    size_t idx = temp_idx(ts);
+    if (!test_bit(idx, temps_used.l)) {
+        struct tcg_temp_info *ti = &temps_[idx];
+
+        ts->state_ptr = ti;
+        ti->next_copy = ts;
+        ti->prev_copy = ts;
+        ti->is_const = false;
+        ti->mask = -1;
+        set_bit(idx, temps_used.l);
     }
 }
 
+static void init_arg_info(TCGArg arg)
+{
+    init_ts_info(arg_temp(arg));
+}
+
 static int op_bits(TCGOpcode op)
 {
     const TCGOpDef *def = &tcg_op_defs[op];
@@ -119,7 +157,7 @@ static TCGOpcode op_to_movi(TCGOpcode op)
 static TCGArg find_better_copy(TCGContext *s, TCGArg arg)
 {
     TCGTemp *ts = arg_temp(arg);
-    TCGArg i;
+    TCGTemp *i;
 
     /* If this is already a global, we can't do better. */
     if (ts->temp_global) {
@@ -127,17 +165,17 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg arg)
     }
 
     /* Search for a global first. */
-    for (i = temps[arg].next_copy ; i != arg; i = temps[i].next_copy) {
-        if (i < s->nb_globals) {
-            return i;
+    for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
+        if (i->temp_global) {
+            return temp_arg(i);
         }
     }
 
     /* If it is a temp, search for a temp local. */
     if (!ts->temp_local) {
-        for (i = temps[arg].next_copy ; i != arg; i = temps[i].next_copy) {
-            if (s->temps[i].temp_local) {
-                return i;
+        for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
+            if (ts->temp_local) {
+                return temp_arg(i);
             }
         }
     }
@@ -146,20 +184,20 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg arg)
     return arg;
 }
 
-static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
+static bool ts_are_copies(TCGTemp *ts1, TCGTemp *ts2)
 {
-    TCGArg i;
+    TCGTemp *i;
 
-    if (arg1 == arg2) {
+    if (ts1 == ts2) {
         return true;
     }
 
-    if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) {
+    if (!ts_is_copy(ts1) || !ts_is_copy(ts2)) {
         return false;
     }
 
-    for (i = temps[arg1].next_copy ; i != arg1 ; i = temps[i].next_copy) {
-        if (i == arg2) {
+    for (i = ts_info(ts1)->next_copy; i != ts1; i = ts_info(i)->next_copy) {
+        if (i == ts2) {
             return true;
         }
     }
@@ -167,22 +205,28 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
     return false;
 }
 
+static bool args_are_copies(TCGArg arg1, TCGArg arg2)
+{
+    return ts_are_copies(arg_temp(arg1), arg_temp(arg2));
+}
+
 static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val)
 {
     TCGOpcode new_op = op_to_movi(op->opc);
     tcg_target_ulong mask;
+    struct tcg_temp_info *di = arg_info(dst);
 
     op->opc = new_op;
 
     reset_temp(dst);
-    temps[dst].is_const = true;
-    temps[dst].val = val;
+    di->is_const = true;
+    di->val = val;
     mask = val;
     if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
         /* High bits of the destination are now garbage.  */
         mask |= ~0xffffffffull;
     }
-    temps[dst].mask = mask;
+    di->mask = mask;
 
     op->args[0] = dst;
     op->args[1] = val;
@@ -190,35 +234,44 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val)
 
 static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
 {
-    if (temps_are_copies(dst, src)) {
+    TCGTemp *dst_ts = arg_temp(dst);
+    TCGTemp *src_ts = arg_temp(src);
+    struct tcg_temp_info *di;
+    struct tcg_temp_info *si;
+    tcg_target_ulong mask;
+    TCGOpcode new_op;
+
+    if (ts_are_copies(dst_ts, src_ts)) {
         tcg_op_remove(s, op);
         return;
     }
 
-    TCGOpcode new_op = op_to_mov(op->opc);
-    tcg_target_ulong mask;
+    reset_ts(dst_ts);
+    di = ts_info(dst_ts);
+    si = ts_info(src_ts);
+    new_op = op_to_mov(op->opc);
 
     op->opc = new_op;
+    op->args[0] = dst;
+    op->args[1] = src;
 
-    reset_temp(dst);
-    mask = temps[src].mask;
+    mask = si->mask;
     if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
         /* High bits of the destination are now garbage.  */
         mask |= ~0xffffffffull;
     }
-    temps[dst].mask = mask;
-
-    if (arg_temp(src)->type == arg_temp(dst)->type) {
-        temps[dst].next_copy = temps[src].next_copy;
-        temps[dst].prev_copy = src;
-        temps[temps[dst].next_copy].prev_copy = dst;
-        temps[src].next_copy = dst;
-        temps[dst].is_const = temps[src].is_const;
-        temps[dst].val = temps[src].val;
-    }
+    di->mask = mask;
 
-    op->args[0] = dst;
-    op->args[1] = src;
+    if (src_ts->type == dst_ts->type) {
+        struct tcg_temp_info *ni = ts_info(si->next_copy);
+
+        di->next_copy = si->next_copy;
+        di->prev_copy = src_ts;
+        ni->prev_copy = dst_ts;
+        si->next_copy = dst_ts;
+        di->is_const = si->is_const;
+        di->val = si->val;
+    }
 }
 
 static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
@@ -465,18 +518,20 @@ static bool do_constant_folding_cond_eq(TCGCond c)
 static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
                                        TCGArg y, TCGCond c)
 {
-    if (temp_is_const(x) && temp_is_const(y)) {
+    tcg_target_ulong xv = arg_info(x)->val;
+    tcg_target_ulong yv = arg_info(y)->val;
+    if (arg_is_const(x) && arg_is_const(y)) {
         switch (op_bits(op)) {
         case 32:
-            return do_constant_folding_cond_32(temps[x].val, temps[y].val, c);
+            return do_constant_folding_cond_32(xv, yv, c);
         case 64:
-            return do_constant_folding_cond_64(temps[x].val, temps[y].val, c);
+            return do_constant_folding_cond_64(xv, yv, c);
         default:
             tcg_abort();
         }
-    } else if (temps_are_copies(x, y)) {
+    } else if (args_are_copies(x, y)) {
         return do_constant_folding_cond_eq(c);
-    } else if (temp_is_const(y) && temps[y].val == 0) {
+    } else if (arg_is_const(y) && yv == 0) {
         switch (c) {
         case TCG_COND_LTU:
             return 0;
@@ -496,12 +551,15 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
     TCGArg al = p1[0], ah = p1[1];
     TCGArg bl = p2[0], bh = p2[1];
 
-    if (temp_is_const(bl) && temp_is_const(bh)) {
-        uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val;
+    if (arg_is_const(bl) && arg_is_const(bh)) {
+        tcg_target_ulong blv = arg_info(bl)->val;
+        tcg_target_ulong bhv = arg_info(bh)->val;
+        uint64_t b = deposit64(blv, 32, 32, bhv);
 
-        if (temp_is_const(al) && temp_is_const(ah)) {
-            uint64_t a;
-            a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val;
+        if (arg_is_const(al) && arg_is_const(ah)) {
+            tcg_target_ulong alv = arg_info(al)->val;
+            tcg_target_ulong ahv = arg_info(ah)->val;
+            uint64_t a = deposit64(alv, 32, 32, ahv);
             return do_constant_folding_cond_64(a, b, c);
         }
         if (b == 0) {
@@ -515,7 +573,7 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
             }
         }
     }
-    if (temps_are_copies(al, bl) && temps_are_copies(ah, bh)) {
+    if (args_are_copies(al, bl) && args_are_copies(ah, bh)) {
         return do_constant_folding_cond_eq(c);
     }
     return 2;
@@ -525,8 +583,8 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
 {
     TCGArg a1 = *p1, a2 = *p2;
     int sum = 0;
-    sum += temp_is_const(a1);
-    sum -= temp_is_const(a2);
+    sum += arg_is_const(a1);
+    sum -= arg_is_const(a2);
 
     /* Prefer the constant in second argument, and then the form
        op a, a, b, which is better handled on non-RISC hosts. */
@@ -541,10 +599,10 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
 static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 {
     int sum = 0;
-    sum += temp_is_const(p1[0]);
-    sum += temp_is_const(p1[1]);
-    sum -= temp_is_const(p2[0]);
-    sum -= temp_is_const(p2[1]);
+    sum += arg_is_const(p1[0]);
+    sum += arg_is_const(p1[1]);
+    sum -= arg_is_const(p2[0]);
+    sum -= arg_is_const(p2[1]);
     if (sum > 0) {
         TCGArg t;
         t = p1[0], p1[0] = p2[0], p2[0] = t;
@@ -586,22 +644,22 @@ void tcg_optimize(TCGContext *s)
             nb_oargs = op->callo;
             nb_iargs = op->calli;
             for (i = 0; i < nb_oargs + nb_iargs; i++) {
-                tmp = op->args[i];
-                if (tmp != TCG_CALL_DUMMY_ARG) {
-                    init_temp_info(tmp);
+                TCGTemp *ts = arg_temp(op->args[i]);
+                if (ts) {
+                    init_ts_info(ts);
                 }
             }
         } else {
             nb_oargs = def->nb_oargs;
             nb_iargs = def->nb_iargs;
             for (i = 0; i < nb_oargs + nb_iargs; i++) {
-                init_temp_info(op->args[i]);
+                init_arg_info(op->args[i]);
             }
         }
 
         /* Do copy propagation */
         for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-            if (temp_is_copy(op->args[i])) {
+            if (arg_is_copy(op->args[i])) {
                 op->args[i] = find_better_copy(s, op->args[i]);
             }
         }
@@ -671,7 +729,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(sar):
         CASE_OP_32_64(rotl):
         CASE_OP_32_64(rotr):
-            if (temp_is_const(op->args[1]) && temps[op->args[1]].val == 0) {
+            if (arg_is_const(op->args[1])
+                && arg_info(op->args[1])->val == 0) {
                 tcg_opt_gen_movi(s, op, op->args[0], 0);
                 continue;
             }
@@ -681,7 +740,7 @@ void tcg_optimize(TCGContext *s)
                 TCGOpcode neg_op;
                 bool have_neg;
 
-                if (temp_is_const(op->args[2])) {
+                if (arg_is_const(op->args[2])) {
                     /* Proceed with possible constant folding. */
                     break;
                 }
@@ -695,8 +754,8 @@ void tcg_optimize(TCGContext *s)
                 if (!have_neg) {
                     break;
                 }
-                if (temp_is_const(op->args[1])
-                    && temps[op->args[1]].val == 0) {
+                if (arg_is_const(op->args[1])
+                    && arg_info(op->args[1])->val == 0) {
                     op->opc = neg_op;
                     reset_temp(op->args[0]);
                     op->args[1] = op->args[2];
@@ -706,34 +765,34 @@ void tcg_optimize(TCGContext *s)
             break;
         CASE_OP_32_64(xor):
         CASE_OP_32_64(nand):
-            if (!temp_is_const(op->args[1])
-                && temp_is_const(op->args[2])
-                && temps[op->args[2]].val == -1) {
+            if (!arg_is_const(op->args[1])
+                && arg_is_const(op->args[2])
+                && arg_info(op->args[2])->val == -1) {
                 i = 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(nor):
-            if (!temp_is_const(op->args[1])
-                && temp_is_const(op->args[2])
-                && temps[op->args[2]].val == 0) {
+            if (!arg_is_const(op->args[1])
+                && arg_is_const(op->args[2])
+                && arg_info(op->args[2])->val == 0) {
                 i = 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(andc):
-            if (!temp_is_const(op->args[2])
-                && temp_is_const(op->args[1])
-                && temps[op->args[1]].val == -1) {
+            if (!arg_is_const(op->args[2])
+                && arg_is_const(op->args[1])
+                && arg_info(op->args[1])->val == -1) {
                 i = 2;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (!temp_is_const(op->args[2])
-                && temp_is_const(op->args[1])
-                && temps[op->args[1]].val == 0) {
+            if (!arg_is_const(op->args[2])
+                && arg_is_const(op->args[1])
+                && arg_info(op->args[1])->val == 0) {
                 i = 2;
                 goto try_not;
             }
@@ -774,9 +833,9 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(or):
         CASE_OP_32_64(xor):
         CASE_OP_32_64(andc):
-            if (!temp_is_const(op->args[1])
-                && temp_is_const(op->args[2])
-                && temps[op->args[2]].val == 0) {
+            if (!arg_is_const(op->args[1])
+                && arg_is_const(op->args[2])
+                && arg_info(op->args[2])->val == 0) {
                 tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
                 continue;
             }
@@ -784,9 +843,9 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(and):
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (!temp_is_const(op->args[1])
-                && temp_is_const(op->args[2])
-                && temps[op->args[2]].val == -1) {
+            if (!arg_is_const(op->args[1])
+                && arg_is_const(op->args[2])
+                && arg_info(op->args[2])->val == -1) {
                 tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
                 continue;
             }
@@ -801,21 +860,21 @@ void tcg_optimize(TCGContext *s)
         affected = -1;
         switch (opc) {
         CASE_OP_32_64(ext8s):
-            if ((temps[op->args[1]].mask & 0x80) != 0) {
+            if ((arg_info(op->args[1])->mask & 0x80) != 0) {
                 break;
             }
         CASE_OP_32_64(ext8u):
             mask = 0xff;
             goto and_const;
         CASE_OP_32_64(ext16s):
-            if ((temps[op->args[1]].mask & 0x8000) != 0) {
+            if ((arg_info(op->args[1])->mask & 0x8000) != 0) {
                 break;
             }
         CASE_OP_32_64(ext16u):
             mask = 0xffff;
             goto and_const;
         case INDEX_op_ext32s_i64:
-            if ((temps[op->args[1]].mask & 0x80000000) != 0) {
+            if ((arg_info(op->args[1])->mask & 0x80000000) != 0) {
                 break;
             }
         case INDEX_op_ext32u_i64:
@@ -823,111 +882,114 @@ void tcg_optimize(TCGContext *s)
             goto and_const;
 
         CASE_OP_32_64(and):
-            mask = temps[op->args[2]].mask;
-            if (temp_is_const(op->args[2])) {
+            mask = arg_info(op->args[2])->mask;
+            if (arg_is_const(op->args[2])) {
         and_const:
-                affected = temps[op->args[1]].mask & ~mask;
+                affected = arg_info(op->args[1])->mask & ~mask;
             }
-            mask = temps[op->args[1]].mask & mask;
+            mask = arg_info(op->args[1])->mask & mask;
             break;
 
         case INDEX_op_ext_i32_i64:
-            if ((temps[op->args[1]].mask & 0x80000000) != 0) {
+            if ((arg_info(op->args[1])->mask & 0x80000000) != 0) {
                 break;
             }
         case INDEX_op_extu_i32_i64:
             /* We do not compute affected as it is a size changing op.  */
-            mask = (uint32_t)temps[op->args[1]].mask;
+            mask = (uint32_t)arg_info(op->args[1])->mask;
             break;
 
         CASE_OP_32_64(andc):
             /* Known-zeros does not imply known-ones.  Therefore unless
                op->args[2] is constant, we can't infer anything from it.  */
-            if (temp_is_const(op->args[2])) {
-                mask = ~temps[op->args[2]].mask;
+            if (arg_is_const(op->args[2])) {
+                mask = ~arg_info(op->args[2])->mask;
                 goto and_const;
             }
-            /* But we certainly know nothing outside op->args[1] may be set. */
-            mask = temps[op->args[1]].mask;
+            /* But we certainly know nothing outside args[1] may be set. */
+            mask = arg_info(op->args[1])->mask;
             break;
 
         case INDEX_op_sar_i32:
-            if (temp_is_const(op->args[2])) {
-                tmp = temps[op->args[2]].val & 31;
-                mask = (int32_t)temps[op->args[1]].mask >> tmp;
+            if (arg_is_const(op->args[2])) {
+                tmp = arg_info(op->args[2])->val & 31;
+                mask = (int32_t)arg_info(op->args[1])->mask >> tmp;
             }
             break;
         case INDEX_op_sar_i64:
-            if (temp_is_const(op->args[2])) {
-                tmp = temps[op->args[2]].val & 63;
-                mask = (int64_t)temps[op->args[1]].mask >> tmp;
+            if (arg_is_const(op->args[2])) {
+                tmp = arg_info(op->args[2])->val & 63;
+                mask = (int64_t)arg_info(op->args[1])->mask >> tmp;
             }
             break;
 
         case INDEX_op_shr_i32:
-            if (temp_is_const(op->args[2])) {
-                tmp = temps[op->args[2]].val & 31;
-                mask = (uint32_t)temps[op->args[1]].mask >> tmp;
+            if (arg_is_const(op->args[2])) {
+                tmp = arg_info(op->args[2])->val & 31;
+                mask = (uint32_t)arg_info(op->args[1])->mask >> tmp;
             }
             break;
         case INDEX_op_shr_i64:
-            if (temp_is_const(op->args[2])) {
-                tmp = temps[op->args[2]].val & 63;
-                mask = (uint64_t)temps[op->args[1]].mask >> tmp;
+            if (arg_is_const(op->args[2])) {
+                tmp = arg_info(op->args[2])->val & 63;
+                mask = (uint64_t)arg_info(op->args[1])->mask >> tmp;
             }
             break;
 
         case INDEX_op_extrl_i64_i32:
-            mask = (uint32_t)temps[op->args[1]].mask;
+            mask = (uint32_t)arg_info(op->args[1])->mask;
             break;
         case INDEX_op_extrh_i64_i32:
-            mask = (uint64_t)temps[op->args[1]].mask >> 32;
+            mask = (uint64_t)arg_info(op->args[1])->mask >> 32;
             break;
 
         CASE_OP_32_64(shl):
-            if (temp_is_const(op->args[2])) {
-                tmp = temps[op->args[2]].val & (TCG_TARGET_REG_BITS - 1);
-                mask = temps[op->args[1]].mask << tmp;
+            if (arg_is_const(op->args[2])) {
+                tmp = arg_info(op->args[2])->val & (TCG_TARGET_REG_BITS - 1);
+                mask = arg_info(op->args[1])->mask << tmp;
             }
             break;
 
         CASE_OP_32_64(neg):
             /* Set to 1 all bits to the left of the rightmost.  */
-            mask = -(temps[op->args[1]].mask & -temps[op->args[1]].mask);
+            mask = -(arg_info(op->args[1])->mask
+                     & -arg_info(op->args[1])->mask);
             break;
 
         CASE_OP_32_64(deposit):
-            mask = deposit64(temps[op->args[1]].mask, op->args[3],
-                             op->args[4], temps[op->args[2]].mask);
+            mask = deposit64(arg_info(op->args[1])->mask,
+                             op->args[3], op->args[4],
+                             arg_info(op->args[2])->mask);
             break;
 
         CASE_OP_32_64(extract):
-            mask = extract64(temps[op->args[1]].mask, op->args[2], op->args[3]);
+            mask = extract64(arg_info(op->args[1])->mask,
+                             op->args[2], op->args[3]);
             if (op->args[2] == 0) {
-                affected = temps[op->args[1]].mask & ~mask;
+                affected = arg_info(op->args[1])->mask & ~mask;
             }
             break;
         CASE_OP_32_64(sextract):
-            mask = sextract64(temps[op->args[1]].mask,
+            mask = sextract64(arg_info(op->args[1])->mask,
                               op->args[2], op->args[3]);
             if (op->args[2] == 0 && (tcg_target_long)mask >= 0) {
-                affected = temps[op->args[1]].mask & ~mask;
+                affected = arg_info(op->args[1])->mask & ~mask;
             }
             break;
 
         CASE_OP_32_64(or):
         CASE_OP_32_64(xor):
-            mask = temps[op->args[1]].mask | temps[op->args[2]].mask;
+            mask = arg_info(op->args[1])->mask | arg_info(op->args[2])->mask;
             break;
 
         case INDEX_op_clz_i32:
         case INDEX_op_ctz_i32:
-            mask = temps[op->args[2]].mask | 31;
+            mask = arg_info(op->args[2])->mask | 31;
             break;
 
         case INDEX_op_clz_i64:
         case INDEX_op_ctz_i64:
-            mask = temps[op->args[2]].mask | 63;
+            mask = arg_info(op->args[2])->mask | 63;
             break;
 
         case INDEX_op_ctpop_i32:
@@ -943,7 +1005,7 @@ void tcg_optimize(TCGContext *s)
             break;
 
         CASE_OP_32_64(movcond):
-            mask = temps[op->args[3]].mask | temps[op->args[4]].mask;
+            mask = arg_info(op->args[3])->mask | arg_info(op->args[4])->mask;
             break;
 
         CASE_OP_32_64(ld8u):
@@ -997,7 +1059,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(mul):
         CASE_OP_32_64(muluh):
         CASE_OP_32_64(mulsh):
-            if ((temp_is_const(op->args[2]) && temps[op->args[2]].val == 0)) {
+            if (arg_is_const(op->args[2])
+                && arg_info(op->args[2])->val == 0) {
                 tcg_opt_gen_movi(s, op, op->args[0], 0);
                 continue;
             }
@@ -1010,7 +1073,7 @@ void tcg_optimize(TCGContext *s)
         switch (opc) {
         CASE_OP_32_64(or):
         CASE_OP_32_64(and):
-            if (temps_are_copies(op->args[1], op->args[2])) {
+            if (args_are_copies(op->args[1], op->args[2])) {
                 tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
                 continue;
             }
@@ -1024,7 +1087,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(andc):
         CASE_OP_32_64(sub):
         CASE_OP_32_64(xor):
-            if (temps_are_copies(op->args[1], op->args[2])) {
+            if (args_are_copies(op->args[1], op->args[2])) {
                 tcg_opt_gen_movi(s, op, op->args[0], 0);
                 continue;
             }
@@ -1057,8 +1120,8 @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_extu_i32_i64:
         case INDEX_op_extrl_i64_i32:
         case INDEX_op_extrh_i64_i32:
-            if (temp_is_const(op->args[1])) {
-                tmp = do_constant_folding(opc, temps[op->args[1]].val, 0);
+            if (arg_is_const(op->args[1])) {
+                tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
                 tcg_opt_gen_movi(s, op, op->args[0], tmp);
                 break;
             }
@@ -1086,9 +1149,9 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(divu):
         CASE_OP_32_64(rem):
         CASE_OP_32_64(remu):
-            if (temp_is_const(op->args[1]) && temp_is_const(op->args[2])) {
-                tmp = do_constant_folding(opc, temps[op->args[1]].val,
-                                          temps[op->args[2]].val);
+            if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
+                tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
+                                          arg_info(op->args[2])->val);
                 tcg_opt_gen_movi(s, op, op->args[0], tmp);
                 break;
             }
@@ -1096,8 +1159,8 @@ void tcg_optimize(TCGContext *s)
 
         CASE_OP_32_64(clz):
         CASE_OP_32_64(ctz):
-            if (temp_is_const(op->args[1])) {
-                TCGArg v = temps[op->args[1]].val;
+            if (arg_is_const(op->args[1])) {
+                TCGArg v = arg_info(op->args[1])->val;
                 if (v != 0) {
                     tmp = do_constant_folding(opc, v, 0);
                     tcg_opt_gen_movi(s, op, op->args[0], tmp);
@@ -1109,17 +1172,18 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         CASE_OP_32_64(deposit):
-            if (temp_is_const(op->args[1]) && temp_is_const(op->args[2])) {
-                tmp = deposit64(temps[op->args[1]].val, op->args[3],
-                                op->args[4], temps[op->args[2]].val);
+            if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
+                tmp = deposit64(arg_info(op->args[1])->val,
+                                op->args[3], op->args[4],
+                                arg_info(op->args[2])->val);
                 tcg_opt_gen_movi(s, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
 
         CASE_OP_32_64(extract):
-            if (temp_is_const(op->args[1])) {
-                tmp = extract64(temps[op->args[1]].val,
+            if (arg_is_const(op->args[1])) {
+                tmp = extract64(arg_info(op->args[1])->val,
                                 op->args[2], op->args[3]);
                 tcg_opt_gen_movi(s, op, op->args[0], tmp);
                 break;
@@ -1127,8 +1191,8 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         CASE_OP_32_64(sextract):
-            if (temp_is_const(op->args[1])) {
-                tmp = sextract64(temps[op->args[1]].val,
+            if (arg_is_const(op->args[1])) {
+                tmp = sextract64(arg_info(op->args[1])->val,
                                  op->args[2], op->args[3]);
                 tcg_opt_gen_movi(s, op, op->args[0], tmp);
                 break;
@@ -1166,9 +1230,9 @@ void tcg_optimize(TCGContext *s)
                 tcg_opt_gen_mov(s, op, op->args[0], op->args[4-tmp]);
                 break;
             }
-            if (temp_is_const(op->args[3]) && temp_is_const(op->args[4])) {
-                tcg_target_ulong tv = temps[op->args[3]].val;
-                tcg_target_ulong fv = temps[op->args[4]].val;
+            if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
+                tcg_target_ulong tv = arg_info(op->args[3])->val;
+                tcg_target_ulong fv = arg_info(op->args[4])->val;
                 TCGCond cond = op->args[5];
                 if (fv == 1 && tv == 0) {
                     cond = tcg_invert_cond(cond);
@@ -1185,12 +1249,12 @@ void tcg_optimize(TCGContext *s)
 
         case INDEX_op_add2_i32:
         case INDEX_op_sub2_i32:
-            if (temp_is_const(op->args[2]) && temp_is_const(op->args[3])
-                && temp_is_const(op->args[4]) && temp_is_const(op->args[5])) {
-                uint32_t al = temps[op->args[2]].val;
-                uint32_t ah = temps[op->args[3]].val;
-                uint32_t bl = temps[op->args[4]].val;
-                uint32_t bh = temps[op->args[5]].val;
+            if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])
+                && arg_is_const(op->args[4]) && arg_is_const(op->args[5])) {
+                uint32_t al = arg_info(op->args[2])->val;
+                uint32_t ah = arg_info(op->args[3])->val;
+                uint32_t bl = arg_info(op->args[4])->val;
+                uint32_t bh = arg_info(op->args[5])->val;
                 uint64_t a = ((uint64_t)ah << 32) | al;
                 uint64_t b = ((uint64_t)bh << 32) | bl;
                 TCGArg rl, rh;
@@ -1214,9 +1278,9 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         case INDEX_op_mulu2_i32:
-            if (temp_is_const(op->args[2]) && temp_is_const(op->args[3])) {
-                uint32_t a = temps[op->args[2]].val;
-                uint32_t b = temps[op->args[3]].val;
+            if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
+                uint32_t a = arg_info(op->args[2])->val;
+                uint32_t b = arg_info(op->args[3])->val;
                 uint64_t r = (uint64_t)a * b;
                 TCGArg rl, rh;
                 TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32, 2);
@@ -1247,10 +1311,10 @@ void tcg_optimize(TCGContext *s)
                 }
             } else if ((op->args[4] == TCG_COND_LT
                         || op->args[4] == TCG_COND_GE)
-                       && temp_is_const(op->args[2])
-                       && temps[op->args[2]].val == 0
-                       && temp_is_const(op->args[3])
-                       && temps[op->args[3]].val == 0) {
+                       && arg_is_const(op->args[2])
+                       && arg_info(op->args[2])->val == 0
+                       && arg_is_const(op->args[3])
+                       && arg_info(op->args[3])->val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_brcond_high:
@@ -1318,15 +1382,15 @@ void tcg_optimize(TCGContext *s)
                 tcg_opt_gen_movi(s, op, op->args[0], tmp);
             } else if ((op->args[5] == TCG_COND_LT
                         || op->args[5] == TCG_COND_GE)
-                       && temp_is_const(op->args[3])
-                       && temps[op->args[3]].val == 0
-                       && temp_is_const(op->args[4])
-                       && temps[op->args[4]].val == 0) {
+                       && arg_is_const(op->args[3])
+                       && arg_info(op->args[3])->val == 0
+                       && arg_is_const(op->args[4])
+                       && arg_info(op->args[4])->val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_setcond_high:
                 reset_temp(op->args[0]);
-                temps[op->args[0]].mask = 1;
+                arg_info(op->args[0])->mask = 1;
                 op->opc = INDEX_op_setcond_i32;
                 op->args[1] = op->args[2];
                 op->args[2] = op->args[4];
@@ -1352,7 +1416,7 @@ void tcg_optimize(TCGContext *s)
                 }
             do_setcond_low:
                 reset_temp(op->args[0]);
-                temps[op->args[0]].mask = 1;
+                arg_info(op->args[0])->mask = 1;
                 op->opc = INDEX_op_setcond_i32;
                 op->args[2] = op->args[3];
                 op->args[3] = op->args[5];
@@ -1386,7 +1450,7 @@ void tcg_optimize(TCGContext *s)
                   & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
                 for (i = 0; i < nb_globals; i++) {
                     if (test_bit(i, temps_used.l)) {
-                        reset_temp(i);
+                        reset_ts(&s->temps[i]);
                     }
                 }
             }
@@ -1408,7 +1472,7 @@ void tcg_optimize(TCGContext *s)
                     /* Save the corresponding known-zero bits mask for the
                        first output argument (only one supported so far). */
                     if (i == 0) {
-                        temps[op->args[i]].mask = mask;
+                        arg_info(op->args[i])->mask = mask;
                     }
                 }
             }
diff --git a/tcg/tcg.h b/tcg/tcg.h
index b75a745..018c01c 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -750,6 +750,11 @@ static inline TCGTemp *arg_temp(TCGArg a)
     return a == TCG_CALL_DUMMY_ARG ? NULL : &tcg_ctx.temps[a];
 }
 
+static inline size_t arg_index(TCGArg a)
+{
+    return a;
+}
+
 static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
 {
     tcg_ctx.gen_op_buf[op_idx].args[arg] = v;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 15/16] tcg: Define separate structures for TCGv_*
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (13 preceding siblings ...)
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 14/16] tcg: Use per-temp state data in optimize Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 16/16] tcg: Store pointers to temporaries directly in TCGArg Richard Henderson
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Pointers that devolve to TCGTemp will tidy things up.
At present, we continue to store indicies in TCGArg.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.c |  67 +++++-------------
 tcg/tcg.h | 237 +++++++++++++++++++++++++++++++++-----------------------------
 2 files changed, 146 insertions(+), 158 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 26931a7..1ca1192 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -492,8 +492,8 @@ static inline TCGTemp *tcg_global_alloc(TCGContext *s)
     return ts;
 }
 
-static int tcg_global_reg_new_internal(TCGContext *s, TCGType type,
-                                       TCGReg reg, const char *name)
+static TCGTemp *tcg_global_reg_new_internal(TCGContext *s, TCGType type,
+                                            TCGReg reg, const char *name)
 {
     TCGTemp *ts;
 
@@ -509,47 +509,45 @@ static int tcg_global_reg_new_internal(TCGContext *s, TCGType type,
     ts->name = name;
     tcg_regset_set_reg(s->reserved_regs, reg);
 
-    return temp_idx(ts);
+    return ts;
 }
 
 void tcg_set_frame(TCGContext *s, TCGReg reg, intptr_t start, intptr_t size)
 {
-    int idx;
     s->frame_start = start;
     s->frame_end = start + size;
-    idx = tcg_global_reg_new_internal(s, TCG_TYPE_PTR, reg, "_frame");
-    s->frame_temp = &s->temps[idx];
+    s->frame_temp = tcg_global_reg_new_internal(s, TCG_TYPE_PTR, reg, "_frame");
 }
 
 TCGv_i32 tcg_global_reg_new_i32(TCGReg reg, const char *name)
 {
     TCGContext *s = &tcg_ctx;
-    int idx;
+    TCGTemp *t;
 
     if (tcg_regset_test_reg(s->reserved_regs, reg)) {
         tcg_abort();
     }
-    idx = tcg_global_reg_new_internal(s, TCG_TYPE_I32, reg, name);
-    return MAKE_TCGV_I32(idx);
+    t = tcg_global_reg_new_internal(s, TCG_TYPE_I32, reg, name);
+    return (TCGv_i32)t;
 }
 
 TCGv_i64 tcg_global_reg_new_i64(TCGReg reg, const char *name)
 {
     TCGContext *s = &tcg_ctx;
-    int idx;
+    TCGTemp *t;
 
     if (tcg_regset_test_reg(s->reserved_regs, reg)) {
         tcg_abort();
     }
-    idx = tcg_global_reg_new_internal(s, TCG_TYPE_I64, reg, name);
-    return MAKE_TCGV_I64(idx);
+    t = tcg_global_reg_new_internal(s, TCG_TYPE_I64, reg, name);
+    return (TCGv_i64)t;
 }
 
-int tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
-                                intptr_t offset, const char *name)
+TCGTemp *tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
+                                     intptr_t offset, const char *name)
 {
     TCGContext *s = &tcg_ctx;
-    TCGTemp *base_ts = &s->temps[GET_TCGV_PTR(base)];
+    TCGTemp *base_ts = &base->impl;
     TCGTemp *ts = tcg_global_alloc(s);
     int indirect_reg = 0, bigendian = 0;
 #ifdef HOST_WORDS_BIGENDIAN
@@ -598,10 +596,10 @@ int tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
         ts->mem_offset = offset;
         ts->name = name;
     }
-    return temp_idx(ts);
+    return ts;
 }
 
-static int tcg_temp_new_internal(TCGType type, int temp_local)
+TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
 {
     TCGContext *s = &tcg_ctx;
     TCGTemp *ts;
@@ -638,36 +636,18 @@ static int tcg_temp_new_internal(TCGType type, int temp_local)
             ts->temp_allocated = 1;
             ts->temp_local = temp_local;
         }
-        idx = temp_idx(ts);
     }
 
 #if defined(CONFIG_DEBUG_TCG)
     s->temps_in_use++;
 #endif
-    return idx;
-}
-
-TCGv_i32 tcg_temp_new_internal_i32(int temp_local)
-{
-    int idx;
-
-    idx = tcg_temp_new_internal(TCG_TYPE_I32, temp_local);
-    return MAKE_TCGV_I32(idx);
-}
-
-TCGv_i64 tcg_temp_new_internal_i64(int temp_local)
-{
-    int idx;
-
-    idx = tcg_temp_new_internal(TCG_TYPE_I64, temp_local);
-    return MAKE_TCGV_I64(idx);
+    return ts;
 }
 
-static void tcg_temp_free_internal(int idx)
+void tcg_temp_free_internal(TCGTemp *ts)
 {
     TCGContext *s = &tcg_ctx;
-    TCGTemp *ts;
-    int k;
+    int k, idx = temp_idx(ts);
 
 #if defined(CONFIG_DEBUG_TCG)
     s->temps_in_use--;
@@ -677,7 +657,6 @@ static void tcg_temp_free_internal(int idx)
 #endif
 
     tcg_debug_assert(idx >= s->nb_globals && idx < s->nb_temps);
-    ts = &s->temps[idx];
     tcg_debug_assert(ts->temp_allocated != 0);
     ts->temp_allocated = 0;
 
@@ -685,16 +664,6 @@ static void tcg_temp_free_internal(int idx)
     set_bit(idx, s->free_temps[k].l);
 }
 
-void tcg_temp_free_i32(TCGv_i32 arg)
-{
-    tcg_temp_free_internal(GET_TCGV_I32(arg));
-}
-
-void tcg_temp_free_i64(TCGv_i64 arg)
-{
-    tcg_temp_free_internal(GET_TCGV_I64(arg));
-}
-
 TCGv_i32 tcg_const_i32(int32_t val)
 {
     TCGv_i32 t0;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 018c01c..a5a0412 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -395,6 +395,44 @@ static inline unsigned get_alignment_bits(TCGMemOp memop)
 
 typedef tcg_target_ulong TCGArg;
 
+typedef enum TCGTempVal {
+    TEMP_VAL_DEAD,
+    TEMP_VAL_REG,
+    TEMP_VAL_MEM,
+    TEMP_VAL_CONST,
+} TCGTempVal;
+
+typedef struct TCGTemp {
+    TCGReg reg:8;
+    TCGTempVal val_type:8;
+    TCGType base_type:8;
+    TCGType type:8;
+    unsigned int fixed_reg:1;
+    unsigned int indirect_reg:1;
+    unsigned int indirect_base:1;
+    unsigned int mem_coherent:1;
+    unsigned int mem_allocated:1;
+    /* If true, the temp is saved across both basic blocks and
+       translation blocks.  */
+    unsigned int temp_global:1;
+    /* If true, the temp is saved across basic blocks but dead
+       at the end of translation blocks.  If false, the temp is
+       dead at the end of basic blocks.  */
+    unsigned int temp_local:1;
+    unsigned int temp_allocated:1;
+
+    tcg_target_long val;
+    struct TCGTemp *mem_base;
+    intptr_t mem_offset;
+    const char *name;
+
+    /* Pass-specific information that can be stored for a temporary.
+       One word worth of integer data, and one pointer to data
+       allocated separately.  */
+    uintptr_t state;
+    void *state_ptr;
+} TCGTemp;
+
 /* Define type and accessor macros for TCG variables.
 
    TCG variables are the inputs and outputs of TCG ops, as described
@@ -411,25 +449,34 @@ typedef tcg_target_ulong TCGArg;
 
    Users of tcg_gen_* don't need to know about any of the internal
    details of these, and should treat them as opaque types.
-   You won't be able to look inside them in a debugger either.
 
    Internal implementation details follow:
 
-   Note that there is no definition of the structs TCGv_i32_d etc anywhere.
-   This is deliberate, because the values we store in variables of type
-   TCGv_i32 are not really pointers-to-structures. They're just small
-   integers, but keeping them in pointer types like this means that the
-   compiler will complain if you accidentally pass a TCGv_i32 to a
-   function which takes a TCGv_i64, and so on. Only the internals of
-   TCG need to care about the actual contents of the types, and they always
-   box and unbox via the MAKE_TCGV_* and GET_TCGV_* functions.
-   Converting to and from intptr_t rather than int reduces the number
-   of sign-extension instructions that get implied on 64-bit hosts.  */
-
-typedef struct TCGv_i32_d *TCGv_i32;
-typedef struct TCGv_i64_d *TCGv_i64;
-typedef struct TCGv_ptr_d *TCGv_ptr;
+   There is an array of TCGTemp structures which describe each variable.
+   For type checking purposes, we want to distinguish one TCGTemp pointer
+   from another.  We do this by creating different structure types
+   (TCGv_i32_d, TCGv_i64_d, TCGv_ptr_d) that wrap TCGTemp or a pair of them.
+   We unwrap these within tcg-op.c when generating opcodes.  After that
+   point we only have unpaired TCGTemp structures.  */
+
+typedef struct TCGv_i32_d {
+    TCGTemp impl;
+} *TCGv_i32;
+
+typedef struct TCGv_i64_d {
+#if TCG_TARGET_REG_BITS == 32
+    struct TCGv_i32_d lo, hi;
+#else
+    TCGTemp impl;
+#endif
+} *TCGv_i64;
+
+typedef struct TCGv_ptr_d {
+    TCGTemp impl;
+} *TCGv_ptr;
+
 typedef TCGv_ptr TCGv_env;
+
 #if TARGET_LONG_BITS == 32
 #define TCGv TCGv_i32
 #elif TARGET_LONG_BITS == 64
@@ -438,53 +485,23 @@ typedef TCGv_ptr TCGv_env;
 #error Unhandled TARGET_LONG_BITS value
 #endif
 
-static inline TCGv_i32 QEMU_ARTIFICIAL MAKE_TCGV_I32(intptr_t i)
-{
-    return (TCGv_i32)i;
-}
-
-static inline TCGv_i64 QEMU_ARTIFICIAL MAKE_TCGV_I64(intptr_t i)
-{
-    return (TCGv_i64)i;
-}
-
-static inline TCGv_ptr QEMU_ARTIFICIAL MAKE_TCGV_PTR(intptr_t i)
-{
-    return (TCGv_ptr)i;
-}
-
-static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_I32(TCGv_i32 t)
-{
-    return (intptr_t)t;
-}
-
-static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_I64(TCGv_i64 t)
-{
-    return (intptr_t)t;
-}
-
-static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
-{
-    return (intptr_t)t;
-}
-
 #if TCG_TARGET_REG_BITS == 32
-#define TCGV_LOW(t) MAKE_TCGV_I32(GET_TCGV_I64(t))
-#define TCGV_HIGH(t) MAKE_TCGV_I32(GET_TCGV_I64(t) + 1)
+#define TCGV_LOW(t)  (&(t)->lo)
+#define TCGV_HIGH(t) (&(t)->hi)
 #endif
 
-#define TCGV_EQUAL_I32(a, b) (GET_TCGV_I32(a) == GET_TCGV_I32(b))
-#define TCGV_EQUAL_I64(a, b) (GET_TCGV_I64(a) == GET_TCGV_I64(b))
-#define TCGV_EQUAL_PTR(a, b) (GET_TCGV_PTR(a) == GET_TCGV_PTR(b))
+#define TCGV_EQUAL_I32(a, b)  ((a) == (b))
+#define TCGV_EQUAL_I64(a, b)  ((a) == (b))
+#define TCGV_EQUAL_PTR(a, b)  ((a) == (b))
 
 /* Dummy definition to avoid compiler warnings.  */
-#define TCGV_UNUSED_I32(x) x = MAKE_TCGV_I32(-1)
-#define TCGV_UNUSED_I64(x) x = MAKE_TCGV_I64(-1)
-#define TCGV_UNUSED_PTR(x) x = MAKE_TCGV_PTR(-1)
+#define TCGV_UNUSED_I32(x)    ((x) = NULL)
+#define TCGV_UNUSED_I64(x)    ((x) = NULL)
+#define TCGV_UNUSED_PTR(x)    ((x) = NULL)
 
-#define TCGV_IS_UNUSED_I32(x) (GET_TCGV_I32(x) == -1)
-#define TCGV_IS_UNUSED_I64(x) (GET_TCGV_I64(x) == -1)
-#define TCGV_IS_UNUSED_PTR(x) (GET_TCGV_PTR(x) == -1)
+#define TCGV_IS_UNUSED_I32(x) ((x) == NULL)
+#define TCGV_IS_UNUSED_I64(x) ((x) == NULL)
+#define TCGV_IS_UNUSED_PTR(x) ((x) == NULL)
 
 /* call flags */
 /* Helper does not read globals (either directly or through an exception). It
@@ -568,44 +585,6 @@ static inline TCGCond tcg_high_cond(TCGCond c)
     }
 }
 
-typedef enum TCGTempVal {
-    TEMP_VAL_DEAD,
-    TEMP_VAL_REG,
-    TEMP_VAL_MEM,
-    TEMP_VAL_CONST,
-} TCGTempVal;
-
-typedef struct TCGTemp {
-    TCGReg reg:8;
-    TCGTempVal val_type:8;
-    TCGType base_type:8;
-    TCGType type:8;
-    unsigned int fixed_reg:1;
-    unsigned int indirect_reg:1;
-    unsigned int indirect_base:1;
-    unsigned int mem_coherent:1;
-    unsigned int mem_allocated:1;
-    /* If true, the temp is saved across both basic blocks and
-       translation blocks.  */
-    unsigned int temp_global:1;
-    /* If true, the temp is saved across basic blocks but dead
-       at the end of translation blocks.  If false, the temp is
-       dead at the end of basic blocks.  */
-    unsigned int temp_local:1;
-    unsigned int temp_allocated:1;
-
-    tcg_target_long val;
-    struct TCGTemp *mem_base;
-    intptr_t mem_offset;
-    const char *name;
-
-    /* Pass-specific information that can be stored for a temporary.
-       One word worth of integer data, and one pointer to data
-       allocated separately.  */
-    uintptr_t state;
-    void *state_ptr;
-} TCGTemp;
-
 typedef struct TCGContext TCGContext;
 
 typedef struct TCGTempSet {
@@ -755,6 +734,36 @@ static inline size_t arg_index(TCGArg a)
     return a;
 }
 
+static inline TCGv_i32 QEMU_ARTIFICIAL MAKE_TCGV_I32(TCGArg i)
+{
+    return (TCGv_i32)arg_temp(i);
+}
+
+static inline TCGv_i64 QEMU_ARTIFICIAL MAKE_TCGV_I64(TCGArg i)
+{
+    return (TCGv_i64)arg_temp(i);
+}
+
+static inline TCGv_ptr QEMU_ARTIFICIAL MAKE_TCGV_PTR(TCGArg i)
+{
+    return (TCGv_ptr)arg_temp(i);
+}
+
+static inline TCGArg QEMU_ARTIFICIAL GET_TCGV_I32(TCGv_i32 t)
+{
+    return temp_arg((TCGTemp *)t);
+}
+
+static inline TCGArg QEMU_ARTIFICIAL GET_TCGV_I64(TCGv_i64 t)
+{
+    return temp_arg((TCGTemp *)t);
+}
+
+static inline TCGArg QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
+{
+    return temp_arg((TCGTemp *)t);
+}
+
 static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
 {
     tcg_ctx.gen_op_buf[op_idx].args[arg] = v;
@@ -807,49 +816,59 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb);
 
 void tcg_set_frame(TCGContext *s, TCGReg reg, intptr_t start, intptr_t size);
 
-int tcg_global_mem_new_internal(TCGType, TCGv_ptr, intptr_t, const char *);
+TCGTemp *tcg_global_mem_new_internal(TCGType, TCGv_ptr, intptr_t, const char *);
+TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local);
+void tcg_temp_free_internal(TCGTemp *ts);
 
 TCGv_i32 tcg_global_reg_new_i32(TCGReg reg, const char *name);
 TCGv_i64 tcg_global_reg_new_i64(TCGReg reg, const char *name);
 
-TCGv_i32 tcg_temp_new_internal_i32(int temp_local);
-TCGv_i64 tcg_temp_new_internal_i64(int temp_local);
-
-void tcg_temp_free_i32(TCGv_i32 arg);
-void tcg_temp_free_i64(TCGv_i64 arg);
-
 static inline TCGv_i32 tcg_global_mem_new_i32(TCGv_ptr reg, intptr_t offset,
                                               const char *name)
 {
-    int idx = tcg_global_mem_new_internal(TCG_TYPE_I32, reg, offset, name);
-    return MAKE_TCGV_I32(idx);
+    TCGTemp *t = tcg_global_mem_new_internal(TCG_TYPE_I32, reg, offset, name);
+    return (TCGv_i32)t;
 }
 
 static inline TCGv_i32 tcg_temp_new_i32(void)
 {
-    return tcg_temp_new_internal_i32(0);
+    TCGTemp *t = tcg_temp_new_internal(TCG_TYPE_I32, false);
+    return (TCGv_i32)t;
 }
 
 static inline TCGv_i32 tcg_temp_local_new_i32(void)
 {
-    return tcg_temp_new_internal_i32(1);
+    TCGTemp *t = tcg_temp_new_internal(TCG_TYPE_I32, true);
+    return (TCGv_i32)t;
 }
 
 static inline TCGv_i64 tcg_global_mem_new_i64(TCGv_ptr reg, intptr_t offset,
                                               const char *name)
 {
-    int idx = tcg_global_mem_new_internal(TCG_TYPE_I64, reg, offset, name);
-    return MAKE_TCGV_I64(idx);
+    TCGTemp *t = tcg_global_mem_new_internal(TCG_TYPE_I64, reg, offset, name);
+    return (TCGv_i64)t;
 }
 
 static inline TCGv_i64 tcg_temp_new_i64(void)
 {
-    return tcg_temp_new_internal_i64(0);
+    TCGTemp *t = tcg_temp_new_internal(TCG_TYPE_I64, false);
+    return (TCGv_i64)t;
 }
 
 static inline TCGv_i64 tcg_temp_local_new_i64(void)
 {
-    return tcg_temp_new_internal_i64(1);
+    TCGTemp *t = tcg_temp_new_internal(TCG_TYPE_I64, true);
+    return (TCGv_i64)t;
+}
+
+static inline void tcg_temp_free_i32(TCGv_i32 arg)
+{
+    tcg_temp_free_internal((TCGTemp *)arg);
+}
+
+static inline void tcg_temp_free_i64(TCGv_i64 arg)
+{
+    tcg_temp_free_internal((TCGTemp *)arg);
 }
 
 #if defined(CONFIG_DEBUG_TCG)
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 16/16] tcg: Store pointers to temporaries directly in TCGArg
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (14 preceding siblings ...)
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 15/16] tcg: Define separate structures for TCGv_* Richard Henderson
@ 2017-06-21  2:48 ` Richard Henderson
  2017-06-21  3:43 ` [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end no-reply
  2017-06-26 16:49 ` Alex Bennée
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-06-21  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.c |  8 ++++----
 tcg/tcg.h | 14 ++++++++------
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 1ca1192..c25f455 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -810,11 +810,11 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
 #else
         if (TCG_TARGET_REG_BITS < 64 && (sizemask & 1)) {
 #ifdef HOST_WORDS_BIGENDIAN
-            op->args[pi++] = ret + 1;
+            op->args[pi++] = ret + sizeof(TCGTemp);
             op->args[pi++] = ret;
 #else
             op->args[pi++] = ret;
-            op->args[pi++] = ret + 1;
+            op->args[pi++] = ret + sizeof(TCGTemp);
 #endif
             nb_rets = 2;
         } else {
@@ -849,11 +849,11 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
               have to get more complicated to differentiate between
               stack arguments and register arguments.  */
 #if defined(HOST_WORDS_BIGENDIAN) != defined(TCG_TARGET_STACK_GROWSUP)
-            op->args[pi++] = args[i] + 1;
+            op->args[pi++] = args[i] + sizeof(TCGTemp);
             op->args[pi++] = args[i];
 #else
             op->args[pi++] = args[i];
-            op->args[pi++] = args[i] + 1;
+            op->args[pi++] = args[i] + sizeof(TCGTemp);
 #endif
             real_args += 2;
             continue;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index a5a0412..df73b31 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -520,7 +520,7 @@ typedef TCGv_ptr TCGv_env;
 #define TCG_CALL_NO_WG_SE       (TCG_CALL_NO_WG | TCG_CALL_NO_SE)
 
 /* used to align parameters */
-#define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
+#define TCG_CALL_DUMMY_ARG      0
 
 /* Conditions.  Note that these are laid out for easy manipulation by
    the functions below:
@@ -714,24 +714,26 @@ extern bool parallel_cpus;
 
 static inline size_t temp_idx(TCGTemp *ts)
 {
-    ptrdiff_t n = ts - tcg_ctx.temps;
-    tcg_debug_assert(n >= 0 && n < tcg_ctx.nb_temps);
+    size_t n = ts - tcg_ctx.temps;
+    tcg_debug_assert(n < tcg_ctx.nb_temps);
     return n;
 }
 
 static inline TCGArg temp_arg(TCGTemp *ts)
 {
-    return temp_idx(ts);
+    size_t n = ts - tcg_ctx.temps;
+    tcg_debug_assert(n < tcg_ctx.nb_temps);
+    return (uintptr_t)ts;
 }
 
 static inline TCGTemp *arg_temp(TCGArg a)
 {
-    return a == TCG_CALL_DUMMY_ARG ? NULL : &tcg_ctx.temps[a];
+    return (TCGTemp *)(uintptr_t)a;
 }
 
 static inline size_t arg_index(TCGArg a)
 {
-    return a;
+    return temp_idx(arg_temp(a));
 }
 
 static inline TCGv_i32 QEMU_ARTIFICIAL MAKE_TCGV_I32(TCGArg i)
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (15 preceding siblings ...)
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 16/16] tcg: Store pointers to temporaries directly in TCGArg Richard Henderson
@ 2017-06-21  3:43 ` no-reply
  2017-06-26 16:49 ` Alex Bennée
  17 siblings, 0 replies; 40+ messages in thread
From: no-reply @ 2017-06-21  3:43 UTC (permalink / raw)
  To: rth; +Cc: famz, qemu-devel, aurelien

Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end
Type: series
Message-id: 20170621024831.26019-1-rth@twiddle.net

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/1498014889-52658-1-git-send-email-wanpeng.li@hotmail.com -> patchew/1498014889-52658-1-git-send-email-wanpeng.li@hotmail.com
Switched to a new branch 'test'
4a3cb84 tcg: Store pointers to temporaries directly in TCGArg
02ed29c tcg: Define separate structures for TCGv_*
5cf0662 tcg: Use per-temp state data in optimize
5fdbde2 tcg: Export temp_idx
15a10ce tcg: Remove unused TCG_CALL_DUMMY_TCGV
4fa8938 tcg: Change temp_allocate_frame arg to TCGTemp
4c11a05 tcg: Avoid loops against variable bounds
6976828 tcg: Use per-temp state data in liveness
1f69381 tcg: Introduce temp_arg
af94763 tcg: Return NULL temp for TCG_CALL_DUMMY_ARG
cb3e123 tcg: Add temp_global bit to TCGTemp
11681e4 tcg: Introduce arg_temp
0d81916 tcg: Propagate TCGOp down to allocators
b2dc5b3 tcg: Propagate args to op->args in tcg.c
d095548 tcg: Propagate args to op->args in optimizer
33e16df tcg: Merge opcode arguments into TCGOp

=== OUTPUT BEGIN ===
Checking PATCH 1/16: tcg: Merge opcode arguments into TCGOp...
ERROR: spaces prohibited around that ':' (ctx:WxW)
#480: FILE: tcg/tcg.h:619:
+    unsigned calli  : 4;        /* 12 */
                     ^

ERROR: spaces prohibited around that ':' (ctx:WxW)
#481: FILE: tcg/tcg.h:620:
+    unsigned callo  : 2;        /* 14 */
                     ^

ERROR: space prohibited before that ':' (ctx:WxW)
#482: FILE: tcg/tcg.h:621:
+    unsigned        : 2;        /* 16 */
                     ^

ERROR: spaces prohibited around that ':' (ctx:WxW)
#487: FILE: tcg/tcg.h:624:
+    unsigned prev   : 16;       /* 32 */
                     ^

ERROR: spaces prohibited around that ':' (ctx:WxW)
#488: FILE: tcg/tcg.h:625:
+    unsigned next   : 16;       /* 48 */
                     ^

total: 5 errors, 0 warnings, 481 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/16: tcg: Propagate args to op->args in optimizer...
ERROR: spaces required around that '-' (ctx:VxV)
#644: FILE: tcg/optimize.c:1165:
+                tcg_opt_gen_mov(s, op, op->args[0], op->args[4-tmp]);
                                                               ^

total: 1 errors, 0 warnings, 912 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 3/16: tcg: Propagate args to op->args in tcg.c...
Checking PATCH 4/16: tcg: Propagate TCGOp down to allocators...
Checking PATCH 5/16: tcg: Introduce arg_temp...
Checking PATCH 6/16: tcg: Add temp_global bit to TCGTemp...
Checking PATCH 7/16: tcg: Return NULL temp for TCG_CALL_DUMMY_ARG...
Checking PATCH 8/16: tcg: Introduce temp_arg...
Checking PATCH 9/16: tcg: Use per-temp state data in liveness...
WARNING: line over 80 characters
#186: FILE: tcg/tcg.c:1572:
+            } else if (arg_temp(op->args[0])->state == TS_DEAD && have_opc_new2) {

total: 0 errors, 1 warnings, 442 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 10/16: tcg: Avoid loops against variable bounds...
Checking PATCH 11/16: tcg: Change temp_allocate_frame arg to TCGTemp...
Checking PATCH 12/16: tcg: Remove unused TCG_CALL_DUMMY_TCGV...
Checking PATCH 13/16: tcg: Export temp_idx...
Checking PATCH 14/16: tcg: Use per-temp state data in optimize...
Checking PATCH 15/16: tcg: Define separate structures for TCGv_*...
Checking PATCH 16/16: tcg: Store pointers to temporaries directly in TCGArg...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 01/16] tcg: Merge opcode arguments into TCGOp
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 01/16] tcg: Merge opcode arguments into TCGOp Richard Henderson
@ 2017-06-26 14:44   ` Alex Bennée
  2017-06-26 14:55     ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Bennée @ 2017-06-26 14:44 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> Rather than have a separate buffer of 10*max_ops entries,
> give each opcode 10 entries.  The result is actually a bit
> smaller and should have slightly more cache locality.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
<snip>

The changes look fine, some questions bellow:

> index 9e37722..720e04e 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -51,8 +51,6 @@
>  #define OPC_BUF_SIZE 640
>  #define OPC_MAX_SIZE (OPC_BUF_SIZE - MAX_OP_PER_INSTR)
>
> -#define OPPARAM_BUF_SIZE (OPC_BUF_SIZE * MAX_OPC_PARAM)
> -
>  #define CPU_TEMP_BUF_NLONGS 128
>
>  /* Default target word size to pointer size.  */
> @@ -613,33 +611,29 @@ typedef struct TCGTempSet {
>  #define SYNC_ARG  1
>  typedef uint16_t TCGLifeData;
>
> -/* The layout here is designed to avoid crossing of a 32-bit boundary.
> -   If we do so, gcc adds padding, expanding the size to 12.  */
> +/* The layout here is designed to avoid crossing of a 32-bit
> boundary.  */

This isn't correct now? Do we mean we now aim to be cache line aligned?

>  typedef struct TCGOp {
>      TCGOpcode opc   : 8;        /*  8 */
>
> -    /* Index of the prev/next op, or 0 for the end of the list.  */
> -    unsigned prev   : 10;       /* 18 */
> -    unsigned next   : 10;       /* 28 */
> -
>      /* The number of out and in parameter for a call.  */
> -    unsigned calli  : 4;        /* 32 */
> -    unsigned callo  : 2;        /* 34 */
> +    unsigned calli  : 4;        /* 12 */
> +    unsigned callo  : 2;        /* 14 */
> +    unsigned        : 2;        /* 16 */
>
> -    /* Index of the arguments for this op, or 0 for zero-operand ops.  */
> -    unsigned args   : 14;       /* 48 */
> +    /* Index of the prev/next op, or 0 for the end of the list.  */
> +    unsigned prev   : 16;       /* 32 */
> +    unsigned next   : 16;       /* 48 */
>
>      /* Lifetime data of the operands.  */
>      unsigned life   : 16;       /* 64 */
> +
> +    /* Arguments for the opcode.  */
> +    TCGArg args[MAX_OPC_PARAM];
>  } TCGOp;
>
>  /* Make sure operands fit in the bitfields above.  */
>  QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8));
> -QEMU_BUILD_BUG_ON(OPC_BUF_SIZE > (1 << 10));
> -QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE > (1 << 14));
> -
> -/* Make sure that we don't overflow 64 bits without noticing.  */
> -QEMU_BUILD_BUG_ON(sizeof(TCGOp) > 8);
> +QEMU_BUILD_BUG_ON(OPC_BUF_SIZE > (1 << 16));

OPC_BUF_SIZE is statically assigned, we don't seem to be taking notice
of sizeof(TCGOp) anymore here. In fact OPC_BUF_SIZE is really
MAX_TCG_OPS right?

I see TCGArg is currently target_ulong. Is this because we never leak
the host size details into generated code safe for the statically
assigned env_ptr?

I mention this because in looking at modelling SIMD registers I'm going
to need to carry a host ptr around in TCG registers that can be passed
to helpers and the like.

>
>  struct TCGContext {
>      uint8_t *pool_cur, *pool_end;
> @@ -691,7 +685,6 @@ struct TCGContext {
>  #endif
>
>      int gen_next_op_idx;
> -    int gen_next_parm_idx;
>
>      /* Code generation.  Note that we specifically do not use tcg_insn_unit
>         here, because there's too much arithmetic throughout that relies
> @@ -723,7 +716,6 @@ struct TCGContext {
>      TCGTemp *reg_to_temp[TCG_TARGET_NB_REGS];
>
>      TCGOp gen_op_buf[OPC_BUF_SIZE];
> -    TCGArg gen_opparam_buf[OPPARAM_BUF_SIZE];
>
>      uint16_t gen_insn_end_off[TCG_MAX_INSNS];
>      target_ulong gen_insn_data[TCG_MAX_INSNS][TARGET_INSN_START_WORDS];
> @@ -734,8 +726,7 @@ extern bool parallel_cpus;
>
>  static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
>  {
> -    int op_argi = tcg_ctx.gen_op_buf[op_idx].args;
> -    tcg_ctx.gen_opparam_buf[op_argi + arg] = v;
> +    tcg_ctx.gen_op_buf[op_idx].args[arg] = v;
>  }
>
>  /* The number of opcodes emitted so far.  */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 02/16] tcg: Propagate args to op->args in optimizer
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 02/16] tcg: Propagate args to op->args in optimizer Richard Henderson
@ 2017-06-26 14:53   ` Alex Bennée
  0 siblings, 0 replies; 40+ messages in thread
From: Alex Bennée @ 2017-06-26 14:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 430 ++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 227 insertions(+), 203 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 002aad6..1a1c6fb 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -166,8 +166,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
>      return false;
>  }
>
> -static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
> -                             TCGArg dst, TCGArg val)
> +static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val)
>  {
>      TCGOpcode new_op = op_to_movi(op->opc);
>      tcg_target_ulong mask;
> @@ -184,12 +183,11 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
>      }
>      temps[dst].mask = mask;
>
> -    args[0] = dst;
> -    args[1] = val;
> +    op->args[0] = dst;
> +    op->args[1] = val;
>  }
>
> -static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
> -                            TCGArg dst, TCGArg src)
> +static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
>  {
>      if (temps_are_copies(dst, src)) {
>          tcg_op_remove(s, op);
> @@ -218,8 +216,8 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
>          temps[dst].val = temps[src].val;
>      }
>
> -    args[0] = dst;
> -    args[1] = src;
> +    op->args[0] = dst;
> +    op->args[1] = src;
>  }
>
>  static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
> @@ -559,7 +557,7 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
>  void tcg_optimize(TCGContext *s)
>  {
>      int oi, oi_next, nb_temps, nb_globals;
> -    TCGArg *prev_mb_args = NULL;
> +    TCGOp *prev_mb = NULL;
>
>      /* Array VALS has an element for each temp.
>         If this temp holds a constant then its value is kept in VALS' element.
> @@ -576,7 +574,6 @@ void tcg_optimize(TCGContext *s)
>          TCGArg tmp;
>
>          TCGOp * const op = &s->gen_op_buf[oi];
> -        TCGArg * const args = op->args;
>          TCGOpcode opc = op->opc;
>          const TCGOpDef *def = &tcg_op_defs[opc];
>
> @@ -588,7 +585,7 @@ void tcg_optimize(TCGContext *s)
>              nb_oargs = op->callo;
>              nb_iargs = op->calli;
>              for (i = 0; i < nb_oargs + nb_iargs; i++) {
> -                tmp = args[i];
> +                tmp = op->args[i];
>                  if (tmp != TCG_CALL_DUMMY_ARG) {
>                      init_temp_info(tmp);
>                  }
> @@ -597,14 +594,14 @@ void tcg_optimize(TCGContext *s)
>              nb_oargs = def->nb_oargs;
>              nb_iargs = def->nb_iargs;
>              for (i = 0; i < nb_oargs + nb_iargs; i++) {
> -                init_temp_info(args[i]);
> +                init_temp_info(op->args[i]);
>              }
>          }
>
>          /* Do copy propagation */
>          for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
> -            if (temp_is_copy(args[i])) {
> -                args[i] = find_better_copy(s, args[i]);
> +            if (temp_is_copy(op->args[i])) {
> +                op->args[i] = find_better_copy(s, op->args[i]);
>              }
>          }
>
> @@ -620,45 +617,45 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(nor):
>          CASE_OP_32_64(muluh):
>          CASE_OP_32_64(mulsh):
> -            swap_commutative(args[0], &args[1], &args[2]);
> +            swap_commutative(op->args[0], &op->args[1], &op->args[2]);
>              break;
>          CASE_OP_32_64(brcond):
> -            if (swap_commutative(-1, &args[0], &args[1])) {
> -                args[2] = tcg_swap_cond(args[2]);
> +            if (swap_commutative(-1, &op->args[0], &op->args[1])) {
> +                op->args[2] = tcg_swap_cond(op->args[2]);
>              }
>              break;
>          CASE_OP_32_64(setcond):
> -            if (swap_commutative(args[0], &args[1], &args[2])) {
> -                args[3] = tcg_swap_cond(args[3]);
> +            if (swap_commutative(op->args[0], &op->args[1], &op->args[2])) {
> +                op->args[3] = tcg_swap_cond(op->args[3]);
>              }
>              break;
>          CASE_OP_32_64(movcond):
> -            if (swap_commutative(-1, &args[1], &args[2])) {
> -                args[5] = tcg_swap_cond(args[5]);
> +            if (swap_commutative(-1, &op->args[1], &op->args[2])) {
> +                op->args[5] = tcg_swap_cond(op->args[5]);
>              }
>              /* For movcond, we canonicalize the "false" input reg to match
>                 the destination reg so that the tcg backend can implement
>                 a "move if true" operation.  */
> -            if (swap_commutative(args[0], &args[4], &args[3])) {
> -                args[5] = tcg_invert_cond(args[5]);
> +            if (swap_commutative(op->args[0], &op->args[4], &op->args[3])) {
> +                op->args[5] = tcg_invert_cond(op->args[5]);
>              }
>              break;
>          CASE_OP_32_64(add2):
> -            swap_commutative(args[0], &args[2], &args[4]);
> -            swap_commutative(args[1], &args[3], &args[5]);
> +            swap_commutative(op->args[0], &op->args[2], &op->args[4]);
> +            swap_commutative(op->args[1], &op->args[3], &op->args[5]);
>              break;
>          CASE_OP_32_64(mulu2):
>          CASE_OP_32_64(muls2):
> -            swap_commutative(args[0], &args[2], &args[3]);
> +            swap_commutative(op->args[0], &op->args[2], &op->args[3]);
>              break;
>          case INDEX_op_brcond2_i32:
> -            if (swap_commutative2(&args[0], &args[2])) {
> -                args[4] = tcg_swap_cond(args[4]);
> +            if (swap_commutative2(&op->args[0], &op->args[2])) {
> +                op->args[4] = tcg_swap_cond(op->args[4]);
>              }
>              break;
>          case INDEX_op_setcond2_i32:
> -            if (swap_commutative2(&args[1], &args[3])) {
> -                args[5] = tcg_swap_cond(args[5]);
> +            if (swap_commutative2(&op->args[1], &op->args[3])) {
> +                op->args[5] = tcg_swap_cond(op->args[5]);
>              }
>              break;
>          default:
> @@ -673,8 +670,8 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(sar):
>          CASE_OP_32_64(rotl):
>          CASE_OP_32_64(rotr):
> -            if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
> -                tcg_opt_gen_movi(s, op, args, args[0], 0);
> +            if (temp_is_const(op->args[1]) && temps[op->args[1]].val == 0) {
> +                tcg_opt_gen_movi(s, op, op->args[0], 0);
>                  continue;
>              }
>              break;
> @@ -683,7 +680,7 @@ void tcg_optimize(TCGContext *s)
>                  TCGOpcode neg_op;
>                  bool have_neg;
>
> -                if (temp_is_const(args[2])) {
> +                if (temp_is_const(op->args[2])) {
>                      /* Proceed with possible constant folding. */
>                      break;
>                  }
> @@ -697,40 +694,45 @@ void tcg_optimize(TCGContext *s)
>                  if (!have_neg) {
>                      break;
>                  }
> -                if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
> +                if (temp_is_const(op->args[1])
> +                    && temps[op->args[1]].val == 0) {
>                      op->opc = neg_op;
> -                    reset_temp(args[0]);
> -                    args[1] = args[2];
> +                    reset_temp(op->args[0]);
> +                    op->args[1] = op->args[2];
>                      continue;
>                  }
>              }
>              break;
>          CASE_OP_32_64(xor):
>          CASE_OP_32_64(nand):
> -            if (!temp_is_const(args[1])
> -                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
> +            if (!temp_is_const(op->args[1])
> +                && temp_is_const(op->args[2])
> +                && temps[op->args[2]].val == -1) {
>                  i = 1;
>                  goto try_not;
>              }
>              break;
>          CASE_OP_32_64(nor):
> -            if (!temp_is_const(args[1])
> -                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
> +            if (!temp_is_const(op->args[1])
> +                && temp_is_const(op->args[2])
> +                && temps[op->args[2]].val == 0) {
>                  i = 1;
>                  goto try_not;
>              }
>              break;
>          CASE_OP_32_64(andc):
> -            if (!temp_is_const(args[2])
> -                && temp_is_const(args[1]) && temps[args[1]].val == -1) {
> +            if (!temp_is_const(op->args[2])
> +                && temp_is_const(op->args[1])
> +                && temps[op->args[1]].val == -1) {
>                  i = 2;
>                  goto try_not;
>              }
>              break;
>          CASE_OP_32_64(orc):
>          CASE_OP_32_64(eqv):
> -            if (!temp_is_const(args[2])
> -                && temp_is_const(args[1]) && temps[args[1]].val == 0) {
> +            if (!temp_is_const(op->args[2])
> +                && temp_is_const(op->args[1])
> +                && temps[op->args[1]].val == 0) {
>                  i = 2;
>                  goto try_not;
>              }
> @@ -751,8 +753,8 @@ void tcg_optimize(TCGContext *s)
>                      break;
>                  }
>                  op->opc = not_op;
> -                reset_temp(args[0]);
> -                args[1] = args[i];
> +                reset_temp(op->args[0]);
> +                op->args[1] = op->args[i];
>                  continue;
>              }
>          default:
> @@ -771,18 +773,20 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(or):
>          CASE_OP_32_64(xor):
>          CASE_OP_32_64(andc):
> -            if (!temp_is_const(args[1])
> -                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
> -                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
> +            if (!temp_is_const(op->args[1])
> +                && temp_is_const(op->args[2])
> +                && temps[op->args[2]].val == 0) {
> +                tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
>                  continue;
>              }
>              break;
>          CASE_OP_32_64(and):
>          CASE_OP_32_64(orc):
>          CASE_OP_32_64(eqv):
> -            if (!temp_is_const(args[1])
> -                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
> -                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
> +            if (!temp_is_const(op->args[1])
> +                && temp_is_const(op->args[2])
> +                && temps[op->args[2]].val == -1) {
> +                tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
>                  continue;
>              }
>              break;
> @@ -796,21 +800,21 @@ void tcg_optimize(TCGContext *s)
>          affected = -1;
>          switch (opc) {
>          CASE_OP_32_64(ext8s):
> -            if ((temps[args[1]].mask & 0x80) != 0) {
> +            if ((temps[op->args[1]].mask & 0x80) != 0) {
>                  break;
>              }
>          CASE_OP_32_64(ext8u):
>              mask = 0xff;
>              goto and_const;
>          CASE_OP_32_64(ext16s):
> -            if ((temps[args[1]].mask & 0x8000) != 0) {
> +            if ((temps[op->args[1]].mask & 0x8000) != 0) {
>                  break;
>              }
>          CASE_OP_32_64(ext16u):
>              mask = 0xffff;
>              goto and_const;
>          case INDEX_op_ext32s_i64:
> -            if ((temps[args[1]].mask & 0x80000000) != 0) {
> +            if ((temps[op->args[1]].mask & 0x80000000) != 0) {
>                  break;
>              }
>          case INDEX_op_ext32u_i64:
> @@ -818,110 +822,111 @@ void tcg_optimize(TCGContext *s)
>              goto and_const;
>
>          CASE_OP_32_64(and):
> -            mask = temps[args[2]].mask;
> -            if (temp_is_const(args[2])) {
> +            mask = temps[op->args[2]].mask;
> +            if (temp_is_const(op->args[2])) {
>          and_const:
> -                affected = temps[args[1]].mask & ~mask;
> +                affected = temps[op->args[1]].mask & ~mask;
>              }
> -            mask = temps[args[1]].mask & mask;
> +            mask = temps[op->args[1]].mask & mask;
>              break;
>
>          case INDEX_op_ext_i32_i64:
> -            if ((temps[args[1]].mask & 0x80000000) != 0) {
> +            if ((temps[op->args[1]].mask & 0x80000000) != 0) {
>                  break;
>              }
>          case INDEX_op_extu_i32_i64:
>              /* We do not compute affected as it is a size changing op.  */
> -            mask = (uint32_t)temps[args[1]].mask;
> +            mask = (uint32_t)temps[op->args[1]].mask;
>              break;
>
>          CASE_OP_32_64(andc):
>              /* Known-zeros does not imply known-ones.  Therefore unless
> -               args[2] is constant, we can't infer anything from it.  */
> -            if (temp_is_const(args[2])) {
> -                mask = ~temps[args[2]].mask;
> +               op->args[2] is constant, we can't infer anything from it.  */
> +            if (temp_is_const(op->args[2])) {
> +                mask = ~temps[op->args[2]].mask;
>                  goto and_const;
>              }
> -            /* But we certainly know nothing outside args[1] may be set. */
> -            mask = temps[args[1]].mask;
> +            /* But we certainly know nothing outside op->args[1] may be set. */
> +            mask = temps[op->args[1]].mask;
>              break;
>
>          case INDEX_op_sar_i32:
> -            if (temp_is_const(args[2])) {
> -                tmp = temps[args[2]].val & 31;
> -                mask = (int32_t)temps[args[1]].mask >> tmp;
> +            if (temp_is_const(op->args[2])) {
> +                tmp = temps[op->args[2]].val & 31;
> +                mask = (int32_t)temps[op->args[1]].mask >> tmp;
>              }
>              break;
>          case INDEX_op_sar_i64:
> -            if (temp_is_const(args[2])) {
> -                tmp = temps[args[2]].val & 63;
> -                mask = (int64_t)temps[args[1]].mask >> tmp;
> +            if (temp_is_const(op->args[2])) {
> +                tmp = temps[op->args[2]].val & 63;
> +                mask = (int64_t)temps[op->args[1]].mask >> tmp;
>              }
>              break;
>
>          case INDEX_op_shr_i32:
> -            if (temp_is_const(args[2])) {
> -                tmp = temps[args[2]].val & 31;
> -                mask = (uint32_t)temps[args[1]].mask >> tmp;
> +            if (temp_is_const(op->args[2])) {
> +                tmp = temps[op->args[2]].val & 31;
> +                mask = (uint32_t)temps[op->args[1]].mask >> tmp;
>              }
>              break;
>          case INDEX_op_shr_i64:
> -            if (temp_is_const(args[2])) {
> -                tmp = temps[args[2]].val & 63;
> -                mask = (uint64_t)temps[args[1]].mask >> tmp;
> +            if (temp_is_const(op->args[2])) {
> +                tmp = temps[op->args[2]].val & 63;
> +                mask = (uint64_t)temps[op->args[1]].mask >> tmp;
>              }
>              break;
>
>          case INDEX_op_extrl_i64_i32:
> -            mask = (uint32_t)temps[args[1]].mask;
> +            mask = (uint32_t)temps[op->args[1]].mask;
>              break;
>          case INDEX_op_extrh_i64_i32:
> -            mask = (uint64_t)temps[args[1]].mask >> 32;
> +            mask = (uint64_t)temps[op->args[1]].mask >> 32;
>              break;
>
>          CASE_OP_32_64(shl):
> -            if (temp_is_const(args[2])) {
> -                tmp = temps[args[2]].val & (TCG_TARGET_REG_BITS - 1);
> -                mask = temps[args[1]].mask << tmp;
> +            if (temp_is_const(op->args[2])) {
> +                tmp = temps[op->args[2]].val & (TCG_TARGET_REG_BITS - 1);
> +                mask = temps[op->args[1]].mask << tmp;
>              }
>              break;
>
>          CASE_OP_32_64(neg):
>              /* Set to 1 all bits to the left of the rightmost.  */
> -            mask = -(temps[args[1]].mask & -temps[args[1]].mask);
> +            mask = -(temps[op->args[1]].mask & -temps[op->args[1]].mask);
>              break;
>
>          CASE_OP_32_64(deposit):
> -            mask = deposit64(temps[args[1]].mask, args[3], args[4],
> -                             temps[args[2]].mask);
> +            mask = deposit64(temps[op->args[1]].mask, op->args[3],
> +                             op->args[4], temps[op->args[2]].mask);
>              break;
>
>          CASE_OP_32_64(extract):
> -            mask = extract64(temps[args[1]].mask, args[2], args[3]);
> -            if (args[2] == 0) {
> -                affected = temps[args[1]].mask & ~mask;
> +            mask = extract64(temps[op->args[1]].mask, op->args[2], op->args[3]);
> +            if (op->args[2] == 0) {
> +                affected = temps[op->args[1]].mask & ~mask;
>              }
>              break;
>          CASE_OP_32_64(sextract):
> -            mask = sextract64(temps[args[1]].mask, args[2], args[3]);
> -            if (args[2] == 0 && (tcg_target_long)mask >= 0) {
> -                affected = temps[args[1]].mask & ~mask;
> +            mask = sextract64(temps[op->args[1]].mask,
> +                              op->args[2], op->args[3]);
> +            if (op->args[2] == 0 && (tcg_target_long)mask >= 0) {
> +                affected = temps[op->args[1]].mask & ~mask;
>              }
>              break;
>
>          CASE_OP_32_64(or):
>          CASE_OP_32_64(xor):
> -            mask = temps[args[1]].mask | temps[args[2]].mask;
> +            mask = temps[op->args[1]].mask | temps[op->args[2]].mask;
>              break;
>
>          case INDEX_op_clz_i32:
>          case INDEX_op_ctz_i32:
> -            mask = temps[args[2]].mask | 31;
> +            mask = temps[op->args[2]].mask | 31;
>              break;
>
>          case INDEX_op_clz_i64:
>          case INDEX_op_ctz_i64:
> -            mask = temps[args[2]].mask | 63;
> +            mask = temps[op->args[2]].mask | 63;
>              break;
>
>          case INDEX_op_ctpop_i32:
> @@ -937,7 +942,7 @@ void tcg_optimize(TCGContext *s)
>              break;
>
>          CASE_OP_32_64(movcond):
> -            mask = temps[args[3]].mask | temps[args[4]].mask;
> +            mask = temps[op->args[3]].mask | temps[op->args[4]].mask;
>              break;
>
>          CASE_OP_32_64(ld8u):
> @@ -952,7 +957,7 @@ void tcg_optimize(TCGContext *s)
>
>          CASE_OP_32_64(qemu_ld):
>              {
> -                TCGMemOpIdx oi = args[nb_oargs + nb_iargs];
> +                TCGMemOpIdx oi = op->args[nb_oargs + nb_iargs];
>                  TCGMemOp mop = get_memop(oi);
>                  if (!(mop & MO_SIGN)) {
>                      mask = (2ULL << ((8 << (mop & MO_SIZE)) - 1)) - 1;
> @@ -976,12 +981,12 @@ void tcg_optimize(TCGContext *s)
>
>          if (partmask == 0) {
>              tcg_debug_assert(nb_oargs == 1);
> -            tcg_opt_gen_movi(s, op, args, args[0], 0);
> +            tcg_opt_gen_movi(s, op, op->args[0], 0);
>              continue;
>          }
>          if (affected == 0) {
>              tcg_debug_assert(nb_oargs == 1);
> -            tcg_opt_gen_mov(s, op, args, args[0], args[1]);
> +            tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
>              continue;
>          }
>
> @@ -991,8 +996,8 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(mul):
>          CASE_OP_32_64(muluh):
>          CASE_OP_32_64(mulsh):
> -            if ((temp_is_const(args[2]) && temps[args[2]].val == 0)) {
> -                tcg_opt_gen_movi(s, op, args, args[0], 0);
> +            if ((temp_is_const(op->args[2]) && temps[op->args[2]].val == 0)) {
> +                tcg_opt_gen_movi(s, op, op->args[0], 0);
>                  continue;
>              }
>              break;
> @@ -1004,8 +1009,8 @@ void tcg_optimize(TCGContext *s)
>          switch (opc) {
>          CASE_OP_32_64(or):
>          CASE_OP_32_64(and):
> -            if (temps_are_copies(args[1], args[2])) {
> -                tcg_opt_gen_mov(s, op, args, args[0], args[1]);
> +            if (temps_are_copies(op->args[1], op->args[2])) {
> +                tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
>                  continue;
>              }
>              break;
> @@ -1018,8 +1023,8 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(andc):
>          CASE_OP_32_64(sub):
>          CASE_OP_32_64(xor):
> -            if (temps_are_copies(args[1], args[2])) {
> -                tcg_opt_gen_movi(s, op, args, args[0], 0);
> +            if (temps_are_copies(op->args[1], op->args[2])) {
> +                tcg_opt_gen_movi(s, op, op->args[0], 0);
>                  continue;
>              }
>              break;
> @@ -1032,10 +1037,10 @@ void tcg_optimize(TCGContext *s)
>             allocator where needed and possible.  Also detect copies. */
>          switch (opc) {
>          CASE_OP_32_64(mov):
> -            tcg_opt_gen_mov(s, op, args, args[0], args[1]);
> +            tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
>              break;
>          CASE_OP_32_64(movi):
> -            tcg_opt_gen_movi(s, op, args, args[0], args[1]);
> +            tcg_opt_gen_movi(s, op, op->args[0], op->args[1]);
>              break;
>
>          CASE_OP_32_64(not):
> @@ -1051,9 +1056,9 @@ void tcg_optimize(TCGContext *s)
>          case INDEX_op_extu_i32_i64:
>          case INDEX_op_extrl_i64_i32:
>          case INDEX_op_extrh_i64_i32:
> -            if (temp_is_const(args[1])) {
> -                tmp = do_constant_folding(opc, temps[args[1]].val, 0);
> -                tcg_opt_gen_movi(s, op, args, args[0], tmp);
> +            if (temp_is_const(op->args[1])) {
> +                tmp = do_constant_folding(opc, temps[op->args[1]].val, 0);
> +                tcg_opt_gen_movi(s, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
> @@ -1080,68 +1085,72 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(divu):
>          CASE_OP_32_64(rem):
>          CASE_OP_32_64(remu):
> -            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
> -                tmp = do_constant_folding(opc, temps[args[1]].val,
> -                                          temps[args[2]].val);
> -                tcg_opt_gen_movi(s, op, args, args[0], tmp);
> +            if (temp_is_const(op->args[1]) && temp_is_const(op->args[2])) {
> +                tmp = do_constant_folding(opc, temps[op->args[1]].val,
> +                                          temps[op->args[2]].val);
> +                tcg_opt_gen_movi(s, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
>
>          CASE_OP_32_64(clz):
>          CASE_OP_32_64(ctz):
> -            if (temp_is_const(args[1])) {
> -                TCGArg v = temps[args[1]].val;
> +            if (temp_is_const(op->args[1])) {
> +                TCGArg v = temps[op->args[1]].val;
>                  if (v != 0) {
>                      tmp = do_constant_folding(opc, v, 0);
> -                    tcg_opt_gen_movi(s, op, args, args[0], tmp);
> +                    tcg_opt_gen_movi(s, op, op->args[0], tmp);
>                  } else {
> -                    tcg_opt_gen_mov(s, op, args, args[0], args[2]);
> +                    tcg_opt_gen_mov(s, op, op->args[0], op->args[2]);
>                  }
>                  break;
>              }
>              goto do_default;
>
>          CASE_OP_32_64(deposit):
> -            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
> -                tmp = deposit64(temps[args[1]].val, args[3], args[4],
> -                                temps[args[2]].val);
> -                tcg_opt_gen_movi(s, op, args, args[0], tmp);
> +            if (temp_is_const(op->args[1]) && temp_is_const(op->args[2])) {
> +                tmp = deposit64(temps[op->args[1]].val, op->args[3],
> +                                op->args[4], temps[op->args[2]].val);
> +                tcg_opt_gen_movi(s, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
>
>          CASE_OP_32_64(extract):
> -            if (temp_is_const(args[1])) {
> -                tmp = extract64(temps[args[1]].val, args[2], args[3]);
> -                tcg_opt_gen_movi(s, op, args, args[0], tmp);
> +            if (temp_is_const(op->args[1])) {
> +                tmp = extract64(temps[op->args[1]].val,
> +                                op->args[2], op->args[3]);
> +                tcg_opt_gen_movi(s, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
>
>          CASE_OP_32_64(sextract):
> -            if (temp_is_const(args[1])) {
> -                tmp = sextract64(temps[args[1]].val, args[2], args[3]);
> -                tcg_opt_gen_movi(s, op, args, args[0], tmp);
> +            if (temp_is_const(op->args[1])) {
> +                tmp = sextract64(temps[op->args[1]].val,
> +                                 op->args[2], op->args[3]);
> +                tcg_opt_gen_movi(s, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
>
>          CASE_OP_32_64(setcond):
> -            tmp = do_constant_folding_cond(opc, args[1], args[2], args[3]);
> +            tmp = do_constant_folding_cond(opc, op->args[1],
> +                                           op->args[2], op->args[3]);
>              if (tmp != 2) {
> -                tcg_opt_gen_movi(s, op, args, args[0], tmp);
> +                tcg_opt_gen_movi(s, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
>
>          CASE_OP_32_64(brcond):
> -            tmp = do_constant_folding_cond(opc, args[0], args[1], args[2]);
> +            tmp = do_constant_folding_cond(opc, op->args[0],
> +                                           op->args[1], op->args[2]);
>              if (tmp != 2) {
>                  if (tmp) {
>                      reset_all_temps(nb_temps);
>                      op->opc = INDEX_op_br;
> -                    args[0] = args[3];
> +                    op->args[0] = op->args[3];
>                  } else {
>                      tcg_op_remove(s, op);
>                  }
> @@ -1150,21 +1159,22 @@ void tcg_optimize(TCGContext *s)
>              goto do_default;
>
>          CASE_OP_32_64(movcond):
> -            tmp = do_constant_folding_cond(opc, args[1], args[2], args[5]);
> +            tmp = do_constant_folding_cond(opc, op->args[1],
> +                                           op->args[2], op->args[5]);
>              if (tmp != 2) {
> -                tcg_opt_gen_mov(s, op, args, args[0], args[4-tmp]);
> +                tcg_opt_gen_mov(s, op, op->args[0], op->args[4-tmp]);
>                  break;
>              }
> -            if (temp_is_const(args[3]) && temp_is_const(args[4])) {
> -                tcg_target_ulong tv = temps[args[3]].val;
> -                tcg_target_ulong fv = temps[args[4]].val;
> -                TCGCond cond = args[5];
> +            if (temp_is_const(op->args[3]) && temp_is_const(op->args[4])) {
> +                tcg_target_ulong tv = temps[op->args[3]].val;
> +                tcg_target_ulong fv = temps[op->args[4]].val;
> +                TCGCond cond = op->args[5];
>                  if (fv == 1 && tv == 0) {
>                      cond = tcg_invert_cond(cond);
>                  } else if (!(tv == 1 && fv == 0)) {
>                      goto do_default;
>                  }
> -                args[3] = cond;
> +                op->args[3] = cond;
>                  op->opc = opc = (opc == INDEX_op_movcond_i32
>                                   ? INDEX_op_setcond_i32
>                                   : INDEX_op_setcond_i64);
> @@ -1174,17 +1184,16 @@ void tcg_optimize(TCGContext *s)
>
>          case INDEX_op_add2_i32:
>          case INDEX_op_sub2_i32:
> -            if (temp_is_const(args[2]) && temp_is_const(args[3])
> -                && temp_is_const(args[4]) && temp_is_const(args[5])) {
> -                uint32_t al = temps[args[2]].val;
> -                uint32_t ah = temps[args[3]].val;
> -                uint32_t bl = temps[args[4]].val;
> -                uint32_t bh = temps[args[5]].val;
> +            if (temp_is_const(op->args[2]) && temp_is_const(op->args[3])
> +                && temp_is_const(op->args[4]) && temp_is_const(op->args[5])) {
> +                uint32_t al = temps[op->args[2]].val;
> +                uint32_t ah = temps[op->args[3]].val;
> +                uint32_t bl = temps[op->args[4]].val;
> +                uint32_t bh = temps[op->args[5]].val;
>                  uint64_t a = ((uint64_t)ah << 32) | al;
>                  uint64_t b = ((uint64_t)bh << 32) | bl;
>                  TCGArg rl, rh;
>                  TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32, 2);
> -                TCGArg *args2 = op2->args;
>
>                  if (opc == INDEX_op_add2_i32) {
>                      a += b;
> @@ -1192,10 +1201,10 @@ void tcg_optimize(TCGContext *s)
>                      a -= b;
>                  }
>
> -                rl = args[0];
> -                rh = args[1];
> -                tcg_opt_gen_movi(s, op, args, rl, (int32_t)a);
> -                tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(a >> 32));
> +                rl = op->args[0];
> +                rh = op->args[1];
> +                tcg_opt_gen_movi(s, op, rl, (int32_t)a);
> +                tcg_opt_gen_movi(s, op2, rh, (int32_t)(a >> 32));
>
>                  /* We've done all we need to do with the movi.  Skip it.  */
>                  oi_next = op2->next;
> @@ -1204,18 +1213,17 @@ void tcg_optimize(TCGContext *s)
>              goto do_default;
>
>          case INDEX_op_mulu2_i32:
> -            if (temp_is_const(args[2]) && temp_is_const(args[3])) {
> -                uint32_t a = temps[args[2]].val;
> -                uint32_t b = temps[args[3]].val;
> +            if (temp_is_const(op->args[2]) && temp_is_const(op->args[3])) {
> +                uint32_t a = temps[op->args[2]].val;
> +                uint32_t b = temps[op->args[3]].val;
>                  uint64_t r = (uint64_t)a * b;
>                  TCGArg rl, rh;
>                  TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32, 2);
> -                TCGArg *args2 = op2->args;
>
> -                rl = args[0];
> -                rh = args[1];
> -                tcg_opt_gen_movi(s, op, args, rl, (int32_t)r);
> -                tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(r >> 32));
> +                rl = op->args[0];
> +                rh = op->args[1];
> +                tcg_opt_gen_movi(s, op, rl, (int32_t)r);
> +                tcg_opt_gen_movi(s, op2, rh, (int32_t)(r >> 32));
>
>                  /* We've done all we need to do with the movi.  Skip it.  */
>                  oi_next = op2->next;
> @@ -1224,41 +1232,47 @@ void tcg_optimize(TCGContext *s)
>              goto do_default;
>
>          case INDEX_op_brcond2_i32:
> -            tmp = do_constant_folding_cond2(&args[0], &args[2], args[4]);
> +            tmp = do_constant_folding_cond2(&op->args[0], &op->args[2],
> +                                            op->args[4]);
>              if (tmp != 2) {
>                  if (tmp) {
>              do_brcond_true:
>                      reset_all_temps(nb_temps);
>                      op->opc = INDEX_op_br;
> -                    args[0] = args[5];
> +                    op->args[0] = op->args[5];
>                  } else {
>              do_brcond_false:
>                      tcg_op_remove(s, op);
>                  }
> -            } else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
> -                       && temp_is_const(args[2]) && temps[args[2]].val == 0
> -                       && temp_is_const(args[3]) && temps[args[3]].val == 0) {
> +            } else if ((op->args[4] == TCG_COND_LT
> +                        || op->args[4] == TCG_COND_GE)
> +                       && temp_is_const(op->args[2])
> +                       && temps[op->args[2]].val == 0
> +                       && temp_is_const(op->args[3])
> +                       && temps[op->args[3]].val == 0) {
>                  /* Simplify LT/GE comparisons vs zero to a single compare
>                     vs the high word of the input.  */
>              do_brcond_high:
>                  reset_all_temps(nb_temps);
>                  op->opc = INDEX_op_brcond_i32;
> -                args[0] = args[1];
> -                args[1] = args[3];
> -                args[2] = args[4];
> -                args[3] = args[5];
> -            } else if (args[4] == TCG_COND_EQ) {
> +                op->args[0] = op->args[1];
> +                op->args[1] = op->args[3];
> +                op->args[2] = op->args[4];
> +                op->args[3] = op->args[5];
> +            } else if (op->args[4] == TCG_COND_EQ) {
>                  /* Simplify EQ comparisons where one of the pairs
>                     can be simplified.  */
>                  tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
> -                                               args[0], args[2], TCG_COND_EQ);
> +                                               op->args[0], op->args[2],
> +                                               TCG_COND_EQ);
>                  if (tmp == 0) {
>                      goto do_brcond_false;
>                  } else if (tmp == 1) {
>                      goto do_brcond_high;
>                  }
>                  tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
> -                                               args[1], args[3], TCG_COND_EQ);
> +                                               op->args[1], op->args[3],
> +                                               TCG_COND_EQ);
>                  if (tmp == 0) {
>                      goto do_brcond_false;
>                  } else if (tmp != 1) {
> @@ -1267,21 +1281,23 @@ void tcg_optimize(TCGContext *s)
>              do_brcond_low:
>                  reset_all_temps(nb_temps);
>                  op->opc = INDEX_op_brcond_i32;
> -                args[1] = args[2];
> -                args[2] = args[4];
> -                args[3] = args[5];
> -            } else if (args[4] == TCG_COND_NE) {
> +                op->args[1] = op->args[2];
> +                op->args[2] = op->args[4];
> +                op->args[3] = op->args[5];
> +            } else if (op->args[4] == TCG_COND_NE) {
>                  /* Simplify NE comparisons where one of the pairs
>                     can be simplified.  */
>                  tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
> -                                               args[0], args[2], TCG_COND_NE);
> +                                               op->args[0], op->args[2],
> +                                               TCG_COND_NE);
>                  if (tmp == 0) {
>                      goto do_brcond_high;
>                  } else if (tmp == 1) {
>                      goto do_brcond_true;
>                  }
>                  tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
> -                                               args[1], args[3], TCG_COND_NE);
> +                                               op->args[1], op->args[3],
> +                                               TCG_COND_NE);
>                  if (tmp == 0) {
>                      goto do_brcond_low;
>                  } else if (tmp == 1) {
> @@ -1294,57 +1310,65 @@ void tcg_optimize(TCGContext *s)
>              break;
>
>          case INDEX_op_setcond2_i32:
> -            tmp = do_constant_folding_cond2(&args[1], &args[3], args[5]);
> +            tmp = do_constant_folding_cond2(&op->args[1], &op->args[3],
> +                                            op->args[5]);
>              if (tmp != 2) {
>              do_setcond_const:
> -                tcg_opt_gen_movi(s, op, args, args[0], tmp);
> -            } else if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
> -                       && temp_is_const(args[3]) && temps[args[3]].val == 0
> -                       && temp_is_const(args[4]) && temps[args[4]].val == 0) {
> +                tcg_opt_gen_movi(s, op, op->args[0], tmp);
> +            } else if ((op->args[5] == TCG_COND_LT
> +                        || op->args[5] == TCG_COND_GE)
> +                       && temp_is_const(op->args[3])
> +                       && temps[op->args[3]].val == 0
> +                       && temp_is_const(op->args[4])
> +                       && temps[op->args[4]].val == 0) {
>                  /* Simplify LT/GE comparisons vs zero to a single compare
>                     vs the high word of the input.  */
>              do_setcond_high:
> -                reset_temp(args[0]);
> -                temps[args[0]].mask = 1;
> +                reset_temp(op->args[0]);
> +                temps[op->args[0]].mask = 1;
>                  op->opc = INDEX_op_setcond_i32;
> -                args[1] = args[2];
> -                args[2] = args[4];
> -                args[3] = args[5];
> -            } else if (args[5] == TCG_COND_EQ) {
> +                op->args[1] = op->args[2];
> +                op->args[2] = op->args[4];
> +                op->args[3] = op->args[5];
> +            } else if (op->args[5] == TCG_COND_EQ) {
>                  /* Simplify EQ comparisons where one of the pairs
>                     can be simplified.  */
>                  tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
> -                                               args[1], args[3], TCG_COND_EQ);
> +                                               op->args[1], op->args[3],
> +                                               TCG_COND_EQ);
>                  if (tmp == 0) {
>                      goto do_setcond_const;
>                  } else if (tmp == 1) {
>                      goto do_setcond_high;
>                  }
>                  tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
> -                                               args[2], args[4], TCG_COND_EQ);
> +                                               op->args[2], op->args[4],
> +                                               TCG_COND_EQ);
>                  if (tmp == 0) {
>                      goto do_setcond_high;
>                  } else if (tmp != 1) {
>                      goto do_default;
>                  }
>              do_setcond_low:
> -                reset_temp(args[0]);
> -                temps[args[0]].mask = 1;
> +                reset_temp(op->args[0]);
> +                temps[op->args[0]].mask = 1;
>                  op->opc = INDEX_op_setcond_i32;
> -                args[2] = args[3];
> -                args[3] = args[5];
> -            } else if (args[5] == TCG_COND_NE) {
> +                op->args[2] = op->args[3];
> +                op->args[3] = op->args[5];
> +            } else if (op->args[5] == TCG_COND_NE) {
>                  /* Simplify NE comparisons where one of the pairs
>                     can be simplified.  */
>                  tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
> -                                               args[1], args[3], TCG_COND_NE);
> +                                               op->args[1], op->args[3],
> +                                               TCG_COND_NE);
>                  if (tmp == 0) {
>                      goto do_setcond_high;
>                  } else if (tmp == 1) {
>                      goto do_setcond_const;
>                  }
>                  tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
> -                                               args[2], args[4], TCG_COND_NE);
> +                                               op->args[2], op->args[4],
> +                                               TCG_COND_NE);
>                  if (tmp == 0) {
>                      goto do_setcond_low;
>                  } else if (tmp == 1) {
> @@ -1357,7 +1381,7 @@ void tcg_optimize(TCGContext *s)
>              break;
>
>          case INDEX_op_call:
> -            if (!(args[nb_oargs + nb_iargs + 1]
> +            if (!(op->args[nb_oargs + nb_iargs + 1]
>                    & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
>                  for (i = 0; i < nb_globals; i++) {
>                      if (test_bit(i, temps_used.l)) {
> @@ -1379,11 +1403,11 @@ void tcg_optimize(TCGContext *s)
>              } else {
>          do_reset_output:
>                  for (i = 0; i < nb_oargs; i++) {
> -                    reset_temp(args[i]);
> +                    reset_temp(op->args[i]);
>                      /* Save the corresponding known-zero bits mask for the
>                         first output argument (only one supported so far). */
>                      if (i == 0) {
> -                        temps[args[i]].mask = mask;
> +                        temps[op->args[i]].mask = mask;
>                      }
>                  }
>              }
> @@ -1391,7 +1415,7 @@ void tcg_optimize(TCGContext *s)
>          }
>
>          /* Eliminate duplicate and redundant fence instructions.  */
> -        if (prev_mb_args) {
> +        if (prev_mb) {
>              switch (opc) {
>              case INDEX_op_mb:
>                  /* Merge two barriers of the same type into one,
> @@ -1405,7 +1429,7 @@ void tcg_optimize(TCGContext *s)
>                   * barrier.  This is stricter than specified but for
>                   * the purposes of TCG is better than not optimizing.
>                   */
> -                prev_mb_args[0] |= args[0];
> +                prev_mb->args[0] |= op->args[0];
>                  tcg_op_remove(s, op);
>                  break;
>
> @@ -1421,11 +1445,11 @@ void tcg_optimize(TCGContext *s)
>              case INDEX_op_qemu_st_i64:
>              case INDEX_op_call:
>                  /* Opcodes that touch guest memory stop the optimization.  */
> -                prev_mb_args = NULL;
> +                prev_mb = NULL;
>                  break;
>              }
>          } else if (opc == INDEX_op_mb) {
> -            prev_mb_args = args;
> +            prev_mb = op;
>          }
>      }
>  }

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 01/16] tcg: Merge opcode arguments into TCGOp
  2017-06-26 14:44   ` Alex Bennée
@ 2017-06-26 14:55     ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-06-26 14:55 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, aurelien

On 06/26/2017 07:44 AM, Alex Bennée wrote:
>> -/* The layout here is designed to avoid crossing of a 32-bit boundary.
>> -   If we do so, gcc adds padding, expanding the size to 12.  */
>> +/* The layout here is designed to avoid crossing of a 32-bit
>> boundary.  */
> 
> This isn't correct now? Do we mean we now aim to be cache line aligned?

I still avoid having a bitfield cross a 32-bit boundary.  Perhaps I should not 
have trimmed quite so much from the comment.

>>   /* Make sure operands fit in the bitfields above.  */
>>   QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8));
>> -QEMU_BUILD_BUG_ON(OPC_BUF_SIZE > (1 << 10));
>> -QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE > (1 << 14));
>> -
>> -/* Make sure that we don't overflow 64 bits without noticing.  */
>> -QEMU_BUILD_BUG_ON(sizeof(TCGOp) > 8);
>> +QEMU_BUILD_BUG_ON(OPC_BUF_SIZE > (1 << 16));
> 
> OPC_BUF_SIZE is statically assigned, we don't seem to be taking notice
> of sizeof(TCGOp) anymore here. In fact OPC_BUF_SIZE is really
> MAX_TCG_OPS right?

Yes, I dropped the sizeof(TCGOp) check.  I could perhaps adjust it, but the 
expression would be a bit unwieldy, since it'll vary by host now.

I suppose you could think of OPC_BUF_SIZE as MAX_TCG_OPS, yes.  I suppose that 
might be a decent renaming as well.

> I see TCGArg is currently target_ulong. Is this because we never leak
> the host size details into generated code safe for the statically
> assigned env_ptr?

You mis-read.  TCGArg is tcg_target_ulong, which is a host specific value.

> I mention this because in looking at modelling SIMD registers I'm going
> to need to carry a host ptr around in TCG registers that can be passed
> to helpers and the like.

You'll always be able to do that.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 03/16] tcg: Propagate args to op->args in tcg.c
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 03/16] tcg: Propagate args to op->args in tcg.c Richard Henderson
@ 2017-06-26 15:02   ` Alex Bennée
  2017-06-26 15:07     ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Bennée @ 2017-06-26 15:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/tcg.c | 121 ++++++++++++++++++++++++++++++--------------------------------
>  1 file changed, 58 insertions(+), 63 deletions(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 298aa0c..be5b69c 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1054,14 +1054,12 @@ void tcg_dump_ops(TCGContext *s)
>      for (oi = s->gen_op_buf[0].next; oi != 0; oi = op->next) {
>          int i, k, nb_oargs, nb_iargs, nb_cargs;
>          const TCGOpDef *def;
> -        const TCGArg *args;
>          TCGOpcode c;
>          int col = 0;
>
>          op = &s->gen_op_buf[oi];
>          c = op->opc;
>          def = &tcg_op_defs[c];
> -        args = op->args;
>
>          if (c == INDEX_op_insn_start) {
>              col += qemu_log("%s ----", oi != s->gen_op_buf[0].next ? "\n" : "");
> @@ -1069,9 +1067,9 @@ void tcg_dump_ops(TCGContext *s)
>              for (i = 0; i < TARGET_INSN_START_WORDS; ++i) {
>                  target_ulong a;
>  #if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
> -                a = ((target_ulong)args[i * 2 + 1] << 32) | args[i * 2];
> +                a = deposit64(op->args[i * 2], 32, 32, op->args[i * 2
>              + 1]);

It doesn't now but should be assert against us overflowing the args
buffer here when dealing with encoded data? Or should it have faulted
when planting the ops?

>  #else
> -                a = args[i];
> +                a = op->args[i];
>  #endif
>                  col += qemu_log(" " TARGET_FMT_lx, a);
>              }
> @@ -1083,14 +1081,14 @@ void tcg_dump_ops(TCGContext *s)
>
>              /* function name, flags, out args */
>              col += qemu_log(" %s %s,$0x%" TCG_PRIlx ",$%d", def->name,
> -                            tcg_find_helper(s, args[nb_oargs + nb_iargs]),
> -                            args[nb_oargs + nb_iargs + 1], nb_oargs);
> +                            tcg_find_helper(s, op->args[nb_oargs + nb_iargs]),
> +                            op->args[nb_oargs + nb_iargs + 1], nb_oargs);
>              for (i = 0; i < nb_oargs; i++) {
>                  col += qemu_log(",%s", tcg_get_arg_str_idx(s, buf, sizeof(buf),
> -                                                           args[i]));
> +                                                           op->args[i]));
>              }
>              for (i = 0; i < nb_iargs; i++) {
> -                TCGArg arg = args[nb_oargs + i];
> +                TCGArg arg = op->args[nb_oargs + i];
>                  const char *t = "<dummy>";
>                  if (arg != TCG_CALL_DUMMY_ARG) {
>                      t = tcg_get_arg_str_idx(s, buf, sizeof(buf), arg);
> @@ -1110,14 +1108,14 @@ void tcg_dump_ops(TCGContext *s)
>                      col += qemu_log(",");
>                  }
>                  col += qemu_log("%s", tcg_get_arg_str_idx(s, buf, sizeof(buf),
> -                                                          args[k++]));
> +                                                          op->args[k++]));
>              }
>              for (i = 0; i < nb_iargs; i++) {
>                  if (k != 0) {
>                      col += qemu_log(",");
>                  }
>                  col += qemu_log("%s", tcg_get_arg_str_idx(s, buf, sizeof(buf),
> -                                                          args[k++]));
> +                                                          op->args[k++]));
>              }
>              switch (c) {
>              case INDEX_op_brcond_i32:
> @@ -1128,10 +1126,11 @@ void tcg_dump_ops(TCGContext *s)
>              case INDEX_op_brcond_i64:
>              case INDEX_op_setcond_i64:
>              case INDEX_op_movcond_i64:
> -                if (args[k] < ARRAY_SIZE(cond_name) && cond_name[args[k]]) {
> -                    col += qemu_log(",%s", cond_name[args[k++]]);
> +                if (op->args[k] < ARRAY_SIZE(cond_name)
> +                    && cond_name[op->args[k]]) {
> +                    col += qemu_log(",%s", cond_name[op->args[k++]]);
>                  } else {
> -                    col += qemu_log(",$0x%" TCG_PRIlx, args[k++]);
> +                    col += qemu_log(",$0x%" TCG_PRIlx, op->args[k++]);
>                  }
>                  i = 1;
>                  break;
> @@ -1140,7 +1139,7 @@ void tcg_dump_ops(TCGContext *s)
>              case INDEX_op_qemu_ld_i64:
>              case INDEX_op_qemu_st_i64:
>                  {
> -                    TCGMemOpIdx oi = args[k++];
> +                    TCGMemOpIdx oi = op->args[k++];
>                      TCGMemOp op = get_memop(oi);
>                      unsigned ix = get_mmuidx(oi);
>
> @@ -1165,14 +1164,15 @@ void tcg_dump_ops(TCGContext *s)
>              case INDEX_op_brcond_i32:
>              case INDEX_op_brcond_i64:
>              case INDEX_op_brcond2_i32:
> -                col += qemu_log("%s$L%d", k ? "," : "", arg_label(args[k])->id);
> +                col += qemu_log("%s$L%d", k ? "," : "",
> +                                arg_label(op->args[k])->id);
>                  i++, k++;
>                  break;
>              default:
>                  break;
>              }
>              for (; i < nb_cargs; i++, k++) {
> -                col += qemu_log("%s$0x%" TCG_PRIlx, k ? "," : "", args[k]);
> +                col += qemu_log("%s$0x%" TCG_PRIlx, k ? "," : "", op->args[k]);
>              }
>          }
>          if (op->life) {
> @@ -1433,7 +1433,6 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>          TCGArg arg;
>
>          TCGOp * const op = &s->gen_op_buf[oi];
> -        TCGArg * const args = op->args;
>          TCGOpcode opc = op->opc;
>          const TCGOpDef *def = &tcg_op_defs[opc];
>
> @@ -1446,12 +1445,12 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>
>                  nb_oargs = op->callo;
>                  nb_iargs = op->calli;
> -                call_flags = args[nb_oargs + nb_iargs + 1];
> +                call_flags = op->args[nb_oargs + nb_iargs + 1];
>
>                  /* pure functions can be removed if their result is unused */
>                  if (call_flags & TCG_CALL_NO_SIDE_EFFECTS) {
>                      for (i = 0; i < nb_oargs; i++) {
> -                        arg = args[i];
> +                        arg = op->args[i];
>                          if (temp_state[arg] != TS_DEAD) {
>                              goto do_not_remove_call;
>                          }
> @@ -1462,7 +1461,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>
>                      /* output args are dead */
>                      for (i = 0; i < nb_oargs; i++) {
> -                        arg = args[i];
> +                        arg = op->args[i];
>                          if (temp_state[arg] & TS_DEAD) {
>                              arg_life |= DEAD_ARG << i;
>                          }
> @@ -1485,7 +1484,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>
>                      /* record arguments that die in this helper */
>                      for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
> -                        arg = args[i];
> +                        arg = op->args[i];
>                          if (arg != TCG_CALL_DUMMY_ARG) {
>                              if (temp_state[arg] & TS_DEAD) {
>                                  arg_life |= DEAD_ARG << i;
> @@ -1494,7 +1493,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>                      }
>                      /* input arguments are live for preceding opcodes */
>                      for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
> -                        arg = args[i];
> +                        arg = op->args[i];
>                          if (arg != TCG_CALL_DUMMY_ARG) {
>                              temp_state[arg] &= ~TS_DEAD;
>                          }
> @@ -1506,7 +1505,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>              break;
>          case INDEX_op_discard:
>              /* mark the temporary as dead */
> -            temp_state[args[0]] = TS_DEAD;
> +            temp_state[op->args[0]] = TS_DEAD;
>              break;
>
>          case INDEX_op_add2_i32:
> @@ -1527,15 +1526,15 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>                 the low part.  The result can be optimized to a simple
>                 add or sub.  This happens often for x86_64 guest when the
>                 cpu mode is set to 32 bit.  */
> -            if (temp_state[args[1]] == TS_DEAD) {
> -                if (temp_state[args[0]] == TS_DEAD) {
> +            if (temp_state[op->args[1]] == TS_DEAD) {
> +                if (temp_state[op->args[0]] == TS_DEAD) {
>                      goto do_remove;
>                  }
>                  /* Replace the opcode and adjust the args in place,
>                     leaving 3 unused args at the end.  */
>                  op->opc = opc = opc_new;
> -                args[1] = args[2];
> -                args[2] = args[4];
> +                op->args[1] = op->args[2];
> +                op->args[2] = op->args[4];
>                  /* Fall through and mark the single-word operation live.  */
>                  nb_iargs = 2;
>                  nb_oargs = 1;
> @@ -1565,21 +1564,21 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>          do_mul2:
>              nb_iargs = 2;
>              nb_oargs = 2;
> -            if (temp_state[args[1]] == TS_DEAD) {
> -                if (temp_state[args[0]] == TS_DEAD) {
> +            if (temp_state[op->args[1]] == TS_DEAD) {
> +                if (temp_state[op->args[0]] == TS_DEAD) {
>                      /* Both parts of the operation are dead.  */
>                      goto do_remove;
>                  }
>                  /* The high part of the operation is dead; generate the low. */
>                  op->opc = opc = opc_new;
> -                args[1] = args[2];
> -                args[2] = args[3];
> -            } else if (temp_state[args[0]] == TS_DEAD && have_opc_new2) {
> +                op->args[1] = op->args[2];
> +                op->args[2] = op->args[3];
> +            } else if (temp_state[op->args[0]] == TS_DEAD && have_opc_new2) {
>                  /* The low part of the operation is dead; generate the high. */
>                  op->opc = opc = opc_new2;
> -                args[0] = args[1];
> -                args[1] = args[2];
> -                args[2] = args[3];
> +                op->args[0] = op->args[1];
> +                op->args[1] = op->args[2];
> +                op->args[2] = op->args[3];
>              } else {
>                  goto do_not_remove;
>              }
> @@ -1597,7 +1596,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>                 implies side effects */
>              if (!(def->flags & TCG_OPF_SIDE_EFFECTS) && nb_oargs != 0) {
>                  for (i = 0; i < nb_oargs; i++) {
> -                    if (temp_state[args[i]] != TS_DEAD) {
> +                    if (temp_state[op->args[i]] != TS_DEAD) {
>                          goto do_not_remove;
>                      }
>                  }
> @@ -1607,7 +1606,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>              do_not_remove:
>                  /* output args are dead */
>                  for (i = 0; i < nb_oargs; i++) {
> -                    arg = args[i];
> +                    arg = op->args[i];
>                      if (temp_state[arg] & TS_DEAD) {
>                          arg_life |= DEAD_ARG << i;
>                      }
> @@ -1629,14 +1628,14 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>
>                  /* record arguments that die in this opcode */
>                  for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
> -                    arg = args[i];
> +                    arg = op->args[i];
>                      if (temp_state[arg] & TS_DEAD) {
>                          arg_life |= DEAD_ARG << i;
>                      }
>                  }
>                  /* input arguments are live for preceding opcodes */
>                  for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
> -                    temp_state[args[i]] &= ~TS_DEAD;
> +                    temp_state[op->args[i]] &= ~TS_DEAD;
>                  }
>              }
>              break;
> @@ -1671,7 +1670,6 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
>
>      for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
>          TCGOp *op = &s->gen_op_buf[oi];
> -        TCGArg *args = op->args;
>          TCGOpcode opc = op->opc;
>          const TCGOpDef *def = &tcg_op_defs[opc];
>          TCGLifeData arg_life = op->life;
> @@ -1683,7 +1681,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
>          if (opc == INDEX_op_call) {
>              nb_oargs = op->callo;
>              nb_iargs = op->calli;
> -            call_flags = args[nb_oargs + nb_iargs + 1];
> +            call_flags = op->args[nb_oargs + nb_iargs + 1];
>          } else {
>              nb_iargs = def->nb_iargs;
>              nb_oargs = def->nb_oargs;
> @@ -1704,7 +1702,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
>
>          /* Make sure that input arguments are available.  */
>          for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
> -            arg = args[i];
> +            arg = op->args[i];
>              /* Note this unsigned test catches TCG_CALL_ARG_DUMMY too.  */
>              if (arg < nb_globals) {
>                  dir = dir_temps[arg];
> @@ -1714,11 +1712,10 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
>                                        ? INDEX_op_ld_i32
>                                        : INDEX_op_ld_i64);
>                      TCGOp *lop = tcg_op_insert_before(s, op, lopc, 3);
> -                    TCGArg *largs = lop->args;
>
> -                    largs[0] = dir;
> -                    largs[1] = temp_idx(s, its->mem_base);
> -                    largs[2] = its->mem_offset;
> +                    lop->args[0] = dir;
> +                    lop->args[1] = temp_idx(s, its->mem_base);
> +                    lop->args[2] = its->mem_offset;
>
>                      /* Loaded, but synced with memory.  */
>                      temp_state[arg] = TS_MEM;
> @@ -1730,11 +1727,11 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
>             No action is required except keeping temp_state up to date
>             so that we reload when needed.  */
>          for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
> -            arg = args[i];
> +            arg = op->args[i];
>              if (arg < nb_globals) {
>                  dir = dir_temps[arg];
>                  if (dir != 0) {
> -                    args[i] = dir;
> +                    op->args[i] = dir;
>                      changes = true;
>                      if (IS_DEAD_ARG(i)) {
>                          temp_state[arg] = TS_DEAD;
> @@ -1765,7 +1762,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
>
>          /* Outputs become available.  */
>          for (i = 0; i < nb_oargs; i++) {
> -            arg = args[i];
> +            arg = op->args[i];
>              if (arg >= nb_globals) {
>                  continue;
>              }
> @@ -1773,7 +1770,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
>              if (dir == 0) {
>                  continue;
>              }
> -            args[i] = dir;
> +            op->args[i] = dir;
>              changes = true;
>
>              /* The output is now live and modified.  */
> @@ -1786,11 +1783,10 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
>                                    ? INDEX_op_st_i32
>                                    : INDEX_op_st_i64);
>                  TCGOp *sop = tcg_op_insert_after(s, op, sopc, 3);
> -                TCGArg *sargs = sop->args;
>
> -                sargs[0] = dir;
> -                sargs[1] = temp_idx(s, its->mem_base);
> -                sargs[2] = its->mem_offset;
> +                sop->args[0] = dir;
> +                sop->args[1] = temp_idx(s, its->mem_base);
> +                sop->args[2] = its->mem_offset;
>
>                  temp_state[arg] = TS_MEM;
>              }
> @@ -2614,7 +2610,6 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      num_insns = -1;
>      for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
>          TCGOp * const op = &s->gen_op_buf[oi];
> -        TCGArg * const args = op->args;
>          TCGOpcode opc = op->opc;
>          const TCGOpDef *def = &tcg_op_defs[opc];
>          TCGLifeData arg_life = op->life;
> @@ -2627,11 +2622,11 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>          switch (opc) {
>          case INDEX_op_mov_i32:
>          case INDEX_op_mov_i64:
> -            tcg_reg_alloc_mov(s, def, args, arg_life);
> +            tcg_reg_alloc_mov(s, def, op->args, arg_life);
>              break;
>          case INDEX_op_movi_i32:
>          case INDEX_op_movi_i64:
> -            tcg_reg_alloc_movi(s, args, arg_life);
> +            tcg_reg_alloc_movi(s, op->args, arg_life);
>              break;
>          case INDEX_op_insn_start:
>              if (num_insns >= 0) {
> @@ -2641,22 +2636,22 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>              for (i = 0; i < TARGET_INSN_START_WORDS; ++i) {
>                  target_ulong a;
>  #if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
> -                a = ((target_ulong)args[i * 2 + 1] << 32) | args[i * 2];
> +                a = deposit64(op->args[i * 2], 32, 32, op->args[i * 2 + 1]);
>  #else
> -                a = args[i];
> +                a = op->args[i];
>  #endif
>                  s->gen_insn_data[num_insns][i] = a;
>              }
>              break;
>          case INDEX_op_discard:
> -            temp_dead(s, &s->temps[args[0]]);
> +            temp_dead(s, &s->temps[op->args[0]]);
>              break;
>          case INDEX_op_set_label:
>              tcg_reg_alloc_bb_end(s, s->reserved_regs);
> -            tcg_out_label(s, arg_label(args[0]), s->code_ptr);
> +            tcg_out_label(s, arg_label(op->args[0]), s->code_ptr);
>              break;
>          case INDEX_op_call:
> -            tcg_reg_alloc_call(s, op->callo, op->calli, args, arg_life);
> +            tcg_reg_alloc_call(s, op->callo, op->calli, op->args, arg_life);
>              break;
>          default:
>              /* Sanity check that we've not introduced any unhandled opcodes. */
> @@ -2666,7 +2661,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>              /* Note: in order to speed up the code, it would be much
>                 faster to have specialized register allocator functions for
>                 some common argument patterns */
> -            tcg_reg_alloc_op(s, def, opc, args, arg_life);
> +            tcg_reg_alloc_op(s, def, opc, op->args, arg_life);
>              break;
>          }
>  #ifdef CONFIG_DEBUG_TCG

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 03/16] tcg: Propagate args to op->args in tcg.c
  2017-06-26 15:02   ` Alex Bennée
@ 2017-06-26 15:07     ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-06-26 15:07 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, aurelien

On 06/26/2017 08:02 AM, Alex Bennée wrote:
>>   #if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
>> -                a = ((target_ulong)args[i * 2 + 1] << 32) | args[i * 2];
>> +                a = deposit64(op->args[i * 2], 32, 32, op->args[i * 2
>>               + 1]);
> 
> It doesn't now but should be assert against us overflowing the args
> buffer here when dealing with encoded data? Or should it have faulted
> when planting the ops?

Statically checked via preprocessor in tcg/tcg-op.h:

#elif TARGET_INSN_START_WORDS == 3
...
#else
# error "Unhandled number of operands to insn_start"
#endif

which maxes out at 6 host arguments, which is less than MAX_OPC_PARAM.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 04/16] tcg: Propagate TCGOp down to allocators
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 04/16] tcg: Propagate TCGOp down to allocators Richard Henderson
@ 2017-06-26 15:08   ` Alex Bennée
  0 siblings, 0 replies; 40+ messages in thread
From: Alex Bennée @ 2017-06-26 15:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/tcg.c | 82 +++++++++++++++++++++++++++++++--------------------------------
>  1 file changed, 40 insertions(+), 42 deletions(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index be5b69c..e2248a6 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -2111,25 +2111,24 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
>      }
>  }
>
> -static void tcg_reg_alloc_movi(TCGContext *s, const TCGArg *args,
> -                               TCGLifeData arg_life)
> +static void tcg_reg_alloc_movi(TCGContext *s, const TCGOp *op)
>  {
> -    TCGTemp *ots = &s->temps[args[0]];
> -    tcg_target_ulong val = args[1];
> +    TCGTemp *ots = &s->temps[op->args[0]];
> +    tcg_target_ulong val = op->args[1];
>
> -    tcg_reg_alloc_do_movi(s, ots, val, arg_life);
> +    tcg_reg_alloc_do_movi(s, ots, val, op->life);
>  }
>
> -static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
> -                              const TCGArg *args, TCGLifeData arg_life)
> +static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
>  {
> +    const TCGLifeData arg_life = op->life;
>      TCGRegSet allocated_regs;
>      TCGTemp *ts, *ots;
>      TCGType otype, itype;
>
>      tcg_regset_set(allocated_regs, s->reserved_regs);
> -    ots = &s->temps[args[0]];
> -    ts = &s->temps[args[1]];
> +    ots = &s->temps[op->args[0]];
> +    ts = &s->temps[op->args[1]];
>
>      /* Note that otype != itype for no-op truncation.  */
>      otype = ots->type;
> @@ -2159,7 +2158,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
>             liveness analysis disabled). */
>          tcg_debug_assert(NEED_SYNC_ARG(0));
>          if (!ots->mem_allocated) {
> -            temp_allocate_frame(s, args[0]);
> +            temp_allocate_frame(s, op->args[0]);
>          }
>          tcg_out_st(s, otype, ts->reg, ots->mem_base->reg, ots->mem_offset);
>          if (IS_DEAD_ARG(1)) {
> @@ -2193,10 +2192,10 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
>      }
>  }
>
> -static void tcg_reg_alloc_op(TCGContext *s,
> -                             const TCGOpDef *def, TCGOpcode opc,
> -                             const TCGArg *args, TCGLifeData arg_life)
> +static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>  {
> +    const TCGLifeData arg_life = op->life;
> +    const TCGOpDef * const def = &tcg_op_defs[op->opc];
>      TCGRegSet i_allocated_regs;
>      TCGRegSet o_allocated_regs;
>      int i, k, nb_iargs, nb_oargs;
> @@ -2207,21 +2206,24 @@ static void tcg_reg_alloc_op(TCGContext *s,
>      TCGArg new_args[TCG_MAX_OP_ARGS];
>      int const_args[TCG_MAX_OP_ARGS];
>
> +    /* Sanity check that we've not introduced any unhandled opcodes. */
> +    tcg_debug_assert(!(def->flags & TCG_OPF_NOT_PRESENT));
> +
>      nb_oargs = def->nb_oargs;
>      nb_iargs = def->nb_iargs;
>
>      /* copy constants */
>      memcpy(new_args + nb_oargs + nb_iargs,
> -           args + nb_oargs + nb_iargs,
> +           op->args + nb_oargs + nb_iargs,
>             sizeof(TCGArg) * def->nb_cargs);
>
>      tcg_regset_set(i_allocated_regs, s->reserved_regs);
>      tcg_regset_set(o_allocated_regs, s->reserved_regs);
>
>      /* satisfy input constraints */
> -    for(k = 0; k < nb_iargs; k++) {
> +    for (k = 0; k < nb_iargs; k++) {
>          i = def->sorted_args[nb_oargs + k];
> -        arg = args[i];
> +        arg = op->args[i];
>          arg_ct = &def->args_ct[i];
>          ts = &s->temps[arg];
>
> @@ -2239,7 +2241,7 @@ static void tcg_reg_alloc_op(TCGContext *s,
>              if (ts->fixed_reg) {
>                  /* if fixed register, we must allocate a new register
>                     if the alias is not the same register */
> -                if (arg != args[arg_ct->alias_index])
> +                if (arg != op->args[arg_ct->alias_index])
>                      goto allocate_in_reg;
>              } else {
>                  /* if the input is aliased to an output and if it is
> @@ -2280,7 +2282,7 @@ static void tcg_reg_alloc_op(TCGContext *s,
>      /* mark dead temporaries and free the associated registers */
>      for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
>          if (IS_DEAD_ARG(i)) {
> -            temp_dead(s, &s->temps[args[i]]);
> +            temp_dead(s, &s->temps[op->args[i]]);
>          }
>      }
>
> @@ -2304,7 +2306,7 @@ static void tcg_reg_alloc_op(TCGContext *s,
>          /* satisfy the output constraints */
>          for(k = 0; k < nb_oargs; k++) {
>              i = def->sorted_args[k];
> -            arg = args[i];
> +            arg = op->args[i];
>              arg_ct = &def->args_ct[i];
>              ts = &s->temps[arg];
>              if ((arg_ct->ct & TCG_CT_ALIAS)
> @@ -2343,11 +2345,11 @@ static void tcg_reg_alloc_op(TCGContext *s,
>      }
>
>      /* emit instruction */
> -    tcg_out_op(s, opc, new_args, const_args);
> +    tcg_out_op(s, op->opc, new_args, const_args);
>
>      /* move the outputs in the correct register if needed */
>      for(i = 0; i < nb_oargs; i++) {
> -        ts = &s->temps[args[i]];
> +        ts = &s->temps[op->args[i]];
>          reg = new_args[i];
>          if (ts->fixed_reg && ts->reg != reg) {
>              tcg_out_mov(s, ts->type, ts->reg, reg);
> @@ -2366,9 +2368,11 @@ static void tcg_reg_alloc_op(TCGContext *s,
>  #define STACK_DIR(x) (x)
>  #endif
>
> -static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
> -                               const TCGArg * const args, TCGLifeData arg_life)
> +static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
>  {
> +    const int nb_oargs = op->callo;
> +    const int nb_iargs = op->calli;
> +    const TCGLifeData arg_life = op->life;
>      int flags, nb_regs, i;
>      TCGReg reg;
>      TCGArg arg;
> @@ -2379,8 +2383,8 @@ static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
>      int allocate_args;
>      TCGRegSet allocated_regs;
>
> -    func_addr = (tcg_insn_unit *)(intptr_t)args[nb_oargs + nb_iargs];
> -    flags = args[nb_oargs + nb_iargs + 1];
> +    func_addr = (tcg_insn_unit *)(intptr_t)op->args[nb_oargs + nb_iargs];
> +    flags = op->args[nb_oargs + nb_iargs + 1];
>
>      nb_regs = ARRAY_SIZE(tcg_target_call_iarg_regs);
>      if (nb_regs > nb_iargs) {
> @@ -2399,8 +2403,8 @@ static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
>      }
>
>      stack_offset = TCG_TARGET_CALL_STACK_OFFSET;
> -    for(i = nb_regs; i < nb_iargs; i++) {
> -        arg = args[nb_oargs + i];
> +    for (i = nb_regs; i < nb_iargs; i++) {
> +        arg = op->args[nb_oargs + i];
>  #ifdef TCG_TARGET_STACK_GROWSUP
>          stack_offset -= sizeof(tcg_target_long);
>  #endif
> @@ -2417,8 +2421,8 @@ static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
>
>      /* assign input registers */
>      tcg_regset_set(allocated_regs, s->reserved_regs);
> -    for(i = 0; i < nb_regs; i++) {
> -        arg = args[nb_oargs + i];
> +    for (i = 0; i < nb_regs; i++) {
> +        arg = op->args[nb_oargs + i];
>          if (arg != TCG_CALL_DUMMY_ARG) {
>              ts = &s->temps[arg];
>              reg = tcg_target_call_iarg_regs[i];
> @@ -2441,9 +2445,9 @@ static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
>      }
>
>      /* mark dead temporaries and free the associated registers */
> -    for(i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
> +    for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
>          if (IS_DEAD_ARG(i)) {
> -            temp_dead(s, &s->temps[args[i]]);
> +            temp_dead(s, &s->temps[op->args[i]]);
>          }
>      }
>
> @@ -2468,7 +2472,7 @@ static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
>
>      /* assign output registers and emit moves if needed */
>      for(i = 0; i < nb_oargs; i++) {
> -        arg = args[i];
> +        arg = op->args[i];
>          ts = &s->temps[arg];
>          reg = tcg_target_call_oarg_regs[i];
>          tcg_debug_assert(s->reg_to_temp[reg] == NULL);
> @@ -2611,8 +2615,6 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
>          TCGOp * const op = &s->gen_op_buf[oi];
>          TCGOpcode opc = op->opc;
> -        const TCGOpDef *def = &tcg_op_defs[opc];
> -        TCGLifeData arg_life = op->life;
>
>          oi_next = op->next;
>  #ifdef CONFIG_PROFILER
> @@ -2622,11 +2624,11 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>          switch (opc) {
>          case INDEX_op_mov_i32:
>          case INDEX_op_mov_i64:
> -            tcg_reg_alloc_mov(s, def, op->args, arg_life);
> +            tcg_reg_alloc_mov(s, op);
>              break;
>          case INDEX_op_movi_i32:
>          case INDEX_op_movi_i64:
> -            tcg_reg_alloc_movi(s, op->args, arg_life);
> +            tcg_reg_alloc_movi(s, op);
>              break;
>          case INDEX_op_insn_start:
>              if (num_insns >= 0) {
> @@ -2651,17 +2653,13 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>              tcg_out_label(s, arg_label(op->args[0]), s->code_ptr);
>              break;
>          case INDEX_op_call:
> -            tcg_reg_alloc_call(s, op->callo, op->calli, op->args, arg_life);
> +            tcg_reg_alloc_call(s, op);
>              break;
>          default:
> -            /* Sanity check that we've not introduced any unhandled opcodes. */
> -            if (def->flags & TCG_OPF_NOT_PRESENT) {
> -                tcg_abort();
> -            }
>              /* Note: in order to speed up the code, it would be much
>                 faster to have specialized register allocator functions for
>                 some common argument patterns */
> -            tcg_reg_alloc_op(s, def, opc, op->args, arg_life);
> +            tcg_reg_alloc_op(s, op);
>              break;
>          }
>  #ifdef CONFIG_DEBUG_TCG


--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 05/16] tcg: Introduce arg_temp
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 05/16] tcg: Introduce arg_temp Richard Henderson
@ 2017-06-26 16:37   ` Alex Bennée
  0 siblings, 0 replies; 40+ messages in thread
From: Alex Bennée @ 2017-06-26 16:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c |  4 ++--
>  tcg/tcg.c      | 51 +++++++++++++++++++++++++--------------------------
>  tcg/tcg.h      |  5 +++++
>  3 files changed, 32 insertions(+), 28 deletions(-)

This is a patch where having diff.orderFile put headers first would be
great ;-)

>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 1a1c6fb..d8c3a7e 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -133,7 +133,7 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
>      }
>
>      /* If it is a temp, search for a temp local. */
> -    if (!s->temps[temp].temp_local) {
> +    if (!arg_temp(temp)->temp_local) {
>          for (i = temps[temp].next_copy ; i != temp ; i = temps[i].next_copy) {
>              if (s->temps[i].temp_local) {
>                  return i;
> @@ -207,7 +207,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
>      }
>      temps[dst].mask = mask;
>
> -    if (s->temps[src].type == s->temps[dst].type) {
> +    if (arg_temp(src)->type == arg_temp(dst)->type) {
>          temps[dst].next_copy = temps[src].next_copy;
>          temps[dst].prev_copy = src;
>          temps[temps[dst].next_copy].prev_copy = dst;
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index e2248a6..068ac51 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -977,11 +977,10 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
>      return buf;
>  }
>
> -static char *tcg_get_arg_str_idx(TCGContext *s, char *buf,
> -                                 int buf_size, int idx)
> +static char *tcg_get_arg_str(TCGContext *s, char *buf,
> +                             int buf_size, TCGArg arg)
>  {
> -    tcg_debug_assert(idx >= 0 && idx < s->nb_temps);
> -    return tcg_get_arg_str_ptr(s, buf, buf_size, &s->temps[idx]);
> +    return tcg_get_arg_str_ptr(s, buf, buf_size, arg_temp(arg));
>  }
>
>  /* Find helper name.  */
> @@ -1084,14 +1083,14 @@ void tcg_dump_ops(TCGContext *s)
>                              tcg_find_helper(s, op->args[nb_oargs + nb_iargs]),
>                              op->args[nb_oargs + nb_iargs + 1], nb_oargs);
>              for (i = 0; i < nb_oargs; i++) {
> -                col += qemu_log(",%s", tcg_get_arg_str_idx(s, buf, sizeof(buf),
> -                                                           op->args[i]));
> +                col += qemu_log(",%s", tcg_get_arg_str(s, buf, sizeof(buf),
> +                                                       op->args[i]));
>              }
>              for (i = 0; i < nb_iargs; i++) {
>                  TCGArg arg = op->args[nb_oargs + i];
>                  const char *t = "<dummy>";
>                  if (arg != TCG_CALL_DUMMY_ARG) {
> -                    t = tcg_get_arg_str_idx(s, buf, sizeof(buf), arg);
> +                    t = tcg_get_arg_str(s, buf, sizeof(buf), arg);
>                  }
>                  col += qemu_log(",%s", t);
>              }
> @@ -1107,15 +1106,15 @@ void tcg_dump_ops(TCGContext *s)
>                  if (k != 0) {
>                      col += qemu_log(",");
>                  }
> -                col += qemu_log("%s", tcg_get_arg_str_idx(s, buf, sizeof(buf),
> -                                                          op->args[k++]));
> +                col += qemu_log("%s", tcg_get_arg_str(s, buf, sizeof(buf),
> +                                                      op->args[k++]));
>              }
>              for (i = 0; i < nb_iargs; i++) {
>                  if (k != 0) {
>                      col += qemu_log(",");
>                  }
> -                col += qemu_log("%s", tcg_get_arg_str_idx(s, buf, sizeof(buf),
> -                                                          op->args[k++]));
> +                col += qemu_log("%s", tcg_get_arg_str(s, buf, sizeof(buf),
> +                                                      op->args[k++]));
>              }
>              switch (c) {
>              case INDEX_op_brcond_i32:
> @@ -1707,7 +1706,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
>              if (arg < nb_globals) {
>                  dir = dir_temps[arg];
>                  if (dir != 0 && temp_state[arg] == TS_DEAD) {
> -                    TCGTemp *its = &s->temps[arg];
> +                    TCGTemp *its = arg_temp(arg);
>                      TCGOpcode lopc = (its->type == TCG_TYPE_I32
>                                        ? INDEX_op_ld_i32
>                                        : INDEX_op_ld_i64);
> @@ -1778,7 +1777,7 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
>
>              /* Sync outputs upon their last write.  */
>              if (NEED_SYNC_ARG(i)) {
> -                TCGTemp *its = &s->temps[arg];
> +                TCGTemp *its = arg_temp(arg);
>                  TCGOpcode sopc = (its->type == TCG_TYPE_I32
>                                    ? INDEX_op_st_i32
>                                    : INDEX_op_st_i64);
> @@ -1809,7 +1808,7 @@ static void dump_regs(TCGContext *s)
>
>      for(i = 0; i < s->nb_temps; i++) {
>          ts = &s->temps[i];
> -        printf("  %10s: ", tcg_get_arg_str_idx(s, buf, sizeof(buf), i));
> +        printf("  %10s: ", tcg_get_arg_str_ptr(s, buf, sizeof(buf), ts));
>          switch(ts->val_type) {
>          case TEMP_VAL_REG:
>              printf("%s", tcg_target_reg_names[ts->reg]);
> @@ -2113,7 +2112,7 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
>
>  static void tcg_reg_alloc_movi(TCGContext *s, const TCGOp *op)
>  {
> -    TCGTemp *ots = &s->temps[op->args[0]];
> +    TCGTemp *ots = arg_temp(op->args[0]);
>      tcg_target_ulong val = op->args[1];
>
>      tcg_reg_alloc_do_movi(s, ots, val, op->life);
> @@ -2127,8 +2126,8 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
>      TCGType otype, itype;
>
>      tcg_regset_set(allocated_regs, s->reserved_regs);
> -    ots = &s->temps[op->args[0]];
> -    ts = &s->temps[op->args[1]];
> +    ots = arg_temp(op->args[0]);
> +    ts = arg_temp(op->args[1]);
>
>      /* Note that otype != itype for no-op truncation.  */
>      otype = ots->type;
> @@ -2225,7 +2224,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>          i = def->sorted_args[nb_oargs + k];
>          arg = op->args[i];
>          arg_ct = &def->args_ct[i];
> -        ts = &s->temps[arg];
> +        ts = arg_temp(arg);
>
>          if (ts->val_type == TEMP_VAL_CONST
>              && tcg_target_const_match(ts->val, ts->type, arg_ct)) {
> @@ -2282,7 +2281,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>      /* mark dead temporaries and free the associated registers */
>      for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
>          if (IS_DEAD_ARG(i)) {
> -            temp_dead(s, &s->temps[op->args[i]]);
> +            temp_dead(s, arg_temp(op->args[i]));
>          }
>      }
>
> @@ -2308,7 +2307,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>              i = def->sorted_args[k];
>              arg = op->args[i];
>              arg_ct = &def->args_ct[i];
> -            ts = &s->temps[arg];
> +            ts = arg_temp(arg);
>              if ((arg_ct->ct & TCG_CT_ALIAS)
>                  && !const_args[arg_ct->alias_index]) {
>                  reg = new_args[arg_ct->alias_index];
> @@ -2349,7 +2348,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>
>      /* move the outputs in the correct register if needed */
>      for(i = 0; i < nb_oargs; i++) {
> -        ts = &s->temps[op->args[i]];
> +        ts = arg_temp(op->args[i]);
>          reg = new_args[i];
>          if (ts->fixed_reg && ts->reg != reg) {
>              tcg_out_mov(s, ts->type, ts->reg, reg);
> @@ -2409,7 +2408,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
>          stack_offset -= sizeof(tcg_target_long);
>  #endif
>          if (arg != TCG_CALL_DUMMY_ARG) {
> -            ts = &s->temps[arg];
> +            ts = arg_temp(arg);
>              temp_load(s, ts, tcg_target_available_regs[ts->type],
>                        s->reserved_regs);
>              tcg_out_st(s, ts->type, ts->reg, TCG_REG_CALL_STACK, stack_offset);
> @@ -2424,7 +2423,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
>      for (i = 0; i < nb_regs; i++) {
>          arg = op->args[nb_oargs + i];
>          if (arg != TCG_CALL_DUMMY_ARG) {
> -            ts = &s->temps[arg];
> +            ts = arg_temp(arg);
>              reg = tcg_target_call_iarg_regs[i];
>              tcg_reg_free(s, reg, allocated_regs);
>
> @@ -2447,7 +2446,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
>      /* mark dead temporaries and free the associated registers */
>      for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
>          if (IS_DEAD_ARG(i)) {
> -            temp_dead(s, &s->temps[op->args[i]]);
> +            temp_dead(s, arg_temp(op->args[i]));
>          }
>      }
>
> @@ -2473,7 +2472,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
>      /* assign output registers and emit moves if needed */
>      for(i = 0; i < nb_oargs; i++) {
>          arg = op->args[i];
> -        ts = &s->temps[arg];
> +        ts = arg_temp(arg);
>          reg = tcg_target_call_oarg_regs[i];
>          tcg_debug_assert(s->reg_to_temp[reg] == NULL);
>
> @@ -2646,7 +2645,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>              }
>              break;
>          case INDEX_op_discard:
> -            temp_dead(s, &s->temps[op->args[0]]);
> +            temp_dead(s, arg_temp(op->args[0]));
>              break;
>          case INDEX_op_set_label:
>              tcg_reg_alloc_bb_end(s, s->reserved_regs);
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 720e04e..70d9fda 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -724,6 +724,11 @@ struct TCGContext {
>  extern TCGContext tcg_ctx;
>  extern bool parallel_cpus;
>
> +static inline TCGTemp *arg_temp(TCGArg a)
> +{
> +    return &tcg_ctx.temps[a];
> +}
> +
>  static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
>  {
>      tcg_ctx.gen_op_buf[op_idx].args[arg] = v;

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end
  2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
                   ` (16 preceding siblings ...)
  2017-06-21  3:43 ` [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end no-reply
@ 2017-06-26 16:49 ` Alex Bennée
  2017-06-26 17:47   ` Richard Henderson
  17 siblings, 1 reply; 40+ messages in thread
From: Alex Bennée @ 2017-06-26 16:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> There are two conceptually unrelated cleanups in here, though
> the second touches many of the same lines as the first, so
> separating the two would be ugly.
>
> The first is to split gen_opparam_buf and move the pieces into
> TCGOp.  This has two effects: the operands for an op is in the
> same cacheline as the op, and we get to drop the pointer into
> gen_opparam_buf, freeing up a register and/or function argument.
>
> The second is to change what value is stored in TCGArg for each
> TCG temporary.  Rather than store the index into tcg_ctx.temps,
> store the pointer to the temp itself.  This allows us to drop
> some arithmetic on many uses of a temp within the backend.
>
> Making that second change is tricky, as we don't want to miss any
> of the places that ought to be changed.  To do that I introduce a
> number of helpers.
>
> As a final step I changed the type of TCGOp.args to a structure,
> and annotated the places that access constant arguments.  I found
> that final patch to be really ugly, so I dropped it.  But I'm
> fairly confident that I've updated all of the non-constant args.
>
> The effect of this is nearly noise, but does reduce code size,
>
>    text	   data	    bss	    dec	    hex	filename
> 6648688	2106408	4486112	13241208 ca0b78	qemu-system-alpha (before)
> 6627656	2106408	4502496	13236560 c9f950	qemu-system-alpha (after)
>
> or about 21k.

Hmm it compile tested fine on mine but:

qemu-system-sparc: /home/alex/lsrc/qemu/qemu.git/tcg/tcg.h:725: temp_arg: Assertion `n < tcg_ctx.nb_temps' failed.
Broken pipe
GTester: last random seed: R02Sd6911d835e4140adeb8780ec1bf70af1
qemu-system-sparc: /home/alex/lsrc/qemu/qemu.git/tcg/tcg.h:725: temp_arg: Assertion `n < tcg_ctx.nb_temps' failed.
Broken pipe
GTester: last random seed: R02Sff8343267ed0c224ea97a4f54e970d65
qemu-system-sparc: /home/alex/lsrc/qemu/qemu.git/tcg/tcg.h:725: temp_arg: Assertion `n < tcg_ctx.nb_temps' failed.
Broken pipe
GTester: last random seed: R02Sfb0d300ff314d4f31a4f5bb414f5f249
/home/alex/lsrc/qemu/qemu.git/tests/Makefile.include:824: recipe for target 'check-qtest-sparc' failed
make: *** [check-qtest-sparc] Error 1
test: Expected a combining operator like '-a' at index 1

And also Travis is quite un-happy:

  https://travis-ci.org/stsquad/qemu/builds/247099991


>
>
> r~
>
>
> Richard Henderson (16):
>   tcg: Merge opcode arguments into TCGOp
>   tcg: Propagate args to op->args in optimizer
>   tcg: Propagate args to op->args in tcg.c
>   tcg: Propagate TCGOp down to allocators
>   tcg: Introduce arg_temp
>   tcg: Add temp_global bit to TCGTemp
>   tcg: Return NULL temp for TCG_CALL_DUMMY_ARG
>   tcg: Introduce temp_arg
>   tcg: Use per-temp state data in liveness
>   tcg: Avoid loops against variable bounds
>   tcg: Change temp_allocate_frame arg to TCGTemp
>   tcg: Remove unused TCG_CALL_DUMMY_TCGV
>   tcg: Export temp_idx
>   tcg: Use per-temp state data in optimize
>   tcg: Define separate structures for TCGv_*
>   tcg: Store pointers to temporaries directly in TCGArg
>
>  tcg/optimize.c | 647 ++++++++++++++++++++++++++++++++-------------------------
>  tcg/tcg-op.c   |  99 ++++-----
>  tcg/tcg.c      | 610 ++++++++++++++++++++++++-----------------------------
>  tcg/tcg.h      | 287 ++++++++++++++-----------
>  4 files changed, 841 insertions(+), 802 deletions(-)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end
  2017-06-26 16:49 ` Alex Bennée
@ 2017-06-26 17:47   ` Richard Henderson
  2017-06-26 19:19     ` Alex Bennée
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-06-26 17:47 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, aurelien

On 06/26/2017 09:49 AM, Alex Bennée wrote:
> Hmm it compile tested fine on mine but:
> 
> qemu-system-sparc: /home/alex/lsrc/qemu/qemu.git/tcg/tcg.h:725: temp_arg: Assertion `n < tcg_ctx.nb_temps' failed.

Bah.  I fixed this, and then lost it while rebasing.
It's an uninitialized structure member.

> And also Travis is quite un-happy:
> 
>    https://travis-ci.org/stsquad/qemu/builds/247099991

The only real failure I see in here is also GTESTER for sparc.
Or am I missing something?


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end
  2017-06-26 17:47   ` Richard Henderson
@ 2017-06-26 19:19     ` Alex Bennée
  0 siblings, 0 replies; 40+ messages in thread
From: Alex Bennée @ 2017-06-26 19:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> On 06/26/2017 09:49 AM, Alex Bennée wrote:
>> Hmm it compile tested fine on mine but:
>>
>> qemu-system-sparc: /home/alex/lsrc/qemu/qemu.git/tcg/tcg.h:725: temp_arg: Assertion `n < tcg_ctx.nb_temps' failed.
>
> Bah.  I fixed this, and then lost it while rebasing.
> It's an uninitialized structure member.
>
>> And also Travis is quite un-happy:
>>
>>    https://travis-ci.org/stsquad/qemu/builds/247099991
>
> The only real failure I see in here is also GTESTER for sparc.
> Or am I missing something?

There is also a compile failure on Travis:

  https://travis-ci.org/stsquad/qemu/jobs/247099992#L3021
  https://travis-ci.org/stsquad/qemu/jobs/247099995#L3020

I hadn't actually clicked through all the failures because I assumed
they are mostly the same. I'll re-check on v2 ;-)

>
>
> r~


--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 06/16] tcg: Add temp_global bit to TCGTemp
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 06/16] tcg: Add temp_global bit to TCGTemp Richard Henderson
@ 2017-06-27  8:39   ` Alex Bennée
  2017-06-27 16:17     ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Bennée @ 2017-06-27  8:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> This avoids needing to test the index of a temp against nb_globals.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 15 ++++++++-------
>  tcg/tcg.c      | 11 ++++++++---
>  tcg/tcg.h      | 12 ++++++++----
>  3 files changed, 24 insertions(+), 14 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index d8c3a7e..55f9e83 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -116,25 +116,26 @@ static TCGOpcode op_to_movi(TCGOpcode op)
>      }
>  }
>
> -static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
> +static TCGArg find_better_copy(TCGContext *s, TCGArg arg)
>  {
> +    TCGTemp *ts = arg_temp(arg);
>      TCGArg i;
>
>      /* If this is already a global, we can't do better. */
> -    if (temp < s->nb_globals) {
> -        return temp;
> +    if (ts->temp_global) {
> +        return arg;
>      }
>
>      /* Search for a global first. */
> -    for (i = temps[temp].next_copy ; i != temp ; i = temps[i].next_copy) {
> +    for (i = temps[arg].next_copy ; i != arg; i = temps[i].next_copy) {
>          if (i < s->nb_globals) {
>              return i;
>          }
>      }
>
>      /* If it is a temp, search for a temp local. */
> -    if (!arg_temp(temp)->temp_local) {
> -        for (i = temps[temp].next_copy ; i != temp ; i = temps[i].next_copy) {
> +    if (!ts->temp_local) {
> +        for (i = temps[arg].next_copy ; i != arg; i = temps[i].next_copy) {
>              if (s->temps[i].temp_local) {
>                  return i;
>              }
> @@ -142,7 +143,7 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg temp)
>      }
>
>      /* Failure to find a better representation, return the same temp. */
> -    return temp;
> +    return arg;
>  }
>
>  static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 068ac51..0bb88b1 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -489,9 +489,14 @@ static inline TCGTemp *tcg_temp_alloc(TCGContext *s)
>
>  static inline TCGTemp *tcg_global_alloc(TCGContext *s)
>  {
> +    TCGTemp *ts;
> +
>      tcg_debug_assert(s->nb_globals == s->nb_temps);
>      s->nb_globals++;
> -    return tcg_temp_alloc(s);
> +    ts = tcg_temp_alloc(s);
> +    ts->temp_global = 1;
> +
> +    return ts;
>  }
>
>  static int tcg_global_reg_new_internal(TCGContext *s, TCGType type,
> @@ -967,7 +972,7 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
>  {
>      int idx = temp_idx(s, ts);
>
> -    if (idx < s->nb_globals) {
> +    if (ts->temp_global) {
>          pstrcpy(buf, buf_size, ts->name);
>      } else if (ts->temp_local) {
>          snprintf(buf, buf_size, "loc%d", idx - s->nb_globals);
> @@ -1905,7 +1910,7 @@ static void temp_free_or_dead(TCGContext *s, TCGTemp *ts, int free_or_dead)
>      }
>      ts->val_type = (free_or_dead < 0
>                      || ts->temp_local
> -                    || temp_idx(s, ts) < s->nb_globals
> +                    || ts->temp_global
>                      ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
>  }
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 70d9fda..3b35344 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -586,10 +586,14 @@ typedef struct TCGTemp {
>      unsigned int indirect_base:1;
>      unsigned int mem_coherent:1;
>      unsigned int mem_allocated:1;
> -    unsigned int temp_local:1; /* If true, the temp is saved across
> -                                  basic blocks. Otherwise, it is not
> -                                  preserved across basic blocks. */
> -    unsigned int temp_allocated:1; /* never used for code gen */
> +    /* If true, the temp is saved across both basic blocks and
> +       translation blocks.  */
> +    unsigned int temp_global:1;
> +    /* If true, the temp is saved across basic blocks but dead
> +       at the end of translation blocks.  If false, the temp is
> +       dead at the end of basic blocks.  */
> +    unsigned int temp_local:1;
> +    unsigned int temp_allocated:1;

This is where my knowledge of the TCG internals gets slightly confused.
As far as I'm aware all our TranslationBlocks are Basic Blocks - they
don't have any branches until the end of the block. What is the
distinction here?

Is a temp_global truly global? I thought the guest state was fully
rectified by the time we leave the basic block.

>
>      tcg_target_long val;
>      struct TCGTemp *mem_base;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 07/16] tcg: Return NULL temp for TCG_CALL_DUMMY_ARG
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 07/16] tcg: Return NULL temp for TCG_CALL_DUMMY_ARG Richard Henderson
@ 2017-06-27  8:47   ` Alex Bennée
  2017-06-27 16:36     ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Bennée @ 2017-06-27  8:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/tcg.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 3b35344..6c357e7 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -730,7 +730,7 @@ extern bool parallel_cpus;
>
>  static inline TCGTemp *arg_temp(TCGArg a)
>  {
> -    return &tcg_ctx.temps[a];
> +    return a == TCG_CALL_DUMMY_ARG ? NULL : &tcg_ctx.temps[a];
>  }

It doesn't look like a lot of calls to arg_temp are able to deal with a
NULL return and may well immediately deref the value. Are we sure the
cases the TCG_CALL_DUMMY arg is involved are narrowly defined?

>
>  static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 09/16] tcg: Use per-temp state data in liveness
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 09/16] tcg: Use per-temp state data in liveness Richard Henderson
@ 2017-06-27  8:57   ` Alex Bennée
  2017-06-27 16:39     ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Bennée @ 2017-06-27  8:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> This avoids having to allocate external memory for each temporary.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/tcg.c | 232 ++++++++++++++++++++++++++++++--------------------------------
>  tcg/tcg.h |   6 ++
>  2 files changed, 120 insertions(+), 118 deletions(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 0d758e4..e78140b 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1399,42 +1399,54 @@ TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *old_op,
>
>  /* liveness analysis: end of function: all temps are dead, and globals
>     should be in memory. */
> -static inline void tcg_la_func_end(TCGContext *s, uint8_t *temp_state)
> +static void tcg_la_func_end(TCGContext *s)
>  {
> -    memset(temp_state, TS_DEAD | TS_MEM, s->nb_globals);
> -    memset(temp_state + s->nb_globals, TS_DEAD, s->nb_temps - s->nb_globals);
> +    int ng = s->nb_globals;
> +    int nt = s->nb_temps;
> +    int i;
> +
> +    for (i = 0; i < ng; ++i) {
> +        s->temps[i].state = TS_DEAD | TS_MEM;
> +    }
> +    for (i = ng; i < nt; ++i) {
> +        s->temps[i].state = TS_DEAD;
> +    }
>  }
>
>  /* liveness analysis: end of basic block: all temps are dead, globals
>     and local temps should be in memory. */
> -static inline void tcg_la_bb_end(TCGContext *s, uint8_t *temp_state)
> +static void tcg_la_bb_end(TCGContext *s)
>  {
> -    int i, n;
> +    int ng = s->nb_globals;
> +    int nt = s->nb_temps;
> +    int i;
>
> -    tcg_la_func_end(s, temp_state);
> -    for (i = s->nb_globals, n = s->nb_temps; i < n; i++) {
> -        if (s->temps[i].temp_local) {
> -            temp_state[i] |= TS_MEM;
> -        }
> +    for (i = 0; i < ng; ++i) {
> +        s->temps[i].state = TS_DEAD | TS_MEM;
> +    }
> +    for (i = ng; i < nt; ++i) {
> +        s->temps[i].state = (s->temps[i].temp_local
> +                             ? TS_DEAD | TS_MEM
> +                             : TS_DEAD);
>      }
>  }
>
>  /* Liveness analysis : update the opc_arg_life array to tell if a
>     given input arguments is dead. Instructions updating dead
>     temporaries are removed. */
> -static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
> +static void liveness_pass_1(TCGContext *s)
>  {
>      int nb_globals = s->nb_globals;
>      int oi, oi_prev;
>
> -    tcg_la_func_end(s, temp_state);
> +    tcg_la_func_end(s);
>
>      for (oi = s->gen_op_buf[0].prev; oi != 0; oi = oi_prev) {
>          int i, nb_iargs, nb_oargs;
>          TCGOpcode opc_new, opc_new2;
>          bool have_opc_new2;
>          TCGLifeData arg_life = 0;
> -        TCGArg arg;
> +        TCGTemp *arg_ts;
>
>          TCGOp * const op = &s->gen_op_buf[oi];
>          TCGOpcode opc = op->opc;
> @@ -1454,8 +1466,8 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>                  /* pure functions can be removed if their result is unused */
>                  if (call_flags & TCG_CALL_NO_SIDE_EFFECTS) {
>                      for (i = 0; i < nb_oargs; i++) {
> -                        arg = op->args[i];
> -                        if (temp_state[arg] != TS_DEAD) {
> +                        arg_ts = arg_temp(op->args[i]);
> +                        if (arg_ts->state != TS_DEAD) {
>                              goto do_not_remove_call;
>                          }
>                      }
> @@ -1465,41 +1477,41 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>
>                      /* output args are dead */
>                      for (i = 0; i < nb_oargs; i++) {
> -                        arg = op->args[i];
> -                        if (temp_state[arg] & TS_DEAD) {
> +                        arg_ts = arg_temp(op->args[i]);
> +                        if (arg_ts->state & TS_DEAD) {
>                              arg_life |= DEAD_ARG << i;
>                          }
> -                        if (temp_state[arg] & TS_MEM) {
> +                        if (arg_ts->state & TS_MEM) {
>                              arg_life |= SYNC_ARG << i;
>                          }
> -                        temp_state[arg] = TS_DEAD;
> +                        arg_ts->state = TS_DEAD;
>                      }
>
>                      if (!(call_flags & (TCG_CALL_NO_WRITE_GLOBALS |
>                                          TCG_CALL_NO_READ_GLOBALS))) {
>                          /* globals should go back to memory */
> -                        memset(temp_state, TS_DEAD | TS_MEM, nb_globals);
> +                        for (i = 0; i < nb_globals; i++) {
> +                            s->temps[i].state = TS_DEAD | TS_MEM;
> +                        }
>                      } else if (!(call_flags & TCG_CALL_NO_READ_GLOBALS)) {
>                          /* globals should be synced to memory */
>                          for (i = 0; i < nb_globals; i++) {
> -                            temp_state[i] |= TS_MEM;
> +                            s->temps[i].state |= TS_MEM;
>                          }
>                      }
>
>                      /* record arguments that die in this helper */
>                      for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
> -                        arg = op->args[i];
> -                        if (arg != TCG_CALL_DUMMY_ARG) {
> -                            if (temp_state[arg] & TS_DEAD) {
> -                                arg_life |= DEAD_ARG << i;
> -                            }
> +                        arg_ts = arg_temp(op->args[i]);
> +                        if (arg_ts && arg_ts->state & TS_DEAD) {
> +                            arg_life |= DEAD_ARG << i;
>                          }
>                      }
>                      /* input arguments are live for preceding opcodes */
>                      for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
> -                        arg = op->args[i];
> -                        if (arg != TCG_CALL_DUMMY_ARG) {
> -                            temp_state[arg] &= ~TS_DEAD;
> +                        arg_ts = arg_temp(op->args[i]);
> +                        if (arg_ts) {
> +                            arg_ts->state &= ~TS_DEAD;
>                          }
>                      }
>                  }
> @@ -1509,7 +1521,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>              break;
>          case INDEX_op_discard:
>              /* mark the temporary as dead */
> -            temp_state[op->args[0]] = TS_DEAD;
> +            arg_temp(op->args[0])->state = TS_DEAD;
>              break;
>
>          case INDEX_op_add2_i32:
> @@ -1530,8 +1542,8 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>                 the low part.  The result can be optimized to a simple
>                 add or sub.  This happens often for x86_64 guest when the
>                 cpu mode is set to 32 bit.  */
> -            if (temp_state[op->args[1]] == TS_DEAD) {
> -                if (temp_state[op->args[0]] == TS_DEAD) {
> +            if (arg_temp(op->args[1])->state == TS_DEAD) {
> +                if (arg_temp(op->args[0])->state == TS_DEAD) {
>                      goto do_remove;
>                  }
>                  /* Replace the opcode and adjust the args in place,
> @@ -1568,8 +1580,8 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>          do_mul2:
>              nb_iargs = 2;
>              nb_oargs = 2;
> -            if (temp_state[op->args[1]] == TS_DEAD) {
> -                if (temp_state[op->args[0]] == TS_DEAD) {
> +            if (arg_temp(op->args[1])->state == TS_DEAD) {
> +                if (arg_temp(op->args[0])->state == TS_DEAD) {
>                      /* Both parts of the operation are dead.  */
>                      goto do_remove;
>                  }
> @@ -1577,7 +1589,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>                  op->opc = opc = opc_new;
>                  op->args[1] = op->args[2];
>                  op->args[2] = op->args[3];
> -            } else if (temp_state[op->args[0]] == TS_DEAD && have_opc_new2) {
> +            } else if (arg_temp(op->args[0])->state == TS_DEAD && have_opc_new2) {
>                  /* The low part of the operation is dead; generate the high. */
>                  op->opc = opc = opc_new2;
>                  op->args[0] = op->args[1];
> @@ -1600,7 +1612,7 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>                 implies side effects */
>              if (!(def->flags & TCG_OPF_SIDE_EFFECTS) && nb_oargs != 0) {
>                  for (i = 0; i < nb_oargs; i++) {
> -                    if (temp_state[op->args[i]] != TS_DEAD) {
> +                    if (arg_temp(op->args[i])->state != TS_DEAD) {
>                          goto do_not_remove;
>                      }
>                  }
> @@ -1610,36 +1622,36 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>              do_not_remove:
>                  /* output args are dead */
>                  for (i = 0; i < nb_oargs; i++) {
> -                    arg = op->args[i];
> -                    if (temp_state[arg] & TS_DEAD) {
> +                    arg_ts = arg_temp(op->args[i]);
> +                    if (arg_ts->state & TS_DEAD) {
>                          arg_life |= DEAD_ARG << i;
>                      }
> -                    if (temp_state[arg] & TS_MEM) {
> +                    if (arg_ts->state & TS_MEM) {
>                          arg_life |= SYNC_ARG << i;
>                      }
> -                    temp_state[arg] = TS_DEAD;
> +                    arg_ts->state = TS_DEAD;
>                  }
>
>                  /* if end of basic block, update */
>                  if (def->flags & TCG_OPF_BB_END) {
> -                    tcg_la_bb_end(s, temp_state);
> +                    tcg_la_bb_end(s);
>                  } else if (def->flags & TCG_OPF_SIDE_EFFECTS) {
>                      /* globals should be synced to memory */
>                      for (i = 0; i < nb_globals; i++) {
> -                        temp_state[i] |= TS_MEM;
> +                        s->temps[i].state |= TS_MEM;
>                      }
>                  }
>
>                  /* record arguments that die in this opcode */
>                  for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
> -                    arg = op->args[i];
> -                    if (temp_state[arg] & TS_DEAD) {
> +                    arg_ts = arg_temp(op->args[i]);
> +                    if (arg_ts->state & TS_DEAD) {
>                          arg_life |= DEAD_ARG << i;
>                      }
>                  }
>                  /* input arguments are live for preceding opcodes */
>                  for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
> -                    temp_state[op->args[i]] &= ~TS_DEAD;
> +                    arg_temp(op->args[i])->state &= ~TS_DEAD;
>                  }
>              }
>              break;
> @@ -1649,16 +1661,12 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
>  }
>
>  /* Liveness analysis: Convert indirect regs to direct temporaries.  */
> -static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
> +static bool liveness_pass_2(TCGContext *s)
>  {
>      int nb_globals = s->nb_globals;
> -    int16_t *dir_temps;
>      int i, oi, oi_next;
>      bool changes = false;
>
> -    dir_temps = tcg_malloc(nb_globals * sizeof(int16_t));
> -    memset(dir_temps, 0, nb_globals * sizeof(int16_t));
> -
>      /* Create a temporary for each indirect global.  */
>      for (i = 0; i < nb_globals; ++i) {
>          TCGTemp *its = &s->temps[i];
> @@ -1666,19 +1674,19 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
>              TCGTemp *dts = tcg_temp_alloc(s);
>              dts->type = its->type;
>              dts->base_type = its->base_type;
> -            dir_temps[i] = temp_idx(s, dts);
> +            its->state_ptr = dts;
>          }
> +        /* All globals begin dead.  */
> +        its->state = TS_DEAD;
>      }
>
> -    memset(temp_state, TS_DEAD, nb_globals);
> -
>      for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
>          TCGOp *op = &s->gen_op_buf[oi];
>          TCGOpcode opc = op->opc;
>          const TCGOpDef *def = &tcg_op_defs[opc];
>          TCGLifeData arg_life = op->life;
>          int nb_iargs, nb_oargs, call_flags;
> -        TCGArg arg, dir;
> +        TCGTemp *arg_ts, *dir_ts;
>
>          oi_next = op->next;
>
> @@ -1706,24 +1714,20 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
>
>          /* Make sure that input arguments are available.  */
>          for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
> -            arg = op->args[i];
> -            /* Note this unsigned test catches TCG_CALL_ARG_DUMMY too.  */
> -            if (arg < nb_globals) {

This test is gone but....

> -                dir = dir_temps[arg];
> -                if (dir != 0 && temp_state[arg] == TS_DEAD) {
> -                    TCGTemp *its = arg_temp(arg);
> -                    TCGOpcode lopc = (its->type == TCG_TYPE_I32
> -                                      ? INDEX_op_ld_i32
> -                                      : INDEX_op_ld_i64);
> -                    TCGOp *lop = tcg_op_insert_before(s, op, lopc, 3);
> -
> -                    lop->args[0] = dir;
> -                    lop->args[1] = temp_arg(its->mem_base);
> -                    lop->args[2] = its->mem_offset;
> -
> -                    /* Loaded, but synced with memory.  */
> -                    temp_state[arg] = TS_MEM;
> -                }
> +            arg_ts = arg_temp(op->args[i]);
> +            dir_ts = arg_ts->state_ptr;
> +            if (dir_ts && arg_ts->state == TS_DEAD) {

...we de-ref arg_ts here. So what if it was a TCG_CALL_ARG_DUMMY?

> +                TCGOpcode lopc = (arg_ts->type == TCG_TYPE_I32
> +                                  ? INDEX_op_ld_i32
> +                                  : INDEX_op_ld_i64);
> +                TCGOp *lop = tcg_op_insert_before(s, op, lopc, 3);
> +
> +                lop->args[0] = temp_arg(dir_ts);
> +                lop->args[1] = temp_arg(arg_ts->mem_base);
> +                lop->args[2] = arg_ts->mem_offset;
> +
> +                /* Loaded, but synced with memory.  */
> +                arg_ts->state = TS_MEM;
>              }
>          }
>
> @@ -1731,15 +1735,13 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
>             No action is required except keeping temp_state up to date
>             so that we reload when needed.  */
>          for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
> -            arg = op->args[i];
> -            if (arg < nb_globals) {
> -                dir = dir_temps[arg];
> -                if (dir != 0) {
> -                    op->args[i] = dir;
> -                    changes = true;
> -                    if (IS_DEAD_ARG(i)) {
> -                        temp_state[arg] = TS_DEAD;
> -                    }
> +            arg_ts = arg_temp(op->args[i]);
> +            dir_ts = arg_ts->state_ptr;
> +            if (dir_ts) {
> +                op->args[i] = temp_arg(dir_ts);
> +                changes = true;
> +                if (IS_DEAD_ARG(i)) {
> +                    arg_ts->state = TS_DEAD;
>                  }
>              }
>          }
> @@ -1752,51 +1754,49 @@ static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
>              for (i = 0; i < nb_globals; ++i) {
>                  /* Liveness should see that globals are synced back,
>                     that is, either TS_DEAD or TS_MEM.  */
> -                tcg_debug_assert(dir_temps[i] == 0
> -                                 || temp_state[i] != 0);
> +                arg_ts = &s->temps[i];
> +                tcg_debug_assert(arg_ts->state_ptr == 0
> +                                 || arg_ts->state != 0);
>              }
>          } else {
>              for (i = 0; i < nb_globals; ++i) {
>                  /* Liveness should see that globals are saved back,
>                     that is, TS_DEAD, waiting to be reloaded.  */
> -                tcg_debug_assert(dir_temps[i] == 0
> -                                 || temp_state[i] == TS_DEAD);
> +                arg_ts = &s->temps[i];
> +                tcg_debug_assert(arg_ts->state_ptr == 0
> +                                 || arg_ts->state == TS_DEAD);
>              }
>          }
>
>          /* Outputs become available.  */
>          for (i = 0; i < nb_oargs; i++) {
> -            arg = op->args[i];
> -            if (arg >= nb_globals) {
> -                continue;
> -            }
> -            dir = dir_temps[arg];
> -            if (dir == 0) {
> +            arg_ts = arg_temp(op->args[i]);
> +            dir_ts = arg_ts->state_ptr;
> +            if (!dir_ts) {
>                  continue;
>              }
> -            op->args[i] = dir;
> +            op->args[i] = temp_arg(dir_ts);
>              changes = true;
>
>              /* The output is now live and modified.  */
> -            temp_state[arg] = 0;
> +            arg_ts->state = 0;
>
>              /* Sync outputs upon their last write.  */
>              if (NEED_SYNC_ARG(i)) {
> -                TCGTemp *its = arg_temp(arg);
> -                TCGOpcode sopc = (its->type == TCG_TYPE_I32
> +                TCGOpcode sopc = (arg_ts->type == TCG_TYPE_I32
>                                    ? INDEX_op_st_i32
>                                    : INDEX_op_st_i64);
>                  TCGOp *sop = tcg_op_insert_after(s, op, sopc, 3);
>
> -                sop->args[0] = dir;
> -                sop->args[1] = temp_arg(its->mem_base);
> -                sop->args[2] = its->mem_offset;
> +                sop->args[0] = temp_arg(dir_ts);
> +                sop->args[1] = temp_arg(arg_ts->mem_base);
> +                sop->args[2] = arg_ts->mem_offset;
>
> -                temp_state[arg] = TS_MEM;
> +                arg_ts->state = TS_MEM;
>              }
>              /* Drop outputs that are dead.  */
>              if (IS_DEAD_ARG(i)) {
> -                temp_state[arg] = TS_DEAD;
> +                arg_ts->state = TS_DEAD;
>              }
>          }
>      }
> @@ -2569,27 +2569,23 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>      s->la_time -= profile_getclock();
>  #endif
>
> -    {
> -        uint8_t *temp_state = tcg_malloc(s->nb_temps + s->nb_indirects);
> -
> -        liveness_pass_1(s, temp_state);
> +    liveness_pass_1(s);
>
> -        if (s->nb_indirects > 0) {
> +    if (s->nb_indirects > 0) {
>  #ifdef DEBUG_DISAS
> -            if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_IND)
> -                         && qemu_log_in_addr_range(tb->pc))) {
> -                qemu_log_lock();
> -                qemu_log("OP before indirect lowering:\n");
> -                tcg_dump_ops(s);
> -                qemu_log("\n");
> -                qemu_log_unlock();
> -            }
> +        if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_IND)
> +                     && qemu_log_in_addr_range(tb->pc))) {
> +            qemu_log_lock();
> +            qemu_log("OP before indirect lowering:\n");
> +            tcg_dump_ops(s);
> +            qemu_log("\n");
> +            qemu_log_unlock();
> +        }
>  #endif
> -            /* Replace indirect temps with direct temps.  */
> -            if (liveness_pass_2(s, temp_state)) {
> -                /* If changes were made, re-run liveness.  */
> -                liveness_pass_1(s, temp_state);
> -            }
> +        /* Replace indirect temps with direct temps.  */
> +        if (liveness_pass_2(s)) {
> +            /* If changes were made, re-run liveness.  */
> +            liveness_pass_1(s);
>          }
>      }
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 80012b5..1eeeca5 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -599,6 +599,12 @@ typedef struct TCGTemp {
>      struct TCGTemp *mem_base;
>      intptr_t mem_offset;
>      const char *name;
> +
> +    /* Pass-specific information that can be stored for a temporary.
> +       One word worth of integer data, and one pointer to data
> +       allocated separately.  */
> +    uintptr_t state;
> +    void *state_ptr;
>  } TCGTemp;
>
>  typedef struct TCGContext TCGContext;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 10/16] tcg: Avoid loops against variable bounds
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 10/16] tcg: Avoid loops against variable bounds Richard Henderson
@ 2017-06-27  9:01   ` Alex Bennée
  0 siblings, 0 replies; 40+ messages in thread
From: Alex Bennée @ 2017-06-27  9:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> Copy s->nb_globals or s->nb_temps to a local variable for the purposes
> of iteration.  This should allow the compiler to use low-overhead
> looping constructs on some hosts.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/tcg.c | 27 ++++++++++-----------------
>  1 file changed, 10 insertions(+), 17 deletions(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index e78140b..c228f1e 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -943,23 +943,16 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
>
>  static void tcg_reg_alloc_start(TCGContext *s)
>  {
> -    int i;
> +    int i, n;
>      TCGTemp *ts;
> -    for(i = 0; i < s->nb_globals; i++) {
> +
> +    for (i = 0, n = s->nb_globals; i < n; i++) {
>          ts = &s->temps[i];
> -        if (ts->fixed_reg) {
> -            ts->val_type = TEMP_VAL_REG;
> -        } else {
> -            ts->val_type = TEMP_VAL_MEM;
> -        }
> +        ts->val_type = (ts->fixed_reg ? TEMP_VAL_REG : TEMP_VAL_MEM);
>      }
> -    for(i = s->nb_globals; i < s->nb_temps; i++) {
> +    for (n = s->nb_temps; i < n; i++) {

A one line comment like /* i continues on from s->nb_globals above */
might prevent a momentary confusion when reading through.

>          ts = &s->temps[i];
> -        if (ts->temp_local) {
> -            ts->val_type = TEMP_VAL_MEM;
> -        } else {
> -            ts->val_type = TEMP_VAL_DEAD;
> -        }
> +        ts->val_type = (ts->temp_local ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
>          ts->mem_allocated = 0;
>          ts->fixed_reg = 0;
>      }
> @@ -2050,9 +2043,9 @@ static void temp_save(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs)
>     temporary registers needs to be allocated to store a constant. */
>  static void save_globals(TCGContext *s, TCGRegSet allocated_regs)
>  {
> -    int i;
> +    int i, n;
>
> -    for (i = 0; i < s->nb_globals; i++) {
> +    for (i = 0, n = s->nb_globals; i < n; i++) {
>          temp_save(s, &s->temps[i], allocated_regs);
>      }
>  }
> @@ -2062,9 +2055,9 @@ static void save_globals(TCGContext *s, TCGRegSet allocated_regs)
>     temporary registers needs to be allocated to store a constant. */
>  static void sync_globals(TCGContext *s, TCGRegSet allocated_regs)
>  {
> -    int i;
> +    int i, n;
>
> -    for (i = 0; i < s->nb_globals; i++) {
> +    for (i = 0, n = s->nb_globals; i < n; i++) {
>          TCGTemp *ts = &s->temps[i];
>          tcg_debug_assert(ts->val_type != TEMP_VAL_REG
>                           || ts->fixed_reg

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 12/16] tcg: Remove unused TCG_CALL_DUMMY_TCGV
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 12/16] tcg: Remove unused TCG_CALL_DUMMY_TCGV Richard Henderson
@ 2017-06-27  9:42   ` Alex Bennée
  0 siblings, 0 replies; 40+ messages in thread
From: Alex Bennée @ 2017-06-27  9:42 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/tcg.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 1eeeca5..4f69d0c 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -503,7 +503,6 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
>  #define TCG_CALL_NO_WG_SE       (TCG_CALL_NO_WG | TCG_CALL_NO_SE)
>
>  /* used to align parameters */
> -#define TCG_CALL_DUMMY_TCGV     MAKE_TCGV_I32(-1)
>  #define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
>
>  /* Conditions.  Note that these are laid out for easy manipulation by


--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 13/16] tcg: Export temp_idx
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 13/16] tcg: Export temp_idx Richard Henderson
@ 2017-06-27  9:46   ` Alex Bennée
  2017-06-27 16:43     ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Bennée @ 2017-06-27  9:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> At the same time, drop the TCGContext argument and use tcg_ctx instead.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/tcg.c | 15 ++++-----------
>  tcg/tcg.h |  7 ++++++-
>  2 files changed, 10 insertions(+), 12 deletions(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index f8d96fa..26931a7 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -473,13 +473,6 @@ void tcg_func_start(TCGContext *s)
>      s->be = tcg_malloc(sizeof(TCGBackendData));
>  }
>
> -static inline int temp_idx(TCGContext *s, TCGTemp *ts)
> -{
> -    ptrdiff_t n = ts - s->temps;
> -    tcg_debug_assert(n >= 0 && n < s->nb_temps);
> -    return n;
> -}
> -
>  static inline TCGTemp *tcg_temp_alloc(TCGContext *s)
>  {
>      int n = s->nb_temps++;
> @@ -516,7 +509,7 @@ static int tcg_global_reg_new_internal(TCGContext *s, TCGType type,
>      ts->name = name;
>      tcg_regset_set_reg(s->reserved_regs, reg);
>
> -    return temp_idx(s, ts);
> +    return temp_idx(ts);
>  }
>
>  void tcg_set_frame(TCGContext *s, TCGReg reg, intptr_t start, intptr_t size)
> @@ -605,7 +598,7 @@ int tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
>          ts->mem_offset = offset;
>          ts->name = name;
>      }
> -    return temp_idx(s, ts);
> +    return temp_idx(ts);
>  }
>
>  static int tcg_temp_new_internal(TCGType type, int temp_local)
> @@ -645,7 +638,7 @@ static int tcg_temp_new_internal(TCGType type, int temp_local)
>              ts->temp_allocated = 1;
>              ts->temp_local = temp_local;
>          }
> -        idx = temp_idx(s, ts);
> +        idx = temp_idx(ts);
>      }
>
>  #if defined(CONFIG_DEBUG_TCG)
> @@ -963,7 +956,7 @@ static void tcg_reg_alloc_start(TCGContext *s)
>  static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
>                                   TCGTemp *ts)
>  {
> -    int idx = temp_idx(s, ts);
> +    int idx = temp_idx(ts);
>
>      if (ts->temp_global) {
>          pstrcpy(buf, buf_size, ts->name);
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 4f69d0c..b75a745 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -733,13 +733,18 @@ struct TCGContext {
>  extern TCGContext tcg_ctx;
>  extern bool parallel_cpus;
>
> -static inline TCGArg temp_arg(TCGTemp *ts)
> +static inline size_t temp_idx(TCGTemp *ts)
>  {
>      ptrdiff_t n = ts - tcg_ctx.temps;
>      tcg_debug_assert(n >= 0 && n < tcg_ctx.nb_temps);
>      return n;
>  }
>
> +static inline TCGArg temp_arg(TCGTemp *ts)
> +{
> +    return temp_idx(ts);
> +}

I'm confused at the dropping of TCGArg in favour of size_t only for
temp_arg to implicitly cast it back. Was this meant to be part of
another patch?

> +
>  static inline TCGTemp *arg_temp(TCGArg a)
>  {
>      return a == TCG_CALL_DUMMY_ARG ? NULL : &tcg_ctx.temps[a];


--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 14/16] tcg: Use per-temp state data in optimize
  2017-06-21  2:48 ` [Qemu-devel] [PATCH 14/16] tcg: Use per-temp state data in optimize Richard Henderson
@ 2017-06-27  9:59   ` Alex Bennée
  0 siblings, 0 replies; 40+ messages in thread
From: Alex Bennée @ 2017-06-27  9:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> While we're touching many of the lines anyway, adjust the naming
> of the functions to better distinguish when "TCGArg" vs "TCGTemp"
> should be used.

Could we add definitions of TCGArg and TCGTemp into tcg/README?

>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 424 +++++++++++++++++++++++++++++++++------------------------
>  tcg/tcg.h      |   5 +
>  2 files changed, 249 insertions(+), 180 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 55f9e83..eb09ae5 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -34,34 +34,63 @@
>
>  struct tcg_temp_info {
>      bool is_const;
> -    uint16_t prev_copy;
> -    uint16_t next_copy;
> +    TCGTemp *prev_copy;
> +    TCGTemp *next_copy;
>      tcg_target_ulong val;
>      tcg_target_ulong mask;
>  };
>
> -static struct tcg_temp_info temps[TCG_MAX_TEMPS];
> +static struct tcg_temp_info temps_[TCG_MAX_TEMPS];
>  static TCGTempSet temps_used;
>
> -static inline bool temp_is_const(TCGArg arg)
> +static inline struct tcg_temp_info *ts_info(TCGTemp *ts)
>  {
> -    return temps[arg].is_const;
> +    return ts->state_ptr;
>  }
>
> -static inline bool temp_is_copy(TCGArg arg)
> +static inline struct tcg_temp_info *arg_info(TCGArg arg)
>  {
> -    return temps[arg].next_copy != arg;
> +    return ts_info(arg_temp(arg));
> +}
> +
> +static inline bool ts_is_const(TCGTemp *ts)
> +{
> +    return ts_info(ts)->is_const;
> +}
> +
> +static inline bool arg_is_const(TCGArg arg)
> +{
> +    return ts_is_const(arg_temp(arg));
> +}
> +
> +static inline bool ts_is_copy(TCGTemp *ts)
> +{
> +    return ts_info(ts)->next_copy != ts;
> +}
> +
> +static inline bool arg_is_copy(TCGArg arg)
> +{
> +    return ts_is_copy(arg_temp(arg));
>  }
>
>  /* Reset TEMP's state, possibly removing the temp for the list of copies.  */
> -static void reset_temp(TCGArg temp)
> +static void reset_ts(TCGTemp *ts)
>  {
> -    temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
> -    temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
> -    temps[temp].next_copy = temp;
> -    temps[temp].prev_copy = temp;
> -    temps[temp].is_const = false;
> -    temps[temp].mask = -1;
> +    struct tcg_temp_info *ti = ts_info(ts);
> +    struct tcg_temp_info *pi = ts_info(ti->prev_copy);
> +    struct tcg_temp_info *ni = ts_info(ti->next_copy);
> +
> +    ni->prev_copy = ti->prev_copy;
> +    pi->next_copy = ti->next_copy;
> +    ti->next_copy = ts;
> +    ti->prev_copy = ts;
> +    ti->is_const = false;
> +    ti->mask = -1;
> +}
> +
> +static void reset_temp(TCGArg arg)
> +{
> +    reset_ts(arg_temp(arg));
>  }
>
>  /* Reset all temporaries, given that there are NB_TEMPS of them.  */
> @@ -71,17 +100,26 @@ static void reset_all_temps(int nb_temps)
>  }
>
>  /* Initialize and activate a temporary.  */
> -static void init_temp_info(TCGArg temp)
> +static void init_ts_info(TCGTemp *ts)
>  {
> -    if (!test_bit(temp, temps_used.l)) {
> -        temps[temp].next_copy = temp;
> -        temps[temp].prev_copy = temp;
> -        temps[temp].is_const = false;
> -        temps[temp].mask = -1;
> -        set_bit(temp, temps_used.l);
> +    size_t idx = temp_idx(ts);
> +    if (!test_bit(idx, temps_used.l)) {
> +        struct tcg_temp_info *ti = &temps_[idx];
> +
> +        ts->state_ptr = ti;
> +        ti->next_copy = ts;
> +        ti->prev_copy = ts;
> +        ti->is_const = false;
> +        ti->mask = -1;
> +        set_bit(idx, temps_used.l);
>      }
>  }
>
> +static void init_arg_info(TCGArg arg)
> +{
> +    init_ts_info(arg_temp(arg));
> +}
> +
>  static int op_bits(TCGOpcode op)
>  {
>      const TCGOpDef *def = &tcg_op_defs[op];
> @@ -119,7 +157,7 @@ static TCGOpcode op_to_movi(TCGOpcode op)
>  static TCGArg find_better_copy(TCGContext *s, TCGArg arg)
>  {
>      TCGTemp *ts = arg_temp(arg);
> -    TCGArg i;
> +    TCGTemp *i;
>
>      /* If this is already a global, we can't do better. */
>      if (ts->temp_global) {
> @@ -127,17 +165,17 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg arg)
>      }
>
>      /* Search for a global first. */
> -    for (i = temps[arg].next_copy ; i != arg; i = temps[i].next_copy) {
> -        if (i < s->nb_globals) {
> -            return i;
> +    for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
> +        if (i->temp_global) {
> +            return temp_arg(i);
>          }
>      }
>
>      /* If it is a temp, search for a temp local. */
>      if (!ts->temp_local) {
> -        for (i = temps[arg].next_copy ; i != arg; i = temps[i].next_copy) {
> -            if (s->temps[i].temp_local) {
> -                return i;
> +        for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
> +            if (ts->temp_local) {
> +                return temp_arg(i);
>              }
>          }
>      }
> @@ -146,20 +184,20 @@ static TCGArg find_better_copy(TCGContext *s, TCGArg arg)
>      return arg;
>  }
>
> -static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
> +static bool ts_are_copies(TCGTemp *ts1, TCGTemp *ts2)
>  {
> -    TCGArg i;
> +    TCGTemp *i;
>
> -    if (arg1 == arg2) {
> +    if (ts1 == ts2) {
>          return true;
>      }
>
> -    if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) {
> +    if (!ts_is_copy(ts1) || !ts_is_copy(ts2)) {
>          return false;
>      }
>
> -    for (i = temps[arg1].next_copy ; i != arg1 ; i = temps[i].next_copy) {
> -        if (i == arg2) {
> +    for (i = ts_info(ts1)->next_copy; i != ts1; i = ts_info(i)->next_copy) {
> +        if (i == ts2) {
>              return true;
>          }
>      }
> @@ -167,22 +205,28 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
>      return false;
>  }
>
> +static bool args_are_copies(TCGArg arg1, TCGArg arg2)
> +{
> +    return ts_are_copies(arg_temp(arg1), arg_temp(arg2));
> +}
> +
>  static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val)
>  {
>      TCGOpcode new_op = op_to_movi(op->opc);
>      tcg_target_ulong mask;
> +    struct tcg_temp_info *di = arg_info(dst);
>
>      op->opc = new_op;
>
>      reset_temp(dst);
> -    temps[dst].is_const = true;
> -    temps[dst].val = val;
> +    di->is_const = true;
> +    di->val = val;
>      mask = val;
>      if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
>          /* High bits of the destination are now garbage.  */
>          mask |= ~0xffffffffull;
>      }
> -    temps[dst].mask = mask;
> +    di->mask = mask;
>
>      op->args[0] = dst;
>      op->args[1] = val;
> @@ -190,35 +234,44 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val)
>
>  static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
>  {
> -    if (temps_are_copies(dst, src)) {
> +    TCGTemp *dst_ts = arg_temp(dst);
> +    TCGTemp *src_ts = arg_temp(src);
> +    struct tcg_temp_info *di;
> +    struct tcg_temp_info *si;
> +    tcg_target_ulong mask;
> +    TCGOpcode new_op;
> +
> +    if (ts_are_copies(dst_ts, src_ts)) {
>          tcg_op_remove(s, op);
>          return;
>      }
>
> -    TCGOpcode new_op = op_to_mov(op->opc);
> -    tcg_target_ulong mask;
> +    reset_ts(dst_ts);
> +    di = ts_info(dst_ts);
> +    si = ts_info(src_ts);
> +    new_op = op_to_mov(op->opc);
>
>      op->opc = new_op;
> +    op->args[0] = dst;
> +    op->args[1] = src;
>
> -    reset_temp(dst);
> -    mask = temps[src].mask;
> +    mask = si->mask;
>      if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
>          /* High bits of the destination are now garbage.  */
>          mask |= ~0xffffffffull;
>      }
> -    temps[dst].mask = mask;
> -
> -    if (arg_temp(src)->type == arg_temp(dst)->type) {
> -        temps[dst].next_copy = temps[src].next_copy;
> -        temps[dst].prev_copy = src;
> -        temps[temps[dst].next_copy].prev_copy = dst;
> -        temps[src].next_copy = dst;
> -        temps[dst].is_const = temps[src].is_const;
> -        temps[dst].val = temps[src].val;
> -    }
> +    di->mask = mask;
>
> -    op->args[0] = dst;
> -    op->args[1] = src;
> +    if (src_ts->type == dst_ts->type) {
> +        struct tcg_temp_info *ni = ts_info(si->next_copy);
> +
> +        di->next_copy = si->next_copy;
> +        di->prev_copy = src_ts;
> +        ni->prev_copy = dst_ts;
> +        si->next_copy = dst_ts;
> +        di->is_const = si->is_const;
> +        di->val = si->val;
> +    }
>  }
>
>  static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
> @@ -465,18 +518,20 @@ static bool do_constant_folding_cond_eq(TCGCond c)
>  static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
>                                         TCGArg y, TCGCond c)
>  {
> -    if (temp_is_const(x) && temp_is_const(y)) {
> +    tcg_target_ulong xv = arg_info(x)->val;
> +    tcg_target_ulong yv = arg_info(y)->val;
> +    if (arg_is_const(x) && arg_is_const(y)) {
>          switch (op_bits(op)) {
>          case 32:
> -            return do_constant_folding_cond_32(temps[x].val, temps[y].val, c);
> +            return do_constant_folding_cond_32(xv, yv, c);
>          case 64:
> -            return do_constant_folding_cond_64(temps[x].val, temps[y].val, c);
> +            return do_constant_folding_cond_64(xv, yv, c);
>          default:
>              tcg_abort();
>          }
> -    } else if (temps_are_copies(x, y)) {
> +    } else if (args_are_copies(x, y)) {
>          return do_constant_folding_cond_eq(c);
> -    } else if (temp_is_const(y) && temps[y].val == 0) {
> +    } else if (arg_is_const(y) && yv == 0) {
>          switch (c) {
>          case TCG_COND_LTU:
>              return 0;
> @@ -496,12 +551,15 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
>      TCGArg al = p1[0], ah = p1[1];
>      TCGArg bl = p2[0], bh = p2[1];
>
> -    if (temp_is_const(bl) && temp_is_const(bh)) {
> -        uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val;
> +    if (arg_is_const(bl) && arg_is_const(bh)) {
> +        tcg_target_ulong blv = arg_info(bl)->val;
> +        tcg_target_ulong bhv = arg_info(bh)->val;
> +        uint64_t b = deposit64(blv, 32, 32, bhv);
>
> -        if (temp_is_const(al) && temp_is_const(ah)) {
> -            uint64_t a;
> -            a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val;
> +        if (arg_is_const(al) && arg_is_const(ah)) {
> +            tcg_target_ulong alv = arg_info(al)->val;
> +            tcg_target_ulong ahv = arg_info(ah)->val;
> +            uint64_t a = deposit64(alv, 32, 32, ahv);
>              return do_constant_folding_cond_64(a, b, c);
>          }
>          if (b == 0) {
> @@ -515,7 +573,7 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
>              }
>          }
>      }
> -    if (temps_are_copies(al, bl) && temps_are_copies(ah, bh)) {
> +    if (args_are_copies(al, bl) && args_are_copies(ah, bh)) {
>          return do_constant_folding_cond_eq(c);
>      }
>      return 2;
> @@ -525,8 +583,8 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
>  {
>      TCGArg a1 = *p1, a2 = *p2;
>      int sum = 0;
> -    sum += temp_is_const(a1);
> -    sum -= temp_is_const(a2);
> +    sum += arg_is_const(a1);
> +    sum -= arg_is_const(a2);
>
>      /* Prefer the constant in second argument, and then the form
>         op a, a, b, which is better handled on non-RISC hosts. */
> @@ -541,10 +599,10 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
>  static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
>  {
>      int sum = 0;
> -    sum += temp_is_const(p1[0]);
> -    sum += temp_is_const(p1[1]);
> -    sum -= temp_is_const(p2[0]);
> -    sum -= temp_is_const(p2[1]);
> +    sum += arg_is_const(p1[0]);
> +    sum += arg_is_const(p1[1]);
> +    sum -= arg_is_const(p2[0]);
> +    sum -= arg_is_const(p2[1]);
>      if (sum > 0) {
>          TCGArg t;
>          t = p1[0], p1[0] = p2[0], p2[0] = t;
> @@ -586,22 +644,22 @@ void tcg_optimize(TCGContext *s)
>              nb_oargs = op->callo;
>              nb_iargs = op->calli;
>              for (i = 0; i < nb_oargs + nb_iargs; i++) {
> -                tmp = op->args[i];
> -                if (tmp != TCG_CALL_DUMMY_ARG) {
> -                    init_temp_info(tmp);
> +                TCGTemp *ts = arg_temp(op->args[i]);
> +                if (ts) {
> +                    init_ts_info(ts);
>                  }
>              }
>          } else {
>              nb_oargs = def->nb_oargs;
>              nb_iargs = def->nb_iargs;
>              for (i = 0; i < nb_oargs + nb_iargs; i++) {
> -                init_temp_info(op->args[i]);
> +                init_arg_info(op->args[i]);
>              }
>          }
>
>          /* Do copy propagation */
>          for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
> -            if (temp_is_copy(op->args[i])) {
> +            if (arg_is_copy(op->args[i])) {
>                  op->args[i] = find_better_copy(s, op->args[i]);
>              }
>          }
> @@ -671,7 +729,8 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(sar):
>          CASE_OP_32_64(rotl):
>          CASE_OP_32_64(rotr):
> -            if (temp_is_const(op->args[1]) && temps[op->args[1]].val == 0) {
> +            if (arg_is_const(op->args[1])
> +                && arg_info(op->args[1])->val == 0) {
>                  tcg_opt_gen_movi(s, op, op->args[0], 0);
>                  continue;
>              }
> @@ -681,7 +740,7 @@ void tcg_optimize(TCGContext *s)
>                  TCGOpcode neg_op;
>                  bool have_neg;
>
> -                if (temp_is_const(op->args[2])) {
> +                if (arg_is_const(op->args[2])) {
>                      /* Proceed with possible constant folding. */
>                      break;
>                  }
> @@ -695,8 +754,8 @@ void tcg_optimize(TCGContext *s)
>                  if (!have_neg) {
>                      break;
>                  }
> -                if (temp_is_const(op->args[1])
> -                    && temps[op->args[1]].val == 0) {
> +                if (arg_is_const(op->args[1])
> +                    && arg_info(op->args[1])->val == 0) {
>                      op->opc = neg_op;
>                      reset_temp(op->args[0]);
>                      op->args[1] = op->args[2];
> @@ -706,34 +765,34 @@ void tcg_optimize(TCGContext *s)
>              break;
>          CASE_OP_32_64(xor):
>          CASE_OP_32_64(nand):
> -            if (!temp_is_const(op->args[1])
> -                && temp_is_const(op->args[2])
> -                && temps[op->args[2]].val == -1) {
> +            if (!arg_is_const(op->args[1])
> +                && arg_is_const(op->args[2])
> +                && arg_info(op->args[2])->val == -1) {
>                  i = 1;
>                  goto try_not;
>              }
>              break;
>          CASE_OP_32_64(nor):
> -            if (!temp_is_const(op->args[1])
> -                && temp_is_const(op->args[2])
> -                && temps[op->args[2]].val == 0) {
> +            if (!arg_is_const(op->args[1])
> +                && arg_is_const(op->args[2])
> +                && arg_info(op->args[2])->val == 0) {
>                  i = 1;
>                  goto try_not;
>              }
>              break;
>          CASE_OP_32_64(andc):
> -            if (!temp_is_const(op->args[2])
> -                && temp_is_const(op->args[1])
> -                && temps[op->args[1]].val == -1) {
> +            if (!arg_is_const(op->args[2])
> +                && arg_is_const(op->args[1])
> +                && arg_info(op->args[1])->val == -1) {
>                  i = 2;
>                  goto try_not;
>              }
>              break;
>          CASE_OP_32_64(orc):
>          CASE_OP_32_64(eqv):
> -            if (!temp_is_const(op->args[2])
> -                && temp_is_const(op->args[1])
> -                && temps[op->args[1]].val == 0) {
> +            if (!arg_is_const(op->args[2])
> +                && arg_is_const(op->args[1])
> +                && arg_info(op->args[1])->val == 0) {
>                  i = 2;
>                  goto try_not;
>              }
> @@ -774,9 +833,9 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(or):
>          CASE_OP_32_64(xor):
>          CASE_OP_32_64(andc):
> -            if (!temp_is_const(op->args[1])
> -                && temp_is_const(op->args[2])
> -                && temps[op->args[2]].val == 0) {
> +            if (!arg_is_const(op->args[1])
> +                && arg_is_const(op->args[2])
> +                && arg_info(op->args[2])->val == 0) {
>                  tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
>                  continue;
>              }
> @@ -784,9 +843,9 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(and):
>          CASE_OP_32_64(orc):
>          CASE_OP_32_64(eqv):
> -            if (!temp_is_const(op->args[1])
> -                && temp_is_const(op->args[2])
> -                && temps[op->args[2]].val == -1) {
> +            if (!arg_is_const(op->args[1])
> +                && arg_is_const(op->args[2])
> +                && arg_info(op->args[2])->val == -1) {
>                  tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
>                  continue;
>              }
> @@ -801,21 +860,21 @@ void tcg_optimize(TCGContext *s)
>          affected = -1;
>          switch (opc) {
>          CASE_OP_32_64(ext8s):
> -            if ((temps[op->args[1]].mask & 0x80) != 0) {
> +            if ((arg_info(op->args[1])->mask & 0x80) != 0) {
>                  break;
>              }
>          CASE_OP_32_64(ext8u):
>              mask = 0xff;
>              goto and_const;
>          CASE_OP_32_64(ext16s):
> -            if ((temps[op->args[1]].mask & 0x8000) != 0) {
> +            if ((arg_info(op->args[1])->mask & 0x8000) != 0) {
>                  break;
>              }
>          CASE_OP_32_64(ext16u):
>              mask = 0xffff;
>              goto and_const;
>          case INDEX_op_ext32s_i64:
> -            if ((temps[op->args[1]].mask & 0x80000000) != 0) {
> +            if ((arg_info(op->args[1])->mask & 0x80000000) != 0) {
>                  break;
>              }
>          case INDEX_op_ext32u_i64:
> @@ -823,111 +882,114 @@ void tcg_optimize(TCGContext *s)
>              goto and_const;
>
>          CASE_OP_32_64(and):
> -            mask = temps[op->args[2]].mask;
> -            if (temp_is_const(op->args[2])) {
> +            mask = arg_info(op->args[2])->mask;
> +            if (arg_is_const(op->args[2])) {
>          and_const:
> -                affected = temps[op->args[1]].mask & ~mask;
> +                affected = arg_info(op->args[1])->mask & ~mask;
>              }
> -            mask = temps[op->args[1]].mask & mask;
> +            mask = arg_info(op->args[1])->mask & mask;
>              break;
>
>          case INDEX_op_ext_i32_i64:
> -            if ((temps[op->args[1]].mask & 0x80000000) != 0) {
> +            if ((arg_info(op->args[1])->mask & 0x80000000) != 0) {
>                  break;
>              }
>          case INDEX_op_extu_i32_i64:
>              /* We do not compute affected as it is a size changing op.  */
> -            mask = (uint32_t)temps[op->args[1]].mask;
> +            mask = (uint32_t)arg_info(op->args[1])->mask;
>              break;
>
>          CASE_OP_32_64(andc):
>              /* Known-zeros does not imply known-ones.  Therefore unless
>                 op->args[2] is constant, we can't infer anything from it.  */
> -            if (temp_is_const(op->args[2])) {
> -                mask = ~temps[op->args[2]].mask;
> +            if (arg_is_const(op->args[2])) {
> +                mask = ~arg_info(op->args[2])->mask;
>                  goto and_const;
>              }
> -            /* But we certainly know nothing outside op->args[1] may be set. */
> -            mask = temps[op->args[1]].mask;
> +            /* But we certainly know nothing outside args[1] may be set. */
> +            mask = arg_info(op->args[1])->mask;
>              break;
>
>          case INDEX_op_sar_i32:
> -            if (temp_is_const(op->args[2])) {
> -                tmp = temps[op->args[2]].val & 31;
> -                mask = (int32_t)temps[op->args[1]].mask >> tmp;
> +            if (arg_is_const(op->args[2])) {
> +                tmp = arg_info(op->args[2])->val & 31;
> +                mask = (int32_t)arg_info(op->args[1])->mask >> tmp;
>              }
>              break;
>          case INDEX_op_sar_i64:
> -            if (temp_is_const(op->args[2])) {
> -                tmp = temps[op->args[2]].val & 63;
> -                mask = (int64_t)temps[op->args[1]].mask >> tmp;
> +            if (arg_is_const(op->args[2])) {
> +                tmp = arg_info(op->args[2])->val & 63;
> +                mask = (int64_t)arg_info(op->args[1])->mask >> tmp;
>              }
>              break;
>
>          case INDEX_op_shr_i32:
> -            if (temp_is_const(op->args[2])) {
> -                tmp = temps[op->args[2]].val & 31;
> -                mask = (uint32_t)temps[op->args[1]].mask >> tmp;
> +            if (arg_is_const(op->args[2])) {
> +                tmp = arg_info(op->args[2])->val & 31;
> +                mask = (uint32_t)arg_info(op->args[1])->mask >> tmp;
>              }
>              break;
>          case INDEX_op_shr_i64:
> -            if (temp_is_const(op->args[2])) {
> -                tmp = temps[op->args[2]].val & 63;
> -                mask = (uint64_t)temps[op->args[1]].mask >> tmp;
> +            if (arg_is_const(op->args[2])) {
> +                tmp = arg_info(op->args[2])->val & 63;
> +                mask = (uint64_t)arg_info(op->args[1])->mask >> tmp;
>              }
>              break;
>
>          case INDEX_op_extrl_i64_i32:
> -            mask = (uint32_t)temps[op->args[1]].mask;
> +            mask = (uint32_t)arg_info(op->args[1])->mask;
>              break;
>          case INDEX_op_extrh_i64_i32:
> -            mask = (uint64_t)temps[op->args[1]].mask >> 32;
> +            mask = (uint64_t)arg_info(op->args[1])->mask >> 32;
>              break;
>
>          CASE_OP_32_64(shl):
> -            if (temp_is_const(op->args[2])) {
> -                tmp = temps[op->args[2]].val & (TCG_TARGET_REG_BITS - 1);
> -                mask = temps[op->args[1]].mask << tmp;
> +            if (arg_is_const(op->args[2])) {
> +                tmp = arg_info(op->args[2])->val & (TCG_TARGET_REG_BITS - 1);
> +                mask = arg_info(op->args[1])->mask << tmp;
>              }
>              break;
>
>          CASE_OP_32_64(neg):
>              /* Set to 1 all bits to the left of the rightmost.  */
> -            mask = -(temps[op->args[1]].mask & -temps[op->args[1]].mask);
> +            mask = -(arg_info(op->args[1])->mask
> +                     & -arg_info(op->args[1])->mask);
>              break;
>
>          CASE_OP_32_64(deposit):
> -            mask = deposit64(temps[op->args[1]].mask, op->args[3],
> -                             op->args[4], temps[op->args[2]].mask);
> +            mask = deposit64(arg_info(op->args[1])->mask,
> +                             op->args[3], op->args[4],
> +                             arg_info(op->args[2])->mask);
>              break;
>
>          CASE_OP_32_64(extract):
> -            mask = extract64(temps[op->args[1]].mask, op->args[2], op->args[3]);
> +            mask = extract64(arg_info(op->args[1])->mask,
> +                             op->args[2], op->args[3]);
>              if (op->args[2] == 0) {
> -                affected = temps[op->args[1]].mask & ~mask;
> +                affected = arg_info(op->args[1])->mask & ~mask;
>              }
>              break;
>          CASE_OP_32_64(sextract):
> -            mask = sextract64(temps[op->args[1]].mask,
> +            mask = sextract64(arg_info(op->args[1])->mask,
>                                op->args[2], op->args[3]);
>              if (op->args[2] == 0 && (tcg_target_long)mask >= 0) {
> -                affected = temps[op->args[1]].mask & ~mask;
> +                affected = arg_info(op->args[1])->mask & ~mask;
>              }
>              break;
>
>          CASE_OP_32_64(or):
>          CASE_OP_32_64(xor):
> -            mask = temps[op->args[1]].mask | temps[op->args[2]].mask;
> +            mask = arg_info(op->args[1])->mask | arg_info(op->args[2])->mask;
>              break;
>
>          case INDEX_op_clz_i32:
>          case INDEX_op_ctz_i32:
> -            mask = temps[op->args[2]].mask | 31;
> +            mask = arg_info(op->args[2])->mask | 31;
>              break;
>
>          case INDEX_op_clz_i64:
>          case INDEX_op_ctz_i64:
> -            mask = temps[op->args[2]].mask | 63;
> +            mask = arg_info(op->args[2])->mask | 63;
>              break;
>
>          case INDEX_op_ctpop_i32:
> @@ -943,7 +1005,7 @@ void tcg_optimize(TCGContext *s)
>              break;
>
>          CASE_OP_32_64(movcond):
> -            mask = temps[op->args[3]].mask | temps[op->args[4]].mask;
> +            mask = arg_info(op->args[3])->mask | arg_info(op->args[4])->mask;
>              break;
>
>          CASE_OP_32_64(ld8u):
> @@ -997,7 +1059,8 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(mul):
>          CASE_OP_32_64(muluh):
>          CASE_OP_32_64(mulsh):
> -            if ((temp_is_const(op->args[2]) && temps[op->args[2]].val == 0)) {
> +            if (arg_is_const(op->args[2])
> +                && arg_info(op->args[2])->val == 0) {
>                  tcg_opt_gen_movi(s, op, op->args[0], 0);
>                  continue;
>              }
> @@ -1010,7 +1073,7 @@ void tcg_optimize(TCGContext *s)
>          switch (opc) {
>          CASE_OP_32_64(or):
>          CASE_OP_32_64(and):
> -            if (temps_are_copies(op->args[1], op->args[2])) {
> +            if (args_are_copies(op->args[1], op->args[2])) {
>                  tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
>                  continue;
>              }
> @@ -1024,7 +1087,7 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(andc):
>          CASE_OP_32_64(sub):
>          CASE_OP_32_64(xor):
> -            if (temps_are_copies(op->args[1], op->args[2])) {
> +            if (args_are_copies(op->args[1], op->args[2])) {
>                  tcg_opt_gen_movi(s, op, op->args[0], 0);
>                  continue;
>              }
> @@ -1057,8 +1120,8 @@ void tcg_optimize(TCGContext *s)
>          case INDEX_op_extu_i32_i64:
>          case INDEX_op_extrl_i64_i32:
>          case INDEX_op_extrh_i64_i32:
> -            if (temp_is_const(op->args[1])) {
> -                tmp = do_constant_folding(opc, temps[op->args[1]].val, 0);
> +            if (arg_is_const(op->args[1])) {
> +                tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
>                  tcg_opt_gen_movi(s, op, op->args[0], tmp);
>                  break;
>              }
> @@ -1086,9 +1149,9 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(divu):
>          CASE_OP_32_64(rem):
>          CASE_OP_32_64(remu):
> -            if (temp_is_const(op->args[1]) && temp_is_const(op->args[2])) {
> -                tmp = do_constant_folding(opc, temps[op->args[1]].val,
> -                                          temps[op->args[2]].val);
> +            if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
> +                tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
> +                                          arg_info(op->args[2])->val);
>                  tcg_opt_gen_movi(s, op, op->args[0], tmp);
>                  break;
>              }
> @@ -1096,8 +1159,8 @@ void tcg_optimize(TCGContext *s)
>
>          CASE_OP_32_64(clz):
>          CASE_OP_32_64(ctz):
> -            if (temp_is_const(op->args[1])) {
> -                TCGArg v = temps[op->args[1]].val;
> +            if (arg_is_const(op->args[1])) {
> +                TCGArg v = arg_info(op->args[1])->val;
>                  if (v != 0) {
>                      tmp = do_constant_folding(opc, v, 0);
>                      tcg_opt_gen_movi(s, op, op->args[0], tmp);
> @@ -1109,17 +1172,18 @@ void tcg_optimize(TCGContext *s)
>              goto do_default;
>
>          CASE_OP_32_64(deposit):
> -            if (temp_is_const(op->args[1]) && temp_is_const(op->args[2])) {
> -                tmp = deposit64(temps[op->args[1]].val, op->args[3],
> -                                op->args[4], temps[op->args[2]].val);
> +            if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
> +                tmp = deposit64(arg_info(op->args[1])->val,
> +                                op->args[3], op->args[4],
> +                                arg_info(op->args[2])->val);
>                  tcg_opt_gen_movi(s, op, op->args[0], tmp);
>                  break;
>              }
>              goto do_default;
>
>          CASE_OP_32_64(extract):
> -            if (temp_is_const(op->args[1])) {
> -                tmp = extract64(temps[op->args[1]].val,
> +            if (arg_is_const(op->args[1])) {
> +                tmp = extract64(arg_info(op->args[1])->val,
>                                  op->args[2], op->args[3]);
>                  tcg_opt_gen_movi(s, op, op->args[0], tmp);
>                  break;
> @@ -1127,8 +1191,8 @@ void tcg_optimize(TCGContext *s)
>              goto do_default;
>
>          CASE_OP_32_64(sextract):
> -            if (temp_is_const(op->args[1])) {
> -                tmp = sextract64(temps[op->args[1]].val,
> +            if (arg_is_const(op->args[1])) {
> +                tmp = sextract64(arg_info(op->args[1])->val,
>                                   op->args[2], op->args[3]);
>                  tcg_opt_gen_movi(s, op, op->args[0], tmp);
>                  break;
> @@ -1166,9 +1230,9 @@ void tcg_optimize(TCGContext *s)
>                  tcg_opt_gen_mov(s, op, op->args[0], op->args[4-tmp]);
>                  break;
>              }
> -            if (temp_is_const(op->args[3]) && temp_is_const(op->args[4])) {
> -                tcg_target_ulong tv = temps[op->args[3]].val;
> -                tcg_target_ulong fv = temps[op->args[4]].val;
> +            if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
> +                tcg_target_ulong tv = arg_info(op->args[3])->val;
> +                tcg_target_ulong fv = arg_info(op->args[4])->val;
>                  TCGCond cond = op->args[5];
>                  if (fv == 1 && tv == 0) {
>                      cond = tcg_invert_cond(cond);
> @@ -1185,12 +1249,12 @@ void tcg_optimize(TCGContext *s)
>
>          case INDEX_op_add2_i32:
>          case INDEX_op_sub2_i32:
> -            if (temp_is_const(op->args[2]) && temp_is_const(op->args[3])
> -                && temp_is_const(op->args[4]) && temp_is_const(op->args[5])) {
> -                uint32_t al = temps[op->args[2]].val;
> -                uint32_t ah = temps[op->args[3]].val;
> -                uint32_t bl = temps[op->args[4]].val;
> -                uint32_t bh = temps[op->args[5]].val;
> +            if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])
> +                && arg_is_const(op->args[4]) && arg_is_const(op->args[5])) {
> +                uint32_t al = arg_info(op->args[2])->val;
> +                uint32_t ah = arg_info(op->args[3])->val;
> +                uint32_t bl = arg_info(op->args[4])->val;
> +                uint32_t bh = arg_info(op->args[5])->val;
>                  uint64_t a = ((uint64_t)ah << 32) | al;
>                  uint64_t b = ((uint64_t)bh << 32) | bl;
>                  TCGArg rl, rh;
> @@ -1214,9 +1278,9 @@ void tcg_optimize(TCGContext *s)
>              goto do_default;
>
>          case INDEX_op_mulu2_i32:
> -            if (temp_is_const(op->args[2]) && temp_is_const(op->args[3])) {
> -                uint32_t a = temps[op->args[2]].val;
> -                uint32_t b = temps[op->args[3]].val;
> +            if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
> +                uint32_t a = arg_info(op->args[2])->val;
> +                uint32_t b = arg_info(op->args[3])->val;
>                  uint64_t r = (uint64_t)a * b;
>                  TCGArg rl, rh;
>                  TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32, 2);
> @@ -1247,10 +1311,10 @@ void tcg_optimize(TCGContext *s)
>                  }
>              } else if ((op->args[4] == TCG_COND_LT
>                          || op->args[4] == TCG_COND_GE)
> -                       && temp_is_const(op->args[2])
> -                       && temps[op->args[2]].val == 0
> -                       && temp_is_const(op->args[3])
> -                       && temps[op->args[3]].val == 0) {
> +                       && arg_is_const(op->args[2])
> +                       && arg_info(op->args[2])->val == 0
> +                       && arg_is_const(op->args[3])
> +                       && arg_info(op->args[3])->val == 0) {
>                  /* Simplify LT/GE comparisons vs zero to a single compare
>                     vs the high word of the input.  */
>              do_brcond_high:
> @@ -1318,15 +1382,15 @@ void tcg_optimize(TCGContext *s)
>                  tcg_opt_gen_movi(s, op, op->args[0], tmp);
>              } else if ((op->args[5] == TCG_COND_LT
>                          || op->args[5] == TCG_COND_GE)
> -                       && temp_is_const(op->args[3])
> -                       && temps[op->args[3]].val == 0
> -                       && temp_is_const(op->args[4])
> -                       && temps[op->args[4]].val == 0) {
> +                       && arg_is_const(op->args[3])
> +                       && arg_info(op->args[3])->val == 0
> +                       && arg_is_const(op->args[4])
> +                       && arg_info(op->args[4])->val == 0) {
>                  /* Simplify LT/GE comparisons vs zero to a single compare
>                     vs the high word of the input.  */
>              do_setcond_high:
>                  reset_temp(op->args[0]);
> -                temps[op->args[0]].mask = 1;
> +                arg_info(op->args[0])->mask = 1;
>                  op->opc = INDEX_op_setcond_i32;
>                  op->args[1] = op->args[2];
>                  op->args[2] = op->args[4];
> @@ -1352,7 +1416,7 @@ void tcg_optimize(TCGContext *s)
>                  }
>              do_setcond_low:
>                  reset_temp(op->args[0]);
> -                temps[op->args[0]].mask = 1;
> +                arg_info(op->args[0])->mask = 1;
>                  op->opc = INDEX_op_setcond_i32;
>                  op->args[2] = op->args[3];
>                  op->args[3] = op->args[5];
> @@ -1386,7 +1450,7 @@ void tcg_optimize(TCGContext *s)
>                    & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
>                  for (i = 0; i < nb_globals; i++) {
>                      if (test_bit(i, temps_used.l)) {
> -                        reset_temp(i);
> +                        reset_ts(&s->temps[i]);
>                      }
>                  }
>              }
> @@ -1408,7 +1472,7 @@ void tcg_optimize(TCGContext *s)
>                      /* Save the corresponding known-zero bits mask for the
>                         first output argument (only one supported so far). */
>                      if (i == 0) {
> -                        temps[op->args[i]].mask = mask;
> +                        arg_info(op->args[i])->mask = mask;
>                      }
>                  }
>              }
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index b75a745..018c01c 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -750,6 +750,11 @@ static inline TCGTemp *arg_temp(TCGArg a)
>      return a == TCG_CALL_DUMMY_ARG ? NULL : &tcg_ctx.temps[a];
>  }
>
> +static inline size_t arg_index(TCGArg a)
> +{
> +    return a;
> +}
> +
>  static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
>  {
>      tcg_ctx.gen_op_buf[op_idx].args[arg] = v;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 06/16] tcg: Add temp_global bit to TCGTemp
  2017-06-27  8:39   ` Alex Bennée
@ 2017-06-27 16:17     ` Richard Henderson
  2017-06-28  8:52       ` Alex Bennée
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2017-06-27 16:17 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, aurelien

On 06/27/2017 01:39 AM, Alex Bennée wrote:
>> +    /* If true, the temp is saved across both basic blocks and
>> +       translation blocks.  */
>> +    unsigned int temp_global:1;
>> +    /* If true, the temp is saved across basic blocks but dead
>> +       at the end of translation blocks.  If false, the temp is
>> +       dead at the end of basic blocks.  */
>> +    unsigned int temp_local:1;
>> +    unsigned int temp_allocated:1;
> 
> This is where my knowledge of the TCG internals gets slightly confused.
> As far as I'm aware all our TranslationBlocks are Basic Blocks - they
> don't have any branches until the end of the block. What is the
> distinction here?
> 
> Is a temp_global truly global? I thought the guest state was fully
> rectified by the time we leave the basic block.

TranslationBlocks are not basic blocks.  They normally stop at branches in the 
target instruction stream, but they certainly may have many branches in the tcg 
opcode stream (brcond and the like).  Consider, for instance, our 
implementation of arm32's conditional instructions.

Beyond that, I agree the language is confusing.

A temp_global is created by tcg_global_mem_new_*, generally represents a cpu 
register, and is synced back to a slot in ENV.

A temp_local is created by tcg_temp_local_new_*, and is synced to a slot in the 
local stack frame.

Something without either is simply declared dead at the end of a basic block, 
and is a source of confusion to those writing new front-ends.

Anyway, we already have all of these concepts.  The change is that before the 
patch the only way to tell a temp_global is to compare the index against 
tcg_ctx.nb_global.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 07/16] tcg: Return NULL temp for TCG_CALL_DUMMY_ARG
  2017-06-27  8:47   ` Alex Bennée
@ 2017-06-27 16:36     ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-06-27 16:36 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, aurelien

On 06/27/2017 01:47 AM, Alex Bennée wrote:
> 
> Richard Henderson <rth@twiddle.net> writes:
> 
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>>   tcg/tcg.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tcg/tcg.h b/tcg/tcg.h
>> index 3b35344..6c357e7 100644
>> --- a/tcg/tcg.h
>> +++ b/tcg/tcg.h
>> @@ -730,7 +730,7 @@ extern bool parallel_cpus;
>>
>>   static inline TCGTemp *arg_temp(TCGArg a)
>>   {
>> -    return &tcg_ctx.temps[a];
>> +    return a == TCG_CALL_DUMMY_ARG ? NULL : &tcg_ctx.temps[a];
>>   }
> 
> It doesn't look like a lot of calls to arg_temp are able to deal with a
> NULL return and may well immediately deref the value. Are we sure the
> cases the TCG_CALL_DUMMY arg is involved are narrowly defined?

They only appear as arguments to a call opcode.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 09/16] tcg: Use per-temp state data in liveness
  2017-06-27  8:57   ` Alex Bennée
@ 2017-06-27 16:39     ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-06-27 16:39 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, aurelien

On 06/27/2017 01:57 AM, Alex Bennée wrote:
>> -            /* Note this unsigned test catches TCG_CALL_ARG_DUMMY too.  */
>> -            if (arg < nb_globals) {
> 
> This test is gone but....
> 
>> -                dir = dir_temps[arg];
>> -                if (dir != 0 && temp_state[arg] == TS_DEAD) {
>> -                    TCGTemp *its = arg_temp(arg);
>> -                    TCGOpcode lopc = (its->type == TCG_TYPE_I32
>> -                                      ? INDEX_op_ld_i32
>> -                                      : INDEX_op_ld_i64);
>> -                    TCGOp *lop = tcg_op_insert_before(s, op, lopc, 3);
>> -
>> -                    lop->args[0] = dir;
>> -                    lop->args[1] = temp_arg(its->mem_base);
>> -                    lop->args[2] = its->mem_offset;
>> -
>> -                    /* Loaded, but synced with memory.  */
>> -                    temp_state[arg] = TS_MEM;
>> -                }
>> +            arg_ts = arg_temp(op->args[i]);
>> +            dir_ts = arg_ts->state_ptr;
>> +            if (dir_ts && arg_ts->state == TS_DEAD) {
> 
> ...we de-ref arg_ts here. So what if it was a TCG_CALL_ARG_DUMMY?

Good catch.  I need to do more testing on a host that actually uses this padding...


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 13/16] tcg: Export temp_idx
  2017-06-27  9:46   ` Alex Bennée
@ 2017-06-27 16:43     ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2017-06-27 16:43 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, aurelien

On 06/27/2017 02:46 AM, Alex Bennée wrote:
> 
> Richard Henderson <rth@twiddle.net> writes:
> 
>> At the same time, drop the TCGContext argument and use tcg_ctx instead.
>>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>>   tcg/tcg.c | 15 ++++-----------
>>   tcg/tcg.h |  7 ++++++-
>>   2 files changed, 10 insertions(+), 12 deletions(-)
>>
>> diff --git a/tcg/tcg.c b/tcg/tcg.c
>> index f8d96fa..26931a7 100644
>> --- a/tcg/tcg.c
>> +++ b/tcg/tcg.c
>> @@ -473,13 +473,6 @@ void tcg_func_start(TCGContext *s)
>>       s->be = tcg_malloc(sizeof(TCGBackendData));
>>   }
>>
>> -static inline int temp_idx(TCGContext *s, TCGTemp *ts)
>> -{
>> -    ptrdiff_t n = ts - s->temps;
>> -    tcg_debug_assert(n >= 0 && n < s->nb_temps);
>> -    return n;
>> -}
>> -
>>   static inline TCGTemp *tcg_temp_alloc(TCGContext *s)
>>   {
>>       int n = s->nb_temps++;
>> @@ -516,7 +509,7 @@ static int tcg_global_reg_new_internal(TCGContext *s, TCGType type,
>>       ts->name = name;
>>       tcg_regset_set_reg(s->reserved_regs, reg);
>>
>> -    return temp_idx(s, ts);
>> +    return temp_idx(ts);
>>   }
>>
>>   void tcg_set_frame(TCGContext *s, TCGReg reg, intptr_t start, intptr_t size)
>> @@ -605,7 +598,7 @@ int tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
>>           ts->mem_offset = offset;
>>           ts->name = name;
>>       }
>> -    return temp_idx(s, ts);
>> +    return temp_idx(ts);
>>   }
>>
>>   static int tcg_temp_new_internal(TCGType type, int temp_local)
>> @@ -645,7 +638,7 @@ static int tcg_temp_new_internal(TCGType type, int temp_local)
>>               ts->temp_allocated = 1;
>>               ts->temp_local = temp_local;
>>           }
>> -        idx = temp_idx(s, ts);
>> +        idx = temp_idx(ts);
>>       }
>>
>>   #if defined(CONFIG_DEBUG_TCG)
>> @@ -963,7 +956,7 @@ static void tcg_reg_alloc_start(TCGContext *s)
>>   static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
>>                                    TCGTemp *ts)
>>   {
>> -    int idx = temp_idx(s, ts);
>> +    int idx = temp_idx(ts);
>>
>>       if (ts->temp_global) {
>>           pstrcpy(buf, buf_size, ts->name);
>> diff --git a/tcg/tcg.h b/tcg/tcg.h
>> index 4f69d0c..b75a745 100644
>> --- a/tcg/tcg.h
>> +++ b/tcg/tcg.h
>> @@ -733,13 +733,18 @@ struct TCGContext {
>>   extern TCGContext tcg_ctx;
>>   extern bool parallel_cpus;
>>
>> -static inline TCGArg temp_arg(TCGTemp *ts)
>> +static inline size_t temp_idx(TCGTemp *ts)
>>   {
>>       ptrdiff_t n = ts - tcg_ctx.temps;
>>       tcg_debug_assert(n >= 0 && n < tcg_ctx.nb_temps);
>>       return n;
>>   }
>>
>> +static inline TCGArg temp_arg(TCGTemp *ts)
>> +{
>> +    return temp_idx(ts);
>> +}
> 
> I'm confused at the dropping of TCGArg in favour of size_t only for
> temp_arg to implicitly cast it back. Was this meant to be part of
> another patch?

It was meant to keep the types logical.  When talking about an "arg" use 
TCGArg; when talking about an index (or idx) use size_t (or quite often int, 
where this hasn't been cleaned up).

You'll see from the last patch that temp_arg no longer calls temp_idx.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 06/16] tcg: Add temp_global bit to TCGTemp
  2017-06-27 16:17     ` Richard Henderson
@ 2017-06-28  8:52       ` Alex Bennée
  0 siblings, 0 replies; 40+ messages in thread
From: Alex Bennée @ 2017-06-28  8:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien


Richard Henderson <rth@twiddle.net> writes:

> On 06/27/2017 01:39 AM, Alex Bennée wrote:
>>> +    /* If true, the temp is saved across both basic blocks and
>>> +       translation blocks.  */
>>> +    unsigned int temp_global:1;
>>> +    /* If true, the temp is saved across basic blocks but dead
>>> +       at the end of translation blocks.  If false, the temp is
>>> +       dead at the end of basic blocks.  */
>>> +    unsigned int temp_local:1;
>>> +    unsigned int temp_allocated:1;
>>
>> This is where my knowledge of the TCG internals gets slightly confused.
>> As far as I'm aware all our TranslationBlocks are Basic Blocks - they
>> don't have any branches until the end of the block. What is the
>> distinction here?
>>
>> Is a temp_global truly global? I thought the guest state was fully
>> rectified by the time we leave the basic block.
>
> TranslationBlocks are not basic blocks.  They normally stop at
> branches in the target instruction stream, but they certainly may have
> many branches in the tcg opcode stream (brcond and the like).
> Consider, for instance, our implementation of arm32's conditional
> instructions.

Right. Re-reading the tcg/README it does make the distinction but the
term "basic block" is overloaded depending on when talking about the
guest instructions or the generated code.

>
> Beyond that, I agree the language is confusing.
>
> A temp_global is created by tcg_global_mem_new_*, generally represents
> a cpu register, and is synced back to a slot in ENV.
>
> A temp_local is created by tcg_temp_local_new_*, and is synced to a
> slot in the local stack frame.

The language from the README:

  A TCG "temporary" is a variable only live in a basic
  block. Temporaries are allocated explicitly in each function.

  A TCG "local temporary" is a variable only live in a function. Local
  temporaries are allocated explicitly in each function.

I must admit I hadn't quite understood the distinction. In the ARM code
the only place where tcg_temp_local_new() over tcg_temp_new() is used is
in the ld/strex code. I guess because you need to preserve its value
over a potential TCG branch?

I guess in translate-a64/gen_store_exclusive() the key lines are:

  TCGv_i64 addr = tcg_temp_local_new_i64();

  /* Copy input into a local temp so it is not trashed when the
   * basic block ends at the branch insn.
   */
  tcg_gen_mov_i64(addr, inaddr);
  tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_exclusive_addr, fail_label);

> Something without either is simply declared dead at the end of a basic
> block, and is a source of confusion to those writing new front-ends.

I'll see if I can come up with some improved wording to help new
developers in the future.

> Anyway, we already have all of these concepts.  The change is that
> before the patch the only way to tell a temp_global is to compare the
> index against tcg_ctx.nb_global.

Indeed. As I have previously really only been a consumer of the TCG API
I'm taking the opportunity to learn more about the internals as the
vector work is likely to touch it ;-)

>
>
> r~


--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2017-06-28  8:51 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-21  2:48 [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end Richard Henderson
2017-06-21  2:48 ` [Qemu-devel] [PATCH 01/16] tcg: Merge opcode arguments into TCGOp Richard Henderson
2017-06-26 14:44   ` Alex Bennée
2017-06-26 14:55     ` Richard Henderson
2017-06-21  2:48 ` [Qemu-devel] [PATCH 02/16] tcg: Propagate args to op->args in optimizer Richard Henderson
2017-06-26 14:53   ` Alex Bennée
2017-06-21  2:48 ` [Qemu-devel] [PATCH 03/16] tcg: Propagate args to op->args in tcg.c Richard Henderson
2017-06-26 15:02   ` Alex Bennée
2017-06-26 15:07     ` Richard Henderson
2017-06-21  2:48 ` [Qemu-devel] [PATCH 04/16] tcg: Propagate TCGOp down to allocators Richard Henderson
2017-06-26 15:08   ` Alex Bennée
2017-06-21  2:48 ` [Qemu-devel] [PATCH 05/16] tcg: Introduce arg_temp Richard Henderson
2017-06-26 16:37   ` Alex Bennée
2017-06-21  2:48 ` [Qemu-devel] [PATCH 06/16] tcg: Add temp_global bit to TCGTemp Richard Henderson
2017-06-27  8:39   ` Alex Bennée
2017-06-27 16:17     ` Richard Henderson
2017-06-28  8:52       ` Alex Bennée
2017-06-21  2:48 ` [Qemu-devel] [PATCH 07/16] tcg: Return NULL temp for TCG_CALL_DUMMY_ARG Richard Henderson
2017-06-27  8:47   ` Alex Bennée
2017-06-27 16:36     ` Richard Henderson
2017-06-21  2:48 ` [Qemu-devel] [PATCH 08/16] tcg: Introduce temp_arg Richard Henderson
2017-06-21  2:48 ` [Qemu-devel] [PATCH 09/16] tcg: Use per-temp state data in liveness Richard Henderson
2017-06-27  8:57   ` Alex Bennée
2017-06-27 16:39     ` Richard Henderson
2017-06-21  2:48 ` [Qemu-devel] [PATCH 10/16] tcg: Avoid loops against variable bounds Richard Henderson
2017-06-27  9:01   ` Alex Bennée
2017-06-21  2:48 ` [Qemu-devel] [PATCH 11/16] tcg: Change temp_allocate_frame arg to TCGTemp Richard Henderson
2017-06-21  2:48 ` [Qemu-devel] [PATCH 12/16] tcg: Remove unused TCG_CALL_DUMMY_TCGV Richard Henderson
2017-06-27  9:42   ` Alex Bennée
2017-06-21  2:48 ` [Qemu-devel] [PATCH 13/16] tcg: Export temp_idx Richard Henderson
2017-06-27  9:46   ` Alex Bennée
2017-06-27 16:43     ` Richard Henderson
2017-06-21  2:48 ` [Qemu-devel] [PATCH 14/16] tcg: Use per-temp state data in optimize Richard Henderson
2017-06-27  9:59   ` Alex Bennée
2017-06-21  2:48 ` [Qemu-devel] [PATCH 15/16] tcg: Define separate structures for TCGv_* Richard Henderson
2017-06-21  2:48 ` [Qemu-devel] [PATCH 16/16] tcg: Store pointers to temporaries directly in TCGArg Richard Henderson
2017-06-21  3:43 ` [Qemu-devel] [PATCH 00/16] Cleanups within TCG middle-end no-reply
2017-06-26 16:49 ` Alex Bennée
2017-06-26 17:47   ` Richard Henderson
2017-06-26 19:19     ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.