* [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer
@ 2015-07-24 16:30 Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking Aurelien Jarno
` (9 more replies)
0 siblings, 10 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson
This patchset improves the optimizer in 3 different ways:
- by optimizing temp tracking using a bit array
- by allowing constants to have copy
- by differentiating 32 <-> 64 bits conversions from moves in the
frontend by using specific instructions
The latter change introduces 2 new mandatory ext/extu_i32_i64 ops.
For the future we might want to allow each guest to only implement 2 of
the 3 size changing ops as I started to do in the following patchset:
http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg03369.html
That said we probably want to experiment with that first and see the
impact on the performances. If wrongly implemented, things might seem
to work but some bugs might appear in very rare cases, like the
truncation problem we recently got on aarch64 and x86-64 hosts.
Alternatively we might want to always implement the 3 ops on all
backends to avoid multiplying variants of the code and with that bugs.
This patchset has been fully tested on x86-64 host, but only lightly
tested on aarch64, ppc64 and s390x hosts.
Aurelien Jarno (10):
tcg/optimize: optimize temps tracking
tcg/optimize: add temp_is_const and temp_is_copy functions
tcg/optimize: track const/copy status separately
tcg/optimize: allow constant to have copies
tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
tcg: implement real ext_i32_i64 and extu_i32_i64 ops
tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
tcg/optimize: do not remember garbage high bits for 32-bit ops
tcg: update README about size changing ops
tcg/README | 20 +++-
tcg/aarch64/tcg-target.c | 4 +
tcg/aarch64/tcg-target.h | 2 +-
tcg/i386/tcg-target.c | 5 +
tcg/i386/tcg-target.h | 2 +-
tcg/ia64/tcg-target.c | 4 +
tcg/ia64/tcg-target.h | 2 +-
tcg/optimize.c | 246 ++++++++++++++++++++++-------------------------
tcg/ppc/tcg-target.c | 6 ++
tcg/ppc/tcg-target.h | 2 +-
tcg/s390/tcg-target.c | 5 +
tcg/s390/tcg-target.h | 2 +-
tcg/sparc/tcg-target.c | 12 ++-
tcg/sparc/tcg-target.h | 2 +-
tcg/tcg-op.c | 16 ++-
tcg/tcg-opc.h | 7 +-
tcg/tcg.h | 2 +-
tcg/tci/tcg-target.c | 4 +
tcg/tci/tcg-target.h | 2 +-
tci.c | 6 +-
20 files changed, 193 insertions(+), 158 deletions(-)
--
2.1.4
^ permalink raw reply [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking
2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
2015-07-27 8:21 ` Paolo Bonzini
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 02/10] tcg/optimize: add temp_is_const and temp_is_copy functions Aurelien Jarno
` (8 subsequent siblings)
9 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson
The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate
vector registers on most guests, it's not uncommon to have more than 100
used temps. This means we have initialize more than 2kB at least twice
per TB, often more when there is a few goto_tb.
Instead used a TCGTempSet bit array to track which temps are in used in
the current basic block. This means there are only around 16 bytes to
initialize.
This improves the boot time of a MIPS guest on an x86-64 host by around
7% and moves out tcg_optimize from the the top of the profiler list.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index cd0e793..20e24b3 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -50,6 +50,7 @@ struct tcg_temp_info {
};
static struct tcg_temp_info temps[TCG_MAX_TEMPS];
+static TCGTempSet temps_used;
/* Reset TEMP's state to TCG_TEMP_UNDEF. If TEMP only had one copy, remove
the copy flag from the left temp. */
@@ -67,6 +68,22 @@ static void reset_temp(TCGArg temp)
temps[temp].mask = -1;
}
+/* Reset all temporaries, given that there are NB_TEMPS of them. */
+static void reset_all_temps(int nb_temps)
+{
+ memset(&temps_used.l, 0, sizeof(long) * BITS_TO_LONGS(nb_temps));
+}
+
+/* Initialize and activate a temporary. */
+static void init_temp_info(TCGArg temp)
+{
+ if (!test_bit(temp, temps_used.l)) {
+ temps[temp].state = TCG_TEMP_UNDEF;
+ temps[temp].mask = -1;
+ set_bit(temp, temps_used.l);
+ }
+}
+
static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
TCGOpcode opc, int nargs)
{
@@ -98,16 +115,6 @@ static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
return new_op;
}
-/* Reset all temporaries, given that there are NB_TEMPS of them. */
-static void reset_all_temps(int nb_temps)
-{
- int i;
- for (i = 0; i < nb_temps; i++) {
- temps[i].state = TCG_TEMP_UNDEF;
- temps[i].mask = -1;
- }
-}
-
static int op_bits(TCGOpcode op)
{
const TCGOpDef *def = &tcg_op_defs[op];
@@ -606,6 +613,11 @@ void tcg_optimize(TCGContext *s)
nb_iargs = def->nb_iargs;
}
+ /* Initialize the temps that are going to be used */
+ for (i = 0; i < nb_oargs + nb_iargs; i++) {
+ init_temp_info(args[i]);
+ }
+
/* Do copy propagation */
for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
if (temps[args[i]].state == TCG_TEMP_COPY) {
--
2.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH for-2.5 02/10] tcg/optimize: add temp_is_const and temp_is_copy functions
2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
2015-07-29 16:01 ` Alex Bennée
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately Aurelien Jarno
` (7 subsequent siblings)
9 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson
Add two accessor functions temp_is_const and temp_is_copy, to make the
code more readable and make code change easier.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 131 ++++++++++++++++++++++++++-------------------------------
1 file changed, 60 insertions(+), 71 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 20e24b3..d2b63a4 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -52,11 +52,21 @@ struct tcg_temp_info {
static struct tcg_temp_info temps[TCG_MAX_TEMPS];
static TCGTempSet temps_used;
+static inline bool temp_is_const(TCGArg arg)
+{
+ return temps[arg].state == TCG_TEMP_CONST;
+}
+
+static inline bool temp_is_copy(TCGArg arg)
+{
+ return temps[arg].state == TCG_TEMP_COPY;
+}
+
/* Reset TEMP's state to TCG_TEMP_UNDEF. If TEMP only had one copy, remove
the copy flag from the left temp. */
static void reset_temp(TCGArg temp)
{
- if (temps[temp].state == TCG_TEMP_COPY) {
+ if (temp_is_copy(temp)) {
if (temps[temp].prev_copy == temps[temp].next_copy) {
temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
} else {
@@ -186,8 +196,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
return true;
}
- if (temps[arg1].state != TCG_TEMP_COPY
- || temps[arg2].state != TCG_TEMP_COPY) {
+ if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) {
return false;
}
@@ -230,7 +239,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
return;
}
- if (temps[src].state == TCG_TEMP_CONST) {
+ if (temp_is_const(src)) {
tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
return;
}
@@ -248,10 +257,10 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
}
temps[dst].mask = mask;
- assert(temps[src].state != TCG_TEMP_CONST);
+ assert(!temp_is_const(src));
if (s->temps[src].type == s->temps[dst].type) {
- if (temps[src].state != TCG_TEMP_COPY) {
+ if (!temp_is_copy(src)) {
temps[src].state = TCG_TEMP_COPY;
temps[src].next_copy = src;
temps[src].prev_copy = src;
@@ -488,7 +497,7 @@ static bool do_constant_folding_cond_eq(TCGCond c)
static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
TCGArg y, TCGCond c)
{
- if (temps[x].state == TCG_TEMP_CONST && temps[y].state == TCG_TEMP_CONST) {
+ if (temp_is_const(x) && temp_is_const(y)) {
switch (op_bits(op)) {
case 32:
return do_constant_folding_cond_32(temps[x].val, temps[y].val, c);
@@ -499,7 +508,7 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
}
} else if (temps_are_copies(x, y)) {
return do_constant_folding_cond_eq(c);
- } else if (temps[y].state == TCG_TEMP_CONST && temps[y].val == 0) {
+ } else if (temp_is_const(y) && temps[y].val == 0) {
switch (c) {
case TCG_COND_LTU:
return 0;
@@ -520,12 +529,10 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
TCGArg al = p1[0], ah = p1[1];
TCGArg bl = p2[0], bh = p2[1];
- if (temps[bl].state == TCG_TEMP_CONST
- && temps[bh].state == TCG_TEMP_CONST) {
+ if (temp_is_const(bl) && temp_is_const(bh)) {
uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val;
- if (temps[al].state == TCG_TEMP_CONST
- && temps[ah].state == TCG_TEMP_CONST) {
+ if (temp_is_const(al) && temp_is_const(ah)) {
uint64_t a;
a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val;
return do_constant_folding_cond_64(a, b, c);
@@ -551,8 +558,8 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
{
TCGArg a1 = *p1, a2 = *p2;
int sum = 0;
- sum += temps[a1].state == TCG_TEMP_CONST;
- sum -= temps[a2].state == TCG_TEMP_CONST;
+ sum += temp_is_const(a1);
+ sum -= temp_is_const(a2);
/* Prefer the constant in second argument, and then the form
op a, a, b, which is better handled on non-RISC hosts. */
@@ -567,10 +574,10 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
{
int sum = 0;
- sum += temps[p1[0]].state == TCG_TEMP_CONST;
- sum += temps[p1[1]].state == TCG_TEMP_CONST;
- sum -= temps[p2[0]].state == TCG_TEMP_CONST;
- sum -= temps[p2[1]].state == TCG_TEMP_CONST;
+ sum += temp_is_const(p1[0]);
+ sum += temp_is_const(p1[1]);
+ sum -= temp_is_const(p2[0]);
+ sum -= temp_is_const(p2[1]);
if (sum > 0) {
TCGArg t;
t = p1[0], p1[0] = p2[0], p2[0] = t;
@@ -620,7 +627,7 @@ void tcg_optimize(TCGContext *s)
/* Do copy propagation */
for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
- if (temps[args[i]].state == TCG_TEMP_COPY) {
+ if (temp_is_copy(args[i])) {
args[i] = find_better_copy(s, args[i]);
}
}
@@ -690,8 +697,7 @@ void tcg_optimize(TCGContext *s)
CASE_OP_32_64(sar):
CASE_OP_32_64(rotl):
CASE_OP_32_64(rotr):
- if (temps[args[1]].state == TCG_TEMP_CONST
- && temps[args[1]].val == 0) {
+ if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
tcg_opt_gen_movi(s, op, args, args[0], 0);
continue;
}
@@ -701,7 +707,7 @@ void tcg_optimize(TCGContext *s)
TCGOpcode neg_op;
bool have_neg;
- if (temps[args[2]].state == TCG_TEMP_CONST) {
+ if (temp_is_const(args[2])) {
/* Proceed with possible constant folding. */
break;
}
@@ -715,8 +721,7 @@ void tcg_optimize(TCGContext *s)
if (!have_neg) {
break;
}
- if (temps[args[1]].state == TCG_TEMP_CONST
- && temps[args[1]].val == 0) {
+ if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
op->opc = neg_op;
reset_temp(args[0]);
args[1] = args[2];
@@ -726,34 +731,30 @@ void tcg_optimize(TCGContext *s)
break;
CASE_OP_32_64(xor):
CASE_OP_32_64(nand):
- if (temps[args[1]].state != TCG_TEMP_CONST
- && temps[args[2]].state == TCG_TEMP_CONST
- && temps[args[2]].val == -1) {
+ if (!temp_is_const(args[1])
+ && temp_is_const(args[2]) && temps[args[2]].val == -1) {
i = 1;
goto try_not;
}
break;
CASE_OP_32_64(nor):
- if (temps[args[1]].state != TCG_TEMP_CONST
- && temps[args[2]].state == TCG_TEMP_CONST
- && temps[args[2]].val == 0) {
+ if (!temp_is_const(args[1])
+ && temp_is_const(args[2]) && temps[args[2]].val == 0) {
i = 1;
goto try_not;
}
break;
CASE_OP_32_64(andc):
- if (temps[args[2]].state != TCG_TEMP_CONST
- && temps[args[1]].state == TCG_TEMP_CONST
- && temps[args[1]].val == -1) {
+ if (!temp_is_const(args[2])
+ && temp_is_const(args[1]) && temps[args[1]].val == -1) {
i = 2;
goto try_not;
}
break;
CASE_OP_32_64(orc):
CASE_OP_32_64(eqv):
- if (temps[args[2]].state != TCG_TEMP_CONST
- && temps[args[1]].state == TCG_TEMP_CONST
- && temps[args[1]].val == 0) {
+ if (!temp_is_const(args[2])
+ && temp_is_const(args[1]) && temps[args[1]].val == 0) {
i = 2;
goto try_not;
}
@@ -794,9 +795,8 @@ void tcg_optimize(TCGContext *s)
CASE_OP_32_64(or):
CASE_OP_32_64(xor):
CASE_OP_32_64(andc):
- if (temps[args[1]].state != TCG_TEMP_CONST
- && temps[args[2]].state == TCG_TEMP_CONST
- && temps[args[2]].val == 0) {
+ if (!temp_is_const(args[1])
+ && temp_is_const(args[2]) && temps[args[2]].val == 0) {
tcg_opt_gen_mov(s, op, args, args[0], args[1]);
continue;
}
@@ -804,9 +804,8 @@ void tcg_optimize(TCGContext *s)
CASE_OP_32_64(and):
CASE_OP_32_64(orc):
CASE_OP_32_64(eqv):
- if (temps[args[1]].state != TCG_TEMP_CONST
- && temps[args[2]].state == TCG_TEMP_CONST
- && temps[args[2]].val == -1) {
+ if (!temp_is_const(args[1])
+ && temp_is_const(args[2]) && temps[args[2]].val == -1) {
tcg_opt_gen_mov(s, op, args, args[0], args[1]);
continue;
}
@@ -844,7 +843,7 @@ void tcg_optimize(TCGContext *s)
CASE_OP_32_64(and):
mask = temps[args[2]].mask;
- if (temps[args[2]].state == TCG_TEMP_CONST) {
+ if (temp_is_const(args[2])) {
and_const:
affected = temps[args[1]].mask & ~mask;
}
@@ -854,7 +853,7 @@ void tcg_optimize(TCGContext *s)
CASE_OP_32_64(andc):
/* Known-zeros does not imply known-ones. Therefore unless
args[2] is constant, we can't infer anything from it. */
- if (temps[args[2]].state == TCG_TEMP_CONST) {
+ if (temp_is_const(args[2])) {
mask = ~temps[args[2]].mask;
goto and_const;
}
@@ -863,26 +862,26 @@ void tcg_optimize(TCGContext *s)
break;
case INDEX_op_sar_i32:
- if (temps[args[2]].state == TCG_TEMP_CONST) {
+ if (temp_is_const(args[2])) {
tmp = temps[args[2]].val & 31;
mask = (int32_t)temps[args[1]].mask >> tmp;
}
break;
case INDEX_op_sar_i64:
- if (temps[args[2]].state == TCG_TEMP_CONST) {
+ if (temp_is_const(args[2])) {
tmp = temps[args[2]].val & 63;
mask = (int64_t)temps[args[1]].mask >> tmp;
}
break;
case INDEX_op_shr_i32:
- if (temps[args[2]].state == TCG_TEMP_CONST) {
+ if (temp_is_const(args[2])) {
tmp = temps[args[2]].val & 31;
mask = (uint32_t)temps[args[1]].mask >> tmp;
}
break;
case INDEX_op_shr_i64:
- if (temps[args[2]].state == TCG_TEMP_CONST) {
+ if (temp_is_const(args[2])) {
tmp = temps[args[2]].val & 63;
mask = (uint64_t)temps[args[1]].mask >> tmp;
}
@@ -893,7 +892,7 @@ void tcg_optimize(TCGContext *s)
break;
CASE_OP_32_64(shl):
- if (temps[args[2]].state == TCG_TEMP_CONST) {
+ if (temp_is_const(args[2])) {
tmp = temps[args[2]].val & (TCG_TARGET_REG_BITS - 1);
mask = temps[args[1]].mask << tmp;
}
@@ -974,8 +973,7 @@ void tcg_optimize(TCGContext *s)
CASE_OP_32_64(mul):
CASE_OP_32_64(muluh):
CASE_OP_32_64(mulsh):
- if ((temps[args[2]].state == TCG_TEMP_CONST
- && temps[args[2]].val == 0)) {
+ if ((temp_is_const(args[2]) && temps[args[2]].val == 0)) {
tcg_opt_gen_movi(s, op, args, args[0], 0);
continue;
}
@@ -1030,7 +1028,7 @@ void tcg_optimize(TCGContext *s)
CASE_OP_32_64(ext16u):
case INDEX_op_ext32s_i64:
case INDEX_op_ext32u_i64:
- if (temps[args[1]].state == TCG_TEMP_CONST) {
+ if (temp_is_const(args[1])) {
tmp = do_constant_folding(opc, temps[args[1]].val, 0);
tcg_opt_gen_movi(s, op, args, args[0], tmp);
break;
@@ -1038,7 +1036,7 @@ void tcg_optimize(TCGContext *s)
goto do_default;
case INDEX_op_trunc_shr_i32:
- if (temps[args[1]].state == TCG_TEMP_CONST) {
+ if (temp_is_const(args[1])) {
tmp = do_constant_folding(opc, temps[args[1]].val, args[2]);
tcg_opt_gen_movi(s, op, args, args[0], tmp);
break;
@@ -1067,8 +1065,7 @@ void tcg_optimize(TCGContext *s)
CASE_OP_32_64(divu):
CASE_OP_32_64(rem):
CASE_OP_32_64(remu):
- if (temps[args[1]].state == TCG_TEMP_CONST
- && temps[args[2]].state == TCG_TEMP_CONST) {
+ if (temp_is_const(args[1]) && temp_is_const(args[2])) {
tmp = do_constant_folding(opc, temps[args[1]].val,
temps[args[2]].val);
tcg_opt_gen_movi(s, op, args, args[0], tmp);
@@ -1077,8 +1074,7 @@ void tcg_optimize(TCGContext *s)
goto do_default;
CASE_OP_32_64(deposit):
- if (temps[args[1]].state == TCG_TEMP_CONST
- && temps[args[2]].state == TCG_TEMP_CONST) {
+ if (temp_is_const(args[1]) && temp_is_const(args[2])) {
tmp = deposit64(temps[args[1]].val, args[3], args[4],
temps[args[2]].val);
tcg_opt_gen_movi(s, op, args, args[0], tmp);
@@ -1118,10 +1114,8 @@ void tcg_optimize(TCGContext *s)
case INDEX_op_add2_i32:
case INDEX_op_sub2_i32:
- if (temps[args[2]].state == TCG_TEMP_CONST
- && temps[args[3]].state == TCG_TEMP_CONST
- && temps[args[4]].state == TCG_TEMP_CONST
- && temps[args[5]].state == TCG_TEMP_CONST) {
+ if (temp_is_const(args[2]) && temp_is_const(args[3])
+ && temp_is_const(args[4]) && temp_is_const(args[5])) {
uint32_t al = temps[args[2]].val;
uint32_t ah = temps[args[3]].val;
uint32_t bl = temps[args[4]].val;
@@ -1150,8 +1144,7 @@ void tcg_optimize(TCGContext *s)
goto do_default;
case INDEX_op_mulu2_i32:
- if (temps[args[2]].state == TCG_TEMP_CONST
- && temps[args[3]].state == TCG_TEMP_CONST) {
+ if (temp_is_const(args[2]) && temp_is_const(args[3])) {
uint32_t a = temps[args[2]].val;
uint32_t b = temps[args[3]].val;
uint64_t r = (uint64_t)a * b;
@@ -1183,10 +1176,8 @@ void tcg_optimize(TCGContext *s)
tcg_op_remove(s, op);
}
} else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
- && temps[args[2]].state == TCG_TEMP_CONST
- && temps[args[3]].state == TCG_TEMP_CONST
- && temps[args[2]].val == 0
- && temps[args[3]].val == 0) {
+ && temp_is_const(args[2]) && temps[args[2]].val == 0
+ && temp_is_const(args[3]) && temps[args[3]].val == 0) {
/* Simplify LT/GE comparisons vs zero to a single compare
vs the high word of the input. */
do_brcond_high:
@@ -1248,10 +1239,8 @@ void tcg_optimize(TCGContext *s)
do_setcond_const:
tcg_opt_gen_movi(s, op, args, args[0], tmp);
} else if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
- && temps[args[3]].state == TCG_TEMP_CONST
- && temps[args[4]].state == TCG_TEMP_CONST
- && temps[args[3]].val == 0
- && temps[args[4]].val == 0) {
+ && temp_is_const(args[3]) && temps[args[3]].val == 0
+ && temp_is_const(args[4]) && temps[args[4]].val == 0) {
/* Simplify LT/GE comparisons vs zero to a single compare
vs the high word of the input. */
do_setcond_high:
--
2.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately
2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 02/10] tcg/optimize: add temp_is_const and temp_is_copy functions Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
2015-07-27 8:25 ` Paolo Bonzini
2015-07-29 16:10 ` Alex Bennée
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies Aurelien Jarno
` (6 subsequent siblings)
9 siblings, 2 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson
Use two bools to track constants and copies instead of an enum.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index d2b63a4..f16eb1e 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -35,14 +35,9 @@
glue(glue(case INDEX_op_, x), _i32): \
glue(glue(case INDEX_op_, x), _i64)
-typedef enum {
- TCG_TEMP_UNDEF = 0,
- TCG_TEMP_CONST,
- TCG_TEMP_COPY,
-} tcg_temp_state;
-
struct tcg_temp_info {
- tcg_temp_state state;
+ bool is_const;
+ bool is_copy;
uint16_t prev_copy;
uint16_t next_copy;
tcg_target_ulong val;
@@ -54,12 +49,12 @@ static TCGTempSet temps_used;
static inline bool temp_is_const(TCGArg arg)
{
- return temps[arg].state == TCG_TEMP_CONST;
+ return temps[arg].is_const;
}
static inline bool temp_is_copy(TCGArg arg)
{
- return temps[arg].state == TCG_TEMP_COPY;
+ return temps[arg].is_copy;
}
/* Reset TEMP's state to TCG_TEMP_UNDEF. If TEMP only had one copy, remove
@@ -68,13 +63,14 @@ static void reset_temp(TCGArg temp)
{
if (temp_is_copy(temp)) {
if (temps[temp].prev_copy == temps[temp].next_copy) {
- temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
+ temps[temps[temp].next_copy].is_copy = false;
} else {
temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
}
}
- temps[temp].state = TCG_TEMP_UNDEF;
+ temps[temp].is_const = false;
+ temps[temp].is_copy = false;
temps[temp].mask = -1;
}
@@ -88,7 +84,8 @@ static void reset_all_temps(int nb_temps)
static void init_temp_info(TCGArg temp)
{
if (!test_bit(temp, temps_used.l)) {
- temps[temp].state = TCG_TEMP_UNDEF;
+ temps[temp].is_const = false;
+ temps[temp].is_copy = false;
temps[temp].mask = -1;
set_bit(temp, temps_used.l);
}
@@ -218,7 +215,8 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
op->opc = new_op;
reset_temp(dst);
- temps[dst].state = TCG_TEMP_CONST;
+ temps[dst].is_const = true;
+ temps[dst].is_copy = false;
temps[dst].val = val;
mask = val;
if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
@@ -261,11 +259,13 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
if (s->temps[src].type == s->temps[dst].type) {
if (!temp_is_copy(src)) {
- temps[src].state = TCG_TEMP_COPY;
+ temps[src].is_const = false;
+ temps[src].is_copy = true;
temps[src].next_copy = src;
temps[src].prev_copy = src;
}
- temps[dst].state = TCG_TEMP_COPY;
+ temps[dst].is_const = false;
+ temps[dst].is_copy = true;
temps[dst].next_copy = temps[src].next_copy;
temps[dst].prev_copy = src;
temps[temps[dst].next_copy].prev_copy = dst;
--
2.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies
2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
` (2 preceding siblings ...)
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
2015-07-24 20:15 ` Richard Henderson
2015-07-29 16:12 ` Alex Bennée
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 05/10] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
` (5 subsequent siblings)
9 siblings, 2 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson
Now that copies and constants are tracked separately, we can allow
constant to have copies, deferring the choice to use a register or a
constant to the register allocation pass. This prevent this kind of
regular constant reloading:
-OUT: [size=338]
+OUT: [size=298]
mov -0x4(%r14),%ebp
test %ebp,%ebp
jne 0x7ffbe9cb0ed6
mov $0x40002219f8,%rbp
mov %rbp,(%r14)
- mov $0x40002219f8,%rbp
mov $0x4000221a20,%rbx
mov %rbp,(%rbx)
mov $0x4000000000,%rbp
mov %rbp,(%r14)
- mov $0x4000000000,%rbp
mov $0x4000221d38,%rbx
mov %rbp,(%rbx)
mov $0x40002221a8,%rbp
mov %rbp,(%r14)
- mov $0x40002221a8,%rbp
mov $0x4000221d40,%rbx
mov %rbp,(%rbx)
mov $0x4000019170,%rbp
mov %rbp,(%r14)
- mov $0x4000019170,%rbp
mov $0x4000221d48,%rbx
mov %rbp,(%rbx)
mov $0x40000049ee,%rbp
mov %rbp,0x80(%r14)
mov %r14,%rdi
callq 0x7ffbe99924d0
mov $0x4000001680,%rbp
mov %rbp,0x30(%r14)
mov 0x10(%r14),%rbp
mov $0x4000001680,%rbp
mov %rbp,0x30(%r14)
mov 0x10(%r14),%rbp
shl $0x20,%rbp
mov (%r14),%rbx
mov %ebx,%ebx
mov %rbx,(%r14)
or %rbx,%rbp
mov %rbp,0x10(%r14)
mov %rbp,0x90(%r14)
mov 0x60(%r14),%rbx
mov %rbx,0x38(%r14)
mov 0x28(%r14),%rbx
mov $0x4000220e60,%r12
mov %rbx,(%r12)
mov $0x40002219c8,%rbx
mov %rbp,(%rbx)
mov 0x20(%r14),%rbp
sub $0x8,%rbp
mov $0x4000004a16,%rbx
mov %rbx,0x0(%rbp)
mov %rbp,0x20(%r14)
mov $0x19,%ebp
mov %ebp,0xa8(%r14)
mov $0x4000015110,%rbp
mov %rbp,0x80(%r14)
xor %eax,%eax
jmpq 0x7ffbebcae426
lea -0x5f6d72a(%rip),%rax # 0x7ffbe3d437b3
jmpq 0x7ffbebcae426
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index f16eb1e..48103b2 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -237,11 +237,6 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
return;
}
- if (temp_is_const(src)) {
- tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
- return;
- }
-
TCGOpcode new_op = op_to_mov(op->opc);
tcg_target_ulong mask;
@@ -255,17 +250,15 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
}
temps[dst].mask = mask;
- assert(!temp_is_const(src));
-
if (s->temps[src].type == s->temps[dst].type) {
if (!temp_is_copy(src)) {
- temps[src].is_const = false;
temps[src].is_copy = true;
temps[src].next_copy = src;
temps[src].prev_copy = src;
}
- temps[dst].is_const = false;
+ temps[dst].is_const = temps[src].is_const;
temps[dst].is_copy = true;
+ temps[dst].val = temps[src].val;
temps[dst].next_copy = temps[src].next_copy;
temps[dst].prev_copy = src;
temps[temps[dst].next_copy].prev_copy = dst;
--
2.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH for-2.5 05/10] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
` (3 preceding siblings ...)
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
2015-07-31 6:31 ` Alex Bennée
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 06/10] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
` (4 subsequent siblings)
9 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson
The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32,
and the name in the README doesn't match the name offered to the
frontends.
Always use the long name to make it clear it is a size changing op.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/README | 2 +-
tcg/aarch64/tcg-target.h | 2 +-
tcg/i386/tcg-target.h | 2 +-
tcg/ia64/tcg-target.h | 2 +-
tcg/optimize.c | 6 +++---
tcg/ppc/tcg-target.h | 2 +-
tcg/s390/tcg-target.h | 2 +-
tcg/sparc/tcg-target.c | 4 ++--
tcg/sparc/tcg-target.h | 2 +-
tcg/tcg-op.c | 4 ++--
tcg/tcg-opc.h | 4 ++--
tcg/tcg.h | 2 +-
tcg/tci/tcg-target.h | 2 +-
13 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/tcg/README b/tcg/README
index a550ff1..61b3899 100644
--- a/tcg/README
+++ b/tcg/README
@@ -314,7 +314,7 @@ This operation would be equivalent to
dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
-* trunc_shr_i32 t0, t1, pos
+* trunc_shr_i64_i32 t0, t1, pos
For 64-bit hosts only, right shift the 64-bit input T1 by POS and
truncate to 32-bit output T0. Depending on the host, this may be
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 8aec04d..dfd8801 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -70,7 +70,7 @@ typedef enum {
#define TCG_TARGET_HAS_muls2_i32 0
#define TCG_TARGET_HAS_muluh_i32 0
#define TCG_TARGET_HAS_mulsh_i32 0
-#define TCG_TARGET_HAS_trunc_shr_i32 0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
#define TCG_TARGET_HAS_div_i64 1
#define TCG_TARGET_HAS_rem_i64 1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 25b5133..dae50ba 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -102,7 +102,7 @@ extern bool have_bmi1;
#define TCG_TARGET_HAS_mulsh_i32 0
#if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i32 0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
#define TCG_TARGET_HAS_div2_i64 1
#define TCG_TARGET_HAS_rot_i64 1
#define TCG_TARGET_HAS_ext8s_i64 1
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index a04ed81..29902f9 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -160,7 +160,7 @@ typedef enum {
#define TCG_TARGET_HAS_muluh_i64 0
#define TCG_TARGET_HAS_mulsh_i32 0
#define TCG_TARGET_HAS_mulsh_i64 0
-#define TCG_TARGET_HAS_trunc_shr_i32 0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
#define TCG_TARGET_deposit_i32_valid(ofs, len) ((len) <= 16)
#define TCG_TARGET_deposit_i64_valid(ofs, len) ((len) <= 16)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 48103b2..56e0a17 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -301,7 +301,7 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
case INDEX_op_shr_i32:
return (uint32_t)x >> (y & 31);
- case INDEX_op_trunc_shr_i32:
+ case INDEX_op_trunc_shr_i64_i32:
case INDEX_op_shr_i64:
return (uint64_t)x >> (y & 63);
@@ -880,7 +880,7 @@ void tcg_optimize(TCGContext *s)
}
break;
- case INDEX_op_trunc_shr_i32:
+ case INDEX_op_trunc_shr_i64_i32:
mask = (uint64_t)temps[args[1]].mask >> args[2];
break;
@@ -1028,7 +1028,7 @@ void tcg_optimize(TCGContext *s)
}
goto do_default;
- case INDEX_op_trunc_shr_i32:
+ case INDEX_op_trunc_shr_i64_i32:
if (temp_is_const(args[1])) {
tmp = do_constant_folding(opc, temps[args[1]].val, args[2]);
tcg_opt_gen_movi(s, op, args, args[0], tmp);
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 7ce7048..b7e6861 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -77,7 +77,7 @@ typedef enum {
#if TCG_TARGET_REG_BITS == 64
#define TCG_TARGET_HAS_add2_i32 0
#define TCG_TARGET_HAS_sub2_i32 0
-#define TCG_TARGET_HAS_trunc_shr_i32 0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
#define TCG_TARGET_HAS_div_i64 1
#define TCG_TARGET_HAS_rem_i64 0
#define TCG_TARGET_HAS_rot_i64 1
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 91576d5..50016a8 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -72,7 +72,7 @@ typedef enum TCGReg {
#define TCG_TARGET_HAS_muls2_i32 0
#define TCG_TARGET_HAS_muluh_i32 0
#define TCG_TARGET_HAS_mulsh_i32 0
-#define TCG_TARGET_HAS_trunc_shr_i32 0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
#define TCG_TARGET_HAS_div2_i64 1
#define TCG_TARGET_HAS_rot_i64 1
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 1a870a8..b23032b 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1413,7 +1413,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
case INDEX_op_ext32u_i64:
tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
break;
- case INDEX_op_trunc_shr_i32:
+ case INDEX_op_trunc_shr_i64_i32:
if (a2 == 0) {
tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
} else {
@@ -1533,7 +1533,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
{ INDEX_op_ext32s_i64, { "R", "r" } },
{ INDEX_op_ext32u_i64, { "R", "r" } },
- { INDEX_op_trunc_shr_i32, { "r", "R" } },
+ { INDEX_op_trunc_shr_i64_i32, { "r", "R" } },
{ INDEX_op_brcond_i64, { "RZ", "RJ" } },
{ INDEX_op_setcond_i64, { "R", "RZ", "RJ" } },
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index f584de4..336c47f 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -118,7 +118,7 @@ extern bool use_vis3_instructions;
#define TCG_TARGET_HAS_muluh_i32 0
#define TCG_TARGET_HAS_mulsh_i32 0
-#define TCG_TARGET_HAS_trunc_shr_i32 1
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 1
#define TCG_TARGET_HAS_div_i64 1
#define TCG_TARGET_HAS_rem_i64 0
#define TCG_TARGET_HAS_rot_i64 0
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 45098c3..61b64db 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1751,8 +1751,8 @@ void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
tcg_gen_mov_i32(ret, TCGV_LOW(t));
tcg_temp_free_i64(t);
}
- } else if (TCG_TARGET_HAS_trunc_shr_i32) {
- tcg_gen_op3i_i32(INDEX_op_trunc_shr_i32, ret,
+ } else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
+ tcg_gen_op3i_i32(INDEX_op_trunc_shr_i64_i32, ret,
MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
} else if (count == 0) {
tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 13ccb60..4a34f43 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -138,8 +138,8 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
-DEF(trunc_shr_i32, 1, 1, 1,
- IMPL(TCG_TARGET_HAS_trunc_shr_i32)
+DEF(trunc_shr_i64_i32, 1, 1, 1,
+ IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
| (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | IMPL64)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 231a781..e7e33b9 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -66,7 +66,7 @@ typedef uint64_t TCGRegSet;
#if TCG_TARGET_REG_BITS == 32
/* Turn some undef macros into false macros. */
-#define TCG_TARGET_HAS_trunc_shr_i32 0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
#define TCG_TARGET_HAS_div_i64 0
#define TCG_TARGET_HAS_rem_i64 0
#define TCG_TARGET_HAS_div2_i64 0
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index cbf3f9b..8b1139b 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -84,7 +84,7 @@
#define TCG_TARGET_HAS_mulsh_i32 0
#if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i32 0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
#define TCG_TARGET_HAS_bswap16_i64 1
#define TCG_TARGET_HAS_bswap32_i64 1
#define TCG_TARGET_HAS_bswap64_i64 1
--
2.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH for-2.5 06/10] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
` (4 preceding siblings ...)
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 05/10] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
2015-07-31 7:32 ` Alex Bennée
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 07/10] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
` (3 subsequent siblings)
9 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson
The tcg_gen_trunc_shr_i64_i32 function takes a 64-bit argument and
returns a 32-bit value. Directly call tcg_gen_op3 with the correct
types instead of calling tcg_gen_op3i_i32 and abusing the TCG types.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/tcg-op.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 61b64db..0e79fd1 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1752,8 +1752,8 @@ void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
tcg_temp_free_i64(t);
}
} else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
- tcg_gen_op3i_i32(INDEX_op_trunc_shr_i64_i32, ret,
- MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
+ tcg_gen_op3(&tcg_ctx, INDEX_op_trunc_shr_i64_i32,
+ GET_TCGV_I32(ret), GET_TCGV_I64(arg), count);
} else if (count == 0) {
tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
} else {
--
2.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH for-2.5 07/10] tcg: implement real ext_i32_i64 and extu_i32_i64 ops
2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
` (5 preceding siblings ...)
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 06/10] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
2015-07-31 16:01 ` Alex Bennée
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 08/10] tcg/optimize: add optimizations for " Aurelien Jarno
` (2 subsequent siblings)
9 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
To: qemu-devel
Cc: Claudio Fontana, Stefan Weil, Claudio Fontana, Alexander Graf,
Blue Swirl, Aurelien Jarno, Richard Henderson
Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a
32-bit value is always converted to a 64-bit value and not propagated
through the register allocator or the optimizer.
Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Claudio Fontana <claudio.fontana@huawei.com>
Cc: Claudio Fontana <claudio.fontana@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/aarch64/tcg-target.c | 4 ++++
tcg/i386/tcg-target.c | 5 +++++
tcg/ia64/tcg-target.c | 4 ++++
tcg/ppc/tcg-target.c | 6 ++++++
tcg/s390/tcg-target.c | 5 +++++
tcg/sparc/tcg-target.c | 8 ++++++--
tcg/tcg-op.c | 10 ++++------
tcg/tcg-opc.h | 3 +++
tcg/tci/tcg-target.c | 4 ++++
tci.c | 6 ++++--
10 files changed, 45 insertions(+), 10 deletions(-)
diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index b7ec4f5..7f7ab7e 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1556,6 +1556,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
case INDEX_op_ext16s_i32:
tcg_out_sxt(s, ext, MO_16, a0, a1);
break;
+ case INDEX_op_ext_i32_i64:
case INDEX_op_ext32s_i64:
tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
break;
@@ -1567,6 +1568,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
case INDEX_op_ext16u_i32:
tcg_out_uxt(s, MO_16, a0, a1);
break;
+ case INDEX_op_extu_i32_i64:
case INDEX_op_ext32u_i64:
tcg_out_movr(s, TCG_TYPE_I32, a0, a1);
break;
@@ -1712,6 +1714,8 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
{ INDEX_op_ext8u_i64, { "r", "r" } },
{ INDEX_op_ext16u_i64, { "r", "r" } },
{ INDEX_op_ext32u_i64, { "r", "r" } },
+ { INDEX_op_ext_i32_i64, { "r", "r" } },
+ { INDEX_op_extu_i32_i64, { "r", "r" } },
{ INDEX_op_deposit_i32, { "r", "0", "rZ" } },
{ INDEX_op_deposit_i64, { "r", "0", "rZ" } },
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 4f40468..ff55499 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -2068,9 +2068,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
case INDEX_op_bswap64_i64:
tcg_out_bswap64(s, args[0]);
break;
+ case INDEX_op_extu_i32_i64:
case INDEX_op_ext32u_i64:
tcg_out_ext32u(s, args[0], args[1]);
break;
+ case INDEX_op_ext_i32_i64:
case INDEX_op_ext32s_i64:
tcg_out_ext32s(s, args[0], args[1]);
break;
@@ -2205,6 +2207,9 @@ static const TCGTargetOpDef x86_op_defs[] = {
{ INDEX_op_ext16u_i64, { "r", "r" } },
{ INDEX_op_ext32u_i64, { "r", "r" } },
+ { INDEX_op_ext_i32_i64, { "r", "r" } },
+ { INDEX_op_extu_i32_i64, { "r", "r" } },
+
{ INDEX_op_deposit_i64, { "Q", "0", "Q" } },
{ INDEX_op_movcond_i64, { "r", "r", "re", "r", "0" } },
diff --git a/tcg/ia64/tcg-target.c b/tcg/ia64/tcg-target.c
index 81cb9f7..71e79cf 100644
--- a/tcg/ia64/tcg-target.c
+++ b/tcg/ia64/tcg-target.c
@@ -2148,9 +2148,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
case INDEX_op_ext16u_i64:
tcg_out_ext(s, OPC_ZXT2_I29, args[0], args[1]);
break;
+ case INDEX_op_ext_i32_i64:
case INDEX_op_ext32s_i64:
tcg_out_ext(s, OPC_SXT4_I29, args[0], args[1]);
break;
+ case INDEX_op_extu_i32_i64:
case INDEX_op_ext32u_i64:
tcg_out_ext(s, OPC_ZXT4_I29, args[0], args[1]);
break;
@@ -2301,6 +2303,8 @@ static const TCGTargetOpDef ia64_op_defs[] = {
{ INDEX_op_ext16u_i64, { "r", "rZ"} },
{ INDEX_op_ext32s_i64, { "r", "rZ"} },
{ INDEX_op_ext32u_i64, { "r", "rZ"} },
+ { INDEX_op_ext_i32_i64, { "r", "rZ" } },
+ { INDEX_op_extu_i32_i64, { "r", "rZ" } },
{ INDEX_op_bswap16_i64, { "r", "rZ" } },
{ INDEX_op_bswap32_i64, { "r", "rZ" } },
diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
index ce8d546..1672220 100644
--- a/tcg/ppc/tcg-target.c
+++ b/tcg/ppc/tcg-target.c
@@ -2221,12 +2221,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
case INDEX_op_ext16s_i64:
c = EXTSH;
goto gen_ext;
+ case INDEX_op_ext_i32_i64:
case INDEX_op_ext32s_i64:
c = EXTSW;
goto gen_ext;
gen_ext:
tcg_out32(s, c | RS(args[1]) | RA(args[0]));
break;
+ case INDEX_op_extu_i32_i64:
+ tcg_out_ext32u(s, args[0], args[1]);
+ break;
case INDEX_op_setcond_i32:
tcg_out_setcond(s, TCG_TYPE_I32, args[3], args[0], args[1], args[2],
@@ -2503,6 +2507,8 @@ static const TCGTargetOpDef ppc_op_defs[] = {
{ INDEX_op_ext8s_i64, { "r", "r" } },
{ INDEX_op_ext16s_i64, { "r", "r" } },
{ INDEX_op_ext32s_i64, { "r", "r" } },
+ { INDEX_op_ext_i32_i64, { "r", "r" } },
+ { INDEX_op_extu_i32_i64, { "r", "r" } },
{ INDEX_op_bswap16_i64, { "r", "r" } },
{ INDEX_op_bswap32_i64, { "r", "r" } },
{ INDEX_op_bswap64_i64, { "r", "r" } },
diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index b3433ce..d4db6d3 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -2106,6 +2106,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
case INDEX_op_ext16s_i64:
tgen_ext16s(s, TCG_TYPE_I64, args[0], args[1]);
break;
+ case INDEX_op_ext_i32_i64:
case INDEX_op_ext32s_i64:
tgen_ext32s(s, args[0], args[1]);
break;
@@ -2115,6 +2116,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
case INDEX_op_ext16u_i64:
tgen_ext16u(s, TCG_TYPE_I64, args[0], args[1]);
break;
+ case INDEX_op_extu_i32_i64:
case INDEX_op_ext32u_i64:
tgen_ext32u(s, args[0], args[1]);
break;
@@ -2267,6 +2269,9 @@ static const TCGTargetOpDef s390_op_defs[] = {
{ INDEX_op_ext32s_i64, { "r", "r" } },
{ INDEX_op_ext32u_i64, { "r", "r" } },
+ { INDEX_op_ext_i32_i64, { "r", "r" } },
+ { INDEX_op_extu_i32_i64, { "r", "r" } },
+
{ INDEX_op_bswap16_i64, { "r", "r" } },
{ INDEX_op_bswap32_i64, { "r", "r" } },
{ INDEX_op_bswap64_i64, { "r", "r" } },
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index b23032b..fe75af0 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1407,9 +1407,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
case INDEX_op_divu_i64:
c = ARITH_UDIVX;
goto gen_arith;
+ case INDEX_op_ext_i32_i64:
case INDEX_op_ext32s_i64:
tcg_out_arithi(s, a0, a1, 0, SHIFT_SRA);
break;
+ case INDEX_op_extu_i32_i64:
case INDEX_op_ext32u_i64:
tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
break;
@@ -1531,8 +1533,10 @@ static const TCGTargetOpDef sparc_op_defs[] = {
{ INDEX_op_neg_i64, { "R", "RJ" } },
{ INDEX_op_not_i64, { "R", "RJ" } },
- { INDEX_op_ext32s_i64, { "R", "r" } },
- { INDEX_op_ext32u_i64, { "R", "r" } },
+ { INDEX_op_ext32s_i64, { "R", "R" } },
+ { INDEX_op_ext32u_i64, { "R", "R" } },
+ { INDEX_op_ext_i32_i64, { "R", "r" } },
+ { INDEX_op_extu_i32_i64, { "R", "r" } },
{ INDEX_op_trunc_shr_i64_i32, { "r", "R" } },
{ INDEX_op_brcond_i64, { "RZ", "RJ" } },
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 0e79fd1..7114315 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1770,9 +1770,8 @@ void tcg_gen_extu_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
tcg_gen_mov_i32(TCGV_LOW(ret), arg);
tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
} else {
- /* Note: we assume the target supports move between
- 32 and 64 bit registers. */
- tcg_gen_ext32u_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
+ tcg_gen_op2(&tcg_ctx, INDEX_op_extu_i32_i64,
+ GET_TCGV_I64(ret), GET_TCGV_I32(arg));
}
}
@@ -1782,9 +1781,8 @@ void tcg_gen_ext_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
tcg_gen_mov_i32(TCGV_LOW(ret), arg);
tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
} else {
- /* Note: we assume the target supports move between
- 32 and 64 bit registers. */
- tcg_gen_ext32s_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
+ tcg_gen_op2(&tcg_ctx, INDEX_op_ext_i32_i64,
+ GET_TCGV_I64(ret), GET_TCGV_I32(arg));
}
}
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 4a34f43..f721a5a 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -138,6 +138,9 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
+/* size changing ops */
+DEF(ext_i32_i64, 1, 1, 0, IMPL64)
+DEF(extu_i32_i64, 1, 1, 0, IMPL64)
DEF(trunc_shr_i64_i32, 1, 1, 1,
IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
| (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
diff --git a/tcg/tci/tcg-target.c b/tcg/tci/tcg-target.c
index 83472db..bbb54d4 100644
--- a/tcg/tci/tcg-target.c
+++ b/tcg/tci/tcg-target.c
@@ -210,6 +210,8 @@ static const TCGTargetOpDef tcg_target_op_defs[] = {
#if TCG_TARGET_HAS_ext32u_i64
{ INDEX_op_ext32u_i64, { R, R } },
#endif
+ { INDEX_op_ext_i32_i64, { R, R } },
+ { INDEX_op_extu_i32_i64, { R, R } },
#if TCG_TARGET_HAS_bswap16_i64
{ INDEX_op_bswap16_i64, { R, R } },
#endif
@@ -701,6 +703,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
case INDEX_op_ext16u_i64: /* Optional (TCG_TARGET_HAS_ext16u_i64). */
case INDEX_op_ext32s_i64: /* Optional (TCG_TARGET_HAS_ext32s_i64). */
case INDEX_op_ext32u_i64: /* Optional (TCG_TARGET_HAS_ext32u_i64). */
+ case INDEX_op_ext_i32_i64:
+ case INDEX_op_extu_i32_i64:
#endif /* TCG_TARGET_REG_BITS == 64 */
case INDEX_op_neg_i32: /* Optional (TCG_TARGET_HAS_neg_i32). */
case INDEX_op_not_i32: /* Optional (TCG_TARGET_HAS_not_i32). */
diff --git a/tci.c b/tci.c
index 8444948..3d6d177 100644
--- a/tci.c
+++ b/tci.c
@@ -1033,18 +1033,20 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
#endif
#if TCG_TARGET_HAS_ext32s_i64
case INDEX_op_ext32s_i64:
+#endif
+ case INDEX_op_ext_i32_i64:
t0 = *tb_ptr++;
t1 = tci_read_r32s(&tb_ptr);
tci_write_reg64(t0, t1);
break;
-#endif
#if TCG_TARGET_HAS_ext32u_i64
case INDEX_op_ext32u_i64:
+#endif
+ case INDEX_op_extu_i32_i64:
t0 = *tb_ptr++;
t1 = tci_read_r32(&tb_ptr);
tci_write_reg64(t0, t1);
break;
-#endif
#if TCG_TARGET_HAS_bswap16_i64
case INDEX_op_bswap16_i64:
TODO();
--
2.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH for-2.5 08/10] tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
` (6 preceding siblings ...)
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 07/10] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 09/10] tcg/optimize: do not remember garbage high bits for 32-bit ops Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 10/10] tcg: update README about size changing ops Aurelien Jarno
9 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson
They behave the same as ext32s_i64 and ext32u_i64 from the constant
folding and zero propagation point of view, except that they can't
be replaced by a mov, so we don't compute the affected value.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 56e0a17..155d5c3 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -356,9 +356,11 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
CASE_OP_32_64(ext16u):
return (uint16_t)x;
+ case INDEX_op_ext_i32_i64:
case INDEX_op_ext32s_i64:
return (int32_t)x;
+ case INDEX_op_extu_i32_i64:
case INDEX_op_ext32u_i64:
return (uint32_t)x;
@@ -843,6 +845,15 @@ void tcg_optimize(TCGContext *s)
mask = temps[args[1]].mask & mask;
break;
+ case INDEX_op_ext_i32_i64:
+ if ((temps[args[1]].mask & 0x80000000) != 0) {
+ break;
+ }
+ case INDEX_op_extu_i32_i64:
+ /* We do not compute affected as it is a size changing op. */
+ mask = (uint32_t)temps[args[1]].mask;
+ break;
+
CASE_OP_32_64(andc):
/* Known-zeros does not imply known-ones. Therefore unless
args[2] is constant, we can't infer anything from it. */
@@ -1021,6 +1032,8 @@ void tcg_optimize(TCGContext *s)
CASE_OP_32_64(ext16u):
case INDEX_op_ext32s_i64:
case INDEX_op_ext32u_i64:
+ case INDEX_op_ext_i32_i64:
+ case INDEX_op_extu_i32_i64:
if (temp_is_const(args[1])) {
tmp = do_constant_folding(opc, temps[args[1]].val, 0);
tcg_opt_gen_movi(s, op, args, args[0], tmp);
--
2.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH for-2.5 09/10] tcg/optimize: do not remember garbage high bits for 32-bit ops
2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
` (7 preceding siblings ...)
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 08/10] tcg/optimize: add optimizations for " Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 10/10] tcg: update README about size changing ops Aurelien Jarno
9 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson
Now that we have real size changing ops, we don't need to mark high
bits of the destination as garbage. The goal of the optimizer is to
predict the value of the temps (and not of the registers) and do
simplifications when possible. The problem there is therefore not the
fact that those bits are not counted as garbage, but that a size
changing op is replaced by a move.
This patch is basically a revert of 24666baf, including the changes that
have been made since then.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/optimize.c | 43 ++++++++++++-------------------------------
1 file changed, 12 insertions(+), 31 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 155d5c3..15d76a0 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -209,21 +209,13 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
TCGArg dst, TCGArg val)
{
- TCGOpcode new_op = op_to_movi(op->opc);
- tcg_target_ulong mask;
-
- op->opc = new_op;
+ op->opc = op_to_movi(op->opc);
reset_temp(dst);
- temps[dst].is_const = true;
temps[dst].is_copy = false;
+ temps[dst].is_const = true;
temps[dst].val = val;
- mask = val;
- if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
- /* High bits of the destination are now garbage. */
- mask |= ~0xffffffffull;
- }
- temps[dst].mask = mask;
+ temps[dst].mask = val;
args[0] = dst;
args[1] = val;
@@ -237,18 +229,9 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
return;
}
- TCGOpcode new_op = op_to_mov(op->opc);
- tcg_target_ulong mask;
-
- op->opc = new_op;
+ op->opc = op_to_mov(op->opc);
reset_temp(dst);
- mask = temps[src].mask;
- if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
- /* High bits of the destination are now garbage. */
- mask |= ~0xffffffffull;
- }
- temps[dst].mask = mask;
if (s->temps[src].type == s->temps[dst].type) {
if (!temp_is_copy(src)) {
@@ -256,15 +239,17 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
temps[src].next_copy = src;
temps[src].prev_copy = src;
}
- temps[dst].is_const = temps[src].is_const;
temps[dst].is_copy = true;
- temps[dst].val = temps[src].val;
temps[dst].next_copy = temps[src].next_copy;
temps[dst].prev_copy = src;
temps[temps[dst].next_copy].prev_copy = dst;
temps[src].next_copy = dst;
}
+ temps[dst].is_const = temps[src].is_const;
+ temps[dst].val = temps[src].val;
+ temps[dst].mask = temps[src].mask;
+
args[0] = dst;
args[1] = src;
}
@@ -597,7 +582,7 @@ void tcg_optimize(TCGContext *s)
reset_all_temps(nb_temps);
for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) {
- tcg_target_ulong mask, partmask, affected;
+ tcg_target_ulong mask, affected;
int nb_oargs, nb_iargs, i;
TCGArg tmp;
@@ -950,17 +935,13 @@ void tcg_optimize(TCGContext *s)
break;
}
- /* 32-bit ops generate 32-bit results. For the result is zero test
- below, we can ignore high bits, but for further optimizations we
- need to record that the high bits contain garbage. */
- partmask = mask;
+ /* 32-bit ops generate 32-bit results. */
if (!(def->flags & TCG_OPF_64BIT)) {
- mask |= ~(tcg_target_ulong)0xffffffffu;
- partmask &= 0xffffffffu;
+ mask &= 0xffffffffu;
affected &= 0xffffffffu;
}
- if (partmask == 0) {
+ if (mask == 0) {
assert(nb_oargs == 1);
tcg_opt_gen_movi(s, op, args, args[0], 0);
continue;
--
2.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Qemu-devel] [PATCH for-2.5 10/10] tcg: update README about size changing ops
2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
` (8 preceding siblings ...)
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 09/10] tcg/optimize: do not remember garbage high bits for 32-bit ops Aurelien Jarno
@ 2015-07-24 16:30 ` Aurelien Jarno
2015-07-31 16:02 ` Alex Bennée
9 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 16:30 UTC (permalink / raw)
To: qemu-devel; +Cc: Aurelien Jarno, Richard Henderson
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
tcg/README | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/tcg/README b/tcg/README
index 61b3899..a22f251 100644
--- a/tcg/README
+++ b/tcg/README
@@ -466,13 +466,25 @@ On a 32 bit target, all 64 bit operations are converted to 32 bits. A
few specific operations must be implemented to allow it (see add2_i32,
sub2_i32, brcond2_i32).
+On a 64 bit target, the values are transfered between 32 and 64-bit
+registers using the following ops:
+- trunc_shr_i64_i32
+- ext_i32_i64
+- extu_i32_i64
+
+They ensure that the values are correctly truncated or extended when
+moved from a 32-bit to a 64-bit register or vice-versa. Note that the
+trunc_shr_i64_i32 is an optional op. It is not necessary to implement
+it if all the following conditions are met:
+- 64-bit registers can hold 32-bit values
+- 32-bit values in a 64-bit register do not need to stay zero or
+ sign extended
+- all 32-bit TCG ops ignore the high part of 64-bit registers
+
Floating point operations are not supported in this version. A
previous incarnation of the code generator had full support of them,
but it is better to concentrate on integer operations first.
-On a 64 bit target, no assumption is made in TCG about the storage of
-the 32 bit values in 64 bit registers.
-
4.2) Constraints
GCC like constraints are used to define the constraints of every
--
2.1.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies Aurelien Jarno
@ 2015-07-24 20:15 ` Richard Henderson
2015-07-24 22:56 ` Aurelien Jarno
2015-07-29 16:12 ` Alex Bennée
1 sibling, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2015-07-24 20:15 UTC (permalink / raw)
To: Aurelien Jarno, qemu-devel
On 07/24/2015 09:30 AM, Aurelien Jarno wrote:
> Now that copies and constants are tracked separately, we can allow
> constant to have copies, deferring the choice to use a register or a
> constant to the register allocation pass. This prevent this kind of
> regular constant reloading:
This appears to cause
$ gdb --args ./x86_64-linux-user/qemu-x86_64 /bin/ls
...
(gdb) run
Starting program: /home/rth/work/qemu/bld/x86_64-linux-user/qemu-x86_64 /bin/ls
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff6604700 (LWP 7767)]
qemu-x86_64: /home/rth/work/qemu/qemu/tcg/tcg.c:1827: tcg_reg_alloc_bb_end:
Assertion `ts->val_type == TEMP_VAL_DEAD' failed.
Program received signal SIGABRT, Aborted.
r~
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies
2015-07-24 20:15 ` Richard Henderson
@ 2015-07-24 22:56 ` Aurelien Jarno
0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-24 22:56 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
On 2015-07-24 13:15, Richard Henderson wrote:
> On 07/24/2015 09:30 AM, Aurelien Jarno wrote:
> > Now that copies and constants are tracked separately, we can allow
> > constant to have copies, deferring the choice to use a register or a
> > constant to the register allocation pass. This prevent this kind of
> > regular constant reloading:
>
> This appears to cause
>
> $ gdb --args ./x86_64-linux-user/qemu-x86_64 /bin/ls
> ...
> (gdb) run
> Starting program: /home/rth/work/qemu/bld/x86_64-linux-user/qemu-x86_64 /bin/ls
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New Thread 0x7ffff6604700 (LWP 7767)]
> qemu-x86_64: /home/rth/work/qemu/qemu/tcg/tcg.c:1827: tcg_reg_alloc_bb_end:
> Assertion `ts->val_type == TEMP_VAL_DEAD' failed.
>
> Program received signal SIGABRT, Aborted.
It looks like I have forgotten to configure with --enable-debug-tcg.
With it I am able to reproduce the problem. It seems to be a bug in the
liveness analysis or the register allocator, I will try to come with a
patch soon.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking Aurelien Jarno
@ 2015-07-27 8:21 ` Paolo Bonzini
2015-07-27 9:09 ` Aurelien Jarno
0 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2015-07-27 8:21 UTC (permalink / raw)
To: Aurelien Jarno, qemu-devel; +Cc: Richard Henderson
On 24/07/2015 18:30, Aurelien Jarno wrote:
> The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate
> vector registers on most guests, it's not uncommon to have more than 100
> used temps. This means we have initialize more than 2kB at least twice
> per TB, often more when there is a few goto_tb.
>
> Instead used a TCGTempSet bit array to track which temps are in used in
> the current basic block. This means there are only around 16 bytes to
> initialize.
>
> This improves the boot time of a MIPS guest on an x86-64 host by around
> 7% and moves out tcg_optimize from the the top of the profiler list.
>
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> tcg/optimize.c | 32 ++++++++++++++++++++++----------
> 1 file changed, 22 insertions(+), 10 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index cd0e793..20e24b3 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -50,6 +50,7 @@ struct tcg_temp_info {
> };
>
> static struct tcg_temp_info temps[TCG_MAX_TEMPS];
> +static TCGTempSet temps_used;
>
> /* Reset TEMP's state to TCG_TEMP_UNDEF. If TEMP only had one copy, remove
> the copy flag from the left temp. */
> @@ -67,6 +68,22 @@ static void reset_temp(TCGArg temp)
> temps[temp].mask = -1;
> }
>
> +/* Reset all temporaries, given that there are NB_TEMPS of them. */
> +static void reset_all_temps(int nb_temps)
> +{
> + memset(&temps_used.l, 0, sizeof(long) * BITS_TO_LONGS(nb_temps));
You can use bitmap_zero here.
Paolo
> +}
> +
> +/* Initialize and activate a temporary. */
> +static void init_temp_info(TCGArg temp)
> +{
> + if (!test_bit(temp, temps_used.l)) {
> + temps[temp].state = TCG_TEMP_UNDEF;
> + temps[temp].mask = -1;
> + set_bit(temp, temps_used.l);
> + }
> +}
> +
> static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
> TCGOpcode opc, int nargs)
> {
> @@ -98,16 +115,6 @@ static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
> return new_op;
> }
>
> -/* Reset all temporaries, given that there are NB_TEMPS of them. */
> -static void reset_all_temps(int nb_temps)
> -{
> - int i;
> - for (i = 0; i < nb_temps; i++) {
> - temps[i].state = TCG_TEMP_UNDEF;
> - temps[i].mask = -1;
> - }
> -}
> -
> static int op_bits(TCGOpcode op)
> {
> const TCGOpDef *def = &tcg_op_defs[op];
> @@ -606,6 +613,11 @@ void tcg_optimize(TCGContext *s)
> nb_iargs = def->nb_iargs;
> }
>
> + /* Initialize the temps that are going to be used */
> + for (i = 0; i < nb_oargs + nb_iargs; i++) {
> + init_temp_info(args[i]);
> + }
> +
> /* Do copy propagation */
> for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
> if (temps[args[i]].state == TCG_TEMP_COPY) {
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately Aurelien Jarno
@ 2015-07-27 8:25 ` Paolo Bonzini
2015-07-27 9:10 ` Aurelien Jarno
2015-07-29 16:10 ` Alex Bennée
1 sibling, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2015-07-27 8:25 UTC (permalink / raw)
To: Aurelien Jarno, qemu-devel; +Cc: Richard Henderson
On 24/07/2015 18:30, Aurelien Jarno wrote:
> Use two bools to track constants and copies instead of an enum.
>
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> tcg/optimize.c | 30 +++++++++++++++---------------
> 1 file changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index d2b63a4..f16eb1e 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -35,14 +35,9 @@
> glue(glue(case INDEX_op_, x), _i32): \
> glue(glue(case INDEX_op_, x), _i64)
>
> -typedef enum {
> - TCG_TEMP_UNDEF = 0,
> - TCG_TEMP_CONST,
> - TCG_TEMP_COPY,
> -} tcg_temp_state;
> -
> struct tcg_temp_info {
> - tcg_temp_state state;
> + bool is_const;
> + bool is_copy;
Could temps[arg].is_copy be replaced by temps[arg].next_copy != arg?
For example, this:
if (temps[temp].prev_copy == temps[temp].next_copy) {
temps[temps[temp].next_copy].is_copy = false;
} else {
temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
}
would be replaced simply by
temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
Paolo
> uint16_t prev_copy;
> uint16_t next_copy;
> tcg_target_ulong val;
> @@ -54,12 +49,12 @@ static TCGTempSet temps_used;
>
> static inline bool temp_is_const(TCGArg arg)
> {
> - return temps[arg].state == TCG_TEMP_CONST;
> + return temps[arg].is_const;
> }
>
> static inline bool temp_is_copy(TCGArg arg)
> {
> - return temps[arg].state == TCG_TEMP_COPY;
> + return temps[arg].is_copy;
> }
>
> /* Reset TEMP's state to TCG_TEMP_UNDEF. If TEMP only had one copy, remove
> @@ -68,13 +63,14 @@ static void reset_temp(TCGArg temp)
> {
> if (temp_is_copy(temp)) {
> if (temps[temp].prev_copy == temps[temp].next_copy) {
> - temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
> + temps[temps[temp].next_copy].is_copy = false;
> } else {
> temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
> temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
> }
> }
> - temps[temp].state = TCG_TEMP_UNDEF;
> + temps[temp].is_const = false;
> + temps[temp].is_copy = false;
> temps[temp].mask = -1;
> }
>
> @@ -88,7 +84,8 @@ static void reset_all_temps(int nb_temps)
> static void init_temp_info(TCGArg temp)
> {
> if (!test_bit(temp, temps_used.l)) {
> - temps[temp].state = TCG_TEMP_UNDEF;
> + temps[temp].is_const = false;
> + temps[temp].is_copy = false;
> temps[temp].mask = -1;
> set_bit(temp, temps_used.l);
> }
> @@ -218,7 +215,8 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
> op->opc = new_op;
>
> reset_temp(dst);
> - temps[dst].state = TCG_TEMP_CONST;
> + temps[dst].is_const = true;
> + temps[dst].is_copy = false;
> temps[dst].val = val;
> mask = val;
> if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
> @@ -261,11 +259,13 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
>
> if (s->temps[src].type == s->temps[dst].type) {
> if (!temp_is_copy(src)) {
> - temps[src].state = TCG_TEMP_COPY;
> + temps[src].is_const = false;
> + temps[src].is_copy = true;
> temps[src].next_copy = src;
> temps[src].prev_copy = src;
> }
> - temps[dst].state = TCG_TEMP_COPY;
> + temps[dst].is_const = false;
> + temps[dst].is_copy = true;
> temps[dst].next_copy = temps[src].next_copy;
> temps[dst].prev_copy = src;
> temps[temps[dst].next_copy].prev_copy = dst;
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking
2015-07-27 8:21 ` Paolo Bonzini
@ 2015-07-27 9:09 ` Aurelien Jarno
0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-27 9:09 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, Richard Henderson
On 2015-07-27 10:21, Paolo Bonzini wrote:
>
>
> On 24/07/2015 18:30, Aurelien Jarno wrote:
> > The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate
> > vector registers on most guests, it's not uncommon to have more than 100
> > used temps. This means we have initialize more than 2kB at least twice
> > per TB, often more when there is a few goto_tb.
> >
> > Instead used a TCGTempSet bit array to track which temps are in used in
> > the current basic block. This means there are only around 16 bytes to
> > initialize.
> >
> > This improves the boot time of a MIPS guest on an x86-64 host by around
> > 7% and moves out tcg_optimize from the the top of the profiler list.
> >
> > Cc: Richard Henderson <rth@twiddle.net>
> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> > ---
> > tcg/optimize.c | 32 ++++++++++++++++++++++----------
> > 1 file changed, 22 insertions(+), 10 deletions(-)
> >
> > diff --git a/tcg/optimize.c b/tcg/optimize.c
> > index cd0e793..20e24b3 100644
> > --- a/tcg/optimize.c
> > +++ b/tcg/optimize.c
> > @@ -50,6 +50,7 @@ struct tcg_temp_info {
> > };
> >
> > static struct tcg_temp_info temps[TCG_MAX_TEMPS];
> > +static TCGTempSet temps_used;
> >
> > /* Reset TEMP's state to TCG_TEMP_UNDEF. If TEMP only had one copy, remove
> > the copy flag from the left temp. */
> > @@ -67,6 +68,22 @@ static void reset_temp(TCGArg temp)
> > temps[temp].mask = -1;
> > }
> >
> > +/* Reset all temporaries, given that there are NB_TEMPS of them. */
> > +static void reset_all_temps(int nb_temps)
> > +{
> > + memset(&temps_used.l, 0, sizeof(long) * BITS_TO_LONGS(nb_temps));
>
> You can use bitmap_zero here.
>
Indeed, thanks for the hint. That will be in v2.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately
2015-07-27 8:25 ` Paolo Bonzini
@ 2015-07-27 9:10 ` Aurelien Jarno
0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-27 9:10 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, Richard Henderson
On 2015-07-27 10:25, Paolo Bonzini wrote:
>
>
> On 24/07/2015 18:30, Aurelien Jarno wrote:
> > Use two bools to track constants and copies instead of an enum.
> >
> > Cc: Richard Henderson <rth@twiddle.net>
> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> > ---
> > tcg/optimize.c | 30 +++++++++++++++---------------
> > 1 file changed, 15 insertions(+), 15 deletions(-)
> >
> > diff --git a/tcg/optimize.c b/tcg/optimize.c
> > index d2b63a4..f16eb1e 100644
> > --- a/tcg/optimize.c
> > +++ b/tcg/optimize.c
> > @@ -35,14 +35,9 @@
> > glue(glue(case INDEX_op_, x), _i32): \
> > glue(glue(case INDEX_op_, x), _i64)
> >
> > -typedef enum {
> > - TCG_TEMP_UNDEF = 0,
> > - TCG_TEMP_CONST,
> > - TCG_TEMP_COPY,
> > -} tcg_temp_state;
> > -
> > struct tcg_temp_info {
> > - tcg_temp_state state;
> > + bool is_const;
> > + bool is_copy;
>
> Could temps[arg].is_copy be replaced by temps[arg].next_copy != arg?
>
> For example, this:
>
> if (temps[temp].prev_copy == temps[temp].next_copy) {
> temps[temps[temp].next_copy].is_copy = false;
> } else {
> temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
> temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
> }
>
> would be replaced simply by
>
> temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
> temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
>
That's an interesting idea, especially I have a patch series in
preparation that get rid of is_const. I will try to get something like
that for v2.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 02/10] tcg/optimize: add temp_is_const and temp_is_copy functions
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 02/10] tcg/optimize: add temp_is_const and temp_is_copy functions Aurelien Jarno
@ 2015-07-29 16:01 ` Alex Bennée
2015-07-29 16:25 ` Aurelien Jarno
0 siblings, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2015-07-29 16:01 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson
Aurelien Jarno <aurelien@aurel32.net> writes:
> Add two accessor functions temp_is_const and temp_is_copy, to make the
> code more readable and make code change easier.
>
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> tcg/optimize.c | 131 ++++++++++++++++++++++++++-------------------------------
> 1 file changed, 60 insertions(+), 71 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 20e24b3..d2b63a4 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -52,11 +52,21 @@ struct tcg_temp_info {
> static struct tcg_temp_info temps[TCG_MAX_TEMPS];
> static TCGTempSet temps_used;
>
> +static inline bool temp_is_const(TCGArg arg)
> +{
> + return temps[arg].state == TCG_TEMP_CONST;
> +}
> +
> +static inline bool temp_is_copy(TCGArg arg)
> +{
> + return temps[arg].state == TCG_TEMP_COPY;
> +}
> +
> /* Reset TEMP's state to TCG_TEMP_UNDEF. If TEMP only had one copy, remove
> the copy flag from the left temp. */
> static void reset_temp(TCGArg temp)
> {
> - if (temps[temp].state == TCG_TEMP_COPY) {
> + if (temp_is_copy(temp)) {
> if (temps[temp].prev_copy == temps[temp].next_copy) {
> temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
> } else {
> @@ -186,8 +196,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
> return true;
> }
>
> - if (temps[arg1].state != TCG_TEMP_COPY
> - || temps[arg2].state != TCG_TEMP_COPY) {
> + if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) {
> return false;
> }
>
> @@ -230,7 +239,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
> return;
> }
>
> - if (temps[src].state == TCG_TEMP_CONST) {
> + if (temp_is_const(src)) {
> tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
> return;
> }
> @@ -248,10 +257,10 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
> }
> temps[dst].mask = mask;
>
> - assert(temps[src].state != TCG_TEMP_CONST);
> + assert(!temp_is_const(src));
>
> if (s->temps[src].type == s->temps[dst].type) {
> - if (temps[src].state != TCG_TEMP_COPY) {
> + if (!temp_is_copy(src)) {
> temps[src].state = TCG_TEMP_COPY;
> temps[src].next_copy = src;
> temps[src].prev_copy = src;
> @@ -488,7 +497,7 @@ static bool do_constant_folding_cond_eq(TCGCond c)
> static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
> TCGArg y, TCGCond c)
> {
> - if (temps[x].state == TCG_TEMP_CONST && temps[y].state == TCG_TEMP_CONST) {
> + if (temp_is_const(x) && temp_is_const(y)) {
> switch (op_bits(op)) {
> case 32:
> return do_constant_folding_cond_32(temps[x].val, temps[y].val, c);
> @@ -499,7 +508,7 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
> }
> } else if (temps_are_copies(x, y)) {
> return do_constant_folding_cond_eq(c);
> - } else if (temps[y].state == TCG_TEMP_CONST && temps[y].val == 0) {
> + } else if (temp_is_const(y) && temps[y].val == 0) {
> switch (c) {
> case TCG_COND_LTU:
> return 0;
> @@ -520,12 +529,10 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
> TCGArg al = p1[0], ah = p1[1];
> TCGArg bl = p2[0], bh = p2[1];
>
> - if (temps[bl].state == TCG_TEMP_CONST
> - && temps[bh].state == TCG_TEMP_CONST) {
> + if (temp_is_const(bl) && temp_is_const(bh)) {
> uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val;
>
> - if (temps[al].state == TCG_TEMP_CONST
> - && temps[ah].state == TCG_TEMP_CONST) {
> + if (temp_is_const(al) && temp_is_const(ah)) {
> uint64_t a;
> a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val;
> return do_constant_folding_cond_64(a, b, c);
> @@ -551,8 +558,8 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
> {
> TCGArg a1 = *p1, a2 = *p2;
> int sum = 0;
> - sum += temps[a1].state == TCG_TEMP_CONST;
> - sum -= temps[a2].state == TCG_TEMP_CONST;
> + sum += temp_is_const(a1);
> + sum -= temp_is_const(a2);
>
> /* Prefer the constant in second argument, and then the form
> op a, a, b, which is better handled on non-RISC hosts. */
> @@ -567,10 +574,10 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
> static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
> {
> int sum = 0;
> - sum += temps[p1[0]].state == TCG_TEMP_CONST;
> - sum += temps[p1[1]].state == TCG_TEMP_CONST;
> - sum -= temps[p2[0]].state == TCG_TEMP_CONST;
> - sum -= temps[p2[1]].state == TCG_TEMP_CONST;
> + sum += temp_is_const(p1[0]);
> + sum += temp_is_const(p1[1]);
> + sum -= temp_is_const(p2[0]);
> + sum -= temp_is_const(p2[1]);
> if (sum > 0) {
> TCGArg t;
> t = p1[0], p1[0] = p2[0], p2[0] = t;
> @@ -620,7 +627,7 @@ void tcg_optimize(TCGContext *s)
>
> /* Do copy propagation */
> for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
> - if (temps[args[i]].state == TCG_TEMP_COPY) {
> + if (temp_is_copy(args[i])) {
> args[i] = find_better_copy(s, args[i]);
> }
> }
> @@ -690,8 +697,7 @@ void tcg_optimize(TCGContext *s)
> CASE_OP_32_64(sar):
> CASE_OP_32_64(rotl):
> CASE_OP_32_64(rotr):
> - if (temps[args[1]].state == TCG_TEMP_CONST
> - && temps[args[1]].val == 0) {
> + if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
> tcg_opt_gen_movi(s, op, args, args[0], 0);
> continue;
> }
> @@ -701,7 +707,7 @@ void tcg_optimize(TCGContext *s)
> TCGOpcode neg_op;
> bool have_neg;
>
> - if (temps[args[2]].state == TCG_TEMP_CONST) {
> + if (temp_is_const(args[2])) {
> /* Proceed with possible constant folding. */
> break;
> }
> @@ -715,8 +721,7 @@ void tcg_optimize(TCGContext *s)
> if (!have_neg) {
> break;
> }
> - if (temps[args[1]].state == TCG_TEMP_CONST
> - && temps[args[1]].val == 0) {
> + if (temp_is_const(args[1]) && temps[args[1]].val ==
> 0) {
This makes me wonder if we should have:
temp_is_const_val(arg, val)
to wrap up these tests even more neatly
> op->opc = neg_op;
> reset_temp(args[0]);
> args[1] = args[2];
> @@ -726,34 +731,30 @@ void tcg_optimize(TCGContext *s)
> break;
> CASE_OP_32_64(xor):
> CASE_OP_32_64(nand):
> - if (temps[args[1]].state != TCG_TEMP_CONST
> - && temps[args[2]].state == TCG_TEMP_CONST
> - && temps[args[2]].val == -1) {
> + if (!temp_is_const(args[1])
> + && temp_is_const(args[2]) && temps[args[2]].val == -1) {
> i = 1;
> goto try_not;
> }
> break;
> CASE_OP_32_64(nor):
> - if (temps[args[1]].state != TCG_TEMP_CONST
> - && temps[args[2]].state == TCG_TEMP_CONST
> - && temps[args[2]].val == 0) {
> + if (!temp_is_const(args[1])
> + && temp_is_const(args[2]) && temps[args[2]].val == 0) {
> i = 1;
> goto try_not;
> }
> break;
> CASE_OP_32_64(andc):
> - if (temps[args[2]].state != TCG_TEMP_CONST
> - && temps[args[1]].state == TCG_TEMP_CONST
> - && temps[args[1]].val == -1) {
> + if (!temp_is_const(args[2])
> + && temp_is_const(args[1]) && temps[args[1]].val == -1) {
> i = 2;
> goto try_not;
> }
> break;
> CASE_OP_32_64(orc):
> CASE_OP_32_64(eqv):
> - if (temps[args[2]].state != TCG_TEMP_CONST
> - && temps[args[1]].state == TCG_TEMP_CONST
> - && temps[args[1]].val == 0) {
> + if (!temp_is_const(args[2])
> + && temp_is_const(args[1]) && temps[args[1]].val == 0) {
> i = 2;
> goto try_not;
> }
> @@ -794,9 +795,8 @@ void tcg_optimize(TCGContext *s)
> CASE_OP_32_64(or):
> CASE_OP_32_64(xor):
> CASE_OP_32_64(andc):
> - if (temps[args[1]].state != TCG_TEMP_CONST
> - && temps[args[2]].state == TCG_TEMP_CONST
> - && temps[args[2]].val == 0) {
> + if (!temp_is_const(args[1])
> + && temp_is_const(args[2]) && temps[args[2]].val == 0) {
> tcg_opt_gen_mov(s, op, args, args[0], args[1]);
> continue;
> }
> @@ -804,9 +804,8 @@ void tcg_optimize(TCGContext *s)
> CASE_OP_32_64(and):
> CASE_OP_32_64(orc):
> CASE_OP_32_64(eqv):
> - if (temps[args[1]].state != TCG_TEMP_CONST
> - && temps[args[2]].state == TCG_TEMP_CONST
> - && temps[args[2]].val == -1) {
> + if (!temp_is_const(args[1])
> + && temp_is_const(args[2]) && temps[args[2]].val == -1) {
> tcg_opt_gen_mov(s, op, args, args[0], args[1]);
> continue;
> }
> @@ -844,7 +843,7 @@ void tcg_optimize(TCGContext *s)
>
> CASE_OP_32_64(and):
> mask = temps[args[2]].mask;
> - if (temps[args[2]].state == TCG_TEMP_CONST) {
> + if (temp_is_const(args[2])) {
> and_const:
> affected = temps[args[1]].mask & ~mask;
> }
> @@ -854,7 +853,7 @@ void tcg_optimize(TCGContext *s)
> CASE_OP_32_64(andc):
> /* Known-zeros does not imply known-ones. Therefore unless
> args[2] is constant, we can't infer anything from it. */
> - if (temps[args[2]].state == TCG_TEMP_CONST) {
> + if (temp_is_const(args[2])) {
> mask = ~temps[args[2]].mask;
> goto and_const;
> }
> @@ -863,26 +862,26 @@ void tcg_optimize(TCGContext *s)
> break;
>
> case INDEX_op_sar_i32:
> - if (temps[args[2]].state == TCG_TEMP_CONST) {
> + if (temp_is_const(args[2])) {
> tmp = temps[args[2]].val & 31;
> mask = (int32_t)temps[args[1]].mask >> tmp;
> }
> break;
> case INDEX_op_sar_i64:
> - if (temps[args[2]].state == TCG_TEMP_CONST) {
> + if (temp_is_const(args[2])) {
> tmp = temps[args[2]].val & 63;
> mask = (int64_t)temps[args[1]].mask >> tmp;
> }
> break;
>
> case INDEX_op_shr_i32:
> - if (temps[args[2]].state == TCG_TEMP_CONST) {
> + if (temp_is_const(args[2])) {
> tmp = temps[args[2]].val & 31;
> mask = (uint32_t)temps[args[1]].mask >> tmp;
> }
> break;
> case INDEX_op_shr_i64:
> - if (temps[args[2]].state == TCG_TEMP_CONST) {
> + if (temp_is_const(args[2])) {
> tmp = temps[args[2]].val & 63;
> mask = (uint64_t)temps[args[1]].mask >> tmp;
> }
> @@ -893,7 +892,7 @@ void tcg_optimize(TCGContext *s)
> break;
>
> CASE_OP_32_64(shl):
> - if (temps[args[2]].state == TCG_TEMP_CONST) {
> + if (temp_is_const(args[2])) {
> tmp = temps[args[2]].val & (TCG_TARGET_REG_BITS - 1);
> mask = temps[args[1]].mask << tmp;
> }
> @@ -974,8 +973,7 @@ void tcg_optimize(TCGContext *s)
> CASE_OP_32_64(mul):
> CASE_OP_32_64(muluh):
> CASE_OP_32_64(mulsh):
> - if ((temps[args[2]].state == TCG_TEMP_CONST
> - && temps[args[2]].val == 0)) {
> + if ((temp_is_const(args[2]) && temps[args[2]].val == 0)) {
> tcg_opt_gen_movi(s, op, args, args[0], 0);
> continue;
> }
> @@ -1030,7 +1028,7 @@ void tcg_optimize(TCGContext *s)
> CASE_OP_32_64(ext16u):
> case INDEX_op_ext32s_i64:
> case INDEX_op_ext32u_i64:
> - if (temps[args[1]].state == TCG_TEMP_CONST) {
> + if (temp_is_const(args[1])) {
> tmp = do_constant_folding(opc, temps[args[1]].val, 0);
> tcg_opt_gen_movi(s, op, args, args[0], tmp);
> break;
> @@ -1038,7 +1036,7 @@ void tcg_optimize(TCGContext *s)
> goto do_default;
>
> case INDEX_op_trunc_shr_i32:
> - if (temps[args[1]].state == TCG_TEMP_CONST) {
> + if (temp_is_const(args[1])) {
> tmp = do_constant_folding(opc, temps[args[1]].val, args[2]);
> tcg_opt_gen_movi(s, op, args, args[0], tmp);
> break;
> @@ -1067,8 +1065,7 @@ void tcg_optimize(TCGContext *s)
> CASE_OP_32_64(divu):
> CASE_OP_32_64(rem):
> CASE_OP_32_64(remu):
> - if (temps[args[1]].state == TCG_TEMP_CONST
> - && temps[args[2]].state == TCG_TEMP_CONST) {
> + if (temp_is_const(args[1]) && temp_is_const(args[2])) {
> tmp = do_constant_folding(opc, temps[args[1]].val,
> temps[args[2]].val);
> tcg_opt_gen_movi(s, op, args, args[0], tmp);
> @@ -1077,8 +1074,7 @@ void tcg_optimize(TCGContext *s)
> goto do_default;
>
> CASE_OP_32_64(deposit):
> - if (temps[args[1]].state == TCG_TEMP_CONST
> - && temps[args[2]].state == TCG_TEMP_CONST) {
> + if (temp_is_const(args[1]) && temp_is_const(args[2])) {
> tmp = deposit64(temps[args[1]].val, args[3], args[4],
> temps[args[2]].val);
> tcg_opt_gen_movi(s, op, args, args[0], tmp);
> @@ -1118,10 +1114,8 @@ void tcg_optimize(TCGContext *s)
>
> case INDEX_op_add2_i32:
> case INDEX_op_sub2_i32:
> - if (temps[args[2]].state == TCG_TEMP_CONST
> - && temps[args[3]].state == TCG_TEMP_CONST
> - && temps[args[4]].state == TCG_TEMP_CONST
> - && temps[args[5]].state == TCG_TEMP_CONST) {
> + if (temp_is_const(args[2]) && temp_is_const(args[3])
> + && temp_is_const(args[4]) && temp_is_const(args[5])) {
> uint32_t al = temps[args[2]].val;
> uint32_t ah = temps[args[3]].val;
> uint32_t bl = temps[args[4]].val;
> @@ -1150,8 +1144,7 @@ void tcg_optimize(TCGContext *s)
> goto do_default;
>
> case INDEX_op_mulu2_i32:
> - if (temps[args[2]].state == TCG_TEMP_CONST
> - && temps[args[3]].state == TCG_TEMP_CONST) {
> + if (temp_is_const(args[2]) && temp_is_const(args[3])) {
> uint32_t a = temps[args[2]].val;
> uint32_t b = temps[args[3]].val;
> uint64_t r = (uint64_t)a * b;
> @@ -1183,10 +1176,8 @@ void tcg_optimize(TCGContext *s)
> tcg_op_remove(s, op);
> }
> } else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
> - && temps[args[2]].state == TCG_TEMP_CONST
> - && temps[args[3]].state == TCG_TEMP_CONST
> - && temps[args[2]].val == 0
> - && temps[args[3]].val == 0) {
> + && temp_is_const(args[2]) && temps[args[2]].val == 0
> + && temp_is_const(args[3]) && temps[args[3]].val == 0) {
> /* Simplify LT/GE comparisons vs zero to a single compare
> vs the high word of the input. */
> do_brcond_high:
> @@ -1248,10 +1239,8 @@ void tcg_optimize(TCGContext *s)
> do_setcond_const:
> tcg_opt_gen_movi(s, op, args, args[0], tmp);
> } else if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
> - && temps[args[3]].state == TCG_TEMP_CONST
> - && temps[args[4]].state == TCG_TEMP_CONST
> - && temps[args[3]].val == 0
> - && temps[args[4]].val == 0) {
> + && temp_is_const(args[3]) && temps[args[3]].val == 0
> + && temp_is_const(args[4]) && temps[args[4]].val == 0) {
> /* Simplify LT/GE comparisons vs zero to a single compare
> vs the high word of the input. */
> do_setcond_high:
Otherwise it look sane to me:
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately Aurelien Jarno
2015-07-27 8:25 ` Paolo Bonzini
@ 2015-07-29 16:10 ` Alex Bennée
2015-07-29 16:25 ` Aurelien Jarno
1 sibling, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2015-07-29 16:10 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson
Aurelien Jarno <aurelien@aurel32.net> writes:
> Use two bools to track constants and copies instead of an enum.
More of an explanation would be useful as to why, otherwise it seems to
me we are just increasing the size of the structure (assuming the
compiler uses the same base sizes for the bool and enum).
>
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> tcg/optimize.c | 30 +++++++++++++++---------------
> 1 file changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index d2b63a4..f16eb1e 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -35,14 +35,9 @@
> glue(glue(case INDEX_op_, x), _i32): \
> glue(glue(case INDEX_op_, x), _i64)
>
> -typedef enum {
> - TCG_TEMP_UNDEF = 0,
> - TCG_TEMP_CONST,
> - TCG_TEMP_COPY,
> -} tcg_temp_state;
> -
> struct tcg_temp_info {
> - tcg_temp_state state;
> + bool is_const;
> + bool is_copy;
> uint16_t prev_copy;
> uint16_t next_copy;
> tcg_target_ulong val;
> @@ -54,12 +49,12 @@ static TCGTempSet temps_used;
>
> static inline bool temp_is_const(TCGArg arg)
> {
> - return temps[arg].state == TCG_TEMP_CONST;
> + return temps[arg].is_const;
> }
>
> static inline bool temp_is_copy(TCGArg arg)
> {
> - return temps[arg].state == TCG_TEMP_COPY;
> + return temps[arg].is_copy;
> }
>
> /* Reset TEMP's state to TCG_TEMP_UNDEF. If TEMP only had one copy, remove
> @@ -68,13 +63,14 @@ static void reset_temp(TCGArg temp)
> {
> if (temp_is_copy(temp)) {
> if (temps[temp].prev_copy == temps[temp].next_copy) {
> - temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
> + temps[temps[temp].next_copy].is_copy = false;
> } else {
> temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
> temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
> }
> }
> - temps[temp].state = TCG_TEMP_UNDEF;
> + temps[temp].is_const = false;
> + temps[temp].is_copy = false;
> temps[temp].mask = -1;
> }
>
> @@ -88,7 +84,8 @@ static void reset_all_temps(int nb_temps)
> static void init_temp_info(TCGArg temp)
> {
> if (!test_bit(temp, temps_used.l)) {
> - temps[temp].state = TCG_TEMP_UNDEF;
> + temps[temp].is_const = false;
> + temps[temp].is_copy = false;
> temps[temp].mask = -1;
> set_bit(temp, temps_used.l);
> }
> @@ -218,7 +215,8 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
> op->opc = new_op;
>
> reset_temp(dst);
> - temps[dst].state = TCG_TEMP_CONST;
> + temps[dst].is_const = true;
> + temps[dst].is_copy = false;
> temps[dst].val = val;
> mask = val;
> if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
> @@ -261,11 +259,13 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
>
> if (s->temps[src].type == s->temps[dst].type) {
> if (!temp_is_copy(src)) {
> - temps[src].state = TCG_TEMP_COPY;
> + temps[src].is_const = false;
> + temps[src].is_copy = true;
> temps[src].next_copy = src;
> temps[src].prev_copy = src;
> }
> - temps[dst].state = TCG_TEMP_COPY;
> + temps[dst].is_const = false;
> + temps[dst].is_copy = true;
> temps[dst].next_copy = temps[src].next_copy;
> temps[dst].prev_copy = src;
> temps[temps[dst].next_copy].prev_copy = dst;
--
Alex Bennée
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies Aurelien Jarno
2015-07-24 20:15 ` Richard Henderson
@ 2015-07-29 16:12 ` Alex Bennée
2015-07-29 16:27 ` Aurelien Jarno
1 sibling, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2015-07-29 16:12 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson
Aurelien Jarno <aurelien@aurel32.net> writes:
> Now that copies and constants are tracked separately, we can allow
> constant to have copies, deferring the choice to use a register or a
> constant to the register allocation pass. This prevent this kind of
> regular constant reloading:
>
> -OUT: [size=338]
> +OUT: [size=298]
> mov -0x4(%r14),%ebp
> test %ebp,%ebp
> jne 0x7ffbe9cb0ed6
> mov $0x40002219f8,%rbp
> mov %rbp,(%r14)
> - mov $0x40002219f8,%rbp
> mov $0x4000221a20,%rbx
> mov %rbp,(%rbx)
> mov $0x4000000000,%rbp
> mov %rbp,(%r14)
> - mov $0x4000000000,%rbp
> mov $0x4000221d38,%rbx
> mov %rbp,(%rbx)
> mov $0x40002221a8,%rbp
> mov %rbp,(%r14)
> - mov $0x40002221a8,%rbp
> mov $0x4000221d40,%rbx
> mov %rbp,(%rbx)
> mov $0x4000019170,%rbp
> mov %rbp,(%r14)
> - mov $0x4000019170,%rbp
> mov $0x4000221d48,%rbx
> mov %rbp,(%rbx)
> mov $0x40000049ee,%rbp
> mov %rbp,0x80(%r14)
> mov %r14,%rdi
> callq 0x7ffbe99924d0
> mov $0x4000001680,%rbp
> mov %rbp,0x30(%r14)
> mov 0x10(%r14),%rbp
> mov $0x4000001680,%rbp
> mov %rbp,0x30(%r14)
> mov 0x10(%r14),%rbp
> shl $0x20,%rbp
> mov (%r14),%rbx
> mov %ebx,%ebx
> mov %rbx,(%r14)
> or %rbx,%rbp
> mov %rbp,0x10(%r14)
> mov %rbp,0x90(%r14)
> mov 0x60(%r14),%rbx
> mov %rbx,0x38(%r14)
> mov 0x28(%r14),%rbx
> mov $0x4000220e60,%r12
> mov %rbx,(%r12)
> mov $0x40002219c8,%rbx
> mov %rbp,(%rbx)
> mov 0x20(%r14),%rbp
> sub $0x8,%rbp
> mov $0x4000004a16,%rbx
> mov %rbx,0x0(%rbp)
> mov %rbp,0x20(%r14)
> mov $0x19,%ebp
> mov %ebp,0xa8(%r14)
> mov $0x4000015110,%rbp
> mov %rbp,0x80(%r14)
> xor %eax,%eax
> jmpq 0x7ffbebcae426
> lea -0x5f6d72a(%rip),%rax # 0x7ffbe3d437b3
> jmpq 0x7ffbebcae426
>
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> tcg/optimize.c | 11 ++---------
> 1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index f16eb1e..48103b2 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -237,11 +237,6 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
> return;
> }
>
> - if (temp_is_const(src)) {
> - tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
> - return;
> - }
> -
That looks suspicious, surely we only want to drop the move if we
already have the const somewhere else?
> TCGOpcode new_op = op_to_mov(op->opc);
> tcg_target_ulong mask;
>
> @@ -255,17 +250,15 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
> }
> temps[dst].mask = mask;
>
> - assert(!temp_is_const(src));
> -
> if (s->temps[src].type == s->temps[dst].type) {
> if (!temp_is_copy(src)) {
> - temps[src].is_const = false;
> temps[src].is_copy = true;
> temps[src].next_copy = src;
> temps[src].prev_copy = src;
> }
> - temps[dst].is_const = false;
> + temps[dst].is_const = temps[src].is_const;
> temps[dst].is_copy = true;
> + temps[dst].val = temps[src].val;
> temps[dst].next_copy = temps[src].next_copy;
> temps[dst].prev_copy = src;
> temps[temps[dst].next_copy].prev_copy = dst;
--
Alex Bennée
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately
2015-07-29 16:10 ` Alex Bennée
@ 2015-07-29 16:25 ` Aurelien Jarno
0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-29 16:25 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel, Richard Henderson
On 2015-07-29 17:10, Alex Bennée wrote:
>
> Aurelien Jarno <aurelien@aurel32.net> writes:
>
> > Use two bools to track constants and copies instead of an enum.
>
> More of an explanation would be useful as to why, otherwise it seems to
> me we are just increasing the size of the structure (assuming the
> compiler uses the same base sizes for the bool and enum).
The reason is in the following patch, to allow separate tracking of const
and copy. Note that this doesn't increase the structure size due to
alignment and that the v2 of this patchset actually uses only one bool.
I'll add a note about why it is need for v3.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 02/10] tcg/optimize: add temp_is_const and temp_is_copy functions
2015-07-29 16:01 ` Alex Bennée
@ 2015-07-29 16:25 ` Aurelien Jarno
0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-29 16:25 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel, Richard Henderson
On 2015-07-29 17:01, Alex Bennée wrote:
>
> Aurelien Jarno <aurelien@aurel32.net> writes:
>
> > Add two accessor functions temp_is_const and temp_is_copy, to make the
> > code more readable and make code change easier.
> >
> > Cc: Richard Henderson <rth@twiddle.net>
> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> > ---
> > tcg/optimize.c | 131 ++++++++++++++++++++++++++-------------------------------
> > 1 file changed, 60 insertions(+), 71 deletions(-)
> >
> > diff --git a/tcg/optimize.c b/tcg/optimize.c
> > index 20e24b3..d2b63a4 100644
> > --- a/tcg/optimize.c
> > +++ b/tcg/optimize.c
> > @@ -52,11 +52,21 @@ struct tcg_temp_info {
> > static struct tcg_temp_info temps[TCG_MAX_TEMPS];
> > static TCGTempSet temps_used;
> >
> > +static inline bool temp_is_const(TCGArg arg)
> > +{
> > + return temps[arg].state == TCG_TEMP_CONST;
> > +}
> > +
> > +static inline bool temp_is_copy(TCGArg arg)
> > +{
> > + return temps[arg].state == TCG_TEMP_COPY;
> > +}
> > +
> > /* Reset TEMP's state to TCG_TEMP_UNDEF. If TEMP only had one copy, remove
> > the copy flag from the left temp. */
> > static void reset_temp(TCGArg temp)
> > {
> > - if (temps[temp].state == TCG_TEMP_COPY) {
> > + if (temp_is_copy(temp)) {
> > if (temps[temp].prev_copy == temps[temp].next_copy) {
> > temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
> > } else {
> > @@ -186,8 +196,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
> > return true;
> > }
> >
> > - if (temps[arg1].state != TCG_TEMP_COPY
> > - || temps[arg2].state != TCG_TEMP_COPY) {
> > + if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) {
> > return false;
> > }
> >
> > @@ -230,7 +239,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
> > return;
> > }
> >
> > - if (temps[src].state == TCG_TEMP_CONST) {
> > + if (temp_is_const(src)) {
> > tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
> > return;
> > }
> > @@ -248,10 +257,10 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
> > }
> > temps[dst].mask = mask;
> >
> > - assert(temps[src].state != TCG_TEMP_CONST);
> > + assert(!temp_is_const(src));
> >
> > if (s->temps[src].type == s->temps[dst].type) {
> > - if (temps[src].state != TCG_TEMP_COPY) {
> > + if (!temp_is_copy(src)) {
> > temps[src].state = TCG_TEMP_COPY;
> > temps[src].next_copy = src;
> > temps[src].prev_copy = src;
> > @@ -488,7 +497,7 @@ static bool do_constant_folding_cond_eq(TCGCond c)
> > static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
> > TCGArg y, TCGCond c)
> > {
> > - if (temps[x].state == TCG_TEMP_CONST && temps[y].state == TCG_TEMP_CONST) {
> > + if (temp_is_const(x) && temp_is_const(y)) {
> > switch (op_bits(op)) {
> > case 32:
> > return do_constant_folding_cond_32(temps[x].val, temps[y].val, c);
> > @@ -499,7 +508,7 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
> > }
> > } else if (temps_are_copies(x, y)) {
> > return do_constant_folding_cond_eq(c);
> > - } else if (temps[y].state == TCG_TEMP_CONST && temps[y].val == 0) {
> > + } else if (temp_is_const(y) && temps[y].val == 0) {
> > switch (c) {
> > case TCG_COND_LTU:
> > return 0;
> > @@ -520,12 +529,10 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
> > TCGArg al = p1[0], ah = p1[1];
> > TCGArg bl = p2[0], bh = p2[1];
> >
> > - if (temps[bl].state == TCG_TEMP_CONST
> > - && temps[bh].state == TCG_TEMP_CONST) {
> > + if (temp_is_const(bl) && temp_is_const(bh)) {
> > uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val;
> >
> > - if (temps[al].state == TCG_TEMP_CONST
> > - && temps[ah].state == TCG_TEMP_CONST) {
> > + if (temp_is_const(al) && temp_is_const(ah)) {
> > uint64_t a;
> > a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val;
> > return do_constant_folding_cond_64(a, b, c);
> > @@ -551,8 +558,8 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
> > {
> > TCGArg a1 = *p1, a2 = *p2;
> > int sum = 0;
> > - sum += temps[a1].state == TCG_TEMP_CONST;
> > - sum -= temps[a2].state == TCG_TEMP_CONST;
> > + sum += temp_is_const(a1);
> > + sum -= temp_is_const(a2);
> >
> > /* Prefer the constant in second argument, and then the form
> > op a, a, b, which is better handled on non-RISC hosts. */
> > @@ -567,10 +574,10 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
> > static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
> > {
> > int sum = 0;
> > - sum += temps[p1[0]].state == TCG_TEMP_CONST;
> > - sum += temps[p1[1]].state == TCG_TEMP_CONST;
> > - sum -= temps[p2[0]].state == TCG_TEMP_CONST;
> > - sum -= temps[p2[1]].state == TCG_TEMP_CONST;
> > + sum += temp_is_const(p1[0]);
> > + sum += temp_is_const(p1[1]);
> > + sum -= temp_is_const(p2[0]);
> > + sum -= temp_is_const(p2[1]);
> > if (sum > 0) {
> > TCGArg t;
> > t = p1[0], p1[0] = p2[0], p2[0] = t;
> > @@ -620,7 +627,7 @@ void tcg_optimize(TCGContext *s)
> >
> > /* Do copy propagation */
> > for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
> > - if (temps[args[i]].state == TCG_TEMP_COPY) {
> > + if (temp_is_copy(args[i])) {
> > args[i] = find_better_copy(s, args[i]);
> > }
> > }
> > @@ -690,8 +697,7 @@ void tcg_optimize(TCGContext *s)
> > CASE_OP_32_64(sar):
> > CASE_OP_32_64(rotl):
> > CASE_OP_32_64(rotr):
> > - if (temps[args[1]].state == TCG_TEMP_CONST
> > - && temps[args[1]].val == 0) {
> > + if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
> > tcg_opt_gen_movi(s, op, args, args[0], 0);
> > continue;
> > }
> > @@ -701,7 +707,7 @@ void tcg_optimize(TCGContext *s)
> > TCGOpcode neg_op;
> > bool have_neg;
> >
> > - if (temps[args[2]].state == TCG_TEMP_CONST) {
> > + if (temp_is_const(args[2])) {
> > /* Proceed with possible constant folding. */
> > break;
> > }
> > @@ -715,8 +721,7 @@ void tcg_optimize(TCGContext *s)
> > if (!have_neg) {
> > break;
> > }
> > - if (temps[args[1]].state == TCG_TEMP_CONST
> > - && temps[args[1]].val == 0) {
> > + if (temp_is_const(args[1]) && temps[args[1]].val ==
> > 0) {
>
> This makes me wonder if we should have:
>
> temp_is_const_val(arg, val)
>
> to wrap up these tests even more neatly
That's something that can be added, but in that case we need both
temp_is_const and temp_is_const_val as we sometimes need to test if
a temp is a constant without necessarily checking for a particular
value.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies
2015-07-29 16:12 ` Alex Bennée
@ 2015-07-29 16:27 ` Aurelien Jarno
2015-07-29 20:42 ` Alex Bennée
0 siblings, 1 reply; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-29 16:27 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel, Richard Henderson
On 2015-07-29 17:12, Alex Bennée wrote:
>
> Aurelien Jarno <aurelien@aurel32.net> writes:
>
> > Now that copies and constants are tracked separately, we can allow
> > constant to have copies, deferring the choice to use a register or a
> > constant to the register allocation pass. This prevent this kind of
> > regular constant reloading:
> >
> > -OUT: [size=338]
> > +OUT: [size=298]
> > mov -0x4(%r14),%ebp
> > test %ebp,%ebp
> > jne 0x7ffbe9cb0ed6
> > mov $0x40002219f8,%rbp
> > mov %rbp,(%r14)
> > - mov $0x40002219f8,%rbp
> > mov $0x4000221a20,%rbx
> > mov %rbp,(%rbx)
> > mov $0x4000000000,%rbp
> > mov %rbp,(%r14)
> > - mov $0x4000000000,%rbp
> > mov $0x4000221d38,%rbx
> > mov %rbp,(%rbx)
> > mov $0x40002221a8,%rbp
> > mov %rbp,(%r14)
> > - mov $0x40002221a8,%rbp
> > mov $0x4000221d40,%rbx
> > mov %rbp,(%rbx)
> > mov $0x4000019170,%rbp
> > mov %rbp,(%r14)
> > - mov $0x4000019170,%rbp
> > mov $0x4000221d48,%rbx
> > mov %rbp,(%rbx)
> > mov $0x40000049ee,%rbp
> > mov %rbp,0x80(%r14)
> > mov %r14,%rdi
> > callq 0x7ffbe99924d0
> > mov $0x4000001680,%rbp
> > mov %rbp,0x30(%r14)
> > mov 0x10(%r14),%rbp
> > mov $0x4000001680,%rbp
> > mov %rbp,0x30(%r14)
> > mov 0x10(%r14),%rbp
> > shl $0x20,%rbp
> > mov (%r14),%rbx
> > mov %ebx,%ebx
> > mov %rbx,(%r14)
> > or %rbx,%rbp
> > mov %rbp,0x10(%r14)
> > mov %rbp,0x90(%r14)
> > mov 0x60(%r14),%rbx
> > mov %rbx,0x38(%r14)
> > mov 0x28(%r14),%rbx
> > mov $0x4000220e60,%r12
> > mov %rbx,(%r12)
> > mov $0x40002219c8,%rbx
> > mov %rbp,(%rbx)
> > mov 0x20(%r14),%rbp
> > sub $0x8,%rbp
> > mov $0x4000004a16,%rbx
> > mov %rbx,0x0(%rbp)
> > mov %rbp,0x20(%r14)
> > mov $0x19,%ebp
> > mov %ebp,0xa8(%r14)
> > mov $0x4000015110,%rbp
> > mov %rbp,0x80(%r14)
> > xor %eax,%eax
> > jmpq 0x7ffbebcae426
> > lea -0x5f6d72a(%rip),%rax # 0x7ffbe3d437b3
> > jmpq 0x7ffbebcae426
> >
> > Cc: Richard Henderson <rth@twiddle.net>
> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> > ---
> > tcg/optimize.c | 11 ++---------
> > 1 file changed, 2 insertions(+), 9 deletions(-)
> >
> > diff --git a/tcg/optimize.c b/tcg/optimize.c
> > index f16eb1e..48103b2 100644
> > --- a/tcg/optimize.c
> > +++ b/tcg/optimize.c
> > @@ -237,11 +237,6 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
> > return;
> > }
> >
> > - if (temp_is_const(src)) {
> > - tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
> > - return;
> > - }
> > -
>
> That looks suspicious, surely we only want to drop the move if we
> already have the const somewhere else?
That's actually the while point of this patchset, to avoid converting
mov into moving for constant values and thus loosing the link between
temps.
At this moment point of the code, the only way to know that the source
temp is a constant is when it has been set using a movi before.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies
2015-07-29 16:27 ` Aurelien Jarno
@ 2015-07-29 20:42 ` Alex Bennée
2015-07-30 7:46 ` Aurelien Jarno
0 siblings, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2015-07-29 20:42 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson
Aurelien Jarno <aurelien@aurel32.net> writes:
> On 2015-07-29 17:12, Alex Bennée wrote:
>>
>> Aurelien Jarno <aurelien@aurel32.net> writes:
>>
>> > Now that copies and constants are tracked separately, we can allow
>> > constant to have copies, deferring the choice to use a register or a
>> > constant to the register allocation pass. This prevent this kind of
>> > regular constant reloading:
>> >
>> > -OUT: [size=338]
>> > +OUT: [size=298]
>> > mov -0x4(%r14),%ebp
>> > test %ebp,%ebp
>> > jne 0x7ffbe9cb0ed6
>> > mov $0x40002219f8,%rbp
>> > mov %rbp,(%r14)
>> > - mov $0x40002219f8,%rbp
>> > mov $0x4000221a20,%rbx
>> > mov %rbp,(%rbx)
>> > mov $0x4000000000,%rbp
>> > mov %rbp,(%r14)
<snip>
>> > --- a/tcg/optimize.c
>> > +++ b/tcg/optimize.c
>> > @@ -237,11 +237,6 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
>> > return;
>> > }
>> >
>> > - if (temp_is_const(src)) {
>> > - tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
>> > - return;
>> > - }
>> > -
>>
>> That looks suspicious, surely we only want to drop the move if we
>> already have the const somewhere else?
>
> That's actually the while point of this patchset, to avoid converting
> mov into moving for constant values and thus loosing the link between
> temps.
I get that, I guess I didn't follow the previous loading of the constant
value. Maybe showing the ops in the commit message will make it clearer.
>
> At this moment point of the code, the only way to know that the source
> temp is a constant is when it has been set using a movi before.
OK.
--
Alex Bennée
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies
2015-07-29 20:42 ` Alex Bennée
@ 2015-07-30 7:46 ` Aurelien Jarno
0 siblings, 0 replies; 30+ messages in thread
From: Aurelien Jarno @ 2015-07-30 7:46 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel, Richard Henderson
On 2015-07-29 21:42, Alex Bennée wrote:
>
> Aurelien Jarno <aurelien@aurel32.net> writes:
>
> > On 2015-07-29 17:12, Alex Bennée wrote:
> >>
> >> Aurelien Jarno <aurelien@aurel32.net> writes:
> >>
> >> > Now that copies and constants are tracked separately, we can allow
> >> > constant to have copies, deferring the choice to use a register or a
> >> > constant to the register allocation pass. This prevent this kind of
> >> > regular constant reloading:
> >> >
> >> > -OUT: [size=338]
> >> > +OUT: [size=298]
> >> > mov -0x4(%r14),%ebp
> >> > test %ebp,%ebp
> >> > jne 0x7ffbe9cb0ed6
> >> > mov $0x40002219f8,%rbp
> >> > mov %rbp,(%r14)
> >> > - mov $0x40002219f8,%rbp
> >> > mov $0x4000221a20,%rbx
> >> > mov %rbp,(%rbx)
> >> > mov $0x4000000000,%rbp
> >> > mov %rbp,(%r14)
> <snip>
> >> > --- a/tcg/optimize.c
> >> > +++ b/tcg/optimize.c
> >> > @@ -237,11 +237,6 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
> >> > return;
> >> > }
> >> >
> >> > - if (temp_is_const(src)) {
> >> > - tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
> >> > - return;
> >> > - }
> >> > -
> >>
> >> That looks suspicious, surely we only want to drop the move if we
> >> already have the const somewhere else?
> >
> > That's actually the while point of this patchset, to avoid converting
> > mov into moving for constant values and thus loosing the link between
> > temps.
>
> I get that, I guess I didn't follow the previous loading of the constant
> value. Maybe showing the ops in the commit message will make it clearer.
Ok, I'll do that in v3.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 05/10] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 05/10] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
@ 2015-07-31 6:31 ` Alex Bennée
0 siblings, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2015-07-31 6:31 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson
Aurelien Jarno <aurelien@aurel32.net> writes:
> The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32,
> and the name in the README doesn't match the name offered to the
> frontends.
I was tempted to suggest just naming it shr_i64_i32 as the truncation
is implicit and the slightly shorter name doesn't mess the formatting up
as much. But then I see we have another usage of trunc so:
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>
> Always use the long name to make it clear it is a size changing op.
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> tcg/README | 2 +-
> tcg/aarch64/tcg-target.h | 2 +-
> tcg/i386/tcg-target.h | 2 +-
> tcg/ia64/tcg-target.h | 2 +-
> tcg/optimize.c | 6 +++---
> tcg/ppc/tcg-target.h | 2 +-
> tcg/s390/tcg-target.h | 2 +-
> tcg/sparc/tcg-target.c | 4 ++--
> tcg/sparc/tcg-target.h | 2 +-
> tcg/tcg-op.c | 4 ++--
> tcg/tcg-opc.h | 4 ++--
> tcg/tcg.h | 2 +-
> tcg/tci/tcg-target.h | 2 +-
> 13 files changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/tcg/README b/tcg/README
> index a550ff1..61b3899 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -314,7 +314,7 @@ This operation would be equivalent to
>
> dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
>
> -* trunc_shr_i32 t0, t1, pos
> +* trunc_shr_i64_i32 t0, t1, pos
>
> For 64-bit hosts only, right shift the 64-bit input T1 by POS and
> truncate to 32-bit output T0. Depending on the host, this may be
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 8aec04d..dfd8801 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -70,7 +70,7 @@ typedef enum {
> #define TCG_TARGET_HAS_muls2_i32 0
> #define TCG_TARGET_HAS_muluh_i32 0
> #define TCG_TARGET_HAS_mulsh_i32 0
> -#define TCG_TARGET_HAS_trunc_shr_i32 0
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
>
> #define TCG_TARGET_HAS_div_i64 1
> #define TCG_TARGET_HAS_rem_i64 1
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 25b5133..dae50ba 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -102,7 +102,7 @@ extern bool have_bmi1;
> #define TCG_TARGET_HAS_mulsh_i32 0
>
> #if TCG_TARGET_REG_BITS == 64
> -#define TCG_TARGET_HAS_trunc_shr_i32 0
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
> #define TCG_TARGET_HAS_div2_i64 1
> #define TCG_TARGET_HAS_rot_i64 1
> #define TCG_TARGET_HAS_ext8s_i64 1
> diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
> index a04ed81..29902f9 100644
> --- a/tcg/ia64/tcg-target.h
> +++ b/tcg/ia64/tcg-target.h
> @@ -160,7 +160,7 @@ typedef enum {
> #define TCG_TARGET_HAS_muluh_i64 0
> #define TCG_TARGET_HAS_mulsh_i32 0
> #define TCG_TARGET_HAS_mulsh_i64 0
> -#define TCG_TARGET_HAS_trunc_shr_i32 0
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
>
> #define TCG_TARGET_deposit_i32_valid(ofs, len) ((len) <= 16)
> #define TCG_TARGET_deposit_i64_valid(ofs, len) ((len) <= 16)
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 48103b2..56e0a17 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -301,7 +301,7 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
> case INDEX_op_shr_i32:
> return (uint32_t)x >> (y & 31);
>
> - case INDEX_op_trunc_shr_i32:
> + case INDEX_op_trunc_shr_i64_i32:
> case INDEX_op_shr_i64:
> return (uint64_t)x >> (y & 63);
>
> @@ -880,7 +880,7 @@ void tcg_optimize(TCGContext *s)
> }
> break;
>
> - case INDEX_op_trunc_shr_i32:
> + case INDEX_op_trunc_shr_i64_i32:
> mask = (uint64_t)temps[args[1]].mask >> args[2];
> break;
>
> @@ -1028,7 +1028,7 @@ void tcg_optimize(TCGContext *s)
> }
> goto do_default;
>
> - case INDEX_op_trunc_shr_i32:
> + case INDEX_op_trunc_shr_i64_i32:
> if (temp_is_const(args[1])) {
> tmp = do_constant_folding(opc, temps[args[1]].val, args[2]);
> tcg_opt_gen_movi(s, op, args, args[0], tmp);
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index 7ce7048..b7e6861 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -77,7 +77,7 @@ typedef enum {
> #if TCG_TARGET_REG_BITS == 64
> #define TCG_TARGET_HAS_add2_i32 0
> #define TCG_TARGET_HAS_sub2_i32 0
> -#define TCG_TARGET_HAS_trunc_shr_i32 0
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
> #define TCG_TARGET_HAS_div_i64 1
> #define TCG_TARGET_HAS_rem_i64 0
> #define TCG_TARGET_HAS_rot_i64 1
> diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
> index 91576d5..50016a8 100644
> --- a/tcg/s390/tcg-target.h
> +++ b/tcg/s390/tcg-target.h
> @@ -72,7 +72,7 @@ typedef enum TCGReg {
> #define TCG_TARGET_HAS_muls2_i32 0
> #define TCG_TARGET_HAS_muluh_i32 0
> #define TCG_TARGET_HAS_mulsh_i32 0
> -#define TCG_TARGET_HAS_trunc_shr_i32 0
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
>
> #define TCG_TARGET_HAS_div2_i64 1
> #define TCG_TARGET_HAS_rot_i64 1
> diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
> index 1a870a8..b23032b 100644
> --- a/tcg/sparc/tcg-target.c
> +++ b/tcg/sparc/tcg-target.c
> @@ -1413,7 +1413,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
> case INDEX_op_ext32u_i64:
> tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
> break;
> - case INDEX_op_trunc_shr_i32:
> + case INDEX_op_trunc_shr_i64_i32:
> if (a2 == 0) {
> tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
> } else {
> @@ -1533,7 +1533,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
>
> { INDEX_op_ext32s_i64, { "R", "r" } },
> { INDEX_op_ext32u_i64, { "R", "r" } },
> - { INDEX_op_trunc_shr_i32, { "r", "R" } },
> + { INDEX_op_trunc_shr_i64_i32, { "r", "R" } },
>
> { INDEX_op_brcond_i64, { "RZ", "RJ" } },
> { INDEX_op_setcond_i64, { "R", "RZ", "RJ" } },
> diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
> index f584de4..336c47f 100644
> --- a/tcg/sparc/tcg-target.h
> +++ b/tcg/sparc/tcg-target.h
> @@ -118,7 +118,7 @@ extern bool use_vis3_instructions;
> #define TCG_TARGET_HAS_muluh_i32 0
> #define TCG_TARGET_HAS_mulsh_i32 0
>
> -#define TCG_TARGET_HAS_trunc_shr_i32 1
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 1
> #define TCG_TARGET_HAS_div_i64 1
> #define TCG_TARGET_HAS_rem_i64 0
> #define TCG_TARGET_HAS_rot_i64 0
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 45098c3..61b64db 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -1751,8 +1751,8 @@ void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
> tcg_gen_mov_i32(ret, TCGV_LOW(t));
> tcg_temp_free_i64(t);
> }
> - } else if (TCG_TARGET_HAS_trunc_shr_i32) {
> - tcg_gen_op3i_i32(INDEX_op_trunc_shr_i32, ret,
> + } else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
> + tcg_gen_op3i_i32(INDEX_op_trunc_shr_i64_i32, ret,
> MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
> } else if (count == 0) {
> tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index 13ccb60..4a34f43 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -138,8 +138,8 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
> DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
> DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
>
> -DEF(trunc_shr_i32, 1, 1, 1,
> - IMPL(TCG_TARGET_HAS_trunc_shr_i32)
> +DEF(trunc_shr_i64_i32, 1, 1, 1,
> + IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
> | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
>
> DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | IMPL64)
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 231a781..e7e33b9 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -66,7 +66,7 @@ typedef uint64_t TCGRegSet;
>
> #if TCG_TARGET_REG_BITS == 32
> /* Turn some undef macros into false macros. */
> -#define TCG_TARGET_HAS_trunc_shr_i32 0
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
> #define TCG_TARGET_HAS_div_i64 0
> #define TCG_TARGET_HAS_rem_i64 0
> #define TCG_TARGET_HAS_div2_i64 0
> diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
> index cbf3f9b..8b1139b 100644
> --- a/tcg/tci/tcg-target.h
> +++ b/tcg/tci/tcg-target.h
> @@ -84,7 +84,7 @@
> #define TCG_TARGET_HAS_mulsh_i32 0
>
> #if TCG_TARGET_REG_BITS == 64
> -#define TCG_TARGET_HAS_trunc_shr_i32 0
> +#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
> #define TCG_TARGET_HAS_bswap16_i64 1
> #define TCG_TARGET_HAS_bswap32_i64 1
> #define TCG_TARGET_HAS_bswap64_i64 1
--
Alex Bennée
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 06/10] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 06/10] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
@ 2015-07-31 7:32 ` Alex Bennée
0 siblings, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2015-07-31 7:32 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson
Aurelien Jarno <aurelien@aurel32.net> writes:
> The tcg_gen_trunc_shr_i64_i32 function takes a 64-bit argument and
> returns a 32-bit value. Directly call tcg_gen_op3 with the correct
> types instead of calling tcg_gen_op3i_i32 and abusing the TCG types.
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> tcg/tcg-op.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 61b64db..0e79fd1 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -1752,8 +1752,8 @@ void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
> tcg_temp_free_i64(t);
> }
> } else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
> - tcg_gen_op3i_i32(INDEX_op_trunc_shr_i64_i32, ret,
> - MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
> + tcg_gen_op3(&tcg_ctx, INDEX_op_trunc_shr_i64_i32,
> + GET_TCGV_I32(ret), GET_TCGV_I64(arg), count);
> } else if (count == 0) {
> tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
> } else {
--
Alex Bennée
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 07/10] tcg: implement real ext_i32_i64 and extu_i32_i64 ops
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 07/10] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
@ 2015-07-31 16:01 ` Alex Bennée
2015-07-31 16:11 ` Richard Henderson
0 siblings, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2015-07-31 16:01 UTC (permalink / raw)
To: Aurelien Jarno
Cc: Claudio Fontana, Stefan Weil, Claudio Fontana, qemu-devel,
Alexander Graf, Blue Swirl, Richard Henderson
Aurelien Jarno <aurelien@aurel32.net> writes:
> Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a
> 32-bit value is always converted to a 64-bit value and not propagated
> through the register allocator or the optimizer.
>
> Cc: Andrzej Zaborowski <balrogg@gmail.com>
> Cc: Alexander Graf <agraf@suse.de>
> Cc: Blue Swirl <blauwirbel@gmail.com>
> Cc: Claudio Fontana <claudio.fontana@huawei.com>
> Cc: Claudio Fontana <claudio.fontana@gmail.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Stefan Weil <sw@weilnetz.de>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> tcg/aarch64/tcg-target.c | 4 ++++
> tcg/i386/tcg-target.c | 5 +++++
> tcg/ia64/tcg-target.c | 4 ++++
> tcg/ppc/tcg-target.c | 6 ++++++
> tcg/s390/tcg-target.c | 5 +++++
> tcg/sparc/tcg-target.c | 8 ++++++--
> tcg/tcg-op.c | 10 ++++------
> tcg/tcg-opc.h | 3 +++
> tcg/tci/tcg-target.c | 4 ++++
> tci.c | 6 ++++--
> 10 files changed, 45 insertions(+), 10 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index b7ec4f5..7f7ab7e 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -1556,6 +1556,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
> case INDEX_op_ext16s_i32:
> tcg_out_sxt(s, ext, MO_16, a0, a1);
> break;
> + case INDEX_op_ext_i32_i64:
> case INDEX_op_ext32s_i64:
> tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
> break;
> @@ -1567,6 +1568,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
> case INDEX_op_ext16u_i32:
> tcg_out_uxt(s, MO_16, a0, a1);
> break;
> + case INDEX_op_extu_i32_i64:
> case INDEX_op_ext32u_i64:
> tcg_out_movr(s, TCG_TYPE_I32, a0, a1);
So what is the difference between extu_i32_i64 and ext32u_i64. The
README skips over this particular part of the naming convention and I
wonder if we should be clearer about that before we add more ops.
> break;
> @@ -1712,6 +1714,8 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
> { INDEX_op_ext8u_i64, { "r", "r" } },
> { INDEX_op_ext16u_i64, { "r", "r" } },
> { INDEX_op_ext32u_i64, { "r", "r" } },
> + { INDEX_op_ext_i32_i64, { "r", "r" } },
> + { INDEX_op_extu_i32_i64, { "r", "r" } },
>
> { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
> { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
> diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
> index 4f40468..ff55499 100644
> --- a/tcg/i386/tcg-target.c
> +++ b/tcg/i386/tcg-target.c
> @@ -2068,9 +2068,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
> case INDEX_op_bswap64_i64:
> tcg_out_bswap64(s, args[0]);
> break;
> + case INDEX_op_extu_i32_i64:
> case INDEX_op_ext32u_i64:
> tcg_out_ext32u(s, args[0], args[1]);
> break;
> + case INDEX_op_ext_i32_i64:
> case INDEX_op_ext32s_i64:
> tcg_out_ext32s(s, args[0], args[1]);
> break;
> @@ -2205,6 +2207,9 @@ static const TCGTargetOpDef x86_op_defs[] = {
> { INDEX_op_ext16u_i64, { "r", "r" } },
> { INDEX_op_ext32u_i64, { "r", "r" } },
>
> + { INDEX_op_ext_i32_i64, { "r", "r" } },
> + { INDEX_op_extu_i32_i64, { "r", "r" } },
> +
> { INDEX_op_deposit_i64, { "Q", "0", "Q" } },
> { INDEX_op_movcond_i64, { "r", "r", "re", "r", "0" } },
>
> diff --git a/tcg/ia64/tcg-target.c b/tcg/ia64/tcg-target.c
> index 81cb9f7..71e79cf 100644
> --- a/tcg/ia64/tcg-target.c
> +++ b/tcg/ia64/tcg-target.c
> @@ -2148,9 +2148,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
> case INDEX_op_ext16u_i64:
> tcg_out_ext(s, OPC_ZXT2_I29, args[0], args[1]);
> break;
> + case INDEX_op_ext_i32_i64:
> case INDEX_op_ext32s_i64:
> tcg_out_ext(s, OPC_SXT4_I29, args[0], args[1]);
> break;
> + case INDEX_op_extu_i32_i64:
> case INDEX_op_ext32u_i64:
> tcg_out_ext(s, OPC_ZXT4_I29, args[0], args[1]);
> break;
> @@ -2301,6 +2303,8 @@ static const TCGTargetOpDef ia64_op_defs[] = {
> { INDEX_op_ext16u_i64, { "r", "rZ"} },
> { INDEX_op_ext32s_i64, { "r", "rZ"} },
> { INDEX_op_ext32u_i64, { "r", "rZ"} },
> + { INDEX_op_ext_i32_i64, { "r", "rZ" } },
> + { INDEX_op_extu_i32_i64, { "r", "rZ" } },
>
> { INDEX_op_bswap16_i64, { "r", "rZ" } },
> { INDEX_op_bswap32_i64, { "r", "rZ" } },
> diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
> index ce8d546..1672220 100644
> --- a/tcg/ppc/tcg-target.c
> +++ b/tcg/ppc/tcg-target.c
> @@ -2221,12 +2221,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
> case INDEX_op_ext16s_i64:
> c = EXTSH;
> goto gen_ext;
> + case INDEX_op_ext_i32_i64:
> case INDEX_op_ext32s_i64:
> c = EXTSW;
> goto gen_ext;
> gen_ext:
> tcg_out32(s, c | RS(args[1]) | RA(args[0]));
> break;
> + case INDEX_op_extu_i32_i64:
> + tcg_out_ext32u(s, args[0], args[1]);
> + break;
>
> case INDEX_op_setcond_i32:
> tcg_out_setcond(s, TCG_TYPE_I32, args[3], args[0], args[1], args[2],
> @@ -2503,6 +2507,8 @@ static const TCGTargetOpDef ppc_op_defs[] = {
> { INDEX_op_ext8s_i64, { "r", "r" } },
> { INDEX_op_ext16s_i64, { "r", "r" } },
> { INDEX_op_ext32s_i64, { "r", "r" } },
> + { INDEX_op_ext_i32_i64, { "r", "r" } },
> + { INDEX_op_extu_i32_i64, { "r", "r" } },
Again getting confused about naming here - I would read both of those as
widen from 32 to 64 bit registers without sign extension.
> { INDEX_op_bswap16_i64, { "r", "r" } },
> { INDEX_op_bswap32_i64, { "r", "r" } },
> { INDEX_op_bswap64_i64, { "r", "r" } },
> diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
> index b3433ce..d4db6d3 100644
> --- a/tcg/s390/tcg-target.c
> +++ b/tcg/s390/tcg-target.c
> @@ -2106,6 +2106,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
> case INDEX_op_ext16s_i64:
> tgen_ext16s(s, TCG_TYPE_I64, args[0], args[1]);
> break;
> + case INDEX_op_ext_i32_i64:
> case INDEX_op_ext32s_i64:
> tgen_ext32s(s, args[0], args[1]);
> break;
> @@ -2115,6 +2116,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
> case INDEX_op_ext16u_i64:
> tgen_ext16u(s, TCG_TYPE_I64, args[0], args[1]);
> break;
> + case INDEX_op_extu_i32_i64:
> case INDEX_op_ext32u_i64:
> tgen_ext32u(s, args[0], args[1]);
> break;
> @@ -2267,6 +2269,9 @@ static const TCGTargetOpDef s390_op_defs[] = {
> { INDEX_op_ext32s_i64, { "r", "r" } },
> { INDEX_op_ext32u_i64, { "r", "r" } },
>
> + { INDEX_op_ext_i32_i64, { "r", "r" } },
> + { INDEX_op_extu_i32_i64, { "r", "r" } },
> +
> { INDEX_op_bswap16_i64, { "r", "r" } },
> { INDEX_op_bswap32_i64, { "r", "r" } },
> { INDEX_op_bswap64_i64, { "r", "r" } },
> diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
> index b23032b..fe75af0 100644
> --- a/tcg/sparc/tcg-target.c
> +++ b/tcg/sparc/tcg-target.c
> @@ -1407,9 +1407,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
> case INDEX_op_divu_i64:
> c = ARITH_UDIVX;
> goto gen_arith;
> + case INDEX_op_ext_i32_i64:
> case INDEX_op_ext32s_i64:
> tcg_out_arithi(s, a0, a1, 0, SHIFT_SRA);
> break;
> + case INDEX_op_extu_i32_i64:
> case INDEX_op_ext32u_i64:
> tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
> break;
> @@ -1531,8 +1533,10 @@ static const TCGTargetOpDef sparc_op_defs[] = {
> { INDEX_op_neg_i64, { "R", "RJ" } },
> { INDEX_op_not_i64, { "R", "RJ" } },
>
> - { INDEX_op_ext32s_i64, { "R", "r" } },
> - { INDEX_op_ext32u_i64, { "R", "r" } },
> + { INDEX_op_ext32s_i64, { "R", "R" } },
> + { INDEX_op_ext32u_i64, { "R", "R" } },
> + { INDEX_op_ext_i32_i64, { "R", "r" } },
> + { INDEX_op_extu_i32_i64, { "R", "r" } },
> { INDEX_op_trunc_shr_i64_i32, { "r", "R" } },
>
> { INDEX_op_brcond_i64, { "RZ", "RJ" } },
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 0e79fd1..7114315 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -1770,9 +1770,8 @@ void tcg_gen_extu_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
> tcg_gen_mov_i32(TCGV_LOW(ret), arg);
> tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
> } else {
> - /* Note: we assume the target supports move between
> - 32 and 64 bit registers. */
> - tcg_gen_ext32u_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
> + tcg_gen_op2(&tcg_ctx, INDEX_op_extu_i32_i64,
> + GET_TCGV_I64(ret), GET_TCGV_I32(arg));
> }
> }
>
> @@ -1782,9 +1781,8 @@ void tcg_gen_ext_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
> tcg_gen_mov_i32(TCGV_LOW(ret), arg);
> tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
> } else {
> - /* Note: we assume the target supports move between
> - 32 and 64 bit registers. */
> - tcg_gen_ext32s_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
> + tcg_gen_op2(&tcg_ctx, INDEX_op_ext_i32_i64,
> + GET_TCGV_I64(ret), GET_TCGV_I32(arg));
> }
> }
>
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index 4a34f43..f721a5a 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -138,6 +138,9 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
> DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
> DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
>
> +/* size changing ops */
> +DEF(ext_i32_i64, 1, 1, 0, IMPL64)
> +DEF(extu_i32_i64, 1, 1, 0, IMPL64)
> DEF(trunc_shr_i64_i32, 1, 1, 1,
> IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
> | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
> diff --git a/tcg/tci/tcg-target.c b/tcg/tci/tcg-target.c
> index 83472db..bbb54d4 100644
> --- a/tcg/tci/tcg-target.c
> +++ b/tcg/tci/tcg-target.c
> @@ -210,6 +210,8 @@ static const TCGTargetOpDef tcg_target_op_defs[] = {
> #if TCG_TARGET_HAS_ext32u_i64
> { INDEX_op_ext32u_i64, { R, R } },
> #endif
> + { INDEX_op_ext_i32_i64, { R, R } },
> + { INDEX_op_extu_i32_i64, { R, R } },
> #if TCG_TARGET_HAS_bswap16_i64
> { INDEX_op_bswap16_i64, { R, R } },
> #endif
> @@ -701,6 +703,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
> case INDEX_op_ext16u_i64: /* Optional (TCG_TARGET_HAS_ext16u_i64). */
> case INDEX_op_ext32s_i64: /* Optional (TCG_TARGET_HAS_ext32s_i64). */
> case INDEX_op_ext32u_i64: /* Optional (TCG_TARGET_HAS_ext32u_i64). */
> + case INDEX_op_ext_i32_i64:
> + case INDEX_op_extu_i32_i64:
> #endif /* TCG_TARGET_REG_BITS == 64 */
> case INDEX_op_neg_i32: /* Optional (TCG_TARGET_HAS_neg_i32). */
> case INDEX_op_not_i32: /* Optional (TCG_TARGET_HAS_not_i32). */
> diff --git a/tci.c b/tci.c
> index 8444948..3d6d177 100644
> --- a/tci.c
> +++ b/tci.c
> @@ -1033,18 +1033,20 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
> #endif
> #if TCG_TARGET_HAS_ext32s_i64
> case INDEX_op_ext32s_i64:
> +#endif
> + case INDEX_op_ext_i32_i64:
> t0 = *tb_ptr++;
> t1 = tci_read_r32s(&tb_ptr);
> tci_write_reg64(t0, t1);
> break;
> -#endif
> #if TCG_TARGET_HAS_ext32u_i64
> case INDEX_op_ext32u_i64:
> +#endif
> + case INDEX_op_extu_i32_i64:
> t0 = *tb_ptr++;
> t1 = tci_read_r32(&tb_ptr);
> tci_write_reg64(t0, t1);
> break;
> -#endif
> #if TCG_TARGET_HAS_bswap16_i64
> case INDEX_op_bswap16_i64:
> TODO();
--
Alex Bennée
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 10/10] tcg: update README about size changing ops
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 10/10] tcg: update README about size changing ops Aurelien Jarno
@ 2015-07-31 16:02 ` Alex Bennée
0 siblings, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2015-07-31 16:02 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel, Richard Henderson
Aurelien Jarno <aurelien@aurel32.net> writes:
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
> tcg/README | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/tcg/README b/tcg/README
> index 61b3899..a22f251 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -466,13 +466,25 @@ On a 32 bit target, all 64 bit operations are converted to 32 bits. A
> few specific operations must be implemented to allow it (see add2_i32,
> sub2_i32, brcond2_i32).
>
> +On a 64 bit target, the values are transfered between 32 and 64-bit
> +registers using the following ops:
> +- trunc_shr_i64_i32
> +- ext_i32_i64
> +- extu_i32_i64
> +
> +They ensure that the values are correctly truncated or extended when
> +moved from a 32-bit to a 64-bit register or vice-versa. Note that the
> +trunc_shr_i64_i32 is an optional op. It is not necessary to implement
> +it if all the following conditions are met:
> +- 64-bit registers can hold 32-bit values
> +- 32-bit values in a 64-bit register do not need to stay zero or
> + sign extended
> +- all 32-bit TCG ops ignore the high part of 64-bit registers
> +
> Floating point operations are not supported in this version. A
> previous incarnation of the code generator had full support of them,
> but it is better to concentrate on integer operations first.
>
> -On a 64 bit target, no assumption is made in TCG about the storage of
> -the 32 bit values in 64 bit registers.
> -
> 4.2) Constraints
>
> GCC like constraints are used to define the constraints of every
I would include patches to the README with the respective patches that
introduced the changes. It is basically an extended comment ;-)
--
Alex Bennée
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.5 07/10] tcg: implement real ext_i32_i64 and extu_i32_i64 ops
2015-07-31 16:01 ` Alex Bennée
@ 2015-07-31 16:11 ` Richard Henderson
0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2015-07-31 16:11 UTC (permalink / raw)
To: Alex Bennée, Aurelien Jarno
Cc: Claudio Fontana, Stefan Weil, Claudio Fontana, qemu-devel,
Alexander Graf, Blue Swirl
On 07/31/2015 09:01 AM, Alex Bennée wrote:
>
> Aurelien Jarno <aurelien@aurel32.net> writes:
>
>> Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a
>> 32-bit value is always converted to a 64-bit value and not propagated
>> through the register allocator or the optimizer.
>>
>> Cc: Andrzej Zaborowski <balrogg@gmail.com>
>> Cc: Alexander Graf <agraf@suse.de>
>> Cc: Blue Swirl <blauwirbel@gmail.com>
>> Cc: Claudio Fontana <claudio.fontana@huawei.com>
>> Cc: Claudio Fontana <claudio.fontana@gmail.com>
>> Cc: Richard Henderson <rth@twiddle.net>
>> Cc: Stefan Weil <sw@weilnetz.de>
>> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
>> ---
>> tcg/aarch64/tcg-target.c | 4 ++++
>> tcg/i386/tcg-target.c | 5 +++++
>> tcg/ia64/tcg-target.c | 4 ++++
>> tcg/ppc/tcg-target.c | 6 ++++++
>> tcg/s390/tcg-target.c | 5 +++++
>> tcg/sparc/tcg-target.c | 8 ++++++--
>> tcg/tcg-op.c | 10 ++++------
>> tcg/tcg-opc.h | 3 +++
>> tcg/tci/tcg-target.c | 4 ++++
>> tci.c | 6 ++++--
>> 10 files changed, 45 insertions(+), 10 deletions(-)
>>
>> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
>> index b7ec4f5..7f7ab7e 100644
>> --- a/tcg/aarch64/tcg-target.c
>> +++ b/tcg/aarch64/tcg-target.c
>> @@ -1556,6 +1556,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>> case INDEX_op_ext16s_i32:
>> tcg_out_sxt(s, ext, MO_16, a0, a1);
>> break;
>> + case INDEX_op_ext_i32_i64:
>> case INDEX_op_ext32s_i64:
>> tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
>> break;
>> @@ -1567,6 +1568,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>> case INDEX_op_ext16u_i32:
>> tcg_out_uxt(s, MO_16, a0, a1);
>> break;
>> + case INDEX_op_extu_i32_i64:
>> case INDEX_op_ext32u_i64:
>> tcg_out_movr(s, TCG_TYPE_I32, a0, a1);
>
> So what is the difference between extu_i32_i64 and ext32u_i64. The
> README skips over this particular part of the naming convention and I
> wonder if we should be clearer about that before we add more ops.
The size of the input, for one. The possibility of eliding is another.
Our current plan for x86_64 and aarch64 is to canonicalize all 32-bit
quantities to be zero-extended in the register. Primarily because for both
platforms this can be done for free. Thus exts_i32_64 and extrl_i64_i32 will
require implementation, but extu_i32_i64 will be replaced by a move.
Similarly, mips64 would keep values sign-extended (as *required* by the
standard for all 32-bit operations), and thus exts_i32_i64 would be replaced by
a move.
Other targets will probably make extrl_i64_i32 be the move, since the 32-bit
ops don't re-extend for free.
r~
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2015-07-31 16:12 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-24 16:30 [Qemu-devel] [PATCH for-2.5 00/10] tcg: improve optimizer Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 01/10] tcg/optimize: optimize temps tracking Aurelien Jarno
2015-07-27 8:21 ` Paolo Bonzini
2015-07-27 9:09 ` Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 02/10] tcg/optimize: add temp_is_const and temp_is_copy functions Aurelien Jarno
2015-07-29 16:01 ` Alex Bennée
2015-07-29 16:25 ` Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 03/10] tcg/optimize: track const/copy status separately Aurelien Jarno
2015-07-27 8:25 ` Paolo Bonzini
2015-07-27 9:10 ` Aurelien Jarno
2015-07-29 16:10 ` Alex Bennée
2015-07-29 16:25 ` Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 04/10] tcg/optimize: allow constant to have copies Aurelien Jarno
2015-07-24 20:15 ` Richard Henderson
2015-07-24 22:56 ` Aurelien Jarno
2015-07-29 16:12 ` Alex Bennée
2015-07-29 16:27 ` Aurelien Jarno
2015-07-29 20:42 ` Alex Bennée
2015-07-30 7:46 ` Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 05/10] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Aurelien Jarno
2015-07-31 6:31 ` Alex Bennée
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 06/10] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Aurelien Jarno
2015-07-31 7:32 ` Alex Bennée
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 07/10] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Aurelien Jarno
2015-07-31 16:01 ` Alex Bennée
2015-07-31 16:11 ` Richard Henderson
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 08/10] tcg/optimize: add optimizations for " Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 09/10] tcg/optimize: do not remember garbage high bits for 32-bit ops Aurelien Jarno
2015-07-24 16:30 ` [Qemu-devel] [PATCH for-2.5 10/10] tcg: update README about size changing ops Aurelien Jarno
2015-07-31 16:02 ` Alex Bennée
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.