All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/17] queued tcg improvements
@ 2015-08-17 19:38 Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 01/17] tcg/optimize: fix constant signedness Richard Henderson
                   ` (16 more replies)
  0 siblings, 17 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

This pull includes three independent patch sets, which were
all posted during the 2.4 freeze.

The first is algorithmic improvements to tcg/optimize, both
improving its runtime and its tracking of constants.

The second is improvements to the representation of 32<->64-bit
size changing operations.  Still to do here is investigate how
these might be best applied to each tcg host.

The third is improvements to how guest unaligned accesses are
implemented in softmmu mode, for the 4 supported  host processors
that themselves implement unaligned accesses.


r~


The following changes since commit 074a9925e1cfd659d5376dcaccd1436d3840e611:

  Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging (2015-08-14 16:52:34 +0100)

are available in the git repository at:

  git://github.com/rth7680/qemu.git tags/pull-tcg-20150817

for you to fetch changes up to 845fa7960c4fc7ab936e7ec45076ef230266d6cd:

  tcg/aarch64: Use softmmu fast path for unaligned accesses (2015-08-17 12:18:05 -0700)

----------------------------------------------------------------
Collected tcg improvements for 2.5

----------------------------------------------------------------
Aurelien Jarno (12):
      tcg/optimize: fix constant signedness
      tcg/optimize: optimize temps tracking
      tcg/optimize: add temp_is_const and temp_is_copy functions
      tcg/optimize: track const/copy status separately
      tcg/optimize: allow constant to have copies
      tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
      tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
      tcg: implement real ext_i32_i64 and extu_i32_i64 ops
      tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
      tcg/optimize: do not remember garbage high bits for 32-bit ops
      tcg: update README about size changing ops
      tcg/i386: use softmmu fast path for unaligned accesses

Benjamin Herrenschmidt (1):
      tcg/ppc: Improve unaligned load/store handling on 64-bit backend

Richard Henderson (5):
      Merge remote-tracking branch 'aurel32/tcg-optimizer' into tcg-next-25
      tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32
      tcg: Remove tcg_gen_trunc_i64_i32
      tcg/s390: Use softmmu fast path for unaligned accesses
      tcg/aarch64: Use softmmu fast path for unaligned accesses

 target-alpha/translate.c      |   4 +-
 target-arm/translate-a64.c    |  60 ++++-----
 target-arm/translate.c        |  46 +++----
 target-cris/translate.c       |   4 +-
 target-m68k/translate.c       |   2 +-
 target-microblaze/translate.c |   8 +-
 target-mips/translate.c       |   4 +-
 target-openrisc/translate.c   |  22 ++--
 target-s390x/translate.c      |  30 ++---
 target-sh4/translate.c        |   4 +-
 target-sparc/translate.c      |  14 +--
 target-tricore/translate.c    |  32 ++---
 target-xtensa/translate.c     |   2 +-
 tcg/README                    |  32 +++--
 tcg/aarch64/tcg-target.c      |  41 ++++--
 tcg/aarch64/tcg-target.h      |   3 +-
 tcg/i386/tcg-target.c         |  27 ++--
 tcg/i386/tcg-target.h         |   3 +-
 tcg/ia64/tcg-target.c         |   4 +
 tcg/ia64/tcg-target.h         |   3 +-
 tcg/optimize.c                | 283 +++++++++++++++++++-----------------------
 tcg/ppc/tcg-target.c          |  47 +++++--
 tcg/ppc/tcg-target.h          |   3 +-
 tcg/s390/tcg-target.c         |  31 ++++-
 tcg/s390/tcg-target.h         |   3 +-
 tcg/sparc/tcg-target.c        |  22 ++--
 tcg/sparc/tcg-target.h        |   3 +-
 tcg/tcg-op.c                  |  48 ++++---
 tcg/tcg-op.h                  |  12 +-
 tcg/tcg-opc.h                 |  10 +-
 tcg/tcg.h                     |   3 +-
 tcg/tci/tcg-target.c          |   4 +
 tcg/tci/tcg-target.h          |   3 +-
 tci.c                         |   6 +-
 34 files changed, 456 insertions(+), 367 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 01/17] tcg/optimize: fix constant signedness
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 02/17] tcg/optimize: optimize temps tracking Richard Henderson
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Aurelien Jarno

From: Aurelien Jarno <aurelien@aurel32.net>

By convention, on a 64-bit host TCG internally stores 32-bit constants
as sign-extended. This is not the case in the optimizer when a 32-bit
constant is folded.

This doesn't seem to have more consequences than suboptimal code
generation. For instance the x86 backend assumes sign-extended constants,
and in some rare cases uses a 32-bit unsigned immediate 0xffffffff
instead of a 8-bit signed immediate 0xff for the constant -1. This is
with a ppc guest:

before
------

 ---- 0x9f29cc
 movi_i32 tmp1,$0xffffffff
 movi_i32 tmp2,$0x0
 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2
 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2
 mov_i32 r10,tmp0

0x7fd8c7dfe90c:  xor    %ebp,%ebp
0x7fd8c7dfe90e:  mov    %ebp,%r11d
0x7fd8c7dfe911:  mov    0x18(%r14),%r9d
0x7fd8c7dfe915:  add    %r9d,%r10d
0x7fd8c7dfe918:  adc    %ebp,%r11d
0x7fd8c7dfe91b:  add    $0xffffffff,%r10d
0x7fd8c7dfe922:  adc    %ebp,%r11d
0x7fd8c7dfe925:  mov    %r11d,0x134(%r14)
0x7fd8c7dfe92c:  mov    %r10d,0x28(%r14)

after
-----

 ---- 0x9f29cc
 movi_i32 tmp1,$0xffffffffffffffff
 movi_i32 tmp2,$0x0
 add2_i32 tmp0,CA,CA,tmp2,r6,tmp2
 add2_i32 tmp0,CA,tmp0,CA,tmp1,tmp2
 mov_i32 r10,tmp0

0x7f37010d490c:  xor    %ebp,%ebp
0x7f37010d490e:  mov    %ebp,%r11d
0x7f37010d4911:  mov    0x18(%r14),%r9d
0x7f37010d4915:  add    %r9d,%r10d
0x7f37010d4918:  adc    %ebp,%r11d
0x7f37010d491b:  add    $0xffffffffffffffff,%r10d
0x7f37010d491f:  adc    %ebp,%r11d
0x7f37010d4922:  mov    %r11d,0x134(%r14)
0x7f37010d4929:  mov    %r10d,0x28(%r14)

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1436544211-2769-2-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 18283cf..cd0e793 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -395,7 +395,7 @@ static TCGArg do_constant_folding(TCGOpcode op, TCGArg x, TCGArg y)
 {
     TCGArg res = do_constant_folding_2(op, x, y);
     if (op_bits(op) == 32) {
-        res &= 0xffffffff;
+        res = (int32_t)res;
     }
     return res;
 }
@@ -1128,8 +1128,8 @@ void tcg_optimize(TCGContext *s)
 
                 rl = args[0];
                 rh = args[1];
-                tcg_opt_gen_movi(s, op, args, rl, (uint32_t)a);
-                tcg_opt_gen_movi(s, op2, args2, rh, (uint32_t)(a >> 32));
+                tcg_opt_gen_movi(s, op, args, rl, (int32_t)a);
+                tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(a >> 32));
 
                 /* We've done all we need to do with the movi.  Skip it.  */
                 oi_next = op2->next;
@@ -1149,8 +1149,8 @@ void tcg_optimize(TCGContext *s)
 
                 rl = args[0];
                 rh = args[1];
-                tcg_opt_gen_movi(s, op, args, rl, (uint32_t)r);
-                tcg_opt_gen_movi(s, op2, args2, rh, (uint32_t)(r >> 32));
+                tcg_opt_gen_movi(s, op, args, rl, (int32_t)r);
+                tcg_opt_gen_movi(s, op2, args2, rh, (int32_t)(r >> 32));
 
                 /* We've done all we need to do with the movi.  Skip it.  */
                 oi_next = op2->next;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 02/17] tcg/optimize: optimize temps tracking
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 01/17] tcg/optimize: fix constant signedness Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 03/17] tcg/optimize: add temp_is_const and temp_is_copy functions Richard Henderson
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Aurelien Jarno

From: Aurelien Jarno <aurelien@aurel32.net>

The tcg_temp_info structure uses 24 bytes per temp. Now that we emulate
vector registers on most guests, it's not uncommon to have more than 100
used temps. This means we have initialize more than 2kB at least twice
per TB, often more when there is a few goto_tb.

Instead used a TCGTempSet bit array to track which temps are in used in
the current basic block. This means there are only around 16 bytes to
initialize.

This improves the boot time of a MIPS guest on an x86-64 host by around
7% and moves out tcg_optimize from the the top of the profiler list.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index cd0e793..425c14b 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -50,6 +50,7 @@ struct tcg_temp_info {
 };
 
 static struct tcg_temp_info temps[TCG_MAX_TEMPS];
+static TCGTempSet temps_used;
 
 /* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
    the copy flag from the left temp.  */
@@ -67,6 +68,22 @@ static void reset_temp(TCGArg temp)
     temps[temp].mask = -1;
 }
 
+/* Reset all temporaries, given that there are NB_TEMPS of them.  */
+static void reset_all_temps(int nb_temps)
+{
+    bitmap_zero(temps_used.l, nb_temps);
+}
+
+/* Initialize and activate a temporary.  */
+static void init_temp_info(TCGArg temp)
+{
+    if (!test_bit(temp, temps_used.l)) {
+        temps[temp].state = TCG_TEMP_UNDEF;
+        temps[temp].mask = -1;
+        set_bit(temp, temps_used.l);
+    }
+}
+
 static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
                                 TCGOpcode opc, int nargs)
 {
@@ -98,16 +115,6 @@ static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
     return new_op;
 }
 
-/* Reset all temporaries, given that there are NB_TEMPS of them.  */
-static void reset_all_temps(int nb_temps)
-{
-    int i;
-    for (i = 0; i < nb_temps; i++) {
-        temps[i].state = TCG_TEMP_UNDEF;
-        temps[i].mask = -1;
-    }
-}
-
 static int op_bits(TCGOpcode op)
 {
     const TCGOpDef *def = &tcg_op_defs[op];
@@ -606,6 +613,11 @@ void tcg_optimize(TCGContext *s)
             nb_iargs = def->nb_iargs;
         }
 
+        /* Initialize the temps that are going to be used */
+        for (i = 0; i < nb_oargs + nb_iargs; i++) {
+            init_temp_info(args[i]);
+        }
+
         /* Do copy propagation */
         for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
             if (temps[args[i]].state == TCG_TEMP_COPY) {
@@ -1299,7 +1311,9 @@ void tcg_optimize(TCGContext *s)
             if (!(args[nb_oargs + nb_iargs + 1]
                   & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
                 for (i = 0; i < nb_globals; i++) {
-                    reset_temp(i);
+                    if (test_bit(i, temps_used.l)) {
+                        reset_temp(i);
+                    }
                 }
             }
             goto do_reset_output;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 03/17] tcg/optimize: add temp_is_const and temp_is_copy functions
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 01/17] tcg/optimize: fix constant signedness Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 02/17] tcg/optimize: optimize temps tracking Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 04/17] tcg/optimize: track const/copy status separately Richard Henderson
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Aurelien Jarno

From: Aurelien Jarno <aurelien@aurel32.net>

Add two accessor functions temp_is_const and temp_is_copy, to make the
code more readable and make code change easier.

Cc: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 131 ++++++++++++++++++++++++++-------------------------------
 1 file changed, 60 insertions(+), 71 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 425c14b..719fee2 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -52,11 +52,21 @@ struct tcg_temp_info {
 static struct tcg_temp_info temps[TCG_MAX_TEMPS];
 static TCGTempSet temps_used;
 
+static inline bool temp_is_const(TCGArg arg)
+{
+    return temps[arg].state == TCG_TEMP_CONST;
+}
+
+static inline bool temp_is_copy(TCGArg arg)
+{
+    return temps[arg].state == TCG_TEMP_COPY;
+}
+
 /* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
    the copy flag from the left temp.  */
 static void reset_temp(TCGArg temp)
 {
-    if (temps[temp].state == TCG_TEMP_COPY) {
+    if (temp_is_copy(temp)) {
         if (temps[temp].prev_copy == temps[temp].next_copy) {
             temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
         } else {
@@ -186,8 +196,7 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
         return true;
     }
 
-    if (temps[arg1].state != TCG_TEMP_COPY
-        || temps[arg2].state != TCG_TEMP_COPY) {
+    if (!temp_is_copy(arg1) || !temp_is_copy(arg2)) {
         return false;
     }
 
@@ -230,7 +239,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
         return;
     }
 
-    if (temps[src].state == TCG_TEMP_CONST) {
+    if (temp_is_const(src)) {
         tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
         return;
     }
@@ -248,10 +257,10 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
     }
     temps[dst].mask = mask;
 
-    assert(temps[src].state != TCG_TEMP_CONST);
+    assert(!temp_is_const(src));
 
     if (s->temps[src].type == s->temps[dst].type) {
-        if (temps[src].state != TCG_TEMP_COPY) {
+        if (!temp_is_copy(src)) {
             temps[src].state = TCG_TEMP_COPY;
             temps[src].next_copy = src;
             temps[src].prev_copy = src;
@@ -488,7 +497,7 @@ static bool do_constant_folding_cond_eq(TCGCond c)
 static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
                                        TCGArg y, TCGCond c)
 {
-    if (temps[x].state == TCG_TEMP_CONST && temps[y].state == TCG_TEMP_CONST) {
+    if (temp_is_const(x) && temp_is_const(y)) {
         switch (op_bits(op)) {
         case 32:
             return do_constant_folding_cond_32(temps[x].val, temps[y].val, c);
@@ -499,7 +508,7 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
         }
     } else if (temps_are_copies(x, y)) {
         return do_constant_folding_cond_eq(c);
-    } else if (temps[y].state == TCG_TEMP_CONST && temps[y].val == 0) {
+    } else if (temp_is_const(y) && temps[y].val == 0) {
         switch (c) {
         case TCG_COND_LTU:
             return 0;
@@ -520,12 +529,10 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
     TCGArg al = p1[0], ah = p1[1];
     TCGArg bl = p2[0], bh = p2[1];
 
-    if (temps[bl].state == TCG_TEMP_CONST
-        && temps[bh].state == TCG_TEMP_CONST) {
+    if (temp_is_const(bl) && temp_is_const(bh)) {
         uint64_t b = ((uint64_t)temps[bh].val << 32) | (uint32_t)temps[bl].val;
 
-        if (temps[al].state == TCG_TEMP_CONST
-            && temps[ah].state == TCG_TEMP_CONST) {
+        if (temp_is_const(al) && temp_is_const(ah)) {
             uint64_t a;
             a = ((uint64_t)temps[ah].val << 32) | (uint32_t)temps[al].val;
             return do_constant_folding_cond_64(a, b, c);
@@ -551,8 +558,8 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
 {
     TCGArg a1 = *p1, a2 = *p2;
     int sum = 0;
-    sum += temps[a1].state == TCG_TEMP_CONST;
-    sum -= temps[a2].state == TCG_TEMP_CONST;
+    sum += temp_is_const(a1);
+    sum -= temp_is_const(a2);
 
     /* Prefer the constant in second argument, and then the form
        op a, a, b, which is better handled on non-RISC hosts. */
@@ -567,10 +574,10 @@ static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
 static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 {
     int sum = 0;
-    sum += temps[p1[0]].state == TCG_TEMP_CONST;
-    sum += temps[p1[1]].state == TCG_TEMP_CONST;
-    sum -= temps[p2[0]].state == TCG_TEMP_CONST;
-    sum -= temps[p2[1]].state == TCG_TEMP_CONST;
+    sum += temp_is_const(p1[0]);
+    sum += temp_is_const(p1[1]);
+    sum -= temp_is_const(p2[0]);
+    sum -= temp_is_const(p2[1]);
     if (sum > 0) {
         TCGArg t;
         t = p1[0], p1[0] = p2[0], p2[0] = t;
@@ -620,7 +627,7 @@ void tcg_optimize(TCGContext *s)
 
         /* Do copy propagation */
         for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-            if (temps[args[i]].state == TCG_TEMP_COPY) {
+            if (temp_is_copy(args[i])) {
                 args[i] = find_better_copy(s, args[i]);
             }
         }
@@ -690,8 +697,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(sar):
         CASE_OP_32_64(rotl):
         CASE_OP_32_64(rotr):
-            if (temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[1]].val == 0) {
+            if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
                 tcg_opt_gen_movi(s, op, args, args[0], 0);
                 continue;
             }
@@ -701,7 +707,7 @@ void tcg_optimize(TCGContext *s)
                 TCGOpcode neg_op;
                 bool have_neg;
 
-                if (temps[args[2]].state == TCG_TEMP_CONST) {
+                if (temp_is_const(args[2])) {
                     /* Proceed with possible constant folding. */
                     break;
                 }
@@ -715,8 +721,7 @@ void tcg_optimize(TCGContext *s)
                 if (!have_neg) {
                     break;
                 }
-                if (temps[args[1]].state == TCG_TEMP_CONST
-                    && temps[args[1]].val == 0) {
+                if (temp_is_const(args[1]) && temps[args[1]].val == 0) {
                     op->opc = neg_op;
                     reset_temp(args[0]);
                     args[1] = args[2];
@@ -726,34 +731,30 @@ void tcg_optimize(TCGContext *s)
             break;
         CASE_OP_32_64(xor):
         CASE_OP_32_64(nand):
-            if (temps[args[1]].state != TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == -1) {
+            if (!temp_is_const(args[1])
+                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
                 i = 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(nor):
-            if (temps[args[1]].state != TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == 0) {
+            if (!temp_is_const(args[1])
+                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
                 i = 1;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(andc):
-            if (temps[args[2]].state != TCG_TEMP_CONST
-                && temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[1]].val == -1) {
+            if (!temp_is_const(args[2])
+                && temp_is_const(args[1]) && temps[args[1]].val == -1) {
                 i = 2;
                 goto try_not;
             }
             break;
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (temps[args[2]].state != TCG_TEMP_CONST
-                && temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[1]].val == 0) {
+            if (!temp_is_const(args[2])
+                && temp_is_const(args[1]) && temps[args[1]].val == 0) {
                 i = 2;
                 goto try_not;
             }
@@ -794,9 +795,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(or):
         CASE_OP_32_64(xor):
         CASE_OP_32_64(andc):
-            if (temps[args[1]].state != TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == 0) {
+            if (!temp_is_const(args[1])
+                && temp_is_const(args[2]) && temps[args[2]].val == 0) {
                 tcg_opt_gen_mov(s, op, args, args[0], args[1]);
                 continue;
             }
@@ -804,9 +804,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(and):
         CASE_OP_32_64(orc):
         CASE_OP_32_64(eqv):
-            if (temps[args[1]].state != TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == -1) {
+            if (!temp_is_const(args[1])
+                && temp_is_const(args[2]) && temps[args[2]].val == -1) {
                 tcg_opt_gen_mov(s, op, args, args[0], args[1]);
                 continue;
             }
@@ -844,7 +843,7 @@ void tcg_optimize(TCGContext *s)
 
         CASE_OP_32_64(and):
             mask = temps[args[2]].mask;
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
         and_const:
                 affected = temps[args[1]].mask & ~mask;
             }
@@ -854,7 +853,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(andc):
             /* Known-zeros does not imply known-ones.  Therefore unless
                args[2] is constant, we can't infer anything from it.  */
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 mask = ~temps[args[2]].mask;
                 goto and_const;
             }
@@ -863,26 +862,26 @@ void tcg_optimize(TCGContext *s)
             break;
 
         case INDEX_op_sar_i32:
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & 31;
                 mask = (int32_t)temps[args[1]].mask >> tmp;
             }
             break;
         case INDEX_op_sar_i64:
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & 63;
                 mask = (int64_t)temps[args[1]].mask >> tmp;
             }
             break;
 
         case INDEX_op_shr_i32:
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & 31;
                 mask = (uint32_t)temps[args[1]].mask >> tmp;
             }
             break;
         case INDEX_op_shr_i64:
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & 63;
                 mask = (uint64_t)temps[args[1]].mask >> tmp;
             }
@@ -893,7 +892,7 @@ void tcg_optimize(TCGContext *s)
             break;
 
         CASE_OP_32_64(shl):
-            if (temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2])) {
                 tmp = temps[args[2]].val & (TCG_TARGET_REG_BITS - 1);
                 mask = temps[args[1]].mask << tmp;
             }
@@ -974,8 +973,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(mul):
         CASE_OP_32_64(muluh):
         CASE_OP_32_64(mulsh):
-            if ((temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[2]].val == 0)) {
+            if ((temp_is_const(args[2]) && temps[args[2]].val == 0)) {
                 tcg_opt_gen_movi(s, op, args, args[0], 0);
                 continue;
             }
@@ -1030,7 +1028,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(ext16u):
         case INDEX_op_ext32s_i64:
         case INDEX_op_ext32u_i64:
-            if (temps[args[1]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, 0);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
                 break;
@@ -1038,7 +1036,7 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         case INDEX_op_trunc_shr_i32:
-            if (temps[args[1]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, args[2]);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
                 break;
@@ -1067,8 +1065,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(divu):
         CASE_OP_32_64(rem):
         CASE_OP_32_64(remu):
-            if (temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val,
                                           temps[args[2]].val);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
@@ -1077,8 +1074,7 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         CASE_OP_32_64(deposit):
-            if (temps[args[1]].state == TCG_TEMP_CONST
-                && temps[args[2]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[1]) && temp_is_const(args[2])) {
                 tmp = deposit64(temps[args[1]].val, args[3], args[4],
                                 temps[args[2]].val);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
@@ -1118,10 +1114,8 @@ void tcg_optimize(TCGContext *s)
 
         case INDEX_op_add2_i32:
         case INDEX_op_sub2_i32:
-            if (temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[3]].state == TCG_TEMP_CONST
-                && temps[args[4]].state == TCG_TEMP_CONST
-                && temps[args[5]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2]) && temp_is_const(args[3])
+                && temp_is_const(args[4]) && temp_is_const(args[5])) {
                 uint32_t al = temps[args[2]].val;
                 uint32_t ah = temps[args[3]].val;
                 uint32_t bl = temps[args[4]].val;
@@ -1150,8 +1144,7 @@ void tcg_optimize(TCGContext *s)
             goto do_default;
 
         case INDEX_op_mulu2_i32:
-            if (temps[args[2]].state == TCG_TEMP_CONST
-                && temps[args[3]].state == TCG_TEMP_CONST) {
+            if (temp_is_const(args[2]) && temp_is_const(args[3])) {
                 uint32_t a = temps[args[2]].val;
                 uint32_t b = temps[args[3]].val;
                 uint64_t r = (uint64_t)a * b;
@@ -1183,10 +1176,8 @@ void tcg_optimize(TCGContext *s)
                     tcg_op_remove(s, op);
                 }
             } else if ((args[4] == TCG_COND_LT || args[4] == TCG_COND_GE)
-                       && temps[args[2]].state == TCG_TEMP_CONST
-                       && temps[args[3]].state == TCG_TEMP_CONST
-                       && temps[args[2]].val == 0
-                       && temps[args[3]].val == 0) {
+                       && temp_is_const(args[2]) && temps[args[2]].val == 0
+                       && temp_is_const(args[3]) && temps[args[3]].val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_brcond_high:
@@ -1248,10 +1239,8 @@ void tcg_optimize(TCGContext *s)
             do_setcond_const:
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
             } else if ((args[5] == TCG_COND_LT || args[5] == TCG_COND_GE)
-                       && temps[args[3]].state == TCG_TEMP_CONST
-                       && temps[args[4]].state == TCG_TEMP_CONST
-                       && temps[args[3]].val == 0
-                       && temps[args[4]].val == 0) {
+                       && temp_is_const(args[3]) && temps[args[3]].val == 0
+                       && temp_is_const(args[4]) && temps[args[4]].val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_setcond_high:
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 04/17] tcg/optimize: track const/copy status separately
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 03/17] tcg/optimize: add temp_is_const and temp_is_copy functions Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 05/17] tcg/optimize: allow constant to have copies Richard Henderson
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Aurelien Jarno

From: Aurelien Jarno <aurelien@aurel32.net>

Instead of using an enum which could be either a copy or a const, track
them separately. This will be used in the next patch.

Constants are tracked through a bool. Copies are tracked by initializing
temp's next_copy and prev_copy to itself, allowing to simplify the code
a bit.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 42 ++++++++++++++----------------------------
 1 file changed, 14 insertions(+), 28 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 719fee2..e69ab05 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -35,14 +35,8 @@
         glue(glue(case INDEX_op_, x), _i32):    \
         glue(glue(case INDEX_op_, x), _i64)
 
-typedef enum {
-    TCG_TEMP_UNDEF = 0,
-    TCG_TEMP_CONST,
-    TCG_TEMP_COPY,
-} tcg_temp_state;
-
 struct tcg_temp_info {
-    tcg_temp_state state;
+    bool is_const;
     uint16_t prev_copy;
     uint16_t next_copy;
     tcg_target_ulong val;
@@ -54,27 +48,22 @@ static TCGTempSet temps_used;
 
 static inline bool temp_is_const(TCGArg arg)
 {
-    return temps[arg].state == TCG_TEMP_CONST;
+    return temps[arg].is_const;
 }
 
 static inline bool temp_is_copy(TCGArg arg)
 {
-    return temps[arg].state == TCG_TEMP_COPY;
+    return temps[arg].next_copy != arg;
 }
 
-/* Reset TEMP's state to TCG_TEMP_UNDEF.  If TEMP only had one copy, remove
-   the copy flag from the left temp.  */
+/* Reset TEMP's state, possibly removing the temp for the list of copies.  */
 static void reset_temp(TCGArg temp)
 {
-    if (temp_is_copy(temp)) {
-        if (temps[temp].prev_copy == temps[temp].next_copy) {
-            temps[temps[temp].next_copy].state = TCG_TEMP_UNDEF;
-        } else {
-            temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
-            temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
-        }
-    }
-    temps[temp].state = TCG_TEMP_UNDEF;
+    temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
+    temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
+    temps[temp].next_copy = temp;
+    temps[temp].prev_copy = temp;
+    temps[temp].is_const = false;
     temps[temp].mask = -1;
 }
 
@@ -88,7 +77,9 @@ static void reset_all_temps(int nb_temps)
 static void init_temp_info(TCGArg temp)
 {
     if (!test_bit(temp, temps_used.l)) {
-        temps[temp].state = TCG_TEMP_UNDEF;
+        temps[temp].next_copy = temp;
+        temps[temp].prev_copy = temp;
+        temps[temp].is_const = false;
         temps[temp].mask = -1;
         set_bit(temp, temps_used.l);
     }
@@ -218,7 +209,7 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
     op->opc = new_op;
 
     reset_temp(dst);
-    temps[dst].state = TCG_TEMP_CONST;
+    temps[dst].is_const = true;
     temps[dst].val = val;
     mask = val;
     if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
@@ -260,16 +251,11 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
     assert(!temp_is_const(src));
 
     if (s->temps[src].type == s->temps[dst].type) {
-        if (!temp_is_copy(src)) {
-            temps[src].state = TCG_TEMP_COPY;
-            temps[src].next_copy = src;
-            temps[src].prev_copy = src;
-        }
-        temps[dst].state = TCG_TEMP_COPY;
         temps[dst].next_copy = temps[src].next_copy;
         temps[dst].prev_copy = src;
         temps[temps[dst].next_copy].prev_copy = dst;
         temps[src].next_copy = dst;
+        temps[dst].is_const = false;
     }
 
     args[0] = dst;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 05/17] tcg/optimize: allow constant to have copies
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
                   ` (3 preceding siblings ...)
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 04/17] tcg/optimize: track const/copy status separately Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 06/17] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Richard Henderson
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Aurelien Jarno

From: Aurelien Jarno <aurelien@aurel32.net>

Now that copies and constants are tracked separately, we can allow
constant to have copies, deferring the choice to use a register or a
constant to the register allocation pass. This prevent this kind of
regular constant reloading:

-OUT: [size=338]
+OUT: [size=298]
   mov    -0x4(%r14),%ebp
   test   %ebp,%ebp
   jne    0x7ffbe9cb0ed6
   mov    $0x40002219f8,%rbp
   mov    %rbp,(%r14)
-  mov    $0x40002219f8,%rbp
   mov    $0x4000221a20,%rbx
   mov    %rbp,(%rbx)
   mov    $0x4000000000,%rbp
   mov    %rbp,(%r14)
-  mov    $0x4000000000,%rbp
   mov    $0x4000221d38,%rbx
   mov    %rbp,(%rbx)
   mov    $0x40002221a8,%rbp
   mov    %rbp,(%r14)
-  mov    $0x40002221a8,%rbp
   mov    $0x4000221d40,%rbx
   mov    %rbp,(%rbx)
   mov    $0x4000019170,%rbp
   mov    %rbp,(%r14)
-  mov    $0x4000019170,%rbp
   mov    $0x4000221d48,%rbx
   mov    %rbp,(%rbx)
   mov    $0x40000049ee,%rbp
   mov    %rbp,0x80(%r14)
   mov    %r14,%rdi
   callq  0x7ffbe99924d0
   mov    $0x4000001680,%rbp
   mov    %rbp,0x30(%r14)
   mov    0x10(%r14),%rbp
   mov    $0x4000001680,%rbp
   mov    %rbp,0x30(%r14)
   mov    0x10(%r14),%rbp
   shl    $0x20,%rbp
   mov    (%r14),%rbx
   mov    %ebx,%ebx
   mov    %rbx,(%r14)
   or     %rbx,%rbp
   mov    %rbp,0x10(%r14)
   mov    %rbp,0x90(%r14)
   mov    0x60(%r14),%rbx
   mov    %rbx,0x38(%r14)
   mov    0x28(%r14),%rbx
   mov    $0x4000220e60,%r12
   mov    %rbx,(%r12)
   mov    $0x40002219c8,%rbx
   mov    %rbp,(%rbx)
   mov    0x20(%r14),%rbp
   sub    $0x8,%rbp
   mov    $0x4000004a16,%rbx
   mov    %rbx,0x0(%rbp)
   mov    %rbp,0x20(%r14)
   mov    $0x19,%ebp
   mov    %ebp,0xa8(%r14)
   mov    $0x4000015110,%rbp
   mov    %rbp,0x80(%r14)
   xor    %eax,%eax
   jmpq   0x7ffbebcae426
   lea    -0x5f6d72a(%rip),%rax        # 0x7ffbe3d437b3
   jmpq   0x7ffbebcae426

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index e69ab05..2d3c72b 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -230,11 +230,6 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
         return;
     }
 
-    if (temp_is_const(src)) {
-        tcg_opt_gen_movi(s, op, args, dst, temps[src].val);
-        return;
-    }
-
     TCGOpcode new_op = op_to_mov(op->opc);
     tcg_target_ulong mask;
 
@@ -248,14 +243,13 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
     }
     temps[dst].mask = mask;
 
-    assert(!temp_is_const(src));
-
     if (s->temps[src].type == s->temps[dst].type) {
         temps[dst].next_copy = temps[src].next_copy;
         temps[dst].prev_copy = src;
         temps[temps[dst].next_copy].prev_copy = dst;
         temps[src].next_copy = dst;
-        temps[dst].is_const = false;
+        temps[dst].is_const = temps[src].is_const;
+        temps[dst].val = temps[src].val;
     }
 
     args[0] = dst;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 06/17] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
                   ` (4 preceding siblings ...)
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 05/17] tcg/optimize: allow constant to have copies Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 07/17] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Richard Henderson
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Aurelien Jarno

From: Aurelien Jarno <aurelien@aurel32.net>

The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32,
and the name in the README doesn't match the name offered to the
frontends.

Always use the long name to make it clear it is a size changing op.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/README               | 2 +-
 tcg/aarch64/tcg-target.h | 2 +-
 tcg/i386/tcg-target.h    | 2 +-
 tcg/ia64/tcg-target.h    | 2 +-
 tcg/optimize.c           | 6 +++---
 tcg/ppc/tcg-target.h     | 2 +-
 tcg/s390/tcg-target.h    | 2 +-
 tcg/sparc/tcg-target.c   | 4 ++--
 tcg/sparc/tcg-target.h   | 2 +-
 tcg/tcg-op.c             | 4 ++--
 tcg/tcg-opc.h            | 4 ++--
 tcg/tcg.h                | 2 +-
 tcg/tci/tcg-target.h     | 2 +-
 13 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/tcg/README b/tcg/README
index a550ff1..61b3899 100644
--- a/tcg/README
+++ b/tcg/README
@@ -314,7 +314,7 @@ This operation would be equivalent to
 
   dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
 
-* trunc_shr_i32 t0, t1, pos
+* trunc_shr_i64_i32 t0, t1, pos
 
 For 64-bit hosts only, right shift the 64-bit input T1 by POS and
 truncate to 32-bit output T0.  Depending on the host, this may be
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 8aec04d..dfd8801 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -70,7 +70,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 25b5133..dae50ba 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -102,7 +102,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
 #define TCG_TARGET_HAS_ext8s_i64        1
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index a04ed81..29902f9 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -160,7 +160,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_mulsh_i64        0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) ((len) <= 16)
 #define TCG_TARGET_deposit_i64_valid(ofs, len) ((len) <= 16)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 2d3c72b..8bfe7ff 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -288,7 +288,7 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     case INDEX_op_shr_i32:
         return (uint32_t)x >> (y & 31);
 
-    case INDEX_op_trunc_shr_i32:
+    case INDEX_op_trunc_shr_i64_i32:
     case INDEX_op_shr_i64:
         return (uint64_t)x >> (y & 63);
 
@@ -867,7 +867,7 @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        case INDEX_op_trunc_shr_i32:
+        case INDEX_op_trunc_shr_i64_i32:
             mask = (uint64_t)temps[args[1]].mask >> args[2];
             break;
 
@@ -1015,7 +1015,7 @@ void tcg_optimize(TCGContext *s)
             }
             goto do_default;
 
-        case INDEX_op_trunc_shr_i32:
+        case INDEX_op_trunc_shr_i64_i32:
             if (temp_is_const(args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, args[2]);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 7ce7048..b7e6861 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -77,7 +77,7 @@ typedef enum {
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
 #define TCG_TARGET_HAS_sub2_i32         0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_rot_i64          1
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 91576d5..50016a8 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -72,7 +72,7 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 1a870a8..b23032b 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1413,7 +1413,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext32u_i64:
         tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
         break;
-    case INDEX_op_trunc_shr_i32:
+    case INDEX_op_trunc_shr_i64_i32:
         if (a2 == 0) {
             tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
         } else {
@@ -1533,7 +1533,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 
     { INDEX_op_ext32s_i64, { "R", "r" } },
     { INDEX_op_ext32u_i64, { "R", "r" } },
-    { INDEX_op_trunc_shr_i32,  { "r", "R" } },
+    { INDEX_op_trunc_shr_i64_i32,  { "r", "R" } },
 
     { INDEX_op_brcond_i64, { "RZ", "RJ" } },
     { INDEX_op_setcond_i64, { "R", "RZ", "RJ" } },
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index f584de4..336c47f 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -118,7 +118,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 
-#define TCG_TARGET_HAS_trunc_shr_i32    1
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 1
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_rot_i64          0
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 45098c3..61b64db 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1751,8 +1751,8 @@ void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
             tcg_gen_mov_i32(ret, TCGV_LOW(t));
             tcg_temp_free_i64(t);
         }
-    } else if (TCG_TARGET_HAS_trunc_shr_i32) {
-        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i32, ret,
+    } else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
+        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i64_i32, ret,
                          MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
     } else if (count == 0) {
         tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 13ccb60..4a34f43 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -138,8 +138,8 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
 
-DEF(trunc_shr_i32, 1, 1, 1,
-    IMPL(TCG_TARGET_HAS_trunc_shr_i32)
+DEF(trunc_shr_i64_i32, 1, 1, 1,
+    IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
     | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
 
 DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | IMPL64)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 231a781..e7e33b9 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -66,7 +66,7 @@ typedef uint64_t TCGRegSet;
 
 #if TCG_TARGET_REG_BITS == 32
 /* Turn some undef macros into false macros.  */
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_div_i64          0
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_div2_i64         0
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index cbf3f9b..8b1139b 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -84,7 +84,7 @@
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i32    0
+#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
 #define TCG_TARGET_HAS_bswap16_i64      1
 #define TCG_TARGET_HAS_bswap32_i64      1
 #define TCG_TARGET_HAS_bswap64_i64      1
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 07/17] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
                   ` (5 preceding siblings ...)
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 06/17] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 08/17] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Richard Henderson
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Aurelien Jarno

From: Aurelien Jarno <aurelien@aurel32.net>

The tcg_gen_trunc_shr_i64_i32 function takes a 64-bit argument and
returns a 32-bit value. Directly call tcg_gen_op3 with the correct
types instead of calling tcg_gen_op3i_i32 and abusing the TCG types.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/tcg-op.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 61b64db..0e79fd1 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1752,8 +1752,8 @@ void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
             tcg_temp_free_i64(t);
         }
     } else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
-        tcg_gen_op3i_i32(INDEX_op_trunc_shr_i64_i32, ret,
-                         MAKE_TCGV_I32(GET_TCGV_I64(arg)), count);
+        tcg_gen_op3(&tcg_ctx, INDEX_op_trunc_shr_i64_i32,
+                    GET_TCGV_I32(ret), GET_TCGV_I64(arg), count);
     } else if (count == 0) {
         tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
     } else {
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 08/17] tcg: implement real ext_i32_i64 and extu_i32_i64 ops
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
                   ` (6 preceding siblings ...)
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 07/17] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 19:51   ` Claudio Fontana
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 09/17] tcg/optimize: add optimizations for " Richard Henderson
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, Claudio Fontana, Stefan Weil, Claudio Fontana,
	Alexander Graf, Blue Swirl, Aurelien Jarno

From: Aurelien Jarno <aurelien@aurel32.net>

Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a
32-bit value is always converted to a 64-bit value and not propagated
through the register allocator or the optimizer.

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Claudio Fontana <claudio.fontana@huawei.com>
Cc: Claudio Fontana <claudio.fontana@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/aarch64/tcg-target.c |  4 ++++
 tcg/i386/tcg-target.c    |  5 +++++
 tcg/ia64/tcg-target.c    |  4 ++++
 tcg/ppc/tcg-target.c     |  6 ++++++
 tcg/s390/tcg-target.c    |  5 +++++
 tcg/sparc/tcg-target.c   |  8 ++++++--
 tcg/tcg-op.c             | 10 ++++------
 tcg/tcg-opc.h            |  3 +++
 tcg/tci/tcg-target.c     |  4 ++++
 tci.c                    |  6 ++++--
 10 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index b7ec4f5..7f7ab7e 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1556,6 +1556,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16s_i32:
         tcg_out_sxt(s, ext, MO_16, a0, a1);
         break;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
         break;
@@ -1567,6 +1568,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16u_i32:
         tcg_out_uxt(s, MO_16, a0, a1);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tcg_out_movr(s, TCG_TYPE_I32, a0, a1);
         break;
@@ -1712,6 +1714,8 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_ext8u_i64, { "r", "r" } },
     { INDEX_op_ext16u_i64, { "r", "r" } },
     { INDEX_op_ext32u_i64, { "r", "r" } },
+    { INDEX_op_ext_i32_i64, { "r", "r" } },
+    { INDEX_op_extu_i32_i64, { "r", "r" } },
 
     { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
     { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 887f22f..7648f7e 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -2064,9 +2064,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_bswap64_i64:
         tcg_out_bswap64(s, args[0]);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tcg_out_ext32u(s, args[0], args[1]);
         break;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tcg_out_ext32s(s, args[0], args[1]);
         break;
@@ -2201,6 +2203,9 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_ext16u_i64, { "r", "r" } },
     { INDEX_op_ext32u_i64, { "r", "r" } },
 
+    { INDEX_op_ext_i32_i64, { "r", "r" } },
+    { INDEX_op_extu_i32_i64, { "r", "r" } },
+
     { INDEX_op_deposit_i64, { "Q", "0", "Q" } },
     { INDEX_op_movcond_i64, { "r", "r", "re", "r", "0" } },
 
diff --git a/tcg/ia64/tcg-target.c b/tcg/ia64/tcg-target.c
index 81cb9f7..71e79cf 100644
--- a/tcg/ia64/tcg-target.c
+++ b/tcg/ia64/tcg-target.c
@@ -2148,9 +2148,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16u_i64:
         tcg_out_ext(s, OPC_ZXT2_I29, args[0], args[1]);
         break;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tcg_out_ext(s, OPC_SXT4_I29, args[0], args[1]);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tcg_out_ext(s, OPC_ZXT4_I29, args[0], args[1]);
         break;
@@ -2301,6 +2303,8 @@ static const TCGTargetOpDef ia64_op_defs[] = {
     { INDEX_op_ext16u_i64, { "r", "rZ"} },
     { INDEX_op_ext32s_i64, { "r", "rZ"} },
     { INDEX_op_ext32u_i64, { "r", "rZ"} },
+    { INDEX_op_ext_i32_i64, { "r", "rZ" } },
+    { INDEX_op_extu_i32_i64, { "r", "rZ" } },
 
     { INDEX_op_bswap16_i64, { "r", "rZ" } },
     { INDEX_op_bswap32_i64, { "r", "rZ" } },
diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
index 2b6eafa..31fa25c 100644
--- a/tcg/ppc/tcg-target.c
+++ b/tcg/ppc/tcg-target.c
@@ -2200,12 +2200,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_ext16s_i64:
         c = EXTSH;
         goto gen_ext;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         c = EXTSW;
         goto gen_ext;
     gen_ext:
         tcg_out32(s, c | RS(args[1]) | RA(args[0]));
         break;
+    case INDEX_op_extu_i32_i64:
+        tcg_out_ext32u(s, args[0], args[1]);
+        break;
 
     case INDEX_op_setcond_i32:
         tcg_out_setcond(s, TCG_TYPE_I32, args[3], args[0], args[1], args[2],
@@ -2482,6 +2486,8 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_ext8s_i64, { "r", "r" } },
     { INDEX_op_ext16s_i64, { "r", "r" } },
     { INDEX_op_ext32s_i64, { "r", "r" } },
+    { INDEX_op_ext_i32_i64, { "r", "r" } },
+    { INDEX_op_extu_i32_i64, { "r", "r" } },
     { INDEX_op_bswap16_i64, { "r", "r" } },
     { INDEX_op_bswap32_i64, { "r", "r" } },
     { INDEX_op_bswap64_i64, { "r", "r" } },
diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 921991e..a16fbec 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -2090,6 +2090,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16s_i64:
         tgen_ext16s(s, TCG_TYPE_I64, args[0], args[1]);
         break;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tgen_ext32s(s, args[0], args[1]);
         break;
@@ -2099,6 +2100,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext16u_i64:
         tgen_ext16u(s, TCG_TYPE_I64, args[0], args[1]);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tgen_ext32u(s, args[0], args[1]);
         break;
@@ -2251,6 +2253,9 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_ext32s_i64, { "r", "r" } },
     { INDEX_op_ext32u_i64, { "r", "r" } },
 
+    { INDEX_op_ext_i32_i64, { "r", "r" } },
+    { INDEX_op_extu_i32_i64, { "r", "r" } },
+
     { INDEX_op_bswap16_i64, { "r", "r" } },
     { INDEX_op_bswap32_i64, { "r", "r" } },
     { INDEX_op_bswap64_i64, { "r", "r" } },
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index b23032b..fe75af0 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1407,9 +1407,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_divu_i64:
         c = ARITH_UDIVX;
         goto gen_arith;
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         tcg_out_arithi(s, a0, a1, 0, SHIFT_SRA);
         break;
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
         break;
@@ -1531,8 +1533,10 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_neg_i64, { "R", "RJ" } },
     { INDEX_op_not_i64, { "R", "RJ" } },
 
-    { INDEX_op_ext32s_i64, { "R", "r" } },
-    { INDEX_op_ext32u_i64, { "R", "r" } },
+    { INDEX_op_ext32s_i64, { "R", "R" } },
+    { INDEX_op_ext32u_i64, { "R", "R" } },
+    { INDEX_op_ext_i32_i64, { "R", "r" } },
+    { INDEX_op_extu_i32_i64, { "R", "r" } },
     { INDEX_op_trunc_shr_i64_i32,  { "r", "R" } },
 
     { INDEX_op_brcond_i64, { "RZ", "RJ" } },
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 0e79fd1..7114315 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1770,9 +1770,8 @@ void tcg_gen_extu_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
         tcg_gen_mov_i32(TCGV_LOW(ret), arg);
         tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
     } else {
-        /* Note: we assume the target supports move between
-           32 and 64 bit registers.  */
-        tcg_gen_ext32u_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
+        tcg_gen_op2(&tcg_ctx, INDEX_op_extu_i32_i64,
+                    GET_TCGV_I64(ret), GET_TCGV_I32(arg));
     }
 }
 
@@ -1782,9 +1781,8 @@ void tcg_gen_ext_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
         tcg_gen_mov_i32(TCGV_LOW(ret), arg);
         tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
     } else {
-        /* Note: we assume the target supports move between
-           32 and 64 bit registers.  */
-        tcg_gen_ext32s_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
+        tcg_gen_op2(&tcg_ctx, INDEX_op_ext_i32_i64,
+                    GET_TCGV_I64(ret), GET_TCGV_I32(arg));
     }
 }
 
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 4a34f43..f721a5a 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -138,6 +138,9 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
 
+/* size changing ops */
+DEF(ext_i32_i64, 1, 1, 0, IMPL64)
+DEF(extu_i32_i64, 1, 1, 0, IMPL64)
 DEF(trunc_shr_i64_i32, 1, 1, 1,
     IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
     | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
diff --git a/tcg/tci/tcg-target.c b/tcg/tci/tcg-target.c
index 83472db..bbb54d4 100644
--- a/tcg/tci/tcg-target.c
+++ b/tcg/tci/tcg-target.c
@@ -210,6 +210,8 @@ static const TCGTargetOpDef tcg_target_op_defs[] = {
 #if TCG_TARGET_HAS_ext32u_i64
     { INDEX_op_ext32u_i64, { R, R } },
 #endif
+    { INDEX_op_ext_i32_i64, { R, R } },
+    { INDEX_op_extu_i32_i64, { R, R } },
 #if TCG_TARGET_HAS_bswap16_i64
     { INDEX_op_bswap16_i64, { R, R } },
 #endif
@@ -701,6 +703,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_ext16u_i64:   /* Optional (TCG_TARGET_HAS_ext16u_i64). */
     case INDEX_op_ext32s_i64:   /* Optional (TCG_TARGET_HAS_ext32s_i64). */
     case INDEX_op_ext32u_i64:   /* Optional (TCG_TARGET_HAS_ext32u_i64). */
+    case INDEX_op_ext_i32_i64:
+    case INDEX_op_extu_i32_i64:
 #endif /* TCG_TARGET_REG_BITS == 64 */
     case INDEX_op_neg_i32:      /* Optional (TCG_TARGET_HAS_neg_i32). */
     case INDEX_op_not_i32:      /* Optional (TCG_TARGET_HAS_not_i32). */
diff --git a/tci.c b/tci.c
index 8444948..3d6d177 100644
--- a/tci.c
+++ b/tci.c
@@ -1033,18 +1033,20 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
 #endif
 #if TCG_TARGET_HAS_ext32s_i64
         case INDEX_op_ext32s_i64:
+#endif
+        case INDEX_op_ext_i32_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r32s(&tb_ptr);
             tci_write_reg64(t0, t1);
             break;
-#endif
 #if TCG_TARGET_HAS_ext32u_i64
         case INDEX_op_ext32u_i64:
+#endif
+        case INDEX_op_extu_i32_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_r32(&tb_ptr);
             tci_write_reg64(t0, t1);
             break;
-#endif
 #if TCG_TARGET_HAS_bswap16_i64
         case INDEX_op_bswap16_i64:
             TODO();
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 09/17] tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
                   ` (7 preceding siblings ...)
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 08/17] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 10/17] tcg/optimize: do not remember garbage high bits for 32-bit ops Richard Henderson
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Aurelien Jarno

From: Aurelien Jarno <aurelien@aurel32.net>

They behave the same as ext32s_i64 and ext32u_i64 from the constant
folding and zero propagation point of view, except that they can't
be replaced by a mov, so we don't compute the affected value.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 8bfe7ff..8f33755 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -343,9 +343,11 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     CASE_OP_32_64(ext16u):
         return (uint16_t)x;
 
+    case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
         return (int32_t)x;
 
+    case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
         return (uint32_t)x;
 
@@ -830,6 +832,15 @@ void tcg_optimize(TCGContext *s)
             mask = temps[args[1]].mask & mask;
             break;
 
+        case INDEX_op_ext_i32_i64:
+            if ((temps[args[1]].mask & 0x80000000) != 0) {
+                break;
+            }
+        case INDEX_op_extu_i32_i64:
+            /* We do not compute affected as it is a size changing op.  */
+            mask = (uint32_t)temps[args[1]].mask;
+            break;
+
         CASE_OP_32_64(andc):
             /* Known-zeros does not imply known-ones.  Therefore unless
                args[2] is constant, we can't infer anything from it.  */
@@ -1008,6 +1019,8 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(ext16u):
         case INDEX_op_ext32s_i64:
         case INDEX_op_ext32u_i64:
+        case INDEX_op_ext_i32_i64:
+        case INDEX_op_extu_i32_i64:
             if (temp_is_const(args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, 0);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 10/17] tcg/optimize: do not remember garbage high bits for 32-bit ops
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
                   ` (8 preceding siblings ...)
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 09/17] tcg/optimize: add optimizations for " Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-18  8:35   ` Aurelien Jarno
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 11/17] tcg: update README about size changing ops Richard Henderson
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Aurelien Jarno

From: Aurelien Jarno <aurelien@aurel32.net>

Now that we have real size changing ops, we don't need to mark high
bits of the destination as garbage. The goal of the optimizer is to
predict the value of the temps (and not of the registers) and do
simplifications when possible. The problem there is therefore not the
fact that those bits are not counted as garbage, but that a size
changing op is replaced by a move.

This patch is basically a revert of 24666baf, including the changes that
have been made since then.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/optimize.c | 41 +++++++++++------------------------------
 1 file changed, 11 insertions(+), 30 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 8f33755..166074e 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -203,20 +203,12 @@ static bool temps_are_copies(TCGArg arg1, TCGArg arg2)
 static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg *args,
                              TCGArg dst, TCGArg val)
 {
-    TCGOpcode new_op = op_to_movi(op->opc);
-    tcg_target_ulong mask;
-
-    op->opc = new_op;
+    op->opc = op_to_movi(op->opc);
 
     reset_temp(dst);
     temps[dst].is_const = true;
     temps[dst].val = val;
-    mask = val;
-    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
-        /* High bits of the destination are now garbage.  */
-        mask |= ~0xffffffffull;
-    }
-    temps[dst].mask = mask;
+    temps[dst].mask = val;
 
     args[0] = dst;
     args[1] = val;
@@ -230,28 +222,21 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg *args,
         return;
     }
 
-    TCGOpcode new_op = op_to_mov(op->opc);
-    tcg_target_ulong mask;
-
-    op->opc = new_op;
+    op->opc = op_to_mov(op->opc);
 
     reset_temp(dst);
-    mask = temps[src].mask;
-    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
-        /* High bits of the destination are now garbage.  */
-        mask |= ~0xffffffffull;
-    }
-    temps[dst].mask = mask;
 
     if (s->temps[src].type == s->temps[dst].type) {
         temps[dst].next_copy = temps[src].next_copy;
         temps[dst].prev_copy = src;
         temps[temps[dst].next_copy].prev_copy = dst;
         temps[src].next_copy = dst;
-        temps[dst].is_const = temps[src].is_const;
-        temps[dst].val = temps[src].val;
     }
 
+    temps[dst].is_const = temps[src].is_const;
+    temps[dst].val = temps[src].val;
+    temps[dst].mask = temps[src].mask;
+
     args[0] = dst;
     args[1] = src;
 }
@@ -584,7 +569,7 @@ void tcg_optimize(TCGContext *s)
     reset_all_temps(nb_temps);
 
     for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) {
-        tcg_target_ulong mask, partmask, affected;
+        tcg_target_ulong mask, affected;
         int nb_oargs, nb_iargs, i;
         TCGArg tmp;
 
@@ -937,17 +922,13 @@ void tcg_optimize(TCGContext *s)
             break;
         }
 
-        /* 32-bit ops generate 32-bit results.  For the result is zero test
-           below, we can ignore high bits, but for further optimizations we
-           need to record that the high bits contain garbage.  */
-        partmask = mask;
+        /* 32-bit ops generate 32-bit results.  */
         if (!(def->flags & TCG_OPF_64BIT)) {
-            mask |= ~(tcg_target_ulong)0xffffffffu;
-            partmask &= 0xffffffffu;
+            mask &= 0xffffffffu;
             affected &= 0xffffffffu;
         }
 
-        if (partmask == 0) {
+        if (mask == 0) {
             assert(nb_oargs == 1);
             tcg_opt_gen_movi(s, op, args, args[0], 0);
             continue;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 11/17] tcg: update README about size changing ops
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
                   ` (9 preceding siblings ...)
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 10/17] tcg/optimize: do not remember garbage high bits for 32-bit ops Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 12/17] tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32 Richard Henderson
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Aurelien Jarno

From: Aurelien Jarno <aurelien@aurel32.net>

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
---
 tcg/README | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/tcg/README b/tcg/README
index 61b3899..a22f251 100644
--- a/tcg/README
+++ b/tcg/README
@@ -466,13 +466,25 @@ On a 32 bit target, all 64 bit operations are converted to 32 bits. A
 few specific operations must be implemented to allow it (see add2_i32,
 sub2_i32, brcond2_i32).
 
+On a 64 bit target, the values are transfered between 32 and 64-bit
+registers using the following ops:
+- trunc_shr_i64_i32
+- ext_i32_i64
+- extu_i32_i64
+
+They ensure that the values are correctly truncated or extended when
+moved from a 32-bit to a 64-bit register or vice-versa. Note that the
+trunc_shr_i64_i32 is an optional op. It is not necessary to implement
+it if all the following conditions are met:
+- 64-bit registers can hold 32-bit values
+- 32-bit values in a 64-bit register do not need to stay zero or
+  sign extended
+- all 32-bit TCG ops ignore the high part of 64-bit registers
+
 Floating point operations are not supported in this version. A
 previous incarnation of the code generator had full support of them,
 but it is better to concentrate on integer operations first.
 
-On a 64 bit target, no assumption is made in TCG about the storage of
-the 32 bit values in 64 bit registers.
-
 4.2) Constraints
 
 GCC like constraints are used to define the constraints of every
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 12/17] tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
                   ` (10 preceding siblings ...)
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 11/17] tcg: update README about size changing ops Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 14/17] tcg/i386: use softmmu fast path for unaligned accesses Richard Henderson
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Rather than allow arbitrary shift+trunc, only concern ourselves
with low and high parts.  This is all that was being used anyway.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-tricore/translate.c | 12 ++++++------
 tcg/README                 | 14 ++++++++++----
 tcg/aarch64/tcg-target.h   |  3 ++-
 tcg/i386/tcg-target.h      |  3 ++-
 tcg/ia64/tcg-target.h      |  3 ++-
 tcg/optimize.c             | 22 +++++++++++-----------
 tcg/ppc/tcg-target.h       |  3 ++-
 tcg/s390/tcg-target.h      |  3 ++-
 tcg/sparc/tcg-target.c     | 14 +++++++-------
 tcg/sparc/tcg-target.h     |  3 ++-
 tcg/tcg-op.c               | 38 +++++++++++++++++++-------------------
 tcg/tcg-op.h               |  5 +++--
 tcg/tcg-opc.h              |  7 +++++--
 tcg/tcg.h                  |  3 ++-
 tcg/tci/tcg-target.h       |  3 ++-
 15 files changed, 77 insertions(+), 59 deletions(-)

diff --git a/target-tricore/translate.c b/target-tricore/translate.c
index 7dc7a32..70f0930 100644
--- a/target-tricore/translate.c
+++ b/target-tricore/translate.c
@@ -457,11 +457,11 @@ gen_add64_d(TCGv_i64 ret, TCGv_i64 r1, TCGv_i64 r2)
     tcg_gen_xor_i64(t1, result, r1);
     tcg_gen_xor_i64(t0, r1, r2);
     tcg_gen_andc_i64(t1, t1, t0);
-    tcg_gen_trunc_shr_i64_i32(cpu_PSW_V, t1, 32);
+    tcg_gen_extrh_i64_i32(cpu_PSW_V, t1);
     /* calc SV bit */
     tcg_gen_or_tl(cpu_PSW_SV, cpu_PSW_SV, cpu_PSW_V);
     /* calc AV/SAV bits */
-    tcg_gen_trunc_shr_i64_i32(temp, result, 32);
+    tcg_gen_extrh_i64_i32(temp, result);
     tcg_gen_add_tl(cpu_PSW_AV, temp, temp);
     tcg_gen_xor_tl(cpu_PSW_AV, temp, cpu_PSW_AV);
     /* calc SAV */
@@ -1273,7 +1273,7 @@ gen_madd64_q(TCGv rl, TCGv rh, TCGv arg1_low, TCGv arg1_high, TCGv arg2,
     tcg_gen_xor_i64(t3, t4, t1);
     tcg_gen_xor_i64(t2, t1, t2);
     tcg_gen_andc_i64(t3, t3, t2);
-    tcg_gen_trunc_shr_i64_i32(cpu_PSW_V, t3, 32);
+    tcg_gen_extrh_i64_i32(cpu_PSW_V, t3);
     /* We produce an overflow on the host if the mul before was
        (0x80000000 * 0x80000000) << 1). If this is the
        case, we negate the ovf. */
@@ -1630,11 +1630,11 @@ gen_sub64_d(TCGv_i64 ret, TCGv_i64 r1, TCGv_i64 r2)
     tcg_gen_xor_i64(t1, result, r1);
     tcg_gen_xor_i64(t0, r1, r2);
     tcg_gen_and_i64(t1, t1, t0);
-    tcg_gen_trunc_shr_i64_i32(cpu_PSW_V, t1, 32);
+    tcg_gen_extrh_i64_i32(cpu_PSW_V, t1);
     /* calc SV bit */
     tcg_gen_or_tl(cpu_PSW_SV, cpu_PSW_SV, cpu_PSW_V);
     /* calc AV/SAV bits */
-    tcg_gen_trunc_shr_i64_i32(temp, result, 32);
+    tcg_gen_extrh_i64_i32(temp, result);
     tcg_gen_add_tl(cpu_PSW_AV, temp, temp);
     tcg_gen_xor_tl(cpu_PSW_AV, temp, cpu_PSW_AV);
     /* calc SAV */
@@ -2126,7 +2126,7 @@ gen_msub64_q(TCGv rl, TCGv rh, TCGv arg1_low, TCGv arg1_high, TCGv arg2,
     tcg_gen_xor_i64(t3, t4, t1);
     tcg_gen_xor_i64(t2, t1, t2);
     tcg_gen_and_i64(t3, t3, t2);
-    tcg_gen_trunc_shr_i64_i32(cpu_PSW_V, t3, 32);
+    tcg_gen_extrh_i64_i32(cpu_PSW_V, t3);
     /* We produce an overflow on the host if the mul before was
        (0x80000000 * 0x80000000) << 1). If this is the
        case, we negate the ovf. */
diff --git a/tcg/README b/tcg/README
index a22f251..34c0775 100644
--- a/tcg/README
+++ b/tcg/README
@@ -314,11 +314,17 @@ This operation would be equivalent to
 
   dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
 
-* trunc_shr_i64_i32 t0, t1, pos
+* extrl_i64_i32 t0, t1
 
-For 64-bit hosts only, right shift the 64-bit input T1 by POS and
-truncate to 32-bit output T0.  Depending on the host, this may be
-a simple mov/shift, or may require additional canonicalization.
+For 64-bit hosts only, extract the low 32-bits of input T1 and place it
+into 32-bit output T0.  Depending on the host, this may be a simple move,
+or may require additional canonicalization.
+
+* extrh_i64_i32 t0, t1
+
+For 64-bit hosts only, extract the high 32-bits of input T1 and place it
+into 32-bit output T0.  Depending on the host, this may be a simple shift,
+or may require additional canonicalization.
 
 ********* Conditional moves
 
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index dfd8801..19a04a6 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -70,7 +70,8 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
+#define TCG_TARGET_HAS_extrl_i64_i32    0
+#define TCG_TARGET_HAS_extrh_i64_i32    0
 
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index dae50ba..92be341 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -102,7 +102,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
+#define TCG_TARGET_HAS_extrl_i64_i32    0
+#define TCG_TARGET_HAS_extrh_i64_i32    0
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
 #define TCG_TARGET_HAS_ext8s_i64        1
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 29902f9..ae9b79f 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -160,7 +160,8 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_mulsh_i64        0
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
+#define TCG_TARGET_HAS_extrl_i64_i32    0
+#define TCG_TARGET_HAS_extrh_i64_i32    0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) ((len) <= 16)
 #define TCG_TARGET_deposit_i64_valid(ofs, len) ((len) <= 16)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 166074e..1e18239 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -273,7 +273,6 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     case INDEX_op_shr_i32:
         return (uint32_t)x >> (y & 31);
 
-    case INDEX_op_trunc_shr_i64_i32:
     case INDEX_op_shr_i64:
         return (uint64_t)x >> (y & 63);
 
@@ -333,9 +332,13 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
         return (int32_t)x;
 
     case INDEX_op_extu_i32_i64:
+    case INDEX_op_extrl_i64_i32:
     case INDEX_op_ext32u_i64:
         return (uint32_t)x;
 
+    case INDEX_op_extrh_i64_i32:
+        return (uint64_t)x >> 32;
+
     case INDEX_op_muluh_i32:
         return ((uint64_t)(uint32_t)x * (uint32_t)y) >> 32;
     case INDEX_op_mulsh_i32:
@@ -863,8 +866,11 @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        case INDEX_op_trunc_shr_i64_i32:
-            mask = (uint64_t)temps[args[1]].mask >> args[2];
+        case INDEX_op_extrl_i64_i32:
+            mask = (uint32_t)temps[args[1]].mask;
+            break;
+        case INDEX_op_extrh_i64_i32:
+            mask = (uint64_t)temps[args[1]].mask >> 32;
             break;
 
         CASE_OP_32_64(shl):
@@ -1002,6 +1008,8 @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_ext32u_i64:
         case INDEX_op_ext_i32_i64:
         case INDEX_op_extu_i32_i64:
+        case INDEX_op_extrl_i64_i32:
+        case INDEX_op_extrh_i64_i32:
             if (temp_is_const(args[1])) {
                 tmp = do_constant_folding(opc, temps[args[1]].val, 0);
                 tcg_opt_gen_movi(s, op, args, args[0], tmp);
@@ -1009,14 +1017,6 @@ void tcg_optimize(TCGContext *s)
             }
             goto do_default;
 
-        case INDEX_op_trunc_shr_i64_i32:
-            if (temp_is_const(args[1])) {
-                tmp = do_constant_folding(opc, temps[args[1]].val, args[2]);
-                tcg_opt_gen_movi(s, op, args, args[0], tmp);
-                break;
-            }
-            goto do_default;
-
         CASE_OP_32_64(add):
         CASE_OP_32_64(sub):
         CASE_OP_32_64(mul):
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index b7e6861..b4f0818 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -77,7 +77,8 @@ typedef enum {
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
 #define TCG_TARGET_HAS_sub2_i32         0
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
+#define TCG_TARGET_HAS_extrl_i64_i32    0
+#define TCG_TARGET_HAS_extrh_i64_i32    0
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_rot_i64          1
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 50016a8..d9dc038 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -72,7 +72,8 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
+#define TCG_TARGET_HAS_extrl_i64_i32    0
+#define TCG_TARGET_HAS_extrh_i64_i32    0
 
 #define TCG_TARGET_HAS_div2_i64         1
 #define TCG_TARGET_HAS_rot_i64          1
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index fe75af0..87f9bcc 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1415,12 +1415,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ext32u_i64:
         tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
         break;
-    case INDEX_op_trunc_shr_i64_i32:
-        if (a2 == 0) {
-            tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
-        } else {
-            tcg_out_arithi(s, a0, a1, a2, SHIFT_SRLX);
-        }
+    case INDEX_op_extrl_i64_i32:
+        tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
+        break;
+    case INDEX_op_extrh_i64_i32:
+        tcg_out_arithi(s, a0, a1, 32, SHIFT_SRLX);
         break;
 
     case INDEX_op_brcond_i64:
@@ -1537,7 +1536,8 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_ext32u_i64, { "R", "R" } },
     { INDEX_op_ext_i32_i64, { "R", "r" } },
     { INDEX_op_extu_i32_i64, { "R", "r" } },
-    { INDEX_op_trunc_shr_i64_i32,  { "r", "R" } },
+    { INDEX_op_extrl_i64_i32,  { "r", "R" } },
+    { INDEX_op_extrh_i64_i32,  { "r", "R" } },
 
     { INDEX_op_brcond_i64, { "RZ", "RJ" } },
     { INDEX_op_setcond_i64, { "R", "RZ", "RJ" } },
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 336c47f..2cd72d2 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -118,7 +118,8 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 1
+#define TCG_TARGET_HAS_extrl_i64_i32    1
+#define TCG_TARGET_HAS_extrh_i64_i32    1
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_rot_i64          0
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 7114315..0b9dd8f 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1737,28 +1737,28 @@ void tcg_gen_muls2_i64(TCGv_i64 rl, TCGv_i64 rh, TCGv_i64 arg1, TCGv_i64 arg2)
 
 /* Size changing operations.  */
 
-void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned count)
+void tcg_gen_extrl_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
 {
-    tcg_debug_assert(count < 64);
     if (TCG_TARGET_REG_BITS == 32) {
-        if (count >= 32) {
-            tcg_gen_shri_i32(ret, TCGV_HIGH(arg), count - 32);
-        } else if (count == 0) {
-            tcg_gen_mov_i32(ret, TCGV_LOW(arg));
-        } else {
-            TCGv_i64 t = tcg_temp_new_i64();
-            tcg_gen_shri_i64(t, arg, count);
-            tcg_gen_mov_i32(ret, TCGV_LOW(t));
-            tcg_temp_free_i64(t);
-        }
-    } else if (TCG_TARGET_HAS_trunc_shr_i64_i32) {
-        tcg_gen_op3(&tcg_ctx, INDEX_op_trunc_shr_i64_i32,
-                    GET_TCGV_I32(ret), GET_TCGV_I64(arg), count);
-    } else if (count == 0) {
+        tcg_gen_mov_i32(ret, TCGV_LOW(arg));
+    } else if (TCG_TARGET_HAS_extrl_i64_i32) {
+        tcg_gen_op2(&tcg_ctx, INDEX_op_extrl_i64_i32,
+                    GET_TCGV_I32(ret), GET_TCGV_I64(arg));
+    } else {
         tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(arg)));
+    }
+}
+
+void tcg_gen_extrh_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
+{
+    if (TCG_TARGET_REG_BITS == 32) {
+        tcg_gen_mov_i32(ret, TCGV_HIGH(arg));
+    } else if (TCG_TARGET_HAS_extrh_i64_i32) {
+        tcg_gen_op2(&tcg_ctx, INDEX_op_extrh_i64_i32,
+                    GET_TCGV_I32(ret), GET_TCGV_I64(arg));
     } else {
         TCGv_i64 t = tcg_temp_new_i64();
-        tcg_gen_shri_i64(t, arg, count);
+        tcg_gen_shri_i64(t, arg, 32);
         tcg_gen_mov_i32(ret, MAKE_TCGV_I32(GET_TCGV_I64(t)));
         tcg_temp_free_i64(t);
     }
@@ -1818,8 +1818,8 @@ void tcg_gen_extr_i64_i32(TCGv_i32 lo, TCGv_i32 hi, TCGv_i64 arg)
         tcg_gen_mov_i32(lo, TCGV_LOW(arg));
         tcg_gen_mov_i32(hi, TCGV_HIGH(arg));
     } else {
-        tcg_gen_trunc_shr_i64_i32(lo, arg, 0);
-        tcg_gen_trunc_shr_i64_i32(hi, arg, 32);
+        tcg_gen_extrl_i64_i32(lo, arg);
+        tcg_gen_extrh_i64_i32(hi, arg);
     }
 }
 
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index d1d763f..6b59eed 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -684,7 +684,8 @@ static inline void tcg_gen_neg_i64(TCGv_i64 ret, TCGv_i64 arg)
 void tcg_gen_extu_i32_i64(TCGv_i64 ret, TCGv_i32 arg);
 void tcg_gen_ext_i32_i64(TCGv_i64 ret, TCGv_i32 arg);
 void tcg_gen_concat_i32_i64(TCGv_i64 dest, TCGv_i32 low, TCGv_i32 high);
-void tcg_gen_trunc_shr_i64_i32(TCGv_i32 ret, TCGv_i64 arg, unsigned int c);
+void tcg_gen_extrl_i64_i32(TCGv_i32 ret, TCGv_i64 arg);
+void tcg_gen_extrh_i64_i32(TCGv_i32 ret, TCGv_i64 arg);
 void tcg_gen_extr_i64_i32(TCGv_i32 lo, TCGv_i32 hi, TCGv_i64 arg);
 void tcg_gen_extr32_i64(TCGv_i64 lo, TCGv_i64 hi, TCGv_i64 arg);
 
@@ -695,7 +696,7 @@ static inline void tcg_gen_concat32_i64(TCGv_i64 ret, TCGv_i64 lo, TCGv_i64 hi)
 
 static inline void tcg_gen_trunc_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
 {
-    tcg_gen_trunc_shr_i64_i32(ret, arg, 0);
+    tcg_gen_extrl_i64_i32(ret, arg);
 }
 
 /* QEMU specific operations.  */
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index f721a5a..02bbf30 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -141,8 +141,11 @@ DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
 /* size changing ops */
 DEF(ext_i32_i64, 1, 1, 0, IMPL64)
 DEF(extu_i32_i64, 1, 1, 0, IMPL64)
-DEF(trunc_shr_i64_i32, 1, 1, 1,
-    IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
+DEF(extrl_i64_i32, 1, 1, 0,
+    IMPL(TCG_TARGET_HAS_extrl_i64_i32)
+    | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
+DEF(extrh_i64_i32, 1, 1, 0,
+    IMPL(TCG_TARGET_HAS_extrh_i64_i32)
     | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
 
 DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | IMPL64)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index e7e33b9..f437824 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -66,7 +66,8 @@ typedef uint64_t TCGRegSet;
 
 #if TCG_TARGET_REG_BITS == 32
 /* Turn some undef macros into false macros.  */
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
+#define TCG_TARGET_HAS_extrl_i64_i32    0
+#define TCG_TARGET_HAS_extrh_i64_i32    0
 #define TCG_TARGET_HAS_div_i64          0
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_div2_i64         0
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 8b1139b..77e5952 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -84,7 +84,8 @@
 #define TCG_TARGET_HAS_mulsh_i32        0
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_trunc_shr_i64_i32 0
+#define TCG_TARGET_HAS_extrl_i64_i32    0
+#define TCG_TARGET_HAS_extrh_i64_i32    0
 #define TCG_TARGET_HAS_bswap16_i64      1
 #define TCG_TARGET_HAS_bswap32_i64      1
 #define TCG_TARGET_HAS_bswap64_i64      1
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 14/17] tcg/i386: use softmmu fast path for unaligned accesses
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
                   ` (11 preceding siblings ...)
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 12/17] tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32 Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 15/17] tcg/ppc: Improve unaligned load/store handling on 64-bit backend Richard Henderson
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Aurelien Jarno

From: Aurelien Jarno <aurelien@aurel32.net>

Softmmu unaligned load/stores currently goes through through the slow
path for two reasons:
  - to support unaligned access on host with strict alignement
  - to correctly handle accesses crossing pages

x86 is only concerned by the second reason. Unaligned accesses are
avoided by compilers, but are not uncommon. We therefore would like
to see them going through the fast path, if they don't cross pages.

For that we can use the fact that two adjacent TLB entries can't contain
the same page. Therefore accessing the TLB entry corresponding to the
first byte, but comparing its content to page address of the last byte
ensures that we don't cross pages. We can do this check without adding
more instructions in the TLB code (but increasing its length by one
byte) by using the LEA instruction to combine the existing move with the
size addition.

On an x86-64 host, this gives a 3% boot time improvement for a powerpc
guest and 4% for an x86-64 guest.

[rth: Tidied calculation of the offset mask]

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1436467197-2183-1-git-send-email-aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 7648f7e..ff55499 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -1172,7 +1172,7 @@ static void * const qemu_st_helpers[16] = {
    First argument register is clobbered.  */
 
 static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
-                                    int mem_index, TCGMemOp s_bits,
+                                    int mem_index, TCGMemOp opc,
                                     tcg_insn_unit **label_ptr, int which)
 {
     const TCGReg r0 = TCG_REG_L0;
@@ -1180,6 +1180,8 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     TCGType ttype = TCG_TYPE_I32;
     TCGType htype = TCG_TYPE_I32;
     int trexw = 0, hrexw = 0;
+    int s_mask = (1 << (opc & MO_SIZE)) - 1;
+    bool aligned = (opc & MO_AMASK) == MO_ALIGN || s_mask == 0;
 
     if (TCG_TARGET_REG_BITS == 64) {
         if (TARGET_LONG_BITS == 64) {
@@ -1193,13 +1195,19 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     }
 
     tcg_out_mov(s, htype, r0, addrlo);
-    tcg_out_mov(s, ttype, r1, addrlo);
+    if (aligned) {
+        tcg_out_mov(s, ttype, r1, addrlo);
+    } else {
+        /* For unaligned access check that we don't cross pages using
+           the page address of the last byte.  */
+        tcg_out_modrm_offset(s, OPC_LEA + trexw, r1, addrlo, s_mask);
+    }
 
     tcg_out_shifti(s, SHIFT_SHR + hrexw, r0,
                    TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
 
     tgen_arithi(s, ARITH_AND + trexw, r1,
-                TARGET_PAGE_MASK | ((1 << s_bits) - 1), 0);
+                TARGET_PAGE_MASK | (aligned ? s_mask : 0), 0);
     tgen_arithi(s, ARITH_AND + hrexw, r0,
                 (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS, 0);
 
@@ -1545,7 +1553,6 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
     TCGMemOp opc;
 #if defined(CONFIG_SOFTMMU)
     int mem_index;
-    TCGMemOp s_bits;
     tcg_insn_unit *label_ptr[2];
 #endif
 
@@ -1558,9 +1565,8 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 
 #if defined(CONFIG_SOFTMMU)
     mem_index = get_mmuidx(oi);
-    s_bits = opc & MO_SIZE;
 
-    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, s_bits,
+    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
                      label_ptr, offsetof(CPUTLBEntry, addr_read));
 
     /* TLB Hit.  */
@@ -1687,7 +1693,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
     TCGMemOp opc;
 #if defined(CONFIG_SOFTMMU)
     int mem_index;
-    TCGMemOp s_bits;
     tcg_insn_unit *label_ptr[2];
 #endif
 
@@ -1700,9 +1705,8 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 
 #if defined(CONFIG_SOFTMMU)
     mem_index = get_mmuidx(oi);
-    s_bits = opc & MO_SIZE;
 
-    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, s_bits,
+    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
                      label_ptr, offsetof(CPUTLBEntry, addr_write));
 
     /* TLB Hit.  */
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 15/17] tcg/ppc: Improve unaligned load/store handling on 64-bit backend
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
                   ` (12 preceding siblings ...)
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 14/17] tcg/i386: use softmmu fast path for unaligned accesses Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 16/17] tcg/s390: Use softmmu fast path for unaligned accesses Richard Henderson
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Currently, we get to the slow path for any unaligned access in the
backend, because we effectively preserve the bottom address bits
below the alignment requirement when comparing with the TLB entry,
so any non-0 bit there will cause the compare to fail.

For the same number of instructions, we can instead add the access
size - 1 to the address and stick to clearing all the bottom bits.

That means that normal unaligned accesses will not fallback (the HW
will handle them fine). Only when crossing a page boundary well we
end up having a mismatch because we'll end up pointing to the next
page which cannot possibly be in that same TLB entry.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Message-Id: <1437455978.5809.2.camel@kernel.crashing.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc/tcg-target.c | 41 +++++++++++++++++++++++++++++++----------
 1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
index 31fa25c..1672220 100644
--- a/tcg/ppc/tcg-target.c
+++ b/tcg/ppc/tcg-target.c
@@ -1361,7 +1361,7 @@ static void * const qemu_st_helpers[16] = {
    in CR7, loads the addend of the TLB into R3, and returns the register
    containing the guest address (zero-extended into R4).  Clobbers R0 and R2. */
 
-static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp s_bits,
+static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp opc,
                                TCGReg addrlo, TCGReg addrhi,
                                int mem_index, bool is_read)
 {
@@ -1371,6 +1371,7 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp s_bits,
            : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
     int add_off = offsetof(CPUArchState, tlb_table[mem_index][0].addend);
     TCGReg base = TCG_AREG0;
+    TCGMemOp s_bits = opc & MO_SIZE;
 
     /* Extract the page index, shifted into place for tlb index.  */
     if (TCG_TARGET_REG_BITS == 64) {
@@ -1422,17 +1423,37 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp s_bits,
        to minimize any load use delay.  */
     tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R3, TCG_REG_R3, add_off);
 
-    /* Clear the non-page, non-alignment bits from the address.  */
+    /* Clear the non-page, non-alignment bits from the address */
     if (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32) {
+        /* We don't support unaligned accesses on 32-bits, preserve
+         * the bottom bits and thus trigger a comparison failure on
+         * unaligned accesses
+         */
         tcg_out_rlw(s, RLWINM, TCG_REG_R0, addrlo, 0,
                     (32 - s_bits) & 31, 31 - TARGET_PAGE_BITS);
-    } else if (!s_bits) {
-        tcg_out_rld(s, RLDICR, TCG_REG_R0, addrlo,
-                    0, 63 - TARGET_PAGE_BITS);
+    } else if (s_bits) {
+        /* > byte access, we need to handle alignment */
+        if ((opc & MO_AMASK) == MO_ALIGN) {
+            /* Alignment required by the front-end, same as 32-bits */
+            tcg_out_rld(s, RLDICL, TCG_REG_R0, addrlo,
+                        64 - TARGET_PAGE_BITS, TARGET_PAGE_BITS - s_bits);
+            tcg_out_rld(s, RLDICL, TCG_REG_R0, TCG_REG_R0, TARGET_PAGE_BITS, 0);
+       } else {
+           /* We support unaligned accesses, we need to make sure we fail
+            * if we cross a page boundary. The trick is to add the
+            * access_size-1 to the address before masking the low bits.
+            * That will make the address overflow to the next page if we
+            * cross a page boundary which will then force a mismatch of
+            * the TLB compare since the next page cannot possibly be in
+            * the same TLB index.
+            */
+            tcg_out32(s, ADDI | TAI(TCG_REG_R0, addrlo, (1 << s_bits) - 1));
+            tcg_out_rld(s, RLDICR, TCG_REG_R0, TCG_REG_R0,
+                        0, 63 - TARGET_PAGE_BITS);
+        }
     } else {
-        tcg_out_rld(s, RLDICL, TCG_REG_R0, addrlo,
-                    64 - TARGET_PAGE_BITS, TARGET_PAGE_BITS - s_bits);
-        tcg_out_rld(s, RLDICL, TCG_REG_R0, TCG_REG_R0, TARGET_PAGE_BITS, 0);
+        /* Byte access, just chop off the bits below the page index */
+        tcg_out_rld(s, RLDICR, TCG_REG_R0, addrlo, 0, 63 - TARGET_PAGE_BITS);
     }
 
     if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
@@ -1592,7 +1613,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    addrlo = tcg_out_tlb_read(s, s_bits, addrlo, addrhi, mem_index, true);
+    addrlo = tcg_out_tlb_read(s, opc, addrlo, addrhi, mem_index, true);
 
     /* Load a pointer into the current opcode w/conditional branch-link. */
     label_ptr = s->code_ptr;
@@ -1667,7 +1688,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = get_mmuidx(oi);
-    addrlo = tcg_out_tlb_read(s, s_bits, addrlo, addrhi, mem_index, false);
+    addrlo = tcg_out_tlb_read(s, opc, addrlo, addrhi, mem_index, false);
 
     /* Load a pointer into the current opcode w/conditional branch-link. */
     label_ptr = s->code_ptr;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 16/17] tcg/s390: Use softmmu fast path for unaligned accesses
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
                   ` (13 preceding siblings ...)
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 15/17] tcg/ppc: Improve unaligned load/store handling on 64-bit backend Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 17/17] tcg/aarch64: " Richard Henderson
  2015-08-17 20:06 ` [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 96c3d65..be51c8b 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -1504,20 +1504,36 @@ QEMU_BUILD_BUG_ON(offsetof(CPUArchState, tlb_table[NB_MMU_MODES - 1][1])
 static TCGReg tcg_out_tlb_read(TCGContext* s, TCGReg addr_reg, TCGMemOp opc,
                                int mem_index, bool is_ld)
 {
-    TCGMemOp s_bits = opc & MO_SIZE;
-    uint64_t tlb_mask = TARGET_PAGE_MASK | ((1 << s_bits) - 1);
-    int ofs;
+    int s_mask = (1 << (opc & MO_SIZE)) - 1;
+    int ofs, a_off;
+    uint64_t tlb_mask;
+
+    /* For aligned accesses, we check the first byte and include the alignment
+       bits within the address.  For unaligned access, we check that we don't
+       cross pages using the address of the last byte of the access.  */
+    if ((opc & MO_AMASK) == MO_ALIGN || s_mask == 0) {
+        a_off = 0;
+        tlb_mask = TARGET_PAGE_MASK | s_mask;
+    } else {
+        a_off = s_mask;
+        tlb_mask = TARGET_PAGE_MASK;
+    }
 
     if (facilities & FACILITY_GEN_INST_EXT) {
         tcg_out_risbg(s, TCG_REG_R2, addr_reg,
                       64 - CPU_TLB_BITS - CPU_TLB_ENTRY_BITS,
                       63 - CPU_TLB_ENTRY_BITS,
                       64 + CPU_TLB_ENTRY_BITS - TARGET_PAGE_BITS, 1);
-        tgen_andi_risbg(s, TCG_REG_R3, addr_reg, tlb_mask);
+        if (a_off) {
+            tcg_out_insn(s, RX, LA, TCG_REG_R3, addr_reg, TCG_REG_NONE, a_off);
+            tgen_andi(s, TCG_TYPE_TL, TCG_REG_R3, tlb_mask);
+        } else {
+            tgen_andi_risbg(s, TCG_REG_R3, addr_reg, tlb_mask);
+        }
     } else {
         tcg_out_sh64(s, RSY_SRLG, TCG_REG_R2, addr_reg, TCG_REG_NONE,
                      TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
-        tcg_out_movi(s, TCG_TYPE_TL, TCG_REG_R3, addr_reg);
+        tcg_out_insn(s, RX, LA, TCG_REG_R3, addr_reg, TCG_REG_NONE, a_off);
         tgen_andi(s, TCG_TYPE_I64, TCG_REG_R2,
                   (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
         tgen_andi(s, TCG_TYPE_TL, TCG_REG_R3, tlb_mask);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 17/17] tcg/aarch64: Use softmmu fast path for unaligned accesses
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
                   ` (14 preceding siblings ...)
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 16/17] tcg/s390: Use softmmu fast path for unaligned accesses Richard Henderson
@ 2015-08-17 19:38 ` Richard Henderson
  2015-08-17 20:06 ` [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 19:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 7f7ab7e..bc3a539 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1051,14 +1051,29 @@ static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
    slow path for the failure case, which will be patched later when finalizing
    the slow path. Generated code returns the host addend in X1,
    clobbers X0,X2,X3,TMP. */
-static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp s_bits,
+static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp opc,
                              tcg_insn_unit **label_ptr, int mem_index,
                              bool is_read)
 {
-    TCGReg base = TCG_AREG0;
     int tlb_offset = is_read ?
         offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
         : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write);
+    int s_mask = (1 << (opc & MO_SIZE)) - 1;
+    TCGReg base = TCG_AREG0, x3;
+    uint64_t tlb_mask;
+
+    /* For aligned accesses, we check the first byte and include the alignment
+       bits within the address.  For unaligned access, we check that we don't
+       cross pages using the address of the last byte of the access.  */
+    if ((opc & MO_AMASK) == MO_ALIGN || s_mask == 0) {
+        tlb_mask = TARGET_PAGE_MASK | s_mask;
+        x3 = addr_reg;
+    } else {
+        tcg_out_insn(s, 3401, ADDI, TARGET_LONG_BITS == 64,
+                     TCG_REG_X3, addr_reg, s_mask);
+        tlb_mask = TARGET_PAGE_MASK;
+        x3 = TCG_REG_X3;
+    }
 
     /* Extract the TLB index from the address into X0.
        X0<CPU_TLB_BITS:0> =
@@ -1066,11 +1081,9 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp s_bits,
     tcg_out_ubfm(s, TARGET_LONG_BITS == 64, TCG_REG_X0, addr_reg,
                  TARGET_PAGE_BITS, TARGET_PAGE_BITS + CPU_TLB_BITS);
 
-    /* Store the page mask part of the address and the low s_bits into X3.
-       Later this allows checking for equality and alignment at the same time.
-       X3 = addr_reg & (PAGE_MASK | ((1 << s_bits) - 1)) */
-    tcg_out_logicali(s, I3404_ANDI, TARGET_LONG_BITS == 64, TCG_REG_X3,
-                     addr_reg, TARGET_PAGE_MASK | ((1 << s_bits) - 1));
+    /* Store the page mask part of the address into X3.  */
+    tcg_out_logicali(s, I3404_ANDI, TARGET_LONG_BITS == 64,
+                     TCG_REG_X3, x3, tlb_mask);
 
     /* Add any "high bits" from the tlb offset to the env address into X2,
        to take advantage of the LSL12 form of the ADDI instruction.
@@ -1207,10 +1220,9 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
     const TCGType otype = TARGET_LONG_BITS == 64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
 #ifdef CONFIG_SOFTMMU
     unsigned mem_index = get_mmuidx(oi);
-    TCGMemOp s_bits = memop & MO_SIZE;
     tcg_insn_unit *label_ptr;
 
-    tcg_out_tlb_read(s, addr_reg, s_bits, &label_ptr, mem_index, 1);
+    tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 1);
     tcg_out_qemu_ld_direct(s, memop, ext, data_reg,
                            TCG_REG_X1, otype, addr_reg);
     add_qemu_ldst_label(s, true, oi, ext, data_reg, addr_reg,
@@ -1229,14 +1241,13 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
     const TCGType otype = TARGET_LONG_BITS == 64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
 #ifdef CONFIG_SOFTMMU
     unsigned mem_index = get_mmuidx(oi);
-    TCGMemOp s_bits = memop & MO_SIZE;
     tcg_insn_unit *label_ptr;
 
-    tcg_out_tlb_read(s, addr_reg, s_bits, &label_ptr, mem_index, 0);
+    tcg_out_tlb_read(s, addr_reg, memop, &label_ptr, mem_index, 0);
     tcg_out_qemu_st_direct(s, memop, data_reg,
                            TCG_REG_X1, otype, addr_reg);
-    add_qemu_ldst_label(s, false, oi, s_bits == MO_64, data_reg, addr_reg,
-                        s->code_ptr, label_ptr);
+    add_qemu_ldst_label(s, false, oi, (memop & MO_SIZE)== MO_64,
+                        data_reg, addr_reg, s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
     tcg_out_qemu_st_direct(s, memop, data_reg,
                            GUEST_BASE ? TCG_REG_GUEST_BASE : TCG_REG_XZR,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 08/17] tcg: implement real ext_i32_i64 and extu_i32_i64 ops
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 08/17] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Richard Henderson
@ 2015-08-17 19:51   ` Claudio Fontana
  0 siblings, 0 replies; 20+ messages in thread
From: Claudio Fontana @ 2015-08-17 19:51 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, Stefan Weil, Claudio Fontana, qemu-devel,
	Alexander Graf, Blue Swirl, Aurelien Jarno

[-- Attachment #1: Type: text/plain, Size: 11879 bytes --]

On Monday, August 17, 2015, Richard Henderson <rth@twiddle.net
<javascript:_e(%7B%7D,'cvml','rth@twiddle.net');>> wrote:

> From: Aurelien Jarno <aurelien@aurel32.net>
>
> Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a
> 32-bit value is always converted to a 64-bit value and not propagated
> through the register allocator or the optimizer.
>
> Cc: Andrzej Zaborowski <balrogg@gmail.com>
> Cc: Alexander Graf <agraf@suse.de>
> Cc: Blue Swirl <blauwirbel@gmail.com>
> Cc: Claudio Fontana <claudio.fontana@huawei.com>
> Cc: Claudio Fontana <claudio.fontana@gmail.com>
> Cc: Richard Henderson <rth@twiddle.net>
> Cc: Stefan Weil <sw@weilnetz.de>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>


Acked-by: Claudio Fontana <claudio.fontana@huawei.com
<javascript:_e(%7B%7D,'cvml','claudio.fontana@huawei.com');>>


> ---
>  tcg/aarch64/tcg-target.c |  4 ++++
>  tcg/i386/tcg-target.c    |  5 +++++
>  tcg/ia64/tcg-target.c    |  4 ++++
>  tcg/ppc/tcg-target.c     |  6 ++++++
>  tcg/s390/tcg-target.c    |  5 +++++
>  tcg/sparc/tcg-target.c   |  8 ++++++--
>  tcg/tcg-op.c             | 10 ++++------
>  tcg/tcg-opc.h            |  3 +++
>  tcg/tci/tcg-target.c     |  4 ++++
>  tci.c                    |  6 ++++--
>  10 files changed, 45 insertions(+), 10 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index b7ec4f5..7f7ab7e 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -1556,6 +1556,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_ext16s_i32:
>          tcg_out_sxt(s, ext, MO_16, a0, a1);
>          break;
> +    case INDEX_op_ext_i32_i64:
>      case INDEX_op_ext32s_i64:
>          tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
>          break;
> @@ -1567,6 +1568,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_ext16u_i32:
>          tcg_out_uxt(s, MO_16, a0, a1);
>          break;
> +    case INDEX_op_extu_i32_i64:
>      case INDEX_op_ext32u_i64:
>          tcg_out_movr(s, TCG_TYPE_I32, a0, a1);
>          break;
> @@ -1712,6 +1714,8 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
>      { INDEX_op_ext8u_i64, { "r", "r" } },
>      { INDEX_op_ext16u_i64, { "r", "r" } },
>      { INDEX_op_ext32u_i64, { "r", "r" } },
> +    { INDEX_op_ext_i32_i64, { "r", "r" } },
> +    { INDEX_op_extu_i32_i64, { "r", "r" } },
>
>      { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
>      { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
> diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
> index 887f22f..7648f7e 100644
> --- a/tcg/i386/tcg-target.c
> +++ b/tcg/i386/tcg-target.c
> @@ -2064,9 +2064,11 @@ static inline void tcg_out_op(TCGContext *s,
> TCGOpcode opc,
>      case INDEX_op_bswap64_i64:
>          tcg_out_bswap64(s, args[0]);
>          break;
> +    case INDEX_op_extu_i32_i64:
>      case INDEX_op_ext32u_i64:
>          tcg_out_ext32u(s, args[0], args[1]);
>          break;
> +    case INDEX_op_ext_i32_i64:
>      case INDEX_op_ext32s_i64:
>          tcg_out_ext32s(s, args[0], args[1]);
>          break;
> @@ -2201,6 +2203,9 @@ static const TCGTargetOpDef x86_op_defs[] = {
>      { INDEX_op_ext16u_i64, { "r", "r" } },
>      { INDEX_op_ext32u_i64, { "r", "r" } },
>
> +    { INDEX_op_ext_i32_i64, { "r", "r" } },
> +    { INDEX_op_extu_i32_i64, { "r", "r" } },
> +
>      { INDEX_op_deposit_i64, { "Q", "0", "Q" } },
>      { INDEX_op_movcond_i64, { "r", "r", "re", "r", "0" } },
>
> diff --git a/tcg/ia64/tcg-target.c b/tcg/ia64/tcg-target.c
> index 81cb9f7..71e79cf 100644
> --- a/tcg/ia64/tcg-target.c
> +++ b/tcg/ia64/tcg-target.c
> @@ -2148,9 +2148,11 @@ static inline void tcg_out_op(TCGContext *s,
> TCGOpcode opc,
>      case INDEX_op_ext16u_i64:
>          tcg_out_ext(s, OPC_ZXT2_I29, args[0], args[1]);
>          break;
> +    case INDEX_op_ext_i32_i64:
>      case INDEX_op_ext32s_i64:
>          tcg_out_ext(s, OPC_SXT4_I29, args[0], args[1]);
>          break;
> +    case INDEX_op_extu_i32_i64:
>      case INDEX_op_ext32u_i64:
>          tcg_out_ext(s, OPC_ZXT4_I29, args[0], args[1]);
>          break;
> @@ -2301,6 +2303,8 @@ static const TCGTargetOpDef ia64_op_defs[] = {
>      { INDEX_op_ext16u_i64, { "r", "rZ"} },
>      { INDEX_op_ext32s_i64, { "r", "rZ"} },
>      { INDEX_op_ext32u_i64, { "r", "rZ"} },
> +    { INDEX_op_ext_i32_i64, { "r", "rZ" } },
> +    { INDEX_op_extu_i32_i64, { "r", "rZ" } },
>
>      { INDEX_op_bswap16_i64, { "r", "rZ" } },
>      { INDEX_op_bswap32_i64, { "r", "rZ" } },
> diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
> index 2b6eafa..31fa25c 100644
> --- a/tcg/ppc/tcg-target.c
> +++ b/tcg/ppc/tcg-target.c
> @@ -2200,12 +2200,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode
> opc, const TCGArg *args,
>      case INDEX_op_ext16s_i64:
>          c = EXTSH;
>          goto gen_ext;
> +    case INDEX_op_ext_i32_i64:
>      case INDEX_op_ext32s_i64:
>          c = EXTSW;
>          goto gen_ext;
>      gen_ext:
>          tcg_out32(s, c | RS(args[1]) | RA(args[0]));
>          break;
> +    case INDEX_op_extu_i32_i64:
> +        tcg_out_ext32u(s, args[0], args[1]);
> +        break;
>
>      case INDEX_op_setcond_i32:
>          tcg_out_setcond(s, TCG_TYPE_I32, args[3], args[0], args[1],
> args[2],
> @@ -2482,6 +2486,8 @@ static const TCGTargetOpDef ppc_op_defs[] = {
>      { INDEX_op_ext8s_i64, { "r", "r" } },
>      { INDEX_op_ext16s_i64, { "r", "r" } },
>      { INDEX_op_ext32s_i64, { "r", "r" } },
> +    { INDEX_op_ext_i32_i64, { "r", "r" } },
> +    { INDEX_op_extu_i32_i64, { "r", "r" } },
>      { INDEX_op_bswap16_i64, { "r", "r" } },
>      { INDEX_op_bswap32_i64, { "r", "r" } },
>      { INDEX_op_bswap64_i64, { "r", "r" } },
> diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
> index 921991e..a16fbec 100644
> --- a/tcg/s390/tcg-target.c
> +++ b/tcg/s390/tcg-target.c
> @@ -2090,6 +2090,7 @@ static inline void tcg_out_op(TCGContext *s,
> TCGOpcode opc,
>      case INDEX_op_ext16s_i64:
>          tgen_ext16s(s, TCG_TYPE_I64, args[0], args[1]);
>          break;
> +    case INDEX_op_ext_i32_i64:
>      case INDEX_op_ext32s_i64:
>          tgen_ext32s(s, args[0], args[1]);
>          break;
> @@ -2099,6 +2100,7 @@ static inline void tcg_out_op(TCGContext *s,
> TCGOpcode opc,
>      case INDEX_op_ext16u_i64:
>          tgen_ext16u(s, TCG_TYPE_I64, args[0], args[1]);
>          break;
> +    case INDEX_op_extu_i32_i64:
>      case INDEX_op_ext32u_i64:
>          tgen_ext32u(s, args[0], args[1]);
>          break;
> @@ -2251,6 +2253,9 @@ static const TCGTargetOpDef s390_op_defs[] = {
>      { INDEX_op_ext32s_i64, { "r", "r" } },
>      { INDEX_op_ext32u_i64, { "r", "r" } },
>
> +    { INDEX_op_ext_i32_i64, { "r", "r" } },
> +    { INDEX_op_extu_i32_i64, { "r", "r" } },
> +
>      { INDEX_op_bswap16_i64, { "r", "r" } },
>      { INDEX_op_bswap32_i64, { "r", "r" } },
>      { INDEX_op_bswap64_i64, { "r", "r" } },
> diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
> index b23032b..fe75af0 100644
> --- a/tcg/sparc/tcg-target.c
> +++ b/tcg/sparc/tcg-target.c
> @@ -1407,9 +1407,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_divu_i64:
>          c = ARITH_UDIVX;
>          goto gen_arith;
> +    case INDEX_op_ext_i32_i64:
>      case INDEX_op_ext32s_i64:
>          tcg_out_arithi(s, a0, a1, 0, SHIFT_SRA);
>          break;
> +    case INDEX_op_extu_i32_i64:
>      case INDEX_op_ext32u_i64:
>          tcg_out_arithi(s, a0, a1, 0, SHIFT_SRL);
>          break;
> @@ -1531,8 +1533,10 @@ static const TCGTargetOpDef sparc_op_defs[] = {
>      { INDEX_op_neg_i64, { "R", "RJ" } },
>      { INDEX_op_not_i64, { "R", "RJ" } },
>
> -    { INDEX_op_ext32s_i64, { "R", "r" } },
> -    { INDEX_op_ext32u_i64, { "R", "r" } },
> +    { INDEX_op_ext32s_i64, { "R", "R" } },
> +    { INDEX_op_ext32u_i64, { "R", "R" } },
> +    { INDEX_op_ext_i32_i64, { "R", "r" } },
> +    { INDEX_op_extu_i32_i64, { "R", "r" } },
>      { INDEX_op_trunc_shr_i64_i32,  { "r", "R" } },
>
>      { INDEX_op_brcond_i64, { "RZ", "RJ" } },
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 0e79fd1..7114315 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -1770,9 +1770,8 @@ void tcg_gen_extu_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
>          tcg_gen_mov_i32(TCGV_LOW(ret), arg);
>          tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
>      } else {
> -        /* Note: we assume the target supports move between
> -           32 and 64 bit registers.  */
> -        tcg_gen_ext32u_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
> +        tcg_gen_op2(&tcg_ctx, INDEX_op_extu_i32_i64,
> +                    GET_TCGV_I64(ret), GET_TCGV_I32(arg));
>      }
>  }
>
> @@ -1782,9 +1781,8 @@ void tcg_gen_ext_i32_i64(TCGv_i64 ret, TCGv_i32 arg)
>          tcg_gen_mov_i32(TCGV_LOW(ret), arg);
>          tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
>      } else {
> -        /* Note: we assume the target supports move between
> -           32 and 64 bit registers.  */
> -        tcg_gen_ext32s_i64(ret, MAKE_TCGV_I64(GET_TCGV_I32(arg)));
> +        tcg_gen_op2(&tcg_ctx, INDEX_op_ext_i32_i64,
> +                    GET_TCGV_I64(ret), GET_TCGV_I32(arg));
>      }
>  }
>
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index 4a34f43..f721a5a 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -138,6 +138,9 @@ DEF(rotl_i64, 1, 2, 0, IMPL64 |
> IMPL(TCG_TARGET_HAS_rot_i64))
>  DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
>  DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
>
> +/* size changing ops */
> +DEF(ext_i32_i64, 1, 1, 0, IMPL64)
> +DEF(extu_i32_i64, 1, 1, 0, IMPL64)
>  DEF(trunc_shr_i64_i32, 1, 1, 1,
>      IMPL(TCG_TARGET_HAS_trunc_shr_i64_i32)
>      | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
> diff --git a/tcg/tci/tcg-target.c b/tcg/tci/tcg-target.c
> index 83472db..bbb54d4 100644
> --- a/tcg/tci/tcg-target.c
> +++ b/tcg/tci/tcg-target.c
> @@ -210,6 +210,8 @@ static const TCGTargetOpDef tcg_target_op_defs[] = {
>  #if TCG_TARGET_HAS_ext32u_i64
>      { INDEX_op_ext32u_i64, { R, R } },
>  #endif
> +    { INDEX_op_ext_i32_i64, { R, R } },
> +    { INDEX_op_extu_i32_i64, { R, R } },
>  #if TCG_TARGET_HAS_bswap16_i64
>      { INDEX_op_bswap16_i64, { R, R } },
>  #endif
> @@ -701,6 +703,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
> const TCGArg *args,
>      case INDEX_op_ext16u_i64:   /* Optional (TCG_TARGET_HAS_ext16u_i64).
> */
>      case INDEX_op_ext32s_i64:   /* Optional (TCG_TARGET_HAS_ext32s_i64).
> */
>      case INDEX_op_ext32u_i64:   /* Optional (TCG_TARGET_HAS_ext32u_i64).
> */
> +    case INDEX_op_ext_i32_i64:
> +    case INDEX_op_extu_i32_i64:
>  #endif /* TCG_TARGET_REG_BITS == 64 */
>      case INDEX_op_neg_i32:      /* Optional (TCG_TARGET_HAS_neg_i32). */
>      case INDEX_op_not_i32:      /* Optional (TCG_TARGET_HAS_not_i32). */
> diff --git a/tci.c b/tci.c
> index 8444948..3d6d177 100644
> --- a/tci.c
> +++ b/tci.c
> @@ -1033,18 +1033,20 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env,
> uint8_t *tb_ptr)
>  #endif
>  #if TCG_TARGET_HAS_ext32s_i64
>          case INDEX_op_ext32s_i64:
> +#endif
> +        case INDEX_op_ext_i32_i64:
>              t0 = *tb_ptr++;
>              t1 = tci_read_r32s(&tb_ptr);
>              tci_write_reg64(t0, t1);
>              break;
> -#endif
>  #if TCG_TARGET_HAS_ext32u_i64
>          case INDEX_op_ext32u_i64:
> +#endif
> +        case INDEX_op_extu_i32_i64:
>              t0 = *tb_ptr++;
>              t1 = tci_read_r32(&tb_ptr);
>              tci_write_reg64(t0, t1);
>              break;
> -#endif
>  #if TCG_TARGET_HAS_bswap16_i64
>          case INDEX_op_bswap16_i64:
>              TODO();
> --
> 2.4.3
>
>

-- 
Claudio Fontana
Werner-Friedmann-Bogen 28, D-80993 Muenchen
email:  claudio.fontana@gmail.com
tel mobile (it): +39 342 1714257
tel mobile (work): +49 15253060158

[-- Attachment #2: Type: text/html, Size: 15012 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 00/17] queued tcg improvements
  2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
                   ` (15 preceding siblings ...)
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 17/17] tcg/aarch64: " Richard Henderson
@ 2015-08-17 20:06 ` Richard Henderson
  16 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2015-08-17 20:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Gah.  Forgot to change the subject tag to PULL.

On 08/17/2015 12:38 PM, Richard Henderson wrote:
> This pull includes three independent patch sets, which were
> all posted during the 2.4 freeze.
> 
> The first is algorithmic improvements to tcg/optimize, both
> improving its runtime and its tracking of constants.
> 
> The second is improvements to the representation of 32<->64-bit
> size changing operations.  Still to do here is investigate how
> these might be best applied to each tcg host.
> 
> The third is improvements to how guest unaligned accesses are
> implemented in softmmu mode, for the 4 supported  host processors
> that themselves implement unaligned accesses.
> 
> 
> r~
> 
> 
> The following changes since commit 074a9925e1cfd659d5376dcaccd1436d3840e611:
> 
>   Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging (2015-08-14 16:52:34 +0100)
> 
> are available in the git repository at:
> 
>   git://github.com/rth7680/qemu.git tags/pull-tcg-20150817
> 
> for you to fetch changes up to 845fa7960c4fc7ab936e7ec45076ef230266d6cd:
> 
>   tcg/aarch64: Use softmmu fast path for unaligned accesses (2015-08-17 12:18:05 -0700)


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 10/17] tcg/optimize: do not remember garbage high bits for 32-bit ops
  2015-08-17 19:38 ` [Qemu-devel] [PATCH 10/17] tcg/optimize: do not remember garbage high bits for 32-bit ops Richard Henderson
@ 2015-08-18  8:35   ` Aurelien Jarno
  0 siblings, 0 replies; 20+ messages in thread
From: Aurelien Jarno @ 2015-08-18  8:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: peter.maydell, qemu-devel

On 2015-08-17 12:38, Richard Henderson wrote:
> From: Aurelien Jarno <aurelien@aurel32.net>
> 
> Now that we have real size changing ops, we don't need to mark high
> bits of the destination as garbage. The goal of the optimizer is to
> predict the value of the temps (and not of the registers) and do
> simplifications when possible. The problem there is therefore not the
> fact that those bits are not counted as garbage, but that a size
> changing op is replaced by a move.
> 
> This patch is basically a revert of 24666baf, including the changes that
> have been made since then.
> 
> Cc: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
> ---
>  tcg/optimize.c | 41 +++++++++++------------------------------
>  1 file changed, 11 insertions(+), 30 deletions(-)

Self NACK. This patch breaks things given !(def->flags & TCG_OPF_64BIT)
doesn't necessary means 32-bit result, at least in the case of the call
opcode.

I'll try to send an updated patch which handles that correctly in the
next days (and catch up on other QEMU things), I am currently busy with
other free software stuff.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-08-18  8:35 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-17 19:38 [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson
2015-08-17 19:38 ` [Qemu-devel] [PATCH 01/17] tcg/optimize: fix constant signedness Richard Henderson
2015-08-17 19:38 ` [Qemu-devel] [PATCH 02/17] tcg/optimize: optimize temps tracking Richard Henderson
2015-08-17 19:38 ` [Qemu-devel] [PATCH 03/17] tcg/optimize: add temp_is_const and temp_is_copy functions Richard Henderson
2015-08-17 19:38 ` [Qemu-devel] [PATCH 04/17] tcg/optimize: track const/copy status separately Richard Henderson
2015-08-17 19:38 ` [Qemu-devel] [PATCH 05/17] tcg/optimize: allow constant to have copies Richard Henderson
2015-08-17 19:38 ` [Qemu-devel] [PATCH 06/17] tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 Richard Henderson
2015-08-17 19:38 ` [Qemu-devel] [PATCH 07/17] tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32 Richard Henderson
2015-08-17 19:38 ` [Qemu-devel] [PATCH 08/17] tcg: implement real ext_i32_i64 and extu_i32_i64 ops Richard Henderson
2015-08-17 19:51   ` Claudio Fontana
2015-08-17 19:38 ` [Qemu-devel] [PATCH 09/17] tcg/optimize: add optimizations for " Richard Henderson
2015-08-17 19:38 ` [Qemu-devel] [PATCH 10/17] tcg/optimize: do not remember garbage high bits for 32-bit ops Richard Henderson
2015-08-18  8:35   ` Aurelien Jarno
2015-08-17 19:38 ` [Qemu-devel] [PATCH 11/17] tcg: update README about size changing ops Richard Henderson
2015-08-17 19:38 ` [Qemu-devel] [PATCH 12/17] tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32 Richard Henderson
2015-08-17 19:38 ` [Qemu-devel] [PATCH 14/17] tcg/i386: use softmmu fast path for unaligned accesses Richard Henderson
2015-08-17 19:38 ` [Qemu-devel] [PATCH 15/17] tcg/ppc: Improve unaligned load/store handling on 64-bit backend Richard Henderson
2015-08-17 19:38 ` [Qemu-devel] [PATCH 16/17] tcg/s390: Use softmmu fast path for unaligned accesses Richard Henderson
2015-08-17 19:38 ` [Qemu-devel] [PATCH 17/17] tcg/aarch64: " Richard Henderson
2015-08-17 20:06 ` [Qemu-devel] [PATCH 00/17] queued tcg improvements Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.