All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/36] tcg 5.1 omnibus patch set
@ 2020-04-22  1:16 Richard Henderson
  2020-04-22  1:16 ` [PATCH v2 01/36] tcg: Add tcg_gen_gvec_dup_imm Richard Henderson
                   ` (36 more replies)
  0 siblings, 37 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

For v1, I had split this into 4 logically distinct parts.  But
apparently there are minor interdependencies, because the later
sets would not apply standalone, says Alex.

Rather than tease them apart, and then have to undo that work
in order to actually apply them later, I'll just lump them.

So:

  Part 1, patches 1-7, tcg_gen_gvec_dup_imm, is reviewed.

  Part 2, patch 8, vector tail clearing, is reviewed, and I have
          moved the target/arm patches into a different queue.

  Part 3, patches 9-25, TYPE_CONST temporaries, is mostly unreviewed.

  Part 4, patch 26, load_dest for GVecGen2, a support patch for SVE2.

  Part 5, patches 27-36, add vector rotate patterns, is brand new.
          I include two demonstrators for target/ppc and target/s390x.
          It will also be used by SVE2.


r~

Richard Henderson (36):
  tcg: Add tcg_gen_gvec_dup_imm
  target/s390x: Use tcg_gen_gvec_dup_imm
  target/ppc: Use tcg_gen_gvec_dup_imm
  target/arm: Use tcg_gen_gvec_dup_imm
  tcg: Use tcg_gen_gvec_dup_imm in logical simplifications
  tcg: Remove tcg_gen_gvec_dup{8,16,32,64}i
  tcg: Add tcg_gen_gvec_dup_tl
  tcg: Improve vector tail clearing
  tcg: Consolidate 3 bits into enum TCGTempKind
  tcg: Add temp_readonly
  tcg: Introduce TYPE_CONST temporaries
  tcg: Use tcg_constant_i32 with icount expander
  tcg: Use tcg_constant_{i32,i64} with tcg int expanders
  tcg: Use tcg_constant_{i32,vec} with tcg vec expanders
  tcg: Use tcg_constant_{i32,i64} with tcg plugins
  tcg: Rename struct tcg_temp_info to TempOptInfo
  tcg/optimize: Adjust TempOptInfo allocation
  tcg/optimize: Use tcg_constant_internal with constant folding
  tcg/tci: Add special tci_movi_{i32,i64} opcodes
  tcg: Remove movi and dupi opcodes
  tcg: Use tcg_out_dupi_vec from temp_load
  tcg: Increase tcg_out_dupi_vec immediate to int64_t
  tcg: Add tcg_reg_alloc_dup2
  tcg/i386: Use tcg_constant_vec with tcg vec expanders
  tcg: Remove tcg_gen_dup{8,16,32,64}i_vec
  tcg: Add load_dest parameter to GVecGen2
  tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32,64}
  tcg: Implement gvec support for rotate by immediate
  tcg: Implement gvec support for rotate by vector
  tcg: Remove expansion to shift by vector from do_shifts
  tcg: Implement gvec support for rotate by scalar
  tcg/i386: Implement INDEX_op_rotl[is]_vec
  tcg/aarch64: Implement INDEX_op_rotli_vec
  tcg/ppc: Implement INDEX_op_rot[lr]v_vec
  target/ppc: Use tcg_gen_gvec_rotlv
  target/s390x: Use tcg_gen_gvec_rotl{i,s,v}

 accel/tcg/tcg-runtime.h             |  15 ++
 include/exec/gen-icount.h           |  25 +-
 include/tcg/tcg-op-gvec.h           |  25 +-
 include/tcg/tcg-op.h                |  30 +--
 include/tcg/tcg-opc.h               |  15 +-
 include/tcg/tcg.h                   |  53 +++-
 target/ppc/helper.h                 |   4 -
 target/s390x/helper.h               |   4 -
 tcg/aarch64/tcg-target.h            |   3 +
 tcg/aarch64/tcg-target.opc.h        |   1 +
 tcg/i386/tcg-target.h               |   3 +
 tcg/ppc/tcg-target.h                |   3 +
 tcg/ppc/tcg-target.opc.h            |   1 -
 accel/tcg/plugin-gen.c              |  49 ++--
 accel/tcg/tcg-runtime-gvec.c        | 144 +++++++++++
 target/arm/translate-a64.c          |  10 +-
 target/arm/translate-sve.c          |  12 +-
 target/arm/translate.c              |   9 +-
 target/ppc/int_helper.c             |  17 --
 target/ppc/translate/vmx-impl.inc.c |  40 +--
 target/ppc/translate/vsx-impl.inc.c |   2 +-
 target/s390x/translate_vx.inc.c     | 107 ++------
 target/s390x/vec_int_helper.c       |  31 ---
 tcg/aarch64/tcg-target.inc.c        |  32 ++-
 tcg/arm/tcg-target.inc.c            |   1 -
 tcg/i386/tcg-target.inc.c           | 195 ++++++++++-----
 tcg/mips/tcg-target.inc.c           |   2 -
 tcg/optimize.c                      | 204 +++++++--------
 tcg/ppc/tcg-target.inc.c            |  47 ++--
 tcg/riscv/tcg-target.inc.c          |   2 -
 tcg/s390/tcg-target.inc.c           |   2 -
 tcg/sparc/tcg-target.inc.c          |   2 -
 tcg/tcg-op-gvec.c                   | 374 +++++++++++++++++++++++-----
 tcg/tcg-op-vec.c                    | 218 +++++++++++-----
 tcg/tcg-op.c                        | 232 ++++++++---------
 tcg/tcg.c                           | 347 ++++++++++++++++++++------
 tcg/tci.c                           |   4 +-
 tcg/tci/tcg-target.inc.c            |   6 +-
 target/s390x/insn-data.def          |   4 +-
 tcg/README                          |   7 +-
 40 files changed, 1490 insertions(+), 792 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 01/36] tcg: Add tcg_gen_gvec_dup_imm
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
@ 2020-04-22  1:16 ` Richard Henderson
  2020-04-22  1:16 ` [PATCH v2 02/36] target/s390x: Use tcg_gen_gvec_dup_imm Richard Henderson
                   ` (35 subsequent siblings)
  36 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, LIU Zhiwei, David Hildenbrand

Add a version of tcg_gen_dup_* that takes both immediate and
a vector element size operand.  This will replace the set of
tcg_gen_gvec_dup{8,16,32,64}i functions that encode the element
size within the function name.

Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-op-gvec.h | 2 ++
 tcg/tcg-op-gvec.c         | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index 74534e2480..eb0d47a42b 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -313,6 +313,8 @@ void tcg_gen_gvec_ors(unsigned vece, uint32_t dofs, uint32_t aofs,
 
 void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
                           uint32_t s, uint32_t m);
+void tcg_gen_gvec_dup_imm(unsigned vece, uint32_t dofs, uint32_t s,
+                          uint32_t m, uint64_t imm);
 void tcg_gen_gvec_dup_i32(unsigned vece, uint32_t dofs, uint32_t s,
                           uint32_t m, TCGv_i32);
 void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t s,
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 327d9588e0..593bb4542e 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1569,6 +1569,13 @@ void tcg_gen_gvec_dup8i(uint32_t dofs, uint32_t oprsz,
     do_dup(MO_8, dofs, oprsz, maxsz, NULL, NULL, x);
 }
 
+void tcg_gen_gvec_dup_imm(unsigned vece, uint32_t dofs, uint32_t oprsz,
+                          uint32_t maxsz, uint64_t x)
+{
+    check_size_align(oprsz, maxsz, dofs);
+    do_dup(vece, dofs, oprsz, maxsz, NULL, NULL, x);
+}
+
 void tcg_gen_gvec_not(unsigned vece, uint32_t dofs, uint32_t aofs,
                       uint32_t oprsz, uint32_t maxsz)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 02/36] target/s390x: Use tcg_gen_gvec_dup_imm
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
  2020-04-22  1:16 ` [PATCH v2 01/36] tcg: Add tcg_gen_gvec_dup_imm Richard Henderson
@ 2020-04-22  1:16 ` Richard Henderson
  2020-04-22  1:16 ` [PATCH v2 03/36] target/ppc: " Richard Henderson
                   ` (34 subsequent siblings)
  36 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, David Hildenbrand

The gen_gvec_dupi switch is unnecessary with the new function.
Replace it with a local gen_gvec_dup_imm that takes care of the
register to offset conversion and length arguments.

Drop zero_vec and use use gen_gvec_dup_imm with 0.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/translate_vx.inc.c | 41 +++++++--------------------------
 1 file changed, 8 insertions(+), 33 deletions(-)

diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 24558cce80..12347f8a03 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -231,8 +231,8 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
 #define gen_gvec_mov(v1, v2) \
     tcg_gen_gvec_mov(0, vec_full_reg_offset(v1), vec_full_reg_offset(v2), 16, \
                      16)
-#define gen_gvec_dup64i(v1, c) \
-    tcg_gen_gvec_dup64i(vec_full_reg_offset(v1), 16, 16, c)
+#define gen_gvec_dup_imm(es, v1, c) \
+    tcg_gen_gvec_dup_imm(es, vec_full_reg_offset(v1), 16, 16, c);
 #define gen_gvec_fn_2(fn, es, v1, v2) \
     tcg_gen_gvec_##fn(es, vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                       16, 16)
@@ -316,31 +316,6 @@ static void gen_gvec128_4_i64(gen_gvec128_4_i64_fn fn, uint8_t d, uint8_t a,
         tcg_temp_free_i64(cl);
 }
 
-static void gen_gvec_dupi(uint8_t es, uint8_t reg, uint64_t c)
-{
-    switch (es) {
-    case ES_8:
-        tcg_gen_gvec_dup8i(vec_full_reg_offset(reg), 16, 16, c);
-        break;
-    case ES_16:
-        tcg_gen_gvec_dup16i(vec_full_reg_offset(reg), 16, 16, c);
-        break;
-    case ES_32:
-        tcg_gen_gvec_dup32i(vec_full_reg_offset(reg), 16, 16, c);
-        break;
-    case ES_64:
-        gen_gvec_dup64i(reg, c);
-        break;
-    default:
-        g_assert_not_reached();
-    }
-}
-
-static void zero_vec(uint8_t reg)
-{
-    tcg_gen_gvec_dup8i(vec_full_reg_offset(reg), 16, 16, 0);
-}
-
 static void gen_addi2_i64(TCGv_i64 dl, TCGv_i64 dh, TCGv_i64 al, TCGv_i64 ah,
                           uint64_t b)
 {
@@ -396,8 +371,8 @@ static DisasJumpType op_vgbm(DisasContext *s, DisasOps *o)
          * Masks for both 64 bit elements of the vector are the same.
          * Trust tcg to produce a good constant loading.
          */
-        gen_gvec_dup64i(get_field(s, v1),
-                        generate_byte_mask(i2 & 0xff));
+        gen_gvec_dup_imm(ES_64, get_field(s, v1),
+                         generate_byte_mask(i2 & 0xff));
     } else {
         TCGv_i64 t = tcg_temp_new_i64();
 
@@ -432,7 +407,7 @@ static DisasJumpType op_vgm(DisasContext *s, DisasOps *o)
         }
     }
 
-    gen_gvec_dupi(es, get_field(s, v1), mask);
+    gen_gvec_dup_imm(es, get_field(s, v1), mask);
     return DISAS_NEXT;
 }
 
@@ -585,7 +560,7 @@ static DisasJumpType op_vllez(DisasContext *s, DisasOps *o)
 
     t = tcg_temp_new_i64();
     tcg_gen_qemu_ld_i64(t, o->addr1, get_mem_index(s), MO_TE | es);
-    zero_vec(get_field(s, v1));
+    gen_gvec_dup_imm(es, get_field(s, v1), 0);
     write_vec_element_i64(t, get_field(s, v1), enr, es);
     tcg_temp_free_i64(t);
     return DISAS_NEXT;
@@ -892,7 +867,7 @@ static DisasJumpType op_vrepi(DisasContext *s, DisasOps *o)
         return DISAS_NORETURN;
     }
 
-    gen_gvec_dupi(es, get_field(s, v1), data);
+    gen_gvec_dup_imm(es, get_field(s, v1), data);
     return DISAS_NEXT;
 }
 
@@ -1372,7 +1347,7 @@ static DisasJumpType op_vcksm(DisasContext *s, DisasOps *o)
         read_vec_element_i32(tmp, get_field(s, v2), i, ES_32);
         tcg_gen_add2_i32(tmp, sum, sum, sum, tmp, tmp);
     }
-    zero_vec(get_field(s, v1));
+    gen_gvec_dup_imm(ES_32, get_field(s, v1), 0);
     write_vec_element_i32(sum, get_field(s, v1), 1, ES_32);
 
     tcg_temp_free_i32(tmp);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 03/36] target/ppc: Use tcg_gen_gvec_dup_imm
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
  2020-04-22  1:16 ` [PATCH v2 01/36] tcg: Add tcg_gen_gvec_dup_imm Richard Henderson
  2020-04-22  1:16 ` [PATCH v2 02/36] target/s390x: Use tcg_gen_gvec_dup_imm Richard Henderson
@ 2020-04-22  1:16 ` Richard Henderson
  2020-04-22  1:16 ` [PATCH v2 04/36] target/arm: " Richard Henderson
                   ` (33 subsequent siblings)
  36 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, David Gibson

We can now unify the implementation of the 3 VSPLTI instructions.

Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate/vmx-impl.inc.c | 32 ++++++++++++++++-------------
 target/ppc/translate/vsx-impl.inc.c |  2 +-
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 81d5a7a341..403ed3a01c 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -1035,21 +1035,25 @@ GEN_VXRFORM_DUAL(vcmpbfp, PPC_ALTIVEC, PPC_NONE, \
 GEN_VXRFORM_DUAL(vcmpgtfp, PPC_ALTIVEC, PPC_NONE, \
                  vcmpgtud, PPC_NONE, PPC2_ALTIVEC_207)
 
-#define GEN_VXFORM_DUPI(name, tcg_op, opc2, opc3)                       \
-static void glue(gen_, name)(DisasContext *ctx)                         \
-    {                                                                   \
-        int simm;                                                       \
-        if (unlikely(!ctx->altivec_enabled)) {                          \
-            gen_exception(ctx, POWERPC_EXCP_VPU);                       \
-            return;                                                     \
-        }                                                               \
-        simm = SIMM5(ctx->opcode);                                      \
-        tcg_op(avr_full_offset(rD(ctx->opcode)), 16, 16, simm);         \
+static void gen_vsplti(DisasContext *ctx, int vece)
+{
+    int simm;
+
+    if (unlikely(!ctx->altivec_enabled)) {
+        gen_exception(ctx, POWERPC_EXCP_VPU);
+        return;
     }
 
-GEN_VXFORM_DUPI(vspltisb, tcg_gen_gvec_dup8i, 6, 12);
-GEN_VXFORM_DUPI(vspltish, tcg_gen_gvec_dup16i, 6, 13);
-GEN_VXFORM_DUPI(vspltisw, tcg_gen_gvec_dup32i, 6, 14);
+    simm = SIMM5(ctx->opcode);
+    tcg_gen_gvec_dup_imm(vece, avr_full_offset(rD(ctx->opcode)), 16, 16, simm);
+}
+
+#define GEN_VXFORM_VSPLTI(name, vece, opc2, opc3) \
+static void glue(gen_, name)(DisasContext *ctx) { gen_vsplti(ctx, vece); }
+
+GEN_VXFORM_VSPLTI(vspltisb, MO_8, 6, 12);
+GEN_VXFORM_VSPLTI(vspltish, MO_16, 6, 13);
+GEN_VXFORM_VSPLTI(vspltisw, MO_32, 6, 14);
 
 #define GEN_VXFORM_NOA(name, opc2, opc3)                                \
 static void glue(gen_, name)(DisasContext *ctx)                         \
@@ -1559,7 +1563,7 @@ GEN_VXFORM_DUAL(vsldoi, PPC_ALTIVEC, PPC_NONE,
 #undef GEN_VXRFORM_DUAL
 #undef GEN_VXRFORM1
 #undef GEN_VXRFORM
-#undef GEN_VXFORM_DUPI
+#undef GEN_VXFORM_VSPLTI
 #undef GEN_VXFORM_NOA
 #undef GEN_VXFORM_UIMM
 #undef GEN_VAFORM_PAIRED
diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c
index 8287e272f5..b518de46db 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1579,7 +1579,7 @@ static void gen_xxspltib(DisasContext *ctx)
             return;
         }
     }
-    tcg_gen_gvec_dup8i(vsr_full_offset(rt), 16, 16, uim8);
+    tcg_gen_gvec_dup_imm(MO_8, vsr_full_offset(rt), 16, 16, uim8);
 }
 
 static void gen_xxsldwi(DisasContext *ctx)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 04/36] target/arm: Use tcg_gen_gvec_dup_imm
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (2 preceding siblings ...)
  2020-04-22  1:16 ` [PATCH v2 03/36] target/ppc: " Richard Henderson
@ 2020-04-22  1:16 ` Richard Henderson
  2020-04-22  1:16 ` [PATCH v2 05/36] tcg: Use tcg_gen_gvec_dup_imm in logical simplifications Richard Henderson
                   ` (32 subsequent siblings)
  36 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

In a few cases, we're able to remove some manual replication.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 10 +++++-----
 target/arm/translate-sve.c | 12 +++++-------
 target/arm/translate.c     |  9 ++++++---
 3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 7580e46367..095638e09a 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -519,7 +519,7 @@ static void clear_vec_high(DisasContext *s, bool is_q, int rd)
         tcg_temp_free_i64(tcg_zero);
     }
     if (vsz > 16) {
-        tcg_gen_gvec_dup8i(ofs + 16, vsz - 16, vsz - 16, 0);
+        tcg_gen_gvec_dup_imm(MO_64, ofs + 16, vsz - 16, vsz - 16, 0);
     }
 }
 
@@ -7794,8 +7794,8 @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
 
     if (!((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9)) {
         /* MOVI or MVNI, with MVNI negation handled above.  */
-        tcg_gen_gvec_dup64i(vec_full_reg_offset(s, rd), is_q ? 16 : 8,
-                            vec_full_reg_size(s), imm);
+        tcg_gen_gvec_dup_imm(MO_64, vec_full_reg_offset(s, rd), is_q ? 16 : 8,
+                             vec_full_reg_size(s), imm);
     } else {
         /* ORR or BIC, with BIC negation to AND handled above.  */
         if (is_neg) {
@@ -10223,8 +10223,8 @@ static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u,
         if (is_u) {
             if (shift == 8 << size) {
                 /* Shift count the same size as element size produces zero.  */
-                tcg_gen_gvec_dup8i(vec_full_reg_offset(s, rd),
-                                   is_q ? 16 : 8, vec_full_reg_size(s), 0);
+                tcg_gen_gvec_dup_imm(size, vec_full_reg_offset(s, rd),
+                                     is_q ? 16 : 8, vec_full_reg_size(s), 0);
             } else {
                 gen_gvec_fn2i(s, is_q, rd, rn, shift, tcg_gen_gvec_shri, size);
             }
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index b35bad245e..6c8bda4e4c 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -177,7 +177,7 @@ static bool do_mov_z(DisasContext *s, int rd, int rn)
 static void do_dupi_z(DisasContext *s, int rd, uint64_t word)
 {
     unsigned vsz = vec_full_reg_size(s);
-    tcg_gen_gvec_dup64i(vec_full_reg_offset(s, rd), vsz, vsz, word);
+    tcg_gen_gvec_dup_imm(MO_64, vec_full_reg_offset(s, rd), vsz, vsz, word);
 }
 
 /* Invoke a vector expander on two Pregs.  */
@@ -1453,7 +1453,7 @@ static bool do_predset(DisasContext *s, int esz, int rd, int pat, bool setflag)
         unsigned oprsz = size_for_gvec(setsz / 8);
 
         if (oprsz * 8 == setsz) {
-            tcg_gen_gvec_dup64i(ofs, oprsz, maxsz, word);
+            tcg_gen_gvec_dup_imm(MO_64, ofs, oprsz, maxsz, word);
             goto done;
         }
     }
@@ -2044,7 +2044,7 @@ static bool trans_DUP_x(DisasContext *s, arg_DUP_x *a)
             unsigned nofs = vec_reg_offset(s, a->rn, index, esz);
             tcg_gen_gvec_dup_mem(esz, dofs, nofs, vsz, vsz);
         } else {
-            tcg_gen_gvec_dup64i(dofs, vsz, vsz, 0);
+            tcg_gen_gvec_dup_imm(esz, dofs, vsz, vsz, 0);
         }
     }
     return true;
@@ -3260,9 +3260,7 @@ static bool trans_FDUP(DisasContext *s, arg_FDUP *a)
 
         /* Decode the VFP immediate.  */
         imm = vfp_expand_imm(a->esz, a->imm);
-        imm = dup_const(a->esz, imm);
-
-        tcg_gen_gvec_dup64i(dofs, vsz, vsz, imm);
+        tcg_gen_gvec_dup_imm(a->esz, dofs, vsz, vsz, imm);
     }
     return true;
 }
@@ -3276,7 +3274,7 @@ static bool trans_DUP_i(DisasContext *s, arg_DUP_i *a)
         unsigned vsz = vec_full_reg_size(s);
         int dofs = vec_full_reg_offset(s, a->rd);
 
-        tcg_gen_gvec_dup64i(dofs, vsz, vsz, dup_const(a->esz, a->imm));
+        tcg_gen_gvec_dup_imm(a->esz, dofs, vsz, vsz, a->imm);
     }
     return true;
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 9f9f4e19e0..af4d3ff4c9 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5386,7 +5386,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                           MIN(shift, (8 << size) - 1),
                                           vec_size, vec_size);
                     } else if (shift >= 8 << size) {
-                        tcg_gen_gvec_dup8i(rd_ofs, vec_size, vec_size, 0);
+                        tcg_gen_gvec_dup_imm(MO_8, rd_ofs, vec_size,
+                                             vec_size, 0);
                     } else {
                         tcg_gen_gvec_shri(size, rd_ofs, rm_ofs, shift,
                                           vec_size, vec_size);
@@ -5437,7 +5438,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          * architecturally valid and results in zero.
                          */
                         if (shift >= 8 << size) {
-                            tcg_gen_gvec_dup8i(rd_ofs, vec_size, vec_size, 0);
+                            tcg_gen_gvec_dup_imm(size, rd_ofs,
+                                                 vec_size, vec_size, 0);
                         } else {
                             tcg_gen_gvec_shli(size, rd_ofs, rm_ofs, shift,
                                               vec_size, vec_size);
@@ -5783,7 +5785,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     }
                     tcg_temp_free_i64(t64);
                 } else {
-                    tcg_gen_gvec_dup32i(reg_ofs, vec_size, vec_size, imm);
+                    tcg_gen_gvec_dup_imm(MO_32, reg_ofs, vec_size,
+                                         vec_size, imm);
                 }
             }
         }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 05/36] tcg: Use tcg_gen_gvec_dup_imm in logical simplifications
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (3 preceding siblings ...)
  2020-04-22  1:16 ` [PATCH v2 04/36] target/arm: " Richard Henderson
@ 2020-04-22  1:16 ` Richard Henderson
  2020-04-22  1:16 ` [PATCH v2 06/36] tcg: Remove tcg_gen_gvec_dup{8,16,32,64}i Richard Henderson
                   ` (31 subsequent siblings)
  36 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, LIU Zhiwei

Replace the outgoing interface.

Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-gvec.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 593bb4542e..de16c027b3 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -2326,7 +2326,7 @@ void tcg_gen_gvec_xor(unsigned vece, uint32_t dofs, uint32_t aofs,
     };
 
     if (aofs == bofs) {
-        tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, 0);
+        tcg_gen_gvec_dup_imm(MO_64, dofs, oprsz, maxsz, 0);
     } else {
         tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
     }
@@ -2343,7 +2343,7 @@ void tcg_gen_gvec_andc(unsigned vece, uint32_t dofs, uint32_t aofs,
     };
 
     if (aofs == bofs) {
-        tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, 0);
+        tcg_gen_gvec_dup_imm(MO_64, dofs, oprsz, maxsz, 0);
     } else {
         tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
     }
@@ -2360,7 +2360,7 @@ void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs,
     };
 
     if (aofs == bofs) {
-        tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, -1);
+        tcg_gen_gvec_dup_imm(MO_64, dofs, oprsz, maxsz, -1);
     } else {
         tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
     }
@@ -2411,7 +2411,7 @@ void tcg_gen_gvec_eqv(unsigned vece, uint32_t dofs, uint32_t aofs,
     };
 
     if (aofs == bofs) {
-        tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, -1);
+        tcg_gen_gvec_dup_imm(MO_64, dofs, oprsz, maxsz, -1);
     } else {
         tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g);
     }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 06/36] tcg: Remove tcg_gen_gvec_dup{8,16,32,64}i
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (4 preceding siblings ...)
  2020-04-22  1:16 ` [PATCH v2 05/36] tcg: Use tcg_gen_gvec_dup_imm in logical simplifications Richard Henderson
@ 2020-04-22  1:16 ` Richard Henderson
  2020-04-22  1:16 ` [PATCH v2 07/36] tcg: Add tcg_gen_gvec_dup_tl Richard Henderson
                   ` (30 subsequent siblings)
  36 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, LIU Zhiwei, David Hildenbrand

These interfaces are now unused.

Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-op-gvec.h |  5 -----
 tcg/tcg-op-gvec.c         | 28 ----------------------------
 2 files changed, 33 deletions(-)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index eb0d47a42b..fa8a0c8d03 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -320,11 +320,6 @@ void tcg_gen_gvec_dup_i32(unsigned vece, uint32_t dofs, uint32_t s,
 void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t s,
                           uint32_t m, TCGv_i64);
 
-void tcg_gen_gvec_dup8i(uint32_t dofs, uint32_t s, uint32_t m, uint8_t x);
-void tcg_gen_gvec_dup16i(uint32_t dofs, uint32_t s, uint32_t m, uint16_t x);
-void tcg_gen_gvec_dup32i(uint32_t dofs, uint32_t s, uint32_t m, uint32_t x);
-void tcg_gen_gvec_dup64i(uint32_t dofs, uint32_t s, uint32_t m, uint64_t x);
-
 void tcg_gen_gvec_shli(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs,
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index de16c027b3..5a6cc19812 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1541,34 +1541,6 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
     }
 }
 
-void tcg_gen_gvec_dup64i(uint32_t dofs, uint32_t oprsz,
-                         uint32_t maxsz, uint64_t x)
-{
-    check_size_align(oprsz, maxsz, dofs);
-    do_dup(MO_64, dofs, oprsz, maxsz, NULL, NULL, x);
-}
-
-void tcg_gen_gvec_dup32i(uint32_t dofs, uint32_t oprsz,
-                         uint32_t maxsz, uint32_t x)
-{
-    check_size_align(oprsz, maxsz, dofs);
-    do_dup(MO_32, dofs, oprsz, maxsz, NULL, NULL, x);
-}
-
-void tcg_gen_gvec_dup16i(uint32_t dofs, uint32_t oprsz,
-                         uint32_t maxsz, uint16_t x)
-{
-    check_size_align(oprsz, maxsz, dofs);
-    do_dup(MO_16, dofs, oprsz, maxsz, NULL, NULL, x);
-}
-
-void tcg_gen_gvec_dup8i(uint32_t dofs, uint32_t oprsz,
-                         uint32_t maxsz, uint8_t x)
-{
-    check_size_align(oprsz, maxsz, dofs);
-    do_dup(MO_8, dofs, oprsz, maxsz, NULL, NULL, x);
-}
-
 void tcg_gen_gvec_dup_imm(unsigned vece, uint32_t dofs, uint32_t oprsz,
                           uint32_t maxsz, uint64_t x)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 07/36] tcg: Add tcg_gen_gvec_dup_tl
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (5 preceding siblings ...)
  2020-04-22  1:16 ` [PATCH v2 06/36] tcg: Remove tcg_gen_gvec_dup{8,16,32,64}i Richard Henderson
@ 2020-04-22  1:16 ` Richard Henderson
  2020-04-22  1:16 ` [PATCH v2 08/36] tcg: Improve vector tail clearing Richard Henderson
                   ` (29 subsequent siblings)
  36 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, LIU Zhiwei, David Hildenbrand

For use when a target needs to pass a configure-specific
target_ulong value to duplicate.

Reviewed-by: LIU Zhiwei <zhiwei_liu@c-sky.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-op-gvec.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index fa8a0c8d03..d89f91f40e 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -320,6 +320,12 @@ void tcg_gen_gvec_dup_i32(unsigned vece, uint32_t dofs, uint32_t s,
 void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t s,
                           uint32_t m, TCGv_i64);
 
+#if TARGET_LONG_BITS == 64
+# define tcg_gen_gvec_dup_tl  tcg_gen_gvec_dup_i64
+#else
+# define tcg_gen_gvec_dup_tl  tcg_gen_gvec_dup_i32
+#endif
+
 void tcg_gen_gvec_shli(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 08/36] tcg: Improve vector tail clearing
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (6 preceding siblings ...)
  2020-04-22  1:16 ` [PATCH v2 07/36] tcg: Add tcg_gen_gvec_dup_tl Richard Henderson
@ 2020-04-22  1:16 ` Richard Henderson
  2020-04-22  1:16 ` [PATCH v2 09/36] tcg: Consolidate 3 bits into enum TCGTempKind Richard Henderson
                   ` (28 subsequent siblings)
  36 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Better handling of non-power-of-2 tails as seen with Arm 8-byte
vector operations.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-gvec.c | 82 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 63 insertions(+), 19 deletions(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 5a6cc19812..43cac1a0bf 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -326,11 +326,34 @@ void tcg_gen_gvec_5_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs,
    in units of LNSZ.  This limits the expansion of inline code.  */
 static inline bool check_size_impl(uint32_t oprsz, uint32_t lnsz)
 {
-    if (oprsz % lnsz == 0) {
-        uint32_t lnct = oprsz / lnsz;
-        return lnct >= 1 && lnct <= MAX_UNROLL;
+    uint32_t q, r;
+
+    if (oprsz < lnsz) {
+        return false;
     }
-    return false;
+
+    q = oprsz / lnsz;
+    r = oprsz % lnsz;
+    tcg_debug_assert((r & 7) == 0);
+
+    if (lnsz < 16) {
+        /* For sizes below 16, accept no remainder. */
+        if (r != 0) {
+            return false;
+        }
+    } else {
+        /*
+         * Recall that ARM SVE allows vector sizes that are not a
+         * power of 2, but always a multiple of 16.  The intent is
+         * that e.g. size == 80 would be expanded with 2x32 + 1x16.
+         * In addition, expand_clr needs to handle a multiple of 8.
+         * Thus we can handle the tail with one more operation per
+         * diminishing power of 2.
+         */
+        q += ctpop32(r);
+    }
+
+    return q <= MAX_UNROLL;
 }
 
 static void expand_clr(uint32_t dofs, uint32_t maxsz);
@@ -402,22 +425,31 @@ static void gen_dup_i64(unsigned vece, TCGv_i64 out, TCGv_i64 in)
 static TCGType choose_vector_type(const TCGOpcode *list, unsigned vece,
                                   uint32_t size, bool prefer_i64)
 {
-    if (TCG_TARGET_HAS_v256 && check_size_impl(size, 32)) {
-        /*
-         * Recall that ARM SVE allows vector sizes that are not a
-         * power of 2, but always a multiple of 16.  The intent is
-         * that e.g. size == 80 would be expanded with 2x32 + 1x16.
-         * It is hard to imagine a case in which v256 is supported
-         * but v128 is not, but check anyway.
-         */
-        if (tcg_can_emit_vecop_list(list, TCG_TYPE_V256, vece)
-            && (size % 32 == 0
-                || tcg_can_emit_vecop_list(list, TCG_TYPE_V128, vece))) {
-            return TCG_TYPE_V256;
-        }
+    /*
+     * Recall that ARM SVE allows vector sizes that are not a
+     * power of 2, but always a multiple of 16.  The intent is
+     * that e.g. size == 80 would be expanded with 2x32 + 1x16.
+     * It is hard to imagine a case in which v256 is supported
+     * but v128 is not, but check anyway.
+     * In addition, expand_clr needs to handle a multiple of 8.
+     */
+    if (TCG_TARGET_HAS_v256 &&
+        check_size_impl(size, 32) &&
+        tcg_can_emit_vecop_list(list, TCG_TYPE_V256, vece) &&
+        (!(size & 16) ||
+         (TCG_TARGET_HAS_v128 &&
+          tcg_can_emit_vecop_list(list, TCG_TYPE_V128, vece))) &&
+        (!(size & 8) ||
+         (TCG_TARGET_HAS_v64 &&
+          tcg_can_emit_vecop_list(list, TCG_TYPE_V64, vece)))) {
+        return TCG_TYPE_V256;
     }
-    if (TCG_TARGET_HAS_v128 && check_size_impl(size, 16)
-        && tcg_can_emit_vecop_list(list, TCG_TYPE_V128, vece)) {
+    if (TCG_TARGET_HAS_v128 &&
+        check_size_impl(size, 16) &&
+        tcg_can_emit_vecop_list(list, TCG_TYPE_V128, vece) &&
+        (!(size & 8) ||
+         (TCG_TARGET_HAS_v64 &&
+          tcg_can_emit_vecop_list(list, TCG_TYPE_V64, vece)))) {
         return TCG_TYPE_V128;
     }
     if (TCG_TARGET_HAS_v64 && !prefer_i64 && check_size_impl(size, 8)
@@ -432,6 +464,18 @@ static void do_dup_store(TCGType type, uint32_t dofs, uint32_t oprsz,
 {
     uint32_t i = 0;
 
+    tcg_debug_assert(oprsz >= 8);
+
+    /*
+     * This may be expand_clr for the tail of an operation, e.g.
+     * oprsz == 8 && maxsz == 64.  The first 8 bytes of this store
+     * are misaligned wrt the maximum vector size, so do that first.
+     */
+    if (dofs & 8) {
+        tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V64);
+        i += 8;
+    }
+
     switch (type) {
     case TCG_TYPE_V256:
         /*
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 09/36] tcg: Consolidate 3 bits into enum TCGTempKind
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (7 preceding siblings ...)
  2020-04-22  1:16 ` [PATCH v2 08/36] tcg: Improve vector tail clearing Richard Henderson
@ 2020-04-22  1:16 ` Richard Henderson
  2020-04-22 11:25   ` Alex Bennée
  2020-04-22 19:58   ` Aleksandar Markovic
  2020-04-22  1:16 ` [PATCH v2 10/36] tcg: Add temp_readonly Richard Henderson
                   ` (27 subsequent siblings)
  36 siblings, 2 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

The temp_fixed, temp_global, temp_local bits are all related.
Combine them into a single enumeration.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h |  20 +++++---
 tcg/optimize.c    |   8 +--
 tcg/tcg.c         | 122 ++++++++++++++++++++++++++++------------------
 3 files changed, 90 insertions(+), 60 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index c48bd76b0a..3534dce77f 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -480,23 +480,27 @@ typedef enum TCGTempVal {
     TEMP_VAL_CONST,
 } TCGTempVal;
 
+typedef enum TCGTempKind {
+    /* Temp is dead at the end of all basic blocks. */
+    TEMP_NORMAL,
+    /* Temp is saved across basic blocks but dead at the end of TBs. */
+    TEMP_LOCAL,
+    /* Temp is saved across both basic blocks and translation blocks. */
+    TEMP_GLOBAL,
+    /* Temp is in a fixed register. */
+    TEMP_FIXED,
+} TCGTempKind;
+
 typedef struct TCGTemp {
     TCGReg reg:8;
     TCGTempVal val_type:8;
     TCGType base_type:8;
     TCGType type:8;
-    unsigned int fixed_reg:1;
+    TCGTempKind kind:3;
     unsigned int indirect_reg:1;
     unsigned int indirect_base:1;
     unsigned int mem_coherent:1;
     unsigned int mem_allocated:1;
-    /* If true, the temp is saved across both basic blocks and
-       translation blocks.  */
-    unsigned int temp_global:1;
-    /* If true, the temp is saved across basic blocks but dead
-       at the end of translation blocks.  If false, the temp is
-       dead at the end of basic blocks.  */
-    unsigned int temp_local:1;
     unsigned int temp_allocated:1;
 
     tcg_target_long val;
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 53aa8e5329..afb4a9a5a9 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -116,21 +116,21 @@ static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
     TCGTemp *i;
 
     /* If this is already a global, we can't do better. */
-    if (ts->temp_global) {
+    if (ts->kind >= TEMP_GLOBAL) {
         return ts;
     }
 
     /* Search for a global first. */
     for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
-        if (i->temp_global) {
+        if (i->kind >= TEMP_GLOBAL) {
             return i;
         }
     }
 
     /* If it is a temp, search for a temp local. */
-    if (!ts->temp_local) {
+    if (ts->kind == TEMP_NORMAL) {
         for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
-            if (ts->temp_local) {
+            if (i->kind >= TEMP_LOCAL) {
                 return i;
             }
         }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index dd4b3d7684..eaf81397a3 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1155,7 +1155,7 @@ static inline TCGTemp *tcg_global_alloc(TCGContext *s)
     tcg_debug_assert(s->nb_globals == s->nb_temps);
     s->nb_globals++;
     ts = tcg_temp_alloc(s);
-    ts->temp_global = 1;
+    ts->kind = TEMP_GLOBAL;
 
     return ts;
 }
@@ -1172,7 +1172,7 @@ static TCGTemp *tcg_global_reg_new_internal(TCGContext *s, TCGType type,
     ts = tcg_global_alloc(s);
     ts->base_type = type;
     ts->type = type;
-    ts->fixed_reg = 1;
+    ts->kind = TEMP_FIXED;
     ts->reg = reg;
     ts->name = name;
     tcg_regset_set_reg(s->reserved_regs, reg);
@@ -1199,7 +1199,7 @@ TCGTemp *tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
     bigendian = 1;
 #endif
 
-    if (!base_ts->fixed_reg) {
+    if (base_ts->kind != TEMP_FIXED) {
         /* We do not support double-indirect registers.  */
         tcg_debug_assert(!base_ts->indirect_reg);
         base_ts->indirect_base = 1;
@@ -1247,6 +1247,7 @@ TCGTemp *tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
 TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
 {
     TCGContext *s = tcg_ctx;
+    TCGTempKind kind = temp_local ? TEMP_LOCAL : TEMP_NORMAL;
     TCGTemp *ts;
     int idx, k;
 
@@ -1259,7 +1260,7 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
         ts = &s->temps[idx];
         ts->temp_allocated = 1;
         tcg_debug_assert(ts->base_type == type);
-        tcg_debug_assert(ts->temp_local == temp_local);
+        tcg_debug_assert(ts->kind == kind);
     } else {
         ts = tcg_temp_alloc(s);
         if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
@@ -1268,18 +1269,18 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
             ts->base_type = type;
             ts->type = TCG_TYPE_I32;
             ts->temp_allocated = 1;
-            ts->temp_local = temp_local;
+            ts->kind = kind;
 
             tcg_debug_assert(ts2 == ts + 1);
             ts2->base_type = TCG_TYPE_I64;
             ts2->type = TCG_TYPE_I32;
             ts2->temp_allocated = 1;
-            ts2->temp_local = temp_local;
+            ts2->kind = kind;
         } else {
             ts->base_type = type;
             ts->type = type;
             ts->temp_allocated = 1;
-            ts->temp_local = temp_local;
+            ts->kind = kind;
         }
     }
 
@@ -1336,12 +1337,12 @@ void tcg_temp_free_internal(TCGTemp *ts)
     }
 #endif
 
-    tcg_debug_assert(ts->temp_global == 0);
+    tcg_debug_assert(ts->kind < TEMP_GLOBAL);
     tcg_debug_assert(ts->temp_allocated != 0);
     ts->temp_allocated = 0;
 
     idx = temp_idx(ts);
-    k = ts->base_type + (ts->temp_local ? TCG_TYPE_COUNT : 0);
+    k = ts->base_type + (ts->kind == TEMP_NORMAL ? 0 : TCG_TYPE_COUNT);
     set_bit(idx, s->free_temps[k].l);
 }
 
@@ -1864,17 +1865,27 @@ void tcg_gen_callN(void *func, TCGTemp *ret, int nargs, TCGTemp **args)
 static void tcg_reg_alloc_start(TCGContext *s)
 {
     int i, n;
-    TCGTemp *ts;
 
-    for (i = 0, n = s->nb_globals; i < n; i++) {
-        ts = &s->temps[i];
-        ts->val_type = (ts->fixed_reg ? TEMP_VAL_REG : TEMP_VAL_MEM);
-    }
-    for (n = s->nb_temps; i < n; i++) {
-        ts = &s->temps[i];
-        ts->val_type = (ts->temp_local ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
-        ts->mem_allocated = 0;
-        ts->fixed_reg = 0;
+    for (i = 0, n = s->nb_temps; i < n; i++) {
+        TCGTemp *ts = &s->temps[i];
+        TCGTempVal val = TEMP_VAL_MEM;
+
+        switch (ts->kind) {
+        case TEMP_FIXED:
+            val = TEMP_VAL_REG;
+            break;
+        case TEMP_GLOBAL:
+            break;
+        case TEMP_NORMAL:
+            val = TEMP_VAL_DEAD;
+            /* fall through */
+        case TEMP_LOCAL:
+            ts->mem_allocated = 0;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        ts->val_type = val;
     }
 
     memset(s->reg_to_temp, 0, sizeof(s->reg_to_temp));
@@ -1885,12 +1896,17 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
 {
     int idx = temp_idx(ts);
 
-    if (ts->temp_global) {
+    switch (ts->kind) {
+    case TEMP_FIXED:
+    case TEMP_GLOBAL:
         pstrcpy(buf, buf_size, ts->name);
-    } else if (ts->temp_local) {
+        break;
+    case TEMP_LOCAL:
         snprintf(buf, buf_size, "loc%d", idx - s->nb_globals);
-    } else {
+        break;
+    case TEMP_NORMAL:
         snprintf(buf, buf_size, "tmp%d", idx - s->nb_globals);
+        break;
     }
     return buf;
 }
@@ -2486,15 +2502,24 @@ static void la_bb_end(TCGContext *s, int ng, int nt)
 {
     int i;
 
-    for (i = 0; i < ng; ++i) {
-        s->temps[i].state = TS_DEAD | TS_MEM;
-        la_reset_pref(&s->temps[i]);
-    }
-    for (i = ng; i < nt; ++i) {
-        s->temps[i].state = (s->temps[i].temp_local
-                             ? TS_DEAD | TS_MEM
-                             : TS_DEAD);
-        la_reset_pref(&s->temps[i]);
+    for (i = 0; i < nt; ++i) {
+        TCGTemp *ts = &s->temps[i];
+        int state;
+
+        switch (ts->kind) {
+        case TEMP_FIXED:
+        case TEMP_GLOBAL:
+        case TEMP_LOCAL:
+            state = TS_DEAD | TS_MEM;
+            break;
+        case TEMP_NORMAL:
+            state = TS_DEAD;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        ts->state = state;
+        la_reset_pref(ts);
     }
 }
 
@@ -3069,7 +3094,8 @@ static void check_regs(TCGContext *s)
     }
     for (k = 0; k < s->nb_temps; k++) {
         ts = &s->temps[k];
-        if (ts->val_type == TEMP_VAL_REG && !ts->fixed_reg
+        if (ts->val_type == TEMP_VAL_REG
+            && ts->kind != TEMP_FIXED
             && s->reg_to_temp[ts->reg] != ts) {
             printf("Inconsistency for temp %s:\n",
                    tcg_get_arg_str_ptr(s, buf, sizeof(buf), ts));
@@ -3106,15 +3132,14 @@ static void temp_load(TCGContext *, TCGTemp *, TCGRegSet, TCGRegSet, TCGRegSet);
    mark it free; otherwise mark it dead.  */
 static void temp_free_or_dead(TCGContext *s, TCGTemp *ts, int free_or_dead)
 {
-    if (ts->fixed_reg) {
+    if (ts->kind == TEMP_FIXED) {
         return;
     }
     if (ts->val_type == TEMP_VAL_REG) {
         s->reg_to_temp[ts->reg] = NULL;
     }
     ts->val_type = (free_or_dead < 0
-                    || ts->temp_local
-                    || ts->temp_global
+                    || ts->kind != TEMP_NORMAL
                     ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
 }
 
@@ -3131,7 +3156,7 @@ static inline void temp_dead(TCGContext *s, TCGTemp *ts)
 static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs,
                       TCGRegSet preferred_regs, int free_or_dead)
 {
-    if (ts->fixed_reg) {
+    if (ts->kind == TEMP_FIXED) {
         return;
     }
     if (!ts->mem_coherent) {
@@ -3289,7 +3314,8 @@ static void temp_save(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs)
 {
     /* The liveness analysis already ensures that globals are back
        in memory. Keep an tcg_debug_assert for safety. */
-    tcg_debug_assert(ts->val_type == TEMP_VAL_MEM || ts->fixed_reg);
+    tcg_debug_assert(ts->val_type == TEMP_VAL_MEM
+                     || ts->kind == TEMP_FIXED);
 }
 
 /* save globals to their canonical location and assume they can be
@@ -3314,7 +3340,7 @@ static void sync_globals(TCGContext *s, TCGRegSet allocated_regs)
     for (i = 0, n = s->nb_globals; i < n; i++) {
         TCGTemp *ts = &s->temps[i];
         tcg_debug_assert(ts->val_type != TEMP_VAL_REG
-                         || ts->fixed_reg
+                         || ts->kind == TEMP_FIXED
                          || ts->mem_coherent);
     }
 }
@@ -3327,7 +3353,7 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
 
     for (i = s->nb_globals; i < s->nb_temps; i++) {
         TCGTemp *ts = &s->temps[i];
-        if (ts->temp_local) {
+        if (ts->kind == TEMP_LOCAL) {
             temp_save(s, ts, allocated_regs);
         } else {
             /* The liveness analysis already ensures that temps are dead.
@@ -3347,7 +3373,7 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
                                   TCGRegSet preferred_regs)
 {
     /* ENV should not be modified.  */
-    tcg_debug_assert(!ots->fixed_reg);
+    tcg_debug_assert(ots->kind != TEMP_FIXED);
 
     /* The movi is not explicitly generated here.  */
     if (ots->val_type == TEMP_VAL_REG) {
@@ -3387,7 +3413,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
     ts = arg_temp(op->args[1]);
 
     /* ENV should not be modified.  */
-    tcg_debug_assert(!ots->fixed_reg);
+    tcg_debug_assert(ots->kind != TEMP_FIXED);
 
     /* Note that otype != itype for no-op truncation.  */
     otype = ots->type;
@@ -3426,7 +3452,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
         }
         temp_dead(s, ots);
     } else {
-        if (IS_DEAD_ARG(1) && !ts->fixed_reg) {
+        if (IS_DEAD_ARG(1) && ts->kind != TEMP_FIXED) {
             /* the mov can be suppressed */
             if (ots->val_type == TEMP_VAL_REG) {
                 s->reg_to_temp[ots->reg] = NULL;
@@ -3448,7 +3474,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
                  * Store the source register into the destination slot
                  * and leave the destination temp as TEMP_VAL_MEM.
                  */
-                assert(!ots->fixed_reg);
+                assert(ots->kind != TEMP_FIXED);
                 if (!ts->mem_allocated) {
                     temp_allocate_frame(s, ots);
                 }
@@ -3485,7 +3511,7 @@ static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp *op)
     its = arg_temp(op->args[1]);
 
     /* ENV should not be modified.  */
-    tcg_debug_assert(!ots->fixed_reg);
+    tcg_debug_assert(ots->kind != TEMP_FIXED);
 
     itype = its->type;
     vece = TCGOP_VECE(op);
@@ -3625,7 +3651,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
         i_preferred_regs = o_preferred_regs = 0;
         if (arg_ct->ct & TCG_CT_IALIAS) {
             o_preferred_regs = op->output_pref[arg_ct->alias_index];
-            if (ts->fixed_reg) {
+            if (ts->kind == TEMP_FIXED) {
                 /* if fixed register, we must allocate a new register
                    if the alias is not the same register */
                 if (arg != op->args[arg_ct->alias_index]) {
@@ -3716,7 +3742,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
             ts = arg_temp(arg);
 
             /* ENV should not be modified.  */
-            tcg_debug_assert(!ts->fixed_reg);
+            tcg_debug_assert(ts->kind != TEMP_FIXED);
 
             if ((arg_ct->ct & TCG_CT_ALIAS)
                 && !const_args[arg_ct->alias_index]) {
@@ -3758,7 +3784,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
         ts = arg_temp(op->args[i]);
 
         /* ENV should not be modified.  */
-        tcg_debug_assert(!ts->fixed_reg);
+        tcg_debug_assert(ts->kind != TEMP_FIXED);
 
         if (NEED_SYNC_ARG(i)) {
             temp_sync(s, ts, o_allocated_regs, 0, IS_DEAD_ARG(i));
@@ -3890,7 +3916,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
         ts = arg_temp(arg);
 
         /* ENV should not be modified.  */
-        tcg_debug_assert(!ts->fixed_reg);
+        tcg_debug_assert(ts->kind != TEMP_FIXED);
 
         reg = tcg_target_call_oarg_regs[i];
         tcg_debug_assert(s->reg_to_temp[reg] == NULL);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 10/36] tcg: Add temp_readonly
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (8 preceding siblings ...)
  2020-04-22  1:16 ` [PATCH v2 09/36] tcg: Consolidate 3 bits into enum TCGTempKind Richard Henderson
@ 2020-04-22  1:16 ` Richard Henderson
  2020-04-22 11:26   ` Alex Bennée
  2020-04-22  1:16 ` [PATCH v2 11/36] tcg: Introduce TYPE_CONST temporaries Richard Henderson
                   ` (26 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, Philippe Mathieu-Daudé

In most, but not all, places that we check for TEMP_FIXED,
we are really testing that we do not modify the temporary.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h |  5 +++++
 tcg/tcg.c         | 21 ++++++++++-----------
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 3534dce77f..27e1b509a6 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -678,6 +678,11 @@ struct TCGContext {
     target_ulong gen_insn_data[TCG_MAX_INSNS][TARGET_INSN_START_WORDS];
 };
 
+static inline bool temp_readonly(TCGTemp *ts)
+{
+    return ts->kind == TEMP_FIXED;
+}
+
 extern TCGContext tcg_init_ctx;
 extern __thread TCGContext *tcg_ctx;
 extern TCGv_env cpu_env;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index eaf81397a3..92b3767097 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3132,7 +3132,7 @@ static void temp_load(TCGContext *, TCGTemp *, TCGRegSet, TCGRegSet, TCGRegSet);
    mark it free; otherwise mark it dead.  */
 static void temp_free_or_dead(TCGContext *s, TCGTemp *ts, int free_or_dead)
 {
-    if (ts->kind == TEMP_FIXED) {
+    if (temp_readonly(ts)) {
         return;
     }
     if (ts->val_type == TEMP_VAL_REG) {
@@ -3156,7 +3156,7 @@ static inline void temp_dead(TCGContext *s, TCGTemp *ts)
 static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs,
                       TCGRegSet preferred_regs, int free_or_dead)
 {
-    if (ts->kind == TEMP_FIXED) {
+    if (temp_readonly(ts)) {
         return;
     }
     if (!ts->mem_coherent) {
@@ -3314,8 +3314,7 @@ static void temp_save(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs)
 {
     /* The liveness analysis already ensures that globals are back
        in memory. Keep an tcg_debug_assert for safety. */
-    tcg_debug_assert(ts->val_type == TEMP_VAL_MEM
-                     || ts->kind == TEMP_FIXED);
+    tcg_debug_assert(ts->val_type == TEMP_VAL_MEM || temp_readonly(ts));
 }
 
 /* save globals to their canonical location and assume they can be
@@ -3373,7 +3372,7 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
                                   TCGRegSet preferred_regs)
 {
     /* ENV should not be modified.  */
-    tcg_debug_assert(ots->kind != TEMP_FIXED);
+    tcg_debug_assert(!temp_readonly(ots));
 
     /* The movi is not explicitly generated here.  */
     if (ots->val_type == TEMP_VAL_REG) {
@@ -3413,7 +3412,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
     ts = arg_temp(op->args[1]);
 
     /* ENV should not be modified.  */
-    tcg_debug_assert(ots->kind != TEMP_FIXED);
+    tcg_debug_assert(!temp_readonly(ots));
 
     /* Note that otype != itype for no-op truncation.  */
     otype = ots->type;
@@ -3474,7 +3473,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
                  * Store the source register into the destination slot
                  * and leave the destination temp as TEMP_VAL_MEM.
                  */
-                assert(ots->kind != TEMP_FIXED);
+                assert(!temp_readonly(ots));
                 if (!ts->mem_allocated) {
                     temp_allocate_frame(s, ots);
                 }
@@ -3511,7 +3510,7 @@ static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp *op)
     its = arg_temp(op->args[1]);
 
     /* ENV should not be modified.  */
-    tcg_debug_assert(ots->kind != TEMP_FIXED);
+    tcg_debug_assert(!temp_readonly(ots));
 
     itype = its->type;
     vece = TCGOP_VECE(op);
@@ -3742,7 +3741,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
             ts = arg_temp(arg);
 
             /* ENV should not be modified.  */
-            tcg_debug_assert(ts->kind != TEMP_FIXED);
+            tcg_debug_assert(!temp_readonly(ts));
 
             if ((arg_ct->ct & TCG_CT_ALIAS)
                 && !const_args[arg_ct->alias_index]) {
@@ -3784,7 +3783,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
         ts = arg_temp(op->args[i]);
 
         /* ENV should not be modified.  */
-        tcg_debug_assert(ts->kind != TEMP_FIXED);
+        tcg_debug_assert(!temp_readonly(ts));
 
         if (NEED_SYNC_ARG(i)) {
             temp_sync(s, ts, o_allocated_regs, 0, IS_DEAD_ARG(i));
@@ -3916,7 +3915,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
         ts = arg_temp(arg);
 
         /* ENV should not be modified.  */
-        tcg_debug_assert(ts->kind != TEMP_FIXED);
+        tcg_debug_assert(!temp_readonly(ts));
 
         reg = tcg_target_call_oarg_regs[i];
         tcg_debug_assert(s->reg_to_temp[reg] == NULL);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 11/36] tcg: Introduce TYPE_CONST temporaries
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (9 preceding siblings ...)
  2020-04-22  1:16 ` [PATCH v2 10/36] tcg: Add temp_readonly Richard Henderson
@ 2020-04-22  1:16 ` Richard Henderson
  2020-04-22 15:17   ` Alex Bennée
  2020-04-22  1:16 ` [PATCH v2 12/36] tcg: Use tcg_constant_i32 with icount expander Richard Henderson
                   ` (25 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

These will hold a single constant for the duration of the TB.
They are hashed, so that each value has one temp across the TB.

Not used yet, this is all infrastructure.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h |  27 ++++++++++-
 tcg/optimize.c    |  40 ++++++++++-------
 tcg/tcg-op-vec.c  |  17 +++++++
 tcg/tcg.c         | 111 +++++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 166 insertions(+), 29 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 27e1b509a6..f72530dfda 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -489,6 +489,8 @@ typedef enum TCGTempKind {
     TEMP_GLOBAL,
     /* Temp is in a fixed register. */
     TEMP_FIXED,
+    /* Temp is a fixed constant. */
+    TEMP_CONST,
 } TCGTempKind;
 
 typedef struct TCGTemp {
@@ -664,6 +666,7 @@ struct TCGContext {
     QSIMPLEQ_HEAD(, TCGOp) plugin_ops;
 #endif
 
+    GHashTable *const_table[TCG_TYPE_COUNT];
     TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
     TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
 
@@ -680,7 +683,7 @@ struct TCGContext {
 
 static inline bool temp_readonly(TCGTemp *ts)
 {
-    return ts->kind == TEMP_FIXED;
+    return ts->kind >= TEMP_FIXED;
 }
 
 extern TCGContext tcg_init_ctx;
@@ -1038,6 +1041,7 @@ TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *op, TCGOpcode opc);
 
 void tcg_optimize(TCGContext *s);
 
+/* Allocate a new temporary and initialize it with a constant. */
 TCGv_i32 tcg_const_i32(int32_t val);
 TCGv_i64 tcg_const_i64(int64_t val);
 TCGv_i32 tcg_const_local_i32(int32_t val);
@@ -1047,6 +1051,27 @@ TCGv_vec tcg_const_ones_vec(TCGType);
 TCGv_vec tcg_const_zeros_vec_matching(TCGv_vec);
 TCGv_vec tcg_const_ones_vec_matching(TCGv_vec);
 
+/*
+ * Locate or create a read-only temporary that is a constant.
+ * This kind of temporary need not and should not be freed.
+ */
+TCGTemp *tcg_constant_internal(TCGType type, tcg_target_long val);
+
+static inline TCGv_i32 tcg_constant_i32(int32_t val)
+{
+    return temp_tcgv_i32(tcg_constant_internal(TCG_TYPE_I32, val));
+}
+
+static inline TCGv_i64 tcg_constant_i64(int64_t val)
+{
+    if (TCG_TARGET_REG_BITS == 32) {
+        qemu_build_not_reached();
+    }
+    return temp_tcgv_i64(tcg_constant_internal(TCG_TYPE_I64, val));
+}
+
+TCGv_vec tcg_constant_vec(TCGType type, unsigned vece, int64_t val);
+
 #if UINTPTR_MAX == UINT32_MAX
 # define tcg_const_ptr(x)        ((TCGv_ptr)tcg_const_i32((intptr_t)(x)))
 # define tcg_const_local_ptr(x)  ((TCGv_ptr)tcg_const_local_i32((intptr_t)(x)))
diff --git a/tcg/optimize.c b/tcg/optimize.c
index afb4a9a5a9..effb47eefd 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -99,8 +99,17 @@ static void init_ts_info(struct tcg_temp_info *infos,
         ts->state_ptr = ti;
         ti->next_copy = ts;
         ti->prev_copy = ts;
-        ti->is_const = false;
-        ti->mask = -1;
+        if (ts->kind == TEMP_CONST) {
+            ti->is_const = true;
+            ti->val = ti->mask = ts->val;
+            if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
+                /* High bits of a 32-bit quantity are garbage.  */
+                ti->mask |= ~0xffffffffull;
+            }
+        } else {
+            ti->is_const = false;
+            ti->mask = -1;
+        }
         set_bit(idx, temps_used->l);
     }
 }
@@ -113,31 +122,28 @@ static void init_arg_info(struct tcg_temp_info *infos,
 
 static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
 {
-    TCGTemp *i;
+    TCGTemp *i, *g, *l;
 
-    /* If this is already a global, we can't do better. */
-    if (ts->kind >= TEMP_GLOBAL) {
+    /* If this is already readonly, we can't do better. */
+    if (temp_readonly(ts)) {
         return ts;
     }
 
-    /* Search for a global first. */
+    g = l = NULL;
     for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
-        if (i->kind >= TEMP_GLOBAL) {
+        if (temp_readonly(i)) {
             return i;
-        }
-    }
-
-    /* If it is a temp, search for a temp local. */
-    if (ts->kind == TEMP_NORMAL) {
-        for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
-            if (i->kind >= TEMP_LOCAL) {
-                return i;
+        } else if (i->kind > ts->kind) {
+            if (i->kind == TEMP_GLOBAL) {
+                g = i;
+            } else if (i->kind == TEMP_LOCAL) {
+                l = i;
             }
         }
     }
 
-    /* Failure to find a better representation, return the same temp. */
-    return ts;
+    /* If we didn't find a better representation, return the same temp. */
+    return g ? g : l ? l : ts;
 }
 
 static bool ts_are_copies(TCGTemp *ts1, TCGTemp *ts2)
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index b6937e8d64..f3927089a7 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -209,6 +209,23 @@ static void vec_gen_op3(TCGOpcode opc, unsigned vece,
     vec_gen_3(opc, type, vece, temp_arg(rt), temp_arg(at), temp_arg(bt));
 }
 
+TCGv_vec tcg_constant_vec(TCGType type, unsigned vece, int64_t val)
+{
+    val = dup_const(vece, val);
+
+    /*
+     * For MO_64 constants that can't be represented in tcg_target_long,
+     * we must use INDEX_op_dup2_vec, which requires a non-const temporary.
+     */
+    if (TCG_TARGET_REG_BITS == 32 &&
+        val != deposit64(val, 32, 32, val) &&
+        val != (uint64_t)(int32_t)val) {
+        g_assert_not_reached();
+    }
+
+    return temp_tcgv_vec(tcg_constant_internal(type, val));
+}
+
 void tcg_gen_mov_vec(TCGv_vec r, TCGv_vec a)
 {
     if (r != a) {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 92b3767097..59beb2bf29 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1127,6 +1127,7 @@ void tcg_func_start(TCGContext *s)
 
     /* No temps have been previously allocated for size or locality.  */
     memset(s->free_temps, 0, sizeof(s->free_temps));
+    memset(s->const_table, 0, sizeof(s->const_table));
 
     s->nb_ops = 0;
     s->nb_labels = 0;
@@ -1199,13 +1200,19 @@ TCGTemp *tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
     bigendian = 1;
 #endif
 
-    if (base_ts->kind != TEMP_FIXED) {
+    switch (base_ts->kind) {
+    case TEMP_FIXED:
+        break;
+    case TEMP_GLOBAL:
         /* We do not support double-indirect registers.  */
         tcg_debug_assert(!base_ts->indirect_reg);
         base_ts->indirect_base = 1;
         s->nb_indirects += (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64
                             ? 2 : 1);
         indirect_reg = 1;
+        break;
+    default:
+        g_assert_not_reached();
     }
 
     if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
@@ -1346,6 +1353,37 @@ void tcg_temp_free_internal(TCGTemp *ts)
     set_bit(idx, s->free_temps[k].l);
 }
 
+TCGTemp *tcg_constant_internal(TCGType type, tcg_target_long val)
+{
+    TCGContext *s = tcg_ctx;
+    GHashTable *h = s->const_table[type];
+    TCGTemp *ts;
+
+    if (h == NULL) {
+        if (sizeof(tcg_target_long) == sizeof(gint64)) {
+            h = g_hash_table_new(g_int64_hash, g_int64_equal);
+        } else if (sizeof(tcg_target_long) == sizeof(gint)) {
+            h = g_hash_table_new(g_int_hash, g_int_equal);
+        } else {
+            qemu_build_not_reached();
+        }
+        s->const_table[type] = h;
+    }
+
+    ts = g_hash_table_lookup(h, &val);
+    if (ts == NULL) {
+        ts = tcg_temp_alloc(s);
+        ts->base_type = type;
+        ts->type = type;
+        ts->kind = TEMP_CONST;
+        ts->temp_allocated = 1;
+        ts->val = val;
+        g_hash_table_insert(h, &ts->val, ts);
+    }
+
+    return ts;
+}
+
 TCGv_i32 tcg_const_i32(int32_t val)
 {
     TCGv_i32 t0;
@@ -1871,6 +1909,9 @@ static void tcg_reg_alloc_start(TCGContext *s)
         TCGTempVal val = TEMP_VAL_MEM;
 
         switch (ts->kind) {
+        case TEMP_CONST:
+            val = TEMP_VAL_CONST;
+            break;
         case TEMP_FIXED:
             val = TEMP_VAL_REG;
             break;
@@ -1907,6 +1948,26 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
     case TEMP_NORMAL:
         snprintf(buf, buf_size, "tmp%d", idx - s->nb_globals);
         break;
+    case TEMP_CONST:
+        switch (ts->type) {
+        case TCG_TYPE_I32:
+            snprintf(buf, buf_size, "$0x%x", (int32_t)ts->val);
+            break;
+#if TCG_TARGET_REG_BITS > 32
+        case TCG_TYPE_I64:
+            snprintf(buf, buf_size, "$0x%" TCG_PRIlx, ts->val);
+            break;
+#endif
+        case TCG_TYPE_V64:
+        case TCG_TYPE_V128:
+        case TCG_TYPE_V256:
+            snprintf(buf, buf_size, "v%d$0x%" TCG_PRIlx,
+                     64 << (ts->type - TCG_TYPE_V64), ts->val);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        break;
     }
     return buf;
 }
@@ -2513,6 +2574,7 @@ static void la_bb_end(TCGContext *s, int ng, int nt)
             state = TS_DEAD | TS_MEM;
             break;
         case TEMP_NORMAL:
+        case TEMP_CONST:
             state = TS_DEAD;
             break;
         default:
@@ -3132,15 +3194,28 @@ static void temp_load(TCGContext *, TCGTemp *, TCGRegSet, TCGRegSet, TCGRegSet);
    mark it free; otherwise mark it dead.  */
 static void temp_free_or_dead(TCGContext *s, TCGTemp *ts, int free_or_dead)
 {
-    if (temp_readonly(ts)) {
+    TCGTempVal new_type;
+
+    switch (ts->kind) {
+    case TEMP_FIXED:
         return;
+    case TEMP_GLOBAL:
+    case TEMP_LOCAL:
+        new_type = TEMP_VAL_MEM;
+        break;
+    case TEMP_NORMAL:
+        new_type = free_or_dead < 0 ? TEMP_VAL_MEM : TEMP_VAL_DEAD;
+        break;
+    case TEMP_CONST:
+        new_type = TEMP_VAL_CONST;
+        break;
+    default:
+        g_assert_not_reached();
     }
     if (ts->val_type == TEMP_VAL_REG) {
         s->reg_to_temp[ts->reg] = NULL;
     }
-    ts->val_type = (free_or_dead < 0
-                    || ts->kind != TEMP_NORMAL
-                    ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
+    ts->val_type = new_type;
 }
 
 /* Mark a temporary as dead.  */
@@ -3156,10 +3231,7 @@ static inline void temp_dead(TCGContext *s, TCGTemp *ts)
 static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs,
                       TCGRegSet preferred_regs, int free_or_dead)
 {
-    if (temp_readonly(ts)) {
-        return;
-    }
-    if (!ts->mem_coherent) {
+    if (!temp_readonly(ts) && !ts->mem_coherent) {
         if (!ts->mem_allocated) {
             temp_allocate_frame(s, ts);
         }
@@ -3352,12 +3424,22 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
 
     for (i = s->nb_globals; i < s->nb_temps; i++) {
         TCGTemp *ts = &s->temps[i];
-        if (ts->kind == TEMP_LOCAL) {
+
+        switch (ts->kind) {
+        case TEMP_LOCAL:
             temp_save(s, ts, allocated_regs);
-        } else {
+            break;
+        case TEMP_NORMAL:
             /* The liveness analysis already ensures that temps are dead.
                Keep an tcg_debug_assert for safety. */
             tcg_debug_assert(ts->val_type == TEMP_VAL_DEAD);
+            break;
+        case TEMP_CONST:
+            /* Similarly, we should have freed any allocated register. */
+            tcg_debug_assert(ts->val_type == TEMP_VAL_CONST);
+            break;
+        default:
+            g_assert_not_reached();
         }
     }
 
@@ -4148,6 +4230,13 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     }
 #endif
 
+    for (i = 0; i < TCG_TYPE_COUNT; ++i) {
+        if (s->const_table[i]) {
+            g_hash_table_destroy(s->const_table[i]);
+            s->const_table[i] = NULL;
+        }
+    }
+
     tcg_reg_alloc_start(s);
 
     s->code_buf = tb->tc.ptr;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 12/36] tcg: Use tcg_constant_i32 with icount expander
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (10 preceding siblings ...)
  2020-04-22  1:16 ` [PATCH v2 11/36] tcg: Introduce TYPE_CONST temporaries Richard Henderson
@ 2020-04-22  1:16 ` Richard Henderson
  2020-04-22 15:40   ` Alex Bennée
  2020-04-22  1:16 ` [PATCH v2 13/36] tcg: Use tcg_constant_{i32, i64} with tcg int expanders Richard Henderson
                   ` (24 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

We must do this before we adjust how tcg_out_movi_i32,
lest the under-the-hood poking that we do be broken.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/gen-icount.h | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 822c43cfd3..404732518a 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -34,7 +34,7 @@ static inline void gen_io_end(void)
 
 static inline void gen_tb_start(TranslationBlock *tb)
 {
-    TCGv_i32 count, imm;
+    TCGv_i32 count;
 
     tcg_ctx->exitreq_label = gen_new_label();
     if (tb_cflags(tb) & CF_USE_ICOUNT) {
@@ -48,15 +48,13 @@ static inline void gen_tb_start(TranslationBlock *tb)
                    offsetof(ArchCPU, env));
 
     if (tb_cflags(tb) & CF_USE_ICOUNT) {
-        imm = tcg_temp_new_i32();
-        /* We emit a movi with a dummy immediate argument. Keep the insn index
-         * of the movi so that we later (when we know the actual insn count)
-         * can update the immediate argument with the actual insn count.  */
-        tcg_gen_movi_i32(imm, 0xdeadbeef);
+        /*
+         * We emit a sub with a dummy immediate argument. Keep the insn index
+         * of the sub so that we later (when we know the actual insn count)
+         * can update the argument with the actual insn count.
+         */
+        tcg_gen_sub_i32(count, count, tcg_constant_i32(0));
         icount_start_insn = tcg_last_op();
-
-        tcg_gen_sub_i32(count, count, imm);
-        tcg_temp_free_i32(imm);
     }
 
     tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
@@ -74,9 +72,12 @@ static inline void gen_tb_start(TranslationBlock *tb)
 static inline void gen_tb_end(TranslationBlock *tb, int num_insns)
 {
     if (tb_cflags(tb) & CF_USE_ICOUNT) {
-        /* Update the num_insn immediate parameter now that we know
-         * the actual insn count.  */
-        tcg_set_insn_param(icount_start_insn, 1, num_insns);
+        /*
+         * Update the num_insn immediate parameter now that we know
+         * the actual insn count.
+         */
+        tcg_set_insn_param(icount_start_insn, 2,
+                           tcgv_i32_arg(tcg_constant_i32(num_insns)));
     }
 
     gen_set_label(tcg_ctx->exitreq_label);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 13/36] tcg: Use tcg_constant_{i32, i64} with tcg int expanders
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (11 preceding siblings ...)
  2020-04-22  1:16 ` [PATCH v2 12/36] tcg: Use tcg_constant_i32 with icount expander Richard Henderson
@ 2020-04-22  1:16 ` Richard Henderson
  2020-04-22 16:18   ` [PATCH v2 13/36] tcg: Use tcg_constant_{i32,i64} " Alex Bennée
  2020-04-22 20:04   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 14/36] tcg: Use tcg_constant_{i32, vec} with tcg vec expanders Richard Henderson
                   ` (23 subsequent siblings)
  36 siblings, 2 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-op.h |  13 +--
 tcg/tcg-op.c         | 216 ++++++++++++++++++++-----------------------
 2 files changed, 100 insertions(+), 129 deletions(-)

diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index 230db6e022..11ed9192f7 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -271,6 +271,7 @@ void tcg_gen_mb(TCGBar);
 
 /* 32 bit ops */
 
+void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg);
 void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
 void tcg_gen_subfi_i32(TCGv_i32 ret, int32_t arg1, TCGv_i32 arg2);
 void tcg_gen_subi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
@@ -349,11 +350,6 @@ static inline void tcg_gen_mov_i32(TCGv_i32 ret, TCGv_i32 arg)
     }
 }
 
-static inline void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg)
-{
-    tcg_gen_op2i_i32(INDEX_op_movi_i32, ret, arg);
-}
-
 static inline void tcg_gen_ld8u_i32(TCGv_i32 ret, TCGv_ptr arg2,
                                     tcg_target_long offset)
 {
@@ -467,6 +463,7 @@ static inline void tcg_gen_not_i32(TCGv_i32 ret, TCGv_i32 arg)
 
 /* 64 bit ops */
 
+void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg);
 void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
 void tcg_gen_subfi_i64(TCGv_i64 ret, int64_t arg1, TCGv_i64 arg2);
 void tcg_gen_subi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
@@ -550,11 +547,6 @@ static inline void tcg_gen_mov_i64(TCGv_i64 ret, TCGv_i64 arg)
     }
 }
 
-static inline void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg)
-{
-    tcg_gen_op2i_i64(INDEX_op_movi_i64, ret, arg);
-}
-
 static inline void tcg_gen_ld8u_i64(TCGv_i64 ret, TCGv_ptr arg2,
                                     tcg_target_long offset)
 {
@@ -698,7 +690,6 @@ static inline void tcg_gen_sub_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
 
 void tcg_gen_discard_i64(TCGv_i64 arg);
 void tcg_gen_mov_i64(TCGv_i64 ret, TCGv_i64 arg);
-void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg);
 void tcg_gen_ld8u_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
 void tcg_gen_ld8s_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
 void tcg_gen_ld16u_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index e2e25ebf7d..07eb661a07 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -104,15 +104,18 @@ void tcg_gen_mb(TCGBar mb_type)
 
 /* 32 bit ops */
 
+void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg)
+{
+    tcg_gen_mov_i32(ret, tcg_constant_i32(arg));
+}
+
 void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 {
     /* some cases can be optimized here */
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_add_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_add_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -122,9 +125,7 @@ void tcg_gen_subfi_i32(TCGv_i32 ret, int32_t arg1, TCGv_i32 arg2)
         /* Don't recurse with tcg_gen_neg_i32.  */
         tcg_gen_op2_i32(INDEX_op_neg_i32, ret, arg2);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg1);
-        tcg_gen_sub_i32(ret, t0, arg2);
-        tcg_temp_free_i32(t0);
+        tcg_gen_sub_i32(ret, tcg_constant_i32(arg1), arg2);
     }
 }
 
@@ -134,15 +135,12 @@ void tcg_gen_subi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_sub_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_sub_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
 void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 {
-    TCGv_i32 t0;
     /* Some cases can be optimized here.  */
     switch (arg2) {
     case 0:
@@ -165,9 +163,8 @@ void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
         }
         break;
     }
-    t0 = tcg_const_i32(arg2);
-    tcg_gen_and_i32(ret, arg1, t0);
-    tcg_temp_free_i32(t0);
+
+    tcg_gen_and_i32(ret, arg1, tcg_constant_i32(arg2));
 }
 
 void tcg_gen_ori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
@@ -178,9 +175,7 @@ void tcg_gen_ori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
     } else if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_or_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_or_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -193,9 +188,7 @@ void tcg_gen_xori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
         /* Don't recurse with tcg_gen_not_i32.  */
         tcg_gen_op2_i32(INDEX_op_not_i32, ret, arg1);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_xor_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_xor_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -205,9 +198,7 @@ void tcg_gen_shli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_shl_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_shl_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -217,9 +208,7 @@ void tcg_gen_shri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_shr_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_shr_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -229,9 +218,7 @@ void tcg_gen_sari_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_sar_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_sar_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -250,9 +237,7 @@ void tcg_gen_brcondi_i32(TCGCond cond, TCGv_i32 arg1, int32_t arg2, TCGLabel *l)
     if (cond == TCG_COND_ALWAYS) {
         tcg_gen_br(l);
     } else if (cond != TCG_COND_NEVER) {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_brcond_i32(cond, arg1, t0, l);
-        tcg_temp_free_i32(t0);
+        tcg_gen_brcond_i32(cond, arg1, tcg_constant_i32(arg2), l);
     }
 }
 
@@ -271,9 +256,7 @@ void tcg_gen_setcond_i32(TCGCond cond, TCGv_i32 ret,
 void tcg_gen_setcondi_i32(TCGCond cond, TCGv_i32 ret,
                           TCGv_i32 arg1, int32_t arg2)
 {
-    TCGv_i32 t0 = tcg_const_i32(arg2);
-    tcg_gen_setcond_i32(cond, ret, arg1, t0);
-    tcg_temp_free_i32(t0);
+    tcg_gen_setcond_i32(cond, ret, arg1, tcg_constant_i32(arg2));
 }
 
 void tcg_gen_muli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
@@ -283,9 +266,7 @@ void tcg_gen_muli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
     } else if (is_power_of_2(arg2)) {
         tcg_gen_shli_i32(ret, arg1, ctz32(arg2));
     } else {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_mul_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_mul_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -433,9 +414,7 @@ void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
 
 void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
 {
-    TCGv_i32 t = tcg_const_i32(arg2);
-    tcg_gen_clz_i32(ret, arg1, t);
-    tcg_temp_free_i32(t);
+    tcg_gen_clz_i32(ret, arg1, tcg_constant_i32(arg2));
 }
 
 void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
@@ -468,10 +447,9 @@ void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
             tcg_gen_clzi_i32(t, t, 32);
             tcg_gen_xori_i32(t, t, 31);
         }
-        z = tcg_const_i32(0);
+        z = tcg_constant_i32(0);
         tcg_gen_movcond_i32(TCG_COND_EQ, ret, arg1, z, arg2, t);
         tcg_temp_free_i32(t);
-        tcg_temp_free_i32(z);
     } else {
         gen_helper_ctz_i32(ret, arg1, arg2);
     }
@@ -487,9 +465,7 @@ void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
         tcg_gen_ctpop_i32(ret, t);
         tcg_temp_free_i32(t);
     } else {
-        TCGv_i32 t = tcg_const_i32(arg2);
-        tcg_gen_ctz_i32(ret, arg1, t);
-        tcg_temp_free_i32(t);
+        tcg_gen_ctz_i32(ret, arg1, tcg_constant_i32(arg2));
     }
 }
 
@@ -547,9 +523,7 @@ void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2)
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else if (TCG_TARGET_HAS_rot_i32) {
-        TCGv_i32 t0 = tcg_const_i32(arg2);
-        tcg_gen_rotl_i32(ret, arg1, t0);
-        tcg_temp_free_i32(t0);
+        tcg_gen_rotl_i32(ret, arg1, tcg_constant_i32(arg2));
     } else {
         TCGv_i32 t0, t1;
         t0 = tcg_temp_new_i32();
@@ -653,9 +627,8 @@ void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
         tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
     } else if (TCG_TARGET_HAS_deposit_i32
                && TCG_TARGET_deposit_i32_valid(ofs, len)) {
-        TCGv_i32 zero = tcg_const_i32(0);
+        TCGv_i32 zero = tcg_constant_i32(0);
         tcg_gen_op5ii_i32(INDEX_op_deposit_i32, ret, zero, arg, ofs, len);
-        tcg_temp_free_i32(zero);
     } else {
         /* To help two-operand hosts we prefer to zero-extend first,
            which allows ARG to stay live.  */
@@ -1052,7 +1025,7 @@ void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
     } else {
         TCGv_i32 t0 = tcg_temp_new_i32();
         TCGv_i32 t1 = tcg_temp_new_i32();
-        TCGv_i32 t2 = tcg_const_i32(0x00ff00ff);
+        TCGv_i32 t2 = tcg_constant_i32(0x00ff00ff);
 
                                         /* arg = abcd */
         tcg_gen_shri_i32(t0, arg, 8);   /*  t0 = .abc */
@@ -1067,7 +1040,6 @@ void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
 
         tcg_temp_free_i32(t0);
         tcg_temp_free_i32(t1);
-        tcg_temp_free_i32(t2);
     }
 }
 
@@ -1237,6 +1209,14 @@ void tcg_gen_mul_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
     tcg_temp_free_i64(t0);
     tcg_temp_free_i32(t1);
 }
+
+#else
+
+void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg)
+{
+    tcg_gen_mov_i64(ret, tcg_constant_i64(arg));
+}
+
 #endif /* TCG_TARGET_REG_SIZE == 32 */
 
 void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
@@ -1244,10 +1224,12 @@ void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
     /* some cases can be optimized here */
     if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
+    } else if (TCG_TARGET_REG_BITS == 64) {
+        tcg_gen_add_i64(ret, arg1, tcg_constant_i64(arg2));
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_add_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_add2_i32(TCGV_LOW(ret), TCGV_HIGH(ret),
+                         TCGV_LOW(arg1), TCGV_HIGH(arg1),
+                         tcg_constant_i32(arg2), tcg_constant_i32(arg2 >> 32));
     }
 }
 
@@ -1256,10 +1238,12 @@ void tcg_gen_subfi_i64(TCGv_i64 ret, int64_t arg1, TCGv_i64 arg2)
     if (arg1 == 0 && TCG_TARGET_HAS_neg_i64) {
         /* Don't recurse with tcg_gen_neg_i64.  */
         tcg_gen_op2_i64(INDEX_op_neg_i64, ret, arg2);
+    } else if (TCG_TARGET_REG_BITS == 64) {
+        tcg_gen_sub_i64(ret, tcg_constant_i64(arg1), arg2);
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg1);
-        tcg_gen_sub_i64(ret, t0, arg2);
-        tcg_temp_free_i64(t0);
+        tcg_gen_sub2_i32(TCGV_LOW(ret), TCGV_HIGH(ret),
+                         tcg_constant_i32(arg1), tcg_constant_i32(arg1 >> 32),
+                         TCGV_LOW(arg2), TCGV_HIGH(arg2));
     }
 }
 
@@ -1268,17 +1252,17 @@ void tcg_gen_subi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
     /* some cases can be optimized here */
     if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
+    } else if (TCG_TARGET_REG_BITS == 64) {
+        tcg_gen_sub_i64(ret, arg1, tcg_constant_i64(arg2));
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_sub_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_sub2_i32(TCGV_LOW(ret), TCGV_HIGH(ret),
+                         TCGV_LOW(arg1), TCGV_HIGH(arg1),
+                         tcg_constant_i32(arg2), tcg_constant_i32(arg2 >> 32));
     }
 }
 
 void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
 {
-    TCGv_i64 t0;
-
     if (TCG_TARGET_REG_BITS == 32) {
         tcg_gen_andi_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2);
         tcg_gen_andi_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2 >> 32);
@@ -1313,9 +1297,8 @@ void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
         }
         break;
     }
-    t0 = tcg_const_i64(arg2);
-    tcg_gen_and_i64(ret, arg1, t0);
-    tcg_temp_free_i64(t0);
+
+    tcg_gen_and_i64(ret, arg1, tcg_constant_i64(arg2));
 }
 
 void tcg_gen_ori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
@@ -1331,9 +1314,7 @@ void tcg_gen_ori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
     } else if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_or_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_or_i64(ret, arg1, tcg_constant_i64(arg2));
     }
 }
 
@@ -1351,9 +1332,7 @@ void tcg_gen_xori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
         /* Don't recurse with tcg_gen_not_i64.  */
         tcg_gen_op2_i64(INDEX_op_not_i64, ret, arg1);
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_xor_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_xor_i64(ret, arg1, tcg_constant_i64(arg2));
     }
 }
 
@@ -1415,9 +1394,7 @@ void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
     } else if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_shl_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_shl_i64(ret, arg1, tcg_constant_i64(arg2));
     }
 }
 
@@ -1429,9 +1406,7 @@ void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
     } else if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_shr_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_shr_i64(ret, arg1, tcg_constant_i64(arg2));
     }
 }
 
@@ -1443,9 +1418,7 @@ void tcg_gen_sari_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
     } else if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
     } else {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_sar_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_sar_i64(ret, arg1, tcg_constant_i64(arg2));
     }
 }
 
@@ -1468,12 +1441,17 @@ void tcg_gen_brcond_i64(TCGCond cond, TCGv_i64 arg1, TCGv_i64 arg2, TCGLabel *l)
 
 void tcg_gen_brcondi_i64(TCGCond cond, TCGv_i64 arg1, int64_t arg2, TCGLabel *l)
 {
-    if (cond == TCG_COND_ALWAYS) {
+    if (TCG_TARGET_REG_BITS == 64) {
+        tcg_gen_brcond_i64(cond, arg1, tcg_constant_i64(arg2), l);
+    } else if (cond == TCG_COND_ALWAYS) {
         tcg_gen_br(l);
     } else if (cond != TCG_COND_NEVER) {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_brcond_i64(cond, arg1, t0, l);
-        tcg_temp_free_i64(t0);
+        l->refs++;
+        tcg_gen_op6ii_i32(INDEX_op_brcond2_i32,
+                          TCGV_LOW(arg1), TCGV_HIGH(arg1),
+                          tcg_constant_i32(arg2),
+                          tcg_constant_i32(arg2 >> 32),
+                          cond, label_arg(l));
     }
 }
 
@@ -1499,9 +1477,19 @@ void tcg_gen_setcond_i64(TCGCond cond, TCGv_i64 ret,
 void tcg_gen_setcondi_i64(TCGCond cond, TCGv_i64 ret,
                           TCGv_i64 arg1, int64_t arg2)
 {
-    TCGv_i64 t0 = tcg_const_i64(arg2);
-    tcg_gen_setcond_i64(cond, ret, arg1, t0);
-    tcg_temp_free_i64(t0);
+    if (TCG_TARGET_REG_BITS == 64) {
+        tcg_gen_setcond_i64(cond, ret, arg1, tcg_constant_i64(arg2));
+    } else if (cond == TCG_COND_ALWAYS) {
+        tcg_gen_movi_i64(ret, 1);
+    } else if (cond == TCG_COND_NEVER) {
+        tcg_gen_movi_i64(ret, 0);
+    } else {
+        tcg_gen_op6i_i32(INDEX_op_setcond2_i32, TCGV_LOW(ret),
+                         TCGV_LOW(arg1), TCGV_HIGH(arg1),
+                         tcg_constant_i32(arg2),
+                         tcg_constant_i32(arg2 >> 32), cond);
+        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+    }
 }
 
 void tcg_gen_muli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
@@ -1690,7 +1678,7 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
     } else {
         TCGv_i64 t0 = tcg_temp_new_i64();
         TCGv_i64 t1 = tcg_temp_new_i64();
-        TCGv_i64 t2 = tcg_const_i64(0x00ff00ff);
+        TCGv_i64 t2 = tcg_constant_i64(0x00ff00ff);
 
                                         /* arg = ....abcd */
         tcg_gen_shri_i64(t0, arg, 8);   /*  t0 = .....abc */
@@ -1706,7 +1694,6 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
 
         tcg_temp_free_i64(t0);
         tcg_temp_free_i64(t1);
-        tcg_temp_free_i64(t2);
     }
 }
 
@@ -1850,16 +1837,16 @@ void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
     if (TCG_TARGET_REG_BITS == 32
         && TCG_TARGET_HAS_clz_i32
         && arg2 <= 0xffffffffu) {
-        TCGv_i32 t = tcg_const_i32((uint32_t)arg2 - 32);
-        tcg_gen_clz_i32(t, TCGV_LOW(arg1), t);
+        TCGv_i32 t = tcg_temp_new_i32();
+        tcg_gen_clzi_i32(t, TCGV_LOW(arg1), arg2 - 32);
         tcg_gen_addi_i32(t, t, 32);
         tcg_gen_clz_i32(TCGV_LOW(ret), TCGV_HIGH(arg1), t);
         tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
         tcg_temp_free_i32(t);
     } else {
-        TCGv_i64 t = tcg_const_i64(arg2);
-        tcg_gen_clz_i64(ret, arg1, t);
-        tcg_temp_free_i64(t);
+        TCGv_i64 t0 = tcg_const_i64(arg2);
+        tcg_gen_clz_i64(ret, arg1, t0);
+        tcg_temp_free_i64(t0);
     }
 }
 
@@ -1881,7 +1868,7 @@ void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
             tcg_gen_clzi_i64(t, t, 64);
             tcg_gen_xori_i64(t, t, 63);
         }
-        z = tcg_const_i64(0);
+        z = tcg_constant_i64(0);
         tcg_gen_movcond_i64(TCG_COND_EQ, ret, arg1, z, arg2, t);
         tcg_temp_free_i64(t);
         tcg_temp_free_i64(z);
@@ -1895,8 +1882,8 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
     if (TCG_TARGET_REG_BITS == 32
         && TCG_TARGET_HAS_ctz_i32
         && arg2 <= 0xffffffffu) {
-        TCGv_i32 t32 = tcg_const_i32((uint32_t)arg2 - 32);
-        tcg_gen_ctz_i32(t32, TCGV_HIGH(arg1), t32);
+        TCGv_i32 t32 = tcg_temp_new_i32();
+        tcg_gen_ctzi_i32(t32, TCGV_HIGH(arg1), arg2 - 32);
         tcg_gen_addi_i32(t32, t32, 32);
         tcg_gen_ctz_i32(TCGV_LOW(ret), TCGV_LOW(arg1), t32);
         tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
@@ -1911,9 +1898,9 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
         tcg_gen_ctpop_i64(ret, t);
         tcg_temp_free_i64(t);
     } else {
-        TCGv_i64 t64 = tcg_const_i64(arg2);
-        tcg_gen_ctz_i64(ret, arg1, t64);
-        tcg_temp_free_i64(t64);
+        TCGv_i64 t0 = tcg_const_i64(arg2);
+        tcg_gen_ctz_i64(ret, arg1, t0);
+        tcg_temp_free_i64(t0);
     }
 }
 
@@ -1969,9 +1956,7 @@ void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2)
     if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
     } else if (TCG_TARGET_HAS_rot_i64) {
-        TCGv_i64 t0 = tcg_const_i64(arg2);
-        tcg_gen_rotl_i64(ret, arg1, t0);
-        tcg_temp_free_i64(t0);
+        tcg_gen_rotl_i64(ret, arg1, tcg_constant_i64(arg2));
     } else {
         TCGv_i64 t0, t1;
         t0 = tcg_temp_new_i64();
@@ -2089,9 +2074,8 @@ void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
         tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
     } else if (TCG_TARGET_HAS_deposit_i64
                && TCG_TARGET_deposit_i64_valid(ofs, len)) {
-        TCGv_i64 zero = tcg_const_i64(0);
+        TCGv_i64 zero = tcg_constant_i64(0);
         tcg_gen_op5ii_i64(INDEX_op_deposit_i64, ret, zero, arg, ofs, len);
-        tcg_temp_free_i64(zero);
     } else {
         if (TCG_TARGET_REG_BITS == 32) {
             if (ofs >= 32) {
@@ -3102,9 +3086,8 @@ void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
 
 #ifdef CONFIG_SOFTMMU
         {
-            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
-            gen(retv, cpu_env, addr, cmpv, newv, oi);
-            tcg_temp_free_i32(oi);
+            TCGMemOpIdx oi = make_memop_idx(memop & ~MO_SIGN, idx);
+            gen(retv, cpu_env, addr, cmpv, newv, tcg_constant_i32(oi));
         }
 #else
         gen(retv, cpu_env, addr, cmpv, newv);
@@ -3147,9 +3130,8 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
 
 #ifdef CONFIG_SOFTMMU
         {
-            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop, idx));
-            gen(retv, cpu_env, addr, cmpv, newv, oi);
-            tcg_temp_free_i32(oi);
+            TCGMemOpIdx oi = make_memop_idx(memop, idx);
+            gen(retv, cpu_env, addr, cmpv, newv, tcg_constant_i32(oi));
         }
 #else
         gen(retv, cpu_env, addr, cmpv, newv);
@@ -3210,9 +3192,8 @@ static void do_atomic_op_i32(TCGv_i32 ret, TCGv addr, TCGv_i32 val,
 
 #ifdef CONFIG_SOFTMMU
     {
-        TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
-        gen(ret, cpu_env, addr, val, oi);
-        tcg_temp_free_i32(oi);
+        TCGMemOpIdx oi = make_memop_idx(memop & ~MO_SIGN, idx);
+        gen(ret, cpu_env, addr, val, tcg_constant_i32(oi));
     }
 #else
     gen(ret, cpu_env, addr, val);
@@ -3255,9 +3236,8 @@ static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
 
 #ifdef CONFIG_SOFTMMU
         {
-            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
-            gen(ret, cpu_env, addr, val, oi);
-            tcg_temp_free_i32(oi);
+            TCGMemOpIdx oi = make_memop_idx(memop & ~MO_SIGN, idx);
+            gen(ret, cpu_env, addr, val, tcg_constant_i32(oi));
         }
 #else
         gen(ret, cpu_env, addr, val);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 14/36] tcg: Use tcg_constant_{i32, vec} with tcg vec expanders
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (12 preceding siblings ...)
  2020-04-22  1:16 ` [PATCH v2 13/36] tcg: Use tcg_constant_{i32, i64} with tcg int expanders Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22 17:00   ` [PATCH v2 14/36] tcg: Use tcg_constant_{i32,vec} " Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 15/36] tcg: Use tcg_constant_{i32,i64} with tcg plugins Richard Henderson
                   ` (22 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-vec.c | 63 ++++++++++++++++++++++++++----------------------
 1 file changed, 34 insertions(+), 29 deletions(-)

diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index f3927089a7..655b3ae32d 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -233,25 +233,17 @@ void tcg_gen_mov_vec(TCGv_vec r, TCGv_vec a)
     }
 }
 
-#define MO_REG  (TCG_TARGET_REG_BITS == 64 ? MO_64 : MO_32)
-
-static void do_dupi_vec(TCGv_vec r, unsigned vece, TCGArg a)
-{
-    TCGTemp *rt = tcgv_vec_temp(r);
-    vec_gen_2(INDEX_op_dupi_vec, rt->base_type, vece, temp_arg(rt), a);
-}
-
 TCGv_vec tcg_const_zeros_vec(TCGType type)
 {
     TCGv_vec ret = tcg_temp_new_vec(type);
-    do_dupi_vec(ret, MO_REG, 0);
+    tcg_gen_mov_vec(ret, tcg_constant_vec(type, MO_8, 0));
     return ret;
 }
 
 TCGv_vec tcg_const_ones_vec(TCGType type)
 {
     TCGv_vec ret = tcg_temp_new_vec(type);
-    do_dupi_vec(ret, MO_REG, -1);
+    tcg_gen_mov_vec(ret, tcg_constant_vec(type, MO_8, -1));
     return ret;
 }
 
@@ -267,37 +259,50 @@ TCGv_vec tcg_const_ones_vec_matching(TCGv_vec m)
     return tcg_const_ones_vec(t->base_type);
 }
 
-void tcg_gen_dup64i_vec(TCGv_vec r, uint64_t a)
+void tcg_gen_dupi_vec(unsigned vece, TCGv_vec dest, uint64_t val)
 {
-    if (TCG_TARGET_REG_BITS == 32 && a == deposit64(a, 32, 32, a)) {
-        do_dupi_vec(r, MO_32, a);
-    } else if (TCG_TARGET_REG_BITS == 64 || a == (uint64_t)(int32_t)a) {
-        do_dupi_vec(r, MO_64, a);
-    } else {
-        TCGv_i64 c = tcg_const_i64(a);
-        tcg_gen_dup_i64_vec(MO_64, r, c);
-        tcg_temp_free_i64(c);
+    TCGType type = tcgv_vec_temp(dest)->base_type;
+
+    /*
+     * For MO_64 constants that can't be represented in tcg_target_long,
+     * we must use INDEX_op_dup2_vec.
+     */
+    if (TCG_TARGET_REG_BITS == 32) {
+        val = dup_const(vece, val);
+        if (val != deposit64(val, 32, 32, val) &&
+            val != (uint64_t)(int32_t)val) {
+            uint32_t vl = extract64(val, 0, 32);
+            uint32_t vh = extract64(val, 32, 32);
+            TCGArg al = tcgv_i32_arg(tcg_constant_i32(vl));
+            TCGArg ah = tcgv_i32_arg(tcg_constant_i32(vh));
+            TCGArg di = tcgv_vec_arg(dest);
+
+            vec_gen_3(INDEX_op_dup2_vec, type, MO_64, di, al, ah);
+            return;
+        }
     }
+
+    tcg_gen_mov_vec(dest, tcg_constant_vec(type, vece, val));
 }
 
-void tcg_gen_dup32i_vec(TCGv_vec r, uint32_t a)
+void tcg_gen_dup64i_vec(TCGv_vec dest, uint64_t val)
 {
-    do_dupi_vec(r, MO_REG, dup_const(MO_32, a));
+    tcg_gen_dupi_vec(MO_64, dest, val);
 }
 
-void tcg_gen_dup16i_vec(TCGv_vec r, uint32_t a)
+void tcg_gen_dup32i_vec(TCGv_vec dest, uint32_t val)
 {
-    do_dupi_vec(r, MO_REG, dup_const(MO_16, a));
+    tcg_gen_dupi_vec(MO_32, dest, val);
 }
 
-void tcg_gen_dup8i_vec(TCGv_vec r, uint32_t a)
+void tcg_gen_dup16i_vec(TCGv_vec dest, uint32_t val)
 {
-    do_dupi_vec(r, MO_REG, dup_const(MO_8, a));
+    tcg_gen_dupi_vec(MO_16, dest, val);
 }
 
-void tcg_gen_dupi_vec(unsigned vece, TCGv_vec r, uint64_t a)
+void tcg_gen_dup8i_vec(TCGv_vec dest, uint32_t val)
 {
-    do_dupi_vec(r, MO_REG, dup_const(vece, a));
+    tcg_gen_dupi_vec(MO_8, dest, val);
 }
 
 void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec r, TCGv_i64 a)
@@ -502,8 +507,8 @@ void tcg_gen_abs_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
             if (tcg_can_emit_vec_op(INDEX_op_sari_vec, type, vece) > 0) {
                 tcg_gen_sari_vec(vece, t, a, (8 << vece) - 1);
             } else {
-                do_dupi_vec(t, MO_REG, 0);
-                tcg_gen_cmp_vec(TCG_COND_LT, vece, t, a, t);
+                tcg_gen_cmp_vec(TCG_COND_LT, vece, t, a,
+                                tcg_constant_vec(type, vece, 0));
             }
             tcg_gen_xor_vec(vece, r, a, t);
             tcg_gen_sub_vec(vece, r, r, t);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 15/36] tcg: Use tcg_constant_{i32,i64} with tcg plugins
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (13 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 14/36] tcg: Use tcg_constant_{i32, vec} with tcg vec expanders Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22 17:18   ` [PATCH v2 15/36] tcg: Use tcg_constant_{i32, i64} " Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 16/36] tcg: Rename struct tcg_temp_info to TempOptInfo Richard Henderson
                   ` (21 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/plugin-gen.c | 49 +++++++++++++++++++-----------------------
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 51580d51a0..e5dc9d0ca9 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -284,8 +284,8 @@ static TCGOp *copy_extu_i32_i64(TCGOp **begin_op, TCGOp *op)
     if (TCG_TARGET_REG_BITS == 32) {
         /* mov_i32 */
         op = copy_op(begin_op, op, INDEX_op_mov_i32);
-        /* movi_i32 */
-        op = copy_op(begin_op, op, INDEX_op_movi_i32);
+        /* mov_i32 w/ $0 */
+        op = copy_op(begin_op, op, INDEX_op_mov_i32);
     } else {
         /* extu_i32_i64 */
         op = copy_op(begin_op, op, INDEX_op_extu_i32_i64);
@@ -306,39 +306,34 @@ static TCGOp *copy_mov_i64(TCGOp **begin_op, TCGOp *op)
     return op;
 }
 
-static TCGOp *copy_movi_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
-{
-    if (TCG_TARGET_REG_BITS == 32) {
-        /* 2x movi_i32 */
-        op = copy_op(begin_op, op, INDEX_op_movi_i32);
-        op->args[1] = v;
-
-        op = copy_op(begin_op, op, INDEX_op_movi_i32);
-        op->args[1] = v >> 32;
-    } else {
-        /* movi_i64 */
-        op = copy_op(begin_op, op, INDEX_op_movi_i64);
-        op->args[1] = v;
-    }
-    return op;
-}
-
 static TCGOp *copy_const_ptr(TCGOp **begin_op, TCGOp *op, void *ptr)
 {
     if (UINTPTR_MAX == UINT32_MAX) {
-        /* movi_i32 */
-        op = copy_op(begin_op, op, INDEX_op_movi_i32);
-        op->args[1] = (uintptr_t)ptr;
+        /* mov_i32 */
+        op = copy_op(begin_op, op, INDEX_op_mov_i32);
+        op->args[1] = tcgv_i32_arg(tcg_constant_i32((uintptr_t)ptr));
     } else {
-        /* movi_i64 */
-        op = copy_movi_i64(begin_op, op, (uint64_t)(uintptr_t)ptr);
+        /* mov_i64 */
+        op = copy_op(begin_op, op, INDEX_op_mov_i64);
+        op->args[1] = tcgv_i64_arg(tcg_constant_i64((uintptr_t)ptr));
     }
     return op;
 }
 
 static TCGOp *copy_const_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
 {
-    return copy_movi_i64(begin_op, op, v);
+    if (TCG_TARGET_REG_BITS == 32) {
+        /* 2x mov_i32 */
+        op = copy_op(begin_op, op, INDEX_op_mov_i32);
+        op->args[1] = tcgv_i32_arg(tcg_constant_i32(v));
+        op = copy_op(begin_op, op, INDEX_op_mov_i32);
+        op->args[1] = tcgv_i32_arg(tcg_constant_i32(v >> 32));
+    } else {
+        /* mov_i64 */
+        op = copy_op(begin_op, op, INDEX_op_mov_i64);
+        op->args[1] = tcgv_i64_arg(tcg_constant_i64(v));
+    }
+    return op;
 }
 
 static TCGOp *copy_extu_tl_i64(TCGOp **begin_op, TCGOp *op)
@@ -486,8 +481,8 @@ static TCGOp *append_mem_cb(const struct qemu_plugin_dyn_cb *cb,
 
     tcg_debug_assert(type == PLUGIN_GEN_CB_MEM);
 
-    /* const_i32 == movi_i32 ("info", so it remains as is) */
-    op = copy_op(&begin_op, op, INDEX_op_movi_i32);
+    /* const_i32 == mov_i32 ("info", so it remains as is) */
+    op = copy_op(&begin_op, op, INDEX_op_mov_i32);
 
     /* const_ptr */
     op = copy_const_ptr(&begin_op, op, cb->userp);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 16/36] tcg: Rename struct tcg_temp_info to TempOptInfo
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (14 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 15/36] tcg: Use tcg_constant_{i32,i64} with tcg plugins Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22 17:19   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 17/36] tcg/optimize: Adjust TempOptInfo allocation Richard Henderson
                   ` (20 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, Philippe Mathieu-Daudé

Fix this name vs our coding style.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index effb47eefd..b86bf3d707 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -35,20 +35,20 @@
         glue(glue(case INDEX_op_, x), _i64):    \
         glue(glue(case INDEX_op_, x), _vec)
 
-struct tcg_temp_info {
+typedef struct TempOptInfo {
     bool is_const;
     TCGTemp *prev_copy;
     TCGTemp *next_copy;
     tcg_target_ulong val;
     tcg_target_ulong mask;
-};
+} TempOptInfo;
 
-static inline struct tcg_temp_info *ts_info(TCGTemp *ts)
+static inline TempOptInfo *ts_info(TCGTemp *ts)
 {
     return ts->state_ptr;
 }
 
-static inline struct tcg_temp_info *arg_info(TCGArg arg)
+static inline TempOptInfo *arg_info(TCGArg arg)
 {
     return ts_info(arg_temp(arg));
 }
@@ -71,9 +71,9 @@ static inline bool ts_is_copy(TCGTemp *ts)
 /* Reset TEMP's state, possibly removing the temp for the list of copies.  */
 static void reset_ts(TCGTemp *ts)
 {
-    struct tcg_temp_info *ti = ts_info(ts);
-    struct tcg_temp_info *pi = ts_info(ti->prev_copy);
-    struct tcg_temp_info *ni = ts_info(ti->next_copy);
+    TempOptInfo *ti = ts_info(ts);
+    TempOptInfo *pi = ts_info(ti->prev_copy);
+    TempOptInfo *ni = ts_info(ti->next_copy);
 
     ni->prev_copy = ti->prev_copy;
     pi->next_copy = ti->next_copy;
@@ -89,12 +89,12 @@ static void reset_temp(TCGArg arg)
 }
 
 /* Initialize and activate a temporary.  */
-static void init_ts_info(struct tcg_temp_info *infos,
+static void init_ts_info(TempOptInfo *infos,
                          TCGTempSet *temps_used, TCGTemp *ts)
 {
     size_t idx = temp_idx(ts);
     if (!test_bit(idx, temps_used->l)) {
-        struct tcg_temp_info *ti = &infos[idx];
+        TempOptInfo *ti = &infos[idx];
 
         ts->state_ptr = ti;
         ti->next_copy = ts;
@@ -114,7 +114,7 @@ static void init_ts_info(struct tcg_temp_info *infos,
     }
 }
 
-static void init_arg_info(struct tcg_temp_info *infos,
+static void init_arg_info(TempOptInfo *infos,
                           TCGTempSet *temps_used, TCGArg arg)
 {
     init_ts_info(infos, temps_used, arg_temp(arg));
@@ -177,7 +177,7 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val)
     const TCGOpDef *def;
     TCGOpcode new_op;
     tcg_target_ulong mask;
-    struct tcg_temp_info *di = arg_info(dst);
+    TempOptInfo *di = arg_info(dst);
 
     def = &tcg_op_defs[op->opc];
     if (def->flags & TCG_OPF_VECTOR) {
@@ -208,8 +208,8 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
     TCGTemp *dst_ts = arg_temp(dst);
     TCGTemp *src_ts = arg_temp(src);
     const TCGOpDef *def;
-    struct tcg_temp_info *di;
-    struct tcg_temp_info *si;
+    TempOptInfo *di;
+    TempOptInfo *si;
     tcg_target_ulong mask;
     TCGOpcode new_op;
 
@@ -242,7 +242,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
     di->mask = mask;
 
     if (src_ts->type == dst_ts->type) {
-        struct tcg_temp_info *ni = ts_info(si->next_copy);
+        TempOptInfo *ni = ts_info(si->next_copy);
 
         di->next_copy = si->next_copy;
         di->prev_copy = src_ts;
@@ -605,7 +605,7 @@ void tcg_optimize(TCGContext *s)
 {
     int nb_temps, nb_globals;
     TCGOp *op, *op_next, *prev_mb = NULL;
-    struct tcg_temp_info *infos;
+    TempOptInfo *infos;
     TCGTempSet temps_used;
 
     /* Array VALS has an element for each temp.
@@ -616,7 +616,7 @@ void tcg_optimize(TCGContext *s)
     nb_temps = s->nb_temps;
     nb_globals = s->nb_globals;
     bitmap_zero(temps_used.l, nb_temps);
-    infos = tcg_malloc(sizeof(struct tcg_temp_info) * nb_temps);
+    infos = tcg_malloc(sizeof(TempOptInfo) * nb_temps);
 
     QTAILQ_FOREACH_SAFE(op, &s->ops, link, op_next) {
         tcg_target_ulong mask, partmask, affected;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 17/36] tcg/optimize: Adjust TempOptInfo allocation
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (15 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 16/36] tcg: Rename struct tcg_temp_info to TempOptInfo Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22 17:53   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 18/36] tcg/optimize: Use tcg_constant_internal with constant folding Richard Henderson
                   ` (19 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Do not allocate a large block for indexing.  Instead, allocate
for each temporary as they are seen.

In general, this will use less memory, if we consider that most
TBs do not touch every target register.  This also allows us to
allocate TempOptInfo for new temps created during optimization.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 60 ++++++++++++++++++++++++++++----------------------
 1 file changed, 34 insertions(+), 26 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index b86bf3d707..d36d7e1d7f 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -89,35 +89,41 @@ static void reset_temp(TCGArg arg)
 }
 
 /* Initialize and activate a temporary.  */
-static void init_ts_info(TempOptInfo *infos,
-                         TCGTempSet *temps_used, TCGTemp *ts)
+static void init_ts_info(TCGTempSet *temps_used, TCGTemp *ts)
 {
     size_t idx = temp_idx(ts);
-    if (!test_bit(idx, temps_used->l)) {
-        TempOptInfo *ti = &infos[idx];
+    TempOptInfo *ti;
 
+    if (test_bit(idx, temps_used->l)) {
+        return;
+    }
+    set_bit(idx, temps_used->l);
+
+    ti = ts->state_ptr;
+    if (ti == NULL) {
+        ti = tcg_malloc(sizeof(TempOptInfo));
         ts->state_ptr = ti;
-        ti->next_copy = ts;
-        ti->prev_copy = ts;
-        if (ts->kind == TEMP_CONST) {
-            ti->is_const = true;
-            ti->val = ti->mask = ts->val;
-            if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
-                /* High bits of a 32-bit quantity are garbage.  */
-                ti->mask |= ~0xffffffffull;
-            }
-        } else {
-            ti->is_const = false;
-            ti->mask = -1;
+    }
+
+    ti->next_copy = ts;
+    ti->prev_copy = ts;
+    if (ts->kind == TEMP_CONST) {
+        ti->is_const = true;
+        ti->val = ts->val;
+        ti->mask = ts->val;
+        if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
+            /* High bits of a 32-bit quantity are garbage.  */
+            ti->mask |= ~0xffffffffull;
         }
-        set_bit(idx, temps_used->l);
+    } else {
+        ti->is_const = false;
+        ti->mask = -1;
     }
 }
 
-static void init_arg_info(TempOptInfo *infos,
-                          TCGTempSet *temps_used, TCGArg arg)
+static void init_arg_info(TCGTempSet *temps_used, TCGArg arg)
 {
-    init_ts_info(infos, temps_used, arg_temp(arg));
+    init_ts_info(temps_used, arg_temp(arg));
 }
 
 static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
@@ -603,9 +609,8 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
-    int nb_temps, nb_globals;
+    int nb_temps, nb_globals, i;
     TCGOp *op, *op_next, *prev_mb = NULL;
-    TempOptInfo *infos;
     TCGTempSet temps_used;
 
     /* Array VALS has an element for each temp.
@@ -615,12 +620,15 @@ void tcg_optimize(TCGContext *s)
 
     nb_temps = s->nb_temps;
     nb_globals = s->nb_globals;
+
     bitmap_zero(temps_used.l, nb_temps);
-    infos = tcg_malloc(sizeof(TempOptInfo) * nb_temps);
+    for (i = 0; i < nb_temps; ++i) {
+        s->temps[i].state_ptr = NULL;
+    }
 
     QTAILQ_FOREACH_SAFE(op, &s->ops, link, op_next) {
         tcg_target_ulong mask, partmask, affected;
-        int nb_oargs, nb_iargs, i;
+        int nb_oargs, nb_iargs;
         TCGArg tmp;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def = &tcg_op_defs[opc];
@@ -633,14 +641,14 @@ void tcg_optimize(TCGContext *s)
             for (i = 0; i < nb_oargs + nb_iargs; i++) {
                 TCGTemp *ts = arg_temp(op->args[i]);
                 if (ts) {
-                    init_ts_info(infos, &temps_used, ts);
+                    init_ts_info(&temps_used, ts);
                 }
             }
         } else {
             nb_oargs = def->nb_oargs;
             nb_iargs = def->nb_iargs;
             for (i = 0; i < nb_oargs + nb_iargs; i++) {
-                init_arg_info(infos, &temps_used, op->args[i]);
+                init_arg_info(&temps_used, op->args[i]);
             }
         }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 18/36] tcg/optimize: Use tcg_constant_internal with constant folding
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (16 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 17/36] tcg/optimize: Adjust TempOptInfo allocation Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22 18:28   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 19/36] tcg/tci: Add special tci_movi_{i32,i64} opcodes Richard Henderson
                   ` (18 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 106 ++++++++++++++++++++++---------------------------
 1 file changed, 48 insertions(+), 58 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index d36d7e1d7f..dd5187be31 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -178,37 +178,6 @@ static bool args_are_copies(TCGArg arg1, TCGArg arg2)
     return ts_are_copies(arg_temp(arg1), arg_temp(arg2));
 }
 
-static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val)
-{
-    const TCGOpDef *def;
-    TCGOpcode new_op;
-    tcg_target_ulong mask;
-    TempOptInfo *di = arg_info(dst);
-
-    def = &tcg_op_defs[op->opc];
-    if (def->flags & TCG_OPF_VECTOR) {
-        new_op = INDEX_op_dupi_vec;
-    } else if (def->flags & TCG_OPF_64BIT) {
-        new_op = INDEX_op_movi_i64;
-    } else {
-        new_op = INDEX_op_movi_i32;
-    }
-    op->opc = new_op;
-    /* TCGOP_VECL and TCGOP_VECE remain unchanged.  */
-    op->args[0] = dst;
-    op->args[1] = val;
-
-    reset_temp(dst);
-    di->is_const = true;
-    di->val = val;
-    mask = val;
-    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
-        /* High bits of the destination are now garbage.  */
-        mask |= ~0xffffffffull;
-    }
-    di->mask = mask;
-}
-
 static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
 {
     TCGTemp *dst_ts = arg_temp(dst);
@@ -259,6 +228,27 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
     }
 }
 
+static void tcg_opt_gen_movi(TCGContext *s, TCGTempSet *temps_used,
+                             TCGOp *op, TCGArg dst, TCGArg val)
+{
+    const TCGOpDef *def = &tcg_op_defs[op->opc];
+    TCGType type;
+    TCGTemp *tv;
+
+    if (def->flags & TCG_OPF_VECTOR) {
+        type = TCGOP_VECL(op) + TCG_TYPE_V64;
+    } else if (def->flags & TCG_OPF_64BIT) {
+        type = TCG_TYPE_I64;
+    } else {
+        type = TCG_TYPE_I32;
+    }
+
+    /* Convert movi to mov with constant temp. */
+    tv = tcg_constant_internal(type, val);
+    init_ts_info(temps_used, tv);
+    tcg_opt_gen_mov(s, op, dst, temp_arg(tv));
+}
+
 static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
 {
     uint64_t l64, h64;
@@ -621,7 +611,7 @@ void tcg_optimize(TCGContext *s)
     nb_temps = s->nb_temps;
     nb_globals = s->nb_globals;
 
-    bitmap_zero(temps_used.l, nb_temps);
+    memset(&temps_used, 0, sizeof(temps_used));
     for (i = 0; i < nb_temps; ++i) {
         s->temps[i].state_ptr = NULL;
     }
@@ -727,7 +717,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(rotr):
             if (arg_is_const(op->args[1])
                 && arg_info(op->args[1])->val == 0) {
-                tcg_opt_gen_movi(s, op, op->args[0], 0);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -1050,7 +1040,7 @@ void tcg_optimize(TCGContext *s)
 
         if (partmask == 0) {
             tcg_debug_assert(nb_oargs == 1);
-            tcg_opt_gen_movi(s, op, op->args[0], 0);
+            tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
             continue;
         }
         if (affected == 0) {
@@ -1067,7 +1057,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(mulsh):
             if (arg_is_const(op->args[2])
                 && arg_info(op->args[2])->val == 0) {
-                tcg_opt_gen_movi(s, op, op->args[0], 0);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -1094,7 +1084,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64_VEC(sub):
         CASE_OP_32_64_VEC(xor):
             if (args_are_copies(op->args[1], op->args[2])) {
-                tcg_opt_gen_movi(s, op, op->args[0], 0);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -1111,14 +1101,14 @@ void tcg_optimize(TCGContext *s)
             break;
         CASE_OP_32_64(movi):
         case INDEX_op_dupi_vec:
-            tcg_opt_gen_movi(s, op, op->args[0], op->args[1]);
+            tcg_opt_gen_movi(s, &temps_used, op, op->args[0], op->args[1]);
             break;
 
         case INDEX_op_dup_vec:
             if (arg_is_const(op->args[1])) {
                 tmp = arg_info(op->args[1])->val;
                 tmp = dup_const(TCGOP_VECE(op), tmp);
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1141,7 +1131,7 @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_extrh_i64_i32:
             if (arg_is_const(op->args[1])) {
                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1171,7 +1161,7 @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
                                           arg_info(op->args[2])->val);
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1182,7 +1172,7 @@ void tcg_optimize(TCGContext *s)
                 TCGArg v = arg_info(op->args[1])->val;
                 if (v != 0) {
                     tmp = do_constant_folding(opc, v, 0);
-                    tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                    tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 } else {
                     tcg_opt_gen_mov(s, op, op->args[0], op->args[2]);
                 }
@@ -1195,7 +1185,7 @@ void tcg_optimize(TCGContext *s)
                 tmp = deposit64(arg_info(op->args[1])->val,
                                 op->args[3], op->args[4],
                                 arg_info(op->args[2])->val);
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1204,7 +1194,7 @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1])) {
                 tmp = extract64(arg_info(op->args[1])->val,
                                 op->args[2], op->args[3]);
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1213,7 +1203,7 @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1])) {
                 tmp = sextract64(arg_info(op->args[1])->val,
                                  op->args[2], op->args[3]);
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1229,7 +1219,7 @@ void tcg_optimize(TCGContext *s)
                     tmp = (int32_t)(((uint32_t)v1 >> op->args[3]) |
                                     ((uint32_t)v2 << (32 - op->args[3])));
                 }
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1238,7 +1228,7 @@ void tcg_optimize(TCGContext *s)
             tmp = do_constant_folding_cond(opc, op->args[1],
                                            op->args[2], op->args[3]);
             if (tmp != 2) {
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -1248,7 +1238,7 @@ void tcg_optimize(TCGContext *s)
                                            op->args[1], op->args[2]);
             if (tmp != 2) {
                 if (tmp) {
-                    bitmap_zero(temps_used.l, nb_temps);
+                    memset(&temps_used, 0, sizeof(temps_used));
                     op->opc = INDEX_op_br;
                     op->args[0] = op->args[3];
                 } else {
@@ -1293,7 +1283,7 @@ void tcg_optimize(TCGContext *s)
                 uint64_t a = ((uint64_t)ah << 32) | al;
                 uint64_t b = ((uint64_t)bh << 32) | bl;
                 TCGArg rl, rh;
-                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32);
+                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_mov_i32);
 
                 if (opc == INDEX_op_add2_i32) {
                     a += b;
@@ -1303,8 +1293,8 @@ void tcg_optimize(TCGContext *s)
 
                 rl = op->args[0];
                 rh = op->args[1];
-                tcg_opt_gen_movi(s, op, rl, (int32_t)a);
-                tcg_opt_gen_movi(s, op2, rh, (int32_t)(a >> 32));
+                tcg_opt_gen_movi(s, &temps_used, op, rl, (int32_t)a);
+                tcg_opt_gen_movi(s, &temps_used, op2, rh, (int32_t)(a >> 32));
                 break;
             }
             goto do_default;
@@ -1315,12 +1305,12 @@ void tcg_optimize(TCGContext *s)
                 uint32_t b = arg_info(op->args[3])->val;
                 uint64_t r = (uint64_t)a * b;
                 TCGArg rl, rh;
-                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32);
+                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_mov_i32);
 
                 rl = op->args[0];
                 rh = op->args[1];
-                tcg_opt_gen_movi(s, op, rl, (int32_t)r);
-                tcg_opt_gen_movi(s, op2, rh, (int32_t)(r >> 32));
+                tcg_opt_gen_movi(s, &temps_used, op, rl, (int32_t)r);
+                tcg_opt_gen_movi(s, &temps_used, op2, rh, (int32_t)(r >> 32));
                 break;
             }
             goto do_default;
@@ -1331,7 +1321,7 @@ void tcg_optimize(TCGContext *s)
             if (tmp != 2) {
                 if (tmp) {
             do_brcond_true:
-                    bitmap_zero(temps_used.l, nb_temps);
+                    memset(&temps_used, 0, sizeof(temps_used));
                     op->opc = INDEX_op_br;
                     op->args[0] = op->args[5];
                 } else {
@@ -1347,7 +1337,7 @@ void tcg_optimize(TCGContext *s)
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_brcond_high:
-                bitmap_zero(temps_used.l, nb_temps);
+                memset(&temps_used, 0, sizeof(temps_used));
                 op->opc = INDEX_op_brcond_i32;
                 op->args[0] = op->args[1];
                 op->args[1] = op->args[3];
@@ -1373,7 +1363,7 @@ void tcg_optimize(TCGContext *s)
                     goto do_default;
                 }
             do_brcond_low:
-                bitmap_zero(temps_used.l, nb_temps);
+                memset(&temps_used, 0, sizeof(temps_used));
                 op->opc = INDEX_op_brcond_i32;
                 op->args[1] = op->args[2];
                 op->args[2] = op->args[4];
@@ -1408,7 +1398,7 @@ void tcg_optimize(TCGContext *s)
                                             op->args[5]);
             if (tmp != 2) {
             do_setcond_const:
-                tcg_opt_gen_movi(s, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
             } else if ((op->args[5] == TCG_COND_LT
                         || op->args[5] == TCG_COND_GE)
                        && arg_is_const(op->args[3])
@@ -1493,7 +1483,7 @@ void tcg_optimize(TCGContext *s)
                block, otherwise we only trash the output args.  "mask" is
                the non-zero bits mask for the first output arg.  */
             if (def->flags & TCG_OPF_BB_END) {
-                bitmap_zero(temps_used.l, nb_temps);
+                memset(&temps_used, 0, sizeof(temps_used));
             } else {
         do_reset_output:
                 for (i = 0; i < nb_oargs; i++) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 19/36] tcg/tci: Add special tci_movi_{i32,i64} opcodes
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (17 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 18/36] tcg/optimize: Use tcg_constant_internal with constant folding Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22 19:02   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 20/36] tcg: Remove movi and dupi opcodes Richard Henderson
                   ` (17 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

The normal movi opcodes are going away.  We need something
for TCI to use internally.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-opc.h    | 8 ++++++++
 tcg/tci.c                | 4 ++--
 tcg/tci/tcg-target.inc.c | 4 ++--
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index 9288a04946..7dee9b38f7 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -268,6 +268,14 @@ DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 #include "tcg-target.opc.h"
 #endif
 
+#ifdef TCG_TARGET_INTERPRETER
+/* These opcodes are only for use between the tci generator and interpreter. */
+DEF(tci_movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
+#if TCG_TARGET_REG_BITS == 64
+DEF(tci_movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
+#endif
+#endif
+
 #undef TLADDR_ARGS
 #undef DATA64_ARGS
 #undef IMPL
diff --git a/tcg/tci.c b/tcg/tci.c
index 46fe9ce63f..a6c1aaf5af 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -576,7 +576,7 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
             t1 = tci_read_r32(regs, &tb_ptr);
             tci_write_reg32(regs, t0, t1);
             break;
-        case INDEX_op_movi_i32:
+        case INDEX_op_tci_movi_i32:
             t0 = *tb_ptr++;
             t1 = tci_read_i32(&tb_ptr);
             tci_write_reg32(regs, t0, t1);
@@ -847,7 +847,7 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
             t1 = tci_read_r64(regs, &tb_ptr);
             tci_write_reg64(regs, t0, t1);
             break;
-        case INDEX_op_movi_i64:
+        case INDEX_op_tci_movi_i64:
             t0 = *tb_ptr++;
             t1 = tci_read_i64(&tb_ptr);
             tci_write_reg64(regs, t0, t1);
diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index 992d50cb1e..1f1639df0d 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -530,13 +530,13 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     uint8_t *old_code_ptr = s->code_ptr;
     uint32_t arg32 = arg;
     if (type == TCG_TYPE_I32 || arg == arg32) {
-        tcg_out_op_t(s, INDEX_op_movi_i32);
+        tcg_out_op_t(s, INDEX_op_tci_movi_i32);
         tcg_out_r(s, t0);
         tcg_out32(s, arg32);
     } else {
         tcg_debug_assert(type == TCG_TYPE_I64);
 #if TCG_TARGET_REG_BITS == 64
-        tcg_out_op_t(s, INDEX_op_movi_i64);
+        tcg_out_op_t(s, INDEX_op_tci_movi_i64);
         tcg_out_r(s, t0);
         tcg_out64(s, arg);
 #else
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 20/36] tcg: Remove movi and dupi opcodes
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (18 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 19/36] tcg/tci: Add special tci_movi_{i32,i64} opcodes Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22  9:12   ` Aleksandar Markovic
  2020-04-22 19:03   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 21/36] tcg: Use tcg_out_dupi_vec from temp_load Richard Henderson
                   ` (16 subsequent siblings)
  36 siblings, 2 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

These are now completely covered by mov from a
TYPE_CONST temporary.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-opc.h        |  3 ---
 tcg/aarch64/tcg-target.inc.c |  3 ---
 tcg/arm/tcg-target.inc.c     |  1 -
 tcg/i386/tcg-target.inc.c    |  3 ---
 tcg/mips/tcg-target.inc.c    |  2 --
 tcg/optimize.c               |  4 ----
 tcg/ppc/tcg-target.inc.c     |  3 ---
 tcg/riscv/tcg-target.inc.c   |  2 --
 tcg/s390/tcg-target.inc.c    |  2 --
 tcg/sparc/tcg-target.inc.c   |  2 --
 tcg/tcg-op-vec.c             |  1 -
 tcg/tcg.c                    | 18 +-----------------
 tcg/tci/tcg-target.inc.c     |  2 --
 13 files changed, 1 insertion(+), 45 deletions(-)

diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index 7dee9b38f7..4a9cbf5426 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -45,7 +45,6 @@ DEF(br, 0, 0, 1, TCG_OPF_BB_END)
 DEF(mb, 0, 0, 1, 0)
 
 DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
-DEF(movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
 DEF(setcond_i32, 1, 2, 1, 0)
 DEF(movcond_i32, 1, 4, 1, IMPL(TCG_TARGET_HAS_movcond_i32))
 /* load/store */
@@ -110,7 +109,6 @@ DEF(ctz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_ctz_i32))
 DEF(ctpop_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ctpop_i32))
 
 DEF(mov_i64, 1, 1, 0, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
-DEF(movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
 DEF(setcond_i64, 1, 2, 1, IMPL64)
 DEF(movcond_i64, 1, 4, 1, IMPL64 | IMPL(TCG_TARGET_HAS_movcond_i64))
 /* load/store */
@@ -215,7 +213,6 @@ DEF(qemu_st_i64, 0, TLADDR_ARGS + DATA64_ARGS, 1,
 #define IMPLVEC  TCG_OPF_VECTOR | IMPL(TCG_TARGET_MAYBE_vec)
 
 DEF(mov_vec, 1, 1, 0, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT)
-DEF(dupi_vec, 1, 0, 1, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT)
 
 DEF(dup_vec, 1, 1, 0, IMPLVEC)
 DEF(dup2_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_REG_BITS == 32))
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 843fd0ca69..7918aeb9d5 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -2261,8 +2261,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         g_assert_not_reached();
@@ -2467,7 +2465,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
-    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
     default:
         g_assert_not_reached();
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 6aa7757aac..b967499fa4 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -2068,7 +2068,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index ec083bddcf..320a4bddd1 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2678,8 +2678,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
@@ -2965,7 +2963,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
-    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
     default:
         g_assert_not_reached();
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 4d32ebc1df..09dc5a94fa 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -2155,8 +2155,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
diff --git a/tcg/optimize.c b/tcg/optimize.c
index dd5187be31..9a2c945dbe 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1099,10 +1099,6 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64_VEC(mov):
             tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
             break;
-        CASE_OP_32_64(movi):
-        case INDEX_op_dupi_vec:
-            tcg_opt_gen_movi(s, &temps_used, op, op->args[0], op->args[1]);
-            break;
 
         case INDEX_op_dup_vec:
             if (arg_is_const(op->args[1])) {
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index ee1f9227c1..fb390ad978 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -2967,8 +2967,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 
     case INDEX_op_mov_i32:   /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32:  /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:      /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
@@ -3310,7 +3308,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         return;
 
     case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
-    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
     default:
         g_assert_not_reached();
diff --git a/tcg/riscv/tcg-target.inc.c b/tcg/riscv/tcg-target.inc.c
index 2bc0ba71f2..ec609272ad 100644
--- a/tcg/riscv/tcg-target.inc.c
+++ b/tcg/riscv/tcg-target.inc.c
@@ -1606,8 +1606,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         g_assert_not_reached();
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index b07e9ff7d6..f6b003a700 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -2310,8 +2310,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 65fddb310d..0808b79eee 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -1591,8 +1591,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 655b3ae32d..6343046e18 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -83,7 +83,6 @@ bool tcg_can_emit_vecop_list(const TCGOpcode *list,
         case INDEX_op_xor_vec:
         case INDEX_op_mov_vec:
         case INDEX_op_dup_vec:
-        case INDEX_op_dupi_vec:
         case INDEX_op_dup2_vec:
         case INDEX_op_ld_vec:
         case INDEX_op_st_vec:
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 59beb2bf29..adb71f16ae 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1463,7 +1463,6 @@ bool tcg_op_supported(TCGOpcode op)
         return TCG_TARGET_HAS_goto_ptr;
 
     case INDEX_op_mov_i32:
-    case INDEX_op_movi_i32:
     case INDEX_op_setcond_i32:
     case INDEX_op_brcond_i32:
     case INDEX_op_ld8u_i32:
@@ -1557,7 +1556,6 @@ bool tcg_op_supported(TCGOpcode op)
         return TCG_TARGET_REG_BITS == 32;
 
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i64:
     case INDEX_op_setcond_i64:
     case INDEX_op_brcond_i64:
     case INDEX_op_ld8u_i64:
@@ -1663,7 +1661,6 @@ bool tcg_op_supported(TCGOpcode op)
 
     case INDEX_op_mov_vec:
     case INDEX_op_dup_vec:
-    case INDEX_op_dupi_vec:
     case INDEX_op_dupm_vec:
     case INDEX_op_ld_vec:
     case INDEX_op_st_vec:
@@ -3447,7 +3444,7 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
 }
 
 /*
- * Specialized code generation for INDEX_op_movi_*.
+ * Specialized code generation for INDEX_op_mov_* with a constant.
  */
 static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
                                   tcg_target_ulong val, TCGLifeData arg_life,
@@ -3470,14 +3467,6 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
     }
 }
 
-static void tcg_reg_alloc_movi(TCGContext *s, const TCGOp *op)
-{
-    TCGTemp *ots = arg_temp(op->args[0]);
-    tcg_target_ulong val = op->args[1];
-
-    tcg_reg_alloc_do_movi(s, ots, val, op->life, op->output_pref[0]);
-}
-
 /*
  * Specialized code generation for INDEX_op_mov_*.
  */
@@ -4263,11 +4252,6 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
         case INDEX_op_mov_vec:
             tcg_reg_alloc_mov(s, op);
             break;
-        case INDEX_op_movi_i32:
-        case INDEX_op_movi_i64:
-        case INDEX_op_dupi_vec:
-            tcg_reg_alloc_movi(s, op);
-            break;
         case INDEX_op_dup_vec:
             tcg_reg_alloc_dup(s, op);
             break;
diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index 1f1639df0d..b796f4fc19 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -815,8 +815,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-    case INDEX_op_movi_i64:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 21/36] tcg: Use tcg_out_dupi_vec from temp_load
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (19 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 20/36] tcg: Remove movi and dupi opcodes Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22 19:28   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 22/36] tcg: Increase tcg_out_dupi_vec immediate to int64_t Richard Henderson
                   ` (15 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Having dupi pass though movi is confusing and arguably wrong.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c |  7 ----
 tcg/i386/tcg-target.inc.c    | 63 ++++++++++++++++++++++++------------
 tcg/ppc/tcg-target.inc.c     |  6 ----
 tcg/tcg.c                    |  8 ++++-
 4 files changed, 49 insertions(+), 35 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 7918aeb9d5..e5c9ab70a9 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1009,13 +1009,6 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
     case TCG_TYPE_I64:
         tcg_debug_assert(rd < 32);
         break;
-
-    case TCG_TYPE_V64:
-    case TCG_TYPE_V128:
-        tcg_debug_assert(rd >= 32);
-        tcg_out_dupi_vec(s, type, rd, value);
-        return;
-
     default:
         g_assert_not_reached();
     }
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 320a4bddd1..07424f7ef9 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -977,30 +977,32 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
     }
 }
 
-static void tcg_out_movi(TCGContext *s, TCGType type,
-                         TCGReg ret, tcg_target_long arg)
+static void tcg_out_movi_vec(TCGContext *s, TCGType type,
+                             TCGReg ret, tcg_target_long arg)
+{
+    if (arg == 0) {
+        tcg_out_vex_modrm(s, OPC_PXOR, ret, ret, ret);
+        return;
+    }
+    if (arg == -1) {
+        tcg_out_vex_modrm(s, OPC_PCMPEQB, ret, ret, ret);
+        return;
+    }
+
+    int rexw = (type == TCG_TYPE_I32 ? 0 : P_REXW);
+    tcg_out_vex_modrm_pool(s, OPC_MOVD_VyEy + rexw, ret);
+    if (TCG_TARGET_REG_BITS == 64) {
+        new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
+    } else {
+        new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
+    }
+}
+
+static void tcg_out_movi_int(TCGContext *s, TCGType type,
+                             TCGReg ret, tcg_target_long arg)
 {
     tcg_target_long diff;
 
-    switch (type) {
-    case TCG_TYPE_I32:
-#if TCG_TARGET_REG_BITS == 64
-    case TCG_TYPE_I64:
-#endif
-        if (ret < 16) {
-            break;
-        }
-        /* fallthru */
-    case TCG_TYPE_V64:
-    case TCG_TYPE_V128:
-    case TCG_TYPE_V256:
-        tcg_debug_assert(ret >= 16);
-        tcg_out_dupi_vec(s, type, ret, arg);
-        return;
-    default:
-        g_assert_not_reached();
-    }
-
     if (arg == 0) {
         tgen_arithr(s, ARITH_XOR, ret, ret);
         return;
@@ -1029,6 +1031,25 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     tcg_out64(s, arg);
 }
 
+static void tcg_out_movi(TCGContext *s, TCGType type,
+                         TCGReg ret, tcg_target_long arg)
+{
+    switch (type) {
+    case TCG_TYPE_I32:
+#if TCG_TARGET_REG_BITS == 64
+    case TCG_TYPE_I64:
+#endif
+        if (ret < 16) {
+            tcg_out_movi_int(s, type, ret, arg);
+        } else {
+            tcg_out_movi_vec(s, type, ret, arg);
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val)
 {
     if (val == (int8_t)val) {
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index fb390ad978..7ab1e32064 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -987,12 +987,6 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
         tcg_out_movi_int(s, type, ret, arg, false);
         break;
 
-    case TCG_TYPE_V64:
-    case TCG_TYPE_V128:
-        tcg_debug_assert(ret >= TCG_REG_V0);
-        tcg_out_dupi_vec(s, type, ret, arg);
-        break;
-
     default:
         g_assert_not_reached();
     }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index adb71f16ae..4f1ed1d2fe 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3359,7 +3359,13 @@ static void temp_load(TCGContext *s, TCGTemp *ts, TCGRegSet desired_regs,
     case TEMP_VAL_CONST:
         reg = tcg_reg_alloc(s, desired_regs, allocated_regs,
                             preferred_regs, ts->indirect_base);
-        tcg_out_movi(s, ts->type, reg, ts->val);
+        if (ts->type <= TCG_TYPE_I64) {
+            tcg_out_movi(s, ts->type, reg, ts->val);
+        } else if (TCG_TARGET_REG_BITS == 64) {
+            tcg_out_dupi_vec(s, ts->type, reg, ts->val);
+        } else {
+            tcg_out_dupi_vec(s, ts->type, reg, dup_const(MO_32, ts->val));
+        }
         ts->mem_coherent = 0;
         break;
     case TEMP_VAL_MEM:
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 22/36] tcg: Increase tcg_out_dupi_vec immediate to int64_t
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (20 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 21/36] tcg: Use tcg_out_dupi_vec from temp_load Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22 19:33   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 23/36] tcg: Add tcg_reg_alloc_dup2 Richard Henderson
                   ` (14 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

While we don't store more than tcg_target_long in TCGTemp,
we shouldn't be limited to that for code generation.  We will
be able to use this for INDEX_op_dup2_vec with 2 constants.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c |  2 +-
 tcg/i386/tcg-target.inc.c    | 20 ++++++++++++--------
 tcg/ppc/tcg-target.inc.c     | 15 ++++++++-------
 tcg/tcg.c                    |  4 ++--
 4 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index e5c9ab70a9..3b5a5d78c7 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -856,7 +856,7 @@ static void tcg_out_logicali(TCGContext *s, AArch64Insn insn, TCGType ext,
 }
 
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
-                             TCGReg rd, tcg_target_long v64)
+                             TCGReg rd, int64_t v64)
 {
     bool q = type == TCG_TYPE_V128;
     int cmode, imm8, i;
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 07424f7ef9..9cb627d6eb 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -945,7 +945,7 @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
 }
 
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
-                             TCGReg ret, tcg_target_long arg)
+                             TCGReg ret, int64_t arg)
 {
     int vex_l = (type == TCG_TYPE_V256 ? P_VEXL : 0);
 
@@ -958,7 +958,14 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
         return;
     }
 
-    if (TCG_TARGET_REG_BITS == 64) {
+    if (TCG_TARGET_REG_BITS == 32 && arg == dup_const(MO_32, arg)) {
+        if (have_avx2) {
+            tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTW + vex_l, ret);
+        } else {
+            tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSS, ret);
+        }
+        new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
+    } else {
         if (type == TCG_TYPE_V64) {
             tcg_out_vex_modrm_pool(s, OPC_MOVQ_VqWq, ret);
         } else if (have_avx2) {
@@ -966,14 +973,11 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
         } else {
             tcg_out_vex_modrm_pool(s, OPC_MOVDDUP, ret);
         }
-        new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
-    } else {
-        if (have_avx2) {
-            tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTW + vex_l, ret);
+        if (TCG_TARGET_REG_BITS == 64) {
+            new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
         } else {
-            tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSS, ret);
+            new_pool_l2(s, R_386_32, s->code_ptr - 4, 0, arg, arg >> 32);
         }
-        new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
     }
 }
 
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 7ab1e32064..3333b55766 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -913,7 +913,7 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
 }
 
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
-                             tcg_target_long val)
+                             int64_t val)
 {
     uint32_t load_insn;
     int rel, low;
@@ -921,20 +921,20 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
 
     low = (int8_t)val;
     if (low >= -16 && low < 16) {
-        if (val == (tcg_target_long)dup_const(MO_8, low)) {
+        if (val == dup_const(MO_8, low)) {
             tcg_out32(s, VSPLTISB | VRT(ret) | ((val & 31) << 16));
             return;
         }
-        if (val == (tcg_target_long)dup_const(MO_16, low)) {
+        if (val == dup_const(MO_16, low)) {
             tcg_out32(s, VSPLTISH | VRT(ret) | ((val & 31) << 16));
             return;
         }
-        if (val == (tcg_target_long)dup_const(MO_32, low)) {
+        if (val == dup_const(MO_32, low)) {
             tcg_out32(s, VSPLTISW | VRT(ret) | ((val & 31) << 16));
             return;
         }
     }
-    if (have_isa_3_00 && val == (tcg_target_long)dup_const(MO_8, val)) {
+    if (have_isa_3_00 && val == dup_const(MO_8, val)) {
         tcg_out32(s, XXSPLTIB | VRT(ret) | ((val & 0xff) << 11));
         return;
     }
@@ -956,14 +956,15 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
         if (TCG_TARGET_REG_BITS == 64) {
             new_pool_label(s, val, rel, s->code_ptr, add);
         } else {
-            new_pool_l2(s, rel, s->code_ptr, add, val, val);
+            new_pool_l2(s, rel, s->code_ptr, add, val >> 32, val);
         }
     } else {
         load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
         if (TCG_TARGET_REG_BITS == 64) {
             new_pool_l2(s, rel, s->code_ptr, add, val, val);
         } else {
-            new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
+            new_pool_l4(s, rel, s->code_ptr, add,
+                        val >> 32, val, val >> 32, val);
         }
     }
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 4f1ed1d2fe..fc1c97d586 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -117,7 +117,7 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
 static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
                              TCGReg dst, TCGReg base, intptr_t offset);
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
-                             TCGReg dst, tcg_target_long arg);
+                             TCGReg dst, int64_t arg);
 static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl,
                            unsigned vece, const TCGArg *args,
                            const int *const_args);
@@ -133,7 +133,7 @@ static inline bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
     g_assert_not_reached();
 }
 static inline void tcg_out_dupi_vec(TCGContext *s, TCGType type,
-                                    TCGReg dst, tcg_target_long arg)
+                                    TCGReg dst, int64_t arg)
 {
     g_assert_not_reached();
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 23/36] tcg: Add tcg_reg_alloc_dup2
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (21 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 22/36] tcg: Increase tcg_out_dupi_vec immediate to int64_t Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22 19:40   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 24/36] tcg/i386: Use tcg_constant_vec with tcg vec expanders Richard Henderson
                   ` (13 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

There are several ways we can expand a vector dup of a 64-bit
element on a 32-bit host.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index fc1c97d586..d712d19842 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3870,6 +3870,91 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
     }
 }
 
+static void tcg_reg_alloc_dup2(TCGContext *s, const TCGOp *op)
+{
+    const TCGLifeData arg_life = op->life;
+    TCGTemp *ots, *itsl, *itsh;
+    TCGType vtype = TCGOP_VECL(op) + TCG_TYPE_V64;
+
+    /* This opcode is only valid for 32-bit hosts, for 64-bit elements. */
+    tcg_debug_assert(TCG_TARGET_REG_BITS == 32);
+    tcg_debug_assert(TCGOP_VECE(op) == MO_64);
+
+    ots = arg_temp(op->args[0]);
+    itsl = arg_temp(op->args[1]);
+    itsh = arg_temp(op->args[2]);
+
+    /* ENV should not be modified.  */
+    tcg_debug_assert(!temp_readonly(ots));
+
+    /* Allocate the output register now.  */
+    if (ots->val_type != TEMP_VAL_REG) {
+        TCGRegSet allocated_regs = s->reserved_regs;
+        TCGRegSet dup_out_regs =
+            tcg_op_defs[INDEX_op_dup_vec].args_ct[0].u.regs;
+
+        /* Make sure to not spill the input registers. */
+        if (!IS_DEAD_ARG(1) && itsl->val_type == TEMP_VAL_REG) {
+            tcg_regset_set_reg(allocated_regs, itsl->reg);
+        }
+        if (!IS_DEAD_ARG(2) && itsh->val_type == TEMP_VAL_REG) {
+            tcg_regset_set_reg(allocated_regs, itsh->reg);
+        }
+
+        ots->reg = tcg_reg_alloc(s, dup_out_regs, allocated_regs,
+                                 op->output_pref[0], ots->indirect_base);
+        ots->val_type = TEMP_VAL_REG;
+        ots->mem_coherent = 0;
+        s->reg_to_temp[ots->reg] = ots;
+    }
+
+    /* Promote dup2 of immediates to dupi_vec. */
+    if (itsl->val_type == TEMP_VAL_CONST &&
+        itsh->val_type == TEMP_VAL_CONST) {
+        tcg_out_dupi_vec(s, vtype, ots->reg,
+                         (uint32_t)itsl->val | ((uint64_t)itsh->val << 32));
+        goto done;
+    }
+
+    /* If the two inputs form one 64-bit value, try dupm_vec. */
+    if (itsl + 1 == itsh &&
+        itsl->base_type == TCG_TYPE_I64 &&
+        itsh->base_type == TCG_TYPE_I64) {
+        if (!itsl->mem_coherent) {
+            temp_sync(s, itsl, s->reserved_regs, 0, 0);
+        }
+        if (!itsl->mem_coherent) {
+            temp_sync(s, itsl, s->reserved_regs, 0, 0);
+        }
+#ifdef HOST_WORDS_BIGENDIAN
+        TCGTemp *its = itsh;
+#else
+        TCGTemp *its = itsl;
+#endif
+        if (tcg_out_dupm_vec(s, vtype, MO_64, ots->reg,
+                             its->mem_base->reg, its->mem_offset)) {
+            goto done;
+        }
+    }
+
+    /* Fall back to generic expansion. */
+    tcg_reg_alloc_op(s, op);
+    return;
+
+ done:
+    if (IS_DEAD_ARG(1)) {
+        temp_dead(s, itsl);
+    }
+    if (IS_DEAD_ARG(2)) {
+        temp_dead(s, itsh);
+    }
+    if (NEED_SYNC_ARG(0)) {
+        temp_sync(s, ots, s->reserved_regs, 0, IS_DEAD_ARG(0));
+    } else if (IS_DEAD_ARG(0)) {
+        temp_dead(s, ots);
+    }
+}
+
 #ifdef TCG_TARGET_STACK_GROWSUP
 #define STACK_DIR(x) (-(x))
 #else
@@ -4261,6 +4346,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
         case INDEX_op_dup_vec:
             tcg_reg_alloc_dup(s, op);
             break;
+        case INDEX_op_dup2_vec:
+            tcg_reg_alloc_dup2(s, op);
+            break;
         case INDEX_op_insn_start:
             if (num_insns >= 0) {
                 size_t off = tcg_current_code_size(s);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 24/36] tcg/i386: Use tcg_constant_vec with tcg vec expanders
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (22 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 23/36] tcg: Add tcg_reg_alloc_dup2 Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22 19:43   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 25/36] tcg: Remove tcg_gen_dup{8,16,32,64}i_vec Richard Henderson
                   ` (12 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 9cb627d6eb..deace219d2 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -3452,7 +3452,7 @@ static void expand_vec_sari(TCGType type, unsigned vece,
 static void expand_vec_mul(TCGType type, unsigned vece,
                            TCGv_vec v0, TCGv_vec v1, TCGv_vec v2)
 {
-    TCGv_vec t1, t2, t3, t4;
+    TCGv_vec t1, t2, t3, t4, zero;
 
     tcg_debug_assert(vece == MO_8);
 
@@ -3470,11 +3470,11 @@ static void expand_vec_mul(TCGType type, unsigned vece,
     case TCG_TYPE_V64:
         t1 = tcg_temp_new_vec(TCG_TYPE_V128);
         t2 = tcg_temp_new_vec(TCG_TYPE_V128);
-        tcg_gen_dup16i_vec(t2, 0);
+        zero = tcg_constant_vec(TCG_TYPE_V128, MO_8, 0);
         vec_gen_3(INDEX_op_x86_punpckl_vec, TCG_TYPE_V128, MO_8,
-                  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(t2));
+                  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(zero));
         vec_gen_3(INDEX_op_x86_punpckl_vec, TCG_TYPE_V128, MO_8,
-                  tcgv_vec_arg(t2), tcgv_vec_arg(t2), tcgv_vec_arg(v2));
+                  tcgv_vec_arg(t2), tcgv_vec_arg(zero), tcgv_vec_arg(v2));
         tcg_gen_mul_vec(MO_16, t1, t1, t2);
         tcg_gen_shri_vec(MO_16, t1, t1, 8);
         vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_8,
@@ -3489,15 +3489,15 @@ static void expand_vec_mul(TCGType type, unsigned vece,
         t2 = tcg_temp_new_vec(type);
         t3 = tcg_temp_new_vec(type);
         t4 = tcg_temp_new_vec(type);
-        tcg_gen_dup16i_vec(t4, 0);
+        zero = tcg_constant_vec(TCG_TYPE_V128, MO_8, 0);
         vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_8,
-                  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(t4));
+                  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(zero));
         vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_8,
-                  tcgv_vec_arg(t2), tcgv_vec_arg(t4), tcgv_vec_arg(v2));
+                  tcgv_vec_arg(t2), tcgv_vec_arg(zero), tcgv_vec_arg(v2));
         vec_gen_3(INDEX_op_x86_punpckh_vec, type, MO_8,
-                  tcgv_vec_arg(t3), tcgv_vec_arg(v1), tcgv_vec_arg(t4));
+                  tcgv_vec_arg(t3), tcgv_vec_arg(v1), tcgv_vec_arg(zero));
         vec_gen_3(INDEX_op_x86_punpckh_vec, type, MO_8,
-                  tcgv_vec_arg(t4), tcgv_vec_arg(t4), tcgv_vec_arg(v2));
+                  tcgv_vec_arg(t4), tcgv_vec_arg(zero), tcgv_vec_arg(v2));
         tcg_gen_mul_vec(MO_16, t1, t1, t2);
         tcg_gen_mul_vec(MO_16, t3, t3, t4);
         tcg_gen_shri_vec(MO_16, t1, t1, 8);
@@ -3525,7 +3525,7 @@ static bool expand_vec_cmp_noinv(TCGType type, unsigned vece, TCGv_vec v0,
         NEED_UMIN = 8,
         NEED_UMAX = 16,
     };
-    TCGv_vec t1, t2;
+    TCGv_vec t1, t2, t3;
     uint8_t fixup;
 
     switch (cond) {
@@ -3596,9 +3596,9 @@ static bool expand_vec_cmp_noinv(TCGType type, unsigned vece, TCGv_vec v0,
     } else if (fixup & NEED_BIAS) {
         t1 = tcg_temp_new_vec(type);
         t2 = tcg_temp_new_vec(type);
-        tcg_gen_dupi_vec(vece, t2, 1ull << ((8 << vece) - 1));
-        tcg_gen_sub_vec(vece, t1, v1, t2);
-        tcg_gen_sub_vec(vece, t2, v2, t2);
+        t3 = tcg_constant_vec(type, vece, 1ull << ((8 << vece) - 1));
+        tcg_gen_sub_vec(vece, t1, v1, t3);
+        tcg_gen_sub_vec(vece, t2, v2, t3);
         v1 = t1;
         v2 = t2;
         cond = tcg_signed_cond(cond);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 25/36] tcg: Remove tcg_gen_dup{8,16,32,64}i_vec
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (23 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 24/36] tcg/i386: Use tcg_constant_vec with tcg vec expanders Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-23  9:11   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 26/36] tcg: Add load_dest parameter to GVecGen2 Richard Henderson
                   ` (11 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

These interfaces have been replaced by tcg_gen_dupi_vec
and tcg_constant_vec.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-op.h |  4 ----
 tcg/tcg-op-vec.c     | 20 --------------------
 2 files changed, 24 deletions(-)

diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index 11ed9192f7..a39eb13ff0 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -959,10 +959,6 @@ void tcg_gen_mov_vec(TCGv_vec, TCGv_vec);
 void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec, TCGv_i32);
 void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec, TCGv_i64);
 void tcg_gen_dup_mem_vec(unsigned vece, TCGv_vec, TCGv_ptr, tcg_target_long);
-void tcg_gen_dup8i_vec(TCGv_vec, uint32_t);
-void tcg_gen_dup16i_vec(TCGv_vec, uint32_t);
-void tcg_gen_dup32i_vec(TCGv_vec, uint32_t);
-void tcg_gen_dup64i_vec(TCGv_vec, uint64_t);
 void tcg_gen_dupi_vec(unsigned vece, TCGv_vec, uint64_t);
 void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 6343046e18..a9c16d85c5 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -284,26 +284,6 @@ void tcg_gen_dupi_vec(unsigned vece, TCGv_vec dest, uint64_t val)
     tcg_gen_mov_vec(dest, tcg_constant_vec(type, vece, val));
 }
 
-void tcg_gen_dup64i_vec(TCGv_vec dest, uint64_t val)
-{
-    tcg_gen_dupi_vec(MO_64, dest, val);
-}
-
-void tcg_gen_dup32i_vec(TCGv_vec dest, uint32_t val)
-{
-    tcg_gen_dupi_vec(MO_32, dest, val);
-}
-
-void tcg_gen_dup16i_vec(TCGv_vec dest, uint32_t val)
-{
-    tcg_gen_dupi_vec(MO_16, dest, val);
-}
-
-void tcg_gen_dup8i_vec(TCGv_vec dest, uint32_t val)
-{
-    tcg_gen_dupi_vec(MO_8, dest, val);
-}
-
 void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec r, TCGv_i64 a)
 {
     TCGArg ri = tcgv_vec_arg(r);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 26/36] tcg: Add load_dest parameter to GVecGen2
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (24 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 25/36] tcg: Remove tcg_gen_dup{8,16,32,64}i_vec Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-23  9:37   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 27/36] tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32, 64} Richard Henderson
                   ` (10 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

We have this same parameter for GVecGen2i, GVecGen3,
and GVecGen3i.  This will make some SVE2 insns easier
to parameterize.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-op-gvec.h |  2 ++
 tcg/tcg-op-gvec.c         | 45 ++++++++++++++++++++++++++++-----------
 2 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index d89f91f40e..cea6497341 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -109,6 +109,8 @@ typedef struct {
     uint8_t vece;
     /* Prefer i64 to v64.  */
     bool prefer_i64;
+    /* Load dest as a 2nd source operand.  */
+    bool load_dest;
 } GVecGen2;
 
 typedef struct {
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 43cac1a0bf..049a55e700 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -663,17 +663,22 @@ static void expand_clr(uint32_t dofs, uint32_t maxsz)
 
 /* Expand OPSZ bytes worth of two-operand operations using i32 elements.  */
 static void expand_2_i32(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
-                         void (*fni)(TCGv_i32, TCGv_i32))
+                         bool load_dest, void (*fni)(TCGv_i32, TCGv_i32))
 {
     TCGv_i32 t0 = tcg_temp_new_i32();
+    TCGv_i32 t1 = tcg_temp_new_i32();
     uint32_t i;
 
     for (i = 0; i < oprsz; i += 4) {
         tcg_gen_ld_i32(t0, cpu_env, aofs + i);
-        fni(t0, t0);
-        tcg_gen_st_i32(t0, cpu_env, dofs + i);
+        if (load_dest) {
+            tcg_gen_ld_i32(t1, cpu_env, dofs + i);
+        }
+        fni(t1, t0);
+        tcg_gen_st_i32(t1, cpu_env, dofs + i);
     }
     tcg_temp_free_i32(t0);
+    tcg_temp_free_i32(t1);
 }
 
 static void expand_2i_i32(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
@@ -793,17 +798,22 @@ static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
 
 /* Expand OPSZ bytes worth of two-operand operations using i64 elements.  */
 static void expand_2_i64(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
-                         void (*fni)(TCGv_i64, TCGv_i64))
+                         bool load_dest, void (*fni)(TCGv_i64, TCGv_i64))
 {
     TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
     uint32_t i;
 
     for (i = 0; i < oprsz; i += 8) {
         tcg_gen_ld_i64(t0, cpu_env, aofs + i);
-        fni(t0, t0);
-        tcg_gen_st_i64(t0, cpu_env, dofs + i);
+        if (load_dest) {
+            tcg_gen_ld_i64(t1, cpu_env, dofs + i);
+        }
+        fni(t1, t0);
+        tcg_gen_st_i64(t1, cpu_env, dofs + i);
     }
     tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
 }
 
 static void expand_2i_i64(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
@@ -924,17 +934,23 @@ static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
 /* Expand OPSZ bytes worth of two-operand operations using host vectors.  */
 static void expand_2_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
                          uint32_t oprsz, uint32_t tysz, TCGType type,
+                         bool load_dest,
                          void (*fni)(unsigned, TCGv_vec, TCGv_vec))
 {
     TCGv_vec t0 = tcg_temp_new_vec(type);
+    TCGv_vec t1 = tcg_temp_new_vec(type);
     uint32_t i;
 
     for (i = 0; i < oprsz; i += tysz) {
         tcg_gen_ld_vec(t0, cpu_env, aofs + i);
-        fni(vece, t0, t0);
-        tcg_gen_st_vec(t0, cpu_env, dofs + i);
+        if (load_dest) {
+            tcg_gen_ld_vec(t1, cpu_env, dofs + i);
+        }
+        fni(vece, t1, t0);
+        tcg_gen_st_vec(t1, cpu_env, dofs + i);
     }
     tcg_temp_free_vec(t0);
+    tcg_temp_free_vec(t1);
 }
 
 /* Expand OPSZ bytes worth of two-vector operands and an immediate operand
@@ -1088,7 +1104,8 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,
          * that e.g. size == 80 would be expanded with 2x32 + 1x16.
          */
         some = QEMU_ALIGN_DOWN(oprsz, 32);
-        expand_2_vec(g->vece, dofs, aofs, some, 32, TCG_TYPE_V256, g->fniv);
+        expand_2_vec(g->vece, dofs, aofs, some, 32, TCG_TYPE_V256,
+                     g->load_dest, g->fniv);
         if (some == oprsz) {
             break;
         }
@@ -1098,17 +1115,19 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,
         maxsz -= some;
         /* fallthru */
     case TCG_TYPE_V128:
-        expand_2_vec(g->vece, dofs, aofs, oprsz, 16, TCG_TYPE_V128, g->fniv);
+        expand_2_vec(g->vece, dofs, aofs, oprsz, 16, TCG_TYPE_V128,
+                     g->load_dest, g->fniv);
         break;
     case TCG_TYPE_V64:
-        expand_2_vec(g->vece, dofs, aofs, oprsz, 8, TCG_TYPE_V64, g->fniv);
+        expand_2_vec(g->vece, dofs, aofs, oprsz, 8, TCG_TYPE_V64,
+                     g->load_dest, g->fniv);
         break;
 
     case 0:
         if (g->fni8 && check_size_impl(oprsz, 8)) {
-            expand_2_i64(dofs, aofs, oprsz, g->fni8);
+            expand_2_i64(dofs, aofs, oprsz, g->load_dest, g->fni8);
         } else if (g->fni4 && check_size_impl(oprsz, 4)) {
-            expand_2_i32(dofs, aofs, oprsz, g->fni4);
+            expand_2_i32(dofs, aofs, oprsz, g->load_dest, g->fni4);
         } else {
             assert(g->fno != NULL);
             tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, g->data, g->fno);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 27/36] tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32, 64}
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (25 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 26/36] tcg: Add load_dest parameter to GVecGen2 Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22 10:19   ` Philippe Mathieu-Daudé
  2020-04-23  9:38   ` [PATCH v2 27/36] tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32,64} Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 28/36] tcg: Implement gvec support for rotate by immediate Richard Henderson
                   ` (9 subsequent siblings)
  36 siblings, 2 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

For the benefit of compatibility of function pointer types,
we have standardized on int32_t and int64_t as the integral
argument to tcg expanders.

We converted most of them in 474b2e8f0f7, but missed the rotates.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-op.h |  8 ++++----
 tcg/tcg-op.c         | 16 ++++++++--------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index a39eb13ff0..b07bf7b524 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -298,9 +298,9 @@ void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
 void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_ctpop_i32(TCGv_i32 a1, TCGv_i32 a2);
 void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
-void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
+void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
 void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
-void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
+void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
 void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
                          unsigned int ofs, unsigned int len);
 void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
@@ -490,9 +490,9 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
 void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg);
 void tcg_gen_ctpop_i64(TCGv_i64 a1, TCGv_i64 a2);
 void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
-void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
+void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
 void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
-void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
+void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
 void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
                          unsigned int ofs, unsigned int len);
 void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 07eb661a07..202d8057c5 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -516,9 +516,9 @@ void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
     }
 }
 
-void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2)
+void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 {
-    tcg_debug_assert(arg2 < 32);
+    tcg_debug_assert(arg2 >= 0 && arg2 < 32);
     /* some cases can be optimized here */
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
@@ -554,9 +554,9 @@ void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
     }
 }
 
-void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2)
+void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 {
-    tcg_debug_assert(arg2 < 32);
+    tcg_debug_assert(arg2 >= 0 && arg2 < 32);
     /* some cases can be optimized here */
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
@@ -1949,9 +1949,9 @@ void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
     }
 }
 
-void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2)
+void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
 {
-    tcg_debug_assert(arg2 < 64);
+    tcg_debug_assert(arg2 >= 0 && arg2 < 64);
     /* some cases can be optimized here */
     if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
@@ -1986,9 +1986,9 @@ void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
     }
 }
 
-void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2)
+void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
 {
-    tcg_debug_assert(arg2 < 64);
+    tcg_debug_assert(arg2 >= 0 && arg2 < 64);
     /* some cases can be optimized here */
     if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 28/36] tcg: Implement gvec support for rotate by immediate
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (26 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 27/36] tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32, 64} Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-23 13:28   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 29/36] tcg: Implement gvec support for rotate by vector Richard Henderson
                   ` (8 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

No host backend support yet, but the interfaces for rotli
are in place.  Canonicalize immediate rotate to the left,
based on a survey of architectures, but provide both left
and right shift interfaces to the translators.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/tcg-runtime.h      |  5 +++
 include/tcg/tcg-op-gvec.h    |  6 ++++
 include/tcg/tcg-op.h         |  2 ++
 include/tcg/tcg-opc.h        |  1 +
 include/tcg/tcg.h            |  1 +
 tcg/aarch64/tcg-target.h     |  1 +
 tcg/i386/tcg-target.h        |  1 +
 tcg/ppc/tcg-target.h         |  1 +
 accel/tcg/tcg-runtime-gvec.c | 48 +++++++++++++++++++++++++
 tcg/tcg-op-gvec.c            | 68 ++++++++++++++++++++++++++++++++++++
 tcg/tcg-op-vec.c             | 12 +++++++
 tcg/tcg.c                    |  2 ++
 tcg/README                   |  3 +-
 13 files changed, 150 insertions(+), 1 deletion(-)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 4fa61b49b4..cf10c8361e 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -259,6 +259,11 @@ DEF_HELPER_FLAGS_3(gvec_sar16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(gvec_sar32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(gvec_sar64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(gvec_rotl8i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_rotl16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_rotl32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_rotl64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(gvec_shl8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_shl16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_shl32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index cea6497341..1afc3ebf03 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -334,6 +334,10 @@ void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_rotli(unsigned vece, uint32_t dofs, uint32_t aofs,
+                        int64_t shift, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_rotri(unsigned vece, uint32_t dofs, uint32_t aofs,
+                        int64_t shift, uint32_t oprsz, uint32_t maxsz);
 
 void tcg_gen_gvec_shls(unsigned vece, uint32_t dofs, uint32_t aofs,
                        TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
@@ -388,5 +392,7 @@ void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
+void tcg_gen_vec_rotl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
+void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 
 #endif
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index b07bf7b524..c624e371d5 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -986,6 +986,8 @@ void tcg_gen_umax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
+void tcg_gen_rotli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
+void tcg_gen_rotri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 
 void tcg_gen_shls_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
 void tcg_gen_shrs_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index 4a9cbf5426..c46c096c3e 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -245,6 +245,7 @@ DEF(not_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_not_vec))
 DEF(shli_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec))
 DEF(shri_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec))
 DEF(sari_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec))
+DEF(rotli_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_roti_vec))
 
 DEF(shls_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec))
 DEF(shrs_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec))
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index f72530dfda..d2034d9334 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -182,6 +182,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_not_vec          0
 #define TCG_TARGET_HAS_andc_vec         0
 #define TCG_TARGET_HAS_orc_vec          0
+#define TCG_TARGET_HAS_roti_vec         0
 #define TCG_TARGET_HAS_shi_vec          0
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          0
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index ca214f6909..225a597f84 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -133,6 +133,7 @@ typedef enum {
 #define TCG_TARGET_HAS_not_vec          1
 #define TCG_TARGET_HAS_neg_vec          1
 #define TCG_TARGET_HAS_abs_vec          1
+#define TCG_TARGET_HAS_roti_vec         0
 #define TCG_TARGET_HAS_shi_vec          1
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index bfb3f5f6e9..23aabde992 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -183,6 +183,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_not_vec          0
 #define TCG_TARGET_HAS_neg_vec          0
 #define TCG_TARGET_HAS_abs_vec          1
+#define TCG_TARGET_HAS_roti_vec         0
 #define TCG_TARGET_HAS_shi_vec          1
 #define TCG_TARGET_HAS_shs_vec          1
 #define TCG_TARGET_HAS_shv_vec          have_avx2
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 4fa21f0e71..e57b891aa5 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -161,6 +161,7 @@ extern bool have_vsx;
 #define TCG_TARGET_HAS_not_vec          1
 #define TCG_TARGET_HAS_neg_vec          have_isa_3_00
 #define TCG_TARGET_HAS_abs_vec          0
+#define TCG_TARGET_HAS_roti_vec         0
 #define TCG_TARGET_HAS_shi_vec          0
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          1
diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index ca449702e6..34b1030365 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -716,6 +716,54 @@ void HELPER(gvec_sar64i)(void *d, void *a, uint32_t desc)
     clear_high(d, oprsz, desc);
 }
 
+void HELPER(gvec_rotl8i)(void *d, void *a, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    int shift = simd_data(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint8_t)) {
+        *(uint8_t *)(d + i) = rol8(*(uint8_t *)(a + i), shift);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_rotl16i)(void *d, void *a, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    int shift = simd_data(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint16_t)) {
+        *(uint16_t *)(d + i) = rol16(*(uint16_t *)(a + i), shift);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_rotl32i)(void *d, void *a, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    int shift = simd_data(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
+        *(uint32_t *)(d + i) = rol32(*(uint32_t *)(a + i), shift);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_rotl64i)(void *d, void *a, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    int shift = simd_data(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
+        *(uint64_t *)(d + i) = rol64(*(uint64_t *)(a + i), shift);
+    }
+    clear_high(d, oprsz, desc);
+}
+
 void HELPER(gvec_shl8v)(void *d, void *a, void *b, uint32_t desc)
 {
     intptr_t oprsz = simd_oprsz(desc);
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 049a55e700..25300b1577 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -2694,6 +2694,74 @@ void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
     }
 }
 
+void tcg_gen_vec_rotl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
+{
+    uint64_t mask = dup_const(MO_8, 0xff << c);
+
+    tcg_gen_shli_i64(d, a, c);
+    tcg_gen_shri_i64(a, a, 8 - c);
+    tcg_gen_andi_i64(d, d, mask);
+    tcg_gen_andi_i64(a, a, ~mask);
+    tcg_gen_or_i64(d, d, a);
+}
+
+void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
+{
+    uint64_t mask = dup_const(MO_16, 0xffff << c);
+
+    tcg_gen_shli_i64(d, a, c);
+    tcg_gen_shri_i64(a, a, 16 - c);
+    tcg_gen_andi_i64(d, d, mask);
+    tcg_gen_andi_i64(a, a, ~mask);
+    tcg_gen_or_i64(d, d, a);
+}
+
+void tcg_gen_gvec_rotli(unsigned vece, uint32_t dofs, uint32_t aofs,
+                        int64_t shift, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = { INDEX_op_rotli_vec, 0 };
+    static const GVecGen2i g[4] = {
+        { .fni8 = tcg_gen_vec_rotl8i_i64,
+          .fniv = tcg_gen_rotli_vec,
+          .fno = gen_helper_gvec_rotl8i,
+          .opt_opc = vecop_list,
+          .vece = MO_8 },
+        { .fni8 = tcg_gen_vec_rotl16i_i64,
+          .fniv = tcg_gen_rotli_vec,
+          .fno = gen_helper_gvec_rotl16i,
+          .opt_opc = vecop_list,
+          .vece = MO_16 },
+        { .fni4 = tcg_gen_rotli_i32,
+          .fniv = tcg_gen_rotli_vec,
+          .fno = gen_helper_gvec_rotl32i,
+          .opt_opc = vecop_list,
+          .vece = MO_32 },
+        { .fni8 = tcg_gen_rotli_i64,
+          .fniv = tcg_gen_rotli_vec,
+          .fno = gen_helper_gvec_rotl64i,
+          .opt_opc = vecop_list,
+          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+          .vece = MO_64 },
+    };
+
+    tcg_debug_assert(vece <= MO_64);
+    tcg_debug_assert(shift >= 0 && shift < (8 << vece));
+    if (shift == 0) {
+        tcg_gen_gvec_mov(vece, dofs, aofs, oprsz, maxsz);
+    } else {
+        tcg_gen_gvec_2i(dofs, aofs, oprsz, maxsz, shift, &g[vece]);
+    }
+}
+
+void tcg_gen_gvec_rotri(unsigned vece, uint32_t dofs, uint32_t aofs,
+                        int64_t shift, uint32_t oprsz, uint32_t maxsz)
+{
+    tcg_debug_assert(vece <= MO_64);
+    tcg_debug_assert(shift >= 0 && shift < (8 << vece));
+    tcg_gen_gvec_rotli(vece, dofs, aofs, -shift & ((8 << vece) - 1),
+                       oprsz, maxsz);
+}
+
 /*
  * Specialized generation vector shifts by a non-constant scalar.
  */
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index a9c16d85c5..845cb3de2e 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -546,6 +546,18 @@ void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i)
     do_shifti(INDEX_op_sari_vec, vece, r, a, i);
 }
 
+void tcg_gen_rotli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i)
+{
+    do_shifti(INDEX_op_rotli_vec, vece, r, a, i);
+}
+
+void tcg_gen_rotri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i)
+{
+    int bits = 8 << vece;
+    tcg_debug_assert(i >= 0 && i < bits);
+    do_shifti(INDEX_op_rotli_vec, vece, r, a, -i & (bits - 1));
+}
+
 void tcg_gen_cmp_vec(TCGCond cond, unsigned vece,
                      TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index d712d19842..71409073bb 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1697,6 +1697,8 @@ bool tcg_op_supported(TCGOpcode op)
     case INDEX_op_shrv_vec:
     case INDEX_op_sarv_vec:
         return have_vec && TCG_TARGET_HAS_shv_vec;
+    case INDEX_op_rotli_vec:
+        return have_vec && TCG_TARGET_HAS_roti_vec;
     case INDEX_op_ssadd_vec:
     case INDEX_op_usadd_vec:
     case INDEX_op_sssub_vec:
diff --git a/tcg/README b/tcg/README
index bfa2e4ed24..1e3e4654f4 100644
--- a/tcg/README
+++ b/tcg/README
@@ -605,10 +605,11 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32.
 
 * shri_vec   v0, v1, i2
 * sari_vec   v0, v1, i2
+* rotli_vec  v0, v1, i2
 * shrs_vec   v0, v1, s2
 * sars_vec   v0, v1, s2
 
-  Similarly for logical and arithmetic right shift.
+  Similarly for logical and arithmetic right shift, and left rotate.
 
 * shlv_vec   v0, v1, v2
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 29/36] tcg: Implement gvec support for rotate by vector
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (27 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 28/36] tcg: Implement gvec support for rotate by immediate Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-23 13:41   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 30/36] tcg: Remove expansion to shift by vector from do_shifts Richard Henderson
                   ` (7 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

No host backend support yet, but the interfaces for rotlv
and rotrv are in place.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/tcg-runtime.h      |  10 +++
 include/tcg/tcg-op-gvec.h    |   4 ++
 include/tcg/tcg-op.h         |   2 +
 include/tcg/tcg-opc.h        |   2 +
 include/tcg/tcg.h            |   1 +
 tcg/aarch64/tcg-target.h     |   1 +
 tcg/i386/tcg-target.h        |   1 +
 tcg/ppc/tcg-target.h         |   1 +
 accel/tcg/tcg-runtime-gvec.c |  96 +++++++++++++++++++++++++++
 tcg/tcg-op-gvec.c            | 122 +++++++++++++++++++++++++++++++++++
 tcg/tcg-op-vec.c             |  83 ++++++++++++++++++++++++
 tcg/tcg.c                    |   3 +
 tcg/README                   |   4 +-
 13 files changed, 329 insertions(+), 1 deletion(-)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index cf10c8361e..4eda24e63a 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -279,6 +279,16 @@ DEF_HELPER_FLAGS_4(gvec_sar16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_sar32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_sar64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(gvec_rotl8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_rotl16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_rotl32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_rotl64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_rotr8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_rotr16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_rotr32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_rotr64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(gvec_eq8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_eq16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_eq32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index 1afc3ebf03..2d768f1160 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -356,6 +356,10 @@ void tcg_gen_gvec_shrv(unsigned vece, uint32_t dofs, uint32_t aofs,
                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_sarv(unsigned vece, uint32_t dofs, uint32_t aofs,
                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_rotlv(unsigned vece, uint32_t dofs, uint32_t aofs,
+                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_rotrv(unsigned vece, uint32_t dofs, uint32_t aofs,
+                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
 
 void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs,
                       uint32_t aofs, uint32_t bofs,
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index c624e371d5..0468009713 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -996,6 +996,8 @@ void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
 void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
 void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
 void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
+void tcg_gen_rotlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
+void tcg_gen_rotrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
 
 void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, TCGv_vec r,
                      TCGv_vec a, TCGv_vec b);
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index c46c096c3e..d80335ba0d 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -254,6 +254,8 @@ DEF(sars_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec))
 DEF(shlv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
 DEF(shrv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
 DEF(sarv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
+DEF(rotlv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_rotv_vec))
+DEF(rotrv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_rotv_vec))
 
 DEF(cmp_vec, 1, 2, 1, IMPLVEC)
 
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index d2034d9334..6bb2e3fe3c 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -183,6 +183,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_andc_vec         0
 #define TCG_TARGET_HAS_orc_vec          0
 #define TCG_TARGET_HAS_roti_vec         0
+#define TCG_TARGET_HAS_rotv_vec         0
 #define TCG_TARGET_HAS_shi_vec          0
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          0
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 225a597f84..a5477bbc07 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -134,6 +134,7 @@ typedef enum {
 #define TCG_TARGET_HAS_neg_vec          1
 #define TCG_TARGET_HAS_abs_vec          1
 #define TCG_TARGET_HAS_roti_vec         0
+#define TCG_TARGET_HAS_rotv_vec         0
 #define TCG_TARGET_HAS_shi_vec          1
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 23aabde992..4c806c97db 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -184,6 +184,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_neg_vec          0
 #define TCG_TARGET_HAS_abs_vec          1
 #define TCG_TARGET_HAS_roti_vec         0
+#define TCG_TARGET_HAS_rotv_vec         0
 #define TCG_TARGET_HAS_shi_vec          1
 #define TCG_TARGET_HAS_shs_vec          1
 #define TCG_TARGET_HAS_shv_vec          have_avx2
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index e57b891aa5..7993422526 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -162,6 +162,7 @@ extern bool have_vsx;
 #define TCG_TARGET_HAS_neg_vec          have_isa_3_00
 #define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_roti_vec         0
+#define TCG_TARGET_HAS_rotv_vec         0
 #define TCG_TARGET_HAS_shi_vec          0
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          1
diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index 34b1030365..521da4a813 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -908,6 +908,102 @@ void HELPER(gvec_sar64v)(void *d, void *a, void *b, uint32_t desc)
     clear_high(d, oprsz, desc);
 }
 
+void HELPER(gvec_rotl8v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint8_t)) {
+        uint8_t sh = *(uint8_t *)(b + i) & 7;
+        *(uint8_t *)(d + i) = rol8(*(uint8_t *)(a + i), sh);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_rotl16v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint16_t)) {
+        uint8_t sh = *(uint16_t *)(b + i) & 15;
+        *(uint16_t *)(d + i) = rol16(*(uint16_t *)(a + i), sh);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_rotl32v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
+        uint8_t sh = *(uint32_t *)(b + i) & 31;
+        *(uint32_t *)(d + i) = rol32(*(uint32_t *)(a + i), sh);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_rotl64v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
+        uint8_t sh = *(uint64_t *)(b + i) & 63;
+        *(uint64_t *)(d + i) = rol64(*(uint64_t *)(a + i), sh);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_rotr8v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint8_t)) {
+        uint8_t sh = *(uint8_t *)(b + i) & 7;
+        *(uint8_t *)(d + i) = ror8(*(uint8_t *)(a + i), sh);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_rotr16v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint16_t)) {
+        uint8_t sh = *(uint16_t *)(b + i) & 15;
+        *(uint16_t *)(d + i) = ror16(*(uint16_t *)(a + i), sh);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_rotr32v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
+        uint8_t sh = *(uint32_t *)(b + i) & 31;
+        *(uint32_t *)(d + i) = ror32(*(uint32_t *)(a + i), sh);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_rotr64v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
+        uint8_t sh = *(uint64_t *)(b + i) & 63;
+        *(uint64_t *)(d + i) = ror64(*(uint64_t *)(a + i), sh);
+    }
+    clear_high(d, oprsz, desc);
+}
+
 #define DO_CMP1(NAME, TYPE, OP)                                            \
 void HELPER(NAME)(void *d, void *a, void *b, uint32_t desc)                \
 {                                                                          \
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 25300b1577..2b71725883 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -3171,6 +3171,128 @@ void tcg_gen_gvec_sarv(unsigned vece, uint32_t dofs, uint32_t aofs,
     tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
 }
 
+/*
+ * Similarly for rotates.
+ */
+
+static void tcg_gen_rotlv_mod_vec(unsigned vece, TCGv_vec d,
+                                  TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t = tcg_temp_new_vec_matching(d);
+
+    tcg_gen_dupi_vec(vece, t, (8 << vece) - 1);
+    tcg_gen_and_vec(vece, t, t, b);
+    tcg_gen_rotlv_vec(vece, d, a, t);
+    tcg_temp_free_vec(t);
+}
+
+static void tcg_gen_rotl_mod_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t = tcg_temp_new_i32();
+
+    tcg_gen_andi_i32(t, b, 31);
+    tcg_gen_rotl_i32(d, a, t);
+    tcg_temp_free_i32(t);
+}
+
+static void tcg_gen_rotl_mod_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t = tcg_temp_new_i64();
+
+    tcg_gen_andi_i64(t, b, 63);
+    tcg_gen_rotl_i64(d, a, t);
+    tcg_temp_free_i64(t);
+}
+
+void tcg_gen_gvec_rotlv(unsigned vece, uint32_t dofs, uint32_t aofs,
+                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = { INDEX_op_rotlv_vec, 0 };
+    static const GVecGen3 g[4] = {
+        { .fniv = tcg_gen_rotlv_mod_vec,
+          .fno = gen_helper_gvec_rotl8v,
+          .opt_opc = vecop_list,
+          .vece = MO_8 },
+        { .fniv = tcg_gen_rotlv_mod_vec,
+          .fno = gen_helper_gvec_rotl16v,
+          .opt_opc = vecop_list,
+          .vece = MO_16 },
+        { .fni4 = tcg_gen_rotl_mod_i32,
+          .fniv = tcg_gen_rotlv_mod_vec,
+          .fno = gen_helper_gvec_rotl32v,
+          .opt_opc = vecop_list,
+          .vece = MO_32 },
+        { .fni8 = tcg_gen_rotl_mod_i64,
+          .fniv = tcg_gen_rotlv_mod_vec,
+          .fno = gen_helper_gvec_rotl64v,
+          .opt_opc = vecop_list,
+          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+          .vece = MO_64 },
+    };
+
+    tcg_debug_assert(vece <= MO_64);
+    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
+}
+
+static void tcg_gen_rotrv_mod_vec(unsigned vece, TCGv_vec d,
+                                  TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t = tcg_temp_new_vec_matching(d);
+
+    tcg_gen_dupi_vec(vece, t, (8 << vece) - 1);
+    tcg_gen_and_vec(vece, t, t, b);
+    tcg_gen_rotrv_vec(vece, d, a, t);
+    tcg_temp_free_vec(t);
+}
+
+static void tcg_gen_rotr_mod_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t = tcg_temp_new_i32();
+
+    tcg_gen_andi_i32(t, b, 31);
+    tcg_gen_rotr_i32(d, a, t);
+    tcg_temp_free_i32(t);
+}
+
+static void tcg_gen_rotr_mod_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t = tcg_temp_new_i64();
+
+    tcg_gen_andi_i64(t, b, 63);
+    tcg_gen_rotr_i64(d, a, t);
+    tcg_temp_free_i64(t);
+}
+
+void tcg_gen_gvec_rotrv(unsigned vece, uint32_t dofs, uint32_t aofs,
+                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = { INDEX_op_rotrv_vec, 0 };
+    static const GVecGen3 g[4] = {
+        { .fniv = tcg_gen_rotrv_mod_vec,
+          .fno = gen_helper_gvec_rotr8v,
+          .opt_opc = vecop_list,
+          .vece = MO_8 },
+        { .fniv = tcg_gen_rotrv_mod_vec,
+          .fno = gen_helper_gvec_rotr16v,
+          .opt_opc = vecop_list,
+          .vece = MO_16 },
+        { .fni4 = tcg_gen_rotr_mod_i32,
+          .fniv = tcg_gen_rotrv_mod_vec,
+          .fno = gen_helper_gvec_rotr32v,
+          .opt_opc = vecop_list,
+          .vece = MO_32 },
+        { .fni8 = tcg_gen_rotr_mod_i64,
+          .fniv = tcg_gen_rotrv_mod_vec,
+          .fno = gen_helper_gvec_rotr64v,
+          .opt_opc = vecop_list,
+          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+          .vece = MO_64 },
+    };
+
+    tcg_debug_assert(vece <= MO_64);
+    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
+}
+
 /* Expand OPSZ bytes worth of three-operand operations using i32 elements.  */
 static void expand_cmp_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                            uint32_t oprsz, TCGCond cond)
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 845cb3de2e..4af92d6b0a 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -129,6 +129,17 @@ bool tcg_can_emit_vecop_list(const TCGOpcode *list,
                 continue;
             }
             break;
+        case INDEX_op_rotlv_vec:
+        case INDEX_op_rotrv_vec:
+            if (tcg_can_emit_vec_op(opc == INDEX_op_rotlv_vec
+                                    ? INDEX_op_rotrv_vec
+                                    : INDEX_op_rotlv_vec, type, vece)) {
+                continue;
+            }
+            if (tcg_can_emit_vec_op(INDEX_op_shlv_vec, type, vece) &&
+                tcg_can_emit_vec_op(INDEX_op_shrv_vec, type, vece)) {
+                continue;
+            }
         default:
             break;
         }
@@ -697,6 +708,78 @@ void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
     do_op3_nofail(vece, r, a, b, INDEX_op_sarv_vec);
 }
 
+static void do_rotv(unsigned vece, TCGv_vec r, TCGv_vec a,
+                    TCGv_vec b, bool right)
+{
+    TCGTemp *rt = tcgv_vec_temp(r);
+    TCGTemp *at = tcgv_vec_temp(a);
+    TCGTemp *bt = tcgv_vec_temp(b);
+    TCGArg ri = temp_arg(rt);
+    TCGArg ai = temp_arg(at);
+    TCGArg bi = temp_arg(bt);
+    TCGType type = rt->base_type;
+    TCGOpcode opc = right ? INDEX_op_rotrv_vec : INDEX_op_rotlv_vec;
+    const TCGOpcode *hold_list;
+    TCGv_vec t;
+    int can;
+
+    tcg_debug_assert(at->base_type >= type);
+    tcg_debug_assert(bt->base_type >= type);
+    tcg_assert_listed_vecop(opc);
+
+    /* Try the requested shift. */
+    can = tcg_can_emit_vec_op(opc, type, vece);
+    if (can) {
+        if (can > 0) {
+            vec_gen_3(opc, type, vece, ri, ai, bi);
+        } else {
+            hold_list = tcg_swap_vecop_list(NULL);
+            tcg_expand_vec_op(opc, type, vece, ri, ai, bi);
+            tcg_swap_vecop_list(hold_list);
+        }
+        return;
+    }
+
+    hold_list = tcg_swap_vecop_list(NULL);
+    t = tcg_temp_new_vec(type);
+    tcg_gen_neg_vec(vece, t, b);
+    tcg_gen_and_vec(vece, t, t, tcg_constant_vec(type, vece, (8 << vece) - 1));
+
+    /* Try the reverse shift. */
+    opc = right ? INDEX_op_rotlv_vec : INDEX_op_rotrv_vec;
+    can = tcg_can_emit_vec_op(opc, type, vece);
+    if (can) {
+        if (can > 0) {
+            vec_gen_3(opc, type, vece, ri, ai, tcgv_vec_arg(t));
+        } else {
+            tcg_expand_vec_op(opc, type, vece, ri, ai, tcgv_vec_arg(t));
+        }
+    } else {
+        /* Fall back to shifts. */
+        if (right) {
+            tcg_gen_shlv_vec(vece, t, a, t);
+            tcg_gen_shrv_vec(vece, r, a, b);
+        } else {
+            tcg_gen_shrv_vec(vece, t, a, t);
+            tcg_gen_shlv_vec(vece, r, a, b);
+        }
+        tcg_gen_or_vec(vece, r, r, t);
+    }
+
+    tcg_temp_free_vec(t);
+    tcg_swap_vecop_list(hold_list);
+}
+
+void tcg_gen_rotlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_rotv(vece, r, a, b, false);
+}
+
+void tcg_gen_rotrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_rotv(vece, r, a, b, true);
+}
+
 static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec a,
                       TCGv_i32 s, TCGOpcode opc_s, TCGOpcode opc_v)
 {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 71409073bb..5a82464610 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1699,6 +1699,9 @@ bool tcg_op_supported(TCGOpcode op)
         return have_vec && TCG_TARGET_HAS_shv_vec;
     case INDEX_op_rotli_vec:
         return have_vec && TCG_TARGET_HAS_roti_vec;
+    case INDEX_op_rotlv_vec:
+    case INDEX_op_rotrv_vec:
+        return have_vec && TCG_TARGET_HAS_rotv_vec;
     case INDEX_op_ssadd_vec:
     case INDEX_op_usadd_vec:
     case INDEX_op_sssub_vec:
diff --git a/tcg/README b/tcg/README
index 1e3e4654f4..a64f67809b 100644
--- a/tcg/README
+++ b/tcg/README
@@ -621,8 +621,10 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32.
 
 * shrv_vec   v0, v1, v2
 * sarv_vec   v0, v1, v2
+* rotlv_vec  v0, v1, v2
+* rotrv_vec  v0, v1, v2
 
-  Similarly for logical and arithmetic right shift.
+  Similarly for logical and arithmetic right shift, and rotates.
 
 * cmp_vec  v0, v1, v2, cond
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 30/36] tcg: Remove expansion to shift by vector from do_shifts
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (28 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 29/36] tcg: Implement gvec support for rotate by vector Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22  1:17 ` [PATCH v2 31/36] tcg: Implement gvec support for rotate by scalar Richard Henderson
                   ` (6 subsequent siblings)
  36 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

We do not reflect this expansion in tcg_can_emit_vecop_list,
so it is unused and unusable.  However, we actually perform
the same expansion in do_gvec_shifts, so it is also unneeded.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-vec.c | 35 +++++++++++------------------------
 1 file changed, 11 insertions(+), 24 deletions(-)

diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 4af92d6b0a..52c1b66283 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -781,7 +781,7 @@ void tcg_gen_rotrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 }
 
 static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec a,
-                      TCGv_i32 s, TCGOpcode opc_s, TCGOpcode opc_v)
+                      TCGv_i32 s, TCGOpcode opc)
 {
     TCGTemp *rt = tcgv_vec_temp(r);
     TCGTemp *at = tcgv_vec_temp(a);
@@ -790,48 +790,35 @@ static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec a,
     TCGArg ai = temp_arg(at);
     TCGArg si = temp_arg(st);
     TCGType type = rt->base_type;
-    const TCGOpcode *hold_list;
     int can;
 
     tcg_debug_assert(at->base_type >= type);
-    tcg_assert_listed_vecop(opc_s);
-    hold_list = tcg_swap_vecop_list(NULL);
-
-    can = tcg_can_emit_vec_op(opc_s, type, vece);
+    tcg_assert_listed_vecop(opc);
+    can = tcg_can_emit_vec_op(opc, type, vece);
     if (can > 0) {
-        vec_gen_3(opc_s, type, vece, ri, ai, si);
+        vec_gen_3(opc, type, vece, ri, ai, si);
     } else if (can < 0) {
-        tcg_expand_vec_op(opc_s, type, vece, ri, ai, si);
+        const TCGOpcode *hold_list = tcg_swap_vecop_list(NULL);
+        tcg_expand_vec_op(opc, type, vece, ri, ai, si);
+        tcg_swap_vecop_list(hold_list);
     } else {
-        TCGv_vec vec_s = tcg_temp_new_vec(type);
-
-        if (vece == MO_64) {
-            TCGv_i64 s64 = tcg_temp_new_i64();
-            tcg_gen_extu_i32_i64(s64, s);
-            tcg_gen_dup_i64_vec(MO_64, vec_s, s64);
-            tcg_temp_free_i64(s64);
-        } else {
-            tcg_gen_dup_i32_vec(vece, vec_s, s);
-        }
-        do_op3_nofail(vece, r, a, vec_s, opc_v);
-        tcg_temp_free_vec(vec_s);
+        g_assert_not_reached();
     }
-    tcg_swap_vecop_list(hold_list);
 }
 
 void tcg_gen_shls_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b)
 {
-    do_shifts(vece, r, a, b, INDEX_op_shls_vec, INDEX_op_shlv_vec);
+    do_shifts(vece, r, a, b, INDEX_op_shls_vec);
 }
 
 void tcg_gen_shrs_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b)
 {
-    do_shifts(vece, r, a, b, INDEX_op_shrs_vec, INDEX_op_shrv_vec);
+    do_shifts(vece, r, a, b, INDEX_op_shrs_vec);
 }
 
 void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b)
 {
-    do_shifts(vece, r, a, b, INDEX_op_sars_vec, INDEX_op_sarv_vec);
+    do_shifts(vece, r, a, b, INDEX_op_sars_vec);
 }
 
 void tcg_gen_bitsel_vec(unsigned vece, TCGv_vec r, TCGv_vec a,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 31/36] tcg: Implement gvec support for rotate by scalar
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (29 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 30/36] tcg: Remove expansion to shift by vector from do_shifts Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-23 13:46   ` Alex Bennée
  2020-04-22  1:17 ` [PATCH v2 32/36] tcg/i386: Implement INDEX_op_rotl[is]_vec Richard Henderson
                   ` (5 subsequent siblings)
  36 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

No host backend support yet, but the interfaces for rotls
are in place.  Only implement left-rotate for now, as the
only known use of vector rotate by scalar is s390x, so any
right-rotate would be unused and untestable.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-op-gvec.h |  2 ++
 include/tcg/tcg-op.h      |  1 +
 include/tcg/tcg-opc.h     |  1 +
 include/tcg/tcg.h         |  1 +
 tcg/aarch64/tcg-target.h  |  1 +
 tcg/i386/tcg-target.h     |  1 +
 tcg/ppc/tcg-target.h      |  1 +
 tcg/tcg-op-gvec.c         | 22 ++++++++++++++++++++++
 tcg/tcg-op-vec.c          |  5 +++++
 tcg/tcg.c                 |  2 ++
 10 files changed, 37 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index 2d768f1160..c69a7de984 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -345,6 +345,8 @@ void tcg_gen_gvec_shrs(unsigned vece, uint32_t dofs, uint32_t aofs,
                        TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_sars(unsigned vece, uint32_t dofs, uint32_t aofs,
                        TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_rotls(unsigned vece, uint32_t dofs, uint32_t aofs,
+                        TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
 
 /*
  * Perform vector shift by vector element, modulo the element size.
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index 0468009713..d0319692ec 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -992,6 +992,7 @@ void tcg_gen_rotri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 void tcg_gen_shls_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
 void tcg_gen_shrs_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
 void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
+void tcg_gen_rotls_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
 
 void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
 void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index d80335ba0d..d63c6bcb3d 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -250,6 +250,7 @@ DEF(rotli_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_roti_vec))
 DEF(shls_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec))
 DEF(shrs_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec))
 DEF(sars_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec))
+DEF(rotls_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_rots_vec))
 
 DEF(shlv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
 DEF(shrv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 6bb2e3fe3c..57d6b0216c 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -183,6 +183,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_andc_vec         0
 #define TCG_TARGET_HAS_orc_vec          0
 #define TCG_TARGET_HAS_roti_vec         0
+#define TCG_TARGET_HAS_rots_vec         0
 #define TCG_TARGET_HAS_rotv_vec         0
 #define TCG_TARGET_HAS_shi_vec          0
 #define TCG_TARGET_HAS_shs_vec          0
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index a5477bbc07..9bc2a5ecbe 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -134,6 +134,7 @@ typedef enum {
 #define TCG_TARGET_HAS_neg_vec          1
 #define TCG_TARGET_HAS_abs_vec          1
 #define TCG_TARGET_HAS_roti_vec         0
+#define TCG_TARGET_HAS_rots_vec         0
 #define TCG_TARGET_HAS_rotv_vec         0
 #define TCG_TARGET_HAS_shi_vec          1
 #define TCG_TARGET_HAS_shs_vec          0
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 4c806c97db..99ac1e3958 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -184,6 +184,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_neg_vec          0
 #define TCG_TARGET_HAS_abs_vec          1
 #define TCG_TARGET_HAS_roti_vec         0
+#define TCG_TARGET_HAS_rots_vec         0
 #define TCG_TARGET_HAS_rotv_vec         0
 #define TCG_TARGET_HAS_shi_vec          1
 #define TCG_TARGET_HAS_shs_vec          1
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 7993422526..4a17aebc5a 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -162,6 +162,7 @@ extern bool have_vsx;
 #define TCG_TARGET_HAS_neg_vec          have_isa_3_00
 #define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_roti_vec         0
+#define TCG_TARGET_HAS_rots_vec         0
 #define TCG_TARGET_HAS_rotv_vec         0
 #define TCG_TARGET_HAS_shi_vec          0
 #define TCG_TARGET_HAS_shs_vec          0
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 2b71725883..3707c0effb 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -2976,6 +2976,28 @@ void tcg_gen_gvec_sars(unsigned vece, uint32_t dofs, uint32_t aofs,
     do_gvec_shifts(vece, dofs, aofs, shift, oprsz, maxsz, &g);
 }
 
+void tcg_gen_gvec_rotls(unsigned vece, uint32_t dofs, uint32_t aofs,
+                        TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen2sh g = {
+        .fni4 = tcg_gen_rotl_i32,
+        .fni8 = tcg_gen_rotl_i64,
+        .fniv_s = tcg_gen_rotls_vec,
+        .fniv_v = tcg_gen_rotlv_vec,
+        .fno = {
+            gen_helper_gvec_rotl8i,
+            gen_helper_gvec_rotl16i,
+            gen_helper_gvec_rotl32i,
+            gen_helper_gvec_rotl64i,
+        },
+        .s_list = { INDEX_op_rotls_vec, 0 },
+        .v_list = { INDEX_op_rotlv_vec, 0 },
+    };
+
+    tcg_debug_assert(vece <= MO_64);
+    do_gvec_shifts(vece, dofs, aofs, shift, oprsz, maxsz, &g);
+}
+
 /*
  * Expand D = A << (B % element bits)
  *
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 52c1b66283..1c12e31fbb 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -821,6 +821,11 @@ void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b)
     do_shifts(vece, r, a, b, INDEX_op_sars_vec);
 }
 
+void tcg_gen_rotls_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s)
+{
+    do_shifts(vece, r, a, s, INDEX_op_rotls_vec);
+}
+
 void tcg_gen_bitsel_vec(unsigned vece, TCGv_vec r, TCGv_vec a,
                         TCGv_vec b, TCGv_vec c)
 {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 5a82464610..e8d06fe813 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1699,6 +1699,8 @@ bool tcg_op_supported(TCGOpcode op)
         return have_vec && TCG_TARGET_HAS_shv_vec;
     case INDEX_op_rotli_vec:
         return have_vec && TCG_TARGET_HAS_roti_vec;
+    case INDEX_op_rotls_vec:
+        return have_vec && TCG_TARGET_HAS_rots_vec;
     case INDEX_op_rotlv_vec:
     case INDEX_op_rotrv_vec:
         return have_vec && TCG_TARGET_HAS_rotv_vec;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 32/36] tcg/i386: Implement INDEX_op_rotl[is]_vec
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (30 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 31/36] tcg: Implement gvec support for rotate by scalar Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22  1:17 ` [PATCH v2 33/36] tcg/aarch64: Implement INDEX_op_rotli_vec Richard Henderson
                   ` (4 subsequent siblings)
  36 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

We must continue the special casing of 8-bit elements and the
other element sizes are trivially implemented with shifts.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 85 +++++++++++++++++++++++++++++++--------
 1 file changed, 69 insertions(+), 16 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index deace219d2..6039ae4fc6 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -3255,6 +3255,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_shls_vec:
     case INDEX_op_shrs_vec:
     case INDEX_op_sars_vec:
+    case INDEX_op_rotls_vec:
     case INDEX_op_cmp_vec:
     case INDEX_op_x86_shufps_vec:
     case INDEX_op_x86_blend_vec:
@@ -3293,6 +3294,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_xor_vec:
     case INDEX_op_andc_vec:
         return 1;
+    case INDEX_op_rotli_vec:
     case INDEX_op_cmp_vec:
     case INDEX_op_cmpsel_vec:
         return -1;
@@ -3316,6 +3318,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
 
     case INDEX_op_shls_vec:
     case INDEX_op_shrs_vec:
+    case INDEX_op_rotls_vec:
         return vece >= MO_16;
     case INDEX_op_sars_vec:
         return vece >= MO_16 && vece <= MO_32;
@@ -3353,7 +3356,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     }
 }
 
-static void expand_vec_shi(TCGType type, unsigned vece, bool shr,
+static void expand_vec_shi(TCGType type, unsigned vece, TCGOpcode opc,
                            TCGv_vec v0, TCGv_vec v1, TCGArg imm)
 {
     TCGv_vec t1, t2;
@@ -3363,26 +3366,31 @@ static void expand_vec_shi(TCGType type, unsigned vece, bool shr,
     t1 = tcg_temp_new_vec(type);
     t2 = tcg_temp_new_vec(type);
 
-    /* Unpack to W, shift, and repack.  Tricky bits:
-       (1) Use punpck*bw x,x to produce DDCCBBAA,
-           i.e. duplicate in other half of the 16-bit lane.
-       (2) For right-shift, add 8 so that the high half of
-           the lane becomes zero.  For left-shift, we must
-           shift up and down again.
-       (3) Step 2 leaves high half zero such that PACKUSWB
-           (pack with unsigned saturation) does not modify
-           the quantity.  */
+    /*
+     * Unpack to W, shift, and repack.  Tricky bits:
+     * (1) Use punpck*bw x,x to produce DDCCBBAA,
+     *     i.e. duplicate in other half of the 16-bit lane.
+     * (2) For right-shift, add 8 so that the high half of the lane
+     *     becomes zero.  For left-shift, and left-rotate, we must
+     *     shift up and down again.
+     * (3) Step 2 leaves high half zero such that PACKUSWB
+     *     (pack with unsigned saturation) does not modify
+     *     the quantity.
+     */
     vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_8,
               tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(v1));
     vec_gen_3(INDEX_op_x86_punpckh_vec, type, MO_8,
               tcgv_vec_arg(t2), tcgv_vec_arg(v1), tcgv_vec_arg(v1));
 
-    if (shr) {
-        tcg_gen_shri_vec(MO_16, t1, t1, imm + 8);
-        tcg_gen_shri_vec(MO_16, t2, t2, imm + 8);
+    if (opc != INDEX_op_rotli_vec) {
+        imm += 8;
+    }
+    if (opc == INDEX_op_shri_vec) {
+        tcg_gen_shri_vec(MO_16, t1, t1, imm);
+        tcg_gen_shri_vec(MO_16, t2, t2, imm);
     } else {
-        tcg_gen_shli_vec(MO_16, t1, t1, imm + 8);
-        tcg_gen_shli_vec(MO_16, t2, t2, imm + 8);
+        tcg_gen_shli_vec(MO_16, t1, t1, imm);
+        tcg_gen_shli_vec(MO_16, t2, t2, imm);
         tcg_gen_shri_vec(MO_16, t1, t1, 8);
         tcg_gen_shri_vec(MO_16, t2, t2, 8);
     }
@@ -3449,6 +3457,43 @@ static void expand_vec_sari(TCGType type, unsigned vece,
     }
 }
 
+static void expand_vec_rotli(TCGType type, unsigned vece,
+                             TCGv_vec v0, TCGv_vec v1, TCGArg imm)
+{
+    TCGv_vec t;
+
+    if (vece == MO_8) {
+        expand_vec_shi(type, vece, INDEX_op_rotli_vec, v0, v1, imm);
+        return;
+    }
+
+    t = tcg_temp_new_vec(type);
+    tcg_gen_shli_vec(vece, t, v1, imm);
+    tcg_gen_shri_vec(vece, v0, v1, (8 << vece) - imm);
+    tcg_gen_or_vec(vece, v0, v0, t);
+    tcg_temp_free_vec(t);
+}
+
+static void expand_vec_rotls(TCGType type, unsigned vece,
+                             TCGv_vec v0, TCGv_vec v1, TCGv_i32 lsh)
+{
+    TCGv_i32 rsh;
+    TCGv_vec t;
+
+    tcg_debug_assert(vece != MO_8);
+
+    t = tcg_temp_new_vec(type);
+    rsh = tcg_temp_new_i32();
+
+    tcg_gen_neg_i32(rsh, lsh);
+    tcg_gen_andi_i32(rsh, rsh, (8 << vece) - 1);
+    tcg_gen_shls_vec(vece, t, v1, lsh);
+    tcg_gen_shrs_vec(vece, v0, v1, rsh);
+    tcg_gen_or_vec(vece, v0, v0, t);
+    tcg_temp_free_vec(t);
+    tcg_temp_free_i32(rsh);
+}
+
 static void expand_vec_mul(TCGType type, unsigned vece,
                            TCGv_vec v0, TCGv_vec v1, TCGv_vec v2)
 {
@@ -3658,13 +3703,21 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
     switch (opc) {
     case INDEX_op_shli_vec:
     case INDEX_op_shri_vec:
-        expand_vec_shi(type, vece, opc == INDEX_op_shri_vec, v0, v1, a2);
+        expand_vec_shi(type, vece, opc, v0, v1, a2);
         break;
 
     case INDEX_op_sari_vec:
         expand_vec_sari(type, vece, v0, v1, a2);
         break;
 
+    case INDEX_op_rotli_vec:
+        expand_vec_rotli(type, vece, v0, v1, a2);
+        break;
+
+    case INDEX_op_rotls_vec:
+        expand_vec_rotls(type, vece, v0, v1, temp_tcgv_i32(arg_temp(a2)));
+        break;
+
     case INDEX_op_mul_vec:
         v2 = temp_tcgv_vec(arg_temp(a2));
         expand_vec_mul(type, vece, v0, v1, v2);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 33/36] tcg/aarch64: Implement INDEX_op_rotli_vec
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (31 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 32/36] tcg/i386: Implement INDEX_op_rotl[is]_vec Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22  1:17 ` [PATCH v2 34/36] tcg/ppc: Implement INDEX_op_rot[lr]v_vec Richard Henderson
                   ` (3 subsequent siblings)
  36 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

We can implement this in two instructions, using SLI.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.opc.h |  1 +
 tcg/aarch64/tcg-target.inc.c | 20 +++++++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/tcg/aarch64/tcg-target.opc.h b/tcg/aarch64/tcg-target.opc.h
index 26bfd9c460..bce30accd9 100644
--- a/tcg/aarch64/tcg-target.opc.h
+++ b/tcg/aarch64/tcg-target.opc.h
@@ -12,3 +12,4 @@
  */
 
 DEF(aa64_sshl_vec, 1, 2, 0, IMPLVEC)
+DEF(aa64_sli_vec, 1, 2, 1, IMPLVEC)
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 3b5a5d78c7..4bc9b30254 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -557,6 +557,7 @@ typedef enum {
     I3614_SSHR      = 0x0f000400,
     I3614_SSRA      = 0x0f001400,
     I3614_SHL       = 0x0f005400,
+    I3614_SLI       = 0x2f005400,
     I3614_USHR      = 0x2f000400,
     I3614_USRA      = 0x2f001400,
 
@@ -2402,6 +2403,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_sari_vec:
         tcg_out_insn(s, 3614, SSHR, is_q, a0, a1, (16 << vece) - a2);
         break;
+    case INDEX_op_aa64_sli_vec:
+        tcg_out_insn(s, 3614, SLI, is_q, a0, a2, args[3] + (8 << vece));
+        break;
     case INDEX_op_shlv_vec:
         tcg_out_insn(s, 3616, USHL, is_q, vece, a0, a1, a2);
         break;
@@ -2488,6 +2492,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_shlv_vec:
     case INDEX_op_bitsel_vec:
         return 1;
+    case INDEX_op_rotli_vec:
     case INDEX_op_shrv_vec:
     case INDEX_op_sarv_vec:
         return -1;
@@ -2508,13 +2513,23 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
 {
     va_list va;
     TCGv_vec v0, v1, v2, t1;
+    TCGArg a2;
 
     va_start(va, a0);
     v0 = temp_tcgv_vec(arg_temp(a0));
     v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
-    v2 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+    a2 = va_arg(va, TCGArg);
+    v2 = temp_tcgv_vec(arg_temp(a2));
 
     switch (opc) {
+    case INDEX_op_rotli_vec:
+        t1 = tcg_temp_new_vec(type);
+        tcg_gen_shri_vec(vece, t1, v1, -a2 & ((8 << vece) - 1));
+        vec_gen_4(INDEX_op_aa64_sli_vec, type, vece,
+                  tcgv_vec_arg(v0), tcgv_vec_arg(t1), tcgv_vec_arg(v1), a2);
+        tcg_temp_free_vec(t1);
+        break;
+
     case INDEX_op_shrv_vec:
     case INDEX_op_sarv_vec:
         /* Right shifts are negative left shifts for AArch64.  */
@@ -2547,6 +2562,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef lZ_l = { .args_ct_str = { "lZ", "l" } };
     static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } };
     static const TCGTargetOpDef w_w_w = { .args_ct_str = { "w", "w", "w" } };
+    static const TCGTargetOpDef w_0_w = { .args_ct_str = { "w", "0", "w" } };
     static const TCGTargetOpDef w_w_wO = { .args_ct_str = { "w", "w", "wO" } };
     static const TCGTargetOpDef w_w_wN = { .args_ct_str = { "w", "w", "wN" } };
     static const TCGTargetOpDef w_w_wZ = { .args_ct_str = { "w", "w", "wZ" } };
@@ -2741,6 +2757,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return &w_w_wZ;
     case INDEX_op_bitsel_vec:
         return &w_w_w_w;
+    case INDEX_op_aa64_sli_vec:
+        return &w_0_w;
 
     default:
         return NULL;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 34/36] tcg/ppc: Implement INDEX_op_rot[lr]v_vec
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (32 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 33/36] tcg/aarch64: Implement INDEX_op_rotli_vec Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22  1:17 ` [PATCH v2 35/36] target/ppc: Use tcg_gen_gvec_rotlv Richard Henderson
                   ` (2 subsequent siblings)
  36 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

We already had support for rotlv, using a target-specific opcode;
convert to use the generic opcode.  Handle rotrv via simple negation.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.h     |  2 +-
 tcg/ppc/tcg-target.opc.h |  1 -
 tcg/ppc/tcg-target.inc.c | 23 +++++++++++++++++++----
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 4a17aebc5a..be5b2901c3 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -163,7 +163,7 @@ extern bool have_vsx;
 #define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_roti_vec         0
 #define TCG_TARGET_HAS_rots_vec         0
-#define TCG_TARGET_HAS_rotv_vec         0
+#define TCG_TARGET_HAS_rotv_vec         1
 #define TCG_TARGET_HAS_shi_vec          0
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          1
diff --git a/tcg/ppc/tcg-target.opc.h b/tcg/ppc/tcg-target.opc.h
index 1373f77e82..db514403c3 100644
--- a/tcg/ppc/tcg-target.opc.h
+++ b/tcg/ppc/tcg-target.opc.h
@@ -30,4 +30,3 @@ DEF(ppc_msum_vec, 1, 3, 0, IMPLVEC)
 DEF(ppc_muleu_vec, 1, 2, 0, IMPLVEC)
 DEF(ppc_mulou_vec, 1, 2, 0, IMPLVEC)
 DEF(ppc_pkum_vec, 1, 2, 0, IMPLVEC)
-DEF(ppc_rotl_vec, 1, 2, 0, IMPLVEC)
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 3333b55766..3f9690418f 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -2988,6 +2988,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_shlv_vec:
     case INDEX_op_shrv_vec:
     case INDEX_op_sarv_vec:
+    case INDEX_op_rotlv_vec:
         return vece <= MO_32 || have_isa_2_07;
     case INDEX_op_ssadd_vec:
     case INDEX_op_sssub_vec:
@@ -2998,6 +2999,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_shli_vec:
     case INDEX_op_shri_vec:
     case INDEX_op_sari_vec:
+    case INDEX_op_rotli_vec:
         return vece <= MO_32 || have_isa_2_07 ? -1 : 0;
     case INDEX_op_neg_vec:
         return vece >= MO_32 && have_isa_3_00;
@@ -3012,6 +3014,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
         return 0;
     case INDEX_op_bitsel_vec:
         return have_vsx;
+    case INDEX_op_rotrv_vec:
+        return -1;
     default:
         return 0;
     }
@@ -3294,7 +3298,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ppc_pkum_vec:
         insn = pkum_op[vece];
         break;
-    case INDEX_op_ppc_rotl_vec:
+    case INDEX_op_rotlv_vec:
         insn = rotl_op[vece];
         break;
     case INDEX_op_ppc_msum_vec:
@@ -3401,7 +3405,7 @@ static void expand_vec_mul(TCGType type, unsigned vece, TCGv_vec v0,
         t3 = tcg_temp_new_vec(type);
         t4 = tcg_temp_new_vec(type);
         tcg_gen_dupi_vec(MO_8, t4, -16);
-        vec_gen_3(INDEX_op_ppc_rotl_vec, type, MO_32, tcgv_vec_arg(t1),
+        vec_gen_3(INDEX_op_rotlv_vec, type, MO_32, tcgv_vec_arg(t1),
                   tcgv_vec_arg(v2), tcgv_vec_arg(t4));
         vec_gen_3(INDEX_op_ppc_mulou_vec, type, MO_16, tcgv_vec_arg(t2),
                   tcgv_vec_arg(v1), tcgv_vec_arg(v2));
@@ -3426,7 +3430,7 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                        TCGArg a0, ...)
 {
     va_list va;
-    TCGv_vec v0, v1, v2;
+    TCGv_vec v0, v1, v2, t0;
     TCGArg a2;
 
     va_start(va, a0);
@@ -3444,6 +3448,9 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
     case INDEX_op_sari_vec:
         expand_vec_shi(type, vece, v0, v1, a2, INDEX_op_sarv_vec);
         break;
+    case INDEX_op_rotli_vec:
+        expand_vec_shi(type, vece, v0, v1, a2, INDEX_op_rotlv_vec);
+        break;
     case INDEX_op_cmp_vec:
         v2 = temp_tcgv_vec(arg_temp(a2));
         expand_vec_cmp(type, vece, v0, v1, v2, va_arg(va, TCGArg));
@@ -3452,6 +3459,13 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
         v2 = temp_tcgv_vec(arg_temp(a2));
         expand_vec_mul(type, vece, v0, v1, v2);
         break;
+    case INDEX_op_rotlv_vec:
+        v2 = temp_tcgv_vec(arg_temp(a2));
+        t0 = tcg_temp_new_vec(type);
+        tcg_gen_neg_vec(vece, t0, v2);
+        tcg_gen_rotlv_vec(vece, v0, v1, t0);
+        tcg_temp_free_vec(t0);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -3656,12 +3670,13 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_shlv_vec:
     case INDEX_op_shrv_vec:
     case INDEX_op_sarv_vec:
+    case INDEX_op_rotlv_vec:
+    case INDEX_op_rotrv_vec:
     case INDEX_op_ppc_mrgh_vec:
     case INDEX_op_ppc_mrgl_vec:
     case INDEX_op_ppc_muleu_vec:
     case INDEX_op_ppc_mulou_vec:
     case INDEX_op_ppc_pkum_vec:
-    case INDEX_op_ppc_rotl_vec:
     case INDEX_op_dup2_vec:
         return &v_v_v;
     case INDEX_op_not_vec:
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 35/36] target/ppc: Use tcg_gen_gvec_rotlv
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (33 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 34/36] tcg/ppc: Implement INDEX_op_rot[lr]v_vec Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-22  1:17 ` [PATCH v2 36/36] target/s390x: Use tcg_gen_gvec_rotl{i,s,v} Richard Henderson
  2020-04-23 13:50 ` [PATCH v2 00/36] tcg 5.1 omnibus patch set Alex Bennée
  36 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, David Gibson

Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h                 |  4 ----
 target/ppc/int_helper.c             | 17 -----------------
 target/ppc/translate/vmx-impl.inc.c |  8 ++++----
 3 files changed, 4 insertions(+), 25 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index a95c010391..b0114fc915 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -213,10 +213,6 @@ DEF_HELPER_3(vsubuqm, void, avr, avr, avr)
 DEF_HELPER_4(vsubecuq, void, avr, avr, avr, avr)
 DEF_HELPER_4(vsubeuqm, void, avr, avr, avr, avr)
 DEF_HELPER_3(vsubcuq, void, avr, avr, avr)
-DEF_HELPER_3(vrlb, void, avr, avr, avr)
-DEF_HELPER_3(vrlh, void, avr, avr, avr)
-DEF_HELPER_3(vrlw, void, avr, avr, avr)
-DEF_HELPER_3(vrld, void, avr, avr, avr)
 DEF_HELPER_4(vsldoi, void, avr, avr, avr, i32)
 DEF_HELPER_3(vextractub, void, avr, avr, i32)
 DEF_HELPER_3(vextractuh, void, avr, avr, i32)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 6d238b989d..ee308da2ca 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1347,23 +1347,6 @@ VRFI(p, float_round_up)
 VRFI(z, float_round_to_zero)
 #undef VRFI
 
-#define VROTATE(suffix, element, mask)                                  \
-    void helper_vrl##suffix(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)   \
-    {                                                                   \
-        int i;                                                          \
-                                                                        \
-        for (i = 0; i < ARRAY_SIZE(r->element); i++) {                  \
-            unsigned int shift = b->element[i] & mask;                  \
-            r->element[i] = (a->element[i] << shift) |                  \
-                (a->element[i] >> (sizeof(a->element[0]) * 8 - shift)); \
-        }                                                               \
-    }
-VROTATE(b, u8, 0x7)
-VROTATE(h, u16, 0xF)
-VROTATE(w, u32, 0x1F)
-VROTATE(d, u64, 0x3F)
-#undef VROTATE
-
 void helper_vrsqrtefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
     int i;
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 403ed3a01c..de2fd136ff 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -900,13 +900,13 @@ GEN_VXFORM3(vsubeuqm, 31, 0);
 GEN_VXFORM3(vsubecuq, 31, 0);
 GEN_VXFORM_DUAL(vsubeuqm, PPC_NONE, PPC2_ALTIVEC_207, \
             vsubecuq, PPC_NONE, PPC2_ALTIVEC_207)
-GEN_VXFORM(vrlb, 2, 0);
-GEN_VXFORM(vrlh, 2, 1);
-GEN_VXFORM(vrlw, 2, 2);
+GEN_VXFORM_V(vrlb, MO_8, tcg_gen_gvec_rotlv, 2, 0);
+GEN_VXFORM_V(vrlh, MO_16, tcg_gen_gvec_rotlv, 2, 1);
+GEN_VXFORM_V(vrlw, MO_32, tcg_gen_gvec_rotlv, 2, 2);
 GEN_VXFORM(vrlwmi, 2, 2);
 GEN_VXFORM_DUAL(vrlw, PPC_ALTIVEC, PPC_NONE, \
                 vrlwmi, PPC_NONE, PPC2_ISA300)
-GEN_VXFORM(vrld, 2, 3);
+GEN_VXFORM_V(vrld, MO_64, tcg_gen_gvec_rotlv, 2, 3);
 GEN_VXFORM(vrldmi, 2, 3);
 GEN_VXFORM_DUAL(vrld, PPC_NONE, PPC2_ALTIVEC_207, \
                 vrldmi, PPC_NONE, PPC2_ISA300)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v2 36/36] target/s390x: Use tcg_gen_gvec_rotl{i,s,v}
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (34 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 35/36] target/ppc: Use tcg_gen_gvec_rotlv Richard Henderson
@ 2020-04-22  1:17 ` Richard Henderson
  2020-04-23 13:50 ` [PATCH v2 00/36] tcg 5.1 omnibus patch set Alex Bennée
  36 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, David Hildenbrand

Merge VERLL and VERLLV into op_vesv and op_ves, alongside
all of the other vector shift operations.

Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/helper.h           |  4 --
 target/s390x/translate_vx.inc.c | 66 +++++----------------------------
 target/s390x/vec_int_helper.c   | 31 ----------------
 target/s390x/insn-data.def      |  4 +-
 4 files changed, 11 insertions(+), 94 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index b5813c2ac2..b7887b552b 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -198,10 +198,6 @@ DEF_HELPER_FLAGS_4(gvec_vmlo16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vmlo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vpopct8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vpopct16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
-DEF_HELPER_FLAGS_4(gvec_verllv8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
-DEF_HELPER_FLAGS_4(gvec_verllv16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
-DEF_HELPER_FLAGS_4(gvec_verll8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
-DEF_HELPER_FLAGS_4(gvec_verll16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_verim8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_verim16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vsl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 12347f8a03..eb767f5288 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1825,63 +1825,6 @@ static DisasJumpType op_vpopct(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
-static void gen_rll_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
-{
-    TCGv_i32 t0 = tcg_temp_new_i32();
-
-    tcg_gen_andi_i32(t0, b, 31);
-    tcg_gen_rotl_i32(d, a, t0);
-    tcg_temp_free_i32(t0);
-}
-
-static void gen_rll_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
-{
-    TCGv_i64 t0 = tcg_temp_new_i64();
-
-    tcg_gen_andi_i64(t0, b, 63);
-    tcg_gen_rotl_i64(d, a, t0);
-    tcg_temp_free_i64(t0);
-}
-
-static DisasJumpType op_verllv(DisasContext *s, DisasOps *o)
-{
-    const uint8_t es = get_field(s, m4);
-    static const GVecGen3 g[4] = {
-        { .fno = gen_helper_gvec_verllv8, },
-        { .fno = gen_helper_gvec_verllv16, },
-        { .fni4 = gen_rll_i32, },
-        { .fni8 = gen_rll_i64, },
-    };
-
-    if (es > ES_64) {
-        gen_program_exception(s, PGM_SPECIFICATION);
-        return DISAS_NORETURN;
-    }
-
-    gen_gvec_3(get_field(s, v1), get_field(s, v2),
-               get_field(s, v3), &g[es]);
-    return DISAS_NEXT;
-}
-
-static DisasJumpType op_verll(DisasContext *s, DisasOps *o)
-{
-    const uint8_t es = get_field(s, m4);
-    static const GVecGen2s g[4] = {
-        { .fno = gen_helper_gvec_verll8, },
-        { .fno = gen_helper_gvec_verll16, },
-        { .fni4 = gen_rll_i32, },
-        { .fni8 = gen_rll_i64, },
-    };
-
-    if (es > ES_64) {
-        gen_program_exception(s, PGM_SPECIFICATION);
-        return DISAS_NORETURN;
-    }
-    gen_gvec_2s(get_field(s, v1), get_field(s, v3), o->addr1,
-                &g[es]);
-    return DISAS_NEXT;
-}
-
 static void gen_rim_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, int32_t c)
 {
     TCGv_i32 t = tcg_temp_new_i32();
@@ -1946,6 +1889,9 @@ static DisasJumpType op_vesv(DisasContext *s, DisasOps *o)
     case 0x70:
         gen_gvec_fn_3(shlv, es, v1, v2, v3);
         break;
+    case 0x73:
+        gen_gvec_fn_3(rotlv, es, v1, v2, v3);
+        break;
     case 0x7a:
         gen_gvec_fn_3(sarv, es, v1, v2, v3);
         break;
@@ -1977,6 +1923,9 @@ static DisasJumpType op_ves(DisasContext *s, DisasOps *o)
         case 0x30:
             gen_gvec_fn_2i(shli, es, v1, v3, d2);
             break;
+        case 0x33:
+            gen_gvec_fn_2i(rotli, es, v1, v3, d2);
+            break;
         case 0x3a:
             gen_gvec_fn_2i(sari, es, v1, v3, d2);
             break;
@@ -1994,6 +1943,9 @@ static DisasJumpType op_ves(DisasContext *s, DisasOps *o)
         case 0x30:
             gen_gvec_fn_2s(shls, es, v1, v3, shift);
             break;
+        case 0x33:
+            gen_gvec_fn_2s(rotls, es, v1, v3, shift);
+            break;
         case 0x3a:
             gen_gvec_fn_2s(sars, es, v1, v3, shift);
             break;
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 0d6bc13dd6..5561b3ed90 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -515,37 +515,6 @@ void HELPER(gvec_vpopct##BITS)(void *v1, const void *v2, uint32_t desc)        \
 DEF_VPOPCT(8)
 DEF_VPOPCT(16)
 
-#define DEF_VERLLV(BITS)                                                       \
-void HELPER(gvec_verllv##BITS)(void *v1, const void *v2, const void *v3,       \
-                               uint32_t desc)                                  \
-{                                                                              \
-    int i;                                                                     \
-                                                                               \
-    for (i = 0; i < (128 / BITS); i++) {                                       \
-        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
-        const uint##BITS##_t b = s390_vec_read_element##BITS(v3, i);           \
-                                                                               \
-        s390_vec_write_element##BITS(v1, i, rol##BITS(a, b));                  \
-    }                                                                          \
-}
-DEF_VERLLV(8)
-DEF_VERLLV(16)
-
-#define DEF_VERLL(BITS)                                                        \
-void HELPER(gvec_verll##BITS)(void *v1, const void *v2, uint64_t count,        \
-                              uint32_t desc)                                   \
-{                                                                              \
-    int i;                                                                     \
-                                                                               \
-    for (i = 0; i < (128 / BITS); i++) {                                       \
-        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
-                                                                               \
-        s390_vec_write_element##BITS(v1, i, rol##BITS(a, count));              \
-    }                                                                          \
-}
-DEF_VERLL(8)
-DEF_VERLL(16)
-
 #define DEF_VERIM(BITS)                                                        \
 void HELPER(gvec_verim##BITS)(void *v1, const void *v2, const void *v3,        \
                               uint32_t desc)                                   \
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 2bc77f0871..91ddaedd84 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1147,8 +1147,8 @@
 /* VECTOR POPULATION COUNT */
     F(0xe750, VPOPCT,  VRR_a, V,   0, 0, 0, 0, vpopct, 0, IF_VEC)
 /* VECTOR ELEMENT ROTATE LEFT LOGICAL */
-    F(0xe773, VERLLV,  VRR_c, V,   0, 0, 0, 0, verllv, 0, IF_VEC)
-    F(0xe733, VERLL,   VRS_a, V,   la2, 0, 0, 0, verll, 0, IF_VEC)
+    F(0xe773, VERLLV,  VRR_c, V,   0, 0, 0, 0, vesv, 0, IF_VEC)
+    F(0xe733, VERLL,   VRS_a, V,   la2, 0, 0, 0, ves, 0, IF_VEC)
 /* VECTOR ELEMENT ROTATE AND INSERT UNDER MASK */
     F(0xe772, VERIM,   VRI_d, V,   0, 0, 0, 0, verim, 0, IF_VEC)
 /* VECTOR ELEMENT SHIFT LEFT */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 20/36] tcg: Remove movi and dupi opcodes
  2020-04-22  1:17 ` [PATCH v2 20/36] tcg: Remove movi and dupi opcodes Richard Henderson
@ 2020-04-22  9:12   ` Aleksandar Markovic
  2020-04-22 19:03   ` Alex Bennée
  1 sibling, 0 replies; 75+ messages in thread
From: Aleksandar Markovic @ 2020-04-22  9:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: alex.bennee, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 11933 bytes --]

среда, 22. април 2020., Richard Henderson <richard.henderson@linaro.org> је
написао/ла:

> These are now completely covered by mov from a
> TYPE_CONST temporary.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---


Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>


>  include/tcg/tcg-opc.h        |  3 ---
>  tcg/aarch64/tcg-target.inc.c |  3 ---
>  tcg/arm/tcg-target.inc.c     |  1 -
>  tcg/i386/tcg-target.inc.c    |  3 ---
>  tcg/mips/tcg-target.inc.c    |  2 --
>  tcg/optimize.c               |  4 ----
>  tcg/ppc/tcg-target.inc.c     |  3 ---
>  tcg/riscv/tcg-target.inc.c   |  2 --
>  tcg/s390/tcg-target.inc.c    |  2 --
>  tcg/sparc/tcg-target.inc.c   |  2 --
>  tcg/tcg-op-vec.c             |  1 -
>  tcg/tcg.c                    | 18 +-----------------
>  tcg/tci/tcg-target.inc.c     |  2 --
>  13 files changed, 1 insertion(+), 45 deletions(-)
>
> diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
> index 7dee9b38f7..4a9cbf5426 100644
> --- a/include/tcg/tcg-opc.h
> +++ b/include/tcg/tcg-opc.h
> @@ -45,7 +45,6 @@ DEF(br, 0, 0, 1, TCG_OPF_BB_END)
>  DEF(mb, 0, 0, 1, 0)
>
>  DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
> -DEF(movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
>  DEF(setcond_i32, 1, 2, 1, 0)
>  DEF(movcond_i32, 1, 4, 1, IMPL(TCG_TARGET_HAS_movcond_i32))
>  /* load/store */
> @@ -110,7 +109,6 @@ DEF(ctz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_ctz_i32))
>  DEF(ctpop_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ctpop_i32))
>
>  DEF(mov_i64, 1, 1, 0, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
> -DEF(movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
>  DEF(setcond_i64, 1, 2, 1, IMPL64)
>  DEF(movcond_i64, 1, 4, 1, IMPL64 | IMPL(TCG_TARGET_HAS_movcond_i64))
>  /* load/store */
> @@ -215,7 +213,6 @@ DEF(qemu_st_i64, 0, TLADDR_ARGS + DATA64_ARGS, 1,
>  #define IMPLVEC  TCG_OPF_VECTOR | IMPL(TCG_TARGET_MAYBE_vec)
>
>  DEF(mov_vec, 1, 1, 0, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT)
> -DEF(dupi_vec, 1, 0, 1, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT)
>
>  DEF(dup_vec, 1, 1, 0, IMPLVEC)
>  DEF(dup2_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_REG_BITS == 32))
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 843fd0ca69..7918aeb9d5 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -2261,8 +2261,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          g_assert_not_reached();
> @@ -2467,7 +2465,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode
> opc,
>          break;
>
>      case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
> -    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
>      case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
>      default:
>          g_assert_not_reached();
> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
> index 6aa7757aac..b967499fa4 100644
> --- a/tcg/arm/tcg-target.inc.c
> +++ b/tcg/arm/tcg-target.inc.c
> @@ -2068,7 +2068,6 @@ static inline void tcg_out_op(TCGContext *s,
> TCGOpcode opc,
>          break;
>
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          tcg_abort();
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index ec083bddcf..320a4bddd1 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -2678,8 +2678,6 @@ static inline void tcg_out_op(TCGContext *s,
> TCGOpcode opc,
>          break;
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          tcg_abort();
> @@ -2965,7 +2963,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode
> opc,
>          break;
>
>      case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
> -    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
>      case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
>      default:
>          g_assert_not_reached();
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index 4d32ebc1df..09dc5a94fa 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -2155,8 +2155,6 @@ static inline void tcg_out_op(TCGContext *s,
> TCGOpcode opc,
>          break;
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          tcg_abort();
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index dd5187be31..9a2c945dbe 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -1099,10 +1099,6 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64_VEC(mov):
>              tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
>              break;
> -        CASE_OP_32_64(movi):
> -        case INDEX_op_dupi_vec:
> -            tcg_opt_gen_movi(s, &temps_used, op, op->args[0],
> op->args[1]);
> -            break;
>
>          case INDEX_op_dup_vec:
>              if (arg_is_const(op->args[1])) {
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index ee1f9227c1..fb390ad978 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -2967,8 +2967,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
> const TCGArg *args,
>
>      case INDEX_op_mov_i32:   /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32:  /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:      /* Always emitted via tcg_out_call.  */
>      default:
>          tcg_abort();
> @@ -3310,7 +3308,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode
> opc,
>          return;
>
>      case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
> -    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
>      case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
>      default:
>          g_assert_not_reached();
> diff --git a/tcg/riscv/tcg-target.inc.c b/tcg/riscv/tcg-target.inc.c
> index 2bc0ba71f2..ec609272ad 100644
> --- a/tcg/riscv/tcg-target.inc.c
> +++ b/tcg/riscv/tcg-target.inc.c
> @@ -1606,8 +1606,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          g_assert_not_reached();
> diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
> index b07e9ff7d6..f6b003a700 100644
> --- a/tcg/s390/tcg-target.inc.c
> +++ b/tcg/s390/tcg-target.inc.c
> @@ -2310,8 +2310,6 @@ static inline void tcg_out_op(TCGContext *s,
> TCGOpcode opc,
>
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          tcg_abort();
> diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
> index 65fddb310d..0808b79eee 100644
> --- a/tcg/sparc/tcg-target.inc.c
> +++ b/tcg/sparc/tcg-target.inc.c
> @@ -1591,8 +1591,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          tcg_abort();
> diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
> index 655b3ae32d..6343046e18 100644
> --- a/tcg/tcg-op-vec.c
> +++ b/tcg/tcg-op-vec.c
> @@ -83,7 +83,6 @@ bool tcg_can_emit_vecop_list(const TCGOpcode *list,
>          case INDEX_op_xor_vec:
>          case INDEX_op_mov_vec:
>          case INDEX_op_dup_vec:
> -        case INDEX_op_dupi_vec:
>          case INDEX_op_dup2_vec:
>          case INDEX_op_ld_vec:
>          case INDEX_op_st_vec:
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 59beb2bf29..adb71f16ae 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1463,7 +1463,6 @@ bool tcg_op_supported(TCGOpcode op)
>          return TCG_TARGET_HAS_goto_ptr;
>
>      case INDEX_op_mov_i32:
> -    case INDEX_op_movi_i32:
>      case INDEX_op_setcond_i32:
>      case INDEX_op_brcond_i32:
>      case INDEX_op_ld8u_i32:
> @@ -1557,7 +1556,6 @@ bool tcg_op_supported(TCGOpcode op)
>          return TCG_TARGET_REG_BITS == 32;
>
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i64:
>      case INDEX_op_setcond_i64:
>      case INDEX_op_brcond_i64:
>      case INDEX_op_ld8u_i64:
> @@ -1663,7 +1661,6 @@ bool tcg_op_supported(TCGOpcode op)
>
>      case INDEX_op_mov_vec:
>      case INDEX_op_dup_vec:
> -    case INDEX_op_dupi_vec:
>      case INDEX_op_dupm_vec:
>      case INDEX_op_ld_vec:
>      case INDEX_op_st_vec:
> @@ -3447,7 +3444,7 @@ static void tcg_reg_alloc_bb_end(TCGContext *s,
> TCGRegSet allocated_regs)
>  }
>
>  /*
> - * Specialized code generation for INDEX_op_movi_*.
> + * Specialized code generation for INDEX_op_mov_* with a constant.
>   */
>  static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
>                                    tcg_target_ulong val, TCGLifeData
> arg_life,
> @@ -3470,14 +3467,6 @@ static void tcg_reg_alloc_do_movi(TCGContext *s,
> TCGTemp *ots,
>      }
>  }
>
> -static void tcg_reg_alloc_movi(TCGContext *s, const TCGOp *op)
> -{
> -    TCGTemp *ots = arg_temp(op->args[0]);
> -    tcg_target_ulong val = op->args[1];
> -
> -    tcg_reg_alloc_do_movi(s, ots, val, op->life, op->output_pref[0]);
> -}
> -
>  /*
>   * Specialized code generation for INDEX_op_mov_*.
>   */
> @@ -4263,11 +4252,6 @@ int tcg_gen_code(TCGContext *s, TranslationBlock
> *tb)
>          case INDEX_op_mov_vec:
>              tcg_reg_alloc_mov(s, op);
>              break;
> -        case INDEX_op_movi_i32:
> -        case INDEX_op_movi_i64:
> -        case INDEX_op_dupi_vec:
> -            tcg_reg_alloc_movi(s, op);
> -            break;
>          case INDEX_op_dup_vec:
>              tcg_reg_alloc_dup(s, op);
>              break;
> diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
> index 1f1639df0d..b796f4fc19 100644
> --- a/tcg/tci/tcg-target.inc.c
> +++ b/tcg/tci/tcg-target.inc.c
> @@ -815,8 +815,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
> const TCGArg *args,
>          break;
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          tcg_abort();
> --
> 2.20.1
>
>
>

[-- Attachment #2: Type: text/html, Size: 13900 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 27/36] tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32, 64}
  2020-04-22  1:17 ` [PATCH v2 27/36] tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32, 64} Richard Henderson
@ 2020-04-22 10:19   ` Philippe Mathieu-Daudé
  2020-04-23  9:38   ` [PATCH v2 27/36] tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32,64} Alex Bennée
  1 sibling, 0 replies; 75+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-04-22 10:19 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 4/22/20 3:17 AM, Richard Henderson wrote:
> For the benefit of compatibility of function pointer types,
> we have standardized on int32_t and int64_t as the integral
> argument to tcg expanders.
> 
> We converted most of them in 474b2e8f0f7, but missed the rotates.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/tcg/tcg-op.h |  8 ++++----
>  tcg/tcg-op.c         | 16 ++++++++--------
>  2 files changed, 12 insertions(+), 12 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 09/36] tcg: Consolidate 3 bits into enum TCGTempKind
  2020-04-22  1:16 ` [PATCH v2 09/36] tcg: Consolidate 3 bits into enum TCGTempKind Richard Henderson
@ 2020-04-22 11:25   ` Alex Bennée
  2020-04-22 19:58   ` Aleksandar Markovic
  1 sibling, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 11:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> The temp_fixed, temp_global, temp_local bits are all related.
> Combine them into a single enumeration.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/tcg/tcg.h |  20 +++++---
>  tcg/optimize.c    |   8 +--
>  tcg/tcg.c         | 122 ++++++++++++++++++++++++++++------------------
>  3 files changed, 90 insertions(+), 60 deletions(-)
>
> diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
> index c48bd76b0a..3534dce77f 100644
> --- a/include/tcg/tcg.h
> +++ b/include/tcg/tcg.h
> @@ -480,23 +480,27 @@ typedef enum TCGTempVal {
>      TEMP_VAL_CONST,
>  } TCGTempVal;
>  
> +typedef enum TCGTempKind {
> +    /* Temp is dead at the end of all basic blocks. */
> +    TEMP_NORMAL,
> +    /* Temp is saved across basic blocks but dead at the end of TBs. */
> +    TEMP_LOCAL,
> +    /* Temp is saved across both basic blocks and translation blocks. */
> +    TEMP_GLOBAL,
> +    /* Temp is in a fixed register. */
> +    TEMP_FIXED,
> +} TCGTempKind;
> +
<snip>
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -116,21 +116,21 @@ static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
>      TCGTemp *i;
>  
>      /* If this is already a global, we can't do better. */
> -    if (ts->temp_global) {
> +    if (ts->kind >= TEMP_GLOBAL) {
>          return ts;
>      }
>  
>      /* Search for a global first. */
>      for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
> -        if (i->temp_global) {
> +        if (i->kind >= TEMP_GLOBAL) {
>              return i;
>          }
>      }
>  
>      /* If it is a temp, search for a temp local. */
> -    if (!ts->temp_local) {
> +    if (ts->kind == TEMP_NORMAL) {
>          for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
> -            if (ts->temp_local) {
> +            if (i->kind >= TEMP_LOCAL) {
>                  return i;
>              }

I was confused as to why these were not equality tests as being of one
type does not imply the properties of another? But I see the logic is
simplified even more in later patches.

<snip>
>  
>      memset(s->reg_to_temp, 0, sizeof(s->reg_to_temp));
> @@ -1885,12 +1896,17 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
>  {
>      int idx = temp_idx(ts);
>  
> -    if (ts->temp_global) {
> +    switch (ts->kind) {
> +    case TEMP_FIXED:
> +    case TEMP_GLOBAL:
>          pstrcpy(buf, buf_size, ts->name);
> -    } else if (ts->temp_local) {
> +        break;
> +    case TEMP_LOCAL:
>          snprintf(buf, buf_size, "loc%d", idx - s->nb_globals);
> -    } else {
> +        break;
> +    case TEMP_NORMAL:
>          snprintf(buf, buf_size, "tmp%d", idx - s->nb_globals);
> +        break;
>      }
>      return buf;

Random aside - if tcg is firmly staying part of qemu we should consider
modernising some of the string handling here.

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 10/36] tcg: Add temp_readonly
  2020-04-22  1:16 ` [PATCH v2 10/36] tcg: Add temp_readonly Richard Henderson
@ 2020-04-22 11:26   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 11:26 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Philippe Mathieu-Daudé


Richard Henderson <richard.henderson@linaro.org> writes:

> In most, but not all, places that we check for TEMP_FIXED,
> we are really testing that we do not modify the temporary.
>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 11/36] tcg: Introduce TYPE_CONST temporaries
  2020-04-22  1:16 ` [PATCH v2 11/36] tcg: Introduce TYPE_CONST temporaries Richard Henderson
@ 2020-04-22 15:17   ` Alex Bennée
  2020-04-22 16:55     ` Richard Henderson
  0 siblings, 1 reply; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 15:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> These will hold a single constant for the duration of the TB.
> They are hashed, so that each value has one temp across the TB.
>
> Not used yet, this is all infrastructure.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/tcg/tcg.h |  27 ++++++++++-
>  tcg/optimize.c    |  40 ++++++++++-------
>  tcg/tcg-op-vec.c  |  17 +++++++
>  tcg/tcg.c         | 111 +++++++++++++++++++++++++++++++++++++++++-----
>  4 files changed, 166 insertions(+), 29 deletions(-)
>
> diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
> index 27e1b509a6..f72530dfda 100644
> --- a/include/tcg/tcg.h
> +++ b/include/tcg/tcg.h
> @@ -489,6 +489,8 @@ typedef enum TCGTempKind {
>      TEMP_GLOBAL,
>      /* Temp is in a fixed register. */
>      TEMP_FIXED,
> +    /* Temp is a fixed constant. */
> +    TEMP_CONST,
>  } TCGTempKind;
>  
>  typedef struct TCGTemp {
> @@ -664,6 +666,7 @@ struct TCGContext {
>      QSIMPLEQ_HEAD(, TCGOp) plugin_ops;
>  #endif
>  
> +    GHashTable *const_table[TCG_TYPE_COUNT];
>      TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
>      TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
>  
> @@ -680,7 +683,7 @@ struct TCGContext {
>  
>  static inline bool temp_readonly(TCGTemp *ts)
>  {
> -    return ts->kind == TEMP_FIXED;
> +    return ts->kind >= TEMP_FIXED;

I should have clarified in the previous patch - TEMP_FIXED is a fixed
value, e.g. env or other internal pointer which we won't be overwriting
or otherwise trashing anywhere in the block?

>  }
>  
>  extern TCGContext tcg_init_ctx;
> @@ -1038,6 +1041,7 @@ TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *op, TCGOpcode opc);
>  
>  void tcg_optimize(TCGContext *s);
>  
> +/* Allocate a new temporary and initialize it with a constant. */
>  TCGv_i32 tcg_const_i32(int32_t val);
>  TCGv_i64 tcg_const_i64(int64_t val);
>  TCGv_i32 tcg_const_local_i32(int32_t val);
> @@ -1047,6 +1051,27 @@ TCGv_vec tcg_const_ones_vec(TCGType);
>  TCGv_vec tcg_const_zeros_vec_matching(TCGv_vec);
>  TCGv_vec tcg_const_ones_vec_matching(TCGv_vec);
>  
> +/*
> + * Locate or create a read-only temporary that is a constant.
> + * This kind of temporary need not and should not be freed.
> + */
> +TCGTemp *tcg_constant_internal(TCGType type, tcg_target_long val);
> +
> +static inline TCGv_i32 tcg_constant_i32(int32_t val)
> +{
> +    return temp_tcgv_i32(tcg_constant_internal(TCG_TYPE_I32, val));
> +}
> +
> +static inline TCGv_i64 tcg_constant_i64(int64_t val)
> +{
> +    if (TCG_TARGET_REG_BITS == 32) {
> +        qemu_build_not_reached();
> +    }
> +    return temp_tcgv_i64(tcg_constant_internal(TCG_TYPE_I64, val));
> +}
> +
> +TCGv_vec tcg_constant_vec(TCGType type, unsigned vece, int64_t val);
> +
>  #if UINTPTR_MAX == UINT32_MAX
>  # define tcg_const_ptr(x)        ((TCGv_ptr)tcg_const_i32((intptr_t)(x)))
>  # define tcg_const_local_ptr(x)  ((TCGv_ptr)tcg_const_local_i32((intptr_t)(x)))
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index afb4a9a5a9..effb47eefd 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -99,8 +99,17 @@ static void init_ts_info(struct tcg_temp_info *infos,
>          ts->state_ptr = ti;
>          ti->next_copy = ts;
>          ti->prev_copy = ts;
> -        ti->is_const = false;
> -        ti->mask = -1;
> +        if (ts->kind == TEMP_CONST) {
> +            ti->is_const = true;
> +            ti->val = ti->mask = ts->val;
> +            if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
> +                /* High bits of a 32-bit quantity are garbage.  */
> +                ti->mask |= ~0xffffffffull;
> +            }
> +        } else {
> +            ti->is_const = false;
> +            ti->mask = -1;
> +        }
>          set_bit(idx, temps_used->l);
>      }
>  }
> @@ -113,31 +122,28 @@ static void init_arg_info(struct tcg_temp_info *infos,
>  
>  static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
>  {
> -    TCGTemp *i;
> +    TCGTemp *i, *g, *l;
>  
> -    /* If this is already a global, we can't do better. */
> -    if (ts->kind >= TEMP_GLOBAL) {
> +    /* If this is already readonly, we can't do better. */
> +    if (temp_readonly(ts)) {
>          return ts;
>      }
>  
> -    /* Search for a global first. */
> +    g = l = NULL;
>      for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
> -        if (i->kind >= TEMP_GLOBAL) {
> +        if (temp_readonly(i)) {
>              return i;
> -        }
> -    }
> -
> -    /* If it is a temp, search for a temp local. */
> -    if (ts->kind == TEMP_NORMAL) {
> -        for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
> -            if (i->kind >= TEMP_LOCAL) {
> -                return i;
> +        } else if (i->kind > ts->kind) {
> +            if (i->kind == TEMP_GLOBAL) {
> +                g = i;
> +            } else if (i->kind == TEMP_LOCAL) {
> +                l = i;
>              }
>          }
>      }
>  
> -    /* Failure to find a better representation, return the same temp. */
> -    return ts;
> +    /* If we didn't find a better representation, return the same temp. */
> +    return g ? g : l ? l : ts;
>  }
>  
>  static bool ts_are_copies(TCGTemp *ts1, TCGTemp *ts2)
> diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
> index b6937e8d64..f3927089a7 100644
> --- a/tcg/tcg-op-vec.c
> +++ b/tcg/tcg-op-vec.c
> @@ -209,6 +209,23 @@ static void vec_gen_op3(TCGOpcode opc, unsigned vece,
>      vec_gen_3(opc, type, vece, temp_arg(rt), temp_arg(at), temp_arg(bt));
>  }
>  
> +TCGv_vec tcg_constant_vec(TCGType type, unsigned vece, int64_t val)
> +{
> +    val = dup_const(vece, val);
> +
> +    /*
> +     * For MO_64 constants that can't be represented in tcg_target_long,
> +     * we must use INDEX_op_dup2_vec, which requires a non-const temporary.
> +     */
> +    if (TCG_TARGET_REG_BITS == 32 &&
> +        val != deposit64(val, 32, 32, val) &&
> +        val != (uint64_t)(int32_t)val) {
> +        g_assert_not_reached();
> +    }
> +
> +    return temp_tcgv_vec(tcg_constant_internal(type, val));
> +}
> +
>  void tcg_gen_mov_vec(TCGv_vec r, TCGv_vec a)
>  {
>      if (r != a) {
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 92b3767097..59beb2bf29 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1127,6 +1127,7 @@ void tcg_func_start(TCGContext *s)
>  
>      /* No temps have been previously allocated for size or locality.  */
>      memset(s->free_temps, 0, sizeof(s->free_temps));
> +    memset(s->const_table, 0, sizeof(s->const_table));
>  
>      s->nb_ops = 0;
>      s->nb_labels = 0;
> @@ -1199,13 +1200,19 @@ TCGTemp *tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
>      bigendian = 1;
>  #endif
>  
> -    if (base_ts->kind != TEMP_FIXED) {
> +    switch (base_ts->kind) {
> +    case TEMP_FIXED:
> +        break;
> +    case TEMP_GLOBAL:
>          /* We do not support double-indirect registers.  */
>          tcg_debug_assert(!base_ts->indirect_reg);
>          base_ts->indirect_base = 1;
>          s->nb_indirects += (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64
>                              ? 2 : 1);
>          indirect_reg = 1;
> +        break;
> +    default:
> +        g_assert_not_reached();
>      }
>  
>      if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
> @@ -1346,6 +1353,37 @@ void tcg_temp_free_internal(TCGTemp *ts)
>      set_bit(idx, s->free_temps[k].l);
>  }
>  
> +TCGTemp *tcg_constant_internal(TCGType type, tcg_target_long val)
> +{
> +    TCGContext *s = tcg_ctx;
> +    GHashTable *h = s->const_table[type];
> +    TCGTemp *ts;
> +
> +    if (h == NULL) {
> +        if (sizeof(tcg_target_long) == sizeof(gint64)) {
> +            h = g_hash_table_new(g_int64_hash, g_int64_equal);
> +        } else if (sizeof(tcg_target_long) == sizeof(gint)) {
> +            h = g_hash_table_new(g_int_hash, g_int_equal);
> +        } else {
> +            qemu_build_not_reached();
> +        }
> +        s->const_table[type] = h;
> +    }
> +
> +    ts = g_hash_table_lookup(h, &val);
> +    if (ts == NULL) {
> +        ts = tcg_temp_alloc(s);
> +        ts->base_type = type;
> +        ts->type = type;
> +        ts->kind = TEMP_CONST;
> +        ts->temp_allocated = 1;
> +        ts->val = val;
> +        g_hash_table_insert(h, &ts->val, ts);

I worried about the efficiency of using a hash table for a low number of
constants but glib's implementation starts with 8 buckets and then
scales up so it seems a reasonable approach.

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 12/36] tcg: Use tcg_constant_i32 with icount expander
  2020-04-22  1:16 ` [PATCH v2 12/36] tcg: Use tcg_constant_i32 with icount expander Richard Henderson
@ 2020-04-22 15:40   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 15:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> We must do this before we adjust how tcg_out_movi_i32,
> lest the under-the-hood poking that we do be broken.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  include/exec/gen-icount.h | 25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
> index 822c43cfd3..404732518a 100644
> --- a/include/exec/gen-icount.h
> +++ b/include/exec/gen-icount.h
> @@ -34,7 +34,7 @@ static inline void gen_io_end(void)
>  
>  static inline void gen_tb_start(TranslationBlock *tb)
>  {
> -    TCGv_i32 count, imm;
> +    TCGv_i32 count;
>  
>      tcg_ctx->exitreq_label = gen_new_label();
>      if (tb_cflags(tb) & CF_USE_ICOUNT) {
> @@ -48,15 +48,13 @@ static inline void gen_tb_start(TranslationBlock *tb)
>                     offsetof(ArchCPU, env));
>  
>      if (tb_cflags(tb) & CF_USE_ICOUNT) {
> -        imm = tcg_temp_new_i32();
> -        /* We emit a movi with a dummy immediate argument. Keep the insn index
> -         * of the movi so that we later (when we know the actual insn count)
> -         * can update the immediate argument with the actual insn count.  */
> -        tcg_gen_movi_i32(imm, 0xdeadbeef);
> +        /*
> +         * We emit a sub with a dummy immediate argument. Keep the insn index
> +         * of the sub so that we later (when we know the actual insn count)
> +         * can update the argument with the actual insn count.
> +         */
> +        tcg_gen_sub_i32(count, count, tcg_constant_i32(0));
>          icount_start_insn = tcg_last_op();
> -
> -        tcg_gen_sub_i32(count, count, imm);
> -        tcg_temp_free_i32(imm);
>      }
>  
>      tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
> @@ -74,9 +72,12 @@ static inline void gen_tb_start(TranslationBlock *tb)
>  static inline void gen_tb_end(TranslationBlock *tb, int num_insns)
>  {
>      if (tb_cflags(tb) & CF_USE_ICOUNT) {
> -        /* Update the num_insn immediate parameter now that we know
> -         * the actual insn count.  */
> -        tcg_set_insn_param(icount_start_insn, 1, num_insns);
> +        /*
> +         * Update the num_insn immediate parameter now that we know
> +         * the actual insn count.
> +         */
> +        tcg_set_insn_param(icount_start_insn, 2,
> +                           tcgv_i32_arg(tcg_constant_i32(num_insns)));
>      }
>  
>      gen_set_label(tcg_ctx->exitreq_label);


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 13/36] tcg: Use tcg_constant_{i32,i64} with tcg int expanders
  2020-04-22  1:16 ` [PATCH v2 13/36] tcg: Use tcg_constant_{i32, i64} with tcg int expanders Richard Henderson
@ 2020-04-22 16:18   ` Alex Bennée
  2020-04-22 17:02     ` Richard Henderson
  2020-04-22 20:04   ` Alex Bennée
  1 sibling, 1 reply; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 16:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/tcg/tcg-op.h |  13 +--
>  tcg/tcg-op.c         | 216 ++++++++++++++++++++-----------------------
>  2 files changed, 100 insertions(+), 129 deletions(-)
>
> diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
> index 230db6e022..11ed9192f7 100644
> --- a/include/tcg/tcg-op.h
> +++ b/include/tcg/tcg-op.h
<snip>
> @@ -1468,12 +1441,17 @@ void tcg_gen_brcond_i64(TCGCond cond, TCGv_i64 arg1, TCGv_i64 arg2, TCGLabel *l)
>  
>  void tcg_gen_brcondi_i64(TCGCond cond, TCGv_i64 arg1, int64_t arg2, TCGLabel *l)
>  {
> -    if (cond == TCG_COND_ALWAYS) {
> +    if (TCG_TARGET_REG_BITS == 64) {
> +        tcg_gen_brcond_i64(cond, arg1, tcg_constant_i64(arg2), l);
> +    } else if (cond == TCG_COND_ALWAYS) {
>          tcg_gen_br(l);
>      } else if (cond != TCG_COND_NEVER) {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_brcond_i64(cond, arg1, t0, l);
> -        tcg_temp_free_i64(t0);
> +        l->refs++;

Hmm is this a separate fix?

> +        tcg_gen_op6ii_i32(INDEX_op_brcond2_i32,
> +                          TCGV_LOW(arg1), TCGV_HIGH(arg1),
> +                          tcg_constant_i32(arg2),
> +                          tcg_constant_i32(arg2 >> 32),
> +                          cond, label_arg(l));
>      }
>  }
<snip>

otherwise lgtm:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 11/36] tcg: Introduce TYPE_CONST temporaries
  2020-04-22 15:17   ` Alex Bennée
@ 2020-04-22 16:55     ` Richard Henderson
  0 siblings, 0 replies; 75+ messages in thread
From: Richard Henderson @ 2020-04-22 16:55 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 4/22/20 8:17 AM, Alex Bennée wrote:
>>  static inline bool temp_readonly(TCGTemp *ts)
>>  {
>> -    return ts->kind == TEMP_FIXED;
>> +    return ts->kind >= TEMP_FIXED;
> 
> I should have clarified in the previous patch - TEMP_FIXED is a fixed
> value, e.g. env or other internal pointer which we won't be overwriting
> or otherwise trashing anywhere in the block?

Correct.  Only env, actually.  There are (currently) no other internal pointers
that are fixed.


r~


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 14/36] tcg: Use tcg_constant_{i32,vec} with tcg vec expanders
  2020-04-22  1:17 ` [PATCH v2 14/36] tcg: Use tcg_constant_{i32, vec} with tcg vec expanders Richard Henderson
@ 2020-04-22 17:00   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 17:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/tcg-op-vec.c | 63 ++++++++++++++++++++++++++----------------------
>  1 file changed, 34 insertions(+), 29 deletions(-)
>
> diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
> index f3927089a7..655b3ae32d 100644
> --- a/tcg/tcg-op-vec.c
> +++ b/tcg/tcg-op-vec.c
> @@ -233,25 +233,17 @@ void tcg_gen_mov_vec(TCGv_vec r, TCGv_vec a)
>      }
>  }
>  
> -#define MO_REG  (TCG_TARGET_REG_BITS == 64 ? MO_64 : MO_32)
> -
> -static void do_dupi_vec(TCGv_vec r, unsigned vece, TCGArg a)
> -{
> -    TCGTemp *rt = tcgv_vec_temp(r);
> -    vec_gen_2(INDEX_op_dupi_vec, rt->base_type, vece, temp_arg(rt), a);
> -}
> -
>  TCGv_vec tcg_const_zeros_vec(TCGType type)
>  {
>      TCGv_vec ret = tcg_temp_new_vec(type);
> -    do_dupi_vec(ret, MO_REG, 0);
> +    tcg_gen_mov_vec(ret, tcg_constant_vec(type, MO_8, 0));
>      return ret;
>  }
>  
>  TCGv_vec tcg_const_ones_vec(TCGType type)
>  {
>      TCGv_vec ret = tcg_temp_new_vec(type);
> -    do_dupi_vec(ret, MO_REG, -1);
> +    tcg_gen_mov_vec(ret, tcg_constant_vec(type, MO_8, -1));
>      return ret;
>  }
>  
> @@ -267,37 +259,50 @@ TCGv_vec tcg_const_ones_vec_matching(TCGv_vec m)
>      return tcg_const_ones_vec(t->base_type);
>  }
>  
> -void tcg_gen_dup64i_vec(TCGv_vec r, uint64_t a)
> +void tcg_gen_dupi_vec(unsigned vece, TCGv_vec dest, uint64_t val)
>  {
> -    if (TCG_TARGET_REG_BITS == 32 && a == deposit64(a, 32, 32, a)) {
> -        do_dupi_vec(r, MO_32, a);
> -    } else if (TCG_TARGET_REG_BITS == 64 || a == (uint64_t)(int32_t)a) {
> -        do_dupi_vec(r, MO_64, a);
> -    } else {
> -        TCGv_i64 c = tcg_const_i64(a);
> -        tcg_gen_dup_i64_vec(MO_64, r, c);
> -        tcg_temp_free_i64(c);
> +    TCGType type = tcgv_vec_temp(dest)->base_type;
> +
> +    /*
> +     * For MO_64 constants that can't be represented in tcg_target_long,
> +     * we must use INDEX_op_dup2_vec.
> +     */
> +    if (TCG_TARGET_REG_BITS == 32) {
> +        val = dup_const(vece, val);
> +        if (val != deposit64(val, 32, 32, val) &&
> +            val != (uint64_t)(int32_t)val) {
> +            uint32_t vl = extract64(val, 0, 32);
> +            uint32_t vh = extract64(val, 32, 32);
> +            TCGArg al = tcgv_i32_arg(tcg_constant_i32(vl));
> +            TCGArg ah = tcgv_i32_arg(tcg_constant_i32(vh));
> +            TCGArg di = tcgv_vec_arg(dest);
> +
> +            vec_gen_3(INDEX_op_dup2_vec, type, MO_64, di, al, ah);
> +            return;
> +        }
>      }
> +
> +    tcg_gen_mov_vec(dest, tcg_constant_vec(type, vece, val));
>  }
>  
> -void tcg_gen_dup32i_vec(TCGv_vec r, uint32_t a)
> +void tcg_gen_dup64i_vec(TCGv_vec dest, uint64_t val)
>  {
> -    do_dupi_vec(r, MO_REG, dup_const(MO_32, a));
> +    tcg_gen_dupi_vec(MO_64, dest, val);
>  }
>  
> -void tcg_gen_dup16i_vec(TCGv_vec r, uint32_t a)
> +void tcg_gen_dup32i_vec(TCGv_vec dest, uint32_t val)
>  {
> -    do_dupi_vec(r, MO_REG, dup_const(MO_16, a));
> +    tcg_gen_dupi_vec(MO_32, dest, val);
>  }
>  
> -void tcg_gen_dup8i_vec(TCGv_vec r, uint32_t a)
> +void tcg_gen_dup16i_vec(TCGv_vec dest, uint32_t val)
>  {
> -    do_dupi_vec(r, MO_REG, dup_const(MO_8, a));
> +    tcg_gen_dupi_vec(MO_16, dest, val);
>  }
>  
> -void tcg_gen_dupi_vec(unsigned vece, TCGv_vec r, uint64_t a)
> +void tcg_gen_dup8i_vec(TCGv_vec dest, uint32_t val)
>  {
> -    do_dupi_vec(r, MO_REG, dup_const(vece, a));
> +    tcg_gen_dupi_vec(MO_8, dest, val);
>  }
>  
>  void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec r, TCGv_i64 a)
> @@ -502,8 +507,8 @@ void tcg_gen_abs_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
>              if (tcg_can_emit_vec_op(INDEX_op_sari_vec, type, vece) > 0) {
>                  tcg_gen_sari_vec(vece, t, a, (8 << vece) - 1);
>              } else {
> -                do_dupi_vec(t, MO_REG, 0);
> -                tcg_gen_cmp_vec(TCG_COND_LT, vece, t, a, t);
> +                tcg_gen_cmp_vec(TCG_COND_LT, vece, t, a,
> +                                tcg_constant_vec(type, vece, 0));
>              }
>              tcg_gen_xor_vec(vece, r, a, t);
>              tcg_gen_sub_vec(vece, r, r, t);


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 13/36] tcg: Use tcg_constant_{i32,i64} with tcg int expanders
  2020-04-22 16:18   ` [PATCH v2 13/36] tcg: Use tcg_constant_{i32,i64} " Alex Bennée
@ 2020-04-22 17:02     ` Richard Henderson
  2020-04-22 17:57       ` Alex Bennée
  0 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-22 17:02 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 4/22/20 9:18 AM, Alex Bennée wrote:
>>  void tcg_gen_brcondi_i64(TCGCond cond, TCGv_i64 arg1, int64_t arg2, TCGLabel *l)
>>  {
>> -    if (cond == TCG_COND_ALWAYS) {
>> +    if (TCG_TARGET_REG_BITS == 64) {
>> +        tcg_gen_brcond_i64(cond, arg1, tcg_constant_i64(arg2), l);
>> +    } else if (cond == TCG_COND_ALWAYS) {
>>          tcg_gen_br(l);
>>      } else if (cond != TCG_COND_NEVER) {
>> -        TCGv_i64 t0 = tcg_const_i64(arg2);
>> -        tcg_gen_brcond_i64(cond, arg1, t0, l);
>> -        tcg_temp_free_i64(t0);
>> +        l->refs++;
> 
> Hmm is this a separate fix?

No, it's expanding what tcg_gen_brcond_i64 would do for TCG_TARGET_REG_BITS == 32.

>> +        tcg_gen_op6ii_i32(INDEX_op_brcond2_i32,
>> +                          TCGV_LOW(arg1), TCGV_HIGH(arg1),
>> +                          tcg_constant_i32(arg2),
>> +                          tcg_constant_i32(arg2 >> 32),
>> +                          cond, label_arg(l));

Because we have two separate TCGv_i32, from tcg_constant_i32(), which cannot be
packaged up with TCGV_HIGH/LOW.


r~


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 15/36] tcg: Use tcg_constant_{i32, i64} with tcg plugins
  2020-04-22  1:17 ` [PATCH v2 15/36] tcg: Use tcg_constant_{i32,i64} with tcg plugins Richard Henderson
@ 2020-04-22 17:18   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 17:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  accel/tcg/plugin-gen.c | 49 +++++++++++++++++++-----------------------
>  1 file changed, 22 insertions(+), 27 deletions(-)
>
> diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
> index 51580d51a0..e5dc9d0ca9 100644
> --- a/accel/tcg/plugin-gen.c
> +++ b/accel/tcg/plugin-gen.c
> @@ -284,8 +284,8 @@ static TCGOp *copy_extu_i32_i64(TCGOp **begin_op, TCGOp *op)
>      if (TCG_TARGET_REG_BITS == 32) {
>          /* mov_i32 */
>          op = copy_op(begin_op, op, INDEX_op_mov_i32);
> -        /* movi_i32 */
> -        op = copy_op(begin_op, op, INDEX_op_movi_i32);
> +        /* mov_i32 w/ $0 */
> +        op = copy_op(begin_op, op, INDEX_op_mov_i32);
>      } else {
>          /* extu_i32_i64 */
>          op = copy_op(begin_op, op, INDEX_op_extu_i32_i64);
> @@ -306,39 +306,34 @@ static TCGOp *copy_mov_i64(TCGOp **begin_op, TCGOp *op)
>      return op;
>  }
>  
> -static TCGOp *copy_movi_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
> -{
> -    if (TCG_TARGET_REG_BITS == 32) {
> -        /* 2x movi_i32 */
> -        op = copy_op(begin_op, op, INDEX_op_movi_i32);
> -        op->args[1] = v;
> -
> -        op = copy_op(begin_op, op, INDEX_op_movi_i32);
> -        op->args[1] = v >> 32;
> -    } else {
> -        /* movi_i64 */
> -        op = copy_op(begin_op, op, INDEX_op_movi_i64);
> -        op->args[1] = v;
> -    }
> -    return op;
> -}
> -
>  static TCGOp *copy_const_ptr(TCGOp **begin_op, TCGOp *op, void *ptr)
>  {
>      if (UINTPTR_MAX == UINT32_MAX) {
> -        /* movi_i32 */
> -        op = copy_op(begin_op, op, INDEX_op_movi_i32);
> -        op->args[1] = (uintptr_t)ptr;
> +        /* mov_i32 */
> +        op = copy_op(begin_op, op, INDEX_op_mov_i32);
> +        op->args[1] = tcgv_i32_arg(tcg_constant_i32((uintptr_t)ptr));
>      } else {
> -        /* movi_i64 */
> -        op = copy_movi_i64(begin_op, op, (uint64_t)(uintptr_t)ptr);
> +        /* mov_i64 */
> +        op = copy_op(begin_op, op, INDEX_op_mov_i64);
> +        op->args[1] = tcgv_i64_arg(tcg_constant_i64((uintptr_t)ptr));
>      }
>      return op;
>  }
>  
>  static TCGOp *copy_const_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
>  {
> -    return copy_movi_i64(begin_op, op, v);
> +    if (TCG_TARGET_REG_BITS == 32) {
> +        /* 2x mov_i32 */
> +        op = copy_op(begin_op, op, INDEX_op_mov_i32);
> +        op->args[1] = tcgv_i32_arg(tcg_constant_i32(v));
> +        op = copy_op(begin_op, op, INDEX_op_mov_i32);
> +        op->args[1] = tcgv_i32_arg(tcg_constant_i32(v >> 32));
> +    } else {
> +        /* mov_i64 */
> +        op = copy_op(begin_op, op, INDEX_op_mov_i64);
> +        op->args[1] = tcgv_i64_arg(tcg_constant_i64(v));
> +    }
> +    return op;
>  }
>  
>  static TCGOp *copy_extu_tl_i64(TCGOp **begin_op, TCGOp *op)
> @@ -486,8 +481,8 @@ static TCGOp *append_mem_cb(const struct qemu_plugin_dyn_cb *cb,
>  
>      tcg_debug_assert(type == PLUGIN_GEN_CB_MEM);
>  
> -    /* const_i32 == movi_i32 ("info", so it remains as is) */
> -    op = copy_op(&begin_op, op, INDEX_op_movi_i32);
> +    /* const_i32 == mov_i32 ("info", so it remains as is) */
> +    op = copy_op(&begin_op, op, INDEX_op_mov_i32);
>  
>      /* const_ptr */
>      op = copy_const_ptr(&begin_op, op, cb->userp);


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 16/36] tcg: Rename struct tcg_temp_info to TempOptInfo
  2020-04-22  1:17 ` [PATCH v2 16/36] tcg: Rename struct tcg_temp_info to TempOptInfo Richard Henderson
@ 2020-04-22 17:19   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 17:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Philippe Mathieu-Daudé


Richard Henderson <richard.henderson@linaro.org> writes:

> Fix this name vs our coding style.
>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/optimize.c | 32 ++++++++++++++++----------------
>  1 file changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index effb47eefd..b86bf3d707 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -35,20 +35,20 @@
>          glue(glue(case INDEX_op_, x), _i64):    \
>          glue(glue(case INDEX_op_, x), _vec)
>  
> -struct tcg_temp_info {
> +typedef struct TempOptInfo {
>      bool is_const;
>      TCGTemp *prev_copy;
>      TCGTemp *next_copy;
>      tcg_target_ulong val;
>      tcg_target_ulong mask;
> -};
> +} TempOptInfo;
>  
> -static inline struct tcg_temp_info *ts_info(TCGTemp *ts)
> +static inline TempOptInfo *ts_info(TCGTemp *ts)
>  {
>      return ts->state_ptr;
>  }
>  
> -static inline struct tcg_temp_info *arg_info(TCGArg arg)
> +static inline TempOptInfo *arg_info(TCGArg arg)
>  {
>      return ts_info(arg_temp(arg));
>  }
> @@ -71,9 +71,9 @@ static inline bool ts_is_copy(TCGTemp *ts)
>  /* Reset TEMP's state, possibly removing the temp for the list of copies.  */
>  static void reset_ts(TCGTemp *ts)
>  {
> -    struct tcg_temp_info *ti = ts_info(ts);
> -    struct tcg_temp_info *pi = ts_info(ti->prev_copy);
> -    struct tcg_temp_info *ni = ts_info(ti->next_copy);
> +    TempOptInfo *ti = ts_info(ts);
> +    TempOptInfo *pi = ts_info(ti->prev_copy);
> +    TempOptInfo *ni = ts_info(ti->next_copy);
>  
>      ni->prev_copy = ti->prev_copy;
>      pi->next_copy = ti->next_copy;
> @@ -89,12 +89,12 @@ static void reset_temp(TCGArg arg)
>  }
>  
>  /* Initialize and activate a temporary.  */
> -static void init_ts_info(struct tcg_temp_info *infos,
> +static void init_ts_info(TempOptInfo *infos,
>                           TCGTempSet *temps_used, TCGTemp *ts)
>  {
>      size_t idx = temp_idx(ts);
>      if (!test_bit(idx, temps_used->l)) {
> -        struct tcg_temp_info *ti = &infos[idx];
> +        TempOptInfo *ti = &infos[idx];
>  
>          ts->state_ptr = ti;
>          ti->next_copy = ts;
> @@ -114,7 +114,7 @@ static void init_ts_info(struct tcg_temp_info *infos,
>      }
>  }
>  
> -static void init_arg_info(struct tcg_temp_info *infos,
> +static void init_arg_info(TempOptInfo *infos,
>                            TCGTempSet *temps_used, TCGArg arg)
>  {
>      init_ts_info(infos, temps_used, arg_temp(arg));
> @@ -177,7 +177,7 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val)
>      const TCGOpDef *def;
>      TCGOpcode new_op;
>      tcg_target_ulong mask;
> -    struct tcg_temp_info *di = arg_info(dst);
> +    TempOptInfo *di = arg_info(dst);
>  
>      def = &tcg_op_defs[op->opc];
>      if (def->flags & TCG_OPF_VECTOR) {
> @@ -208,8 +208,8 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
>      TCGTemp *dst_ts = arg_temp(dst);
>      TCGTemp *src_ts = arg_temp(src);
>      const TCGOpDef *def;
> -    struct tcg_temp_info *di;
> -    struct tcg_temp_info *si;
> +    TempOptInfo *di;
> +    TempOptInfo *si;
>      tcg_target_ulong mask;
>      TCGOpcode new_op;
>  
> @@ -242,7 +242,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
>      di->mask = mask;
>  
>      if (src_ts->type == dst_ts->type) {
> -        struct tcg_temp_info *ni = ts_info(si->next_copy);
> +        TempOptInfo *ni = ts_info(si->next_copy);
>  
>          di->next_copy = si->next_copy;
>          di->prev_copy = src_ts;
> @@ -605,7 +605,7 @@ void tcg_optimize(TCGContext *s)
>  {
>      int nb_temps, nb_globals;
>      TCGOp *op, *op_next, *prev_mb = NULL;
> -    struct tcg_temp_info *infos;
> +    TempOptInfo *infos;
>      TCGTempSet temps_used;
>  
>      /* Array VALS has an element for each temp.
> @@ -616,7 +616,7 @@ void tcg_optimize(TCGContext *s)
>      nb_temps = s->nb_temps;
>      nb_globals = s->nb_globals;
>      bitmap_zero(temps_used.l, nb_temps);
> -    infos = tcg_malloc(sizeof(struct tcg_temp_info) * nb_temps);
> +    infos = tcg_malloc(sizeof(TempOptInfo) * nb_temps);
>  
>      QTAILQ_FOREACH_SAFE(op, &s->ops, link, op_next) {
>          tcg_target_ulong mask, partmask, affected;


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 17/36] tcg/optimize: Adjust TempOptInfo allocation
  2020-04-22  1:17 ` [PATCH v2 17/36] tcg/optimize: Adjust TempOptInfo allocation Richard Henderson
@ 2020-04-22 17:53   ` Alex Bennée
  2020-04-22 18:28     ` Alex Bennée
  0 siblings, 1 reply; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 17:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Do not allocate a large block for indexing.  Instead, allocate
> for each temporary as they are seen.
>
> In general, this will use less memory, if we consider that most
> TBs do not touch every target register.  This also allows us to
> allocate TempOptInfo for new temps created during optimization.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/optimize.c | 60 ++++++++++++++++++++++++++++----------------------
>  1 file changed, 34 insertions(+), 26 deletions(-)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index b86bf3d707..d36d7e1d7f 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -89,35 +89,41 @@ static void reset_temp(TCGArg arg)
>  }
>  
>  /* Initialize and activate a temporary.  */
> -static void init_ts_info(TempOptInfo *infos,
> -                         TCGTempSet *temps_used, TCGTemp *ts)
> +static void init_ts_info(TCGTempSet *temps_used, TCGTemp *ts)
>  {
>      size_t idx = temp_idx(ts);
> -    if (!test_bit(idx, temps_used->l)) {
> -        TempOptInfo *ti = &infos[idx];
> +    TempOptInfo *ti;
>  
> +    if (test_bit(idx, temps_used->l)) {
> +        return;
> +    }
> +    set_bit(idx, temps_used->l);
> +
> +    ti = ts->state_ptr;
> +    if (ti == NULL) {
> +        ti = tcg_malloc(sizeof(TempOptInfo));
>          ts->state_ptr = ti;
> -        ti->next_copy = ts;
> -        ti->prev_copy = ts;
> -        if (ts->kind == TEMP_CONST) {
> -            ti->is_const = true;
> -            ti->val = ti->mask = ts->val;
> -            if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
> -                /* High bits of a 32-bit quantity are garbage.  */
> -                ti->mask |= ~0xffffffffull;
> -            }
> -        } else {
> -            ti->is_const = false;
> -            ti->mask = -1;
> +    }
> +
> +    ti->next_copy = ts;
> +    ti->prev_copy = ts;
> +    if (ts->kind == TEMP_CONST) {
> +        ti->is_const = true;
> +        ti->val = ts->val;
> +        ti->mask = ts->val;
> +        if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
> +            /* High bits of a 32-bit quantity are garbage.  */
> +            ti->mask |= ~0xffffffffull;
>          }
> -        set_bit(idx, temps_used->l);
> +    } else {
> +        ti->is_const = false;
> +        ti->mask = -1;
>      }
>  }
>  
> -static void init_arg_info(TempOptInfo *infos,
> -                          TCGTempSet *temps_used, TCGArg arg)
> +static void init_arg_info(TCGTempSet *temps_used, TCGArg arg)
>  {
> -    init_ts_info(infos, temps_used, arg_temp(arg));
> +    init_ts_info(temps_used, arg_temp(arg));
>  }
>  
>  static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
> @@ -603,9 +609,8 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
>  /* Propagate constants and copies, fold constant expressions. */
>  void tcg_optimize(TCGContext *s)
>  {
> -    int nb_temps, nb_globals;
> +    int nb_temps, nb_globals, i;
>      TCGOp *op, *op_next, *prev_mb = NULL;
> -    TempOptInfo *infos;
>      TCGTempSet temps_used;
>  
>      /* Array VALS has an element for each temp.
> @@ -615,12 +620,15 @@ void tcg_optimize(TCGContext *s)
>  
>      nb_temps = s->nb_temps;
>      nb_globals = s->nb_globals;
> +
>      bitmap_zero(temps_used.l, nb_temps);
> -    infos = tcg_malloc(sizeof(TempOptInfo) * nb_temps);
> +    for (i = 0; i < nb_temps; ++i) {
> +        s->temps[i].state_ptr = NULL;
> +    }
>  
>      QTAILQ_FOREACH_SAFE(op, &s->ops, link, op_next) {
>          tcg_target_ulong mask, partmask, affected;
> -        int nb_oargs, nb_iargs, i;
> +        int nb_oargs, nb_iargs;
>          TCGArg tmp;
>          TCGOpcode opc = op->opc;
>          const TCGOpDef *def = &tcg_op_defs[opc];
> @@ -633,14 +641,14 @@ void tcg_optimize(TCGContext *s)
>              for (i = 0; i < nb_oargs + nb_iargs; i++) {
>                  TCGTemp *ts = arg_temp(op->args[i]);
>                  if (ts) {
> -                    init_ts_info(infos, &temps_used, ts);
> +                    init_ts_info(&temps_used, ts);
>                  }
>              }
>          } else {
>              nb_oargs = def->nb_oargs;
>              nb_iargs = def->nb_iargs;
>              for (i = 0; i < nb_oargs + nb_iargs; i++) {
> -                init_arg_info(infos, &temps_used, op->args[i]);
> +                init_arg_info(&temps_used, op->args[i]);
>              }
>          }


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 13/36] tcg: Use tcg_constant_{i32,i64} with tcg int expanders
  2020-04-22 17:02     ` Richard Henderson
@ 2020-04-22 17:57       ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 17:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> On 4/22/20 9:18 AM, Alex Bennée wrote:
>>>  void tcg_gen_brcondi_i64(TCGCond cond, TCGv_i64 arg1, int64_t arg2, TCGLabel *l)
>>>  {
>>> -    if (cond == TCG_COND_ALWAYS) {
>>> +    if (TCG_TARGET_REG_BITS == 64) {
>>> +        tcg_gen_brcond_i64(cond, arg1, tcg_constant_i64(arg2), l);
>>> +    } else if (cond == TCG_COND_ALWAYS) {
>>>          tcg_gen_br(l);
>>>      } else if (cond != TCG_COND_NEVER) {
>>> -        TCGv_i64 t0 = tcg_const_i64(arg2);
>>> -        tcg_gen_brcond_i64(cond, arg1, t0, l);
>>> -        tcg_temp_free_i64(t0);
>>> +        l->refs++;
>> 
>> Hmm is this a separate fix?
>
> No, it's expanding what tcg_gen_brcond_i64 would do for TCG_TARGET_REG_BITS == 32.
>
>>> +        tcg_gen_op6ii_i32(INDEX_op_brcond2_i32,
>>> +                          TCGV_LOW(arg1), TCGV_HIGH(arg1),
>>> +                          tcg_constant_i32(arg2),
>>> +                          tcg_constant_i32(arg2 >> 32),
>>> +                          cond, label_arg(l));
>
> Because we have two separate TCGv_i32, from tcg_constant_i32(), which cannot be
> packaged up with TCGV_HIGH/LOW.
>
>
> r~

OK I see that now - the r-b stands ;-)

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 17/36] tcg/optimize: Adjust TempOptInfo allocation
  2020-04-22 17:53   ` Alex Bennée
@ 2020-04-22 18:28     ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 18:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Alex Bennée <alex.bennee@linaro.org> writes:

> Richard Henderson <richard.henderson@linaro.org> writes:
>
>> Do not allocate a large block for indexing.  Instead, allocate
>> for each temporary as they are seen.
>>
>> In general, this will use less memory, if we consider that most
>> TBs do not touch every target register.  This also allows us to
>> allocate TempOptInfo for new temps created during optimization.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
<snip>
>>  
>> -static void init_arg_info(TempOptInfo *infos,
>> -                          TCGTempSet *temps_used, TCGArg arg)
>> +static void init_arg_info(TCGTempSet *temps_used, TCGArg arg)
>>  {
>> -    init_ts_info(infos, temps_used, arg_temp(arg));
>> +    init_ts_info(temps_used, arg_temp(arg));
>>  }

Although I've noticed this function which is only called once where as
others call init_ts_info directly. Any reason to keep it around?

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 18/36] tcg/optimize: Use tcg_constant_internal with constant folding
  2020-04-22  1:17 ` [PATCH v2 18/36] tcg/optimize: Use tcg_constant_internal with constant folding Richard Henderson
@ 2020-04-22 18:28   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 18:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 19/36] tcg/tci: Add special tci_movi_{i32,i64} opcodes
  2020-04-22  1:17 ` [PATCH v2 19/36] tcg/tci: Add special tci_movi_{i32,i64} opcodes Richard Henderson
@ 2020-04-22 19:02   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 19:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> The normal movi opcodes are going away.  We need something
> for TCI to use internally.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  include/tcg/tcg-opc.h    | 8 ++++++++
>  tcg/tci.c                | 4 ++--
>  tcg/tci/tcg-target.inc.c | 4 ++--
>  3 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
> index 9288a04946..7dee9b38f7 100644
> --- a/include/tcg/tcg-opc.h
> +++ b/include/tcg/tcg-opc.h
> @@ -268,6 +268,14 @@ DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
>  #include "tcg-target.opc.h"
>  #endif
>  
> +#ifdef TCG_TARGET_INTERPRETER
> +/* These opcodes are only for use between the tci generator and interpreter. */
> +DEF(tci_movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
> +#if TCG_TARGET_REG_BITS == 64
> +DEF(tci_movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
> +#endif
> +#endif
> +
>  #undef TLADDR_ARGS
>  #undef DATA64_ARGS
>  #undef IMPL
> diff --git a/tcg/tci.c b/tcg/tci.c
> index 46fe9ce63f..a6c1aaf5af 100644
> --- a/tcg/tci.c
> +++ b/tcg/tci.c
> @@ -576,7 +576,7 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
>              t1 = tci_read_r32(regs, &tb_ptr);
>              tci_write_reg32(regs, t0, t1);
>              break;
> -        case INDEX_op_movi_i32:
> +        case INDEX_op_tci_movi_i32:
>              t0 = *tb_ptr++;
>              t1 = tci_read_i32(&tb_ptr);
>              tci_write_reg32(regs, t0, t1);
> @@ -847,7 +847,7 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
>              t1 = tci_read_r64(regs, &tb_ptr);
>              tci_write_reg64(regs, t0, t1);
>              break;
> -        case INDEX_op_movi_i64:
> +        case INDEX_op_tci_movi_i64:
>              t0 = *tb_ptr++;
>              t1 = tci_read_i64(&tb_ptr);
>              tci_write_reg64(regs, t0, t1);
> diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
> index 992d50cb1e..1f1639df0d 100644
> --- a/tcg/tci/tcg-target.inc.c
> +++ b/tcg/tci/tcg-target.inc.c
> @@ -530,13 +530,13 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
>      uint8_t *old_code_ptr = s->code_ptr;
>      uint32_t arg32 = arg;
>      if (type == TCG_TYPE_I32 || arg == arg32) {
> -        tcg_out_op_t(s, INDEX_op_movi_i32);
> +        tcg_out_op_t(s, INDEX_op_tci_movi_i32);
>          tcg_out_r(s, t0);
>          tcg_out32(s, arg32);
>      } else {
>          tcg_debug_assert(type == TCG_TYPE_I64);
>  #if TCG_TARGET_REG_BITS == 64
> -        tcg_out_op_t(s, INDEX_op_movi_i64);
> +        tcg_out_op_t(s, INDEX_op_tci_movi_i64);
>          tcg_out_r(s, t0);
>          tcg_out64(s, arg);
>  #else


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 20/36] tcg: Remove movi and dupi opcodes
  2020-04-22  1:17 ` [PATCH v2 20/36] tcg: Remove movi and dupi opcodes Richard Henderson
  2020-04-22  9:12   ` Aleksandar Markovic
@ 2020-04-22 19:03   ` Alex Bennée
  1 sibling, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 19:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> These are now completely covered by mov from a
> TYPE_CONST temporary.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  include/tcg/tcg-opc.h        |  3 ---
>  tcg/aarch64/tcg-target.inc.c |  3 ---
>  tcg/arm/tcg-target.inc.c     |  1 -
>  tcg/i386/tcg-target.inc.c    |  3 ---
>  tcg/mips/tcg-target.inc.c    |  2 --
>  tcg/optimize.c               |  4 ----
>  tcg/ppc/tcg-target.inc.c     |  3 ---
>  tcg/riscv/tcg-target.inc.c   |  2 --
>  tcg/s390/tcg-target.inc.c    |  2 --
>  tcg/sparc/tcg-target.inc.c   |  2 --
>  tcg/tcg-op-vec.c             |  1 -
>  tcg/tcg.c                    | 18 +-----------------
>  tcg/tci/tcg-target.inc.c     |  2 --
>  13 files changed, 1 insertion(+), 45 deletions(-)
>
> diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
> index 7dee9b38f7..4a9cbf5426 100644
> --- a/include/tcg/tcg-opc.h
> +++ b/include/tcg/tcg-opc.h
> @@ -45,7 +45,6 @@ DEF(br, 0, 0, 1, TCG_OPF_BB_END)
>  DEF(mb, 0, 0, 1, 0)
>  
>  DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
> -DEF(movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
>  DEF(setcond_i32, 1, 2, 1, 0)
>  DEF(movcond_i32, 1, 4, 1, IMPL(TCG_TARGET_HAS_movcond_i32))
>  /* load/store */
> @@ -110,7 +109,6 @@ DEF(ctz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_ctz_i32))
>  DEF(ctpop_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ctpop_i32))
>  
>  DEF(mov_i64, 1, 1, 0, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
> -DEF(movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
>  DEF(setcond_i64, 1, 2, 1, IMPL64)
>  DEF(movcond_i64, 1, 4, 1, IMPL64 | IMPL(TCG_TARGET_HAS_movcond_i64))
>  /* load/store */
> @@ -215,7 +213,6 @@ DEF(qemu_st_i64, 0, TLADDR_ARGS + DATA64_ARGS, 1,
>  #define IMPLVEC  TCG_OPF_VECTOR | IMPL(TCG_TARGET_MAYBE_vec)
>  
>  DEF(mov_vec, 1, 1, 0, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT)
> -DEF(dupi_vec, 1, 0, 1, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT)
>  
>  DEF(dup_vec, 1, 1, 0, IMPLVEC)
>  DEF(dup2_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_REG_BITS == 32))
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 843fd0ca69..7918aeb9d5 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -2261,8 +2261,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>  
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          g_assert_not_reached();
> @@ -2467,7 +2465,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
>          break;
>  
>      case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
> -    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
>      case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
>      default:
>          g_assert_not_reached();
> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
> index 6aa7757aac..b967499fa4 100644
> --- a/tcg/arm/tcg-target.inc.c
> +++ b/tcg/arm/tcg-target.inc.c
> @@ -2068,7 +2068,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>  
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          tcg_abort();
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index ec083bddcf..320a4bddd1 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -2678,8 +2678,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          tcg_abort();
> @@ -2965,7 +2963,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
>          break;
>  
>      case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
> -    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
>      case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
>      default:
>          g_assert_not_reached();
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index 4d32ebc1df..09dc5a94fa 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -2155,8 +2155,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          tcg_abort();
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index dd5187be31..9a2c945dbe 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -1099,10 +1099,6 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64_VEC(mov):
>              tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
>              break;
> -        CASE_OP_32_64(movi):
> -        case INDEX_op_dupi_vec:
> -            tcg_opt_gen_movi(s, &temps_used, op, op->args[0], op->args[1]);
> -            break;
>  
>          case INDEX_op_dup_vec:
>              if (arg_is_const(op->args[1])) {
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index ee1f9227c1..fb390ad978 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -2967,8 +2967,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>  
>      case INDEX_op_mov_i32:   /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32:  /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:      /* Always emitted via tcg_out_call.  */
>      default:
>          tcg_abort();
> @@ -3310,7 +3308,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
>          return;
>  
>      case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
> -    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
>      case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
>      default:
>          g_assert_not_reached();
> diff --git a/tcg/riscv/tcg-target.inc.c b/tcg/riscv/tcg-target.inc.c
> index 2bc0ba71f2..ec609272ad 100644
> --- a/tcg/riscv/tcg-target.inc.c
> +++ b/tcg/riscv/tcg-target.inc.c
> @@ -1606,8 +1606,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>  
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          g_assert_not_reached();
> diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
> index b07e9ff7d6..f6b003a700 100644
> --- a/tcg/s390/tcg-target.inc.c
> +++ b/tcg/s390/tcg-target.inc.c
> @@ -2310,8 +2310,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>  
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          tcg_abort();
> diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
> index 65fddb310d..0808b79eee 100644
> --- a/tcg/sparc/tcg-target.inc.c
> +++ b/tcg/sparc/tcg-target.inc.c
> @@ -1591,8 +1591,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>  
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          tcg_abort();
> diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
> index 655b3ae32d..6343046e18 100644
> --- a/tcg/tcg-op-vec.c
> +++ b/tcg/tcg-op-vec.c
> @@ -83,7 +83,6 @@ bool tcg_can_emit_vecop_list(const TCGOpcode *list,
>          case INDEX_op_xor_vec:
>          case INDEX_op_mov_vec:
>          case INDEX_op_dup_vec:
> -        case INDEX_op_dupi_vec:
>          case INDEX_op_dup2_vec:
>          case INDEX_op_ld_vec:
>          case INDEX_op_st_vec:
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 59beb2bf29..adb71f16ae 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1463,7 +1463,6 @@ bool tcg_op_supported(TCGOpcode op)
>          return TCG_TARGET_HAS_goto_ptr;
>  
>      case INDEX_op_mov_i32:
> -    case INDEX_op_movi_i32:
>      case INDEX_op_setcond_i32:
>      case INDEX_op_brcond_i32:
>      case INDEX_op_ld8u_i32:
> @@ -1557,7 +1556,6 @@ bool tcg_op_supported(TCGOpcode op)
>          return TCG_TARGET_REG_BITS == 32;
>  
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i64:
>      case INDEX_op_setcond_i64:
>      case INDEX_op_brcond_i64:
>      case INDEX_op_ld8u_i64:
> @@ -1663,7 +1661,6 @@ bool tcg_op_supported(TCGOpcode op)
>  
>      case INDEX_op_mov_vec:
>      case INDEX_op_dup_vec:
> -    case INDEX_op_dupi_vec:
>      case INDEX_op_dupm_vec:
>      case INDEX_op_ld_vec:
>      case INDEX_op_st_vec:
> @@ -3447,7 +3444,7 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
>  }
>  
>  /*
> - * Specialized code generation for INDEX_op_movi_*.
> + * Specialized code generation for INDEX_op_mov_* with a constant.
>   */
>  static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
>                                    tcg_target_ulong val, TCGLifeData arg_life,
> @@ -3470,14 +3467,6 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
>      }
>  }
>  
> -static void tcg_reg_alloc_movi(TCGContext *s, const TCGOp *op)
> -{
> -    TCGTemp *ots = arg_temp(op->args[0]);
> -    tcg_target_ulong val = op->args[1];
> -
> -    tcg_reg_alloc_do_movi(s, ots, val, op->life, op->output_pref[0]);
> -}
> -
>  /*
>   * Specialized code generation for INDEX_op_mov_*.
>   */
> @@ -4263,11 +4252,6 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
>          case INDEX_op_mov_vec:
>              tcg_reg_alloc_mov(s, op);
>              break;
> -        case INDEX_op_movi_i32:
> -        case INDEX_op_movi_i64:
> -        case INDEX_op_dupi_vec:
> -            tcg_reg_alloc_movi(s, op);
> -            break;
>          case INDEX_op_dup_vec:
>              tcg_reg_alloc_dup(s, op);
>              break;
> diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
> index 1f1639df0d..b796f4fc19 100644
> --- a/tcg/tci/tcg-target.inc.c
> +++ b/tcg/tci/tcg-target.inc.c
> @@ -815,8 +815,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>          break;
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
> -    case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> -    case INDEX_op_movi_i64:
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
>      default:
>          tcg_abort();


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 21/36] tcg: Use tcg_out_dupi_vec from temp_load
  2020-04-22  1:17 ` [PATCH v2 21/36] tcg: Use tcg_out_dupi_vec from temp_load Richard Henderson
@ 2020-04-22 19:28   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 19:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Having dupi pass though movi is confusing and arguably wrong.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/aarch64/tcg-target.inc.c |  7 ----
>  tcg/i386/tcg-target.inc.c    | 63 ++++++++++++++++++++++++------------
>  tcg/ppc/tcg-target.inc.c     |  6 ----
>  tcg/tcg.c                    |  8 ++++-
>  4 files changed, 49 insertions(+), 35 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 7918aeb9d5..e5c9ab70a9 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -1009,13 +1009,6 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
>      case TCG_TYPE_I64:
>          tcg_debug_assert(rd < 32);
>          break;
> -
> -    case TCG_TYPE_V64:
> -    case TCG_TYPE_V128:
> -        tcg_debug_assert(rd >= 32);
> -        tcg_out_dupi_vec(s, type, rd, value);
> -        return;
> -
>      default:
>          g_assert_not_reached();
>      }
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 320a4bddd1..07424f7ef9 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -977,30 +977,32 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
>      }
>  }
>  
> -static void tcg_out_movi(TCGContext *s, TCGType type,
> -                         TCGReg ret, tcg_target_long arg)
> +static void tcg_out_movi_vec(TCGContext *s, TCGType type,
> +                             TCGReg ret, tcg_target_long arg)
> +{
> +    if (arg == 0) {
> +        tcg_out_vex_modrm(s, OPC_PXOR, ret, ret, ret);
> +        return;
> +    }
> +    if (arg == -1) {
> +        tcg_out_vex_modrm(s, OPC_PCMPEQB, ret, ret, ret);
> +        return;
> +    }
> +
> +    int rexw = (type == TCG_TYPE_I32 ? 0 : P_REXW);
> +    tcg_out_vex_modrm_pool(s, OPC_MOVD_VyEy + rexw, ret);
> +    if (TCG_TARGET_REG_BITS == 64) {
> +        new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
> +    } else {
> +        new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
> +    }
> +}
> +
> +static void tcg_out_movi_int(TCGContext *s, TCGType type,
> +                             TCGReg ret, tcg_target_long arg)
>  {
>      tcg_target_long diff;
>  
> -    switch (type) {
> -    case TCG_TYPE_I32:
> -#if TCG_TARGET_REG_BITS == 64
> -    case TCG_TYPE_I64:
> -#endif
> -        if (ret < 16) {
> -            break;
> -        }
> -        /* fallthru */
> -    case TCG_TYPE_V64:
> -    case TCG_TYPE_V128:
> -    case TCG_TYPE_V256:
> -        tcg_debug_assert(ret >= 16);
> -        tcg_out_dupi_vec(s, type, ret, arg);
> -        return;
> -    default:
> -        g_assert_not_reached();
> -    }
> -
>      if (arg == 0) {
>          tgen_arithr(s, ARITH_XOR, ret, ret);
>          return;
> @@ -1029,6 +1031,25 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
>      tcg_out64(s, arg);
>  }
>  
> +static void tcg_out_movi(TCGContext *s, TCGType type,
> +                         TCGReg ret, tcg_target_long arg)
> +{
> +    switch (type) {
> +    case TCG_TYPE_I32:
> +#if TCG_TARGET_REG_BITS == 64
> +    case TCG_TYPE_I64:
> +#endif
> +        if (ret < 16) {
> +            tcg_out_movi_int(s, type, ret, arg);
> +        } else {
> +            tcg_out_movi_vec(s, type, ret, arg);
> +        }
> +        break;
> +    default:
> +        g_assert_not_reached();
> +    }
> +}
> +
>  static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val)
>  {
>      if (val == (int8_t)val) {
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index fb390ad978..7ab1e32064 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -987,12 +987,6 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
>          tcg_out_movi_int(s, type, ret, arg, false);
>          break;
>  
> -    case TCG_TYPE_V64:
> -    case TCG_TYPE_V128:
> -        tcg_debug_assert(ret >= TCG_REG_V0);
> -        tcg_out_dupi_vec(s, type, ret, arg);
> -        break;
> -
>      default:
>          g_assert_not_reached();
>      }
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index adb71f16ae..4f1ed1d2fe 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -3359,7 +3359,13 @@ static void temp_load(TCGContext *s, TCGTemp *ts, TCGRegSet desired_regs,
>      case TEMP_VAL_CONST:
>          reg = tcg_reg_alloc(s, desired_regs, allocated_regs,
>                              preferred_regs, ts->indirect_base);
> -        tcg_out_movi(s, ts->type, reg, ts->val);
> +        if (ts->type <= TCG_TYPE_I64) {
> +            tcg_out_movi(s, ts->type, reg, ts->val);
> +        } else if (TCG_TARGET_REG_BITS == 64) {
> +            tcg_out_dupi_vec(s, ts->type, reg, ts->val);
> +        } else {
> +            tcg_out_dupi_vec(s, ts->type, reg, dup_const(MO_32, ts->val));
> +        }
>          ts->mem_coherent = 0;
>          break;
>      case TEMP_VAL_MEM:


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 22/36] tcg: Increase tcg_out_dupi_vec immediate to int64_t
  2020-04-22  1:17 ` [PATCH v2 22/36] tcg: Increase tcg_out_dupi_vec immediate to int64_t Richard Henderson
@ 2020-04-22 19:33   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 19:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> While we don't store more than tcg_target_long in TCGTemp,
> we shouldn't be limited to that for code generation.  We will
> be able to use this for INDEX_op_dup2_vec with 2 constants.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/aarch64/tcg-target.inc.c |  2 +-
>  tcg/i386/tcg-target.inc.c    | 20 ++++++++++++--------
>  tcg/ppc/tcg-target.inc.c     | 15 ++++++++-------
>  tcg/tcg.c                    |  4 ++--
>  4 files changed, 23 insertions(+), 18 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index e5c9ab70a9..3b5a5d78c7 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -856,7 +856,7 @@ static void tcg_out_logicali(TCGContext *s, AArch64Insn insn, TCGType ext,
>  }
>  
>  static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
> -                             TCGReg rd, tcg_target_long v64)
> +                             TCGReg rd, int64_t v64)
>  {
>      bool q = type == TCG_TYPE_V128;
>      int cmode, imm8, i;
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 07424f7ef9..9cb627d6eb 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -945,7 +945,7 @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
>  }
>  
>  static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
> -                             TCGReg ret, tcg_target_long arg)
> +                             TCGReg ret, int64_t arg)
>  {
>      int vex_l = (type == TCG_TYPE_V256 ? P_VEXL : 0);
>  
> @@ -958,7 +958,14 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
>          return;
>      }
>  
> -    if (TCG_TARGET_REG_BITS == 64) {
> +    if (TCG_TARGET_REG_BITS == 32 && arg == dup_const(MO_32, arg)) {
> +        if (have_avx2) {
> +            tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTW + vex_l, ret);
> +        } else {
> +            tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSS, ret);
> +        }
> +        new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
> +    } else {
>          if (type == TCG_TYPE_V64) {
>              tcg_out_vex_modrm_pool(s, OPC_MOVQ_VqWq, ret);
>          } else if (have_avx2) {
> @@ -966,14 +973,11 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
>          } else {
>              tcg_out_vex_modrm_pool(s, OPC_MOVDDUP, ret);
>          }
> -        new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
> -    } else {
> -        if (have_avx2) {
> -            tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTW + vex_l, ret);
> +        if (TCG_TARGET_REG_BITS == 64) {
> +            new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
>          } else {
> -            tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSS, ret);
> +            new_pool_l2(s, R_386_32, s->code_ptr - 4, 0, arg, arg >> 32);
>          }
> -        new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
>      }
>  }
>  
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index 7ab1e32064..3333b55766 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -913,7 +913,7 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
>  }
>  
>  static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
> -                             tcg_target_long val)
> +                             int64_t val)
>  {
>      uint32_t load_insn;
>      int rel, low;
> @@ -921,20 +921,20 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
>  
>      low = (int8_t)val;
>      if (low >= -16 && low < 16) {
> -        if (val == (tcg_target_long)dup_const(MO_8, low)) {
> +        if (val == dup_const(MO_8, low)) {
>              tcg_out32(s, VSPLTISB | VRT(ret) | ((val & 31) << 16));
>              return;
>          }
> -        if (val == (tcg_target_long)dup_const(MO_16, low)) {
> +        if (val == dup_const(MO_16, low)) {
>              tcg_out32(s, VSPLTISH | VRT(ret) | ((val & 31) << 16));
>              return;
>          }
> -        if (val == (tcg_target_long)dup_const(MO_32, low)) {
> +        if (val == dup_const(MO_32, low)) {
>              tcg_out32(s, VSPLTISW | VRT(ret) | ((val & 31) << 16));
>              return;
>          }
>      }
> -    if (have_isa_3_00 && val == (tcg_target_long)dup_const(MO_8, val)) {
> +    if (have_isa_3_00 && val == dup_const(MO_8, val)) {
>          tcg_out32(s, XXSPLTIB | VRT(ret) | ((val & 0xff) << 11));
>          return;
>      }
> @@ -956,14 +956,15 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
>          if (TCG_TARGET_REG_BITS == 64) {
>              new_pool_label(s, val, rel, s->code_ptr, add);
>          } else {
> -            new_pool_l2(s, rel, s->code_ptr, add, val, val);
> +            new_pool_l2(s, rel, s->code_ptr, add, val >> 32, val);
>          }
>      } else {
>          load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
>          if (TCG_TARGET_REG_BITS == 64) {
>              new_pool_l2(s, rel, s->code_ptr, add, val, val);
>          } else {
> -            new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
> +            new_pool_l4(s, rel, s->code_ptr, add,
> +                        val >> 32, val, val >> 32, val);
>          }
>      }
>  
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 4f1ed1d2fe..fc1c97d586 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -117,7 +117,7 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
>  static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
>                               TCGReg dst, TCGReg base, intptr_t offset);
>  static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
> -                             TCGReg dst, tcg_target_long arg);
> +                             TCGReg dst, int64_t arg);
>  static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl,
>                             unsigned vece, const TCGArg *args,
>                             const int *const_args);
> @@ -133,7 +133,7 @@ static inline bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
>      g_assert_not_reached();
>  }
>  static inline void tcg_out_dupi_vec(TCGContext *s, TCGType type,
> -                                    TCGReg dst, tcg_target_long arg)
> +                                    TCGReg dst, int64_t arg)
>  {
>      g_assert_not_reached();
>  }


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 23/36] tcg: Add tcg_reg_alloc_dup2
  2020-04-22  1:17 ` [PATCH v2 23/36] tcg: Add tcg_reg_alloc_dup2 Richard Henderson
@ 2020-04-22 19:40   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 19:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> There are several ways we can expand a vector dup of a 64-bit
> element on a 32-bit host.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/tcg.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 88 insertions(+)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index fc1c97d586..d712d19842 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -3870,6 +3870,91 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>      }
>  }
>  
> +static void tcg_reg_alloc_dup2(TCGContext *s, const TCGOp *op)
> +{
> +    const TCGLifeData arg_life = op->life;
> +    TCGTemp *ots, *itsl, *itsh;
> +    TCGType vtype = TCGOP_VECL(op) + TCG_TYPE_V64;
> +
> +    /* This opcode is only valid for 32-bit hosts, for 64-bit elements. */
> +    tcg_debug_assert(TCG_TARGET_REG_BITS == 32);

Given this maybe the whole function should be in a #if
TCG_TARGET_REG_BITS == 32 protection. Most of the other parts of the
code that refer to this have something similar.

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 24/36] tcg/i386: Use tcg_constant_vec with tcg vec expanders
  2020-04-22  1:17 ` [PATCH v2 24/36] tcg/i386: Use tcg_constant_vec with tcg vec expanders Richard Henderson
@ 2020-04-22 19:43   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 19:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/i386/tcg-target.inc.c | 26 +++++++++++++-------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 9cb627d6eb..deace219d2 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -3452,7 +3452,7 @@ static void expand_vec_sari(TCGType type, unsigned vece,
>  static void expand_vec_mul(TCGType type, unsigned vece,
>                             TCGv_vec v0, TCGv_vec v1, TCGv_vec v2)
>  {
> -    TCGv_vec t1, t2, t3, t4;
> +    TCGv_vec t1, t2, t3, t4, zero;
>  
>      tcg_debug_assert(vece == MO_8);
>  
> @@ -3470,11 +3470,11 @@ static void expand_vec_mul(TCGType type, unsigned vece,
>      case TCG_TYPE_V64:
>          t1 = tcg_temp_new_vec(TCG_TYPE_V128);
>          t2 = tcg_temp_new_vec(TCG_TYPE_V128);
> -        tcg_gen_dup16i_vec(t2, 0);
> +        zero = tcg_constant_vec(TCG_TYPE_V128, MO_8, 0);
>          vec_gen_3(INDEX_op_x86_punpckl_vec, TCG_TYPE_V128, MO_8,
> -                  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(t2));
> +                  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(zero));
>          vec_gen_3(INDEX_op_x86_punpckl_vec, TCG_TYPE_V128, MO_8,
> -                  tcgv_vec_arg(t2), tcgv_vec_arg(t2), tcgv_vec_arg(v2));
> +                  tcgv_vec_arg(t2), tcgv_vec_arg(zero), tcgv_vec_arg(v2));
>          tcg_gen_mul_vec(MO_16, t1, t1, t2);
>          tcg_gen_shri_vec(MO_16, t1, t1, 8);
>          vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_8,
> @@ -3489,15 +3489,15 @@ static void expand_vec_mul(TCGType type, unsigned vece,
>          t2 = tcg_temp_new_vec(type);
>          t3 = tcg_temp_new_vec(type);
>          t4 = tcg_temp_new_vec(type);
> -        tcg_gen_dup16i_vec(t4, 0);
> +        zero = tcg_constant_vec(TCG_TYPE_V128, MO_8, 0);
>          vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_8,
> -                  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(t4));
> +                  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(zero));
>          vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_8,
> -                  tcgv_vec_arg(t2), tcgv_vec_arg(t4), tcgv_vec_arg(v2));
> +                  tcgv_vec_arg(t2), tcgv_vec_arg(zero), tcgv_vec_arg(v2));
>          vec_gen_3(INDEX_op_x86_punpckh_vec, type, MO_8,
> -                  tcgv_vec_arg(t3), tcgv_vec_arg(v1), tcgv_vec_arg(t4));
> +                  tcgv_vec_arg(t3), tcgv_vec_arg(v1), tcgv_vec_arg(zero));
>          vec_gen_3(INDEX_op_x86_punpckh_vec, type, MO_8,
> -                  tcgv_vec_arg(t4), tcgv_vec_arg(t4), tcgv_vec_arg(v2));
> +                  tcgv_vec_arg(t4), tcgv_vec_arg(zero), tcgv_vec_arg(v2));
>          tcg_gen_mul_vec(MO_16, t1, t1, t2);
>          tcg_gen_mul_vec(MO_16, t3, t3, t4);
>          tcg_gen_shri_vec(MO_16, t1, t1, 8);
> @@ -3525,7 +3525,7 @@ static bool expand_vec_cmp_noinv(TCGType type, unsigned vece, TCGv_vec v0,
>          NEED_UMIN = 8,
>          NEED_UMAX = 16,
>      };
> -    TCGv_vec t1, t2;
> +    TCGv_vec t1, t2, t3;
>      uint8_t fixup;
>  
>      switch (cond) {
> @@ -3596,9 +3596,9 @@ static bool expand_vec_cmp_noinv(TCGType type, unsigned vece, TCGv_vec v0,
>      } else if (fixup & NEED_BIAS) {
>          t1 = tcg_temp_new_vec(type);
>          t2 = tcg_temp_new_vec(type);
> -        tcg_gen_dupi_vec(vece, t2, 1ull << ((8 << vece) - 1));
> -        tcg_gen_sub_vec(vece, t1, v1, t2);
> -        tcg_gen_sub_vec(vece, t2, v2, t2);
> +        t3 = tcg_constant_vec(type, vece, 1ull << ((8 << vece) - 1));
> +        tcg_gen_sub_vec(vece, t1, v1, t3);
> +        tcg_gen_sub_vec(vece, t2, v2, t3);
>          v1 = t1;
>          v2 = t2;
>          cond = tcg_signed_cond(cond);


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 09/36] tcg: Consolidate 3 bits into enum TCGTempKind
  2020-04-22  1:16 ` [PATCH v2 09/36] tcg: Consolidate 3 bits into enum TCGTempKind Richard Henderson
  2020-04-22 11:25   ` Alex Bennée
@ 2020-04-22 19:58   ` Aleksandar Markovic
  2020-04-23  9:00     ` Philippe Mathieu-Daudé
  1 sibling, 1 reply; 75+ messages in thread
From: Aleksandar Markovic @ 2020-04-22 19:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Alex Bennée, QEMU Developers

сре, 22. апр 2020. у 03:27 Richard Henderson
<richard.henderson@linaro.org> је написао/ла:
>
> The temp_fixed, temp_global, temp_local bits are all related.
> Combine them into a single enumeration.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/tcg/tcg.h |  20 +++++---
>  tcg/optimize.c    |   8 +--
>  tcg/tcg.c         | 122 ++++++++++++++++++++++++++++------------------
>  3 files changed, 90 insertions(+), 60 deletions(-)
>
> diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
> index c48bd76b0a..3534dce77f 100644
> --- a/include/tcg/tcg.h
> +++ b/include/tcg/tcg.h
> @@ -480,23 +480,27 @@ typedef enum TCGTempVal {
>      TEMP_VAL_CONST,
>  } TCGTempVal;
>
> +typedef enum TCGTempKind {
> +    /* Temp is dead at the end of all basic blocks. */
> +    TEMP_NORMAL,
> +    /* Temp is saved across basic blocks but dead at the end of TBs. */
> +    TEMP_LOCAL,
> +    /* Temp is saved across both basic blocks and translation blocks. */
> +    TEMP_GLOBAL,
> +    /* Temp is in a fixed register. */
> +    TEMP_FIXED,
> +} TCGTempKind;
> +
>  typedef struct TCGTemp {
>      TCGReg reg:8;
>      TCGTempVal val_type:8;
>      TCGType base_type:8;
>      TCGType type:8;
> -    unsigned int fixed_reg:1;
> +    TCGTempKind kind:3;
>      unsigned int indirect_reg:1;
>      unsigned int indirect_base:1;
>      unsigned int mem_coherent:1;
>      unsigned int mem_allocated:1;
> -    /* If true, the temp is saved across both basic blocks and
> -       translation blocks.  */
> -    unsigned int temp_global:1;
> -    /* If true, the temp is saved across basic blocks but dead
> -       at the end of translation blocks.  If false, the temp is
> -       dead at the end of basic blocks.  */
> -    unsigned int temp_local:1;
>      unsigned int temp_allocated:1;
>
>      tcg_target_long val;
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 53aa8e5329..afb4a9a5a9 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -116,21 +116,21 @@ static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
>      TCGTemp *i;
>
>      /* If this is already a global, we can't do better. */
> -    if (ts->temp_global) {
> +    if (ts->kind >= TEMP_GLOBAL) {
>          return ts;
>      }
>
>      /* Search for a global first. */
>      for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
> -        if (i->temp_global) {
> +        if (i->kind >= TEMP_GLOBAL) {
>              return i;
>          }
>      }
>
>      /* If it is a temp, search for a temp local. */
> -    if (!ts->temp_local) {
> +    if (ts->kind == TEMP_NORMAL) {
>          for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
> -            if (ts->temp_local) {
> +            if (i->kind >= TEMP_LOCAL) {
>                  return i;
>              }
>          }
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index dd4b3d7684..eaf81397a3 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1155,7 +1155,7 @@ static inline TCGTemp *tcg_global_alloc(TCGContext *s)
>      tcg_debug_assert(s->nb_globals == s->nb_temps);
>      s->nb_globals++;
>      ts = tcg_temp_alloc(s);
> -    ts->temp_global = 1;
> +    ts->kind = TEMP_GLOBAL;
>
>      return ts;
>  }
> @@ -1172,7 +1172,7 @@ static TCGTemp *tcg_global_reg_new_internal(TCGContext *s, TCGType type,
>      ts = tcg_global_alloc(s);
>      ts->base_type = type;
>      ts->type = type;
> -    ts->fixed_reg = 1;
> +    ts->kind = TEMP_FIXED;
>      ts->reg = reg;
>      ts->name = name;
>      tcg_regset_set_reg(s->reserved_regs, reg);
> @@ -1199,7 +1199,7 @@ TCGTemp *tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
>      bigendian = 1;
>  #endif
>
> -    if (!base_ts->fixed_reg) {
> +    if (base_ts->kind != TEMP_FIXED) {
>          /* We do not support double-indirect registers.  */
>          tcg_debug_assert(!base_ts->indirect_reg);
>          base_ts->indirect_base = 1;
> @@ -1247,6 +1247,7 @@ TCGTemp *tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
>  TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
>  {
>      TCGContext *s = tcg_ctx;
> +    TCGTempKind kind = temp_local ? TEMP_LOCAL : TEMP_NORMAL;
>      TCGTemp *ts;
>      int idx, k;
>
> @@ -1259,7 +1260,7 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
>          ts = &s->temps[idx];
>          ts->temp_allocated = 1;
>          tcg_debug_assert(ts->base_type == type);
> -        tcg_debug_assert(ts->temp_local == temp_local);
> +        tcg_debug_assert(ts->kind == kind);
>      } else {
>          ts = tcg_temp_alloc(s);
>          if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
> @@ -1268,18 +1269,18 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
>              ts->base_type = type;
>              ts->type = TCG_TYPE_I32;
>              ts->temp_allocated = 1;
> -            ts->temp_local = temp_local;
> +            ts->kind = kind;
>
>              tcg_debug_assert(ts2 == ts + 1);
>              ts2->base_type = TCG_TYPE_I64;
>              ts2->type = TCG_TYPE_I32;
>              ts2->temp_allocated = 1;
> -            ts2->temp_local = temp_local;
> +            ts2->kind = kind;
>          } else {
>              ts->base_type = type;
>              ts->type = type;
>              ts->temp_allocated = 1;
> -            ts->temp_local = temp_local;
> +            ts->kind = kind;
>          }
>      }
>
> @@ -1336,12 +1337,12 @@ void tcg_temp_free_internal(TCGTemp *ts)
>      }
>  #endif
>
> -    tcg_debug_assert(ts->temp_global == 0);
> +    tcg_debug_assert(ts->kind < TEMP_GLOBAL);
>      tcg_debug_assert(ts->temp_allocated != 0);
>      ts->temp_allocated = 0;
>
>      idx = temp_idx(ts);
> -    k = ts->base_type + (ts->temp_local ? TCG_TYPE_COUNT : 0);
> +    k = ts->base_type + (ts->kind == TEMP_NORMAL ? 0 : TCG_TYPE_COUNT);
>      set_bit(idx, s->free_temps[k].l);
>  }
>
> @@ -1864,17 +1865,27 @@ void tcg_gen_callN(void *func, TCGTemp *ret, int nargs, TCGTemp **args)
>  static void tcg_reg_alloc_start(TCGContext *s)
>  {
>      int i, n;
> -    TCGTemp *ts;
>
> -    for (i = 0, n = s->nb_globals; i < n; i++) {
> -        ts = &s->temps[i];
> -        ts->val_type = (ts->fixed_reg ? TEMP_VAL_REG : TEMP_VAL_MEM);
> -    }
> -    for (n = s->nb_temps; i < n; i++) {
> -        ts = &s->temps[i];
> -        ts->val_type = (ts->temp_local ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
> -        ts->mem_allocated = 0;
> -        ts->fixed_reg = 0;
> +    for (i = 0, n = s->nb_temps; i < n; i++) {
> +        TCGTemp *ts = &s->temps[i];
> +        TCGTempVal val = TEMP_VAL_MEM;
> +
> +        switch (ts->kind) {
> +        case TEMP_FIXED:
> +            val = TEMP_VAL_REG;
> +            break;
> +        case TEMP_GLOBAL:
> +            break;
> +        case TEMP_NORMAL:
> +            val = TEMP_VAL_DEAD;
> +            /* fall through */
> +        case TEMP_LOCAL:
> +            ts->mem_allocated = 0;
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
> +        ts->val_type = val;
>      }
>
>      memset(s->reg_to_temp, 0, sizeof(s->reg_to_temp));
> @@ -1885,12 +1896,17 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
>  {
>      int idx = temp_idx(ts);
>
> -    if (ts->temp_global) {
> +    switch (ts->kind) {
> +    case TEMP_FIXED:
> +    case TEMP_GLOBAL:
>          pstrcpy(buf, buf_size, ts->name);
> -    } else if (ts->temp_local) {
> +        break;
> +    case TEMP_LOCAL:
>          snprintf(buf, buf_size, "loc%d", idx - s->nb_globals);
> -    } else {
> +        break;
> +    case TEMP_NORMAL:
>          snprintf(buf, buf_size, "tmp%d", idx - s->nb_globals);
> +        break;
>      }

Hmm, why this switch doesn't have:

        default:
            g_assert_not_reached();

like the other ones?

Aleksandar

>      return buf;
>  }
> @@ -2486,15 +2502,24 @@ static void la_bb_end(TCGContext *s, int ng, int nt)
>  {
>      int i;
>
> -    for (i = 0; i < ng; ++i) {
> -        s->temps[i].state = TS_DEAD | TS_MEM;
> -        la_reset_pref(&s->temps[i]);
> -    }
> -    for (i = ng; i < nt; ++i) {
> -        s->temps[i].state = (s->temps[i].temp_local
> -                             ? TS_DEAD | TS_MEM
> -                             : TS_DEAD);
> -        la_reset_pref(&s->temps[i]);
> +    for (i = 0; i < nt; ++i) {
> +        TCGTemp *ts = &s->temps[i];
> +        int state;
> +
> +        switch (ts->kind) {
> +        case TEMP_FIXED:
> +        case TEMP_GLOBAL:
> +        case TEMP_LOCAL:
> +            state = TS_DEAD | TS_MEM;
> +            break;
> +        case TEMP_NORMAL:
> +            state = TS_DEAD;
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
> +        ts->state = state;
> +        la_reset_pref(ts);
>      }
>  }
>
> @@ -3069,7 +3094,8 @@ static void check_regs(TCGContext *s)
>      }
>      for (k = 0; k < s->nb_temps; k++) {
>          ts = &s->temps[k];
> -        if (ts->val_type == TEMP_VAL_REG && !ts->fixed_reg
> +        if (ts->val_type == TEMP_VAL_REG
> +            && ts->kind != TEMP_FIXED
>              && s->reg_to_temp[ts->reg] != ts) {
>              printf("Inconsistency for temp %s:\n",
>                     tcg_get_arg_str_ptr(s, buf, sizeof(buf), ts));
> @@ -3106,15 +3132,14 @@ static void temp_load(TCGContext *, TCGTemp *, TCGRegSet, TCGRegSet, TCGRegSet);
>     mark it free; otherwise mark it dead.  */
>  static void temp_free_or_dead(TCGContext *s, TCGTemp *ts, int free_or_dead)
>  {
> -    if (ts->fixed_reg) {
> +    if (ts->kind == TEMP_FIXED) {
>          return;
>      }
>      if (ts->val_type == TEMP_VAL_REG) {
>          s->reg_to_temp[ts->reg] = NULL;
>      }
>      ts->val_type = (free_or_dead < 0
> -                    || ts->temp_local
> -                    || ts->temp_global
> +                    || ts->kind != TEMP_NORMAL
>                      ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
>  }
>
> @@ -3131,7 +3156,7 @@ static inline void temp_dead(TCGContext *s, TCGTemp *ts)
>  static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs,
>                        TCGRegSet preferred_regs, int free_or_dead)
>  {
> -    if (ts->fixed_reg) {
> +    if (ts->kind == TEMP_FIXED) {
>          return;
>      }
>      if (!ts->mem_coherent) {
> @@ -3289,7 +3314,8 @@ static void temp_save(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs)
>  {
>      /* The liveness analysis already ensures that globals are back
>         in memory. Keep an tcg_debug_assert for safety. */
> -    tcg_debug_assert(ts->val_type == TEMP_VAL_MEM || ts->fixed_reg);
> +    tcg_debug_assert(ts->val_type == TEMP_VAL_MEM
> +                     || ts->kind == TEMP_FIXED);
>  }
>
>  /* save globals to their canonical location and assume they can be
> @@ -3314,7 +3340,7 @@ static void sync_globals(TCGContext *s, TCGRegSet allocated_regs)
>      for (i = 0, n = s->nb_globals; i < n; i++) {
>          TCGTemp *ts = &s->temps[i];
>          tcg_debug_assert(ts->val_type != TEMP_VAL_REG
> -                         || ts->fixed_reg
> +                         || ts->kind == TEMP_FIXED
>                           || ts->mem_coherent);
>      }
>  }
> @@ -3327,7 +3353,7 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
>
>      for (i = s->nb_globals; i < s->nb_temps; i++) {
>          TCGTemp *ts = &s->temps[i];
> -        if (ts->temp_local) {
> +        if (ts->kind == TEMP_LOCAL) {
>              temp_save(s, ts, allocated_regs);
>          } else {
>              /* The liveness analysis already ensures that temps are dead.
> @@ -3347,7 +3373,7 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
>                                    TCGRegSet preferred_regs)
>  {
>      /* ENV should not be modified.  */
> -    tcg_debug_assert(!ots->fixed_reg);
> +    tcg_debug_assert(ots->kind != TEMP_FIXED);
>
>      /* The movi is not explicitly generated here.  */
>      if (ots->val_type == TEMP_VAL_REG) {
> @@ -3387,7 +3413,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
>      ts = arg_temp(op->args[1]);
>
>      /* ENV should not be modified.  */
> -    tcg_debug_assert(!ots->fixed_reg);
> +    tcg_debug_assert(ots->kind != TEMP_FIXED);
>
>      /* Note that otype != itype for no-op truncation.  */
>      otype = ots->type;
> @@ -3426,7 +3452,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
>          }
>          temp_dead(s, ots);
>      } else {
> -        if (IS_DEAD_ARG(1) && !ts->fixed_reg) {
> +        if (IS_DEAD_ARG(1) && ts->kind != TEMP_FIXED) {
>              /* the mov can be suppressed */
>              if (ots->val_type == TEMP_VAL_REG) {
>                  s->reg_to_temp[ots->reg] = NULL;
> @@ -3448,7 +3474,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
>                   * Store the source register into the destination slot
>                   * and leave the destination temp as TEMP_VAL_MEM.
>                   */
> -                assert(!ots->fixed_reg);
> +                assert(ots->kind != TEMP_FIXED);
>                  if (!ts->mem_allocated) {
>                      temp_allocate_frame(s, ots);
>                  }
> @@ -3485,7 +3511,7 @@ static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp *op)
>      its = arg_temp(op->args[1]);
>
>      /* ENV should not be modified.  */
> -    tcg_debug_assert(!ots->fixed_reg);
> +    tcg_debug_assert(ots->kind != TEMP_FIXED);
>
>      itype = its->type;
>      vece = TCGOP_VECE(op);
> @@ -3625,7 +3651,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>          i_preferred_regs = o_preferred_regs = 0;
>          if (arg_ct->ct & TCG_CT_IALIAS) {
>              o_preferred_regs = op->output_pref[arg_ct->alias_index];
> -            if (ts->fixed_reg) {
> +            if (ts->kind == TEMP_FIXED) {
>                  /* if fixed register, we must allocate a new register
>                     if the alias is not the same register */
>                  if (arg != op->args[arg_ct->alias_index]) {
> @@ -3716,7 +3742,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>              ts = arg_temp(arg);
>
>              /* ENV should not be modified.  */
> -            tcg_debug_assert(!ts->fixed_reg);
> +            tcg_debug_assert(ts->kind != TEMP_FIXED);
>
>              if ((arg_ct->ct & TCG_CT_ALIAS)
>                  && !const_args[arg_ct->alias_index]) {
> @@ -3758,7 +3784,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>          ts = arg_temp(op->args[i]);
>
>          /* ENV should not be modified.  */
> -        tcg_debug_assert(!ts->fixed_reg);
> +        tcg_debug_assert(ts->kind != TEMP_FIXED);
>
>          if (NEED_SYNC_ARG(i)) {
>              temp_sync(s, ts, o_allocated_regs, 0, IS_DEAD_ARG(i));
> @@ -3890,7 +3916,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
>          ts = arg_temp(arg);
>
>          /* ENV should not be modified.  */
> -        tcg_debug_assert(!ts->fixed_reg);
> +        tcg_debug_assert(ts->kind != TEMP_FIXED);
>
>          reg = tcg_target_call_oarg_regs[i];
>          tcg_debug_assert(s->reg_to_temp[reg] == NULL);
> --
> 2.20.1
>
>


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 13/36] tcg: Use tcg_constant_{i32,i64} with tcg int expanders
  2020-04-22  1:16 ` [PATCH v2 13/36] tcg: Use tcg_constant_{i32, i64} with tcg int expanders Richard Henderson
  2020-04-22 16:18   ` [PATCH v2 13/36] tcg: Use tcg_constant_{i32,i64} " Alex Bennée
@ 2020-04-22 20:04   ` Alex Bennée
  2020-04-23 23:13     ` Richard Henderson
  1 sibling, 1 reply; 75+ messages in thread
From: Alex Bennée @ 2020-04-22 20:04 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

We have a regression. Setting up a build dir with:

  ../../configure --disable-tools --disable-docs --target-list=sparc-softmmu,sparc64-softmmu
  make -j30 && make check-acceptance

And then running a bisect between HEAD and master:

  git bisect run /bin/sh -c "cd builds/bisect && make -j30 && ./tests/venv/bin/avocado run ./tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_sparc_ss20"

Fingers:

  a4d42b76dd29818e4f393c4c3eb59601b0015b2f is the first bad commit
  commit a4d42b76dd29818e4f393c4c3eb59601b0015b2f
  Author: Richard Henderson <richard.henderson@linaro.org>
  Date:   Tue Apr 21 18:16:59 2020 -0700

      tcg: Use tcg_constant_{i32,i64} with tcg int expanders

      Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
      Message-Id: <20200422011722.13287-14-richard.henderson@linaro.org>

  :040000 040000 45283ae0961f2794f5f15e09c29f160372fb5fae 92939e91645a5cf4fc36d475ff5dddd0839a7314 M      include
  :040000 040000 1083f94f8f045924fbf1e1f9c116f05827c25345 31a5dfc97636fcd0a114b910095b11cb767a22db M      tcg
  bisect run success

> ---
>  include/tcg/tcg-op.h |  13 +--
>  tcg/tcg-op.c         | 216 ++++++++++++++++++++-----------------------
>  2 files changed, 100 insertions(+), 129 deletions(-)
>
> diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
> index 230db6e022..11ed9192f7 100644
> --- a/include/tcg/tcg-op.h
> +++ b/include/tcg/tcg-op.h
> @@ -271,6 +271,7 @@ void tcg_gen_mb(TCGBar);
>  
>  /* 32 bit ops */
>  
> +void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg);
>  void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
>  void tcg_gen_subfi_i32(TCGv_i32 ret, int32_t arg1, TCGv_i32 arg2);
>  void tcg_gen_subi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
> @@ -349,11 +350,6 @@ static inline void tcg_gen_mov_i32(TCGv_i32 ret, TCGv_i32 arg)
>      }
>  }
>  
> -static inline void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg)
> -{
> -    tcg_gen_op2i_i32(INDEX_op_movi_i32, ret, arg);
> -}
> -
>  static inline void tcg_gen_ld8u_i32(TCGv_i32 ret, TCGv_ptr arg2,
>                                      tcg_target_long offset)
>  {
> @@ -467,6 +463,7 @@ static inline void tcg_gen_not_i32(TCGv_i32 ret, TCGv_i32 arg)
>  
>  /* 64 bit ops */
>  
> +void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg);
>  void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
>  void tcg_gen_subfi_i64(TCGv_i64 ret, int64_t arg1, TCGv_i64 arg2);
>  void tcg_gen_subi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
> @@ -550,11 +547,6 @@ static inline void tcg_gen_mov_i64(TCGv_i64 ret, TCGv_i64 arg)
>      }
>  }
>  
> -static inline void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg)
> -{
> -    tcg_gen_op2i_i64(INDEX_op_movi_i64, ret, arg);
> -}
> -
>  static inline void tcg_gen_ld8u_i64(TCGv_i64 ret, TCGv_ptr arg2,
>                                      tcg_target_long offset)
>  {
> @@ -698,7 +690,6 @@ static inline void tcg_gen_sub_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>  
>  void tcg_gen_discard_i64(TCGv_i64 arg);
>  void tcg_gen_mov_i64(TCGv_i64 ret, TCGv_i64 arg);
> -void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg);
>  void tcg_gen_ld8u_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
>  void tcg_gen_ld8s_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
>  void tcg_gen_ld16u_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index e2e25ebf7d..07eb661a07 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -104,15 +104,18 @@ void tcg_gen_mb(TCGBar mb_type)
>  
>  /* 32 bit ops */
>  
> +void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg)
> +{
> +    tcg_gen_mov_i32(ret, tcg_constant_i32(arg));
> +}
> +
>  void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>  {
>      /* some cases can be optimized here */
>      if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_add_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_add_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -122,9 +125,7 @@ void tcg_gen_subfi_i32(TCGv_i32 ret, int32_t arg1, TCGv_i32 arg2)
>          /* Don't recurse with tcg_gen_neg_i32.  */
>          tcg_gen_op2_i32(INDEX_op_neg_i32, ret, arg2);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg1);
> -        tcg_gen_sub_i32(ret, t0, arg2);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_sub_i32(ret, tcg_constant_i32(arg1), arg2);
>      }
>  }
>  
> @@ -134,15 +135,12 @@ void tcg_gen_subi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>      if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_sub_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_sub_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
>  void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>  {
> -    TCGv_i32 t0;
>      /* Some cases can be optimized here.  */
>      switch (arg2) {
>      case 0:
> @@ -165,9 +163,8 @@ void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>          }
>          break;
>      }
> -    t0 = tcg_const_i32(arg2);
> -    tcg_gen_and_i32(ret, arg1, t0);
> -    tcg_temp_free_i32(t0);
> +
> +    tcg_gen_and_i32(ret, arg1, tcg_constant_i32(arg2));
>  }
>  
>  void tcg_gen_ori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
> @@ -178,9 +175,7 @@ void tcg_gen_ori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>      } else if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_or_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_or_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -193,9 +188,7 @@ void tcg_gen_xori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>          /* Don't recurse with tcg_gen_not_i32.  */
>          tcg_gen_op2_i32(INDEX_op_not_i32, ret, arg1);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_xor_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_xor_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -205,9 +198,7 @@ void tcg_gen_shli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>      if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_shl_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_shl_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -217,9 +208,7 @@ void tcg_gen_shri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>      if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_shr_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_shr_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -229,9 +218,7 @@ void tcg_gen_sari_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>      if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_sar_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_sar_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -250,9 +237,7 @@ void tcg_gen_brcondi_i32(TCGCond cond, TCGv_i32 arg1, int32_t arg2, TCGLabel *l)
>      if (cond == TCG_COND_ALWAYS) {
>          tcg_gen_br(l);
>      } else if (cond != TCG_COND_NEVER) {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_brcond_i32(cond, arg1, t0, l);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_brcond_i32(cond, arg1, tcg_constant_i32(arg2), l);
>      }
>  }
>  
> @@ -271,9 +256,7 @@ void tcg_gen_setcond_i32(TCGCond cond, TCGv_i32 ret,
>  void tcg_gen_setcondi_i32(TCGCond cond, TCGv_i32 ret,
>                            TCGv_i32 arg1, int32_t arg2)
>  {
> -    TCGv_i32 t0 = tcg_const_i32(arg2);
> -    tcg_gen_setcond_i32(cond, ret, arg1, t0);
> -    tcg_temp_free_i32(t0);
> +    tcg_gen_setcond_i32(cond, ret, arg1, tcg_constant_i32(arg2));
>  }
>  
>  void tcg_gen_muli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
> @@ -283,9 +266,7 @@ void tcg_gen_muli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>      } else if (is_power_of_2(arg2)) {
>          tcg_gen_shli_i32(ret, arg1, ctz32(arg2));
>      } else {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_mul_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_mul_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -433,9 +414,7 @@ void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
>  
>  void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
>  {
> -    TCGv_i32 t = tcg_const_i32(arg2);
> -    tcg_gen_clz_i32(ret, arg1, t);
> -    tcg_temp_free_i32(t);
> +    tcg_gen_clz_i32(ret, arg1, tcg_constant_i32(arg2));
>  }
>  
>  void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
> @@ -468,10 +447,9 @@ void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
>              tcg_gen_clzi_i32(t, t, 32);
>              tcg_gen_xori_i32(t, t, 31);
>          }
> -        z = tcg_const_i32(0);
> +        z = tcg_constant_i32(0);
>          tcg_gen_movcond_i32(TCG_COND_EQ, ret, arg1, z, arg2, t);
>          tcg_temp_free_i32(t);
> -        tcg_temp_free_i32(z);
>      } else {
>          gen_helper_ctz_i32(ret, arg1, arg2);
>      }
> @@ -487,9 +465,7 @@ void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
>          tcg_gen_ctpop_i32(ret, t);
>          tcg_temp_free_i32(t);
>      } else {
> -        TCGv_i32 t = tcg_const_i32(arg2);
> -        tcg_gen_ctz_i32(ret, arg1, t);
> -        tcg_temp_free_i32(t);
> +        tcg_gen_ctz_i32(ret, arg1, tcg_constant_i32(arg2));
>      }
>  }
>  
> @@ -547,9 +523,7 @@ void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2)
>      if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
>      } else if (TCG_TARGET_HAS_rot_i32) {
> -        TCGv_i32 t0 = tcg_const_i32(arg2);
> -        tcg_gen_rotl_i32(ret, arg1, t0);
> -        tcg_temp_free_i32(t0);
> +        tcg_gen_rotl_i32(ret, arg1, tcg_constant_i32(arg2));
>      } else {
>          TCGv_i32 t0, t1;
>          t0 = tcg_temp_new_i32();
> @@ -653,9 +627,8 @@ void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
>          tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
>      } else if (TCG_TARGET_HAS_deposit_i32
>                 && TCG_TARGET_deposit_i32_valid(ofs, len)) {
> -        TCGv_i32 zero = tcg_const_i32(0);
> +        TCGv_i32 zero = tcg_constant_i32(0);
>          tcg_gen_op5ii_i32(INDEX_op_deposit_i32, ret, zero, arg, ofs, len);
> -        tcg_temp_free_i32(zero);
>      } else {
>          /* To help two-operand hosts we prefer to zero-extend first,
>             which allows ARG to stay live.  */
> @@ -1052,7 +1025,7 @@ void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
>      } else {
>          TCGv_i32 t0 = tcg_temp_new_i32();
>          TCGv_i32 t1 = tcg_temp_new_i32();
> -        TCGv_i32 t2 = tcg_const_i32(0x00ff00ff);
> +        TCGv_i32 t2 = tcg_constant_i32(0x00ff00ff);
>  
>                                          /* arg = abcd */
>          tcg_gen_shri_i32(t0, arg, 8);   /*  t0 = .abc */
> @@ -1067,7 +1040,6 @@ void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
>  
>          tcg_temp_free_i32(t0);
>          tcg_temp_free_i32(t1);
> -        tcg_temp_free_i32(t2);
>      }
>  }
>  
> @@ -1237,6 +1209,14 @@ void tcg_gen_mul_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>      tcg_temp_free_i64(t0);
>      tcg_temp_free_i32(t1);
>  }
> +
> +#else
> +
> +void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg)
> +{
> +    tcg_gen_mov_i64(ret, tcg_constant_i64(arg));
> +}
> +
>  #endif /* TCG_TARGET_REG_SIZE == 32 */
>  
>  void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
> @@ -1244,10 +1224,12 @@ void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      /* some cases can be optimized here */
>      if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
> +    } else if (TCG_TARGET_REG_BITS == 64) {
> +        tcg_gen_add_i64(ret, arg1, tcg_constant_i64(arg2));
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_add_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_add2_i32(TCGV_LOW(ret), TCGV_HIGH(ret),
> +                         TCGV_LOW(arg1), TCGV_HIGH(arg1),
> +                         tcg_constant_i32(arg2), tcg_constant_i32(arg2 >> 32));
>      }
>  }
>  
> @@ -1256,10 +1238,12 @@ void tcg_gen_subfi_i64(TCGv_i64 ret, int64_t arg1, TCGv_i64 arg2)
>      if (arg1 == 0 && TCG_TARGET_HAS_neg_i64) {
>          /* Don't recurse with tcg_gen_neg_i64.  */
>          tcg_gen_op2_i64(INDEX_op_neg_i64, ret, arg2);
> +    } else if (TCG_TARGET_REG_BITS == 64) {
> +        tcg_gen_sub_i64(ret, tcg_constant_i64(arg1), arg2);
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg1);
> -        tcg_gen_sub_i64(ret, t0, arg2);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_sub2_i32(TCGV_LOW(ret), TCGV_HIGH(ret),
> +                         tcg_constant_i32(arg1), tcg_constant_i32(arg1 >> 32),
> +                         TCGV_LOW(arg2), TCGV_HIGH(arg2));
>      }
>  }
>  
> @@ -1268,17 +1252,17 @@ void tcg_gen_subi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      /* some cases can be optimized here */
>      if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
> +    } else if (TCG_TARGET_REG_BITS == 64) {
> +        tcg_gen_sub_i64(ret, arg1, tcg_constant_i64(arg2));
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_sub_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_sub2_i32(TCGV_LOW(ret), TCGV_HIGH(ret),
> +                         TCGV_LOW(arg1), TCGV_HIGH(arg1),
> +                         tcg_constant_i32(arg2), tcg_constant_i32(arg2 >> 32));
>      }
>  }
>  
>  void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>  {
> -    TCGv_i64 t0;
> -
>      if (TCG_TARGET_REG_BITS == 32) {
>          tcg_gen_andi_i32(TCGV_LOW(ret), TCGV_LOW(arg1), arg2);
>          tcg_gen_andi_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1), arg2 >> 32);
> @@ -1313,9 +1297,8 @@ void tcg_gen_andi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>          }
>          break;
>      }
> -    t0 = tcg_const_i64(arg2);
> -    tcg_gen_and_i64(ret, arg1, t0);
> -    tcg_temp_free_i64(t0);
> +
> +    tcg_gen_and_i64(ret, arg1, tcg_constant_i64(arg2));
>  }
>  
>  void tcg_gen_ori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
> @@ -1331,9 +1314,7 @@ void tcg_gen_ori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      } else if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_or_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_or_i64(ret, arg1, tcg_constant_i64(arg2));
>      }
>  }
>  
> @@ -1351,9 +1332,7 @@ void tcg_gen_xori_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>          /* Don't recurse with tcg_gen_not_i64.  */
>          tcg_gen_op2_i64(INDEX_op_not_i64, ret, arg1);
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_xor_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_xor_i64(ret, arg1, tcg_constant_i64(arg2));
>      }
>  }
>  
> @@ -1415,9 +1394,7 @@ void tcg_gen_shli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      } else if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_shl_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_shl_i64(ret, arg1, tcg_constant_i64(arg2));
>      }
>  }
>  
> @@ -1429,9 +1406,7 @@ void tcg_gen_shri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      } else if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_shr_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_shr_i64(ret, arg1, tcg_constant_i64(arg2));
>      }
>  }
>  
> @@ -1443,9 +1418,7 @@ void tcg_gen_sari_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      } else if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
>      } else {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_sar_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_sar_i64(ret, arg1, tcg_constant_i64(arg2));
>      }
>  }
>  
> @@ -1468,12 +1441,17 @@ void tcg_gen_brcond_i64(TCGCond cond, TCGv_i64 arg1, TCGv_i64 arg2, TCGLabel *l)
>  
>  void tcg_gen_brcondi_i64(TCGCond cond, TCGv_i64 arg1, int64_t arg2, TCGLabel *l)
>  {
> -    if (cond == TCG_COND_ALWAYS) {
> +    if (TCG_TARGET_REG_BITS == 64) {
> +        tcg_gen_brcond_i64(cond, arg1, tcg_constant_i64(arg2), l);
> +    } else if (cond == TCG_COND_ALWAYS) {
>          tcg_gen_br(l);
>      } else if (cond != TCG_COND_NEVER) {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_brcond_i64(cond, arg1, t0, l);
> -        tcg_temp_free_i64(t0);
> +        l->refs++;
> +        tcg_gen_op6ii_i32(INDEX_op_brcond2_i32,
> +                          TCGV_LOW(arg1), TCGV_HIGH(arg1),
> +                          tcg_constant_i32(arg2),
> +                          tcg_constant_i32(arg2 >> 32),
> +                          cond, label_arg(l));
>      }
>  }
>  
> @@ -1499,9 +1477,19 @@ void tcg_gen_setcond_i64(TCGCond cond, TCGv_i64 ret,
>  void tcg_gen_setcondi_i64(TCGCond cond, TCGv_i64 ret,
>                            TCGv_i64 arg1, int64_t arg2)
>  {
> -    TCGv_i64 t0 = tcg_const_i64(arg2);
> -    tcg_gen_setcond_i64(cond, ret, arg1, t0);
> -    tcg_temp_free_i64(t0);
> +    if (TCG_TARGET_REG_BITS == 64) {
> +        tcg_gen_setcond_i64(cond, ret, arg1, tcg_constant_i64(arg2));
> +    } else if (cond == TCG_COND_ALWAYS) {
> +        tcg_gen_movi_i64(ret, 1);
> +    } else if (cond == TCG_COND_NEVER) {
> +        tcg_gen_movi_i64(ret, 0);
> +    } else {
> +        tcg_gen_op6i_i32(INDEX_op_setcond2_i32, TCGV_LOW(ret),
> +                         TCGV_LOW(arg1), TCGV_HIGH(arg1),
> +                         tcg_constant_i32(arg2),
> +                         tcg_constant_i32(arg2 >> 32), cond);
> +        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
> +    }
>  }
>  
>  void tcg_gen_muli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
> @@ -1690,7 +1678,7 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
>      } else {
>          TCGv_i64 t0 = tcg_temp_new_i64();
>          TCGv_i64 t1 = tcg_temp_new_i64();
> -        TCGv_i64 t2 = tcg_const_i64(0x00ff00ff);
> +        TCGv_i64 t2 = tcg_constant_i64(0x00ff00ff);
>  
>                                          /* arg = ....abcd */
>          tcg_gen_shri_i64(t0, arg, 8);   /*  t0 = .....abc */
> @@ -1706,7 +1694,6 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
>  
>          tcg_temp_free_i64(t0);
>          tcg_temp_free_i64(t1);
> -        tcg_temp_free_i64(t2);
>      }
>  }
>  
> @@ -1850,16 +1837,16 @@ void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
>      if (TCG_TARGET_REG_BITS == 32
>          && TCG_TARGET_HAS_clz_i32
>          && arg2 <= 0xffffffffu) {
> -        TCGv_i32 t = tcg_const_i32((uint32_t)arg2 - 32);
> -        tcg_gen_clz_i32(t, TCGV_LOW(arg1), t);
> +        TCGv_i32 t = tcg_temp_new_i32();
> +        tcg_gen_clzi_i32(t, TCGV_LOW(arg1), arg2 - 32);
>          tcg_gen_addi_i32(t, t, 32);
>          tcg_gen_clz_i32(TCGV_LOW(ret), TCGV_HIGH(arg1), t);
>          tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
>          tcg_temp_free_i32(t);
>      } else {
> -        TCGv_i64 t = tcg_const_i64(arg2);
> -        tcg_gen_clz_i64(ret, arg1, t);
> -        tcg_temp_free_i64(t);
> +        TCGv_i64 t0 = tcg_const_i64(arg2);
> +        tcg_gen_clz_i64(ret, arg1, t0);
> +        tcg_temp_free_i64(t0);
>      }
>  }
>  
> @@ -1881,7 +1868,7 @@ void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>              tcg_gen_clzi_i64(t, t, 64);
>              tcg_gen_xori_i64(t, t, 63);
>          }
> -        z = tcg_const_i64(0);
> +        z = tcg_constant_i64(0);
>          tcg_gen_movcond_i64(TCG_COND_EQ, ret, arg1, z, arg2, t);
>          tcg_temp_free_i64(t);
>          tcg_temp_free_i64(z);
> @@ -1895,8 +1882,8 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
>      if (TCG_TARGET_REG_BITS == 32
>          && TCG_TARGET_HAS_ctz_i32
>          && arg2 <= 0xffffffffu) {
> -        TCGv_i32 t32 = tcg_const_i32((uint32_t)arg2 - 32);
> -        tcg_gen_ctz_i32(t32, TCGV_HIGH(arg1), t32);
> +        TCGv_i32 t32 = tcg_temp_new_i32();
> +        tcg_gen_ctzi_i32(t32, TCGV_HIGH(arg1), arg2 - 32);
>          tcg_gen_addi_i32(t32, t32, 32);
>          tcg_gen_ctz_i32(TCGV_LOW(ret), TCGV_LOW(arg1), t32);
>          tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
> @@ -1911,9 +1898,9 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
>          tcg_gen_ctpop_i64(ret, t);
>          tcg_temp_free_i64(t);
>      } else {
> -        TCGv_i64 t64 = tcg_const_i64(arg2);
> -        tcg_gen_ctz_i64(ret, arg1, t64);
> -        tcg_temp_free_i64(t64);
> +        TCGv_i64 t0 = tcg_const_i64(arg2);
> +        tcg_gen_ctz_i64(ret, arg1, t0);
> +        tcg_temp_free_i64(t0);
>      }
>  }
>  
> @@ -1969,9 +1956,7 @@ void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2)
>      if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
>      } else if (TCG_TARGET_HAS_rot_i64) {
> -        TCGv_i64 t0 = tcg_const_i64(arg2);
> -        tcg_gen_rotl_i64(ret, arg1, t0);
> -        tcg_temp_free_i64(t0);
> +        tcg_gen_rotl_i64(ret, arg1, tcg_constant_i64(arg2));
>      } else {
>          TCGv_i64 t0, t1;
>          t0 = tcg_temp_new_i64();
> @@ -2089,9 +2074,8 @@ void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
>          tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
>      } else if (TCG_TARGET_HAS_deposit_i64
>                 && TCG_TARGET_deposit_i64_valid(ofs, len)) {
> -        TCGv_i64 zero = tcg_const_i64(0);
> +        TCGv_i64 zero = tcg_constant_i64(0);
>          tcg_gen_op5ii_i64(INDEX_op_deposit_i64, ret, zero, arg, ofs, len);
> -        tcg_temp_free_i64(zero);
>      } else {
>          if (TCG_TARGET_REG_BITS == 32) {
>              if (ofs >= 32) {
> @@ -3102,9 +3086,8 @@ void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
>  
>  #ifdef CONFIG_SOFTMMU
>          {
> -            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
> -            gen(retv, cpu_env, addr, cmpv, newv, oi);
> -            tcg_temp_free_i32(oi);
> +            TCGMemOpIdx oi = make_memop_idx(memop & ~MO_SIGN, idx);
> +            gen(retv, cpu_env, addr, cmpv, newv, tcg_constant_i32(oi));
>          }
>  #else
>          gen(retv, cpu_env, addr, cmpv, newv);
> @@ -3147,9 +3130,8 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
>  
>  #ifdef CONFIG_SOFTMMU
>          {
> -            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop, idx));
> -            gen(retv, cpu_env, addr, cmpv, newv, oi);
> -            tcg_temp_free_i32(oi);
> +            TCGMemOpIdx oi = make_memop_idx(memop, idx);
> +            gen(retv, cpu_env, addr, cmpv, newv, tcg_constant_i32(oi));
>          }
>  #else
>          gen(retv, cpu_env, addr, cmpv, newv);
> @@ -3210,9 +3192,8 @@ static void do_atomic_op_i32(TCGv_i32 ret, TCGv addr, TCGv_i32 val,
>  
>  #ifdef CONFIG_SOFTMMU
>      {
> -        TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
> -        gen(ret, cpu_env, addr, val, oi);
> -        tcg_temp_free_i32(oi);
> +        TCGMemOpIdx oi = make_memop_idx(memop & ~MO_SIGN, idx);
> +        gen(ret, cpu_env, addr, val, tcg_constant_i32(oi));
>      }
>  #else
>      gen(ret, cpu_env, addr, val);
> @@ -3255,9 +3236,8 @@ static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
>  
>  #ifdef CONFIG_SOFTMMU
>          {
> -            TCGv_i32 oi = tcg_const_i32(make_memop_idx(memop & ~MO_SIGN, idx));
> -            gen(ret, cpu_env, addr, val, oi);
> -            tcg_temp_free_i32(oi);
> +            TCGMemOpIdx oi = make_memop_idx(memop & ~MO_SIGN, idx);
> +            gen(ret, cpu_env, addr, val, tcg_constant_i32(oi));
>          }
>  #else
>          gen(ret, cpu_env, addr, val);


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 09/36] tcg: Consolidate 3 bits into enum TCGTempKind
  2020-04-22 19:58   ` Aleksandar Markovic
@ 2020-04-23  9:00     ` Philippe Mathieu-Daudé
  2020-04-23 15:40       ` Richard Henderson
  0 siblings, 1 reply; 75+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-04-23  9:00 UTC (permalink / raw)
  To: Aleksandar Markovic, Richard Henderson; +Cc: Alex Bennée, QEMU Developers

On 4/22/20 9:58 PM, Aleksandar Markovic wrote:
> сре, 22. апр 2020. у 03:27 Richard Henderson
> <richard.henderson@linaro.org> је написао/ла:
>>
>> The temp_fixed, temp_global, temp_local bits are all related.
>> Combine them into a single enumeration.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  include/tcg/tcg.h |  20 +++++---
>>  tcg/optimize.c    |   8 +--
>>  tcg/tcg.c         | 122 ++++++++++++++++++++++++++++------------------
>>  3 files changed, 90 insertions(+), 60 deletions(-)
>>
>> diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
>> index c48bd76b0a..3534dce77f 100644
>> --- a/include/tcg/tcg.h
>> +++ b/include/tcg/tcg.h
>> @@ -480,23 +480,27 @@ typedef enum TCGTempVal {
>>      TEMP_VAL_CONST,
>>  } TCGTempVal;
>>
>> +typedef enum TCGTempKind {
>> +    /* Temp is dead at the end of all basic blocks. */
>> +    TEMP_NORMAL,
>> +    /* Temp is saved across basic blocks but dead at the end of TBs. */
>> +    TEMP_LOCAL,
>> +    /* Temp is saved across both basic blocks and translation blocks. */
>> +    TEMP_GLOBAL,
>> +    /* Temp is in a fixed register. */
>> +    TEMP_FIXED,

4 cases, so currently 2 bits are enough.

>> +} TCGTempKind;
>> +
>>  typedef struct TCGTemp {
>>      TCGReg reg:8;
>>      TCGTempVal val_type:8;
>>      TCGType base_type:8;
>>      TCGType type:8;
>> -    unsigned int fixed_reg:1;
>> +    TCGTempKind kind:3;

But in case you plan to support more cases...

>>      unsigned int indirect_reg:1;
>>      unsigned int indirect_base:1;
>>      unsigned int mem_coherent:1;
>>      unsigned int mem_allocated:1;
>> -    /* If true, the temp is saved across both basic blocks and
>> -       translation blocks.  */
>> -    unsigned int temp_global:1;
>> -    /* If true, the temp is saved across basic blocks but dead
>> -       at the end of translation blocks.  If false, the temp is
>> -       dead at the end of basic blocks.  */
>> -    unsigned int temp_local:1;
>>      unsigned int temp_allocated:1;
>>
>>      tcg_target_long val;
>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>> index 53aa8e5329..afb4a9a5a9 100644
>> --- a/tcg/optimize.c
>> +++ b/tcg/optimize.c
>> @@ -116,21 +116,21 @@ static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
>>      TCGTemp *i;
>>
>>      /* If this is already a global, we can't do better. */
>> -    if (ts->temp_global) {
>> +    if (ts->kind >= TEMP_GLOBAL) {
>>          return ts;
>>      }
>>
>>      /* Search for a global first. */
>>      for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
>> -        if (i->temp_global) {
>> +        if (i->kind >= TEMP_GLOBAL) {
>>              return i;
>>          }
>>      }
>>
>>      /* If it is a temp, search for a temp local. */
>> -    if (!ts->temp_local) {
>> +    if (ts->kind == TEMP_NORMAL) {
>>          for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
>> -            if (ts->temp_local) {
>> +            if (i->kind >= TEMP_LOCAL) {
>>                  return i;
>>              }
>>          }
>> diff --git a/tcg/tcg.c b/tcg/tcg.c
>> index dd4b3d7684..eaf81397a3 100644
>> --- a/tcg/tcg.c
>> +++ b/tcg/tcg.c
>> @@ -1155,7 +1155,7 @@ static inline TCGTemp *tcg_global_alloc(TCGContext *s)
>>      tcg_debug_assert(s->nb_globals == s->nb_temps);
>>      s->nb_globals++;
>>      ts = tcg_temp_alloc(s);
>> -    ts->temp_global = 1;
>> +    ts->kind = TEMP_GLOBAL;
>>
>>      return ts;
>>  }
>> @@ -1172,7 +1172,7 @@ static TCGTemp *tcg_global_reg_new_internal(TCGContext *s, TCGType type,
>>      ts = tcg_global_alloc(s);
>>      ts->base_type = type;
>>      ts->type = type;
>> -    ts->fixed_reg = 1;
>> +    ts->kind = TEMP_FIXED;
>>      ts->reg = reg;
>>      ts->name = name;
>>      tcg_regset_set_reg(s->reserved_regs, reg);
>> @@ -1199,7 +1199,7 @@ TCGTemp *tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
>>      bigendian = 1;
>>  #endif
>>
>> -    if (!base_ts->fixed_reg) {
>> +    if (base_ts->kind != TEMP_FIXED) {
>>          /* We do not support double-indirect registers.  */
>>          tcg_debug_assert(!base_ts->indirect_reg);
>>          base_ts->indirect_base = 1;
>> @@ -1247,6 +1247,7 @@ TCGTemp *tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
>>  TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
>>  {
>>      TCGContext *s = tcg_ctx;
>> +    TCGTempKind kind = temp_local ? TEMP_LOCAL : TEMP_NORMAL;
>>      TCGTemp *ts;
>>      int idx, k;
>>
>> @@ -1259,7 +1260,7 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
>>          ts = &s->temps[idx];
>>          ts->temp_allocated = 1;
>>          tcg_debug_assert(ts->base_type == type);
>> -        tcg_debug_assert(ts->temp_local == temp_local);
>> +        tcg_debug_assert(ts->kind == kind);
>>      } else {
>>          ts = tcg_temp_alloc(s);
>>          if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
>> @@ -1268,18 +1269,18 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
>>              ts->base_type = type;
>>              ts->type = TCG_TYPE_I32;
>>              ts->temp_allocated = 1;
>> -            ts->temp_local = temp_local;
>> +            ts->kind = kind;
>>
>>              tcg_debug_assert(ts2 == ts + 1);
>>              ts2->base_type = TCG_TYPE_I64;
>>              ts2->type = TCG_TYPE_I32;
>>              ts2->temp_allocated = 1;
>> -            ts2->temp_local = temp_local;
>> +            ts2->kind = kind;
>>          } else {
>>              ts->base_type = type;
>>              ts->type = type;
>>              ts->temp_allocated = 1;
>> -            ts->temp_local = temp_local;
>> +            ts->kind = kind;
>>          }
>>      }
>>
>> @@ -1336,12 +1337,12 @@ void tcg_temp_free_internal(TCGTemp *ts)
>>      }
>>  #endif
>>
>> -    tcg_debug_assert(ts->temp_global == 0);
>> +    tcg_debug_assert(ts->kind < TEMP_GLOBAL);
>>      tcg_debug_assert(ts->temp_allocated != 0);
>>      ts->temp_allocated = 0;
>>
>>      idx = temp_idx(ts);
>> -    k = ts->base_type + (ts->temp_local ? TCG_TYPE_COUNT : 0);
>> +    k = ts->base_type + (ts->kind == TEMP_NORMAL ? 0 : TCG_TYPE_COUNT);
>>      set_bit(idx, s->free_temps[k].l);
>>  }
>>
>> @@ -1864,17 +1865,27 @@ void tcg_gen_callN(void *func, TCGTemp *ret, int nargs, TCGTemp **args)
>>  static void tcg_reg_alloc_start(TCGContext *s)
>>  {
>>      int i, n;
>> -    TCGTemp *ts;
>>
>> -    for (i = 0, n = s->nb_globals; i < n; i++) {
>> -        ts = &s->temps[i];
>> -        ts->val_type = (ts->fixed_reg ? TEMP_VAL_REG : TEMP_VAL_MEM);
>> -    }
>> -    for (n = s->nb_temps; i < n; i++) {
>> -        ts = &s->temps[i];
>> -        ts->val_type = (ts->temp_local ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
>> -        ts->mem_allocated = 0;
>> -        ts->fixed_reg = 0;
>> +    for (i = 0, n = s->nb_temps; i < n; i++) {
>> +        TCGTemp *ts = &s->temps[i];
>> +        TCGTempVal val = TEMP_VAL_MEM;
>> +
>> +        switch (ts->kind) {
>> +        case TEMP_FIXED:
>> +            val = TEMP_VAL_REG;
>> +            break;
>> +        case TEMP_GLOBAL:
>> +            break;
>> +        case TEMP_NORMAL:
>> +            val = TEMP_VAL_DEAD;
>> +            /* fall through */
>> +        case TEMP_LOCAL:
>> +            ts->mem_allocated = 0;
>> +            break;
>> +        default:
>> +            g_assert_not_reached();
>> +        }
>> +        ts->val_type = val;
>>      }
>>
>>      memset(s->reg_to_temp, 0, sizeof(s->reg_to_temp));
>> @@ -1885,12 +1896,17 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
>>  {
>>      int idx = temp_idx(ts);
>>
>> -    if (ts->temp_global) {
>> +    switch (ts->kind) {
>> +    case TEMP_FIXED:
>> +    case TEMP_GLOBAL:
>>          pstrcpy(buf, buf_size, ts->name);
>> -    } else if (ts->temp_local) {
>> +        break;
>> +    case TEMP_LOCAL:
>>          snprintf(buf, buf_size, "loc%d", idx - s->nb_globals);
>> -    } else {
>> +        break;
>> +    case TEMP_NORMAL:
>>          snprintf(buf, buf_size, "tmp%d", idx - s->nb_globals);
>> +        break;
>>      }
> 
> Hmm, why this switch doesn't have:
> 
>         default:
>             g_assert_not_reached();
> 
> like the other ones?

... then all switch should have a default case, as noticed Aleksandar.

With the default case fixed:
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> 
> Aleksandar
> 
>>      return buf;
>>  }
>> @@ -2486,15 +2502,24 @@ static void la_bb_end(TCGContext *s, int ng, int nt)
>>  {
>>      int i;
>>
>> -    for (i = 0; i < ng; ++i) {
>> -        s->temps[i].state = TS_DEAD | TS_MEM;
>> -        la_reset_pref(&s->temps[i]);
>> -    }
>> -    for (i = ng; i < nt; ++i) {
>> -        s->temps[i].state = (s->temps[i].temp_local
>> -                             ? TS_DEAD | TS_MEM
>> -                             : TS_DEAD);
>> -        la_reset_pref(&s->temps[i]);
>> +    for (i = 0; i < nt; ++i) {
>> +        TCGTemp *ts = &s->temps[i];
>> +        int state;
>> +
>> +        switch (ts->kind) {
>> +        case TEMP_FIXED:
>> +        case TEMP_GLOBAL:
>> +        case TEMP_LOCAL:
>> +            state = TS_DEAD | TS_MEM;
>> +            break;
>> +        case TEMP_NORMAL:
>> +            state = TS_DEAD;
>> +            break;
>> +        default:
>> +            g_assert_not_reached();
>> +        }
>> +        ts->state = state;
>> +        la_reset_pref(ts);
>>      }
>>  }
>>
>> @@ -3069,7 +3094,8 @@ static void check_regs(TCGContext *s)
>>      }
>>      for (k = 0; k < s->nb_temps; k++) {
>>          ts = &s->temps[k];
>> -        if (ts->val_type == TEMP_VAL_REG && !ts->fixed_reg
>> +        if (ts->val_type == TEMP_VAL_REG
>> +            && ts->kind != TEMP_FIXED
>>              && s->reg_to_temp[ts->reg] != ts) {
>>              printf("Inconsistency for temp %s:\n",
>>                     tcg_get_arg_str_ptr(s, buf, sizeof(buf), ts));
>> @@ -3106,15 +3132,14 @@ static void temp_load(TCGContext *, TCGTemp *, TCGRegSet, TCGRegSet, TCGRegSet);
>>     mark it free; otherwise mark it dead.  */
>>  static void temp_free_or_dead(TCGContext *s, TCGTemp *ts, int free_or_dead)
>>  {
>> -    if (ts->fixed_reg) {
>> +    if (ts->kind == TEMP_FIXED) {
>>          return;
>>      }
>>      if (ts->val_type == TEMP_VAL_REG) {
>>          s->reg_to_temp[ts->reg] = NULL;
>>      }
>>      ts->val_type = (free_or_dead < 0
>> -                    || ts->temp_local
>> -                    || ts->temp_global
>> +                    || ts->kind != TEMP_NORMAL
>>                      ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
>>  }
>>
>> @@ -3131,7 +3156,7 @@ static inline void temp_dead(TCGContext *s, TCGTemp *ts)
>>  static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs,
>>                        TCGRegSet preferred_regs, int free_or_dead)
>>  {
>> -    if (ts->fixed_reg) {
>> +    if (ts->kind == TEMP_FIXED) {
>>          return;
>>      }
>>      if (!ts->mem_coherent) {
>> @@ -3289,7 +3314,8 @@ static void temp_save(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs)
>>  {
>>      /* The liveness analysis already ensures that globals are back
>>         in memory. Keep an tcg_debug_assert for safety. */
>> -    tcg_debug_assert(ts->val_type == TEMP_VAL_MEM || ts->fixed_reg);
>> +    tcg_debug_assert(ts->val_type == TEMP_VAL_MEM
>> +                     || ts->kind == TEMP_FIXED);
>>  }
>>
>>  /* save globals to their canonical location and assume they can be
>> @@ -3314,7 +3340,7 @@ static void sync_globals(TCGContext *s, TCGRegSet allocated_regs)
>>      for (i = 0, n = s->nb_globals; i < n; i++) {
>>          TCGTemp *ts = &s->temps[i];
>>          tcg_debug_assert(ts->val_type != TEMP_VAL_REG
>> -                         || ts->fixed_reg
>> +                         || ts->kind == TEMP_FIXED
>>                           || ts->mem_coherent);
>>      }
>>  }
>> @@ -3327,7 +3353,7 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
>>
>>      for (i = s->nb_globals; i < s->nb_temps; i++) {
>>          TCGTemp *ts = &s->temps[i];
>> -        if (ts->temp_local) {
>> +        if (ts->kind == TEMP_LOCAL) {
>>              temp_save(s, ts, allocated_regs);
>>          } else {
>>              /* The liveness analysis already ensures that temps are dead.
>> @@ -3347,7 +3373,7 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
>>                                    TCGRegSet preferred_regs)
>>  {
>>      /* ENV should not be modified.  */
>> -    tcg_debug_assert(!ots->fixed_reg);
>> +    tcg_debug_assert(ots->kind != TEMP_FIXED);
>>
>>      /* The movi is not explicitly generated here.  */
>>      if (ots->val_type == TEMP_VAL_REG) {
>> @@ -3387,7 +3413,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
>>      ts = arg_temp(op->args[1]);
>>
>>      /* ENV should not be modified.  */
>> -    tcg_debug_assert(!ots->fixed_reg);
>> +    tcg_debug_assert(ots->kind != TEMP_FIXED);
>>
>>      /* Note that otype != itype for no-op truncation.  */
>>      otype = ots->type;
>> @@ -3426,7 +3452,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
>>          }
>>          temp_dead(s, ots);
>>      } else {
>> -        if (IS_DEAD_ARG(1) && !ts->fixed_reg) {
>> +        if (IS_DEAD_ARG(1) && ts->kind != TEMP_FIXED) {
>>              /* the mov can be suppressed */
>>              if (ots->val_type == TEMP_VAL_REG) {
>>                  s->reg_to_temp[ots->reg] = NULL;
>> @@ -3448,7 +3474,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
>>                   * Store the source register into the destination slot
>>                   * and leave the destination temp as TEMP_VAL_MEM.
>>                   */
>> -                assert(!ots->fixed_reg);
>> +                assert(ots->kind != TEMP_FIXED);
>>                  if (!ts->mem_allocated) {
>>                      temp_allocate_frame(s, ots);
>>                  }
>> @@ -3485,7 +3511,7 @@ static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp *op)
>>      its = arg_temp(op->args[1]);
>>
>>      /* ENV should not be modified.  */
>> -    tcg_debug_assert(!ots->fixed_reg);
>> +    tcg_debug_assert(ots->kind != TEMP_FIXED);
>>
>>      itype = its->type;
>>      vece = TCGOP_VECE(op);
>> @@ -3625,7 +3651,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>>          i_preferred_regs = o_preferred_regs = 0;
>>          if (arg_ct->ct & TCG_CT_IALIAS) {
>>              o_preferred_regs = op->output_pref[arg_ct->alias_index];
>> -            if (ts->fixed_reg) {
>> +            if (ts->kind == TEMP_FIXED) {
>>                  /* if fixed register, we must allocate a new register
>>                     if the alias is not the same register */
>>                  if (arg != op->args[arg_ct->alias_index]) {
>> @@ -3716,7 +3742,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>>              ts = arg_temp(arg);
>>
>>              /* ENV should not be modified.  */
>> -            tcg_debug_assert(!ts->fixed_reg);
>> +            tcg_debug_assert(ts->kind != TEMP_FIXED);
>>
>>              if ((arg_ct->ct & TCG_CT_ALIAS)
>>                  && !const_args[arg_ct->alias_index]) {
>> @@ -3758,7 +3784,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>>          ts = arg_temp(op->args[i]);
>>
>>          /* ENV should not be modified.  */
>> -        tcg_debug_assert(!ts->fixed_reg);
>> +        tcg_debug_assert(ts->kind != TEMP_FIXED);
>>
>>          if (NEED_SYNC_ARG(i)) {
>>              temp_sync(s, ts, o_allocated_regs, 0, IS_DEAD_ARG(i));
>> @@ -3890,7 +3916,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
>>          ts = arg_temp(arg);
>>
>>          /* ENV should not be modified.  */
>> -        tcg_debug_assert(!ts->fixed_reg);
>> +        tcg_debug_assert(ts->kind != TEMP_FIXED);
>>
>>          reg = tcg_target_call_oarg_regs[i];
>>          tcg_debug_assert(s->reg_to_temp[reg] == NULL);
>> --
>> 2.20.1
>>
>>
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 25/36] tcg: Remove tcg_gen_dup{8,16,32,64}i_vec
  2020-04-22  1:17 ` [PATCH v2 25/36] tcg: Remove tcg_gen_dup{8,16,32,64}i_vec Richard Henderson
@ 2020-04-23  9:11   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-23  9:11 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> These interfaces have been replaced by tcg_gen_dupi_vec
> and tcg_constant_vec.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  include/tcg/tcg-op.h |  4 ----
>  tcg/tcg-op-vec.c     | 20 --------------------
>  2 files changed, 24 deletions(-)
>
> diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
> index 11ed9192f7..a39eb13ff0 100644
> --- a/include/tcg/tcg-op.h
> +++ b/include/tcg/tcg-op.h
> @@ -959,10 +959,6 @@ void tcg_gen_mov_vec(TCGv_vec, TCGv_vec);
>  void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec, TCGv_i32);
>  void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec, TCGv_i64);
>  void tcg_gen_dup_mem_vec(unsigned vece, TCGv_vec, TCGv_ptr, tcg_target_long);
> -void tcg_gen_dup8i_vec(TCGv_vec, uint32_t);
> -void tcg_gen_dup16i_vec(TCGv_vec, uint32_t);
> -void tcg_gen_dup32i_vec(TCGv_vec, uint32_t);
> -void tcg_gen_dup64i_vec(TCGv_vec, uint64_t);
>  void tcg_gen_dupi_vec(unsigned vece, TCGv_vec, uint64_t);
>  void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>  void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
> diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
> index 6343046e18..a9c16d85c5 100644
> --- a/tcg/tcg-op-vec.c
> +++ b/tcg/tcg-op-vec.c
> @@ -284,26 +284,6 @@ void tcg_gen_dupi_vec(unsigned vece, TCGv_vec dest, uint64_t val)
>      tcg_gen_mov_vec(dest, tcg_constant_vec(type, vece, val));
>  }
>  
> -void tcg_gen_dup64i_vec(TCGv_vec dest, uint64_t val)
> -{
> -    tcg_gen_dupi_vec(MO_64, dest, val);
> -}
> -
> -void tcg_gen_dup32i_vec(TCGv_vec dest, uint32_t val)
> -{
> -    tcg_gen_dupi_vec(MO_32, dest, val);
> -}
> -
> -void tcg_gen_dup16i_vec(TCGv_vec dest, uint32_t val)
> -{
> -    tcg_gen_dupi_vec(MO_16, dest, val);
> -}
> -
> -void tcg_gen_dup8i_vec(TCGv_vec dest, uint32_t val)
> -{
> -    tcg_gen_dupi_vec(MO_8, dest, val);
> -}
> -
>  void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec r, TCGv_i64 a)
>  {
>      TCGArg ri = tcgv_vec_arg(r);


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 26/36] tcg: Add load_dest parameter to GVecGen2
  2020-04-22  1:17 ` [PATCH v2 26/36] tcg: Add load_dest parameter to GVecGen2 Richard Henderson
@ 2020-04-23  9:37   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-23  9:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> We have this same parameter for GVecGen2i, GVecGen3,
> and GVecGen3i.  This will make some SVE2 insns easier
> to parameterize.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  include/tcg/tcg-op-gvec.h |  2 ++
>  tcg/tcg-op-gvec.c         | 45 ++++++++++++++++++++++++++++-----------
>  2 files changed, 34 insertions(+), 13 deletions(-)
>
> diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
> index d89f91f40e..cea6497341 100644
> --- a/include/tcg/tcg-op-gvec.h
> +++ b/include/tcg/tcg-op-gvec.h
> @@ -109,6 +109,8 @@ typedef struct {
>      uint8_t vece;
>      /* Prefer i64 to v64.  */
>      bool prefer_i64;
> +    /* Load dest as a 2nd source operand.  */
> +    bool load_dest;
>  } GVecGen2;
>  
>  typedef struct {
> diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
> index 43cac1a0bf..049a55e700 100644
> --- a/tcg/tcg-op-gvec.c
> +++ b/tcg/tcg-op-gvec.c
> @@ -663,17 +663,22 @@ static void expand_clr(uint32_t dofs, uint32_t maxsz)
>  
>  /* Expand OPSZ bytes worth of two-operand operations using i32 elements.  */
>  static void expand_2_i32(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
> -                         void (*fni)(TCGv_i32, TCGv_i32))
> +                         bool load_dest, void (*fni)(TCGv_i32, TCGv_i32))
>  {
>      TCGv_i32 t0 = tcg_temp_new_i32();
> +    TCGv_i32 t1 = tcg_temp_new_i32();
>      uint32_t i;
>  
>      for (i = 0; i < oprsz; i += 4) {
>          tcg_gen_ld_i32(t0, cpu_env, aofs + i);
> -        fni(t0, t0);
> -        tcg_gen_st_i32(t0, cpu_env, dofs + i);
> +        if (load_dest) {
> +            tcg_gen_ld_i32(t1, cpu_env, dofs + i);
> +        }
> +        fni(t1, t0);
> +        tcg_gen_st_i32(t1, cpu_env, dofs + i);
>      }
>      tcg_temp_free_i32(t0);
> +    tcg_temp_free_i32(t1);
>  }
>  
>  static void expand_2i_i32(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
> @@ -793,17 +798,22 @@ static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
>  
>  /* Expand OPSZ bytes worth of two-operand operations using i64 elements.  */
>  static void expand_2_i64(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
> -                         void (*fni)(TCGv_i64, TCGv_i64))
> +                         bool load_dest, void (*fni)(TCGv_i64, TCGv_i64))
>  {
>      TCGv_i64 t0 = tcg_temp_new_i64();
> +    TCGv_i64 t1 = tcg_temp_new_i64();
>      uint32_t i;
>  
>      for (i = 0; i < oprsz; i += 8) {
>          tcg_gen_ld_i64(t0, cpu_env, aofs + i);
> -        fni(t0, t0);
> -        tcg_gen_st_i64(t0, cpu_env, dofs + i);
> +        if (load_dest) {
> +            tcg_gen_ld_i64(t1, cpu_env, dofs + i);
> +        }
> +        fni(t1, t0);
> +        tcg_gen_st_i64(t1, cpu_env, dofs + i);
>      }
>      tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
>  }
>  
>  static void expand_2i_i64(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
> @@ -924,17 +934,23 @@ static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
>  /* Expand OPSZ bytes worth of two-operand operations using host vectors.  */
>  static void expand_2_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
>                           uint32_t oprsz, uint32_t tysz, TCGType type,
> +                         bool load_dest,
>                           void (*fni)(unsigned, TCGv_vec, TCGv_vec))
>  {
>      TCGv_vec t0 = tcg_temp_new_vec(type);
> +    TCGv_vec t1 = tcg_temp_new_vec(type);
>      uint32_t i;
>  
>      for (i = 0; i < oprsz; i += tysz) {
>          tcg_gen_ld_vec(t0, cpu_env, aofs + i);
> -        fni(vece, t0, t0);
> -        tcg_gen_st_vec(t0, cpu_env, dofs + i);
> +        if (load_dest) {
> +            tcg_gen_ld_vec(t1, cpu_env, dofs + i);
> +        }
> +        fni(vece, t1, t0);
> +        tcg_gen_st_vec(t1, cpu_env, dofs + i);
>      }
>      tcg_temp_free_vec(t0);
> +    tcg_temp_free_vec(t1);
>  }
>  
>  /* Expand OPSZ bytes worth of two-vector operands and an immediate operand
> @@ -1088,7 +1104,8 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,
>           * that e.g. size == 80 would be expanded with 2x32 + 1x16.
>           */
>          some = QEMU_ALIGN_DOWN(oprsz, 32);
> -        expand_2_vec(g->vece, dofs, aofs, some, 32, TCG_TYPE_V256, g->fniv);
> +        expand_2_vec(g->vece, dofs, aofs, some, 32, TCG_TYPE_V256,
> +                     g->load_dest, g->fniv);
>          if (some == oprsz) {
>              break;
>          }
> @@ -1098,17 +1115,19 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,
>          maxsz -= some;
>          /* fallthru */
>      case TCG_TYPE_V128:
> -        expand_2_vec(g->vece, dofs, aofs, oprsz, 16, TCG_TYPE_V128, g->fniv);
> +        expand_2_vec(g->vece, dofs, aofs, oprsz, 16, TCG_TYPE_V128,
> +                     g->load_dest, g->fniv);
>          break;
>      case TCG_TYPE_V64:
> -        expand_2_vec(g->vece, dofs, aofs, oprsz, 8, TCG_TYPE_V64, g->fniv);
> +        expand_2_vec(g->vece, dofs, aofs, oprsz, 8, TCG_TYPE_V64,
> +                     g->load_dest, g->fniv);
>          break;
>  
>      case 0:
>          if (g->fni8 && check_size_impl(oprsz, 8)) {
> -            expand_2_i64(dofs, aofs, oprsz, g->fni8);
> +            expand_2_i64(dofs, aofs, oprsz, g->load_dest, g->fni8);
>          } else if (g->fni4 && check_size_impl(oprsz, 4)) {
> -            expand_2_i32(dofs, aofs, oprsz, g->fni4);
> +            expand_2_i32(dofs, aofs, oprsz, g->load_dest, g->fni4);
>          } else {
>              assert(g->fno != NULL);
>              tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, g->data, g->fno);


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 27/36] tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32,64}
  2020-04-22  1:17 ` [PATCH v2 27/36] tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32, 64} Richard Henderson
  2020-04-22 10:19   ` Philippe Mathieu-Daudé
@ 2020-04-23  9:38   ` Alex Bennée
  1 sibling, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-23  9:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> For the benefit of compatibility of function pointer types,
> we have standardized on int32_t and int64_t as the integral
> argument to tcg expanders.
>
> We converted most of them in 474b2e8f0f7, but missed the rotates.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  include/tcg/tcg-op.h |  8 ++++----
>  tcg/tcg-op.c         | 16 ++++++++--------
>  2 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
> index a39eb13ff0..b07bf7b524 100644
> --- a/include/tcg/tcg-op.h
> +++ b/include/tcg/tcg-op.h
> @@ -298,9 +298,9 @@ void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
>  void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg);
>  void tcg_gen_ctpop_i32(TCGv_i32 a1, TCGv_i32 a2);
>  void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
> -void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
> +void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
>  void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
> -void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
> +void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
>  void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
>                           unsigned int ofs, unsigned int len);
>  void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
> @@ -490,9 +490,9 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
>  void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg);
>  void tcg_gen_ctpop_i64(TCGv_i64 a1, TCGv_i64 a2);
>  void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
> -void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
> +void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
>  void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
> -void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
> +void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
>  void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
>                           unsigned int ofs, unsigned int len);
>  void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 07eb661a07..202d8057c5 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -516,9 +516,9 @@ void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
>      }
>  }
>  
> -void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2)
> +void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>  {
> -    tcg_debug_assert(arg2 < 32);
> +    tcg_debug_assert(arg2 >= 0 && arg2 < 32);
>      /* some cases can be optimized here */
>      if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
> @@ -554,9 +554,9 @@ void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
>      }
>  }
>  
> -void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2)
> +void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>  {
> -    tcg_debug_assert(arg2 < 32);
> +    tcg_debug_assert(arg2 >= 0 && arg2 < 32);
>      /* some cases can be optimized here */
>      if (arg2 == 0) {
>          tcg_gen_mov_i32(ret, arg1);
> @@ -1949,9 +1949,9 @@ void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>      }
>  }
>  
> -void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2)
> +void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>  {
> -    tcg_debug_assert(arg2 < 64);
> +    tcg_debug_assert(arg2 >= 0 && arg2 < 64);
>      /* some cases can be optimized here */
>      if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);
> @@ -1986,9 +1986,9 @@ void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>      }
>  }
>  
> -void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2)
> +void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>  {
> -    tcg_debug_assert(arg2 < 64);
> +    tcg_debug_assert(arg2 >= 0 && arg2 < 64);
>      /* some cases can be optimized here */
>      if (arg2 == 0) {
>          tcg_gen_mov_i64(ret, arg1);


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 28/36] tcg: Implement gvec support for rotate by immediate
  2020-04-22  1:17 ` [PATCH v2 28/36] tcg: Implement gvec support for rotate by immediate Richard Henderson
@ 2020-04-23 13:28   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-23 13:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> No host backend support yet, but the interfaces for rotli
> are in place.  Canonicalize immediate rotate to the left,
> based on a survey of architectures, but provide both left
> and right shift interfaces to the translators.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 29/36] tcg: Implement gvec support for rotate by vector
  2020-04-22  1:17 ` [PATCH v2 29/36] tcg: Implement gvec support for rotate by vector Richard Henderson
@ 2020-04-23 13:41   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-23 13:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> No host backend support yet, but the interfaces for rotlv
> and rotrv are in place.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 31/36] tcg: Implement gvec support for rotate by scalar
  2020-04-22  1:17 ` [PATCH v2 31/36] tcg: Implement gvec support for rotate by scalar Richard Henderson
@ 2020-04-23 13:46   ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-23 13:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> No host backend support yet, but the interfaces for rotls
> are in place.  Only implement left-rotate for now, as the
> only known use of vector rotate by scalar is s390x, so any
> right-rotate would be unused and untestable.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 00/36] tcg 5.1 omnibus patch set
  2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
                   ` (35 preceding siblings ...)
  2020-04-22  1:17 ` [PATCH v2 36/36] target/s390x: Use tcg_gen_gvec_rotl{i,s,v} Richard Henderson
@ 2020-04-23 13:50 ` Alex Bennée
  36 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-23 13:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> For v1, I had split this into 4 logically distinct parts.  But
> apparently there are minor interdependencies, because the later
> sets would not apply standalone, says Alex.
>
> Rather than tease them apart, and then have to undo that work
> in order to actually apply them later, I'll just lump them.
>
> So:
>
>   Part 1, patches 1-7, tcg_gen_gvec_dup_imm, is reviewed.
>
>   Part 2, patch 8, vector tail clearing, is reviewed, and I have
>           moved the target/arm patches into a different queue.
>
>   Part 3, patches 9-25, TYPE_CONST temporaries, is mostly unreviewed.
>
>   Part 4, patch 26, load_dest for GVecGen2, a support patch for SVE2.
>
>   Part 5, patches 27-36, add vector rotate patterns, is brand new.
>           I include two demonstrators for target/ppc and target/s390x.

I've done my review pass for now. I made it through all the core code
but my brain was too frazzled to look at the back end generation code so
I'll have another go at that on the next revision when the sparc
regression is figured out.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 09/36] tcg: Consolidate 3 bits into enum TCGTempKind
  2020-04-23  9:00     ` Philippe Mathieu-Daudé
@ 2020-04-23 15:40       ` Richard Henderson
  2020-04-23 17:24         ` Daniel P. Berrangé
  0 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-23 15:40 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Aleksandar Markovic
  Cc: Alex Bennée, QEMU Developers

On 4/23/20 2:00 AM, Philippe Mathieu-Daudé wrote:
>>> @@ -1885,12 +1896,17 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
>>>  {
>>>      int idx = temp_idx(ts);
>>>
>>> -    if (ts->temp_global) {
>>> +    switch (ts->kind) {
>>> +    case TEMP_FIXED:
>>> +    case TEMP_GLOBAL:
>>>          pstrcpy(buf, buf_size, ts->name);
>>> -    } else if (ts->temp_local) {
>>> +        break;
>>> +    case TEMP_LOCAL:
>>>          snprintf(buf, buf_size, "loc%d", idx - s->nb_globals);
>>> -    } else {
>>> +        break;
>>> +    case TEMP_NORMAL:
>>>          snprintf(buf, buf_size, "tmp%d", idx - s->nb_globals);
>>> +        break;
>>>      }
>>
>> Hmm, why this switch doesn't have:
>>
>>         default:
>>             g_assert_not_reached();
>>
>> like the other ones?
> 
> ... then all switch should have a default case, as noticed Aleksandar.

There's a bit of a conflict between wanting to use -Werror -Wswitch, and making
sure every switch has a default.

With the former, you get a compiler error of the form

error: enumeration value ‘FOO’ not handled in switch

which lets you easily find places that need adjustment enumerators are added.

With the latter, you only get a runtime failure, which can be more difficult to
find if you've missed one.

We do not always have the option of relying on -Wswitch, if there are other
compounding warnings such as uninitialized variables.

In this instance, we can rely on -Wswitch, and I see no reason to add a default
case.


r~


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 09/36] tcg: Consolidate 3 bits into enum TCGTempKind
  2020-04-23 15:40       ` Richard Henderson
@ 2020-04-23 17:24         ` Daniel P. Berrangé
  2020-04-23 23:11           ` Richard Henderson
  0 siblings, 1 reply; 75+ messages in thread
From: Daniel P. Berrangé @ 2020-04-23 17:24 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Aleksandar Markovic, Alex Bennée,
	Philippe Mathieu-Daudé,
	QEMU Developers

On Thu, Apr 23, 2020 at 08:40:10AM -0700, Richard Henderson wrote:
> On 4/23/20 2:00 AM, Philippe Mathieu-Daudé wrote:
> >>> @@ -1885,12 +1896,17 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
> >>>  {
> >>>      int idx = temp_idx(ts);
> >>>
> >>> -    if (ts->temp_global) {
> >>> +    switch (ts->kind) {
> >>> +    case TEMP_FIXED:
> >>> +    case TEMP_GLOBAL:
> >>>          pstrcpy(buf, buf_size, ts->name);
> >>> -    } else if (ts->temp_local) {
> >>> +        break;
> >>> +    case TEMP_LOCAL:
> >>>          snprintf(buf, buf_size, "loc%d", idx - s->nb_globals);
> >>> -    } else {
> >>> +        break;
> >>> +    case TEMP_NORMAL:
> >>>          snprintf(buf, buf_size, "tmp%d", idx - s->nb_globals);
> >>> +        break;
> >>>      }
> >>
> >> Hmm, why this switch doesn't have:
> >>
> >>         default:
> >>             g_assert_not_reached();
> >>
> >> like the other ones?
> > 
> > ... then all switch should have a default case, as noticed Aleksandar.
> 
> There's a bit of a conflict between wanting to use -Werror -Wswitch, and making
> sure every switch has a default.
> 
> With the former, you get a compiler error of the form
> 
> error: enumeration value ‘FOO’ not handled in switch
> 
> which lets you easily find places that need adjustment enumerators are added.

FYI,  -Wswitch-enum can deal with this. This gives a warning about
missing enum cases, even if there is a default statement:

[quote]
'-Wswitch-enum'
     Warn whenever a 'switch' statement has an index of enumerated type
     and lacks a 'case' for one or more of the named codes of that
     enumeration.  'case' labels outside the enumeration range also
     provoke warnings when this option is used.  The only difference
     between '-Wswitch' and this option is that this option gives a
     warning about an omitted enumeration code even if there is a
     'default' label.
[/quote]

If we want to have a default: in every switch, then we could also
use -Wswitch-default too !

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 09/36] tcg: Consolidate 3 bits into enum TCGTempKind
  2020-04-23 17:24         ` Daniel P. Berrangé
@ 2020-04-23 23:11           ` Richard Henderson
  2020-04-24  9:08             ` Daniel P. Berrangé
  0 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-23 23:11 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Aleksandar Markovic, Alex Bennée,
	Philippe Mathieu-Daudé,
	QEMU Developers

On 4/23/20 10:24 AM, Daniel P. Berrangé wrote:
> On Thu, Apr 23, 2020 at 08:40:10AM -0700, Richard Henderson wrote:
>> On 4/23/20 2:00 AM, Philippe Mathieu-Daudé wrote:
>>>>> @@ -1885,12 +1896,17 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
>>>>>  {
>>>>>      int idx = temp_idx(ts);
>>>>>
>>>>> -    if (ts->temp_global) {
>>>>> +    switch (ts->kind) {
>>>>> +    case TEMP_FIXED:
>>>>> +    case TEMP_GLOBAL:
>>>>>          pstrcpy(buf, buf_size, ts->name);
>>>>> -    } else if (ts->temp_local) {
>>>>> +        break;
>>>>> +    case TEMP_LOCAL:
>>>>>          snprintf(buf, buf_size, "loc%d", idx - s->nb_globals);
>>>>> -    } else {
>>>>> +        break;
>>>>> +    case TEMP_NORMAL:
>>>>>          snprintf(buf, buf_size, "tmp%d", idx - s->nb_globals);
>>>>> +        break;
>>>>>      }
>>>>
>>>> Hmm, why this switch doesn't have:
>>>>
>>>>         default:
>>>>             g_assert_not_reached();
>>>>
>>>> like the other ones?
>>>
>>> ... then all switch should have a default case, as noticed Aleksandar.
>>
>> There's a bit of a conflict between wanting to use -Werror -Wswitch, and making
>> sure every switch has a default.
>>
>> With the former, you get a compiler error of the form
>>
>> error: enumeration value ‘FOO’ not handled in switch
>>
>> which lets you easily find places that need adjustment enumerators are added.
> 
> FYI,  -Wswitch-enum can deal with this. This gives a warning about
> missing enum cases, even if there is a default statement:
> 
> [quote]
> '-Wswitch-enum'
>      Warn whenever a 'switch' statement has an index of enumerated type
>      and lacks a 'case' for one or more of the named codes of that
>      enumeration.  'case' labels outside the enumeration range also
>      provoke warnings when this option is used.  The only difference
>      between '-Wswitch' and this option is that this option gives a
>      warning about an omitted enumeration code even if there is a
>      'default' label.

This warning, IMO, is useless.

All you need is one enumeration with 100 elements -- e.g. TCGOp -- and you
certainly will not want to have to add all enumerators to every switch.


r~


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 13/36] tcg: Use tcg_constant_{i32,i64} with tcg int expanders
  2020-04-22 20:04   ` Alex Bennée
@ 2020-04-23 23:13     ` Richard Henderson
  2020-04-24 13:23       ` Alex Bennée
  0 siblings, 1 reply; 75+ messages in thread
From: Richard Henderson @ 2020-04-23 23:13 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 4/22/20 1:04 PM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> 
> We have a regression. Setting up a build dir with:
> 
>   ../../configure --disable-tools --disable-docs --target-list=sparc-softmmu,sparc64-softmmu
>   make -j30 && make check-acceptance
> 
> And then running a bisect between HEAD and master:
> 
>   git bisect run /bin/sh -c "cd builds/bisect && make -j30 && ./tests/venv/bin/avocado run ./tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_sparc_ss20"
> 
> Fingers:
> 
>   a4d42b76dd29818e4f393c4c3eb59601b0015b2f is the first bad commit
>   commit a4d42b76dd29818e4f393c4c3eb59601b0015b2f
>   Author: Richard Henderson <richard.henderson@linaro.org>
>   Date:   Tue Apr 21 18:16:59 2020 -0700
> 
>       tcg: Use tcg_constant_{i32,i64} with tcg int expanders
> 
>       Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>       Message-Id: <20200422011722.13287-14-richard.henderson@linaro.org>

Ho hum.  I can reproduce this, but after a day of debugging I'm no closer to
figuring out what's wrong than when I started.

I'm going to put this whole section of TEMP_CONST to the side for now.


r~


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 09/36] tcg: Consolidate 3 bits into enum TCGTempKind
  2020-04-23 23:11           ` Richard Henderson
@ 2020-04-24  9:08             ` Daniel P. Berrangé
  0 siblings, 0 replies; 75+ messages in thread
From: Daniel P. Berrangé @ 2020-04-24  9:08 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Aleksandar Markovic, Alex Bennée,
	Philippe Mathieu-Daudé,
	QEMU Developers

On Thu, Apr 23, 2020 at 04:11:14PM -0700, Richard Henderson wrote:
> On 4/23/20 10:24 AM, Daniel P. Berrangé wrote:
> > On Thu, Apr 23, 2020 at 08:40:10AM -0700, Richard Henderson wrote:
> >> On 4/23/20 2:00 AM, Philippe Mathieu-Daudé wrote:
> >>>>> @@ -1885,12 +1896,17 @@ static char *tcg_get_arg_str_ptr(TCGContext *s, char *buf, int buf_size,
> >>>>>  {
> >>>>>      int idx = temp_idx(ts);
> >>>>>
> >>>>> -    if (ts->temp_global) {
> >>>>> +    switch (ts->kind) {
> >>>>> +    case TEMP_FIXED:
> >>>>> +    case TEMP_GLOBAL:
> >>>>>          pstrcpy(buf, buf_size, ts->name);
> >>>>> -    } else if (ts->temp_local) {
> >>>>> +        break;
> >>>>> +    case TEMP_LOCAL:
> >>>>>          snprintf(buf, buf_size, "loc%d", idx - s->nb_globals);
> >>>>> -    } else {
> >>>>> +        break;
> >>>>> +    case TEMP_NORMAL:
> >>>>>          snprintf(buf, buf_size, "tmp%d", idx - s->nb_globals);
> >>>>> +        break;
> >>>>>      }
> >>>>
> >>>> Hmm, why this switch doesn't have:
> >>>>
> >>>>         default:
> >>>>             g_assert_not_reached();
> >>>>
> >>>> like the other ones?
> >>>
> >>> ... then all switch should have a default case, as noticed Aleksandar.
> >>
> >> There's a bit of a conflict between wanting to use -Werror -Wswitch, and making
> >> sure every switch has a default.
> >>
> >> With the former, you get a compiler error of the form
> >>
> >> error: enumeration value ‘FOO’ not handled in switch
> >>
> >> which lets you easily find places that need adjustment enumerators are added.
> > 
> > FYI,  -Wswitch-enum can deal with this. This gives a warning about
> > missing enum cases, even if there is a default statement:
> > 
> > [quote]
> > '-Wswitch-enum'
> >      Warn whenever a 'switch' statement has an index of enumerated type
> >      and lacks a 'case' for one or more of the named codes of that
> >      enumeration.  'case' labels outside the enumeration range also
> >      provoke warnings when this option is used.  The only difference
> >      between '-Wswitch' and this option is that this option gives a
> >      warning about an omitted enumeration code even if there is a
> >      'default' label.
> 
> This warning, IMO, is useless.
> 
> All you need is one enumeration with 100 elements -- e.g. TCGOp -- and you
> certainly will not want to have to add all enumerators to every switch.

It depends how many of these exceptions you have. If there are only a
small handful of exceptions like this, then you can use Pragmas to
selectively disable the warning in those few cases, while still
benefitting from it across the rest of the code.  If there are alot
of such exceptions though, then I agree it is impractical.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 13/36] tcg: Use tcg_constant_{i32,i64} with tcg int expanders
  2020-04-23 23:13     ` Richard Henderson
@ 2020-04-24 13:23       ` Alex Bennée
  0 siblings, 0 replies; 75+ messages in thread
From: Alex Bennée @ 2020-04-24 13:23 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> On 4/22/20 1:04 PM, Alex Bennée wrote:
>> 
>> Richard Henderson <richard.henderson@linaro.org> writes:
>> 
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> 
>> We have a regression. Setting up a build dir with:
>> 
>>   ../../configure --disable-tools --disable-docs --target-list=sparc-softmmu,sparc64-softmmu
>>   make -j30 && make check-acceptance
>> 
>> And then running a bisect between HEAD and master:
>> 
>>   git bisect run /bin/sh -c "cd builds/bisect && make -j30 && ./tests/venv/bin/avocado run ./tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_sparc_ss20"
>> 
>> Fingers:
>> 
>>   a4d42b76dd29818e4f393c4c3eb59601b0015b2f is the first bad commit
>>   commit a4d42b76dd29818e4f393c4c3eb59601b0015b2f
>>   Author: Richard Henderson <richard.henderson@linaro.org>
>>   Date:   Tue Apr 21 18:16:59 2020 -0700
>> 
>>       tcg: Use tcg_constant_{i32,i64} with tcg int expanders
>> 
>>       Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>       Message-Id: <20200422011722.13287-14-richard.henderson@linaro.org>
>
> Ho hum.  I can reproduce this, but after a day of debugging I'm no closer to
> figuring out what's wrong than when I started.
>
> I'm going to put this whole section of TEMP_CONST to the side for now.

From my own poking around I can say the hang occurs when you first
introduce just:

  void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg)
  {
      tcg_gen_mov_i32(ret, tcg_constant_i32(arg));
  }

and nothing else. Which indicates the problem has to be in the core
plumbing itself. This is odd because all the other architectures are
fine. I wonder if there is something special about sparc's constant
generation?

Eyeballing the numbers it does seem like sparc generates more negative
numbers than ARM does - although ARM does generate some. I thought I'd
just have a check to see what happens so I looked at the first
occurrence in the sparc test:

  0x00006224:  sethi  %hi(0xffdcf000), %g6
  0x00006228:  mov  %g6, %g6      ! 0xffdcf000
  0x0000622c:  sethi  %hi(0xffd00000), %g4
  0x00006230:  mov  %g4, %g4      ! 0xffd00000
  0x00006234:  sub  %g6, %g4, %g6
  0x00006238:  sub  %g1, %g6, %g3
  0x0000623c:  sethi  %hi(0x1000), %g5
  0x00006240:  sub  %g3, %g5, %g3
  0x00006244:  sub  %g3, %g5, %g3

Which seems to be translated into ops ok:

   ---- 00006224 00006228
   mov_i32 g6,$0xffdcf000

   ---- 00006228 0000622c

   ---- 0000622c 00006230
   mov_i32 g4,$0xffd00000

   ---- 00006230 00006234

   ---- 00006234 00006238
   sub_i32 tmp0,g6,g4
   mov_i32 g6,tmp0

   ---- 00006238 0000623c
   sub_i32 tmp0,g1,g6
   mov_i32 g3,tmp0

   ---- 0000623c 00006240
   mov_i32 g5,$0x1000

   ---- 00006240 00006244
   sub_i32 tmp0,g3,g5
   mov_i32 g3,tmp0

   ---- 00006244 00006248
   sub_i32 tmp0,g3,g5
   mov_i32 g3,tmp0

and looks like its doing the expected constant folding here.

   ---- 00006224 00006228

   ---- 00006228 0000622c

   ---- 0000622c 00006230

   ---- 00006230 00006234

   ---- 00006234 00006238
   movi_i32 tmp0,$0xcf000                   pref=0xffff
   mov_i32 g6,tmp0                          dead: 1  pref=0xffff

   ---- 00006238 0000623c
   sub_i32 tmp0,g1,g6                       dead: 1 2  pref=0xffff
   mov_i32 g3,tmp0                          dead: 1  pref=0xffff

   ---- 0000623c 00006240
   mov_i32 g5,$0x1000                       sync: 0  dead: 0  pref=0xffff

   ---- 00006240 00006244
   sub_i32 tmp0,g3,$0x1000                  dead: 1  pref=0xffff
   mov_i32 g3,tmp0                          dead: 1  pref=0xffff

   ---- 00006244 00006248
   sub_i32 tmp0,g3,$0x1000                  dead: 1  pref=0xffff
   mov_i32 g3,tmp0                          sync: 0  dead: 1  pref=0xf038


One other data point is it is certainly in the optimisation phase that
things go wrong because:

  //#define USE_TCG_OPTIMIZATIONS

means the test passes.


>
>
> r~


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2020-04-24 13:24 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-22  1:16 [PATCH v2 00/36] tcg 5.1 omnibus patch set Richard Henderson
2020-04-22  1:16 ` [PATCH v2 01/36] tcg: Add tcg_gen_gvec_dup_imm Richard Henderson
2020-04-22  1:16 ` [PATCH v2 02/36] target/s390x: Use tcg_gen_gvec_dup_imm Richard Henderson
2020-04-22  1:16 ` [PATCH v2 03/36] target/ppc: " Richard Henderson
2020-04-22  1:16 ` [PATCH v2 04/36] target/arm: " Richard Henderson
2020-04-22  1:16 ` [PATCH v2 05/36] tcg: Use tcg_gen_gvec_dup_imm in logical simplifications Richard Henderson
2020-04-22  1:16 ` [PATCH v2 06/36] tcg: Remove tcg_gen_gvec_dup{8,16,32,64}i Richard Henderson
2020-04-22  1:16 ` [PATCH v2 07/36] tcg: Add tcg_gen_gvec_dup_tl Richard Henderson
2020-04-22  1:16 ` [PATCH v2 08/36] tcg: Improve vector tail clearing Richard Henderson
2020-04-22  1:16 ` [PATCH v2 09/36] tcg: Consolidate 3 bits into enum TCGTempKind Richard Henderson
2020-04-22 11:25   ` Alex Bennée
2020-04-22 19:58   ` Aleksandar Markovic
2020-04-23  9:00     ` Philippe Mathieu-Daudé
2020-04-23 15:40       ` Richard Henderson
2020-04-23 17:24         ` Daniel P. Berrangé
2020-04-23 23:11           ` Richard Henderson
2020-04-24  9:08             ` Daniel P. Berrangé
2020-04-22  1:16 ` [PATCH v2 10/36] tcg: Add temp_readonly Richard Henderson
2020-04-22 11:26   ` Alex Bennée
2020-04-22  1:16 ` [PATCH v2 11/36] tcg: Introduce TYPE_CONST temporaries Richard Henderson
2020-04-22 15:17   ` Alex Bennée
2020-04-22 16:55     ` Richard Henderson
2020-04-22  1:16 ` [PATCH v2 12/36] tcg: Use tcg_constant_i32 with icount expander Richard Henderson
2020-04-22 15:40   ` Alex Bennée
2020-04-22  1:16 ` [PATCH v2 13/36] tcg: Use tcg_constant_{i32, i64} with tcg int expanders Richard Henderson
2020-04-22 16:18   ` [PATCH v2 13/36] tcg: Use tcg_constant_{i32,i64} " Alex Bennée
2020-04-22 17:02     ` Richard Henderson
2020-04-22 17:57       ` Alex Bennée
2020-04-22 20:04   ` Alex Bennée
2020-04-23 23:13     ` Richard Henderson
2020-04-24 13:23       ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 14/36] tcg: Use tcg_constant_{i32, vec} with tcg vec expanders Richard Henderson
2020-04-22 17:00   ` [PATCH v2 14/36] tcg: Use tcg_constant_{i32,vec} " Alex Bennée
2020-04-22  1:17 ` [PATCH v2 15/36] tcg: Use tcg_constant_{i32,i64} with tcg plugins Richard Henderson
2020-04-22 17:18   ` [PATCH v2 15/36] tcg: Use tcg_constant_{i32, i64} " Alex Bennée
2020-04-22  1:17 ` [PATCH v2 16/36] tcg: Rename struct tcg_temp_info to TempOptInfo Richard Henderson
2020-04-22 17:19   ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 17/36] tcg/optimize: Adjust TempOptInfo allocation Richard Henderson
2020-04-22 17:53   ` Alex Bennée
2020-04-22 18:28     ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 18/36] tcg/optimize: Use tcg_constant_internal with constant folding Richard Henderson
2020-04-22 18:28   ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 19/36] tcg/tci: Add special tci_movi_{i32,i64} opcodes Richard Henderson
2020-04-22 19:02   ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 20/36] tcg: Remove movi and dupi opcodes Richard Henderson
2020-04-22  9:12   ` Aleksandar Markovic
2020-04-22 19:03   ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 21/36] tcg: Use tcg_out_dupi_vec from temp_load Richard Henderson
2020-04-22 19:28   ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 22/36] tcg: Increase tcg_out_dupi_vec immediate to int64_t Richard Henderson
2020-04-22 19:33   ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 23/36] tcg: Add tcg_reg_alloc_dup2 Richard Henderson
2020-04-22 19:40   ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 24/36] tcg/i386: Use tcg_constant_vec with tcg vec expanders Richard Henderson
2020-04-22 19:43   ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 25/36] tcg: Remove tcg_gen_dup{8,16,32,64}i_vec Richard Henderson
2020-04-23  9:11   ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 26/36] tcg: Add load_dest parameter to GVecGen2 Richard Henderson
2020-04-23  9:37   ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 27/36] tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32, 64} Richard Henderson
2020-04-22 10:19   ` Philippe Mathieu-Daudé
2020-04-23  9:38   ` [PATCH v2 27/36] tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32,64} Alex Bennée
2020-04-22  1:17 ` [PATCH v2 28/36] tcg: Implement gvec support for rotate by immediate Richard Henderson
2020-04-23 13:28   ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 29/36] tcg: Implement gvec support for rotate by vector Richard Henderson
2020-04-23 13:41   ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 30/36] tcg: Remove expansion to shift by vector from do_shifts Richard Henderson
2020-04-22  1:17 ` [PATCH v2 31/36] tcg: Implement gvec support for rotate by scalar Richard Henderson
2020-04-23 13:46   ` Alex Bennée
2020-04-22  1:17 ` [PATCH v2 32/36] tcg/i386: Implement INDEX_op_rotl[is]_vec Richard Henderson
2020-04-22  1:17 ` [PATCH v2 33/36] tcg/aarch64: Implement INDEX_op_rotli_vec Richard Henderson
2020-04-22  1:17 ` [PATCH v2 34/36] tcg/ppc: Implement INDEX_op_rot[lr]v_vec Richard Henderson
2020-04-22  1:17 ` [PATCH v2 35/36] target/ppc: Use tcg_gen_gvec_rotlv Richard Henderson
2020-04-22  1:17 ` [PATCH v2 36/36] target/s390x: Use tcg_gen_gvec_rotl{i,s,v} Richard Henderson
2020-04-23 13:50 ` [PATCH v2 00/36] tcg 5.1 omnibus patch set Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.