All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/38] tcg vector improvements
@ 2019-04-20  7:34 Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 01/38] target/arm: Fill in .opc for cmtst_op Richard Henderson
                   ` (40 more replies)
  0 siblings, 41 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Based-on: tcg-next, which at present is only tcg_gen_extract2.

The dupm patches have been on list before, with a larger context
of supporting tcg/ppc.  The rest of the set was written to support
David's s390 vector patches.  In particular:

(1) Add vector absolute value.
(2) Add vector shift by non-constant scalar.
(3) Add vector shift by vector.
(4) Add vector select.
(5) Be more precise in handling target-specific vector expansions.

And then there's a set of bugs that I encountered while working
on this across x86, aa64, and ppc hosts.  Tested primarily with
aa64 as the guest, via RISU.


r~


David Hildenbrand (1):
  tcg: Implement tcg_gen_gvec_3i()

Richard Henderson (37):
  target/arm: Fill in .opc for cmtst_op
  tcg: Assert fixed_reg is read-only
  tcg: Return bool success from tcg_out_mov
  tcg: Support cross-class moves without instruction support
  tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded
  tcg: Promote tcg_out_{dup,dupi}_vec to backend interface
  tcg: Manually expand INDEX_op_dup_vec
  tcg: Add tcg_out_dupm_vec to the backend interface
  tcg/i386: Implement tcg_out_dupm_vec
  tcg/aarch64: Implement tcg_out_dupm_vec
  tcg: Add INDEX_op_dup_mem_vec
  tcg: Add gvec expanders for variable shift
  tcg/i386: Support vector variable shift opcodes
  tcg/aarch64: Support vector variable shift opcodes
  tcg: Specify optional vector requirements with a list
  tcg: Add gvec expanders for vector shift by scalar
  tcg/i386: Support vector scalar shift opcodes
  tcg: Add support for integer absolute value
  tcg: Add support for vector absolute value
  target/arm: Use tcg_gen_abs_i64 and tcg_gen_gvec_abs
  target/cris: Use tcg_gen_abs_tl
  target/ppc: Use tcg_gen_abs_tl
  target/s390x: Use tcg_gen_abs_i64
  target/xtensa: Use tcg_gen_abs_i32
  tcg/i386: Support vector absolute value
  tcg/aarch64: Support vector absolute value
  tcg: Add support for vector comparison select
  tcg/i386: Support vector comparison select value
  tcg/aarch64: Support vector comparison select value
  target/ppc: Use vector variable shifts for VS{L,R,RA}{B,H,W,D}
  target/arm: Vectorize USHL and SSHL
  tcg/aarch64: Do not advertise minmax for MO_64
  tcg: Do not recreate INDEX_op_neg_vec unless supported
  tcg: Introduce do_op3_nofail for vector expansion
  tcg: Expand vector minmax using cmp+cmpsel
  tcg/aarch64: Use MVNI for expansion of dupi
  tcg/aarch64: Use ORRI and BICI for vector logical operations

 accel/tcg/tcg-runtime.h             |  20 +
 target/arm/helper.h                 |  17 +-
 target/arm/translate.h              |   6 +
 target/ppc/helper.h                 |  24 +-
 tcg/aarch64/tcg-target.h            |   4 +-
 tcg/aarch64/tcg-target.opc.h        |   2 +
 tcg/i386/tcg-target.h               |   6 +-
 tcg/i386/tcg-target.opc.h           |   1 -
 tcg/tcg-op-gvec.h                   |  60 +-
 tcg/tcg-op.h                        |  16 +
 tcg/tcg-opc.h                       |   3 +
 tcg/tcg.h                           |  20 +
 accel/tcg/tcg-runtime-gvec.c        | 180 ++++++
 target/arm/neon_helper.c            |  38 --
 target/arm/translate-a64.c          |  59 +-
 target/arm/translate-sve.c          |   9 +-
 target/arm/translate.c              | 432 ++++++++++---
 target/arm/vec_helper.c             | 176 ++++++
 target/cris/translate.c             |   9 +-
 target/ppc/int_helper.c             |   6 +-
 target/ppc/translate.c              |  80 +--
 target/ppc/translate/vmx-impl.inc.c | 175 +++++-
 target/s390x/translate.c            |   8 +-
 target/xtensa/translate.c           |   9 +-
 tcg/aarch64/tcg-target.inc.c        | 227 ++++++-
 tcg/arm/tcg-target.inc.c            |   7 +-
 tcg/i386/tcg-target.inc.c           | 176 +++++-
 tcg/mips/tcg-target.inc.c           |   3 +-
 tcg/optimize.c                      |   8 +-
 tcg/ppc/tcg-target.inc.c            |   3 +-
 tcg/riscv/tcg-target.inc.c          |   5 +-
 tcg/s390/tcg-target.inc.c           |   3 +-
 tcg/sparc/tcg-target.inc.c          |   3 +-
 tcg/tcg-op-gvec.c                   | 917 +++++++++++++++++++++++-----
 tcg/tcg-op-vec.c                    | 259 +++++++-
 tcg/tcg-op.c                        |  20 +
 tcg/tcg.c                           | 256 ++++++--
 tcg/tci/tcg-target.inc.c            |   3 +-
 tcg/README                          |  16 +
 39 files changed, 2699 insertions(+), 567 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 01/38] target/arm: Fill in .opc for cmtst_op
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-23  8:00   ` David Hildenbrand
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 02/38] tcg: Assert fixed_reg is read-only Richard Henderson
                   ` (39 subsequent siblings)
  40 siblings, 1 reply; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

This allows us to fall back to integers if the tcg backend
does not support comparisons in the given vece.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index d408e4d7ef..13e2dc6562 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -6140,16 +6140,20 @@ static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
 const GVecGen3 cmtst_op[4] = {
     { .fni4 = gen_helper_neon_tst_u8,
       .fniv = gen_cmtst_vec,
+      .opc = INDEX_op_cmp_vec,
       .vece = MO_8 },
     { .fni4 = gen_helper_neon_tst_u16,
       .fniv = gen_cmtst_vec,
+      .opc = INDEX_op_cmp_vec,
       .vece = MO_16 },
     { .fni4 = gen_cmtst_i32,
       .fniv = gen_cmtst_vec,
+      .opc = INDEX_op_cmp_vec,
       .vece = MO_32 },
     { .fni8 = gen_cmtst_i64,
       .fniv = gen_cmtst_vec,
       .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+      .opc = INDEX_op_cmp_vec,
       .vece = MO_64 },
 };
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 02/38] tcg: Assert fixed_reg is read-only
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 01/38] target/arm: Fill in .opc for cmtst_op Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-23  8:03   ` David Hildenbrand
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 03/38] tcg: Return bool success from tcg_out_mov Richard Henderson
                   ` (38 subsequent siblings)
  40 siblings, 1 reply; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

The only fixed_reg is cpu_env, and it should not be modified
during any TB.  Therefore code that tries to special-case moves
into a fixed_reg is dead.  Remove it.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg.c | 85 +++++++++++++++++++++++++------------------------------
 1 file changed, 38 insertions(+), 47 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index ade6050982..4f77a957b0 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3279,11 +3279,8 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
                                   tcg_target_ulong val, TCGLifeData arg_life,
                                   TCGRegSet preferred_regs)
 {
-    if (ots->fixed_reg) {
-        /* For fixed registers, we do not do any constant propagation.  */
-        tcg_out_movi(s, ots->type, ots->reg, val);
-        return;
-    }
+    /* ENV should not be modified.  */
+    tcg_debug_assert(!ots->fixed_reg);
 
     /* The movi is not explicitly generated here.  */
     if (ots->val_type == TEMP_VAL_REG) {
@@ -3319,6 +3316,9 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
     ots = arg_temp(op->args[0]);
     ts = arg_temp(op->args[1]);
 
+    /* ENV should not be modified.  */
+    tcg_debug_assert(!ots->fixed_reg);
+
     /* Note that otype != itype for no-op truncation.  */
     otype = ots->type;
     itype = ts->type;
@@ -3343,7 +3343,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
     }
 
     tcg_debug_assert(ts->val_type == TEMP_VAL_REG);
-    if (IS_DEAD_ARG(0) && !ots->fixed_reg) {
+    if (IS_DEAD_ARG(0)) {
         /* mov to a non-saved dead register makes no sense (even with
            liveness analysis disabled). */
         tcg_debug_assert(NEED_SYNC_ARG(0));
@@ -3356,7 +3356,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
         }
         temp_dead(s, ots);
     } else {
-        if (IS_DEAD_ARG(1) && !ts->fixed_reg && !ots->fixed_reg) {
+        if (IS_DEAD_ARG(1) && !ts->fixed_reg) {
             /* the mov can be suppressed */
             if (ots->val_type == TEMP_VAL_REG) {
                 s->reg_to_temp[ots->reg] = NULL;
@@ -3509,6 +3509,10 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
             arg = op->args[i];
             arg_ct = &def->args_ct[i];
             ts = arg_temp(arg);
+
+            /* ENV should not be modified.  */
+            tcg_debug_assert(!ts->fixed_reg);
+
             if ((arg_ct->ct & TCG_CT_ALIAS)
                 && !const_args[arg_ct->alias_index]) {
                 reg = new_args[arg_ct->alias_index];
@@ -3517,29 +3521,19 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
                                     i_allocated_regs | o_allocated_regs,
                                     op->output_pref[k], ts->indirect_base);
             } else {
-                /* if fixed register, we try to use it */
-                reg = ts->reg;
-                if (ts->fixed_reg &&
-                    tcg_regset_test_reg(arg_ct->u.regs, reg)) {
-                    goto oarg_end;
-                }
                 reg = tcg_reg_alloc(s, arg_ct->u.regs, o_allocated_regs,
                                     op->output_pref[k], ts->indirect_base);
             }
             tcg_regset_set_reg(o_allocated_regs, reg);
-            /* if a fixed register is used, then a move will be done afterwards */
-            if (!ts->fixed_reg) {
-                if (ts->val_type == TEMP_VAL_REG) {
-                    s->reg_to_temp[ts->reg] = NULL;
-                }
-                ts->val_type = TEMP_VAL_REG;
-                ts->reg = reg;
-                /* temp value is modified, so the value kept in memory is
-                   potentially not the same */
-                ts->mem_coherent = 0;
-                s->reg_to_temp[reg] = ts;
+            if (ts->val_type == TEMP_VAL_REG) {
+                s->reg_to_temp[ts->reg] = NULL;
             }
-        oarg_end:
+            ts->val_type = TEMP_VAL_REG;
+            ts->reg = reg;
+            /* temp value is modified, so the value kept in memory is
+               potentially not the same */
+            ts->mem_coherent = 0;
+            s->reg_to_temp[reg] = ts;
             new_args[i] = reg;
         }
     }
@@ -3555,10 +3549,10 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
     /* move the outputs in the correct register if needed */
     for(i = 0; i < nb_oargs; i++) {
         ts = arg_temp(op->args[i]);
-        reg = new_args[i];
-        if (ts->fixed_reg && ts->reg != reg) {
-            tcg_out_mov(s, ts->type, ts->reg, reg);
-        }
+
+        /* ENV should not be modified.  */
+        tcg_debug_assert(!ts->fixed_reg);
+
         if (NEED_SYNC_ARG(i)) {
             temp_sync(s, ts, o_allocated_regs, 0, IS_DEAD_ARG(i));
         } else if (IS_DEAD_ARG(i)) {
@@ -3679,26 +3673,23 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
     for(i = 0; i < nb_oargs; i++) {
         arg = op->args[i];
         ts = arg_temp(arg);
+
+        /* ENV should not be modified.  */
+        tcg_debug_assert(!ts->fixed_reg);
+
         reg = tcg_target_call_oarg_regs[i];
         tcg_debug_assert(s->reg_to_temp[reg] == NULL);
-
-        if (ts->fixed_reg) {
-            if (ts->reg != reg) {
-                tcg_out_mov(s, ts->type, ts->reg, reg);
-            }
-        } else {
-            if (ts->val_type == TEMP_VAL_REG) {
-                s->reg_to_temp[ts->reg] = NULL;
-            }
-            ts->val_type = TEMP_VAL_REG;
-            ts->reg = reg;
-            ts->mem_coherent = 0;
-            s->reg_to_temp[reg] = ts;
-            if (NEED_SYNC_ARG(i)) {
-                temp_sync(s, ts, allocated_regs, 0, IS_DEAD_ARG(i));
-            } else if (IS_DEAD_ARG(i)) {
-                temp_dead(s, ts);
-            }
+        if (ts->val_type == TEMP_VAL_REG) {
+            s->reg_to_temp[ts->reg] = NULL;
+        }
+        ts->val_type = TEMP_VAL_REG;
+        ts->reg = reg;
+        ts->mem_coherent = 0;
+        s->reg_to_temp[reg] = ts;
+        if (NEED_SYNC_ARG(i)) {
+            temp_sync(s, ts, allocated_regs, 0, IS_DEAD_ARG(i));
+        } else if (IS_DEAD_ARG(i)) {
+            temp_dead(s, ts);
         }
     }
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 03/38] tcg: Return bool success from tcg_out_mov
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 01/38] target/arm: Fill in .opc for cmtst_op Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 02/38] tcg: Assert fixed_reg is read-only Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20 10:56   ` Philippe Mathieu-Daudé
  2019-04-23  8:27   ` David Hildenbrand
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 04/38] tcg: Support cross-class moves without instruction support Richard Henderson
                   ` (37 subsequent siblings)
  40 siblings, 2 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

This patch merely changes the interface, aborting on all failures,
of which there are currently none.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c |  5 +++--
 tcg/arm/tcg-target.inc.c     |  7 +++++--
 tcg/i386/tcg-target.inc.c    |  5 +++--
 tcg/mips/tcg-target.inc.c    |  3 ++-
 tcg/ppc/tcg-target.inc.c     |  3 ++-
 tcg/riscv/tcg-target.inc.c   |  5 +++--
 tcg/s390/tcg-target.inc.c    |  3 ++-
 tcg/sparc/tcg-target.inc.c   |  3 ++-
 tcg/tcg.c                    | 14 ++++++++++----
 tcg/tci/tcg-target.inc.c     |  3 ++-
 10 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 8b93598bce..b2d3f9c0a5 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -938,10 +938,10 @@ static void tcg_out_ldst(TCGContext *s, AArch64Insn insn, TCGReg rd,
     tcg_out_ldst_r(s, insn, rd, rn, TCG_TYPE_I64, TCG_REG_TMP);
 }
 
-static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
     if (ret == arg) {
-        return;
+        return true;
     }
     switch (type) {
     case TCG_TYPE_I32:
@@ -970,6 +970,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
     default:
         g_assert_not_reached();
     }
+    return true;
 }
 
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 6873b0cf95..34e6652142 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -2275,10 +2275,13 @@ static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
     return false;
 }
 
-static inline void tcg_out_mov(TCGContext *s, TCGType type,
+static inline bool tcg_out_mov(TCGContext *s, TCGType type,
                                TCGReg ret, TCGReg arg)
 {
-    tcg_out_dat_reg(s, COND_AL, ARITH_MOV, ret, 0, arg, SHIFT_IMM_LSL(0));
+    if (ret != arg) {
+        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, ret, 0, arg, SHIFT_IMM_LSL(0));
+    }
+    return true;
 }
 
 static inline void tcg_out_movi(TCGContext *s, TCGType type,
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 1fa833840e..817a167767 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -809,12 +809,12 @@ static inline void tgen_arithr(TCGContext *s, int subop, int dest, int src)
     tcg_out_modrm(s, OPC_ARITH_GvEv + (subop << 3) + ext, dest, src);
 }
 
-static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
     int rexw = 0;
 
     if (arg == ret) {
-        return;
+        return true;
     }
     switch (type) {
     case TCG_TYPE_I64:
@@ -852,6 +852,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
     default:
         g_assert_not_reached();
     }
+    return true;
 }
 
 static void tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 8a92e916dd..f31ebb43bf 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -558,13 +558,14 @@ static inline void tcg_out_dsra(TCGContext *s, TCGReg rd, TCGReg rt, TCGArg sa)
     tcg_out_opc_sa64(s, OPC_DSRA, OPC_DSRA32, rd, rt, sa);
 }
 
-static inline void tcg_out_mov(TCGContext *s, TCGType type,
+static inline bool tcg_out_mov(TCGContext *s, TCGType type,
                                TCGReg ret, TCGReg arg)
 {
     /* Simple reg-reg move, optimising out the 'do nothing' case */
     if (ret != arg) {
         tcg_out_opc_reg(s, OPC_OR, ret, arg, TCG_REG_ZERO);
     }
+    return true;
 }
 
 static void tcg_out_movi(TCGContext *s, TCGType type,
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 773690f1d9..ec8e336be8 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -566,12 +566,13 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
                              TCGReg base, tcg_target_long offset);
 
-static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
     tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
     if (ret != arg) {
         tcg_out32(s, OR | SAB(arg, ret, arg));
     }
+    return true;
 }
 
 static inline void tcg_out_rld(TCGContext *s, int op, TCGReg ra, TCGReg rs,
diff --git a/tcg/riscv/tcg-target.inc.c b/tcg/riscv/tcg-target.inc.c
index b785f4acb7..e2bf1c2c6e 100644
--- a/tcg/riscv/tcg-target.inc.c
+++ b/tcg/riscv/tcg-target.inc.c
@@ -515,10 +515,10 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
  * TCG intrinsics
  */
 
-static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
     if (ret == arg) {
-        return;
+        return true;
     }
     switch (type) {
     case TCG_TYPE_I32:
@@ -528,6 +528,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
     default:
         g_assert_not_reached();
     }
+    return true;
 }
 
 static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 7db90b3bae..eb22188d1d 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -548,7 +548,7 @@ static void tcg_out_sh32(TCGContext* s, S390Opcode op, TCGReg dest,
     tcg_out_insn_RS(s, op, dest, sh_reg, 0, sh_imm);
 }
 
-static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)
 {
     if (src != dst) {
         if (type == TCG_TYPE_I32) {
@@ -557,6 +557,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)
             tcg_out_insn(s, RRE, LGR, dst, src);
         }
     }
+    return true;
 }
 
 static const S390Opcode lli_insns[4] = {
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 7a61839dc1..83295955a7 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -407,12 +407,13 @@ static void tcg_out_arithc(TCGContext *s, TCGReg rd, TCGReg rs1,
               | (val2const ? INSN_IMM13(val2) : INSN_RS2(val2)));
 }
 
-static inline void tcg_out_mov(TCGContext *s, TCGType type,
+static inline bool tcg_out_mov(TCGContext *s, TCGType type,
                                TCGReg ret, TCGReg arg)
 {
     if (ret != arg) {
         tcg_out_arith(s, ret, arg, TCG_REG_G0, ARITH_OR);
     }
+    return true;
 }
 
 static inline void tcg_out_sethi(TCGContext *s, TCGReg ret, uint32_t arg)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 4f77a957b0..b083faacd2 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -102,7 +102,7 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
                                            const char *ct_str, TCGType type);
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
                        intptr_t arg2);
-static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
 static void tcg_out_movi(TCGContext *s, TCGType type,
                          TCGReg ret, tcg_target_long arg);
 static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
@@ -3372,7 +3372,9 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
                                          allocated_regs, preferred_regs,
                                          ots->indirect_base);
             }
-            tcg_out_mov(s, otype, ots->reg, ts->reg);
+            if (!tcg_out_mov(s, otype, ots->reg, ts->reg)) {
+                abort();
+            }
         }
         ots->val_type = TEMP_VAL_REG;
         ots->mem_coherent = 0;
@@ -3472,7 +3474,9 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
                       i_allocated_regs, 0);
             reg = tcg_reg_alloc(s, arg_ct->u.regs, i_allocated_regs,
                                 o_preferred_regs, ts->indirect_base);
-            tcg_out_mov(s, ts->type, reg, ts->reg);
+            if (!tcg_out_mov(s, ts->type, reg, ts->reg)) {
+                abort();
+            }
         }
         new_args[i] = reg;
         const_args[i] = 0;
@@ -3629,7 +3633,9 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
             if (ts->val_type == TEMP_VAL_REG) {
                 if (ts->reg != reg) {
                     tcg_reg_free(s, reg, allocated_regs);
-                    tcg_out_mov(s, ts->type, reg, ts->reg);
+                    if (!tcg_out_mov(s, ts->type, reg, ts->reg)) {
+                        abort();
+                    }
                 }
             } else {
                 TCGRegSet arg_set = 0;
diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index 0015a98485..992d50cb1e 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -509,7 +509,7 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
-static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
     uint8_t *old_code_ptr = s->code_ptr;
     tcg_debug_assert(ret != arg);
@@ -521,6 +521,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
     tcg_out_r(s, ret);
     tcg_out_r(s, arg);
     old_code_ptr[1] = s->code_ptr - old_code_ptr;
+    return true;
 }
 
 static void tcg_out_movi(TCGContext *s, TCGType type,
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 04/38] tcg: Support cross-class moves without instruction support
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 03/38] tcg: Return bool success from tcg_out_mov Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-23  8:29   ` David Hildenbrand
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 05/38] tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded Richard Henderson
                   ` (36 subsequent siblings)
  40 siblings, 1 reply; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

PowerPC Altivec does not support direct moves between vector registers
and general registers.  So when tcg_out_mov fails, we can use the
backing memory for the temporary to perform the move.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index b083faacd2..d3dcfe3dca 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3373,7 +3373,18 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
                                          ots->indirect_base);
             }
             if (!tcg_out_mov(s, otype, ots->reg, ts->reg)) {
-                abort();
+                /* Cross register class move not supported.
+                   Store the source register into the destination slot
+                   and leave the destination temp as TEMP_VAL_MEM.  */
+                assert(!ots->fixed_reg);
+                if (!ts->mem_allocated) {
+                    temp_allocate_frame(s, ots);
+                }
+                tcg_out_st(s, ts->type, ts->reg,
+                           ots->mem_base->reg, ots->mem_offset);
+                ots->mem_coherent = 1;
+                temp_free_or_dead(s, ots, -1);
+                return;
             }
         }
         ots->val_type = TEMP_VAL_REG;
@@ -3475,7 +3486,11 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
             reg = tcg_reg_alloc(s, arg_ct->u.regs, i_allocated_regs,
                                 o_preferred_regs, ts->indirect_base);
             if (!tcg_out_mov(s, ts->type, reg, ts->reg)) {
-                abort();
+                /* Cross register class move not supported.  Sync the
+                   temp back to its slot and load from there.  */
+                temp_sync(s, ts, i_allocated_regs, 0, 0);
+                tcg_out_ld(s, ts->type, reg,
+                           ts->mem_base->reg, ts->mem_offset);
             }
         }
         new_args[i] = reg;
@@ -3634,7 +3649,11 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
                 if (ts->reg != reg) {
                     tcg_reg_free(s, reg, allocated_regs);
                     if (!tcg_out_mov(s, ts->type, reg, ts->reg)) {
-                        abort();
+                        /* Cross register class move not supported.  Sync the
+                           temp back to its slot and load from there.  */
+                        temp_sync(s, ts, allocated_regs, 0, 0);
+                        tcg_out_ld(s, ts->type, reg,
+                                   ts->mem_base->reg, ts->mem_offset);
                     }
                 }
             } else {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 05/38] tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (3 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 04/38] tcg: Support cross-class moves without instruction support Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-23  8:33   ` David Hildenbrand
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 06/38] tcg: Promote tcg_out_{dup, dupi}_vec to backend interface Richard Henderson
                   ` (35 subsequent siblings)
  40 siblings, 1 reply; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-vec.c | 49 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 27f65600c3..cfb18682b1 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -226,16 +226,6 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr b, TCGArg o, TCGType low_type)
     vec_gen_3(INDEX_op_st_vec, low_type, 0, ri, bi, o);
 }
 
-void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
-{
-    vec_gen_op3(INDEX_op_add_vec, vece, r, a, b);
-}
-
-void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
-{
-    vec_gen_op3(INDEX_op_sub_vec, vece, r, a, b);
-}
-
 void tcg_gen_and_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
     vec_gen_op3(INDEX_op_and_vec, 0, r, a, b);
@@ -296,11 +286,30 @@ void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
     tcg_gen_not_vec(0, r, r);
 }
 
+static bool do_op2(unsigned vece, TCGv_vec r, TCGv_vec a, TCGOpcode opc)
+{
+    TCGTemp *rt = tcgv_vec_temp(r);
+    TCGTemp *at = tcgv_vec_temp(a);
+    TCGArg ri = temp_arg(rt);
+    TCGArg ai = temp_arg(at);
+    TCGType type = rt->base_type;
+    int can;
+
+    tcg_debug_assert(at->base_type >= type);
+    can = tcg_can_emit_vec_op(opc, type, vece);
+    if (can > 0) {
+        vec_gen_2(opc, type, vece, ri, ai);
+    } else if (can < 0) {
+        tcg_expand_vec_op(opc, type, vece, ri, ai);
+    } else {
+        return false;
+    }
+    return true;
+}
+
 void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
 {
-    if (TCG_TARGET_HAS_not_vec) {
-        vec_gen_op2(INDEX_op_not_vec, 0, r, a);
-    } else {
+    if (!TCG_TARGET_HAS_not_vec || !do_op2(vece, r, a, INDEX_op_not_vec)) {
         TCGv_vec t = tcg_const_ones_vec_matching(r);
         tcg_gen_xor_vec(0, r, a, t);
         tcg_temp_free_vec(t);
@@ -309,9 +318,7 @@ void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
 
 void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
 {
-    if (TCG_TARGET_HAS_neg_vec) {
-        vec_gen_op2(INDEX_op_neg_vec, vece, r, a);
-    } else {
+    if (!TCG_TARGET_HAS_neg_vec || !do_op2(vece, r, a, INDEX_op_neg_vec)) {
         TCGv_vec t = tcg_const_zeros_vec_matching(r);
         tcg_gen_sub_vec(vece, r, t, a);
         tcg_temp_free_vec(t);
@@ -409,6 +416,16 @@ static void do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
     }
 }
 
+void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_op3(vece, r, a, b, INDEX_op_add_vec);
+}
+
+void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_op3(vece, r, a, b, INDEX_op_sub_vec);
+}
+
 void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
     do_op3(vece, r, a, b, INDEX_op_mul_vec);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 06/38] tcg: Promote tcg_out_{dup, dupi}_vec to backend interface
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (4 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 05/38] tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 07/38] tcg: Manually expand INDEX_op_dup_vec Richard Henderson
                   ` (34 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

The i386 backend already has these functions, and the aarch64
backend could easily split out one.  Nothing is done with these
functions yet, but this will aid register allocation of
INDEX_op_dup_vec in a later patch.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 12 ++++++++++--
 tcg/i386/tcg-target.inc.c    |  3 ++-
 tcg/tcg.c                    | 14 ++++++++++++++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index b2d3f9c0a5..116ebd8c1a 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -799,7 +799,7 @@ static void tcg_out_logicali(TCGContext *s, AArch64Insn insn, TCGType ext,
 }
 
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
-                             TCGReg rd, uint64_t v64)
+                             TCGReg rd, tcg_target_long v64)
 {
     int op, cmode, imm8;
 
@@ -814,6 +814,14 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
     }
 }
 
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+                            TCGReg rd, TCGReg rs)
+{
+    int is_q = type - TCG_TYPE_V64;
+    tcg_out_insn(s, 3605, DUP, is_q, rd, rs, 1 << vece, 0);
+    return true;
+}
+
 static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
                          tcg_target_long value)
 {
@@ -2197,7 +2205,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         tcg_out_insn(s, 3617, NOT, is_q, 0, a0, a1);
         break;
     case INDEX_op_dup_vec:
-        tcg_out_insn(s, 3605, DUP, is_q, a0, a1, 1 << vece, 0);
+        tcg_out_dup_vec(s, type, vece, a0, a1);
         break;
     case INDEX_op_shli_vec:
         tcg_out_insn(s, 3614, SHL, is_q, a0, a1, a2 + (8 << vece));
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 817a167767..04e3d37b05 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -855,7 +855,7 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
     return true;
 }
 
-static void tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
                             TCGReg r, TCGReg a)
 {
     if (have_avx2) {
@@ -888,6 +888,7 @@ static void tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
             g_assert_not_reached();
         }
     }
+    return true;
 }
 
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
diff --git a/tcg/tcg.c b/tcg/tcg.c
index d3dcfe3dca..5ed9c7bee5 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -108,10 +108,24 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
                        const int *const_args);
 #if TCG_TARGET_MAYBE_vec
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+                            TCGReg dst, TCGReg src);
+static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
+                             TCGReg dst, tcg_target_long arg);
 static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl,
                            unsigned vece, const TCGArg *args,
                            const int *const_args);
 #else
+static inline bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+                                   TCGReg dst, TCGReg src)
+{
+    g_assert_not_reached();
+}
+static inline void tcg_out_dupi_vec(TCGContext *s, TCGType type,
+                                    TCGReg dst, tcg_target_long arg)
+{
+    g_assert_not_reached();
+}
 static inline void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl,
                                   unsigned vece, const TCGArg *args,
                                   const int *const_args)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 07/38] tcg: Manually expand INDEX_op_dup_vec
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (5 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 06/38] tcg: Promote tcg_out_{dup, dupi}_vec to backend interface Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 08/38] tcg: Add tcg_out_dupm_vec to the backend interface Richard Henderson
                   ` (33 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

This case is similar to INDEX_op_mov_* in that we need to do
different things depending on the current location of the source.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c |   9 ++--
 tcg/i386/tcg-target.inc.c    |   8 ++-
 tcg/tcg.c                    | 102 +++++++++++++++++++++++++++++++++++
 3 files changed, 109 insertions(+), 10 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 116ebd8c1a..b272822969 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -2104,10 +2104,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_mov_vec:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_movi_i64:
-    case INDEX_op_dupi_vec:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         g_assert_not_reached();
@@ -2204,9 +2202,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_not_vec:
         tcg_out_insn(s, 3617, NOT, is_q, 0, a0, a1);
         break;
-    case INDEX_op_dup_vec:
-        tcg_out_dup_vec(s, type, vece, a0, a1);
-        break;
     case INDEX_op_shli_vec:
         tcg_out_insn(s, 3614, SHL, is_q, a0, a1, a2 + (8 << vece));
         break;
@@ -2250,6 +2245,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
             }
         }
         break;
+
+    case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
+    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
+    case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
     default:
         g_assert_not_reached();
     }
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 04e3d37b05..49691c4f56 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2601,10 +2601,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-    case INDEX_op_mov_vec:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_movi_i64:
-    case INDEX_op_dupi_vec:
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
     default:
         tcg_abort();
@@ -2793,9 +2791,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_st_vec:
         tcg_out_st(s, type, a0, a1, a2);
         break;
-    case INDEX_op_dup_vec:
-        tcg_out_dup_vec(s, type, vece, a0, a1);
-        break;
 
     case INDEX_op_x86_shufps_vec:
         insn = OPC_SHUFPS;
@@ -2837,6 +2832,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         tcg_out8(s, a2);
         break;
 
+    case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
+    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
+    case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
     default:
         g_assert_not_reached();
     }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 5ed9c7bee5..55498b63d7 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3410,6 +3410,105 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
     }
 }
 
+static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp *op)
+{
+    const TCGLifeData arg_life = op->life;
+    TCGRegSet dup_out_regs, dup_in_regs;
+    TCGTemp *its, *ots;
+    TCGType itype, vtype;
+    unsigned vece;
+    bool ok;
+
+    ots = arg_temp(op->args[0]);
+    its = arg_temp(op->args[1]);
+
+    /* There should be no fixed vector registers.  */
+    tcg_debug_assert(!ots->fixed_reg);
+
+    itype = its->type;
+    vece = TCGOP_VECE(op);
+    vtype = TCGOP_VECL(op) + TCG_TYPE_V64;
+
+    if (its->val_type == TEMP_VAL_CONST) {
+        /* Propagate constant via movi -> dupi.  */
+        tcg_target_ulong val = its->val;
+        if (IS_DEAD_ARG(1)) {
+            temp_dead(s, its);
+        }
+        tcg_reg_alloc_do_movi(s, ots, val, arg_life, op->output_pref[0]);
+        return;
+    }
+
+    dup_out_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[0].u.regs;
+    dup_in_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[1].u.regs;
+
+    /* Allocate the output register now.  */
+    if (ots->val_type != TEMP_VAL_REG) {
+        TCGRegSet allocated_regs = s->reserved_regs;
+
+        if (!IS_DEAD_ARG(1) && its->val_type == TEMP_VAL_REG) {
+            /* Make sure to not spill the input register. */
+            tcg_regset_set_reg(allocated_regs, its->reg);
+        }
+        ots->reg = tcg_reg_alloc(s, dup_out_regs, allocated_regs,
+                                 op->output_pref[0], ots->indirect_base);
+        ots->val_type = TEMP_VAL_REG;
+        ots->mem_coherent = 0;
+        s->reg_to_temp[ots->reg] = ots;
+    }
+
+    switch (its->val_type) {
+    case TEMP_VAL_REG:
+        /*
+         * The dup constriaints must be broad, covering all possible VECE.
+         * However, tcg_op_dup_vec() gets to see the VECE and we allow it
+         * to fail, indicating that extra moves are required for that case.
+         */
+        if (tcg_regset_test_reg(dup_in_regs, its->reg)) {
+            if (tcg_out_dup_vec(s, vtype, vece, ots->reg, its->reg)) {
+                goto done;
+            }
+            /* Try again from memory or a vector input register.  */
+        }
+        if (!its->mem_coherent) {
+            /*
+             * The input register is not synced, and so an extra store
+             * would be required to use memory.  Attempt an integer-vector
+             * register move first.  We do not have a TCGRegSet for this.
+             */
+            if (tcg_out_mov(s, itype, ots->reg, its->reg)) {
+                break;
+            }
+            /* Sync the temp back to its slot and load from there.  */
+            temp_sync(s, its, s->reserved_regs, 0, 0);
+        }
+        /* fall through */
+
+    case TEMP_VAL_MEM:
+        /* TODO: dup from memory */
+        tcg_out_ld(s, itype, ots->reg, its->mem_base->reg, its->mem_offset);
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+
+    /* We now have a vector input register, so dup must succeed. */
+    ok = tcg_out_dup_vec(s, vtype, vece, ots->reg, ots->reg);
+    tcg_debug_assert(ok);
+
+ done:
+    if (IS_DEAD_ARG(1)) {
+        temp_dead(s, its);
+    }
+    if (NEED_SYNC_ARG(0)) {
+        temp_sync(s, ots, s->reserved_regs, 0, 0);
+    }
+    if (IS_DEAD_ARG(0)) {
+        temp_dead(s, ots);
+    }
+}
+
 static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
 {
     const TCGLifeData arg_life = op->life;
@@ -3978,6 +4077,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
         case INDEX_op_dupi_vec:
             tcg_reg_alloc_movi(s, op);
             break;
+        case INDEX_op_dup_vec:
+            tcg_reg_alloc_dup(s, op);
+            break;
         case INDEX_op_insn_start:
             if (num_insns >= 0) {
                 size_t off = tcg_current_code_size(s);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 08/38] tcg: Add tcg_out_dupm_vec to the backend interface
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (6 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 07/38] tcg: Manually expand INDEX_op_dup_vec Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 09/38] tcg/i386: Implement tcg_out_dupm_vec Richard Henderson
                   ` (32 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Currently stubbed out in all backends that support vectors.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c |  6 ++++++
 tcg/i386/tcg-target.inc.c    |  7 +++++++
 tcg/tcg.c                    | 19 ++++++++++++++++++-
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index b272822969..3f95930e88 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -822,6 +822,12 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
     return true;
 }
 
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+                             TCGReg r, TCGReg base, intptr_t offset)
+{
+    return false;
+}
+
 static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
                          tcg_target_long value)
 {
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 49691c4f56..0be7d8f589 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -891,6 +891,13 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
     return true;
 }
 
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+                             TCGReg r, TCGReg base, intptr_t offset)
+{
+    return false;
+}
+
+
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
                              TCGReg ret, tcg_target_long arg)
 {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 55498b63d7..1c34c08791 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -110,6 +110,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 #if TCG_TARGET_MAYBE_vec
 static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
                             TCGReg dst, TCGReg src);
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+                             TCGReg dst, TCGReg base, intptr_t offset);
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
                              TCGReg dst, tcg_target_long arg);
 static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc, unsigned vecl,
@@ -121,6 +123,11 @@ static inline bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
 {
     g_assert_not_reached();
 }
+static inline bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+                                    TCGReg dst, TCGReg base, intptr_t offset)
+{
+    g_assert_not_reached();
+}
 static inline void tcg_out_dupi_vec(TCGContext *s, TCGType type,
                                     TCGReg dst, tcg_target_long arg)
 {
@@ -3416,6 +3423,7 @@ static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp *op)
     TCGRegSet dup_out_regs, dup_in_regs;
     TCGTemp *its, *ots;
     TCGType itype, vtype;
+    intptr_t endian_fixup;
     unsigned vece;
     bool ok;
 
@@ -3485,7 +3493,16 @@ static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp *op)
         /* fall through */
 
     case TEMP_VAL_MEM:
-        /* TODO: dup from memory */
+#ifdef HOST_WORDS_BIGENDIAN
+        endian_fixup = itype == TCG_TYPE_I32 ? 4 : 8;
+        endian_fixup -= 1 << vece;
+#else
+        endian_fixup = 0;
+#endif
+        if (tcg_out_dupm_vec(s, vtype, vece, ots->reg, its->mem_base->reg,
+                             its->mem_offset + endian_fixup)) {
+            goto done;
+        }
         tcg_out_ld(s, itype, ots->reg, its->mem_base->reg, its->mem_offset);
         break;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 09/38] tcg/i386: Implement tcg_out_dupm_vec
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (7 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 08/38] tcg: Add tcg_out_dupm_vec to the backend interface Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 10/38] tcg/aarch64: " Richard Henderson
                   ` (31 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

At the same time, improve tcg_out_dupi_vec wrt broadcast
from the constant pool.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 57 +++++++++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 14 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 0be7d8f589..fcabc1bdf2 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -358,7 +358,6 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_MOVBE_MyGy  (0xf1 | P_EXT38)
 #define OPC_MOVD_VyEy   (0x6e | P_EXT | P_DATA16)
 #define OPC_MOVD_EyVy   (0x7e | P_EXT | P_DATA16)
-#define OPC_MOVDDUP     (0x12 | P_EXT | P_SIMDF2)
 #define OPC_MOVDQA_VxWx (0x6f | P_EXT | P_DATA16)
 #define OPC_MOVDQA_WxVx (0x7f | P_EXT | P_DATA16)
 #define OPC_MOVDQU_VxWx (0x6f | P_EXT | P_SIMDF3)
@@ -458,6 +457,10 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_UD2         (0x0b | P_EXT)
 #define OPC_VPBLENDD    (0x02 | P_EXT3A | P_DATA16)
 #define OPC_VPBLENDVB   (0x4c | P_EXT3A | P_DATA16)
+#define OPC_VPINSRB     (0x20 | P_EXT3A | P_DATA16)
+#define OPC_VPINSRW     (0xc4 | P_EXT | P_DATA16)
+#define OPC_VBROADCASTSS (0x18 | P_EXT38 | P_DATA16)
+#define OPC_VBROADCASTSD (0x19 | P_EXT38 | P_DATA16)
 #define OPC_VPBROADCASTB (0x78 | P_EXT38 | P_DATA16)
 #define OPC_VPBROADCASTW (0x79 | P_EXT38 | P_DATA16)
 #define OPC_VPBROADCASTD (0x58 | P_EXT38 | P_DATA16)
@@ -855,16 +858,17 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
     return true;
 }
 
+static const int avx2_dup_insn[4] = {
+    OPC_VPBROADCASTB, OPC_VPBROADCASTW,
+    OPC_VPBROADCASTD, OPC_VPBROADCASTQ,
+};
+
 static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
                             TCGReg r, TCGReg a)
 {
     if (have_avx2) {
-        static const int dup_insn[4] = {
-            OPC_VPBROADCASTB, OPC_VPBROADCASTW,
-            OPC_VPBROADCASTD, OPC_VPBROADCASTQ,
-        };
         int vex_l = (type == TCG_TYPE_V256 ? P_VEXL : 0);
-        tcg_out_vex_modrm(s, dup_insn[vece] + vex_l, r, 0, a);
+        tcg_out_vex_modrm(s, avx2_dup_insn[vece] + vex_l, r, 0, a);
     } else {
         switch (vece) {
         case MO_8:
@@ -894,10 +898,35 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
 static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
                              TCGReg r, TCGReg base, intptr_t offset)
 {
-    return false;
+    if (have_avx2) {
+        int vex_l = (type == TCG_TYPE_V256 ? P_VEXL : 0);
+        tcg_out_vex_modrm_offset(s, avx2_dup_insn[vece] + vex_l,
+                                 r, 0, base, offset);
+    } else {
+        switch (vece) {
+        case MO_64:
+            tcg_out_vex_modrm_offset(s, OPC_VBROADCASTSD, r, 0, base, offset);
+            break;
+        case MO_32:
+            tcg_out_vex_modrm_offset(s, OPC_VBROADCASTSS, r, 0, base, offset);
+            break;
+        case MO_16:
+            tcg_out_vex_modrm_offset(s, OPC_VPINSRW, r, r, base, offset);
+            tcg_out8(s, 0); /* imm8 */
+            tcg_out_dup_vec(s, type, vece, r, r);
+            break;
+        case MO_8:
+            tcg_out_vex_modrm_offset(s, OPC_VPINSRB, r, r, base, offset);
+            tcg_out8(s, 0); /* imm8 */
+            tcg_out_dup_vec(s, type, vece, r, r);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+    return true;
 }
 
-
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
                              TCGReg ret, tcg_target_long arg)
 {
@@ -918,16 +947,16 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
         } else if (have_avx2) {
             tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTQ + vex_l, ret);
         } else {
-            tcg_out_vex_modrm_pool(s, OPC_MOVDDUP, ret);
+            tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSD, ret);
         }
         new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
-    } else if (have_avx2) {
-        tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTD + vex_l, ret);
-        new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
     } else {
-        tcg_out_vex_modrm_pool(s, OPC_MOVD_VyEy, ret);
+        if (have_avx2) {
+            tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSD + vex_l, ret);
+        } else {
+            tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSS, ret);
+        }
         new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
-        tcg_out_dup_vec(s, type, MO_32, ret, ret);
     }
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 10/38] tcg/aarch64: Implement tcg_out_dupm_vec
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (8 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 09/38] tcg/i386: Implement tcg_out_dupm_vec Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 11/38] tcg: Add INDEX_op_dup_mem_vec Richard Henderson
                   ` (30 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 38 ++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 3f95930e88..1db4e22365 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -381,6 +381,9 @@ typedef enum {
     I3207_BLR       = 0xd63f0000,
     I3207_RET       = 0xd65f0000,
 
+    /* AdvSIMD load/store single structure.  */
+    I3303_LD1R      = 0x0d40c000,
+
     /* Load literal for loading the address at pc-relative offset */
     I3305_LDR       = 0x58000000,
     I3305_LDR_v64   = 0x5c000000,
@@ -414,6 +417,8 @@ typedef enum {
     I3312_LDRVQ     = 0x3c000000 | 3 << 22 | 0 << 30,
     I3312_STRVQ     = 0x3c000000 | 2 << 22 | 0 << 30,
 
+
+
     I3312_TO_I3310  = 0x00200800,
     I3312_TO_I3313  = 0x01000000,
 
@@ -566,7 +571,14 @@ static inline uint32_t tcg_in32(TCGContext *s)
 #define tcg_out_insn(S, FMT, OP, ...) \
     glue(tcg_out_insn_,FMT)(S, glue(glue(glue(I,FMT),_),OP), ## __VA_ARGS__)
 
-static void tcg_out_insn_3305(TCGContext *s, AArch64Insn insn, int imm19, TCGReg rt)
+static void tcg_out_insn_3303(TCGContext *s, AArch64Insn insn, bool q,
+                              TCGReg rt, TCGReg rn, unsigned size)
+{
+    tcg_out32(s, insn | (rt & 0x1f) | (rn << 5) | (size << 10) | (q << 30));
+}
+
+static void tcg_out_insn_3305(TCGContext *s, AArch64Insn insn,
+                              int imm19, TCGReg rt)
 {
     tcg_out32(s, insn | (imm19 & 0x7ffff) << 5 | rt);
 }
@@ -825,7 +837,29 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
 static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
                              TCGReg r, TCGReg base, intptr_t offset)
 {
-    return false;
+    if (offset != 0) {
+        AArch64Insn add_insn = I3401_ADDI;
+        TCGReg temp = TCG_REG_TMP;
+
+        if (offset < 0) {
+            add_insn = I3401_SUBI;
+            offset = -offset;
+        }
+        if (offset <= 0xfff) {
+            tcg_out_insn_3401(s, add_insn, 1, temp, base, offset);
+        } else if (offset <= 0xffffff) {
+            tcg_out_insn_3401(s, add_insn, 1, temp, base, offset & 0xfff000);
+            if (offset & 0xfff) {
+                tcg_out_insn_3401(s, add_insn, 1, temp, base, offset & 0xfff);
+            }
+        } else {
+            tcg_out_movi(s, TCG_TYPE_PTR, temp, offset);
+            tcg_out_insn(s, 3502, ADD, 1, temp, temp, base);
+        }
+        base = temp;
+    }
+    tcg_out_insn(s, 3303, LD1R, type == TCG_TYPE_V128, r, base, vece);
+    return true;
 }
 
 static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 11/38] tcg: Add INDEX_op_dup_mem_vec
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (9 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 10/38] tcg/aarch64: " Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 12/38] tcg: Add gvec expanders for variable shift Richard Henderson
                   ` (29 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Allow the backend to expand dup from memory directly, instead of
forcing the value into a temp first.  This is especially important
if integer/vector register moves do not exist.

Note that officially tcg_out_dupm_vec is allowed to fail.
If it did, we could fix this up relatively easily:

  VECE == 32/64:
    Load the value into a vector register, then dup.
    Both of these must work.

  VECE == 8/16:
    If the value happens to be at an offset such that an aligned
    load would place the desired value in the least significant
    end of the register, go ahead and load w/garbage in high bits.

    Load the value w/INDEX_op_ld{8,16}_i32.
    Attempt a move directly to vector reg, which may fail.
    Store the value into the backing store for OTS.
    Load the value into the vector reg w/TCG_TYPE_I32, which must work.
    Duplicate from the vector reg into itself, which must work.

All of which is well and good, except that all supported
hosts can support dupm for all vece, so all of the failure
paths would be dead code and untestable.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op.h                 |  1 +
 tcg/tcg-opc.h                |  1 +
 tcg/aarch64/tcg-target.inc.c |  4 ++
 tcg/i386/tcg-target.inc.c    |  4 ++
 tcg/tcg-op-gvec.c            | 88 +++++++++++++++++++-----------------
 tcg/tcg-op-vec.c             | 11 +++++
 tcg/tcg.c                    |  1 +
 7 files changed, 69 insertions(+), 41 deletions(-)

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 1f1824c30a..9fff9864f6 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -954,6 +954,7 @@ void tcg_gen_atomic_umax_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 void tcg_gen_mov_vec(TCGv_vec, TCGv_vec);
 void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec, TCGv_i32);
 void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec, TCGv_i64);
+void tcg_gen_dup_mem_vec(unsigned vece, TCGv_vec, TCGv_ptr, tcg_target_long);
 void tcg_gen_dup8i_vec(TCGv_vec, uint32_t);
 void tcg_gen_dup16i_vec(TCGv_vec, uint32_t);
 void tcg_gen_dup32i_vec(TCGv_vec, uint32_t);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 1bad6e4208..4bf71f261f 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -219,6 +219,7 @@ DEF(dup2_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_REG_BITS == 32))
 
 DEF(ld_vec, 1, 1, 1, IMPLVEC)
 DEF(st_vec, 0, 2, 1, IMPLVEC)
+DEF(dupm_vec, 1, 1, 1, IMPLVEC)
 
 DEF(add_vec, 1, 2, 0, IMPLVEC)
 DEF(sub_vec, 1, 2, 0, IMPLVEC)
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 1db4e22365..1c9f4b0cb3 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -2188,6 +2188,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_st_vec:
         tcg_out_st(s, type, a0, a1, a2);
         break;
+    case INDEX_op_dupm_vec:
+        tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
+        break;
     case INDEX_op_add_vec:
         tcg_out_insn(s, 3616, ADD, is_q, vece, a0, a1, a2);
         break;
@@ -2520,6 +2523,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return &w_w;
     case INDEX_op_ld_vec:
     case INDEX_op_st_vec:
+    case INDEX_op_dupm_vec:
         return &w_r;
     case INDEX_op_dup_vec:
         return &w_wr;
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index fcabc1bdf2..4c42a2430d 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2827,6 +2827,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_st_vec:
         tcg_out_st(s, type, a0, a1, a2);
         break;
+    case INDEX_op_dupm_vec:
+        tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
+        break;
 
     case INDEX_op_x86_shufps_vec:
         insn = OPC_SHUFPS;
@@ -3113,6 +3116,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_ld_vec:
     case INDEX_op_st_vec:
+    case INDEX_op_dupm_vec:
         return &x_r;
 
     case INDEX_op_add_vec:
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 0996ef0812..f056018713 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -390,6 +390,40 @@ static TCGType choose_vector_type(TCGOpcode op, unsigned vece, uint32_t size,
     return 0;
 }
 
+static void do_dup_store(TCGType type, uint32_t dofs, uint32_t oprsz,
+                         uint32_t maxsz, TCGv_vec t_vec)
+{
+    uint32_t i = 0;
+
+    switch (type) {
+    case TCG_TYPE_V256:
+        /* Recall that ARM SVE allows vector sizes that are not a
+         * power of 2, but always a multiple of 16.  The intent is
+         * that e.g. size == 80 would be expanded with 2x32 + 1x16.
+         */
+        for (; i + 32 <= oprsz; i += 32) {
+            tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V256);
+        }
+        /* fallthru */
+    case TCG_TYPE_V128:
+        for (; i + 16 <= oprsz; i += 16) {
+            tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V128);
+        }
+        break;
+    case TCG_TYPE_V64:
+        for (; i < oprsz; i += 8) {
+            tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V64);
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    if (oprsz < maxsz) {
+        expand_clr(dofs + oprsz, maxsz - oprsz);
+    }
+}
+
 /* Set OPRSZ bytes at DOFS to replications of IN_32, IN_64 or IN_C.
  * Only one of IN_32 or IN_64 may be set;
  * IN_C is used if IN_32 and IN_64 are unset.
@@ -429,49 +463,11 @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t oprsz,
         } else if (in_64) {
             tcg_gen_dup_i64_vec(vece, t_vec, in_64);
         } else {
-            switch (vece) {
-            case MO_8:
-                tcg_gen_dup8i_vec(t_vec, in_c);
-                break;
-            case MO_16:
-                tcg_gen_dup16i_vec(t_vec, in_c);
-                break;
-            case MO_32:
-                tcg_gen_dup32i_vec(t_vec, in_c);
-                break;
-            default:
-                tcg_gen_dup64i_vec(t_vec, in_c);
-                break;
-            }
+            tcg_gen_dupi_vec(vece, t_vec, in_c);
         }
-
-        i = 0;
-        switch (type) {
-        case TCG_TYPE_V256:
-            /* Recall that ARM SVE allows vector sizes that are not a
-             * power of 2, but always a multiple of 16.  The intent is
-             * that e.g. size == 80 would be expanded with 2x32 + 1x16.
-             */
-            for (; i + 32 <= oprsz; i += 32) {
-                tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V256);
-            }
-            /* fallthru */
-        case TCG_TYPE_V128:
-            for (; i + 16 <= oprsz; i += 16) {
-                tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V128);
-            }
-            break;
-        case TCG_TYPE_V64:
-            for (; i < oprsz; i += 8) {
-                tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V64);
-            }
-            break;
-        default:
-            g_assert_not_reached();
-        }
-
+        do_dup_store(type, dofs, oprsz, maxsz, t_vec);
         tcg_temp_free_vec(t_vec);
-        goto done;
+        return;
     }
 
     /* Otherwise, inline with an integer type, unless "large".  */
@@ -1287,6 +1283,16 @@ void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t oprsz,
 void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
                           uint32_t oprsz, uint32_t maxsz)
 {
+    if (vece <= MO_64) {
+        TCGType type = choose_vector_type(0, vece, oprsz, 0);
+        if (type != 0) {
+            TCGv_vec t_vec = tcg_temp_new_vec(type);
+            tcg_gen_dup_mem_vec(vece, t_vec, cpu_env, aofs);
+            do_dup_store(type, dofs, oprsz, maxsz, t_vec);
+            tcg_temp_free_vec(t_vec);
+            return;
+        }
+    }
     if (vece <= MO_32) {
         TCGv_i32 in = tcg_temp_new_i32();
         switch (vece) {
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index cfb18682b1..ce7987b858 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -194,6 +194,17 @@ void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec r, TCGv_i32 a)
     vec_gen_2(INDEX_op_dup_vec, type, vece, ri, ai);
 }
 
+void tcg_gen_dup_mem_vec(unsigned vece, TCGv_vec r, TCGv_ptr b,
+                         tcg_target_long ofs)
+{
+    TCGArg ri = tcgv_vec_arg(r);
+    TCGArg bi = tcgv_ptr_arg(b);
+    TCGTemp *rt = arg_temp(ri);
+    TCGType type = rt->base_type;
+
+    vec_gen_3(INDEX_op_dupm_vec, type, vece, ri, bi, ofs);
+}
+
 static void vec_gen_ldst(TCGOpcode opc, TCGv_vec r, TCGv_ptr b, TCGArg o)
 {
     TCGArg ri = tcgv_vec_arg(r);
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 1c34c08791..0b0b228bb5 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1605,6 +1605,7 @@ bool tcg_op_supported(TCGOpcode op)
     case INDEX_op_mov_vec:
     case INDEX_op_dup_vec:
     case INDEX_op_dupi_vec:
+    case INDEX_op_dupm_vec:
     case INDEX_op_ld_vec:
     case INDEX_op_st_vec:
     case INDEX_op_add_vec:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 12/38] tcg: Add gvec expanders for variable shift
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (10 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 11/38] tcg: Add INDEX_op_dup_mem_vec Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-23 19:04   ` David Hildenbrand
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 13/38] tcg/i386: Support vector variable shift opcodes Richard Henderson
                   ` (28 subsequent siblings)
  40 siblings, 1 reply; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/tcg-runtime.h      |  15 ++++
 tcg/tcg-op-gvec.h            |   7 ++
 tcg/tcg-op.h                 |   4 ++
 accel/tcg/tcg-runtime-gvec.c | 132 +++++++++++++++++++++++++++++++++++
 tcg/tcg-op-gvec.c            |  87 +++++++++++++++++++++++
 tcg/tcg-op-vec.c             |  15 ++++
 6 files changed, 260 insertions(+)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index dfe325625c..ed3ce5fd91 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -254,6 +254,21 @@ DEF_HELPER_FLAGS_3(gvec_sar16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(gvec_sar32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(gvec_sar64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(gvec_shl8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_shl16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_shl32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_shl64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_shr8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_shr16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_shr32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_shr64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_sar8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sar16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sar32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sar64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(gvec_eq8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_eq16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_eq32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index 850da32ded..1cd18a959a 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -294,6 +294,13 @@ void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz);
 
+void tcg_gen_gvec_shlv(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_shrv(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_sarv(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+
 void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs,
                       uint32_t aofs, uint32_t bofs,
                       uint32_t oprsz, uint32_t maxsz);
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 9fff9864f6..833c6330b5 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -986,6 +986,10 @@ void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 
+void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
+void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
+void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
+
 void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, TCGv_vec r,
                      TCGv_vec a, TCGv_vec b);
 
diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index e2c6f24262..7b88f5590c 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -725,6 +725,138 @@ void HELPER(gvec_sar64i)(void *d, void *a, uint32_t desc)
     clear_high(d, oprsz, desc);
 }
 
+void HELPER(gvec_shl8v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint8_t)) {
+        *(uint8_t *)(d + i) = *(uint8_t *)(a + i) << *(uint8_t *)(b + i);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_shl16v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint16_t)) {
+        *(uint16_t *)(d + i) = *(uint16_t *)(a + i) << *(uint16_t *)(b + i);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_shl32v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
+        *(uint32_t *)(d + i) = *(uint32_t *)(a + i) << *(uint32_t *)(b + i);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_shl64v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
+        *(uint64_t *)(d + i) = *(uint64_t *)(a + i) << *(uint64_t *)(b + i);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_shr8v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint8_t)) {
+        *(uint8_t *)(d + i) = *(uint8_t *)(a + i) >> *(uint8_t *)(b + i);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_shr16v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint16_t)) {
+        *(uint16_t *)(d + i) = *(uint16_t *)(a + i) >> *(uint16_t *)(b + i);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_shr32v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
+        *(uint32_t *)(d + i) = *(uint32_t *)(a + i) >> *(uint32_t *)(b + i);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_shr64v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
+        *(uint64_t *)(d + i) = *(uint64_t *)(a + i) >> *(uint64_t *)(b + i);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_sar8v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(vec8)) {
+        *(int8_t *)(d + i) = *(int8_t *)(a + i) >> *(int8_t *)(b + i);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_sar16v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(int16_t)) {
+        *(int16_t *)(d + i) = *(int16_t *)(a + i) >> *(int16_t *)(b + i);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_sar32v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(vec32)) {
+        *(int32_t *)(d + i) = *(int32_t *)(a + i) >> *(int32_t *)(b + i);
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_sar64v)(void *d, void *a, void *b, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(vec64)) {
+        *(int64_t *)(d + i) = *(int64_t *)(a + i) >> *(int64_t *)(b + i);
+    }
+    clear_high(d, oprsz, desc);
+}
+
 /* If vectors are enabled, the compiler fills in -1 for true.
    Otherwise, we must take care of this by hand.  */
 #ifdef CONFIG_VECTOR16
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index f056018713..5d28184045 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -2382,6 +2382,93 @@ void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
     }
 }
 
+void tcg_gen_gvec_shlv(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen3 g[4] = {
+        { .fniv = tcg_gen_shlv_vec,
+          .fno = gen_helper_gvec_shl8v,
+          .opc = INDEX_op_shlv_vec,
+          .vece = MO_8 },
+        { .fniv = tcg_gen_shlv_vec,
+          .fno = gen_helper_gvec_shl16v,
+          .opc = INDEX_op_shlv_vec,
+          .vece = MO_16 },
+        { .fni4 = tcg_gen_shl_i32,
+          .fniv = tcg_gen_shlv_vec,
+          .fno = gen_helper_gvec_shl32v,
+          .opc = INDEX_op_shlv_vec,
+          .vece = MO_32 },
+        { .fni8 = tcg_gen_shl_i64,
+          .fniv = tcg_gen_shlv_vec,
+          .fno = gen_helper_gvec_shl64v,
+          .opc = INDEX_op_shlv_vec,
+          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+          .vece = MO_64 },
+    };
+
+    tcg_debug_assert(vece <= MO_64);
+    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
+}
+
+void tcg_gen_gvec_shrv(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen3 g[4] = {
+        { .fniv = tcg_gen_shrv_vec,
+          .fno = gen_helper_gvec_shr8v,
+          .opc = INDEX_op_shrv_vec,
+          .vece = MO_8 },
+        { .fniv = tcg_gen_shrv_vec,
+          .fno = gen_helper_gvec_shr16v,
+          .opc = INDEX_op_shrv_vec,
+          .vece = MO_16 },
+        { .fni4 = tcg_gen_shr_i32,
+          .fniv = tcg_gen_shrv_vec,
+          .fno = gen_helper_gvec_shr32v,
+          .opc = INDEX_op_shrv_vec,
+          .vece = MO_32 },
+        { .fni8 = tcg_gen_shr_i64,
+          .fniv = tcg_gen_shrv_vec,
+          .fno = gen_helper_gvec_shr64v,
+          .opc = INDEX_op_shrv_vec,
+          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+          .vece = MO_64 },
+    };
+
+    tcg_debug_assert(vece <= MO_64);
+    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
+}
+
+void tcg_gen_gvec_sarv(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen3 g[4] = {
+        { .fniv = tcg_gen_sarv_vec,
+          .fno = gen_helper_gvec_sar8v,
+          .opc = INDEX_op_sarv_vec,
+          .vece = MO_8 },
+        { .fniv = tcg_gen_sarv_vec,
+          .fno = gen_helper_gvec_sar16v,
+          .opc = INDEX_op_sarv_vec,
+          .vece = MO_16 },
+        { .fni4 = tcg_gen_sar_i32,
+          .fniv = tcg_gen_sarv_vec,
+          .fno = gen_helper_gvec_sar32v,
+          .opc = INDEX_op_sarv_vec,
+          .vece = MO_32 },
+        { .fni8 = tcg_gen_sar_i64,
+          .fniv = tcg_gen_sarv_vec,
+          .fno = gen_helper_gvec_sar64v,
+          .opc = INDEX_op_sarv_vec,
+          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+          .vece = MO_64 },
+    };
+
+    tcg_debug_assert(vece <= MO_64);
+    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
+}
+
 /* Expand OPSZ bytes worth of three-operand operations using i32 elements.  */
 static void expand_cmp_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                            uint32_t oprsz, TCGCond cond)
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index ce7987b858..6601cb8a8f 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -481,3 +481,18 @@ void tcg_gen_umax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
     do_op3(vece, r, a, b, INDEX_op_umax_vec);
 }
+
+void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_op3(vece, r, a, b, INDEX_op_shlv_vec);
+}
+
+void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_op3(vece, r, a, b, INDEX_op_shrv_vec);
+}
+
+void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
+{
+    do_op3(vece, r, a, b, INDEX_op_sarv_vec);
+}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 13/38] tcg/i386: Support vector variable shift opcodes
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (11 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 12/38] tcg: Add gvec expanders for variable shift Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 14/38] tcg/aarch64: " Richard Henderson
                   ` (27 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h     |  2 +-
 tcg/i386/tcg-target.inc.c | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 241bf19413..b240633455 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -184,7 +184,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_neg_vec          0
 #define TCG_TARGET_HAS_shi_vec          1
 #define TCG_TARGET_HAS_shs_vec          0
-#define TCG_TARGET_HAS_shv_vec          0
+#define TCG_TARGET_HAS_shv_vec          have_avx2
 #define TCG_TARGET_HAS_cmp_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 4c42a2430d..04e609c7b2 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -467,6 +467,11 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_VPBROADCASTQ (0x59 | P_EXT38 | P_DATA16)
 #define OPC_VPERMQ      (0x00 | P_EXT3A | P_DATA16 | P_REXW)
 #define OPC_VPERM2I128  (0x46 | P_EXT3A | P_DATA16 | P_VEXL)
+#define OPC_VPSLLVD     (0x47 | P_EXT38 | P_DATA16)
+#define OPC_VPSLLVQ     (0x47 | P_EXT38 | P_DATA16 | P_REXW)
+#define OPC_VPSRAVD     (0x46 | P_EXT38 | P_DATA16)
+#define OPC_VPSRLVD     (0x45 | P_EXT38 | P_DATA16)
+#define OPC_VPSRLVQ     (0x45 | P_EXT38 | P_DATA16 | P_REXW)
 #define OPC_VZEROUPPER  (0x77 | P_EXT)
 #define OPC_XCHG_ax_r32	(0x90)
 
@@ -2705,6 +2710,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     static int const umax_insn[4] = {
         OPC_PMAXUB, OPC_PMAXUW, OPC_PMAXUD, OPC_UD2
     };
+    static int const shlv_insn[4] = {
+        /* TODO: AVX512 adds support for MO_16.  */
+        OPC_UD2, OPC_UD2, OPC_VPSLLVD, OPC_VPSLLVQ
+    };
+    static int const shrv_insn[4] = {
+        /* TODO: AVX512 adds support for MO_16.  */
+        OPC_UD2, OPC_UD2, OPC_VPSRLVD, OPC_VPSRLVQ
+    };
+    static int const sarv_insn[4] = {
+        /* TODO: AVX512 adds support for MO_16, MO_64.  */
+        OPC_UD2, OPC_UD2, OPC_VPSRAVD, OPC_UD2
+    };
 
     TCGType type = vecl + TCG_TYPE_V64;
     int insn, sub;
@@ -2757,6 +2774,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_umax_vec:
         insn = umax_insn[vece];
         goto gen_simd;
+    case INDEX_op_shlv_vec:
+        insn = shlv_insn[vece];
+        goto gen_simd;
+    case INDEX_op_shrv_vec:
+        insn = shrv_insn[vece];
+        goto gen_simd;
+    case INDEX_op_sarv_vec:
+        insn = sarv_insn[vece];
+        goto gen_simd;
     case INDEX_op_x86_punpckl_vec:
         insn = punpckl_insn[vece];
         goto gen_simd;
@@ -3134,6 +3160,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_umin_vec:
     case INDEX_op_smax_vec:
     case INDEX_op_umax_vec:
+    case INDEX_op_shlv_vec:
+    case INDEX_op_shrv_vec:
+    case INDEX_op_sarv_vec:
     case INDEX_op_cmp_vec:
     case INDEX_op_x86_shufps_vec:
     case INDEX_op_x86_blend_vec:
@@ -3191,6 +3220,12 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
         }
         return 1;
 
+    case INDEX_op_shlv_vec:
+    case INDEX_op_shrv_vec:
+        return have_avx2 && vece >= MO_32;
+    case INDEX_op_sarv_vec:
+        return have_avx2 && vece == MO_32;
+
     case INDEX_op_mul_vec:
         if (vece == MO_8) {
             /* We can expand the operation for MO_8.  */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 14/38] tcg/aarch64: Support vector variable shift opcodes
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (12 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 13/38] tcg/i386: Support vector variable shift opcodes Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 15/38] tcg: Implement tcg_gen_gvec_3i() Richard Henderson
                   ` (26 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h     |  2 +-
 tcg/aarch64/tcg-target.opc.h |  2 ++
 tcg/aarch64/tcg-target.inc.c | 42 ++++++++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index ce2bb1f90b..f5640a229b 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -134,7 +134,7 @@ typedef enum {
 #define TCG_TARGET_HAS_neg_vec          1
 #define TCG_TARGET_HAS_shi_vec          1
 #define TCG_TARGET_HAS_shs_vec          0
-#define TCG_TARGET_HAS_shv_vec          0
+#define TCG_TARGET_HAS_shv_vec          1
 #define TCG_TARGET_HAS_cmp_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
diff --git a/tcg/aarch64/tcg-target.opc.h b/tcg/aarch64/tcg-target.opc.h
index 4816a6c3d4..59e1d3f7f7 100644
--- a/tcg/aarch64/tcg-target.opc.h
+++ b/tcg/aarch64/tcg-target.opc.h
@@ -1,3 +1,5 @@
 /* Target-specific opcodes for host vector expansion.  These will be
    emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
    consider these to be UNSPEC with names.  */
+
+DEF(aa64_sshl_vec, 1, 2, 0, IMPLVEC)
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 1c9f4b0cb3..7d2a8213ec 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -538,12 +538,14 @@ typedef enum {
     I3616_CMEQ      = 0x2e208c00,
     I3616_SMAX      = 0x0e206400,
     I3616_SMIN      = 0x0e206c00,
+    I3616_SSHL      = 0x0e204400,
     I3616_SQADD     = 0x0e200c00,
     I3616_SQSUB     = 0x0e202c00,
     I3616_UMAX      = 0x2e206400,
     I3616_UMIN      = 0x2e206c00,
     I3616_UQADD     = 0x2e200c00,
     I3616_UQSUB     = 0x2e202c00,
+    I3616_USHL      = 0x2e204400,
 
     /* AdvSIMD two-reg misc.  */
     I3617_CMGT0     = 0x0e208800,
@@ -2254,6 +2256,12 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_sari_vec:
         tcg_out_insn(s, 3614, SSHR, is_q, a0, a1, (16 << vece) - a2);
         break;
+    case INDEX_op_shlv_vec:
+        tcg_out_insn(s, 3616, USHL, is_q, vece, a0, a1, a2);
+        break;
+    case INDEX_op_aa64_sshl_vec:
+        tcg_out_insn(s, 3616, SSHL, is_q, vece, a0, a1, a2);
+        break;
     case INDEX_op_cmp_vec:
         {
             TCGCond cond = args[3];
@@ -2321,7 +2329,11 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_smin_vec:
     case INDEX_op_umax_vec:
     case INDEX_op_umin_vec:
+    case INDEX_op_shlv_vec:
         return 1;
+    case INDEX_op_shrv_vec:
+    case INDEX_op_sarv_vec:
+        return -1;
     case INDEX_op_mul_vec:
         return vece < MO_64;
 
@@ -2333,6 +2345,32 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
 void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                        TCGArg a0, ...)
 {
+    va_list va;
+    TCGv_vec v0, v1, v2, t1;
+
+    va_start(va, a0);
+    v0 = temp_tcgv_vec(arg_temp(a0));
+    v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+    v2 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+
+    switch (opc) {
+    case INDEX_op_shrv_vec:
+    case INDEX_op_sarv_vec:
+        /* Right shifts are negative left shifts for AArch64.  */
+        t1 = tcg_temp_new_vec(type);
+        tcg_gen_neg_vec(vece, t1, v2);
+        opc = (opc == INDEX_op_shrv_vec
+               ? INDEX_op_shlv_vec : INDEX_op_aa64_sshl_vec);
+        vec_gen_3(opc, type, vece, tcgv_vec_arg(v0),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(t1));
+        tcg_temp_free_vec(t1);
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+
+    va_end(va);
 }
 
 static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
@@ -2514,6 +2552,10 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_smin_vec:
     case INDEX_op_umax_vec:
     case INDEX_op_umin_vec:
+    case INDEX_op_shlv_vec:
+    case INDEX_op_shrv_vec:
+    case INDEX_op_sarv_vec:
+    case INDEX_op_aa64_sshl_vec:
         return &w_w_w;
     case INDEX_op_not_vec:
     case INDEX_op_neg_vec:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 15/38] tcg: Implement tcg_gen_gvec_3i()
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (13 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 14/38] tcg/aarch64: " Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 16/38] tcg: Specify optional vector requirements with a list Richard Henderson
                   ` (25 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

From: David Hildenbrand <david@redhat.com>

Let's add tcg_gen_gvec_3i(), similar to tcg_gen_gvec_2i(), however
without introducing "gen_helper_gvec_3i *fnoi", as it isn't needed
for now.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190416185301.25344-2-david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-gvec.h |  24 ++++++++
 tcg/tcg-op-gvec.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 163 insertions(+)

diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index 1cd18a959a..aaeb6e5d8b 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -164,6 +164,27 @@ typedef struct {
     bool load_dest;
 } GVecGen3;
 
+typedef struct {
+    /*
+     * Expand inline as a 64-bit or 32-bit integer. Only one of these will be
+     * non-NULL.
+     */
+    void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64, int64_t);
+    void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32, int32_t);
+    /* Expand inline with a host vector type.  */
+    void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, int64_t);
+    /* Expand out-of-line helper w/descriptor, data in descriptor.  */
+    gen_helper_gvec_3 *fno;
+    /* The opcode, if any, to which this corresponds.  */
+    TCGOpcode opc;
+    /* The vector element size, if applicable.  */
+    uint8_t vece;
+    /* Prefer i64 to v64.  */
+    bool prefer_i64;
+    /* Load dest as a 3rd source operand.  */
+    bool load_dest;
+} GVecGen3i;
+
 typedef struct {
     /* Expand inline as a 64-bit or 32-bit integer.
        Only one of these will be non-NULL.  */
@@ -193,6 +214,9 @@ void tcg_gen_gvec_2s(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
                      uint32_t maxsz, TCGv_i64 c, const GVecGen2s *);
 void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen3 *);
+void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                     uint32_t oprsz, uint32_t maxsz, int64_t c,
+                     const GVecGen3i *);
 void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen4 *);
 
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 5d28184045..3eb9126f50 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -659,6 +659,29 @@ static void expand_3_i32(uint32_t dofs, uint32_t aofs,
     tcg_temp_free_i32(t0);
 }
 
+static void expand_3i_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                          uint32_t oprsz, int32_t c, bool load_dest,
+                          void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32, int32_t))
+{
+    TCGv_i32 t0 = tcg_temp_new_i32();
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    TCGv_i32 t2 = tcg_temp_new_i32();
+    uint32_t i;
+
+    for (i = 0; i < oprsz; i += 4) {
+        tcg_gen_ld_i32(t0, cpu_env, aofs + i);
+        tcg_gen_ld_i32(t1, cpu_env, bofs + i);
+        if (load_dest) {
+            tcg_gen_ld_i32(t2, cpu_env, dofs + i);
+        }
+        fni(t2, t0, t1, c);
+        tcg_gen_st_i32(t2, cpu_env, dofs + i);
+    }
+    tcg_temp_free_i32(t0);
+    tcg_temp_free_i32(t1);
+    tcg_temp_free_i32(t2);
+}
+
 /* Expand OPSZ bytes worth of three-operand operations using i32 elements.  */
 static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                          uint32_t cofs, uint32_t oprsz, bool write_aofs,
@@ -766,6 +789,29 @@ static void expand_3_i64(uint32_t dofs, uint32_t aofs,
     tcg_temp_free_i64(t0);
 }
 
+static void expand_3i_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                          uint32_t oprsz, int64_t c, bool load_dest,
+                          void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, int64_t))
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    uint32_t i;
+
+    for (i = 0; i < oprsz; i += 8) {
+        tcg_gen_ld_i64(t0, cpu_env, aofs + i);
+        tcg_gen_ld_i64(t1, cpu_env, bofs + i);
+        if (load_dest) {
+            tcg_gen_ld_i64(t2, cpu_env, dofs + i);
+        }
+        fni(t2, t0, t1, c);
+        tcg_gen_st_i64(t2, cpu_env, dofs + i);
+    }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
 /* Expand OPSZ bytes worth of three-operand operations using i64 elements.  */
 static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                          uint32_t cofs, uint32_t oprsz, bool write_aofs,
@@ -879,6 +925,35 @@ static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
     tcg_temp_free_vec(t0);
 }
 
+/*
+ * Expand OPSZ bytes worth of three-vector operands and an immediate operand
+ * using host vectors.
+ */
+static void expand_3i_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
+                          uint32_t bofs, uint32_t oprsz, uint32_t tysz,
+                          TCGType type, int64_t c, bool load_dest,
+                          void (*fni)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec,
+                                      int64_t))
+{
+    TCGv_vec t0 = tcg_temp_new_vec(type);
+    TCGv_vec t1 = tcg_temp_new_vec(type);
+    TCGv_vec t2 = tcg_temp_new_vec(type);
+    uint32_t i;
+
+    for (i = 0; i < oprsz; i += tysz) {
+        tcg_gen_ld_vec(t0, cpu_env, aofs + i);
+        tcg_gen_ld_vec(t1, cpu_env, bofs + i);
+        if (load_dest) {
+            tcg_gen_ld_vec(t2, cpu_env, dofs + i);
+        }
+        fni(vece, t2, t0, t1, c);
+        tcg_gen_st_vec(t2, cpu_env, dofs + i);
+    }
+    tcg_temp_free_vec(t0);
+    tcg_temp_free_vec(t1);
+    tcg_temp_free_vec(t2);
+}
+
 /* Expand OPSZ bytes worth of four-operand operations using host vectors.  */
 static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
                          uint32_t bofs, uint32_t cofs, uint32_t oprsz,
@@ -1170,6 +1245,70 @@ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,
     }
 }
 
+/* Expand a vector operation with three vectors and an immediate.  */
+void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                     uint32_t oprsz, uint32_t maxsz, int64_t c,
+                     const GVecGen3i *g)
+{
+    TCGType type;
+    uint32_t some;
+
+    check_size_align(oprsz, maxsz, dofs | aofs | bofs);
+    check_overlap_3(dofs, aofs, bofs, maxsz);
+
+    type = 0;
+    if (g->fniv) {
+        type = choose_vector_type(g->opc, g->vece, oprsz, g->prefer_i64);
+    }
+    switch (type) {
+    case TCG_TYPE_V256:
+        /*
+         * Recall that ARM SVE allows vector sizes that are not a
+         * power of 2, but always a multiple of 16.  The intent is
+         * that e.g. size == 80 would be expanded with 2x32 + 1x16.
+         */
+        some = QEMU_ALIGN_DOWN(oprsz, 32);
+        expand_3i_vec(g->vece, dofs, aofs, bofs, some, 32, TCG_TYPE_V256,
+                      c, g->load_dest, g->fniv);
+        if (some == oprsz) {
+            break;
+        }
+        dofs += some;
+        aofs += some;
+        bofs += some;
+        oprsz -= some;
+        maxsz -= some;
+        /* fallthru */
+    case TCG_TYPE_V128:
+        expand_3i_vec(g->vece, dofs, aofs, bofs, oprsz, 16, TCG_TYPE_V128,
+                      c, g->load_dest, g->fniv);
+        break;
+    case TCG_TYPE_V64:
+        expand_3i_vec(g->vece, dofs, aofs, bofs, oprsz, 8, TCG_TYPE_V64,
+                      c, g->load_dest, g->fniv);
+        break;
+
+    case 0:
+        if (g->fni8 && check_size_impl(oprsz, 8)) {
+            expand_3i_i64(dofs, aofs, bofs, oprsz, c, g->load_dest, g->fni8);
+        } else if (g->fni4 && check_size_impl(oprsz, 4)) {
+            expand_3i_i32(dofs, aofs, bofs, oprsz, c, g->load_dest, g->fni4);
+        } else {
+            assert(g->fno != NULL);
+            tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, c, g->fno);
+            return;
+        }
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+
+    if (oprsz < maxsz) {
+        expand_clr(dofs + oprsz, maxsz - oprsz);
+    }
+}
+
 /* Expand a vector four-operand operation.  */
 void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen4 *g)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 16/38] tcg: Specify optional vector requirements with a list
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (14 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 15/38] tcg: Implement tcg_gen_gvec_3i() Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 17/38] tcg: Add gvec expanders for vector shift by scalar Richard Henderson
                   ` (24 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Replace the single opcode in .opc with a null-terminated
array in .opt_opc.  We still require that all opcodes be
used with the same .vece.

Validate the contents of this list with CONFIG_DEBUG_TCG.
All tcg_gen_*_vec functions will check any list active
during .fniv expansion.  Swap the active list in and out
as we expand other opcodes, or take control away from the
front-end function.

Convert all existing vector aware front ends.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-gvec.h                   |  24 +-
 tcg/tcg.h                           |  18 ++
 target/arm/translate-sve.c          |   9 +-
 target/arm/translate.c              | 127 +++++++----
 target/ppc/translate/vmx-impl.inc.c |   7 +-
 tcg/tcg-op-gvec.c                   | 341 ++++++++++++++++++----------
 tcg/tcg-op-vec.c                    |  18 ++
 7 files changed, 365 insertions(+), 179 deletions(-)

diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index aaeb6e5d8b..a0e0902f6c 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -91,8 +91,8 @@ typedef struct {
     void (*fniv)(unsigned, TCGv_vec, TCGv_vec);
     /* Expand out-of-line helper w/descriptor.  */
     gen_helper_gvec_2 *fno;
-    /* The opcode, if any, to which this corresponds.  */
-    TCGOpcode opc;
+    /* The optional opcodes, if any, utilized by .fniv.  */
+    const TCGOpcode *opt_opc;
     /* The data argument to the out-of-line helper.  */
     int32_t data;
     /* The vector element size, if applicable.  */
@@ -112,8 +112,8 @@ typedef struct {
     gen_helper_gvec_2 *fno;
     /* Expand out-of-line helper w/descriptor, data as argument.  */
     gen_helper_gvec_2i *fnoi;
-    /* The opcode, if any, to which this corresponds.  */
-    TCGOpcode opc;
+    /* The optional opcodes, if any, utilized by .fniv.  */
+    const TCGOpcode *opt_opc;
     /* The vector element size, if applicable.  */
     uint8_t vece;
     /* Prefer i64 to v64.  */
@@ -131,8 +131,8 @@ typedef struct {
     void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec);
     /* Expand out-of-line helper w/descriptor.  */
     gen_helper_gvec_2i *fno;
-    /* The opcode, if any, to which this corresponds.  */
-    TCGOpcode opc;
+    /* The optional opcodes, if any, utilized by .fniv.  */
+    const TCGOpcode *opt_opc;
     /* The data argument to the out-of-line helper.  */
     uint32_t data;
     /* The vector element size, if applicable.  */
@@ -152,8 +152,8 @@ typedef struct {
     void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec);
     /* Expand out-of-line helper w/descriptor.  */
     gen_helper_gvec_3 *fno;
-    /* The opcode, if any, to which this corresponds.  */
-    TCGOpcode opc;
+    /* The optional opcodes, if any, utilized by .fniv.  */
+    const TCGOpcode *opt_opc;
     /* The data argument to the out-of-line helper.  */
     int32_t data;
     /* The vector element size, if applicable.  */
@@ -175,8 +175,8 @@ typedef struct {
     void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, int64_t);
     /* Expand out-of-line helper w/descriptor, data in descriptor.  */
     gen_helper_gvec_3 *fno;
-    /* The opcode, if any, to which this corresponds.  */
-    TCGOpcode opc;
+    /* The optional opcodes, if any, utilized by .fniv.  */
+    const TCGOpcode *opt_opc;
     /* The vector element size, if applicable.  */
     uint8_t vece;
     /* Prefer i64 to v64.  */
@@ -194,8 +194,8 @@ typedef struct {
     void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, TCGv_vec);
     /* Expand out-of-line helper w/descriptor.  */
     gen_helper_gvec_4 *fno;
-    /* The opcode, if any, to which this corresponds.  */
-    TCGOpcode opc;
+    /* The optional opcodes, if any, utilized by .fniv.  */
+    const TCGOpcode *opt_opc;
     /* The data argument to the out-of-line helper.  */
     int32_t data;
     /* The vector element size, if applicable.  */
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 7b1c15b40b..48d4d2e03e 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -694,6 +694,7 @@ struct TCGContext {
     QSIMPLEQ_HEAD(, TCGLabel) labels;
     int temps_in_use;
     int goto_tb_issue_mask;
+    const TCGOpcode *vecop_list;
 #endif
 
     /* Code generation.  Note that we specifically do not use tcg_insn_unit
@@ -1493,4 +1494,21 @@ void helper_atomic_sto_le_mmu(CPUArchState *env, target_ulong addr, Int128 val,
 void helper_atomic_sto_be_mmu(CPUArchState *env, target_ulong addr, Int128 val,
                               TCGMemOpIdx oi, uintptr_t retaddr);
 
+#ifdef CONFIG_DEBUG_TCG
+void tcg_assert_listed_vecop(TCGOpcode);
+#else
+static inline void tcg_assert_listed_vecop(TCGOpcode op) { }
+#endif
+
+static inline const TCGOpcode *tcg_swap_vecop_list(const TCGOpcode *n)
+{
+#ifdef CONFIG_DEBUG_TCG
+    const TCGOpcode *o = tcg_ctx->vecop_list;
+    tcg_ctx->vecop_list = n;
+    return o;
+#else
+    return NULL;
+#endif
+}
+
 #endif /* TCG_H */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 245cd82621..0682c0d32b 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3302,29 +3302,30 @@ static bool trans_SUB_zzi(DisasContext *s, arg_rri_esz *a)
 
 static bool trans_SUBR_zzi(DisasContext *s, arg_rri_esz *a)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_sub_vec, 0 };
     static const GVecGen2s op[4] = {
         { .fni8 = tcg_gen_vec_sub8_i64,
           .fniv = tcg_gen_sub_vec,
           .fno = gen_helper_sve_subri_b,
-          .opc = INDEX_op_sub_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8,
           .scalar_first = true },
         { .fni8 = tcg_gen_vec_sub16_i64,
           .fniv = tcg_gen_sub_vec,
           .fno = gen_helper_sve_subri_h,
-          .opc = INDEX_op_sub_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16,
           .scalar_first = true },
         { .fni4 = tcg_gen_sub_i32,
           .fniv = tcg_gen_sub_vec,
           .fno = gen_helper_sve_subri_s,
-          .opc = INDEX_op_sub_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32,
           .scalar_first = true },
         { .fni8 = tcg_gen_sub_i64,
           .fniv = tcg_gen_sub_vec,
           .fno = gen_helper_sve_subri_d,
-          .opc = INDEX_op_sub_vec,
+          .opt_opc = vecop_list,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .vece = MO_64,
           .scalar_first = true }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 13e2dc6562..83a008e945 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5772,27 +5772,31 @@ static void gen_ssra_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
     tcg_gen_add_vec(vece, d, d, a);
 }
 
+static const TCGOpcode vecop_list_ssra[] = {
+    INDEX_op_sari_vec, INDEX_op_add_vec, 0
+};
+
 const GVecGen2i ssra_op[4] = {
     { .fni8 = gen_ssra8_i64,
       .fniv = gen_ssra_vec,
       .load_dest = true,
-      .opc = INDEX_op_sari_vec,
+      .opt_opc = vecop_list_ssra,
       .vece = MO_8 },
     { .fni8 = gen_ssra16_i64,
       .fniv = gen_ssra_vec,
       .load_dest = true,
-      .opc = INDEX_op_sari_vec,
+      .opt_opc = vecop_list_ssra,
       .vece = MO_16 },
     { .fni4 = gen_ssra32_i32,
       .fniv = gen_ssra_vec,
       .load_dest = true,
-      .opc = INDEX_op_sari_vec,
+      .opt_opc = vecop_list_ssra,
       .vece = MO_32 },
     { .fni8 = gen_ssra64_i64,
       .fniv = gen_ssra_vec,
       .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+      .opt_opc = vecop_list_ssra,
       .load_dest = true,
-      .opc = INDEX_op_sari_vec,
       .vece = MO_64 },
 };
 
@@ -5826,27 +5830,31 @@ static void gen_usra_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
     tcg_gen_add_vec(vece, d, d, a);
 }
 
+static const TCGOpcode vecop_list_usra[] = {
+    INDEX_op_shri_vec, INDEX_op_add_vec, 0
+};
+
 const GVecGen2i usra_op[4] = {
     { .fni8 = gen_usra8_i64,
       .fniv = gen_usra_vec,
       .load_dest = true,
-      .opc = INDEX_op_shri_vec,
+      .opt_opc = vecop_list_usra,
       .vece = MO_8, },
     { .fni8 = gen_usra16_i64,
       .fniv = gen_usra_vec,
       .load_dest = true,
-      .opc = INDEX_op_shri_vec,
+      .opt_opc = vecop_list_usra,
       .vece = MO_16, },
     { .fni4 = gen_usra32_i32,
       .fniv = gen_usra_vec,
       .load_dest = true,
-      .opc = INDEX_op_shri_vec,
+      .opt_opc = vecop_list_usra,
       .vece = MO_32, },
     { .fni8 = gen_usra64_i64,
       .fniv = gen_usra_vec,
       .prefer_i64 = TCG_TARGET_REG_BITS == 64,
       .load_dest = true,
-      .opc = INDEX_op_shri_vec,
+      .opt_opc = vecop_list_usra,
       .vece = MO_64, },
 };
 
@@ -5904,27 +5912,29 @@ static void gen_shr_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
     }
 }
 
+static const TCGOpcode vecop_list_sri[] = { INDEX_op_shri_vec, 0 };
+
 const GVecGen2i sri_op[4] = {
     { .fni8 = gen_shr8_ins_i64,
       .fniv = gen_shr_ins_vec,
       .load_dest = true,
-      .opc = INDEX_op_shri_vec,
+      .opt_opc = vecop_list_sri,
       .vece = MO_8 },
     { .fni8 = gen_shr16_ins_i64,
       .fniv = gen_shr_ins_vec,
       .load_dest = true,
-      .opc = INDEX_op_shri_vec,
+      .opt_opc = vecop_list_sri,
       .vece = MO_16 },
     { .fni4 = gen_shr32_ins_i32,
       .fniv = gen_shr_ins_vec,
       .load_dest = true,
-      .opc = INDEX_op_shri_vec,
+      .opt_opc = vecop_list_sri,
       .vece = MO_32 },
     { .fni8 = gen_shr64_ins_i64,
       .fniv = gen_shr_ins_vec,
       .prefer_i64 = TCG_TARGET_REG_BITS == 64,
       .load_dest = true,
-      .opc = INDEX_op_shri_vec,
+      .opt_opc = vecop_list_sri,
       .vece = MO_64 },
 };
 
@@ -5980,27 +5990,29 @@ static void gen_shl_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
     }
 }
 
+static const TCGOpcode vecop_list_sli[] = { INDEX_op_shli_vec, 0 };
+
 const GVecGen2i sli_op[4] = {
     { .fni8 = gen_shl8_ins_i64,
       .fniv = gen_shl_ins_vec,
       .load_dest = true,
-      .opc = INDEX_op_shli_vec,
+      .opt_opc = vecop_list_sli,
       .vece = MO_8 },
     { .fni8 = gen_shl16_ins_i64,
       .fniv = gen_shl_ins_vec,
       .load_dest = true,
-      .opc = INDEX_op_shli_vec,
+      .opt_opc = vecop_list_sli,
       .vece = MO_16 },
     { .fni4 = gen_shl32_ins_i32,
       .fniv = gen_shl_ins_vec,
       .load_dest = true,
-      .opc = INDEX_op_shli_vec,
+      .opt_opc = vecop_list_sli,
       .vece = MO_32 },
     { .fni8 = gen_shl64_ins_i64,
       .fniv = gen_shl_ins_vec,
       .prefer_i64 = TCG_TARGET_REG_BITS == 64,
       .load_dest = true,
-      .opc = INDEX_op_shli_vec,
+      .opt_opc = vecop_list_sli,
       .vece = MO_64 },
 };
 
@@ -6067,51 +6079,60 @@ static void gen_mls_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
 /* Note that while NEON does not support VMLA and VMLS as 64-bit ops,
  * these tables are shared with AArch64 which does support them.
  */
+
+static const TCGOpcode vecop_list_mla[] = {
+    INDEX_op_mul_vec, INDEX_op_add_vec, 0
+};
+
+static const TCGOpcode vecop_list_mls[] = {
+    INDEX_op_mul_vec, INDEX_op_sub_vec, 0
+};
+
 const GVecGen3 mla_op[4] = {
     { .fni4 = gen_mla8_i32,
       .fniv = gen_mla_vec,
-      .opc = INDEX_op_mul_vec,
       .load_dest = true,
+      .opt_opc = vecop_list_mla,
       .vece = MO_8 },
     { .fni4 = gen_mla16_i32,
       .fniv = gen_mla_vec,
-      .opc = INDEX_op_mul_vec,
       .load_dest = true,
+      .opt_opc = vecop_list_mla,
       .vece = MO_16 },
     { .fni4 = gen_mla32_i32,
       .fniv = gen_mla_vec,
-      .opc = INDEX_op_mul_vec,
       .load_dest = true,
+      .opt_opc = vecop_list_mla,
       .vece = MO_32 },
     { .fni8 = gen_mla64_i64,
       .fniv = gen_mla_vec,
-      .opc = INDEX_op_mul_vec,
       .prefer_i64 = TCG_TARGET_REG_BITS == 64,
       .load_dest = true,
+      .opt_opc = vecop_list_mla,
       .vece = MO_64 },
 };
 
 const GVecGen3 mls_op[4] = {
     { .fni4 = gen_mls8_i32,
       .fniv = gen_mls_vec,
-      .opc = INDEX_op_mul_vec,
       .load_dest = true,
+      .opt_opc = vecop_list_mls,
       .vece = MO_8 },
     { .fni4 = gen_mls16_i32,
       .fniv = gen_mls_vec,
-      .opc = INDEX_op_mul_vec,
       .load_dest = true,
+      .opt_opc = vecop_list_mls,
       .vece = MO_16 },
     { .fni4 = gen_mls32_i32,
       .fniv = gen_mls_vec,
-      .opc = INDEX_op_mul_vec,
       .load_dest = true,
+      .opt_opc = vecop_list_mls,
       .vece = MO_32 },
     { .fni8 = gen_mls64_i64,
       .fniv = gen_mls_vec,
-      .opc = INDEX_op_mul_vec,
       .prefer_i64 = TCG_TARGET_REG_BITS == 64,
       .load_dest = true,
+      .opt_opc = vecop_list_mls,
       .vece = MO_64 },
 };
 
@@ -6137,23 +6158,25 @@ static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
     tcg_gen_cmp_vec(TCG_COND_NE, vece, d, d, a);
 }
 
+static const TCGOpcode vecop_list_cmtst[] = { INDEX_op_cmp_vec, 0 };
+
 const GVecGen3 cmtst_op[4] = {
     { .fni4 = gen_helper_neon_tst_u8,
       .fniv = gen_cmtst_vec,
-      .opc = INDEX_op_cmp_vec,
+      .opt_opc = vecop_list_cmtst,
       .vece = MO_8 },
     { .fni4 = gen_helper_neon_tst_u16,
       .fniv = gen_cmtst_vec,
-      .opc = INDEX_op_cmp_vec,
+      .opt_opc = vecop_list_cmtst,
       .vece = MO_16 },
     { .fni4 = gen_cmtst_i32,
       .fniv = gen_cmtst_vec,
-      .opc = INDEX_op_cmp_vec,
+      .opt_opc = vecop_list_cmtst,
       .vece = MO_32 },
     { .fni8 = gen_cmtst_i64,
       .fniv = gen_cmtst_vec,
       .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-      .opc = INDEX_op_cmp_vec,
+      .opt_opc = vecop_list_cmtst,
       .vece = MO_64 },
 };
 
@@ -6168,26 +6191,30 @@ static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
     tcg_temp_free_vec(x);
 }
 
+static const TCGOpcode vecop_list_uqadd[] = {
+    INDEX_op_usadd_vec, INDEX_op_cmp_vec, INDEX_op_add_vec, 0
+};
+
 const GVecGen4 uqadd_op[4] = {
     { .fniv = gen_uqadd_vec,
       .fno = gen_helper_gvec_uqadd_b,
-      .opc = INDEX_op_usadd_vec,
       .write_aofs = true,
+      .opt_opc = vecop_list_uqadd,
       .vece = MO_8 },
     { .fniv = gen_uqadd_vec,
       .fno = gen_helper_gvec_uqadd_h,
-      .opc = INDEX_op_usadd_vec,
       .write_aofs = true,
+      .opt_opc = vecop_list_uqadd,
       .vece = MO_16 },
     { .fniv = gen_uqadd_vec,
       .fno = gen_helper_gvec_uqadd_s,
-      .opc = INDEX_op_usadd_vec,
       .write_aofs = true,
+      .opt_opc = vecop_list_uqadd,
       .vece = MO_32 },
     { .fniv = gen_uqadd_vec,
       .fno = gen_helper_gvec_uqadd_d,
-      .opc = INDEX_op_usadd_vec,
       .write_aofs = true,
+      .opt_opc = vecop_list_uqadd,
       .vece = MO_64 },
 };
 
@@ -6202,25 +6229,29 @@ static void gen_sqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
     tcg_temp_free_vec(x);
 }
 
+static const TCGOpcode vecop_list_sqadd[] = {
+    INDEX_op_ssadd_vec, INDEX_op_cmp_vec, INDEX_op_add_vec, 0
+};
+
 const GVecGen4 sqadd_op[4] = {
     { .fniv = gen_sqadd_vec,
       .fno = gen_helper_gvec_sqadd_b,
-      .opc = INDEX_op_ssadd_vec,
+      .opt_opc = vecop_list_sqadd,
       .write_aofs = true,
       .vece = MO_8 },
     { .fniv = gen_sqadd_vec,
       .fno = gen_helper_gvec_sqadd_h,
-      .opc = INDEX_op_ssadd_vec,
+      .opt_opc = vecop_list_sqadd,
       .write_aofs = true,
       .vece = MO_16 },
     { .fniv = gen_sqadd_vec,
       .fno = gen_helper_gvec_sqadd_s,
-      .opc = INDEX_op_ssadd_vec,
+      .opt_opc = vecop_list_sqadd,
       .write_aofs = true,
       .vece = MO_32 },
     { .fniv = gen_sqadd_vec,
       .fno = gen_helper_gvec_sqadd_d,
-      .opc = INDEX_op_ssadd_vec,
+      .opt_opc = vecop_list_sqadd,
       .write_aofs = true,
       .vece = MO_64 },
 };
@@ -6236,25 +6267,29 @@ static void gen_uqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
     tcg_temp_free_vec(x);
 }
 
+static const TCGOpcode vecop_list_uqsub[] = {
+    INDEX_op_ussub_vec, INDEX_op_cmp_vec, INDEX_op_sub_vec, 0
+};
+
 const GVecGen4 uqsub_op[4] = {
     { .fniv = gen_uqsub_vec,
       .fno = gen_helper_gvec_uqsub_b,
-      .opc = INDEX_op_ussub_vec,
+      .opt_opc = vecop_list_uqsub,
       .write_aofs = true,
       .vece = MO_8 },
     { .fniv = gen_uqsub_vec,
       .fno = gen_helper_gvec_uqsub_h,
-      .opc = INDEX_op_ussub_vec,
+      .opt_opc = vecop_list_uqsub,
       .write_aofs = true,
       .vece = MO_16 },
     { .fniv = gen_uqsub_vec,
       .fno = gen_helper_gvec_uqsub_s,
-      .opc = INDEX_op_ussub_vec,
+      .opt_opc = vecop_list_uqsub,
       .write_aofs = true,
       .vece = MO_32 },
     { .fniv = gen_uqsub_vec,
       .fno = gen_helper_gvec_uqsub_d,
-      .opc = INDEX_op_ussub_vec,
+      .opt_opc = vecop_list_uqsub,
       .write_aofs = true,
       .vece = MO_64 },
 };
@@ -6270,25 +6305,29 @@ static void gen_sqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
     tcg_temp_free_vec(x);
 }
 
+static const TCGOpcode vecop_list_sqsub[] = {
+    INDEX_op_sssub_vec, INDEX_op_cmp_vec, INDEX_op_sub_vec, 0
+};
+
 const GVecGen4 sqsub_op[4] = {
     { .fniv = gen_sqsub_vec,
       .fno = gen_helper_gvec_sqsub_b,
-      .opc = INDEX_op_sssub_vec,
+      .opt_opc = vecop_list_sqsub,
       .write_aofs = true,
       .vece = MO_8 },
     { .fniv = gen_sqsub_vec,
       .fno = gen_helper_gvec_sqsub_h,
-      .opc = INDEX_op_sssub_vec,
+      .opt_opc = vecop_list_sqsub,
       .write_aofs = true,
       .vece = MO_16 },
     { .fniv = gen_sqsub_vec,
       .fno = gen_helper_gvec_sqsub_s,
-      .opc = INDEX_op_sssub_vec,
+      .opt_opc = vecop_list_sqsub,
       .write_aofs = true,
       .vece = MO_32 },
     { .fniv = gen_sqsub_vec,
       .fno = gen_helper_gvec_sqsub_d,
-      .opc = INDEX_op_sssub_vec,
+      .opt_opc = vecop_list_sqsub,
       .write_aofs = true,
       .vece = MO_64 },
 };
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index eb10c533ca..c83e605a00 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -562,10 +562,15 @@ static void glue(glue(gen_, NAME), _vec)(unsigned vece, TCGv_vec t,     \
 }                                                                       \
 static void glue(gen_, NAME)(DisasContext *ctx)                         \
 {                                                                       \
+    static const TCGOpcode vecop_list[] = {                             \
+        glue(glue(INDEX_op_, NORM), _vec),                              \
+        glue(glue(INDEX_op_, SAT), _vec),                               \
+        INDEX_op_cmp_vec, 0                                             \
+    };                                                                  \
     static const GVecGen4 g = {                                         \
         .fniv = glue(glue(gen_, NAME), _vec),                           \
         .fno = glue(gen_helper_, NAME),                                 \
-        .opc = glue(glue(INDEX_op_, SAT), _vec),                        \
+        .opt_opc = vecop_list,                                          \
         .write_aofs = true,                                             \
         .vece = VECE,                                                   \
     };                                                                  \
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 3eb9126f50..40858a83e0 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -26,6 +26,76 @@
 
 #define MAX_UNROLL  4
 
+/*
+ * Vector optional opcode tracking.
+ * Except for the basic logical operations (and, or, xor), and
+ * data movement (mov, ld, st, dupi), many vector opcodes are
+ * optional and may not be supported on the host.  Thank Intel
+ * for the irregularity in their instruction set.
+ *
+ * The gvec expanders allow custom vector operations to be composed,
+ * generally via the .fniv callback in the GVecGen* structures.  At
+ * the same time, in deciding whether to use this hook we need to
+ * know if the host supports the required operations.  This is
+ * presented as an array of opcodes, terminated by 0.  Each opcode
+ * is assumed to be expanded with the given VECE.
+ *
+ * For debugging, we want to validate this array.  Therefore, when
+ * tcg_ctx->vec_opt_opc is non-NULL, the tcg_gen_*_vec expanders
+ * will validate that their opcode is present in the list.
+ */
+#ifdef CONFIG_DEBUG_TCG
+void tcg_assert_listed_vecop(TCGOpcode op)
+{
+    const TCGOpcode *p = tcg_ctx->vecop_list;
+    if (p) {
+        for (; *p; ++p) {
+            if (*p == op) {
+                return;
+            }
+        }
+        g_assert_not_reached();
+    }
+}
+
+static const TCGOpcode vecop_list_empty[1];
+#else
+#define vecop_list_empty NULL
+#endif
+
+static bool tcg_can_emit_vecop_list(const TCGOpcode *list,
+                                    TCGType type, unsigned vece)
+{
+    if (list == NULL) {
+        return true;
+    }
+
+    for (; *list; ++list) {
+        TCGOpcode opc = *list;
+
+        if (tcg_can_emit_vec_op(opc, type, vece)) {
+            continue;
+        }
+
+        /*
+         * The opcode list is created by front ends based on what they
+         * actually invoke.  We must mirror the logic in tcg-op-vec.c
+         * for generic expansions using other opcodes.
+         */
+        switch (opc) {
+        case INDEX_op_neg_vec:
+            if (tcg_can_emit_vec_op(INDEX_op_sub_vec, type, vece)) {
+                continue;
+            }
+            break;
+        default:
+            break;
+        }
+        return false;
+    }
+    return true;
+}
+
 /* Verify vector size and alignment rules.  OFS should be the OR of all
    of the operand offsets so that we can check them all at once.  */
 static void check_size_align(uint32_t oprsz, uint32_t maxsz, uint32_t ofs)
@@ -360,31 +430,29 @@ static void gen_dup_i64(unsigned vece, TCGv_i64 out, TCGv_i64 in)
  * on elements of size VECE in the selected type.  Do not select V64 if
  * PREFER_I64 is true.  Return 0 if no vector type is selected.
  */
-static TCGType choose_vector_type(TCGOpcode op, unsigned vece, uint32_t size,
-                                  bool prefer_i64)
+static TCGType choose_vector_type(const TCGOpcode *list, unsigned vece,
+                                  uint32_t size, bool prefer_i64)
 {
     if (TCG_TARGET_HAS_v256 && check_size_impl(size, 32)) {
-        if (op == 0) {
-            return TCG_TYPE_V256;
-        }
-        /* Recall that ARM SVE allows vector sizes that are not a
+        /*
+         * Recall that ARM SVE allows vector sizes that are not a
          * power of 2, but always a multiple of 16.  The intent is
          * that e.g. size == 80 would be expanded with 2x32 + 1x16.
          * It is hard to imagine a case in which v256 is supported
          * but v128 is not, but check anyway.
          */
-        if (tcg_can_emit_vec_op(op, TCG_TYPE_V256, vece)
+        if (tcg_can_emit_vecop_list(list, TCG_TYPE_V256, vece)
             && (size % 32 == 0
-                || tcg_can_emit_vec_op(op, TCG_TYPE_V128, vece))) {
+                || tcg_can_emit_vecop_list(list, TCG_TYPE_V128, vece))) {
             return TCG_TYPE_V256;
         }
     }
     if (TCG_TARGET_HAS_v128 && check_size_impl(size, 16)
-        && (op == 0 || tcg_can_emit_vec_op(op, TCG_TYPE_V128, vece))) {
+        && tcg_can_emit_vecop_list(list, TCG_TYPE_V128, vece)) {
         return TCG_TYPE_V128;
     }
     if (TCG_TARGET_HAS_v64 && !prefer_i64 && check_size_impl(size, 8)
-        && (op == 0 || tcg_can_emit_vec_op(op, TCG_TYPE_V64, vece))) {
+        && tcg_can_emit_vecop_list(list, TCG_TYPE_V64, vece)) {
         return TCG_TYPE_V64;
     }
     return 0;
@@ -452,7 +520,7 @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t oprsz,
     /* Implement inline with a vector type, if possible.
      * Prefer integer when 64-bit host and no variable dup.
      */
-    type = choose_vector_type(0, vece, oprsz,
+    type = choose_vector_type(NULL, vece, oprsz,
                               (TCG_TARGET_REG_BITS == 64 && in_32 == NULL
                                && (in_64 == NULL || vece == MO_64)));
     if (type != 0) {
@@ -987,6 +1055,8 @@ static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen2 *g)
 {
+    const TCGOpcode *this_list = g->opt_opc ? : vecop_list_empty;
+    const TCGOpcode *hold_list = tcg_swap_vecop_list(this_list);
     TCGType type;
     uint32_t some;
 
@@ -995,7 +1065,7 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,
 
     type = 0;
     if (g->fniv) {
-        type = choose_vector_type(g->opc, g->vece, oprsz, g->prefer_i64);
+        type = choose_vector_type(g->opt_opc, g->vece, oprsz, g->prefer_i64);
     }
     switch (type) {
     case TCG_TYPE_V256:
@@ -1028,13 +1098,14 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,
         } else {
             assert(g->fno != NULL);
             tcg_gen_gvec_2_ool(dofs, aofs, oprsz, maxsz, g->data, g->fno);
-            return;
+            oprsz = maxsz;
         }
         break;
 
     default:
         g_assert_not_reached();
     }
+    tcg_swap_vecop_list(hold_list);
 
     if (oprsz < maxsz) {
         expand_clr(dofs + oprsz, maxsz - oprsz);
@@ -1045,6 +1116,8 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
                      uint32_t maxsz, int64_t c, const GVecGen2i *g)
 {
+    const TCGOpcode *this_list = g->opt_opc ? : vecop_list_empty;
+    const TCGOpcode *hold_list = tcg_swap_vecop_list(this_list);
     TCGType type;
     uint32_t some;
 
@@ -1053,7 +1126,7 @@ void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
 
     type = 0;
     if (g->fniv) {
-        type = choose_vector_type(g->opc, g->vece, oprsz, g->prefer_i64);
+        type = choose_vector_type(g->opt_opc, g->vece, oprsz, g->prefer_i64);
     }
     switch (type) {
     case TCG_TYPE_V256:
@@ -1095,13 +1168,14 @@ void tcg_gen_gvec_2i(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
                                     maxsz, c, g->fnoi);
                 tcg_temp_free_i64(tcg_c);
             }
-            return;
+            oprsz = maxsz;
         }
         break;
 
     default:
         g_assert_not_reached();
     }
+    tcg_swap_vecop_list(hold_list);
 
     if (oprsz < maxsz) {
         expand_clr(dofs + oprsz, maxsz - oprsz);
@@ -1119,9 +1193,11 @@ void tcg_gen_gvec_2s(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
 
     type = 0;
     if (g->fniv) {
-        type = choose_vector_type(g->opc, g->vece, oprsz, g->prefer_i64);
+        type = choose_vector_type(g->opt_opc, g->vece, oprsz, g->prefer_i64);
     }
     if (type != 0) {
+        const TCGOpcode *this_list = g->opt_opc ? : vecop_list_empty;
+        const TCGOpcode *hold_list = tcg_swap_vecop_list(this_list);
         TCGv_vec t_vec = tcg_temp_new_vec(type);
         uint32_t some;
 
@@ -1159,6 +1235,7 @@ void tcg_gen_gvec_2s(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
             g_assert_not_reached();
         }
         tcg_temp_free_vec(t_vec);
+        tcg_swap_vecop_list(hold_list);
     } else if (g->fni8 && check_size_impl(oprsz, 8)) {
         TCGv_i64 t64 = tcg_temp_new_i64();
 
@@ -1186,6 +1263,8 @@ void tcg_gen_gvec_2s(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
 void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen3 *g)
 {
+    const TCGOpcode *this_list = g->opt_opc ? : vecop_list_empty;
+    const TCGOpcode *hold_list = tcg_swap_vecop_list(this_list);
     TCGType type;
     uint32_t some;
 
@@ -1194,7 +1273,7 @@ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,
 
     type = 0;
     if (g->fniv) {
-        type = choose_vector_type(g->opc, g->vece, oprsz, g->prefer_i64);
+        type = choose_vector_type(g->opt_opc, g->vece, oprsz, g->prefer_i64);
     }
     switch (type) {
     case TCG_TYPE_V256:
@@ -1232,13 +1311,14 @@ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,
             assert(g->fno != NULL);
             tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz,
                                maxsz, g->data, g->fno);
-            return;
+            oprsz = maxsz;
         }
         break;
 
     default:
         g_assert_not_reached();
     }
+    tcg_swap_vecop_list(hold_list);
 
     if (oprsz < maxsz) {
         expand_clr(dofs + oprsz, maxsz - oprsz);
@@ -1250,6 +1330,8 @@ void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                      uint32_t oprsz, uint32_t maxsz, int64_t c,
                      const GVecGen3i *g)
 {
+    const TCGOpcode *this_list = g->opt_opc ? : vecop_list_empty;
+    const TCGOpcode *hold_list = tcg_swap_vecop_list(this_list);
     TCGType type;
     uint32_t some;
 
@@ -1258,7 +1340,7 @@ void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
 
     type = 0;
     if (g->fniv) {
-        type = choose_vector_type(g->opc, g->vece, oprsz, g->prefer_i64);
+        type = choose_vector_type(g->opt_opc, g->vece, oprsz, g->prefer_i64);
     }
     switch (type) {
     case TCG_TYPE_V256:
@@ -1296,13 +1378,14 @@ void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
         } else {
             assert(g->fno != NULL);
             tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, c, g->fno);
-            return;
+            oprsz = maxsz;
         }
         break;
 
     default:
         g_assert_not_reached();
     }
+    tcg_swap_vecop_list(hold_list);
 
     if (oprsz < maxsz) {
         expand_clr(dofs + oprsz, maxsz - oprsz);
@@ -1313,6 +1396,8 @@ void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
 void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen4 *g)
 {
+    const TCGOpcode *this_list = g->opt_opc ? : vecop_list_empty;
+    const TCGOpcode *hold_list = tcg_swap_vecop_list(this_list);
     TCGType type;
     uint32_t some;
 
@@ -1321,7 +1406,7 @@ void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
 
     type = 0;
     if (g->fniv) {
-        type = choose_vector_type(g->opc, g->vece, oprsz, g->prefer_i64);
+        type = choose_vector_type(g->opt_opc, g->vece, oprsz, g->prefer_i64);
     }
     switch (type) {
     case TCG_TYPE_V256:
@@ -1362,13 +1447,14 @@ void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
             assert(g->fno != NULL);
             tcg_gen_gvec_4_ool(dofs, aofs, bofs, cofs,
                                oprsz, maxsz, g->data, g->fno);
-            return;
+            oprsz = maxsz;
         }
         break;
 
     default:
         g_assert_not_reached();
     }
+    tcg_swap_vecop_list(hold_list);
 
     if (oprsz < maxsz) {
         expand_clr(dofs + oprsz, maxsz - oprsz);
@@ -1423,7 +1509,7 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
                           uint32_t oprsz, uint32_t maxsz)
 {
     if (vece <= MO_64) {
-        TCGType type = choose_vector_type(0, vece, oprsz, 0);
+        TCGType type = choose_vector_type(NULL, vece, oprsz, 0);
         if (type != 0) {
             TCGv_vec t_vec = tcg_temp_new_vec(type);
             tcg_gen_dup_mem_vec(vece, t_vec, cpu_env, aofs);
@@ -1573,6 +1659,8 @@ void tcg_gen_vec_add32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
     tcg_temp_free_i64(t2);
 }
 
+static const TCGOpcode vecop_list_add[] = { INDEX_op_add_vec, 0 };
+
 void tcg_gen_gvec_add(unsigned vece, uint32_t dofs, uint32_t aofs,
                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
@@ -1580,22 +1668,22 @@ void tcg_gen_gvec_add(unsigned vece, uint32_t dofs, uint32_t aofs,
         { .fni8 = tcg_gen_vec_add8_i64,
           .fniv = tcg_gen_add_vec,
           .fno = gen_helper_gvec_add8,
-          .opc = INDEX_op_add_vec,
+          .opt_opc = vecop_list_add,
           .vece = MO_8 },
         { .fni8 = tcg_gen_vec_add16_i64,
           .fniv = tcg_gen_add_vec,
           .fno = gen_helper_gvec_add16,
-          .opc = INDEX_op_add_vec,
+          .opt_opc = vecop_list_add,
           .vece = MO_16 },
         { .fni4 = tcg_gen_add_i32,
           .fniv = tcg_gen_add_vec,
           .fno = gen_helper_gvec_add32,
-          .opc = INDEX_op_add_vec,
+          .opt_opc = vecop_list_add,
           .vece = MO_32 },
         { .fni8 = tcg_gen_add_i64,
           .fniv = tcg_gen_add_vec,
           .fno = gen_helper_gvec_add64,
-          .opc = INDEX_op_add_vec,
+          .opt_opc = vecop_list_add,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .vece = MO_64 },
     };
@@ -1611,22 +1699,22 @@ void tcg_gen_gvec_adds(unsigned vece, uint32_t dofs, uint32_t aofs,
         { .fni8 = tcg_gen_vec_add8_i64,
           .fniv = tcg_gen_add_vec,
           .fno = gen_helper_gvec_adds8,
-          .opc = INDEX_op_add_vec,
+          .opt_opc = vecop_list_add,
           .vece = MO_8 },
         { .fni8 = tcg_gen_vec_add16_i64,
           .fniv = tcg_gen_add_vec,
           .fno = gen_helper_gvec_adds16,
-          .opc = INDEX_op_add_vec,
+          .opt_opc = vecop_list_add,
           .vece = MO_16 },
         { .fni4 = tcg_gen_add_i32,
           .fniv = tcg_gen_add_vec,
           .fno = gen_helper_gvec_adds32,
-          .opc = INDEX_op_add_vec,
+          .opt_opc = vecop_list_add,
           .vece = MO_32 },
         { .fni8 = tcg_gen_add_i64,
           .fniv = tcg_gen_add_vec,
           .fno = gen_helper_gvec_adds64,
-          .opc = INDEX_op_add_vec,
+          .opt_opc = vecop_list_add,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .vece = MO_64 },
     };
@@ -1643,6 +1731,8 @@ void tcg_gen_gvec_addi(unsigned vece, uint32_t dofs, uint32_t aofs,
     tcg_temp_free_i64(tmp);
 }
 
+static const TCGOpcode vecop_list_sub[] = { INDEX_op_sub_vec, 0 };
+
 void tcg_gen_gvec_subs(unsigned vece, uint32_t dofs, uint32_t aofs,
                        TCGv_i64 c, uint32_t oprsz, uint32_t maxsz)
 {
@@ -1650,22 +1740,22 @@ void tcg_gen_gvec_subs(unsigned vece, uint32_t dofs, uint32_t aofs,
         { .fni8 = tcg_gen_vec_sub8_i64,
           .fniv = tcg_gen_sub_vec,
           .fno = gen_helper_gvec_subs8,
-          .opc = INDEX_op_sub_vec,
+          .opt_opc = vecop_list_sub,
           .vece = MO_8 },
         { .fni8 = tcg_gen_vec_sub16_i64,
           .fniv = tcg_gen_sub_vec,
           .fno = gen_helper_gvec_subs16,
-          .opc = INDEX_op_sub_vec,
+          .opt_opc = vecop_list_sub,
           .vece = MO_16 },
         { .fni4 = tcg_gen_sub_i32,
           .fniv = tcg_gen_sub_vec,
           .fno = gen_helper_gvec_subs32,
-          .opc = INDEX_op_sub_vec,
+          .opt_opc = vecop_list_sub,
           .vece = MO_32 },
         { .fni8 = tcg_gen_sub_i64,
           .fniv = tcg_gen_sub_vec,
           .fno = gen_helper_gvec_subs64,
-          .opc = INDEX_op_sub_vec,
+          .opt_opc = vecop_list_sub,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .vece = MO_64 },
     };
@@ -1729,22 +1819,22 @@ void tcg_gen_gvec_sub(unsigned vece, uint32_t dofs, uint32_t aofs,
         { .fni8 = tcg_gen_vec_sub8_i64,
           .fniv = tcg_gen_sub_vec,
           .fno = gen_helper_gvec_sub8,
-          .opc = INDEX_op_sub_vec,
+          .opt_opc = vecop_list_sub,
           .vece = MO_8 },
         { .fni8 = tcg_gen_vec_sub16_i64,
           .fniv = tcg_gen_sub_vec,
           .fno = gen_helper_gvec_sub16,
-          .opc = INDEX_op_sub_vec,
+          .opt_opc = vecop_list_sub,
           .vece = MO_16 },
         { .fni4 = tcg_gen_sub_i32,
           .fniv = tcg_gen_sub_vec,
           .fno = gen_helper_gvec_sub32,
-          .opc = INDEX_op_sub_vec,
+          .opt_opc = vecop_list_sub,
           .vece = MO_32 },
         { .fni8 = tcg_gen_sub_i64,
           .fniv = tcg_gen_sub_vec,
           .fno = gen_helper_gvec_sub64,
-          .opc = INDEX_op_sub_vec,
+          .opt_opc = vecop_list_sub,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .vece = MO_64 },
     };
@@ -1753,27 +1843,29 @@ void tcg_gen_gvec_sub(unsigned vece, uint32_t dofs, uint32_t aofs,
     tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
 }
 
+static const TCGOpcode vecop_list_mul[] = { INDEX_op_mul_vec, 0 };
+
 void tcg_gen_gvec_mul(unsigned vece, uint32_t dofs, uint32_t aofs,
                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
     static const GVecGen3 g[4] = {
         { .fniv = tcg_gen_mul_vec,
           .fno = gen_helper_gvec_mul8,
-          .opc = INDEX_op_mul_vec,
+          .opt_opc = vecop_list_mul,
           .vece = MO_8 },
         { .fniv = tcg_gen_mul_vec,
           .fno = gen_helper_gvec_mul16,
-          .opc = INDEX_op_mul_vec,
+          .opt_opc = vecop_list_mul,
           .vece = MO_16 },
         { .fni4 = tcg_gen_mul_i32,
           .fniv = tcg_gen_mul_vec,
           .fno = gen_helper_gvec_mul32,
-          .opc = INDEX_op_mul_vec,
+          .opt_opc = vecop_list_mul,
           .vece = MO_32 },
         { .fni8 = tcg_gen_mul_i64,
           .fniv = tcg_gen_mul_vec,
           .fno = gen_helper_gvec_mul64,
-          .opc = INDEX_op_mul_vec,
+          .opt_opc = vecop_list_mul,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .vece = MO_64 },
     };
@@ -1788,21 +1880,21 @@ void tcg_gen_gvec_muls(unsigned vece, uint32_t dofs, uint32_t aofs,
     static const GVecGen2s g[4] = {
         { .fniv = tcg_gen_mul_vec,
           .fno = gen_helper_gvec_muls8,
-          .opc = INDEX_op_mul_vec,
+          .opt_opc = vecop_list_mul,
           .vece = MO_8 },
         { .fniv = tcg_gen_mul_vec,
           .fno = gen_helper_gvec_muls16,
-          .opc = INDEX_op_mul_vec,
+          .opt_opc = vecop_list_mul,
           .vece = MO_16 },
         { .fni4 = tcg_gen_mul_i32,
           .fniv = tcg_gen_mul_vec,
           .fno = gen_helper_gvec_muls32,
-          .opc = INDEX_op_mul_vec,
+          .opt_opc = vecop_list_mul,
           .vece = MO_32 },
         { .fni8 = tcg_gen_mul_i64,
           .fniv = tcg_gen_mul_vec,
           .fno = gen_helper_gvec_muls64,
-          .opc = INDEX_op_mul_vec,
+          .opt_opc = vecop_list_mul,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .vece = MO_64 },
     };
@@ -1822,22 +1914,23 @@ void tcg_gen_gvec_muli(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_ssadd(unsigned vece, uint32_t dofs, uint32_t aofs,
                         uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_ssadd_vec, 0 };
     static const GVecGen3 g[4] = {
         { .fniv = tcg_gen_ssadd_vec,
           .fno = gen_helper_gvec_ssadd8,
-          .opc = INDEX_op_ssadd_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fniv = tcg_gen_ssadd_vec,
           .fno = gen_helper_gvec_ssadd16,
-          .opc = INDEX_op_ssadd_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fniv = tcg_gen_ssadd_vec,
           .fno = gen_helper_gvec_ssadd32,
-          .opc = INDEX_op_ssadd_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fniv = tcg_gen_ssadd_vec,
           .fno = gen_helper_gvec_ssadd64,
-          .opc = INDEX_op_ssadd_vec,
+          .opt_opc = vecop_list,
           .vece = MO_64 },
     };
     tcg_debug_assert(vece <= MO_64);
@@ -1847,22 +1940,23 @@ void tcg_gen_gvec_ssadd(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_sssub(unsigned vece, uint32_t dofs, uint32_t aofs,
                         uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_sssub_vec, 0 };
     static const GVecGen3 g[4] = {
         { .fniv = tcg_gen_sssub_vec,
           .fno = gen_helper_gvec_sssub8,
-          .opc = INDEX_op_sssub_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fniv = tcg_gen_sssub_vec,
           .fno = gen_helper_gvec_sssub16,
-          .opc = INDEX_op_sssub_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fniv = tcg_gen_sssub_vec,
           .fno = gen_helper_gvec_sssub32,
-          .opc = INDEX_op_sssub_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fniv = tcg_gen_sssub_vec,
           .fno = gen_helper_gvec_sssub64,
-          .opc = INDEX_op_sssub_vec,
+          .opt_opc = vecop_list,
           .vece = MO_64 },
     };
     tcg_debug_assert(vece <= MO_64);
@@ -1888,24 +1982,25 @@ static void tcg_gen_usadd_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 void tcg_gen_gvec_usadd(unsigned vece, uint32_t dofs, uint32_t aofs,
                         uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_usadd_vec, 0 };
     static const GVecGen3 g[4] = {
         { .fniv = tcg_gen_usadd_vec,
           .fno = gen_helper_gvec_usadd8,
-          .opc = INDEX_op_usadd_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fniv = tcg_gen_usadd_vec,
           .fno = gen_helper_gvec_usadd16,
-          .opc = INDEX_op_usadd_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fni4 = tcg_gen_usadd_i32,
           .fniv = tcg_gen_usadd_vec,
           .fno = gen_helper_gvec_usadd32,
-          .opc = INDEX_op_usadd_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fni8 = tcg_gen_usadd_i64,
           .fniv = tcg_gen_usadd_vec,
           .fno = gen_helper_gvec_usadd64,
-          .opc = INDEX_op_usadd_vec,
+          .opt_opc = vecop_list,
           .vece = MO_64 }
     };
     tcg_debug_assert(vece <= MO_64);
@@ -1931,24 +2026,25 @@ static void tcg_gen_ussub_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 void tcg_gen_gvec_ussub(unsigned vece, uint32_t dofs, uint32_t aofs,
                         uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_ussub_vec, 0 };
     static const GVecGen3 g[4] = {
         { .fniv = tcg_gen_ussub_vec,
           .fno = gen_helper_gvec_ussub8,
-          .opc = INDEX_op_ussub_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fniv = tcg_gen_ussub_vec,
           .fno = gen_helper_gvec_ussub16,
-          .opc = INDEX_op_ussub_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fni4 = tcg_gen_ussub_i32,
           .fniv = tcg_gen_ussub_vec,
           .fno = gen_helper_gvec_ussub32,
-          .opc = INDEX_op_ussub_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fni8 = tcg_gen_ussub_i64,
           .fniv = tcg_gen_ussub_vec,
           .fno = gen_helper_gvec_ussub64,
-          .opc = INDEX_op_ussub_vec,
+          .opt_opc = vecop_list,
           .vece = MO_64 }
     };
     tcg_debug_assert(vece <= MO_64);
@@ -1958,24 +2054,25 @@ void tcg_gen_gvec_ussub(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_smin(unsigned vece, uint32_t dofs, uint32_t aofs,
                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_smin_vec, 0 };
     static const GVecGen3 g[4] = {
         { .fniv = tcg_gen_smin_vec,
           .fno = gen_helper_gvec_smin8,
-          .opc = INDEX_op_smin_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fniv = tcg_gen_smin_vec,
           .fno = gen_helper_gvec_smin16,
-          .opc = INDEX_op_smin_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fni4 = tcg_gen_smin_i32,
           .fniv = tcg_gen_smin_vec,
           .fno = gen_helper_gvec_smin32,
-          .opc = INDEX_op_smin_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fni8 = tcg_gen_smin_i64,
           .fniv = tcg_gen_smin_vec,
           .fno = gen_helper_gvec_smin64,
-          .opc = INDEX_op_smin_vec,
+          .opt_opc = vecop_list,
           .vece = MO_64 }
     };
     tcg_debug_assert(vece <= MO_64);
@@ -1985,24 +2082,25 @@ void tcg_gen_gvec_smin(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_umin(unsigned vece, uint32_t dofs, uint32_t aofs,
                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_umin_vec, 0 };
     static const GVecGen3 g[4] = {
         { .fniv = tcg_gen_umin_vec,
           .fno = gen_helper_gvec_umin8,
-          .opc = INDEX_op_umin_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fniv = tcg_gen_umin_vec,
           .fno = gen_helper_gvec_umin16,
-          .opc = INDEX_op_umin_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fni4 = tcg_gen_umin_i32,
           .fniv = tcg_gen_umin_vec,
           .fno = gen_helper_gvec_umin32,
-          .opc = INDEX_op_umin_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fni8 = tcg_gen_umin_i64,
           .fniv = tcg_gen_umin_vec,
           .fno = gen_helper_gvec_umin64,
-          .opc = INDEX_op_umin_vec,
+          .opt_opc = vecop_list,
           .vece = MO_64 }
     };
     tcg_debug_assert(vece <= MO_64);
@@ -2012,24 +2110,25 @@ void tcg_gen_gvec_umin(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_smax(unsigned vece, uint32_t dofs, uint32_t aofs,
                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_smax_vec, 0 };
     static const GVecGen3 g[4] = {
         { .fniv = tcg_gen_smax_vec,
           .fno = gen_helper_gvec_smax8,
-          .opc = INDEX_op_smax_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fniv = tcg_gen_smax_vec,
           .fno = gen_helper_gvec_smax16,
-          .opc = INDEX_op_smax_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fni4 = tcg_gen_smax_i32,
           .fniv = tcg_gen_smax_vec,
           .fno = gen_helper_gvec_smax32,
-          .opc = INDEX_op_smax_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fni8 = tcg_gen_smax_i64,
           .fniv = tcg_gen_smax_vec,
           .fno = gen_helper_gvec_smax64,
-          .opc = INDEX_op_smax_vec,
+          .opt_opc = vecop_list,
           .vece = MO_64 }
     };
     tcg_debug_assert(vece <= MO_64);
@@ -2039,24 +2138,25 @@ void tcg_gen_gvec_smax(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_umax(unsigned vece, uint32_t dofs, uint32_t aofs,
                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_umax_vec, 0 };
     static const GVecGen3 g[4] = {
         { .fniv = tcg_gen_umax_vec,
           .fno = gen_helper_gvec_umax8,
-          .opc = INDEX_op_umax_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fniv = tcg_gen_umax_vec,
           .fno = gen_helper_gvec_umax16,
-          .opc = INDEX_op_umax_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fni4 = tcg_gen_umax_i32,
           .fniv = tcg_gen_umax_vec,
           .fno = gen_helper_gvec_umax32,
-          .opc = INDEX_op_umax_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fni8 = tcg_gen_umax_i64,
           .fniv = tcg_gen_umax_vec,
           .fno = gen_helper_gvec_umax64,
-          .opc = INDEX_op_umax_vec,
+          .opt_opc = vecop_list,
           .vece = MO_64 }
     };
     tcg_debug_assert(vece <= MO_64);
@@ -2110,26 +2210,27 @@ void tcg_gen_vec_neg32_i64(TCGv_i64 d, TCGv_i64 b)
 void tcg_gen_gvec_neg(unsigned vece, uint32_t dofs, uint32_t aofs,
                       uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_neg_vec, 0 };
     static const GVecGen2 g[4] = {
         { .fni8 = tcg_gen_vec_neg8_i64,
           .fniv = tcg_gen_neg_vec,
           .fno = gen_helper_gvec_neg8,
-          .opc = INDEX_op_neg_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fni8 = tcg_gen_vec_neg16_i64,
           .fniv = tcg_gen_neg_vec,
           .fno = gen_helper_gvec_neg16,
-          .opc = INDEX_op_neg_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fni4 = tcg_gen_neg_i32,
           .fniv = tcg_gen_neg_vec,
           .fno = gen_helper_gvec_neg32,
-          .opc = INDEX_op_neg_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fni8 = tcg_gen_neg_i64,
           .fniv = tcg_gen_neg_vec,
           .fno = gen_helper_gvec_neg64,
-          .opc = INDEX_op_neg_vec,
+          .opt_opc = vecop_list,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .vece = MO_64 },
     };
@@ -2145,7 +2246,6 @@ void tcg_gen_gvec_and(unsigned vece, uint32_t dofs, uint32_t aofs,
         .fni8 = tcg_gen_and_i64,
         .fniv = tcg_gen_and_vec,
         .fno = gen_helper_gvec_and,
-        .opc = INDEX_op_and_vec,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
 
@@ -2163,7 +2263,6 @@ void tcg_gen_gvec_or(unsigned vece, uint32_t dofs, uint32_t aofs,
         .fni8 = tcg_gen_or_i64,
         .fniv = tcg_gen_or_vec,
         .fno = gen_helper_gvec_or,
-        .opc = INDEX_op_or_vec,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
 
@@ -2181,7 +2280,6 @@ void tcg_gen_gvec_xor(unsigned vece, uint32_t dofs, uint32_t aofs,
         .fni8 = tcg_gen_xor_i64,
         .fniv = tcg_gen_xor_vec,
         .fno = gen_helper_gvec_xor,
-        .opc = INDEX_op_xor_vec,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
 
@@ -2199,7 +2297,6 @@ void tcg_gen_gvec_andc(unsigned vece, uint32_t dofs, uint32_t aofs,
         .fni8 = tcg_gen_andc_i64,
         .fniv = tcg_gen_andc_vec,
         .fno = gen_helper_gvec_andc,
-        .opc = INDEX_op_andc_vec,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
 
@@ -2217,7 +2314,6 @@ void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, uint32_t aofs,
         .fni8 = tcg_gen_orc_i64,
         .fniv = tcg_gen_orc_vec,
         .fno = gen_helper_gvec_orc,
-        .opc = INDEX_op_orc_vec,
         .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     };
 
@@ -2283,7 +2379,6 @@ static const GVecGen2s gop_ands = {
     .fni8 = tcg_gen_and_i64,
     .fniv = tcg_gen_and_vec,
     .fno = gen_helper_gvec_ands,
-    .opc = INDEX_op_and_vec,
     .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     .vece = MO_64
 };
@@ -2309,7 +2404,6 @@ static const GVecGen2s gop_xors = {
     .fni8 = tcg_gen_xor_i64,
     .fniv = tcg_gen_xor_vec,
     .fno = gen_helper_gvec_xors,
-    .opc = INDEX_op_xor_vec,
     .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     .vece = MO_64
 };
@@ -2335,7 +2429,6 @@ static const GVecGen2s gop_ors = {
     .fni8 = tcg_gen_or_i64,
     .fniv = tcg_gen_or_vec,
     .fno = gen_helper_gvec_ors,
-    .opc = INDEX_op_or_vec,
     .prefer_i64 = TCG_TARGET_REG_BITS == 64,
     .vece = MO_64
 };
@@ -2374,26 +2467,27 @@ void tcg_gen_vec_shl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
 void tcg_gen_gvec_shli(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_shli_vec, 0 };
     static const GVecGen2i g[4] = {
         { .fni8 = tcg_gen_vec_shl8i_i64,
           .fniv = tcg_gen_shli_vec,
           .fno = gen_helper_gvec_shl8i,
-          .opc = INDEX_op_shli_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fni8 = tcg_gen_vec_shl16i_i64,
           .fniv = tcg_gen_shli_vec,
           .fno = gen_helper_gvec_shl16i,
-          .opc = INDEX_op_shli_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fni4 = tcg_gen_shli_i32,
           .fniv = tcg_gen_shli_vec,
           .fno = gen_helper_gvec_shl32i,
-          .opc = INDEX_op_shli_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fni8 = tcg_gen_shli_i64,
           .fniv = tcg_gen_shli_vec,
           .fno = gen_helper_gvec_shl64i,
-          .opc = INDEX_op_shli_vec,
+          .opt_opc = vecop_list,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .vece = MO_64 },
     };
@@ -2424,26 +2518,27 @@ void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
 void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_shri_vec, 0 };
     static const GVecGen2i g[4] = {
         { .fni8 = tcg_gen_vec_shr8i_i64,
           .fniv = tcg_gen_shri_vec,
           .fno = gen_helper_gvec_shr8i,
-          .opc = INDEX_op_shri_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fni8 = tcg_gen_vec_shr16i_i64,
           .fniv = tcg_gen_shri_vec,
           .fno = gen_helper_gvec_shr16i,
-          .opc = INDEX_op_shri_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fni4 = tcg_gen_shri_i32,
           .fniv = tcg_gen_shri_vec,
           .fno = gen_helper_gvec_shr32i,
-          .opc = INDEX_op_shri_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fni8 = tcg_gen_shri_i64,
           .fniv = tcg_gen_shri_vec,
           .fno = gen_helper_gvec_shr64i,
-          .opc = INDEX_op_shri_vec,
+          .opt_opc = vecop_list,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .vece = MO_64 },
     };
@@ -2488,26 +2583,27 @@ void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c)
 void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_sari_vec, 0 };
     static const GVecGen2i g[4] = {
         { .fni8 = tcg_gen_vec_sar8i_i64,
           .fniv = tcg_gen_sari_vec,
           .fno = gen_helper_gvec_sar8i,
-          .opc = INDEX_op_sari_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fni8 = tcg_gen_vec_sar16i_i64,
           .fniv = tcg_gen_sari_vec,
           .fno = gen_helper_gvec_sar16i,
-          .opc = INDEX_op_sari_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fni4 = tcg_gen_sari_i32,
           .fniv = tcg_gen_sari_vec,
           .fno = gen_helper_gvec_sar32i,
-          .opc = INDEX_op_sari_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fni8 = tcg_gen_sari_i64,
           .fniv = tcg_gen_sari_vec,
           .fno = gen_helper_gvec_sar64i,
-          .opc = INDEX_op_sari_vec,
+          .opt_opc = vecop_list,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .vece = MO_64 },
     };
@@ -2524,24 +2620,25 @@ void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_shlv(unsigned vece, uint32_t dofs, uint32_t aofs,
                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_shlv_vec, 0 };
     static const GVecGen3 g[4] = {
         { .fniv = tcg_gen_shlv_vec,
           .fno = gen_helper_gvec_shl8v,
-          .opc = INDEX_op_shlv_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fniv = tcg_gen_shlv_vec,
           .fno = gen_helper_gvec_shl16v,
-          .opc = INDEX_op_shlv_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fni4 = tcg_gen_shl_i32,
           .fniv = tcg_gen_shlv_vec,
           .fno = gen_helper_gvec_shl32v,
-          .opc = INDEX_op_shlv_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fni8 = tcg_gen_shl_i64,
           .fniv = tcg_gen_shlv_vec,
           .fno = gen_helper_gvec_shl64v,
-          .opc = INDEX_op_shlv_vec,
+          .opt_opc = vecop_list,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .vece = MO_64 },
     };
@@ -2553,24 +2650,25 @@ void tcg_gen_gvec_shlv(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_shrv(unsigned vece, uint32_t dofs, uint32_t aofs,
                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_shrv_vec, 0 };
     static const GVecGen3 g[4] = {
         { .fniv = tcg_gen_shrv_vec,
           .fno = gen_helper_gvec_shr8v,
-          .opc = INDEX_op_shrv_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fniv = tcg_gen_shrv_vec,
           .fno = gen_helper_gvec_shr16v,
-          .opc = INDEX_op_shrv_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fni4 = tcg_gen_shr_i32,
           .fniv = tcg_gen_shrv_vec,
           .fno = gen_helper_gvec_shr32v,
-          .opc = INDEX_op_shrv_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fni8 = tcg_gen_shr_i64,
           .fniv = tcg_gen_shrv_vec,
           .fno = gen_helper_gvec_shr64v,
-          .opc = INDEX_op_shrv_vec,
+          .opt_opc = vecop_list,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .vece = MO_64 },
     };
@@ -2582,24 +2680,25 @@ void tcg_gen_gvec_shrv(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_sarv(unsigned vece, uint32_t dofs, uint32_t aofs,
                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode vecop_list[] = { INDEX_op_sarv_vec, 0 };
     static const GVecGen3 g[4] = {
         { .fniv = tcg_gen_sarv_vec,
           .fno = gen_helper_gvec_sar8v,
-          .opc = INDEX_op_sarv_vec,
+          .opt_opc = vecop_list,
           .vece = MO_8 },
         { .fniv = tcg_gen_sarv_vec,
           .fno = gen_helper_gvec_sar16v,
-          .opc = INDEX_op_sarv_vec,
+          .opt_opc = vecop_list,
           .vece = MO_16 },
         { .fni4 = tcg_gen_sar_i32,
           .fniv = tcg_gen_sarv_vec,
           .fno = gen_helper_gvec_sar32v,
-          .opc = INDEX_op_sarv_vec,
+          .opt_opc = vecop_list,
           .vece = MO_32 },
         { .fni8 = tcg_gen_sar_i64,
           .fniv = tcg_gen_sarv_vec,
           .fno = gen_helper_gvec_sar64v,
-          .opc = INDEX_op_sarv_vec,
+          .opt_opc = vecop_list,
           .prefer_i64 = TCG_TARGET_REG_BITS == 64,
           .vece = MO_64 },
     };
@@ -2667,6 +2766,7 @@ void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs,
                       uint32_t aofs, uint32_t bofs,
                       uint32_t oprsz, uint32_t maxsz)
 {
+    static const TCGOpcode cmp_list[] = { INDEX_op_cmp_vec, 0 };
     static gen_helper_gvec_3 * const eq_fn[4] = {
         gen_helper_gvec_eq8, gen_helper_gvec_eq16,
         gen_helper_gvec_eq32, gen_helper_gvec_eq64
@@ -2699,6 +2799,8 @@ void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs,
         [TCG_COND_LTU] = ltu_fn,
         [TCG_COND_LEU] = leu_fn,
     };
+
+    const TCGOpcode *hold_list;
     TCGType type;
     uint32_t some;
 
@@ -2711,10 +2813,12 @@ void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs,
         return;
     }
 
-    /* Implement inline with a vector type, if possible.
+    /*
+     * Implement inline with a vector type, if possible.
      * Prefer integer when 64-bit host and 64-bit comparison.
      */
-    type = choose_vector_type(INDEX_op_cmp_vec, vece, oprsz,
+    hold_list = tcg_swap_vecop_list(cmp_list);
+    type = choose_vector_type(cmp_list, vece, oprsz,
                               TCG_TARGET_REG_BITS == 64 && vece == MO_64);
     switch (type) {
     case TCG_TYPE_V256:
@@ -2756,13 +2860,14 @@ void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs,
                 assert(fn != NULL);
             }
             tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, 0, fn[vece]);
-            return;
+            oprsz = maxsz;
         }
         break;
 
     default:
         g_assert_not_reached();
     }
+    tcg_swap_vecop_list(hold_list);
 
     if (oprsz < maxsz) {
         expand_clr(dofs + oprsz, maxsz - oprsz);
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 6601cb8a8f..8d80e37f72 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -307,11 +307,14 @@ static bool do_op2(unsigned vece, TCGv_vec r, TCGv_vec a, TCGOpcode opc)
     int can;
 
     tcg_debug_assert(at->base_type >= type);
+    tcg_assert_listed_vecop(opc);
     can = tcg_can_emit_vec_op(opc, type, vece);
     if (can > 0) {
         vec_gen_2(opc, type, vece, ri, ai);
     } else if (can < 0) {
+        const TCGOpcode *hold_list = tcg_swap_vecop_list(NULL);
         tcg_expand_vec_op(opc, type, vece, ri, ai);
+        tcg_swap_vecop_list(hold_list);
     } else {
         return false;
     }
@@ -329,11 +332,17 @@ void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
 
 void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
 {
+    const TCGOpcode *hold_list;
+
+    tcg_assert_listed_vecop(INDEX_op_neg_vec);
+    hold_list = tcg_swap_vecop_list(NULL);
+
     if (!TCG_TARGET_HAS_neg_vec || !do_op2(vece, r, a, INDEX_op_neg_vec)) {
         TCGv_vec t = tcg_const_zeros_vec_matching(r);
         tcg_gen_sub_vec(vece, r, t, a);
         tcg_temp_free_vec(t);
     }
+    tcg_swap_vecop_list(hold_list);
 }
 
 static void do_shifti(TCGOpcode opc, unsigned vece,
@@ -348,6 +357,7 @@ static void do_shifti(TCGOpcode opc, unsigned vece,
 
     tcg_debug_assert(at->base_type == type);
     tcg_debug_assert(i >= 0 && i < (8 << vece));
+    tcg_assert_listed_vecop(opc);
 
     if (i == 0) {
         tcg_gen_mov_vec(r, a);
@@ -361,8 +371,10 @@ static void do_shifti(TCGOpcode opc, unsigned vece,
         /* We leave the choice of expansion via scalar or vector shift
            to the target.  Often, but not always, dupi can feed a vector
            shift easier than a scalar.  */
+        const TCGOpcode *hold_list = tcg_swap_vecop_list(NULL);
         tcg_debug_assert(can < 0);
         tcg_expand_vec_op(opc, type, vece, ri, ai, i);
+        tcg_swap_vecop_list(hold_list);
     }
 }
 
@@ -395,12 +407,15 @@ void tcg_gen_cmp_vec(TCGCond cond, unsigned vece,
 
     tcg_debug_assert(at->base_type >= type);
     tcg_debug_assert(bt->base_type >= type);
+    tcg_assert_listed_vecop(INDEX_op_cmp_vec);
     can = tcg_can_emit_vec_op(INDEX_op_cmp_vec, type, vece);
     if (can > 0) {
         vec_gen_4(INDEX_op_cmp_vec, type, vece, ri, ai, bi, cond);
     } else {
+        const TCGOpcode *hold_list = tcg_swap_vecop_list(NULL);
         tcg_debug_assert(can < 0);
         tcg_expand_vec_op(INDEX_op_cmp_vec, type, vece, ri, ai, bi, cond);
+        tcg_swap_vecop_list(hold_list);
     }
 }
 
@@ -418,12 +433,15 @@ static void do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
 
     tcg_debug_assert(at->base_type >= type);
     tcg_debug_assert(bt->base_type >= type);
+    tcg_assert_listed_vecop(opc);
     can = tcg_can_emit_vec_op(opc, type, vece);
     if (can > 0) {
         vec_gen_3(opc, type, vece, ri, ai, bi);
     } else {
+        const TCGOpcode *hold_list = tcg_swap_vecop_list(NULL);
         tcg_debug_assert(can < 0);
         tcg_expand_vec_op(opc, type, vece, ri, ai, bi);
+        tcg_swap_vecop_list(hold_list);
     }
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 17/38] tcg: Add gvec expanders for vector shift by scalar
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (15 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 16/38] tcg: Specify optional vector requirements with a list Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-23 18:58   ` David Hildenbrand
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 18/38] tcg/i386: Support vector scalar shift opcodes Richard Henderson
                   ` (23 subsequent siblings)
  40 siblings, 1 reply; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-gvec.h |   7 ++
 tcg/tcg-op.h      |   4 +
 tcg/tcg-op-gvec.c | 210 ++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg-op-vec.c  |  54 ++++++++++++
 4 files changed, 275 insertions(+)

diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index a0e0902f6c..f9c6058e92 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -318,6 +318,13 @@ void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs,
 void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
                        int64_t shift, uint32_t oprsz, uint32_t maxsz);
 
+void tcg_gen_gvec_shls(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_shrs(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_sars(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
+
 void tcg_gen_gvec_shlv(unsigned vece, uint32_t dofs, uint32_t aofs,
                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_shrv(unsigned vece, uint32_t dofs, uint32_t aofs,
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 833c6330b5..472b73cb38 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -986,6 +986,10 @@ void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 
+void tcg_gen_shls_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
+void tcg_gen_shrs_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
+void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
+
 void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
 void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
 void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 40858a83e0..4eb0747ddd 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -2617,6 +2617,216 @@ void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
     }
 }
 
+/*
+ * Specialized generation vector shifts by a non-constant scalar.
+ */
+
+static void expand_2sh_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
+                           uint32_t oprsz, uint32_t tysz, TCGType type,
+                           TCGv_i32 shift,
+                           void (*fni)(unsigned, TCGv_vec, TCGv_vec, TCGv_i32))
+{
+    TCGv_vec t0 = tcg_temp_new_vec(type);
+    uint32_t i;
+
+    for (i = 0; i < oprsz; i += tysz) {
+        tcg_gen_ld_vec(t0, cpu_env, aofs + i);
+        fni(vece, t0, t0, shift);
+        tcg_gen_st_vec(t0, cpu_env, dofs + i);
+    }
+    tcg_temp_free_vec(t0);
+}
+
+static void do_shifts(unsigned vece, uint32_t dofs, uint32_t aofs,
+                      TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz,
+                      void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32),
+                      void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64),
+                      void (*fniv_s)(unsigned, TCGv_vec, TCGv_vec, TCGv_i32),
+                      void (*fniv_v)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec),
+                      gen_helper_gvec_2 *fno,
+                      const TCGOpcode *s_list, const TCGOpcode *v_list)
+{
+    TCGType type;
+    uint32_t some;
+
+    check_size_align(oprsz, maxsz, dofs | aofs);
+    check_overlap_2(dofs, aofs, maxsz);
+
+    /* If the backend has a scalar expansion, great.  */
+    type = choose_vector_type(s_list, vece, oprsz, vece == MO_64);
+    if (type) {
+        const TCGOpcode *hold_list = tcg_swap_vecop_list(NULL);
+        switch (type) {
+        case TCG_TYPE_V256:
+            some = QEMU_ALIGN_DOWN(oprsz, 32);
+            expand_2sh_vec(vece, dofs, aofs, some, 32,
+                           TCG_TYPE_V256, shift, fniv_s);
+            if (some == oprsz) {
+                break;
+            }
+            dofs += some;
+            aofs += some;
+            oprsz -= some;
+            maxsz -= some;
+            /* fallthru */
+        case TCG_TYPE_V128:
+            expand_2sh_vec(vece, dofs, aofs, oprsz, 16,
+                           TCG_TYPE_V128, shift, fniv_s);
+            break;
+        case TCG_TYPE_V64:
+            expand_2sh_vec(vece, dofs, aofs, oprsz, 8,
+                           TCG_TYPE_V64, shift, fniv_s);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        tcg_swap_vecop_list(hold_list);
+        goto clear_tail;
+    }
+
+    /* If the backend supports variable vector shifts, also cool.  */
+    type = choose_vector_type(v_list, vece, oprsz, vece == MO_64);
+    if (type) {
+        const TCGOpcode *hold_list = tcg_swap_vecop_list(NULL);
+        TCGv_vec v_shift = tcg_temp_new_vec(type);
+
+        if (vece == MO_64) {
+            TCGv_i64 sh64 = tcg_temp_new_i64();
+            tcg_gen_extu_i32_i64(sh64, shift);
+            tcg_gen_dup_i64_vec(MO_64, v_shift, sh64);
+            tcg_temp_free_i64(sh64);
+        } else {
+            tcg_gen_dup_i32_vec(vece, v_shift, shift);
+        }
+
+        switch (type) {
+        case TCG_TYPE_V256:
+            some = QEMU_ALIGN_DOWN(oprsz, 32);
+            expand_2s_vec(vece, dofs, aofs, some, 32, TCG_TYPE_V256,
+                          v_shift, false, fniv_v);
+            if (some == oprsz) {
+                break;
+            }
+            dofs += some;
+            aofs += some;
+            oprsz -= some;
+            maxsz -= some;
+            /* fallthru */
+        case TCG_TYPE_V128:
+            expand_2s_vec(vece, dofs, aofs, oprsz, 16, TCG_TYPE_V128,
+                          v_shift, false, fniv_v);
+            break;
+        case TCG_TYPE_V64:
+            expand_2s_vec(vece, dofs, aofs, oprsz, 8, TCG_TYPE_V64,
+                          v_shift, false, fniv_v);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        tcg_temp_free_vec(v_shift);
+        tcg_swap_vecop_list(hold_list);
+        goto clear_tail;
+    }
+
+    /* Otherwise fall back to integral... */
+    if (fni8) {
+        TCGv_i64 sh64 = tcg_temp_new_i64();
+        tcg_gen_extu_i32_i64(sh64, shift);
+        expand_2s_i64(dofs, aofs, oprsz, sh64, false, fni8);
+        tcg_temp_free_i64(sh64);
+        goto clear_tail;
+    }
+    if (fni4) {
+        expand_2s_i32(dofs, aofs, oprsz, shift, false, fni4);
+        goto clear_tail;
+    }
+
+    /* Otherwise fall back to out of line.  */
+    tcg_debug_assert(fno);
+    {
+        TCGv_ptr a0 = tcg_temp_new_ptr();
+        TCGv_ptr a1 = tcg_temp_new_ptr();
+        TCGv_i32 desc = tcg_temp_new_i32();
+
+        tcg_gen_shli_i32(desc, shift, SIMD_DATA_SHIFT);
+        tcg_gen_ori_i32(desc, desc, simd_desc(oprsz, maxsz, 0));
+        tcg_gen_addi_ptr(a0, cpu_env, dofs);
+        tcg_gen_addi_ptr(a1, cpu_env, aofs);
+
+        fno(a0, a1, desc);
+
+        tcg_temp_free_ptr(a0);
+        tcg_temp_free_ptr(a1);
+        tcg_temp_free_i32(desc);
+        return;
+    }
+
+ clear_tail:
+    if (oprsz < maxsz) {
+        expand_clr(dofs + oprsz, maxsz - oprsz);
+    }
+}
+
+void tcg_gen_gvec_shls(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode scalar_list[] = { INDEX_op_shls_vec, 0 };
+    static const TCGOpcode vector_list[] = { INDEX_op_shlv_vec, 0 };
+    static gen_helper_gvec_2 * const fno[4] = {
+        gen_helper_gvec_shl8i,
+        gen_helper_gvec_shl16i,
+        gen_helper_gvec_shl32i,
+        gen_helper_gvec_shl64i,
+    };
+
+    tcg_debug_assert(vece <= MO_64);
+    do_shifts(vece, dofs, aofs, shift, oprsz, maxsz,
+              vece == MO_32 ? tcg_gen_shl_i32 : NULL,
+              vece == MO_64 ? tcg_gen_shl_i64 : NULL,
+              tcg_gen_shls_vec, tcg_gen_shlv_vec, fno[vece],
+              scalar_list, vector_list);
+}
+
+void tcg_gen_gvec_shrs(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode scalar_list[] = { INDEX_op_shrs_vec, 0 };
+    static const TCGOpcode vector_list[] = { INDEX_op_shrv_vec, 0 };
+    static gen_helper_gvec_2 * const fno[4] = {
+        gen_helper_gvec_shr8i,
+        gen_helper_gvec_shr16i,
+        gen_helper_gvec_shr32i,
+        gen_helper_gvec_shr64i,
+    };
+
+    tcg_debug_assert(vece <= MO_64);
+    do_shifts(vece, dofs, aofs, shift, oprsz, maxsz,
+              vece == MO_32 ? tcg_gen_shr_i32 : NULL,
+              vece == MO_64 ? tcg_gen_shr_i64 : NULL,
+              tcg_gen_shrs_vec, tcg_gen_shrv_vec, fno[vece],
+              scalar_list, vector_list);
+}
+
+void tcg_gen_gvec_sars(unsigned vece, uint32_t dofs, uint32_t aofs,
+                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode scalar_list[] = { INDEX_op_sars_vec, 0 };
+    static const TCGOpcode vector_list[] = { INDEX_op_sarv_vec, 0 };
+    static gen_helper_gvec_2 * const fno[4] = {
+        gen_helper_gvec_sar8i,
+        gen_helper_gvec_sar16i,
+        gen_helper_gvec_sar32i,
+        gen_helper_gvec_sar64i,
+    };
+
+    tcg_debug_assert(vece <= MO_64);
+    do_shifts(vece, dofs, aofs, shift, oprsz, maxsz,
+              vece == MO_32 ? tcg_gen_sar_i32 : NULL,
+              vece == MO_64 ? tcg_gen_sar_i64 : NULL,
+              tcg_gen_sars_vec, tcg_gen_sarv_vec, fno[vece],
+              scalar_list, vector_list);
+}
+
 void tcg_gen_gvec_shlv(unsigned vece, uint32_t dofs, uint32_t aofs,
                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 8d80e37f72..5ec8e0b1a0 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -514,3 +514,57 @@ void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
     do_op3(vece, r, a, b, INDEX_op_sarv_vec);
 }
+
+static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec a,
+                      TCGv_i32 s, TCGOpcode opc_s, TCGOpcode opc_v)
+{
+    TCGTemp *rt = tcgv_vec_temp(r);
+    TCGTemp *at = tcgv_vec_temp(a);
+    TCGTemp *st = tcgv_i32_temp(s);
+    TCGArg ri = temp_arg(rt);
+    TCGArg ai = temp_arg(at);
+    TCGArg si = temp_arg(st);
+    TCGType type = rt->base_type;
+    const TCGOpcode *hold_list;
+    int can;
+
+    tcg_debug_assert(at->base_type >= type);
+    tcg_assert_listed_vecop(opc_s);
+    hold_list = tcg_swap_vecop_list(NULL);
+
+    can = tcg_can_emit_vec_op(opc_s, type, vece);
+    if (can > 0) {
+        vec_gen_3(opc_s, type, vece, ri, ai, si);
+    } else if (can < 0) {
+        tcg_expand_vec_op(opc_s, type, vece, ri, ai, si);
+    } else {
+        TCGv_vec vec_s = tcg_temp_new_vec(type);
+
+        if (vece == MO_64) {
+            TCGv_i64 s64 = tcg_temp_new_i64();
+            tcg_gen_extu_i32_i64(s64, s);
+            tcg_gen_dup_i64_vec(MO_64, vec_s, s64);
+            tcg_temp_free_i64(s64);
+        } else {
+            tcg_gen_dup_i32_vec(vece, vec_s, s);
+        }
+        do_op3(vece, r, a, vec_s, opc_v);
+        tcg_temp_free_vec(vec_s);
+    }
+    tcg_swap_vecop_list(hold_list);
+}
+
+void tcg_gen_shls_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b)
+{
+    do_shifts(vece, r, a, b, INDEX_op_shls_vec, INDEX_op_shlv_vec);
+}
+
+void tcg_gen_shrs_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b)
+{
+    do_shifts(vece, r, a, b, INDEX_op_shrs_vec, INDEX_op_shrv_vec);
+}
+
+void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b)
+{
+    do_shifts(vece, r, a, b, INDEX_op_sars_vec, INDEX_op_sarv_vec);
+}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 18/38] tcg/i386: Support vector scalar shift opcodes
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (16 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 17/38] tcg: Add gvec expanders for vector shift by scalar Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 19/38] tcg: Add support for integer absolute value Richard Henderson
                   ` (22 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h     |  2 +-
 tcg/i386/tcg-target.inc.c | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index b240633455..618aa520d2 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -183,7 +183,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_not_vec          0
 #define TCG_TARGET_HAS_neg_vec          0
 #define TCG_TARGET_HAS_shi_vec          1
-#define TCG_TARGET_HAS_shs_vec          0
+#define TCG_TARGET_HAS_shs_vec          1
 #define TCG_TARGET_HAS_shv_vec          have_avx2
 #define TCG_TARGET_HAS_cmp_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 04e609c7b2..85b68e4326 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -420,6 +420,14 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_PSHIFTW_Ib  (0x71 | P_EXT | P_DATA16) /* /2 /6 /4 */
 #define OPC_PSHIFTD_Ib  (0x72 | P_EXT | P_DATA16) /* /2 /6 /4 */
 #define OPC_PSHIFTQ_Ib  (0x73 | P_EXT | P_DATA16) /* /2 /6 /4 */
+#define OPC_PSLLW       (0xf1 | P_EXT | P_DATA16)
+#define OPC_PSLLD       (0xf2 | P_EXT | P_DATA16)
+#define OPC_PSLLQ       (0xf3 | P_EXT | P_DATA16)
+#define OPC_PSRAW       (0xe1 | P_EXT | P_DATA16)
+#define OPC_PSRAD       (0xe2 | P_EXT | P_DATA16)
+#define OPC_PSRLW       (0xd1 | P_EXT | P_DATA16)
+#define OPC_PSRLD       (0xd2 | P_EXT | P_DATA16)
+#define OPC_PSRLQ       (0xd3 | P_EXT | P_DATA16)
 #define OPC_PSUBB       (0xf8 | P_EXT | P_DATA16)
 #define OPC_PSUBW       (0xf9 | P_EXT | P_DATA16)
 #define OPC_PSUBD       (0xfa | P_EXT | P_DATA16)
@@ -2722,6 +2730,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         /* TODO: AVX512 adds support for MO_16, MO_64.  */
         OPC_UD2, OPC_UD2, OPC_VPSRAVD, OPC_UD2
     };
+    static int const shls_insn[4] = {
+        OPC_UD2, OPC_PSLLW, OPC_PSLLD, OPC_PSLLQ
+    };
+    static int const shrs_insn[4] = {
+        OPC_UD2, OPC_PSRLW, OPC_PSRLD, OPC_PSRLQ
+    };
+    static int const sars_insn[4] = {
+        OPC_UD2, OPC_PSRAW, OPC_PSRAD, OPC_UD2
+    };
 
     TCGType type = vecl + TCG_TYPE_V64;
     int insn, sub;
@@ -2783,6 +2800,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_sarv_vec:
         insn = sarv_insn[vece];
         goto gen_simd;
+    case INDEX_op_shls_vec:
+        insn = shls_insn[vece];
+        goto gen_simd;
+    case INDEX_op_shrs_vec:
+        insn = shrs_insn[vece];
+        goto gen_simd;
+    case INDEX_op_sars_vec:
+        insn = sars_insn[vece];
+        goto gen_simd;
     case INDEX_op_x86_punpckl_vec:
         insn = punpckl_insn[vece];
         goto gen_simd;
@@ -3163,6 +3189,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_shlv_vec:
     case INDEX_op_shrv_vec:
     case INDEX_op_sarv_vec:
+    case INDEX_op_shls_vec:
+    case INDEX_op_shrs_vec:
+    case INDEX_op_sars_vec:
     case INDEX_op_cmp_vec:
     case INDEX_op_x86_shufps_vec:
     case INDEX_op_x86_blend_vec:
@@ -3220,6 +3249,12 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
         }
         return 1;
 
+    case INDEX_op_shls_vec:
+    case INDEX_op_shrs_vec:
+        return vece >= MO_16;
+    case INDEX_op_sars_vec:
+        return vece >= MO_16 && vece <= MO_32;
+
     case INDEX_op_shlv_vec:
     case INDEX_op_shrv_vec:
         return have_avx2 && vece >= MO_32;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 19/38] tcg: Add support for integer absolute value
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (17 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 18/38] tcg/i386: Support vector scalar shift opcodes Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-23  8:52   ` Philippe Mathieu-Daudé
  2019-04-23 18:37   ` David Hildenbrand
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 20/38] tcg: Add support for vector " Richard Henderson
                   ` (21 subsequent siblings)
  40 siblings, 2 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Remove a function of the same name from target/arm/.
Use a branchless implementation of abs that gcc uses for x86.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op.h           |  5 +++++
 target/arm/translate.c | 10 ----------
 tcg/tcg-op.c           | 20 ++++++++++++++++++++
 3 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 472b73cb38..660fe205d0 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -335,6 +335,7 @@ void tcg_gen_smin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_smax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_umin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_umax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
+void tcg_gen_abs_i32(TCGv_i32, TCGv_i32);
 
 static inline void tcg_gen_discard_i32(TCGv_i32 arg)
 {
@@ -534,6 +535,7 @@ void tcg_gen_smin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_smax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_umin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_umax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
+void tcg_gen_abs_i64(TCGv_i64, TCGv_i64);
 
 #if TCG_TARGET_REG_BITS == 64
 static inline void tcg_gen_discard_i64(TCGv_i64 arg)
@@ -973,6 +975,7 @@ void tcg_gen_nor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
 void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
+void tcg_gen_abs_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
 void tcg_gen_ssadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_usadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_sssub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
@@ -1019,6 +1022,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
 #define tcg_gen_addi_tl tcg_gen_addi_i64
 #define tcg_gen_sub_tl tcg_gen_sub_i64
 #define tcg_gen_neg_tl tcg_gen_neg_i64
+#define tcg_gen_abs_tl tcg_gen_abs_i64
 #define tcg_gen_subfi_tl tcg_gen_subfi_i64
 #define tcg_gen_subi_tl tcg_gen_subi_i64
 #define tcg_gen_and_tl tcg_gen_and_i64
@@ -1131,6 +1135,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
 #define tcg_gen_addi_tl tcg_gen_addi_i32
 #define tcg_gen_sub_tl tcg_gen_sub_i32
 #define tcg_gen_neg_tl tcg_gen_neg_i32
+#define tcg_gen_abs_tl tcg_gen_abs_i32
 #define tcg_gen_subfi_tl tcg_gen_subfi_i32
 #define tcg_gen_subi_tl tcg_gen_subi_i32
 #define tcg_gen_and_tl tcg_gen_and_i32
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 83a008e945..721171794d 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -603,16 +603,6 @@ static void gen_sar(TCGv_i32 dest, TCGv_i32 t0, TCGv_i32 t1)
     tcg_temp_free_i32(tmp1);
 }
 
-static void tcg_gen_abs_i32(TCGv_i32 dest, TCGv_i32 src)
-{
-    TCGv_i32 c0 = tcg_const_i32(0);
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_neg_i32(tmp, src);
-    tcg_gen_movcond_i32(TCG_COND_GT, dest, src, c0, src, tmp);
-    tcg_temp_free_i32(c0);
-    tcg_temp_free_i32(tmp);
-}
-
 static void shifter_out_im(TCGv_i32 var, int shift)
 {
     if (shift == 0) {
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index a00d1df37e..0ac291f1c4 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1091,6 +1091,16 @@ void tcg_gen_umax_i32(TCGv_i32 ret, TCGv_i32 a, TCGv_i32 b)
     tcg_gen_movcond_i32(TCG_COND_LTU, ret, a, b, b, a);
 }
 
+void tcg_gen_abs_i32(TCGv_i32 ret, TCGv_i32 a)
+{
+    TCGv_i32 t = tcg_temp_new_i32();
+
+    tcg_gen_sari_i32(t, a, 31);
+    tcg_gen_xor_i32(ret, a, t);
+    tcg_gen_sub_i32(ret, ret, t);
+    tcg_temp_free_i32(t);
+}
+
 /* 64-bit ops */
 
 #if TCG_TARGET_REG_BITS == 32
@@ -2548,6 +2558,16 @@ void tcg_gen_umax_i64(TCGv_i64 ret, TCGv_i64 a, TCGv_i64 b)
     tcg_gen_movcond_i64(TCG_COND_LTU, ret, a, b, b, a);
 }
 
+void tcg_gen_abs_i64(TCGv_i64 ret, TCGv_i64 a)
+{
+    TCGv_i64 t = tcg_temp_new_i64();
+
+    tcg_gen_sari_i64(t, a, 63);
+    tcg_gen_xor_i64(ret, a, t);
+    tcg_gen_sub_i64(ret, ret, t);
+    tcg_temp_free_i64(t);
+}
+
 /* Size changing operations.  */
 
 void tcg_gen_extrl_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 20/38] tcg: Add support for vector absolute value
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (18 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 19/38] tcg: Add support for integer absolute value Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-23 18:35   ` David Hildenbrand
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 21/38] target/arm: Use tcg_gen_abs_i64 and tcg_gen_gvec_abs Richard Henderson
                   ` (20 subsequent siblings)
  40 siblings, 1 reply; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/tcg-runtime.h      |  5 +++
 tcg/aarch64/tcg-target.h     |  1 +
 tcg/i386/tcg-target.h        |  1 +
 tcg/tcg-op-gvec.h            |  2 +
 tcg/tcg-opc.h                |  1 +
 tcg/tcg.h                    |  1 +
 accel/tcg/tcg-runtime-gvec.c | 48 ++++++++++++++++++++++++
 tcg/tcg-op-gvec.c            | 71 ++++++++++++++++++++++++++++++++++++
 tcg/tcg-op-vec.c             | 31 ++++++++++++++++
 tcg/tcg.c                    |  2 +
 tcg/README                   |  4 ++
 11 files changed, 167 insertions(+)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index ed3ce5fd91..6d73dc2d65 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -225,6 +225,11 @@ DEF_HELPER_FLAGS_3(gvec_neg16, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(gvec_neg32, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(gvec_neg64, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(gvec_abs8, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_abs16, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_abs32, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_abs64, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_3(gvec_not, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_and, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_or, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index f5640a229b..21d06d928c 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -132,6 +132,7 @@ typedef enum {
 #define TCG_TARGET_HAS_orc_vec          1
 #define TCG_TARGET_HAS_not_vec          1
 #define TCG_TARGET_HAS_neg_vec          1
+#define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_shi_vec          1
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 618aa520d2..7445f05885 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -182,6 +182,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_orc_vec          0
 #define TCG_TARGET_HAS_not_vec          0
 #define TCG_TARGET_HAS_neg_vec          0
+#define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_shi_vec          1
 #define TCG_TARGET_HAS_shs_vec          1
 #define TCG_TARGET_HAS_shv_vec          have_avx2
diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index f9c6058e92..46f58febbf 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -228,6 +228,8 @@ void tcg_gen_gvec_not(unsigned vece, uint32_t dofs, uint32_t aofs,
                       uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_neg(unsigned vece, uint32_t dofs, uint32_t aofs,
                       uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_abs(unsigned vece, uint32_t dofs, uint32_t aofs,
+                      uint32_t oprsz, uint32_t maxsz);
 
 void tcg_gen_gvec_add(unsigned vece, uint32_t dofs, uint32_t aofs,
                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 4bf71f261f..4a2dd116eb 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -225,6 +225,7 @@ DEF(add_vec, 1, 2, 0, IMPLVEC)
 DEF(sub_vec, 1, 2, 0, IMPLVEC)
 DEF(mul_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_mul_vec))
 DEF(neg_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec))
+DEF(abs_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_abs_vec))
 DEF(ssadd_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
 DEF(usadd_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
 DEF(sssub_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 48d4d2e03e..986055fdfa 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -176,6 +176,7 @@ typedef uint64_t TCGRegSet;
     && !defined(TCG_TARGET_HAS_v128) \
     && !defined(TCG_TARGET_HAS_v256)
 #define TCG_TARGET_MAYBE_vec            0
+#define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_neg_vec          0
 #define TCG_TARGET_HAS_not_vec          0
 #define TCG_TARGET_HAS_andc_vec         0
diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index 7b88f5590c..dd08095773 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -398,6 +398,54 @@ void HELPER(gvec_neg64)(void *d, void *a, uint32_t desc)
     clear_high(d, oprsz, desc);
 }
 
+void HELPER(gvec_abs8)(void *d, void *a, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(int8_t)) {
+        int8_t aa = *(int8_t *)(a + i);
+        *(int8_t *)(d + i) = aa < 0 ? -aa : aa;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_abs16)(void *d, void *a, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(int16_t)) {
+        int16_t aa = *(int16_t *)(a + i);
+        *(int16_t *)(d + i) = aa < 0 ? -aa : aa;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_abs32)(void *d, void *a, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(int32_t)) {
+        int32_t aa = *(int32_t *)(a + i);
+        *(int32_t *)(d + i) = aa < 0 ? -aa : aa;
+    }
+    clear_high(d, oprsz, desc);
+}
+
+void HELPER(gvec_abs64)(void *d, void *a, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(int64_t)) {
+        int64_t aa = *(int64_t *)(a + i);
+        *(int64_t *)(d + i) = aa < 0 ? -aa : aa;
+    }
+    clear_high(d, oprsz, desc);
+}
+
 void HELPER(gvec_mov)(void *d, void *a, uint32_t desc)
 {
     intptr_t oprsz = simd_oprsz(desc);
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 4eb0747ddd..87d5a01cc9 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -88,6 +88,14 @@ static bool tcg_can_emit_vecop_list(const TCGOpcode *list,
                 continue;
             }
             break;
+        case INDEX_op_abs_vec:
+            if (tcg_can_emit_vec_op(INDEX_op_sub_vec, type, vece)
+                && (tcg_can_emit_vec_op(INDEX_op_smax_vec, type, vece) > 0
+                    || tcg_can_emit_vec_op(INDEX_op_sari_vec, type, vece) > 0
+                    || tcg_can_emit_vec_op(INDEX_op_cmp_vec, type, vece))) {
+                continue;
+            }
+            break;
         default:
             break;
         }
@@ -2239,6 +2247,69 @@ void tcg_gen_gvec_neg(unsigned vece, uint32_t dofs, uint32_t aofs,
     tcg_gen_gvec_2(dofs, aofs, oprsz, maxsz, &g[vece]);
 }
 
+static void gen_absv_mask(TCGv_i64 d, TCGv_i64 b, unsigned vece)
+{
+    TCGv_i64 t = tcg_temp_new_i64();
+    int nbit = 8 << vece;
+
+    /* Create -1 for each negative element.  */
+    tcg_gen_shri_i64(t, b, nbit - 1);
+    tcg_gen_andi_i64(t, t, dup_const(vece, 1));
+    tcg_gen_muli_i64(t, t, (1 << nbit) - 1);
+
+    /*
+     * Invert (via xor -1) and add one (via sub -1).
+     * Because of the ordering the msb is cleared,
+     * so we never have carry into the next element.
+     */
+    tcg_gen_xor_i64(d, b, t);
+    tcg_gen_sub_i64(d, d, t);
+
+    tcg_temp_free_i64(t);
+}
+
+static void tcg_gen_vec_abs8_i64(TCGv_i64 d, TCGv_i64 b)
+{
+    gen_absv_mask(d, b, MO_8);
+}
+
+static void tcg_gen_vec_abs16_i64(TCGv_i64 d, TCGv_i64 b)
+{
+    gen_absv_mask(d, b, MO_16);
+}
+
+void tcg_gen_gvec_abs(unsigned vece, uint32_t dofs, uint32_t aofs,
+                      uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode vecop_list[] = { INDEX_op_abs_vec, 0 };
+    static const GVecGen2 g[4] = {
+        { .fni8 = tcg_gen_vec_abs8_i64,
+          .fniv = tcg_gen_abs_vec,
+          .fno = gen_helper_gvec_abs8,
+          .opt_opc = vecop_list,
+          .vece = MO_8 },
+        { .fni8 = tcg_gen_vec_abs16_i64,
+          .fniv = tcg_gen_abs_vec,
+          .fno = gen_helper_gvec_abs16,
+          .opt_opc = vecop_list,
+          .vece = MO_16 },
+        { .fni4 = tcg_gen_abs_i32,
+          .fniv = tcg_gen_abs_vec,
+          .fno = gen_helper_gvec_abs32,
+          .opt_opc = vecop_list,
+          .vece = MO_32 },
+        { .fni8 = tcg_gen_abs_i64,
+          .fniv = tcg_gen_abs_vec,
+          .fno = gen_helper_gvec_abs64,
+          .opt_opc = vecop_list,
+          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+          .vece = MO_64 },
+    };
+
+    tcg_debug_assert(vece <= MO_64);
+    tcg_gen_gvec_2(dofs, aofs, oprsz, maxsz, &g[vece]);
+}
+
 void tcg_gen_gvec_and(unsigned vece, uint32_t dofs, uint32_t aofs,
                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
 {
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 5ec8e0b1a0..b7f21145bb 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -345,6 +345,37 @@ void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
     tcg_swap_vecop_list(hold_list);
 }
 
+void tcg_gen_abs_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
+{
+    const TCGOpcode *hold_list;
+
+    tcg_assert_listed_vecop(INDEX_op_abs_vec);
+    hold_list = tcg_swap_vecop_list(NULL);
+
+    if (!do_op2(vece, r, a, INDEX_op_abs_vec)) {
+        TCGType type = tcgv_vec_temp(r)->base_type;
+        TCGv_vec t = tcg_temp_new_vec(type);
+
+        tcg_debug_assert(tcg_can_emit_vec_op(INDEX_op_sub_vec, type, vece));
+        if (tcg_can_emit_vec_op(INDEX_op_smax_vec, type, vece) > 0) {
+            tcg_gen_neg_vec(vece, t, a);
+            tcg_gen_smax_vec(vece, r, a, t);
+        } else {
+            if (tcg_can_emit_vec_op(INDEX_op_sari_vec, type, vece) > 0) {
+                tcg_gen_sari_vec(vece, t, a, (8 << vece) - 1);
+            } else {
+                do_dupi_vec(t, MO_REG, 0);
+                tcg_gen_cmp_vec(TCG_COND_LT, vece, t, a, t);
+            }
+            tcg_gen_xor_vec(vece, r, a, t);
+            tcg_gen_sub_vec(vece, r, r, t);
+        }
+
+        tcg_temp_free_vec(t);
+    }
+    tcg_swap_vecop_list(hold_list);
+}
+
 static void do_shifti(TCGOpcode opc, unsigned vece,
                       TCGv_vec r, TCGv_vec a, int64_t i)
 {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 0b0b228bb5..86a95a636b 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1621,6 +1621,8 @@ bool tcg_op_supported(TCGOpcode op)
         return have_vec && TCG_TARGET_HAS_not_vec;
     case INDEX_op_neg_vec:
         return have_vec && TCG_TARGET_HAS_neg_vec;
+    case INDEX_op_abs_vec:
+        return have_vec && TCG_TARGET_HAS_abs_vec;
     case INDEX_op_andc_vec:
         return have_vec && TCG_TARGET_HAS_andc_vec;
     case INDEX_op_orc_vec:
diff --git a/tcg/README b/tcg/README
index c30e5418a6..cbdfd3b6bc 100644
--- a/tcg/README
+++ b/tcg/README
@@ -561,6 +561,10 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32.
 
   Similarly, v0 = -v1.
 
+* abs_vec   v0, v1
+
+  Similarly, v0 = v1 < 0 ? -v1 : v1, in elements across the vector.
+
 * smin_vec:
 * umin_vec:
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 21/38] target/arm: Use tcg_gen_abs_i64 and tcg_gen_gvec_abs
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (19 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 20/38] tcg: Add support for vector " Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 22/38] target/cris: Use tcg_gen_abs_tl Richard Henderson
                   ` (19 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h        |  2 --
 target/arm/neon_helper.c   |  5 -----
 target/arm/translate-a64.c | 41 +++++---------------------------------
 target/arm/translate.c     | 11 +++-------
 4 files changed, 8 insertions(+), 51 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index a09566f795..3d90b5be66 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -347,8 +347,6 @@ DEF_HELPER_2(neon_ceq_u8, i32, i32, i32)
 DEF_HELPER_2(neon_ceq_u16, i32, i32, i32)
 DEF_HELPER_2(neon_ceq_u32, i32, i32, i32)
 
-DEF_HELPER_1(neon_abs_s8, i32, i32)
-DEF_HELPER_1(neon_abs_s16, i32, i32)
 DEF_HELPER_1(neon_clz_u8, i32, i32)
 DEF_HELPER_1(neon_clz_u16, i32, i32)
 DEF_HELPER_1(neon_cls_s8, i32, i32)
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index ed1c6fc41c..4259056723 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -1228,11 +1228,6 @@ NEON_VOP(ceq_u16, neon_u16, 2)
 NEON_VOP(ceq_u32, neon_u32, 1)
 #undef NEON_FN
 
-#define NEON_FN(dest, src, dummy) dest = (src < 0) ? -src : src
-NEON_VOP1(abs_s8, neon_s8, 4)
-NEON_VOP1(abs_s16, neon_s16, 2)
-#undef NEON_FN
-
 /* Count Leading Sign/Zero Bits.  */
 static inline int do_clz8(uint8_t x)
 {
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index dcdeb80176..fd8921565e 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -9468,11 +9468,7 @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u,
         if (u) {
             tcg_gen_neg_i64(tcg_rd, tcg_rn);
         } else {
-            TCGv_i64 tcg_zero = tcg_const_i64(0);
-            tcg_gen_neg_i64(tcg_rd, tcg_rn);
-            tcg_gen_movcond_i64(TCG_COND_GT, tcg_rd, tcg_rn, tcg_zero,
-                                tcg_rn, tcg_rd);
-            tcg_temp_free_i64(tcg_zero);
+            tcg_gen_abs_i64(tcg_rd, tcg_rn);
         }
         break;
     case 0x2f: /* FABS */
@@ -12366,11 +12362,12 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
         }
         break;
     case 0xb:
-        if (u) { /* NEG */
+        if (u) { /* ABS, NEG */
             gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_neg, size);
-            return;
+        } else {
+            gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_abs, size);
         }
-        break;
+        return;
     }
 
     if (size == 3) {
@@ -12438,17 +12435,6 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
                         gen_helper_neon_qabs_s32(tcg_res, cpu_env, tcg_op);
                     }
                     break;
-                case 0xb: /* ABS, NEG */
-                    if (u) {
-                        tcg_gen_neg_i32(tcg_res, tcg_op);
-                    } else {
-                        TCGv_i32 tcg_zero = tcg_const_i32(0);
-                        tcg_gen_neg_i32(tcg_res, tcg_op);
-                        tcg_gen_movcond_i32(TCG_COND_GT, tcg_res, tcg_op,
-                                            tcg_zero, tcg_op, tcg_res);
-                        tcg_temp_free_i32(tcg_zero);
-                    }
-                    break;
                 case 0x2f: /* FABS */
                     gen_helper_vfp_abss(tcg_res, tcg_op);
                     break;
@@ -12561,23 +12547,6 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
                     tcg_temp_free_i32(tcg_zero);
                     break;
                 }
-                case 0xb: /* ABS, NEG */
-                    if (u) {
-                        TCGv_i32 tcg_zero = tcg_const_i32(0);
-                        if (size) {
-                            gen_helper_neon_sub_u16(tcg_res, tcg_zero, tcg_op);
-                        } else {
-                            gen_helper_neon_sub_u8(tcg_res, tcg_zero, tcg_op);
-                        }
-                        tcg_temp_free_i32(tcg_zero);
-                    } else {
-                        if (size) {
-                            gen_helper_neon_abs_s16(tcg_res, tcg_op);
-                        } else {
-                            gen_helper_neon_abs_s8(tcg_res, tcg_op);
-                        }
-                    }
-                    break;
                 case 0x4: /* CLS, CLZ */
                     if (u) {
                         if (size == 0) {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 721171794d..911ad0bdab 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8031,6 +8031,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 case NEON_2RM_VNEG:
                     tcg_gen_gvec_neg(size, rd_ofs, rm_ofs, vec_size, vec_size);
                     break;
+                case NEON_2RM_VABS:
+                    tcg_gen_gvec_abs(size, rd_ofs, rm_ofs, vec_size, vec_size);
+                    break;
 
                 default:
                 elementwise:
@@ -8136,14 +8139,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             }
                             tcg_temp_free_i32(tmp2);
                             break;
-                        case NEON_2RM_VABS:
-                            switch(size) {
-                            case 0: gen_helper_neon_abs_s8(tmp, tmp); break;
-                            case 1: gen_helper_neon_abs_s16(tmp, tmp); break;
-                            case 2: tcg_gen_abs_i32(tmp, tmp); break;
-                            default: abort();
-                            }
-                            break;
                         case NEON_2RM_VCGT0_F:
                         {
                             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 22/38] target/cris: Use tcg_gen_abs_tl
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (20 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 21/38] target/arm: Use tcg_gen_abs_i64 and tcg_gen_gvec_abs Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-23 10:09   ` Philippe Mathieu-Daudé
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 23/38] target/ppc: " Richard Henderson
                   ` (18 subsequent siblings)
  40 siblings, 1 reply; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/cris/translate.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index 11b2c11174..0374718f66 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -1685,18 +1685,11 @@ static int dec_cmp_r(CPUCRISState *env, DisasContext *dc)
 
 static int dec_abs_r(CPUCRISState *env, DisasContext *dc)
 {
-    TCGv t0;
-
     LOG_DIS("abs $r%u, $r%u\n",
             dc->op1, dc->op2);
     cris_cc_mask(dc, CC_MASK_NZ);
 
-    t0 = tcg_temp_new();
-    tcg_gen_sari_tl(t0, cpu_R[dc->op1], 31);
-    tcg_gen_xor_tl(cpu_R[dc->op2], cpu_R[dc->op1], t0);
-    tcg_gen_sub_tl(cpu_R[dc->op2], cpu_R[dc->op2], t0);
-    tcg_temp_free(t0);
-
+    tcg_gen_abs_tl(cpu_R[dc->op2], cpu_R[dc->op1]);
     cris_alu(dc, CC_OP_MOVE,
             cpu_R[dc->op2], cpu_R[dc->op2], cpu_R[dc->op2], 4);
     return 2;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 23/38] target/ppc: Use tcg_gen_abs_tl
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (21 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 22/38] target/cris: Use tcg_gen_abs_tl Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 24/38] target/s390x: Use tcg_gen_abs_i64 Richard Henderson
                   ` (17 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 80 +++++++++++++++++-------------------------
 1 file changed, 32 insertions(+), 48 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index badc1ae1a3..97b8e8ddaf 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -5013,39 +5013,27 @@ static void gen_ecowx(DisasContext *ctx)
 /* abs - abs. */
 static void gen_abs(DisasContext *ctx)
 {
-    TCGLabel *l1 = gen_new_label();
-    TCGLabel *l2 = gen_new_label();
-    tcg_gen_brcondi_tl(TCG_COND_GE, cpu_gpr[rA(ctx->opcode)], 0, l1);
-    tcg_gen_neg_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-    tcg_gen_br(l2);
-    gen_set_label(l1);
-    tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-    gen_set_label(l2);
-    if (unlikely(Rc(ctx->opcode) != 0))
-        gen_set_Rc0(ctx, cpu_gpr[rD(ctx->opcode)]);
+    TCGv d = cpu_gpr[rD(ctx->opcode)];
+    TCGv a = cpu_gpr[rA(ctx->opcode)];
+
+    tcg_gen_abs_tl(d, a);
+    if (unlikely(Rc(ctx->opcode) != 0)) {
+        gen_set_Rc0(ctx, d);
+    }
 }
 
 /* abso - abso. */
 static void gen_abso(DisasContext *ctx)
 {
-    TCGLabel *l1 = gen_new_label();
-    TCGLabel *l2 = gen_new_label();
-    TCGLabel *l3 = gen_new_label();
-    /* Start with XER OV disabled, the most likely case */
-    tcg_gen_movi_tl(cpu_ov, 0);
-    tcg_gen_brcondi_tl(TCG_COND_GE, cpu_gpr[rA(ctx->opcode)], 0, l2);
-    tcg_gen_brcondi_tl(TCG_COND_NE, cpu_gpr[rA(ctx->opcode)], 0x80000000, l1);
-    tcg_gen_movi_tl(cpu_ov, 1);
-    tcg_gen_movi_tl(cpu_so, 1);
-    tcg_gen_br(l2);
-    gen_set_label(l1);
-    tcg_gen_neg_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-    tcg_gen_br(l3);
-    gen_set_label(l2);
-    tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-    gen_set_label(l3);
-    if (unlikely(Rc(ctx->opcode) != 0))
-        gen_set_Rc0(ctx, cpu_gpr[rD(ctx->opcode)]);
+    TCGv d = cpu_gpr[rD(ctx->opcode)];
+    TCGv a = cpu_gpr[rA(ctx->opcode)];
+
+    tcg_gen_setcondi_tl(TCG_COND_EQ, cpu_ov, a, 0x80000000);
+    tcg_gen_abs_tl(d, a);
+    tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
+    if (unlikely(Rc(ctx->opcode) != 0)) {
+        gen_set_Rc0(ctx, d);
+    }
 }
 
 /* clcs */
@@ -5265,33 +5253,29 @@ static void gen_mulo(DisasContext *ctx)
 /* nabs - nabs. */
 static void gen_nabs(DisasContext *ctx)
 {
-    TCGLabel *l1 = gen_new_label();
-    TCGLabel *l2 = gen_new_label();
-    tcg_gen_brcondi_tl(TCG_COND_GT, cpu_gpr[rA(ctx->opcode)], 0, l1);
-    tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-    tcg_gen_br(l2);
-    gen_set_label(l1);
-    tcg_gen_neg_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-    gen_set_label(l2);
-    if (unlikely(Rc(ctx->opcode) != 0))
-        gen_set_Rc0(ctx, cpu_gpr[rD(ctx->opcode)]);
+    TCGv d = cpu_gpr[rD(ctx->opcode)];
+    TCGv a = cpu_gpr[rA(ctx->opcode)];
+
+    tcg_gen_abs_tl(d, a);
+    tcg_gen_neg_tl(d, d);
+    if (unlikely(Rc(ctx->opcode) != 0)) {
+        gen_set_Rc0(ctx, d);
+    }
 }
 
 /* nabso - nabso. */
 static void gen_nabso(DisasContext *ctx)
 {
-    TCGLabel *l1 = gen_new_label();
-    TCGLabel *l2 = gen_new_label();
-    tcg_gen_brcondi_tl(TCG_COND_GT, cpu_gpr[rA(ctx->opcode)], 0, l1);
-    tcg_gen_mov_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-    tcg_gen_br(l2);
-    gen_set_label(l1);
-    tcg_gen_neg_tl(cpu_gpr[rD(ctx->opcode)], cpu_gpr[rA(ctx->opcode)]);
-    gen_set_label(l2);
+    TCGv d = cpu_gpr[rD(ctx->opcode)];
+    TCGv a = cpu_gpr[rA(ctx->opcode)];
+
+    tcg_gen_abs_tl(d, a);
+    tcg_gen_neg_tl(d, d);
     /* nabs never overflows */
     tcg_gen_movi_tl(cpu_ov, 0);
-    if (unlikely(Rc(ctx->opcode) != 0))
-        gen_set_Rc0(ctx, cpu_gpr[rD(ctx->opcode)]);
+    if (unlikely(Rc(ctx->opcode) != 0)) {
+        gen_set_Rc0(ctx, d);
+    }
 }
 
 /* rlmi - rlmi. */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 24/38] target/s390x: Use tcg_gen_abs_i64
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (22 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 23/38] target/ppc: " Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-23 18:40   ` David Hildenbrand
  2019-04-23 22:12   ` Philippe Mathieu-Daudé
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 25/38] target/xtensa: Use tcg_gen_abs_i32 Richard Henderson
                   ` (16 subsequent siblings)
  40 siblings, 2 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/translate.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 0afa8f7ca5..030129acbb 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -1407,13 +1407,7 @@ static DisasJumpType help_branch(DisasContext *s, DisasCompare *c,
 
 static DisasJumpType op_abs(DisasContext *s, DisasOps *o)
 {
-    TCGv_i64 z, n;
-    z = tcg_const_i64(0);
-    n = tcg_temp_new_i64();
-    tcg_gen_neg_i64(n, o->in2);
-    tcg_gen_movcond_i64(TCG_COND_LT, o->out, o->in2, z, n, o->in2);
-    tcg_temp_free_i64(n);
-    tcg_temp_free_i64(z);
+    tcg_gen_abs_i64(o->out, o->in2);
     return DISAS_NEXT;
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 25/38] target/xtensa: Use tcg_gen_abs_i32
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (23 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 24/38] target/s390x: Use tcg_gen_abs_i64 Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-23 22:14   ` Philippe Mathieu-Daudé
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 26/38] tcg/i386: Support vector absolute value Richard Henderson
                   ` (15 subsequent siblings)
  40 siblings, 1 reply; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/xtensa/translate.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
index 65561d2c49..62be8a6f6a 100644
--- a/target/xtensa/translate.c
+++ b/target/xtensa/translate.c
@@ -1707,14 +1707,7 @@ void restore_state_to_opc(CPUXtensaState *env, TranslationBlock *tb,
 static void translate_abs(DisasContext *dc, const OpcodeArg arg[],
                           const uint32_t par[])
 {
-    TCGv_i32 zero = tcg_const_i32(0);
-    TCGv_i32 neg = tcg_temp_new_i32();
-
-    tcg_gen_neg_i32(neg, arg[1].in);
-    tcg_gen_movcond_i32(TCG_COND_GE, arg[0].out,
-                        arg[1].in, zero, arg[1].in, neg);
-    tcg_temp_free(neg);
-    tcg_temp_free(zero);
+    tcg_gen_abs_i32(arg[0].out, arg[1].in);
 }
 
 static void translate_add(DisasContext *dc, const OpcodeArg arg[],
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 26/38] tcg/i386: Support vector absolute value
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (24 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 25/38] target/xtensa: Use tcg_gen_abs_i32 Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 27/38] tcg/aarch64: " Richard Henderson
                   ` (14 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h     |  2 +-
 tcg/i386/tcg-target.inc.c | 15 +++++++++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 7445f05885..66f16fbe3c 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -182,7 +182,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_orc_vec          0
 #define TCG_TARGET_HAS_not_vec          0
 #define TCG_TARGET_HAS_neg_vec          0
-#define TCG_TARGET_HAS_abs_vec          0
+#define TCG_TARGET_HAS_abs_vec          1
 #define TCG_TARGET_HAS_shi_vec          1
 #define TCG_TARGET_HAS_shs_vec          1
 #define TCG_TARGET_HAS_shv_vec          have_avx2
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 85b68e4326..3dae0bf0c5 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -369,6 +369,9 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_MOVSLQ	(0x63 | P_REXW)
 #define OPC_MOVZBL	(0xb6 | P_EXT)
 #define OPC_MOVZWL	(0xb7 | P_EXT)
+#define OPC_PABSB       (0x1c | P_EXT38 | P_DATA16)
+#define OPC_PABSW       (0x1d | P_EXT38 | P_DATA16)
+#define OPC_PABSD       (0x1e | P_EXT38 | P_DATA16)
 #define OPC_PACKSSDW    (0x6b | P_EXT | P_DATA16)
 #define OPC_PACKSSWB    (0x63 | P_EXT | P_DATA16)
 #define OPC_PACKUSDW    (0x2b | P_EXT38 | P_DATA16)
@@ -2739,6 +2742,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     static int const sars_insn[4] = {
         OPC_UD2, OPC_PSRAW, OPC_PSRAD, OPC_UD2
     };
+    static int const abs_insn[4] = {
+        /* TODO: AVX512 adds support for MO_64.  */
+        OPC_PABSB, OPC_PABSW, OPC_PABSD, OPC_UD2
+    };
 
     TCGType type = vecl + TCG_TYPE_V64;
     int insn, sub;
@@ -2827,6 +2834,11 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         insn = OPC_PUNPCKLDQ;
         goto gen_simd;
 #endif
+    case INDEX_op_abs_vec:
+        insn = abs_insn[vece];
+        a2 = a1;
+        a1 = 0;
+        goto gen_simd;
     gen_simd:
         tcg_debug_assert(insn != OPC_UD2);
         if (type == TCG_TYPE_V256) {
@@ -3204,6 +3216,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_dup2_vec:
 #endif
         return &x_x_x;
+    case INDEX_op_abs_vec:
     case INDEX_op_dup_vec:
     case INDEX_op_shli_vec:
     case INDEX_op_shri_vec:
@@ -3281,6 +3294,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_umin_vec:
     case INDEX_op_umax_vec:
         return vece <= MO_32 ? 1 : -1;
+    case INDEX_op_abs_vec:
+        return vece <= MO_32;
 
     default:
         return 0;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 27/38] tcg/aarch64: Support vector absolute value
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (25 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 26/38] tcg/i386: Support vector absolute value Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 28/38] tcg: Add support for vector comparison select Richard Henderson
                   ` (13 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h     | 2 +-
 tcg/aarch64/tcg-target.inc.c | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 21d06d928c..e43554c3c7 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -132,7 +132,7 @@ typedef enum {
 #define TCG_TARGET_HAS_orc_vec          1
 #define TCG_TARGET_HAS_not_vec          1
 #define TCG_TARGET_HAS_neg_vec          1
-#define TCG_TARGET_HAS_abs_vec          0
+#define TCG_TARGET_HAS_abs_vec          1
 #define TCG_TARGET_HAS_shi_vec          1
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          1
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 7d2a8213ec..cf891defd4 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -554,6 +554,7 @@ typedef enum {
     I3617_CMGE0     = 0x2e208800,
     I3617_CMLE0     = 0x2e20a800,
     I3617_NOT       = 0x2e205800,
+    I3617_ABS       = 0x0e20b800,
     I3617_NEG       = 0x2e20b800,
 
     /* System instructions.  */
@@ -2205,6 +2206,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_neg_vec:
         tcg_out_insn(s, 3617, NEG, is_q, vece, a0, a1);
         break;
+    case INDEX_op_abs_vec:
+        tcg_out_insn(s, 3617, ABS, is_q, vece, a0, a1);
+        break;
     case INDEX_op_and_vec:
         tcg_out_insn(s, 3616, AND, is_q, 0, a0, a1, a2);
         break;
@@ -2316,6 +2320,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_andc_vec:
     case INDEX_op_orc_vec:
     case INDEX_op_neg_vec:
+    case INDEX_op_abs_vec:
     case INDEX_op_not_vec:
     case INDEX_op_cmp_vec:
     case INDEX_op_shli_vec:
@@ -2559,6 +2564,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return &w_w_w;
     case INDEX_op_not_vec:
     case INDEX_op_neg_vec:
+    case INDEX_op_abs_vec:
     case INDEX_op_shli_vec:
     case INDEX_op_shri_vec:
     case INDEX_op_sari_vec:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 28/38] tcg: Add support for vector comparison select
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (26 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 27/38] tcg/aarch64: " Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 29/38] tcg/i386: Support vector comparison select value Richard Henderson
                   ` (12 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

At present, only tcg_gen_cmpsel_vec added, which can be used by
other target-specific vector expanders.  It is not clear whether
a full gvec expander would be worthwhile, given the unspecified
nature of the selector.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h |  1 +
 tcg/i386/tcg-target.h    |  1 +
 tcg/tcg-op.h             |  2 ++
 tcg/tcg-opc.h            |  1 +
 tcg/tcg.h                |  1 +
 tcg/tcg-op-gvec.c        |  3 +++
 tcg/tcg-op-vec.c         | 37 +++++++++++++++++++++++++++++++++++++
 tcg/tcg.c                |  2 ++
 tcg/README               | 12 ++++++++++++
 9 files changed, 60 insertions(+)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index e43554c3c7..e1135e930a 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -140,6 +140,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
+#define TCG_TARGET_HAS_cmpsel_vec       0
 
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 66f16fbe3c..683e029980 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -190,6 +190,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
+#define TCG_TARGET_HAS_cmpsel_vec       0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
     (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 660fe205d0..6c4cd0aa14 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -999,6 +999,8 @@ void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
 
 void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, TCGv_vec r,
                      TCGv_vec a, TCGv_vec b);
+void tcg_gen_cmpsel_vec(unsigned vece, TCGv_vec r, TCGv_vec s,
+                        TCGv_vec a, TCGv_vec b);
 
 void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);
 void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 4a2dd116eb..05fb9e3f37 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -255,6 +255,7 @@ DEF(shrv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
 DEF(sarv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
 
 DEF(cmp_vec, 1, 2, 1, IMPLVEC)
+DEF(cmpsel_vec, 1, 3, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_cmpsel_vec))
 
 DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 986055fdfa..1abee6cbe5 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -187,6 +187,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_mul_vec          0
 #define TCG_TARGET_HAS_sat_vec          0
 #define TCG_TARGET_HAS_minmax_vec       0
+#define TCG_TARGET_HAS_cmpsel_vec       0
 #else
 #define TCG_TARGET_MAYBE_vec            1
 #endif
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 87d5a01cc9..e7029d26f4 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -96,6 +96,9 @@ static bool tcg_can_emit_vecop_list(const TCGOpcode *list,
                 continue;
             }
             break;
+        case INDEX_op_cmpsel_vec:
+            /* Fallback expansion uses only required logial ops.  */
+            continue;
         default:
             break;
         }
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index b7f21145bb..7d8f7b490a 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -599,3 +599,40 @@ void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b)
 {
     do_shifts(vece, r, a, b, INDEX_op_sars_vec, INDEX_op_sarv_vec);
 }
+
+void tcg_gen_cmpsel_vec(unsigned vece, TCGv_vec r, TCGv_vec s,
+                        TCGv_vec a, TCGv_vec b)
+{
+    TCGTemp *rt = tcgv_vec_temp(r);
+    TCGTemp *st = tcgv_vec_temp(s);
+    TCGTemp *at = tcgv_vec_temp(a);
+    TCGTemp *bt = tcgv_vec_temp(b);
+    TCGArg ri = temp_arg(rt);
+    TCGArg si = temp_arg(st);
+    TCGArg ai = temp_arg(at);
+    TCGArg bi = temp_arg(bt);
+    TCGType type = rt->base_type;
+    const TCGOpcode *hold_list;
+    int can;
+
+    tcg_debug_assert(st->base_type >= type);
+    tcg_debug_assert(at->base_type >= type);
+    tcg_debug_assert(bt->base_type >= type);
+    tcg_assert_listed_vecop(INDEX_op_cmpsel_vec);
+    hold_list = tcg_swap_vecop_list(NULL);
+
+    can = tcg_can_emit_vec_op(INDEX_op_cmpsel_vec, type, vece);
+    if (can > 0) {
+        vec_gen_4(INDEX_op_cmpsel_vec, type, vece, ri, si, ai, bi);
+    } else if (can < 0) {
+        tcg_expand_vec_op(INDEX_op_cmpsel_vec, type, vece, ri, si, ai, bi);
+    } else {
+        TCGv_vec t = tcg_temp_new_vec(type);
+
+        tcg_gen_and_vec(vece, t, a, s);
+        tcg_gen_andc_vec(vece, r, b, s);
+        tcg_gen_or_vec(vece, r, r, t);
+        tcg_temp_free_vec(t);
+    }
+    tcg_swap_vecop_list(hold_list);
+}
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 86a95a636b..0c68bd5cf5 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1651,6 +1651,8 @@ bool tcg_op_supported(TCGOpcode op)
     case INDEX_op_smax_vec:
     case INDEX_op_umax_vec:
         return have_vec && TCG_TARGET_HAS_minmax_vec;
+    case INDEX_op_cmpsel_vec:
+        return have_vec && TCG_TARGET_HAS_cmpsel_vec;
 
     default:
         tcg_debug_assert(op > INDEX_op_last_generic && op < NB_OPS);
diff --git a/tcg/README b/tcg/README
index cbdfd3b6bc..16977a8d62 100644
--- a/tcg/README
+++ b/tcg/README
@@ -627,6 +627,18 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32.
 
   Compare vectors by element, storing -1 for true and 0 for false.
 
+* cmpsel_vec v0, v1, v2, v3
+
+  If an element in v1 is "true", select the corresponding element
+  from v2, else select the corresponding element from v3, storing
+  the result in v0.  Nominally, this can be implemented as
+
+    v0 = (v2 & v1) | (v3 & ~v1)
+
+  HOWEVER, it is unspecified which bits of v1 are significant.
+  Thus each element of v1 must be -1 or 0, as if by the result
+  of cmp_vec.
+
 *********
 
 Note 1: Some shortcuts are defined when the last operand is known to be
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 29/38] tcg/i386: Support vector comparison select value
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (27 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 28/38] tcg: Add support for vector comparison select Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 30/38] tcg/aarch64: " Richard Henderson
                   ` (11 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

We already had backend support for this feature, but with a
backend-specific opcode.  Remove the old name, and reorder
the arguments to match the generic opcode.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h     |  2 +-
 tcg/i386/tcg-target.opc.h |  1 -
 tcg/i386/tcg-target.inc.c | 13 ++++++-------
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 683e029980..acdf96b99d 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -190,7 +190,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
-#define TCG_TARGET_HAS_cmpsel_vec       0
+#define TCG_TARGET_HAS_cmpsel_vec       1
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
     (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
diff --git a/tcg/i386/tcg-target.opc.h b/tcg/i386/tcg-target.opc.h
index e5fa88ba25..d761cca0cd 100644
--- a/tcg/i386/tcg-target.opc.h
+++ b/tcg/i386/tcg-target.opc.h
@@ -3,7 +3,6 @@
    consider these to be UNSPEC with names.  */
 
 DEF(x86_shufps_vec, 1, 2, 1, IMPLVEC)
-DEF(x86_vpblendvb_vec, 1, 3, 0, IMPLVEC)
 DEF(x86_blend_vec, 1, 2, 1, IMPLVEC)
 DEF(x86_packss_vec, 1, 2, 0, IMPLVEC)
 DEF(x86_packus_vec, 1, 2, 0, IMPLVEC)
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 3dae0bf0c5..50296b4b2f 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2921,13 +2921,13 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         tcg_out8(s, sub);
         break;
 
-    case INDEX_op_x86_vpblendvb_vec:
+    case INDEX_op_cmpsel_vec:
         insn = OPC_VPBLENDVB;
         if (type == TCG_TYPE_V256) {
             insn |= P_VEXL;
         }
-        tcg_out_vex_modrm(s, insn, a0, a1, a2);
-        tcg_out8(s, args[3] << 4);
+        tcg_out_vex_modrm(s, insn, a0, a2, args[3]);
+        tcg_out8(s, a1 << 4);
         break;
 
     case INDEX_op_x86_psrldq_vec:
@@ -3223,7 +3223,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_sari_vec:
     case INDEX_op_x86_psrldq_vec:
         return &x_x;
-    case INDEX_op_x86_vpblendvb_vec:
+    case INDEX_op_cmpsel_vec:
         return &x_x_x_x;
 
     default:
@@ -3241,6 +3241,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
     case INDEX_op_andc_vec:
+    case INDEX_op_cmpsel_vec:
         return 1;
     case INDEX_op_cmp_vec:
         return -1;
@@ -3537,9 +3538,7 @@ static void expand_vec_minmax(TCGType type, unsigned vece,
         TCGv_vec t2;
         t2 = v1, v1 = v2, v2 = t2;
     }
-    vec_gen_4(INDEX_op_x86_vpblendvb_vec, type, vece,
-              tcgv_vec_arg(v0), tcgv_vec_arg(v1),
-              tcgv_vec_arg(v2), tcgv_vec_arg(t1));
+    tcg_gen_cmpsel_vec(vece, v0, t1, v1, v2);
     tcg_temp_free_vec(t1);
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 30/38] tcg/aarch64: Support vector comparison select value
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (28 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 29/38] tcg/i386: Support vector comparison select value Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 31/38] target/ppc: Use vector variable shifts for VS{L, R, RA}{B, H, W, D} Richard Henderson
                   ` (10 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

The instruction set has 3 insns that perform the same operation,
only varying in which operand must overlap the destination.  We
can represent the operation without overlap and choose based on
the operands seen.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h     |  2 +-
 tcg/aarch64/tcg-target.inc.c | 24 +++++++++++++++++++++++-
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index e1135e930a..e030bf3c8f 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -140,7 +140,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
-#define TCG_TARGET_HAS_cmpsel_vec       0
+#define TCG_TARGET_HAS_cmpsel_vec       1
 
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index cf891defd4..84d402acd8 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -525,6 +525,9 @@ typedef enum {
     I3616_ADD       = 0x0e208400,
     I3616_AND       = 0x0e201c00,
     I3616_BIC       = 0x0e601c00,
+    I3616_BIF       = 0x2ee01c00,
+    I3616_BIT       = 0x2ea01c00,
+    I3616_BSL       = 0x2e601c00,
     I3616_EOR       = 0x2e201c00,
     I3616_MUL       = 0x0e209c00,
     I3616_ORR       = 0x0ea01c00,
@@ -2178,7 +2181,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 
     TCGType type = vecl + TCG_TYPE_V64;
     unsigned is_q = vecl;
-    TCGArg a0, a1, a2;
+    TCGArg a0, a1, a2, a3;
 
     a0 = args[0];
     a1 = args[1];
@@ -2301,6 +2304,20 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_cmpsel_vec:
+        a3 = args[3];
+        if (a0 == a3) {
+            tcg_out_insn(s, 3616, BIT, is_q, 0, a0, a2, a1);
+        } else if (a0 == a2) {
+            tcg_out_insn(s, 3616, BIF, is_q, 0, a0, a3, a1);
+        } else {
+            if (a0 != a1) {
+                tcg_out_mov(s, type, a0, a1);
+            }
+            tcg_out_insn(s, 3616, BSL, is_q, 0, a0, a2, a3);
+        }
+        break;
+
     case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
@@ -2323,6 +2340,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_abs_vec:
     case INDEX_op_not_vec:
     case INDEX_op_cmp_vec:
+    case INDEX_op_cmpsel_vec:
     case INDEX_op_shli_vec:
     case INDEX_op_shri_vec:
     case INDEX_op_sari_vec:
@@ -2405,6 +2423,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         = { .args_ct_str = { "r", "r", "rA", "rZ", "rZ" } };
     static const TCGTargetOpDef add2
         = { .args_ct_str = { "r", "r", "rZ", "rZ", "rA", "rMZ" } };
+    static const TCGTargetOpDef w_w_w_w
+        = { .args_ct_str = { "w", "w", "w", "w" } };
 
     switch (op) {
     case INDEX_op_goto_ptr:
@@ -2577,6 +2597,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return &w_wr;
     case INDEX_op_cmp_vec:
         return &w_w_wZ;
+    case INDEX_op_cmpsel_vec:
+        return &w_w_w_w;
 
     default:
         return NULL;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 31/38] target/ppc: Use vector variable shifts for VS{L, R, RA}{B, H, W, D}
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (29 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 30/38] tcg/aarch64: " Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 32/38] target/arm: Vectorize USHL and SSHL Richard Henderson
                   ` (9 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h                 |  24 ++--
 target/ppc/int_helper.c             |   6 +-
 target/ppc/translate/vmx-impl.inc.c | 168 ++++++++++++++++++++++++++--
 3 files changed, 172 insertions(+), 26 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 638a6e99c4..5416dc55ce 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -180,18 +180,18 @@ DEF_HELPER_3(vmuloub, void, avr, avr, avr)
 DEF_HELPER_3(vmulouh, void, avr, avr, avr)
 DEF_HELPER_3(vmulouw, void, avr, avr, avr)
 DEF_HELPER_3(vmuluwm, void, avr, avr, avr)
-DEF_HELPER_3(vsrab, void, avr, avr, avr)
-DEF_HELPER_3(vsrah, void, avr, avr, avr)
-DEF_HELPER_3(vsraw, void, avr, avr, avr)
-DEF_HELPER_3(vsrad, void, avr, avr, avr)
-DEF_HELPER_3(vsrb, void, avr, avr, avr)
-DEF_HELPER_3(vsrh, void, avr, avr, avr)
-DEF_HELPER_3(vsrw, void, avr, avr, avr)
-DEF_HELPER_3(vsrd, void, avr, avr, avr)
-DEF_HELPER_3(vslb, void, avr, avr, avr)
-DEF_HELPER_3(vslh, void, avr, avr, avr)
-DEF_HELPER_3(vslw, void, avr, avr, avr)
-DEF_HELPER_3(vsld, void, avr, avr, avr)
+DEF_HELPER_4(vsrab, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsrah, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsraw, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsrad, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsrb, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsrh, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsrw, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsrd, void, avr, avr, avr, i32)
+DEF_HELPER_4(vslb, void, avr, avr, avr, i32)
+DEF_HELPER_4(vslh, void, avr, avr, avr, i32)
+DEF_HELPER_4(vslw, void, avr, avr, avr, i32)
+DEF_HELPER_4(vsld, void, avr, avr, avr, i32)
 DEF_HELPER_3(vslo, void, avr, avr, avr)
 DEF_HELPER_3(vsro, void, avr, avr, avr)
 DEF_HELPER_3(vsrv, void, avr, avr, avr)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 162add561e..35ec1ccdfb 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1770,7 +1770,8 @@ VSHIFT(r, 0)
 #undef VSHIFT
 
 #define VSL(suffix, element, mask)                                      \
-    void helper_vsl##suffix(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)   \
+    void helper_vsl##suffix(ppc_avr_t *r, ppc_avr_t *a,                 \
+                            ppc_avr_t *b, uint32_t desc)                \
     {                                                                   \
         int i;                                                          \
                                                                         \
@@ -1958,7 +1959,8 @@ VNEG(vnegd, s64)
 #undef VNEG
 
 #define VSR(suffix, element, mask)                                      \
-    void helper_vsr##suffix(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)   \
+    void helper_vsr##suffix(ppc_avr_t *r, ppc_avr_t *a,                 \
+                            ppc_avr_t *b, uint32_t desc)                \
     {                                                                   \
         int i;                                                          \
                                                                         \
diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index c83e605a00..8cc2e99963 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -511,6 +511,150 @@ static void gen_vmrgow(DisasContext *ctx)
     tcg_temp_free_i64(avr);
 }
 
+static void gen_vsl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t = tcg_temp_new_vec_matching(b);
+    tcg_gen_dupi_vec(vece, t, (8 << vece) - 1);
+    tcg_gen_and_vec(vece, b, b, t);
+    tcg_temp_free_vec(t);
+    tcg_gen_shlv_vec(vece, d, a, b);
+}
+
+static void gen_vslw_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    tcg_gen_andi_i32(b, b, 31);
+    tcg_gen_shl_i32(d, a, b);
+}
+
+static void gen_vsld_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    tcg_gen_andi_i64(b, b, 63);
+    tcg_gen_shl_i64(d, a, b);
+}
+
+static void gen__vsl(unsigned vece, uint32_t dofs, uint32_t aofs,
+                     uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode shlv_list[] = { INDEX_op_shlv_vec, 0 };
+    static const GVecGen3 g[4] = {
+        { .fniv = gen_vsl_vec,
+          .fno = gen_helper_vslb,
+          .opt_opc = shlv_list,
+          .vece = MO_8 },
+        { .fniv = gen_vsl_vec,
+          .fno = gen_helper_vslh,
+          .opt_opc = shlv_list,
+          .vece = MO_16 },
+        { .fni4 = gen_vslw_i32,
+          .fniv = gen_vsl_vec,
+          .fno = gen_helper_vslw,
+          .opt_opc = shlv_list,
+          .vece = MO_32 },
+        { .fni8 = gen_vsld_i64,
+          .fniv = gen_vsl_vec,
+          .fno = gen_helper_vsld,
+          .opt_opc = shlv_list,
+          .vece = MO_64 }
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
+}
+
+static void gen_vsr_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t = tcg_temp_new_vec_matching(b);
+    tcg_gen_dupi_vec(vece, t, (8 << vece) - 1);
+    tcg_gen_and_vec(vece, b, b, t);
+    tcg_temp_free_vec(t);
+    tcg_gen_shrv_vec(vece, d, a, b);
+}
+
+static void gen_vsrw_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    tcg_gen_andi_i32(b, b, 31);
+    tcg_gen_shr_i32(d, a, b);
+}
+
+static void gen_vsrd_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    tcg_gen_andi_i64(b, b, 63);
+    tcg_gen_shr_i64(d, a, b);
+}
+
+static void gen__vsr(unsigned vece, uint32_t dofs, uint32_t aofs,
+                     uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode shrv_list[] = { INDEX_op_shrv_vec, 0 };
+    static const GVecGen3 g[4] = {
+        { .fniv = gen_vsr_vec,
+          .fno = gen_helper_vsrb,
+          .opt_opc = shrv_list,
+          .vece = MO_8 },
+        { .fniv = gen_vsr_vec,
+          .fno = gen_helper_vsrh,
+          .opt_opc = shrv_list,
+          .vece = MO_16 },
+        { .fni4 = gen_vsrw_i32,
+          .fniv = gen_vsr_vec,
+          .fno = gen_helper_vsrw,
+          .opt_opc = shrv_list,
+          .vece = MO_32 },
+        { .fni8 = gen_vsrd_i64,
+          .fniv = gen_vsr_vec,
+          .fno = gen_helper_vsrd,
+          .opt_opc = shrv_list,
+          .vece = MO_64 }
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
+}
+
+static void gen_vsra_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec t = tcg_temp_new_vec_matching(b);
+    tcg_gen_dupi_vec(vece, t, (8 << vece) - 1);
+    tcg_gen_and_vec(vece, b, b, t);
+    tcg_temp_free_vec(t);
+    tcg_gen_sarv_vec(vece, d, a, b);
+}
+
+static void gen_vsraw_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    tcg_gen_andi_i32(b, b, 31);
+    tcg_gen_sar_i32(d, a, b);
+}
+
+static void gen_vsrad_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    tcg_gen_andi_i64(b, b, 63);
+    tcg_gen_sar_i64(d, a, b);
+}
+
+static void gen__vsra(unsigned vece, uint32_t dofs, uint32_t aofs,
+                      uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
+{
+    static const TCGOpcode sarv_list[] = { INDEX_op_sarv_vec, 0 };
+    static const GVecGen3 g[4] = {
+        { .fniv = gen_vsra_vec,
+          .fno = gen_helper_vsrab,
+          .opt_opc = sarv_list,
+          .vece = MO_8 },
+        { .fniv = gen_vsra_vec,
+          .fno = gen_helper_vsrah,
+          .opt_opc = sarv_list,
+          .vece = MO_16 },
+        { .fni4 = gen_vsraw_i32,
+          .fniv = gen_vsra_vec,
+          .fno = gen_helper_vsraw,
+          .opt_opc = sarv_list,
+          .vece = MO_32 },
+        { .fni8 = gen_vsrad_i64,
+          .fniv = gen_vsra_vec,
+          .fno = gen_helper_vsrad,
+          .opt_opc = sarv_list,
+          .vece = MO_64 }
+    };
+    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
+}
+
 GEN_VXFORM(vmuloub, 4, 0);
 GEN_VXFORM(vmulouh, 4, 1);
 GEN_VXFORM(vmulouw, 4, 2);
@@ -526,21 +670,21 @@ GEN_VXFORM(vmuleuw, 4, 10);
 GEN_VXFORM(vmulesb, 4, 12);
 GEN_VXFORM(vmulesh, 4, 13);
 GEN_VXFORM(vmulesw, 4, 14);
-GEN_VXFORM(vslb, 2, 4);
-GEN_VXFORM(vslh, 2, 5);
-GEN_VXFORM(vslw, 2, 6);
+GEN_VXFORM_V(vslb, MO_8, gen__vsl, 2, 4);
+GEN_VXFORM_V(vslh, MO_16, gen__vsl, 2, 5);
+GEN_VXFORM_V(vslw, MO_32, gen__vsl, 2, 6);
+GEN_VXFORM_V(vsld, MO_64, gen__vsl, 2, 23);
 GEN_VXFORM(vrlwnm, 2, 6);
 GEN_VXFORM_DUAL(vslw, PPC_ALTIVEC, PPC_NONE, \
                 vrlwnm, PPC_NONE, PPC2_ISA300)
-GEN_VXFORM(vsld, 2, 23);
-GEN_VXFORM(vsrb, 2, 8);
-GEN_VXFORM(vsrh, 2, 9);
-GEN_VXFORM(vsrw, 2, 10);
-GEN_VXFORM(vsrd, 2, 27);
-GEN_VXFORM(vsrab, 2, 12);
-GEN_VXFORM(vsrah, 2, 13);
-GEN_VXFORM(vsraw, 2, 14);
-GEN_VXFORM(vsrad, 2, 15);
+GEN_VXFORM_V(vsrb, MO_8, gen__vsr, 2, 8);
+GEN_VXFORM_V(vsrh, MO_16, gen__vsr, 2, 9);
+GEN_VXFORM_V(vsrw, MO_32, gen__vsr, 2, 10);
+GEN_VXFORM_V(vsrd, MO_64, gen__vsr, 2, 27);
+GEN_VXFORM_V(vsrab, MO_8, gen__vsra, 2, 12);
+GEN_VXFORM_V(vsrah, MO_16, gen__vsra, 2, 13);
+GEN_VXFORM_V(vsraw, MO_32, gen__vsra, 2, 14);
+GEN_VXFORM_V(vsrad, MO_64, gen__vsra, 2, 15);
 GEN_VXFORM(vsrv, 2, 28);
 GEN_VXFORM(vslv, 2, 29);
 GEN_VXFORM(vslo, 6, 16);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 32/38] target/arm: Vectorize USHL and SSHL
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (30 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 31/38] target/ppc: Use vector variable shifts for VS{L, R, RA}{B, H, W, D} Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 33/38] tcg/aarch64: Do not advertise minmax for MO_64 Richard Henderson
                   ` (8 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

These instructions shift left or right depending on the sign
of the input, and 7 bits are significant to the shift.  This
requires several masks and selects in addition to the actual
shifts to form the complete answer.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h        |  15 +-
 target/arm/translate.h     |   6 +
 target/arm/neon_helper.c   |  33 -----
 target/arm/translate-a64.c |  18 +--
 target/arm/translate.c     | 288 +++++++++++++++++++++++++++++++++++--
 target/arm/vec_helper.c    | 176 +++++++++++++++++++++++
 6 files changed, 470 insertions(+), 66 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 3d90b5be66..1c0de661fb 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -292,14 +292,8 @@ DEF_HELPER_2(neon_abd_s16, i32, i32, i32)
 DEF_HELPER_2(neon_abd_u32, i32, i32, i32)
 DEF_HELPER_2(neon_abd_s32, i32, i32, i32)
 
-DEF_HELPER_2(neon_shl_u8, i32, i32, i32)
-DEF_HELPER_2(neon_shl_s8, i32, i32, i32)
 DEF_HELPER_2(neon_shl_u16, i32, i32, i32)
 DEF_HELPER_2(neon_shl_s16, i32, i32, i32)
-DEF_HELPER_2(neon_shl_u32, i32, i32, i32)
-DEF_HELPER_2(neon_shl_s32, i32, i32, i32)
-DEF_HELPER_2(neon_shl_u64, i64, i64, i64)
-DEF_HELPER_2(neon_shl_s64, i64, i64, i64)
 DEF_HELPER_2(neon_rshl_u8, i32, i32, i32)
 DEF_HELPER_2(neon_rshl_s8, i32, i32, i32)
 DEF_HELPER_2(neon_rshl_u16, i32, i32, i32)
@@ -686,6 +680,15 @@ DEF_HELPER_FLAGS_2(frint64_s, TCG_CALL_NO_RWG, f32, f32, ptr)
 DEF_HELPER_FLAGS_2(frint32_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 DEF_HELPER_FLAGS_2(frint64_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 
+DEF_HELPER_FLAGS_4(gvec_sshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 912cc2a4a5..633668fa1b 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -244,6 +244,8 @@ extern const GVecGen3 bif_op;
 extern const GVecGen3 mla_op[4];
 extern const GVecGen3 mls_op[4];
 extern const GVecGen3 cmtst_op[4];
+extern const GVecGen3 sshl_op[4];
+extern const GVecGen3 ushl_op[4];
 extern const GVecGen2i ssra_op[4];
 extern const GVecGen2i usra_op[4];
 extern const GVecGen2i sri_op[4];
@@ -253,6 +255,10 @@ extern const GVecGen4 sqadd_op[4];
 extern const GVecGen4 uqsub_op[4];
 extern const GVecGen4 sqsub_op[4];
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
+void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
+void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 /*
  * Forward to the isar_feature_* tests given a DisasContext pointer.
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index 4259056723..c581ffb7d3 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -615,24 +615,9 @@ NEON_VOP(abd_u32, neon_u32, 1)
     } else { \
         dest = src1 << tmp; \
     }} while (0)
-NEON_VOP(shl_u8, neon_u8, 4)
 NEON_VOP(shl_u16, neon_u16, 2)
-NEON_VOP(shl_u32, neon_u32, 1)
 #undef NEON_FN
 
-uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
-{
-    int8_t shift = (int8_t)shiftop;
-    if (shift >= 64 || shift <= -64) {
-        val = 0;
-    } else if (shift < 0) {
-        val >>= -shift;
-    } else {
-        val <<= shift;
-    }
-    return val;
-}
-
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
@@ -645,27 +630,9 @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
     } else { \
         dest = src1 << tmp; \
     }} while (0)
-NEON_VOP(shl_s8, neon_s8, 4)
 NEON_VOP(shl_s16, neon_s16, 2)
-NEON_VOP(shl_s32, neon_s32, 1)
 #undef NEON_FN
 
-uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop)
-{
-    int8_t shift = (int8_t)shiftop;
-    int64_t val = valop;
-    if (shift >= 64) {
-        val = 0;
-    } else if (shift <= -64) {
-        val >>= 63;
-    } else if (shift < 0) {
-        val >>= -shift;
-    } else {
-        val <<= shift;
-    }
-    return val;
-}
-
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index fd8921565e..c30f99c7cd 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -8845,9 +8845,9 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
         break;
     case 0x8: /* SSHL, USHL */
         if (u) {
-            gen_helper_neon_shl_u64(tcg_rd, tcg_rn, tcg_rm);
+            gen_ushl_i64(tcg_rd, tcg_rn, tcg_rm);
         } else {
-            gen_helper_neon_shl_s64(tcg_rd, tcg_rn, tcg_rm);
+            gen_sshl_i64(tcg_rd, tcg_rn, tcg_rm);
         }
         break;
     case 0x9: /* SQSHL, UQSHL */
@@ -11242,6 +11242,10 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                        is_q ? 16 : 8, vec_full_reg_size(s),
                        (u ? uqsub_op : sqsub_op) + size);
         return;
+    case 0x08: /* SSHL, USHL */
+        gen_gvec_op3(s, is_q, rd, rn, rm,
+                     u ? &ushl_op[size] : &sshl_op[size]);
+        return;
     case 0x0c: /* SMAX, UMAX */
         if (u) {
             gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -11357,16 +11361,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genfn = fns[size][u];
                 break;
             }
-            case 0x8: /* SSHL, USHL */
-            {
-                static NeonGenTwoOpFn * const fns[3][2] = {
-                    { gen_helper_neon_shl_s8, gen_helper_neon_shl_u8 },
-                    { gen_helper_neon_shl_s16, gen_helper_neon_shl_u16 },
-                    { gen_helper_neon_shl_s32, gen_helper_neon_shl_u32 },
-                };
-                genfn = fns[size][u];
-                break;
-            }
             case 0x9: /* SQSHL, UQSHL */
             {
                 static NeonGenTwoOpEnvFn * const fns[3][2] = {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 911ad0bdab..1fd31228f7 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5285,13 +5285,13 @@ static inline void gen_neon_shift_narrow(int size, TCGv_i32 var, TCGv_i32 shift,
         if (u) {
             switch (size) {
             case 1: gen_helper_neon_shl_u16(var, var, shift); break;
-            case 2: gen_helper_neon_shl_u32(var, var, shift); break;
+            case 2: gen_ushl_i32(var, var, shift); break;
             default: abort();
             }
         } else {
             switch (size) {
             case 1: gen_helper_neon_shl_s16(var, var, shift); break;
-            case 2: gen_helper_neon_shl_s32(var, var, shift); break;
+            case 2: gen_sshl_i32(var, var, shift); break;
             default: abort();
             }
         }
@@ -6170,6 +6170,270 @@ const GVecGen3 cmtst_op[4] = {
       .vece = MO_64 },
 };
 
+void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 lval = tcg_temp_new_i32();
+    TCGv_i32 rval = tcg_temp_new_i32();
+    TCGv_i32 lsh = tcg_temp_new_i32();
+    TCGv_i32 rsh = tcg_temp_new_i32();
+    TCGv_i32 zero = tcg_const_i32(0);
+    TCGv_i32 max = tcg_const_i32(32);
+
+    /*
+     * Perform possibly out of range shifts, trusting that the operation
+     * does not trap.  Discard unused results after the fact.
+     */
+    tcg_gen_ext8s_i32(lsh, b);
+    tcg_gen_neg_i32(rsh, lsh);
+    tcg_gen_shl_i32(lval, a, lsh);
+    tcg_gen_shr_i32(rval, a, rsh);
+    tcg_gen_movcond_i32(TCG_COND_LTU, d, lsh, max, lval, zero);
+    tcg_gen_movcond_i32(TCG_COND_LTU, d, rsh, max, rval, d);
+
+    tcg_temp_free_i32(lval);
+    tcg_temp_free_i32(rval);
+    tcg_temp_free_i32(lsh);
+    tcg_temp_free_i32(rsh);
+    tcg_temp_free_i32(zero);
+    tcg_temp_free_i32(max);
+}
+
+void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 lval = tcg_temp_new_i64();
+    TCGv_i64 rval = tcg_temp_new_i64();
+    TCGv_i64 lsh = tcg_temp_new_i64();
+    TCGv_i64 rsh = tcg_temp_new_i64();
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 max = tcg_const_i64(64);
+
+    /*
+     * Perform possibly out of range shifts, trusting that the operation
+     * does not trap.  Discard unused results after the fact.
+     */
+    tcg_gen_ext8s_i64(lsh, b);
+    tcg_gen_neg_i64(rsh, lsh);
+    tcg_gen_shl_i64(lval, a, lsh);
+    tcg_gen_shr_i64(rval, a, rsh);
+    tcg_gen_movcond_i64(TCG_COND_LTU, d, lsh, max, lval, zero);
+    tcg_gen_movcond_i64(TCG_COND_LTU, d, rsh, max, rval, d);
+
+    tcg_temp_free_i64(lval);
+    tcg_temp_free_i64(rval);
+    tcg_temp_free_i64(lsh);
+    tcg_temp_free_i64(rsh);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(max);
+}
+
+static void gen_ushl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec lval = tcg_temp_new_vec_matching(d);
+    TCGv_vec rval = tcg_temp_new_vec_matching(d);
+    TCGv_vec lsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec rsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec msk, max;
+
+    /*
+     * Since we don't have a sign-extend vector primitive, negate and mask.
+     * With the out-of-range check below, we'll select the correct answer.
+     */
+    tcg_gen_neg_vec(vece, rsh, b);
+    if (vece == MO_8) {
+        tcg_gen_mov_vec(lsh, b);
+    } else {
+        msk = tcg_temp_new_vec_matching(d);
+        tcg_gen_dupi_vec(vece, msk, 0xff);
+        tcg_gen_and_vec(vece, lsh, b, msk);
+        tcg_gen_and_vec(vece, rsh, rsh, msk);
+        tcg_temp_free_vec(msk);
+    }
+
+    /*
+     * Perform possibly out of range shifts, trusting that the operation
+     * does not trap.  Discard unused results after the fact.
+     */
+    tcg_gen_shlv_vec(vece, lval, a, lsh);
+    tcg_gen_shrv_vec(vece, rval, a, rsh);
+
+    max = tcg_temp_new_vec_matching(d);
+    tcg_gen_dupi_vec(vece, max, 8 << vece);
+    tcg_gen_cmp_vec(TCG_COND_LTU, vece, lsh, lsh, max);
+    tcg_gen_cmp_vec(TCG_COND_LTU, vece, rsh, rsh, max);
+    tcg_temp_free_vec(max);
+
+    tcg_gen_and_vec(vece, lval, lval, lsh);
+    tcg_gen_and_vec(vece, rval, rval, rsh);
+    tcg_gen_or_vec(vece, d, lval, rval);
+
+    tcg_temp_free_vec(lval);
+    tcg_temp_free_vec(rval);
+    tcg_temp_free_vec(lsh);
+    tcg_temp_free_vec(rsh);
+}
+
+static const TCGOpcode ushl_list[] = {
+    INDEX_op_neg_vec, INDEX_op_shlv_vec,
+    INDEX_op_shrv_vec, INDEX_op_cmp_vec, 0
+};
+
+const GVecGen3 ushl_op[4] = {
+    { .fniv = gen_ushl_vec,
+      .fno = gen_helper_gvec_ushl_b,
+      .opt_opc = ushl_list,
+      .vece = MO_8 },
+    { .fniv = gen_ushl_vec,
+      .fno = gen_helper_gvec_ushl_h,
+      .opt_opc = ushl_list,
+      .vece = MO_16 },
+    { .fni4 = gen_ushl_i32,
+      .fniv = gen_ushl_vec,
+      .fno = gen_helper_gvec_ushl_s,
+      .opt_opc = ushl_list,
+      .vece = MO_32 },
+    { .fni8 = gen_ushl_i64,
+      .fniv = gen_ushl_vec,
+      .fno = gen_helper_gvec_ushl_d,
+      .opt_opc = ushl_list,
+      .vece = MO_64 },
+};
+
+void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 lval = tcg_temp_new_i32();
+    TCGv_i32 rval = tcg_temp_new_i32();
+    TCGv_i32 lsh = tcg_temp_new_i32();
+    TCGv_i32 rsh = tcg_temp_new_i32();
+    TCGv_i32 zero = tcg_const_i32(0);
+    TCGv_i32 max = tcg_const_i32(31);
+
+    /*
+     * Perform possibly out of range shifts, trusting that the operation
+     * does not trap.  Discard unused results after the fact.
+     */
+    tcg_gen_ext8s_i32(lsh, b);
+    tcg_gen_neg_i32(rsh, lsh);
+    tcg_gen_shl_i32(lval, a, lsh);
+    tcg_gen_umin_i32(rsh, rsh, max);
+    tcg_gen_sar_i32(rval, a, rsh);
+    tcg_gen_movcond_i32(TCG_COND_LEU, lval, lsh, max, lval, zero);
+    tcg_gen_movcond_i32(TCG_COND_LT, d, lsh, zero, rval, lval);
+
+    tcg_temp_free_i32(lval);
+    tcg_temp_free_i32(rval);
+    tcg_temp_free_i32(lsh);
+    tcg_temp_free_i32(rsh);
+    tcg_temp_free_i32(zero);
+    tcg_temp_free_i32(max);
+}
+
+void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 lval = tcg_temp_new_i64();
+    TCGv_i64 rval = tcg_temp_new_i64();
+    TCGv_i64 lsh = tcg_temp_new_i64();
+    TCGv_i64 rsh = tcg_temp_new_i64();
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 max = tcg_const_i64(63);
+
+    /*
+     * Perform possibly out of range shifts, trusting that the operation
+     * does not trap.  Discard unused results after the fact.
+     */
+    tcg_gen_ext8s_i64(lsh, b);
+    tcg_gen_neg_i64(rsh, lsh);
+    tcg_gen_shl_i64(lval, a, lsh);
+    tcg_gen_umin_i64(rsh, rsh, max);
+    tcg_gen_sar_i64(rval, a, rsh);
+    tcg_gen_movcond_i64(TCG_COND_LEU, lval, lsh, max, lval, zero);
+    tcg_gen_movcond_i64(TCG_COND_LT, d, lsh, zero, rval, lval);
+
+    tcg_temp_free_i64(lval);
+    tcg_temp_free_i64(rval);
+    tcg_temp_free_i64(lsh);
+    tcg_temp_free_i64(rsh);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(max);
+}
+
+static void gen_sshl_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+    TCGv_vec lval = tcg_temp_new_vec_matching(d);
+    TCGv_vec rval = tcg_temp_new_vec_matching(d);
+    TCGv_vec lsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec rsh = tcg_temp_new_vec_matching(d);
+    TCGv_vec msk = tcg_temp_new_vec_matching(d);
+    TCGv_vec max = tcg_temp_new_vec_matching(d);
+
+    /*
+     * Since we don't have a sign-extend vector primitive, negate and mask.
+     * With the out-of-range check below, we'll select the correct answer.
+     */
+    tcg_gen_neg_vec(vece, rsh, b);
+    if (vece == MO_8) {
+        tcg_gen_mov_vec(lsh, b);
+    } else {
+        tcg_gen_dupi_vec(vece, msk, 0xff);
+        tcg_gen_and_vec(vece, lsh, b, msk);
+        tcg_gen_and_vec(vece, rsh, rsh, msk);
+    }
+
+    /* Bound rsh so out of bound right shift gets -1.  */
+    tcg_gen_dupi_vec(vece, max, (8 << vece) - 1);
+    tcg_gen_umin_vec(vece, rsh, rsh, max);
+
+    /* Select between left and right shift.  */
+    tcg_gen_dupi_vec(vece, msk, vece == MO_8 ? 0 : 0x80);
+    tcg_gen_cmp_vec(TCG_COND_LT, vece, msk, lsh, msk);
+
+    tcg_gen_shlv_vec(vece, lval, a, lsh);
+    tcg_gen_sarv_vec(vece, rval, a, rsh);
+
+    /* Select in-bound left shift.  */
+    tcg_gen_cmp_vec(TCG_COND_GT, vece, lsh, lsh, max);
+
+    /* Merge the selections above.  */
+    tcg_gen_andc_vec(vece, lval, lval, lsh);
+    if (vece == MO_8) {
+        tcg_gen_cmpsel_vec(vece, d, msk, rval, lval);
+    } else {
+        tcg_gen_cmpsel_vec(vece, d, msk, lval, rval);
+    }
+
+    tcg_temp_free_vec(lval);
+    tcg_temp_free_vec(rval);
+    tcg_temp_free_vec(lsh);
+    tcg_temp_free_vec(rsh);
+    tcg_temp_free_vec(msk);
+    tcg_temp_free_vec(max);
+}
+
+static const TCGOpcode sshl_list[] = {
+    INDEX_op_neg_vec, INDEX_op_umin_vec, INDEX_op_shlv_vec,
+    INDEX_op_sarv_vec, INDEX_op_cmp_vec, INDEX_op_cmpsel_vec, 0
+};
+
+const GVecGen3 sshl_op[4] = {
+    { .fniv = gen_sshl_vec,
+      .fno = gen_helper_gvec_sshl_b,
+      .opt_opc = sshl_list,
+      .vece = MO_8 },
+    { .fniv = gen_sshl_vec,
+      .fno = gen_helper_gvec_sshl_h,
+      .opt_opc = sshl_list,
+      .vece = MO_16 },
+    { .fni4 = gen_sshl_i32,
+      .fniv = gen_sshl_vec,
+      .fno = gen_helper_gvec_sshl_s,
+      .opt_opc = sshl_list,
+      .vece = MO_32 },
+    { .fni8 = gen_sshl_i64,
+      .fniv = gen_sshl_vec,
+      .fno = gen_helper_gvec_sshl_d,
+      .opt_opc = sshl_list,
+      .vece = MO_64 },
+};
+
 static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
                           TCGv_vec a, TCGv_vec b)
 {
@@ -6573,6 +6837,11 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                   vec_size, vec_size);
             }
             return 0;
+
+        case NEON_3R_VSHL:
+            tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
+                           u ? &ushl_op[size] : &sshl_op[size]);
+            return 0;
         }
 
         if (size == 3) {
@@ -6581,13 +6850,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 neon_load_reg64(cpu_V0, rn + pass);
                 neon_load_reg64(cpu_V1, rm + pass);
                 switch (op) {
-                case NEON_3R_VSHL:
-                    if (u) {
-                        gen_helper_neon_shl_u64(cpu_V0, cpu_V1, cpu_V0);
-                    } else {
-                        gen_helper_neon_shl_s64(cpu_V0, cpu_V1, cpu_V0);
-                    }
-                    break;
                 case NEON_3R_VQSHL:
                     if (u) {
                         gen_helper_neon_qshl_u64(cpu_V0, cpu_env,
@@ -6622,7 +6884,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         }
         pairwise = 0;
         switch (op) {
-        case NEON_3R_VSHL:
         case NEON_3R_VQSHL:
         case NEON_3R_VRSHL:
         case NEON_3R_VQRSHL:
@@ -6702,9 +6963,6 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VHSUB:
             GEN_NEON_INTEGER_OP(hsub);
             break;
-        case NEON_3R_VSHL:
-            GEN_NEON_INTEGER_OP(shl);
-            break;
         case NEON_3R_VQSHL:
             GEN_NEON_INTEGER_OP_ENV(qshl);
             break;
@@ -7113,9 +7371,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             }
                         } else {
                             if (input_unsigned) {
-                                gen_helper_neon_shl_u64(cpu_V0, in, tmp64);
+                                gen_ushl_i64(cpu_V0, in, tmp64);
                             } else {
-                                gen_helper_neon_shl_s64(cpu_V0, in, tmp64);
+                                gen_sshl_i64(cpu_V0, in, tmp64);
                             }
                         }
                         tmp = tcg_temp_new_i32();
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index dedef62403..9f8eee5611 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -1046,3 +1046,179 @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
 }
+
+void HELPER(gvec_sshl_b)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int8_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz; ++i) {
+        int8_t mm = m[i];
+        int8_t nn = n[i];
+        int8_t res = 0;
+        if (mm >= 0) {
+            if (mm < 8) {
+                res = nn << mm;
+            }
+        } else {
+            res = nn >> (mm > -8 ? -mm : 7);
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_sshl_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int16_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 2; ++i) {
+        int8_t mm = m[i];   /* only 8 bits of shift are significant */
+        int16_t nn = n[i];
+        int16_t res = 0;
+        if (mm >= 0) {
+            if (mm < 16) {
+                res = nn << mm;
+            }
+        } else {
+            res = nn >> (mm > -16 ? -mm : 15);
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_sshl_s)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int32_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 4; ++i) {
+        int8_t mm = m[i];   /* only 8 bits of shift are significant */
+        int32_t nn = n[i];
+        int32_t res = 0;
+        if (mm >= 0) {
+            if (mm < 32) {
+                res = nn << mm;
+            }
+        } else {
+            res = nn >> (mm > -32 ? -mm : 31);
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_sshl_d)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int64_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 8; ++i) {
+        int8_t mm = m[i];   /* only 8 bits of shift are significant */
+        int64_t nn = n[i];
+        int64_t res = 0;
+        if (mm >= 0) {
+            if (mm < 64) {
+                res = nn << mm;
+            }
+        } else {
+            res = nn >> (mm > -64 ? -mm : 63);
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_ushl_b)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint8_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz; ++i) {
+        int8_t mm = m[i];
+        uint8_t nn = n[i];
+        uint8_t res = 0;
+        if (mm >= 0) {
+            if (mm < 8) {
+                res = nn << mm;
+            }
+        } else {
+            if (mm > -8) {
+                res = nn >> -mm;
+            }
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint16_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 2; ++i) {
+        int8_t mm = m[i];   /* only 8 bits of shift are significant */
+        uint16_t nn = n[i];
+        uint16_t res = 0;
+        if (mm >= 0) {
+            if (mm < 16) {
+                res = nn << mm;
+            }
+        } else {
+            if (mm > -16) {
+                res = nn >> -mm;
+            }
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_ushl_s)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint32_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 4; ++i) {
+        int8_t mm = m[i];   /* only 8 bits of shift are significant */
+        uint32_t nn = n[i];
+        uint32_t res = 0;
+        if (mm >= 0) {
+            if (mm < 32) {
+                res = nn << mm;
+            }
+        } else {
+            if (mm > -32) {
+                res = nn >> -mm;
+            }
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_ushl_d)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint64_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 8; ++i) {
+        int8_t mm = m[i];   /* only 8 bits of shift are significant */
+        uint64_t nn = n[i];
+        uint64_t res = 0;
+        if (mm >= 0) {
+            if (mm < 64) {
+                res = nn << mm;
+            }
+        } else {
+            if (mm > -64) {
+                res = nn >> -mm;
+            }
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 33/38] tcg/aarch64: Do not advertise minmax for MO_64
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (31 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 32/38] target/arm: Vectorize USHL and SSHL Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 34/38] tcg: Do not recreate INDEX_op_neg_vec unless supported Richard Henderson
                   ` (7 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

The min/max instructions are not available for 64-bit elements.

Fixes: 93f332a50371
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 84d402acd8..e68e4de08c 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -2348,16 +2348,16 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_sssub_vec:
     case INDEX_op_usadd_vec:
     case INDEX_op_ussub_vec:
-    case INDEX_op_smax_vec:
-    case INDEX_op_smin_vec:
-    case INDEX_op_umax_vec:
-    case INDEX_op_umin_vec:
     case INDEX_op_shlv_vec:
         return 1;
     case INDEX_op_shrv_vec:
     case INDEX_op_sarv_vec:
         return -1;
     case INDEX_op_mul_vec:
+    case INDEX_op_smax_vec:
+    case INDEX_op_smin_vec:
+    case INDEX_op_umax_vec:
+    case INDEX_op_umin_vec:
         return vece < MO_64;
 
     default:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 34/38] tcg: Do not recreate INDEX_op_neg_vec unless supported
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (32 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 33/38] tcg/aarch64: Do not advertise minmax for MO_64 Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 35/38] tcg: Introduce do_op3_nofail for vector expansion Richard Henderson
                   ` (6 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Use tcg_can_emit_vec_op instead of just TCG_TARGET_HAS_neg_vec,
so that we check the type and vece for the actual operation.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 5150c38a25..24faa06260 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -734,9 +734,13 @@ void tcg_optimize(TCGContext *s)
                 } else if (opc == INDEX_op_sub_i64) {
                     neg_op = INDEX_op_neg_i64;
                     have_neg = TCG_TARGET_HAS_neg_i64;
-                } else {
+                } else if (TCG_TARGET_HAS_neg_vec) {
+                    TCGType type = TCGOP_VECL(op) + TCG_TYPE_V64;
+                    unsigned vece = TCGOP_VECE(op);
                     neg_op = INDEX_op_neg_vec;
-                    have_neg = TCG_TARGET_HAS_neg_vec;
+                    have_neg = tcg_can_emit_vec_op(neg_op, type, vece) > 0;
+                } else {
+                    break;
                 }
                 if (!have_neg) {
                     break;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 35/38] tcg: Introduce do_op3_nofail for vector expansion
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (33 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 34/38] tcg: Do not recreate INDEX_op_neg_vec unless supported Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 36/38] tcg: Expand vector minmax using cmp+cmpsel Richard Henderson
                   ` (5 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

This makes do_op3 match do_op2 in allowing for failure,
and thus fall back expansions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-vec.c | 45 +++++++++++++++++++++++++++------------------
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 7d8f7b490a..5868a51270 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -450,7 +450,7 @@ void tcg_gen_cmp_vec(TCGCond cond, unsigned vece,
     }
 }
 
-static void do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
+static bool do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
                    TCGv_vec b, TCGOpcode opc)
 {
     TCGTemp *rt = tcgv_vec_temp(r);
@@ -468,82 +468,91 @@ static void do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
     can = tcg_can_emit_vec_op(opc, type, vece);
     if (can > 0) {
         vec_gen_3(opc, type, vece, ri, ai, bi);
-    } else {
+    } else if (can < 0) {
         const TCGOpcode *hold_list = tcg_swap_vecop_list(NULL);
-        tcg_debug_assert(can < 0);
         tcg_expand_vec_op(opc, type, vece, ri, ai, bi);
         tcg_swap_vecop_list(hold_list);
+    } else {
+        return false;
     }
+    return true;
+}
+
+static void do_op3_nofail(unsigned vece, TCGv_vec r, TCGv_vec a,
+                          TCGv_vec b, TCGOpcode opc)
+{
+    bool ok = do_op3(vece, r, a, b, opc);
+    tcg_debug_assert(ok);
 }
 
 void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_add_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_add_vec);
 }
 
 void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_sub_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_sub_vec);
 }
 
 void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_mul_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_mul_vec);
 }
 
 void tcg_gen_ssadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_ssadd_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_ssadd_vec);
 }
 
 void tcg_gen_usadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_usadd_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_usadd_vec);
 }
 
 void tcg_gen_sssub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_sssub_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_sssub_vec);
 }
 
 void tcg_gen_ussub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_ussub_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_ussub_vec);
 }
 
 void tcg_gen_smin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_smin_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_smin_vec);
 }
 
 void tcg_gen_umin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_umin_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_umin_vec);
 }
 
 void tcg_gen_smax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_smax_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_smax_vec);
 }
 
 void tcg_gen_umax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_umax_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_umax_vec);
 }
 
 void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_shlv_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_shlv_vec);
 }
 
 void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_shrv_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_shrv_vec);
 }
 
 void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_sarv_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_sarv_vec);
 }
 
 static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec a,
@@ -579,7 +588,7 @@ static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec a,
         } else {
             tcg_gen_dup_i32_vec(vece, vec_s, s);
         }
-        do_op3(vece, r, a, vec_s, opc_v);
+        do_op3_nofail(vece, r, a, vec_s, opc_v);
         tcg_temp_free_vec(vec_s);
     }
     tcg_swap_vecop_list(hold_list);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 36/38] tcg: Expand vector minmax using cmp+cmpsel
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (34 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 35/38] tcg: Introduce do_op3_nofail for vector expansion Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 37/38] tcg/aarch64: Use MVNI for expansion of dupi Richard Henderson
                   ` (4 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-gvec.c |  8 ++++++++
 tcg/tcg-op-vec.c  | 19 +++++++++++++++----
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index e7029d26f4..dddb00719a 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -99,6 +99,14 @@ static bool tcg_can_emit_vecop_list(const TCGOpcode *list,
         case INDEX_op_cmpsel_vec:
             /* Fallback expansion uses only required logial ops.  */
             continue;
+        case INDEX_op_smin_vec:
+        case INDEX_op_smax_vec:
+        case INDEX_op_umin_vec:
+        case INDEX_op_umax_vec:
+            if (tcg_can_emit_vec_op(INDEX_op_cmp_vec, type, vece)) {
+                continue;
+            }
+            break;
         default:
             break;
         }
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 5868a51270..43abeb0674 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -520,24 +520,35 @@ void tcg_gen_ussub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
     do_op3_nofail(vece, r, a, b, INDEX_op_ussub_vec);
 }
 
+static void do_minmax(unsigned vece, TCGv_vec r, TCGv_vec a,
+                      TCGv_vec b, TCGOpcode opc, TCGCond cond)
+{
+    if (!do_op3(vece, r, a, b, opc)) {
+        TCGv_vec t = tcg_temp_new_vec_matching(r);
+        tcg_gen_cmp_vec(cond, vece, t, a, b);
+        tcg_gen_cmpsel_vec(vece, r, t, a, b);
+        tcg_temp_free_vec(t);
+    }
+}
+
 void tcg_gen_smin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3_nofail(vece, r, a, b, INDEX_op_smin_vec);
+    do_minmax(vece, r, a, b, INDEX_op_smin_vec, TCG_COND_GT);
 }
 
 void tcg_gen_umin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3_nofail(vece, r, a, b, INDEX_op_umin_vec);
+    do_minmax(vece, r, a, b, INDEX_op_umin_vec, TCG_COND_LTU);
 }
 
 void tcg_gen_smax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3_nofail(vece, r, a, b, INDEX_op_smax_vec);
+    do_minmax(vece, r, a, b, INDEX_op_smax_vec, TCG_COND_GT);
 }
 
 void tcg_gen_umax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3_nofail(vece, r, a, b, INDEX_op_umax_vec);
+    do_minmax(vece, r, a, b, INDEX_op_umax_vec, TCG_COND_GTU);
 }
 
 void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 37/38] tcg/aarch64: Use MVNI for expansion of dupi
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (35 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 36/38] tcg: Expand vector minmax using cmp+cmpsel Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 38/38] tcg/aarch64: Use ORRI and BICI for vector logical operations Richard Henderson
                   ` (3 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index e68e4de08c..20c8699f79 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -513,6 +513,7 @@ typedef enum {
 
     /* AdvSIMD modified immediate */
     I3606_MOVI      = 0x0f000400,
+    I3606_MVNI      = 0x2f000400,
 
     /* AdvSIMD shift by immediate */
     I3614_SSHR      = 0x0f000400,
@@ -823,6 +824,9 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
 
     if (is_fimm(v64, &op, &cmode, &imm8)) {
         tcg_out_insn(s, 3606, MOVI, type == TCG_TYPE_V128, rd, op, cmode, imm8);
+    } else if (is_fimm(~v64, &op, &cmode, &imm8)
+               && op == 0 && ((1 << cmode) & 0x03555)) {
+        tcg_out_insn(s, 3606, MVNI, type == TCG_TYPE_V128, rd, 0, cmode, imm8);
     } else if (type == TCG_TYPE_V128) {
         new_pool_l2(s, R_AARCH64_CONDBR19, s->code_ptr, 0, v64, v64);
         tcg_out_insn(s, 3305, LDR_v128, 0, rd);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCH 38/38] tcg/aarch64: Use ORRI and BICI for vector logical operations
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (36 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 37/38] tcg/aarch64: Use MVNI for expansion of dupi Richard Henderson
@ 2019-04-20  7:34 ` Richard Henderson
  2019-04-20  8:09 ` [Qemu-devel] [PATCH 00/38] tcg vector improvements no-reply
                   ` (2 subsequent siblings)
  40 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-20  7:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 77 ++++++++++++++++++++++++++++++------
 1 file changed, 65 insertions(+), 12 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 20c8699f79..1e79a60fb2 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -119,6 +119,8 @@ static inline bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 #define TCG_CT_CONST_LIMM 0x200
 #define TCG_CT_CONST_ZERO 0x400
 #define TCG_CT_CONST_MONE 0x800
+#define TCG_CT_CONST_PVI  0x1000
+#define TCG_CT_CONST_IVI  0x2000
 
 /* parse target specific constraints */
 static const char *target_parse_constraint(TCGArgConstraint *ct,
@@ -154,6 +156,12 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
     case 'M': /* minus one */
         ct->ct |= TCG_CT_CONST_MONE;
         break;
+    case 'P': /* vector positive immediate */
+        ct->ct |= TCG_CT_CONST_PVI;
+        break;
+    case 'I': /* vector inverted immediate */
+        ct->ct |= TCG_CT_CONST_IVI;
+        break;
     case 'Z': /* zero */
         ct->ct |= TCG_CT_CONST_ZERO;
         break;
@@ -294,6 +302,7 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
                                   const TCGArgConstraint *arg_ct)
 {
     int ct = arg_ct->ct;
+    int op, cmode, imm8;
 
     if (ct & TCG_CT_CONST) {
         return 1;
@@ -313,6 +322,16 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
     if ((ct & TCG_CT_CONST_MONE) && val == -1) {
         return 1;
     }
+    if ((ct & TCG_CT_CONST_PVI) &&
+        is_fimm(val, &op, &cmode, &imm8) &&
+        op == 0 && ((1 << cmode) & 0x555)) {
+        return 1;
+    }
+    if ((ct & TCG_CT_CONST_IVI) &&
+        is_fimm(~val, &op, &cmode, &imm8) &&
+        op == 0 && ((1 << cmode) & 0x555)) {
+        return 1;
+    }
 
     return 0;
 }
@@ -514,6 +533,8 @@ typedef enum {
     /* AdvSIMD modified immediate */
     I3606_MOVI      = 0x0f000400,
     I3606_MVNI      = 0x2f000400,
+    I3606_ORRI      = 0x0f001400,
+    I3606_BICI      = 0x2f001400,
 
     /* AdvSIMD shift by immediate */
     I3614_SSHR      = 0x0f000400,
@@ -2217,20 +2238,48 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         tcg_out_insn(s, 3617, ABS, is_q, vece, a0, a1);
         break;
     case INDEX_op_and_vec:
-        tcg_out_insn(s, 3616, AND, is_q, 0, a0, a1, a2);
+        if (const_args[2]) {
+            int op, cmode, imm8;
+            is_fimm(~a2, &op, &cmode, &imm8);
+            tcg_out_mov(s, type, a0, a1);
+            tcg_out_insn(s, 3606, BICI, is_q, a0, 0, cmode, imm8);
+        } else {
+            tcg_out_insn(s, 3616, AND, is_q, 0, a0, a1, a2);
+        }
         break;
     case INDEX_op_or_vec:
-        tcg_out_insn(s, 3616, ORR, is_q, 0, a0, a1, a2);
+        if (const_args[2]) {
+            int op, cmode, imm8;
+            is_fimm(a2, &op, &cmode, &imm8);
+            tcg_out_mov(s, type, a0, a1);
+            tcg_out_insn(s, 3606, ORRI, is_q, a0, 0, cmode, imm8);
+        } else {
+            tcg_out_insn(s, 3616, ORR, is_q, 0, a0, a1, a2);
+        }
+        break;
+    case INDEX_op_andc_vec:
+        if (const_args[2]) {
+            int op, cmode, imm8;
+            is_fimm(a2, &op, &cmode, &imm8);
+            tcg_out_mov(s, type, a0, a1);
+            tcg_out_insn(s, 3606, BICI, is_q, a0, 0, cmode, imm8);
+        } else {
+            tcg_out_insn(s, 3616, BIC, is_q, 0, a0, a1, a2);
+        }
+        break;
+    case INDEX_op_orc_vec:
+        if (const_args[2]) {
+            int op, cmode, imm8;
+            is_fimm(~a2, &op, &cmode, &imm8);
+            tcg_out_mov(s, type, a0, a1);
+            tcg_out_insn(s, 3606, ORRI, is_q, a0, 0, cmode, imm8);
+        } else {
+            tcg_out_insn(s, 3616, ORN, is_q, 0, a0, a1, a2);
+        }
         break;
     case INDEX_op_xor_vec:
         tcg_out_insn(s, 3616, EOR, is_q, 0, a0, a1, a2);
         break;
-    case INDEX_op_andc_vec:
-        tcg_out_insn(s, 3616, BIC, is_q, 0, a0, a1, a2);
-        break;
-    case INDEX_op_orc_vec:
-        tcg_out_insn(s, 3616, ORN, is_q, 0, a0, a1, a2);
-        break;
     case INDEX_op_ssadd_vec:
         tcg_out_insn(s, 3616, SQADD, is_q, vece, a0, a1, a2);
         break;
@@ -2413,6 +2462,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef lZ_l = { .args_ct_str = { "lZ", "l" } };
     static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } };
     static const TCGTargetOpDef w_w_w = { .args_ct_str = { "w", "w", "w" } };
+    static const TCGTargetOpDef w_w_wI = { .args_ct_str = { "w", "w", "wI" } };
+    static const TCGTargetOpDef w_w_wP = { .args_ct_str = { "w", "w", "wP" } };
     static const TCGTargetOpDef w_w_wZ = { .args_ct_str = { "w", "w", "wZ" } };
     static const TCGTargetOpDef r_r_ri = { .args_ct_str = { "r", "r", "ri" } };
     static const TCGTargetOpDef r_r_rA = { .args_ct_str = { "r", "r", "rA" } };
@@ -2568,11 +2619,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_add_vec:
     case INDEX_op_sub_vec:
     case INDEX_op_mul_vec:
-    case INDEX_op_and_vec:
-    case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
-    case INDEX_op_andc_vec:
-    case INDEX_op_orc_vec:
     case INDEX_op_ssadd_vec:
     case INDEX_op_sssub_vec:
     case INDEX_op_usadd_vec:
@@ -2586,6 +2633,12 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_sarv_vec:
     case INDEX_op_aa64_sshl_vec:
         return &w_w_w;
+    case INDEX_op_or_vec:
+    case INDEX_op_andc_vec:
+        return &w_w_wP;
+    case INDEX_op_and_vec:
+    case INDEX_op_orc_vec:
+        return &w_w_wI;
     case INDEX_op_not_vec:
     case INDEX_op_neg_vec:
     case INDEX_op_abs_vec:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 00/38] tcg vector improvements
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (37 preceding siblings ...)
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 38/38] tcg/aarch64: Use ORRI and BICI for vector logical operations Richard Henderson
@ 2019-04-20  8:09 ` no-reply
  2019-04-23 19:15 ` David Hildenbrand
  2019-04-29 19:28 ` David Hildenbrand
  40 siblings, 0 replies; 69+ messages in thread
From: no-reply @ 2019-04-20  8:09 UTC (permalink / raw)
  To: richard.henderson; +Cc: fam, qemu-devel, david

Patchew URL: https://patchew.org/QEMU/20190420073442.7488-1-richard.henderson@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20190420073442.7488-1-richard.henderson@linaro.org
Subject: [Qemu-devel] [PATCH 00/38] tcg vector improvements

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]               patchew/20190420073442.7488-1-richard.henderson@linaro.org -> patchew/20190420073442.7488-1-richard.henderson@linaro.org
Switched to a new branch 'test'
eaace97609 tcg/aarch64: Use ORRI and BICI for vector logical operations
a701168d9c tcg/aarch64: Use MVNI for expansion of dupi
508dc19d39 tcg: Expand vector minmax using cmp+cmpsel
6976ef828e tcg: Introduce do_op3_nofail for vector expansion
2d6f5f7050 tcg: Do not recreate INDEX_op_neg_vec unless supported
7425b741da tcg/aarch64: Do not advertise minmax for MO_64
54ed8f1e33 target/arm: Vectorize USHL and SSHL
23ef118db8 target/ppc: Use vector variable shifts for VS{L, R, RA}{B, H, W, D}
65cae51501 tcg/aarch64: Support vector comparison select value
28392fb9a2 tcg/i386: Support vector comparison select value
a7efba49a2 tcg: Add support for vector comparison select
2aded2eb78 tcg/aarch64: Support vector absolute value
4537dd40fc tcg/i386: Support vector absolute value
f81e912312 target/xtensa: Use tcg_gen_abs_i32
3c3292af32 target/s390x: Use tcg_gen_abs_i64
e81e04de28 target/ppc: Use tcg_gen_abs_tl
1869b46302 target/cris: Use tcg_gen_abs_tl
74420abd9d target/arm: Use tcg_gen_abs_i64 and tcg_gen_gvec_abs
72e6fb61b1 tcg: Add support for vector absolute value
b6885d0400 tcg: Add support for integer absolute value
9c178a385e tcg/i386: Support vector scalar shift opcodes
8a79bb2407 tcg: Add gvec expanders for vector shift by scalar
c611d5ab1d tcg: Specify optional vector requirements with a list
2d22c62f7d tcg: Implement tcg_gen_gvec_3i()
3e06422a97 tcg/aarch64: Support vector variable shift opcodes
1d95e80f8b tcg/i386: Support vector variable shift opcodes
8f6ed1c661 tcg: Add gvec expanders for variable shift
8c573e5a9c tcg: Add INDEX_op_dup_mem_vec
ca9a67767b tcg/aarch64: Implement tcg_out_dupm_vec
24cd3652da tcg/i386: Implement tcg_out_dupm_vec
9a14d7cf98 tcg: Add tcg_out_dupm_vec to the backend interface
504eb1c2ce tcg: Manually expand INDEX_op_dup_vec
71309aeb54 tcg: Promote tcg_out_{dup, dupi}_vec to backend interface
24aee13663 tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded
0ccf0219e9 tcg: Support cross-class moves without instruction support
476aacd9e7 tcg: Return bool success from tcg_out_mov
41a8017efa tcg: Assert fixed_reg is read-only
d2482ee256 target/arm: Fill in .opc for cmtst_op

=== OUTPUT BEGIN ===
1/38 Checking commit d2482ee25652 (target/arm: Fill in .opc for cmtst_op)
2/38 Checking commit 41a8017efa97 (tcg: Assert fixed_reg is read-only)
WARNING: Block comments use a leading /* on a separate line
#103: FILE: tcg/tcg.c:3529:
+            /* temp value is modified, so the value kept in memory is

WARNING: Block comments use * on subsequent lines
#104: FILE: tcg/tcg.c:3530:
+            /* temp value is modified, so the value kept in memory is
+               potentially not the same */

WARNING: Block comments use a trailing */ on a separate line
#104: FILE: tcg/tcg.c:3530:
+               potentially not the same */

total: 0 errors, 3 warnings, 140 lines checked

Patch 2/38 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
3/38 Checking commit 476aacd9e75c (tcg: Return bool success from tcg_out_mov)
4/38 Checking commit 0ccf0219e9e3 (tcg: Support cross-class moves without instruction support)
WARNING: Block comments use a leading /* on a separate line
#24: FILE: tcg/tcg.c:3372:
+                /* Cross register class move not supported.

WARNING: Block comments use * on subsequent lines
#25: FILE: tcg/tcg.c:3373:
+                /* Cross register class move not supported.
+                   Store the source register into the destination slot

WARNING: Block comments use a trailing */ on a separate line
#26: FILE: tcg/tcg.c:3374:
+                   and leave the destination temp as TEMP_VAL_MEM.  */

WARNING: Block comments use a leading /* on a separate line
#44: FILE: tcg/tcg.c:3485:
+                /* Cross register class move not supported.  Sync the

WARNING: Block comments use * on subsequent lines
#45: FILE: tcg/tcg.c:3486:
+                /* Cross register class move not supported.  Sync the
+                   temp back to its slot and load from there.  */

WARNING: Block comments use a trailing */ on a separate line
#45: FILE: tcg/tcg.c:3486:
+                   temp back to its slot and load from there.  */

WARNING: Block comments use a leading /* on a separate line
#57: FILE: tcg/tcg.c:3648:
+                        /* Cross register class move not supported.  Sync the

WARNING: Block comments use * on subsequent lines
#58: FILE: tcg/tcg.c:3649:
+                        /* Cross register class move not supported.  Sync the
+                           temp back to its slot and load from there.  */

WARNING: Block comments use a trailing */ on a separate line
#58: FILE: tcg/tcg.c:3649:
+                           temp back to its slot and load from there.  */

total: 0 errors, 9 warnings, 43 lines checked

Patch 4/38 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
5/38 Checking commit 24aee1366381 (tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded)
6/38 Checking commit 71309aeb54f3 (tcg: Promote tcg_out_{dup, dupi}_vec to backend interface)
7/38 Checking commit 504eb1c2ce3e (tcg: Manually expand INDEX_op_dup_vec)
8/38 Checking commit 9a14d7cf98a8 (tcg: Add tcg_out_dupm_vec to the backend interface)
9/38 Checking commit 24cd3652da7f (tcg/i386: Implement tcg_out_dupm_vec)
10/38 Checking commit ca9a67767b3f (tcg/aarch64: Implement tcg_out_dupm_vec)
11/38 Checking commit 8c573e5a9cf7 (tcg: Add INDEX_op_dup_mem_vec)
WARNING: Block comments use a leading /* on a separate line
#96: FILE: tcg/tcg-op-gvec.c:400:
+        /* Recall that ARM SVE allows vector sizes that are not a

total: 0 errors, 1 warnings, 178 lines checked

Patch 11/38 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
12/38 Checking commit 8f6ed1c661ad (tcg: Add gvec expanders for variable shift)
13/38 Checking commit 1d95e80f8b9b (tcg/i386: Support vector variable shift opcodes)
14/38 Checking commit 3e06422a97e0 (tcg/aarch64: Support vector variable shift opcodes)
15/38 Checking commit 2d22c62f7d91 (tcg: Implement tcg_gen_gvec_3i())
16/38 Checking commit c611d5ab1dd7 (tcg: Specify optional vector requirements with a list)
17/38 Checking commit 8a79bb240796 (tcg: Add gvec expanders for vector shift by scalar)
18/38 Checking commit 9c178a385e74 (tcg/i386: Support vector scalar shift opcodes)
19/38 Checking commit b6885d0400cb (tcg: Add support for integer absolute value)
20/38 Checking commit 72e6fb61b122 (tcg: Add support for vector absolute value)
21/38 Checking commit 74420abd9dc1 (target/arm: Use tcg_gen_abs_i64 and tcg_gen_gvec_abs)
22/38 Checking commit 1869b46302aa (target/cris: Use tcg_gen_abs_tl)
23/38 Checking commit e81e04de2818 (target/ppc: Use tcg_gen_abs_tl)
24/38 Checking commit 3c3292af3271 (target/s390x: Use tcg_gen_abs_i64)
25/38 Checking commit f81e912312f6 (target/xtensa: Use tcg_gen_abs_i32)
26/38 Checking commit 4537dd40fc11 (tcg/i386: Support vector absolute value)
27/38 Checking commit 2aded2eb782d (tcg/aarch64: Support vector absolute value)
28/38 Checking commit a7efba49a2b5 (tcg: Add support for vector comparison select)
29/38 Checking commit 28392fb9a2f4 (tcg/i386: Support vector comparison select value)
30/38 Checking commit 65cae5150118 (tcg/aarch64: Support vector comparison select value)
31/38 Checking commit 23ef118db83a (target/ppc: Use vector variable shifts for VS{L, R, RA}{B, H, W, D})
32/38 Checking commit 54ed8f1e33a8 (target/arm: Vectorize USHL and SSHL)
ERROR: trailing statements should be on next line
#161: FILE: target/arm/translate.c:5288:
+            case 2: gen_ushl_i32(var, var, shift); break;

ERROR: trailing statements should be on next line
#168: FILE: target/arm/translate.c:5294:
+            case 2: gen_sshl_i32(var, var, shift); break;

total: 2 errors, 0 warnings, 650 lines checked

Patch 32/38 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

33/38 Checking commit 7425b741da09 (tcg/aarch64: Do not advertise minmax for MO_64)
34/38 Checking commit 2d6f5f70509e (tcg: Do not recreate INDEX_op_neg_vec unless supported)
35/38 Checking commit 6976ef828e8b (tcg: Introduce do_op3_nofail for vector expansion)
36/38 Checking commit 508dc19d3978 (tcg: Expand vector minmax using cmp+cmpsel)
37/38 Checking commit a701168d9cbe (tcg/aarch64: Use MVNI for expansion of dupi)
38/38 Checking commit eaace9760937 (tcg/aarch64: Use ORRI and BICI for vector logical operations)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190420073442.7488-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 03/38] tcg: Return bool success from tcg_out_mov
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 03/38] tcg: Return bool success from tcg_out_mov Richard Henderson
@ 2019-04-20 10:56   ` Philippe Mathieu-Daudé
  2019-04-23  8:27   ` David Hildenbrand
  1 sibling, 0 replies; 69+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-04-20 10:56 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: david

On 4/20/19 9:34 AM, Richard Henderson wrote:
> This patch merely changes the interface, aborting on all failures,
> of which there are currently none.
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

> ---
>  tcg/aarch64/tcg-target.inc.c |  5 +++--
>  tcg/arm/tcg-target.inc.c     |  7 +++++--
>  tcg/i386/tcg-target.inc.c    |  5 +++--
>  tcg/mips/tcg-target.inc.c    |  3 ++-
>  tcg/ppc/tcg-target.inc.c     |  3 ++-
>  tcg/riscv/tcg-target.inc.c   |  5 +++--
>  tcg/s390/tcg-target.inc.c    |  3 ++-
>  tcg/sparc/tcg-target.inc.c   |  3 ++-
>  tcg/tcg.c                    | 14 ++++++++++----
>  tcg/tci/tcg-target.inc.c     |  3 ++-
>  10 files changed, 34 insertions(+), 17 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 8b93598bce..b2d3f9c0a5 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -938,10 +938,10 @@ static void tcg_out_ldst(TCGContext *s, AArch64Insn insn, TCGReg rd,
>      tcg_out_ldst_r(s, insn, rd, rn, TCG_TYPE_I64, TCG_REG_TMP);
>  }
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>  {
>      if (ret == arg) {
> -        return;
> +        return true;
>      }
>      switch (type) {
>      case TCG_TYPE_I32:
> @@ -970,6 +970,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>      default:
>          g_assert_not_reached();
>      }
> +    return true;
>  }
>  
>  static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
> index 6873b0cf95..34e6652142 100644
> --- a/tcg/arm/tcg-target.inc.c
> +++ b/tcg/arm/tcg-target.inc.c
> @@ -2275,10 +2275,13 @@ static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
>      return false;
>  }
>  
> -static inline void tcg_out_mov(TCGContext *s, TCGType type,
> +static inline bool tcg_out_mov(TCGContext *s, TCGType type,
>                                 TCGReg ret, TCGReg arg)
>  {
> -    tcg_out_dat_reg(s, COND_AL, ARITH_MOV, ret, 0, arg, SHIFT_IMM_LSL(0));
> +    if (ret != arg) {
> +        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, ret, 0, arg, SHIFT_IMM_LSL(0));
> +    }
> +    return true;
>  }
>  
>  static inline void tcg_out_movi(TCGContext *s, TCGType type,
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 1fa833840e..817a167767 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -809,12 +809,12 @@ static inline void tgen_arithr(TCGContext *s, int subop, int dest, int src)
>      tcg_out_modrm(s, OPC_ARITH_GvEv + (subop << 3) + ext, dest, src);
>  }
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>  {
>      int rexw = 0;
>  
>      if (arg == ret) {
> -        return;
> +        return true;
>      }
>      switch (type) {
>      case TCG_TYPE_I64:
> @@ -852,6 +852,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>      default:
>          g_assert_not_reached();
>      }
> +    return true;
>  }
>  
>  static void tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index 8a92e916dd..f31ebb43bf 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -558,13 +558,14 @@ static inline void tcg_out_dsra(TCGContext *s, TCGReg rd, TCGReg rt, TCGArg sa)
>      tcg_out_opc_sa64(s, OPC_DSRA, OPC_DSRA32, rd, rt, sa);
>  }
>  
> -static inline void tcg_out_mov(TCGContext *s, TCGType type,
> +static inline bool tcg_out_mov(TCGContext *s, TCGType type,
>                                 TCGReg ret, TCGReg arg)
>  {
>      /* Simple reg-reg move, optimising out the 'do nothing' case */
>      if (ret != arg) {
>          tcg_out_opc_reg(s, OPC_OR, ret, arg, TCG_REG_ZERO);
>      }
> +    return true;
>  }
>  
>  static void tcg_out_movi(TCGContext *s, TCGType type,
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index 773690f1d9..ec8e336be8 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -566,12 +566,13 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
>  static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
>                               TCGReg base, tcg_target_long offset);
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>  {
>      tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
>      if (ret != arg) {
>          tcg_out32(s, OR | SAB(arg, ret, arg));
>      }
> +    return true;
>  }
>  
>  static inline void tcg_out_rld(TCGContext *s, int op, TCGReg ra, TCGReg rs,
> diff --git a/tcg/riscv/tcg-target.inc.c b/tcg/riscv/tcg-target.inc.c
> index b785f4acb7..e2bf1c2c6e 100644
> --- a/tcg/riscv/tcg-target.inc.c
> +++ b/tcg/riscv/tcg-target.inc.c
> @@ -515,10 +515,10 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
>   * TCG intrinsics
>   */
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>  {
>      if (ret == arg) {
> -        return;
> +        return true;
>      }
>      switch (type) {
>      case TCG_TYPE_I32:
> @@ -528,6 +528,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>      default:
>          g_assert_not_reached();
>      }
> +    return true;
>  }
>  
>  static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
> diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
> index 7db90b3bae..eb22188d1d 100644
> --- a/tcg/s390/tcg-target.inc.c
> +++ b/tcg/s390/tcg-target.inc.c
> @@ -548,7 +548,7 @@ static void tcg_out_sh32(TCGContext* s, S390Opcode op, TCGReg dest,
>      tcg_out_insn_RS(s, op, dest, sh_reg, 0, sh_imm);
>  }
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)
>  {
>      if (src != dst) {
>          if (type == TCG_TYPE_I32) {
> @@ -557,6 +557,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)
>              tcg_out_insn(s, RRE, LGR, dst, src);
>          }
>      }
> +    return true;
>  }
>  
>  static const S390Opcode lli_insns[4] = {
> diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
> index 7a61839dc1..83295955a7 100644
> --- a/tcg/sparc/tcg-target.inc.c
> +++ b/tcg/sparc/tcg-target.inc.c
> @@ -407,12 +407,13 @@ static void tcg_out_arithc(TCGContext *s, TCGReg rd, TCGReg rs1,
>                | (val2const ? INSN_IMM13(val2) : INSN_RS2(val2)));
>  }
>  
> -static inline void tcg_out_mov(TCGContext *s, TCGType type,
> +static inline bool tcg_out_mov(TCGContext *s, TCGType type,
>                                 TCGReg ret, TCGReg arg)
>  {
>      if (ret != arg) {
>          tcg_out_arith(s, ret, arg, TCG_REG_G0, ARITH_OR);
>      }
> +    return true;
>  }
>  
>  static inline void tcg_out_sethi(TCGContext *s, TCGReg ret, uint32_t arg)
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 4f77a957b0..b083faacd2 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -102,7 +102,7 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
>                                             const char *ct_str, TCGType type);
>  static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
>                         intptr_t arg2);
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
>  static void tcg_out_movi(TCGContext *s, TCGType type,
>                           TCGReg ret, tcg_target_long arg);
>  static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
> @@ -3372,7 +3372,9 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
>                                           allocated_regs, preferred_regs,
>                                           ots->indirect_base);
>              }
> -            tcg_out_mov(s, otype, ots->reg, ts->reg);
> +            if (!tcg_out_mov(s, otype, ots->reg, ts->reg)) {
> +                abort();
> +            }
>          }
>          ots->val_type = TEMP_VAL_REG;
>          ots->mem_coherent = 0;
> @@ -3472,7 +3474,9 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>                        i_allocated_regs, 0);
>              reg = tcg_reg_alloc(s, arg_ct->u.regs, i_allocated_regs,
>                                  o_preferred_regs, ts->indirect_base);
> -            tcg_out_mov(s, ts->type, reg, ts->reg);
> +            if (!tcg_out_mov(s, ts->type, reg, ts->reg)) {
> +                abort();
> +            }
>          }
>          new_args[i] = reg;
>          const_args[i] = 0;
> @@ -3629,7 +3633,9 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
>              if (ts->val_type == TEMP_VAL_REG) {
>                  if (ts->reg != reg) {
>                      tcg_reg_free(s, reg, allocated_regs);
> -                    tcg_out_mov(s, ts->type, reg, ts->reg);
> +                    if (!tcg_out_mov(s, ts->type, reg, ts->reg)) {
> +                        abort();
> +                    }
>                  }
>              } else {
>                  TCGRegSet arg_set = 0;
> diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
> index 0015a98485..992d50cb1e 100644
> --- a/tcg/tci/tcg-target.inc.c
> +++ b/tcg/tci/tcg-target.inc.c
> @@ -509,7 +509,7 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
>      old_code_ptr[1] = s->code_ptr - old_code_ptr;
>  }
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>  {
>      uint8_t *old_code_ptr = s->code_ptr;
>      tcg_debug_assert(ret != arg);
> @@ -521,6 +521,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>      tcg_out_r(s, ret);
>      tcg_out_r(s, arg);
>      old_code_ptr[1] = s->code_ptr - old_code_ptr;
> +    return true;
>  }
>  
>  static void tcg_out_movi(TCGContext *s, TCGType type,
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 01/38] target/arm: Fill in .opc for cmtst_op
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 01/38] target/arm: Fill in .opc for cmtst_op Richard Henderson
@ 2019-04-23  8:00   ` David Hildenbrand
  0 siblings, 0 replies; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23  8:00 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 20.04.19 09:34, Richard Henderson wrote:
> This allows us to fall back to integers if the tcg backend
> does not support comparisons in the given vece.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index d408e4d7ef..13e2dc6562 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -6140,16 +6140,20 @@ static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
>  const GVecGen3 cmtst_op[4] = {
>      { .fni4 = gen_helper_neon_tst_u8,
>        .fniv = gen_cmtst_vec,
> +      .opc = INDEX_op_cmp_vec,
>        .vece = MO_8 },
>      { .fni4 = gen_helper_neon_tst_u16,
>        .fniv = gen_cmtst_vec,
> +      .opc = INDEX_op_cmp_vec,
>        .vece = MO_16 },
>      { .fni4 = gen_cmtst_i32,
>        .fniv = gen_cmtst_vec,
> +      .opc = INDEX_op_cmp_vec,
>        .vece = MO_32 },
>      { .fni8 = gen_cmtst_i64,
>        .fniv = gen_cmtst_vec,
>        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
> +      .opc = INDEX_op_cmp_vec,
>        .vece = MO_64 },
>  };
>  
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 02/38] tcg: Assert fixed_reg is read-only
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 02/38] tcg: Assert fixed_reg is read-only Richard Henderson
@ 2019-04-23  8:03   ` David Hildenbrand
  0 siblings, 0 replies; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23  8:03 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 20.04.19 09:34, Richard Henderson wrote:
> The only fixed_reg is cpu_env, and it should not be modified
> during any TB.  Therefore code that tries to special-case moves
> into a fixed_reg is dead.  Remove it.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/tcg.c | 85 +++++++++++++++++++++++++------------------------------
>  1 file changed, 38 insertions(+), 47 deletions(-)
> 
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index ade6050982..4f77a957b0 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -3279,11 +3279,8 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp *ots,
>                                    tcg_target_ulong val, TCGLifeData arg_life,
>                                    TCGRegSet preferred_regs)
>  {
> -    if (ots->fixed_reg) {
> -        /* For fixed registers, we do not do any constant propagation.  */
> -        tcg_out_movi(s, ots->type, ots->reg, val);
> -        return;
> -    }
> +    /* ENV should not be modified.  */
> +    tcg_debug_assert(!ots->fixed_reg);
>  
>      /* The movi is not explicitly generated here.  */
>      if (ots->val_type == TEMP_VAL_REG) {
> @@ -3319,6 +3316,9 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
>      ots = arg_temp(op->args[0]);
>      ts = arg_temp(op->args[1]);
>  
> +    /* ENV should not be modified.  */
> +    tcg_debug_assert(!ots->fixed_reg);
> +
>      /* Note that otype != itype for no-op truncation.  */
>      otype = ots->type;
>      itype = ts->type;
> @@ -3343,7 +3343,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
>      }
>  
>      tcg_debug_assert(ts->val_type == TEMP_VAL_REG);
> -    if (IS_DEAD_ARG(0) && !ots->fixed_reg) {
> +    if (IS_DEAD_ARG(0)) {
>          /* mov to a non-saved dead register makes no sense (even with
>             liveness analysis disabled). */
>          tcg_debug_assert(NEED_SYNC_ARG(0));
> @@ -3356,7 +3356,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
>          }
>          temp_dead(s, ots);
>      } else {
> -        if (IS_DEAD_ARG(1) && !ts->fixed_reg && !ots->fixed_reg) {
> +        if (IS_DEAD_ARG(1) && !ts->fixed_reg) {
>              /* the mov can be suppressed */
>              if (ots->val_type == TEMP_VAL_REG) {
>                  s->reg_to_temp[ots->reg] = NULL;
> @@ -3509,6 +3509,10 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>              arg = op->args[i];
>              arg_ct = &def->args_ct[i];
>              ts = arg_temp(arg);
> +
> +            /* ENV should not be modified.  */
> +            tcg_debug_assert(!ts->fixed_reg);
> +
>              if ((arg_ct->ct & TCG_CT_ALIAS)
>                  && !const_args[arg_ct->alias_index]) {
>                  reg = new_args[arg_ct->alias_index];
> @@ -3517,29 +3521,19 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>                                      i_allocated_regs | o_allocated_regs,
>                                      op->output_pref[k], ts->indirect_base);
>              } else {
> -                /* if fixed register, we try to use it */
> -                reg = ts->reg;
> -                if (ts->fixed_reg &&
> -                    tcg_regset_test_reg(arg_ct->u.regs, reg)) {
> -                    goto oarg_end;
> -                }
>                  reg = tcg_reg_alloc(s, arg_ct->u.regs, o_allocated_regs,
>                                      op->output_pref[k], ts->indirect_base);
>              }
>              tcg_regset_set_reg(o_allocated_regs, reg);
> -            /* if a fixed register is used, then a move will be done afterwards */
> -            if (!ts->fixed_reg) {
> -                if (ts->val_type == TEMP_VAL_REG) {
> -                    s->reg_to_temp[ts->reg] = NULL;
> -                }
> -                ts->val_type = TEMP_VAL_REG;
> -                ts->reg = reg;
> -                /* temp value is modified, so the value kept in memory is
> -                   potentially not the same */
> -                ts->mem_coherent = 0;
> -                s->reg_to_temp[reg] = ts;
> +            if (ts->val_type == TEMP_VAL_REG) {
> +                s->reg_to_temp[ts->reg] = NULL;
>              }
> -        oarg_end:
> +            ts->val_type = TEMP_VAL_REG;
> +            ts->reg = reg;
> +            /* temp value is modified, so the value kept in memory is
> +               potentially not the same */
> +            ts->mem_coherent = 0;
> +            s->reg_to_temp[reg] = ts;
>              new_args[i] = reg;
>          }
>      }
> @@ -3555,10 +3549,10 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>      /* move the outputs in the correct register if needed */
>      for(i = 0; i < nb_oargs; i++) {
>          ts = arg_temp(op->args[i]);
> -        reg = new_args[i];
> -        if (ts->fixed_reg && ts->reg != reg) {
> -            tcg_out_mov(s, ts->type, ts->reg, reg);
> -        }
> +
> +        /* ENV should not be modified.  */
> +        tcg_debug_assert(!ts->fixed_reg);
> +
>          if (NEED_SYNC_ARG(i)) {
>              temp_sync(s, ts, o_allocated_regs, 0, IS_DEAD_ARG(i));
>          } else if (IS_DEAD_ARG(i)) {
> @@ -3679,26 +3673,23 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
>      for(i = 0; i < nb_oargs; i++) {
>          arg = op->args[i];
>          ts = arg_temp(arg);
> +
> +        /* ENV should not be modified.  */
> +        tcg_debug_assert(!ts->fixed_reg);
> +
>          reg = tcg_target_call_oarg_regs[i];
>          tcg_debug_assert(s->reg_to_temp[reg] == NULL);
> -
> -        if (ts->fixed_reg) {
> -            if (ts->reg != reg) {
> -                tcg_out_mov(s, ts->type, ts->reg, reg);
> -            }
> -        } else {
> -            if (ts->val_type == TEMP_VAL_REG) {
> -                s->reg_to_temp[ts->reg] = NULL;
> -            }
> -            ts->val_type = TEMP_VAL_REG;
> -            ts->reg = reg;
> -            ts->mem_coherent = 0;
> -            s->reg_to_temp[reg] = ts;
> -            if (NEED_SYNC_ARG(i)) {
> -                temp_sync(s, ts, allocated_regs, 0, IS_DEAD_ARG(i));
> -            } else if (IS_DEAD_ARG(i)) {
> -                temp_dead(s, ts);
> -            }
> +        if (ts->val_type == TEMP_VAL_REG) {
> +            s->reg_to_temp[ts->reg] = NULL;
> +        }
> +        ts->val_type = TEMP_VAL_REG;
> +        ts->reg = reg;
> +        ts->mem_coherent = 0;
> +        s->reg_to_temp[reg] = ts;
> +        if (NEED_SYNC_ARG(i)) {
> +            temp_sync(s, ts, allocated_regs, 0, IS_DEAD_ARG(i));
> +        } else if (IS_DEAD_ARG(i)) {
> +            temp_dead(s, ts);
>          }
>      }
>  }
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 03/38] tcg: Return bool success from tcg_out_mov
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 03/38] tcg: Return bool success from tcg_out_mov Richard Henderson
  2019-04-20 10:56   ` Philippe Mathieu-Daudé
@ 2019-04-23  8:27   ` David Hildenbrand
  1 sibling, 0 replies; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23  8:27 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 20.04.19 09:34, Richard Henderson wrote:
> This patch merely changes the interface, aborting on all failures,
> of which there are currently none.
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/aarch64/tcg-target.inc.c |  5 +++--
>  tcg/arm/tcg-target.inc.c     |  7 +++++--
>  tcg/i386/tcg-target.inc.c    |  5 +++--
>  tcg/mips/tcg-target.inc.c    |  3 ++-
>  tcg/ppc/tcg-target.inc.c     |  3 ++-
>  tcg/riscv/tcg-target.inc.c   |  5 +++--
>  tcg/s390/tcg-target.inc.c    |  3 ++-
>  tcg/sparc/tcg-target.inc.c   |  3 ++-
>  tcg/tcg.c                    | 14 ++++++++++----
>  tcg/tci/tcg-target.inc.c     |  3 ++-
>  10 files changed, 34 insertions(+), 17 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 8b93598bce..b2d3f9c0a5 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -938,10 +938,10 @@ static void tcg_out_ldst(TCGContext *s, AArch64Insn insn, TCGReg rd,
>      tcg_out_ldst_r(s, insn, rd, rn, TCG_TYPE_I64, TCG_REG_TMP);
>  }
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>  {
>      if (ret == arg) {
> -        return;
> +        return true;
>      }
>      switch (type) {
>      case TCG_TYPE_I32:
> @@ -970,6 +970,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>      default:
>          g_assert_not_reached();
>      }
> +    return true;
>  }
>  
>  static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
> index 6873b0cf95..34e6652142 100644
> --- a/tcg/arm/tcg-target.inc.c
> +++ b/tcg/arm/tcg-target.inc.c
> @@ -2275,10 +2275,13 @@ static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
>      return false;
>  }
>  
> -static inline void tcg_out_mov(TCGContext *s, TCGType type,
> +static inline bool tcg_out_mov(TCGContext *s, TCGType type,
>                                 TCGReg ret, TCGReg arg)
>  {
> -    tcg_out_dat_reg(s, COND_AL, ARITH_MOV, ret, 0, arg, SHIFT_IMM_LSL(0));
> +    if (ret != arg) {
> +        tcg_out_dat_reg(s, COND_AL, ARITH_MOV, ret, 0, arg, SHIFT_IMM_LSL(0));
> +    }
> +    return true;
>  }
>  
>  static inline void tcg_out_movi(TCGContext *s, TCGType type,
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 1fa833840e..817a167767 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -809,12 +809,12 @@ static inline void tgen_arithr(TCGContext *s, int subop, int dest, int src)
>      tcg_out_modrm(s, OPC_ARITH_GvEv + (subop << 3) + ext, dest, src);
>  }
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>  {
>      int rexw = 0;
>  
>      if (arg == ret) {
> -        return;
> +        return true;
>      }
>      switch (type) {
>      case TCG_TYPE_I64:
> @@ -852,6 +852,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>      default:
>          g_assert_not_reached();
>      }
> +    return true;
>  }
>  
>  static void tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index 8a92e916dd..f31ebb43bf 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -558,13 +558,14 @@ static inline void tcg_out_dsra(TCGContext *s, TCGReg rd, TCGReg rt, TCGArg sa)
>      tcg_out_opc_sa64(s, OPC_DSRA, OPC_DSRA32, rd, rt, sa);
>  }
>  
> -static inline void tcg_out_mov(TCGContext *s, TCGType type,
> +static inline bool tcg_out_mov(TCGContext *s, TCGType type,
>                                 TCGReg ret, TCGReg arg)
>  {
>      /* Simple reg-reg move, optimising out the 'do nothing' case */
>      if (ret != arg) {
>          tcg_out_opc_reg(s, OPC_OR, ret, arg, TCG_REG_ZERO);
>      }
> +    return true;
>  }
>  
>  static void tcg_out_movi(TCGContext *s, TCGType type,
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index 773690f1d9..ec8e336be8 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -566,12 +566,13 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
>  static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
>                               TCGReg base, tcg_target_long offset);
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>  {
>      tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
>      if (ret != arg) {
>          tcg_out32(s, OR | SAB(arg, ret, arg));
>      }
> +    return true;
>  }
>  
>  static inline void tcg_out_rld(TCGContext *s, int op, TCGReg ra, TCGReg rs,
> diff --git a/tcg/riscv/tcg-target.inc.c b/tcg/riscv/tcg-target.inc.c
> index b785f4acb7..e2bf1c2c6e 100644
> --- a/tcg/riscv/tcg-target.inc.c
> +++ b/tcg/riscv/tcg-target.inc.c
> @@ -515,10 +515,10 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
>   * TCG intrinsics
>   */
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>  {
>      if (ret == arg) {
> -        return;
> +        return true;
>      }
>      switch (type) {
>      case TCG_TYPE_I32:
> @@ -528,6 +528,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>      default:
>          g_assert_not_reached();
>      }
> +    return true;
>  }
>  
>  static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
> diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
> index 7db90b3bae..eb22188d1d 100644
> --- a/tcg/s390/tcg-target.inc.c
> +++ b/tcg/s390/tcg-target.inc.c
> @@ -548,7 +548,7 @@ static void tcg_out_sh32(TCGContext* s, S390Opcode op, TCGReg dest,
>      tcg_out_insn_RS(s, op, dest, sh_reg, 0, sh_imm);
>  }
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)
>  {
>      if (src != dst) {
>          if (type == TCG_TYPE_I32) {
> @@ -557,6 +557,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg dst, TCGReg src)
>              tcg_out_insn(s, RRE, LGR, dst, src);
>          }
>      }
> +    return true;
>  }
>  
>  static const S390Opcode lli_insns[4] = {
> diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
> index 7a61839dc1..83295955a7 100644
> --- a/tcg/sparc/tcg-target.inc.c
> +++ b/tcg/sparc/tcg-target.inc.c
> @@ -407,12 +407,13 @@ static void tcg_out_arithc(TCGContext *s, TCGReg rd, TCGReg rs1,
>                | (val2const ? INSN_IMM13(val2) : INSN_RS2(val2)));
>  }
>  
> -static inline void tcg_out_mov(TCGContext *s, TCGType type,
> +static inline bool tcg_out_mov(TCGContext *s, TCGType type,
>                                 TCGReg ret, TCGReg arg)
>  {
>      if (ret != arg) {
>          tcg_out_arith(s, ret, arg, TCG_REG_G0, ARITH_OR);
>      }
> +    return true;
>  }
>  
>  static inline void tcg_out_sethi(TCGContext *s, TCGReg ret, uint32_t arg)
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 4f77a957b0..b083faacd2 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -102,7 +102,7 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
>                                             const char *ct_str, TCGType type);
>  static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
>                         intptr_t arg2);
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
>  static void tcg_out_movi(TCGContext *s, TCGType type,
>                           TCGReg ret, tcg_target_long arg);
>  static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
> @@ -3372,7 +3372,9 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
>                                           allocated_regs, preferred_regs,
>                                           ots->indirect_base);
>              }
> -            tcg_out_mov(s, otype, ots->reg, ts->reg);
> +            if (!tcg_out_mov(s, otype, ots->reg, ts->reg)) {
> +                abort();
> +            }
>          }
>          ots->val_type = TEMP_VAL_REG;
>          ots->mem_coherent = 0;
> @@ -3472,7 +3474,9 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>                        i_allocated_regs, 0);
>              reg = tcg_reg_alloc(s, arg_ct->u.regs, i_allocated_regs,
>                                  o_preferred_regs, ts->indirect_base);
> -            tcg_out_mov(s, ts->type, reg, ts->reg);
> +            if (!tcg_out_mov(s, ts->type, reg, ts->reg)) {
> +                abort();
> +            }
>          }
>          new_args[i] = reg;
>          const_args[i] = 0;
> @@ -3629,7 +3633,9 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
>              if (ts->val_type == TEMP_VAL_REG) {
>                  if (ts->reg != reg) {
>                      tcg_reg_free(s, reg, allocated_regs);
> -                    tcg_out_mov(s, ts->type, reg, ts->reg);
> +                    if (!tcg_out_mov(s, ts->type, reg, ts->reg)) {
> +                        abort();
> +                    }
>                  }
>              } else {
>                  TCGRegSet arg_set = 0;
> diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
> index 0015a98485..992d50cb1e 100644
> --- a/tcg/tci/tcg-target.inc.c
> +++ b/tcg/tci/tcg-target.inc.c
> @@ -509,7 +509,7 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
>      old_code_ptr[1] = s->code_ptr - old_code_ptr;
>  }
>  
> -static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
> +static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>  {
>      uint8_t *old_code_ptr = s->code_ptr;
>      tcg_debug_assert(ret != arg);
> @@ -521,6 +521,7 @@ static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>      tcg_out_r(s, ret);
>      tcg_out_r(s, arg);
>      old_code_ptr[1] = s->code_ptr - old_code_ptr;
> +    return true;
>  }
>  
>  static void tcg_out_movi(TCGContext *s, TCGType type,
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 04/38] tcg: Support cross-class moves without instruction support
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 04/38] tcg: Support cross-class moves without instruction support Richard Henderson
@ 2019-04-23  8:29   ` David Hildenbrand
  0 siblings, 0 replies; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23  8:29 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 20.04.19 09:34, Richard Henderson wrote:
> PowerPC Altivec does not support direct moves between vector registers
> and general registers.  So when tcg_out_mov fails, we can use the
> backing memory for the temporary to perform the move.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/tcg.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index b083faacd2..d3dcfe3dca 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -3373,7 +3373,18 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp *op)
>                                           ots->indirect_base);
>              }
>              if (!tcg_out_mov(s, otype, ots->reg, ts->reg)) {
> -                abort();
> +                /* Cross register class move not supported.
> +                   Store the source register into the destination slot
> +                   and leave the destination temp as TEMP_VAL_MEM.  */
> +                assert(!ots->fixed_reg);
> +                if (!ts->mem_allocated) {
> +                    temp_allocate_frame(s, ots);
> +                }
> +                tcg_out_st(s, ts->type, ts->reg,
> +                           ots->mem_base->reg, ots->mem_offset);
> +                ots->mem_coherent = 1;
> +                temp_free_or_dead(s, ots, -1);
> +                return;
>              }
>          }
>          ots->val_type = TEMP_VAL_REG;
> @@ -3475,7 +3486,11 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
>              reg = tcg_reg_alloc(s, arg_ct->u.regs, i_allocated_regs,
>                                  o_preferred_regs, ts->indirect_base);
>              if (!tcg_out_mov(s, ts->type, reg, ts->reg)) {
> -                abort();
> +                /* Cross register class move not supported.  Sync the
> +                   temp back to its slot and load from there.  */
> +                temp_sync(s, ts, i_allocated_regs, 0, 0);
> +                tcg_out_ld(s, ts->type, reg,
> +                           ts->mem_base->reg, ts->mem_offset);
>              }
>          }
>          new_args[i] = reg;
> @@ -3634,7 +3649,11 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
>                  if (ts->reg != reg) {
>                      tcg_reg_free(s, reg, allocated_regs);
>                      if (!tcg_out_mov(s, ts->type, reg, ts->reg)) {
> -                        abort();
> +                        /* Cross register class move not supported.  Sync the
> +                           temp back to its slot and load from there.  */
> +                        temp_sync(s, ts, allocated_regs, 0, 0);
> +                        tcg_out_ld(s, ts->type, reg,
> +                                   ts->mem_base->reg, ts->mem_offset);
>                      }
>                  }
>              } else {
> 

Acked-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 05/38] tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 05/38] tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded Richard Henderson
@ 2019-04-23  8:33   ` David Hildenbrand
  0 siblings, 0 replies; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23  8:33 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 20.04.19 09:34, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/tcg-op-vec.c | 49 ++++++++++++++++++++++++++++++++----------------
>  1 file changed, 33 insertions(+), 16 deletions(-)
> 
> diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
> index 27f65600c3..cfb18682b1 100644
> --- a/tcg/tcg-op-vec.c
> +++ b/tcg/tcg-op-vec.c
> @@ -226,16 +226,6 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr b, TCGArg o, TCGType low_type)
>      vec_gen_3(INDEX_op_st_vec, low_type, 0, ri, bi, o);
>  }
>  
> -void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
> -{
> -    vec_gen_op3(INDEX_op_add_vec, vece, r, a, b);
> -}
> -
> -void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
> -{
> -    vec_gen_op3(INDEX_op_sub_vec, vece, r, a, b);
> -}
> -
>  void tcg_gen_and_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
>  {
>      vec_gen_op3(INDEX_op_and_vec, 0, r, a, b);
> @@ -296,11 +286,30 @@ void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
>      tcg_gen_not_vec(0, r, r);
>  }
>  
> +static bool do_op2(unsigned vece, TCGv_vec r, TCGv_vec a, TCGOpcode opc)
> +{
> +    TCGTemp *rt = tcgv_vec_temp(r);
> +    TCGTemp *at = tcgv_vec_temp(a);
> +    TCGArg ri = temp_arg(rt);
> +    TCGArg ai = temp_arg(at);
> +    TCGType type = rt->base_type;
> +    int can;
> +
> +    tcg_debug_assert(at->base_type >= type);
> +    can = tcg_can_emit_vec_op(opc, type, vece);
> +    if (can > 0) {
> +        vec_gen_2(opc, type, vece, ri, ai);
> +    } else if (can < 0) {
> +        tcg_expand_vec_op(opc, type, vece, ri, ai);
> +    } else {
> +        return false;
> +    }
> +    return true;
> +}

I'd do

    if (can > 0) {
       vec_gen_2(opc, type, vece, ri, ai);
       return true;
    } else if (can < 0) {
       tcg_expand_vec_op(opc, type, vece, ri, ai);
       return true;
    }
    return false;


Or less readable "return can != 0;"

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 19/38] tcg: Add support for integer absolute value
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 19/38] tcg: Add support for integer absolute value Richard Henderson
@ 2019-04-23  8:52   ` Philippe Mathieu-Daudé
  2019-04-23 18:37   ` David Hildenbrand
  1 sibling, 0 replies; 69+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-04-23  8:52 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: david

On 4/20/19 9:34 AM, Richard Henderson wrote:
> Remove a function of the same name from target/arm/.
> Use a branchless implementation of abs that gcc uses for x86.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/tcg-op.h           |  5 +++++
>  target/arm/translate.c | 10 ----------
>  tcg/tcg-op.c           | 20 ++++++++++++++++++++
>  3 files changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index 472b73cb38..660fe205d0 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -335,6 +335,7 @@ void tcg_gen_smin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_smax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_umin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_umax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
> +void tcg_gen_abs_i32(TCGv_i32, TCGv_i32);
>  
>  static inline void tcg_gen_discard_i32(TCGv_i32 arg)
>  {
> @@ -534,6 +535,7 @@ void tcg_gen_smin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_smax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_umin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_umax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
> +void tcg_gen_abs_i64(TCGv_i64, TCGv_i64);
>  
>  #if TCG_TARGET_REG_BITS == 64
>  static inline void tcg_gen_discard_i64(TCGv_i64 arg)
> @@ -973,6 +975,7 @@ void tcg_gen_nor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>  void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>  void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>  void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
> +void tcg_gen_abs_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>  void tcg_gen_ssadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>  void tcg_gen_usadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>  void tcg_gen_sssub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
> @@ -1019,6 +1022,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
>  #define tcg_gen_addi_tl tcg_gen_addi_i64
>  #define tcg_gen_sub_tl tcg_gen_sub_i64
>  #define tcg_gen_neg_tl tcg_gen_neg_i64
> +#define tcg_gen_abs_tl tcg_gen_abs_i64
>  #define tcg_gen_subfi_tl tcg_gen_subfi_i64
>  #define tcg_gen_subi_tl tcg_gen_subi_i64
>  #define tcg_gen_and_tl tcg_gen_and_i64
> @@ -1131,6 +1135,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
>  #define tcg_gen_addi_tl tcg_gen_addi_i32
>  #define tcg_gen_sub_tl tcg_gen_sub_i32
>  #define tcg_gen_neg_tl tcg_gen_neg_i32
> +#define tcg_gen_abs_tl tcg_gen_abs_i32
>  #define tcg_gen_subfi_tl tcg_gen_subfi_i32
>  #define tcg_gen_subi_tl tcg_gen_subi_i32
>  #define tcg_gen_and_tl tcg_gen_and_i32
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 83a008e945..721171794d 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -603,16 +603,6 @@ static void gen_sar(TCGv_i32 dest, TCGv_i32 t0, TCGv_i32 t1)
>      tcg_temp_free_i32(tmp1);
>  }
>  
> -static void tcg_gen_abs_i32(TCGv_i32 dest, TCGv_i32 src)
> -{
> -    TCGv_i32 c0 = tcg_const_i32(0);
> -    TCGv_i32 tmp = tcg_temp_new_i32();
> -    tcg_gen_neg_i32(tmp, src);
> -    tcg_gen_movcond_i32(TCG_COND_GT, dest, src, c0, src, tmp);
> -    tcg_temp_free_i32(c0);
> -    tcg_temp_free_i32(tmp);
> -}
> -
>  static void shifter_out_im(TCGv_i32 var, int shift)
>  {
>      if (shift == 0) {
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index a00d1df37e..0ac291f1c4 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -1091,6 +1091,16 @@ void tcg_gen_umax_i32(TCGv_i32 ret, TCGv_i32 a, TCGv_i32 b)
>      tcg_gen_movcond_i32(TCG_COND_LTU, ret, a, b, b, a);
>  }
>  
> +void tcg_gen_abs_i32(TCGv_i32 ret, TCGv_i32 a)
> +{
> +    TCGv_i32 t = tcg_temp_new_i32();
> +
> +    tcg_gen_sari_i32(t, a, 31);
> +    tcg_gen_xor_i32(ret, a, t);
> +    tcg_gen_sub_i32(ret, ret, t);
> +    tcg_temp_free_i32(t);
> +}
> +
>  /* 64-bit ops */
>  
>  #if TCG_TARGET_REG_BITS == 32
> @@ -2548,6 +2558,16 @@ void tcg_gen_umax_i64(TCGv_i64 ret, TCGv_i64 a, TCGv_i64 b)
>      tcg_gen_movcond_i64(TCG_COND_LTU, ret, a, b, b, a);
>  }
>  
> +void tcg_gen_abs_i64(TCGv_i64 ret, TCGv_i64 a)
> +{
> +    TCGv_i64 t = tcg_temp_new_i64();
> +
> +    tcg_gen_sari_i64(t, a, 63);
> +    tcg_gen_xor_i64(ret, a, t);
> +    tcg_gen_sub_i64(ret, ret, t);
> +    tcg_temp_free_i64(t);
> +}
> +
>  /* Size changing operations.  */
>  
>  void tcg_gen_extrl_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 22/38] target/cris: Use tcg_gen_abs_tl
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 22/38] target/cris: Use tcg_gen_abs_tl Richard Henderson
@ 2019-04-23 10:09   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 69+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-04-23 10:09 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: david

On 4/20/19 9:34 AM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/cris/translate.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/target/cris/translate.c b/target/cris/translate.c
> index 11b2c11174..0374718f66 100644
> --- a/target/cris/translate.c
> +++ b/target/cris/translate.c
> @@ -1685,18 +1685,11 @@ static int dec_cmp_r(CPUCRISState *env, DisasContext *dc)
>  
>  static int dec_abs_r(CPUCRISState *env, DisasContext *dc)
>  {
> -    TCGv t0;
> -
>      LOG_DIS("abs $r%u, $r%u\n",
>              dc->op1, dc->op2);
>      cris_cc_mask(dc, CC_MASK_NZ);
>  
> -    t0 = tcg_temp_new();
> -    tcg_gen_sari_tl(t0, cpu_R[dc->op1], 31);
> -    tcg_gen_xor_tl(cpu_R[dc->op2], cpu_R[dc->op1], t0);
> -    tcg_gen_sub_tl(cpu_R[dc->op2], cpu_R[dc->op2], t0);
> -    tcg_temp_free(t0);
> -
> +    tcg_gen_abs_tl(cpu_R[dc->op2], cpu_R[dc->op1]);
>      cris_alu(dc, CC_OP_MOVE,
>              cpu_R[dc->op2], cpu_R[dc->op2], cpu_R[dc->op2], 4);
>      return 2;
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 20/38] tcg: Add support for vector absolute value
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 20/38] tcg: Add support for vector " Richard Henderson
@ 2019-04-23 18:35   ` David Hildenbrand
  0 siblings, 0 replies; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23 18:35 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 20.04.19 09:34, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/tcg-runtime.h      |  5 +++
>  tcg/aarch64/tcg-target.h     |  1 +
>  tcg/i386/tcg-target.h        |  1 +
>  tcg/tcg-op-gvec.h            |  2 +
>  tcg/tcg-opc.h                |  1 +
>  tcg/tcg.h                    |  1 +
>  accel/tcg/tcg-runtime-gvec.c | 48 ++++++++++++++++++++++++
>  tcg/tcg-op-gvec.c            | 71 ++++++++++++++++++++++++++++++++++++
>  tcg/tcg-op-vec.c             | 31 ++++++++++++++++
>  tcg/tcg.c                    |  2 +
>  tcg/README                   |  4 ++
>  11 files changed, 167 insertions(+)
> 
> diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
> index ed3ce5fd91..6d73dc2d65 100644
> --- a/accel/tcg/tcg-runtime.h
> +++ b/accel/tcg/tcg-runtime.h
> @@ -225,6 +225,11 @@ DEF_HELPER_FLAGS_3(gvec_neg16, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_3(gvec_neg32, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_3(gvec_neg64, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
>  
> +DEF_HELPER_FLAGS_3(gvec_abs8, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_3(gvec_abs16, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_3(gvec_abs32, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_3(gvec_abs64, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
> +
>  DEF_HELPER_FLAGS_3(gvec_not, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_4(gvec_and, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_4(gvec_or, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index f5640a229b..21d06d928c 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -132,6 +132,7 @@ typedef enum {
>  #define TCG_TARGET_HAS_orc_vec          1
>  #define TCG_TARGET_HAS_not_vec          1
>  #define TCG_TARGET_HAS_neg_vec          1
> +#define TCG_TARGET_HAS_abs_vec          0
>  #define TCG_TARGET_HAS_shi_vec          1
>  #define TCG_TARGET_HAS_shs_vec          0
>  #define TCG_TARGET_HAS_shv_vec          1
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 618aa520d2..7445f05885 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -182,6 +182,7 @@ extern bool have_avx2;
>  #define TCG_TARGET_HAS_orc_vec          0
>  #define TCG_TARGET_HAS_not_vec          0
>  #define TCG_TARGET_HAS_neg_vec          0
> +#define TCG_TARGET_HAS_abs_vec          0
>  #define TCG_TARGET_HAS_shi_vec          1
>  #define TCG_TARGET_HAS_shs_vec          1
>  #define TCG_TARGET_HAS_shv_vec          have_avx2
> diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
> index f9c6058e92..46f58febbf 100644
> --- a/tcg/tcg-op-gvec.h
> +++ b/tcg/tcg-op-gvec.h
> @@ -228,6 +228,8 @@ void tcg_gen_gvec_not(unsigned vece, uint32_t dofs, uint32_t aofs,
>                        uint32_t oprsz, uint32_t maxsz);
>  void tcg_gen_gvec_neg(unsigned vece, uint32_t dofs, uint32_t aofs,
>                        uint32_t oprsz, uint32_t maxsz);
> +void tcg_gen_gvec_abs(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                      uint32_t oprsz, uint32_t maxsz);
>  
>  void tcg_gen_gvec_add(unsigned vece, uint32_t dofs, uint32_t aofs,
>                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index 4bf71f261f..4a2dd116eb 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -225,6 +225,7 @@ DEF(add_vec, 1, 2, 0, IMPLVEC)
>  DEF(sub_vec, 1, 2, 0, IMPLVEC)
>  DEF(mul_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_mul_vec))
>  DEF(neg_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_neg_vec))
> +DEF(abs_vec, 1, 1, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_abs_vec))
>  DEF(ssadd_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
>  DEF(usadd_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
>  DEF(sssub_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_sat_vec))
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 48d4d2e03e..986055fdfa 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -176,6 +176,7 @@ typedef uint64_t TCGRegSet;
>      && !defined(TCG_TARGET_HAS_v128) \
>      && !defined(TCG_TARGET_HAS_v256)
>  #define TCG_TARGET_MAYBE_vec            0
> +#define TCG_TARGET_HAS_abs_vec          0
>  #define TCG_TARGET_HAS_neg_vec          0
>  #define TCG_TARGET_HAS_not_vec          0
>  #define TCG_TARGET_HAS_andc_vec         0
> diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
> index 7b88f5590c..dd08095773 100644
> --- a/accel/tcg/tcg-runtime-gvec.c
> +++ b/accel/tcg/tcg-runtime-gvec.c
> @@ -398,6 +398,54 @@ void HELPER(gvec_neg64)(void *d, void *a, uint32_t desc)
>      clear_high(d, oprsz, desc);
>  }
>  
> +void HELPER(gvec_abs8)(void *d, void *a, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(int8_t)) {
> +        int8_t aa = *(int8_t *)(a + i);
> +        *(int8_t *)(d + i) = aa < 0 ? -aa : aa;
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_abs16)(void *d, void *a, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(int16_t)) {
> +        int16_t aa = *(int16_t *)(a + i);
> +        *(int16_t *)(d + i) = aa < 0 ? -aa : aa;
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_abs32)(void *d, void *a, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(int32_t)) {
> +        int32_t aa = *(int32_t *)(a + i);
> +        *(int32_t *)(d + i) = aa < 0 ? -aa : aa;
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_abs64)(void *d, void *a, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(int64_t)) {
> +        int64_t aa = *(int64_t *)(a + i);
> +        *(int64_t *)(d + i) = aa < 0 ? -aa : aa;
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
>  void HELPER(gvec_mov)(void *d, void *a, uint32_t desc)
>  {
>      intptr_t oprsz = simd_oprsz(desc);
> diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
> index 4eb0747ddd..87d5a01cc9 100644
> --- a/tcg/tcg-op-gvec.c
> +++ b/tcg/tcg-op-gvec.c
> @@ -88,6 +88,14 @@ static bool tcg_can_emit_vecop_list(const TCGOpcode *list,
>                  continue;
>              }
>              break;
> +        case INDEX_op_abs_vec:
> +            if (tcg_can_emit_vec_op(INDEX_op_sub_vec, type, vece)
> +                && (tcg_can_emit_vec_op(INDEX_op_smax_vec, type, vece) > 0
> +                    || tcg_can_emit_vec_op(INDEX_op_sari_vec, type, vece) > 0
> +                    || tcg_can_emit_vec_op(INDEX_op_cmp_vec, type, vece))) {
> +                continue;
> +            }
> +            break;
>          default:
>              break;
>          }
> @@ -2239,6 +2247,69 @@ void tcg_gen_gvec_neg(unsigned vece, uint32_t dofs, uint32_t aofs,
>      tcg_gen_gvec_2(dofs, aofs, oprsz, maxsz, &g[vece]);
>  }
>  
> +static void gen_absv_mask(TCGv_i64 d, TCGv_i64 b, unsigned vece)
> +{
> +    TCGv_i64 t = tcg_temp_new_i64();
> +    int nbit = 8 << vece;
> +
> +    /* Create -1 for each negative element.  */
> +    tcg_gen_shri_i64(t, b, nbit - 1);
> +    tcg_gen_andi_i64(t, t, dup_const(vece, 1));
> +    tcg_gen_muli_i64(t, t, (1 << nbit) - 1);
> +
> +    /*
> +     * Invert (via xor -1) and add one (via sub -1).
> +     * Because of the ordering the msb is cleared,
> +     * so we never have carry into the next element.
> +     */
> +    tcg_gen_xor_i64(d, b, t);
> +    tcg_gen_sub_i64(d, d, t);
> +
> +    tcg_temp_free_i64(t);
> +}
> +
> +static void tcg_gen_vec_abs8_i64(TCGv_i64 d, TCGv_i64 b)
> +{
> +    gen_absv_mask(d, b, MO_8);
> +}
> +
> +static void tcg_gen_vec_abs16_i64(TCGv_i64 d, TCGv_i64 b)
> +{
> +    gen_absv_mask(d, b, MO_16);
> +}
> +
> +void tcg_gen_gvec_abs(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                      uint32_t oprsz, uint32_t maxsz)
> +{
> +    static const TCGOpcode vecop_list[] = { INDEX_op_abs_vec, 0 };
> +    static const GVecGen2 g[4] = {
> +        { .fni8 = tcg_gen_vec_abs8_i64,
> +          .fniv = tcg_gen_abs_vec,
> +          .fno = gen_helper_gvec_abs8,
> +          .opt_opc = vecop_list,
> +          .vece = MO_8 },
> +        { .fni8 = tcg_gen_vec_abs16_i64,
> +          .fniv = tcg_gen_abs_vec,
> +          .fno = gen_helper_gvec_abs16,
> +          .opt_opc = vecop_list,
> +          .vece = MO_16 },
> +        { .fni4 = tcg_gen_abs_i32,
> +          .fniv = tcg_gen_abs_vec,
> +          .fno = gen_helper_gvec_abs32,
> +          .opt_opc = vecop_list,
> +          .vece = MO_32 },
> +        { .fni8 = tcg_gen_abs_i64,
> +          .fniv = tcg_gen_abs_vec,
> +          .fno = gen_helper_gvec_abs64,
> +          .opt_opc = vecop_list,
> +          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
> +          .vece = MO_64 },
> +    };
> +
> +    tcg_debug_assert(vece <= MO_64);
> +    tcg_gen_gvec_2(dofs, aofs, oprsz, maxsz, &g[vece]);
> +}
> +
>  void tcg_gen_gvec_and(unsigned vece, uint32_t dofs, uint32_t aofs,
>                        uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
>  {
> diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
> index 5ec8e0b1a0..b7f21145bb 100644
> --- a/tcg/tcg-op-vec.c
> +++ b/tcg/tcg-op-vec.c
> @@ -345,6 +345,37 @@ void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
>      tcg_swap_vecop_list(hold_list);
>  }
>  
> +void tcg_gen_abs_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
> +{
> +    const TCGOpcode *hold_list;
> +
> +    tcg_assert_listed_vecop(INDEX_op_abs_vec);
> +    hold_list = tcg_swap_vecop_list(NULL);
> +
> +    if (!do_op2(vece, r, a, INDEX_op_abs_vec)) {
> +        TCGType type = tcgv_vec_temp(r)->base_type;
> +        TCGv_vec t = tcg_temp_new_vec(type);
> +
> +        tcg_debug_assert(tcg_can_emit_vec_op(INDEX_op_sub_vec, type, vece));
> +        if (tcg_can_emit_vec_op(INDEX_op_smax_vec, type, vece) > 0) {
> +            tcg_gen_neg_vec(vece, t, a);
> +            tcg_gen_smax_vec(vece, r, a, t);
> +        } else {
> +            if (tcg_can_emit_vec_op(INDEX_op_sari_vec, type, vece) > 0) {
> +                tcg_gen_sari_vec(vece, t, a, (8 << vece) - 1);
> +            } else {
> +                do_dupi_vec(t, MO_REG, 0);
> +                tcg_gen_cmp_vec(TCG_COND_LT, vece, t, a, t);
> +            }
> +            tcg_gen_xor_vec(vece, r, a, t);
> +            tcg_gen_sub_vec(vece, r, r, t);
> +        }
> +
> +        tcg_temp_free_vec(t);
> +    }
> +    tcg_swap_vecop_list(hold_list);
> +}
> +
>  static void do_shifti(TCGOpcode opc, unsigned vece,
>                        TCGv_vec r, TCGv_vec a, int64_t i)
>  {
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 0b0b228bb5..86a95a636b 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1621,6 +1621,8 @@ bool tcg_op_supported(TCGOpcode op)
>          return have_vec && TCG_TARGET_HAS_not_vec;
>      case INDEX_op_neg_vec:
>          return have_vec && TCG_TARGET_HAS_neg_vec;
> +    case INDEX_op_abs_vec:
> +        return have_vec && TCG_TARGET_HAS_abs_vec;
>      case INDEX_op_andc_vec:
>          return have_vec && TCG_TARGET_HAS_andc_vec;
>      case INDEX_op_orc_vec:
> diff --git a/tcg/README b/tcg/README
> index c30e5418a6..cbdfd3b6bc 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -561,6 +561,10 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32.
>  
>    Similarly, v0 = -v1.
>  
> +* abs_vec   v0, v1
> +
> +  Similarly, v0 = v1 < 0 ? -v1 : v1, in elements across the vector.
> +
>  * smin_vec:
>  * umin_vec:
>  
> 

Thanks, using this for VECTOR LOAD POSITIVE on s390x.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 19/38] tcg: Add support for integer absolute value
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 19/38] tcg: Add support for integer absolute value Richard Henderson
  2019-04-23  8:52   ` Philippe Mathieu-Daudé
@ 2019-04-23 18:37   ` David Hildenbrand
  2019-04-23 22:09     ` Philippe Mathieu-Daudé
  1 sibling, 1 reply; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23 18:37 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 20.04.19 09:34, Richard Henderson wrote:
> Remove a function of the same name from target/arm/.
> Use a branchless implementation of abs that gcc uses for x86.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/tcg-op.h           |  5 +++++
>  target/arm/translate.c | 10 ----------
>  tcg/tcg-op.c           | 20 ++++++++++++++++++++
>  3 files changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index 472b73cb38..660fe205d0 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -335,6 +335,7 @@ void tcg_gen_smin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_smax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_umin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_umax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
> +void tcg_gen_abs_i32(TCGv_i32, TCGv_i32);
>  
>  static inline void tcg_gen_discard_i32(TCGv_i32 arg)
>  {
> @@ -534,6 +535,7 @@ void tcg_gen_smin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_smax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_umin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_umax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
> +void tcg_gen_abs_i64(TCGv_i64, TCGv_i64);
>  
>  #if TCG_TARGET_REG_BITS == 64
>  static inline void tcg_gen_discard_i64(TCGv_i64 arg)
> @@ -973,6 +975,7 @@ void tcg_gen_nor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>  void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>  void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>  void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
> +void tcg_gen_abs_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>  void tcg_gen_ssadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>  void tcg_gen_usadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>  void tcg_gen_sssub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
> @@ -1019,6 +1022,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
>  #define tcg_gen_addi_tl tcg_gen_addi_i64
>  #define tcg_gen_sub_tl tcg_gen_sub_i64
>  #define tcg_gen_neg_tl tcg_gen_neg_i64
> +#define tcg_gen_abs_tl tcg_gen_abs_i64
>  #define tcg_gen_subfi_tl tcg_gen_subfi_i64
>  #define tcg_gen_subi_tl tcg_gen_subi_i64
>  #define tcg_gen_and_tl tcg_gen_and_i64
> @@ -1131,6 +1135,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
>  #define tcg_gen_addi_tl tcg_gen_addi_i32
>  #define tcg_gen_sub_tl tcg_gen_sub_i32
>  #define tcg_gen_neg_tl tcg_gen_neg_i32
> +#define tcg_gen_abs_tl tcg_gen_abs_i32
>  #define tcg_gen_subfi_tl tcg_gen_subfi_i32
>  #define tcg_gen_subi_tl tcg_gen_subi_i32
>  #define tcg_gen_and_tl tcg_gen_and_i32
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 83a008e945..721171794d 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -603,16 +603,6 @@ static void gen_sar(TCGv_i32 dest, TCGv_i32 t0, TCGv_i32 t1)
>      tcg_temp_free_i32(tmp1);
>  }
>  
> -static void tcg_gen_abs_i32(TCGv_i32 dest, TCGv_i32 src)
> -{
> -    TCGv_i32 c0 = tcg_const_i32(0);
> -    TCGv_i32 tmp = tcg_temp_new_i32();
> -    tcg_gen_neg_i32(tmp, src);
> -    tcg_gen_movcond_i32(TCG_COND_GT, dest, src, c0, src, tmp);
> -    tcg_temp_free_i32(c0);
> -    tcg_temp_free_i32(tmp);
> -}
> -
>  static void shifter_out_im(TCGv_i32 var, int shift)
>  {
>      if (shift == 0) {
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index a00d1df37e..0ac291f1c4 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -1091,6 +1091,16 @@ void tcg_gen_umax_i32(TCGv_i32 ret, TCGv_i32 a, TCGv_i32 b)
>      tcg_gen_movcond_i32(TCG_COND_LTU, ret, a, b, b, a);
>  }
>  
> +void tcg_gen_abs_i32(TCGv_i32 ret, TCGv_i32 a)
> +{
> +    TCGv_i32 t = tcg_temp_new_i32();
> +
> +    tcg_gen_sari_i32(t, a, 31);
> +    tcg_gen_xor_i32(ret, a, t);
> +    tcg_gen_sub_i32(ret, ret, t);
> +    tcg_temp_free_i32(t);
> +}
> +
>  /* 64-bit ops */
>  
>  #if TCG_TARGET_REG_BITS == 32
> @@ -2548,6 +2558,16 @@ void tcg_gen_umax_i64(TCGv_i64 ret, TCGv_i64 a, TCGv_i64 b)
>      tcg_gen_movcond_i64(TCG_COND_LTU, ret, a, b, b, a);
>  }
>  
> +void tcg_gen_abs_i64(TCGv_i64 ret, TCGv_i64 a)
> +{
> +    TCGv_i64 t = tcg_temp_new_i64();
> +
> +    tcg_gen_sari_i64(t, a, 63);
> +    tcg_gen_xor_i64(ret, a, t);
> +    tcg_gen_sub_i64(ret, ret, t);
> +    tcg_temp_free_i64(t);
> +}
> +
>  /* Size changing operations.  */
>  
>  void tcg_gen_extrl_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
> 

Nice trick

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 24/38] target/s390x: Use tcg_gen_abs_i64
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 24/38] target/s390x: Use tcg_gen_abs_i64 Richard Henderson
@ 2019-04-23 18:40   ` David Hildenbrand
  2019-04-23 22:12   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23 18:40 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 20.04.19 09:34, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/s390x/translate.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/target/s390x/translate.c b/target/s390x/translate.c
> index 0afa8f7ca5..030129acbb 100644
> --- a/target/s390x/translate.c
> +++ b/target/s390x/translate.c
> @@ -1407,13 +1407,7 @@ static DisasJumpType help_branch(DisasContext *s, DisasCompare *c,
>  
>  static DisasJumpType op_abs(DisasContext *s, DisasOps *o)
>  {
> -    TCGv_i64 z, n;
> -    z = tcg_const_i64(0);
> -    n = tcg_temp_new_i64();
> -    tcg_gen_neg_i64(n, o->in2);
> -    tcg_gen_movcond_i64(TCG_COND_LT, o->out, o->in2, z, n, o->in2);
> -    tcg_temp_free_i64(n);
> -    tcg_temp_free_i64(z);
> +    tcg_gen_abs_i64(o->out, o->in2);
>      return DISAS_NEXT;
>  }
>  
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 17/38] tcg: Add gvec expanders for vector shift by scalar
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 17/38] tcg: Add gvec expanders for vector shift by scalar Richard Henderson
@ 2019-04-23 18:58   ` David Hildenbrand
  2019-04-23 19:21     ` Richard Henderson
  0 siblings, 1 reply; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23 18:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 20.04.19 09:34, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/tcg-op-gvec.h |   7 ++
>  tcg/tcg-op.h      |   4 +
>  tcg/tcg-op-gvec.c | 210 ++++++++++++++++++++++++++++++++++++++++++++++
>  tcg/tcg-op-vec.c  |  54 ++++++++++++
>  4 files changed, 275 insertions(+)
> 
> diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
> index a0e0902f6c..f9c6058e92 100644
> --- a/tcg/tcg-op-gvec.h
> +++ b/tcg/tcg-op-gvec.h
> @@ -318,6 +318,13 @@ void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs,
>  void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
>                         int64_t shift, uint32_t oprsz, uint32_t maxsz);
>  
> +void tcg_gen_gvec_shls(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
> +void tcg_gen_gvec_shrs(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
> +void tcg_gen_gvec_sars(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);

I assume all irrelevant bits of the shift have to be masked off by the
caller, right?

On s390x, I would use it for (one variant of) VECTOR ELEMENT SHIFT like
this:


+static DisasJumpType op_ves(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    const uint8_t d2 = get_field(s->fields, d2) &
+                       (NUM_VEC_ELEMENT_BITS(es) - 1);
+    const uint8_t v1 = get_field(s->fields, v1);
+    const uint8_t v3 = get_field(s->fields, v3);
+    TCGv_i32 shift;
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    shift = tcg_temp_new_i32();
+    tcg_gen_extrl_i64_i32(shift, o->addr1);
+    tcg_gen_andi_i32(shift, shift, NUM_VEC_ELEMENT_BITS(es) - 1);
+
+    switch (s->fields->op2) {
+    case 0x30:
+        if (likely(!get_field(s->fields, b2))) {
+            gen_gvec_fn_2i(shli, es, v1, v3, d2);
+        } else {
+            gen_gvec_fn_2s(shls, es, v1, v3, shift);
+        }
+        break;
+    case 0x3a:
+        if (likely(!get_field(s->fields, b2))) {
+            gen_gvec_fn_2i(sari, es, v1, v3, d2);
+        } else {
+            gen_gvec_fn_2s(sars, es, v1, v3, shift);
+        }
+        break;
+    case 0x38:
+        if (likely(!get_field(s->fields, b2))) {
+            gen_gvec_fn_2i(shri, es, v1, v3, d2);
+        } else {
+            gen_gvec_fn_2s(shrs, es, v1, v3, shift);
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    tcg_temp_free_i32(shift);
+    return DISAS_NEXT;
+}

Does it still make sense to special-case the const immediate case?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 12/38] tcg: Add gvec expanders for variable shift
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 12/38] tcg: Add gvec expanders for variable shift Richard Henderson
@ 2019-04-23 19:04   ` David Hildenbrand
  2019-04-23 19:28     ` Richard Henderson
  0 siblings, 1 reply; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23 19:04 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

In order to use this on s390x for VECTOR ELEMENT SHIFT, like

+static DisasJumpType op_vesv(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    const uint8_t v1 = get_field(s->fields, v1);
+    const uint8_t v2 = get_field(s->fields, v2);
+    const uint8_t v3 = get_field(s->fields, v3);
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    switch (s->fields->op2) {
+    case 0x70:
+        gen_gvec_fn_3(shlv, es, v1, v2, v3);
+        break;
+    case 0x7a:
+        gen_gvec_fn_3(sarv, es, v1, v2, v3);
+        break;
+    case 0x78:
+        gen_gvec_fn_3(shrv, es, v1, v2, v3);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    return DISAS_NEXT;
+}

We need to mask of invalid bits from the shift. Can that be added?


On 20.04.19 09:34, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/tcg-runtime.h      |  15 ++++
>  tcg/tcg-op-gvec.h            |   7 ++
>  tcg/tcg-op.h                 |   4 ++
>  accel/tcg/tcg-runtime-gvec.c | 132 +++++++++++++++++++++++++++++++++++
>  tcg/tcg-op-gvec.c            |  87 +++++++++++++++++++++++
>  tcg/tcg-op-vec.c             |  15 ++++
>  6 files changed, 260 insertions(+)
> 
> diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
> index dfe325625c..ed3ce5fd91 100644
> --- a/accel/tcg/tcg-runtime.h
> +++ b/accel/tcg/tcg-runtime.h
> @@ -254,6 +254,21 @@ DEF_HELPER_FLAGS_3(gvec_sar16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_3(gvec_sar32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_3(gvec_sar64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
>  
> +DEF_HELPER_FLAGS_4(gvec_shl8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(gvec_shl16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(gvec_shl32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(gvec_shl64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +
> +DEF_HELPER_FLAGS_4(gvec_shr8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(gvec_shr16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(gvec_shr32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(gvec_shr64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +
> +DEF_HELPER_FLAGS_4(gvec_sar8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(gvec_sar16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(gvec_sar32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(gvec_sar64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +
>  DEF_HELPER_FLAGS_4(gvec_eq8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_4(gvec_eq16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_4(gvec_eq32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
> index 850da32ded..1cd18a959a 100644
> --- a/tcg/tcg-op-gvec.h
> +++ b/tcg/tcg-op-gvec.h
> @@ -294,6 +294,13 @@ void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs,
>  void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
>                         int64_t shift, uint32_t oprsz, uint32_t maxsz);
>  
> +void tcg_gen_gvec_shlv(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
> +void tcg_gen_gvec_shrv(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
> +void tcg_gen_gvec_sarv(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
> +
>  void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs,
>                        uint32_t aofs, uint32_t bofs,
>                        uint32_t oprsz, uint32_t maxsz);
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index 9fff9864f6..833c6330b5 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -986,6 +986,10 @@ void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
>  void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
>  void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
>  
> +void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
> +void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
> +void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
> +
>  void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, TCGv_vec r,
>                       TCGv_vec a, TCGv_vec b);
>  
> diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
> index e2c6f24262..7b88f5590c 100644
> --- a/accel/tcg/tcg-runtime-gvec.c
> +++ b/accel/tcg/tcg-runtime-gvec.c
> @@ -725,6 +725,138 @@ void HELPER(gvec_sar64i)(void *d, void *a, uint32_t desc)
>      clear_high(d, oprsz, desc);
>  }
>  
> +void HELPER(gvec_shl8v)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(uint8_t)) {
> +        *(uint8_t *)(d + i) = *(uint8_t *)(a + i) << *(uint8_t *)(b + i);
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_shl16v)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(uint16_t)) {
> +        *(uint16_t *)(d + i) = *(uint16_t *)(a + i) << *(uint16_t *)(b + i);
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_shl32v)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
> +        *(uint32_t *)(d + i) = *(uint32_t *)(a + i) << *(uint32_t *)(b + i);
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_shl64v)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
> +        *(uint64_t *)(d + i) = *(uint64_t *)(a + i) << *(uint64_t *)(b + i);
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_shr8v)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(uint8_t)) {
> +        *(uint8_t *)(d + i) = *(uint8_t *)(a + i) >> *(uint8_t *)(b + i);
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_shr16v)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(uint16_t)) {
> +        *(uint16_t *)(d + i) = *(uint16_t *)(a + i) >> *(uint16_t *)(b + i);
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_shr32v)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
> +        *(uint32_t *)(d + i) = *(uint32_t *)(a + i) >> *(uint32_t *)(b + i);
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_shr64v)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
> +        *(uint64_t *)(d + i) = *(uint64_t *)(a + i) >> *(uint64_t *)(b + i);
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_sar8v)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(vec8)) {
> +        *(int8_t *)(d + i) = *(int8_t *)(a + i) >> *(int8_t *)(b + i);
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_sar16v)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(int16_t)) {
> +        *(int16_t *)(d + i) = *(int16_t *)(a + i) >> *(int16_t *)(b + i);
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_sar32v)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(vec32)) {
> +        *(int32_t *)(d + i) = *(int32_t *)(a + i) >> *(int32_t *)(b + i);
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
> +void HELPER(gvec_sar64v)(void *d, void *a, void *b, uint32_t desc)
> +{
> +    intptr_t oprsz = simd_oprsz(desc);
> +    intptr_t i;
> +
> +    for (i = 0; i < oprsz; i += sizeof(vec64)) {
> +        *(int64_t *)(d + i) = *(int64_t *)(a + i) >> *(int64_t *)(b + i);
> +    }
> +    clear_high(d, oprsz, desc);
> +}
> +
>  /* If vectors are enabled, the compiler fills in -1 for true.
>     Otherwise, we must take care of this by hand.  */
>  #ifdef CONFIG_VECTOR16
> diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
> index f056018713..5d28184045 100644
> --- a/tcg/tcg-op-gvec.c
> +++ b/tcg/tcg-op-gvec.c
> @@ -2382,6 +2382,93 @@ void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
>      }
>  }
>  
> +void tcg_gen_gvec_shlv(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
> +{
> +    static const GVecGen3 g[4] = {
> +        { .fniv = tcg_gen_shlv_vec,
> +          .fno = gen_helper_gvec_shl8v,
> +          .opc = INDEX_op_shlv_vec,
> +          .vece = MO_8 },
> +        { .fniv = tcg_gen_shlv_vec,
> +          .fno = gen_helper_gvec_shl16v,
> +          .opc = INDEX_op_shlv_vec,
> +          .vece = MO_16 },
> +        { .fni4 = tcg_gen_shl_i32,
> +          .fniv = tcg_gen_shlv_vec,
> +          .fno = gen_helper_gvec_shl32v,
> +          .opc = INDEX_op_shlv_vec,
> +          .vece = MO_32 },
> +        { .fni8 = tcg_gen_shl_i64,
> +          .fniv = tcg_gen_shlv_vec,
> +          .fno = gen_helper_gvec_shl64v,
> +          .opc = INDEX_op_shlv_vec,
> +          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
> +          .vece = MO_64 },
> +    };
> +
> +    tcg_debug_assert(vece <= MO_64);
> +    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
> +}
> +
> +void tcg_gen_gvec_shrv(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
> +{
> +    static const GVecGen3 g[4] = {
> +        { .fniv = tcg_gen_shrv_vec,
> +          .fno = gen_helper_gvec_shr8v,
> +          .opc = INDEX_op_shrv_vec,
> +          .vece = MO_8 },
> +        { .fniv = tcg_gen_shrv_vec,
> +          .fno = gen_helper_gvec_shr16v,
> +          .opc = INDEX_op_shrv_vec,
> +          .vece = MO_16 },
> +        { .fni4 = tcg_gen_shr_i32,
> +          .fniv = tcg_gen_shrv_vec,
> +          .fno = gen_helper_gvec_shr32v,
> +          .opc = INDEX_op_shrv_vec,
> +          .vece = MO_32 },
> +        { .fni8 = tcg_gen_shr_i64,
> +          .fniv = tcg_gen_shrv_vec,
> +          .fno = gen_helper_gvec_shr64v,
> +          .opc = INDEX_op_shrv_vec,
> +          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
> +          .vece = MO_64 },
> +    };
> +
> +    tcg_debug_assert(vece <= MO_64);
> +    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
> +}
> +
> +void tcg_gen_gvec_sarv(unsigned vece, uint32_t dofs, uint32_t aofs,
> +                       uint32_t bofs, uint32_t oprsz, uint32_t maxsz)
> +{
> +    static const GVecGen3 g[4] = {
> +        { .fniv = tcg_gen_sarv_vec,
> +          .fno = gen_helper_gvec_sar8v,
> +          .opc = INDEX_op_sarv_vec,
> +          .vece = MO_8 },
> +        { .fniv = tcg_gen_sarv_vec,
> +          .fno = gen_helper_gvec_sar16v,
> +          .opc = INDEX_op_sarv_vec,
> +          .vece = MO_16 },
> +        { .fni4 = tcg_gen_sar_i32,
> +          .fniv = tcg_gen_sarv_vec,
> +          .fno = gen_helper_gvec_sar32v,
> +          .opc = INDEX_op_sarv_vec,
> +          .vece = MO_32 },
> +        { .fni8 = tcg_gen_sar_i64,
> +          .fniv = tcg_gen_sarv_vec,
> +          .fno = gen_helper_gvec_sar64v,
> +          .opc = INDEX_op_sarv_vec,
> +          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
> +          .vece = MO_64 },
> +    };
> +
> +    tcg_debug_assert(vece <= MO_64);
> +    tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, &g[vece]);
> +}
> +
>  /* Expand OPSZ bytes worth of three-operand operations using i32 elements.  */
>  static void expand_cmp_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
>                             uint32_t oprsz, TCGCond cond)
> diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
> index ce7987b858..6601cb8a8f 100644
> --- a/tcg/tcg-op-vec.c
> +++ b/tcg/tcg-op-vec.c
> @@ -481,3 +481,18 @@ void tcg_gen_umax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
>  {
>      do_op3(vece, r, a, b, INDEX_op_umax_vec);
>  }
> +
> +void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
> +{
> +    do_op3(vece, r, a, b, INDEX_op_shlv_vec);
> +}
> +
> +void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
> +{
> +    do_op3(vece, r, a, b, INDEX_op_shrv_vec);
> +}
> +
> +void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
> +{
> +    do_op3(vece, r, a, b, INDEX_op_sarv_vec);
> +}
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 00/38] tcg vector improvements
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (38 preceding siblings ...)
  2019-04-20  8:09 ` [Qemu-devel] [PATCH 00/38] tcg vector improvements no-reply
@ 2019-04-23 19:15 ` David Hildenbrand
  2019-04-23 20:26   ` Richard Henderson
  2019-04-29 19:28 ` David Hildenbrand
  40 siblings, 1 reply; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23 19:15 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 20.04.19 09:34, Richard Henderson wrote:
> Based-on: tcg-next, which at present is only tcg_gen_extract2.
> 
> The dupm patches have been on list before, with a larger context
> of supporting tcg/ppc.  The rest of the set was written to support
> David's s390 vector patches.  In particular:
> 
> (1) Add vector absolute value.
> (2) Add vector shift by non-constant scalar.
> (3) Add vector shift by vector.
> (4) Add vector select.

Remind me, is this for VECTOR SELECT on s390x where we already added a
vector variant? At least VECTOR SELECT on s390x works on bit, not
element granularity.

> (5) Be more precise in handling target-specific vector expansions.
> 
> And then there's a set of bugs that I encountered while working
> on this across x86, aa64, and ppc hosts.  Tested primarily with
> aa64 as the guest, via RISU.
> 
> 
> r~


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 17/38] tcg: Add gvec expanders for vector shift by scalar
  2019-04-23 18:58   ` David Hildenbrand
@ 2019-04-23 19:21     ` Richard Henderson
  2019-04-23 21:05       ` David Hildenbrand
  0 siblings, 1 reply; 69+ messages in thread
From: Richard Henderson @ 2019-04-23 19:21 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel

On 4/23/19 11:58 AM, David Hildenbrand wrote:
>> +void tcg_gen_gvec_shls(unsigned vece, uint32_t dofs, uint32_t aofs,
>> +                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
>> +void tcg_gen_gvec_shrs(unsigned vece, uint32_t dofs, uint32_t aofs,
>> +                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
>> +void tcg_gen_gvec_sars(unsigned vece, uint32_t dofs, uint32_t aofs,
>> +                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
> 
> I assume all irrelevant bits of the shift have to be masked off by the
> caller, right?

Correct, just like for integers.

> 
> On s390x, I would use it for (one variant of) VECTOR ELEMENT SHIFT like
> this:
> 
> 
> +static DisasJumpType op_ves(DisasContext *s, DisasOps *o)
> +{
> +    const uint8_t es = get_field(s->fields, m4);
> +    const uint8_t d2 = get_field(s->fields, d2) &
> +                       (NUM_VEC_ELEMENT_BITS(es) - 1);
> +    const uint8_t v1 = get_field(s->fields, v1);
> +    const uint8_t v3 = get_field(s->fields, v3);
> +    TCGv_i32 shift;
> +
> +    if (es > ES_64) {
> +        gen_program_exception(s, PGM_SPECIFICATION);
> +        return DISAS_NORETURN;
> +    }
> +
> +    shift = tcg_temp_new_i32();
> +    tcg_gen_extrl_i64_i32(shift, o->addr1);
> +    tcg_gen_andi_i32(shift, shift, NUM_VEC_ELEMENT_BITS(es) - 1);
> +
> +    switch (s->fields->op2) {
> +    case 0x30:
> +        if (likely(!get_field(s->fields, b2))) {
> +            gen_gvec_fn_2i(shli, es, v1, v3, d2);
> +        } else {
> +            gen_gvec_fn_2s(shls, es, v1, v3, shift);
> +        }
> +        break;
> +    case 0x3a:
> +        if (likely(!get_field(s->fields, b2))) {
> +            gen_gvec_fn_2i(sari, es, v1, v3, d2);
> +        } else {
> +            gen_gvec_fn_2s(sars, es, v1, v3, shift);
> +        }
> +        break;
> +    case 0x38:
> +        if (likely(!get_field(s->fields, b2))) {
> +            gen_gvec_fn_2i(shri, es, v1, v3, d2);
> +        } else {
> +            gen_gvec_fn_2s(shrs, es, v1, v3, shift);
> +        }
> +        break;
> +    default:
> +        g_assert_not_reached();
> +    }
> +    tcg_temp_free_i32(shift);
> +    return DISAS_NEXT;
> +}

Looks plausible.  I might have hoisted the b2 == 0 check,
and avoid the other tcg arithmetic when unused.

> Does it still make sense to special-case the const immediate case?

Yes.  We cannot turn non-constant scalar shift into immediate shift, when it
can be shown that the scalar is constant.

x86 (and s390, obviously) has all 3 forms of shift.
aarch64 and powerpc are missing the scalar form, having
only the immediate and vector forms.

The expansion that we do when a form is missing may make it
very difficult to undo via constant propagation.


r~

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 12/38] tcg: Add gvec expanders for variable shift
  2019-04-23 19:04   ` David Hildenbrand
@ 2019-04-23 19:28     ` Richard Henderson
  2019-04-23 21:02       ` David Hildenbrand
  0 siblings, 1 reply; 69+ messages in thread
From: Richard Henderson @ 2019-04-23 19:28 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel

On 4/23/19 12:04 PM, David Hildenbrand wrote:
> In order to use this on s390x for VECTOR ELEMENT SHIFT, like
> 
> +static DisasJumpType op_vesv(DisasContext *s, DisasOps *o)
> +{
> +    const uint8_t es = get_field(s->fields, m4);
> +    const uint8_t v1 = get_field(s->fields, v1);
> +    const uint8_t v2 = get_field(s->fields, v2);
> +    const uint8_t v3 = get_field(s->fields, v3);
> +
> +    if (es > ES_64) {
> +        gen_program_exception(s, PGM_SPECIFICATION);
> +        return DISAS_NORETURN;
> +    }
> +
> +    switch (s->fields->op2) {
> +    case 0x70:
> +        gen_gvec_fn_3(shlv, es, v1, v2, v3);
> +        break;
> +    case 0x7a:
> +        gen_gvec_fn_3(sarv, es, v1, v2, v3);
> +        break;
> +    case 0x78:
> +        gen_gvec_fn_3(shrv, es, v1, v2, v3);
> +        break;
> +    default:
> +        g_assert_not_reached();
> +    }
> +
> +    return DISAS_NEXT;
> +}
> 
> We need to mask of invalid bits from the shift. Can that be added?

Yes, I do exactly this in patch 31 for target/ppc.


r~

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 00/38] tcg vector improvements
  2019-04-23 19:15 ` David Hildenbrand
@ 2019-04-23 20:26   ` Richard Henderson
  2019-04-23 20:31     ` David Hildenbrand
  0 siblings, 1 reply; 69+ messages in thread
From: Richard Henderson @ 2019-04-23 20:26 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel

On 4/23/19 12:15 PM, David Hildenbrand wrote:
> On 20.04.19 09:34, Richard Henderson wrote:
>> Based-on: tcg-next, which at present is only tcg_gen_extract2.
>>
>> The dupm patches have been on list before, with a larger context
>> of supporting tcg/ppc.  The rest of the set was written to support
>> David's s390 vector patches.  In particular:
>>
>> (1) Add vector absolute value.
>> (2) Add vector shift by non-constant scalar.
>> (3) Add vector shift by vector.
>> (4) Add vector select.
> 
> Remind me, is this for VECTOR SELECT on s390x where we already added a
> vector variant? At least VECTOR SELECT on s390x works on bit, not
> element granularity.


No, this was more for implementing _vec helpers, where we can
reasonably use element granularity (since that's all x86 has).

I thought about adding a bitsel alongside cmpsel, to work on
bits like this, but haven't yet.


r~

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 00/38] tcg vector improvements
  2019-04-23 20:26   ` Richard Henderson
@ 2019-04-23 20:31     ` David Hildenbrand
  0 siblings, 0 replies; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23 20:31 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 23.04.19 22:26, Richard Henderson wrote:
> On 4/23/19 12:15 PM, David Hildenbrand wrote:
>> On 20.04.19 09:34, Richard Henderson wrote:
>>> Based-on: tcg-next, which at present is only tcg_gen_extract2.
>>>
>>> The dupm patches have been on list before, with a larger context
>>> of supporting tcg/ppc.  The rest of the set was written to support
>>> David's s390 vector patches.  In particular:
>>>
>>> (1) Add vector absolute value.
>>> (2) Add vector shift by non-constant scalar.
>>> (3) Add vector shift by vector.
>>> (4) Add vector select.
>>
>> Remind me, is this for VECTOR SELECT on s390x where we already added a
>> vector variant? At least VECTOR SELECT on s390x works on bit, not
>> element granularity.
> 
> 
> No, this was more for implementing _vec helpers, where we can
> reasonably use element granularity (since that's all x86 has).
> 
> I thought about adding a bitsel alongside cmpsel, to work on
> bits like this, but haven't yet.
> 

Makes sense, thanks!


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 12/38] tcg: Add gvec expanders for variable shift
  2019-04-23 19:28     ` Richard Henderson
@ 2019-04-23 21:02       ` David Hildenbrand
  2019-04-23 21:40         ` Richard Henderson
  0 siblings, 1 reply; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23 21:02 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 23.04.19 21:28, Richard Henderson wrote:
> On 4/23/19 12:04 PM, David Hildenbrand wrote:
>> In order to use this on s390x for VECTOR ELEMENT SHIFT, like
>>
>> +static DisasJumpType op_vesv(DisasContext *s, DisasOps *o)
>> +{
>> +    const uint8_t es = get_field(s->fields, m4);
>> +    const uint8_t v1 = get_field(s->fields, v1);
>> +    const uint8_t v2 = get_field(s->fields, v2);
>> +    const uint8_t v3 = get_field(s->fields, v3);
>> +
>> +    if (es > ES_64) {
>> +        gen_program_exception(s, PGM_SPECIFICATION);
>> +        return DISAS_NORETURN;
>> +    }
>> +
>> +    switch (s->fields->op2) {
>> +    case 0x70:
>> +        gen_gvec_fn_3(shlv, es, v1, v2, v3);
>> +        break;
>> +    case 0x7a:
>> +        gen_gvec_fn_3(sarv, es, v1, v2, v3);
>> +        break;
>> +    case 0x78:
>> +        gen_gvec_fn_3(shrv, es, v1, v2, v3);
>> +        break;
>> +    default:
>> +        g_assert_not_reached();
>> +    }
>> +
>> +    return DISAS_NEXT;
>> +}
>>
>> We need to mask of invalid bits from the shift. Can that be added?
> 
> Yes, I do exactly this in patch 31 for target/ppc.
> 
> 
> r~
> 

Got it, so not via a generic gvec expansion. Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 17/38] tcg: Add gvec expanders for vector shift by scalar
  2019-04-23 19:21     ` Richard Henderson
@ 2019-04-23 21:05       ` David Hildenbrand
  0 siblings, 0 replies; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23 21:05 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 23.04.19 21:21, Richard Henderson wrote:
> On 4/23/19 11:58 AM, David Hildenbrand wrote:
>>> +void tcg_gen_gvec_shls(unsigned vece, uint32_t dofs, uint32_t aofs,
>>> +                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
>>> +void tcg_gen_gvec_shrs(unsigned vece, uint32_t dofs, uint32_t aofs,
>>> +                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
>>> +void tcg_gen_gvec_sars(unsigned vece, uint32_t dofs, uint32_t aofs,
>>> +                       TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
>>
>> I assume all irrelevant bits of the shift have to be masked off by the
>> caller, right?
> 
> Correct, just like for integers.
> 
>>
>> On s390x, I would use it for (one variant of) VECTOR ELEMENT SHIFT like
>> this:
>>
>>
>> +static DisasJumpType op_ves(DisasContext *s, DisasOps *o)
>> +{
>> +    const uint8_t es = get_field(s->fields, m4);
>> +    const uint8_t d2 = get_field(s->fields, d2) &
>> +                       (NUM_VEC_ELEMENT_BITS(es) - 1);
>> +    const uint8_t v1 = get_field(s->fields, v1);
>> +    const uint8_t v3 = get_field(s->fields, v3);
>> +    TCGv_i32 shift;
>> +
>> +    if (es > ES_64) {
>> +        gen_program_exception(s, PGM_SPECIFICATION);
>> +        return DISAS_NORETURN;
>> +    }
>> +
>> +    shift = tcg_temp_new_i32();
>> +    tcg_gen_extrl_i64_i32(shift, o->addr1);
>> +    tcg_gen_andi_i32(shift, shift, NUM_VEC_ELEMENT_BITS(es) - 1);
>> +
>> +    switch (s->fields->op2) {
>> +    case 0x30:
>> +        if (likely(!get_field(s->fields, b2))) {
>> +            gen_gvec_fn_2i(shli, es, v1, v3, d2);
>> +        } else {
>> +            gen_gvec_fn_2s(shls, es, v1, v3, shift);
>> +        }
>> +        break;
>> +    case 0x3a:
>> +        if (likely(!get_field(s->fields, b2))) {
>> +            gen_gvec_fn_2i(sari, es, v1, v3, d2);
>> +        } else {
>> +            gen_gvec_fn_2s(sars, es, v1, v3, shift);
>> +        }
>> +        break;
>> +    case 0x38:
>> +        if (likely(!get_field(s->fields, b2))) {
>> +            gen_gvec_fn_2i(shri, es, v1, v3, d2);
>> +        } else {
>> +            gen_gvec_fn_2s(shrs, es, v1, v3, shift);
>> +        }
>> +        break;
>> +    default:
>> +        g_assert_not_reached();
>> +    }
>> +    tcg_temp_free_i32(shift);
>> +    return DISAS_NEXT;
>> +}
> 
> Looks plausible.  I might have hoisted the b2 == 0 check,
> and avoid the other tcg arithmetic when unused.

Makes sense, will do. Thanks!


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 12/38] tcg: Add gvec expanders for variable shift
  2019-04-23 21:02       ` David Hildenbrand
@ 2019-04-23 21:40         ` Richard Henderson
  2019-04-23 21:57           ` David Hildenbrand
  0 siblings, 1 reply; 69+ messages in thread
From: Richard Henderson @ 2019-04-23 21:40 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel

On 4/23/19 2:02 PM, David Hildenbrand wrote:
> On 23.04.19 21:28, Richard Henderson wrote:
>> On 4/23/19 12:04 PM, David Hildenbrand wrote:
>>> In order to use this on s390x for VECTOR ELEMENT SHIFT, like
>>>
>>> +static DisasJumpType op_vesv(DisasContext *s, DisasOps *o)
>>> +{
>>> +    const uint8_t es = get_field(s->fields, m4);
>>> +    const uint8_t v1 = get_field(s->fields, v1);
>>> +    const uint8_t v2 = get_field(s->fields, v2);
>>> +    const uint8_t v3 = get_field(s->fields, v3);
>>> +
>>> +    if (es > ES_64) {
>>> +        gen_program_exception(s, PGM_SPECIFICATION);
>>> +        return DISAS_NORETURN;
>>> +    }
>>> +
>>> +    switch (s->fields->op2) {
>>> +    case 0x70:
>>> +        gen_gvec_fn_3(shlv, es, v1, v2, v3);
>>> +        break;
>>> +    case 0x7a:
>>> +        gen_gvec_fn_3(sarv, es, v1, v2, v3);
>>> +        break;
>>> +    case 0x78:
>>> +        gen_gvec_fn_3(shrv, es, v1, v2, v3);
>>> +        break;
>>> +    default:
>>> +        g_assert_not_reached();
>>> +    }
>>> +
>>> +    return DISAS_NEXT;
>>> +}
>>>
>>> We need to mask of invalid bits from the shift. Can that be added?
>>
>> Yes, I do exactly this in patch 31 for target/ppc.
>>
>>
>> r~
>>
> 
> Got it, so not via a generic gvec expansion. Thanks!

I do wonder if I *should* make the truncating expansion the
generic gvec expansion.  It would be usable from two targets
at least...

Because the one's that *don't* truncate in hardware will
still have to have their own custom expansion code.

Thoughts?


r~

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 12/38] tcg: Add gvec expanders for variable shift
  2019-04-23 21:40         ` Richard Henderson
@ 2019-04-23 21:57           ` David Hildenbrand
  0 siblings, 0 replies; 69+ messages in thread
From: David Hildenbrand @ 2019-04-23 21:57 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 23.04.19 23:40, Richard Henderson wrote:
> On 4/23/19 2:02 PM, David Hildenbrand wrote:
>> On 23.04.19 21:28, Richard Henderson wrote:
>>> On 4/23/19 12:04 PM, David Hildenbrand wrote:
>>>> In order to use this on s390x for VECTOR ELEMENT SHIFT, like
>>>>
>>>> +static DisasJumpType op_vesv(DisasContext *s, DisasOps *o)
>>>> +{
>>>> +    const uint8_t es = get_field(s->fields, m4);
>>>> +    const uint8_t v1 = get_field(s->fields, v1);
>>>> +    const uint8_t v2 = get_field(s->fields, v2);
>>>> +    const uint8_t v3 = get_field(s->fields, v3);
>>>> +
>>>> +    if (es > ES_64) {
>>>> +        gen_program_exception(s, PGM_SPECIFICATION);
>>>> +        return DISAS_NORETURN;
>>>> +    }
>>>> +
>>>> +    switch (s->fields->op2) {
>>>> +    case 0x70:
>>>> +        gen_gvec_fn_3(shlv, es, v1, v2, v3);
>>>> +        break;
>>>> +    case 0x7a:
>>>> +        gen_gvec_fn_3(sarv, es, v1, v2, v3);
>>>> +        break;
>>>> +    case 0x78:
>>>> +        gen_gvec_fn_3(shrv, es, v1, v2, v3);
>>>> +        break;
>>>> +    default:
>>>> +        g_assert_not_reached();
>>>> +    }
>>>> +
>>>> +    return DISAS_NEXT;
>>>> +}
>>>>
>>>> We need to mask of invalid bits from the shift. Can that be added?
>>>
>>> Yes, I do exactly this in patch 31 for target/ppc.
>>>
>>>
>>> r~
>>>
>>
>> Got it, so not via a generic gvec expansion. Thanks!
> 
> I do wonder if I *should* make the truncating expansion the
> generic gvec expansion.  It would be usable from two targets
> at least...
> 
> Because the one's that *don't* truncate in hardware will
> still have to have their own custom expansion code.
> 
> Thoughts?
> 

Exactly what I had in mind. I wonder if the same applies to all shift
helpers/expansions. Expected behavior (mask off bits) is always better
than unexpected behavior. But of course I might be wrong. At least, here
I would vote for doing the truncation in the generic gvec expansion.

> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 19/38] tcg: Add support for integer absolute value
  2019-04-23 18:37   ` David Hildenbrand
@ 2019-04-23 22:09     ` Philippe Mathieu-Daudé
  2019-04-23 22:29       ` Richard Henderson
  0 siblings, 1 reply; 69+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-04-23 22:09 UTC (permalink / raw)
  To: David Hildenbrand, Richard Henderson, qemu-devel, Edgar E. Iglesias

On 4/23/19 8:37 PM, David Hildenbrand wrote:
> On 20.04.19 09:34, Richard Henderson wrote:
>> Remove a function of the same name from target/arm/.
>> Use a branchless implementation of abs that gcc uses for x86.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  tcg/tcg-op.h           |  5 +++++
>>  target/arm/translate.c | 10 ----------
>>  tcg/tcg-op.c           | 20 ++++++++++++++++++++
>>  3 files changed, 25 insertions(+), 10 deletions(-)
>>
>> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
>> index 472b73cb38..660fe205d0 100644
>> --- a/tcg/tcg-op.h
>> +++ b/tcg/tcg-op.h
>> @@ -335,6 +335,7 @@ void tcg_gen_smin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>>  void tcg_gen_smax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>>  void tcg_gen_umin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>>  void tcg_gen_umax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>> +void tcg_gen_abs_i32(TCGv_i32, TCGv_i32);
>>  
>>  static inline void tcg_gen_discard_i32(TCGv_i32 arg)
>>  {
>> @@ -534,6 +535,7 @@ void tcg_gen_smin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>>  void tcg_gen_smax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>>  void tcg_gen_umin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>>  void tcg_gen_umax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>> +void tcg_gen_abs_i64(TCGv_i64, TCGv_i64);
>>  
>>  #if TCG_TARGET_REG_BITS == 64
>>  static inline void tcg_gen_discard_i64(TCGv_i64 arg)
>> @@ -973,6 +975,7 @@ void tcg_gen_nor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>>  void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>>  void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>>  void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>> +void tcg_gen_abs_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>>  void tcg_gen_ssadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>>  void tcg_gen_usadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>>  void tcg_gen_sssub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>> @@ -1019,6 +1022,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
>>  #define tcg_gen_addi_tl tcg_gen_addi_i64
>>  #define tcg_gen_sub_tl tcg_gen_sub_i64
>>  #define tcg_gen_neg_tl tcg_gen_neg_i64
>> +#define tcg_gen_abs_tl tcg_gen_abs_i64
>>  #define tcg_gen_subfi_tl tcg_gen_subfi_i64
>>  #define tcg_gen_subi_tl tcg_gen_subi_i64
>>  #define tcg_gen_and_tl tcg_gen_and_i64
>> @@ -1131,6 +1135,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
>>  #define tcg_gen_addi_tl tcg_gen_addi_i32
>>  #define tcg_gen_sub_tl tcg_gen_sub_i32
>>  #define tcg_gen_neg_tl tcg_gen_neg_i32
>> +#define tcg_gen_abs_tl tcg_gen_abs_i32
>>  #define tcg_gen_subfi_tl tcg_gen_subfi_i32
>>  #define tcg_gen_subi_tl tcg_gen_subi_i32
>>  #define tcg_gen_and_tl tcg_gen_and_i32
>> diff --git a/target/arm/translate.c b/target/arm/translate.c
>> index 83a008e945..721171794d 100644
>> --- a/target/arm/translate.c
>> +++ b/target/arm/translate.c
>> @@ -603,16 +603,6 @@ static void gen_sar(TCGv_i32 dest, TCGv_i32 t0, TCGv_i32 t1)
>>      tcg_temp_free_i32(tmp1);
>>  }
>>  
>> -static void tcg_gen_abs_i32(TCGv_i32 dest, TCGv_i32 src)
>> -{
>> -    TCGv_i32 c0 = tcg_const_i32(0);
>> -    TCGv_i32 tmp = tcg_temp_new_i32();
>> -    tcg_gen_neg_i32(tmp, src);
>> -    tcg_gen_movcond_i32(TCG_COND_GT, dest, src, c0, src, tmp);
>> -    tcg_temp_free_i32(c0);
>> -    tcg_temp_free_i32(tmp);
>> -}
>> -
>>  static void shifter_out_im(TCGv_i32 var, int shift)
>>  {
>>      if (shift == 0) {
>> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
>> index a00d1df37e..0ac291f1c4 100644
>> --- a/tcg/tcg-op.c
>> +++ b/tcg/tcg-op.c
>> @@ -1091,6 +1091,16 @@ void tcg_gen_umax_i32(TCGv_i32 ret, TCGv_i32 a, TCGv_i32 b)
>>      tcg_gen_movcond_i32(TCG_COND_LTU, ret, a, b, b, a);
>>  }
>>  
>> +void tcg_gen_abs_i32(TCGv_i32 ret, TCGv_i32 a)
>> +{
>> +    TCGv_i32 t = tcg_temp_new_i32();
>> +
>> +    tcg_gen_sari_i32(t, a, 31);
>> +    tcg_gen_xor_i32(ret, a, t);
>> +    tcg_gen_sub_i32(ret, ret, t);
>> +    tcg_temp_free_i32(t);
>> +}
>> +
>>  /* 64-bit ops */
>>  
>>  #if TCG_TARGET_REG_BITS == 32
>> @@ -2548,6 +2558,16 @@ void tcg_gen_umax_i64(TCGv_i64 ret, TCGv_i64 a, TCGv_i64 b)
>>      tcg_gen_movcond_i64(TCG_COND_LTU, ret, a, b, b, a);
>>  }
>>  
>> +void tcg_gen_abs_i64(TCGv_i64 ret, TCGv_i64 a)
>> +{
>> +    TCGv_i64 t = tcg_temp_new_i64();
>> +
>> +    tcg_gen_sari_i64(t, a, 63);
>> +    tcg_gen_xor_i64(ret, a, t);
>> +    tcg_gen_sub_i64(ret, ret, t);
>> +    tcg_temp_free_i64(t);
>> +}
>> +
>>  /* Size changing operations.  */
>>  
>>  void tcg_gen_extrl_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
>>
> 
> Nice trick

Per commit 7dcfb0897b99, I think it's worth a:

Inspired-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>

> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 24/38] target/s390x: Use tcg_gen_abs_i64
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 24/38] target/s390x: Use tcg_gen_abs_i64 Richard Henderson
  2019-04-23 18:40   ` David Hildenbrand
@ 2019-04-23 22:12   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 69+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-04-23 22:12 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: david

On 4/20/19 9:34 AM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/s390x/translate.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/target/s390x/translate.c b/target/s390x/translate.c
> index 0afa8f7ca5..030129acbb 100644
> --- a/target/s390x/translate.c
> +++ b/target/s390x/translate.c
> @@ -1407,13 +1407,7 @@ static DisasJumpType help_branch(DisasContext *s, DisasCompare *c,
>  
>  static DisasJumpType op_abs(DisasContext *s, DisasOps *o)
>  {
> -    TCGv_i64 z, n;
> -    z = tcg_const_i64(0);
> -    n = tcg_temp_new_i64();
> -    tcg_gen_neg_i64(n, o->in2);
> -    tcg_gen_movcond_i64(TCG_COND_LT, o->out, o->in2, z, n, o->in2);
> -    tcg_temp_free_i64(n);
> -    tcg_temp_free_i64(z);
> +    tcg_gen_abs_i64(o->out, o->in2);
>      return DISAS_NEXT;
>  }
>  
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 25/38] target/xtensa: Use tcg_gen_abs_i32
  2019-04-20  7:34 ` [Qemu-devel] [PATCH 25/38] target/xtensa: Use tcg_gen_abs_i32 Richard Henderson
@ 2019-04-23 22:14   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 69+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-04-23 22:14 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: david

On 4/20/19 9:34 AM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/xtensa/translate.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
> index 65561d2c49..62be8a6f6a 100644
> --- a/target/xtensa/translate.c
> +++ b/target/xtensa/translate.c
> @@ -1707,14 +1707,7 @@ void restore_state_to_opc(CPUXtensaState *env, TranslationBlock *tb,
>  static void translate_abs(DisasContext *dc, const OpcodeArg arg[],
>                            const uint32_t par[])
>  {
> -    TCGv_i32 zero = tcg_const_i32(0);
> -    TCGv_i32 neg = tcg_temp_new_i32();
> -
> -    tcg_gen_neg_i32(neg, arg[1].in);
> -    tcg_gen_movcond_i32(TCG_COND_GE, arg[0].out,
> -                        arg[1].in, zero, arg[1].in, neg);
> -    tcg_temp_free(neg);
> -    tcg_temp_free(zero);
> +    tcg_gen_abs_i32(arg[0].out, arg[1].in);
>  }
>  
>  static void translate_add(DisasContext *dc, const OpcodeArg arg[],
> 

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 19/38] tcg: Add support for integer absolute value
  2019-04-23 22:09     ` Philippe Mathieu-Daudé
@ 2019-04-23 22:29       ` Richard Henderson
  2019-04-23 23:05         ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 69+ messages in thread
From: Richard Henderson @ 2019-04-23 22:29 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	David Hildenbrand, qemu-devel, Edgar E. Iglesias

On 4/23/19 3:09 PM, Philippe Mathieu-Daudé wrote:
> On 4/23/19 8:37 PM, David Hildenbrand wrote:
>> On 20.04.19 09:34, Richard Henderson wrote:
>>> Remove a function of the same name from target/arm/.
>>> Use a branchless implementation of abs that gcc uses for x86.
>>>
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>> ---
>>>  tcg/tcg-op.h           |  5 +++++
>>>  target/arm/translate.c | 10 ----------
>>>  tcg/tcg-op.c           | 20 ++++++++++++++++++++
>>>  3 files changed, 25 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
>>> index 472b73cb38..660fe205d0 100644
>>> --- a/tcg/tcg-op.h
>>> +++ b/tcg/tcg-op.h
>>> @@ -335,6 +335,7 @@ void tcg_gen_smin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>>>  void tcg_gen_smax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>>>  void tcg_gen_umin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>>>  void tcg_gen_umax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>>> +void tcg_gen_abs_i32(TCGv_i32, TCGv_i32);
>>>  
>>>  static inline void tcg_gen_discard_i32(TCGv_i32 arg)
>>>  {
>>> @@ -534,6 +535,7 @@ void tcg_gen_smin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>>>  void tcg_gen_smax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>>>  void tcg_gen_umin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>>>  void tcg_gen_umax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>>> +void tcg_gen_abs_i64(TCGv_i64, TCGv_i64);
>>>  
>>>  #if TCG_TARGET_REG_BITS == 64
>>>  static inline void tcg_gen_discard_i64(TCGv_i64 arg)
>>> @@ -973,6 +975,7 @@ void tcg_gen_nor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>>>  void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>>>  void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>>>  void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>>> +void tcg_gen_abs_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>>>  void tcg_gen_ssadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>>>  void tcg_gen_usadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>>>  void tcg_gen_sssub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>>> @@ -1019,6 +1022,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
>>>  #define tcg_gen_addi_tl tcg_gen_addi_i64
>>>  #define tcg_gen_sub_tl tcg_gen_sub_i64
>>>  #define tcg_gen_neg_tl tcg_gen_neg_i64
>>> +#define tcg_gen_abs_tl tcg_gen_abs_i64
>>>  #define tcg_gen_subfi_tl tcg_gen_subfi_i64
>>>  #define tcg_gen_subi_tl tcg_gen_subi_i64
>>>  #define tcg_gen_and_tl tcg_gen_and_i64
>>> @@ -1131,6 +1135,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
>>>  #define tcg_gen_addi_tl tcg_gen_addi_i32
>>>  #define tcg_gen_sub_tl tcg_gen_sub_i32
>>>  #define tcg_gen_neg_tl tcg_gen_neg_i32
>>> +#define tcg_gen_abs_tl tcg_gen_abs_i32
>>>  #define tcg_gen_subfi_tl tcg_gen_subfi_i32
>>>  #define tcg_gen_subi_tl tcg_gen_subi_i32
>>>  #define tcg_gen_and_tl tcg_gen_and_i32
>>> diff --git a/target/arm/translate.c b/target/arm/translate.c
>>> index 83a008e945..721171794d 100644
>>> --- a/target/arm/translate.c
>>> +++ b/target/arm/translate.c
>>> @@ -603,16 +603,6 @@ static void gen_sar(TCGv_i32 dest, TCGv_i32 t0, TCGv_i32 t1)
>>>      tcg_temp_free_i32(tmp1);
>>>  }
>>>  
>>> -static void tcg_gen_abs_i32(TCGv_i32 dest, TCGv_i32 src)
>>> -{
>>> -    TCGv_i32 c0 = tcg_const_i32(0);
>>> -    TCGv_i32 tmp = tcg_temp_new_i32();
>>> -    tcg_gen_neg_i32(tmp, src);
>>> -    tcg_gen_movcond_i32(TCG_COND_GT, dest, src, c0, src, tmp);
>>> -    tcg_temp_free_i32(c0);
>>> -    tcg_temp_free_i32(tmp);
>>> -}
>>> -
>>>  static void shifter_out_im(TCGv_i32 var, int shift)
>>>  {
>>>      if (shift == 0) {
>>> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
>>> index a00d1df37e..0ac291f1c4 100644
>>> --- a/tcg/tcg-op.c
>>> +++ b/tcg/tcg-op.c
>>> @@ -1091,6 +1091,16 @@ void tcg_gen_umax_i32(TCGv_i32 ret, TCGv_i32 a, TCGv_i32 b)
>>>      tcg_gen_movcond_i32(TCG_COND_LTU, ret, a, b, b, a);
>>>  }
>>>  
>>> +void tcg_gen_abs_i32(TCGv_i32 ret, TCGv_i32 a)
>>> +{
>>> +    TCGv_i32 t = tcg_temp_new_i32();
>>> +
>>> +    tcg_gen_sari_i32(t, a, 31);
>>> +    tcg_gen_xor_i32(ret, a, t);
>>> +    tcg_gen_sub_i32(ret, ret, t);
>>> +    tcg_temp_free_i32(t);
>>> +}
>>> +
>>>  /* 64-bit ops */
>>>  
>>>  #if TCG_TARGET_REG_BITS == 32
>>> @@ -2548,6 +2558,16 @@ void tcg_gen_umax_i64(TCGv_i64 ret, TCGv_i64 a, TCGv_i64 b)
>>>      tcg_gen_movcond_i64(TCG_COND_LTU, ret, a, b, b, a);
>>>  }
>>>  
>>> +void tcg_gen_abs_i64(TCGv_i64 ret, TCGv_i64 a)
>>> +{
>>> +    TCGv_i64 t = tcg_temp_new_i64();
>>> +
>>> +    tcg_gen_sari_i64(t, a, 63);
>>> +    tcg_gen_xor_i64(ret, a, t);
>>> +    tcg_gen_sub_i64(ret, ret, t);
>>> +    tcg_temp_free_i64(t);
>>> +}
>>> +
>>>  /* Size changing operations.  */
>>>  
>>>  void tcg_gen_extrl_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
>>>
>>
>> Nice trick
> 
> Per commit 7dcfb0897b99, I think it's worth a:
> 
> Inspired-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>

*shrug* As per the comment, I got the sequence from gcc -O2 -S.


r~

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 19/38] tcg: Add support for integer absolute value
  2019-04-23 22:29       ` Richard Henderson
@ 2019-04-23 23:05         ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 69+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-04-23 23:05 UTC (permalink / raw)
  To: Richard Henderson, David Hildenbrand, qemu-devel, Edgar E. Iglesias

On 4/24/19 12:29 AM, Richard Henderson wrote:
> On 4/23/19 3:09 PM, Philippe Mathieu-Daudé wrote:
>> On 4/23/19 8:37 PM, David Hildenbrand wrote:
>>> On 20.04.19 09:34, Richard Henderson wrote:
>>>> Remove a function of the same name from target/arm/.
>>>> Use a branchless implementation of abs that gcc uses for x86.
>>>>
>>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>>> ---
>>>>  tcg/tcg-op.h           |  5 +++++
>>>>  target/arm/translate.c | 10 ----------
>>>>  tcg/tcg-op.c           | 20 ++++++++++++++++++++
>>>>  3 files changed, 25 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
>>>> index 472b73cb38..660fe205d0 100644
>>>> --- a/tcg/tcg-op.h
>>>> +++ b/tcg/tcg-op.h
>>>> @@ -335,6 +335,7 @@ void tcg_gen_smin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>>>>  void tcg_gen_smax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>>>>  void tcg_gen_umin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>>>>  void tcg_gen_umax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
>>>> +void tcg_gen_abs_i32(TCGv_i32, TCGv_i32);
>>>>  
>>>>  static inline void tcg_gen_discard_i32(TCGv_i32 arg)
>>>>  {
>>>> @@ -534,6 +535,7 @@ void tcg_gen_smin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>>>>  void tcg_gen_smax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>>>>  void tcg_gen_umin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>>>>  void tcg_gen_umax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
>>>> +void tcg_gen_abs_i64(TCGv_i64, TCGv_i64);
>>>>  
>>>>  #if TCG_TARGET_REG_BITS == 64
>>>>  static inline void tcg_gen_discard_i64(TCGv_i64 arg)
>>>> @@ -973,6 +975,7 @@ void tcg_gen_nor_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>>>>  void tcg_gen_eqv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>>>>  void tcg_gen_not_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>>>>  void tcg_gen_neg_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>>>> +void tcg_gen_abs_vec(unsigned vece, TCGv_vec r, TCGv_vec a);
>>>>  void tcg_gen_ssadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>>>>  void tcg_gen_usadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>>>>  void tcg_gen_sssub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
>>>> @@ -1019,6 +1022,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
>>>>  #define tcg_gen_addi_tl tcg_gen_addi_i64
>>>>  #define tcg_gen_sub_tl tcg_gen_sub_i64
>>>>  #define tcg_gen_neg_tl tcg_gen_neg_i64
>>>> +#define tcg_gen_abs_tl tcg_gen_abs_i64
>>>>  #define tcg_gen_subfi_tl tcg_gen_subfi_i64
>>>>  #define tcg_gen_subi_tl tcg_gen_subi_i64
>>>>  #define tcg_gen_and_tl tcg_gen_and_i64
>>>> @@ -1131,6 +1135,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
>>>>  #define tcg_gen_addi_tl tcg_gen_addi_i32
>>>>  #define tcg_gen_sub_tl tcg_gen_sub_i32
>>>>  #define tcg_gen_neg_tl tcg_gen_neg_i32
>>>> +#define tcg_gen_abs_tl tcg_gen_abs_i32
>>>>  #define tcg_gen_subfi_tl tcg_gen_subfi_i32
>>>>  #define tcg_gen_subi_tl tcg_gen_subi_i32
>>>>  #define tcg_gen_and_tl tcg_gen_and_i32
>>>> diff --git a/target/arm/translate.c b/target/arm/translate.c
>>>> index 83a008e945..721171794d 100644
>>>> --- a/target/arm/translate.c
>>>> +++ b/target/arm/translate.c
>>>> @@ -603,16 +603,6 @@ static void gen_sar(TCGv_i32 dest, TCGv_i32 t0, TCGv_i32 t1)
>>>>      tcg_temp_free_i32(tmp1);
>>>>  }
>>>>  
>>>> -static void tcg_gen_abs_i32(TCGv_i32 dest, TCGv_i32 src)
>>>> -{
>>>> -    TCGv_i32 c0 = tcg_const_i32(0);
>>>> -    TCGv_i32 tmp = tcg_temp_new_i32();
>>>> -    tcg_gen_neg_i32(tmp, src);
>>>> -    tcg_gen_movcond_i32(TCG_COND_GT, dest, src, c0, src, tmp);
>>>> -    tcg_temp_free_i32(c0);
>>>> -    tcg_temp_free_i32(tmp);
>>>> -}
>>>> -
>>>>  static void shifter_out_im(TCGv_i32 var, int shift)
>>>>  {
>>>>      if (shift == 0) {
>>>> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
>>>> index a00d1df37e..0ac291f1c4 100644
>>>> --- a/tcg/tcg-op.c
>>>> +++ b/tcg/tcg-op.c
>>>> @@ -1091,6 +1091,16 @@ void tcg_gen_umax_i32(TCGv_i32 ret, TCGv_i32 a, TCGv_i32 b)
>>>>      tcg_gen_movcond_i32(TCG_COND_LTU, ret, a, b, b, a);
>>>>  }
>>>>  
>>>> +void tcg_gen_abs_i32(TCGv_i32 ret, TCGv_i32 a)
>>>> +{
>>>> +    TCGv_i32 t = tcg_temp_new_i32();
>>>> +
>>>> +    tcg_gen_sari_i32(t, a, 31);
>>>> +    tcg_gen_xor_i32(ret, a, t);
>>>> +    tcg_gen_sub_i32(ret, ret, t);
>>>> +    tcg_temp_free_i32(t);
>>>> +}
>>>> +
>>>>  /* 64-bit ops */
>>>>  
>>>>  #if TCG_TARGET_REG_BITS == 32
>>>> @@ -2548,6 +2558,16 @@ void tcg_gen_umax_i64(TCGv_i64 ret, TCGv_i64 a, TCGv_i64 b)
>>>>      tcg_gen_movcond_i64(TCG_COND_LTU, ret, a, b, b, a);
>>>>  }
>>>>  
>>>> +void tcg_gen_abs_i64(TCGv_i64 ret, TCGv_i64 a)
>>>> +{
>>>> +    TCGv_i64 t = tcg_temp_new_i64();
>>>> +
>>>> +    tcg_gen_sari_i64(t, a, 63);
>>>> +    tcg_gen_xor_i64(ret, a, t);
>>>> +    tcg_gen_sub_i64(ret, ret, t);
>>>> +    tcg_temp_free_i64(t);
>>>> +}
>>>> +
>>>>  /* Size changing operations.  */
>>>>  
>>>>  void tcg_gen_extrl_i64_i32(TCGv_i32 ret, TCGv_i64 arg)
>>>>
>>>
>>> Nice trick
>>
>> Per commit 7dcfb0897b99, I think it's worth a:
>>
>> Inspired-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
> 
> *shrug* As per the comment, I got the sequence from gcc -O2 -S.

Now I understand better your comment "Use a branchless implementation of
abs that gcc uses for x86". Previously I misunderstood it =)

Back to commit 7dcfb0897b99, eventually Edgar figured the same trick
from GCC.

Regards,

Phil.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 00/38] tcg vector improvements
  2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
                   ` (39 preceding siblings ...)
  2019-04-23 19:15 ` David Hildenbrand
@ 2019-04-29 19:28 ` David Hildenbrand
  2019-04-29 20:19   ` Richard Henderson
  40 siblings, 1 reply; 69+ messages in thread
From: David Hildenbrand @ 2019-04-29 19:28 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 20.04.19 09:34, Richard Henderson wrote:
> Based-on: tcg-next, which at present is only tcg_gen_extract2.
> 
> The dupm patches have been on list before, with a larger context
> of supporting tcg/ppc.  The rest of the set was written to support
> David's s390 vector patches.  In particular:
> 
> (1) Add vector absolute value.
> (2) Add vector shift by non-constant scalar.
> (3) Add vector shift by vector.
> (4) Add vector select.
> (5) Be more precise in handling target-specific vector expansions.
> 
> And then there's a set of bugs that I encountered while working
> on this across x86, aa64, and ppc hosts.  Tested primarily with
> aa64 as the guest, via RISU.
> 
> 
> r~

Hi Richard,

what are your plans with this series? (and shlv and friends?)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCH 00/38] tcg vector improvements
  2019-04-29 19:28 ` David Hildenbrand
@ 2019-04-29 20:19   ` Richard Henderson
  0 siblings, 0 replies; 69+ messages in thread
From: Richard Henderson @ 2019-04-29 20:19 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel

On 4/29/19 12:28 PM, David Hildenbrand wrote:
> Hi Richard,
> 
> what are your plans with this series? (and shlv and friends?)
> 

I expect to submit them this week, barring any other comment on the patches
themselves.

r~

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2019-04-29 20:20 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-20  7:34 [Qemu-devel] [PATCH 00/38] tcg vector improvements Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 01/38] target/arm: Fill in .opc for cmtst_op Richard Henderson
2019-04-23  8:00   ` David Hildenbrand
2019-04-20  7:34 ` [Qemu-devel] [PATCH 02/38] tcg: Assert fixed_reg is read-only Richard Henderson
2019-04-23  8:03   ` David Hildenbrand
2019-04-20  7:34 ` [Qemu-devel] [PATCH 03/38] tcg: Return bool success from tcg_out_mov Richard Henderson
2019-04-20 10:56   ` Philippe Mathieu-Daudé
2019-04-23  8:27   ` David Hildenbrand
2019-04-20  7:34 ` [Qemu-devel] [PATCH 04/38] tcg: Support cross-class moves without instruction support Richard Henderson
2019-04-23  8:29   ` David Hildenbrand
2019-04-20  7:34 ` [Qemu-devel] [PATCH 05/38] tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded Richard Henderson
2019-04-23  8:33   ` David Hildenbrand
2019-04-20  7:34 ` [Qemu-devel] [PATCH 06/38] tcg: Promote tcg_out_{dup, dupi}_vec to backend interface Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 07/38] tcg: Manually expand INDEX_op_dup_vec Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 08/38] tcg: Add tcg_out_dupm_vec to the backend interface Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 09/38] tcg/i386: Implement tcg_out_dupm_vec Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 10/38] tcg/aarch64: " Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 11/38] tcg: Add INDEX_op_dup_mem_vec Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 12/38] tcg: Add gvec expanders for variable shift Richard Henderson
2019-04-23 19:04   ` David Hildenbrand
2019-04-23 19:28     ` Richard Henderson
2019-04-23 21:02       ` David Hildenbrand
2019-04-23 21:40         ` Richard Henderson
2019-04-23 21:57           ` David Hildenbrand
2019-04-20  7:34 ` [Qemu-devel] [PATCH 13/38] tcg/i386: Support vector variable shift opcodes Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 14/38] tcg/aarch64: " Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 15/38] tcg: Implement tcg_gen_gvec_3i() Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 16/38] tcg: Specify optional vector requirements with a list Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 17/38] tcg: Add gvec expanders for vector shift by scalar Richard Henderson
2019-04-23 18:58   ` David Hildenbrand
2019-04-23 19:21     ` Richard Henderson
2019-04-23 21:05       ` David Hildenbrand
2019-04-20  7:34 ` [Qemu-devel] [PATCH 18/38] tcg/i386: Support vector scalar shift opcodes Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 19/38] tcg: Add support for integer absolute value Richard Henderson
2019-04-23  8:52   ` Philippe Mathieu-Daudé
2019-04-23 18:37   ` David Hildenbrand
2019-04-23 22:09     ` Philippe Mathieu-Daudé
2019-04-23 22:29       ` Richard Henderson
2019-04-23 23:05         ` Philippe Mathieu-Daudé
2019-04-20  7:34 ` [Qemu-devel] [PATCH 20/38] tcg: Add support for vector " Richard Henderson
2019-04-23 18:35   ` David Hildenbrand
2019-04-20  7:34 ` [Qemu-devel] [PATCH 21/38] target/arm: Use tcg_gen_abs_i64 and tcg_gen_gvec_abs Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 22/38] target/cris: Use tcg_gen_abs_tl Richard Henderson
2019-04-23 10:09   ` Philippe Mathieu-Daudé
2019-04-20  7:34 ` [Qemu-devel] [PATCH 23/38] target/ppc: " Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 24/38] target/s390x: Use tcg_gen_abs_i64 Richard Henderson
2019-04-23 18:40   ` David Hildenbrand
2019-04-23 22:12   ` Philippe Mathieu-Daudé
2019-04-20  7:34 ` [Qemu-devel] [PATCH 25/38] target/xtensa: Use tcg_gen_abs_i32 Richard Henderson
2019-04-23 22:14   ` Philippe Mathieu-Daudé
2019-04-20  7:34 ` [Qemu-devel] [PATCH 26/38] tcg/i386: Support vector absolute value Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 27/38] tcg/aarch64: " Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 28/38] tcg: Add support for vector comparison select Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 29/38] tcg/i386: Support vector comparison select value Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 30/38] tcg/aarch64: " Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 31/38] target/ppc: Use vector variable shifts for VS{L, R, RA}{B, H, W, D} Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 32/38] target/arm: Vectorize USHL and SSHL Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 33/38] tcg/aarch64: Do not advertise minmax for MO_64 Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 34/38] tcg: Do not recreate INDEX_op_neg_vec unless supported Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 35/38] tcg: Introduce do_op3_nofail for vector expansion Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 36/38] tcg: Expand vector minmax using cmp+cmpsel Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 37/38] tcg/aarch64: Use MVNI for expansion of dupi Richard Henderson
2019-04-20  7:34 ` [Qemu-devel] [PATCH 38/38] tcg/aarch64: Use ORRI and BICI for vector logical operations Richard Henderson
2019-04-20  8:09 ` [Qemu-devel] [PATCH 00/38] tcg vector improvements no-reply
2019-04-23 19:15 ` David Hildenbrand
2019-04-23 20:26   ` Richard Henderson
2019-04-23 20:31     ` David Hildenbrand
2019-04-29 19:28 ` David Hildenbrand
2019-04-29 20:19   ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.