All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes
@ 2019-06-29 13:00 Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 01/16] tcg/ppc: Introduce Altivec registers Richard Henderson
                   ` (17 more replies)
  0 siblings, 18 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

Changes since v5:
  * Disable runtime altivec detection until all of the required
    opcodes are implemented.
    Because dup2 was last, that really means all of the pure altivec
    bits, so the initial patches are not bisectable in any meaningful
    sense.  I thought about reshuffling dup2 earlier, but that created
    too many conflicts and I was too lazy.
  * Rearranged the patches a little bit to make sure that each
    one actually builds, which was not the case before.
  * Folded in the fix to tcg_out_mem_long, as discussed in the
    followup within the v4 thread.

Changes since v4:
  * Patch 1, "tcg/ppc: Introduce Altivec registers", is divided into
    ten smaller patches.
  * The net result (code-wise) is not changed between former patch 1
    and ten new patches.
  * Remaining (2-7) patches from v4 are applied verbatim.
  * This means that code-wise v5 and v4 do not differ.
  * v5 is devised to help debugging, and to better organize the code.

Changes since v3:
  * Add support for bitsel, with the vsx xxsel insn.
  * Rely on the new relocation overflow handling, so
    we don't require 3 insns for a vector load.

Changes since v2:
  * Several generic tcg patches to improve dup vs dupi vs dupm.
    In particular, if a global temp (like guest r10) is not in
    a host register, we should duplicate from memory instead of
    loading to an integer register, spilling to stack, loading
    to a vector register, and then duplicating.
  * I have more confidence that 32-bit ppc host should work
    this time around.  No testing on that front yet, but I've
    unified some code sequences with 64-bit ppc host.
  * Base altivec now supports V128 only.  Moved V64 support to
    Power7 (v2.06), which has 64-bit load/store.
  * Dropped support for 64-bit vector multiply using Power8.
    The expansion was too large compared to using integer regs.

Richard Henderson (16):
  tcg/ppc: Introduce Altivec registers
  tcg/ppc: Introduce macro VX4()
  tcg/ppc: Introduce macros VRT(), VRA(), VRB(), VRC()
  tcg/ppc: Enable tcg backend vector compilation
  tcg/ppc: Add support for load/store/logic/comparison
  tcg/ppc: Add support for vector maximum/minimum
  tcg/ppc: Add support for vector add/subtract
  tcg/ppc: Add support for vector saturated add/subtract
  tcg/ppc: Prepare case for vector multiply
  tcg/ppc: Support vector shift by immediate
  tcg/ppc: Support vector multiply
  tcg/ppc: Support vector dup2
  tcg/ppc: Enable Altivec detection
  tcg/ppc: Update vector support to v2.06
  tcg/ppc: Update vector support to v2.07
  tcg/ppc: Update vector support to v3.00

 tcg/ppc/tcg-target.h     |   39 +-
 tcg/ppc/tcg-target.opc.h |   13 +
 tcg/ppc/tcg-target.inc.c | 1091 +++++++++++++++++++++++++++++++++++---
 3 files changed, 1076 insertions(+), 67 deletions(-)
 create mode 100644 tcg/ppc/tcg-target.opc.h

-- 
2.17.1



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 01/16] tcg/ppc: Introduce Altivec registers
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 02/16] tcg/ppc: Introduce macro VX4() Richard Henderson
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

Altivec supports 32 128-bit vector registers, whose names are
by convention v0 through v31.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     | 11 ++++-
 tcg/ppc/tcg-target.inc.c | 88 +++++++++++++++++++++++++---------------
 2 files changed, 65 insertions(+), 34 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 7627fb62d3..690fa744e1 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -31,7 +31,7 @@
 # define TCG_TARGET_REG_BITS  32
 #endif
 
-#define TCG_TARGET_NB_REGS 32
+#define TCG_TARGET_NB_REGS 64
 #define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
 
@@ -45,6 +45,15 @@ typedef enum {
     TCG_REG_R24, TCG_REG_R25, TCG_REG_R26, TCG_REG_R27,
     TCG_REG_R28, TCG_REG_R29, TCG_REG_R30, TCG_REG_R31,
 
+    TCG_REG_V0,  TCG_REG_V1,  TCG_REG_V2,  TCG_REG_V3,
+    TCG_REG_V4,  TCG_REG_V5,  TCG_REG_V6,  TCG_REG_V7,
+    TCG_REG_V8,  TCG_REG_V9,  TCG_REG_V10, TCG_REG_V11,
+    TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
+    TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
+    TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
+    TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27,
+    TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
+
     TCG_REG_CALL_STACK = TCG_REG_R1,
     TCG_AREG0 = TCG_REG_R27
 } TCGReg;
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 852b8940fb..8e1bba7824 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -42,6 +42,9 @@
 # define TCG_REG_TMP1   TCG_REG_R12
 #endif
 
+#define TCG_VEC_TMP1    TCG_REG_V0
+#define TCG_VEC_TMP2    TCG_REG_V1
+
 #define TCG_REG_TB     TCG_REG_R31
 #define USE_REG_TB     (TCG_TARGET_REG_BITS == 64)
 
@@ -72,39 +75,15 @@ bool have_isa_3_00;
 #endif
 
 #ifdef CONFIG_DEBUG_TCG
-static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
-    "r0",
-    "r1",
-    "r2",
-    "r3",
-    "r4",
-    "r5",
-    "r6",
-    "r7",
-    "r8",
-    "r9",
-    "r10",
-    "r11",
-    "r12",
-    "r13",
-    "r14",
-    "r15",
-    "r16",
-    "r17",
-    "r18",
-    "r19",
-    "r20",
-    "r21",
-    "r22",
-    "r23",
-    "r24",
-    "r25",
-    "r26",
-    "r27",
-    "r28",
-    "r29",
-    "r30",
-    "r31"
+static const char tcg_target_reg_names[TCG_TARGET_NB_REGS][4] = {
+    "r0",  "r1",  "r2",  "r3",  "r4",  "r5",  "r6",  "r7",
+    "r8",  "r9",  "r10", "r11", "r12", "r13", "r14", "r15",
+    "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23",
+    "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31",
+    "v0",  "v1",  "v2",  "v3",  "v4",  "v5",  "v6",  "v7",
+    "v8",  "v9",  "v10", "v11", "v12", "v13", "v14", "v15",
+    "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23",
+    "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31",
 };
 #endif
 
@@ -139,6 +118,26 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_R5,
     TCG_REG_R4,
     TCG_REG_R3,
+
+    /* V0 and V1 reserved as temporaries; V20 - V31 are call-saved */
+    TCG_REG_V2,   /* call clobbered, vectors */
+    TCG_REG_V3,
+    TCG_REG_V4,
+    TCG_REG_V5,
+    TCG_REG_V6,
+    TCG_REG_V7,
+    TCG_REG_V8,
+    TCG_REG_V9,
+    TCG_REG_V10,
+    TCG_REG_V11,
+    TCG_REG_V12,
+    TCG_REG_V13,
+    TCG_REG_V14,
+    TCG_REG_V15,
+    TCG_REG_V16,
+    TCG_REG_V17,
+    TCG_REG_V18,
+    TCG_REG_V19,
 };
 
 static const int tcg_target_call_iarg_regs[] = {
@@ -2808,6 +2807,27 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R11);
     tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R12);
 
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V0);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V1);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V2);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V3);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V4);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V5);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V6);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V7);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V8);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V9);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V10);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V11);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V12);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V13);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V14);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V15);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V16);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V17);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V18);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V19);
+
     s->reserved_regs = 0;
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R0); /* tcg temp */
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R1); /* stack pointer */
@@ -2818,6 +2838,8 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R13); /* thread pointer */
 #endif
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP1); /* mem temp */
+    tcg_regset_set_reg(s->reserved_regs, TCG_VEC_TMP1);
+    tcg_regset_set_reg(s->reserved_regs, TCG_VEC_TMP2);
     if (USE_REG_TB) {
         tcg_regset_set_reg(s->reserved_regs, TCG_REG_TB);  /* tb->tc_ptr */
     }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 02/16] tcg/ppc: Introduce macro VX4()
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 01/16] tcg/ppc: Introduce Altivec registers Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 03/16] tcg/ppc: Introduce macros VRT(), VRA(), VRB(), VRC() Richard Henderson
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

Introduce macro VX4() used for encoding Altivec instructions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.inc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 8e1bba7824..9e560db993 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -319,6 +319,7 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define XO31(opc) (OPCD(31)|((opc)<<1))
 #define XO58(opc) (OPCD(58)|(opc))
 #define XO62(opc) (OPCD(62)|(opc))
+#define VX4(opc)  (OPCD(4)|(opc))
 
 #define B      OPCD( 18)
 #define BC     OPCD( 16)
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 03/16] tcg/ppc: Introduce macros VRT(), VRA(), VRB(), VRC()
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 01/16] tcg/ppc: Introduce Altivec registers Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 02/16] tcg/ppc: Introduce macro VX4() Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 04/16] tcg/ppc: Enable tcg backend vector compilation Richard Henderson
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

Introduce macros VRT(), VRA(), VRB(), VRC() used for encoding
elements of Altivec instructions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.inc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 9e560db993..cfbd7ff12c 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -473,6 +473,11 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define MB64(b) ((b)<<5)
 #define FXM(b) (1 << (19 - (b)))
 
+#define VRT(r)  (((r) & 31) << 21)
+#define VRA(r)  (((r) & 31) << 16)
+#define VRB(r)  (((r) & 31) << 11)
+#define VRC(r)  (((r) & 31) <<  6)
+
 #define LK    1
 
 #define TAB(t, a, b) (RT(t) | RA(a) | RB(b))
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 04/16] tcg/ppc: Enable tcg backend vector compilation
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (2 preceding siblings ...)
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 03/16] tcg/ppc: Introduce macros VRT(), VRA(), VRB(), VRC() Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-30  9:46   ` Aleksandar Markovic
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 05/16] tcg/ppc: Add support for load/store/logic/comparison Richard Henderson
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

Introduce all of the flags required to enable tcg backend vector support,
and a runtime flag to indicate the host supports Altivec instructions.

For now, do not actually set have_isa_altivec to true, because we have not
yet added all of the code to actually generate all of the required insns.
However, we must define these flags in order to disable ifndefs that create
stub versions of the functions added here.

The change to tcg_out_movi works around a buglet in tcg.c wherein if we
do not define tcg_out_dupi_vec we get a declared but not defined Werror,
but if we only declare it we get a defined but not used Werror.  We need
to this change to tcg_out_movi eventually anyway, so it's no biggie.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     | 25 ++++++++++++++++
 tcg/ppc/tcg-target.opc.h |  5 ++++
 tcg/ppc/tcg-target.inc.c | 65 ++++++++++++++++++++++++++++++++++++++--
 3 files changed, 92 insertions(+), 3 deletions(-)
 create mode 100644 tcg/ppc/tcg-target.opc.h

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 690fa744e1..f6283f468b 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -58,6 +58,7 @@ typedef enum {
     TCG_AREG0 = TCG_REG_R27
 } TCGReg;
 
+extern bool have_isa_altivec;
 extern bool have_isa_2_06;
 extern bool have_isa_3_00;
 
@@ -135,6 +136,30 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_mulsh_i64        1
 #endif
 
+/*
+ * While technically Altivec could support V64, it has no 64-bit store
+ * instruction and substituting two 32-bit stores makes the generated
+ * code quite large.
+ */
+#define TCG_TARGET_HAS_v64              0
+#define TCG_TARGET_HAS_v128             have_isa_altivec
+#define TCG_TARGET_HAS_v256             0
+
+#define TCG_TARGET_HAS_andc_vec         0
+#define TCG_TARGET_HAS_orc_vec          0
+#define TCG_TARGET_HAS_not_vec          0
+#define TCG_TARGET_HAS_neg_vec          0
+#define TCG_TARGET_HAS_abs_vec          0
+#define TCG_TARGET_HAS_shi_vec          0
+#define TCG_TARGET_HAS_shs_vec          0
+#define TCG_TARGET_HAS_shv_vec          0
+#define TCG_TARGET_HAS_cmp_vec          0
+#define TCG_TARGET_HAS_mul_vec          0
+#define TCG_TARGET_HAS_sat_vec          0
+#define TCG_TARGET_HAS_minmax_vec       0
+#define TCG_TARGET_HAS_bitsel_vec       0
+#define TCG_TARGET_HAS_cmpsel_vec       0
+
 void flush_icache_range(uintptr_t start, uintptr_t stop);
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
diff --git a/tcg/ppc/tcg-target.opc.h b/tcg/ppc/tcg-target.opc.h
new file mode 100644
index 0000000000..fa680dd6a0
--- /dev/null
+++ b/tcg/ppc/tcg-target.opc.h
@@ -0,0 +1,5 @@
+/*
+ * Target-specific opcodes for host vector expansion.  These will be
+ * emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
+ * consider these to be UNSPEC with names.
+ */
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index cfbd7ff12c..b938e9aac5 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -64,6 +64,7 @@
 
 static tcg_insn_unit *tb_ret_addr;
 
+bool have_isa_altivec;
 bool have_isa_2_06;
 bool have_isa_3_00;
 
@@ -717,10 +718,31 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
     }
 }
 
-static inline void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
-                                tcg_target_long arg)
+static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
+                             tcg_target_long val)
 {
-    tcg_out_movi_int(s, type, ret, arg, false);
+    g_assert_not_reached();
+}
+
+static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
+                         tcg_target_long arg)
+{
+    switch (type) {
+    case TCG_TYPE_I32:
+    case TCG_TYPE_I64:
+        tcg_debug_assert(ret < TCG_REG_V0);
+        tcg_out_movi_int(s, type, ret, arg, false);
+        break;
+
+    case TCG_TYPE_V64:
+    case TCG_TYPE_V128:
+        tcg_debug_assert(ret >= TCG_REG_V0);
+        tcg_out_dupi_vec(s, type, ret, arg);
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
 }
 
 static bool mask_operand(uint32_t c, int *mb, int *me)
@@ -2605,6 +2627,36 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     }
 }
 
+int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
+{
+    g_assert_not_reached();
+}
+
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+                            TCGReg dst, TCGReg src)
+{
+    g_assert_not_reached();
+}
+
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+                             TCGReg out, TCGReg base, intptr_t offset)
+{
+    g_assert_not_reached();
+}
+
+static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
+                           unsigned vecl, unsigned vece,
+                           const TCGArg *args, const int *const_args)
+{
+    g_assert_not_reached();
+}
+
+void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
+                       TCGArg a0, ...)
+{
+    g_assert_not_reached();
+}
+
 static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
 {
     static const TCGTargetOpDef r = { .args_ct_str = { "r" } };
@@ -2787,6 +2839,9 @@ static void tcg_target_init(TCGContext *s)
     unsigned long hwcap = qemu_getauxval(AT_HWCAP);
     unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2);
 
+    if (hwcap & /* PPC_FEATURE_HAS_ALTIVEC -- NOT YET */ 0) {
+        have_isa_altivec = true;
+    }
     if (hwcap & PPC_FEATURE_ARCH_2_06) {
         have_isa_2_06 = true;
     }
@@ -2798,6 +2853,10 @@ static void tcg_target_init(TCGContext *s)
 
     tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
     tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
+    if (have_isa_altivec) {
+        tcg_target_available_regs[TCG_TYPE_V64] = 0xffffffff00000000ull;
+        tcg_target_available_regs[TCG_TYPE_V128] = 0xffffffff00000000ull;
+    }
 
     tcg_target_call_clobber_regs = 0;
     tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R0);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 05/16] tcg/ppc: Add support for load/store/logic/comparison
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (3 preceding siblings ...)
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 04/16] tcg/ppc: Enable tcg backend vector compilation Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 06/16] tcg/ppc: Add support for vector maximum/minimum Richard Henderson
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

Add various bits and peaces related mostly to load and store
operations. In that context, logic, compare, and splat Altivec
instructions are used, and, therefore, the support for emitting
them is included in this patch too.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     |   6 +-
 tcg/ppc/tcg-target.inc.c | 472 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 442 insertions(+), 36 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index f6283f468b..b66a808259 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -145,15 +145,15 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_v128             have_isa_altivec
 #define TCG_TARGET_HAS_v256             0
 
-#define TCG_TARGET_HAS_andc_vec         0
+#define TCG_TARGET_HAS_andc_vec         1
 #define TCG_TARGET_HAS_orc_vec          0
-#define TCG_TARGET_HAS_not_vec          0
+#define TCG_TARGET_HAS_not_vec          1
 #define TCG_TARGET_HAS_neg_vec          0
 #define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_shi_vec          0
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          0
-#define TCG_TARGET_HAS_cmp_vec          0
+#define TCG_TARGET_HAS_cmp_vec          1
 #define TCG_TARGET_HAS_mul_vec          0
 #define TCG_TARGET_HAS_sat_vec          0
 #define TCG_TARGET_HAS_minmax_vec       0
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index b938e9aac5..87c418ebf4 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -233,6 +233,10 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
         ct->ct |= TCG_CT_REG;
         ct->u.regs = 0xffffffff;
         break;
+    case 'v':
+        ct->ct |= TCG_CT_REG;
+        ct->u.regs = 0xffffffff00000000ull;
+        break;
     case 'L':                   /* qemu_ld constraint */
         ct->ct |= TCG_CT_REG;
         ct->u.regs = 0xffffffff;
@@ -462,6 +466,39 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 
 #define NOP    ORI  /* ori 0,0,0 */
 
+#define LVX        XO31(103)
+#define LVEBX      XO31(7)
+#define LVEHX      XO31(39)
+#define LVEWX      XO31(71)
+
+#define STVX       XO31(231)
+#define STVEWX     XO31(199)
+
+#define VCMPEQUB   VX4(6)
+#define VCMPEQUH   VX4(70)
+#define VCMPEQUW   VX4(134)
+#define VCMPGTSB   VX4(774)
+#define VCMPGTSH   VX4(838)
+#define VCMPGTSW   VX4(902)
+#define VCMPGTUB   VX4(518)
+#define VCMPGTUH   VX4(582)
+#define VCMPGTUW   VX4(646)
+
+#define VAND       VX4(1028)
+#define VANDC      VX4(1092)
+#define VNOR       VX4(1284)
+#define VOR        VX4(1156)
+#define VXOR       VX4(1220)
+
+#define VSPLTB     VX4(524)
+#define VSPLTH     VX4(588)
+#define VSPLTW     VX4(652)
+#define VSPLTISB   VX4(780)
+#define VSPLTISH   VX4(844)
+#define VSPLTISW   VX4(908)
+
+#define VSLDOI     VX4(44)
+
 #define RT(r) ((r)<<21)
 #define RS(r) ((r)<<21)
 #define RA(r) ((r)<<16)
@@ -535,6 +572,8 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     tcg_insn_unit *target;
+    int16_t lo;
+    int32_t hi;
 
     value += addend;
     target = (tcg_insn_unit *)value;
@@ -556,6 +595,20 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
         }
         *code_ptr = (*code_ptr & ~0xfffc) | (value & 0xfffc);
         break;
+    case R_PPC_ADDR32:
+        /*
+         * We are abusing this relocation type.  Again, this points to
+         * a pair of insns, lis + load.  This is an absolute address
+         * relocation for PPC32 so the lis cannot be removed.
+         */
+        lo = value;
+        hi = value - lo;
+        if (hi + lo != value) {
+            return false;
+        }
+        code_ptr[0] = deposit32(code_ptr[0], 0, 16, hi >> 16);
+        code_ptr[1] = deposit32(code_ptr[1], 0, 16, lo);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -567,9 +620,29 @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
 
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
-    tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
-    if (ret != arg) {
-        tcg_out32(s, OR | SAB(arg, ret, arg));
+    if (ret == arg) {
+        return true;
+    }
+    switch (type) {
+    case TCG_TYPE_I64:
+        tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+        /* fallthru */
+    case TCG_TYPE_I32:
+        if (ret < TCG_REG_V0 && arg < TCG_REG_V0) {
+            tcg_out32(s, OR | SAB(arg, ret, arg));
+            break;
+        } else if (ret < TCG_REG_V0 || arg < TCG_REG_V0) {
+            /* Altivec does not support vector/integer moves.  */
+            return false;
+        }
+        /* fallthru */
+    case TCG_TYPE_V64:
+    case TCG_TYPE_V128:
+        tcg_debug_assert(ret >= TCG_REG_V0 && arg >= TCG_REG_V0);
+        tcg_out32(s, VOR | VRT(ret) | VRA(arg) | VRB(arg));
+        break;
+    default:
+        g_assert_not_reached();
     }
     return true;
 }
@@ -721,7 +794,52 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
                              tcg_target_long val)
 {
-    g_assert_not_reached();
+    uint32_t load_insn;
+    int rel, low;
+    intptr_t add;
+
+    low = (int8_t)val;
+    if (low >= -16 && low < 16) {
+        if (val == (tcg_target_long)dup_const(MO_8, low)) {
+            tcg_out32(s, VSPLTISB | VRT(ret) | ((val & 31) << 16));
+            return;
+        }
+        if (val == (tcg_target_long)dup_const(MO_16, low)) {
+            tcg_out32(s, VSPLTISH | VRT(ret) | ((val & 31) << 16));
+            return;
+        }
+        if (val == (tcg_target_long)dup_const(MO_32, low)) {
+            tcg_out32(s, VSPLTISW | VRT(ret) | ((val & 31) << 16));
+            return;
+        }
+    }
+
+    /*
+     * Otherwise we must load the value from the constant pool.
+     */
+    if (USE_REG_TB) {
+        rel = R_PPC_ADDR16;
+        add = -(intptr_t)s->code_gen_ptr;
+    } else {
+        rel = R_PPC_ADDR32;
+        add = 0;
+    }
+
+    load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
+    if (TCG_TARGET_REG_BITS == 64) {
+        new_pool_l2(s, rel, s->code_ptr, add, val, val);
+    } else {
+        new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
+    }
+
+    if (USE_REG_TB) {
+        tcg_out32(s, ADDI | TAI(TCG_REG_TMP1, 0, 0));
+        load_insn |= RA(TCG_REG_TB);
+    } else {
+        tcg_out32(s, ADDIS | TAI(TCG_REG_TMP1, 0, 0));
+        tcg_out32(s, ADDI | TAI(TCG_REG_TMP1, TCG_REG_TMP1, 0));
+    }
+    tcg_out32(s, load_insn);
 }
 
 static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
@@ -881,7 +999,7 @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
         align = 3;
         /* FALLTHRU */
     default:
-        if (rt != TCG_REG_R0) {
+        if (rt > TCG_REG_R0 && rt < TCG_REG_V0) {
             rs = rt;
             break;
         }
@@ -895,13 +1013,13 @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
     }
 
     /* For unaligned, or very large offsets, use the indexed form.  */
-    if (offset & align || offset != (int32_t)offset) {
+    if (offset & align || offset != (int32_t)offset || opi == 0) {
         if (rs == base) {
             rs = TCG_REG_R0;
         }
         tcg_debug_assert(!is_store || rs != rt);
         tcg_out_movi(s, TCG_TYPE_PTR, rs, orig);
-        tcg_out32(s, opx | TAB(rt, base, rs));
+        tcg_out32(s, opx | TAB(rt & 31, base, rs));
         return;
     }
 
@@ -922,36 +1040,102 @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
         base = rs;
     }
     if (opi != ADDI || base != rt || l0 != 0) {
-        tcg_out32(s, opi | TAI(rt, base, l0));
+        tcg_out32(s, opi | TAI(rt & 31, base, l0));
     }
 }
 
-static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
-                              TCGReg arg1, intptr_t arg2)
+static void tcg_out_vsldoi(TCGContext *s, TCGReg ret,
+                           TCGReg va, TCGReg vb, int shb)
 {
-    int opi, opx;
-
-    tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
-    if (type == TCG_TYPE_I32) {
-        opi = LWZ, opx = LWZX;
-    } else {
-        opi = LD, opx = LDX;
-    }
-    tcg_out_mem_long(s, opi, opx, ret, arg1, arg2);
+    tcg_out32(s, VSLDOI | VRT(ret) | VRA(va) | VRB(vb) | (shb << 6));
 }
 
-static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
-                              TCGReg arg1, intptr_t arg2)
+static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
+                       TCGReg base, intptr_t offset)
 {
-    int opi, opx;
+    int shift;
 
-    tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
-    if (type == TCG_TYPE_I32) {
-        opi = STW, opx = STWX;
-    } else {
-        opi = STD, opx = STDX;
+    switch (type) {
+    case TCG_TYPE_I32:
+        if (ret < TCG_REG_V0) {
+            tcg_out_mem_long(s, LWZ, LWZX, ret, base, offset);
+            break;
+        }
+        assert((offset & 3) == 0);
+        tcg_out_mem_long(s, 0, LVEWX, ret, base, offset);
+        shift = (offset - 4) & 0xc;
+        if (shift) {
+            tcg_out_vsldoi(s, ret, ret, ret, shift);
+        }
+        break;
+    case TCG_TYPE_I64:
+        if (ret < TCG_REG_V0) {
+            tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+            tcg_out_mem_long(s, LD, LDX, ret, base, offset);
+            break;
+        }
+        /* fallthru */
+    case TCG_TYPE_V64:
+        tcg_debug_assert(ret >= TCG_REG_V0);
+        assert((offset & 7) == 0);
+        tcg_out_mem_long(s, 0, LVX, ret, base, offset & -16);
+        if (offset & 8) {
+            tcg_out_vsldoi(s, ret, ret, ret, 8);
+        }
+        break;
+    case TCG_TYPE_V128:
+        tcg_debug_assert(ret >= TCG_REG_V0);
+        assert((offset & 15) == 0);
+        tcg_out_mem_long(s, 0, LVX, ret, base, offset);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
+                              TCGReg base, intptr_t offset)
+{
+    int shift;
+
+    switch (type) {
+    case TCG_TYPE_I32:
+        if (arg < TCG_REG_V0) {
+            tcg_out_mem_long(s, STW, STWX, arg, base, offset);
+            break;
+        }
+        assert((offset & 3) == 0);
+        shift = (offset - 4) & 0xc;
+        if (shift) {
+            tcg_out_vsldoi(s, TCG_VEC_TMP1, arg, arg, shift);
+            arg = TCG_VEC_TMP1;
+        }
+        tcg_out_mem_long(s, 0, STVEWX, arg, base, offset);
+        break;
+    case TCG_TYPE_I64:
+        if (arg < TCG_REG_V0) {
+            tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+            tcg_out_mem_long(s, STD, STDX, arg, base, offset);
+            break;
+        }
+        /* fallthru */
+    case TCG_TYPE_V64:
+        tcg_debug_assert(arg >= TCG_REG_V0);
+        assert((offset & 7) == 0);
+        if (offset & 8) {
+            tcg_out_vsldoi(s, TCG_VEC_TMP1, arg, arg, 8);
+            arg = TCG_VEC_TMP1;
+        }
+        tcg_out_mem_long(s, 0, STVEWX, arg, base, offset);
+        tcg_out_mem_long(s, 0, STVEWX, arg, base, offset + 4);
+        break;
+    case TCG_TYPE_V128:
+        tcg_debug_assert(arg >= TCG_REG_V0);
+        tcg_out_mem_long(s, 0, STVX, arg, base, offset);
+        break;
+    default:
+        g_assert_not_reached();
     }
-    tcg_out_mem_long(s, opi, opx, arg, arg1, arg2);
 }
 
 static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
@@ -2629,32 +2813,236 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 
 int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
 {
-    g_assert_not_reached();
+    switch (opc) {
+    case INDEX_op_and_vec:
+    case INDEX_op_or_vec:
+    case INDEX_op_xor_vec:
+    case INDEX_op_andc_vec:
+    case INDEX_op_not_vec:
+        return 1;
+    case INDEX_op_cmp_vec:
+        return vece <= MO_32 ? -1 : 0;
+    default:
+        return 0;
+    }
 }
 
 static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
                             TCGReg dst, TCGReg src)
 {
-    g_assert_not_reached();
+    tcg_debug_assert(dst >= TCG_REG_V0);
+    tcg_debug_assert(src >= TCG_REG_V0);
+
+    /*
+     * Recall we use (or emulate) VSX integer loads, so the integer is
+     * right justified within the left (zero-index) double-word.
+     */
+    switch (vece) {
+    case MO_8:
+        tcg_out32(s, VSPLTB | VRT(dst) | VRB(src) | (7 << 16));
+        break;
+    case MO_16:
+        tcg_out32(s, VSPLTH | VRT(dst) | VRB(src) | (3 << 16));
+        break;
+    case MO_32:
+        tcg_out32(s, VSPLTW | VRT(dst) | VRB(src) | (1 << 16));
+        break;
+    case MO_64:
+        tcg_out_vsldoi(s, TCG_VEC_TMP1, src, src, 8);
+        tcg_out_vsldoi(s, dst, TCG_VEC_TMP1, src, 8);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return true;
 }
 
 static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
                              TCGReg out, TCGReg base, intptr_t offset)
 {
-    g_assert_not_reached();
+    int elt;
+
+    tcg_debug_assert(out >= TCG_REG_V0);
+    switch (vece) {
+    case MO_8:
+        tcg_out_mem_long(s, 0, LVEBX, out, base, offset);
+        elt = extract32(offset, 0, 4);
+#ifndef HOST_WORDS_BIGENDIAN
+        elt ^= 15;
+#endif
+        tcg_out32(s, VSPLTB | VRT(out) | VRB(out) | (elt << 16));
+        break;
+    case MO_16:
+        assert((offset & 1) == 0);
+        tcg_out_mem_long(s, 0, LVEHX, out, base, offset);
+        elt = extract32(offset, 1, 3);
+#ifndef HOST_WORDS_BIGENDIAN
+        elt ^= 7;
+#endif
+        tcg_out32(s, VSPLTH | VRT(out) | VRB(out) | (elt << 16));
+        break;
+    case MO_32:
+        assert((offset & 3) == 0);
+        tcg_out_mem_long(s, 0, LVEWX, out, base, offset);
+        elt = extract32(offset, 2, 2);
+#ifndef HOST_WORDS_BIGENDIAN
+        elt ^= 3;
+#endif
+        tcg_out32(s, VSPLTW | VRT(out) | VRB(out) | (elt << 16));
+        break;
+    case MO_64:
+        assert((offset & 7) == 0);
+        tcg_out_mem_long(s, 0, LVX, out, base, offset & -16);
+        tcg_out_vsldoi(s, TCG_VEC_TMP1, out, out, 8);
+        elt = extract32(offset, 3, 1);
+#ifndef HOST_WORDS_BIGENDIAN
+        elt = !elt;
+#endif
+        if (elt) {
+            tcg_out_vsldoi(s, out, out, TCG_VEC_TMP1, 8);
+        } else {
+            tcg_out_vsldoi(s, out, TCG_VEC_TMP1, out, 8);
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return true;
 }
 
 static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
                            unsigned vecl, unsigned vece,
                            const TCGArg *args, const int *const_args)
 {
-    g_assert_not_reached();
+    static const uint32_t
+        eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 },
+        gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 },
+        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 };
+
+    TCGType type = vecl + TCG_TYPE_V64;
+    TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
+    uint32_t insn;
+
+    switch (opc) {
+    case INDEX_op_ld_vec:
+        tcg_out_ld(s, type, a0, a1, a2);
+        return;
+    case INDEX_op_st_vec:
+        tcg_out_st(s, type, a0, a1, a2);
+        return;
+    case INDEX_op_dupm_vec:
+        tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
+        return;
+
+    case INDEX_op_and_vec:
+        insn = VAND;
+        break;
+    case INDEX_op_or_vec:
+        insn = VOR;
+        break;
+    case INDEX_op_xor_vec:
+        insn = VXOR;
+        break;
+    case INDEX_op_andc_vec:
+        insn = VANDC;
+        break;
+    case INDEX_op_not_vec:
+        insn = VNOR;
+        a2 = a1;
+        break;
+
+    case INDEX_op_cmp_vec:
+        switch (args[3]) {
+        case TCG_COND_EQ:
+            insn = eq_op[vece];
+            break;
+        case TCG_COND_GT:
+            insn = gts_op[vece];
+            break;
+        case TCG_COND_GTU:
+            insn = gtu_op[vece];
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        break;
+
+    case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
+    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
+    case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
+    default:
+        g_assert_not_reached();
+    }
+
+    tcg_debug_assert(insn != 0);
+    tcg_out32(s, insn | VRT(a0) | VRA(a1) | VRB(a2));
+}
+
+static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
+                           TCGv_vec v1, TCGv_vec v2, TCGCond cond)
+{
+    bool need_swap = false, need_inv = false;
+
+    tcg_debug_assert(vece <= MO_32);
+
+    switch (cond) {
+    case TCG_COND_EQ:
+    case TCG_COND_GT:
+    case TCG_COND_GTU:
+        break;
+    case TCG_COND_NE:
+    case TCG_COND_LE:
+    case TCG_COND_LEU:
+        need_inv = true;
+        break;
+    case TCG_COND_LT:
+    case TCG_COND_LTU:
+        need_swap = true;
+        break;
+    case TCG_COND_GE:
+    case TCG_COND_GEU:
+        need_swap = need_inv = true;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    if (need_inv) {
+        cond = tcg_invert_cond(cond);
+    }
+    if (need_swap) {
+        TCGv_vec t1;
+        t1 = v1, v1 = v2, v2 = t1;
+        cond = tcg_swap_cond(cond);
+    }
+
+    vec_gen_4(INDEX_op_cmp_vec, type, vece, tcgv_vec_arg(v0),
+              tcgv_vec_arg(v1), tcgv_vec_arg(v2), cond);
+
+    if (need_inv) {
+        tcg_gen_not_vec(vece, v0, v0);
+    }
 }
 
 void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                        TCGArg a0, ...)
 {
-    g_assert_not_reached();
+    va_list va;
+    TCGv_vec v0, v1, v2;
+
+    va_start(va, a0);
+    v0 = temp_tcgv_vec(arg_temp(a0));
+    v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+    v2 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+
+    switch (opc) {
+    case INDEX_op_cmp_vec:
+        expand_vec_cmp(type, vece, v0, v1, v2, va_arg(va, TCGArg));
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    va_end(va);
 }
 
 static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
@@ -2694,6 +3082,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         = { .args_ct_str = { "r", "r", "r", "r", "rI", "rZM" } };
     static const TCGTargetOpDef sub2
         = { .args_ct_str = { "r", "r", "rI", "rZM", "r", "r" } };
+    static const TCGTargetOpDef v_r = { .args_ct_str = { "v", "r" } };
+    static const TCGTargetOpDef v_v = { .args_ct_str = { "v", "v" } };
+    static const TCGTargetOpDef v_v_v = { .args_ct_str = { "v", "v", "v" } };
 
     switch (op) {
     case INDEX_op_goto_ptr:
@@ -2829,6 +3220,21 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return (TCG_TARGET_REG_BITS == 64 ? &S_S
                 : TARGET_LONG_BITS == 32 ? &S_S_S : &S_S_S_S);
 
+    case INDEX_op_and_vec:
+    case INDEX_op_or_vec:
+    case INDEX_op_xor_vec:
+    case INDEX_op_andc_vec:
+    case INDEX_op_orc_vec:
+    case INDEX_op_cmp_vec:
+        return &v_v_v;
+    case INDEX_op_not_vec:
+    case INDEX_op_dup_vec:
+        return &v_v;
+    case INDEX_op_ld_vec:
+    case INDEX_op_st_vec:
+    case INDEX_op_dupm_vec:
+        return &v_r;
+
     default:
         return NULL;
     }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 06/16] tcg/ppc: Add support for vector maximum/minimum
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (4 preceding siblings ...)
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 05/16] tcg/ppc: Add support for load/store/logic/comparison Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 07/16] tcg/ppc: Add support for vector add/subtract Richard Henderson
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

Add support for vector maximum/minimum using Altivec instructions
VMAXSB, VMAXSH, VMAXSW, VMAXUB, VMAXUH, VMAXUW, and
VMINSB, VMINSH, VMINSW, VMINUB, VMINUH, VMINUW.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     |  2 +-
 tcg/ppc/tcg-target.inc.c | 40 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index b66a808259..a86ed57303 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -156,7 +156,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_cmp_vec          1
 #define TCG_TARGET_HAS_mul_vec          0
 #define TCG_TARGET_HAS_sat_vec          0
-#define TCG_TARGET_HAS_minmax_vec       0
+#define TCG_TARGET_HAS_minmax_vec       1
 #define TCG_TARGET_HAS_bitsel_vec       0
 #define TCG_TARGET_HAS_cmpsel_vec       0
 
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 87c418ebf4..9c5630dc8a 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -474,6 +474,19 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define STVX       XO31(231)
 #define STVEWX     XO31(199)
 
+#define VMAXSB     VX4(258)
+#define VMAXSH     VX4(322)
+#define VMAXSW     VX4(386)
+#define VMAXUB     VX4(2)
+#define VMAXUH     VX4(66)
+#define VMAXUW     VX4(130)
+#define VMINSB     VX4(770)
+#define VMINSH     VX4(834)
+#define VMINSW     VX4(898)
+#define VMINUB     VX4(514)
+#define VMINUH     VX4(578)
+#define VMINUW     VX4(642)
+
 #define VCMPEQUB   VX4(6)
 #define VCMPEQUH   VX4(70)
 #define VCMPEQUW   VX4(134)
@@ -2820,6 +2833,11 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_andc_vec:
     case INDEX_op_not_vec:
         return 1;
+    case INDEX_op_smax_vec:
+    case INDEX_op_smin_vec:
+    case INDEX_op_umax_vec:
+    case INDEX_op_umin_vec:
+        return vece <= MO_32;
     case INDEX_op_cmp_vec:
         return vece <= MO_32 ? -1 : 0;
     default:
@@ -2917,7 +2935,11 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     static const uint32_t
         eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 },
         gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 },
-        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 };
+        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 },
+        umin_op[4] = { VMINUB, VMINUH, VMINUW, 0 },
+        smin_op[4] = { VMINSB, VMINSH, VMINSW, 0 },
+        umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, 0 },
+        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 };
 
     TCGType type = vecl + TCG_TYPE_V64;
     TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
@@ -2934,6 +2956,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
         return;
 
+    case INDEX_op_smin_vec:
+        insn = smin_op[vece];
+        break;
+    case INDEX_op_umin_vec:
+        insn = umin_op[vece];
+        break;
+    case INDEX_op_smax_vec:
+        insn = smax_op[vece];
+        break;
+    case INDEX_op_umax_vec:
+        insn = umax_op[vece];
+        break;
     case INDEX_op_and_vec:
         insn = VAND;
         break;
@@ -3226,6 +3260,10 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_andc_vec:
     case INDEX_op_orc_vec:
     case INDEX_op_cmp_vec:
+    case INDEX_op_smax_vec:
+    case INDEX_op_smin_vec:
+    case INDEX_op_umax_vec:
+    case INDEX_op_umin_vec:
         return &v_v_v;
     case INDEX_op_not_vec:
     case INDEX_op_dup_vec:
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 07/16] tcg/ppc: Add support for vector add/subtract
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (5 preceding siblings ...)
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 06/16] tcg/ppc: Add support for vector maximum/minimum Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 08/16] tcg/ppc: Add support for vector saturated add/subtract Richard Henderson
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

Add support for vector add/subtract using Altivec instructions:
VADDUBM, VADDUHM, VADDUWM, VSUBUBM, VSUBUHM, VSUBUWM.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.inc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 9c5630dc8a..c31694cc78 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -474,6 +474,14 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define STVX       XO31(231)
 #define STVEWX     XO31(199)
 
+#define VADDUBM    VX4(0)
+#define VADDUHM    VX4(64)
+#define VADDUWM    VX4(128)
+
+#define VSUBUBM    VX4(1024)
+#define VSUBUHM    VX4(1088)
+#define VSUBUWM    VX4(1152)
+
 #define VMAXSB     VX4(258)
 #define VMAXSH     VX4(322)
 #define VMAXSW     VX4(386)
@@ -2833,6 +2841,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_andc_vec:
     case INDEX_op_not_vec:
         return 1;
+    case INDEX_op_add_vec:
+    case INDEX_op_sub_vec:
     case INDEX_op_smax_vec:
     case INDEX_op_smin_vec:
     case INDEX_op_umax_vec:
@@ -2933,6 +2943,8 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
                            const TCGArg *args, const int *const_args)
 {
     static const uint32_t
+        add_op[4] = { VADDUBM, VADDUHM, VADDUWM, 0 },
+        sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, 0 },
         eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 },
         gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 },
         gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 },
@@ -2956,6 +2968,12 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
         return;
 
+    case INDEX_op_add_vec:
+        insn = add_op[vece];
+        break;
+    case INDEX_op_sub_vec:
+        insn = sub_op[vece];
+        break;
     case INDEX_op_smin_vec:
         insn = smin_op[vece];
         break;
@@ -3254,6 +3272,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return (TCG_TARGET_REG_BITS == 64 ? &S_S
                 : TARGET_LONG_BITS == 32 ? &S_S_S : &S_S_S_S);
 
+    case INDEX_op_add_vec:
+    case INDEX_op_sub_vec:
     case INDEX_op_and_vec:
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 08/16] tcg/ppc: Add support for vector saturated add/subtract
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (6 preceding siblings ...)
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 07/16] tcg/ppc: Add support for vector add/subtract Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 09/16] tcg/ppc: Prepare case for vector multiply Richard Henderson
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

Add support for vector saturated add/subtract using Altivec
instructions:
VADDSBS, VADDSHS, VADDSWS, VADDUBS, VADDUHS, VADDUWS, and
VSUBSBS, VSUBSHS, VSUBSWS, VSUBUBS, VSUBUHS, VSUBUWS.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     |  2 +-
 tcg/ppc/tcg-target.inc.c | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index a86ed57303..368c250c6a 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -155,7 +155,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_shv_vec          0
 #define TCG_TARGET_HAS_cmp_vec          1
 #define TCG_TARGET_HAS_mul_vec          0
-#define TCG_TARGET_HAS_sat_vec          0
+#define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
 #define TCG_TARGET_HAS_bitsel_vec       0
 #define TCG_TARGET_HAS_cmpsel_vec       0
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index c31694cc78..307e809fad 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -474,12 +474,24 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define STVX       XO31(231)
 #define STVEWX     XO31(199)
 
+#define VADDSBS    VX4(768)
+#define VADDUBS    VX4(512)
 #define VADDUBM    VX4(0)
+#define VADDSHS    VX4(832)
+#define VADDUHS    VX4(576)
 #define VADDUHM    VX4(64)
+#define VADDSWS    VX4(896)
+#define VADDUWS    VX4(640)
 #define VADDUWM    VX4(128)
 
+#define VSUBSBS    VX4(1792)
+#define VSUBUBS    VX4(1536)
 #define VSUBUBM    VX4(1024)
+#define VSUBSHS    VX4(1856)
+#define VSUBUHS    VX4(1600)
 #define VSUBUHM    VX4(1088)
+#define VSUBSWS    VX4(1920)
+#define VSUBUWS    VX4(1664)
 #define VSUBUWM    VX4(1152)
 
 #define VMAXSB     VX4(258)
@@ -2847,6 +2859,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_smin_vec:
     case INDEX_op_umax_vec:
     case INDEX_op_umin_vec:
+    case INDEX_op_ssadd_vec:
+    case INDEX_op_sssub_vec:
+    case INDEX_op_usadd_vec:
+    case INDEX_op_ussub_vec:
         return vece <= MO_32;
     case INDEX_op_cmp_vec:
         return vece <= MO_32 ? -1 : 0;
@@ -2948,6 +2964,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 },
         gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 },
         gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 },
+        ssadd_op[4] = { VADDSBS, VADDSHS, VADDSWS, 0 },
+        usadd_op[4] = { VADDUBS, VADDUHS, VADDUWS, 0 },
+        sssub_op[4] = { VSUBSBS, VSUBSHS, VSUBSWS, 0 },
+        ussub_op[4] = { VSUBUBS, VSUBUHS, VSUBUWS, 0 },
         umin_op[4] = { VMINUB, VMINUH, VMINUW, 0 },
         smin_op[4] = { VMINSB, VMINSH, VMINSW, 0 },
         umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, 0 },
@@ -2974,6 +2994,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_sub_vec:
         insn = sub_op[vece];
         break;
+    case INDEX_op_ssadd_vec:
+        insn = ssadd_op[vece];
+        break;
+    case INDEX_op_sssub_vec:
+        insn = sssub_op[vece];
+        break;
+    case INDEX_op_usadd_vec:
+        insn = usadd_op[vece];
+        break;
+    case INDEX_op_ussub_vec:
+        insn = ussub_op[vece];
+        break;
     case INDEX_op_smin_vec:
         insn = smin_op[vece];
         break;
@@ -3280,6 +3312,10 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_andc_vec:
     case INDEX_op_orc_vec:
     case INDEX_op_cmp_vec:
+    case INDEX_op_ssadd_vec:
+    case INDEX_op_sssub_vec:
+    case INDEX_op_usadd_vec:
+    case INDEX_op_ussub_vec:
     case INDEX_op_smax_vec:
     case INDEX_op_smin_vec:
     case INDEX_op_umax_vec:
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 09/16] tcg/ppc: Prepare case for vector multiply
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (7 preceding siblings ...)
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 08/16] tcg/ppc: Add support for vector saturated add/subtract Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-30  9:52   ` Aleksandar Markovic
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 10/16] tcg/ppc: Support vector shift by immediate Richard Henderson
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

This line is just preparation for full vector multiply support
in some of subsequent patches.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.inc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 307e809fad..e19400609c 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -3306,6 +3306,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_add_vec:
     case INDEX_op_sub_vec:
+    case INDEX_op_mul_vec:
     case INDEX_op_and_vec:
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 10/16] tcg/ppc: Support vector shift by immediate
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (8 preceding siblings ...)
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 09/16] tcg/ppc: Prepare case for vector multiply Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 11/16] tcg/ppc: Support vector multiply Richard Henderson
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

For Altivec, this is done via vector shift by vector,
and loading the immediate into a register.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     |  2 +-
 tcg/ppc/tcg-target.inc.c | 58 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 368c250c6a..766706fd30 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -152,7 +152,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_shi_vec          0
 #define TCG_TARGET_HAS_shs_vec          0
-#define TCG_TARGET_HAS_shv_vec          0
+#define TCG_TARGET_HAS_shv_vec          1
 #define TCG_TARGET_HAS_cmp_vec          1
 #define TCG_TARGET_HAS_mul_vec          0
 #define TCG_TARGET_HAS_sat_vec          1
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index e19400609c..7ddef950f7 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -517,6 +517,16 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define VCMPGTUH   VX4(582)
 #define VCMPGTUW   VX4(646)
 
+#define VSLB       VX4(260)
+#define VSLH       VX4(324)
+#define VSLW       VX4(388)
+#define VSRB       VX4(516)
+#define VSRH       VX4(580)
+#define VSRW       VX4(644)
+#define VSRAB      VX4(772)
+#define VSRAH      VX4(836)
+#define VSRAW      VX4(900)
+
 #define VAND       VX4(1028)
 #define VANDC      VX4(1092)
 #define VNOR       VX4(1284)
@@ -2863,8 +2873,14 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_sssub_vec:
     case INDEX_op_usadd_vec:
     case INDEX_op_ussub_vec:
+    case INDEX_op_shlv_vec:
+    case INDEX_op_shrv_vec:
+    case INDEX_op_sarv_vec:
         return vece <= MO_32;
     case INDEX_op_cmp_vec:
+    case INDEX_op_shli_vec:
+    case INDEX_op_shri_vec:
+    case INDEX_op_sari_vec:
         return vece <= MO_32 ? -1 : 0;
     default:
         return 0;
@@ -2971,7 +2987,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         umin_op[4] = { VMINUB, VMINUH, VMINUW, 0 },
         smin_op[4] = { VMINSB, VMINSH, VMINSW, 0 },
         umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, 0 },
-        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 };
+        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 },
+        shlv_op[4] = { VSLB, VSLH, VSLW, 0 },
+        shrv_op[4] = { VSRB, VSRH, VSRW, 0 },
+        sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 };
 
     TCGType type = vecl + TCG_TYPE_V64;
     TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
@@ -3018,6 +3037,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_umax_vec:
         insn = umax_op[vece];
         break;
+    case INDEX_op_shlv_vec:
+        insn = shlv_op[vece];
+        break;
+    case INDEX_op_shrv_vec:
+        insn = shrv_op[vece];
+        break;
+    case INDEX_op_sarv_vec:
+        insn = sarv_op[vece];
+        break;
     case INDEX_op_and_vec:
         insn = VAND;
         break;
@@ -3062,6 +3090,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     tcg_out32(s, insn | VRT(a0) | VRA(a1) | VRB(a2));
 }
 
+static void expand_vec_shi(TCGType type, unsigned vece, TCGv_vec v0,
+                           TCGv_vec v1, TCGArg imm, TCGOpcode opci)
+{
+    TCGv_vec t1 = tcg_temp_new_vec(type);
+
+    /* Splat w/bytes for xxspltib.  */
+    tcg_gen_dupi_vec(MO_8, t1, imm & ((8 << vece) - 1));
+    vec_gen_3(opci, type, vece, tcgv_vec_arg(v0),
+              tcgv_vec_arg(v1), tcgv_vec_arg(t1));
+    tcg_temp_free_vec(t1);
+}
+
 static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
                            TCGv_vec v1, TCGv_vec v2, TCGCond cond)
 {
@@ -3113,14 +3153,25 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
 {
     va_list va;
     TCGv_vec v0, v1, v2;
+    TCGArg a2;
 
     va_start(va, a0);
     v0 = temp_tcgv_vec(arg_temp(a0));
     v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
-    v2 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+    a2 = va_arg(va, TCGArg);
 
     switch (opc) {
+    case INDEX_op_shli_vec:
+        expand_vec_shi(type, vece, v0, v1, a2, INDEX_op_shlv_vec);
+        break;
+    case INDEX_op_shri_vec:
+        expand_vec_shi(type, vece, v0, v1, a2, INDEX_op_shrv_vec);
+        break;
+    case INDEX_op_sari_vec:
+        expand_vec_shi(type, vece, v0, v1, a2, INDEX_op_sarv_vec);
+        break;
     case INDEX_op_cmp_vec:
+        v2 = temp_tcgv_vec(arg_temp(a2));
         expand_vec_cmp(type, vece, v0, v1, v2, va_arg(va, TCGArg));
         break;
     default:
@@ -3321,6 +3372,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_smin_vec:
     case INDEX_op_umax_vec:
     case INDEX_op_umin_vec:
+    case INDEX_op_shlv_vec:
+    case INDEX_op_shrv_vec:
+    case INDEX_op_sarv_vec:
         return &v_v_v;
     case INDEX_op_not_vec:
     case INDEX_op_dup_vec:
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 11/16] tcg/ppc: Support vector multiply
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (9 preceding siblings ...)
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 10/16] tcg/ppc: Support vector shift by immediate Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 12/16] tcg/ppc: Support vector dup2 Richard Henderson
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

For Altivec, this is always an expansion.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     |   2 +-
 tcg/ppc/tcg-target.opc.h |   8 +++
 tcg/ppc/tcg-target.inc.c | 112 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 120 insertions(+), 2 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 766706fd30..a130192cbd 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -154,7 +154,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          1
 #define TCG_TARGET_HAS_cmp_vec          1
-#define TCG_TARGET_HAS_mul_vec          0
+#define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
 #define TCG_TARGET_HAS_bitsel_vec       0
diff --git a/tcg/ppc/tcg-target.opc.h b/tcg/ppc/tcg-target.opc.h
index fa680dd6a0..db24a11987 100644
--- a/tcg/ppc/tcg-target.opc.h
+++ b/tcg/ppc/tcg-target.opc.h
@@ -3,3 +3,11 @@
  * emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
  * consider these to be UNSPEC with names.
  */
+
+DEF(ppc_mrgh_vec, 1, 2, 0, IMPLVEC)
+DEF(ppc_mrgl_vec, 1, 2, 0, IMPLVEC)
+DEF(ppc_msum_vec, 1, 3, 0, IMPLVEC)
+DEF(ppc_muleu_vec, 1, 2, 0, IMPLVEC)
+DEF(ppc_mulou_vec, 1, 2, 0, IMPLVEC)
+DEF(ppc_pkum_vec, 1, 2, 0, IMPLVEC)
+DEF(ppc_rotl_vec, 1, 2, 0, IMPLVEC)
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 7ddef950f7..cb604b76a3 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -526,6 +526,25 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define VSRAB      VX4(772)
 #define VSRAH      VX4(836)
 #define VSRAW      VX4(900)
+#define VRLB       VX4(4)
+#define VRLH       VX4(68)
+#define VRLW       VX4(132)
+
+#define VMULEUB    VX4(520)
+#define VMULEUH    VX4(584)
+#define VMULOUB    VX4(8)
+#define VMULOUH    VX4(72)
+#define VMSUMUHM   VX4(38)
+
+#define VMRGHB     VX4(12)
+#define VMRGHH     VX4(76)
+#define VMRGHW     VX4(140)
+#define VMRGLB     VX4(268)
+#define VMRGLH     VX4(332)
+#define VMRGLW     VX4(396)
+
+#define VPKUHUM    VX4(14)
+#define VPKUWUM    VX4(78)
 
 #define VAND       VX4(1028)
 #define VANDC      VX4(1092)
@@ -2878,6 +2897,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_sarv_vec:
         return vece <= MO_32;
     case INDEX_op_cmp_vec:
+    case INDEX_op_mul_vec:
     case INDEX_op_shli_vec:
     case INDEX_op_shri_vec:
     case INDEX_op_sari_vec:
@@ -2990,7 +3010,13 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 },
         shlv_op[4] = { VSLB, VSLH, VSLW, 0 },
         shrv_op[4] = { VSRB, VSRH, VSRW, 0 },
-        sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 };
+        sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 },
+        mrgh_op[4] = { VMRGHB, VMRGHH, VMRGHW, 0 },
+        mrgl_op[4] = { VMRGLB, VMRGLH, VMRGLW, 0 },
+        muleu_op[4] = { VMULEUB, VMULEUH, 0, 0 },
+        mulou_op[4] = { VMULOUB, VMULOUH, 0, 0 },
+        pkum_op[4] = { VPKUHUM, VPKUWUM, 0, 0 },
+        rotl_op[4] = { VRLB, VRLH, VRLW, 0 };
 
     TCGType type = vecl + TCG_TYPE_V64;
     TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
@@ -3079,6 +3105,29 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_ppc_mrgh_vec:
+        insn = mrgh_op[vece];
+        break;
+    case INDEX_op_ppc_mrgl_vec:
+        insn = mrgl_op[vece];
+        break;
+    case INDEX_op_ppc_muleu_vec:
+        insn = muleu_op[vece];
+        break;
+    case INDEX_op_ppc_mulou_vec:
+        insn = mulou_op[vece];
+        break;
+    case INDEX_op_ppc_pkum_vec:
+        insn = pkum_op[vece];
+        break;
+    case INDEX_op_ppc_rotl_vec:
+        insn = rotl_op[vece];
+        break;
+    case INDEX_op_ppc_msum_vec:
+        tcg_debug_assert(vece == MO_16);
+        tcg_out32(s, VMSUMUHM | VRT(a0) | VRA(a1) | VRB(a2) | VRC(args[3]));
+        return;
+
     case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
@@ -3148,6 +3197,53 @@ static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
     }
 }
 
+static void expand_vec_mul(TCGType type, unsigned vece, TCGv_vec v0,
+                           TCGv_vec v1, TCGv_vec v2)
+{
+    TCGv_vec t1 = tcg_temp_new_vec(type);
+    TCGv_vec t2 = tcg_temp_new_vec(type);
+    TCGv_vec t3, t4;
+
+    switch (vece) {
+    case MO_8:
+    case MO_16:
+        vec_gen_3(INDEX_op_ppc_muleu_vec, type, vece, tcgv_vec_arg(t1),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+        vec_gen_3(INDEX_op_ppc_mulou_vec, type, vece, tcgv_vec_arg(t2),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+        vec_gen_3(INDEX_op_ppc_mrgh_vec, type, vece + 1, tcgv_vec_arg(v0),
+                  tcgv_vec_arg(t1), tcgv_vec_arg(t2));
+        vec_gen_3(INDEX_op_ppc_mrgl_vec, type, vece + 1, tcgv_vec_arg(t1),
+                  tcgv_vec_arg(t1), tcgv_vec_arg(t2));
+        vec_gen_3(INDEX_op_ppc_pkum_vec, type, vece, tcgv_vec_arg(v0),
+                  tcgv_vec_arg(v0), tcgv_vec_arg(t1));
+	break;
+
+    case MO_32:
+        t3 = tcg_temp_new_vec(type);
+        t4 = tcg_temp_new_vec(type);
+        tcg_gen_dupi_vec(MO_8, t4, -16);
+        vec_gen_3(INDEX_op_ppc_rotl_vec, type, MO_32, tcgv_vec_arg(t1),
+                  tcgv_vec_arg(v2), tcgv_vec_arg(t4));
+        vec_gen_3(INDEX_op_ppc_mulou_vec, type, MO_16, tcgv_vec_arg(t2),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+        tcg_gen_dupi_vec(MO_8, t3, 0);
+        vec_gen_4(INDEX_op_ppc_msum_vec, type, MO_16, tcgv_vec_arg(t3),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(t1), tcgv_vec_arg(t3));
+        vec_gen_3(INDEX_op_shlv_vec, type, MO_32, tcgv_vec_arg(t3),
+                  tcgv_vec_arg(t3), tcgv_vec_arg(t4));
+        tcg_gen_add_vec(MO_32, v0, t2, t3);
+        tcg_temp_free_vec(t3);
+        tcg_temp_free_vec(t4);
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+    tcg_temp_free_vec(t1);
+    tcg_temp_free_vec(t2);
+}
+
 void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                        TCGArg a0, ...)
 {
@@ -3174,6 +3270,10 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
         v2 = temp_tcgv_vec(arg_temp(a2));
         expand_vec_cmp(type, vece, v0, v1, v2, va_arg(va, TCGArg));
         break;
+    case INDEX_op_mul_vec:
+        v2 = temp_tcgv_vec(arg_temp(a2));
+        expand_vec_mul(type, vece, v0, v1, v2);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -3220,6 +3320,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef v_r = { .args_ct_str = { "v", "r" } };
     static const TCGTargetOpDef v_v = { .args_ct_str = { "v", "v" } };
     static const TCGTargetOpDef v_v_v = { .args_ct_str = { "v", "v", "v" } };
+    static const TCGTargetOpDef v_v_v_v
+        = { .args_ct_str = { "v", "v", "v", "v" } };
 
     switch (op) {
     case INDEX_op_goto_ptr:
@@ -3375,6 +3477,12 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_shlv_vec:
     case INDEX_op_shrv_vec:
     case INDEX_op_sarv_vec:
+    case INDEX_op_ppc_mrgh_vec:
+    case INDEX_op_ppc_mrgl_vec:
+    case INDEX_op_ppc_muleu_vec:
+    case INDEX_op_ppc_mulou_vec:
+    case INDEX_op_ppc_pkum_vec:
+    case INDEX_op_ppc_rotl_vec:
         return &v_v_v;
     case INDEX_op_not_vec:
     case INDEX_op_dup_vec:
@@ -3383,6 +3491,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_st_vec:
     case INDEX_op_dupm_vec:
         return &v_r;
+    case INDEX_op_ppc_msum_vec:
+        return &v_v_v_v;
 
     default:
         return NULL;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 12/16] tcg/ppc: Support vector dup2
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (10 preceding siblings ...)
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 11/16] tcg/ppc: Support vector multiply Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 13/16] tcg/ppc: Enable Altivec detection Richard Henderson
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

This is only used for 32-bit hosts.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.inc.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index cb604b76a3..9a44670180 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -3105,6 +3105,14 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_dup2_vec:
+        assert(TCG_TARGET_REG_BITS == 32);
+        /* With inputs a1 = xLxx, a2 = xHxx  */
+        tcg_out32(s, VMRGHW | VRT(a0) | VRA(a2) | VRB(a1));  /* a0  = xxHL */
+        tcg_out_vsldoi(s, TCG_VEC_TMP1, a0, a0, 8);          /* tmp = HLxx */
+        tcg_out_vsldoi(s, a0, a0, TCG_VEC_TMP1, 8);          /* a0  = HLHL */
+        return;
+
     case INDEX_op_ppc_mrgh_vec:
         insn = mrgh_op[vece];
         break;
@@ -3483,6 +3491,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_ppc_mulou_vec:
     case INDEX_op_ppc_pkum_vec:
     case INDEX_op_ppc_rotl_vec:
+    case INDEX_op_dup2_vec:
         return &v_v_v;
     case INDEX_op_not_vec:
     case INDEX_op_dup_vec:
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 13/16] tcg/ppc: Enable Altivec detection
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (11 preceding siblings ...)
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 12/16] tcg/ppc: Support vector dup2 Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 14/16] tcg/ppc: Update vector support to v2.06 Richard Henderson
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

Now that we have implemented the required tcg operations,
we can enable detection of host vector support.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.inc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 9a44670180..c6defd4df7 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -3513,7 +3513,7 @@ static void tcg_target_init(TCGContext *s)
     unsigned long hwcap = qemu_getauxval(AT_HWCAP);
     unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2);
 
-    if (hwcap & /* PPC_FEATURE_HAS_ALTIVEC -- NOT YET */ 0) {
+    if (hwcap & PPC_FEATURE_HAS_ALTIVEC) {
         have_isa_altivec = true;
     }
     if (hwcap & PPC_FEATURE_ARCH_2_06) {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 14/16] tcg/ppc: Update vector support to v2.06
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (12 preceding siblings ...)
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 13/16] tcg/ppc: Enable Altivec detection Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 15/16] tcg/ppc: Update vector support to v2.07 Richard Henderson
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

This includes double-word loads and stores, double-word load and splat,
double-word permute, and bit select.  All of which require multiple
operations in the base Altivec instruction set.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     |  5 ++--
 tcg/ppc/tcg-target.inc.c | 51 ++++++++++++++++++++++++++++++++++++----
 2 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index a130192cbd..40544f996d 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -60,6 +60,7 @@ typedef enum {
 
 extern bool have_isa_altivec;
 extern bool have_isa_2_06;
+extern bool have_isa_2_06_vsx;
 extern bool have_isa_3_00;
 
 /* optional instructions automatically implemented */
@@ -141,7 +142,7 @@ extern bool have_isa_3_00;
  * instruction and substituting two 32-bit stores makes the generated
  * code quite large.
  */
-#define TCG_TARGET_HAS_v64              0
+#define TCG_TARGET_HAS_v64              have_isa_2_06_vsx
 #define TCG_TARGET_HAS_v128             have_isa_altivec
 #define TCG_TARGET_HAS_v256             0
 
@@ -157,7 +158,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
-#define TCG_TARGET_HAS_bitsel_vec       0
+#define TCG_TARGET_HAS_bitsel_vec       have_isa_2_06_vsx
 #define TCG_TARGET_HAS_cmpsel_vec       0
 
 void flush_icache_range(uintptr_t start, uintptr_t stop);
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index c6defd4df7..50d1b5612c 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -66,6 +66,7 @@ static tcg_insn_unit *tb_ret_addr;
 
 bool have_isa_altivec;
 bool have_isa_2_06;
+bool have_isa_2_06_vsx;
 bool have_isa_3_00;
 
 #define HAVE_ISA_2_06  have_isa_2_06
@@ -470,9 +471,12 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define LVEBX      XO31(7)
 #define LVEHX      XO31(39)
 #define LVEWX      XO31(71)
+#define LXSDX      XO31(588)      /* v2.06 */
+#define LXVDSX     XO31(332)      /* v2.06 */
 
 #define STVX       XO31(231)
 #define STVEWX     XO31(199)
+#define STXSDX     XO31(716)      /* v2.06 */
 
 #define VADDSBS    VX4(768)
 #define VADDUBS    VX4(512)
@@ -561,6 +565,9 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 
 #define VSLDOI     VX4(44)
 
+#define XXPERMDI   (OPCD(60) | (10 << 3))   /* v2.06 */
+#define XXSEL      (OPCD(60) | (3 << 4))    /* v2.06 */
+
 #define RT(r) ((r)<<21)
 #define RS(r) ((r)<<21)
 #define RA(r) ((r)<<16)
@@ -887,11 +894,21 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
         add = 0;
     }
 
-    load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
-    if (TCG_TARGET_REG_BITS == 64) {
-        new_pool_l2(s, rel, s->code_ptr, add, val, val);
+    if (have_isa_2_06_vsx) {
+        load_insn = type == TCG_TYPE_V64 ? LXSDX : LXVDSX;
+        load_insn |= VRT(ret) | RB(TCG_REG_TMP1) | 1;
+        if (TCG_TARGET_REG_BITS == 64) {
+            new_pool_label(s, val, rel, s->code_ptr, add);
+        } else {
+            new_pool_l2(s, rel, s->code_ptr, add, val, val);
+        }
     } else {
-        new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
+        load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
+        if (TCG_TARGET_REG_BITS == 64) {
+            new_pool_l2(s, rel, s->code_ptr, add, val, val);
+        } else {
+            new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
+        }
     }
 
     if (USE_REG_TB) {
@@ -1139,6 +1156,10 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
         /* fallthru */
     case TCG_TYPE_V64:
         tcg_debug_assert(ret >= TCG_REG_V0);
+        if (have_isa_2_06_vsx) {
+            tcg_out_mem_long(s, 0, LXSDX | 1, ret, base, offset);
+            break;
+        }
         assert((offset & 7) == 0);
         tcg_out_mem_long(s, 0, LVX, ret, base, offset & -16);
         if (offset & 8) {
@@ -1183,6 +1204,10 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
         /* fallthru */
     case TCG_TYPE_V64:
         tcg_debug_assert(arg >= TCG_REG_V0);
+        if (have_isa_2_06_vsx) {
+            tcg_out_mem_long(s, 0, STXSDX | 1, arg, base, offset);
+            break;
+        }
         assert((offset & 7) == 0);
         if (offset & 8) {
             tcg_out_vsldoi(s, TCG_VEC_TMP1, arg, arg, 8);
@@ -2902,6 +2927,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_shri_vec:
     case INDEX_op_sari_vec:
         return vece <= MO_32 ? -1 : 0;
+    case INDEX_op_bitsel_vec:
+        return have_isa_2_06_vsx;
     default:
         return 0;
     }
@@ -2928,6 +2955,10 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
         tcg_out32(s, VSPLTW | VRT(dst) | VRB(src) | (1 << 16));
         break;
     case MO_64:
+        if (have_isa_2_06_vsx) {
+            tcg_out32(s, XXPERMDI | 7 | VRT(dst) | VRA(src) | VRB(src));
+            break;
+        }
         tcg_out_vsldoi(s, TCG_VEC_TMP1, src, src, 8);
         tcg_out_vsldoi(s, dst, TCG_VEC_TMP1, src, 8);
         break;
@@ -2971,6 +3002,10 @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
         tcg_out32(s, VSPLTW | VRT(out) | VRB(out) | (elt << 16));
         break;
     case MO_64:
+        if (have_isa_2_06_vsx) {
+            tcg_out_mem_long(s, 0, LXVDSX | 1, out, base, offset);
+            break;
+        }
         assert((offset & 7) == 0);
         tcg_out_mem_long(s, 0, LVX, out, base, offset & -16);
         tcg_out_vsldoi(s, TCG_VEC_TMP1, out, out, 8);
@@ -3105,6 +3140,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_bitsel_vec:
+        tcg_out32(s, XXSEL | 0xf | VRT(a0) | VRC(a1) | VRB(a2) | VRA(args[3]));
+        return;
+
     case INDEX_op_dup2_vec:
         assert(TCG_TARGET_REG_BITS == 32);
         /* With inputs a1 = xLxx, a2 = xHxx  */
@@ -3500,6 +3539,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_st_vec:
     case INDEX_op_dupm_vec:
         return &v_r;
+    case INDEX_op_bitsel_vec:
     case INDEX_op_ppc_msum_vec:
         return &v_v_v_v;
 
@@ -3518,6 +3558,9 @@ static void tcg_target_init(TCGContext *s)
     }
     if (hwcap & PPC_FEATURE_ARCH_2_06) {
         have_isa_2_06 = true;
+        if (hwcap & PPC_FEATURE_HAS_VSX) {
+            have_isa_2_06_vsx = true;
+        }
     }
 #ifdef PPC_FEATURE2_ARCH_3_00
     if (hwcap2 & PPC_FEATURE2_ARCH_3_00) {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 15/16] tcg/ppc: Update vector support to v2.07
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (13 preceding siblings ...)
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 14/16] tcg/ppc: Update vector support to v2.06 Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-30 11:50   ` Aleksandar Markovic
  2019-06-30 13:37   ` Aleksandar Markovic
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 16/16] tcg/ppc: Update vector support to v3.00 Richard Henderson
                   ` (2 subsequent siblings)
  17 siblings, 2 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

This includes single-word loads and stores, lots of double-word
arithmetic, and a few extra logical operations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     |   3 +-
 tcg/ppc/tcg-target.inc.c | 128 ++++++++++++++++++++++++++++++---------
 2 files changed, 103 insertions(+), 28 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 40544f996d..b8355d0a56 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -61,6 +61,7 @@ typedef enum {
 extern bool have_isa_altivec;
 extern bool have_isa_2_06;
 extern bool have_isa_2_06_vsx;
+extern bool have_isa_2_07_vsx;
 extern bool have_isa_3_00;
 
 /* optional instructions automatically implemented */
@@ -147,7 +148,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_v256             0
 
 #define TCG_TARGET_HAS_andc_vec         1
-#define TCG_TARGET_HAS_orc_vec          0
+#define TCG_TARGET_HAS_orc_vec          have_isa_2_07_vsx
 #define TCG_TARGET_HAS_not_vec          1
 #define TCG_TARGET_HAS_neg_vec          0
 #define TCG_TARGET_HAS_abs_vec          0
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 50d1b5612c..af86ab07dd 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -67,6 +67,7 @@ static tcg_insn_unit *tb_ret_addr;
 bool have_isa_altivec;
 bool have_isa_2_06;
 bool have_isa_2_06_vsx;
+bool have_isa_2_07_vsx;
 bool have_isa_3_00;
 
 #define HAVE_ISA_2_06  have_isa_2_06
@@ -473,10 +474,12 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define LVEWX      XO31(71)
 #define LXSDX      XO31(588)      /* v2.06 */
 #define LXVDSX     XO31(332)      /* v2.06 */
+#define LXSIWZX    XO31(12)       /* v2.07 */
 
 #define STVX       XO31(231)
 #define STVEWX     XO31(199)
 #define STXSDX     XO31(716)      /* v2.06 */
+#define STXSIWX    XO31(140)      /* v2.07 */
 
 #define VADDSBS    VX4(768)
 #define VADDUBS    VX4(512)
@@ -487,6 +490,7 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define VADDSWS    VX4(896)
 #define VADDUWS    VX4(640)
 #define VADDUWM    VX4(128)
+#define VADDUDM    VX4(192)       /* v2.07 */
 
 #define VSUBSBS    VX4(1792)
 #define VSUBUBS    VX4(1536)
@@ -497,47 +501,62 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define VSUBSWS    VX4(1920)
 #define VSUBUWS    VX4(1664)
 #define VSUBUWM    VX4(1152)
+#define VSUBUDM    VX4(1216)      /* v2.07 */
 
 #define VMAXSB     VX4(258)
 #define VMAXSH     VX4(322)
 #define VMAXSW     VX4(386)
+#define VMAXSD     VX4(450)       /* v2.07 */
 #define VMAXUB     VX4(2)
 #define VMAXUH     VX4(66)
 #define VMAXUW     VX4(130)
+#define VMAXUD     VX4(194)       /* v2.07 */
 #define VMINSB     VX4(770)
 #define VMINSH     VX4(834)
 #define VMINSW     VX4(898)
+#define VMINSD     VX4(962)       /* v2.07 */
 #define VMINUB     VX4(514)
 #define VMINUH     VX4(578)
 #define VMINUW     VX4(642)
+#define VMINUD     VX4(706)       /* v2.07 */
 
 #define VCMPEQUB   VX4(6)
 #define VCMPEQUH   VX4(70)
 #define VCMPEQUW   VX4(134)
+#define VCMPEQUD   VX4(199)       /* v2.07 */
 #define VCMPGTSB   VX4(774)
 #define VCMPGTSH   VX4(838)
 #define VCMPGTSW   VX4(902)
+#define VCMPGTSD   VX4(967)       /* v2.07 */
 #define VCMPGTUB   VX4(518)
 #define VCMPGTUH   VX4(582)
 #define VCMPGTUW   VX4(646)
+#define VCMPGTUD   VX4(711)       /* v2.07 */
 
 #define VSLB       VX4(260)
 #define VSLH       VX4(324)
 #define VSLW       VX4(388)
+#define VSLD       VX4(1476)      /* v2.07 */
 #define VSRB       VX4(516)
 #define VSRH       VX4(580)
 #define VSRW       VX4(644)
+#define VSRD       VX4(1732)      /* v2.07 */
 #define VSRAB      VX4(772)
 #define VSRAH      VX4(836)
 #define VSRAW      VX4(900)
+#define VSRAD      VX4(964)       /* v2.07 */
 #define VRLB       VX4(4)
 #define VRLH       VX4(68)
 #define VRLW       VX4(132)
+#define VRLD       VX4(196)       /* v2.07 */
 
 #define VMULEUB    VX4(520)
 #define VMULEUH    VX4(584)
+#define VMULEUW    VX4(648)       /* v2.07 */
 #define VMULOUB    VX4(8)
 #define VMULOUH    VX4(72)
+#define VMULOUW    VX4(136)       /* v2.07 */
+#define VMULUWM    VX4(137)       /* v2.07 */
 #define VMSUMUHM   VX4(38)
 
 #define VMRGHB     VX4(12)
@@ -555,6 +574,9 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define VNOR       VX4(1284)
 #define VOR        VX4(1156)
 #define VXOR       VX4(1220)
+#define VEQV       VX4(1668)      /* v2.07 */
+#define VNAND      VX4(1412)      /* v2.07 */
+#define VORC       VX4(1348)      /* v2.07 */
 
 #define VSPLTB     VX4(524)
 #define VSPLTH     VX4(588)
@@ -568,6 +590,11 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define XXPERMDI   (OPCD(60) | (10 << 3))   /* v2.06 */
 #define XXSEL      (OPCD(60) | (3 << 4))    /* v2.06 */
 
+#define MFVSRD     XO31(51)       /* v2.07 */
+#define MFVSRWZ    XO31(115)      /* v2.07 */
+#define MTVSRD     XO31(179)      /* v2.07 */
+#define MTVSRWZ    XO31(179)      /* v2.07 */
+
 #define RT(r) ((r)<<21)
 #define RS(r) ((r)<<21)
 #define RA(r) ((r)<<16)
@@ -697,12 +724,27 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
         tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
         /* fallthru */
     case TCG_TYPE_I32:
-        if (ret < TCG_REG_V0 && arg < TCG_REG_V0) {
-            tcg_out32(s, OR | SAB(arg, ret, arg));
-            break;
-        } else if (ret < TCG_REG_V0 || arg < TCG_REG_V0) {
-            /* Altivec does not support vector/integer moves.  */
-            return false;
+        if (ret < TCG_REG_V0) {
+            if (arg < TCG_REG_V0) {
+                tcg_out32(s, OR | SAB(arg, ret, arg));
+                break;
+            } else if (have_isa_2_07_vsx) {
+                tcg_out32(s, (type == TCG_TYPE_I32 ? MFVSRWZ : MFVSRD)
+                          | VRT(arg) | RA(ret) | 1);
+                break;
+            } else {
+                /* Altivec does not support vector->integer moves.  */
+                return false;
+            }
+        } else if (arg < TCG_REG_V0) {
+            if (have_isa_2_07_vsx) {
+                tcg_out32(s, (type == TCG_TYPE_I32 ? MTVSRWZ : MTVSRD)
+                          | VRT(ret) | RA(arg) | 1);
+                break;
+            } else {
+                /* Altivec does not support integer->vector moves.  */
+                return false;
+            }
         }
         /* fallthru */
     case TCG_TYPE_V64:
@@ -1140,6 +1182,10 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
             tcg_out_mem_long(s, LWZ, LWZX, ret, base, offset);
             break;
         }
+        if (have_isa_2_07_vsx) {
+            tcg_out_mem_long(s, 0, LXSIWZX | 1, ret, base, offset);
+            break;
+        }
         assert((offset & 3) == 0);
         tcg_out_mem_long(s, 0, LVEWX, ret, base, offset);
         shift = (offset - 4) & 0xc;
@@ -1187,6 +1233,10 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
             tcg_out_mem_long(s, STW, STWX, arg, base, offset);
             break;
         }
+        if (have_isa_2_07_vsx) {
+            tcg_out_mem_long(s, 0, STXSIWX | 1, arg, base, offset);
+            break;
+        }
         assert((offset & 3) == 0);
         shift = (offset - 4) & 0xc;
         if (shift) {
@@ -2907,26 +2957,37 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_andc_vec:
     case INDEX_op_not_vec:
         return 1;
+    case INDEX_op_orc_vec:
+        return have_isa_2_07_vsx;
     case INDEX_op_add_vec:
     case INDEX_op_sub_vec:
     case INDEX_op_smax_vec:
     case INDEX_op_smin_vec:
     case INDEX_op_umax_vec:
     case INDEX_op_umin_vec:
+    case INDEX_op_shlv_vec:
+    case INDEX_op_shrv_vec:
+    case INDEX_op_sarv_vec:
+        return vece <= MO_32 || have_isa_2_07_vsx;
     case INDEX_op_ssadd_vec:
     case INDEX_op_sssub_vec:
     case INDEX_op_usadd_vec:
     case INDEX_op_ussub_vec:
-    case INDEX_op_shlv_vec:
-    case INDEX_op_shrv_vec:
-    case INDEX_op_sarv_vec:
         return vece <= MO_32;
     case INDEX_op_cmp_vec:
-    case INDEX_op_mul_vec:
     case INDEX_op_shli_vec:
     case INDEX_op_shri_vec:
     case INDEX_op_sari_vec:
-        return vece <= MO_32 ? -1 : 0;
+        return vece <= MO_32 || have_isa_2_07_vsx ? -1 : 0;
+    case INDEX_op_mul_vec:
+        switch (vece) {
+        case MO_8:
+        case MO_16:
+            return -1;
+        case MO_32:
+            return have_isa_2_07_vsx ? 1 : -1;
+        }
+        return 0;
     case INDEX_op_bitsel_vec:
         return have_isa_2_06_vsx;
     default:
@@ -3030,28 +3091,28 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
                            const TCGArg *args, const int *const_args)
 {
     static const uint32_t
-        add_op[4] = { VADDUBM, VADDUHM, VADDUWM, 0 },
-        sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, 0 },
-        eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 },
-        gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 },
-        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 },
+        add_op[4] = { VADDUBM, VADDUHM, VADDUWM, VADDUDM },
+        sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, VSUBUDM },
+        eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, VCMPEQUD },
+        gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, VCMPGTSD },
+        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, VCMPGTUD },
         ssadd_op[4] = { VADDSBS, VADDSHS, VADDSWS, 0 },
         usadd_op[4] = { VADDUBS, VADDUHS, VADDUWS, 0 },
         sssub_op[4] = { VSUBSBS, VSUBSHS, VSUBSWS, 0 },
         ussub_op[4] = { VSUBUBS, VSUBUHS, VSUBUWS, 0 },
-        umin_op[4] = { VMINUB, VMINUH, VMINUW, 0 },
-        smin_op[4] = { VMINSB, VMINSH, VMINSW, 0 },
-        umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, 0 },
-        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 },
-        shlv_op[4] = { VSLB, VSLH, VSLW, 0 },
-        shrv_op[4] = { VSRB, VSRH, VSRW, 0 },
-        sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 },
+        umin_op[4] = { VMINUB, VMINUH, VMINUW, VMINUD },
+        smin_op[4] = { VMINSB, VMINSH, VMINSW, VMINSD },
+        umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, VMAXUD },
+        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, VMAXSD },
+        shlv_op[4] = { VSLB, VSLH, VSLW, VSLD },
+        shrv_op[4] = { VSRB, VSRH, VSRW, VSRD },
+        sarv_op[4] = { VSRAB, VSRAH, VSRAW, VSRAD },
         mrgh_op[4] = { VMRGHB, VMRGHH, VMRGHW, 0 },
         mrgl_op[4] = { VMRGLB, VMRGLH, VMRGLW, 0 },
-        muleu_op[4] = { VMULEUB, VMULEUH, 0, 0 },
-        mulou_op[4] = { VMULOUB, VMULOUH, 0, 0 },
+        muleu_op[4] = { VMULEUB, VMULEUH, VMULEUW, 0 },
+        mulou_op[4] = { VMULOUB, VMULOUH, VMULOUW, 0 },
         pkum_op[4] = { VPKUHUM, VPKUWUM, 0, 0 },
-        rotl_op[4] = { VRLB, VRLH, VRLW, 0 };
+        rotl_op[4] = { VRLB, VRLH, VRLW, VRLD };
 
     TCGType type = vecl + TCG_TYPE_V64;
     TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
@@ -3074,6 +3135,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_sub_vec:
         insn = sub_op[vece];
         break;
+    case INDEX_op_mul_vec:
+        tcg_debug_assert(vece == MO_32 && have_isa_2_07_vsx);
+        insn = VMULUWM;
+        break;
     case INDEX_op_ssadd_vec:
         insn = ssadd_op[vece];
         break;
@@ -3123,6 +3188,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         insn = VNOR;
         a2 = a1;
         break;
+    case INDEX_op_orc_vec:
+        insn = VORC;
+        break;
 
     case INDEX_op_cmp_vec:
         switch (args[3]) {
@@ -3203,7 +3271,7 @@ static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
 {
     bool need_swap = false, need_inv = false;
 
-    tcg_debug_assert(vece <= MO_32);
+    tcg_debug_assert(vece <= MO_32 || have_isa_2_07_vsx);
 
     switch (cond) {
     case TCG_COND_EQ:
@@ -3267,6 +3335,7 @@ static void expand_vec_mul(TCGType type, unsigned vece, TCGv_vec v0,
 	break;
 
     case MO_32:
+        tcg_debug_assert(!have_isa_2_07_vsx);
         t3 = tcg_temp_new_vec(type);
         t4 = tcg_temp_new_vec(type);
         tcg_gen_dupi_vec(MO_8, t4, -16);
@@ -3562,6 +3631,11 @@ static void tcg_target_init(TCGContext *s)
             have_isa_2_06_vsx = true;
         }
     }
+    if (hwcap2 & PPC_FEATURE2_ARCH_2_07) {
+        if (hwcap & PPC_FEATURE_HAS_VSX) {
+            have_isa_2_07_vsx = true;
+        }
+    }
 #ifdef PPC_FEATURE2_ARCH_3_00
     if (hwcap2 & PPC_FEATURE2_ARCH_3_00) {
         have_isa_3_00 = true;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v6 16/16] tcg/ppc: Update vector support to v3.00
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (14 preceding siblings ...)
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 15/16] tcg/ppc: Update vector support to v2.07 Richard Henderson
@ 2019-06-29 13:00 ` Richard Henderson
  2019-06-29 13:37 ` [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes no-reply
  2019-06-30 17:58 ` Mark Cave-Ayland
  17 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2019-06-29 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, amarkovic, hsp.cat7

This includes vector load/store with immediate offset, some extra
move and splat insns, compare ne, and negate.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     |   3 +-
 tcg/ppc/tcg-target.inc.c | 103 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 94 insertions(+), 12 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index b8355d0a56..533f0ef510 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -63,6 +63,7 @@ extern bool have_isa_2_06;
 extern bool have_isa_2_06_vsx;
 extern bool have_isa_2_07_vsx;
 extern bool have_isa_3_00;
+extern bool have_isa_3_00_vsx;
 
 /* optional instructions automatically implemented */
 #define TCG_TARGET_HAS_ext8u_i32        0 /* andi */
@@ -150,7 +151,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_andc_vec         1
 #define TCG_TARGET_HAS_orc_vec          have_isa_2_07_vsx
 #define TCG_TARGET_HAS_not_vec          1
-#define TCG_TARGET_HAS_neg_vec          0
+#define TCG_TARGET_HAS_neg_vec          have_isa_3_00_vsx
 #define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_shi_vec          0
 #define TCG_TARGET_HAS_shs_vec          0
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index af86ab07dd..6715f29d4a 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -69,6 +69,7 @@ bool have_isa_2_06;
 bool have_isa_2_06_vsx;
 bool have_isa_2_07_vsx;
 bool have_isa_3_00;
+bool have_isa_3_00_vsx;
 
 #define HAVE_ISA_2_06  have_isa_2_06
 #define HAVE_ISEL      have_isa_2_06
@@ -475,11 +476,16 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define LXSDX      XO31(588)      /* v2.06 */
 #define LXVDSX     XO31(332)      /* v2.06 */
 #define LXSIWZX    XO31(12)       /* v2.07 */
+#define LXV        (OPCD(61) | 1) /* v3.00 */
+#define LXSD       (OPCD(57) | 2) /* v3.00 */
+#define LXVWSX     XO31(364)      /* v3.00 */
 
 #define STVX       XO31(231)
 #define STVEWX     XO31(199)
 #define STXSDX     XO31(716)      /* v2.06 */
 #define STXSIWX    XO31(140)      /* v2.07 */
+#define STXV       (OPCD(61) | 5) /* v3.00 */
+#define STXSD      (OPCD(61) | 2) /* v3.00 */
 
 #define VADDSBS    VX4(768)
 #define VADDUBS    VX4(512)
@@ -503,6 +509,9 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define VSUBUWM    VX4(1152)
 #define VSUBUDM    VX4(1216)      /* v2.07 */
 
+#define VNEGW      (VX4(1538) | (6 << 16))  /* v3.00 */
+#define VNEGD      (VX4(1538) | (7 << 16))  /* v3.00 */
+
 #define VMAXSB     VX4(258)
 #define VMAXSH     VX4(322)
 #define VMAXSW     VX4(386)
@@ -532,6 +541,9 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define VCMPGTUH   VX4(582)
 #define VCMPGTUW   VX4(646)
 #define VCMPGTUD   VX4(711)       /* v2.07 */
+#define VCMPNEB    VX4(7)         /* v3.00 */
+#define VCMPNEH    VX4(71)        /* v3.00 */
+#define VCMPNEW    VX4(135)       /* v3.00 */
 
 #define VSLB       VX4(260)
 #define VSLH       VX4(324)
@@ -589,11 +601,14 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 
 #define XXPERMDI   (OPCD(60) | (10 << 3))   /* v2.06 */
 #define XXSEL      (OPCD(60) | (3 << 4))    /* v2.06 */
+#define XXSPLTIB   (OPCD(60) | (360 << 1))  /* v3.00 */
 
 #define MFVSRD     XO31(51)       /* v2.07 */
 #define MFVSRWZ    XO31(115)      /* v2.07 */
 #define MTVSRD     XO31(179)      /* v2.07 */
 #define MTVSRWZ    XO31(179)      /* v2.07 */
+#define MTVSRDD    XO31(435)      /* v3.00 */
+#define MTVSRWS    XO31(403)      /* v3.00 */
 
 #define RT(r) ((r)<<21)
 #define RS(r) ((r)<<21)
@@ -924,6 +939,10 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
             return;
         }
     }
+    if (have_isa_3_00_vsx && val == (tcg_target_long)dup_const(MO_8, val)) {
+        tcg_out32(s, XXSPLTIB | VRT(ret) | ((val & 0xff) << 11) | 1);
+        return;
+    }
 
     /*
      * Otherwise we must load the value from the constant pool.
@@ -1112,7 +1131,7 @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
                              TCGReg base, tcg_target_long offset)
 {
     tcg_target_long orig = offset, l0, l1, extra = 0, align = 0;
-    bool is_store = false;
+    bool is_int_store = false;
     TCGReg rs = TCG_REG_TMP1;
 
     switch (opi) {
@@ -1125,11 +1144,20 @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
             break;
         }
         break;
+    case LXSD:
+    case STXSD:
+        align = 3;
+        break;
+    case LXV: case LXV | 8:
+    case STXV: case STXV | 8:
+        /* The |8 cases force altivec registers.  */
+        align = 15;
+        break;
     case STD:
         align = 3;
         /* FALLTHRU */
     case STB: case STH: case STW:
-        is_store = true;
+        is_int_store = true;
         break;
     }
 
@@ -1138,7 +1166,7 @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
         if (rs == base) {
             rs = TCG_REG_R0;
         }
-        tcg_debug_assert(!is_store || rs != rt);
+        tcg_debug_assert(!is_int_store || rs != rt);
         tcg_out_movi(s, TCG_TYPE_PTR, rs, orig);
         tcg_out32(s, opx | TAB(rt & 31, base, rs));
         return;
@@ -1203,7 +1231,8 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
     case TCG_TYPE_V64:
         tcg_debug_assert(ret >= TCG_REG_V0);
         if (have_isa_2_06_vsx) {
-            tcg_out_mem_long(s, 0, LXSDX | 1, ret, base, offset);
+            tcg_out_mem_long(s, have_isa_3_00_vsx ? LXSD : 0, LXSDX | 1,
+                             ret, base, offset);
             break;
         }
         assert((offset & 7) == 0);
@@ -1215,7 +1244,8 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
     case TCG_TYPE_V128:
         tcg_debug_assert(ret >= TCG_REG_V0);
         assert((offset & 15) == 0);
-        tcg_out_mem_long(s, 0, LVX, ret, base, offset);
+        tcg_out_mem_long(s, have_isa_3_00_vsx ? LXV | 8 : 0, LVX,
+                         ret, base, offset);
         break;
     default:
         g_assert_not_reached();
@@ -1255,7 +1285,8 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
     case TCG_TYPE_V64:
         tcg_debug_assert(arg >= TCG_REG_V0);
         if (have_isa_2_06_vsx) {
-            tcg_out_mem_long(s, 0, STXSDX | 1, arg, base, offset);
+            tcg_out_mem_long(s, have_isa_3_00_vsx ? STXSD : 0,
+                             STXSDX | 1, arg, base, offset);
             break;
         }
         assert((offset & 7) == 0);
@@ -1268,7 +1299,8 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
         break;
     case TCG_TYPE_V128:
         tcg_debug_assert(arg >= TCG_REG_V0);
-        tcg_out_mem_long(s, 0, STVX, arg, base, offset);
+        tcg_out_mem_long(s, have_isa_3_00_vsx ? STXV | 8 : 0, STVX,
+                         arg, base, offset);
         break;
     default:
         g_assert_not_reached();
@@ -2979,6 +3011,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_shri_vec:
     case INDEX_op_sari_vec:
         return vece <= MO_32 || have_isa_2_07_vsx ? -1 : 0;
+    case INDEX_op_neg_vec:
+        return vece >= MO_32 && have_isa_3_00_vsx;
     case INDEX_op_mul_vec:
         switch (vece) {
         case MO_8:
@@ -2999,7 +3033,22 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
                             TCGReg dst, TCGReg src)
 {
     tcg_debug_assert(dst >= TCG_REG_V0);
-    tcg_debug_assert(src >= TCG_REG_V0);
+
+    /* Splat from integer reg allowed via constraints for v3.00.  */
+    if (src < TCG_REG_V0) {
+        tcg_debug_assert(have_isa_3_00_vsx);
+        switch (vece) {
+        case MO_64:
+            tcg_out32(s, MTVSRDD | 1 | VRT(dst) | RA(src) | RB(src));
+            return true;
+        case MO_32:
+            tcg_out32(s, MTVSRWS | 1 | VRT(dst) | RA(src));
+            return true;
+        default:
+            /* Fail, so that we fall back on either dupm or mov+dup.  */
+            return false;
+        }
+    }
 
     /*
      * Recall we use (or emulate) VSX integer loads, so the integer is
@@ -3037,7 +3086,11 @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
     tcg_debug_assert(out >= TCG_REG_V0);
     switch (vece) {
     case MO_8:
-        tcg_out_mem_long(s, 0, LVEBX, out, base, offset);
+        if (have_isa_3_00_vsx) {
+            tcg_out_mem_long(s, LXV | 8, LVX, out, base, offset & -16);
+        } else {
+            tcg_out_mem_long(s, 0, LVEBX, out, base, offset);
+        }
         elt = extract32(offset, 0, 4);
 #ifndef HOST_WORDS_BIGENDIAN
         elt ^= 15;
@@ -3046,7 +3099,11 @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
         break;
     case MO_16:
         assert((offset & 1) == 0);
-        tcg_out_mem_long(s, 0, LVEHX, out, base, offset);
+        if (have_isa_3_00_vsx) {
+            tcg_out_mem_long(s, LXV | 8, LVX, out, base, offset & -16);
+        } else {
+            tcg_out_mem_long(s, 0, LVEHX, out, base, offset);
+        }
         elt = extract32(offset, 1, 3);
 #ifndef HOST_WORDS_BIGENDIAN
         elt ^= 7;
@@ -3054,6 +3111,10 @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
         tcg_out32(s, VSPLTH | VRT(out) | VRB(out) | (elt << 16));
         break;
     case MO_32:
+        if (have_isa_3_00_vsx) {
+            tcg_out_mem_long(s, 0, LXVWSX | 1, out, base, offset);
+            break;
+        }
         assert((offset & 3) == 0);
         tcg_out_mem_long(s, 0, LVEWX, out, base, offset);
         elt = extract32(offset, 2, 2);
@@ -3093,7 +3154,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     static const uint32_t
         add_op[4] = { VADDUBM, VADDUHM, VADDUWM, VADDUDM },
         sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, VSUBUDM },
+        neg_op[4] = { 0, 0, VNEGW, VNEGD },
         eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, VCMPEQUD },
+        ne_op[4]  = { VCMPNEB, VCMPNEH, VCMPNEW, 0 },
         gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, VCMPGTSD },
         gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, VCMPGTUD },
         ssadd_op[4] = { VADDSBS, VADDSHS, VADDSWS, 0 },
@@ -3135,6 +3198,11 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_sub_vec:
         insn = sub_op[vece];
         break;
+    case INDEX_op_neg_vec:
+        insn = neg_op[vece];
+        a2 = a1;
+        a1 = 0;
+        break;
     case INDEX_op_mul_vec:
         tcg_debug_assert(vece == MO_32 && have_isa_2_07_vsx);
         insn = VMULUWM;
@@ -3197,6 +3265,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         case TCG_COND_EQ:
             insn = eq_op[vece];
             break;
+        case TCG_COND_NE:
+            insn = ne_op[vece];
+            break;
         case TCG_COND_GT:
             insn = gts_op[vece];
             break;
@@ -3279,6 +3350,10 @@ static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
     case TCG_COND_GTU:
         break;
     case TCG_COND_NE:
+        if (have_isa_3_00_vsx && vece <= MO_32) {
+            break;
+        }
+        /* fall through */
     case TCG_COND_LE:
     case TCG_COND_LEU:
         need_inv = true;
@@ -3434,6 +3509,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef sub2
         = { .args_ct_str = { "r", "r", "rI", "rZM", "r", "r" } };
     static const TCGTargetOpDef v_r = { .args_ct_str = { "v", "r" } };
+    static const TCGTargetOpDef v_vr = { .args_ct_str = { "v", "vr" } };
     static const TCGTargetOpDef v_v = { .args_ct_str = { "v", "v" } };
     static const TCGTargetOpDef v_v_v = { .args_ct_str = { "v", "v", "v" } };
     static const TCGTargetOpDef v_v_v_v
@@ -3602,8 +3678,10 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_dup2_vec:
         return &v_v_v;
     case INDEX_op_not_vec:
-    case INDEX_op_dup_vec:
+    case INDEX_op_neg_vec:
         return &v_v;
+    case INDEX_op_dup_vec:
+        return have_isa_3_00_vsx ? &v_vr : &v_v;
     case INDEX_op_ld_vec:
     case INDEX_op_st_vec:
     case INDEX_op_dupm_vec:
@@ -3639,6 +3717,9 @@ static void tcg_target_init(TCGContext *s)
 #ifdef PPC_FEATURE2_ARCH_3_00
     if (hwcap2 & PPC_FEATURE2_ARCH_3_00) {
         have_isa_3_00 = true;
+        if (hwcap & PPC_FEATURE_HAS_VSX) {
+            have_isa_3_00_vsx = true;
+        }
     }
 #endif
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (15 preceding siblings ...)
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 16/16] tcg/ppc: Update vector support to v3.00 Richard Henderson
@ 2019-06-29 13:37 ` no-reply
  2019-06-30 17:58 ` Mark Cave-Ayland
  17 siblings, 0 replies; 40+ messages in thread
From: no-reply @ 2019-06-29 13:37 UTC (permalink / raw)
  To: richard.henderson; +Cc: mark.cave-ayland, qemu-devel, amarkovic, hsp.cat7

Patchew URL: https://patchew.org/QEMU/20190629130017.2973-1-richard.henderson@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes
Message-id: 20190629130017.2973-1-richard.henderson@linaro.org
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
735e428 tcg/ppc: Update vector support to v3.00
d5df8ce tcg/ppc: Update vector support to v2.07
70bae8c tcg/ppc: Update vector support to v2.06
cdcb6fd tcg/ppc: Enable Altivec detection
5eca04a tcg/ppc: Support vector dup2
9a92a5b tcg/ppc: Support vector multiply
9dcbbb5 tcg/ppc: Support vector shift by immediate
5707cff tcg/ppc: Prepare case for vector multiply
4e8c856 tcg/ppc: Add support for vector saturated add/subtract
8542349 tcg/ppc: Add support for vector add/subtract
09dcca3 tcg/ppc: Add support for vector maximum/minimum
940d802 tcg/ppc: Add support for load/store/logic/comparison
1354b48 tcg/ppc: Enable tcg backend vector compilation
ce65dc7 tcg/ppc: Introduce macros VRT(), VRA(), VRB(), VRC()
c15e076 tcg/ppc: Introduce macro VX4()
a351796 tcg/ppc: Introduce Altivec registers

=== OUTPUT BEGIN ===
1/16 Checking commit a35179674cf2 (tcg/ppc: Introduce Altivec registers)
2/16 Checking commit c15e076c7d0f (tcg/ppc: Introduce macro VX4())
ERROR: spaces required around that '|' (ctx:VxV)
#21: FILE: tcg/ppc/tcg-target.inc.c:322:
+#define VX4(opc)  (OPCD(4)|(opc))
                           ^

total: 1 errors, 0 warnings, 7 lines checked

Patch 2/16 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

3/16 Checking commit ce65dc76f743 (tcg/ppc: Introduce macros VRT(), VRA(), VRB(), VRC())
4/16 Checking commit 1354b48a4dce (tcg/ppc: Enable tcg backend vector compilation)
WARNING: Block comments use a leading /* on a separate line
#155: FILE: tcg/ppc/tcg-target.inc.c:2842:
+    if (hwcap & /* PPC_FEATURE_HAS_ALTIVEC -- NOT YET */ 0) {

WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#173: 
new file mode 100644

total: 0 errors, 2 warnings, 138 lines checked

Patch 4/16 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
5/16 Checking commit 940d8027994d (tcg/ppc: Add support for load/store/logic/comparison)
6/16 Checking commit 09dcca3c9f87 (tcg/ppc: Add support for vector maximum/minimum)
7/16 Checking commit 8542349a45f4 (tcg/ppc: Add support for vector add/subtract)
8/16 Checking commit 4e8c8565186d (tcg/ppc: Add support for vector saturated add/subtract)
9/16 Checking commit 5707cff60faf (tcg/ppc: Prepare case for vector multiply)
10/16 Checking commit 9dcbbb561046 (tcg/ppc: Support vector shift by immediate)
11/16 Checking commit 9a92a5bffebd (tcg/ppc: Support vector multiply)
ERROR: code indent should never use tabs
#133: FILE: tcg/ppc/tcg-target.inc.c:3220:
+^Ibreak;$

total: 1 errors, 0 warnings, 185 lines checked

Patch 11/16 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

12/16 Checking commit 5eca04a86aea (tcg/ppc: Support vector dup2)
13/16 Checking commit cdcb6fdbe190 (tcg/ppc: Enable Altivec detection)
14/16 Checking commit 70bae8c6b3df (tcg/ppc: Update vector support to v2.06)
15/16 Checking commit d5df8cec3718 (tcg/ppc: Update vector support to v2.07)
16/16 Checking commit 735e428f5f2a (tcg/ppc: Update vector support to v3.00)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190629130017.2973-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 04/16] tcg/ppc: Enable tcg backend vector compilation
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 04/16] tcg/ppc: Enable tcg backend vector compilation Richard Henderson
@ 2019-06-30  9:46   ` Aleksandar Markovic
  2019-06-30 10:48     ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Aleksandar Markovic @ 2019-06-30  9:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: mark.cave-ayland, qemu-devel, amarkovic, hsp.cat7

On Saturday, June 29, 2019, Richard Henderson <richard.henderson@linaro.org>
wrote:

> Introduce all of the flags required to enable tcg backend vector support,
> and a runtime flag to indicate the host supports Altivec instructions.
>
>
If two flags have different purpose and usage, it is better that they
have different
names. (perhaps one of them should have the suffix “_runtime“)

Also, I am not sure if Altiveec can be reffered as isa, it is a
part/extension of an isa, so “isa” seems  superfluous here.

checkpatch warning should also be honored.


> For now, do not actually set have_isa_altivec to true, because we have not
> yet added all of the code to actually generate all of the required insns.
> However, we must define these flags in order to disable ifndefs that create
> stub versions of the functions added here.
>
> The change to tcg_out_movi works around a buglet in tcg.c wherein if we
> do not define tcg_out_dupi_vec we get a declared but not defined Werror,
> but if we only declare it we get a defined but not used Werror.  We need
> to this change to tcg_out_movi eventually anyway, so it's no biggie.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
> ---
>  tcg/ppc/tcg-target.h     | 25 ++++++++++++++++
>  tcg/ppc/tcg-target.opc.h |  5 ++++
>  tcg/ppc/tcg-target.inc.c | 65 ++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 92 insertions(+), 3 deletions(-)
>  create mode 100644 tcg/ppc/tcg-target.opc.h
>
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index 690fa744e1..f6283f468b 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -58,6 +58,7 @@ typedef enum {
>      TCG_AREG0 = TCG_REG_R27
>  } TCGReg;
>
> +extern bool have_isa_altivec;
>  extern bool have_isa_2_06;
>  extern bool have_isa_3_00;
>
> @@ -135,6 +136,30 @@ extern bool have_isa_3_00;
>  #define TCG_TARGET_HAS_mulsh_i64        1
>  #endif
>
> +/*
> + * While technically Altivec could support V64, it has no 64-bit store
> + * instruction and substituting two 32-bit stores makes the generated
> + * code quite large.
> + */
> +#define TCG_TARGET_HAS_v64              0
> +#define TCG_TARGET_HAS_v128             have_isa_altivec
> +#define TCG_TARGET_HAS_v256             0
> +
> +#define TCG_TARGET_HAS_andc_vec         0
> +#define TCG_TARGET_HAS_orc_vec          0
> +#define TCG_TARGET_HAS_not_vec          0
> +#define TCG_TARGET_HAS_neg_vec          0
> +#define TCG_TARGET_HAS_abs_vec          0
> +#define TCG_TARGET_HAS_shi_vec          0
> +#define TCG_TARGET_HAS_shs_vec          0
> +#define TCG_TARGET_HAS_shv_vec          0
> +#define TCG_TARGET_HAS_cmp_vec          0
> +#define TCG_TARGET_HAS_mul_vec          0
> +#define TCG_TARGET_HAS_sat_vec          0
> +#define TCG_TARGET_HAS_minmax_vec       0
> +#define TCG_TARGET_HAS_bitsel_vec       0
> +#define TCG_TARGET_HAS_cmpsel_vec       0
> +
>  void flush_icache_range(uintptr_t start, uintptr_t stop);
>  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
>
> diff --git a/tcg/ppc/tcg-target.opc.h b/tcg/ppc/tcg-target.opc.h
> new file mode 100644
> index 0000000000..fa680dd6a0
> --- /dev/null
> +++ b/tcg/ppc/tcg-target.opc.h
> @@ -0,0 +1,5 @@
> +/*
> + * Target-specific opcodes for host vector expansion.  These will be
> + * emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
> + * consider these to be UNSPEC with names.
> + */
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index cfbd7ff12c..b938e9aac5 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -64,6 +64,7 @@
>
>  static tcg_insn_unit *tb_ret_addr;
>
> +bool have_isa_altivec;
>  bool have_isa_2_06;
>  bool have_isa_3_00;
>
> @@ -717,10 +718,31 @@ static void tcg_out_movi_int(TCGContext *s, TCGType
> type, TCGReg ret,
>      }
>  }
>
> -static inline void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
> -                                tcg_target_long arg)
> +static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
> +                             tcg_target_long val)
>  {
> -    tcg_out_movi_int(s, type, ret, arg, false);
> +    g_assert_not_reached();
> +}
> +
> +static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
> +                         tcg_target_long arg)
> +{
> +    switch (type) {
> +    case TCG_TYPE_I32:
> +    case TCG_TYPE_I64:
> +        tcg_debug_assert(ret < TCG_REG_V0);
> +        tcg_out_movi_int(s, type, ret, arg, false);
> +        break;
> +
> +    case TCG_TYPE_V64:
> +    case TCG_TYPE_V128:
> +        tcg_debug_assert(ret >= TCG_REG_V0);
> +        tcg_out_dupi_vec(s, type, ret, arg);
> +        break;
> +
> +    default:
> +        g_assert_not_reached();
> +    }
>  }
>
>  static bool mask_operand(uint32_t c, int *mb, int *me)
> @@ -2605,6 +2627,36 @@ static void tcg_out_op(TCGContext *s, TCGOpcode
> opc, const TCGArg *args,
>      }
>  }
>
> +int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
> +{
> +    g_assert_not_reached();
> +}
> +
> +static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
> +                            TCGReg dst, TCGReg src)
> +{
> +    g_assert_not_reached();
> +}
> +
> +static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
> +                             TCGReg out, TCGReg base, intptr_t offset)
> +{
> +    g_assert_not_reached();
> +}
> +
> +static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
> +                           unsigned vecl, unsigned vece,
> +                           const TCGArg *args, const int *const_args)
> +{
> +    g_assert_not_reached();
> +}
> +
> +void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
> +                       TCGArg a0, ...)
> +{
> +    g_assert_not_reached();
> +}
> +
>  static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
>  {
>      static const TCGTargetOpDef r = { .args_ct_str = { "r" } };
> @@ -2787,6 +2839,9 @@ static void tcg_target_init(TCGContext *s)
>      unsigned long hwcap = qemu_getauxval(AT_HWCAP);
>      unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2);
>
> +    if (hwcap & /* PPC_FEATURE_HAS_ALTIVEC -- NOT YET */ 0) {
> +        have_isa_altivec = true;
> +    }
>      if (hwcap & PPC_FEATURE_ARCH_2_06) {
>          have_isa_2_06 = true;
>      }
> @@ -2798,6 +2853,10 @@ static void tcg_target_init(TCGContext *s)
>
>      tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
>      tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
> +    if (have_isa_altivec) {
> +        tcg_target_available_regs[TCG_TYPE_V64] = 0xffffffff00000000ull;
> +        tcg_target_available_regs[TCG_TYPE_V128] = 0xffffffff00000000ull;
> +    }
>
>      tcg_target_call_clobber_regs = 0;
>      tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R0);
> --
> 2.17.1
>
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 09/16] tcg/ppc: Prepare case for vector multiply
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 09/16] tcg/ppc: Prepare case for vector multiply Richard Henderson
@ 2019-06-30  9:52   ` Aleksandar Markovic
  2019-06-30 10:49     ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Aleksandar Markovic @ 2019-06-30  9:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: mark.cave-ayland, qemu-devel, amarkovic, hsp.cat7

On Saturday, June 29, 2019, Richard Henderson <richard.henderson@linaro.org>
wrote:

> This line is just preparation for full vector multiply support
> in some of subsequent patches.
>
>
This patch should be aquashed into the patch on implementing multiply.



> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
> ---
>  tcg/ppc/tcg-target.inc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index 307e809fad..e19400609c 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -3306,6 +3306,7 @@ static const TCGTargetOpDef
> *tcg_target_op_def(TCGOpcode op)
>
>      case INDEX_op_add_vec:
>      case INDEX_op_sub_vec:
> +    case INDEX_op_mul_vec:
>      case INDEX_op_and_vec:
>      case INDEX_op_or_vec:
>      case INDEX_op_xor_vec:
> --
> 2.17.1
>
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 04/16] tcg/ppc: Enable tcg backend vector compilation
  2019-06-30  9:46   ` Aleksandar Markovic
@ 2019-06-30 10:48     ` Richard Henderson
  2019-06-30 11:45       ` Aleksandar Markovic
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2019-06-30 10:48 UTC (permalink / raw)
  To: Aleksandar Markovic; +Cc: mark.cave-ayland, qemu-devel, amarkovic, hsp.cat7

On 6/30/19 11:46 AM, Aleksandar Markovic wrote:
> 
> 
> On Saturday, June 29, 2019, Richard Henderson <richard.henderson@linaro.org
> <mailto:richard.henderson@linaro.org>> wrote:
> 
>     Introduce all of the flags required to enable tcg backend vector support,
>     and a runtime flag to indicate the host supports Altivec instructions.
> 
> 
> If two flags have different purpose and usage, it is better that they
> have different names. (perhaps one of them should have the suffix “_runtime“)

Huh?  They do have different names.  Very different names.

> Also, I am not sure if Altiveec can be reffered as isa, it is a part/extension
> of an isa, so “isa” seems  superfluous here.

It also matches the other existing names, so I'll leave it as is.

> checkpatch warning should also be honored.

It's bogus.

> WARNING: Block comments use a leading /* on a separate line
> #155: FILE: tcg/ppc/tcg-target.inc.c:2842:
> +    if (hwcap & /* PPC_FEATURE_HAS_ALTIVEC -- NOT YET */ 0) {

It's not a block comment; the whole thing is on one line.
I have no idea why it doesn't notice.

In any case, this goes away in patch 13.


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 09/16] tcg/ppc: Prepare case for vector multiply
  2019-06-30  9:52   ` Aleksandar Markovic
@ 2019-06-30 10:49     ` Richard Henderson
  2019-06-30 11:35       ` Aleksandar Markovic
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2019-06-30 10:49 UTC (permalink / raw)
  To: Aleksandar Markovic; +Cc: mark.cave-ayland, qemu-devel, amarkovic, hsp.cat7

On 6/30/19 11:52 AM, Aleksandar Markovic wrote:
> 
> 
> On Saturday, June 29, 2019, Richard Henderson <richard.henderson@linaro.org
> <mailto:richard.henderson@linaro.org>> wrote:
> 
>     This line is just preparation for full vector multiply support
>     in some of subsequent patches.
> 
> 
> This patch should be aquashed into the patch on implementing multiply.

Yes it should.

Incidentally, why did you split it out in the first place?


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 09/16] tcg/ppc: Prepare case for vector multiply
  2019-06-30 10:49     ` Richard Henderson
@ 2019-06-30 11:35       ` Aleksandar Markovic
  0 siblings, 0 replies; 40+ messages in thread
From: Aleksandar Markovic @ 2019-06-30 11:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: mark.cave-ayland, qemu-devel, amarkovic, hsp.cat7

On Jun 30, 2019 12:49 PM, "Richard Henderson" <richard.henderson@linaro.org>
wrote:
>
> On 6/30/19 11:52 AM, Aleksandar Markovic wrote:
> >
> >
> > On Saturday, June 29, 2019, Richard Henderson <
richard.henderson@linaro.org
> > <mailto:richard.henderson@linaro.org>> wrote:
> >
> >     This line is just preparation for full vector multiply support
> >     in some of subsequent patches.
> >
> >
> > This patch should be aquashed into the patch on implementing multiply.
>
> Yes it should.
>
> Incidentally, why did you split it out in the first place?
>

I wanted to split the patch 1 from v4 into smaller patches, and use
remaining v4 patches as-is, so I did not want to meld this segment (from
patch 1) and one of remaining patches (mul), otherwise remaining patches
wouldn't be as-is.

But v5 was done mainly for debugging purposes. Normally, two mentioned
partches on “multiply” should obviously be melded.

>
> r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 04/16] tcg/ppc: Enable tcg backend vector compilation
  2019-06-30 10:48     ` Richard Henderson
@ 2019-06-30 11:45       ` Aleksandar Markovic
  0 siblings, 0 replies; 40+ messages in thread
From: Aleksandar Markovic @ 2019-06-30 11:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: mark.cave-ayland, qemu-devel, amarkovic, hsp.cat7

On Jun 30, 2019 12:48 PM, "Richard Henderson" <richard.henderson@linaro.org>
wrote:
>
> On 6/30/19 11:46 AM, Aleksandar Markovic wrote:
> >
> >
> > On Saturday, June 29, 2019, Richard Henderson <
richard.henderson@linaro.org
> > <mailto:richard.henderson@linaro.org>> wrote:
> >
> >     Introduce all of the flags required to enable tcg backend vector
support,
> >     and a runtime flag to indicate the host supports Altivec
instructions.
> >
> >
> > If two flags have different purpose and usage, it is better that they
> > have different names. (perhaps one of them should have the suffix
“_runtime“)
>
> Huh?  They do have different names.  Very different names.
>

They do. If you leave the same name, you would make any search for that
name during future debugging/development more difficult.

> > Also, I am not sure if Altiveec can be reffered as isa, it is a
part/extension
> > of an isa, so “isa” seems  superfluous here.
>
> It also matches the other existing names, so I'll leave it as is.
>

If something is wrong in the old code, it does not mean one should continue
the same practice.

> > checkpatch warning should also be honored.
>
> It's bogus.

I don't think it is bogus. The comment should be converted to a regular
one-line ot perhaps multiline comment before if-statement. Althoug it may
be correct in the sense of C-syntax, noone expects coment to be inlined
into if-condition, and it makes the code feel obfuscated, rather than clear.

> > WARNING: Block comments use a leading /* on a separate line
> > #155: FILE: tcg/ppc/tcg-target.inc.c:2842:
> > +    if (hwcap & /* PPC_FEATURE_HAS_ALTIVEC -- NOT YET */ 0) {
>
> It's not a block comment; the whole thing is on one line.
> I have no idea why it doesn't notice.
>
> In any case, this goes away in patch 13.
>
>
> r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 15/16] tcg/ppc: Update vector support to v2.07
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 15/16] tcg/ppc: Update vector support to v2.07 Richard Henderson
@ 2019-06-30 11:50   ` Aleksandar Markovic
  2019-06-30 13:37   ` Aleksandar Markovic
  1 sibling, 0 replies; 40+ messages in thread
From: Aleksandar Markovic @ 2019-06-30 11:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: mark.cave-ayland, qemu-devel, amarkovic, hsp.cat7

On Jun 29, 2019 3:14 PM, "Richard Henderson" <richard.henderson@linaro.org>
wrote:
>
> This includes single-word loads and stores, lots of double-word
> arithmetic, and a few extra logical operations.
>

This patch should be split into several units (so, treat shift, compare,
etc. separately).

The same for other similar patches from this series.

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
> ---
>  tcg/ppc/tcg-target.h     |   3 +-
>  tcg/ppc/tcg-target.inc.c | 128 ++++++++++++++++++++++++++++++---------
>  2 files changed, 103 insertions(+), 28 deletions(-)
>
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index 40544f996d..b8355d0a56 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -61,6 +61,7 @@ typedef enum {
>  extern bool have_isa_altivec;
>  extern bool have_isa_2_06;
>  extern bool have_isa_2_06_vsx;
> +extern bool have_isa_2_07_vsx;
>  extern bool have_isa_3_00;
>
>  /* optional instructions automatically implemented */
> @@ -147,7 +148,7 @@ extern bool have_isa_3_00;
>  #define TCG_TARGET_HAS_v256             0
>
>  #define TCG_TARGET_HAS_andc_vec         1
> -#define TCG_TARGET_HAS_orc_vec          0
> +#define TCG_TARGET_HAS_orc_vec          have_isa_2_07_vsx
>  #define TCG_TARGET_HAS_not_vec          1
>  #define TCG_TARGET_HAS_neg_vec          0
>  #define TCG_TARGET_HAS_abs_vec          0
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index 50d1b5612c..af86ab07dd 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -67,6 +67,7 @@ static tcg_insn_unit *tb_ret_addr;
>  bool have_isa_altivec;
>  bool have_isa_2_06;
>  bool have_isa_2_06_vsx;
> +bool have_isa_2_07_vsx;
>  bool have_isa_3_00;
>
>  #define HAVE_ISA_2_06  have_isa_2_06
> @@ -473,10 +474,12 @@ static int tcg_target_const_match(tcg_target_long
val, TCGType type,
>  #define LVEWX      XO31(71)
>  #define LXSDX      XO31(588)      /* v2.06 */
>  #define LXVDSX     XO31(332)      /* v2.06 */
> +#define LXSIWZX    XO31(12)       /* v2.07 */
>
>  #define STVX       XO31(231)
>  #define STVEWX     XO31(199)
>  #define STXSDX     XO31(716)      /* v2.06 */
> +#define STXSIWX    XO31(140)      /* v2.07 */
>
>  #define VADDSBS    VX4(768)
>  #define VADDUBS    VX4(512)
> @@ -487,6 +490,7 @@ static int tcg_target_const_match(tcg_target_long
val, TCGType type,
>  #define VADDSWS    VX4(896)
>  #define VADDUWS    VX4(640)
>  #define VADDUWM    VX4(128)
> +#define VADDUDM    VX4(192)       /* v2.07 */
>
>  #define VSUBSBS    VX4(1792)
>  #define VSUBUBS    VX4(1536)
> @@ -497,47 +501,62 @@ static int tcg_target_const_match(tcg_target_long
val, TCGType type,
>  #define VSUBSWS    VX4(1920)
>  #define VSUBUWS    VX4(1664)
>  #define VSUBUWM    VX4(1152)
> +#define VSUBUDM    VX4(1216)      /* v2.07 */
>
>  #define VMAXSB     VX4(258)
>  #define VMAXSH     VX4(322)
>  #define VMAXSW     VX4(386)
> +#define VMAXSD     VX4(450)       /* v2.07 */
>  #define VMAXUB     VX4(2)
>  #define VMAXUH     VX4(66)
>  #define VMAXUW     VX4(130)
> +#define VMAXUD     VX4(194)       /* v2.07 */
>  #define VMINSB     VX4(770)
>  #define VMINSH     VX4(834)
>  #define VMINSW     VX4(898)
> +#define VMINSD     VX4(962)       /* v2.07 */
>  #define VMINUB     VX4(514)
>  #define VMINUH     VX4(578)
>  #define VMINUW     VX4(642)
> +#define VMINUD     VX4(706)       /* v2.07 */
>
>  #define VCMPEQUB   VX4(6)
>  #define VCMPEQUH   VX4(70)
>  #define VCMPEQUW   VX4(134)
> +#define VCMPEQUD   VX4(199)       /* v2.07 */
>  #define VCMPGTSB   VX4(774)
>  #define VCMPGTSH   VX4(838)
>  #define VCMPGTSW   VX4(902)
> +#define VCMPGTSD   VX4(967)       /* v2.07 */
>  #define VCMPGTUB   VX4(518)
>  #define VCMPGTUH   VX4(582)
>  #define VCMPGTUW   VX4(646)
> +#define VCMPGTUD   VX4(711)       /* v2.07 */
>
>  #define VSLB       VX4(260)
>  #define VSLH       VX4(324)
>  #define VSLW       VX4(388)
> +#define VSLD       VX4(1476)      /* v2.07 */
>  #define VSRB       VX4(516)
>  #define VSRH       VX4(580)
>  #define VSRW       VX4(644)
> +#define VSRD       VX4(1732)      /* v2.07 */
>  #define VSRAB      VX4(772)
>  #define VSRAH      VX4(836)
>  #define VSRAW      VX4(900)
> +#define VSRAD      VX4(964)       /* v2.07 */
>  #define VRLB       VX4(4)
>  #define VRLH       VX4(68)
>  #define VRLW       VX4(132)
> +#define VRLD       VX4(196)       /* v2.07 */
>
>  #define VMULEUB    VX4(520)
>  #define VMULEUH    VX4(584)
> +#define VMULEUW    VX4(648)       /* v2.07 */
>  #define VMULOUB    VX4(8)
>  #define VMULOUH    VX4(72)
> +#define VMULOUW    VX4(136)       /* v2.07 */
> +#define VMULUWM    VX4(137)       /* v2.07 */
>  #define VMSUMUHM   VX4(38)
>
>  #define VMRGHB     VX4(12)
> @@ -555,6 +574,9 @@ static int tcg_target_const_match(tcg_target_long
val, TCGType type,
>  #define VNOR       VX4(1284)
>  #define VOR        VX4(1156)
>  #define VXOR       VX4(1220)
> +#define VEQV       VX4(1668)      /* v2.07 */
> +#define VNAND      VX4(1412)      /* v2.07 */
> +#define VORC       VX4(1348)      /* v2.07 */
>
>  #define VSPLTB     VX4(524)
>  #define VSPLTH     VX4(588)
> @@ -568,6 +590,11 @@ static int tcg_target_const_match(tcg_target_long
val, TCGType type,
>  #define XXPERMDI   (OPCD(60) | (10 << 3))   /* v2.06 */
>  #define XXSEL      (OPCD(60) | (3 << 4))    /* v2.06 */
>
> +#define MFVSRD     XO31(51)       /* v2.07 */
> +#define MFVSRWZ    XO31(115)      /* v2.07 */
> +#define MTVSRD     XO31(179)      /* v2.07 */
> +#define MTVSRWZ    XO31(179)      /* v2.07 */
> +
>  #define RT(r) ((r)<<21)
>  #define RS(r) ((r)<<21)
>  #define RA(r) ((r)<<16)
> @@ -697,12 +724,27 @@ static bool tcg_out_mov(TCGContext *s, TCGType
type, TCGReg ret, TCGReg arg)
>          tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
>          /* fallthru */
>      case TCG_TYPE_I32:
> -        if (ret < TCG_REG_V0 && arg < TCG_REG_V0) {
> -            tcg_out32(s, OR | SAB(arg, ret, arg));
> -            break;
> -        } else if (ret < TCG_REG_V0 || arg < TCG_REG_V0) {
> -            /* Altivec does not support vector/integer moves.  */
> -            return false;
> +        if (ret < TCG_REG_V0) {
> +            if (arg < TCG_REG_V0) {
> +                tcg_out32(s, OR | SAB(arg, ret, arg));
> +                break;
> +            } else if (have_isa_2_07_vsx) {
> +                tcg_out32(s, (type == TCG_TYPE_I32 ? MFVSRWZ : MFVSRD)
> +                          | VRT(arg) | RA(ret) | 1);
> +                break;
> +            } else {
> +                /* Altivec does not support vector->integer moves.  */
> +                return false;
> +            }
> +        } else if (arg < TCG_REG_V0) {
> +            if (have_isa_2_07_vsx) {
> +                tcg_out32(s, (type == TCG_TYPE_I32 ? MTVSRWZ : MTVSRD)
> +                          | VRT(ret) | RA(arg) | 1);
> +                break;
> +            } else {
> +                /* Altivec does not support integer->vector moves.  */
> +                return false;
> +            }
>          }
>          /* fallthru */
>      case TCG_TYPE_V64:
> @@ -1140,6 +1182,10 @@ static void tcg_out_ld(TCGContext *s, TCGType
type, TCGReg ret,
>              tcg_out_mem_long(s, LWZ, LWZX, ret, base, offset);
>              break;
>          }
> +        if (have_isa_2_07_vsx) {
> +            tcg_out_mem_long(s, 0, LXSIWZX | 1, ret, base, offset);
> +            break;
> +        }
>          assert((offset & 3) == 0);
>          tcg_out_mem_long(s, 0, LVEWX, ret, base, offset);
>          shift = (offset - 4) & 0xc;
> @@ -1187,6 +1233,10 @@ static void tcg_out_st(TCGContext *s, TCGType
type, TCGReg arg,
>              tcg_out_mem_long(s, STW, STWX, arg, base, offset);
>              break;
>          }
> +        if (have_isa_2_07_vsx) {
> +            tcg_out_mem_long(s, 0, STXSIWX | 1, arg, base, offset);
> +            break;
> +        }
>          assert((offset & 3) == 0);
>          shift = (offset - 4) & 0xc;
>          if (shift) {
> @@ -2907,26 +2957,37 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType
type, unsigned vece)
>      case INDEX_op_andc_vec:
>      case INDEX_op_not_vec:
>          return 1;
> +    case INDEX_op_orc_vec:
> +        return have_isa_2_07_vsx;
>      case INDEX_op_add_vec:
>      case INDEX_op_sub_vec:
>      case INDEX_op_smax_vec:
>      case INDEX_op_smin_vec:
>      case INDEX_op_umax_vec:
>      case INDEX_op_umin_vec:
> +    case INDEX_op_shlv_vec:
> +    case INDEX_op_shrv_vec:
> +    case INDEX_op_sarv_vec:
> +        return vece <= MO_32 || have_isa_2_07_vsx;
>      case INDEX_op_ssadd_vec:
>      case INDEX_op_sssub_vec:
>      case INDEX_op_usadd_vec:
>      case INDEX_op_ussub_vec:
> -    case INDEX_op_shlv_vec:
> -    case INDEX_op_shrv_vec:
> -    case INDEX_op_sarv_vec:
>          return vece <= MO_32;
>      case INDEX_op_cmp_vec:
> -    case INDEX_op_mul_vec:
>      case INDEX_op_shli_vec:
>      case INDEX_op_shri_vec:
>      case INDEX_op_sari_vec:
> -        return vece <= MO_32 ? -1 : 0;
> +        return vece <= MO_32 || have_isa_2_07_vsx ? -1 : 0;
> +    case INDEX_op_mul_vec:
> +        switch (vece) {
> +        case MO_8:
> +        case MO_16:
> +            return -1;
> +        case MO_32:
> +            return have_isa_2_07_vsx ? 1 : -1;
> +        }
> +        return 0;
>      case INDEX_op_bitsel_vec:
>          return have_isa_2_06_vsx;
>      default:
> @@ -3030,28 +3091,28 @@ static void tcg_out_vec_op(TCGContext *s,
TCGOpcode opc,
>                             const TCGArg *args, const int *const_args)
>  {
>      static const uint32_t
> -        add_op[4] = { VADDUBM, VADDUHM, VADDUWM, 0 },
> -        sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, 0 },
> -        eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 },
> -        gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 },
> -        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 },
> +        add_op[4] = { VADDUBM, VADDUHM, VADDUWM, VADDUDM },
> +        sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, VSUBUDM },
> +        eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, VCMPEQUD },
> +        gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, VCMPGTSD },
> +        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, VCMPGTUD },
>          ssadd_op[4] = { VADDSBS, VADDSHS, VADDSWS, 0 },
>          usadd_op[4] = { VADDUBS, VADDUHS, VADDUWS, 0 },
>          sssub_op[4] = { VSUBSBS, VSUBSHS, VSUBSWS, 0 },
>          ussub_op[4] = { VSUBUBS, VSUBUHS, VSUBUWS, 0 },
> -        umin_op[4] = { VMINUB, VMINUH, VMINUW, 0 },
> -        smin_op[4] = { VMINSB, VMINSH, VMINSW, 0 },
> -        umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, 0 },
> -        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 },
> -        shlv_op[4] = { VSLB, VSLH, VSLW, 0 },
> -        shrv_op[4] = { VSRB, VSRH, VSRW, 0 },
> -        sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 },
> +        umin_op[4] = { VMINUB, VMINUH, VMINUW, VMINUD },
> +        smin_op[4] = { VMINSB, VMINSH, VMINSW, VMINSD },
> +        umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, VMAXUD },
> +        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, VMAXSD },
> +        shlv_op[4] = { VSLB, VSLH, VSLW, VSLD },
> +        shrv_op[4] = { VSRB, VSRH, VSRW, VSRD },
> +        sarv_op[4] = { VSRAB, VSRAH, VSRAW, VSRAD },
>          mrgh_op[4] = { VMRGHB, VMRGHH, VMRGHW, 0 },
>          mrgl_op[4] = { VMRGLB, VMRGLH, VMRGLW, 0 },
> -        muleu_op[4] = { VMULEUB, VMULEUH, 0, 0 },
> -        mulou_op[4] = { VMULOUB, VMULOUH, 0, 0 },
> +        muleu_op[4] = { VMULEUB, VMULEUH, VMULEUW, 0 },
> +        mulou_op[4] = { VMULOUB, VMULOUH, VMULOUW, 0 },
>          pkum_op[4] = { VPKUHUM, VPKUWUM, 0, 0 },
> -        rotl_op[4] = { VRLB, VRLH, VRLW, 0 };
> +        rotl_op[4] = { VRLB, VRLH, VRLW, VRLD };
>
>      TCGType type = vecl + TCG_TYPE_V64;
>      TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
> @@ -3074,6 +3135,10 @@ static void tcg_out_vec_op(TCGContext *s,
TCGOpcode opc,
>      case INDEX_op_sub_vec:
>          insn = sub_op[vece];
>          break;
> +    case INDEX_op_mul_vec:
> +        tcg_debug_assert(vece == MO_32 && have_isa_2_07_vsx);
> +        insn = VMULUWM;
> +        break;
>      case INDEX_op_ssadd_vec:
>          insn = ssadd_op[vece];
>          break;
> @@ -3123,6 +3188,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode
opc,
>          insn = VNOR;
>          a2 = a1;
>          break;
> +    case INDEX_op_orc_vec:
> +        insn = VORC;
> +        break;
>
>      case INDEX_op_cmp_vec:
>          switch (args[3]) {
> @@ -3203,7 +3271,7 @@ static void expand_vec_cmp(TCGType type, unsigned
vece, TCGv_vec v0,
>  {
>      bool need_swap = false, need_inv = false;
>
> -    tcg_debug_assert(vece <= MO_32);
> +    tcg_debug_assert(vece <= MO_32 || have_isa_2_07_vsx);
>
>      switch (cond) {
>      case TCG_COND_EQ:
> @@ -3267,6 +3335,7 @@ static void expand_vec_mul(TCGType type, unsigned
vece, TCGv_vec v0,
>         break;
>
>      case MO_32:
> +        tcg_debug_assert(!have_isa_2_07_vsx);
>          t3 = tcg_temp_new_vec(type);
>          t4 = tcg_temp_new_vec(type);
>          tcg_gen_dupi_vec(MO_8, t4, -16);
> @@ -3562,6 +3631,11 @@ static void tcg_target_init(TCGContext *s)
>              have_isa_2_06_vsx = true;
>          }
>      }
> +    if (hwcap2 & PPC_FEATURE2_ARCH_2_07) {
> +        if (hwcap & PPC_FEATURE_HAS_VSX) {
> +            have_isa_2_07_vsx = true;
> +        }
> +    }
>  #ifdef PPC_FEATURE2_ARCH_3_00
>      if (hwcap2 & PPC_FEATURE2_ARCH_3_00) {
>          have_isa_3_00 = true;
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 15/16] tcg/ppc: Update vector support to v2.07
  2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 15/16] tcg/ppc: Update vector support to v2.07 Richard Henderson
  2019-06-30 11:50   ` Aleksandar Markovic
@ 2019-06-30 13:37   ` Aleksandar Markovic
  2019-06-30 15:12     ` Richard Henderson
  1 sibling, 1 reply; 40+ messages in thread
From: Aleksandar Markovic @ 2019-06-30 13:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: mark.cave-ayland, qemu-devel, amarkovic, hsp.cat7

On Jun 29, 2019 3:14 PM, "Richard Henderson" <richard.henderson@linaro.org>
wrote:
>
> This includes single-word loads and stores, lots of double-word
> arithmetic, and a few extra logical operations.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
> ---
>  tcg/ppc/tcg-target.h     |   3 +-
>  tcg/ppc/tcg-target.inc.c | 128 ++++++++++++++++++++++++++++++---------
>  2 files changed, 103 insertions(+), 28 deletions(-)
>
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index 40544f996d..b8355d0a56 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -61,6 +61,7 @@ typedef enum {
>  extern bool have_isa_altivec;
>  extern bool have_isa_2_06;
>  extern bool have_isa_2_06_vsx;
> +extern bool have_isa_2_07_vsx;
>  extern bool have_isa_3_00;
>
>  /* optional instructions automatically implemented */
> @@ -147,7 +148,7 @@ extern bool have_isa_3_00;
>  #define TCG_TARGET_HAS_v256             0
>
>  #define TCG_TARGET_HAS_andc_vec         1
> -#define TCG_TARGET_HAS_orc_vec          0
> +#define TCG_TARGET_HAS_orc_vec          have_isa_2_07_vsx
>  #define TCG_TARGET_HAS_not_vec          1
>  #define TCG_TARGET_HAS_neg_vec          0
>  #define TCG_TARGET_HAS_abs_vec          0
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index 50d1b5612c..af86ab07dd 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -67,6 +67,7 @@ static tcg_insn_unit *tb_ret_addr;
>  bool have_isa_altivec;
>  bool have_isa_2_06;
>  bool have_isa_2_06_vsx;
> +bool have_isa_2_07_vsx;

Does this flag indicate support for PowerISA 2.07 or VSX?

If VSX support is implied by PowerISA 2.07, then “_vsx” suffix is really
not needed. If not, why are there two flavors of “2_06” flags (with and
without _vsx), and only one flavor if 2.07 (with _vsx) flg variables?

>  bool have_isa_3_00;
>
>  #define HAVE_ISA_2_06  have_isa_2_06
> @@ -473,10 +474,12 @@ static int tcg_target_const_match(tcg_target_long
val, TCGType type,
>  #define LVEWX      XO31(71)
>  #define LXSDX      XO31(588)      /* v2.06 */
>  #define LXVDSX     XO31(332)      /* v2.06 */
> +#define LXSIWZX    XO31(12)       /* v2.07 */
>
>  #define STVX       XO31(231)
>  #define STVEWX     XO31(199)
>  #define STXSDX     XO31(716)      /* v2.06 */
> +#define STXSIWX    XO31(140)      /* v2.07 */
>
>  #define VADDSBS    VX4(768)
>  #define VADDUBS    VX4(512)
> @@ -487,6 +490,7 @@ static int tcg_target_const_match(tcg_target_long
val, TCGType type,
>  #define VADDSWS    VX4(896)
>  #define VADDUWS    VX4(640)
>  #define VADDUWM    VX4(128)
> +#define VADDUDM    VX4(192)       /* v2.07 */
>
>  #define VSUBSBS    VX4(1792)
>  #define VSUBUBS    VX4(1536)
> @@ -497,47 +501,62 @@ static int tcg_target_const_match(tcg_target_long
val, TCGType type,
>  #define VSUBSWS    VX4(1920)
>  #define VSUBUWS    VX4(1664)
>  #define VSUBUWM    VX4(1152)
> +#define VSUBUDM    VX4(1216)      /* v2.07 */
>
>  #define VMAXSB     VX4(258)
>  #define VMAXSH     VX4(322)
>  #define VMAXSW     VX4(386)
> +#define VMAXSD     VX4(450)       /* v2.07 */
>  #define VMAXUB     VX4(2)
>  #define VMAXUH     VX4(66)
>  #define VMAXUW     VX4(130)
> +#define VMAXUD     VX4(194)       /* v2.07 */
>  #define VMINSB     VX4(770)
>  #define VMINSH     VX4(834)
>  #define VMINSW     VX4(898)
> +#define VMINSD     VX4(962)       /* v2.07 */
>  #define VMINUB     VX4(514)
>  #define VMINUH     VX4(578)
>  #define VMINUW     VX4(642)
> +#define VMINUD     VX4(706)       /* v2.07 */
>
>  #define VCMPEQUB   VX4(6)
>  #define VCMPEQUH   VX4(70)
>  #define VCMPEQUW   VX4(134)
> +#define VCMPEQUD   VX4(199)       /* v2.07 */
>  #define VCMPGTSB   VX4(774)
>  #define VCMPGTSH   VX4(838)
>  #define VCMPGTSW   VX4(902)
> +#define VCMPGTSD   VX4(967)       /* v2.07 */
>  #define VCMPGTUB   VX4(518)
>  #define VCMPGTUH   VX4(582)
>  #define VCMPGTUW   VX4(646)
> +#define VCMPGTUD   VX4(711)       /* v2.07 */
>
>  #define VSLB       VX4(260)
>  #define VSLH       VX4(324)
>  #define VSLW       VX4(388)
> +#define VSLD       VX4(1476)      /* v2.07 */
>  #define VSRB       VX4(516)
>  #define VSRH       VX4(580)
>  #define VSRW       VX4(644)
> +#define VSRD       VX4(1732)      /* v2.07 */
>  #define VSRAB      VX4(772)
>  #define VSRAH      VX4(836)
>  #define VSRAW      VX4(900)
> +#define VSRAD      VX4(964)       /* v2.07 */
>  #define VRLB       VX4(4)
>  #define VRLH       VX4(68)
>  #define VRLW       VX4(132)
> +#define VRLD       VX4(196)       /* v2.07 */
>
>  #define VMULEUB    VX4(520)
>  #define VMULEUH    VX4(584)
> +#define VMULEUW    VX4(648)       /* v2.07 */
>  #define VMULOUB    VX4(8)
>  #define VMULOUH    VX4(72)
> +#define VMULOUW    VX4(136)       /* v2.07 */
> +#define VMULUWM    VX4(137)       /* v2.07 */
>  #define VMSUMUHM   VX4(38)
>
>  #define VMRGHB     VX4(12)
> @@ -555,6 +574,9 @@ static int tcg_target_const_match(tcg_target_long
val, TCGType type,
>  #define VNOR       VX4(1284)
>  #define VOR        VX4(1156)
>  #define VXOR       VX4(1220)
> +#define VEQV       VX4(1668)      /* v2.07 */
> +#define VNAND      VX4(1412)      /* v2.07 */
> +#define VORC       VX4(1348)      /* v2.07 */
>
>  #define VSPLTB     VX4(524)
>  #define VSPLTH     VX4(588)
> @@ -568,6 +590,11 @@ static int tcg_target_const_match(tcg_target_long
val, TCGType type,
>  #define XXPERMDI   (OPCD(60) | (10 << 3))   /* v2.06 */
>  #define XXSEL      (OPCD(60) | (3 << 4))    /* v2.06 */
>
> +#define MFVSRD     XO31(51)       /* v2.07 */
> +#define MFVSRWZ    XO31(115)      /* v2.07 */
> +#define MTVSRD     XO31(179)      /* v2.07 */
> +#define MTVSRWZ    XO31(179)      /* v2.07 */
> +
>  #define RT(r) ((r)<<21)
>  #define RS(r) ((r)<<21)
>  #define RA(r) ((r)<<16)
> @@ -697,12 +724,27 @@ static bool tcg_out_mov(TCGContext *s, TCGType
type, TCGReg ret, TCGReg arg)
>          tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
>          /* fallthru */
>      case TCG_TYPE_I32:
> -        if (ret < TCG_REG_V0 && arg < TCG_REG_V0) {
> -            tcg_out32(s, OR | SAB(arg, ret, arg));
> -            break;
> -        } else if (ret < TCG_REG_V0 || arg < TCG_REG_V0) {
> -            /* Altivec does not support vector/integer moves.  */
> -            return false;
> +        if (ret < TCG_REG_V0) {
> +            if (arg < TCG_REG_V0) {
> +                tcg_out32(s, OR | SAB(arg, ret, arg));
> +                break;
> +            } else if (have_isa_2_07_vsx) {
> +                tcg_out32(s, (type == TCG_TYPE_I32 ? MFVSRWZ : MFVSRD)
> +                          | VRT(arg) | RA(ret) | 1);
> +                break;
> +            } else {
> +                /* Altivec does not support vector->integer moves.  */
> +                return false;
> +            }
> +        } else if (arg < TCG_REG_V0) {
> +            if (have_isa_2_07_vsx) {
> +                tcg_out32(s, (type == TCG_TYPE_I32 ? MTVSRWZ : MTVSRD)
> +                          | VRT(ret) | RA(arg) | 1);
> +                break;
> +            } else {
> +                /* Altivec does not support integer->vector moves.  */
> +                return false;
> +            }
>          }
>          /* fallthru */
>      case TCG_TYPE_V64:
> @@ -1140,6 +1182,10 @@ static void tcg_out_ld(TCGContext *s, TCGType
type, TCGReg ret,
>              tcg_out_mem_long(s, LWZ, LWZX, ret, base, offset);
>              break;
>          }
> +        if (have_isa_2_07_vsx) {
> +            tcg_out_mem_long(s, 0, LXSIWZX | 1, ret, base, offset);
> +            break;
> +        }
>          assert((offset & 3) == 0);
>          tcg_out_mem_long(s, 0, LVEWX, ret, base, offset);
>          shift = (offset - 4) & 0xc;
> @@ -1187,6 +1233,10 @@ static void tcg_out_st(TCGContext *s, TCGType
type, TCGReg arg,
>              tcg_out_mem_long(s, STW, STWX, arg, base, offset);
>              break;
>          }
> +        if (have_isa_2_07_vsx) {
> +            tcg_out_mem_long(s, 0, STXSIWX | 1, arg, base, offset);
> +            break;
> +        }
>          assert((offset & 3) == 0);
>          shift = (offset - 4) & 0xc;
>          if (shift) {
> @@ -2907,26 +2957,37 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType
type, unsigned vece)
>      case INDEX_op_andc_vec:
>      case INDEX_op_not_vec:
>          return 1;
> +    case INDEX_op_orc_vec:
> +        return have_isa_2_07_vsx;
>      case INDEX_op_add_vec:
>      case INDEX_op_sub_vec:
>      case INDEX_op_smax_vec:
>      case INDEX_op_smin_vec:
>      case INDEX_op_umax_vec:
>      case INDEX_op_umin_vec:
> +    case INDEX_op_shlv_vec:
> +    case INDEX_op_shrv_vec:
> +    case INDEX_op_sarv_vec:
> +        return vece <= MO_32 || have_isa_2_07_vsx;
>      case INDEX_op_ssadd_vec:
>      case INDEX_op_sssub_vec:
>      case INDEX_op_usadd_vec:
>      case INDEX_op_ussub_vec:
> -    case INDEX_op_shlv_vec:
> -    case INDEX_op_shrv_vec:
> -    case INDEX_op_sarv_vec:
>          return vece <= MO_32;
>      case INDEX_op_cmp_vec:
> -    case INDEX_op_mul_vec:
>      case INDEX_op_shli_vec:
>      case INDEX_op_shri_vec:
>      case INDEX_op_sari_vec:
> -        return vece <= MO_32 ? -1 : 0;
> +        return vece <= MO_32 || have_isa_2_07_vsx ? -1 : 0;
> +    case INDEX_op_mul_vec:
> +        switch (vece) {
> +        case MO_8:
> +        case MO_16:
> +            return -1;
> +        case MO_32:
> +            return have_isa_2_07_vsx ? 1 : -1;
> +        }
> +        return 0;
>      case INDEX_op_bitsel_vec:
>          return have_isa_2_06_vsx;
>      default:
> @@ -3030,28 +3091,28 @@ static void tcg_out_vec_op(TCGContext *s,
TCGOpcode opc,
>                             const TCGArg *args, const int *const_args)
>  {
>      static const uint32_t
> -        add_op[4] = { VADDUBM, VADDUHM, VADDUWM, 0 },
> -        sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, 0 },
> -        eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 },
> -        gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 },
> -        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 },
> +        add_op[4] = { VADDUBM, VADDUHM, VADDUWM, VADDUDM },
> +        sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, VSUBUDM },
> +        eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, VCMPEQUD },
> +        gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, VCMPGTSD },
> +        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, VCMPGTUD },
>          ssadd_op[4] = { VADDSBS, VADDSHS, VADDSWS, 0 },
>          usadd_op[4] = { VADDUBS, VADDUHS, VADDUWS, 0 },
>          sssub_op[4] = { VSUBSBS, VSUBSHS, VSUBSWS, 0 },
>          ussub_op[4] = { VSUBUBS, VSUBUHS, VSUBUWS, 0 },
> -        umin_op[4] = { VMINUB, VMINUH, VMINUW, 0 },
> -        smin_op[4] = { VMINSB, VMINSH, VMINSW, 0 },
> -        umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, 0 },
> -        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 },
> -        shlv_op[4] = { VSLB, VSLH, VSLW, 0 },
> -        shrv_op[4] = { VSRB, VSRH, VSRW, 0 },
> -        sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 },
> +        umin_op[4] = { VMINUB, VMINUH, VMINUW, VMINUD },
> +        smin_op[4] = { VMINSB, VMINSH, VMINSW, VMINSD },
> +        umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, VMAXUD },
> +        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, VMAXSD },
> +        shlv_op[4] = { VSLB, VSLH, VSLW, VSLD },
> +        shrv_op[4] = { VSRB, VSRH, VSRW, VSRD },
> +        sarv_op[4] = { VSRAB, VSRAH, VSRAW, VSRAD },
>          mrgh_op[4] = { VMRGHB, VMRGHH, VMRGHW, 0 },
>          mrgl_op[4] = { VMRGLB, VMRGLH, VMRGLW, 0 },
> -        muleu_op[4] = { VMULEUB, VMULEUH, 0, 0 },
> -        mulou_op[4] = { VMULOUB, VMULOUH, 0, 0 },
> +        muleu_op[4] = { VMULEUB, VMULEUH, VMULEUW, 0 },
> +        mulou_op[4] = { VMULOUB, VMULOUH, VMULOUW, 0 },
>          pkum_op[4] = { VPKUHUM, VPKUWUM, 0, 0 },
> -        rotl_op[4] = { VRLB, VRLH, VRLW, 0 };
> +        rotl_op[4] = { VRLB, VRLH, VRLW, VRLD };
>
>      TCGType type = vecl + TCG_TYPE_V64;
>      TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
> @@ -3074,6 +3135,10 @@ static void tcg_out_vec_op(TCGContext *s,
TCGOpcode opc,
>      case INDEX_op_sub_vec:
>          insn = sub_op[vece];
>          break;
> +    case INDEX_op_mul_vec:
> +        tcg_debug_assert(vece == MO_32 && have_isa_2_07_vsx);
> +        insn = VMULUWM;
> +        break;
>      case INDEX_op_ssadd_vec:
>          insn = ssadd_op[vece];
>          break;
> @@ -3123,6 +3188,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode
opc,
>          insn = VNOR;
>          a2 = a1;
>          break;
> +    case INDEX_op_orc_vec:
> +        insn = VORC;
> +        break;
>
>      case INDEX_op_cmp_vec:
>          switch (args[3]) {
> @@ -3203,7 +3271,7 @@ static void expand_vec_cmp(TCGType type, unsigned
vece, TCGv_vec v0,
>  {
>      bool need_swap = false, need_inv = false;
>
> -    tcg_debug_assert(vece <= MO_32);
> +    tcg_debug_assert(vece <= MO_32 || have_isa_2_07_vsx);
>
>      switch (cond) {
>      case TCG_COND_EQ:
> @@ -3267,6 +3335,7 @@ static void expand_vec_mul(TCGType type, unsigned
vece, TCGv_vec v0,
>         break;
>
>      case MO_32:
> +        tcg_debug_assert(!have_isa_2_07_vsx);
>          t3 = tcg_temp_new_vec(type);
>          t4 = tcg_temp_new_vec(type);
>          tcg_gen_dupi_vec(MO_8, t4, -16);
> @@ -3562,6 +3631,11 @@ static void tcg_target_init(TCGContext *s)
>              have_isa_2_06_vsx = true;
>          }
>      }
> +    if (hwcap2 & PPC_FEATURE2_ARCH_2_07) {
> +        if (hwcap & PPC_FEATURE_HAS_VSX) {
> +            have_isa_2_07_vsx = true;
> +        }
> +    }
>  #ifdef PPC_FEATURE2_ARCH_3_00
>      if (hwcap2 & PPC_FEATURE2_ARCH_3_00) {
>          have_isa_3_00 = true;
> --
> 2.17.1
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 15/16] tcg/ppc: Update vector support to v2.07
  2019-06-30 13:37   ` Aleksandar Markovic
@ 2019-06-30 15:12     ` Richard Henderson
  2019-07-01  3:57       ` Aleksandar Markovic
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2019-06-30 15:12 UTC (permalink / raw)
  To: Aleksandar Markovic; +Cc: mark.cave-ayland, qemu-devel, amarkovic, hsp.cat7

On 6/30/19 3:37 PM, Aleksandar Markovic wrote:
>>  bool have_isa_2_06;
>>  bool have_isa_2_06_vsx;
>> +bool have_isa_2_07_vsx;
> 
> Does this flag indicate support for PowerISA 2.07 or VSX?

VSX & 2.07,

>> +    if (hwcap2 & PPC_FEATURE2_ARCH_2_07) {
>> +        if (hwcap & PPC_FEATURE_HAS_VSX) {
>> +            have_isa_2_07_vsx = true;
>> +        }
>> +    }

Like so.

While it would have been possible to have one single have_isa_vsx, we would
then also have to check a second flag to see which revision.  Therefore I
created these composite flags so that we only have to check one.


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes
  2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
                   ` (16 preceding siblings ...)
  2019-06-29 13:37 ` [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes no-reply
@ 2019-06-30 17:58 ` Mark Cave-Ayland
  2019-07-01 10:30   ` Richard Henderson
  17 siblings, 1 reply; 40+ messages in thread
From: Mark Cave-Ayland @ 2019-06-30 17:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: amarkovic, hsp.cat7

On 29/06/2019 14:00, Richard Henderson wrote:

> Changes since v5:
>   * Disable runtime altivec detection until all of the required
>     opcodes are implemented.
>     Because dup2 was last, that really means all of the pure altivec
>     bits, so the initial patches are not bisectable in any meaningful
>     sense.  I thought about reshuffling dup2 earlier, but that created
>     too many conflicts and I was too lazy.
>   * Rearranged the patches a little bit to make sure that each
>     one actually builds, which was not the case before.
>   * Folded in the fix to tcg_out_mem_long, as discussed in the
>     followup within the v4 thread.
> 
> Changes since v4:
>   * Patch 1, "tcg/ppc: Introduce Altivec registers", is divided into
>     ten smaller patches.
>   * The net result (code-wise) is not changed between former patch 1
>     and ten new patches.
>   * Remaining (2-7) patches from v4 are applied verbatim.
>   * This means that code-wise v5 and v4 do not differ.
>   * v5 is devised to help debugging, and to better organize the code.
> 
> Changes since v3:
>   * Add support for bitsel, with the vsx xxsel insn.
>   * Rely on the new relocation overflow handling, so
>     we don't require 3 insns for a vector load.
> 
> Changes since v2:
>   * Several generic tcg patches to improve dup vs dupi vs dupm.
>     In particular, if a global temp (like guest r10) is not in
>     a host register, we should duplicate from memory instead of
>     loading to an integer register, spilling to stack, loading
>     to a vector register, and then duplicating.
>   * I have more confidence that 32-bit ppc host should work
>     this time around.  No testing on that front yet, but I've
>     unified some code sequences with 64-bit ppc host.
>   * Base altivec now supports V128 only.  Moved V64 support to
>     Power7 (v2.06), which has 64-bit load/store.
>   * Dropped support for 64-bit vector multiply using Power8.
>     The expansion was too large compared to using integer regs.
> 
> Richard Henderson (16):
>   tcg/ppc: Introduce Altivec registers
>   tcg/ppc: Introduce macro VX4()
>   tcg/ppc: Introduce macros VRT(), VRA(), VRB(), VRC()
>   tcg/ppc: Enable tcg backend vector compilation
>   tcg/ppc: Add support for load/store/logic/comparison
>   tcg/ppc: Add support for vector maximum/minimum
>   tcg/ppc: Add support for vector add/subtract
>   tcg/ppc: Add support for vector saturated add/subtract
>   tcg/ppc: Prepare case for vector multiply
>   tcg/ppc: Support vector shift by immediate
>   tcg/ppc: Support vector multiply
>   tcg/ppc: Support vector dup2
>   tcg/ppc: Enable Altivec detection
>   tcg/ppc: Update vector support to v2.06
>   tcg/ppc: Update vector support to v2.07
>   tcg/ppc: Update vector support to v3.00
> 
>  tcg/ppc/tcg-target.h     |   39 +-
>  tcg/ppc/tcg-target.opc.h |   13 +
>  tcg/ppc/tcg-target.inc.c | 1091 +++++++++++++++++++++++++++++++++++---
>  3 files changed, 1076 insertions(+), 67 deletions(-)
>  create mode 100644 tcg/ppc/tcg-target.opc.h

I don't have space for a full set of images on the G4, however I've tried boot tests
on installer CDs for MacOS 9, OS X 10.2, Linux and HelenOS and it looks good here.

Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> [PPC32]


ATB,

Mark.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 15/16] tcg/ppc: Update vector support to v2.07
  2019-06-30 15:12     ` Richard Henderson
@ 2019-07-01  3:57       ` Aleksandar Markovic
  2019-07-01 10:29         ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Aleksandar Markovic @ 2019-07-01  3:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: mark.cave-ayland, qemu-devel, amarkovic, hsp.cat7

On Jun 30, 2019 5:12 PM, "Richard Henderson" <richard.henderson@linaro.org>
wrote:
>
> On 6/30/19 3:37 PM, Aleksandar Markovic wrote:
> >>  bool have_isa_2_06;
> >>  bool have_isa_2_06_vsx;
> >> +bool have_isa_2_07_vsx;
> >
> > Does this flag indicate support for PowerISA 2.07 or VSX?
>
> VSX & 2.07,
>
> >> +    if (hwcap2 & PPC_FEATURE2_ARCH_2_07) {
> >> +        if (hwcap & PPC_FEATURE_HAS_VSX) {
> >> +            have_isa_2_07_vsx = true;
> >> +        }
> >> +    }
>
> Like so.
>
> While it would have been possible to have one single have_isa_vsx, we
would
> then also have to check a second flag to see which revision.  Therefore I
> created these composite flags so that we only have to check one.
>

Yes, but, in this patch, for example, among other things, the support for
doubleword integer max/min vector operation is implemented. Why is the
existence of that support dependant on VSX (PPC_FEATURE_HAS_VSX)?

>
> r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 15/16] tcg/ppc: Update vector support to v2.07
  2019-07-01  3:57       ` Aleksandar Markovic
@ 2019-07-01 10:29         ` Richard Henderson
  2019-07-01 11:41           ` Aleksandar Markovic
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2019-07-01 10:29 UTC (permalink / raw)
  To: Aleksandar Markovic; +Cc: mark.cave-ayland, qemu-devel, amarkovic, hsp.cat7

On 7/1/19 5:57 AM, Aleksandar Markovic wrote:
> 
> On Jun 30, 2019 5:12 PM, "Richard Henderson" <richard.henderson@linaro.org
> <mailto:richard.henderson@linaro.org>> wrote:
>>
>> On 6/30/19 3:37 PM, Aleksandar Markovic wrote:
>> >>  bool have_isa_2_06;
>> >>  bool have_isa_2_06_vsx;
>> >> +bool have_isa_2_07_vsx;
>> >
>> > Does this flag indicate support for PowerISA 2.07 or VSX?
>>
>> VSX & 2.07,
>>
>> >> +    if (hwcap2 & PPC_FEATURE2_ARCH_2_07) {
>> >> +        if (hwcap & PPC_FEATURE_HAS_VSX) {
>> >> +            have_isa_2_07_vsx = true;
>> >> +        }
>> >> +    }
>>
>> Like so.
>>
>> While it would have been possible to have one single have_isa_vsx, we would
>> then also have to check a second flag to see which revision.  Therefore I
>> created these composite flags so that we only have to check one.
>>
> 
> Yes, but, in this patch, for example, among other things, the support for
> doubleword integer max/min vector operation is implemented. Why is the
> existence of that support dependant on VSX (PPC_FEATURE_HAS_VSX)?

Because otherwise the instruction doesn't exist?


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes
  2019-06-30 17:58 ` Mark Cave-Ayland
@ 2019-07-01 10:30   ` Richard Henderson
  2019-07-01 18:34     ` Howard Spoelstra
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2019-07-01 10:30 UTC (permalink / raw)
  To: Mark Cave-Ayland, qemu-devel; +Cc: amarkovic, hsp.cat7

On 6/30/19 7:58 PM, Mark Cave-Ayland wrote:
> I don't have space for a full set of images on the G4, however I've tried boot tests
> on installer CDs for MacOS 9, OS X 10.2, Linux and HelenOS and it looks good here.
> 
> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> [PPC32]

Thanks!


r!



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 15/16] tcg/ppc: Update vector support to v2.07
  2019-07-01 10:29         ` Richard Henderson
@ 2019-07-01 11:41           ` Aleksandar Markovic
  2019-07-02 14:25             ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Aleksandar Markovic @ 2019-07-01 11:41 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Mark Cave-Ayland, QEMU Developers, Aleksandar Markovic, Howard Spoelstra

On Mon, Jul 1, 2019 at 12:29 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 7/1/19 5:57 AM, Aleksandar Markovic wrote:
> >
> > On Jun 30, 2019 5:12 PM, "Richard Henderson" <richard.henderson@linaro.org
> > <mailto:richard.henderson@linaro.org>> wrote:
> >>
> >> On 6/30/19 3:37 PM, Aleksandar Markovic wrote:
> >> >>  bool have_isa_2_06;
> >> >>  bool have_isa_2_06_vsx;
> >> >> +bool have_isa_2_07_vsx;
> >> >
> >> > Does this flag indicate support for PowerISA 2.07 or VSX?
> >>
> >> VSX & 2.07,
> >>
> >> >> +    if (hwcap2 & PPC_FEATURE2_ARCH_2_07) {
> >> >> +        if (hwcap & PPC_FEATURE_HAS_VSX) {
> >> >> +            have_isa_2_07_vsx = true;
> >> >> +        }
> >> >> +    }
> >>
> >> Like so.
> >>
> >> While it would have been possible to have one single have_isa_vsx, we would
> >> then also have to check a second flag to see which revision.  Therefore I
> >> created these composite flags so that we only have to check one.
> >>
> >
> > Yes, but, in this patch, for example, among other things, the support for
> > doubleword integer max/min vector operation is implemented. Why is the
> > existence of that support dependant on VSX (PPC_FEATURE_HAS_VSX)?
>
> Because otherwise the instruction doesn't exist?
>

If we go back to my example, it appears to me that doubleword
integer max/min Altivec instruction do not depend on VSX in any
way, or, at least, I did not find anything in Altivec docs that
mentions it (I could be wrong). The same concern for majority
of Altivec instructions used in this patch. What is your reason
for considering all of them needing VSX?

Regards,
Aleksandar

>
> r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes
  2019-07-01 10:30   ` Richard Henderson
@ 2019-07-01 18:34     ` Howard Spoelstra
  2019-09-03 17:02       ` Mark Cave-Ayland
  0 siblings, 1 reply; 40+ messages in thread
From: Howard Spoelstra @ 2019-07-01 18:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Mark Cave-Ayland, qemu-devel qemu-devel, amarkovic

On Mon, Jul 1, 2019 at 12:30 PM Richard Henderson <
richard.henderson@linaro.org> wrote:

> On 6/30/19 7:58 PM, Mark Cave-Ayland wrote:
> > I don't have space for a full set of images on the G4, however I've
> tried boot tests
> > on installer CDs for MacOS 9, OS X 10.2, Linux and HelenOS and it looks
> good here.
> >
> > Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> [PPC32]
>
> Thanks!
>
> Hi

I just compiled the v6 set applied to current master on my G5, Ubuntu 16.
command line:
./qemu-system-ppc -L pc-bios -boot c m 512 -M mac99,via=pmu \
-netdev user,id=net1 -device sungem,netdev=net1 \
-drive file=10.3.img,format=raw,media=disk \

With no specific cpu set, Mac OS 9.2 hard disk image and 9.2 iso do not get
to the desktop, they just hang while still in the openbios window. They
need -cpu G4 on the command line to get to the desktop.

OSX 10.3 installed image boots to desktop.
OSX 10.3 iso boots to installer
OSX 10.4 installed image boots to desktop.
OSX 10.4 iso boot to installer
OSX 10.5 installed image boots to desktop.
OSX 10.5 iso boots to installer

So there seems to be a difference between hosts: If ran on a G4 host there
is no need to add -cpu G4 to run Mac OS 9.x, while there is when ran on a
G5 host.

Best,
Howard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 15/16] tcg/ppc: Update vector support to v2.07
  2019-07-01 11:41           ` Aleksandar Markovic
@ 2019-07-02 14:25             ` Richard Henderson
  2019-07-10 10:52               ` Aleksandar Markovic
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2019-07-02 14:25 UTC (permalink / raw)
  To: Aleksandar Markovic
  Cc: Mark Cave-Ayland, QEMU Developers, Aleksandar Markovic, Howard Spoelstra

On 7/1/19 1:41 PM, Aleksandar Markovic wrote:
> If we go back to my example, it appears to me that doubleword
> integer max/min Altivec instruction do not depend on VSX in any
> way, or, at least, I did not find anything in Altivec docs that
> mentions it (I could be wrong).

You are correct, for the case of min/max -- and indeed all of the other
arithmetic added in this patch -- we do not need VSX.

However, the load/store instructions added by this patch do require VSX.

AFAIK, there is exactly one v2.07 core design, the power8.
It has both Altivec and VSX, so it's really only a technicality
to check both v2.07 + Altivec + VSX, but I do anyway.  It does
not seem worthwhile to decompose these checks further.


r~


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 15/16] tcg/ppc: Update vector support to v2.07
  2019-07-02 14:25             ` Richard Henderson
@ 2019-07-10 10:52               ` Aleksandar Markovic
  0 siblings, 0 replies; 40+ messages in thread
From: Aleksandar Markovic @ 2019-07-10 10:52 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Mark Cave-Ayland, QEMU Developers, Aleksandar Markovic, Howard Spoelstra

On Jul 2, 2019 4:26 PM, "Richard Henderson" <richard.henderson@linaro.org>
wrote:
>
> On 7/1/19 1:41 PM, Aleksandar Markovic wrote:
> > If we go back to my example, it appears to me that doubleword
> > integer max/min Altivec instruction do not depend on VSX in any
> > way, or, at least, I did not find anything in Altivec docs that
> > mentions it (I could be wrong).
>
> You are correct, for the case of min/max -- and indeed all of the other
> arithmetic added in this patch -- we do not need VSX.
>
> However, the load/store instructions added by this patch do require VSX.
>
> AFAIK, there is exactly one v2.07 core design, the power8.
> It has both Altivec and VSX, so it's really only a technicality
> to check both v2.07 + Altivec + VSX, but I do anyway.  It does
> not seem worthwhile to decompose these checks further.
>

What did you achieve with such assimption? I see this:

- you have one flag less in your code, saving some 7-8 lines of source
(this includes flag initialization)

- you made QEMU executable around 20 bytes shorter

- you shortened QEMU initialization by almost 10 ns

(feel free to add something if I missed it)

However, here is the price: you made the code not only dependant on ISA
documentation, but also on market forces (whether some CPU configurations
are physically produced or not). Secondly, analysis of the code is
significantly more difficult.

Those are huge steps backward.

You should stick to the documentation only, and refactor the patch in that
light.

Regards,
Aleksandar

>
> r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes
  2019-07-01 18:34     ` Howard Spoelstra
@ 2019-09-03 17:02       ` Mark Cave-Ayland
  2019-09-03 17:37         ` Aleksandar Markovic
  0 siblings, 1 reply; 40+ messages in thread
From: Mark Cave-Ayland @ 2019-09-03 17:02 UTC (permalink / raw)
  To: Howard Spoelstra, Richard Henderson; +Cc: qemu-devel qemu-devel, amarkovic

On 01/07/2019 19:34, Howard Spoelstra wrote:

> On Mon, Jul 1, 2019 at 12:30 PM Richard Henderson <
> richard.henderson@linaro.org> wrote:
> 
>> On 6/30/19 7:58 PM, Mark Cave-Ayland wrote:
>>> I don't have space for a full set of images on the G4, however I've
>> tried boot tests
>>> on installer CDs for MacOS 9, OS X 10.2, Linux and HelenOS and it looks
>> good here.
>>>
>>> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> [PPC32]
>>
>> Thanks!
>>
>> Hi
> 
> I just compiled the v6 set applied to current master on my G5, Ubuntu 16.
> command line:
> ./qemu-system-ppc -L pc-bios -boot c m 512 -M mac99,via=pmu \
> -netdev user,id=net1 -device sungem,netdev=net1 \
> -drive file=10.3.img,format=raw,media=disk \
> 
> With no specific cpu set, Mac OS 9.2 hard disk image and 9.2 iso do not get
> to the desktop, they just hang while still in the openbios window. They
> need -cpu G4 on the command line to get to the desktop.
> 
> OSX 10.3 installed image boots to desktop.
> OSX 10.3 iso boots to installer
> OSX 10.4 installed image boots to desktop.
> OSX 10.4 iso boot to installer
> OSX 10.5 installed image boots to desktop.
> OSX 10.5 iso boots to installer
> 
> So there seems to be a difference between hosts: If ran on a G4 host there
> is no need to add -cpu G4 to run Mac OS 9.x, while there is when ran on a
> G5 host.

Are there any outstanding issues with this patchset now, or is it ready to be merged?
I'm really looking forward to seeing the improved performance when testing QEMU on my
Mac Mini :)


ATB,

Mark.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes
  2019-09-03 17:02       ` Mark Cave-Ayland
@ 2019-09-03 17:37         ` Aleksandar Markovic
  2019-09-03 18:32           ` Mark Cave-Ayland
  0 siblings, 1 reply; 40+ messages in thread
From: Aleksandar Markovic @ 2019-09-03 17:37 UTC (permalink / raw)
  To: Mark Cave-Ayland
  Cc: Richard Henderson, qemu-devel qemu-devel, Aleksandar Markovic,
	Howard Spoelstra

On Tue, Sep 3, 2019 at 7:05 PM Mark Cave-Ayland <
mark.cave-ayland@ilande.co.uk> wrote:

> On 01/07/2019 19:34, Howard Spoelstra wrote:
>
> > On Mon, Jul 1, 2019 at 12:30 PM Richard Henderson <
> > richard.henderson@linaro.org> wrote:
> >
> >> On 6/30/19 7:58 PM, Mark Cave-Ayland wrote:
> >>> I don't have space for a full set of images on the G4, however I've
> >> tried boot tests
> >>> on installer CDs for MacOS 9, OS X 10.2, Linux and HelenOS and it looks
> >> good here.
> >>>
> >>> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> [PPC32]
> >>
> >> Thanks!
> >>
> >> Hi
> >
> > I just compiled the v6 set applied to current master on my G5, Ubuntu 16.
> > command line:
> > ./qemu-system-ppc -L pc-bios -boot c m 512 -M mac99,via=pmu \
> > -netdev user,id=net1 -device sungem,netdev=net1 \
> > -drive file=10.3.img,format=raw,media=disk \
> >
> > With no specific cpu set, Mac OS 9.2 hard disk image and 9.2 iso do not
> get
> > to the desktop, they just hang while still in the openbios window. They
> > need -cpu G4 on the command line to get to the desktop.
> >
> > OSX 10.3 installed image boots to desktop.
> > OSX 10.3 iso boots to installer
> > OSX 10.4 installed image boots to desktop.
> > OSX 10.4 iso boot to installer
> > OSX 10.5 installed image boots to desktop.
> > OSX 10.5 iso boots to installer
> >
> > So there seems to be a difference between hosts: If ran on a G4 host
> there
> > is no need to add -cpu G4 to run Mac OS 9.x, while there is when ran on a
> > G5 host.
>
> Are there any outstanding issues with this patchset now, or is it ready to
> be merged?
> I'm really looking forward to seeing the improved performance when testing
> QEMU on my
> Mac Mini :)
>
>
Howard pointed to some illogical quirks of command line:

> If ran on a G4 host there is no need to add -cpu G4 to run Mac OS 9.x,
> while there is when ran on a G5 host.

I am not sure if Howard says that this is a consequence of this series
though.

Overall, I think this is a very good series - however, I had a number of
minor
objections to multiple patches, that don't affect (or affect in a minimal
way)
provided functionality - those objections are not addressed, nor properly
discussed - but I do think they should be addressed in order to get the
series
in a better shape before upstreaming.

Thanks,
Aleksandar


> ATB,
>
> Mark.
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes
  2019-09-03 17:37         ` Aleksandar Markovic
@ 2019-09-03 18:32           ` Mark Cave-Ayland
  2019-09-05 11:43             ` Aleksandar Markovic
  0 siblings, 1 reply; 40+ messages in thread
From: Mark Cave-Ayland @ 2019-09-03 18:32 UTC (permalink / raw)
  To: Aleksandar Markovic
  Cc: Richard Henderson, qemu-devel qemu-devel, Aleksandar Markovic,
	Howard Spoelstra

On 03/09/2019 18:37, Aleksandar Markovic wrote:

> On Tue, Sep 3, 2019 at 7:05 PM Mark Cave-Ayland <
> mark.cave-ayland@ilande.co.uk> wrote:
> 
>> On 01/07/2019 19:34, Howard Spoelstra wrote:
>>
>>> On Mon, Jul 1, 2019 at 12:30 PM Richard Henderson <
>>> richard.henderson@linaro.org> wrote:
>>>
>>>> On 6/30/19 7:58 PM, Mark Cave-Ayland wrote:
>>>>> I don't have space for a full set of images on the G4, however I've
>>>> tried boot tests
>>>>> on installer CDs for MacOS 9, OS X 10.2, Linux and HelenOS and it looks
>>>> good here.
>>>>>
>>>>> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> [PPC32]
>>>>
>>>> Thanks!
>>>>
>>>> Hi
>>>
>>> I just compiled the v6 set applied to current master on my G5, Ubuntu 16.
>>> command line:
>>> ./qemu-system-ppc -L pc-bios -boot c m 512 -M mac99,via=pmu \
>>> -netdev user,id=net1 -device sungem,netdev=net1 \
>>> -drive file=10.3.img,format=raw,media=disk \
>>>
>>> With no specific cpu set, Mac OS 9.2 hard disk image and 9.2 iso do not
>> get
>>> to the desktop, they just hang while still in the openbios window. They
>>> need -cpu G4 on the command line to get to the desktop.
>>>
>>> OSX 10.3 installed image boots to desktop.
>>> OSX 10.3 iso boots to installer
>>> OSX 10.4 installed image boots to desktop.
>>> OSX 10.4 iso boot to installer
>>> OSX 10.5 installed image boots to desktop.
>>> OSX 10.5 iso boots to installer
>>>
>>> So there seems to be a difference between hosts: If ran on a G4 host
>> there
>>> is no need to add -cpu G4 to run Mac OS 9.x, while there is when ran on a
>>> G5 host.
>>
>> Are there any outstanding issues with this patchset now, or is it ready to
>> be merged?
>> I'm really looking forward to seeing the improved performance when testing
>> QEMU on my
>> Mac Mini :)
>>
>>
> Howard pointed to some illogical quirks of command line:
> 
>> If ran on a G4 host there is no need to add -cpu G4 to run Mac OS 9.x,
>> while there is when ran on a G5 host.
> 
> I am not sure if Howard says that this is a consequence of this series
> though.

No, that has been an existing issue for a long time :)

> Overall, I think this is a very good series - however, I had a number of
> minor
> objections to multiple patches, that don't affect (or affect in a minimal
> way)
> provided functionality - those objections are not addressed, nor properly
> discussed - but I do think they should be addressed in order to get the
> series
> in a better shape before upstreaming.

I've had a quick look at some of your review comments, and certainly I can see how
the earlier revisions have benefited from your feedback. There has been a lot of
positive discussion, and Richard has taken the time to respond and update the
patchset over several weeks to its latest revision.

AFAICT the only remaining issue is that related to the ISA flags, but to me this
isn't something that should prevent the patchset being merged. I can certainly see
how the current flags implementation may not be considered technically correct, but
then from your comments I don't see that it would be something that would be
particularly difficult to change at a later date either.

The things that are important to me are i) is the patchset functionally correct and
ii) is it understandable and maintainable. I would say that the first point is
clearly true (both myself and Howard have spent a lot of time testing it), and given
that I had to delve into these patches to fix the R2 register issue on 32-bit PPC
then I can confirm that the contents of the patches were a reasonably accurate
representation of the changes described within. And that's from someone like me who
is mostly still a TCG beginner :)

From a slightly more selfish position as the PPC Mac machine maintainer, these
patches make a significant difference to me in that they reduce the MacOS boot times
during everyday testing. Now for someone like myself who works on QEMU as a hobby
outside of family life and a full time job, those few minutes are really important to
me and soon add up really quickly during testing.

I would really like these patches to be merged soon, since the worst thing that can
happen is that the patchset ends up bit-rotting and then all the time and effort put
into writing, testing and reviewing the patches by Richard, Howard, David, myself and
indeed your review time will all end up going to waste.


ATB,

Mark.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes
  2019-09-03 18:32           ` Mark Cave-Ayland
@ 2019-09-05 11:43             ` Aleksandar Markovic
  2019-09-27 12:13               ` Aleksandar Markovic
  0 siblings, 1 reply; 40+ messages in thread
From: Aleksandar Markovic @ 2019-09-05 11:43 UTC (permalink / raw)
  To: Mark Cave-Ayland
  Cc: Richard Henderson, qemu-devel qemu-devel, Aleksandar Markovic,
	Howard Spoelstra

ping for Richard

03.09.2019. 20.34, "Mark Cave-Ayland" <mark.cave-ayland@ilande.co.uk> је
написао/ла:
>
> On 03/09/2019 18:37, Aleksandar Markovic wrote:
>
> > On Tue, Sep 3, 2019 at 7:05 PM Mark Cave-Ayland <
> > mark.cave-ayland@ilande.co.uk> wrote:
> >
> >> On 01/07/2019 19:34, Howard Spoelstra wrote:
> >>
> >>> On Mon, Jul 1, 2019 at 12:30 PM Richard Henderson <
> >>> richard.henderson@linaro.org> wrote:
> >>>
> >>>> On 6/30/19 7:58 PM, Mark Cave-Ayland wrote:
> >>>>> I don't have space for a full set of images on the G4, however I've
> >>>> tried boot tests
> >>>>> on installer CDs for MacOS 9, OS X 10.2, Linux and HelenOS and it
looks
> >>>> good here.
> >>>>>
> >>>>> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> [PPC32]
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Hi
> >>>
> >>> I just compiled the v6 set applied to current master on my G5, Ubuntu
16.
> >>> command line:
> >>> ./qemu-system-ppc -L pc-bios -boot c m 512 -M mac99,via=pmu \
> >>> -netdev user,id=net1 -device sungem,netdev=net1 \
> >>> -drive file=10.3.img,format=raw,media=disk \
> >>>
> >>> With no specific cpu set, Mac OS 9.2 hard disk image and 9.2 iso do
not
> >> get
> >>> to the desktop, they just hang while still in the openbios window.
They
> >>> need -cpu G4 on the command line to get to the desktop.
> >>>
> >>> OSX 10.3 installed image boots to desktop.
> >>> OSX 10.3 iso boots to installer
> >>> OSX 10.4 installed image boots to desktop.
> >>> OSX 10.4 iso boot to installer
> >>> OSX 10.5 installed image boots to desktop.
> >>> OSX 10.5 iso boots to installer
> >>>
> >>> So there seems to be a difference between hosts: If ran on a G4 host
> >> there
> >>> is no need to add -cpu G4 to run Mac OS 9.x, while there is when ran
on a
> >>> G5 host.
> >>
> >> Are there any outstanding issues with this patchset now, or is it
ready to
> >> be merged?
> >> I'm really looking forward to seeing the improved performance when
testing
> >> QEMU on my
> >> Mac Mini :)
> >>
> >>
> > Howard pointed to some illogical quirks of command line:
> >
> >> If ran on a G4 host there is no need to add -cpu G4 to run Mac OS 9.x,
> >> while there is when ran on a G5 host.
> >
> > I am not sure if Howard says that this is a consequence of this series
> > though.
>
> No, that has been an existing issue for a long time :)
>
> > Overall, I think this is a very good series - however, I had a number of
> > minor
> > objections to multiple patches, that don't affect (or affect in a
minimal
> > way)
> > provided functionality - those objections are not addressed, nor
properly
> > discussed - but I do think they should be addressed in order to get the
> > series
> > in a better shape before upstreaming.
>
> I've had a quick look at some of your review comments, and certainly I
can see how
> the earlier revisions have benefited from your feedback. There has been a
lot of
> positive discussion, and Richard has taken the time to respond and update
the
> patchset over several weeks to its latest revision.
>
> AFAICT the only remaining issue is that related to the ISA flags, but to
me this
> isn't something that should prevent the patchset being merged. I can
certainly see
> how the current flags implementation may not be considered technically
correct, but
> then from your comments I don't see that it would be something that would
be
> particularly difficult to change at a later date either.
>
> The things that are important to me are i) is the patchset functionally
correct and
> ii) is it understandable and maintainable. I would say that the first
point is
> clearly true (both myself and Howard have spent a lot of time testing
it), and given
> that I had to delve into these patches to fix the R2 register issue on
32-bit PPC
> then I can confirm that the contents of the patches were a reasonably
accurate
> representation of the changes described within. And that's from someone
like me who
> is mostly still a TCG beginner :)
>
> From a slightly more selfish position as the PPC Mac machine maintainer,
these
> patches make a significant difference to me in that they reduce the MacOS
boot times
> during everyday testing. Now for someone like myself who works on QEMU as
a hobby
> outside of family life and a full time job, those few minutes are really
important to
> me and soon add up really quickly during testing.
>
> I would really like these patches to be merged soon, since the worst
thing that can
> happen is that the patchset ends up bit-rotting and then all the time and
effort put
> into writing, testing and reviewing the patches by Richard, Howard,
David, myself and
> indeed your review time will all end up going to waste.
>
>
> ATB,
>
> Mark.
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes
  2019-09-05 11:43             ` Aleksandar Markovic
@ 2019-09-27 12:13               ` Aleksandar Markovic
  0 siblings, 0 replies; 40+ messages in thread
From: Aleksandar Markovic @ 2019-09-27 12:13 UTC (permalink / raw)
  To: Mark Cave-Ayland
  Cc: Richard Henderson, qemu-devel qemu-devel, Aleksandar Markovic,
	Howard Spoelstra

[-- Attachment #1: Type: text/plain, Size: 5166 bytes --]

ping

05.09.2019. 13.43, "Aleksandar Markovic" <aleksandar.m.mail@gmail.com> је
написао/ла:
>
>
> ping for Richard
>
> 03.09.2019. 20.34, "Mark Cave-Ayland" <mark.cave-ayland@ilande.co.uk> је
написао/ла:
> >
> > On 03/09/2019 18:37, Aleksandar Markovic wrote:
> >
> > > On Tue, Sep 3, 2019 at 7:05 PM Mark Cave-Ayland <
> > > mark.cave-ayland@ilande.co.uk> wrote:
> > >
> > >> On 01/07/2019 19:34, Howard Spoelstra wrote:
> > >>
> > >>> On Mon, Jul 1, 2019 at 12:30 PM Richard Henderson <
> > >>> richard.henderson@linaro.org> wrote:
> > >>>
> > >>>> On 6/30/19 7:58 PM, Mark Cave-Ayland wrote:
> > >>>>> I don't have space for a full set of images on the G4, however
I've
> > >>>> tried boot tests
> > >>>>> on installer CDs for MacOS 9, OS X 10.2, Linux and HelenOS and it
looks
> > >>>> good here.
> > >>>>>
> > >>>>> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
[PPC32]
> > >>>>
> > >>>> Thanks!
> > >>>>
> > >>>> Hi
> > >>>
> > >>> I just compiled the v6 set applied to current master on my G5,
Ubuntu 16.
> > >>> command line:
> > >>> ./qemu-system-ppc -L pc-bios -boot c m 512 -M mac99,via=pmu \
> > >>> -netdev user,id=net1 -device sungem,netdev=net1 \
> > >>> -drive file=10.3.img,format=raw,media=disk \
> > >>>
> > >>> With no specific cpu set, Mac OS 9.2 hard disk image and 9.2 iso do
not
> > >> get
> > >>> to the desktop, they just hang while still in the openbios window.
They
> > >>> need -cpu G4 on the command line to get to the desktop.
> > >>>
> > >>> OSX 10.3 installed image boots to desktop.
> > >>> OSX 10.3 iso boots to installer
> > >>> OSX 10.4 installed image boots to desktop.
> > >>> OSX 10.4 iso boot to installer
> > >>> OSX 10.5 installed image boots to desktop.
> > >>> OSX 10.5 iso boots to installer
> > >>>
> > >>> So there seems to be a difference between hosts: If ran on a G4 host
> > >> there
> > >>> is no need to add -cpu G4 to run Mac OS 9.x, while there is when
ran on a
> > >>> G5 host.
> > >>
> > >> Are there any outstanding issues with this patchset now, or is it
ready to
> > >> be merged?
> > >> I'm really looking forward to seeing the improved performance when
testing
> > >> QEMU on my
> > >> Mac Mini :)
> > >>
> > >>
> > > Howard pointed to some illogical quirks of command line:
> > >
> > >> If ran on a G4 host there is no need to add -cpu G4 to run Mac OS
9.x,
> > >> while there is when ran on a G5 host.
> > >
> > > I am not sure if Howard says that this is a consequence of this series
> > > though.
> >
> > No, that has been an existing issue for a long time :)
> >
> > > Overall, I think this is a very good series - however, I had a number
of
> > > minor
> > > objections to multiple patches, that don't affect (or affect in a
minimal
> > > way)
> > > provided functionality - those objections are not addressed, nor
properly
> > > discussed - but I do think they should be addressed in order to get
the
> > > series
> > > in a better shape before upstreaming.
> >
> > I've had a quick look at some of your review comments, and certainly I
can see how
> > the earlier revisions have benefited from your feedback. There has been
a lot of
> > positive discussion, and Richard has taken the time to respond and
update the
> > patchset over several weeks to its latest revision.
> >
> > AFAICT the only remaining issue is that related to the ISA flags, but
to me this
> > isn't something that should prevent the patchset being merged. I can
certainly see
> > how the current flags implementation may not be considered technically
correct, but
> > then from your comments I don't see that it would be something that
would be
> > particularly difficult to change at a later date either.
> >
> > The things that are important to me are i) is the patchset functionally
correct and
> > ii) is it understandable and maintainable. I would say that the first
point is
> > clearly true (both myself and Howard have spent a lot of time testing
it), and given
> > that I had to delve into these patches to fix the R2 register issue on
32-bit PPC
> > then I can confirm that the contents of the patches were a reasonably
accurate
> > representation of the changes described within. And that's from someone
like me who
> > is mostly still a TCG beginner :)
> >
> > From a slightly more selfish position as the PPC Mac machine
maintainer, these
> > patches make a significant difference to me in that they reduce the
MacOS boot times
> > during everyday testing. Now for someone like myself who works on QEMU
as a hobby
> > outside of family life and a full time job, those few minutes are
really important to
> > me and soon add up really quickly during testing.
> >
> > I would really like these patches to be merged soon, since the worst
thing that can
> > happen is that the patchset ends up bit-rotting and then all the time
and effort put
> > into writing, testing and reviewing the patches by Richard, Howard,
David, myself and
> > indeed your review time will all end up going to waste.
> >
> >
> > ATB,
> >
> > Mark.
> >

[-- Attachment #2: Type: text/html, Size: 7102 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2019-09-27 14:13 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-29 13:00 [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes Richard Henderson
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 01/16] tcg/ppc: Introduce Altivec registers Richard Henderson
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 02/16] tcg/ppc: Introduce macro VX4() Richard Henderson
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 03/16] tcg/ppc: Introduce macros VRT(), VRA(), VRB(), VRC() Richard Henderson
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 04/16] tcg/ppc: Enable tcg backend vector compilation Richard Henderson
2019-06-30  9:46   ` Aleksandar Markovic
2019-06-30 10:48     ` Richard Henderson
2019-06-30 11:45       ` Aleksandar Markovic
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 05/16] tcg/ppc: Add support for load/store/logic/comparison Richard Henderson
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 06/16] tcg/ppc: Add support for vector maximum/minimum Richard Henderson
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 07/16] tcg/ppc: Add support for vector add/subtract Richard Henderson
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 08/16] tcg/ppc: Add support for vector saturated add/subtract Richard Henderson
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 09/16] tcg/ppc: Prepare case for vector multiply Richard Henderson
2019-06-30  9:52   ` Aleksandar Markovic
2019-06-30 10:49     ` Richard Henderson
2019-06-30 11:35       ` Aleksandar Markovic
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 10/16] tcg/ppc: Support vector shift by immediate Richard Henderson
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 11/16] tcg/ppc: Support vector multiply Richard Henderson
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 12/16] tcg/ppc: Support vector dup2 Richard Henderson
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 13/16] tcg/ppc: Enable Altivec detection Richard Henderson
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 14/16] tcg/ppc: Update vector support to v2.06 Richard Henderson
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 15/16] tcg/ppc: Update vector support to v2.07 Richard Henderson
2019-06-30 11:50   ` Aleksandar Markovic
2019-06-30 13:37   ` Aleksandar Markovic
2019-06-30 15:12     ` Richard Henderson
2019-07-01  3:57       ` Aleksandar Markovic
2019-07-01 10:29         ` Richard Henderson
2019-07-01 11:41           ` Aleksandar Markovic
2019-07-02 14:25             ` Richard Henderson
2019-07-10 10:52               ` Aleksandar Markovic
2019-06-29 13:00 ` [Qemu-devel] [PATCH v6 16/16] tcg/ppc: Update vector support to v3.00 Richard Henderson
2019-06-29 13:37 ` [Qemu-devel] [PATCH v6 00/16] tcg/ppc: Add vector opcodes no-reply
2019-06-30 17:58 ` Mark Cave-Ayland
2019-07-01 10:30   ` Richard Henderson
2019-07-01 18:34     ` Howard Spoelstra
2019-09-03 17:02       ` Mark Cave-Ayland
2019-09-03 17:37         ` Aleksandar Markovic
2019-09-03 18:32           ` Mark Cave-Ayland
2019-09-05 11:43             ` Aleksandar Markovic
2019-09-27 12:13               ` Aleksandar Markovic

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.