All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1
@ 2019-02-26 11:38 David Hildenbrand
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 01/33] s390x/tcg: Define vector instruction formats David Hildenbrand
                   ` (33 more replies)
  0 siblings, 34 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

This is the first part of vector instruction support for s390x. Parts
will be sent and reviewed piece by piece.

Part 1: Vector Support Instructions
Part 2: Vector Integer Instructions
Part 3: Vector String Instructions
Part 4: Vector Floating-Point Instructions

The current state can be found at (kept updated):
    https://github.com/davidhildenbrand/qemu/tree/vx
It is based on
    https://github.com/cohuck/qemu/tree/s390-next

To make use of vector instructions on my branch, make sure to specify
"-cpu max" for now.

With the current state I can boot Linux kernel + user space compiled with
SIMD support. This allows to boot distributions compiled exclusively for
z13, requiring SIMD support. Also, I have a growing set of tests for
kvm-unit-tests which I cross-test on a real s390x system.

In this part, the basic infrastructure and all Vector Support Instructions
introduced with the "Vector Facility" are added. The Vector Extension
Facilities are not considered for now.

We make use of the existing gvec expansion + ool (out-of-line) support.
This will be heavily used especially for part 2 (Integer Instructions)
where we can actually reuse quite some existing gvec expansions.

David Hildenbrand (33):
  s390x/tcg: Define vector instruction formats
  s390x/tcg: Check vector register instructions at central point
  s390x: Add one temporary vector register in CPU state for TCG
  s390x/tcg: Utilities for vector instruction helpers
  s390x/tcg: Implement VECTOR GATHER ELEMENT
  s390x/tcg: Implement VECTOR GENERATE BYTE MASK
  s390x/tcg: Implement VECTOR GENERATE MASK
  s390x/tcg: Implement VECTOR LOAD
  s390x/tcg: Implement VECTOR LOAD AND REPLICATE
  s390x/tcg: Implement VECTOR LOAD ELEMENT
  s390x/tcg: Implement VECTOR LOAD ELEMENT IMMEDIATE
  s390x/tcg: Implement VECTOR LOAD GR FROM VR ELEMENT
  s390x/tcg: Implement VECTOR LOAD LOGICAL ELEMENT AND ZERO
  s390x/tcg: Implement VECTOR LOAD MULTIPLE
  s390x/tcg: Implement VECTOR LOAD TO BLOCK BOUNDARY
  s390x/tcg: Implement VECTOR LOAD VR ELEMENT FROM GR
  s390x/tcg: Implement VECTOR LOAD VR FROM GRS DISJOINT
  s390x/tcg: Implement VECTOR LOAD WITH LENGTH
  s390x/tcg: Implement VECTOR MERGE (HIGH|LOW)
  s390x/tcg: Implement VECTOR PACK
  s390x/tcg: Implement VECTOR PACK (LOGICAL) SATURATE
  s390x/tcg: Implement VECTOR PERMUTE
  s390x/tcg: Implement VECTOR PERMUTE DOUBLEWORD IMMEDIATE
  s390x/tcg: Implement VECTOR REPLICATE
  s390x/tcg: Implement VECTOR REPLICATE IMMEDIATE
  s390x/tcg: Implement VECTOR SCATTER ELEMENT
  s390x/tcg: Implement VECTOR SELECT
  s390x/tcg: Implement VECTOR SIGN EXTEND TO DOUBLEWORD
  s390x/tcg: Implement VECTOR STORE
  s390x/tcg: Implement VECTOR STORE ELEMENT
  s390x/tcg: Implement VECTOR STORE MULTIPLE
  s390x/tcg: Implement VECTOR STORE WITH LENGTH
  s390x/tcg: Implement VECTOR UNPACK *

 target/s390x/Makefile.objs      |   1 +
 target/s390x/cpu.h              |  18 +
 target/s390x/helper.h           |  11 +
 target/s390x/insn-data.def      |  83 +++
 target/s390x/insn-format.def    |  25 +
 target/s390x/translate.c        |  64 ++-
 target/s390x/translate_vx.inc.c | 885 ++++++++++++++++++++++++++++++++
 target/s390x/vec.h              |  31 ++
 target/s390x/vec_helper.c       | 220 ++++++++
 9 files changed, 1337 insertions(+), 1 deletion(-)
 create mode 100644 target/s390x/translate_vx.inc.c
 create mode 100644 target/s390x/vec.h
 create mode 100644 target/s390x/vec_helper.c

-- 
2.17.2

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 01/33] s390x/tcg: Define vector instruction formats
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-26 18:24   ` Richard Henderson
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 02/33] s390x/tcg: Check vector register instructions at central point David Hildenbrand
                   ` (32 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

These are the new instruction formats related to vector instructions as
up to the z14 (a.k.a. latest PoP).

As v2 appeares (like x2 in VRX) with d2/b2 in VRV, we have to assign it a
higher field number to avoid collisions.

Properly take care of the MSB (to be able to address 32 registers) for
each vector register field stored in the RXB field (Bit 36 - 30  for all
vector instructions). As we have 32 bit vector registers and the
"v" fields are only 4 bit in size, the 5th bit is stored in the RXB.
We use a new type to indicate that the MSB has to be fetched from the
RXB.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-format.def | 25 +++++++++++++++++++++++
 target/s390x/translate.c     | 39 +++++++++++++++++++++++++++++++++++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/target/s390x/insn-format.def b/target/s390x/insn-format.def
index 4297ff4165..6253edbd19 100644
--- a/target/s390x/insn-format.def
+++ b/target/s390x/insn-format.def
@@ -54,3 +54,28 @@ F4(SS_e,  R(1, 8),     BD(2,16,20), R(3,12),     BD(4,32,36))
 F3(SS_f,  BD(1,16,20), L(2,8,8),    BD(2,32,36))
 F2(SSE,   BD(1,16,20), BD(2,32,36))
 F3(SSF,   BD(1,16,20), BD(2,32,36), R(3,8))
+F3(VRI_a, V(1,8),      I(2,16,16),  M(3,32))
+F4(VRI_b, V(1,8),      I(2,16,8),   I(3,24,8),   M(4,32))
+F4(VRI_c, V(1,8),      V(3,12),     I(2,16,16),  M(4,32))
+F5(VRI_d, V(1,8),      V(2,12),     V(3,16),     I(4,24,8),   M(5,32))
+F5(VRI_e, V(1,8),      V(2,12),     I(3,16,12),  M(5,28),     M(4,32))
+F5(VRI_f, V(1,8),      V(2,12),     V(3,16),     M(5,24),     I(4,28,8))
+F5(VRI_g, V(1,8),      V(2,12),     I(4,16,8),   M(5,24),     I(3,28,8))
+F3(VRI_h, V(1,8),      I(2,16,16),  I(3,32,4))
+F4(VRI_i, V(1,8),      R(2,12),     M(4,24),     I(3,28,8))
+F5(VRR_a, V(1,8),      V(2,12),     M(5,24),     M(4,28),     M(3,32))
+F5(VRR_b, V(1,8),      V(2,12),     V(3,16),     M(5,24),     M(4,32))
+F6(VRR_c, V(1,8),      V(2,12),     V(3,16),     M(6,24),     M(5,28),  M(4,32))
+F6(VRR_d, V(1,8),      V(2,12),     V(3,16),     M(5,20),     M(6,24),  V(4,32))
+F6(VRR_e, V(1,8),      V(2,12),     V(3,16),     M(6,20),     M(5,28),  V(4,32))
+F3(VRR_f, V(1,8),      R(2,12),     R(3,16))
+F1(VRR_g, V(1,12))
+F3(VRR_h, V(1,12),     V(2,16),     M(3,24))
+F3(VRR_i, R(1,8),      V(2,12),     M(3,24))
+F4(VRS_a, V(1,8),      V(3,12),     BD(2,16,20), M(4,32))
+F4(VRS_b, V(1,8),      R(3,12),     BD(2,16,20), M(4,32))
+F4(VRS_c, R(1,8),      V(3,12),     BD(2,16,20), M(4,32))
+F3(VRS_d, R(3,12),     BD(2,16,20), V(1,32))
+F4(VRV,   V(1,8),      V(2,12),     BD(2,16,20), M(3,32))
+F3(VRX,   V(1,8),      BXD(2),      M(3,32))
+F3(VSI,   I(3,8,8),    BD(2,16,20), V(1,32))
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 41fb466bb4..1d8030f8cd 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -985,6 +985,7 @@ static void free_compare(DisasCompare *c)
 #define F3(N, X1, X2, X3)             F0(N)
 #define F4(N, X1, X2, X3, X4)         F0(N)
 #define F5(N, X1, X2, X3, X4, X5)     F0(N)
+#define F6(N, X1, X2, X3, X4, X5, X6) F0(N)
 
 typedef enum {
 #include "insn-format.def"
@@ -996,6 +997,7 @@ typedef enum {
 #undef F3
 #undef F4
 #undef F5
+#undef F6
 
 /* Define a structure to hold the decoded fields.  We'll store each inside
    an array indexed by an enum.  In order to conserve memory, we'll arrange
@@ -1010,6 +1012,8 @@ enum DisasFieldIndexO {
     FLD_O_m1,
     FLD_O_m3,
     FLD_O_m4,
+    FLD_O_m5,
+    FLD_O_m6,
     FLD_O_b1,
     FLD_O_b2,
     FLD_O_b4,
@@ -1023,7 +1027,11 @@ enum DisasFieldIndexO {
     FLD_O_i2,
     FLD_O_i3,
     FLD_O_i4,
-    FLD_O_i5
+    FLD_O_i5,
+    FLD_O_v1,
+    FLD_O_v2,
+    FLD_O_v3,
+    FLD_O_v4,
 };
 
 enum DisasFieldIndexC {
@@ -1031,6 +1039,7 @@ enum DisasFieldIndexC {
     FLD_C_m1 = 0,
     FLD_C_b1 = 0,
     FLD_C_i1 = 0,
+    FLD_C_v1 = 0,
 
     FLD_C_r2 = 1,
     FLD_C_b2 = 1,
@@ -1039,20 +1048,25 @@ enum DisasFieldIndexC {
     FLD_C_r3 = 2,
     FLD_C_m3 = 2,
     FLD_C_i3 = 2,
+    FLD_C_v3 = 2,
 
     FLD_C_m4 = 3,
     FLD_C_b4 = 3,
     FLD_C_i4 = 3,
     FLD_C_l1 = 3,
+    FLD_C_v4 = 3,
 
     FLD_C_i5 = 4,
     FLD_C_d1 = 4,
+    FLD_C_m5 = 4,
 
     FLD_C_d2 = 5,
+    FLD_C_m6 = 5,
 
     FLD_C_d4 = 6,
     FLD_C_x2 = 6,
     FLD_C_l2 = 6,
+    FLD_C_v2 = 6,
 
     NUM_C_FIELD = 7
 };
@@ -1097,6 +1111,7 @@ typedef struct DisasFormatInfo {
 
 #define R(N, B)       {  B,  4, 0, FLD_C_r##N, FLD_O_r##N }
 #define M(N, B)       {  B,  4, 0, FLD_C_m##N, FLD_O_m##N }
+#define V(N, B)       {  B,  4, 3, FLD_C_v##N, FLD_O_v##N }
 #define BD(N, BB, BD) { BB,  4, 0, FLD_C_b##N, FLD_O_b##N }, \
                       { BD, 12, 0, FLD_C_d##N, FLD_O_d##N }
 #define BXD(N)        { 16,  4, 0, FLD_C_b##N, FLD_O_b##N }, \
@@ -1116,6 +1131,7 @@ typedef struct DisasFormatInfo {
 #define F3(N, X1, X2, X3)         { { X1, X2, X3 } },
 #define F4(N, X1, X2, X3, X4)     { { X1, X2, X3, X4 } },
 #define F5(N, X1, X2, X3, X4, X5) { { X1, X2, X3, X4, X5 } },
+#define F6(N, X1, X2, X3, X4, X5, X6)       { { X1, X2, X3, X4, X5, X6 } },
 
 static const DisasFormatInfo format_info[] = {
 #include "insn-format.def"
@@ -1127,8 +1143,10 @@ static const DisasFormatInfo format_info[] = {
 #undef F3
 #undef F4
 #undef F5
+#undef F6
 #undef R
 #undef M
+#undef V
 #undef BD
 #undef BXD
 #undef BDL
@@ -6119,6 +6137,25 @@ static void extract_field(DisasFields *o, const DisasField *f, uint64_t insn)
     case 2: /* dl+dh split, signed 20 bit. */
         r = ((int8_t)r << 12) | (r >> 8);
         break;
+    case 3: /* MSB stored in RXB */
+        g_assert(f->size == 4);
+        switch (f->beg) {
+        case 8:
+            r |= extract64(insn, 63 - 36, 1) << 4;
+            break;
+        case 12:
+            r |= extract64(insn, 63 - 37, 1) << 4;
+            break;
+        case 16:
+            r |= extract64(insn, 63 - 38, 1) << 4;
+            break;
+        case 32:
+            r |= extract64(insn, 63 - 39, 1) << 4;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        break;
     default:
         abort();
     }
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 02/33] s390x/tcg: Check vector register instructions at central point
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 01/33] s390x/tcg: Define vector instruction formats David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-26 18:26   ` Richard Henderson
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 03/33] s390x: Add one temporary vector register in CPU state for TCG David Hildenbrand
                   ` (31 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Check them at a central point. We'll use a new instruction flag to
flag all vector instructions (IF_VEC) and handle it very similar to
AFP, whereby we use another unused position in the PSW mask to store
the state of vector register enablement per translation block.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/cpu.h       |  7 +++++++
 target/s390x/translate.c | 12 ++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index b71ac5183d..cb6d77053a 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -257,6 +257,7 @@ extern const struct VMStateDescription vmstate_s390_cpu;
 /* PSW defines */
 #undef PSW_MASK_PER
 #undef PSW_MASK_UNUSED_2
+#undef PSW_MASK_UNUSED_3
 #undef PSW_MASK_DAT
 #undef PSW_MASK_IO
 #undef PSW_MASK_EXT
@@ -276,6 +277,7 @@ extern const struct VMStateDescription vmstate_s390_cpu;
 
 #define PSW_MASK_PER            0x4000000000000000ULL
 #define PSW_MASK_UNUSED_2       0x2000000000000000ULL
+#define PSW_MASK_UNUSED_3       0x1000000000000000ULL
 #define PSW_MASK_DAT            0x0400000000000000ULL
 #define PSW_MASK_IO             0x0200000000000000ULL
 #define PSW_MASK_EXT            0x0100000000000000ULL
@@ -323,12 +325,14 @@ extern const struct VMStateDescription vmstate_s390_cpu;
 
 /* we'll use some unused PSW positions to store CR flags in tb flags */
 #define FLAG_MASK_AFP           (PSW_MASK_UNUSED_2 >> FLAG_MASK_PSW_SHIFT)
+#define FLAG_MASK_VECTOR        (PSW_MASK_UNUSED_3 >> FLAG_MASK_PSW_SHIFT)
 
 /* Control register 0 bits */
 #define CR0_LOWPROT             0x0000000010000000ULL
 #define CR0_SECONDARY           0x0000000004000000ULL
 #define CR0_EDAT                0x0000000000800000ULL
 #define CR0_AFP                 0x0000000000040000ULL
+#define CR0_VECTOR              0x0000000000020000ULL
 #define CR0_EMERGENCY_SIGNAL_SC 0x0000000000004000ULL
 #define CR0_EXTERNAL_CALL_SC    0x0000000000002000ULL
 #define CR0_CKC_SC              0x0000000000000800ULL
@@ -373,6 +377,9 @@ static inline void cpu_get_tb_cpu_state(CPUS390XState* env, target_ulong *pc,
     if (env->cregs[0] & CR0_AFP) {
         *flags |= FLAG_MASK_AFP;
     }
+    if (env->cregs[0] & CR0_VECTOR) {
+        *flags |= FLAG_MASK_VECTOR;
+    }
 }
 
 /* PER bits from control register 9 */
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 1d8030f8cd..d52c02c572 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -1203,6 +1203,7 @@ typedef struct {
 #define IF_BFP      0x0008      /* binary floating point instruction */
 #define IF_DFP      0x0010      /* decimal floating point instruction */
 #define IF_PRIV     0x0020      /* privileged instruction */
+#define IF_VEC      0x0040      /* vector instruction */
 
 struct DisasInsn {
     unsigned opc:16;
@@ -6337,11 +6338,22 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s)
             if (insn->flags & IF_DFP) {
                 dxc = 3;
             }
+            if (insn->flags & IF_VEC) {
+                dxc = 0xfe;
+            }
             if (dxc) {
                 gen_data_exception(dxc);
                 return DISAS_NORETURN;
             }
         }
+
+        /* if vector instructions not enabled, executing them is forbidden */
+        if (insn->flags & IF_VEC) {
+            if (!((s->base.tb->flags & FLAG_MASK_VECTOR))) {
+                gen_data_exception(0xfe);
+                return DISAS_NORETURN;
+            }
+        }
     }
 
     /* Check for insn specification exceptions.  */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 03/33] s390x: Add one temporary vector register in CPU state for TCG
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 01/33] s390x/tcg: Define vector instruction formats David Hildenbrand
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 02/33] s390x/tcg: Check vector register instructions at central point David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-26 18:36   ` Richard Henderson
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 04/33] s390x/tcg: Utilities for vector instruction helpers David Hildenbrand
                   ` (30 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

We sometimes want to work on a temporary vector register instead of the
actual destination, because source and destination might overlap. An
alternative would be loading the vector into two i64 variables, but than
separate handling for accessing the vector elements would be needed.
This is easier. Add one for now as that seems to be enough.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/cpu.h       | 11 +++++++++++
 target/s390x/translate.c |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index cb6d77053a..a8dc0b2b83 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -67,6 +67,17 @@ struct CPUS390XState {
      * vregs[0][0] -> vregs[15][0] are 16 floating point registers
      */
     CPU_DoubleU vregs[32][2];  /* vector registers */
+#ifdef CONFIG_TCG
+#define TMP_VREG_0   33
+    /*
+     * Temporary vector registers used while processing vector instructions
+     * in TCG. This is helpful e.g. when source and destination registers
+     * overlap for certain instructions in translate functions. Content valid
+     * only within execution of one translated block, therefore no migration is
+     * needed. Resets don't mather, but has to be properly aligned.
+     */
+    CPU_DoubleU tmp_vregs[1][2];
+#endif
     uint32_t aregs[16];    /* access registers */
     uint8_t riccb[64];     /* runtime instrumentation control */
     uint64_t gscb[4];      /* guarded storage control */
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index d52c02c572..8733d19182 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -147,6 +147,9 @@ void s390x_translate_init(void)
 
 static inline int vec_full_reg_offset(uint8_t reg)
 {
+    if (reg == TMP_VREG_0) {
+        return offsetof(CPUS390XState, tmp_vregs[0][0].d);
+    }
     g_assert(reg < 32);
     return offsetof(CPUS390XState, vregs[reg][0].d);
 }
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 04/33] s390x/tcg: Utilities for vector instruction helpers
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (2 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 03/33] s390x: Add one temporary vector register in CPU state for TCG David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 05/33] s390x/tcg: Implement VECTOR GATHER ELEMENT David Hildenbrand
                   ` (29 subsequent siblings)
  33 siblings, 0 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

We'll have to read/write vector elements quite frequently from helpers.
The tricky bit is properly taking care of endianess. Handle it similar
to aarch64.

target/s390x/vec_helper.c will later also contain vector support
instruction helpers.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/Makefile.objs |  1 +
 target/s390x/vec.h         | 31 +++++++++++++
 target/s390x/vec_helper.c  | 90 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 122 insertions(+)
 create mode 100644 target/s390x/vec.h
 create mode 100644 target/s390x/vec_helper.c

diff --git a/target/s390x/Makefile.objs b/target/s390x/Makefile.objs
index 22a9a9927a..68eeee3d2f 100644
--- a/target/s390x/Makefile.objs
+++ b/target/s390x/Makefile.objs
@@ -1,6 +1,7 @@
 obj-y += cpu.o cpu_models.o cpu_features.o gdbstub.o interrupt.o helper.o
 obj-$(CONFIG_TCG) += translate.o cc_helper.o excp_helper.o fpu_helper.o
 obj-$(CONFIG_TCG) += int_helper.o mem_helper.o misc_helper.o crypto_helper.o
+obj-$(CONFIG_TCG) += vec_helper.o
 obj-$(CONFIG_SOFTMMU) += machine.o ioinst.o arch_dump.o mmu_helper.o diag.o
 obj-$(CONFIG_SOFTMMU) += sigp.o
 obj-$(CONFIG_KVM) += kvm.o
diff --git a/target/s390x/vec.h b/target/s390x/vec.h
new file mode 100644
index 0000000000..c03be1a9c9
--- /dev/null
+++ b/target/s390x/vec.h
@@ -0,0 +1,31 @@
+/*
+ * QEMU TCG support -- s390x vector utilitites
+ *
+ * Copyright (C) 2019 Red Hat Inc
+ *
+ * Authors:
+ *   David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef S390X_VEC_H
+#define S390X_VEC_H
+
+typedef union S390Vector {
+    uint64_t doubleword[2];
+    uint32_t word[4];
+    uint16_t halfword[8];
+    uint8_t byte[16];
+} S390Vector;
+
+uint8_t s390_vec_read_element8(const S390Vector *v, uint8_t enr);
+uint16_t s390_vec_read_element16(const S390Vector *v, uint8_t enr);
+uint32_t s390_vec_read_element32(const S390Vector *v, uint8_t enr);
+uint64_t s390_vec_read_element64(const S390Vector *v, uint8_t enr);
+void s390_vec_write_element8(S390Vector *v, uint8_t enr, uint8_t data);
+void s390_vec_write_element16(S390Vector *v, uint8_t enr, uint16_t data);
+void s390_vec_write_element32(S390Vector *v, uint8_t enr, uint32_t data);
+void s390_vec_write_element64(S390Vector *v, uint8_t enr, uint64_t data);
+
+#endif /* S390X_VEC_H */
diff --git a/target/s390x/vec_helper.c b/target/s390x/vec_helper.c
new file mode 100644
index 0000000000..3e21e440ba
--- /dev/null
+++ b/target/s390x/vec_helper.c
@@ -0,0 +1,90 @@
+/*
+ * QEMU TCG support -- s390x vector support instructions and utilitites
+ *
+ * Copyright (C) 2019 Red Hat Inc
+ *
+ * Authors:
+ *   David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "vec.h"
+#include "tcg/tcg.h"
+
+/*
+ * Each vector is stored as two 64bit host values. So when talking about
+ * byte/halfword/word numbers, we have to take care of proper translation
+ * between element numbers.
+ *
+ * Big Endian (target/possible host)
+ * B:  [ 0][ 1][ 2][ 3][ 4][ 5][ 6][ 7] - [ 8][ 9][10][11][12][13][14][15]
+ * HW: [     0][     1][     2][     3] - [     4][     5][     6][     7]
+ * W:  [             0][             1] - [             2][             3]
+ * DW: [                             0] - [                             1]
+ *
+ * Little Endian (possible host)
+ * B:  [ 7][ 6][ 5][ 4][ 3][ 2][ 1][ 0] - [15][14][13][12][11][10][ 9][ 8]
+ * HW: [     3][     2][     1][     0] - [     7][     6][     5][     4]
+ * W:  [             1][             0] - [             3][             2]
+ * DW: [                             0] - [                             1]
+ */
+#ifndef HOST_WORDS_BIGENDIAN
+#define H1(x)  ((x) ^ 7)
+#define H2(x)  ((x) ^ 3)
+#define H4(x)  ((x) ^ 1)
+#else
+#define H1(x)  (x)
+#define H2(x)  (x)
+#define H4(x)  (x)
+#endif
+
+uint8_t s390_vec_read_element8(const S390Vector *v, uint8_t enr)
+{
+    g_assert(enr < 16);
+    return v->byte[H1(enr)];
+}
+
+uint16_t s390_vec_read_element16(const S390Vector *v, uint8_t enr)
+{
+    g_assert(enr < 8);
+    return v->halfword[H2(enr)];
+}
+
+uint32_t s390_vec_read_element32(const S390Vector *v, uint8_t enr)
+{
+    g_assert(enr < 4);
+    return v->word[H4(enr)];
+}
+
+uint64_t s390_vec_read_element64(const S390Vector *v, uint8_t enr)
+{
+    g_assert(enr < 2);
+    return v->doubleword[enr];
+}
+
+void s390_vec_write_element8(S390Vector *v, uint8_t enr, uint8_t data)
+{
+    g_assert(enr < 16);
+    v->byte[H1(enr)] = data;
+}
+
+void s390_vec_write_element16(S390Vector *v, uint8_t enr, uint16_t data)
+{
+    g_assert(enr < 8);
+    v->halfword[H2(enr)] = data;
+}
+
+void s390_vec_write_element32(S390Vector *v, uint8_t enr, uint32_t data)
+{
+    g_assert(enr < 4);
+    v->word[H4(enr)] = data;
+}
+
+void s390_vec_write_element64(S390Vector *v, uint8_t enr, uint64_t data)
+{
+    g_assert(enr < 2);
+    v->doubleword[enr] = data;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 05/33] s390x/tcg: Implement VECTOR GATHER ELEMENT
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (3 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 04/33] s390x/tcg: Utilities for vector instruction helpers David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-26 18:44   ` Richard Henderson
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 06/33] s390x/tcg: Implement VECTOR GENERATE BYTE MASK David Hildenbrand
                   ` (28 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Let's start with a more involved one, but it is the first in the list
of vector support instructions (introduced with the vector facility).

Good thing is, we need a lot of basic infrastructure for this. Reading
and writing vector elements, checking element validity as well as loading
vector elements from memory. Storing will be added later, once needed.

All vector instruction related translation functions will reside in
translate_vx.inc.c, to be included in translate.c - similar to how
other architectures handle it.

While at it, directly add some documentation (which contains parts about
things added in follow-up patches, but splitting this up does not make
too much sense).

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |   6 ++
 target/s390x/translate.c        |   2 +
 target/s390x/translate_vx.inc.c | 138 ++++++++++++++++++++++++++++++++
 3 files changed, 146 insertions(+)
 create mode 100644 target/s390x/translate_vx.inc.c

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 61b750a855..2b06cc9130 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -972,6 +972,12 @@
     D(0xb93e, KIMD,    RRE,   MSA,  0, 0, 0, 0, msa, 0, S390_FEAT_TYPE_KIMD)
     D(0xb93f, KLMD,    RRE,   MSA,  0, 0, 0, 0, msa, 0, S390_FEAT_TYPE_KLMD)
 
+/* === Vector Support Instructions === */
+
+/* VECTOR GATHER ELEMENT */
+    E(0xe713, VGEF,    VRV,   V,   la2, 0, 0, 0, vge, 0, MO_32, IF_VEC)
+    E(0xe712, VGEG,    VRV,   V,   la2, 0, 0, 0, vge, 0, MO_64, IF_VEC)
+
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
     E(0xb250, CSP,     RRE,   Z,   r1_32u, ra2, r1_P, 0, csp, 0, MO_TEUL, IF_PRIV)
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 8733d19182..3935bc8bb7 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -5123,6 +5123,8 @@ static DisasJumpType op_mpcifc(DisasContext *s, DisasOps *o)
 }
 #endif
 
+#include "translate_vx.inc.c"
+
 /* ====================================================================== */
 /* The "Cc OUTput" generators.  Given the generated output (and in some cases
    the original inputs), update the various cc data structures in order to
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
new file mode 100644
index 0000000000..56f403e40d
--- /dev/null
+++ b/target/s390x/translate_vx.inc.c
@@ -0,0 +1,138 @@
+/*
+ * QEMU TCG support -- s390x vector instruction translation functions
+ *
+ * Copyright (C) 2019 Red Hat Inc
+ *
+ * Authors:
+ *   David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+/*
+ * For most instructions that use the same element size for reads and
+ * writes, we can use real gvec vector expansion, which potantially uses
+ * real host vector instructions. As they only work up to 64 bit elements,
+ * 128 bit elements (vector is a single element) have to be handled
+ * differently. Operations that are too complicated to encode via TCG ops
+ * are handled via gvec ool (out-of-line) handlers.
+ *
+ * As soon as instructions use different element sizes for reads and writes
+ * or access elements "out of their element scope" we expand them manually
+ * in fancy loops, as gvec expansion does not deal with actual element
+ * numbers and does also not support access to other elements.
+ *
+ * 128 bit elements:
+ *  As we only have i32/i64, such elements have to be loaded into two
+ *  i64 values and can then be processed e.g. by tcg_gen_add2_i64.
+ *
+ * Sizes:
+ *  On s390x, the operand size (oprsz) and the maximum size (maxsz) are
+ *  always 16 (128 bit). What gvec code calls "vece", s390x calls "es",
+ *  a.k.a. "element size". These values nicely map to MO_8 ... MO_64. Only
+ *  128 bit element size has to be treated in a special way (MO_64 + 1).
+ *
+ * CC handling:
+ *  As gvec ool-helpers can currently not return values (besides via
+ *  pointers like vectors or cpu_env), whenever we have to set the CC and
+ *  can't conclude the value from the result vector, we will directly
+ *  set it in "env->cc_op" and mark it as static via set_cc_static()".
+ *  Whenever this is done, the helper writes globals (cc_op).
+ */
+
+#define NUM_VEC_ELEMENT_BYTES(es) (1 << (es))
+#define NUM_VEC_ELEMENTS(es) (16 / NUM_VEC_ELEMENT_BYTES(es))
+
+static inline bool valid_vec_element(uint8_t enr, TCGMemOp es)
+{
+    return !(enr & ~(NUM_VEC_ELEMENTS(es) - 1));
+}
+
+static void read_vec_element_i64(TCGv_i64 dst, uint8_t reg, uint8_t enr,
+                                 TCGMemOp memop)
+{
+    const int offs = vec_reg_offset(reg, enr, memop & MO_SIZE);
+
+    switch (memop) {
+    case MO_8:
+        tcg_gen_ld8u_i64(dst, cpu_env, offs);
+        break;
+    case MO_16:
+        tcg_gen_ld16u_i64(dst, cpu_env, offs);
+        break;
+    case MO_32:
+        tcg_gen_ld32u_i64(dst, cpu_env, offs);
+        break;
+    case MO_8 | MO_SIGN:
+        tcg_gen_ld8s_i64(dst, cpu_env, offs);
+        break;
+    case MO_16 | MO_SIGN:
+        tcg_gen_ld16s_i64(dst, cpu_env, offs);
+        break;
+    case MO_32 | MO_SIGN:
+        tcg_gen_ld32s_i64(dst, cpu_env, offs);
+        break;
+    case MO_64:
+    case MO_64 | MO_SIGN:
+        tcg_gen_ld_i64(dst, cpu_env, offs);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void write_vec_element_i64(TCGv_i64 src, int reg, uint8_t enr,
+                                  TCGMemOp memop)
+{
+    const int offs = vec_reg_offset(reg, enr, memop & MO_SIZE);
+
+    switch (memop) {
+    case MO_8:
+        tcg_gen_st8_i64(src, cpu_env, offs);
+        break;
+    case MO_16:
+        tcg_gen_st16_i64(src, cpu_env, offs);
+        break;
+    case MO_32:
+        tcg_gen_st32_i64(src, cpu_env, offs);
+        break;
+    case MO_64:
+        tcg_gen_st_i64(src, cpu_env, offs);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void load_vec_element(DisasContext *s, uint8_t reg, uint8_t enr,
+                             TCGv_i64 addr, uint8_t es)
+{
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    tcg_gen_qemu_ld_i64(tmp, addr, get_mem_index(s), MO_TE | es);
+    write_vec_element_i64(tmp, reg, enr, es);
+
+    tcg_temp_free_i64(tmp);
+}
+
+static DisasJumpType op_vge(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = s->insn->data;
+    const uint8_t enr = get_field(s->fields, m3);
+    TCGv_i64 tmp;
+
+    if (!valid_vec_element(enr, es)) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tmp = tcg_temp_new_i64();
+    read_vec_element_i64(tmp, get_field(s->fields, v2), enr, es);
+    tcg_gen_add_i64(o->addr1, o->addr1, tmp);
+    gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 0);
+
+    load_vec_element(s, get_field(s->fields, v1), enr, o->addr1, es);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 06/33] s390x/tcg: Implement VECTOR GENERATE BYTE MASK
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (4 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 05/33] s390x/tcg: Implement VECTOR GATHER ELEMENT David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-26 19:12   ` Richard Henderson
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 07/33] s390x/tcg: Implement VECTOR GENERATE MASK David Hildenbrand
                   ` (27 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

As we are working on byte elements, we can use i32 for element access.
Add write_vec_element_i32().

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 39 +++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 2b06cc9130..1bdfcf8130 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -977,6 +977,8 @@
 /* VECTOR GATHER ELEMENT */
     E(0xe713, VGEF,    VRV,   V,   la2, 0, 0, 0, vge, 0, MO_32, IF_VEC)
     E(0xe712, VGEG,    VRV,   V,   la2, 0, 0, 0, vge, 0, MO_64, IF_VEC)
+/* VECTOR GENERATE BYTE MASK */
+    F(0xe744, VGBM,    VRI_a, V,   0, 0, 0, 0, vgbm, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 56f403e40d..7775401dd3 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -105,6 +105,26 @@ static void write_vec_element_i64(TCGv_i64 src, int reg, uint8_t enr,
     }
 }
 
+static void write_vec_element_i32(TCGv_i32 src, int reg, uint8_t enr,
+                                  TCGMemOp memop)
+{
+    const int offs = vec_reg_offset(reg, enr, memop & MO_SIZE);
+
+    switch (memop) {
+    case MO_8:
+        tcg_gen_st8_i32(src, cpu_env, offs);
+        break;
+    case MO_16:
+        tcg_gen_st16_i32(src, cpu_env, offs);
+        break;
+    case MO_32:
+        tcg_gen_st_i32(src, cpu_env, offs);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void load_vec_element(DisasContext *s, uint8_t reg, uint8_t enr,
                              TCGv_i64 addr, uint8_t es)
 {
@@ -136,3 +156,22 @@ static DisasJumpType op_vge(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vgbm(DisasContext *s, DisasOps *o)
+{
+    const uint16_t i2 = get_field(s->fields, i2);
+    TCGv_i32 ones = tcg_const_i32(-1u);
+    TCGv_i32 zeroes = tcg_const_i32(0);
+    int i;
+
+    for (i = 0; i < 16; i++) {
+        if (extract32(i2, 15 - i, 1)) {
+            write_vec_element_i32(ones, get_field(s->fields, v1), i, MO_8);
+        } else {
+            write_vec_element_i32(zeroes, get_field(s->fields, v1), i, MO_8);
+        }
+    }
+    tcg_temp_free_i32(ones);
+    tcg_temp_free_i32(zeroes);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 07/33] s390x/tcg: Implement VECTOR GENERATE MASK
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (5 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 06/33] s390x/tcg: Implement VECTOR GENERATE BYTE MASK David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-26 21:16   ` David Hildenbrand
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 08/33] s390x/tcg: Implement VECTOR LOAD David Hildenbrand
                   ` (26 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

This is the first instruction that uses gvec expansion for duplicating
elements. We will use makros for most gvec calls to simplify translating
vector numbers into offsets (and to not have to worry about oprsz and
maxsz).

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate.c        |  1 +
 target/s390x/translate_vx.inc.c | 34 +++++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 1bdfcf8130..a3a0df7788 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -979,6 +979,8 @@
     E(0xe712, VGEG,    VRV,   V,   la2, 0, 0, 0, vge, 0, MO_64, IF_VEC)
 /* VECTOR GENERATE BYTE MASK */
     F(0xe744, VGBM,    VRI_a, V,   0, 0, 0, 0, vgbm, 0, IF_VEC)
+/* VECTOR GENERATE MASK */
+    F(0xe746, VGM,     VRI_b, V,   0, 0, 0, 0, vgm, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 3935bc8bb7..56c146f91e 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -34,6 +34,7 @@
 #include "disas/disas.h"
 #include "exec/exec-all.h"
 #include "tcg-op.h"
+#include "tcg-op-gvec.h"
 #include "qemu/log.h"
 #include "qemu/host-utils.h"
 #include "exec/cpu_ldst.h"
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 7775401dd3..ed63b2ca22 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -43,6 +43,7 @@
 
 #define NUM_VEC_ELEMENT_BYTES(es) (1 << (es))
 #define NUM_VEC_ELEMENTS(es) (16 / NUM_VEC_ELEMENT_BYTES(es))
+#define NUM_VEC_ELEMENT_BITS(es) (NUM_VEC_ELEMENT_BYTES(es) * BITS_PER_BYTE)
 
 static inline bool valid_vec_element(uint8_t enr, TCGMemOp es)
 {
@@ -136,6 +137,9 @@ static void load_vec_element(DisasContext *s, uint8_t reg, uint8_t enr,
     tcg_temp_free_i64(tmp);
 }
 
+#define gen_gvec_dup_i64(es, v1, c) \
+    tcg_gen_gvec_dup_i64(es, vec_full_reg_offset(v1), 16, 16, c)
+
 static DisasJumpType op_vge(DisasContext *s, DisasOps *o)
 {
     const uint8_t es = s->insn->data;
@@ -175,3 +179,33 @@ static DisasJumpType op_vgbm(DisasContext *s, DisasOps *o)
     tcg_temp_free_i32(zeroes);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vgm(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    const uint8_t bits = NUM_VEC_ELEMENT_BITS(es);
+    const uint8_t i2 = get_field(s->fields, i2) & (bits - 1);
+    const uint8_t i3 = get_field(s->fields, i3) & (bits - 1);
+    uint64_t mask = 0;
+    TCGv_i64 tmp;
+    int i;
+
+    if (es > MO_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    /* generate the mask - take care of wrapping */
+    for (i = i2; ; i = (i + 1) % bits) {
+        mask |= 1ull << (bits - i - 1);
+        if (i == i3) {
+            break;
+        }
+    }
+
+    tmp = tcg_temp_new_i64();
+    tcg_gen_movi_i64(tmp, mask);
+    gen_gvec_dup_i64(es, get_field(s->fields, v1), tmp);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 08/33] s390x/tcg: Implement VECTOR LOAD
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (6 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 07/33] s390x/tcg: Implement VECTOR GENERATE MASK David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-27 15:39   ` Richard Henderson
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 09/33] s390x/tcg: Implement VECTOR LOAD AND REPLICATE David Hildenbrand
                   ` (25 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

When loading from memory, load to our temporary vector first, so in case
we get an access exception on the second 64 bit element, the vector
won't get modified.

Loading with strange alingment from the end of the address space will
not properly wrap, we can ignore that for now.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  3 +++
 target/s390x/translate_vx.inc.c | 18 ++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index a3a0df7788..c6dd70f2fd 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -981,6 +981,9 @@
     F(0xe744, VGBM,    VRI_a, V,   0, 0, 0, 0, vgbm, 0, IF_VEC)
 /* VECTOR GENERATE MASK */
     F(0xe746, VGM,     VRI_b, V,   0, 0, 0, 0, vgm, 0, IF_VEC)
+/* VECTOR LOAD */
+    F(0xe706, VL,      VRX,   V,   la2, 0, 0, 0, vl, 0, IF_VEC)
+    F(0xe756, VLR,     VRR_a, V,   0, 0, 0, 0, vlr, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index ed63b2ca22..9af5639bfe 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -139,6 +139,9 @@ static void load_vec_element(DisasContext *s, uint8_t reg, uint8_t enr,
 
 #define gen_gvec_dup_i64(es, v1, c) \
     tcg_gen_gvec_dup_i64(es, vec_full_reg_offset(v1), 16, 16, c)
+#define gen_gvec_mov(v1, v2) \
+    tcg_gen_gvec_mov(0, vec_full_reg_offset(v1), vec_full_reg_offset(v2), 16, \
+                     16)
 
 static DisasJumpType op_vge(DisasContext *s, DisasOps *o)
 {
@@ -209,3 +212,18 @@ static DisasJumpType op_vgm(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vl(DisasContext *s, DisasOps *o)
+{
+    load_vec_element(s, TMP_VREG_0, 0, o->addr1, MO_64);
+    gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
+    load_vec_element(s, TMP_VREG_0, 1, o->addr1, MO_64);
+    gen_gvec_mov(get_field(s->fields, v1), TMP_VREG_0);
+    return DISAS_NEXT;
+}
+
+static DisasJumpType op_vlr(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_mov(get_field(s->fields, v1), get_field(s->fields, v2));
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 09/33] s390x/tcg: Implement VECTOR LOAD AND REPLICATE
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (7 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 08/33] s390x/tcg: Implement VECTOR LOAD David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-27 15:40   ` Richard Henderson
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 10/33] s390x/tcg: Implement VECTOR LOAD ELEMENT David Hildenbrand
                   ` (24 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

We can use tcg_gen_gvec_dup_i64() to carry out the duplication.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index c6dd70f2fd..5475f04561 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -984,6 +984,8 @@
 /* VECTOR LOAD */
     F(0xe706, VL,      VRX,   V,   la2, 0, 0, 0, vl, 0, IF_VEC)
     F(0xe756, VLR,     VRR_a, V,   0, 0, 0, 0, vlr, 0, IF_VEC)
+/* VECTOR LOAD AND REPLICATE */
+    F(0xe705, VLREP,   VRX,   V,   la2, 0, 0, 0, vlrep, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 9af5639bfe..b898910cd9 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -227,3 +227,20 @@ static DisasJumpType op_vlr(DisasContext *s, DisasOps *o)
     gen_gvec_mov(get_field(s->fields, v1), get_field(s->fields, v2));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vlrep(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m3);
+    TCGv_i64 tmp;
+
+    if (es > MO_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tmp = tcg_temp_new_i64();
+    tcg_gen_qemu_ld_i64(tmp, o->addr1, get_mem_index(s), MO_TE | es);
+    gen_gvec_dup_i64(es, get_field(s->fields, v1), tmp);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 10/33] s390x/tcg: Implement VECTOR LOAD ELEMENT
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (8 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 09/33] s390x/tcg: Implement VECTOR LOAD AND REPLICATE David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-27 15:42   ` Richard Henderson
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 11/33] s390x/tcg: Implement VECTOR LOAD ELEMENT IMMEDIATE David Hildenbrand
                   ` (23 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Fairly easy, load with desired size and store it into the right element.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  5 +++++
 target/s390x/translate_vx.inc.c | 18 ++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 5475f04561..960ee8f0a8 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -986,6 +986,11 @@
     F(0xe756, VLR,     VRR_a, V,   0, 0, 0, 0, vlr, 0, IF_VEC)
 /* VECTOR LOAD AND REPLICATE */
     F(0xe705, VLREP,   VRX,   V,   la2, 0, 0, 0, vlrep, 0, IF_VEC)
+/* VECTOR LOAD ELEMENT */
+    E(0xe700, VLEB,    VRX,   V,   la2, 0, 0, 0, vle, 0, MO_8, IF_VEC)
+    E(0xe701, VLEH,    VRX,   V,   la2, 0, 0, 0, vle, 0, MO_16, IF_VEC)
+    E(0xe703, VLEF,    VRX,   V,   la2, 0, 0, 0, vle, 0, MO_32, IF_VEC)
+    E(0xe702, VLEG,    VRX,   V,   la2, 0, 0, 0, vle, 0, MO_64, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index b898910cd9..b11628af50 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -244,3 +244,21 @@ static DisasJumpType op_vlrep(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vle(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = s->insn->data;
+    const uint8_t enr = get_field(s->fields, m3);
+    TCGv_i64 tmp;
+
+    if (!valid_vec_element(enr, es)) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tmp = tcg_temp_new_i64();
+    tcg_gen_qemu_ld_i64(tmp, o->addr1, get_mem_index(s), MO_TE | es);
+    write_vec_element_i64(tmp, get_field(s->fields, v1), enr, es);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 11/33] s390x/tcg: Implement VECTOR LOAD ELEMENT IMMEDIATE
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (9 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 10/33] s390x/tcg: Implement VECTOR LOAD ELEMENT David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-27 15:44   ` Richard Henderson
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 12/33] s390x/tcg: Implement VECTOR LOAD GR FROM VR ELEMENT David Hildenbrand
                   ` (22 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Take care of properly sign-extending the immediate.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  5 +++++
 target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 960ee8f0a8..46610e808f 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -991,6 +991,11 @@
     E(0xe701, VLEH,    VRX,   V,   la2, 0, 0, 0, vle, 0, MO_16, IF_VEC)
     E(0xe703, VLEF,    VRX,   V,   la2, 0, 0, 0, vle, 0, MO_32, IF_VEC)
     E(0xe702, VLEG,    VRX,   V,   la2, 0, 0, 0, vle, 0, MO_64, IF_VEC)
+/* VECTOR LOAD ELEMENT IMMEDIATE */
+    E(0xe740, VLEIB,   VRI_a, V,   0, 0, 0, 0, vlei, 0, MO_8, IF_VEC)
+    E(0xe741, VLEIH,   VRI_a, V,   0, 0, 0, 0, vlei, 0, MO_16, IF_VEC)
+    E(0xe743, VLEIF,   VRI_a, V,   0, 0, 0, 0, vlei, 0, MO_32, IF_VEC)
+    E(0xe742, VLEIG,   VRI_a, V,   0, 0, 0, 0, vlei, 0, MO_64, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index b11628af50..1bf654ff4e 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -262,3 +262,20 @@ static DisasJumpType op_vle(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vlei(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = s->insn->data;
+    const uint8_t enr = get_field(s->fields, m3);
+    TCGv_i64 tmp;
+
+    if (!valid_vec_element(enr, es)) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tmp = tcg_const_i64((int16_t)get_field(s->fields, i2));
+    write_vec_element_i64(tmp, get_field(s->fields, v1), enr, es);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 12/33] s390x/tcg: Implement VECTOR LOAD GR FROM VR ELEMENT
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (10 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 11/33] s390x/tcg: Implement VECTOR LOAD ELEMENT IMMEDIATE David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-27 15:53   ` Richard Henderson
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 13/33] s390x/tcg: Implement VECTOR LOAD LOGICAL ELEMENT AND ZERO David Hildenbrand
                   ` (21 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

To avoid an helper, we have to do the actual calculation of the element
address (offset in cpu_env + cpu_env) manually. Factor that out into
get_vec_element_ptr_i64(). The same logic will be reused for "VECTOR
LOAD VR ELEMENT FROM GR".

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 55 +++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 46610e808f..f4201ff55a 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -996,6 +996,8 @@
     E(0xe741, VLEIH,   VRI_a, V,   0, 0, 0, 0, vlei, 0, MO_16, IF_VEC)
     E(0xe743, VLEIF,   VRI_a, V,   0, 0, 0, 0, vlei, 0, MO_32, IF_VEC)
     E(0xe742, VLEIG,   VRI_a, V,   0, 0, 0, 0, vlei, 0, MO_64, IF_VEC)
+/* VECTOR LOAD GR FROM VR ELEMENT */
+    F(0xe721, VLGV,    VRS_c, V,   la2, 0, r1, 0, vlgv, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 1bf654ff4e..a02a3ba81f 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -137,6 +137,28 @@ static void load_vec_element(DisasContext *s, uint8_t reg, uint8_t enr,
     tcg_temp_free_i64(tmp);
 }
 
+static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
+                                    uint8_t es)
+{
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    /* mask off invalid parts from the element nr */
+    tcg_gen_andi_i64(tmp, enr, NUM_VEC_ELEMENTS(es) - 1);
+
+    /* convert it to an element offset relative to cpu_env (vec_reg_offset() */
+    tcg_gen_muli_i64(tmp, tmp, NUM_VEC_ELEMENT_BYTES(es));
+#ifndef HOST_WORDS_BIGENDIAN
+    tcg_gen_xori_i64(tmp, tmp, 8 - NUM_VEC_ELEMENT_BYTES(es));
+#endif
+    tcg_gen_addi_i64(tmp, tmp, vec_full_reg_offset(reg));
+
+    /* generate the final ptr by adding cpu_env */
+    tcg_gen_trunc_i64_ptr(ptr, tmp);
+    tcg_gen_add_ptr(ptr, ptr, cpu_env);
+
+    tcg_temp_free_i64(tmp);
+}
+
 #define gen_gvec_dup_i64(es, v1, c) \
     tcg_gen_gvec_dup_i64(es, vec_full_reg_offset(v1), 16, 16, c)
 #define gen_gvec_mov(v1, v2) \
@@ -279,3 +301,36 @@ static DisasJumpType op_vlei(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vlgv(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    TCGv_ptr ptr;
+
+    if (es > MO_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    ptr = tcg_temp_new_ptr();
+    get_vec_element_ptr_i64(ptr, get_field(s->fields, v3), o->addr1, es);
+    switch (es) {
+    case MO_8:
+        tcg_gen_ld8u_i64(o->out, ptr, 0);
+        break;
+    case MO_16:
+        tcg_gen_ld16u_i64(o->out, ptr, 0);
+        break;
+    case MO_32:
+        tcg_gen_ld32u_i64(o->out, ptr, 0);
+        break;
+    case MO_64:
+        tcg_gen_ld_i64(o->out, ptr, 0);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    tcg_temp_free_ptr(ptr);
+
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 13/33] s390x/tcg: Implement VECTOR LOAD LOGICAL ELEMENT AND ZERO
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (11 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 12/33] s390x/tcg: Implement VECTOR LOAD GR FROM VR ELEMENT David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-27 15:56   ` Richard Henderson
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE David Hildenbrand
                   ` (20 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Fairly easy, zero out the vector before we load the desired element.
Use a temporary vector so we don't modify the target vector on
exceptions.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 43 +++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index f4201ff55a..46a0739703 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -998,6 +998,8 @@
     E(0xe742, VLEIG,   VRI_a, V,   0, 0, 0, 0, vlei, 0, MO_64, IF_VEC)
 /* VECTOR LOAD GR FROM VR ELEMENT */
     F(0xe721, VLGV,    VRS_c, V,   la2, 0, r1, 0, vlgv, 0, IF_VEC)
+/* VECTOR LOAD LOGICAL ELEMENT AND ZERO */
+    F(0xe704, VLLEZ,   VRX,   V,   la2, 0, 0, 0, vllez, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index a02a3ba81f..301408d1f2 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -165,6 +165,11 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
     tcg_gen_gvec_mov(0, vec_full_reg_offset(v1), vec_full_reg_offset(v2), 16, \
                      16)
 
+static void zero_vec(uint8_t reg)
+{
+    tcg_gen_gvec_dup8i(vec_full_reg_offset(reg), 16, 16, 0);
+}
+
 static DisasJumpType op_vge(DisasContext *s, DisasOps *o)
 {
     const uint8_t es = s->insn->data;
@@ -334,3 +339,41 @@ static DisasJumpType op_vlgv(DisasContext *s, DisasOps *o)
 
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vllez(DisasContext *s, DisasOps *o)
+{
+    uint8_t es = get_field(s->fields, m3);
+    uint8_t enr;
+
+    switch (es) {
+    /* rightmost sub-element of leftmost doubleword */
+    case MO_8:
+        enr = 7;
+        break;
+    case MO_16:
+        enr = 3;
+        break;
+    case MO_32:
+        enr = 1;
+        break;
+    case MO_64:
+        enr = 0;
+        break;
+    /* leftmost sub-element of leftmost doubleword */
+    case 6:
+        if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+            es = MO_32;
+            enr = 0;
+            break;
+        }
+    default:
+        /* fallthrough */
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    zero_vec(TMP_VREG_0);
+    load_vec_element(s, TMP_VREG_0, enr, o->addr1, es);
+    gen_gvec_mov(get_field(s->fields, v1), TMP_VREG_0);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (12 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 13/33] s390x/tcg: Implement VECTOR LOAD LOGICAL ELEMENT AND ZERO David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-27 16:02   ` Richard Henderson
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 15/33] s390x/tcg: Implement VECTOR LOAD TO BLOCK BOUNDARY David Hildenbrand
                   ` (19 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Also fairly easy to implement. One issue we have is that exceptions will
result in some vectors already being modified. At least handle it
consistently per vector by using a temporary vector. Good enough for
now, add a FIXME.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 26 ++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 46a0739703..65ff8bbd2e 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1000,6 +1000,8 @@
     F(0xe721, VLGV,    VRS_c, V,   la2, 0, r1, 0, vlgv, 0, IF_VEC)
 /* VECTOR LOAD LOGICAL ELEMENT AND ZERO */
     F(0xe704, VLLEZ,   VRX,   V,   la2, 0, 0, 0, vllez, 0, IF_VEC)
+/* VECTOR LOAD MULTIPLE */
+    F(0xe736, VLM,     VRS_a, V,   la2, 0, 0, 0, vlm, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 301408d1f2..c9f57afd4a 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -377,3 +377,29 @@ static DisasJumpType op_vllez(DisasContext *s, DisasOps *o)
     gen_gvec_mov(get_field(s->fields, v1), TMP_VREG_0);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vlm(DisasContext *s, DisasOps *o)
+{
+    const uint8_t v3 = get_field(s->fields, v3);
+    uint8_t v1 = get_field(s->fields, v1);
+
+    while (v3 < v1 || (v3 - v1 + 1) > 16) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    /*
+     * FIXME: On exceptions we must not modify any vector.
+     */
+    for (;; v1++) {
+        load_vec_element(s, TMP_VREG_0, 0, o->addr1, MO_64);
+        gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
+        load_vec_element(s, TMP_VREG_0, 1, o->addr1, MO_64);
+        gen_gvec_mov(v1, TMP_VREG_0);
+        if (v1 == v3) {
+            break;
+        }
+        gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
+    }
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 15/33] s390x/tcg: Implement VECTOR LOAD TO BLOCK BOUNDARY
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (13 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-27 16:08   ` Richard Henderson
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 16/33] s390x/tcg: Implement VECTOR LOAD VR ELEMENT FROM GR David Hildenbrand
                   ` (18 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Very similar to LOAD COUNT TO BLOCK BOUNDARY, but instead of only
calculating, the actual vector is loaded. Use a temporary vector to
not modify the real vector on exceptions. Initialize that one to zero,
to not leak any data.

As we don't have gvec ool handlers for single vectors, just calculate
the vector address manually.

We can reuse the helper later on for VECTOR LOAD WITH LENGTH. In fact,
we are going to name it "vll" right from the beginning, because that's
a better match.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  3 +++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 25 +++++++++++++++++++++++++
 target/s390x/vec_helper.c       | 20 ++++++++++++++++++++
 4 files changed, 50 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index bb659257f6..6c745ba0f6 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -124,6 +124,9 @@ DEF_HELPER_5(msa, i32, env, i32, i32, i32, i32)
 DEF_HELPER_FLAGS_1(stpt, TCG_CALL_NO_RWG, i64, env)
 DEF_HELPER_FLAGS_1(stck, TCG_CALL_NO_RWG_SE, i64, env)
 
+/* === Vector Support Instructions === */
+DEF_HELPER_FLAGS_4(vll, TCG_CALL_NO_WG, void, env, ptr, i64, i64)
+
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
 DEF_HELPER_4(diag, void, env, i32, i32, i32)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 65ff8bbd2e..2ab88938ff 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1002,6 +1002,8 @@
     F(0xe704, VLLEZ,   VRX,   V,   la2, 0, 0, 0, vllez, 0, IF_VEC)
 /* VECTOR LOAD MULTIPLE */
     F(0xe736, VLM,     VRS_a, V,   la2, 0, 0, 0, vlm, 0, IF_VEC)
+/* VECTOR LOAD TO BLOCK BOUNDARY */
+    F(0xe707, VLBB,    VRX,   V,   la2, 0, 0, 0, vlbb, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index c9f57afd4a..b5ed3bd89f 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -403,3 +403,28 @@ static DisasJumpType op_vlm(DisasContext *s, DisasOps *o)
     }
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vlbb(DisasContext *s, DisasOps *o)
+{
+    const int64_t block_size = (1ull << (get_field(s->fields, m3) + 6));
+    const int v1_offs = vec_full_reg_offset(get_field(s->fields, v1));
+    TCGv_ptr a0;
+    TCGv_i64 bytes;
+
+    if (get_field(s->fields, m3) > 6) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    bytes = tcg_temp_new_i64();
+    a0 = tcg_temp_new_ptr();
+    /* calculate the number of bytes until the next block boundary */
+    tcg_gen_ori_i64(bytes, o->addr1, -block_size);
+    tcg_gen_neg_i64(bytes, bytes);
+
+    tcg_gen_addi_ptr(a0, cpu_env, v1_offs);
+    gen_helper_vll(cpu_env, a0, o->addr1, bytes);
+    tcg_temp_free_i64(bytes);
+    tcg_temp_free_ptr(a0);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_helper.c b/target/s390x/vec_helper.c
index 3e21e440ba..d2f510ed07 100644
--- a/target/s390x/vec_helper.c
+++ b/target/s390x/vec_helper.c
@@ -11,8 +11,13 @@
  */
 #include "qemu/osdep.h"
 #include "qemu-common.h"
+#include "cpu.h"
+#include "internal.h"
 #include "vec.h"
 #include "tcg/tcg.h"
+#include "exec/helper-proto.h"
+#include "exec/cpu_ldst.h"
+#include "exec/exec-all.h"
 
 /*
  * Each vector is stored as two 64bit host values. So when talking about
@@ -88,3 +93,18 @@ void s390_vec_write_element64(S390Vector *v, uint8_t enr, uint64_t data)
     g_assert(enr < 2);
     v->doubleword[enr] = data;
 }
+
+void HELPER(vll)(CPUS390XState *env, void *v1, uint64_t addr, uint64_t bytes)
+{
+    S390Vector tmp = {};
+    int i;
+
+    bytes = MIN(bytes, 16);
+    for (i = 0; i < bytes; i++) {
+        uint8_t byte = cpu_ldub_data_ra(env, addr, GETPC());
+
+        s390_vec_write_element8(&tmp, i, byte);
+        addr = wrap_address(env, addr + 1);
+    }
+    *(S390Vector *)v1 = tmp;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 16/33] s390x/tcg: Implement VECTOR LOAD VR ELEMENT FROM GR
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (14 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 15/33] s390x/tcg: Implement VECTOR LOAD TO BLOCK BOUNDARY David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-27 16:08   ` Richard Henderson
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 17/33] s390x/tcg: Implement VECTOR LOAD VR FROM GRS DISJOINT David Hildenbrand
                   ` (17 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Very similar to VECTOR LOAD GR FROM VR ELEMENT, just the opposite
direction.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 2ab88938ff..8b6957b750 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1004,6 +1004,8 @@
     F(0xe736, VLM,     VRS_a, V,   la2, 0, 0, 0, vlm, 0, IF_VEC)
 /* VECTOR LOAD TO BLOCK BOUNDARY */
     F(0xe707, VLBB,    VRX,   V,   la2, 0, 0, 0, vlbb, 0, IF_VEC)
+/* VECTOR LOAD VR ELEMENT FROM GR */
+    F(0xe722, VLVG,    VRS_b, V,   la2, r3, 0, 0, vlvg, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index b5ed3bd89f..edf471b8a7 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -428,3 +428,36 @@ static DisasJumpType op_vlbb(DisasContext *s, DisasOps *o)
     tcg_temp_free_ptr(a0);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vlvg(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    TCGv_ptr ptr;
+
+    if (es > MO_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    ptr = tcg_temp_new_ptr();
+    get_vec_element_ptr_i64(ptr, get_field(s->fields, v1), o->addr1, es);
+    switch (es) {
+    case MO_8:
+        tcg_gen_st8_i64(o->in2, ptr, 0);
+        break;
+    case MO_16:
+        tcg_gen_st16_i64(o->in2, ptr, 0);
+        break;
+    case MO_32:
+        tcg_gen_st32_i64(o->in2, ptr, 0);
+        break;
+    case MO_64:
+        tcg_gen_st_i64(o->in2, ptr, 0);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    tcg_temp_free_ptr(ptr);
+
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 17/33] s390x/tcg: Implement VECTOR LOAD VR FROM GRS DISJOINT
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (15 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 16/33] s390x/tcg: Implement VECTOR LOAD VR ELEMENT FROM GR David Hildenbrand
@ 2019-02-26 11:38 ` David Hildenbrand
  2019-02-27 16:10   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 18/33] s390x/tcg: Implement VECTOR LOAD WITH LENGTH David Hildenbrand
                   ` (16 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Fairly easy, just load from to gprs into a single vector.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      | 2 ++
 target/s390x/translate_vx.inc.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 8b6957b750..1594366d7e 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1006,6 +1006,8 @@
     F(0xe707, VLBB,    VRX,   V,   la2, 0, 0, 0, vlbb, 0, IF_VEC)
 /* VECTOR LOAD VR ELEMENT FROM GR */
     F(0xe722, VLVG,    VRS_b, V,   la2, r3, 0, 0, vlvg, 0, IF_VEC)
+/* VECTOR LOAD VR FROM GRS DISJOINT */
+    F(0xe762, VLVGP,   VRR_f, V,   r2, r3, 0, 0, vlvgp, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index edf471b8a7..93cbc9328f 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -461,3 +461,10 @@ static DisasJumpType op_vlvg(DisasContext *s, DisasOps *o)
 
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vlvgp(DisasContext *s, DisasOps *o)
+{
+    write_vec_element_i64(o->in1, get_field(s->fields, v1), 0, MO_64);
+    write_vec_element_i64(o->in2, get_field(s->fields, v1), 1, MO_64);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 18/33] s390x/tcg: Implement VECTOR LOAD WITH LENGTH
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (16 preceding siblings ...)
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 17/33] s390x/tcg: Implement VECTOR LOAD VR FROM GRS DISJOINT David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 16:12   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 19/33] s390x/tcg: Implement VECTOR MERGE (HIGH|LOW) David Hildenbrand
                   ` (15 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

We can reuse the helper introduced along with VECTOR LOAD TO BLOCK
BOUNDARY. We just have to take care of converting the highest index into
a length.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate.c        |  7 +++++++
 target/s390x/translate_vx.inc.c | 13 +++++++++++++
 3 files changed, 22 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 1594366d7e..2a9ac9cebc 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1008,6 +1008,8 @@
     F(0xe722, VLVG,    VRS_b, V,   la2, r3, 0, 0, vlvg, 0, IF_VEC)
 /* VECTOR LOAD VR FROM GRS DISJOINT */
     F(0xe762, VLVGP,   VRR_f, V,   r2, r3, 0, 0, vlvgp, 0, IF_VEC)
+/* VECTOR LOAD WITH LENGTH */
+    F(0xe737, VLL,     VRS_b, V,   la2, r3_32u, 0, 0, vll, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 56c146f91e..b43de96429 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -5797,6 +5797,13 @@ static void in2_r3_sr32(DisasContext *s, DisasFields *f, DisasOps *o)
 }
 #define SPEC_in2_r3_sr32 0
 
+static void in2_r3_32u(DisasContext *s, DisasFields *f, DisasOps *o)
+{
+    o->in2 = tcg_temp_new_i64();
+    tcg_gen_ext32u_i64(o->in2, regs[get_field(f, r3)]);
+}
+#define SPEC_in2_r3_32u 0
+
 static void in2_r2_32s(DisasContext *s, DisasFields *f, DisasOps *o)
 {
     o->in2 = tcg_temp_new_i64();
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 93cbc9328f..37f312fbb4 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -468,3 +468,16 @@ static DisasJumpType op_vlvgp(DisasContext *s, DisasOps *o)
     write_vec_element_i64(o->in2, get_field(s->fields, v1), 1, MO_64);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vll(DisasContext *s, DisasOps *o)
+{
+    const int v1_offs = vec_full_reg_offset(get_field(s->fields, v1));
+    TCGv_ptr a0 = tcg_temp_new_ptr();
+
+    /* convert highest index into an actual length */
+    tcg_gen_addi_i64(o->in2, o->in2, 1);
+    tcg_gen_addi_ptr(a0, cpu_env, v1_offs);
+    gen_helper_vll(cpu_env, a0, o->addr1, o->in2);
+    tcg_temp_free_ptr(a0);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 19/33] s390x/tcg: Implement VECTOR MERGE (HIGH|LOW)
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (17 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 18/33] s390x/tcg: Implement VECTOR LOAD WITH LENGTH David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 16:14   ` Richard Henderson
  2019-02-27 16:20   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 20/33] s390x/tcg: Implement VECTOR PACK David Hildenbrand
                   ` (14 subsequent siblings)
  33 siblings, 2 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

We cannot use gvec expansion as source and destination elements are
have different element numbers. So we'll expand using a fancy loop.
Also, we have to take care of overlapping source and target registers and
use a temporary register in case they do.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  4 +++
 target/s390x/translate_vx.inc.c | 43 +++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 2a9ac9cebc..51003cf917 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1010,6 +1010,10 @@
     F(0xe762, VLVGP,   VRR_f, V,   r2, r3, 0, 0, vlvgp, 0, IF_VEC)
 /* VECTOR LOAD WITH LENGTH */
     F(0xe737, VLL,     VRS_b, V,   la2, r3_32u, 0, 0, vll, 0, IF_VEC)
+/* VECTOR MERGE HIGH */
+    F(0xe761, VMRH,    VRR_c, V,   0, 0, 0, 0, vmr, 0, IF_VEC)
+/* VECTOR MERGE LOW */
+    F(0xe760, VMRL,    VRR_c, V,   0, 0, 0, 0, vmr, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 37f312fbb4..64a5ee55ca 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -481,3 +481,46 @@ static DisasJumpType op_vll(DisasContext *s, DisasOps *o)
     tcg_temp_free_ptr(a0);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vmr(DisasContext *s, DisasOps *o)
+{
+    const uint8_t v1 = get_field(s->fields, v1);
+    const uint8_t v2 = get_field(s->fields, v2);
+    const uint8_t v3 = get_field(s->fields, v3);
+    const uint8_t es = get_field(s->fields, m4);
+    const bool high = s->fields->op2 == 0x61;
+    int dst_idx, src_idx;
+    uint8_t dst_v = v1;
+    TCGv_i64 tmp;
+
+    if (es > MO_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    /* Source and destination overlap -> use a temporary register */
+    if (v1 == v2 || v1 == v3) {
+        dst_v = TMP_VREG_0;
+    }
+
+    tmp = tcg_temp_new_i64();
+    for (dst_idx = 0; dst_idx < NUM_VEC_ELEMENTS(es); dst_idx++) {
+        src_idx = dst_idx / 2;
+        if (!high) {
+            src_idx += NUM_VEC_ELEMENTS(es) / 2;
+        }
+        if (dst_idx % 2 == 0) {
+            read_vec_element_i64(tmp, v2, src_idx, es);
+        } else {
+            read_vec_element_i64(tmp, v3, src_idx, es);
+        }
+        write_vec_element_i64(tmp, dst_v, dst_idx, es);
+    }
+    tcg_temp_free_i64(tmp);
+
+    /* move the temporary to the destination */
+    if (dst_v != v1) {
+        gen_gvec_mov(v1, dst_v);
+    }
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 20/33] s390x/tcg: Implement VECTOR PACK
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (18 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 19/33] s390x/tcg: Implement VECTOR MERGE (HIGH|LOW) David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 23:11   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 21/33] s390x/tcg: Implement VECTOR PACK (LOGICAL) SATURATE David Hildenbrand
                   ` (13 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

We cannot use gvex expansion as the element size of source and
destination differs. So expand manually. Luckily, VECTOR PACK does not
care about saturation or setting the CC, so it can be implemented
without a helper. We have to watch out for overlapping source and
destination registers and use a temporary register in this case.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 41 +++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 51003cf917..8374a663bd 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1014,6 +1014,8 @@
     F(0xe761, VMRH,    VRR_c, V,   0, 0, 0, 0, vmr, 0, IF_VEC)
 /* VECTOR MERGE LOW */
     F(0xe760, VMRL,    VRR_c, V,   0, 0, 0, 0, vmr, 0, IF_VEC)
+/* VECTOR PACK */
+    F(0xe794, VPK,     VRR_c, V,   0, 0, 0, 0, vpk, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 64a5ee55ca..842ff6a02f 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -524,3 +524,44 @@ static DisasJumpType op_vmr(DisasContext *s, DisasOps *o)
     }
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vpk(DisasContext *s, DisasOps *o)
+{
+    const uint8_t v1 = get_field(s->fields, v1);
+    const uint8_t v2 = get_field(s->fields, v2);
+    const uint8_t v3 = get_field(s->fields, v3);
+    const uint8_t src_es = get_field(s->fields, m4);
+    const uint8_t dst_es = src_es - 1;
+    uint8_t dst_v = v1;
+    int dst_idx, src_idx;
+    TCGv_i64 tmp;
+
+    if (src_es == MO_8 || src_es > MO_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    /* Source and destination overlap -> use a temporary register */
+    if (v1 == v2 || v1 == v3) {
+        dst_v = TMP_VREG_0;
+    }
+
+    tmp = tcg_temp_new_i64();
+    for (dst_idx = 0; dst_idx < NUM_VEC_ELEMENTS(dst_es); dst_idx++) {
+        src_idx = dst_idx;
+        if (src_idx < NUM_VEC_ELEMENTS(src_es)) {
+            read_vec_element_i64(tmp, v2, src_idx, src_es);
+        } else {
+            src_idx -= NUM_VEC_ELEMENTS(src_es);
+            read_vec_element_i64(tmp, v3, src_idx, src_es);
+        }
+        write_vec_element_i64(tmp, dst_v, dst_idx, dst_es);
+    }
+    tcg_temp_free_i64(tmp);
+
+    /* move the temporary to the destination */
+    if (dst_v != v1) {
+        gen_gvec_mov(v1, dst_v);
+    }
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 21/33] s390x/tcg: Implement VECTOR PACK (LOGICAL) SATURATE
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (19 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 20/33] s390x/tcg: Implement VECTOR PACK David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 23:18   ` Richard Henderson
  2019-02-27 23:24   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 22/33] s390x/tcg: Implement VECTOR PERMUTE David Hildenbrand
                   ` (12 subsequent siblings)
  33 siblings, 2 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

We'll implement both via gvec ool helpers. As these can't return
values, we'll return the CC via env->cc_op. Generate different C
functions for the different cases using makros.

In the future we might want to do a translation like VECTOR PACK or
use separate handlers in case no CC update is needed. As linux does
not seem to use the function right now, no need to tune for performance.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  6 +++
 target/s390x/insn-data.def      |  4 ++
 target/s390x/translate_vx.inc.c | 37 ++++++++++++++++
 target/s390x/vec_helper.c       | 75 +++++++++++++++++++++++++++++++++
 4 files changed, 122 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 6c745ba0f6..4ea51618a5 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -126,6 +126,12 @@ DEF_HELPER_FLAGS_1(stck, TCG_CALL_NO_RWG_SE, i64, env)
 
 /* === Vector Support Instructions === */
 DEF_HELPER_FLAGS_4(vll, TCG_CALL_NO_WG, void, env, ptr, i64, i64)
+DEF_HELPER_5(gvec_vpks16, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_5(gvec_vpks32, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_5(gvec_vpks64, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_5(gvec_vpkls16, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_5(gvec_vpkls32, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_5(gvec_vpkls64, void, ptr, cptr, cptr, env, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 8374a663bd..c0a011c118 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1016,6 +1016,10 @@
     F(0xe760, VMRL,    VRR_c, V,   0, 0, 0, 0, vmr, 0, IF_VEC)
 /* VECTOR PACK */
     F(0xe794, VPK,     VRR_c, V,   0, 0, 0, 0, vpk, 0, IF_VEC)
+/* VECTOR PACK SATURATE */
+    F(0xe797, VPKS,    VRR_b, V,   0, 0, 0, 0, vpks, 0, IF_VEC)
+/* VECTOR PACK LOGICAL SATURATE */
+    F(0xe795, VPKLS,   VRR_b, V,   0, 0, 0, 0, vpks, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 842ff6a02f..d70ae3db3c 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -159,6 +159,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
     tcg_temp_free_i64(tmp);
 }
 
+#define gen_gvec_3_ptr(v1, v2, v3, ptr, data, fn) \
+    tcg_gen_gvec_3_ptr(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                       vec_full_reg_offset(v3), ptr, 16, 16, data, fn)
 #define gen_gvec_dup_i64(es, v1, c) \
     tcg_gen_gvec_dup_i64(es, vec_full_reg_offset(v1), 16, 16, c)
 #define gen_gvec_mov(v1, v2) \
@@ -565,3 +568,37 @@ static DisasJumpType op_vpk(DisasContext *s, DisasOps *o)
     }
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vpks(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    const uint8_t m5 = get_field(s->fields, m5);
+    static gen_helper_gvec_3_ptr * vpks[3] = {
+        gen_helper_gvec_vpks16,
+        gen_helper_gvec_vpks32,
+        gen_helper_gvec_vpks64,
+    };
+    static gen_helper_gvec_3_ptr * vpkls[3] = {
+        gen_helper_gvec_vpkls16,
+        gen_helper_gvec_vpkls32,
+        gen_helper_gvec_vpkls64,
+    };
+
+    if (es == MO_8 || es > MO_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    /* TODO: Separate translation/handlers in case we don't update the CC. */
+    if (s->fields->op2 == 0x97) {
+        gen_gvec_3_ptr(get_field(s->fields, v1), get_field(s->fields, v2),
+                       get_field(s->fields, v3), cpu_env, m5, vpks[es - 1]);
+    } else {
+        gen_gvec_3_ptr(get_field(s->fields, v1), get_field(s->fields, v2),
+                       get_field(s->fields, v3), cpu_env, m5, vpkls[es - 1]);
+    }
+    if (m5 & 0x1) {
+        set_cc_static(s);
+    }
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_helper.c b/target/s390x/vec_helper.c
index d2f510ed07..9974471cc8 100644
--- a/target/s390x/vec_helper.c
+++ b/target/s390x/vec_helper.c
@@ -15,6 +15,7 @@
 #include "internal.h"
 #include "vec.h"
 #include "tcg/tcg.h"
+#include "tcg/tcg-gvec-desc.h"
 #include "exec/helper-proto.h"
 #include "exec/cpu_ldst.h"
 #include "exec/exec-all.h"
@@ -108,3 +109,77 @@ void HELPER(vll)(CPUS390XState *env, void *v1, uint64_t addr, uint64_t bytes)
     }
     *(S390Vector *)v1 = tmp;
 }
+
+#define DEF_VPK_HFN(_BITS, _TBITS)                                             \
+typedef uint##_TBITS##_t (*vpk##_BITS##_fn)(uint##_BITS##_t, int *);           \
+static void vpk##_BITS##_hfn(CPUS390XState *env, S390Vector *v1,               \
+                             const S390Vector *v2, const S390Vector *v3,       \
+                             uint8_t m5, vpk##_BITS##_fn fn)                   \
+{                                                                              \
+    const uint8_t set_cc = m5 & 0x1;                                           \
+    int i, saturated = 0;                                                      \
+    S390Vector tmp;                                                            \
+                                                                               \
+    for (i = 0; i < (128 / _TBITS); i++) {                                     \
+        uint##_BITS##_t src;                                                   \
+                                                                               \
+        if (i < (128 / _BITS)) {                                               \
+            src = s390_vec_read_element##_BITS(v2, i);                         \
+        } else {                                                               \
+            src = s390_vec_read_element##_BITS(v3, i - (128 / _BITS));         \
+        }                                                                      \
+        s390_vec_write_element##_TBITS(&tmp, i, fn(src, &saturated));          \
+    }                                                                          \
+    *v1 = tmp;                                                                 \
+    if (set_cc) {                                                              \
+        if (saturated == i) {                                                  \
+            env->cc_op = 3;                                                    \
+        } else if (saturated) {                                                \
+            env->cc_op = 1;                                                    \
+        } else {                                                               \
+            env->cc_op = 0;                                                    \
+        }                                                                      \
+    }                                                                          \
+}
+DEF_VPK_HFN(64, 32)
+DEF_VPK_HFN(32, 16)
+DEF_VPK_HFN(16, 8)
+
+#define DEF_VPKS(_BITS, _TBITS)                                                \
+static uint##_TBITS##_t vpks##_BITS##e(uint##_BITS##_t src, int *saturated)    \
+{                                                                              \
+    if ((int##_BITS##_t)src > INT##_TBITS##_MAX) {                             \
+        (*saturated)++;                                                        \
+        return INT##_TBITS##_MAX;                                              \
+    } else if ((int##_BITS##_t)src < INT##_TBITS##_MIN) {                      \
+        (*saturated)++;                                                        \
+        return INT##_TBITS##_MIN;                                              \
+    }                                                                          \
+    return src;                                                                \
+}                                                                              \
+void HELPER(gvec_vpks##_BITS)(void *v1, const void *v2, const void *v3,        \
+                              CPUS390XState *env, uint32_t desc)               \
+{                                                                              \
+    vpk##_BITS##_hfn(env, v1, v2, v3, simd_data(desc), vpks##_BITS##e);        \
+}
+DEF_VPKS(64, 32)
+DEF_VPKS(32, 16)
+DEF_VPKS(16, 8)
+
+#define DEF_VPKLS(_BITS, _TBITS)                                               \
+static uint##_TBITS##_t vpkls##_BITS##e(uint##_BITS##_t src, int *saturated)   \
+{                                                                              \
+    if (src > UINT##_TBITS##_MAX) {                                            \
+        (*saturated)++;                                                        \
+        return UINT##_TBITS##_MAX;                                             \
+    }                                                                          \
+    return src;                                                                \
+}                                                                              \
+void HELPER(gvec_vpkls##_BITS)(void *v1, const void *v2, const void *v3,       \
+                               CPUS390XState *env, uint32_t desc)              \
+{                                                                              \
+    vpk##_BITS##_hfn(env, v1, v2, v3, simd_data(desc), vpkls##_BITS##e);       \
+}
+DEF_VPKLS(64, 32)
+DEF_VPKLS(32, 16)
+DEF_VPKLS(16, 8)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 22/33] s390x/tcg: Implement VECTOR PERMUTE
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (20 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 21/33] s390x/tcg: Implement VECTOR PACK (LOGICAL) SATURATE David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 23:21   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 23/33] s390x/tcg: Implement VECTOR PERMUTE DOUBLEWORD IMMEDIATE David Hildenbrand
                   ` (11 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Take care of overlying inputs and outputs by using a temporary vector.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 12 ++++++++++++
 target/s390x/vec_helper.c       | 20 ++++++++++++++++++++
 4 files changed, 35 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 4ea51618a5..969b124f6a 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -132,6 +132,7 @@ DEF_HELPER_5(gvec_vpks64, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_5(gvec_vpkls16, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_5(gvec_vpkls32, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_5(gvec_vpkls64, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vperm, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index c0a011c118..b4b4be651b 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1020,6 +1020,8 @@
     F(0xe797, VPKS,    VRR_b, V,   0, 0, 0, 0, vpks, 0, IF_VEC)
 /* VECTOR PACK LOGICAL SATURATE */
     F(0xe795, VPKLS,   VRR_b, V,   0, 0, 0, 0, vpks, 0, IF_VEC)
+/* VECTOR PERMUTE */
+    F(0xe78c, VPERM,   VRR_e, V,   0, 0, 0, 0, vperm, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index d70ae3db3c..a57d4816c4 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -162,6 +162,10 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
 #define gen_gvec_3_ptr(v1, v2, v3, ptr, data, fn) \
     tcg_gen_gvec_3_ptr(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                        vec_full_reg_offset(v3), ptr, 16, 16, data, fn)
+#define gen_gvec_4_ool(v1, v2, v3, v4, data, fn) \
+    tcg_gen_gvec_4_ool(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                       vec_full_reg_offset(v3), vec_full_reg_offset(v4), \
+                       16, 16, data, fn)
 #define gen_gvec_dup_i64(es, v1, c) \
     tcg_gen_gvec_dup_i64(es, vec_full_reg_offset(v1), 16, 16, c)
 #define gen_gvec_mov(v1, v2) \
@@ -602,3 +606,11 @@ static DisasJumpType op_vpks(DisasContext *s, DisasOps *o)
     }
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vperm(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_4_ool(get_field(s->fields, v1), get_field(s->fields, v2),
+                   get_field(s->fields, v3), get_field(s->fields, v4),
+                   0, gen_helper_gvec_vperm);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_helper.c b/target/s390x/vec_helper.c
index 9974471cc8..bf8a91cdfa 100644
--- a/target/s390x/vec_helper.c
+++ b/target/s390x/vec_helper.c
@@ -183,3 +183,23 @@ void HELPER(gvec_vpkls##_BITS)(void *v1, const void *v2, const void *v3,       \
 DEF_VPKLS(64, 32)
 DEF_VPKLS(32, 16)
 DEF_VPKLS(16, 8)
+
+void HELPER(gvec_vperm)(void *v1, const void *v2, const void *v3,
+                        const void *v4, uint32_t desc)
+{
+    S390Vector tmp;
+    int i;
+
+    for (i = 0; i < 16; i++) {
+        const uint8_t selector = s390_vec_read_element8(v4, i) & 0x1f;
+        uint8_t byte;
+
+        if (selector < 16) {
+            byte = s390_vec_read_element8(v2, selector);
+        } else {
+            byte = s390_vec_read_element8(v3, selector - 16);
+        }
+        s390_vec_write_element8(&tmp, i, byte);
+    }
+    *(S390Vector *)v1 = tmp;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 23/33] s390x/tcg: Implement VECTOR PERMUTE DOUBLEWORD IMMEDIATE
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (21 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 22/33] s390x/tcg: Implement VECTOR PERMUTE David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 23:26   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 24/33] s390x/tcg: Implement VECTOR REPLICATE David Hildenbrand
                   ` (10 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Read the whole input before modifying the destination vector.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 16 ++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index b4b4be651b..eb4dea2e2d 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1022,6 +1022,8 @@
     F(0xe795, VPKLS,   VRR_b, V,   0, 0, 0, 0, vpks, 0, IF_VEC)
 /* VECTOR PERMUTE */
     F(0xe78c, VPERM,   VRR_e, V,   0, 0, 0, 0, vperm, 0, IF_VEC)
+/* VECTOR PERMUTE DOUBLEWORD IMMEDIATE */
+    F(0xe784, VPDI,    VRR_c, V,   0, 0, 0, 0, vpdi, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index a57d4816c4..e67b47f262 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -614,3 +614,19 @@ static DisasJumpType op_vperm(DisasContext *s, DisasOps *o)
                    0, gen_helper_gvec_vperm);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vpdi(DisasContext *s, DisasOps *o)
+{
+    const uint8_t i2 = extract32(get_field(s->fields, m4), 2, 1);
+    const uint8_t i3 = extract32(get_field(s->fields, m4), 0, 1);
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    read_vec_element_i64(t0, get_field(s->fields, v2), i2, MO_64);
+    read_vec_element_i64(t1, get_field(s->fields, v3), i3, MO_64);
+    write_vec_element_i64(t0, get_field(s->fields, v1), 0, MO_64);
+    write_vec_element_i64(t1, get_field(s->fields, v1), 1, MO_64);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 24/33] s390x/tcg: Implement VECTOR REPLICATE
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (22 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 23/33] s390x/tcg: Implement VECTOR PERMUTE DOUBLEWORD IMMEDIATE David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 23:29   ` Richard Henderson
  2019-02-27 23:31   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 25/33] s390x/tcg: Implement VECTOR REPLICATE IMMEDIATE David Hildenbrand
                   ` (9 subsequent siblings)
  33 siblings, 2 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Load the element and replicate it using gvec_dup.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index eb4dea2e2d..d2efe6bba2 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1024,6 +1024,8 @@
     F(0xe78c, VPERM,   VRR_e, V,   0, 0, 0, 0, vperm, 0, IF_VEC)
 /* VECTOR PERMUTE DOUBLEWORD IMMEDIATE */
     F(0xe784, VPDI,    VRR_c, V,   0, 0, 0, 0, vpdi, 0, IF_VEC)
+/* VECTOR REPLICATE */
+    F(0xe74d, VREP,    VRI_c, V,   0, 0, 0, 0, vrep, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index e67b47f262..c261e56c6b 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -630,3 +630,21 @@ static DisasJumpType op_vpdi(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(t1);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vrep(DisasContext *s, DisasOps *o)
+{
+    const uint8_t enr = get_field(s->fields, i2);
+    const uint8_t es = get_field(s->fields, m4);
+    TCGv_i64 tmp;
+
+    if (es > MO_64 || !valid_vec_element(enr, es)) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tmp = tcg_temp_new_i64();
+    read_vec_element_i64(tmp, get_field(s->fields, v3), enr, es);
+    gen_gvec_dup_i64(es, get_field(s->fields, v1), tmp);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 25/33] s390x/tcg: Implement VECTOR REPLICATE IMMEDIATE
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (23 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 24/33] s390x/tcg: Implement VECTOR REPLICATE David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 23:39   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 26/33] s390x/tcg: Implement VECTOR SCATTER ELEMENT David Hildenbrand
                   ` (8 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Like VECTOR REPLICATE, but the element to be replicated comes from an
immediate.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index d2efe6bba2..9aa508547b 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1026,6 +1026,8 @@
     F(0xe784, VPDI,    VRR_c, V,   0, 0, 0, 0, vpdi, 0, IF_VEC)
 /* VECTOR REPLICATE */
     F(0xe74d, VREP,    VRI_c, V,   0, 0, 0, 0, vrep, 0, IF_VEC)
+/* VECTOR REPLICATE IMMEDIATE */
+    F(0xe745, VREPI,   VRI_a, V,   0, 0, 0, 0, vrepi, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index c261e56c6b..761f3dc723 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -648,3 +648,21 @@ static DisasJumpType op_vrep(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vrepi(DisasContext *s, DisasOps *o)
+{
+    const int64_t data = (int16_t)get_field(s->fields, i2);
+    const uint8_t es = get_field(s->fields, m3);
+    TCGv_i64 tmp;
+
+    if (es > MO_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tmp = tcg_temp_new_i64();
+    tcg_gen_movi_i64(tmp, data);
+    gen_gvec_dup_i64(es, get_field(s->fields, v1), tmp);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 26/33] s390x/tcg: Implement VECTOR SCATTER ELEMENT
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (24 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 25/33] s390x/tcg: Implement VECTOR REPLICATE IMMEDIATE David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 23:40   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 27/33] s390x/tcg: Implement VECTOR SELECT David Hildenbrand
                   ` (7 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Similar to VECTOR GATHER ELEMENT, but the other direction.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  3 +++
 target/s390x/translate_vx.inc.c | 22 ++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 9aa508547b..4159ec36f9 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1028,6 +1028,9 @@
     F(0xe74d, VREP,    VRI_c, V,   0, 0, 0, 0, vrep, 0, IF_VEC)
 /* VECTOR REPLICATE IMMEDIATE */
     F(0xe745, VREPI,   VRI_a, V,   0, 0, 0, 0, vrepi, 0, IF_VEC)
+/* VECTOR SCATTER ELEMENT */
+    E(0xe71b, VSCEF,   VRV,   V,   la2, 0, 0, 0, vsce, 0, MO_32, IF_VEC)
+    E(0xe71a, VSCEG,   VRV,   V,   la2, 0, 0, 0, vsce, 0, MO_64, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 761f3dc723..344ac36f93 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -666,3 +666,25 @@ static DisasJumpType op_vrepi(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsce(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = s->insn->data;
+    const uint8_t enr = get_field(s->fields, m3);
+    TCGv_i64 tmp;
+
+    if (!valid_vec_element(enr, es)) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tmp = tcg_temp_new_i64();
+    read_vec_element_i64(tmp, get_field(s->fields, v2), enr, es);
+    tcg_gen_add_i64(o->addr1, o->addr1, tmp);
+    gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 0);
+
+    read_vec_element_i64(tmp, get_field(s->fields, v1), enr, es);
+    tcg_gen_qemu_st_i64(tmp, o->addr1, get_mem_index(s), MO_TE | es);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 27/33] s390x/tcg: Implement VECTOR SELECT
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (25 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 26/33] s390x/tcg: Implement VECTOR SCATTER ELEMENT David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 23:42   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 28/33] s390x/tcg: Implement VECTOR SIGN EXTEND TO DOUBLEWORD David Hildenbrand
                   ` (6 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Provide an implementation based on i64 and on real host vectors.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 43 +++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 4159ec36f9..a8d43b588c 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1031,6 +1031,8 @@
 /* VECTOR SCATTER ELEMENT */
     E(0xe71b, VSCEF,   VRV,   V,   la2, 0, 0, 0, vsce, 0, MO_32, IF_VEC)
     E(0xe71a, VSCEG,   VRV,   V,   la2, 0, 0, 0, vsce, 0, MO_64, IF_VEC)
+/* VECTOR SELECT */
+    F(0xe78d, VSEL,    VRR_e, V,   0, 0, 0, 0, vsel, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 344ac36f93..d3463c9ef3 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -162,6 +162,10 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
 #define gen_gvec_3_ptr(v1, v2, v3, ptr, data, fn) \
     tcg_gen_gvec_3_ptr(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                        vec_full_reg_offset(v3), ptr, 16, 16, data, fn)
+#define gen_gvec_4(v1, v2, v3, v4, gen) \
+    tcg_gen_gvec_4(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                   vec_full_reg_offset(v3), vec_full_reg_offset(v4), \
+                   16, 16, gen)
 #define gen_gvec_4_ool(v1, v2, v3, v4, data, fn) \
     tcg_gen_gvec_4_ool(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                        vec_full_reg_offset(v3), vec_full_reg_offset(v4), \
@@ -688,3 +692,42 @@ static DisasJumpType op_vsce(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmp);
     return DISAS_NEXT;
 }
+
+static void gen_sel_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 c)
+{
+    TCGv_i64 t = tcg_temp_new_i64();
+
+    /* bit in c not set -> copy bit from b */
+    tcg_gen_not_i64(t, c);
+    tcg_gen_and_i64(t, b, t);
+    /* bit in c set -> copy bit from a */
+    tcg_gen_and_i64(d, a, c);
+    /* merge the results */
+    tcg_gen_or_i64(d, d, t);
+    tcg_temp_free_i64(t);
+}
+
+static void gen_sel_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b,
+                        TCGv_vec c)
+{
+    TCGv_vec t = tcg_temp_new_vec_matching(d);
+
+    tcg_gen_not_vec(vece, t, c);
+    tcg_gen_and_vec(vece, t, t, b);
+    tcg_gen_and_vec(vece, d, a, c);
+    tcg_gen_or_vec(vece, d, d, t);
+    tcg_temp_free_vec(t);
+}
+
+static DisasJumpType op_vsel(DisasContext *s, DisasOps *o)
+{
+    static const GVecGen4 gvec_op = {
+        .fni8 = gen_sel_i64,
+        .fniv = gen_sel_vec,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    };
+
+    gen_gvec_4(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), get_field(s->fields, v4), &gvec_op);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 28/33] s390x/tcg: Implement VECTOR SIGN EXTEND TO DOUBLEWORD
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (26 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 27/33] s390x/tcg: Implement VECTOR SELECT David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 23:43   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 29/33] s390x/tcg: Implement VECTOR STORE David Hildenbrand
                   ` (5 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Load both elements signed and store them into the two 64 bit elements.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index a8d43b588c..ab3309c54b 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1033,6 +1033,8 @@
     E(0xe71a, VSCEG,   VRV,   V,   la2, 0, 0, 0, vsce, 0, MO_64, IF_VEC)
 /* VECTOR SELECT */
     F(0xe78d, VSEL,    VRR_e, V,   0, 0, 0, 0, vsel, 0, IF_VEC)
+/* VECTOR SIGN EXTEND TO DOUBLEWORD */
+    F(0xe75f, VSEG,    VRR_a, V,   0, 0, 0, 0, vseg, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index d3463c9ef3..23cdae0970 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -731,3 +731,36 @@ static DisasJumpType op_vsel(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), get_field(s->fields, v4), &gvec_op);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vseg(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m3);
+    int idx1, idx2;
+    TCGv_i64 tmp;
+
+    switch (es) {
+    case MO_8:
+        idx1 = 7;
+        idx2 = 15;
+        break;
+    case MO_16:
+        idx1 = 3;
+        idx2 = 7;
+        break;
+    case MO_32:
+        idx1 = 1;
+        idx2 = 3;
+        break;
+    default:
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tmp = tcg_temp_new_i64();
+    read_vec_element_i64(tmp, get_field(s->fields, v2), idx1, es | MO_SIGN);
+    write_vec_element_i64(tmp, get_field(s->fields, v1), 0, MO_64);
+    read_vec_element_i64(tmp, get_field(s->fields, v2), idx2, es | MO_SIGN);
+    write_vec_element_i64(tmp, get_field(s->fields, v1), 1, MO_64);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 29/33] s390x/tcg: Implement VECTOR STORE
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (27 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 28/33] s390x/tcg: Implement VECTOR SIGN EXTEND TO DOUBLEWORD David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 23:46   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 30/33] s390x/tcg: Implement VECTOR STORE ELEMENT David Hildenbrand
                   ` (4 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Add a FIXME regarding exceptions during the second store.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index ab3309c54b..2b18f4ab54 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1035,6 +1035,8 @@
     F(0xe78d, VSEL,    VRR_e, V,   0, 0, 0, 0, vsel, 0, IF_VEC)
 /* VECTOR SIGN EXTEND TO DOUBLEWORD */
     F(0xe75f, VSEG,    VRR_a, V,   0, 0, 0, 0, vseg, 0, IF_VEC)
+/* VECTOR STORE */
+    F(0xe70e, VST,     VRX,   V,   la2, 0, 0, 0, vst, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 23cdae0970..69b12e79a1 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -137,6 +137,17 @@ static void load_vec_element(DisasContext *s, uint8_t reg, uint8_t enr,
     tcg_temp_free_i64(tmp);
 }
 
+static void store_vec_element(DisasContext *s, uint8_t reg, uint8_t enr,
+                              TCGv_i64 addr, TCGMemOp es)
+{
+    TCGv_i64 tmp = tcg_temp_new_i64();
+
+    read_vec_element_i64(tmp, reg, enr, es);
+    tcg_gen_qemu_st_i64(tmp, addr, get_mem_index(s), MO_TE | es);
+
+    tcg_temp_free_i64(tmp);
+}
+
 static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
                                     uint8_t es)
 {
@@ -764,3 +775,14 @@ static DisasJumpType op_vseg(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vst(DisasContext *s, DisasOps *o)
+{
+    /*
+     * FIXME: On exceptions we must not modify any memory.
+     */
+    store_vec_element(s, get_field(s->fields, v1), 0, o->addr1, MO_64);
+    gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
+    store_vec_element(s, get_field(s->fields, v1), 1, o->addr1, MO_64);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 30/33] s390x/tcg: Implement VECTOR STORE ELEMENT
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (28 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 29/33] s390x/tcg: Implement VECTOR STORE David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 23:47   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 31/33] s390x/tcg: Implement VECTOR STORE MULTIPLE David Hildenbrand
                   ` (3 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

As we only store one element, there is nothing to consider regarding
exceptions.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  5 +++++
 target/s390x/translate_vx.inc.c | 18 ++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 2b18f4ab54..bf9786120b 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1037,6 +1037,11 @@
     F(0xe75f, VSEG,    VRR_a, V,   0, 0, 0, 0, vseg, 0, IF_VEC)
 /* VECTOR STORE */
     F(0xe70e, VST,     VRX,   V,   la2, 0, 0, 0, vst, 0, IF_VEC)
+/* VECTOR STORE ELEMENT */
+    E(0xe708, VSTEB,   VRX,   V,   la2, 0, 0, 0, vste, 0, MO_8, IF_VEC)
+    E(0xe709, VSTEH,   VRX,   V,   la2, 0, 0, 0, vste, 0, MO_16, IF_VEC)
+    E(0xe70b, VSTEF,   VRX,   V,   la2, 0, 0, 0, vste, 0, MO_32, IF_VEC)
+    E(0xe70a, VSTEG,   VRX,   V,   la2, 0, 0, 0, vste, 0, MO_64, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 69b12e79a1..9ec135d1a9 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -786,3 +786,21 @@ static DisasJumpType op_vst(DisasContext *s, DisasOps *o)
     store_vec_element(s, get_field(s->fields, v1), 1, o->addr1, MO_64);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vste(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = s->insn->data;
+    const uint8_t enr = get_field(s->fields, m3);
+    TCGv_i64 tmp;
+
+    if (!valid_vec_element(enr, es)) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tmp = tcg_temp_new_i64();
+    read_vec_element_i64(tmp, get_field(s->fields, v1), enr, es);
+    tcg_gen_qemu_st_i64(tmp, o->addr1, get_mem_index(s), MO_TE | es);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 31/33] s390x/tcg: Implement VECTOR STORE MULTIPLE
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (29 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 30/33] s390x/tcg: Implement VECTOR STORE ELEMENT David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 23:48   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 32/33] s390x/tcg: Implement VECTOR STORE WITH LENGTH David Hildenbrand
                   ` (2 subsequent siblings)
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Similar to VECTOR LOAD MULTIPLE, just the opposite direction.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 25 +++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index bf9786120b..60e4895f60 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1042,6 +1042,8 @@
     E(0xe709, VSTEH,   VRX,   V,   la2, 0, 0, 0, vste, 0, MO_16, IF_VEC)
     E(0xe70b, VSTEF,   VRX,   V,   la2, 0, 0, 0, vste, 0, MO_32, IF_VEC)
     E(0xe70a, VSTEG,   VRX,   V,   la2, 0, 0, 0, vste, 0, MO_64, IF_VEC)
+/* VECTOR STORE MULTIPLE */
+    F(0xe73e, VSTM,    VRS_a, V,   la2, 0, 0, 0, vstm, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 9ec135d1a9..7e7c96c974 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -804,3 +804,28 @@ static DisasJumpType op_vste(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vstm(DisasContext *s, DisasOps *o)
+{
+    const uint8_t v3 = get_field(s->fields, v3);
+    uint8_t v1 = get_field(s->fields, v1);
+
+    while (v3 < v1 || (v3 - v1 + 1) > 16) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    /*
+     * FIXME: On exceptions we must not modify any memory.
+     */
+    for (;; v1++) {
+        store_vec_element(s, v1, 0, o->addr1, MO_64);
+        gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
+        store_vec_element(s, v1, 1, o->addr1, MO_64);
+        if (v1 == v3) {
+            break;
+        }
+        gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
+    }
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 32/33] s390x/tcg: Implement VECTOR STORE WITH LENGTH
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (30 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 31/33] s390x/tcg: Implement VECTOR STORE MULTIPLE David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-27 23:49   ` Richard Henderson
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 33/33] s390x/tcg: Implement VECTOR UNPACK * David Hildenbrand
  2019-02-28  7:24 ` [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Very similar to VECTOR LOAD WITH LENGTH, just the opposite direction.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 13 +++++++++++++
 target/s390x/vec_helper.c       | 15 +++++++++++++++
 4 files changed, 31 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 969b124f6a..df449f4c53 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -133,6 +133,7 @@ DEF_HELPER_5(gvec_vpkls16, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_5(gvec_vpkls32, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_5(gvec_vpkls64, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vperm, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(vstl, TCG_CALL_NO_WG, void, env, cptr, i64, i64)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 60e4895f60..5d4d2ecc7e 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1044,6 +1044,8 @@
     E(0xe70a, VSTEG,   VRX,   V,   la2, 0, 0, 0, vste, 0, MO_64, IF_VEC)
 /* VECTOR STORE MULTIPLE */
     F(0xe73e, VSTM,    VRS_a, V,   la2, 0, 0, 0, vstm, 0, IF_VEC)
+/* VECTOR STORE WITH LENGTH */
+    F(0xe73f, VSTL,    VRS_b, V,   la2, r3_32u, 0, 0, vstl, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 7e7c96c974..d87f5bafcf 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -829,3 +829,16 @@ static DisasJumpType op_vstm(DisasContext *s, DisasOps *o)
     }
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vstl(DisasContext *s, DisasOps *o)
+{
+    const int v1_offs = vec_full_reg_offset(get_field(s->fields, v1));
+    TCGv_ptr a0 = tcg_temp_new_ptr();
+
+    /* convert highest index into an actual length */
+    tcg_gen_addi_i64(o->in2, o->in2, 1);
+    tcg_gen_addi_ptr(a0, cpu_env, v1_offs);
+    gen_helper_vstl(cpu_env, a0, o->addr1, o->in2);
+    tcg_temp_free_ptr(a0);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_helper.c b/target/s390x/vec_helper.c
index bf8a91cdfa..eddc925101 100644
--- a/target/s390x/vec_helper.c
+++ b/target/s390x/vec_helper.c
@@ -203,3 +203,18 @@ void HELPER(gvec_vperm)(void *v1, const void *v2, const void *v3,
     }
     *(S390Vector *)v1 = tmp;
 }
+
+void HELPER(vstl)(CPUS390XState *env, const void *v1, uint64_t addr,
+                  uint64_t bytes)
+{
+    int i;
+
+    /* FIXME: On exceptions we must not modify any memory. */
+    bytes = MIN(bytes, 16);
+    for (i = 0; i < bytes; i++) {
+        const uint8_t byte = s390_vec_read_element8(v1, i);
+
+        cpu_stb_data_ra(env, addr, byte, GETPC());
+        addr = wrap_address(env, addr + 1);
+    }
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [Qemu-devel] [PATCH v1 33/33] s390x/tcg: Implement VECTOR UNPACK *
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (31 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 32/33] s390x/tcg: Implement VECTOR STORE WITH LENGTH David Hildenbrand
@ 2019-02-26 11:39 ` David Hildenbrand
  2019-02-28  0:03   ` Richard Henderson
  2019-02-28  7:24 ` [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
  33 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 11:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Combine all variant in a single handler. As source and destination
have different element sizes, we can't use gvec expansion. Expand
manually. Also watch out for overlapping source and destination and
use a temporary register in that case.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  8 +++++++
 target/s390x/translate_vx.inc.c | 41 +++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 5d4d2ecc7e..2c49c63c59 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1046,6 +1046,14 @@
     F(0xe73e, VSTM,    VRS_a, V,   la2, 0, 0, 0, vstm, 0, IF_VEC)
 /* VECTOR STORE WITH LENGTH */
     F(0xe73f, VSTL,    VRS_b, V,   la2, r3_32u, 0, 0, vstl, 0, IF_VEC)
+/* VECTOR UNPACK HIGH */
+    F(0xe7d7, VUPH,    VRR_a, V,   0, 0, 0, 0, vup, 0, IF_VEC)
+/* VECTOR UNPACK LOGICAL HIGH */
+    F(0xe7d5, VUPLH,   VRR_a, V,   0, 0, 0, 0, vup, 0, IF_VEC)
+/* VECTOR UNPACK LOW */
+    F(0xe7d6, VUPL,    VRR_a, V,   0, 0, 0, 0, vup, 0, IF_VEC)
+/* VECTOR UNPACK LOGICAL LOW */
+    F(0xe7d4, VUPLL,   VRR_a, V,   0, 0, 0, 0, vup, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index d87f5bafcf..fde8b06953 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -842,3 +842,44 @@ static DisasJumpType op_vstl(DisasContext *s, DisasOps *o)
     tcg_temp_free_ptr(a0);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vup(DisasContext *s, DisasOps *o)
+{
+    const bool high = s->fields->op2 == 0xd7 || s->fields->op2 == 0xd5;
+    const bool logical = s->fields->op2 == 0xd4 || s->fields->op2 == 0xd5;
+    const uint8_t v1 = get_field(s->fields, v1);
+    const uint8_t v2 = get_field(s->fields, v2);
+    const uint8_t src_es = get_field(s->fields, m3);
+    const uint8_t dst_es = src_es + 1;
+    uint8_t dst_v = v1;
+    int dst_idx, src_idx;
+    TCGv_i64 tmp;
+
+    if (src_es > MO_32) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    /* Source and destination overlap -> use a temporary register */
+    if (v1 == v2) {
+        dst_v = TMP_VREG_0;
+    }
+
+    tmp = tcg_temp_new_i64();
+    for (dst_idx = 0; dst_idx < NUM_VEC_ELEMENTS(dst_es); dst_idx++) {
+        src_idx = dst_idx;
+        if (!high) {
+            src_idx += NUM_VEC_ELEMENTS(src_es) / 2;
+        }
+        read_vec_element_i64(tmp, v2, src_idx,
+                             src_es | (logical ? 0 : MO_SIGN));
+        write_vec_element_i64(tmp, dst_v, dst_idx, dst_es);
+    }
+    tcg_temp_free_i64(tmp);
+
+    /* move the temporary to the destination */
+    if (dst_v != v1) {
+        gen_gvec_mov(v1, dst_v);
+    }
+    return DISAS_NEXT;
+}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 01/33] s390x/tcg: Define vector instruction formats
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 01/33] s390x/tcg: Define vector instruction formats David Hildenbrand
@ 2019-02-26 18:24   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-26 18:24 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> These are the new instruction formats related to vector instructions as
> up to the z14 (a.k.a. latest PoP).
> 
> As v2 appeares (like x2 in VRX) with d2/b2 in VRV, we have to assign it a
> higher field number to avoid collisions.
> 
> Properly take care of the MSB (to be able to address 32 registers) for
> each vector register field stored in the RXB field (Bit 36 - 30  for all
> vector instructions). As we have 32 bit vector registers and the
> "v" fields are only 4 bit in size, the 5th bit is stored in the RXB.
> We use a new type to indicate that the MSB has to be fetched from the
> RXB.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-format.def | 25 +++++++++++++++++++++++
>  target/s390x/translate.c     | 39 +++++++++++++++++++++++++++++++++++-
>  2 files changed, 63 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 02/33] s390x/tcg: Check vector register instructions at central point
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 02/33] s390x/tcg: Check vector register instructions at central point David Hildenbrand
@ 2019-02-26 18:26   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-26 18:26 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> Check them at a central point. We'll use a new instruction flag to
> flag all vector instructions (IF_VEC) and handle it very similar to
> AFP, whereby we use another unused position in the PSW mask to store
> the state of vector register enablement per translation block.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/cpu.h       |  7 +++++++
>  target/s390x/translate.c | 12 ++++++++++++
>  2 files changed, 19 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/33] s390x: Add one temporary vector register in CPU state for TCG
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 03/33] s390x: Add one temporary vector register in CPU state for TCG David Hildenbrand
@ 2019-02-26 18:36   ` Richard Henderson
  2019-02-26 18:45     ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-26 18:36 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> We sometimes want to work on a temporary vector register instead of the
> actual destination, because source and destination might overlap. An
> alternative would be loading the vector into two i64 variables, but than
> separate handling for accessing the vector elements would be needed.
> This is easier. Add one for now as that seems to be enough.

Hmm, I'll reserve judgment until I see how this is used.

For ARM SVE, I would allocate this temporary on the stack within the helper,
and move one of the operands out of the way.  E.g.

void helper(foo)(void *vd, void *vx, *void *vy
{
    VectorReg tmp;
    TYPE *d = vd, *x = vx, *y = vy;

    if (vx == vd || vy == vd) {
        tmp = *(VectorReg *)vd;
        if (vx == vd) {
            vx = &tmp;
        }
        if (vy == vd) {
            vy = &tmp;
        }
    }

    process d, x, y as normal.
}

This minimized the amount of code inline.  However, SVE vectors are quite a bit
larger, at 256 bytes, so the copy itself was out of line most of the time anyway.

Provisionally,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 05/33] s390x/tcg: Implement VECTOR GATHER ELEMENT
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 05/33] s390x/tcg: Implement VECTOR GATHER ELEMENT David Hildenbrand
@ 2019-02-26 18:44   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-26 18:44 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> Let's start with a more involved one, but it is the first in the list
> of vector support instructions (introduced with the vector facility).
> 
> Good thing is, we need a lot of basic infrastructure for this. Reading
> and writing vector elements, checking element validity as well as loading
> vector elements from memory. Storing will be added later, once needed.
> 
> All vector instruction related translation functions will reside in
> translate_vx.inc.c, to be included in translate.c - similar to how
> other architectures handle it.
> 
> While at it, directly add some documentation (which contains parts about
> things added in follow-up patches, but splitting this up does not make
> too much sense).
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/33] s390x: Add one temporary vector register in CPU state for TCG
  2019-02-26 18:36   ` Richard Henderson
@ 2019-02-26 18:45     ` David Hildenbrand
  0 siblings, 0 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 18:45 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 26.02.19 19:36, Richard Henderson wrote:
> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>> We sometimes want to work on a temporary vector register instead of the
>> actual destination, because source and destination might overlap. An
>> alternative would be loading the vector into two i64 variables, but than
>> separate handling for accessing the vector elements would be needed.
>> This is easier. Add one for now as that seems to be enough.
> 
> Hmm, I'll reserve judgment until I see how this is used.
> 
> For ARM SVE, I would allocate this temporary on the stack within the helper,
> and move one of the operands out of the way.  E.g.

Yes, I do the same for helpers. This, however is for TCG translated code :)

E.g. see

[PATCH v1 08/33] s390x/tcg: Implement VECTOR LOAD
[PATCH v1 19/33] s390x/tcg: Implement VECTOR MERGE (HIGH|LOW)
[PATCH v1 33/33] s390x/tcg: Implement VECTOR UNPACK *


> 
> void helper(foo)(void *vd, void *vx, *void *vy
> {
>     VectorReg tmp;
>     TYPE *d = vd, *x = vx, *y = vy;
> 
>     if (vx == vd || vy == vd) {
>         tmp = *(VectorReg *)vd;
>         if (vx == vd) {
>             vx = &tmp;
>         }
>         if (vy == vd) {
>             vy = &tmp;
>         }
>     }
> 
>     process d, x, y as normal.
> }
> 
> This minimized the amount of code inline.  However, SVE vectors are quite a bit
> larger, at 256 bytes, so the copy itself was out of line most of the time anyway.
> 
> Provisionally,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 06/33] s390x/tcg: Implement VECTOR GENERATE BYTE MASK
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 06/33] s390x/tcg: Implement VECTOR GENERATE BYTE MASK David Hildenbrand
@ 2019-02-26 19:12   ` Richard Henderson
  2019-02-26 19:23     ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-26 19:12 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> +static DisasJumpType op_vgbm(DisasContext *s, DisasOps *o)
> +{
> +    const uint16_t i2 = get_field(s->fields, i2);
> +    TCGv_i32 ones = tcg_const_i32(-1u);
> +    TCGv_i32 zeroes = tcg_const_i32(0);
> +    int i;
> +
> +    for (i = 0; i < 16; i++) {
> +        if (extract32(i2, 15 - i, 1)) {
> +            write_vec_element_i32(ones, get_field(s->fields, v1), i, MO_8);
> +        } else {
> +            write_vec_element_i32(zeroes, get_field(s->fields, v1), i, MO_8);
> +        }
> +    }
> +    tcg_temp_free_i32(ones);
> +    tcg_temp_free_i32(zeroes);
> +    return DISAS_NEXT;
> +}

While this works, it's not in the spirit of

> Programming Note: VECTOR GENERATE BYTE
> MASK is the preferred method for setting a vector
> register to all zeroes or ones.

Better, I think, with

uint64_t generate_byte_mask(uint8_t mask)
{
    uint64_t r = 0;
    int i;
    for (i = 0; i < 8; i++) {
        if ((mask >> i) & 1) {
            r |= 0xffull << (i * 8);
        }
    }
    return r;
}

    if (i2 == (i2 & 0xff) * 0x0101) {
        /* masks for both halves of the vector are the same.
           trust tcg to produce a good constant loading.  */
        tcg_gen_gvec_dup64i(vec_full_reg_offset(s, v1), 16, 16,
                            generate_byte_mask(i2 & 0xff));
    } else {
        TCGv_i64 t = tcg_temp_new_i64();
        tcg_gen_movi_i64(t, generate_byte_mask(i2 >> 8));
        write_vec_element_i64(t, v1, 0, MO_64);
        tcg_gen_movi_i64(t, generate_byte_mask(i2 & 0xff));
        write_vec_element_i64(t, v1, 1, MO_64);
        tcg_temp_free_i64();
    }

Somewhere behind tcg_gen_gvec_dup64i, I check to see if the constant can be
decomposed further, which will eventually bottom out at

	vpxor	%xmm0,%xmm0,%xmm0		// all zeros
	vpcmpeq	%xmm0,%xmm0,%xmm0		// all ones

and even more interesting combinations for tcg/aarch64.



r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 06/33] s390x/tcg: Implement VECTOR GENERATE BYTE MASK
  2019-02-26 19:12   ` Richard Henderson
@ 2019-02-26 19:23     ` David Hildenbrand
  2019-02-26 21:23       ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 19:23 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 26.02.19 20:12, Richard Henderson wrote:
> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>> +static DisasJumpType op_vgbm(DisasContext *s, DisasOps *o)
>> +{
>> +    const uint16_t i2 = get_field(s->fields, i2);
>> +    TCGv_i32 ones = tcg_const_i32(-1u);
>> +    TCGv_i32 zeroes = tcg_const_i32(0);
>> +    int i;
>> +
>> +    for (i = 0; i < 16; i++) {
>> +        if (extract32(i2, 15 - i, 1)) {
>> +            write_vec_element_i32(ones, get_field(s->fields, v1), i, MO_8);
>> +        } else {
>> +            write_vec_element_i32(zeroes, get_field(s->fields, v1), i, MO_8);
>> +        }
>> +    }
>> +    tcg_temp_free_i32(ones);
>> +    tcg_temp_free_i32(zeroes);
>> +    return DISAS_NEXT;
>> +}
> 
> While this works, it's not in the spirit of
> 
>> Programming Note: VECTOR GENERATE BYTE
>> MASK is the preferred method for setting a vector
>> register to all zeroes or ones.

Good point, I skipped that note so far.

> 
> Better, I think, with

Many instructions to implement, so little time to fine tune stuff so
far. However I have tests for VGBM, so I can easily get it working. Will
play with it!

> 
> uint64_t generate_byte_mask(uint8_t mask)
> {
>     uint64_t r = 0;
>     int i;
>     for (i = 0; i < 8; i++) {
>         if ((mask >> i) & 1) {
>             r |= 0xffull << (i * 8);
>         }
>     }
>     return r;
> }
> 
>     if (i2 == (i2 & 0xff) * 0x0101) {
>         /* masks for both halves of the vector are the same.
>            trust tcg to produce a good constant loading.  */
>         tcg_gen_gvec_dup64i(vec_full_reg_offset(s, v1), 16, 16,
>                             generate_byte_mask(i2 & 0xff));
>     } else {
>         TCGv_i64 t = tcg_temp_new_i64();
>         tcg_gen_movi_i64(t, generate_byte_mask(i2 >> 8));
>         write_vec_element_i64(t, v1, 0, MO_64);
>         tcg_gen_movi_i64(t, generate_byte_mask(i2 & 0xff));
>         write_vec_element_i64(t, v1, 1, MO_64);
>         tcg_temp_free_i64();
>     }
> 
> Somewhere behind tcg_gen_gvec_dup64i, I check to see if the constant can be
> decomposed further, which will eventually bottom out at
> 
> 	vpxor	%xmm0,%xmm0,%xmm0		// all zeros
> 	vpcmpeq	%xmm0,%xmm0,%xmm0		// all ones
> 
> and even more interesting combinations for tcg/aarch64.
> 
> 

At this point I want to highlight how helpful your reviews are. Amazing! :)

> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 07/33] s390x/tcg: Implement VECTOR GENERATE MASK
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 07/33] s390x/tcg: Implement VECTOR GENERATE MASK David Hildenbrand
@ 2019-02-26 21:16   ` David Hildenbrand
  2019-02-27 15:29     ` Richard Henderson
  0 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 21:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson

On 26.02.19 12:38, David Hildenbrand wrote:
> This is the first instruction that uses gvec expansion for duplicating
> elements. We will use makros for most gvec calls to simplify translating
> vector numbers into offsets (and to not have to worry about oprsz and
> maxsz).
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate.c        |  1 +
>  target/s390x/translate_vx.inc.c | 34 +++++++++++++++++++++++++++++++++
>  3 files changed, 37 insertions(+)
> 
> diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
> index 1bdfcf8130..a3a0df7788 100644
> --- a/target/s390x/insn-data.def
> +++ b/target/s390x/insn-data.def
> @@ -979,6 +979,8 @@
>      E(0xe712, VGEG,    VRV,   V,   la2, 0, 0, 0, vge, 0, MO_64, IF_VEC)
>  /* VECTOR GENERATE BYTE MASK */
>      F(0xe744, VGBM,    VRI_a, V,   0, 0, 0, 0, vgbm, 0, IF_VEC)
> +/* VECTOR GENERATE MASK */
> +    F(0xe746, VGM,     VRI_b, V,   0, 0, 0, 0, vgm, 0, IF_VEC)
>  
>  #ifndef CONFIG_USER_ONLY
>  /* COMPARE AND SWAP AND PURGE */
> diff --git a/target/s390x/translate.c b/target/s390x/translate.c
> index 3935bc8bb7..56c146f91e 100644
> --- a/target/s390x/translate.c
> +++ b/target/s390x/translate.c
> @@ -34,6 +34,7 @@
>  #include "disas/disas.h"
>  #include "exec/exec-all.h"
>  #include "tcg-op.h"
> +#include "tcg-op-gvec.h"
>  #include "qemu/log.h"
>  #include "qemu/host-utils.h"
>  #include "exec/cpu_ldst.h"
> diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
> index 7775401dd3..ed63b2ca22 100644
> --- a/target/s390x/translate_vx.inc.c
> +++ b/target/s390x/translate_vx.inc.c
> @@ -43,6 +43,7 @@
>  
>  #define NUM_VEC_ELEMENT_BYTES(es) (1 << (es))
>  #define NUM_VEC_ELEMENTS(es) (16 / NUM_VEC_ELEMENT_BYTES(es))
> +#define NUM_VEC_ELEMENT_BITS(es) (NUM_VEC_ELEMENT_BYTES(es) * BITS_PER_BYTE)
>  
>  static inline bool valid_vec_element(uint8_t enr, TCGMemOp es)
>  {
> @@ -136,6 +137,9 @@ static void load_vec_element(DisasContext *s, uint8_t reg, uint8_t enr,
>      tcg_temp_free_i64(tmp);
>  }
>  
> +#define gen_gvec_dup_i64(es, v1, c) \
> +    tcg_gen_gvec_dup_i64(es, vec_full_reg_offset(v1), 16, 16, c)
> +
>  static DisasJumpType op_vge(DisasContext *s, DisasOps *o)
>  {
>      const uint8_t es = s->insn->data;
> @@ -175,3 +179,33 @@ static DisasJumpType op_vgbm(DisasContext *s, DisasOps *o)
>      tcg_temp_free_i32(zeroes);
>      return DISAS_NEXT;
>  }
> +
> +static DisasJumpType op_vgm(DisasContext *s, DisasOps *o)
> +{
> +    const uint8_t es = get_field(s->fields, m4);
> +    const uint8_t bits = NUM_VEC_ELEMENT_BITS(es);
> +    const uint8_t i2 = get_field(s->fields, i2) & (bits - 1);
> +    const uint8_t i3 = get_field(s->fields, i3) & (bits - 1);
> +    uint64_t mask = 0;
> +    TCGv_i64 tmp;
> +    int i;
> +
> +    if (es > MO_64) {
> +        gen_program_exception(s, PGM_SPECIFICATION);
> +        return DISAS_NORETURN;
> +    }
> +
> +    /* generate the mask - take care of wrapping */
> +    for (i = i2; ; i = (i + 1) % bits) {
> +        mask |= 1ull << (bits - i - 1);
> +        if (i == i3) {
> +            break;
> +        }
> +    }
> +
> +    tmp = tcg_temp_new_i64();
> +    tcg_gen_movi_i64(tmp, mask);
> +    gen_gvec_dup_i64(es, get_field(s->fields, v1), tmp);

Richard, shall I better convert this into

switch (es) {
case MO_8:
	tcg_gen_gvec_dup8i(..., 16, 16, mask)
	break;
case MO_16:
	tcg_gen_gvec_dup16i(..., 16, 16, mask)
	break;
...
};

?

Thanks

> +    tcg_temp_free_i64(tmp);
> +    return DISAS_NEXT;
> +}
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 06/33] s390x/tcg: Implement VECTOR GENERATE BYTE MASK
  2019-02-26 19:23     ` David Hildenbrand
@ 2019-02-26 21:23       ` David Hildenbrand
  0 siblings, 0 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-26 21:23 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 26.02.19 20:23, David Hildenbrand wrote:
> On 26.02.19 20:12, Richard Henderson wrote:
>> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>>> +static DisasJumpType op_vgbm(DisasContext *s, DisasOps *o)
>>> +{
>>> +    const uint16_t i2 = get_field(s->fields, i2);
>>> +    TCGv_i32 ones = tcg_const_i32(-1u);
>>> +    TCGv_i32 zeroes = tcg_const_i32(0);
>>> +    int i;
>>> +
>>> +    for (i = 0; i < 16; i++) {
>>> +        if (extract32(i2, 15 - i, 1)) {
>>> +            write_vec_element_i32(ones, get_field(s->fields, v1), i, MO_8);
>>> +        } else {
>>> +            write_vec_element_i32(zeroes, get_field(s->fields, v1), i, MO_8);
>>> +        }
>>> +    }
>>> +    tcg_temp_free_i32(ones);
>>> +    tcg_temp_free_i32(zeroes);
>>> +    return DISAS_NEXT;
>>> +}
>>
>> While this works, it's not in the spirit of
>>
>>> Programming Note: VECTOR GENERATE BYTE
>>> MASK is the preferred method for setting a vector
>>> register to all zeroes or ones.
> 
> Good point, I skipped that note so far.
> 
>>
>> Better, I think, with
> 
> Many instructions to implement, so little time to fine tune stuff so
> far. However I have tests for VGBM, so I can easily get it working. Will
> play with it!
> 
>>
>> uint64_t generate_byte_mask(uint8_t mask)
>> {
>>     uint64_t r = 0;
>>     int i;
>>     for (i = 0; i < 8; i++) {
>>         if ((mask >> i) & 1) {
>>             r |= 0xffull << (i * 8);
>>         }
>>     }
>>     return r;
>> }
>>
>>     if (i2 == (i2 & 0xff) * 0x0101) {
>>         /* masks for both halves of the vector are the same.
>>            trust tcg to produce a good constant loading.  */
>>         tcg_gen_gvec_dup64i(vec_full_reg_offset(s, v1), 16, 16,
>>                             generate_byte_mask(i2 & 0xff));
>>     } else {
>>         TCGv_i64 t = tcg_temp_new_i64();
>>         tcg_gen_movi_i64(t, generate_byte_mask(i2 >> 8));
>>         write_vec_element_i64(t, v1, 0, MO_64);
>>         tcg_gen_movi_i64(t, generate_byte_mask(i2 & 0xff));
>>         write_vec_element_i64(t, v1, 1, MO_64);
>>         tcg_temp_free_i64();
>>     }
>>
>> Somewhere behind tcg_gen_gvec_dup64i, I check to see if the constant can be
>> decomposed further, which will eventually bottom out at
>>
>> 	vpxor	%xmm0,%xmm0,%xmm0		// all zeros
>> 	vpcmpeq	%xmm0,%xmm0,%xmm0		// all ones
>>

Just tested with minor adaptions, works like a charm!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 07/33] s390x/tcg: Implement VECTOR GENERATE MASK
  2019-02-26 21:16   ` David Hildenbrand
@ 2019-02-27 15:29     ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 15:29 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 1:16 PM, David Hildenbrand wrote:
>> +    tmp = tcg_temp_new_i64();
>> +    tcg_gen_movi_i64(tmp, mask);
>> +    gen_gvec_dup_i64(es, get_field(s->fields, v1), tmp);
> Richard, shall I better convert this into
> 
> switch (es) {
> case MO_8:
> 	tcg_gen_gvec_dup8i(..., 16, 16, mask)
> 	break;
> case MO_16:
> 	tcg_gen_gvec_dup16i(..., 16, 16, mask)
> 	break;
> ...
> };
> 
> ?

Yes, that would be better.

I see code in tcg/optimizer.c that should have propagated the constant, but
it's better to emit the correct opcode in the first place when it is easy like
this.

With that,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org

r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 08/33] s390x/tcg: Implement VECTOR LOAD
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 08/33] s390x/tcg: Implement VECTOR LOAD David Hildenbrand
@ 2019-02-27 15:39   ` Richard Henderson
  2019-02-28  7:48     ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 15:39 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> +static DisasJumpType op_vl(DisasContext *s, DisasOps *o)
> +{
> +    load_vec_element(s, TMP_VREG_0, 0, o->addr1, MO_64);
> +    gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
> +    load_vec_element(s, TMP_VREG_0, 1, o->addr1, MO_64);
> +    gen_gvec_mov(get_field(s->fields, v1), TMP_VREG_0);
> +    return DISAS_NEXT;
> +}

Isn't it just as easy to load two TCGv_i64 temps and store into the correct
vector afterward?

Also, it is easy to honor the required alignment:

    TCGMemOp mop1, mop2;

    if (m3 < 3) {
        mop1 = mop2 = MO_TEQ;
    } else if (m3 == 3) {
        mop1 = mop2 = MO_TEQ | MO_ALIGN;
    } else {
        mop1 = MO_TEQ | MO_ALIGN_16;
        mop2 = MO_TEQ | MO_ALIGN;
    }
    tcg_gen_qemu_ld_i64(tmp1, o->addr1, mem_idx, mop1);
    gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
    tcg_gen_qemu_ld_i64(tmp2, o->addr1, mem_idx, mop2);
    write_vec_element_i64(tmp1, v1, 0, MO_64);
    write_vec_element_i64(tmp2, v1, 1, MO_64);


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 09/33] s390x/tcg: Implement VECTOR LOAD AND REPLICATE
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 09/33] s390x/tcg: Implement VECTOR LOAD AND REPLICATE David Hildenbrand
@ 2019-02-27 15:40   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 15:40 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> We can use tcg_gen_gvec_dup_i64() to carry out the duplication.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
>  2 files changed, 19 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 10/33] s390x/tcg: Implement VECTOR LOAD ELEMENT
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 10/33] s390x/tcg: Implement VECTOR LOAD ELEMENT David Hildenbrand
@ 2019-02-27 15:42   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 15:42 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> Fairly easy, load with desired size and store it into the right element.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  5 +++++
>  target/s390x/translate_vx.inc.c | 18 ++++++++++++++++++
>  2 files changed, 23 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 11/33] s390x/tcg: Implement VECTOR LOAD ELEMENT IMMEDIATE
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 11/33] s390x/tcg: Implement VECTOR LOAD ELEMENT IMMEDIATE David Hildenbrand
@ 2019-02-27 15:44   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 15:44 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> Take care of properly sign-extending the immediate.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  5 +++++
>  target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
>  2 files changed, 22 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 12/33] s390x/tcg: Implement VECTOR LOAD GR FROM VR ELEMENT
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 12/33] s390x/tcg: Implement VECTOR LOAD GR FROM VR ELEMENT David Hildenbrand
@ 2019-02-27 15:53   ` Richard Henderson
  2019-02-28  8:27     ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 15:53 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> To avoid an helper, we have to do the actual calculation of the element
> address (offset in cpu_env + cpu_env) manually. Factor that out into
> get_vec_element_ptr_i64(). The same logic will be reused for "VECTOR
> LOAD VR ELEMENT FROM GR".
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 55 +++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+)
> 
> diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
> index 46610e808f..f4201ff55a 100644
> --- a/target/s390x/insn-data.def
> +++ b/target/s390x/insn-data.def
> @@ -996,6 +996,8 @@
>      E(0xe741, VLEIH,   VRI_a, V,   0, 0, 0, 0, vlei, 0, MO_16, IF_VEC)
>      E(0xe743, VLEIF,   VRI_a, V,   0, 0, 0, 0, vlei, 0, MO_32, IF_VEC)
>      E(0xe742, VLEIG,   VRI_a, V,   0, 0, 0, 0, vlei, 0, MO_64, IF_VEC)
> +/* VECTOR LOAD GR FROM VR ELEMENT */
> +    F(0xe721, VLGV,    VRS_c, V,   la2, 0, r1, 0, vlgv, 0, IF_VEC)
>  
>  #ifndef CONFIG_USER_ONLY
>  /* COMPARE AND SWAP AND PURGE */
> diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
> index 1bf654ff4e..a02a3ba81f 100644
> --- a/target/s390x/translate_vx.inc.c
> +++ b/target/s390x/translate_vx.inc.c
> @@ -137,6 +137,28 @@ static void load_vec_element(DisasContext *s, uint8_t reg, uint8_t enr,
>      tcg_temp_free_i64(tmp);
>  }
>  
> +static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
> +                                    uint8_t es)
> +{
> +    TCGv_i64 tmp = tcg_temp_new_i64();
> +
> +    /* mask off invalid parts from the element nr */
> +    tcg_gen_andi_i64(tmp, enr, NUM_VEC_ELEMENTS(es) - 1);
> +
> +    /* convert it to an element offset relative to cpu_env (vec_reg_offset() */
> +    tcg_gen_muli_i64(tmp, tmp, NUM_VEC_ELEMENT_BYTES(es));

Or
  tcg_gen_shli_i64(tmp, tmp, es);


> +    /* generate the final ptr by adding cpu_env */
> +    tcg_gen_trunc_i64_ptr(ptr, tmp);
> +    tcg_gen_add_ptr(ptr, ptr, cpu_env);

Sadly, there's nothing in the optimizer that will propagate this...

> +    case MO_8:
> +        tcg_gen_ld8u_i64(o->out, ptr, 0);

... into this.

Is it easy for you objdump|grep some binaries to tell if my hunch is correct,
in that virtually all direct element access is with a constant, i.e. with c(r0)
as the address?

It would be nice if this could be (o->out, cpu_env, ofs) for those cases...

But what's here is correct, and what I'm suggesting is mere refinement,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 13/33] s390x/tcg: Implement VECTOR LOAD LOGICAL ELEMENT AND ZERO
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 13/33] s390x/tcg: Implement VECTOR LOAD LOGICAL ELEMENT AND ZERO David Hildenbrand
@ 2019-02-27 15:56   ` Richard Henderson
  2019-02-28  8:30     ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 15:56 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> +    zero_vec(TMP_VREG_0);
> +    load_vec_element(s, TMP_VREG_0, enr, o->addr1, es);
> +    gen_gvec_mov(get_field(s->fields, v1), TMP_VREG_0);

load into TCGv_i64, zero real dest, store into real dest.


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE David Hildenbrand
@ 2019-02-27 16:02   ` Richard Henderson
  2019-02-28  8:36     ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 16:02 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> Also fairly easy to implement. One issue we have is that exceptions will
> result in some vectors already being modified. At least handle it
> consistently per vector by using a temporary vector. Good enough for
> now, add a FIXME.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 26 ++++++++++++++++++++++++++
>  2 files changed, 28 insertions(+)

I suppose the fixme is good enough.  For the record, I think you could do the
check with just two loads -- the first and last quadword.  After that, none of
the other loads can fault, and you can store everything else into the
destination vectors as you read them.

Also missing for the fixme: MO_ALIGN{,_16}.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 15/33] s390x/tcg: Implement VECTOR LOAD TO BLOCK BOUNDARY
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 15/33] s390x/tcg: Implement VECTOR LOAD TO BLOCK BOUNDARY David Hildenbrand
@ 2019-02-27 16:08   ` Richard Henderson
  2019-02-28  8:40     ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 16:08 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> +void HELPER(vll)(CPUS390XState *env, void *v1, uint64_t addr, uint64_t bytes)
> +{
> +    S390Vector tmp = {};
> +    int i;
> +
> +    bytes = MIN(bytes, 16);
> +    for (i = 0; i < bytes; i++) {
> +        uint8_t byte = cpu_ldub_data_ra(env, addr, GETPC());
> +
> +        s390_vec_write_element8(&tmp, i, byte);
> +        addr = wrap_address(env, addr + 1);
> +    }

TODO:

    if (likely(bytes >= 16)) {
        uint64_t t0 = cpu_ldq_data_ra(env, addr, GETPC());
        uint64_t t1 = cpu_ldq_data_ra(env, addr, GETPC());
        s390_vec_write_element64(v1, 0, t0);
        s390_vec_write_element64(v1, 1, t1);
    } else {
        // byte loop
    }

But what you have is correct, so
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 16/33] s390x/tcg: Implement VECTOR LOAD VR ELEMENT FROM GR
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 16/33] s390x/tcg: Implement VECTOR LOAD VR ELEMENT FROM GR David Hildenbrand
@ 2019-02-27 16:08   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 16:08 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> Very similar to VECTOR LOAD GR FROM VR ELEMENT, just the opposite
> direction.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 33 +++++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)

Similar comment re constant offset.  ;-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 17/33] s390x/tcg: Implement VECTOR LOAD VR FROM GRS DISJOINT
  2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 17/33] s390x/tcg: Implement VECTOR LOAD VR FROM GRS DISJOINT David Hildenbrand
@ 2019-02-27 16:10   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 16:10 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:38 AM, David Hildenbrand wrote:
> Fairly easy, just load from to gprs into a single vector.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      | 2 ++
>  target/s390x/translate_vx.inc.c | 7 +++++++
>  2 files changed, 9 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 18/33] s390x/tcg: Implement VECTOR LOAD WITH LENGTH
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 18/33] s390x/tcg: Implement VECTOR LOAD WITH LENGTH David Hildenbrand
@ 2019-02-27 16:12   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 16:12 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> We can reuse the helper introduced along with VECTOR LOAD TO BLOCK
> BOUNDARY. We just have to take care of converting the highest index into
> a length.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate.c        |  7 +++++++
>  target/s390x/translate_vx.inc.c | 13 +++++++++++++
>  3 files changed, 22 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 19/33] s390x/tcg: Implement VECTOR MERGE (HIGH|LOW)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 19/33] s390x/tcg: Implement VECTOR MERGE (HIGH|LOW) David Hildenbrand
@ 2019-02-27 16:14   ` Richard Henderson
  2019-02-27 16:20   ` Richard Henderson
  1 sibling, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 16:14 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> We cannot use gvec expansion as source and destination elements are
> have different element numbers. So we'll expand using a fancy loop.
> Also, we have to take care of overlapping source and target registers and
> use a temporary register in case they do.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  4 +++
>  target/s390x/translate_vx.inc.c | 43 +++++++++++++++++++++++++++++++++
>  2 files changed, 47 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 19/33] s390x/tcg: Implement VECTOR MERGE (HIGH|LOW)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 19/33] s390x/tcg: Implement VECTOR MERGE (HIGH|LOW) David Hildenbrand
  2019-02-27 16:14   ` Richard Henderson
@ 2019-02-27 16:20   ` Richard Henderson
  2019-02-28  8:54     ` David Hildenbrand
  1 sibling, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 16:20 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> +    for (dst_idx = 0; dst_idx < NUM_VEC_ELEMENTS(es); dst_idx++) {
> +        src_idx = dst_idx / 2;
> +        if (!high) {
> +            src_idx += NUM_VEC_ELEMENTS(es) / 2;
> +        }
> +        if (dst_idx % 2 == 0) {
> +            read_vec_element_i64(tmp, v2, src_idx, es);
> +        } else {
> +            read_vec_element_i64(tmp, v3, src_idx, es);
> +        }
> +        write_vec_element_i64(tmp, dst_v, dst_idx, es);
> +    }

TODO: Note that you do not need a vector temporary here, so long as you load
both source elements before writing, and you iterate in the proper direction.

For VMRL, iterate forward as you do now.  The element access order for MO_32:

 read  v2: 2   3
 read  v3:   2   3
 write v1: 0 1 2 3

For VMRH, iterate backward:

 read  v2: 1   0
 read  v3:   1   0
 write v1: 3 2 1 0


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 20/33] s390x/tcg: Implement VECTOR PACK
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 20/33] s390x/tcg: Implement VECTOR PACK David Hildenbrand
@ 2019-02-27 23:11   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:11 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> We cannot use gvex expansion as the element size of source and
> destination differs. So expand manually. Luckily, VECTOR PACK does not
> care about saturation or setting the CC, so it can be implemented
> without a helper. We have to watch out for overlapping source and
> destination registers and use a temporary register in this case.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 41 +++++++++++++++++++++++++++++++++
>  2 files changed, 43 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 21/33] s390x/tcg: Implement VECTOR PACK (LOGICAL) SATURATE
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 21/33] s390x/tcg: Implement VECTOR PACK (LOGICAL) SATURATE David Hildenbrand
@ 2019-02-27 23:18   ` Richard Henderson
  2019-02-27 23:24   ` Richard Henderson
  1 sibling, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:18 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> We'll implement both via gvec ool helpers. As these can't return
> values, we'll return the CC via env->cc_op. Generate different C
> functions for the different cases using makros.
> 
> In the future we might want to do a translation like VECTOR PACK or
> use separate handlers in case no CC update is needed. As linux does
> not seem to use the function right now, no need to tune for performance.

Fair enough.

> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  6 +++
>  target/s390x/insn-data.def      |  4 ++
>  target/s390x/translate_vx.inc.c | 37 ++++++++++++++++
>  target/s390x/vec_helper.c       | 75 +++++++++++++++++++++++++++++++++
>  4 files changed, 122 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 22/33] s390x/tcg: Implement VECTOR PERMUTE
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 22/33] s390x/tcg: Implement VECTOR PERMUTE David Hildenbrand
@ 2019-02-27 23:21   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:21 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> Take care of overlying inputs and outputs by using a temporary vector.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  1 +
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 12 ++++++++++++
>  target/s390x/vec_helper.c       | 20 ++++++++++++++++++++
>  4 files changed, 35 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 21/33] s390x/tcg: Implement VECTOR PACK (LOGICAL) SATURATE
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 21/33] s390x/tcg: Implement VECTOR PACK (LOGICAL) SATURATE David Hildenbrand
  2019-02-27 23:18   ` Richard Henderson
@ 2019-02-27 23:24   ` Richard Henderson
  1 sibling, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:24 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> We'll implement both via gvec ool helpers. As these can't return
> values, we'll return the CC via env->cc_op. Generate different C
> functions for the different cases using makros.
> 
> In the future we might want to do a translation like VECTOR PACK or
> use separate handlers in case no CC update is needed. As linux does
> not seem to use the function right now, no need to tune for performance.

Fair enough.

> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  6 +++
>  target/s390x/insn-data.def      |  4 ++
>  target/s390x/translate_vx.inc.c | 37 ++++++++++++++++
>  target/s390x/vec_helper.c       | 75 +++++++++++++++++++++++++++++++++
>  4 files changed, 122 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 23/33] s390x/tcg: Implement VECTOR PERMUTE DOUBLEWORD IMMEDIATE
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 23/33] s390x/tcg: Implement VECTOR PERMUTE DOUBLEWORD IMMEDIATE David Hildenbrand
@ 2019-02-27 23:26   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:26 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> Read the whole input before modifying the destination vector.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 16 ++++++++++++++++
>  2 files changed, 18 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 24/33] s390x/tcg: Implement VECTOR REPLICATE
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 24/33] s390x/tcg: Implement VECTOR REPLICATE David Hildenbrand
@ 2019-02-27 23:29   ` Richard Henderson
  2019-02-27 23:31   ` Richard Henderson
  1 sibling, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:29 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> Load the element and replicate it using gvec_dup.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 18 ++++++++++++++++++
>  2 files changed, 20 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 24/33] s390x/tcg: Implement VECTOR REPLICATE
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 24/33] s390x/tcg: Implement VECTOR REPLICATE David Hildenbrand
  2019-02-27 23:29   ` Richard Henderson
@ 2019-02-27 23:31   ` Richard Henderson
  1 sibling, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:31 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> Load the element and replicate it using gvec_dup.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 18 ++++++++++++++++++
>  2 files changed, 20 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 25/33] s390x/tcg: Implement VECTOR REPLICATE IMMEDIATE
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 25/33] s390x/tcg: Implement VECTOR REPLICATE IMMEDIATE David Hildenbrand
@ 2019-02-27 23:39   ` Richard Henderson
  2019-02-28  9:07     ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:39 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> +    tmp = tcg_temp_new_i64();
> +    tcg_gen_movi_i64(tmp, data);
> +    gen_gvec_dup_i64(es, get_field(s->fields, v1), tmp);
> +    tcg_temp_free_i64(tmp);
> +    return DISAS_NEXT;

Reuse the dupi8, dupi16, ... switch from one of the other patches upthread?


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 26/33] s390x/tcg: Implement VECTOR SCATTER ELEMENT
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 26/33] s390x/tcg: Implement VECTOR SCATTER ELEMENT David Hildenbrand
@ 2019-02-27 23:40   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:40 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> Similar to VECTOR GATHER ELEMENT, but the other direction.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  3 +++
>  target/s390x/translate_vx.inc.c | 22 ++++++++++++++++++++++
>  2 files changed, 25 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 27/33] s390x/tcg: Implement VECTOR SELECT
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 27/33] s390x/tcg: Implement VECTOR SELECT David Hildenbrand
@ 2019-02-27 23:42   ` Richard Henderson
  2019-02-28  9:09     ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:42 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> +    tcg_gen_not_vec(vece, t, c);
> +    tcg_gen_and_vec(vece, t, t, b);

tcg_gen_andc_vec(t, b, c);

> +    tcg_gen_not_i64(t, c);
> +    tcg_gen_and_i64(t, b, t);

Likewise.


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 28/33] s390x/tcg: Implement VECTOR SIGN EXTEND TO DOUBLEWORD
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 28/33] s390x/tcg: Implement VECTOR SIGN EXTEND TO DOUBLEWORD David Hildenbrand
@ 2019-02-27 23:43   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:43 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> Load both elements signed and store them into the two 64 bit elements.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 33 +++++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 29/33] s390x/tcg: Implement VECTOR STORE
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 29/33] s390x/tcg: Implement VECTOR STORE David Hildenbrand
@ 2019-02-27 23:46   ` Richard Henderson
  2019-02-28  9:11     ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:46 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> +static DisasJumpType op_vst(DisasContext *s, DisasOps *o)
> +{
> +    /*
> +     * FIXME: On exceptions we must not modify any memory.
> +     */
> +    store_vec_element(s, get_field(s->fields, v1), 0, o->addr1, MO_64);
> +    gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
> +    store_vec_element(s, get_field(s->fields, v1), 1, o->addr1, MO_64);
> +    return DISAS_NEXT;

Should handle alignment though.

FWIW, there is a probe_write function that can be called to make sure a region
is writable before actually accessing it.  But this is common enough that we
should probably just handle 16-byte quantities as a native tcg type.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 30/33] s390x/tcg: Implement VECTOR STORE ELEMENT
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 30/33] s390x/tcg: Implement VECTOR STORE ELEMENT David Hildenbrand
@ 2019-02-27 23:47   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:47 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> As we only store one element, there is nothing to consider regarding
> exceptions.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  5 +++++
>  target/s390x/translate_vx.inc.c | 18 ++++++++++++++++++
>  2 files changed, 23 insertions(+)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 31/33] s390x/tcg: Implement VECTOR STORE MULTIPLE
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 31/33] s390x/tcg: Implement VECTOR STORE MULTIPLE David Hildenbrand
@ 2019-02-27 23:48   ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:48 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> Similar to VECTOR LOAD MULTIPLE, just the opposite direction.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 25 +++++++++++++++++++++++++
>  2 files changed, 27 insertions(+)

Same fixme wrt alignment too.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 32/33] s390x/tcg: Implement VECTOR STORE WITH LENGTH
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 32/33] s390x/tcg: Implement VECTOR STORE WITH LENGTH David Hildenbrand
@ 2019-02-27 23:49   ` Richard Henderson
  2019-02-28  9:13     ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-27 23:49 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> Very similar to VECTOR LOAD WITH LENGTH, just the opposite direction.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  1 +
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 13 +++++++++++++
>  target/s390x/vec_helper.c       | 15 +++++++++++++++
>  4 files changed, 31 insertions(+)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 33/33] s390x/tcg: Implement VECTOR UNPACK *
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 33/33] s390x/tcg: Implement VECTOR UNPACK * David Hildenbrand
@ 2019-02-28  0:03   ` Richard Henderson
  2019-02-28  9:28     ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-28  0:03 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/26/19 3:39 AM, David Hildenbrand wrote:
> Combine all variant in a single handler. As source and destination
> have different element sizes, we can't use gvec expansion. Expand
> manually. Also watch out for overlapping source and destination and
> use a temporary register in that case.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  8 +++++++
>  target/s390x/translate_vx.inc.c | 41 +++++++++++++++++++++++++++++++++
>  2 files changed, 49 insertions(+)

This works as is, so
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

But the same comment applies wrt iteration order and not needing a temporary.
High unpack can iterate backward, while low unpack can iterate forward, with no
lost data.


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1
  2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
                   ` (32 preceding siblings ...)
  2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 33/33] s390x/tcg: Implement VECTOR UNPACK * David Hildenbrand
@ 2019-02-28  7:24 ` David Hildenbrand
  33 siblings, 0 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28  7:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson

On 26.02.19 12:38, David Hildenbrand wrote:
> This is the first part of vector instruction support for s390x. Parts
> will be sent and reviewed piece by piece.
> 
> Part 1: Vector Support Instructions
> Part 2: Vector Integer Instructions
> Part 3: Vector String Instructions
> Part 4: Vector Floating-Point Instructions
> 
> The current state can be found at (kept updated):
>     https://github.com/davidhildenbrand/qemu/tree/vx
> It is based on
>     https://github.com/cohuck/qemu/tree/s390-next
> 
> To make use of vector instructions on my branch, make sure to specify
> "-cpu max" for now.
> 
> With the current state I can boot Linux kernel + user space compiled with
> SIMD support. This allows to boot distributions compiled exclusively for
> z13, requiring SIMD support. Also, I have a growing set of tests for
> kvm-unit-tests which I cross-test on a real s390x system.
> 
> In this part, the basic infrastructure and all Vector Support Instructions
> introduced with the "Vector Facility" are added. The Vector Extension
> Facilities are not considered for now.
> 
> We make use of the existing gvec expansion + ool (out-of-line) support.
> This will be heavily used especially for part 2 (Integer Instructions)
> where we can actually reuse quite some existing gvec expansions.
> 

I'll most probably introduce and use something like

#define ES_8    MO_8
#define ES_16   MO_16
#define ES_32   MO_32
#define ES_64   MO_64
#define ES_128  4

That will make handling of ES_128 nicer

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 08/33] s390x/tcg: Implement VECTOR LOAD
  2019-02-27 15:39   ` Richard Henderson
@ 2019-02-28  7:48     ` David Hildenbrand
  2019-02-28 16:34       ` Richard Henderson
  0 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28  7:48 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 27.02.19 16:39, Richard Henderson wrote:
> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>> +static DisasJumpType op_vl(DisasContext *s, DisasOps *o)
>> +{
>> +    load_vec_element(s, TMP_VREG_0, 0, o->addr1, MO_64);
>> +    gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
>> +    load_vec_element(s, TMP_VREG_0, 1, o->addr1, MO_64);
>> +    gen_gvec_mov(get_field(s->fields, v1), TMP_VREG_0);
>> +    return DISAS_NEXT;
>> +}
> 
> Isn't it just as easy to load two TCGv_i64 temps and store into the correct
> vector afterward?

Yes it is, using the existing helpers was just easier. I guess I'll
change that.

> 
> Also, it is easy to honor the required alignment:

I think that would be wrong. It is only an alignment hint.

"Setting the alignment hint to a non-zero value
that doesn’t correspond to the alignment of the second operand may
reduce performance on some models."

So we must not inject an exception when unaligned. This, however would
be the result of MO_ALIGN,, right?

In essence, this is just an optimization for real hardware and can be
ignored by us completely.

> 
>     TCGMemOp mop1, mop2;
> 
>     if (m3 < 3) {
>         mop1 = mop2 = MO_TEQ;
>     } else if (m3 == 3) {
>         mop1 = mop2 = MO_TEQ | MO_ALIGN;
>     } else {
>         mop1 = MO_TEQ | MO_ALIGN_16;
>         mop2 = MO_TEQ | MO_ALIGN;
>     }
>     tcg_gen_qemu_ld_i64(tmp1, o->addr1, mem_idx, mop1);
>     gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
>     tcg_gen_qemu_ld_i64(tmp2, o->addr1, mem_idx, mop2);
>     write_vec_element_i64(tmp1, v1, 0, MO_64);
>     write_vec_element_i64(tmp2, v1, 1, MO_64);
> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 12/33] s390x/tcg: Implement VECTOR LOAD GR FROM VR ELEMENT
  2019-02-27 15:53   ` Richard Henderson
@ 2019-02-28  8:27     ` David Hildenbrand
  2019-02-28 17:10       ` Richard Henderson
  0 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28  8:27 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 27.02.19 16:53, Richard Henderson wrote:
> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>> To avoid an helper, we have to do the actual calculation of the element
>> address (offset in cpu_env + cpu_env) manually. Factor that out into
>> get_vec_element_ptr_i64(). The same logic will be reused for "VECTOR
>> LOAD VR ELEMENT FROM GR".
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  target/s390x/insn-data.def      |  2 ++
>>  target/s390x/translate_vx.inc.c | 55 +++++++++++++++++++++++++++++++++
>>  2 files changed, 57 insertions(+)
>>
>> diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
>> index 46610e808f..f4201ff55a 100644
>> --- a/target/s390x/insn-data.def
>> +++ b/target/s390x/insn-data.def
>> @@ -996,6 +996,8 @@
>>      E(0xe741, VLEIH,   VRI_a, V,   0, 0, 0, 0, vlei, 0, MO_16, IF_VEC)
>>      E(0xe743, VLEIF,   VRI_a, V,   0, 0, 0, 0, vlei, 0, MO_32, IF_VEC)
>>      E(0xe742, VLEIG,   VRI_a, V,   0, 0, 0, 0, vlei, 0, MO_64, IF_VEC)
>> +/* VECTOR LOAD GR FROM VR ELEMENT */
>> +    F(0xe721, VLGV,    VRS_c, V,   la2, 0, r1, 0, vlgv, 0, IF_VEC)
>>  
>>  #ifndef CONFIG_USER_ONLY
>>  /* COMPARE AND SWAP AND PURGE */
>> diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
>> index 1bf654ff4e..a02a3ba81f 100644
>> --- a/target/s390x/translate_vx.inc.c
>> +++ b/target/s390x/translate_vx.inc.c
>> @@ -137,6 +137,28 @@ static void load_vec_element(DisasContext *s, uint8_t reg, uint8_t enr,
>>      tcg_temp_free_i64(tmp);
>>  }
>>  
>> +static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
>> +                                    uint8_t es)
>> +{
>> +    TCGv_i64 tmp = tcg_temp_new_i64();
>> +
>> +    /* mask off invalid parts from the element nr */
>> +    tcg_gen_andi_i64(tmp, enr, NUM_VEC_ELEMENTS(es) - 1);
>> +
>> +    /* convert it to an element offset relative to cpu_env (vec_reg_offset() */
>> +    tcg_gen_muli_i64(tmp, tmp, NUM_VEC_ELEMENT_BYTES(es));
> 
> Or
>   tcg_gen_shli_i64(tmp, tmp, es);


Makes sense!

> 
> 
>> +    /* generate the final ptr by adding cpu_env */
>> +    tcg_gen_trunc_i64_ptr(ptr, tmp);
>> +    tcg_gen_add_ptr(ptr, ptr, cpu_env);
> 
> Sadly, there's nothing in the optimizer that will propagate this...
> 
>> +    case MO_8:
>> +        tcg_gen_ld8u_i64(o->out, ptr, 0);
> 
> ... into this.
> 
> Is it easy for you objdump|grep some binaries to tell if my hunch is correct,
> in that virtually all direct element access is with a constant, i.e. with c(r0)
> as the address?
> 
> It would be nice if this could be (o->out, cpu_env, ofs) for those cases...
> 
> But what's here is correct, and what I'm suggesting is mere refinement,

I can do it quick and dirty, run a z13 compiled kernel+user space and
print if we really only have constants here. IMHO it makes perfect sense
to have a fast path for that.

 
+    /* fast path if we don't need the register content */
+    if (!get_field(s->fields, b2)) {
+        uint8_t enr = get_field(s->fields, d2) & (NUM_VEC_ELEMENTS(es) - 1);
+
+        read_vec_element_i64(o->out, get_field(s->fields, v3), enr, es);
+        return DISAS_NEXT;
+    }
+

Should do the trick, right?

Thanks!

> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 13/33] s390x/tcg: Implement VECTOR LOAD LOGICAL ELEMENT AND ZERO
  2019-02-27 15:56   ` Richard Henderson
@ 2019-02-28  8:30     ` David Hildenbrand
  0 siblings, 0 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28  8:30 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 27.02.19 16:56, Richard Henderson wrote:
> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>> +    zero_vec(TMP_VREG_0);
>> +    load_vec_element(s, TMP_VREG_0, enr, o->addr1, es);
>> +    gen_gvec_mov(get_field(s->fields, v1), TMP_VREG_0);
> 
> load into TCGv_i64, zero real dest, store into real dest.
> 

Yes, can do!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE
  2019-02-27 16:02   ` Richard Henderson
@ 2019-02-28  8:36     ` David Hildenbrand
  2019-02-28 17:15       ` Richard Henderson
  0 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28  8:36 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 27.02.19 17:02, Richard Henderson wrote:
> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>> Also fairly easy to implement. One issue we have is that exceptions will
>> result in some vectors already being modified. At least handle it
>> consistently per vector by using a temporary vector. Good enough for
>> now, add a FIXME.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  target/s390x/insn-data.def      |  2 ++
>>  target/s390x/translate_vx.inc.c | 26 ++++++++++++++++++++++++++
>>  2 files changed, 28 insertions(+)
> 
> I suppose the fixme is good enough.  For the record, I think you could do the
> check with just two loads -- the first and last quadword.  After that, none of
> the other loads can fault, and you can store everything else into the
> destination vectors as you read them.

Aren't such approaches prone to races if other VCPUs invalidate page
tables/TLB entries?

(or am I messing up things and the MMU of this VCPU won't be touched
while in this block and once we touched all applicable pages, it cannot
fail anymore?)

> 
> Also missing for the fixme: MO_ALIGN{,_16}.

Just like the other occurrence, I think MO_ALIGN would be wrong.

"Setting the alignment hint to a non-zero value
that doesn’t correspond to the alignment of the
second operand may reduce performance on
some models."

> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 

Thanks!

> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 15/33] s390x/tcg: Implement VECTOR LOAD TO BLOCK BOUNDARY
  2019-02-27 16:08   ` Richard Henderson
@ 2019-02-28  8:40     ` David Hildenbrand
  0 siblings, 0 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28  8:40 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 27.02.19 17:08, Richard Henderson wrote:
> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>> +void HELPER(vll)(CPUS390XState *env, void *v1, uint64_t addr, uint64_t bytes)
>> +{
>> +    S390Vector tmp = {};
>> +    int i;
>> +
>> +    bytes = MIN(bytes, 16);
>> +    for (i = 0; i < bytes; i++) {
>> +        uint8_t byte = cpu_ldub_data_ra(env, addr, GETPC());
>> +
>> +        s390_vec_write_element8(&tmp, i, byte);
>> +        addr = wrap_address(env, addr + 1);
>> +    }
> 
> TODO:
> 
>     if (likely(bytes >= 16)) {
>         uint64_t t0 = cpu_ldq_data_ra(env, addr, GETPC());
>         uint64_t t1 = cpu_ldq_data_ra(env, addr, GETPC());

adding + wrapping the address of course.

>         s390_vec_write_element64(v1, 0, t0);
>         s390_vec_write_element64(v1, 1, t1);
>     } else {
>         // byte loop
>     }
> 
> But what you have is correct, so

Makes sense and gets rid of the MIN(), so changed :)

Thanks!

> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 19/33] s390x/tcg: Implement VECTOR MERGE (HIGH|LOW)
  2019-02-27 16:20   ` Richard Henderson
@ 2019-02-28  8:54     ` David Hildenbrand
  0 siblings, 0 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28  8:54 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 27.02.19 17:20, Richard Henderson wrote:
> On 2/26/19 3:39 AM, David Hildenbrand wrote:
>> +    for (dst_idx = 0; dst_idx < NUM_VEC_ELEMENTS(es); dst_idx++) {
>> +        src_idx = dst_idx / 2;
>> +        if (!high) {
>> +            src_idx += NUM_VEC_ELEMENTS(es) / 2;
>> +        }
>> +        if (dst_idx % 2 == 0) {
>> +            read_vec_element_i64(tmp, v2, src_idx, es);
>> +        } else {
>> +            read_vec_element_i64(tmp, v3, src_idx, es);
>> +        }
>> +        write_vec_element_i64(tmp, dst_v, dst_idx, es);
>> +    }
> 
> TODO: Note that you do not need a vector temporary here, so long as you load
> both source elements before writing, and you iterate in the proper direction.
> 
> For VMRL, iterate forward as you do now.  The element access order for MO_32:
> 
>  read  v2: 2   3
>  read  v3:   2   3
>  write v1: 0 1 2 3
> 
> For VMRH, iterate backward:
> 
>  read  v2: 1   0
>  read  v3:   1   0
>  write v1: 3 2 1 0
> 
> 
> r~
> 

Let's have a look for VMRH when iterating forward (My brain is a little
slow in the morning):

v1[0] = v2[0]
v1[1] = v3[0]
v1[2] = v2[1]
v1[3] = v3[1]

If all would overlap

v1[0] = v1[0]
v1[1] = v1[0] -> v1[0] already modified
v1[2] = v1[1] -> v1[1] already modified
v1[3] = v1[1] -> v1[1] already modified

When iterating backwards:

v1[3] = v3[1]
v1[2] = v2[1]
v1[1] = v3[0]
v1[0] = v2[0]

If all would overlap

v1[3] = v1[1]
v1[2] = v1[1]
v1[1] = v1[0]
v1[0] = v1[0]


VMRH when iterating forward:

v1[0] = v2[2]
v1[1] = v3[2]
v1[2] = v2[3]
v1[3] = v3[3]

If all would overlap

v1[0] = v1[2]
v1[1] = v1[2]
v1[2] = v1[3]
v1[3] = v1[3]

Perfect :) I'll split up the two cases! Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 25/33] s390x/tcg: Implement VECTOR REPLICATE IMMEDIATE
  2019-02-27 23:39   ` Richard Henderson
@ 2019-02-28  9:07     ` David Hildenbrand
  0 siblings, 0 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28  9:07 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 28.02.19 00:39, Richard Henderson wrote:
> On 2/26/19 3:39 AM, David Hildenbrand wrote:
>> +    tmp = tcg_temp_new_i64();
>> +    tcg_gen_movi_i64(tmp, data);
>> +    gen_gvec_dup_i64(es, get_field(s->fields, v1), tmp);
>> +    tcg_temp_free_i64(tmp);
>> +    return DISAS_NEXT;
> 
> Reuse the dupi8, dupi16, ... switch from one of the other patches upthread?

Yes, makes sense!

> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 27/33] s390x/tcg: Implement VECTOR SELECT
  2019-02-27 23:42   ` Richard Henderson
@ 2019-02-28  9:09     ` David Hildenbrand
  0 siblings, 0 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28  9:09 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 28.02.19 00:42, Richard Henderson wrote:
> On 2/26/19 3:39 AM, David Hildenbrand wrote:
>> +    tcg_gen_not_vec(vece, t, c);
>> +    tcg_gen_and_vec(vece, t, t, b);
> 
> tcg_gen_andc_vec(t, b, c);
> 
>> +    tcg_gen_not_i64(t, c);
>> +    tcg_gen_and_i64(t, b, t);
> 
> Likewise.
> 

Changed, thanks.


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 29/33] s390x/tcg: Implement VECTOR STORE
  2019-02-27 23:46   ` Richard Henderson
@ 2019-02-28  9:11     ` David Hildenbrand
  0 siblings, 0 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28  9:11 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 28.02.19 00:46, Richard Henderson wrote:
> On 2/26/19 3:39 AM, David Hildenbrand wrote:
>> +static DisasJumpType op_vst(DisasContext *s, DisasOps *o)
>> +{
>> +    /*
>> +     * FIXME: On exceptions we must not modify any memory.
>> +     */
>> +    store_vec_element(s, get_field(s->fields, v1), 0, o->addr1, MO_64);
>> +    gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
>> +    store_vec_element(s, get_field(s->fields, v1), 1, o->addr1, MO_64);
>> +    return DISAS_NEXT;
> 
> Should handle alignment though.

Again, as unaligned access must not trigger an exception, I don't think
we can use MO_ALIGN.

> 
> FWIW, there is a probe_write function that can be called to make sure a region
> is writable before actually accessing it.  But this is common enough that we
> should probably just handle 16-byte quantities as a native tcg type.

That would make most sense. It won't help for the VECTOR_STORE_MULTIPLE
part, though.

I'll keep it as is for now.

Thanks!

> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 32/33] s390x/tcg: Implement VECTOR STORE WITH LENGTH
  2019-02-27 23:49   ` Richard Henderson
@ 2019-02-28  9:13     ` David Hildenbrand
  0 siblings, 0 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28  9:13 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 28.02.19 00:49, Richard Henderson wrote:
> On 2/26/19 3:39 AM, David Hildenbrand wrote:
>> Very similar to VECTOR LOAD WITH LENGTH, just the opposite direction.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  target/s390x/helper.h           |  1 +
>>  target/s390x/insn-data.def      |  2 ++
>>  target/s390x/translate_vx.inc.c | 13 +++++++++++++
>>  target/s390x/vec_helper.c       | 15 +++++++++++++++
>>  4 files changed, 31 insertions(+)
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

I'll also add a fast path for storing with lengths >= 16.

Thanks.

> r~
> 
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 33/33] s390x/tcg: Implement VECTOR UNPACK *
  2019-02-28  0:03   ` Richard Henderson
@ 2019-02-28  9:28     ` David Hildenbrand
  2019-02-28 10:54       ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28  9:28 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 28.02.19 01:03, Richard Henderson wrote:
> On 2/26/19 3:39 AM, David Hildenbrand wrote:
>> Combine all variant in a single handler. As source and destination
>> have different element sizes, we can't use gvec expansion. Expand
>> manually. Also watch out for overlapping source and destination and
>> use a temporary register in that case.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  target/s390x/insn-data.def      |  8 +++++++
>>  target/s390x/translate_vx.inc.c | 41 +++++++++++++++++++++++++++++++++
>>  2 files changed, 49 insertions(+)
> 
> This works as is, so
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> But the same comment applies wrt iteration order and not needing a temporary.
> High unpack can iterate backward, while low unpack can iterate forward, with no
> lost data.

I'll fix that right away. I guess vector pack cannot be handled like this.

The only way to get rid of the temporary would be to load both elements
from v2 and v3 and then writing the two (half sized) elements in v1.

I'll have a look.

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 33/33] s390x/tcg: Implement VECTOR UNPACK *
  2019-02-28  9:28     ` David Hildenbrand
@ 2019-02-28 10:54       ` David Hildenbrand
  2019-02-28 18:22         ` Richard Henderson
  0 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28 10:54 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 28.02.19 10:28, David Hildenbrand wrote:
> On 28.02.19 01:03, Richard Henderson wrote:
>> On 2/26/19 3:39 AM, David Hildenbrand wrote:
>>> Combine all variant in a single handler. As source and destination
>>> have different element sizes, we can't use gvec expansion. Expand
>>> manually. Also watch out for overlapping source and destination and
>>> use a temporary register in that case.
>>>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  target/s390x/insn-data.def      |  8 +++++++
>>>  target/s390x/translate_vx.inc.c | 41 +++++++++++++++++++++++++++++++++
>>>  2 files changed, 49 insertions(+)
>>
>> This works as is, so
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>>
>> But the same comment applies wrt iteration order and not needing a temporary.
>> High unpack can iterate backward, while low unpack can iterate forward, with no
>> lost data.
> 
> I'll fix that right away. I guess vector pack cannot be handled like this.
> 
> The only way to get rid of the temporary would be to load both elements
> from v2 and v3 and then writing the two (half sized) elements in v1.
> 
> I'll have a look.

Hmm, as v2 and v3 are handled concatenated it is not that easy. I am not
sure if we can handle this without a temporary vector.

I thought about packing them first interleaved

v2 = [v2e0, v2e1]
v3 = [v3e0, ve31]
v1 = [v2e0_packed, v3e0_packed, v2e1_packed, v3e1_packed]

And then restoring the right order

v1 = [v2e0_packed, v2e1_packed, v3e0_packed, v3e1_packed]

But than the second operation seems to be the problem. That shuffling
would have to be hard coded as far as I can see. (shuffling with MO_8 is
nasty -> 14 element shave to be exchanged, in my opinion needing
eventually 14 temporary variables)

Of course, we can also simply detect duplicates and if so, call into a
helper.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 08/33] s390x/tcg: Implement VECTOR LOAD
  2019-02-28  7:48     ` David Hildenbrand
@ 2019-02-28 16:34       ` Richard Henderson
  2019-02-28 16:40         ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-28 16:34 UTC (permalink / raw)
  To: David Hildenbrand, Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 2/27/19 11:48 PM, David Hildenbrand wrote:
> I think that would be wrong. It is only an alignment hint.
> 
> "Setting the alignment hint to a non-zero value
> that doesn’t correspond to the alignment of the second operand may
> reduce performance on some models."
> 
> So we must not inject an exception when unaligned. This, however would
> be the result of MO_ALIGN,, right?

Ah, I didn't get that an alignment exception is not raised.  (I do find that
odd.  If the user is asserting a given alignment, why would we not tell him if
he is wrong?)

So, yes, ignore all of this from me -- leave MO_ALIGN off.


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 08/33] s390x/tcg: Implement VECTOR LOAD
  2019-02-28 16:34       ` Richard Henderson
@ 2019-02-28 16:40         ` David Hildenbrand
  0 siblings, 0 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28 16:40 UTC (permalink / raw)
  To: Richard Henderson, Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 28.02.19 17:34, Richard Henderson wrote:
> On 2/27/19 11:48 PM, David Hildenbrand wrote:
>> I think that would be wrong. It is only an alignment hint.
>>
>> "Setting the alignment hint to a non-zero value
>> that doesn’t correspond to the alignment of the second operand may
>> reduce performance on some models."
>>
>> So we must not inject an exception when unaligned. This, however would
>> be the result of MO_ALIGN,, right?
> 
> Ah, I didn't get that an alignment exception is not raised.  (I do find that
> odd.  If the user is asserting a given alignment, why would we not tell him if
> he is wrong?)

I was wondering the same thing. Most probably because they didn't
specify that that field has to contain 0 when introducing the
instruction. And as they added the alignment constraint only on new
hardware generations (z14), it could result for some instructions where
stuff "used to work" to suddenly report an exception.

> 
> So, yes, ignore all of this from me -- leave MO_ALIGN off.
> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 12/33] s390x/tcg: Implement VECTOR LOAD GR FROM VR ELEMENT
  2019-02-28  8:27     ` David Hildenbrand
@ 2019-02-28 17:10       ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-02-28 17:10 UTC (permalink / raw)
  To: David Hildenbrand, Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 2/28/19 12:27 AM, David Hildenbrand wrote:
> +    /* fast path if we don't need the register content */
> +    if (!get_field(s->fields, b2)) {
> +        uint8_t enr = get_field(s->fields, d2) & (NUM_VEC_ELEMENTS(es) - 1);
> +
> +        read_vec_element_i64(o->out, get_field(s->fields, v3), enr, es);
> +        return DISAS_NEXT;
> +    }
> +
> 
> Should do the trick, right?

Yep!


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE
  2019-02-28  8:36     ` David Hildenbrand
@ 2019-02-28 17:15       ` Richard Henderson
  2019-02-28 19:05         ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-28 17:15 UTC (permalink / raw)
  To: David Hildenbrand, Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 2/28/19 12:36 AM, David Hildenbrand wrote:
> On 27.02.19 17:02, Richard Henderson wrote:
>> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>>> Also fairly easy to implement. One issue we have is that exceptions will
>>> result in some vectors already being modified. At least handle it
>>> consistently per vector by using a temporary vector. Good enough for
>>> now, add a FIXME.
>>>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  target/s390x/insn-data.def      |  2 ++
>>>  target/s390x/translate_vx.inc.c | 26 ++++++++++++++++++++++++++
>>>  2 files changed, 28 insertions(+)
>>
>> I suppose the fixme is good enough.  For the record, I think you could do the
>> check with just two loads -- the first and last quadword.  After that, none of
>> the other loads can fault, and you can store everything else into the
>> destination vectors as you read them.
> 
> Aren't such approaches prone to races if other VCPUs invalidate page
> tables/TLB entries?

No, because...

> (or am I messing up things and the MMU of this VCPU won't be touched
> while in this block and once we touched all applicable pages, it cannot
> fail anymore?)

Correct.

If vcpu 1 does a global invalidate, the time at which vcpu 2 acknowledges that
invalidate is somewhat fluid.  VCPU 2 will see an interrupt, exit at a TB
boundary, and then acknowledge.

VCPU 1 has to wait for the ack before it knows the operation is complete.

Thus no race within any given instruction's execution.


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 33/33] s390x/tcg: Implement VECTOR UNPACK *
  2019-02-28 10:54       ` David Hildenbrand
@ 2019-02-28 18:22         ` Richard Henderson
  2019-02-28 19:45           ` David Hildenbrand
  0 siblings, 1 reply; 94+ messages in thread
From: Richard Henderson @ 2019-02-28 18:22 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 2/28/19 2:54 AM, David Hildenbrand wrote:
> Hmm, as v2 and v3 are handled concatenated it is not that easy. I am not
> sure if we can handle this without a temporary vector.
> 
> I thought about packing them first interleaved
> 
> v2 = [v2e0, v2e1]
> v3 = [v3e0, ve31]
> v1 = [v2e0_packed, v3e0_packed, v2e1_packed, v3e1_packed]
> 
> And then restoring the right order
> 
> v1 = [v2e0_packed, v2e1_packed, v3e0_packed, v3e1_packed]
> 
> But than the second operation seems to be the problem. That shuffling
> would have to be hard coded as far as I can see. (shuffling with MO_8 is
> nasty -> 14 element shave to be exchanged, in my opinion needing
> eventually 14 temporary variables)

I suppose you could do it in registers.

  load_element_i64(t1, v2, 0);
  for (i = 1; i < N; i++) {
    load_element_i64(t3, v2, i, es);
    tcg_gen_deposit_i64(t1, t1, t3, i << es, 1 << es);
  }
  // repeat for v3 into t2
  // store t1,t2 into v1.

Now you have only 3 temporaries, which is manageable.

The only question, when it comes to MO_8, is whether the code expansion of this
is reasonable (16 byte loads, 15 deposits, 2 stores -- minimum 33 insns,
probably 48 for x86_64 host), or whether a helper function would be better in
the end.  But then the same is true for all of the other merge & unpack
operations wrt MO_8.


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE
  2019-02-28 17:15       ` Richard Henderson
@ 2019-02-28 19:05         ` David Hildenbrand
  2019-03-01  6:34           ` Richard Henderson
  0 siblings, 1 reply; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28 19:05 UTC (permalink / raw)
  To: Richard Henderson, Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 28.02.19 18:15, Richard Henderson wrote:
> On 2/28/19 12:36 AM, David Hildenbrand wrote:
>> On 27.02.19 17:02, Richard Henderson wrote:
>>> On 2/26/19 3:38 AM, David Hildenbrand wrote:
>>>> Also fairly easy to implement. One issue we have is that exceptions will
>>>> result in some vectors already being modified. At least handle it
>>>> consistently per vector by using a temporary vector. Good enough for
>>>> now, add a FIXME.
>>>>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  target/s390x/insn-data.def      |  2 ++
>>>>  target/s390x/translate_vx.inc.c | 26 ++++++++++++++++++++++++++
>>>>  2 files changed, 28 insertions(+)
>>>
>>> I suppose the fixme is good enough.  For the record, I think you could do the
>>> check with just two loads -- the first and last quadword.  After that, none of
>>> the other loads can fault, and you can store everything else into the
>>> destination vectors as you read them.
>>
>> Aren't such approaches prone to races if other VCPUs invalidate page
>> tables/TLB entries?
> 
> No, because...
> 
>> (or am I messing up things and the MMU of this VCPU won't be touched
>> while in this block and once we touched all applicable pages, it cannot
>> fail anymore?)
> 
> Correct.
> 
> If vcpu 1 does a global invalidate, the time at which vcpu 2 acknowledges that
> invalidate is somewhat fluid.  VCPU 2 will see an interrupt, exit at a TB
> boundary, and then acknowledge.
> 
> VCPU 1 has to wait for the ack before it knows the operation is complete.
> 
> Thus no race within any given instruction's execution.

Okay, rings a bell, thanks! :)

So for writing from helpers, I can use probe_write(). What about testing
write access from TCG code?

I could do a load, followed by a store of the loaded value. This should
work in most cases (but eventually could be observed by somebody really
wanting to observe it - which is highly unlikely).


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 33/33] s390x/tcg: Implement VECTOR UNPACK *
  2019-02-28 18:22         ` Richard Henderson
@ 2019-02-28 19:45           ` David Hildenbrand
  0 siblings, 0 replies; 94+ messages in thread
From: David Hildenbrand @ 2019-02-28 19:45 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 28.02.19 19:22, Richard Henderson wrote:
> On 2/28/19 2:54 AM, David Hildenbrand wrote:
>> Hmm, as v2 and v3 are handled concatenated it is not that easy. I am not
>> sure if we can handle this without a temporary vector.
>>
>> I thought about packing them first interleaved
>>
>> v2 = [v2e0, v2e1]
>> v3 = [v3e0, ve31]
>> v1 = [v2e0_packed, v3e0_packed, v2e1_packed, v3e1_packed]
>>
>> And then restoring the right order
>>
>> v1 = [v2e0_packed, v2e1_packed, v3e0_packed, v3e1_packed]
>>
>> But than the second operation seems to be the problem. That shuffling
>> would have to be hard coded as far as I can see. (shuffling with MO_8 is
>> nasty -> 14 element shave to be exchanged, in my opinion needing
>> eventually 14 temporary variables)
> 
> I suppose you could do it in registers.
> 
>   load_element_i64(t1, v2, 0);
>   for (i = 1; i < N; i++) {
>     load_element_i64(t3, v2, i, es);
>     tcg_gen_deposit_i64(t1, t1, t3, i << es, 1 << es);
>   }
>   // repeat for v3 into t2
>   // store t1,t2 into v1.
> 
> Now you have only 3 temporaries, which is manageable.
> 
> The only question, when it comes to MO_8, is whether the code expansion of this
> is reasonable (16 byte loads, 15 deposits, 2 stores -- minimum 33 insns,
> probably 48 for x86_64 host), or whether a helper function would be better in
> the end.  But then the same is true for all of the other merge & unpack
> operations wrt MO_8.

And it would only apply when dst==src. Will have a try what looks "less
ugly" :) Thanks!

> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE
  2019-02-28 19:05         ` David Hildenbrand
@ 2019-03-01  6:34           ` Richard Henderson
  0 siblings, 0 replies; 94+ messages in thread
From: Richard Henderson @ 2019-03-01  6:34 UTC (permalink / raw)
  To: David Hildenbrand, Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 2/28/19 11:05 AM, David Hildenbrand wrote:
> So for writing from helpers, I can use probe_write(). What about testing
> write access from TCG code?
> 
> I could do a load, followed by a store of the loaded value. This should
> work in most cases (but eventually could be observed by somebody really
> wanting to observe it - which is highly unlikely).

I would call a helper for probe_write.


r~

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2019-03-01  6:34 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-26 11:38 [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 01/33] s390x/tcg: Define vector instruction formats David Hildenbrand
2019-02-26 18:24   ` Richard Henderson
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 02/33] s390x/tcg: Check vector register instructions at central point David Hildenbrand
2019-02-26 18:26   ` Richard Henderson
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 03/33] s390x: Add one temporary vector register in CPU state for TCG David Hildenbrand
2019-02-26 18:36   ` Richard Henderson
2019-02-26 18:45     ` David Hildenbrand
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 04/33] s390x/tcg: Utilities for vector instruction helpers David Hildenbrand
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 05/33] s390x/tcg: Implement VECTOR GATHER ELEMENT David Hildenbrand
2019-02-26 18:44   ` Richard Henderson
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 06/33] s390x/tcg: Implement VECTOR GENERATE BYTE MASK David Hildenbrand
2019-02-26 19:12   ` Richard Henderson
2019-02-26 19:23     ` David Hildenbrand
2019-02-26 21:23       ` David Hildenbrand
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 07/33] s390x/tcg: Implement VECTOR GENERATE MASK David Hildenbrand
2019-02-26 21:16   ` David Hildenbrand
2019-02-27 15:29     ` Richard Henderson
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 08/33] s390x/tcg: Implement VECTOR LOAD David Hildenbrand
2019-02-27 15:39   ` Richard Henderson
2019-02-28  7:48     ` David Hildenbrand
2019-02-28 16:34       ` Richard Henderson
2019-02-28 16:40         ` David Hildenbrand
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 09/33] s390x/tcg: Implement VECTOR LOAD AND REPLICATE David Hildenbrand
2019-02-27 15:40   ` Richard Henderson
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 10/33] s390x/tcg: Implement VECTOR LOAD ELEMENT David Hildenbrand
2019-02-27 15:42   ` Richard Henderson
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 11/33] s390x/tcg: Implement VECTOR LOAD ELEMENT IMMEDIATE David Hildenbrand
2019-02-27 15:44   ` Richard Henderson
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 12/33] s390x/tcg: Implement VECTOR LOAD GR FROM VR ELEMENT David Hildenbrand
2019-02-27 15:53   ` Richard Henderson
2019-02-28  8:27     ` David Hildenbrand
2019-02-28 17:10       ` Richard Henderson
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 13/33] s390x/tcg: Implement VECTOR LOAD LOGICAL ELEMENT AND ZERO David Hildenbrand
2019-02-27 15:56   ` Richard Henderson
2019-02-28  8:30     ` David Hildenbrand
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 14/33] s390x/tcg: Implement VECTOR LOAD MULTIPLE David Hildenbrand
2019-02-27 16:02   ` Richard Henderson
2019-02-28  8:36     ` David Hildenbrand
2019-02-28 17:15       ` Richard Henderson
2019-02-28 19:05         ` David Hildenbrand
2019-03-01  6:34           ` Richard Henderson
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 15/33] s390x/tcg: Implement VECTOR LOAD TO BLOCK BOUNDARY David Hildenbrand
2019-02-27 16:08   ` Richard Henderson
2019-02-28  8:40     ` David Hildenbrand
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 16/33] s390x/tcg: Implement VECTOR LOAD VR ELEMENT FROM GR David Hildenbrand
2019-02-27 16:08   ` Richard Henderson
2019-02-26 11:38 ` [Qemu-devel] [PATCH v1 17/33] s390x/tcg: Implement VECTOR LOAD VR FROM GRS DISJOINT David Hildenbrand
2019-02-27 16:10   ` Richard Henderson
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 18/33] s390x/tcg: Implement VECTOR LOAD WITH LENGTH David Hildenbrand
2019-02-27 16:12   ` Richard Henderson
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 19/33] s390x/tcg: Implement VECTOR MERGE (HIGH|LOW) David Hildenbrand
2019-02-27 16:14   ` Richard Henderson
2019-02-27 16:20   ` Richard Henderson
2019-02-28  8:54     ` David Hildenbrand
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 20/33] s390x/tcg: Implement VECTOR PACK David Hildenbrand
2019-02-27 23:11   ` Richard Henderson
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 21/33] s390x/tcg: Implement VECTOR PACK (LOGICAL) SATURATE David Hildenbrand
2019-02-27 23:18   ` Richard Henderson
2019-02-27 23:24   ` Richard Henderson
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 22/33] s390x/tcg: Implement VECTOR PERMUTE David Hildenbrand
2019-02-27 23:21   ` Richard Henderson
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 23/33] s390x/tcg: Implement VECTOR PERMUTE DOUBLEWORD IMMEDIATE David Hildenbrand
2019-02-27 23:26   ` Richard Henderson
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 24/33] s390x/tcg: Implement VECTOR REPLICATE David Hildenbrand
2019-02-27 23:29   ` Richard Henderson
2019-02-27 23:31   ` Richard Henderson
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 25/33] s390x/tcg: Implement VECTOR REPLICATE IMMEDIATE David Hildenbrand
2019-02-27 23:39   ` Richard Henderson
2019-02-28  9:07     ` David Hildenbrand
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 26/33] s390x/tcg: Implement VECTOR SCATTER ELEMENT David Hildenbrand
2019-02-27 23:40   ` Richard Henderson
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 27/33] s390x/tcg: Implement VECTOR SELECT David Hildenbrand
2019-02-27 23:42   ` Richard Henderson
2019-02-28  9:09     ` David Hildenbrand
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 28/33] s390x/tcg: Implement VECTOR SIGN EXTEND TO DOUBLEWORD David Hildenbrand
2019-02-27 23:43   ` Richard Henderson
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 29/33] s390x/tcg: Implement VECTOR STORE David Hildenbrand
2019-02-27 23:46   ` Richard Henderson
2019-02-28  9:11     ` David Hildenbrand
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 30/33] s390x/tcg: Implement VECTOR STORE ELEMENT David Hildenbrand
2019-02-27 23:47   ` Richard Henderson
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 31/33] s390x/tcg: Implement VECTOR STORE MULTIPLE David Hildenbrand
2019-02-27 23:48   ` Richard Henderson
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 32/33] s390x/tcg: Implement VECTOR STORE WITH LENGTH David Hildenbrand
2019-02-27 23:49   ` Richard Henderson
2019-02-28  9:13     ` David Hildenbrand
2019-02-26 11:39 ` [Qemu-devel] [PATCH v1 33/33] s390x/tcg: Implement VECTOR UNPACK * David Hildenbrand
2019-02-28  0:03   ` Richard Henderson
2019-02-28  9:28     ` David Hildenbrand
2019-02-28 10:54       ` David Hildenbrand
2019-02-28 18:22         ` Richard Henderson
2019-02-28 19:45           ` David Hildenbrand
2019-02-28  7:24 ` [Qemu-devel] [PATCH v1 00/33] s390x/tcg: Vector Instruction Support Part 1 David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.