All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v1 00/41] s390x/tcg: Vector Instruction Support Part 2
@ 2019-04-11 10:07 ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

This is the second part of vector instruction support for s390x. It is
based on the series soon to land upstream:
    [PATCH 0/9] tcg: Add tcg_gen_extract2_{i32,i64}

Part 1: Vector Support Instructions
Part 2: Vector Integer Instructions
Part 3: Vector String Instructions
Part 4: Vector Floating-Point Instructions

The current state can be found at (kept updated):
    https://github.com/davidhildenbrand/qemu/tree/vx

With the current state I can boot Linux kernel + user space compiled with
SIMD support. This allows to boot distributions compiled exclusively for
z13, requiring SIMD support. Also, it is now possible to build a complete
kernel using rpmbuild as quite some issues have been sorted out.

In this part, all Vector Integer Instructions introduced with the
"Vector Facility" are added. Some instructions part of the Vector Extension
Facilities are also added.

David Hildenbrand (41):
  tcg: Implement tcg_gen_gvec_3i()
  s390x/tcg: Implement VECTOR ADD
  s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
  s390x/tcg: Implement VECTOR ADD WITH CARRY
  s390x/tcg: Implement VECTOR ADD WITH CARRY COMPUTE CARRY
  s390x/tcg: Implement VECTOR AND (WITH COMPLEMENT)
  s390x/tcg: Implement VECTOR AVERAGE
  s390x/tcg: Implement VECTOR AVERAGE LOGICAL
  s390x/tcg: Implement VECTOR CHECKSUM
  s390x/tcg: Implement VECTOR ELEMENT COMPARE *
  s390x/tcg: Implement VECTOR COMPARE *
  s390x/tcg: Implement VECTOR COUNT LEADING ZEROS
  s390x/tcg: Implement VECTOR COUNT TRAILING ZEROS
  s390x/tcg: Implement VECTOR EXCLUSIVE OR
  s390x/tcg: Implement VECTOR GALOIS FIELD MULTIPLY SUM (AND ACCUMULATE)
  s390x/tcg: Implement VECTOR LOAD COMPLEMENT
  s390x/tcg: Implement VECTOR LOAD POSITIVE
  s390x/tcg: Implement VECTOR (MAXIMUM|MINIMUM) (LOGICAL)
  s390x/tcg: Implement VECTOR MULTIPLY AND ADD *
  s390x/tcg: Implement VECTOR MULTIPLY *
  s390x/tcg: Implement VECTOR NAND
  s390x/tcg: Implement VECTOR NOR
  s390x/tcg: Implement VECTOR NOT EXCLUSIVE OR
  s390x/tcg: Implement VECTOR OR
  s390x/tcg: Implement VECTOR OR WITH COMPLEMENT
  s390x/tcg: Implement VECTOR POPULATION COUNT
  s390x/tcg: Implement VECTOR ELEMENT ROTATE LEFT LOGICAL
  s390x/tcg: Implement VECTOR ELEMENT ROTATE AND INSERT UNDER MASK
  s390x/tcg: Implement VECTOR ELEMENT SHIFT
  s390x/tcg: Implement VECTOR SHIFT LEFT (BY BYTE)
  s390x/tcg: Implement VECTOR SHIFT LEFT DOUBLE BY BYTE
  s390x/tcg: Implement VECTOR SHIFT RIGHT ARITHMETIC
  s390x/tcg: Implement VECTOR SHIFT RIGHT LOGICAL *
  s390x/tcg: Implement VECTOR SUBTRACT
  s390x/tcg: Implement VECTOR SUBTRACT COMPUTE BORROW INDICATION
  s390x/tcg: Implement VECTOR SUBTRACT WITH BORROW INDICATION
  s390x/tcg: Implement VECTOR SUBTRACT WITH BORROW COMPUTE BORROW
    INDICATION
  s390x/tcg: Implement VECTOR SUM ACROSS DOUBLEWORD
  s390x/tcg: Implement VECTOR SUM ACROSS QUADWORD
  s390x/tcg: Implement VECTOR SUM ACROSS WORD
  s390x/tcg: Implement VECTOR TEST UNDER MASK

 target/s390x/Makefile.objs      |    2 +-
 target/s390x/cc_helper.c        |   17 +
 target/s390x/helper.c           |    1 +
 target/s390x/helper.h           |   91 ++
 target/s390x/insn-data.def      |  137 +++
 target/s390x/internal.h         |    1 +
 target/s390x/translate.c        |    2 +
 target/s390x/translate_vx.inc.c | 1393 +++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   |  839 +++++++++++++++++++
 tcg/tcg-op-gvec.c               |  139 +++
 tcg/tcg-op-gvec.h               |   24 +
 11 files changed, 2645 insertions(+), 1 deletion(-)
 create mode 100644 target/s390x/vec_int_helper.c

-- 
2.20.1

^ permalink raw reply	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 00/41] s390x/tcg: Vector Instruction Support Part 2
@ 2019-04-11 10:07 ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

This is the second part of vector instruction support for s390x. It is
based on the series soon to land upstream:
    [PATCH 0/9] tcg: Add tcg_gen_extract2_{i32,i64}

Part 1: Vector Support Instructions
Part 2: Vector Integer Instructions
Part 3: Vector String Instructions
Part 4: Vector Floating-Point Instructions

The current state can be found at (kept updated):
    https://github.com/davidhildenbrand/qemu/tree/vx

With the current state I can boot Linux kernel + user space compiled with
SIMD support. This allows to boot distributions compiled exclusively for
z13, requiring SIMD support. Also, it is now possible to build a complete
kernel using rpmbuild as quite some issues have been sorted out.

In this part, all Vector Integer Instructions introduced with the
"Vector Facility" are added. Some instructions part of the Vector Extension
Facilities are also added.

David Hildenbrand (41):
  tcg: Implement tcg_gen_gvec_3i()
  s390x/tcg: Implement VECTOR ADD
  s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
  s390x/tcg: Implement VECTOR ADD WITH CARRY
  s390x/tcg: Implement VECTOR ADD WITH CARRY COMPUTE CARRY
  s390x/tcg: Implement VECTOR AND (WITH COMPLEMENT)
  s390x/tcg: Implement VECTOR AVERAGE
  s390x/tcg: Implement VECTOR AVERAGE LOGICAL
  s390x/tcg: Implement VECTOR CHECKSUM
  s390x/tcg: Implement VECTOR ELEMENT COMPARE *
  s390x/tcg: Implement VECTOR COMPARE *
  s390x/tcg: Implement VECTOR COUNT LEADING ZEROS
  s390x/tcg: Implement VECTOR COUNT TRAILING ZEROS
  s390x/tcg: Implement VECTOR EXCLUSIVE OR
  s390x/tcg: Implement VECTOR GALOIS FIELD MULTIPLY SUM (AND ACCUMULATE)
  s390x/tcg: Implement VECTOR LOAD COMPLEMENT
  s390x/tcg: Implement VECTOR LOAD POSITIVE
  s390x/tcg: Implement VECTOR (MAXIMUM|MINIMUM) (LOGICAL)
  s390x/tcg: Implement VECTOR MULTIPLY AND ADD *
  s390x/tcg: Implement VECTOR MULTIPLY *
  s390x/tcg: Implement VECTOR NAND
  s390x/tcg: Implement VECTOR NOR
  s390x/tcg: Implement VECTOR NOT EXCLUSIVE OR
  s390x/tcg: Implement VECTOR OR
  s390x/tcg: Implement VECTOR OR WITH COMPLEMENT
  s390x/tcg: Implement VECTOR POPULATION COUNT
  s390x/tcg: Implement VECTOR ELEMENT ROTATE LEFT LOGICAL
  s390x/tcg: Implement VECTOR ELEMENT ROTATE AND INSERT UNDER MASK
  s390x/tcg: Implement VECTOR ELEMENT SHIFT
  s390x/tcg: Implement VECTOR SHIFT LEFT (BY BYTE)
  s390x/tcg: Implement VECTOR SHIFT LEFT DOUBLE BY BYTE
  s390x/tcg: Implement VECTOR SHIFT RIGHT ARITHMETIC
  s390x/tcg: Implement VECTOR SHIFT RIGHT LOGICAL *
  s390x/tcg: Implement VECTOR SUBTRACT
  s390x/tcg: Implement VECTOR SUBTRACT COMPUTE BORROW INDICATION
  s390x/tcg: Implement VECTOR SUBTRACT WITH BORROW INDICATION
  s390x/tcg: Implement VECTOR SUBTRACT WITH BORROW COMPUTE BORROW
    INDICATION
  s390x/tcg: Implement VECTOR SUM ACROSS DOUBLEWORD
  s390x/tcg: Implement VECTOR SUM ACROSS QUADWORD
  s390x/tcg: Implement VECTOR SUM ACROSS WORD
  s390x/tcg: Implement VECTOR TEST UNDER MASK

 target/s390x/Makefile.objs      |    2 +-
 target/s390x/cc_helper.c        |   17 +
 target/s390x/helper.c           |    1 +
 target/s390x/helper.h           |   91 ++
 target/s390x/insn-data.def      |  137 +++
 target/s390x/internal.h         |    1 +
 target/s390x/translate.c        |    2 +
 target/s390x/translate_vx.inc.c | 1393 +++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   |  839 +++++++++++++++++++
 tcg/tcg-op-gvec.c               |  139 +++
 tcg/tcg-op-gvec.h               |   24 +
 11 files changed, 2645 insertions(+), 1 deletion(-)
 create mode 100644 target/s390x/vec_int_helper.c

-- 
2.20.1



^ permalink raw reply	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 01/41] tcg: Implement tcg_gen_gvec_3i()
@ 2019-04-11 10:07   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Let's add tcg_gen_gvec_3i(), similar to tcg_gen_gvec_2i(), however
without introducing "gen_helper_gvec_3i *fnoi", as it isn't needed
for now.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 tcg/tcg-op-gvec.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg-op-gvec.h |  24 ++++++++
 2 files changed, 163 insertions(+)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 0996ef0812..f831adb4e7 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -663,6 +663,29 @@ static void expand_3_i32(uint32_t dofs, uint32_t aofs,
     tcg_temp_free_i32(t0);
 }
 
+static void expand_3i_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                          uint32_t oprsz, int32_t c, bool load_dest,
+                          void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32, int32_t))
+{
+    TCGv_i32 t0 = tcg_temp_new_i32();
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    TCGv_i32 t2 = tcg_temp_new_i32();
+    uint32_t i;
+
+    for (i = 0; i < oprsz; i += 4) {
+        tcg_gen_ld_i32(t0, cpu_env, aofs + i);
+        tcg_gen_ld_i32(t1, cpu_env, bofs + i);
+        if (load_dest) {
+            tcg_gen_ld_i32(t2, cpu_env, dofs + i);
+        }
+        fni(t2, t0, t1, c);
+        tcg_gen_st_i32(t2, cpu_env, dofs + i);
+    }
+    tcg_temp_free_i32(t0);
+    tcg_temp_free_i32(t1);
+    tcg_temp_free_i32(t2);
+}
+
 /* Expand OPSZ bytes worth of three-operand operations using i32 elements.  */
 static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                          uint32_t cofs, uint32_t oprsz, bool write_aofs,
@@ -770,6 +793,29 @@ static void expand_3_i64(uint32_t dofs, uint32_t aofs,
     tcg_temp_free_i64(t0);
 }
 
+static void expand_3i_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                          uint32_t oprsz, int64_t c, bool load_dest,
+                          void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, int64_t))
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    uint32_t i;
+
+    for (i = 0; i < oprsz; i += 8) {
+        tcg_gen_ld_i64(t0, cpu_env, aofs + i);
+        tcg_gen_ld_i64(t1, cpu_env, bofs + i);
+        if (load_dest) {
+            tcg_gen_ld_i64(t2, cpu_env, dofs + i);
+        }
+        fni(t2, t0, t1, c);
+        tcg_gen_st_i64(t2, cpu_env, dofs + i);
+    }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
 /* Expand OPSZ bytes worth of three-operand operations using i64 elements.  */
 static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                          uint32_t cofs, uint32_t oprsz, bool write_aofs,
@@ -883,6 +929,35 @@ static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
     tcg_temp_free_vec(t0);
 }
 
+/*
+ * Expand OPSZ bytes worth of three-vector operands and an immediate operand
+ * using host vectors.
+ */
+static void expand_3i_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
+                          uint32_t bofs, uint32_t oprsz, uint32_t tysz,
+                          TCGType type, int64_t c, bool load_dest,
+                          void (*fni)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec,
+                                      int64_t))
+{
+    TCGv_vec t0 = tcg_temp_new_vec(type);
+    TCGv_vec t1 = tcg_temp_new_vec(type);
+    TCGv_vec t2 = tcg_temp_new_vec(type);
+    uint32_t i;
+
+    for (i = 0; i < oprsz; i += tysz) {
+        tcg_gen_ld_vec(t0, cpu_env, aofs + i);
+        tcg_gen_ld_vec(t1, cpu_env, bofs + i);
+        if (load_dest) {
+            tcg_gen_ld_vec(t2, cpu_env, dofs + i);
+        }
+        fni(vece, t2, t0, t1, c);
+        tcg_gen_st_vec(t2, cpu_env, dofs + i);
+    }
+    tcg_temp_free_vec(t0);
+    tcg_temp_free_vec(t1);
+    tcg_temp_free_vec(t2);
+}
+
 /* Expand OPSZ bytes worth of four-operand operations using host vectors.  */
 static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
                          uint32_t bofs, uint32_t cofs, uint32_t oprsz,
@@ -1174,6 +1249,70 @@ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,
     }
 }
 
+/* Expand a vector operation with three vectors and an immediate.  */
+void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                     uint32_t oprsz, uint32_t maxsz, int64_t c,
+                     const GVecGen3i *g)
+{
+    TCGType type;
+    uint32_t some;
+
+    check_size_align(oprsz, maxsz, dofs | aofs | bofs);
+    check_overlap_3(dofs, aofs, bofs, maxsz);
+
+    type = 0;
+    if (g->fniv) {
+        type = choose_vector_type(g->opc, g->vece, oprsz, g->prefer_i64);
+    }
+    switch (type) {
+    case TCG_TYPE_V256:
+        /*
+         * Recall that ARM SVE allows vector sizes that are not a
+         * power of 2, but always a multiple of 16.  The intent is
+         * that e.g. size == 80 would be expanded with 2x32 + 1x16.
+         */
+        some = QEMU_ALIGN_DOWN(oprsz, 32);
+        expand_3i_vec(g->vece, dofs, aofs, bofs, some, 32, TCG_TYPE_V256,
+                      c, g->load_dest, g->fniv);
+        if (some == oprsz) {
+            break;
+        }
+        dofs += some;
+        aofs += some;
+        bofs += some;
+        oprsz -= some;
+        maxsz -= some;
+        /* fallthru */
+    case TCG_TYPE_V128:
+        expand_3i_vec(g->vece, dofs, aofs, bofs, oprsz, 16, TCG_TYPE_V128,
+                      c, g->load_dest, g->fniv);
+        break;
+    case TCG_TYPE_V64:
+        expand_3i_vec(g->vece, dofs, aofs, bofs, oprsz, 8, TCG_TYPE_V64,
+                      c, g->load_dest, g->fniv);
+        break;
+
+    case 0:
+        if (g->fni8 && check_size_impl(oprsz, 8)) {
+            expand_3i_i64(dofs, aofs, bofs, oprsz, c, g->load_dest, g->fni8);
+        } else if (g->fni4 && check_size_impl(oprsz, 4)) {
+            expand_3i_i32(dofs, aofs, bofs, oprsz, c, g->load_dest, g->fni4);
+        } else {
+            assert(g->fno != NULL);
+            tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, c, g->fno);
+            return;
+        }
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+
+    if (oprsz < maxsz) {
+        expand_clr(dofs + oprsz, maxsz - oprsz);
+    }
+}
+
 /* Expand a vector four-operand operation.  */
 void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen4 *g)
diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index 850da32ded..c093243c4c 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -164,6 +164,27 @@ typedef struct {
     bool load_dest;
 } GVecGen3;
 
+typedef struct {
+    /*
+     * Expand inline as a 64-bit or 32-bit integer. Only one of these will be
+     * non-NULL.
+     */
+    void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64, int64_t);
+    void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32, int32_t);
+    /* Expand inline with a host vector type.  */
+    void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, int64_t);
+    /* Expand out-of-line helper w/descriptor, data in descriptor.  */
+    gen_helper_gvec_3 *fno;
+    /* The opcode, if any, to which this corresponds.  */
+    TCGOpcode opc;
+    /* The vector element size, if applicable.  */
+    uint8_t vece;
+    /* Prefer i64 to v64.  */
+    bool prefer_i64;
+    /* Load dest as a 3rd source operand.  */
+    bool load_dest;
+} GVecGen3i;
+
 typedef struct {
     /* Expand inline as a 64-bit or 32-bit integer.
        Only one of these will be non-NULL.  */
@@ -193,6 +214,9 @@ void tcg_gen_gvec_2s(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
                      uint32_t maxsz, TCGv_i64 c, const GVecGen2s *);
 void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen3 *);
+void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                     uint32_t oprsz, uint32_t maxsz, int64_t c,
+                     const GVecGen3i *);
 void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen4 *);
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 01/41] tcg: Implement tcg_gen_gvec_3i()
@ 2019-04-11 10:07   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Let's add tcg_gen_gvec_3i(), similar to tcg_gen_gvec_2i(), however
without introducing "gen_helper_gvec_3i *fnoi", as it isn't needed
for now.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 tcg/tcg-op-gvec.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg-op-gvec.h |  24 ++++++++
 2 files changed, 163 insertions(+)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 0996ef0812..f831adb4e7 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -663,6 +663,29 @@ static void expand_3_i32(uint32_t dofs, uint32_t aofs,
     tcg_temp_free_i32(t0);
 }
 
+static void expand_3i_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                          uint32_t oprsz, int32_t c, bool load_dest,
+                          void (*fni)(TCGv_i32, TCGv_i32, TCGv_i32, int32_t))
+{
+    TCGv_i32 t0 = tcg_temp_new_i32();
+    TCGv_i32 t1 = tcg_temp_new_i32();
+    TCGv_i32 t2 = tcg_temp_new_i32();
+    uint32_t i;
+
+    for (i = 0; i < oprsz; i += 4) {
+        tcg_gen_ld_i32(t0, cpu_env, aofs + i);
+        tcg_gen_ld_i32(t1, cpu_env, bofs + i);
+        if (load_dest) {
+            tcg_gen_ld_i32(t2, cpu_env, dofs + i);
+        }
+        fni(t2, t0, t1, c);
+        tcg_gen_st_i32(t2, cpu_env, dofs + i);
+    }
+    tcg_temp_free_i32(t0);
+    tcg_temp_free_i32(t1);
+    tcg_temp_free_i32(t2);
+}
+
 /* Expand OPSZ bytes worth of three-operand operations using i32 elements.  */
 static void expand_4_i32(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                          uint32_t cofs, uint32_t oprsz, bool write_aofs,
@@ -770,6 +793,29 @@ static void expand_3_i64(uint32_t dofs, uint32_t aofs,
     tcg_temp_free_i64(t0);
 }
 
+static void expand_3i_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                          uint32_t oprsz, int64_t c, bool load_dest,
+                          void (*fni)(TCGv_i64, TCGv_i64, TCGv_i64, int64_t))
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    uint32_t i;
+
+    for (i = 0; i < oprsz; i += 8) {
+        tcg_gen_ld_i64(t0, cpu_env, aofs + i);
+        tcg_gen_ld_i64(t1, cpu_env, bofs + i);
+        if (load_dest) {
+            tcg_gen_ld_i64(t2, cpu_env, dofs + i);
+        }
+        fni(t2, t0, t1, c);
+        tcg_gen_st_i64(t2, cpu_env, dofs + i);
+    }
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+}
+
 /* Expand OPSZ bytes worth of three-operand operations using i64 elements.  */
 static void expand_4_i64(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                          uint32_t cofs, uint32_t oprsz, bool write_aofs,
@@ -883,6 +929,35 @@ static void expand_3_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
     tcg_temp_free_vec(t0);
 }
 
+/*
+ * Expand OPSZ bytes worth of three-vector operands and an immediate operand
+ * using host vectors.
+ */
+static void expand_3i_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
+                          uint32_t bofs, uint32_t oprsz, uint32_t tysz,
+                          TCGType type, int64_t c, bool load_dest,
+                          void (*fni)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec,
+                                      int64_t))
+{
+    TCGv_vec t0 = tcg_temp_new_vec(type);
+    TCGv_vec t1 = tcg_temp_new_vec(type);
+    TCGv_vec t2 = tcg_temp_new_vec(type);
+    uint32_t i;
+
+    for (i = 0; i < oprsz; i += tysz) {
+        tcg_gen_ld_vec(t0, cpu_env, aofs + i);
+        tcg_gen_ld_vec(t1, cpu_env, bofs + i);
+        if (load_dest) {
+            tcg_gen_ld_vec(t2, cpu_env, dofs + i);
+        }
+        fni(vece, t2, t0, t1, c);
+        tcg_gen_st_vec(t2, cpu_env, dofs + i);
+    }
+    tcg_temp_free_vec(t0);
+    tcg_temp_free_vec(t1);
+    tcg_temp_free_vec(t2);
+}
+
 /* Expand OPSZ bytes worth of four-operand operations using host vectors.  */
 static void expand_4_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
                          uint32_t bofs, uint32_t cofs, uint32_t oprsz,
@@ -1174,6 +1249,70 @@ void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,
     }
 }
 
+/* Expand a vector operation with three vectors and an immediate.  */
+void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                     uint32_t oprsz, uint32_t maxsz, int64_t c,
+                     const GVecGen3i *g)
+{
+    TCGType type;
+    uint32_t some;
+
+    check_size_align(oprsz, maxsz, dofs | aofs | bofs);
+    check_overlap_3(dofs, aofs, bofs, maxsz);
+
+    type = 0;
+    if (g->fniv) {
+        type = choose_vector_type(g->opc, g->vece, oprsz, g->prefer_i64);
+    }
+    switch (type) {
+    case TCG_TYPE_V256:
+        /*
+         * Recall that ARM SVE allows vector sizes that are not a
+         * power of 2, but always a multiple of 16.  The intent is
+         * that e.g. size == 80 would be expanded with 2x32 + 1x16.
+         */
+        some = QEMU_ALIGN_DOWN(oprsz, 32);
+        expand_3i_vec(g->vece, dofs, aofs, bofs, some, 32, TCG_TYPE_V256,
+                      c, g->load_dest, g->fniv);
+        if (some == oprsz) {
+            break;
+        }
+        dofs += some;
+        aofs += some;
+        bofs += some;
+        oprsz -= some;
+        maxsz -= some;
+        /* fallthru */
+    case TCG_TYPE_V128:
+        expand_3i_vec(g->vece, dofs, aofs, bofs, oprsz, 16, TCG_TYPE_V128,
+                      c, g->load_dest, g->fniv);
+        break;
+    case TCG_TYPE_V64:
+        expand_3i_vec(g->vece, dofs, aofs, bofs, oprsz, 8, TCG_TYPE_V64,
+                      c, g->load_dest, g->fniv);
+        break;
+
+    case 0:
+        if (g->fni8 && check_size_impl(oprsz, 8)) {
+            expand_3i_i64(dofs, aofs, bofs, oprsz, c, g->load_dest, g->fni8);
+        } else if (g->fni4 && check_size_impl(oprsz, 4)) {
+            expand_3i_i32(dofs, aofs, bofs, oprsz, c, g->load_dest, g->fni4);
+        } else {
+            assert(g->fno != NULL);
+            tcg_gen_gvec_3_ool(dofs, aofs, bofs, oprsz, maxsz, c, g->fno);
+            return;
+        }
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+
+    if (oprsz < maxsz) {
+        expand_clr(dofs + oprsz, maxsz - oprsz);
+    }
+}
+
 /* Expand a vector four-operand operation.  */
 void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen4 *g)
diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index 850da32ded..c093243c4c 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -164,6 +164,27 @@ typedef struct {
     bool load_dest;
 } GVecGen3;
 
+typedef struct {
+    /*
+     * Expand inline as a 64-bit or 32-bit integer. Only one of these will be
+     * non-NULL.
+     */
+    void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64, int64_t);
+    void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32, int32_t);
+    /* Expand inline with a host vector type.  */
+    void (*fniv)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec, int64_t);
+    /* Expand out-of-line helper w/descriptor, data in descriptor.  */
+    gen_helper_gvec_3 *fno;
+    /* The opcode, if any, to which this corresponds.  */
+    TCGOpcode opc;
+    /* The vector element size, if applicable.  */
+    uint8_t vece;
+    /* Prefer i64 to v64.  */
+    bool prefer_i64;
+    /* Load dest as a 3rd source operand.  */
+    bool load_dest;
+} GVecGen3i;
+
 typedef struct {
     /* Expand inline as a 64-bit or 32-bit integer.
        Only one of these will be non-NULL.  */
@@ -193,6 +214,9 @@ void tcg_gen_gvec_2s(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
                      uint32_t maxsz, TCGv_i64 c, const GVecGen2s *);
 void tcg_gen_gvec_3(uint32_t dofs, uint32_t aofs, uint32_t bofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen3 *);
+void tcg_gen_gvec_3i(uint32_t dofs, uint32_t aofs, uint32_t bofs,
+                     uint32_t oprsz, uint32_t maxsz, int64_t c,
+                     const GVecGen3i *);
 void tcg_gen_gvec_4(uint32_t dofs, uint32_t aofs, uint32_t bofs, uint32_t cofs,
                     uint32_t oprsz, uint32_t maxsz, const GVecGen4 *);
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 02/41] s390x/tcg: Implement VECTOR ADD
@ 2019-04-11 10:07   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Introduce two types of fancy new helpers that will be reused a couple of
times

1. gen_gvec_fn_3: Call an existing tcg_gen_gvec_X function with 3
   parameters, simplifying parameter passing
2. gen_gvec128_3_i64: Call a function that performs 128 bit calculations
   using two 64 bit values per vector.

Luckily, for VECTOR ADD we already have everything we need.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  5 ++++
 target/s390x/translate_vx.inc.c | 52 +++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 71fa9b8d6c..74a0ccc770 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1054,6 +1054,11 @@
 /* VECTOR UNPACK LOGICAL LOW */
     F(0xe7d4, VUPLL,   VRR_a, V,   0, 0, 0, 0, vup, 0, IF_VEC)
 
+/* === Vector Integer Instructions === */
+
+/* VECTOR ADD */
+    F(0xe7f3, VA,      VRR_c, V,   0, 0, 0, 0, va, 0, IF_VEC)
+
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
     E(0xb250, CSP,     RRE,   Z,   r1_32u, ra2, r1_P, 0, csp, 0, MO_TEUL, IF_PRIV)
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 76f9a5d939..2f84ea0511 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -157,6 +157,41 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
                      16)
 #define gen_gvec_dup64i(v1, c) \
     tcg_gen_gvec_dup64i(vec_full_reg_offset(v1), 16, 16, c)
+#define gen_gvec_fn_3(fn, es, v1, v2, v3) \
+    tcg_gen_gvec_##fn(es, vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                      vec_full_reg_offset(v3), 16, 16)
+
+/*
+ * Helper to carry out a 128 bit vector computation using 2 i64 values per
+ * vector.
+ */
+typedef void (*gen_gvec128_3_i64_fn)(TCGv_i64 dl, TCGv_i64 dh, TCGv_i64 al,
+                                     TCGv_i64 ah, TCGv_i64 bl, TCGv_i64 bh);
+static void gen_gvec128_3_i64(gen_gvec128_3_i64_fn fn, uint8_t d, uint8_t a,
+                              uint8_t b)
+{
+        TCGv_i64 dh = tcg_temp_new_i64();
+        TCGv_i64 dl = tcg_temp_new_i64();
+        TCGv_i64 ah = tcg_temp_new_i64();
+        TCGv_i64 al = tcg_temp_new_i64();
+        TCGv_i64 bh = tcg_temp_new_i64();
+        TCGv_i64 bl = tcg_temp_new_i64();
+
+        read_vec_element_i64(ah, a, 0, ES_64);
+        read_vec_element_i64(al, a, 1, ES_64);
+        read_vec_element_i64(bh, b, 0, ES_64);
+        read_vec_element_i64(bl, b, 1, ES_64);
+        fn(dl, dh, al, ah, bl, bh);
+        write_vec_element_i64(dh, d, 0, ES_64);
+        write_vec_element_i64(dl, d, 1, ES_64);
+
+        tcg_temp_free_i64(dh);
+        tcg_temp_free_i64(dl);
+        tcg_temp_free_i64(ah);
+        tcg_temp_free_i64(al);
+        tcg_temp_free_i64(bh);
+        tcg_temp_free_i64(bl);
+}
 
 static void gen_gvec_dupi(uint8_t es, uint8_t reg, uint64_t c)
 {
@@ -933,3 +968,20 @@ static DisasJumpType op_vup(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_va(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+
+    if (es > ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    } else if (es == ES_128) {
+        gen_gvec128_3_i64(tcg_gen_add2_i64, get_field(s->fields, v1),
+                          get_field(s->fields, v2), get_field(s->fields, v3));
+        return DISAS_NEXT;
+    }
+    gen_gvec_fn_3(add, es, get_field(s->fields, v1), get_field(s->fields, v2),
+                  get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 02/41] s390x/tcg: Implement VECTOR ADD
@ 2019-04-11 10:07   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Introduce two types of fancy new helpers that will be reused a couple of
times

1. gen_gvec_fn_3: Call an existing tcg_gen_gvec_X function with 3
   parameters, simplifying parameter passing
2. gen_gvec128_3_i64: Call a function that performs 128 bit calculations
   using two 64 bit values per vector.

Luckily, for VECTOR ADD we already have everything we need.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  5 ++++
 target/s390x/translate_vx.inc.c | 52 +++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 71fa9b8d6c..74a0ccc770 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1054,6 +1054,11 @@
 /* VECTOR UNPACK LOGICAL LOW */
     F(0xe7d4, VUPLL,   VRR_a, V,   0, 0, 0, 0, vup, 0, IF_VEC)
 
+/* === Vector Integer Instructions === */
+
+/* VECTOR ADD */
+    F(0xe7f3, VA,      VRR_c, V,   0, 0, 0, 0, va, 0, IF_VEC)
+
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
     E(0xb250, CSP,     RRE,   Z,   r1_32u, ra2, r1_P, 0, csp, 0, MO_TEUL, IF_PRIV)
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 76f9a5d939..2f84ea0511 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -157,6 +157,41 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
                      16)
 #define gen_gvec_dup64i(v1, c) \
     tcg_gen_gvec_dup64i(vec_full_reg_offset(v1), 16, 16, c)
+#define gen_gvec_fn_3(fn, es, v1, v2, v3) \
+    tcg_gen_gvec_##fn(es, vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                      vec_full_reg_offset(v3), 16, 16)
+
+/*
+ * Helper to carry out a 128 bit vector computation using 2 i64 values per
+ * vector.
+ */
+typedef void (*gen_gvec128_3_i64_fn)(TCGv_i64 dl, TCGv_i64 dh, TCGv_i64 al,
+                                     TCGv_i64 ah, TCGv_i64 bl, TCGv_i64 bh);
+static void gen_gvec128_3_i64(gen_gvec128_3_i64_fn fn, uint8_t d, uint8_t a,
+                              uint8_t b)
+{
+        TCGv_i64 dh = tcg_temp_new_i64();
+        TCGv_i64 dl = tcg_temp_new_i64();
+        TCGv_i64 ah = tcg_temp_new_i64();
+        TCGv_i64 al = tcg_temp_new_i64();
+        TCGv_i64 bh = tcg_temp_new_i64();
+        TCGv_i64 bl = tcg_temp_new_i64();
+
+        read_vec_element_i64(ah, a, 0, ES_64);
+        read_vec_element_i64(al, a, 1, ES_64);
+        read_vec_element_i64(bh, b, 0, ES_64);
+        read_vec_element_i64(bl, b, 1, ES_64);
+        fn(dl, dh, al, ah, bl, bh);
+        write_vec_element_i64(dh, d, 0, ES_64);
+        write_vec_element_i64(dl, d, 1, ES_64);
+
+        tcg_temp_free_i64(dh);
+        tcg_temp_free_i64(dl);
+        tcg_temp_free_i64(ah);
+        tcg_temp_free_i64(al);
+        tcg_temp_free_i64(bh);
+        tcg_temp_free_i64(bl);
+}
 
 static void gen_gvec_dupi(uint8_t es, uint8_t reg, uint64_t c)
 {
@@ -933,3 +968,20 @@ static DisasJumpType op_vup(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_va(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+
+    if (es > ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    } else if (es == ES_128) {
+        gen_gvec128_3_i64(tcg_gen_add2_i64, get_field(s->fields, v1),
+                          get_field(s->fields, v2), get_field(s->fields, v3));
+        return DISAS_NEXT;
+    }
+    gen_gvec_fn_3(add, es, get_field(s->fields, v1), get_field(s->fields, v2),
+                  get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
@ 2019-04-11 10:07   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Only 64 bit handling is really easy. 128 bit handling is performed
via an ool handler, introducing s390_vec_add() that will be reused later.

8/16/32 bit handling is black magic inspired by gen_addv_mask(). If
there is every a bug detected in there, throw it away and simply use
ool helpers for 8/16 bit handling and something like 64 bit handling for
32 bit handling.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/Makefile.objs      |  2 +-
 target/s390x/helper.h           |  3 ++
 target/s390x/insn-data.def      |  2 +
 target/s390x/translate_vx.inc.c | 74 +++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 47 +++++++++++++++++++++
 5 files changed, 127 insertions(+), 1 deletion(-)
 create mode 100644 target/s390x/vec_int_helper.c

diff --git a/target/s390x/Makefile.objs b/target/s390x/Makefile.objs
index 68eeee3d2f..993ac93ed6 100644
--- a/target/s390x/Makefile.objs
+++ b/target/s390x/Makefile.objs
@@ -1,7 +1,7 @@
 obj-y += cpu.o cpu_models.o cpu_features.o gdbstub.o interrupt.o helper.o
 obj-$(CONFIG_TCG) += translate.o cc_helper.o excp_helper.o fpu_helper.o
 obj-$(CONFIG_TCG) += int_helper.o mem_helper.o misc_helper.o crypto_helper.o
-obj-$(CONFIG_TCG) += vec_helper.o
+obj-$(CONFIG_TCG) += vec_helper.o vec_int_helper.o
 obj-$(CONFIG_SOFTMMU) += machine.o ioinst.o arch_dump.o mmu_helper.o diag.o
 obj-$(CONFIG_SOFTMMU) += sigp.o
 obj-$(CONFIG_KVM) += kvm.o
diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 0b494a2fd2..2c1b223248 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -145,6 +145,9 @@ DEF_HELPER_5(gvec_vpkls_cc64, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vperm, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(vstl, TCG_CALL_NO_WG, void, env, cptr, i64, i64)
 
+/* === Vector Integer Instructions === */
+DEF_HELPER_FLAGS_4(gvec_vacc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
 DEF_HELPER_4(diag, void, env, i32, i32, i32)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 74a0ccc770..f0e62b9aa8 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1058,6 +1058,8 @@
 
 /* VECTOR ADD */
     F(0xe7f3, VA,      VRR_c, V,   0, 0, 0, 0, va, 0, IF_VEC)
+/* VECTOR ADD COMPUTE CARRY */
+    F(0xe7f1, VACC,    VRR_c, V,   0, 0, 0, 0, vacc, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 2f84ea0511..c3bc47f1a9 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -136,6 +136,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
     tcg_temp_free_i64(tmp);
 }
 
+#define gen_gvec_3(v1, v2, v3, gen) \
+    tcg_gen_gvec_3(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                   vec_full_reg_offset(v3), 16, 16, gen)
 #define gen_gvec_3_ool(v1, v2, v3, data, fn) \
     tcg_gen_gvec_3_ool(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                        vec_full_reg_offset(v3), 16, 16, data, fn)
@@ -985,3 +988,74 @@ static DisasJumpType op_va(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static void gen_acc(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, uint8_t es)
+{
+    const uint8_t msb_bit_nr = NUM_VEC_ELEMENT_BITS(es) - 1;
+    TCGv_i64 msb_mask = tcg_const_i64(dup_const(es, 1ull << msb_bit_nr));
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 t3 = tcg_temp_new_i64();
+
+    /* Calculate the carry into the MSB, ignoring the old MSBs */
+    tcg_gen_andc_i64(t1, a, msb_mask);
+    tcg_gen_andc_i64(t2, b, msb_mask);
+    tcg_gen_add_i64(t1, t1, t2);
+    /* Calculate the MSB without any carry into it */
+    tcg_gen_xor_i64(t3, a, b);
+    /* Calculate the carry out of the MSB in the MSB bit position */
+    tcg_gen_and_i64(d, a, b);
+    tcg_gen_and_i64(t1, t1, t3);
+    tcg_gen_or_i64(d, d, t1);
+    /* Isolate and shift the carry into position */
+    tcg_gen_and_i64(d, d, msb_mask);
+    tcg_gen_shri_i64(d, d, msb_bit_nr);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+    tcg_temp_free_i64(t3);
+}
+
+static void gen_acc8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    gen_acc(d, a, b, ES_8);
+}
+
+static void gen_acc16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    gen_acc(d, a, b, ES_16);
+}
+
+static void gen_acc32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    gen_acc(d, a, b, ES_32);
+}
+
+static void gen_acc_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t = tcg_temp_new_i64();
+
+    tcg_gen_add_i64(t, a, b);
+    tcg_gen_setcond_i64(TCG_COND_LTU, d, t, b);
+    tcg_temp_free_i64(t);
+}
+
+static DisasJumpType op_vacc(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g[5] = {
+        { .fni8 = gen_acc8_i64, },
+        { .fni8 = gen_acc16_i64, },
+        { .fni8 = gen_acc32_i64, },
+        { .fni8 = gen_acc_i64, },
+        { .fno = gen_helper_gvec_vacc128, },
+    };
+
+    if (es > ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
new file mode 100644
index 0000000000..0b232571bc
--- /dev/null
+++ b/target/s390x/vec_int_helper.c
@@ -0,0 +1,47 @@
+/*
+ * QEMU TCG support -- s390x vector integer instruction support
+ *
+ * Copyright (C) 2019 Red Hat Inc
+ *
+ * Authors:
+ *   David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "vec.h"
+#include "exec/helper-proto.h"
+
+/*
+ * Add two 128 bit vectors, returning the carry.
+ */
+static bool s390_vec_add(S390Vector *d, const S390Vector *a,
+                         const S390Vector *b)
+{
+    bool low_carry = false, high_carry = false;
+
+    if (a->doubleword[0] + b->doubleword[0] < a->doubleword[0]) {
+        high_carry = true;
+    }
+    if (a->doubleword[1] + b->doubleword[1] < a->doubleword[1]) {
+        low_carry = true;
+        if (a->doubleword[0] == b->doubleword[0]) {
+            high_carry = true;
+        }
+    }
+    d->doubleword[0] = a->doubleword[0] + b->doubleword[0] + low_carry;
+    d->doubleword[1] = a->doubleword[1] + b->doubleword[1];
+    return high_carry;
+}
+
+void HELPER(gvec_vacc128)(void *v1, const void *v2, const void *v3,
+                          uint32_t desc)
+{
+    S390Vector tmp, *dst = v1;
+
+    dst->doubleword[0] = 0;
+    dst->doubleword[1] = s390_vec_add(&tmp, v2, v3);
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
@ 2019-04-11 10:07   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Only 64 bit handling is really easy. 128 bit handling is performed
via an ool handler, introducing s390_vec_add() that will be reused later.

8/16/32 bit handling is black magic inspired by gen_addv_mask(). If
there is every a bug detected in there, throw it away and simply use
ool helpers for 8/16 bit handling and something like 64 bit handling for
32 bit handling.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/Makefile.objs      |  2 +-
 target/s390x/helper.h           |  3 ++
 target/s390x/insn-data.def      |  2 +
 target/s390x/translate_vx.inc.c | 74 +++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 47 +++++++++++++++++++++
 5 files changed, 127 insertions(+), 1 deletion(-)
 create mode 100644 target/s390x/vec_int_helper.c

diff --git a/target/s390x/Makefile.objs b/target/s390x/Makefile.objs
index 68eeee3d2f..993ac93ed6 100644
--- a/target/s390x/Makefile.objs
+++ b/target/s390x/Makefile.objs
@@ -1,7 +1,7 @@
 obj-y += cpu.o cpu_models.o cpu_features.o gdbstub.o interrupt.o helper.o
 obj-$(CONFIG_TCG) += translate.o cc_helper.o excp_helper.o fpu_helper.o
 obj-$(CONFIG_TCG) += int_helper.o mem_helper.o misc_helper.o crypto_helper.o
-obj-$(CONFIG_TCG) += vec_helper.o
+obj-$(CONFIG_TCG) += vec_helper.o vec_int_helper.o
 obj-$(CONFIG_SOFTMMU) += machine.o ioinst.o arch_dump.o mmu_helper.o diag.o
 obj-$(CONFIG_SOFTMMU) += sigp.o
 obj-$(CONFIG_KVM) += kvm.o
diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 0b494a2fd2..2c1b223248 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -145,6 +145,9 @@ DEF_HELPER_5(gvec_vpkls_cc64, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vperm, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(vstl, TCG_CALL_NO_WG, void, env, cptr, i64, i64)
 
+/* === Vector Integer Instructions === */
+DEF_HELPER_FLAGS_4(gvec_vacc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
 DEF_HELPER_4(diag, void, env, i32, i32, i32)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 74a0ccc770..f0e62b9aa8 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1058,6 +1058,8 @@
 
 /* VECTOR ADD */
     F(0xe7f3, VA,      VRR_c, V,   0, 0, 0, 0, va, 0, IF_VEC)
+/* VECTOR ADD COMPUTE CARRY */
+    F(0xe7f1, VACC,    VRR_c, V,   0, 0, 0, 0, vacc, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 2f84ea0511..c3bc47f1a9 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -136,6 +136,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
     tcg_temp_free_i64(tmp);
 }
 
+#define gen_gvec_3(v1, v2, v3, gen) \
+    tcg_gen_gvec_3(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                   vec_full_reg_offset(v3), 16, 16, gen)
 #define gen_gvec_3_ool(v1, v2, v3, data, fn) \
     tcg_gen_gvec_3_ool(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                        vec_full_reg_offset(v3), 16, 16, data, fn)
@@ -985,3 +988,74 @@ static DisasJumpType op_va(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static void gen_acc(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, uint8_t es)
+{
+    const uint8_t msb_bit_nr = NUM_VEC_ELEMENT_BITS(es) - 1;
+    TCGv_i64 msb_mask = tcg_const_i64(dup_const(es, 1ull << msb_bit_nr));
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+    TCGv_i64 t3 = tcg_temp_new_i64();
+
+    /* Calculate the carry into the MSB, ignoring the old MSBs */
+    tcg_gen_andc_i64(t1, a, msb_mask);
+    tcg_gen_andc_i64(t2, b, msb_mask);
+    tcg_gen_add_i64(t1, t1, t2);
+    /* Calculate the MSB without any carry into it */
+    tcg_gen_xor_i64(t3, a, b);
+    /* Calculate the carry out of the MSB in the MSB bit position */
+    tcg_gen_and_i64(d, a, b);
+    tcg_gen_and_i64(t1, t1, t3);
+    tcg_gen_or_i64(d, d, t1);
+    /* Isolate and shift the carry into position */
+    tcg_gen_and_i64(d, d, msb_mask);
+    tcg_gen_shri_i64(d, d, msb_bit_nr);
+
+    tcg_temp_free_i64(t1);
+    tcg_temp_free_i64(t2);
+    tcg_temp_free_i64(t3);
+}
+
+static void gen_acc8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    gen_acc(d, a, b, ES_8);
+}
+
+static void gen_acc16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    gen_acc(d, a, b, ES_16);
+}
+
+static void gen_acc32_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    gen_acc(d, a, b, ES_32);
+}
+
+static void gen_acc_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t = tcg_temp_new_i64();
+
+    tcg_gen_add_i64(t, a, b);
+    tcg_gen_setcond_i64(TCG_COND_LTU, d, t, b);
+    tcg_temp_free_i64(t);
+}
+
+static DisasJumpType op_vacc(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g[5] = {
+        { .fni8 = gen_acc8_i64, },
+        { .fni8 = gen_acc16_i64, },
+        { .fni8 = gen_acc32_i64, },
+        { .fni8 = gen_acc_i64, },
+        { .fno = gen_helper_gvec_vacc128, },
+    };
+
+    if (es > ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
new file mode 100644
index 0000000000..0b232571bc
--- /dev/null
+++ b/target/s390x/vec_int_helper.c
@@ -0,0 +1,47 @@
+/*
+ * QEMU TCG support -- s390x vector integer instruction support
+ *
+ * Copyright (C) 2019 Red Hat Inc
+ *
+ * Authors:
+ *   David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "vec.h"
+#include "exec/helper-proto.h"
+
+/*
+ * Add two 128 bit vectors, returning the carry.
+ */
+static bool s390_vec_add(S390Vector *d, const S390Vector *a,
+                         const S390Vector *b)
+{
+    bool low_carry = false, high_carry = false;
+
+    if (a->doubleword[0] + b->doubleword[0] < a->doubleword[0]) {
+        high_carry = true;
+    }
+    if (a->doubleword[1] + b->doubleword[1] < a->doubleword[1]) {
+        low_carry = true;
+        if (a->doubleword[0] == b->doubleword[0]) {
+            high_carry = true;
+        }
+    }
+    d->doubleword[0] = a->doubleword[0] + b->doubleword[0] + low_carry;
+    d->doubleword[1] = a->doubleword[1] + b->doubleword[1];
+    return high_carry;
+}
+
+void HELPER(gvec_vacc128)(void *v1, const void *v2, const void *v3,
+                          uint32_t desc)
+{
+    S390Vector tmp, *dst = v1;
+
+    dst->doubleword[0] = 0;
+    dst->doubleword[1] = s390_vec_add(&tmp, v2, v3);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 04/41] s390x/tcg: Implement VECTOR ADD WITH CARRY
@ 2019-04-11 10:07   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Only slightly ugly, perform two additions. At least it is only supported
for 128 bit elements.

Introduce gen_gvec128_4_i64() similar to gen_gvec128_3_i64().

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 63 +++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index f0e62b9aa8..38d1e22a6d 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1060,6 +1060,8 @@
     F(0xe7f3, VA,      VRR_c, V,   0, 0, 0, 0, va, 0, IF_VEC)
 /* VECTOR ADD COMPUTE CARRY */
     F(0xe7f1, VACC,    VRR_c, V,   0, 0, 0, 0, vacc, 0, IF_VEC)
+/* VECTOR ADD WITH CARRY */
+    F(0xe7bb, VAC,     VRR_d, V,   0, 0, 0, 0, vac, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index c3bc47f1a9..111b0b7c69 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -196,6 +196,41 @@ static void gen_gvec128_3_i64(gen_gvec128_3_i64_fn fn, uint8_t d, uint8_t a,
         tcg_temp_free_i64(bl);
 }
 
+typedef void (*gen_gvec128_4_i64_fn)(TCGv_i64 dl, TCGv_i64 dh, TCGv_i64 al,
+                                     TCGv_i64 ah, TCGv_i64 bl, TCGv_i64 bh,
+                                     TCGv_i64 cl, TCGv_i64 ch);
+static void gen_gvec128_4_i64(gen_gvec128_4_i64_fn fn, uint8_t d, uint8_t a,
+                              uint8_t b, uint8_t c)
+{
+        TCGv_i64 dh = tcg_temp_new_i64();
+        TCGv_i64 dl = tcg_temp_new_i64();
+        TCGv_i64 ah = tcg_temp_new_i64();
+        TCGv_i64 al = tcg_temp_new_i64();
+        TCGv_i64 bh = tcg_temp_new_i64();
+        TCGv_i64 bl = tcg_temp_new_i64();
+        TCGv_i64 ch = tcg_temp_new_i64();
+        TCGv_i64 cl = tcg_temp_new_i64();
+
+        read_vec_element_i64(ah, a, 0, ES_64);
+        read_vec_element_i64(al, a, 1, ES_64);
+        read_vec_element_i64(bh, b, 0, ES_64);
+        read_vec_element_i64(bl, b, 1, ES_64);
+        read_vec_element_i64(ch, c, 0, ES_64);
+        read_vec_element_i64(cl, c, 1, ES_64);
+        fn(dl, dh, al, ah, bl, bh, cl, ch);
+        write_vec_element_i64(dh, d, 0, ES_64);
+        write_vec_element_i64(dl, d, 1, ES_64);
+
+        tcg_temp_free_i64(dh);
+        tcg_temp_free_i64(dl);
+        tcg_temp_free_i64(ah);
+        tcg_temp_free_i64(al);
+        tcg_temp_free_i64(bh);
+        tcg_temp_free_i64(bl);
+        tcg_temp_free_i64(ch);
+        tcg_temp_free_i64(cl);
+}
+
 static void gen_gvec_dupi(uint8_t es, uint8_t reg, uint64_t c)
 {
     switch (es) {
@@ -1059,3 +1094,31 @@ static DisasJumpType op_vacc(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), &g[es]);
     return DISAS_NEXT;
 }
+
+static void gen_ac2_i64(TCGv_i64 dl, TCGv_i64 dh, TCGv_i64 al, TCGv_i64 ah,
+                        TCGv_i64 bl, TCGv_i64 bh, TCGv_i64 cl, TCGv_i64 ch)
+{
+    TCGv_i64 tl = tcg_temp_new_i64();
+    TCGv_i64 th = tcg_const_i64(0);
+
+    /* extract the carry only */
+    tcg_gen_extract_i64(tl, cl, 0, 1);
+    tcg_gen_add2_i64(dl, dh, al, ah, bl, bh);
+    tcg_gen_add2_i64(dl, dh, dl, dh, tl, th);
+
+    tcg_temp_free_i64(tl);
+    tcg_temp_free_i64(th);
+}
+
+static DisasJumpType op_vac(DisasContext *s, DisasOps *o)
+{
+    if (get_field(s->fields, m5) != ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec128_4_i64(gen_ac2_i64, get_field(s->fields, v1),
+                      get_field(s->fields, v2), get_field(s->fields, v3),
+                      get_field(s->fields, v4));
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 04/41] s390x/tcg: Implement VECTOR ADD WITH CARRY
@ 2019-04-11 10:07   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Only slightly ugly, perform two additions. At least it is only supported
for 128 bit elements.

Introduce gen_gvec128_4_i64() similar to gen_gvec128_3_i64().

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 63 +++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index f0e62b9aa8..38d1e22a6d 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1060,6 +1060,8 @@
     F(0xe7f3, VA,      VRR_c, V,   0, 0, 0, 0, va, 0, IF_VEC)
 /* VECTOR ADD COMPUTE CARRY */
     F(0xe7f1, VACC,    VRR_c, V,   0, 0, 0, 0, vacc, 0, IF_VEC)
+/* VECTOR ADD WITH CARRY */
+    F(0xe7bb, VAC,     VRR_d, V,   0, 0, 0, 0, vac, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index c3bc47f1a9..111b0b7c69 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -196,6 +196,41 @@ static void gen_gvec128_3_i64(gen_gvec128_3_i64_fn fn, uint8_t d, uint8_t a,
         tcg_temp_free_i64(bl);
 }
 
+typedef void (*gen_gvec128_4_i64_fn)(TCGv_i64 dl, TCGv_i64 dh, TCGv_i64 al,
+                                     TCGv_i64 ah, TCGv_i64 bl, TCGv_i64 bh,
+                                     TCGv_i64 cl, TCGv_i64 ch);
+static void gen_gvec128_4_i64(gen_gvec128_4_i64_fn fn, uint8_t d, uint8_t a,
+                              uint8_t b, uint8_t c)
+{
+        TCGv_i64 dh = tcg_temp_new_i64();
+        TCGv_i64 dl = tcg_temp_new_i64();
+        TCGv_i64 ah = tcg_temp_new_i64();
+        TCGv_i64 al = tcg_temp_new_i64();
+        TCGv_i64 bh = tcg_temp_new_i64();
+        TCGv_i64 bl = tcg_temp_new_i64();
+        TCGv_i64 ch = tcg_temp_new_i64();
+        TCGv_i64 cl = tcg_temp_new_i64();
+
+        read_vec_element_i64(ah, a, 0, ES_64);
+        read_vec_element_i64(al, a, 1, ES_64);
+        read_vec_element_i64(bh, b, 0, ES_64);
+        read_vec_element_i64(bl, b, 1, ES_64);
+        read_vec_element_i64(ch, c, 0, ES_64);
+        read_vec_element_i64(cl, c, 1, ES_64);
+        fn(dl, dh, al, ah, bl, bh, cl, ch);
+        write_vec_element_i64(dh, d, 0, ES_64);
+        write_vec_element_i64(dl, d, 1, ES_64);
+
+        tcg_temp_free_i64(dh);
+        tcg_temp_free_i64(dl);
+        tcg_temp_free_i64(ah);
+        tcg_temp_free_i64(al);
+        tcg_temp_free_i64(bh);
+        tcg_temp_free_i64(bl);
+        tcg_temp_free_i64(ch);
+        tcg_temp_free_i64(cl);
+}
+
 static void gen_gvec_dupi(uint8_t es, uint8_t reg, uint64_t c)
 {
     switch (es) {
@@ -1059,3 +1094,31 @@ static DisasJumpType op_vacc(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), &g[es]);
     return DISAS_NEXT;
 }
+
+static void gen_ac2_i64(TCGv_i64 dl, TCGv_i64 dh, TCGv_i64 al, TCGv_i64 ah,
+                        TCGv_i64 bl, TCGv_i64 bh, TCGv_i64 cl, TCGv_i64 ch)
+{
+    TCGv_i64 tl = tcg_temp_new_i64();
+    TCGv_i64 th = tcg_const_i64(0);
+
+    /* extract the carry only */
+    tcg_gen_extract_i64(tl, cl, 0, 1);
+    tcg_gen_add2_i64(dl, dh, al, ah, bl, bh);
+    tcg_gen_add2_i64(dl, dh, dl, dh, tl, th);
+
+    tcg_temp_free_i64(tl);
+    tcg_temp_free_i64(th);
+}
+
+static DisasJumpType op_vac(DisasContext *s, DisasOps *o)
+{
+    if (get_field(s->fields, m5) != ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec128_4_i64(gen_ac2_i64, get_field(s->fields, v1),
+                      get_field(s->fields, v2), get_field(s->fields, v3),
+                      get_field(s->fields, v4));
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 05/41] s390x/tcg: Implement VECTOR ADD WITH CARRY COMPUTE CARRY
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Again, use a helper as calculating the carry is even more involved than
for VECTOR ADD COMPUTE CARRY.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 13 +++++++++++++
 target/s390x/vec_int_helper.c   | 16 ++++++++++++++++
 4 files changed, 32 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 2c1b223248..e1847e8877 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -147,6 +147,7 @@ DEF_HELPER_FLAGS_4(vstl, TCG_CALL_NO_WG, void, env, cptr, i64, i64)
 
 /* === Vector Integer Instructions === */
 DEF_HELPER_FLAGS_4(gvec_vacc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vaccc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 38d1e22a6d..a531b21908 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1062,6 +1062,8 @@
     F(0xe7f1, VACC,    VRR_c, V,   0, 0, 0, 0, vacc, 0, IF_VEC)
 /* VECTOR ADD WITH CARRY */
     F(0xe7bb, VAC,     VRR_d, V,   0, 0, 0, 0, vac, 0, IF_VEC)
+/* VECTOR ADD WITH CARRY COMPUTE CARRY */
+    F(0xe7b9, VACCC,   VRR_d, V,   0, 0, 0, 0, vaccc, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 111b0b7c69..a264aa0c5a 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1122,3 +1122,16 @@ static DisasJumpType op_vac(DisasContext *s, DisasOps *o)
                       get_field(s->fields, v4));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vaccc(DisasContext *s, DisasOps *o)
+{
+    if (get_field(s->fields, m5) != ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec_4_ool(get_field(s->fields, v1), get_field(s->fields, v2),
+                   get_field(s->fields, v3), get_field(s->fields, v4), 0,
+                   gen_helper_gvec_vaccc128);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 0b232571bc..97fc559da0 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -45,3 +45,19 @@ void HELPER(gvec_vacc128)(void *v1, const void *v2, const void *v3,
     dst->doubleword[0] = 0;
     dst->doubleword[1] = s390_vec_add(&tmp, v2, v3);
 }
+
+void HELPER(gvec_vaccc128)(void *v1, const void *v2, const void *v3,
+                           const void *v4, uint32_t desc)
+{
+    const S390Vector old_carry = {
+        .doubleword[0] = 0,
+        .doubleword[1] = ((S390Vector *)v4)->doubleword[1] & 1,
+    };
+    S390Vector tmp, *dst = v1;
+    bool carry;
+
+    carry = s390_vec_add(&tmp, v2, v3);
+    carry |= s390_vec_add(&tmp, &tmp, &old_carry);
+    dst->doubleword[0] = 0;
+    dst->doubleword[1] = carry;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 05/41] s390x/tcg: Implement VECTOR ADD WITH CARRY COMPUTE CARRY
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Again, use a helper as calculating the carry is even more involved than
for VECTOR ADD COMPUTE CARRY.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 13 +++++++++++++
 target/s390x/vec_int_helper.c   | 16 ++++++++++++++++
 4 files changed, 32 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 2c1b223248..e1847e8877 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -147,6 +147,7 @@ DEF_HELPER_FLAGS_4(vstl, TCG_CALL_NO_WG, void, env, cptr, i64, i64)
 
 /* === Vector Integer Instructions === */
 DEF_HELPER_FLAGS_4(gvec_vacc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vaccc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 38d1e22a6d..a531b21908 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1062,6 +1062,8 @@
     F(0xe7f1, VACC,    VRR_c, V,   0, 0, 0, 0, vacc, 0, IF_VEC)
 /* VECTOR ADD WITH CARRY */
     F(0xe7bb, VAC,     VRR_d, V,   0, 0, 0, 0, vac, 0, IF_VEC)
+/* VECTOR ADD WITH CARRY COMPUTE CARRY */
+    F(0xe7b9, VACCC,   VRR_d, V,   0, 0, 0, 0, vaccc, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 111b0b7c69..a264aa0c5a 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1122,3 +1122,16 @@ static DisasJumpType op_vac(DisasContext *s, DisasOps *o)
                       get_field(s->fields, v4));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vaccc(DisasContext *s, DisasOps *o)
+{
+    if (get_field(s->fields, m5) != ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec_4_ool(get_field(s->fields, v1), get_field(s->fields, v2),
+                   get_field(s->fields, v3), get_field(s->fields, v4), 0,
+                   gen_helper_gvec_vaccc128);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 0b232571bc..97fc559da0 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -45,3 +45,19 @@ void HELPER(gvec_vacc128)(void *v1, const void *v2, const void *v3,
     dst->doubleword[0] = 0;
     dst->doubleword[1] = s390_vec_add(&tmp, v2, v3);
 }
+
+void HELPER(gvec_vaccc128)(void *v1, const void *v2, const void *v3,
+                           const void *v4, uint32_t desc)
+{
+    const S390Vector old_carry = {
+        .doubleword[0] = 0,
+        .doubleword[1] = ((S390Vector *)v4)->doubleword[1] & 1,
+    };
+    S390Vector tmp, *dst = v1;
+    bool carry;
+
+    carry = s390_vec_add(&tmp, v2, v3);
+    carry |= s390_vec_add(&tmp, &tmp, &old_carry);
+    dst->doubleword[0] = 0;
+    dst->doubleword[1] = carry;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 06/41] s390x/tcg: Implement VECTOR AND (WITH COMPLEMENT)
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Easy, as we can reuse existing gvec helpers.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  4 ++++
 target/s390x/translate_vx.inc.c | 14 ++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index a531b21908..456d5597ca 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1064,6 +1064,10 @@
     F(0xe7bb, VAC,     VRR_d, V,   0, 0, 0, 0, vac, 0, IF_VEC)
 /* VECTOR ADD WITH CARRY COMPUTE CARRY */
     F(0xe7b9, VACCC,   VRR_d, V,   0, 0, 0, 0, vaccc, 0, IF_VEC)
+/* VECTOR AND */
+    F(0xe768, VN,      VRR_c, V,   0, 0, 0, 0, vn, 0, IF_VEC)
+/* VECTOR AND WITH COMPLEMENT */
+    F(0xe769, VNC,     VRR_c, V,   0, 0, 0, 0, vnc, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index a264aa0c5a..aaa247e855 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1135,3 +1135,17 @@ static DisasJumpType op_vaccc(DisasContext *s, DisasOps *o)
                    gen_helper_gvec_vaccc128);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vn(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(and, ES_8, get_field(s->fields, v1), get_field(s->fields, v2),
+                  get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
+
+static DisasJumpType op_vnc(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(andc, ES_8, get_field(s->fields, v1),
+                  get_field(s->fields, v2), get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 06/41] s390x/tcg: Implement VECTOR AND (WITH COMPLEMENT)
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Easy, as we can reuse existing gvec helpers.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  4 ++++
 target/s390x/translate_vx.inc.c | 14 ++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index a531b21908..456d5597ca 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1064,6 +1064,10 @@
     F(0xe7bb, VAC,     VRR_d, V,   0, 0, 0, 0, vac, 0, IF_VEC)
 /* VECTOR ADD WITH CARRY COMPUTE CARRY */
     F(0xe7b9, VACCC,   VRR_d, V,   0, 0, 0, 0, vaccc, 0, IF_VEC)
+/* VECTOR AND */
+    F(0xe768, VN,      VRR_c, V,   0, 0, 0, 0, vn, 0, IF_VEC)
+/* VECTOR AND WITH COMPLEMENT */
+    F(0xe769, VNC,     VRR_c, V,   0, 0, 0, 0, vnc, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index a264aa0c5a..aaa247e855 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1135,3 +1135,17 @@ static DisasJumpType op_vaccc(DisasContext *s, DisasOps *o)
                    gen_helper_gvec_vaccc128);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vn(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(and, ES_8, get_field(s->fields, v1), get_field(s->fields, v2),
+                  get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
+
+static DisasJumpType op_vnc(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(andc, ES_8, get_field(s->fields, v1),
+                  get_field(s->fields, v2), get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 07/41] s390x/tcg: Implement VECTOR AVERAGE
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Handle 32/64-bit elements via gvec expansion and the 8/16 bits via
ool helpers.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  2 ++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 63 +++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 16 +++++++++
 4 files changed, 83 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index e1847e8877..2b6b716909 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -148,6 +148,8 @@ DEF_HELPER_FLAGS_4(vstl, TCG_CALL_NO_WG, void, env, cptr, i64, i64)
 /* === Vector Integer Instructions === */
 DEF_HELPER_FLAGS_4(gvec_vacc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vaccc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vavg8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vavg16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 456d5597ca..6f8b42e327 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1068,6 +1068,8 @@
     F(0xe768, VN,      VRR_c, V,   0, 0, 0, 0, vn, 0, IF_VEC)
 /* VECTOR AND WITH COMPLEMENT */
     F(0xe769, VNC,     VRR_c, V,   0, 0, 0, 0, vnc, 0, IF_VEC)
+/* VECTOR AVERAGE */
+    F(0xe7f2, VAVG,    VRR_c, V,   0, 0, 0, 0, vavg, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index aaa247e855..50e03bf151 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -256,6 +256,17 @@ static void zero_vec(uint8_t reg)
     tcg_gen_gvec_dup8i(vec_full_reg_offset(reg), 16, 16, 0);
 }
 
+static void gen_addi2_i64(TCGv_i64 dl, TCGv_i64 dh, TCGv_i64 al, TCGv_i64 ah,
+                          uint64_t b)
+{
+    TCGv_i64 bl = tcg_const_i64(b);
+    TCGv_i64 bh = tcg_const_i64(0);
+
+    tcg_gen_add2_i64(dl, dh, al, ah, bl, bh);
+    tcg_temp_free_i64(bl);
+    tcg_temp_free_i64(bh);
+}
+
 static DisasJumpType op_vge(DisasContext *s, DisasOps *o)
 {
     const uint8_t es = s->insn->data;
@@ -1149,3 +1160,55 @@ static DisasJumpType op_vnc(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v2), get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static void gen_avg_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    tcg_gen_ext_i32_i64(t0, a);
+    tcg_gen_ext_i32_i64(t1, b);
+    tcg_gen_add_i64(t0, t0, t1);
+    tcg_gen_addi_i64(t0, t0, 1);
+    tcg_gen_shri_i64(t0, t0, 1);
+    tcg_gen_extrl_i64_i32(d, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+}
+
+static void gen_avg_i64(TCGv_i64 dl, TCGv_i64 al, TCGv_i64 bl)
+{
+    TCGv_i64 dh = tcg_temp_new_i64();
+    TCGv_i64 ah = tcg_temp_new_i64();
+    TCGv_i64 bh = tcg_temp_new_i64();
+
+    /* extending the sign by one bit is sufficient */
+    tcg_gen_extract_i64(ah, al, 63, 1);
+    tcg_gen_extract_i64(bh, bl, 63, 1);
+    tcg_gen_add2_i64(dl, dh, al, ah, bl, bh);
+    gen_addi2_i64(dl, dh, dl, dh, 1);
+    tcg_gen_extract2_i64(dl, dl, dh, 1);
+
+    tcg_temp_free_i64(dh);
+    tcg_temp_free_i64(ah);
+    tcg_temp_free_i64(bh);
+}
+static DisasJumpType op_vavg(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g[4] = {
+        { .fno = gen_helper_gvec_vavg8, },
+        { .fno = gen_helper_gvec_vavg16, },
+        { .fni4 = gen_avg_i32, },
+        { .fni8 = gen_avg_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 97fc559da0..149cfaaeae 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -61,3 +61,19 @@ void HELPER(gvec_vaccc128)(void *v1, const void *v2, const void *v3,
     dst->doubleword[0] = 0;
     dst->doubleword[1] = carry;
 }
+
+#define DEF_VAVG(BITS)                                                         \
+void HELPER(gvec_vavg##BITS)(void *v1, const void *v2, const void *v3,         \
+                             uint32_t desc)                                    \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const int32_t a = (int##BITS##_t)s390_vec_read_element##BITS(v2, i);   \
+        const int32_t b = (int##BITS##_t)s390_vec_read_element##BITS(v3, i);   \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, (a + b + 1) >> 1);                 \
+    }                                                                          \
+}
+DEF_VAVG(8)
+DEF_VAVG(16)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 07/41] s390x/tcg: Implement VECTOR AVERAGE
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Handle 32/64-bit elements via gvec expansion and the 8/16 bits via
ool helpers.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  2 ++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 63 +++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 16 +++++++++
 4 files changed, 83 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index e1847e8877..2b6b716909 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -148,6 +148,8 @@ DEF_HELPER_FLAGS_4(vstl, TCG_CALL_NO_WG, void, env, cptr, i64, i64)
 /* === Vector Integer Instructions === */
 DEF_HELPER_FLAGS_4(gvec_vacc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vaccc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vavg8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vavg16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 456d5597ca..6f8b42e327 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1068,6 +1068,8 @@
     F(0xe768, VN,      VRR_c, V,   0, 0, 0, 0, vn, 0, IF_VEC)
 /* VECTOR AND WITH COMPLEMENT */
     F(0xe769, VNC,     VRR_c, V,   0, 0, 0, 0, vnc, 0, IF_VEC)
+/* VECTOR AVERAGE */
+    F(0xe7f2, VAVG,    VRR_c, V,   0, 0, 0, 0, vavg, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index aaa247e855..50e03bf151 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -256,6 +256,17 @@ static void zero_vec(uint8_t reg)
     tcg_gen_gvec_dup8i(vec_full_reg_offset(reg), 16, 16, 0);
 }
 
+static void gen_addi2_i64(TCGv_i64 dl, TCGv_i64 dh, TCGv_i64 al, TCGv_i64 ah,
+                          uint64_t b)
+{
+    TCGv_i64 bl = tcg_const_i64(b);
+    TCGv_i64 bh = tcg_const_i64(0);
+
+    tcg_gen_add2_i64(dl, dh, al, ah, bl, bh);
+    tcg_temp_free_i64(bl);
+    tcg_temp_free_i64(bh);
+}
+
 static DisasJumpType op_vge(DisasContext *s, DisasOps *o)
 {
     const uint8_t es = s->insn->data;
@@ -1149,3 +1160,55 @@ static DisasJumpType op_vnc(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v2), get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static void gen_avg_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    tcg_gen_ext_i32_i64(t0, a);
+    tcg_gen_ext_i32_i64(t1, b);
+    tcg_gen_add_i64(t0, t0, t1);
+    tcg_gen_addi_i64(t0, t0, 1);
+    tcg_gen_shri_i64(t0, t0, 1);
+    tcg_gen_extrl_i64_i32(d, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+}
+
+static void gen_avg_i64(TCGv_i64 dl, TCGv_i64 al, TCGv_i64 bl)
+{
+    TCGv_i64 dh = tcg_temp_new_i64();
+    TCGv_i64 ah = tcg_temp_new_i64();
+    TCGv_i64 bh = tcg_temp_new_i64();
+
+    /* extending the sign by one bit is sufficient */
+    tcg_gen_extract_i64(ah, al, 63, 1);
+    tcg_gen_extract_i64(bh, bl, 63, 1);
+    tcg_gen_add2_i64(dl, dh, al, ah, bl, bh);
+    gen_addi2_i64(dl, dh, dl, dh, 1);
+    tcg_gen_extract2_i64(dl, dl, dh, 1);
+
+    tcg_temp_free_i64(dh);
+    tcg_temp_free_i64(ah);
+    tcg_temp_free_i64(bh);
+}
+static DisasJumpType op_vavg(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g[4] = {
+        { .fno = gen_helper_gvec_vavg8, },
+        { .fno = gen_helper_gvec_vavg16, },
+        { .fni4 = gen_avg_i32, },
+        { .fni8 = gen_avg_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 97fc559da0..149cfaaeae 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -61,3 +61,19 @@ void HELPER(gvec_vaccc128)(void *v1, const void *v2, const void *v3,
     dst->doubleword[0] = 0;
     dst->doubleword[1] = carry;
 }
+
+#define DEF_VAVG(BITS)                                                         \
+void HELPER(gvec_vavg##BITS)(void *v1, const void *v2, const void *v3,         \
+                             uint32_t desc)                                    \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const int32_t a = (int##BITS##_t)s390_vec_read_element##BITS(v2, i);   \
+        const int32_t b = (int##BITS##_t)s390_vec_read_element##BITS(v3, i);   \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, (a + b + 1) >> 1);                 \
+    }                                                                          \
+}
+DEF_VAVG(8)
+DEF_VAVG(16)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 08/41] s390x/tcg: Implement VECTOR AVERAGE LOGICAL
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Similar to VECTOR AVERAGE but without sign extension.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  2 ++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 48 +++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 16 +++++++++++
 4 files changed, 68 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 2b6b716909..04a3f5fb2e 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -150,6 +150,8 @@ DEF_HELPER_FLAGS_4(gvec_vacc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vaccc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vavg8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vavg16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vavgl8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vavgl16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 6f8b42e327..9889dc0b01 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1070,6 +1070,8 @@
     F(0xe769, VNC,     VRR_c, V,   0, 0, 0, 0, vnc, 0, IF_VEC)
 /* VECTOR AVERAGE */
     F(0xe7f2, VAVG,    VRR_c, V,   0, 0, 0, 0, vavg, 0, IF_VEC)
+/* VECTOR AVERAGE LOGICAL */
+    F(0xe7f0, VAVGL,   VRR_c, V,   0, 0, 0, 0, vavgl, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 50e03bf151..a190ac57ee 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1212,3 +1212,51 @@ static DisasJumpType op_vavg(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), &g[es]);
     return DISAS_NEXT;
 }
+
+static void gen_avgl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    tcg_gen_extu_i32_i64(t0, a);
+    tcg_gen_extu_i32_i64(t1, b);
+    tcg_gen_add_i64(t0, t0, t1);
+    tcg_gen_addi_i64(t0, t0, 1);
+    tcg_gen_shri_i64(t0, t0, 1);
+    tcg_gen_extrl_i64_i32(d, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+}
+
+static void gen_avgl_i64(TCGv_i64 dl, TCGv_i64 al, TCGv_i64 bl)
+{
+    TCGv_i64 dh = tcg_temp_new_i64();
+    TCGv_i64 zero = tcg_const_i64(0);
+
+    tcg_gen_add2_i64(dl, dh, al, zero, bl, zero);
+    gen_addi2_i64(dl, dh, dl, dh, 1);
+    tcg_gen_extract2_i64(dl, dl, dh, 1);
+
+    tcg_temp_free_i64(dh);
+    tcg_temp_free_i64(zero);
+}
+
+static DisasJumpType op_vavgl(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g[4] = {
+        { .fno = gen_helper_gvec_vavgl8, },
+        { .fno = gen_helper_gvec_vavgl16, },
+        { .fni4 = gen_avgl_i32, },
+        { .fni8 = gen_avgl_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 149cfaaeae..c2aeb12108 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -77,3 +77,19 @@ void HELPER(gvec_vavg##BITS)(void *v1, const void *v2, const void *v3,         \
 }
 DEF_VAVG(8)
 DEF_VAVG(16)
+
+#define DEF_VAVGL(BITS)                                                        \
+void HELPER(gvec_vavgl##BITS)(void *v1, const void *v2, const void *v3,        \
+                              uint32_t desc)                                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint##BITS##_t b = s390_vec_read_element##BITS(v3, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, (a + b + 1) >> 1);                 \
+    }                                                                          \
+}
+DEF_VAVGL(8)
+DEF_VAVGL(16)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 08/41] s390x/tcg: Implement VECTOR AVERAGE LOGICAL
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Similar to VECTOR AVERAGE but without sign extension.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  2 ++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 48 +++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 16 +++++++++++
 4 files changed, 68 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 2b6b716909..04a3f5fb2e 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -150,6 +150,8 @@ DEF_HELPER_FLAGS_4(gvec_vacc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vaccc128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vavg8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vavg16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vavgl8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vavgl16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 6f8b42e327..9889dc0b01 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1070,6 +1070,8 @@
     F(0xe769, VNC,     VRR_c, V,   0, 0, 0, 0, vnc, 0, IF_VEC)
 /* VECTOR AVERAGE */
     F(0xe7f2, VAVG,    VRR_c, V,   0, 0, 0, 0, vavg, 0, IF_VEC)
+/* VECTOR AVERAGE LOGICAL */
+    F(0xe7f0, VAVGL,   VRR_c, V,   0, 0, 0, 0, vavgl, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 50e03bf151..a190ac57ee 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1212,3 +1212,51 @@ static DisasJumpType op_vavg(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), &g[es]);
     return DISAS_NEXT;
 }
+
+static void gen_avgl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    tcg_gen_extu_i32_i64(t0, a);
+    tcg_gen_extu_i32_i64(t1, b);
+    tcg_gen_add_i64(t0, t0, t1);
+    tcg_gen_addi_i64(t0, t0, 1);
+    tcg_gen_shri_i64(t0, t0, 1);
+    tcg_gen_extrl_i64_i32(d, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+}
+
+static void gen_avgl_i64(TCGv_i64 dl, TCGv_i64 al, TCGv_i64 bl)
+{
+    TCGv_i64 dh = tcg_temp_new_i64();
+    TCGv_i64 zero = tcg_const_i64(0);
+
+    tcg_gen_add2_i64(dl, dh, al, zero, bl, zero);
+    gen_addi2_i64(dl, dh, dl, dh, 1);
+    tcg_gen_extract2_i64(dl, dl, dh, 1);
+
+    tcg_temp_free_i64(dh);
+    tcg_temp_free_i64(zero);
+}
+
+static DisasJumpType op_vavgl(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g[4] = {
+        { .fno = gen_helper_gvec_vavgl8, },
+        { .fno = gen_helper_gvec_vavgl16, },
+        { .fni4 = gen_avgl_i32, },
+        { .fni8 = gen_avgl_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 149cfaaeae..c2aeb12108 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -77,3 +77,19 @@ void HELPER(gvec_vavg##BITS)(void *v1, const void *v2, const void *v3,         \
 }
 DEF_VAVG(8)
 DEF_VAVG(16)
+
+#define DEF_VAVGL(BITS)                                                        \
+void HELPER(gvec_vavgl##BITS)(void *v1, const void *v2, const void *v3,        \
+                              uint32_t desc)                                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint##BITS##_t b = s390_vec_read_element##BITS(v3, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, (a + b + 1) >> 1);                 \
+    }                                                                          \
+}
+DEF_VAVGL(8)
+DEF_VAVGL(16)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 09/41] s390x/tcg: Implement VECTOR CHECKSUM
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Time to introduce read_vec_element_i32 and write_vec_element_i32.
Take proper care of properly adding the carry.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 +
 target/s390x/translate_vx.inc.c | 67 +++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 9889dc0b01..64459465c5 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1072,6 +1072,8 @@
     F(0xe7f2, VAVG,    VRR_c, V,   0, 0, 0, 0, vavg, 0, IF_VEC)
 /* VECTOR AVERAGE LOGICAL */
     F(0xe7f0, VAVGL,   VRR_c, V,   0, 0, 0, 0, vavgl, 0, IF_VEC)
+/* VECTOR CHECKSUM */
+    F(0xe766, VCKSM,   VRR_c, V,   0, 0, 0, 0, vcksm, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index a190ac57ee..7a7e185d43 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -90,6 +90,33 @@ static void read_vec_element_i64(TCGv_i64 dst, uint8_t reg, uint8_t enr,
     }
 }
 
+static void read_vec_element_i32(TCGv_i32 dst, uint8_t reg, uint8_t enr,
+                                 TCGMemOp memop)
+{
+    const int offs = vec_reg_offset(reg, enr, memop & MO_SIZE);
+
+    switch (memop) {
+    case ES_8:
+        tcg_gen_ld8u_i32(dst, cpu_env, offs);
+        break;
+    case ES_16:
+        tcg_gen_ld16u_i32(dst, cpu_env, offs);
+        break;
+    case ES_8 | MO_SIGN:
+        tcg_gen_ld8s_i32(dst, cpu_env, offs);
+        break;
+    case ES_16 | MO_SIGN:
+        tcg_gen_ld16s_i32(dst, cpu_env, offs);
+        break;
+    case ES_32:
+    case ES_32 | MO_SIGN:
+        tcg_gen_ld_i32(dst, cpu_env, offs);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void write_vec_element_i64(TCGv_i64 src, int reg, uint8_t enr,
                                   TCGMemOp memop)
 {
@@ -113,6 +140,25 @@ static void write_vec_element_i64(TCGv_i64 src, int reg, uint8_t enr,
     }
 }
 
+static void write_vec_element_i32(TCGv_i32 src, int reg, uint8_t enr,
+                                  TCGMemOp memop)
+{
+    const int offs = vec_reg_offset(reg, enr, memop & MO_SIZE);
+
+    switch (memop) {
+    case ES_8:
+        tcg_gen_st8_i32(src, cpu_env, offs);
+        break;
+    case ES_16:
+        tcg_gen_st16_i32(src, cpu_env, offs);
+        break;
+    case ES_32:
+        tcg_gen_st_i32(src, cpu_env, offs);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
 
 static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
                                     uint8_t es)
@@ -1260,3 +1306,24 @@ static DisasJumpType op_vavgl(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), &g[es]);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vcksm(DisasContext *s, DisasOps *o)
+{
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    TCGv_i32 sum = tcg_temp_new_i32();
+    int i;
+
+    read_vec_element_i32(sum, get_field(s->fields, v3), 1, ES_32);
+    for (i = 0; i < 4; i++) {
+        read_vec_element_i32(tmp, get_field(s->fields, v2), i, ES_32);
+        tcg_gen_add_i32(sum, sum, tmp);
+        tcg_gen_setcond_i32(TCG_COND_LTU, tmp, sum, tmp);
+        tcg_gen_add_i32(sum, sum, tmp);
+    }
+    zero_vec(get_field(s->fields, v1));
+    write_vec_element_i32(sum, get_field(s->fields, v1), 1, ES_32);
+
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(sum);
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 09/41] s390x/tcg: Implement VECTOR CHECKSUM
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Time to introduce read_vec_element_i32 and write_vec_element_i32.
Take proper care of properly adding the carry.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 +
 target/s390x/translate_vx.inc.c | 67 +++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 9889dc0b01..64459465c5 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1072,6 +1072,8 @@
     F(0xe7f2, VAVG,    VRR_c, V,   0, 0, 0, 0, vavg, 0, IF_VEC)
 /* VECTOR AVERAGE LOGICAL */
     F(0xe7f0, VAVGL,   VRR_c, V,   0, 0, 0, 0, vavgl, 0, IF_VEC)
+/* VECTOR CHECKSUM */
+    F(0xe766, VCKSM,   VRR_c, V,   0, 0, 0, 0, vcksm, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index a190ac57ee..7a7e185d43 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -90,6 +90,33 @@ static void read_vec_element_i64(TCGv_i64 dst, uint8_t reg, uint8_t enr,
     }
 }
 
+static void read_vec_element_i32(TCGv_i32 dst, uint8_t reg, uint8_t enr,
+                                 TCGMemOp memop)
+{
+    const int offs = vec_reg_offset(reg, enr, memop & MO_SIZE);
+
+    switch (memop) {
+    case ES_8:
+        tcg_gen_ld8u_i32(dst, cpu_env, offs);
+        break;
+    case ES_16:
+        tcg_gen_ld16u_i32(dst, cpu_env, offs);
+        break;
+    case ES_8 | MO_SIGN:
+        tcg_gen_ld8s_i32(dst, cpu_env, offs);
+        break;
+    case ES_16 | MO_SIGN:
+        tcg_gen_ld16s_i32(dst, cpu_env, offs);
+        break;
+    case ES_32:
+    case ES_32 | MO_SIGN:
+        tcg_gen_ld_i32(dst, cpu_env, offs);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void write_vec_element_i64(TCGv_i64 src, int reg, uint8_t enr,
                                   TCGMemOp memop)
 {
@@ -113,6 +140,25 @@ static void write_vec_element_i64(TCGv_i64 src, int reg, uint8_t enr,
     }
 }
 
+static void write_vec_element_i32(TCGv_i32 src, int reg, uint8_t enr,
+                                  TCGMemOp memop)
+{
+    const int offs = vec_reg_offset(reg, enr, memop & MO_SIZE);
+
+    switch (memop) {
+    case ES_8:
+        tcg_gen_st8_i32(src, cpu_env, offs);
+        break;
+    case ES_16:
+        tcg_gen_st16_i32(src, cpu_env, offs);
+        break;
+    case ES_32:
+        tcg_gen_st_i32(src, cpu_env, offs);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
 
 static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
                                     uint8_t es)
@@ -1260,3 +1306,24 @@ static DisasJumpType op_vavgl(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), &g[es]);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vcksm(DisasContext *s, DisasOps *o)
+{
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    TCGv_i32 sum = tcg_temp_new_i32();
+    int i;
+
+    read_vec_element_i32(sum, get_field(s->fields, v3), 1, ES_32);
+    for (i = 0; i < 4; i++) {
+        read_vec_element_i32(tmp, get_field(s->fields, v2), i, ES_32);
+        tcg_gen_add_i32(sum, sum, tmp);
+        tcg_gen_setcond_i32(TCG_COND_LTU, tmp, sum, tmp);
+        tcg_gen_add_i32(sum, sum, tmp);
+    }
+    zero_vec(get_field(s->fields, v1));
+    write_vec_element_i32(sum, get_field(s->fields, v1), 1, ES_32);
+
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(sum);
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 10/41] s390x/tcg: Implement VECTOR ELEMENT COMPARE *
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Fairly easy to implement, we can make use of the existing CC helpers
cmps64 and cmpu64 - we siply have to sign extend the elements.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  4 ++++
 target/s390x/translate_vx.inc.c | 20 ++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 64459465c5..52e398f515 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1074,6 +1074,10 @@
     F(0xe7f0, VAVGL,   VRR_c, V,   0, 0, 0, 0, vavgl, 0, IF_VEC)
 /* VECTOR CHECKSUM */
     F(0xe766, VCKSM,   VRR_c, V,   0, 0, 0, 0, vcksm, 0, IF_VEC)
+/* VECTOR ELEMENT COMPARE */
+    F(0xe7db, VEC,     VRR_a, V,   0, 0, 0, 0, vec, cmps64, IF_VEC)
+/* VECTOR ELEMENT COMPARE LOGICAL */
+    F(0xe7d9, VECL,    VRR_a, V,   0, 0, 0, 0, vec, cmpu64, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 7a7e185d43..c7462d1bb1 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1327,3 +1327,23 @@ static DisasJumpType op_vcksm(DisasContext *s, DisasOps *o)
     tcg_temp_free_i32(sum);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vec(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m3);
+    const uint8_t enr = NUM_VEC_ELEMENTS(es) / 2 - 1;
+    const bool logical = s->fields->op2 == 0xd9;
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    o->in1 = tcg_temp_new_i64();
+    o->in2 = tcg_temp_new_i64();
+    read_vec_element_i64(o->in1, get_field(s->fields, v1), enr,
+                         es | logical ? 0 : MO_SIGN);
+    read_vec_element_i64(o->in2, get_field(s->fields, v2), enr,
+                         es | logical ? 0 : MO_SIGN);
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 10/41] s390x/tcg: Implement VECTOR ELEMENT COMPARE *
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Fairly easy to implement, we can make use of the existing CC helpers
cmps64 and cmpu64 - we siply have to sign extend the elements.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  4 ++++
 target/s390x/translate_vx.inc.c | 20 ++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 64459465c5..52e398f515 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1074,6 +1074,10 @@
     F(0xe7f0, VAVGL,   VRR_c, V,   0, 0, 0, 0, vavgl, 0, IF_VEC)
 /* VECTOR CHECKSUM */
     F(0xe766, VCKSM,   VRR_c, V,   0, 0, 0, 0, vcksm, 0, IF_VEC)
+/* VECTOR ELEMENT COMPARE */
+    F(0xe7db, VEC,     VRR_a, V,   0, 0, 0, 0, vec, cmps64, IF_VEC)
+/* VECTOR ELEMENT COMPARE LOGICAL */
+    F(0xe7d9, VECL,    VRR_a, V,   0, 0, 0, 0, vec, cmpu64, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 7a7e185d43..c7462d1bb1 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1327,3 +1327,23 @@ static DisasJumpType op_vcksm(DisasContext *s, DisasOps *o)
     tcg_temp_free_i32(sum);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vec(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m3);
+    const uint8_t enr = NUM_VEC_ELEMENTS(es) / 2 - 1;
+    const bool logical = s->fields->op2 == 0xd9;
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    o->in1 = tcg_temp_new_i64();
+    o->in2 = tcg_temp_new_i64();
+    read_vec_element_i64(o->in1, get_field(s->fields, v1), enr,
+                         es | logical ? 0 : MO_SIGN);
+    read_vec_element_i64(o->in2, get_field(s->fields, v2), enr,
+                         es | logical ? 0 : MO_SIGN);
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 11/41] s390x/tcg: Implement VECTOR COMPARE *
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

To carry out the comparison, we can reuse the existing gvec comparison
function. In case the CC is to be computed, save the result vector
and compute the CC lazily. The result is a vector consisting of all 1's
for elements that matched and 0's for elements that didn't match.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/cc_helper.c        | 17 +++++++++++++++++
 target/s390x/helper.c           |  1 +
 target/s390x/insn-data.def      |  6 ++++++
 target/s390x/internal.h         |  1 +
 target/s390x/translate.c        |  1 +
 target/s390x/translate_vx.inc.c | 28 ++++++++++++++++++++++++++++
 6 files changed, 54 insertions(+)

diff --git a/target/s390x/cc_helper.c b/target/s390x/cc_helper.c
index 0e467bf2b6..a00294f183 100644
--- a/target/s390x/cc_helper.c
+++ b/target/s390x/cc_helper.c
@@ -402,6 +402,20 @@ static uint32_t cc_calc_lcbb(uint64_t dst)
     return dst == 16 ? 0 : 3;
 }
 
+static uint32_t cc_calc_vc(uint64_t low, uint64_t high)
+{
+    if (high == -1ull && low == -1ull) {
+        /* all elements match */
+        return 0;
+    } else if (high == 0 && low == 0) {
+        /* no elements match */
+        return 3;
+    } else {
+        /* some elements but not all match */
+        return 1;
+    }
+}
+
 static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op,
                                   uint64_t src, uint64_t dst, uint64_t vr)
 {
@@ -514,6 +528,9 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op,
     case CC_OP_LCBB:
         r = cc_calc_lcbb(dst);
         break;
+    case CC_OP_VC:
+        r = cc_calc_vc(src, dst);
+        break;
 
     case CC_OP_NZ_F32:
         r = set_cc_nz_f32(dst);
diff --git a/target/s390x/helper.c b/target/s390x/helper.c
index 8e9573221c..946de15503 100644
--- a/target/s390x/helper.c
+++ b/target/s390x/helper.c
@@ -418,6 +418,7 @@ const char *cc_name(enum cc_op cc_op)
         [CC_OP_SLA_64]    = "CC_OP_SLA_64",
         [CC_OP_FLOGR]     = "CC_OP_FLOGR",
         [CC_OP_LCBB]      = "CC_OP_LCBB",
+        [CC_OP_VC]        = "CC_OP_VC",
     };
 
     return cc_names[cc_op];
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 52e398f515..1d159cb201 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1078,6 +1078,12 @@
     F(0xe7db, VEC,     VRR_a, V,   0, 0, 0, 0, vec, cmps64, IF_VEC)
 /* VECTOR ELEMENT COMPARE LOGICAL */
     F(0xe7d9, VECL,    VRR_a, V,   0, 0, 0, 0, vec, cmpu64, IF_VEC)
+/* VECTOR COMPARE EQUAL */
+    E(0xe7f8, VCEQ,    VRR_b, V,   0, 0, 0, 0, vc, 0, TCG_COND_EQ, IF_VEC)
+/* VECTOR COMPARE HIGH */
+    E(0xe7fb, VCH,     VRR_b, V,   0, 0, 0, 0, vc, 0, TCG_COND_GT, IF_VEC)
+/* VECTOR COMPARE HIGH LOGICAL */
+    E(0xe7f9, VCHL,    VRR_b, V,   0, 0, 0, 0, vc, 0, TCG_COND_GTU, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/internal.h b/target/s390x/internal.h
index 3b4855c175..00b00fece5 100644
--- a/target/s390x/internal.h
+++ b/target/s390x/internal.h
@@ -200,6 +200,7 @@ enum cc_op {
     CC_OP_SLA_64,               /* Calculate shift left signed (64bit) */
     CC_OP_FLOGR,                /* find leftmost one */
     CC_OP_LCBB,                 /* load count to block boundary */
+    CC_OP_VC,                   /* vector compare result */
     CC_OP_MAX
 };
 
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 0afa8f7ca5..a800aa9dc9 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -572,6 +572,7 @@ static void gen_op_calc_cc(DisasContext *s)
     case CC_OP_SLA_32:
     case CC_OP_SLA_64:
     case CC_OP_NZ_F128:
+    case CC_OP_VC:
         /* 2 arguments */
         gen_helper_calc_cc(cc_op, cpu_env, local_cc_op, cc_src, cc_dst, dummy);
         break;
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index c7462d1bb1..ceb805a406 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1347,3 +1347,31 @@ static DisasJumpType op_vec(DisasContext *s, DisasOps *o)
                          es | logical ? 0 : MO_SIGN);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vc(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    TCGCond cond = s->insn->data;
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tcg_gen_gvec_cmp(cond, es,
+                     vec_full_reg_offset(get_field(s->fields, v1)),
+                     vec_full_reg_offset(get_field(s->fields, v2)),
+                     vec_full_reg_offset(get_field(s->fields, v3)), 16, 16);
+    if (get_field(s->fields, m5) & 0x1) {
+        TCGv_i64 low = tcg_temp_new_i64();
+        TCGv_i64 high = tcg_temp_new_i64();
+
+        read_vec_element_i64(high, get_field(s->fields, v1), 0, ES_64);
+        read_vec_element_i64(low, get_field(s->fields, v1), 1, ES_64);
+        gen_op_update2_cc_i64(s, CC_OP_VC, low, high);
+
+        tcg_temp_free_i64(low);
+        tcg_temp_free_i64(high);
+    }
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 11/41] s390x/tcg: Implement VECTOR COMPARE *
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

To carry out the comparison, we can reuse the existing gvec comparison
function. In case the CC is to be computed, save the result vector
and compute the CC lazily. The result is a vector consisting of all 1's
for elements that matched and 0's for elements that didn't match.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/cc_helper.c        | 17 +++++++++++++++++
 target/s390x/helper.c           |  1 +
 target/s390x/insn-data.def      |  6 ++++++
 target/s390x/internal.h         |  1 +
 target/s390x/translate.c        |  1 +
 target/s390x/translate_vx.inc.c | 28 ++++++++++++++++++++++++++++
 6 files changed, 54 insertions(+)

diff --git a/target/s390x/cc_helper.c b/target/s390x/cc_helper.c
index 0e467bf2b6..a00294f183 100644
--- a/target/s390x/cc_helper.c
+++ b/target/s390x/cc_helper.c
@@ -402,6 +402,20 @@ static uint32_t cc_calc_lcbb(uint64_t dst)
     return dst == 16 ? 0 : 3;
 }
 
+static uint32_t cc_calc_vc(uint64_t low, uint64_t high)
+{
+    if (high == -1ull && low == -1ull) {
+        /* all elements match */
+        return 0;
+    } else if (high == 0 && low == 0) {
+        /* no elements match */
+        return 3;
+    } else {
+        /* some elements but not all match */
+        return 1;
+    }
+}
+
 static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op,
                                   uint64_t src, uint64_t dst, uint64_t vr)
 {
@@ -514,6 +528,9 @@ static uint32_t do_calc_cc(CPUS390XState *env, uint32_t cc_op,
     case CC_OP_LCBB:
         r = cc_calc_lcbb(dst);
         break;
+    case CC_OP_VC:
+        r = cc_calc_vc(src, dst);
+        break;
 
     case CC_OP_NZ_F32:
         r = set_cc_nz_f32(dst);
diff --git a/target/s390x/helper.c b/target/s390x/helper.c
index 8e9573221c..946de15503 100644
--- a/target/s390x/helper.c
+++ b/target/s390x/helper.c
@@ -418,6 +418,7 @@ const char *cc_name(enum cc_op cc_op)
         [CC_OP_SLA_64]    = "CC_OP_SLA_64",
         [CC_OP_FLOGR]     = "CC_OP_FLOGR",
         [CC_OP_LCBB]      = "CC_OP_LCBB",
+        [CC_OP_VC]        = "CC_OP_VC",
     };
 
     return cc_names[cc_op];
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 52e398f515..1d159cb201 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1078,6 +1078,12 @@
     F(0xe7db, VEC,     VRR_a, V,   0, 0, 0, 0, vec, cmps64, IF_VEC)
 /* VECTOR ELEMENT COMPARE LOGICAL */
     F(0xe7d9, VECL,    VRR_a, V,   0, 0, 0, 0, vec, cmpu64, IF_VEC)
+/* VECTOR COMPARE EQUAL */
+    E(0xe7f8, VCEQ,    VRR_b, V,   0, 0, 0, 0, vc, 0, TCG_COND_EQ, IF_VEC)
+/* VECTOR COMPARE HIGH */
+    E(0xe7fb, VCH,     VRR_b, V,   0, 0, 0, 0, vc, 0, TCG_COND_GT, IF_VEC)
+/* VECTOR COMPARE HIGH LOGICAL */
+    E(0xe7f9, VCHL,    VRR_b, V,   0, 0, 0, 0, vc, 0, TCG_COND_GTU, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/internal.h b/target/s390x/internal.h
index 3b4855c175..00b00fece5 100644
--- a/target/s390x/internal.h
+++ b/target/s390x/internal.h
@@ -200,6 +200,7 @@ enum cc_op {
     CC_OP_SLA_64,               /* Calculate shift left signed (64bit) */
     CC_OP_FLOGR,                /* find leftmost one */
     CC_OP_LCBB,                 /* load count to block boundary */
+    CC_OP_VC,                   /* vector compare result */
     CC_OP_MAX
 };
 
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 0afa8f7ca5..a800aa9dc9 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -572,6 +572,7 @@ static void gen_op_calc_cc(DisasContext *s)
     case CC_OP_SLA_32:
     case CC_OP_SLA_64:
     case CC_OP_NZ_F128:
+    case CC_OP_VC:
         /* 2 arguments */
         gen_helper_calc_cc(cc_op, cpu_env, local_cc_op, cc_src, cc_dst, dummy);
         break;
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index c7462d1bb1..ceb805a406 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1347,3 +1347,31 @@ static DisasJumpType op_vec(DisasContext *s, DisasOps *o)
                          es | logical ? 0 : MO_SIGN);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vc(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    TCGCond cond = s->insn->data;
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tcg_gen_gvec_cmp(cond, es,
+                     vec_full_reg_offset(get_field(s->fields, v1)),
+                     vec_full_reg_offset(get_field(s->fields, v2)),
+                     vec_full_reg_offset(get_field(s->fields, v3)), 16, 16);
+    if (get_field(s->fields, m5) & 0x1) {
+        TCGv_i64 low = tcg_temp_new_i64();
+        TCGv_i64 high = tcg_temp_new_i64();
+
+        read_vec_element_i64(high, get_field(s->fields, v1), 0, ES_64);
+        read_vec_element_i64(low, get_field(s->fields, v1), 1, ES_64);
+        gen_op_update2_cc_i64(s, CC_OP_VC, low, high);
+
+        tcg_temp_free_i64(low);
+        tcg_temp_free_i64(high);
+    }
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 12/41] s390x/tcg: Implement VECTOR COUNT LEADING ZEROS
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

For 8/16, use the 32 bit variant and properly subtract the added
leading zero bits.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  2 ++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 31 +++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 14 ++++++++++++++
 4 files changed, 49 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 04a3f5fb2e..e25e1467ae 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -152,6 +152,8 @@ DEF_HELPER_FLAGS_4(gvec_vavg8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vavg16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vavgl8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vavgl16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vclz8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vclz16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 1d159cb201..be3c07aafb 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1084,6 +1084,8 @@
     E(0xe7fb, VCH,     VRR_b, V,   0, 0, 0, 0, vc, 0, TCG_COND_GT, IF_VEC)
 /* VECTOR COMPARE HIGH LOGICAL */
     E(0xe7f9, VCHL,    VRR_b, V,   0, 0, 0, 0, vc, 0, TCG_COND_GTU, IF_VEC)
+/* VECTOR COUNT LEADING ZEROS */
+    F(0xe753, VCLZ,    VRR_a, V,   0, 0, 0, 0, vclz, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index ceb805a406..299924a7cc 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -182,6 +182,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
     tcg_temp_free_i64(tmp);
 }
 
+#define gen_gvec_2(v1, v2, gen) \
+    tcg_gen_gvec_2(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                   16, 16, gen)
 #define gen_gvec_3(v1, v2, v3, gen) \
     tcg_gen_gvec_3(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                    vec_full_reg_offset(v3), 16, 16, gen)
@@ -1375,3 +1378,31 @@ static DisasJumpType op_vc(DisasContext *s, DisasOps *o)
     }
     return DISAS_NEXT;
 }
+
+static void gen_clz_i32(TCGv_i32 d, TCGv_i32 a)
+{
+    tcg_gen_clzi_i32(d, a, 32);
+}
+
+static void gen_clz_i64(TCGv_i64 d, TCGv_i64 a)
+{
+    tcg_gen_clzi_i64(d, a, 64);
+}
+
+static DisasJumpType op_vclz(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m3);
+    static const GVecGen2 g[4] = {
+        { .fno = gen_helper_gvec_vclz8, },
+        { .fno = gen_helper_gvec_vclz16, },
+        { .fni4 = gen_clz_i32, },
+        { .fni8 = gen_clz_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index c2aeb12108..366962cfbe 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -93,3 +93,17 @@ void HELPER(gvec_vavgl##BITS)(void *v1, const void *v2, const void *v3,        \
 }
 DEF_VAVGL(8)
 DEF_VAVGL(16)
+
+#define DEF_VCLZ(BITS)                                                         \
+void HELPER(gvec_vclz##BITS)(void *v1, const void *v2, uint32_t desc)          \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, clz32(a) - 32 + BITS);             \
+    }                                                                          \
+}
+DEF_VCLZ(8)
+DEF_VCLZ(16)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 12/41] s390x/tcg: Implement VECTOR COUNT LEADING ZEROS
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

For 8/16, use the 32 bit variant and properly subtract the added
leading zero bits.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  2 ++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 31 +++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 14 ++++++++++++++
 4 files changed, 49 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 04a3f5fb2e..e25e1467ae 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -152,6 +152,8 @@ DEF_HELPER_FLAGS_4(gvec_vavg8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vavg16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vavgl8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vavgl16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vclz8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vclz16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 1d159cb201..be3c07aafb 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1084,6 +1084,8 @@
     E(0xe7fb, VCH,     VRR_b, V,   0, 0, 0, 0, vc, 0, TCG_COND_GT, IF_VEC)
 /* VECTOR COMPARE HIGH LOGICAL */
     E(0xe7f9, VCHL,    VRR_b, V,   0, 0, 0, 0, vc, 0, TCG_COND_GTU, IF_VEC)
+/* VECTOR COUNT LEADING ZEROS */
+    F(0xe753, VCLZ,    VRR_a, V,   0, 0, 0, 0, vclz, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index ceb805a406..299924a7cc 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -182,6 +182,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
     tcg_temp_free_i64(tmp);
 }
 
+#define gen_gvec_2(v1, v2, gen) \
+    tcg_gen_gvec_2(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                   16, 16, gen)
 #define gen_gvec_3(v1, v2, v3, gen) \
     tcg_gen_gvec_3(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                    vec_full_reg_offset(v3), 16, 16, gen)
@@ -1375,3 +1378,31 @@ static DisasJumpType op_vc(DisasContext *s, DisasOps *o)
     }
     return DISAS_NEXT;
 }
+
+static void gen_clz_i32(TCGv_i32 d, TCGv_i32 a)
+{
+    tcg_gen_clzi_i32(d, a, 32);
+}
+
+static void gen_clz_i64(TCGv_i64 d, TCGv_i64 a)
+{
+    tcg_gen_clzi_i64(d, a, 64);
+}
+
+static DisasJumpType op_vclz(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m3);
+    static const GVecGen2 g[4] = {
+        { .fno = gen_helper_gvec_vclz8, },
+        { .fno = gen_helper_gvec_vclz16, },
+        { .fni4 = gen_clz_i32, },
+        { .fni8 = gen_clz_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index c2aeb12108..366962cfbe 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -93,3 +93,17 @@ void HELPER(gvec_vavgl##BITS)(void *v1, const void *v2, const void *v3,        \
 }
 DEF_VAVGL(8)
 DEF_VAVGL(16)
+
+#define DEF_VCLZ(BITS)                                                         \
+void HELPER(gvec_vclz##BITS)(void *v1, const void *v2, uint32_t desc)          \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, clz32(a) - 32 + BITS);             \
+    }                                                                          \
+}
+DEF_VCLZ(8)
+DEF_VCLZ(16)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 13/41] s390x/tcg: Implement VECTOR COUNT TRAILING ZEROS
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Implement it similar to VECTOR COUNT LEADING ZEROS.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  2 ++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 28 ++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 14 ++++++++++++++
 4 files changed, 46 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index e25e1467ae..83e5070821 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -154,6 +154,8 @@ DEF_HELPER_FLAGS_4(gvec_vavgl8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vavgl16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vclz8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vclz16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vctz8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vctz16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index be3c07aafb..a355b7f62f 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1086,6 +1086,8 @@
     E(0xe7f9, VCHL,    VRR_b, V,   0, 0, 0, 0, vc, 0, TCG_COND_GTU, IF_VEC)
 /* VECTOR COUNT LEADING ZEROS */
     F(0xe753, VCLZ,    VRR_a, V,   0, 0, 0, 0, vclz, 0, IF_VEC)
+/* VECTOR COUNT TRAILING ZEROS */
+    F(0xe752, VCTZ,    VRR_a, V,   0, 0, 0, 0, vctz, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 299924a7cc..23d5870dc5 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1406,3 +1406,31 @@ static DisasJumpType op_vclz(DisasContext *s, DisasOps *o)
     gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
     return DISAS_NEXT;
 }
+
+static void gen_ctz_i32(TCGv_i32 d, TCGv_i32 a)
+{
+    tcg_gen_ctzi_i32(d, a, 32);
+}
+
+static void gen_ctz_i64(TCGv_i64 d, TCGv_i64 a)
+{
+    tcg_gen_ctzi_i64(d, a, 64);
+}
+
+static DisasJumpType op_vctz(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m3);
+    static const GVecGen2 g[4] = {
+        { .fno = gen_helper_gvec_vctz8, },
+        { .fno = gen_helper_gvec_vctz16, },
+        { .fni4 = gen_ctz_i32, },
+        { .fni8 = gen_ctz_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 366962cfbe..c589f92765 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -107,3 +107,17 @@ void HELPER(gvec_vclz##BITS)(void *v1, const void *v2, uint32_t desc)          \
 }
 DEF_VCLZ(8)
 DEF_VCLZ(16)
+
+#define DEF_VCTZ(BITS)                                                         \
+void HELPER(gvec_vctz##BITS)(void *v1, const void *v2, uint32_t desc)          \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, ctz32(a));                         \
+    }                                                                          \
+}
+DEF_VCTZ(8)
+DEF_VCTZ(16)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 13/41] s390x/tcg: Implement VECTOR COUNT TRAILING ZEROS
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Implement it similar to VECTOR COUNT LEADING ZEROS.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  2 ++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 28 ++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 14 ++++++++++++++
 4 files changed, 46 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index e25e1467ae..83e5070821 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -154,6 +154,8 @@ DEF_HELPER_FLAGS_4(gvec_vavgl8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vavgl16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vclz8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vclz16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vctz8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vctz16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index be3c07aafb..a355b7f62f 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1086,6 +1086,8 @@
     E(0xe7f9, VCHL,    VRR_b, V,   0, 0, 0, 0, vc, 0, TCG_COND_GTU, IF_VEC)
 /* VECTOR COUNT LEADING ZEROS */
     F(0xe753, VCLZ,    VRR_a, V,   0, 0, 0, 0, vclz, 0, IF_VEC)
+/* VECTOR COUNT TRAILING ZEROS */
+    F(0xe752, VCTZ,    VRR_a, V,   0, 0, 0, 0, vctz, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 299924a7cc..23d5870dc5 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1406,3 +1406,31 @@ static DisasJumpType op_vclz(DisasContext *s, DisasOps *o)
     gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
     return DISAS_NEXT;
 }
+
+static void gen_ctz_i32(TCGv_i32 d, TCGv_i32 a)
+{
+    tcg_gen_ctzi_i32(d, a, 32);
+}
+
+static void gen_ctz_i64(TCGv_i64 d, TCGv_i64 a)
+{
+    tcg_gen_ctzi_i64(d, a, 64);
+}
+
+static DisasJumpType op_vctz(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m3);
+    static const GVecGen2 g[4] = {
+        { .fno = gen_helper_gvec_vctz8, },
+        { .fno = gen_helper_gvec_vctz16, },
+        { .fni4 = gen_ctz_i32, },
+        { .fni8 = gen_ctz_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 366962cfbe..c589f92765 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -107,3 +107,17 @@ void HELPER(gvec_vclz##BITS)(void *v1, const void *v2, uint32_t desc)          \
 }
 DEF_VCLZ(8)
 DEF_VCLZ(16)
+
+#define DEF_VCTZ(BITS)                                                         \
+void HELPER(gvec_vctz##BITS)(void *v1, const void *v2, uint32_t desc)          \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, ctz32(a));                         \
+    }                                                                          \
+}
+DEF_VCTZ(8)
+DEF_VCTZ(16)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 14/41] s390x/tcg: Implement VECTOR EXCLUSIVE OR
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Easy, we can reuse an existing gvec helper.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      | 2 ++
 target/s390x/translate_vx.inc.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index a355b7f62f..b8400c191a 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1088,6 +1088,8 @@
     F(0xe753, VCLZ,    VRR_a, V,   0, 0, 0, 0, vclz, 0, IF_VEC)
 /* VECTOR COUNT TRAILING ZEROS */
     F(0xe752, VCTZ,    VRR_a, V,   0, 0, 0, 0, vctz, 0, IF_VEC)
+/* VECTOR EXCLUSIVE OR */
+    F(0xe76d, VX,      VRR_c, V,   0, 0, 0, 0, vx, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 23d5870dc5..85a0f20b90 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1434,3 +1434,10 @@ static DisasJumpType op_vctz(DisasContext *s, DisasOps *o)
     gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vx(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(xor, ES_8, get_field(s->fields, v1), get_field(s->fields, v2),
+                 get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 14/41] s390x/tcg: Implement VECTOR EXCLUSIVE OR
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Easy, we can reuse an existing gvec helper.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      | 2 ++
 target/s390x/translate_vx.inc.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index a355b7f62f..b8400c191a 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1088,6 +1088,8 @@
     F(0xe753, VCLZ,    VRR_a, V,   0, 0, 0, 0, vclz, 0, IF_VEC)
 /* VECTOR COUNT TRAILING ZEROS */
     F(0xe752, VCTZ,    VRR_a, V,   0, 0, 0, 0, vctz, 0, IF_VEC)
+/* VECTOR EXCLUSIVE OR */
+    F(0xe76d, VX,      VRR_c, V,   0, 0, 0, 0, vx, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 23d5870dc5..85a0f20b90 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1434,3 +1434,10 @@ static DisasJumpType op_vctz(DisasContext *s, DisasOps *o)
     gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vx(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(xor, ES_8, get_field(s->fields, v1), get_field(s->fields, v2),
+                 get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 15/41] s390x/tcg: Implement VECTOR GALOIS FIELD MULTIPLY SUM (AND ACCUMULATE)
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

A galois field multiplication in field 2 is like binary multiplication,
however instead of doing ordinary binary additions, xor's are performed.
So no carries are considered.

Implement all variants via helpers. s390_vec_sar() and s390_vec_shr()
will be reused later on.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |   8 ++
 target/s390x/insn-data.def      |   4 +
 target/s390x/translate_vx.inc.c |  38 ++++++++
 target/s390x/vec_int_helper.c   | 168 ++++++++++++++++++++++++++++++++
 4 files changed, 218 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 83e5070821..18a3df6b07 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -156,6 +156,14 @@ DEF_HELPER_FLAGS_3(gvec_vclz8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vclz16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vctz8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vctz16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vgfm8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vgfm16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vgfm32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vgfm64, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vgfma8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vgfma16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vgfma32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vgfma64, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index b8400c191a..add174b793 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1090,6 +1090,10 @@
     F(0xe752, VCTZ,    VRR_a, V,   0, 0, 0, 0, vctz, 0, IF_VEC)
 /* VECTOR EXCLUSIVE OR */
     F(0xe76d, VX,      VRR_c, V,   0, 0, 0, 0, vx, 0, IF_VEC)
+/* VECTOR GALOIS FIELD MULTIPLY SUM */
+    F(0xe7b4, VGFM,    VRR_c, V,   0, 0, 0, 0, vgfm, 0, IF_VEC)
+/* VECTOR GALOIS FIELD MULTIPLY SUM AND ACCUMULATE */
+    F(0xe7bc, VGFMA,   VRR_d, V,   0, 0, 0, 0, vgfma, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 85a0f20b90..fe36d027b6 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1441,3 +1441,41 @@ static DisasJumpType op_vx(DisasContext *s, DisasOps *o)
                  get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vgfm(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g[4] = {
+        { .fno = gen_helper_gvec_vgfm8, },
+        { .fno = gen_helper_gvec_vgfm16, },
+        { .fno = gen_helper_gvec_vgfm32, },
+        { .fno = gen_helper_gvec_vgfm64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), &g[es]);
+    return DISAS_NEXT;
+}
+
+static DisasJumpType op_vgfma(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m5);
+    static const GVecGen4 g[4] = {
+        { .fno = gen_helper_gvec_vgfma8, },
+        { .fno = gen_helper_gvec_vgfma16, },
+        { .fno = gen_helper_gvec_vgfma32, },
+        { .fno = gen_helper_gvec_vgfma64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_4(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), get_field(s->fields, v4), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index c589f92765..99dd0653f6 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -37,6 +37,60 @@ static bool s390_vec_add(S390Vector *d, const S390Vector *a,
     return high_carry;
 }
 
+static bool s390_vec_is_zero(const S390Vector *v)
+{
+    return !v->doubleword[0] && !v->doubleword[1];
+}
+
+static void s390_vec_xor(S390Vector *res, const S390Vector *a,
+                         const S390Vector *b)
+{
+    res->doubleword[0] = a->doubleword[0] ^ b->doubleword[0];
+    res->doubleword[1] = a->doubleword[1] ^ b->doubleword[1];
+}
+
+static void s390_vec_shl(S390Vector *d, const S390Vector *a, uint64_t count)
+{
+    uint64_t tmp;
+
+    g_assert(count < 128);
+    if (count == 0) {
+        d->doubleword[0] = a->doubleword[0];
+        d->doubleword[1] = a->doubleword[1];
+    } else if (count == 64) {
+        d->doubleword[0] = a->doubleword[1];
+        d->doubleword[1] = 0;
+    } else if (count < 64) {
+        tmp = extract64(a->doubleword[1], 64 - count, count);
+        d->doubleword[1] = a->doubleword[1] << count;
+        d->doubleword[0] = (a->doubleword[0] << count) | tmp;
+    } else {
+        d->doubleword[0] = a->doubleword[1] << (count - 64);
+        d->doubleword[1] = 0;
+    }
+}
+
+static void s390_vec_shr(S390Vector *d, const S390Vector *a, uint64_t count)
+{
+    uint64_t tmp;
+
+    g_assert(count < 128);
+    if (count == 0) {
+        d->doubleword[0] = a->doubleword[0];
+        d->doubleword[1] = a->doubleword[1];
+    } else if (count == 64) {
+        d->doubleword[1] = a->doubleword[0];
+        d->doubleword[0] = 0;
+    } else if (count < 64) {
+        tmp = a->doubleword[1] >> count;
+        d->doubleword[1] = deposit64(tmp, 64 - count, count, a->doubleword[0]);
+        d->doubleword[0] = a->doubleword[0] >> count;
+    } else {
+        d->doubleword[1] = a->doubleword[0] >> (count - 64);
+        d->doubleword[0] = 0;
+    }
+}
+
 void HELPER(gvec_vacc128)(void *v1, const void *v2, const void *v3,
                           uint32_t desc)
 {
@@ -121,3 +175,117 @@ void HELPER(gvec_vctz##BITS)(void *v1, const void *v2, uint32_t desc)          \
 }
 DEF_VCTZ(8)
 DEF_VCTZ(16)
+
+/* like binary multiplication, but XOR instead of addition */
+#define DEF_GALOIS_MULTIPLY(BITS, TBITS)                                       \
+static uint##TBITS##_t galois_multiply##BITS(uint##TBITS##_t a,                \
+                                             uint##TBITS##_t b)                \
+{                                                                              \
+    uint##TBITS##_t res = 0;                                                   \
+                                                                               \
+    while (b) {                                                                \
+        if (b & 0x1) {                                                         \
+            res = res ^ a;                                                     \
+        }                                                                      \
+        a = a << 1;                                                            \
+        b = b >> 1;                                                            \
+    }                                                                          \
+    return res;                                                                \
+}
+DEF_GALOIS_MULTIPLY(8, 16)
+DEF_GALOIS_MULTIPLY(16, 32)
+DEF_GALOIS_MULTIPLY(32, 64)
+
+static S390Vector galois_multiply64(uint64_t a, uint64_t b)
+{
+    S390Vector res = {};
+    S390Vector va = {
+        .doubleword[1] = a,
+    };
+    S390Vector vb = {
+        .doubleword[1] = b,
+    };
+
+    while (!s390_vec_is_zero(&vb)) {
+        if (vb.doubleword[1] & 0x1) {
+            s390_vec_xor(&res, &res, &va);
+        }
+        s390_vec_shl(&va, &va, 1);
+        s390_vec_shr(&vb, &vb, 1);
+    }
+    return res;
+}
+
+#define DEF_VGFM(BITS, TBITS)                                                  \
+void HELPER(gvec_vgfm##BITS)(void *v1, const void *v2, const void *v3,         \
+                             uint32_t desc)                                    \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / TBITS); i++) {                                      \
+        uint##BITS##_t a = s390_vec_read_element##BITS(v2, i * 2);             \
+        uint##BITS##_t b = s390_vec_read_element##BITS(v3, i * 2);             \
+        uint##TBITS##_t d = galois_multiply##BITS(a, b);                       \
+                                                                               \
+        a = s390_vec_read_element##BITS(v2, i * 2 + 1);                        \
+        b = s390_vec_read_element##BITS(v3, i * 2 + 1);                        \
+        d = d ^ galois_multiply32(a, b);                                       \
+        s390_vec_write_element##TBITS(v1, i, d);                               \
+    }                                                                          \
+}
+DEF_VGFM(8, 16)
+DEF_VGFM(16, 32)
+DEF_VGFM(32, 64)
+
+void HELPER(gvec_vgfm64)(void *v1, const void *v2, const void *v3,
+                         uint32_t desc)
+{
+    S390Vector tmp1, tmp2;
+    uint64_t a, b;
+
+    a = s390_vec_read_element64(v2, 0);
+    b = s390_vec_read_element64(v3, 0);
+    tmp1 = galois_multiply64(a, b);
+    a = s390_vec_read_element64(v2, 1);
+    b = s390_vec_read_element64(v3, 1);
+    tmp2 = galois_multiply64(a, b);
+    s390_vec_xor(v1, &tmp1, &tmp2);
+}
+
+#define DEF_VGFMA(BITS, TBITS)                                                 \
+void HELPER(gvec_vgfma##BITS)(void *v1, const void *v2, const void *v3,        \
+                              const void *v4, uint32_t desc)                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / TBITS); i++) {                                      \
+        uint##BITS##_t a = s390_vec_read_element##BITS(v2, i * 2);             \
+        uint##BITS##_t b = s390_vec_read_element##BITS(v3, i * 2);             \
+        uint##TBITS##_t d = galois_multiply##BITS(a, b);                       \
+                                                                               \
+        a = s390_vec_read_element##BITS(v2, i * 2 + 1);                        \
+        b = s390_vec_read_element##BITS(v3, i * 2 + 1);                        \
+        d = d ^ galois_multiply32(a, b);                                       \
+        d = d ^ s390_vec_read_element##TBITS(v4, i);                           \
+        s390_vec_write_element##TBITS(v1, i, d);                               \
+    }                                                                          \
+}
+DEF_VGFMA(8, 16)
+DEF_VGFMA(16, 32)
+DEF_VGFMA(32, 64)
+
+void HELPER(gvec_vgfma64)(void *v1, const void *v2, const void *v3,
+                          const void *v4, uint32_t desc)
+{
+    S390Vector tmp1, tmp2;
+    uint64_t a, b;
+
+    a = s390_vec_read_element64(v2, 0);
+    b = s390_vec_read_element64(v3, 0);
+    tmp1 = galois_multiply64(a, b);
+    a = s390_vec_read_element64(v2, 1);
+    b = s390_vec_read_element64(v3, 1);
+    tmp2 = galois_multiply64(a, b);
+    s390_vec_xor(&tmp1, &tmp1, &tmp2);
+    s390_vec_xor(v1, &tmp1, v4);
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 15/41] s390x/tcg: Implement VECTOR GALOIS FIELD MULTIPLY SUM (AND ACCUMULATE)
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

A galois field multiplication in field 2 is like binary multiplication,
however instead of doing ordinary binary additions, xor's are performed.
So no carries are considered.

Implement all variants via helpers. s390_vec_sar() and s390_vec_shr()
will be reused later on.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |   8 ++
 target/s390x/insn-data.def      |   4 +
 target/s390x/translate_vx.inc.c |  38 ++++++++
 target/s390x/vec_int_helper.c   | 168 ++++++++++++++++++++++++++++++++
 4 files changed, 218 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 83e5070821..18a3df6b07 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -156,6 +156,14 @@ DEF_HELPER_FLAGS_3(gvec_vclz8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vclz16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vctz8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vctz16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vgfm8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vgfm16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vgfm32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vgfm64, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vgfma8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vgfma16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vgfma32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vgfma64, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index b8400c191a..add174b793 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1090,6 +1090,10 @@
     F(0xe752, VCTZ,    VRR_a, V,   0, 0, 0, 0, vctz, 0, IF_VEC)
 /* VECTOR EXCLUSIVE OR */
     F(0xe76d, VX,      VRR_c, V,   0, 0, 0, 0, vx, 0, IF_VEC)
+/* VECTOR GALOIS FIELD MULTIPLY SUM */
+    F(0xe7b4, VGFM,    VRR_c, V,   0, 0, 0, 0, vgfm, 0, IF_VEC)
+/* VECTOR GALOIS FIELD MULTIPLY SUM AND ACCUMULATE */
+    F(0xe7bc, VGFMA,   VRR_d, V,   0, 0, 0, 0, vgfma, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 85a0f20b90..fe36d027b6 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1441,3 +1441,41 @@ static DisasJumpType op_vx(DisasContext *s, DisasOps *o)
                  get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vgfm(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g[4] = {
+        { .fno = gen_helper_gvec_vgfm8, },
+        { .fno = gen_helper_gvec_vgfm16, },
+        { .fno = gen_helper_gvec_vgfm32, },
+        { .fno = gen_helper_gvec_vgfm64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), &g[es]);
+    return DISAS_NEXT;
+}
+
+static DisasJumpType op_vgfma(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m5);
+    static const GVecGen4 g[4] = {
+        { .fno = gen_helper_gvec_vgfma8, },
+        { .fno = gen_helper_gvec_vgfma16, },
+        { .fno = gen_helper_gvec_vgfma32, },
+        { .fno = gen_helper_gvec_vgfma64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_4(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), get_field(s->fields, v4), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index c589f92765..99dd0653f6 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -37,6 +37,60 @@ static bool s390_vec_add(S390Vector *d, const S390Vector *a,
     return high_carry;
 }
 
+static bool s390_vec_is_zero(const S390Vector *v)
+{
+    return !v->doubleword[0] && !v->doubleword[1];
+}
+
+static void s390_vec_xor(S390Vector *res, const S390Vector *a,
+                         const S390Vector *b)
+{
+    res->doubleword[0] = a->doubleword[0] ^ b->doubleword[0];
+    res->doubleword[1] = a->doubleword[1] ^ b->doubleword[1];
+}
+
+static void s390_vec_shl(S390Vector *d, const S390Vector *a, uint64_t count)
+{
+    uint64_t tmp;
+
+    g_assert(count < 128);
+    if (count == 0) {
+        d->doubleword[0] = a->doubleword[0];
+        d->doubleword[1] = a->doubleword[1];
+    } else if (count == 64) {
+        d->doubleword[0] = a->doubleword[1];
+        d->doubleword[1] = 0;
+    } else if (count < 64) {
+        tmp = extract64(a->doubleword[1], 64 - count, count);
+        d->doubleword[1] = a->doubleword[1] << count;
+        d->doubleword[0] = (a->doubleword[0] << count) | tmp;
+    } else {
+        d->doubleword[0] = a->doubleword[1] << (count - 64);
+        d->doubleword[1] = 0;
+    }
+}
+
+static void s390_vec_shr(S390Vector *d, const S390Vector *a, uint64_t count)
+{
+    uint64_t tmp;
+
+    g_assert(count < 128);
+    if (count == 0) {
+        d->doubleword[0] = a->doubleword[0];
+        d->doubleword[1] = a->doubleword[1];
+    } else if (count == 64) {
+        d->doubleword[1] = a->doubleword[0];
+        d->doubleword[0] = 0;
+    } else if (count < 64) {
+        tmp = a->doubleword[1] >> count;
+        d->doubleword[1] = deposit64(tmp, 64 - count, count, a->doubleword[0]);
+        d->doubleword[0] = a->doubleword[0] >> count;
+    } else {
+        d->doubleword[1] = a->doubleword[0] >> (count - 64);
+        d->doubleword[0] = 0;
+    }
+}
+
 void HELPER(gvec_vacc128)(void *v1, const void *v2, const void *v3,
                           uint32_t desc)
 {
@@ -121,3 +175,117 @@ void HELPER(gvec_vctz##BITS)(void *v1, const void *v2, uint32_t desc)          \
 }
 DEF_VCTZ(8)
 DEF_VCTZ(16)
+
+/* like binary multiplication, but XOR instead of addition */
+#define DEF_GALOIS_MULTIPLY(BITS, TBITS)                                       \
+static uint##TBITS##_t galois_multiply##BITS(uint##TBITS##_t a,                \
+                                             uint##TBITS##_t b)                \
+{                                                                              \
+    uint##TBITS##_t res = 0;                                                   \
+                                                                               \
+    while (b) {                                                                \
+        if (b & 0x1) {                                                         \
+            res = res ^ a;                                                     \
+        }                                                                      \
+        a = a << 1;                                                            \
+        b = b >> 1;                                                            \
+    }                                                                          \
+    return res;                                                                \
+}
+DEF_GALOIS_MULTIPLY(8, 16)
+DEF_GALOIS_MULTIPLY(16, 32)
+DEF_GALOIS_MULTIPLY(32, 64)
+
+static S390Vector galois_multiply64(uint64_t a, uint64_t b)
+{
+    S390Vector res = {};
+    S390Vector va = {
+        .doubleword[1] = a,
+    };
+    S390Vector vb = {
+        .doubleword[1] = b,
+    };
+
+    while (!s390_vec_is_zero(&vb)) {
+        if (vb.doubleword[1] & 0x1) {
+            s390_vec_xor(&res, &res, &va);
+        }
+        s390_vec_shl(&va, &va, 1);
+        s390_vec_shr(&vb, &vb, 1);
+    }
+    return res;
+}
+
+#define DEF_VGFM(BITS, TBITS)                                                  \
+void HELPER(gvec_vgfm##BITS)(void *v1, const void *v2, const void *v3,         \
+                             uint32_t desc)                                    \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / TBITS); i++) {                                      \
+        uint##BITS##_t a = s390_vec_read_element##BITS(v2, i * 2);             \
+        uint##BITS##_t b = s390_vec_read_element##BITS(v3, i * 2);             \
+        uint##TBITS##_t d = galois_multiply##BITS(a, b);                       \
+                                                                               \
+        a = s390_vec_read_element##BITS(v2, i * 2 + 1);                        \
+        b = s390_vec_read_element##BITS(v3, i * 2 + 1);                        \
+        d = d ^ galois_multiply32(a, b);                                       \
+        s390_vec_write_element##TBITS(v1, i, d);                               \
+    }                                                                          \
+}
+DEF_VGFM(8, 16)
+DEF_VGFM(16, 32)
+DEF_VGFM(32, 64)
+
+void HELPER(gvec_vgfm64)(void *v1, const void *v2, const void *v3,
+                         uint32_t desc)
+{
+    S390Vector tmp1, tmp2;
+    uint64_t a, b;
+
+    a = s390_vec_read_element64(v2, 0);
+    b = s390_vec_read_element64(v3, 0);
+    tmp1 = galois_multiply64(a, b);
+    a = s390_vec_read_element64(v2, 1);
+    b = s390_vec_read_element64(v3, 1);
+    tmp2 = galois_multiply64(a, b);
+    s390_vec_xor(v1, &tmp1, &tmp2);
+}
+
+#define DEF_VGFMA(BITS, TBITS)                                                 \
+void HELPER(gvec_vgfma##BITS)(void *v1, const void *v2, const void *v3,        \
+                              const void *v4, uint32_t desc)                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / TBITS); i++) {                                      \
+        uint##BITS##_t a = s390_vec_read_element##BITS(v2, i * 2);             \
+        uint##BITS##_t b = s390_vec_read_element##BITS(v3, i * 2);             \
+        uint##TBITS##_t d = galois_multiply##BITS(a, b);                       \
+                                                                               \
+        a = s390_vec_read_element##BITS(v2, i * 2 + 1);                        \
+        b = s390_vec_read_element##BITS(v3, i * 2 + 1);                        \
+        d = d ^ galois_multiply32(a, b);                                       \
+        d = d ^ s390_vec_read_element##TBITS(v4, i);                           \
+        s390_vec_write_element##TBITS(v1, i, d);                               \
+    }                                                                          \
+}
+DEF_VGFMA(8, 16)
+DEF_VGFMA(16, 32)
+DEF_VGFMA(32, 64)
+
+void HELPER(gvec_vgfma64)(void *v1, const void *v2, const void *v3,
+                          const void *v4, uint32_t desc)
+{
+    S390Vector tmp1, tmp2;
+    uint64_t a, b;
+
+    a = s390_vec_read_element64(v2, 0);
+    b = s390_vec_read_element64(v3, 0);
+    tmp1 = galois_multiply64(a, b);
+    a = s390_vec_read_element64(v2, 1);
+    b = s390_vec_read_element64(v3, 1);
+    tmp2 = galois_multiply64(a, b);
+    s390_vec_xor(&tmp1, &tmp1, &tmp2);
+    s390_vec_xor(v1, &tmp1, v4);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 16/41] s390x/tcg: Implement VECTOR LOAD COMPLEMENT
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

We can reuse an existing gvec helper for negating the values.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index add174b793..07868ff082 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1094,6 +1094,8 @@
     F(0xe7b4, VGFM,    VRR_c, V,   0, 0, 0, 0, vgfm, 0, IF_VEC)
 /* VECTOR GALOIS FIELD MULTIPLY SUM AND ACCUMULATE */
     F(0xe7bc, VGFMA,   VRR_d, V,   0, 0, 0, 0, vgfma, 0, IF_VEC)
+/* VECTOR LOAD COMPLEMENT */
+    F(0xe7de, VLC,     VRR_a, V,   0, 0, 0, 0, vlc, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index fe36d027b6..28436cb01a 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -209,6 +209,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
                      16)
 #define gen_gvec_dup64i(v1, c) \
     tcg_gen_gvec_dup64i(vec_full_reg_offset(v1), 16, 16, c)
+#define gen_gvec_fn_2(fn, es, v1, v2) \
+    tcg_gen_gvec_##fn(es, vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                      16, 16)
 #define gen_gvec_fn_3(fn, es, v1, v2, v3) \
     tcg_gen_gvec_##fn(es, vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                       vec_full_reg_offset(v3), 16, 16)
@@ -1479,3 +1482,17 @@ static DisasJumpType op_vgfma(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), get_field(s->fields, v4), &g[es]);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vlc(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m3);
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec_fn_2(neg, ES_8, get_field(s->fields, v1),
+                  get_field(s->fields, v2));
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 16/41] s390x/tcg: Implement VECTOR LOAD COMPLEMENT
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

We can reuse an existing gvec helper for negating the values.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index add174b793..07868ff082 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1094,6 +1094,8 @@
     F(0xe7b4, VGFM,    VRR_c, V,   0, 0, 0, 0, vgfm, 0, IF_VEC)
 /* VECTOR GALOIS FIELD MULTIPLY SUM AND ACCUMULATE */
     F(0xe7bc, VGFMA,   VRR_d, V,   0, 0, 0, 0, vgfma, 0, IF_VEC)
+/* VECTOR LOAD COMPLEMENT */
+    F(0xe7de, VLC,     VRR_a, V,   0, 0, 0, 0, vlc, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index fe36d027b6..28436cb01a 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -209,6 +209,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
                      16)
 #define gen_gvec_dup64i(v1, c) \
     tcg_gen_gvec_dup64i(vec_full_reg_offset(v1), 16, 16, c)
+#define gen_gvec_fn_2(fn, es, v1, v2) \
+    tcg_gen_gvec_##fn(es, vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                      16, 16)
 #define gen_gvec_fn_3(fn, es, v1, v2, v3) \
     tcg_gen_gvec_##fn(es, vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                       vec_full_reg_offset(v3), 16, 16)
@@ -1479,3 +1482,17 @@ static DisasJumpType op_vgfma(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), get_field(s->fields, v4), &g[es]);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vlc(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m3);
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec_fn_2(neg, ES_8, get_field(s->fields, v1),
+                  get_field(s->fields, v2));
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 17/41] s390x/tcg: Implement VECTOR LOAD POSITIVE
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Similar to VECTOR LOAD COMPLEMENT but unfortunately we don't have a
gvec helper.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  2 ++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 40 +++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 14 ++++++++++++
 4 files changed, 58 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 18a3df6b07..065c4c6ea3 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -164,6 +164,8 @@ DEF_HELPER_FLAGS_5(gvec_vgfma8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i3
 DEF_HELPER_FLAGS_5(gvec_vgfma16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vgfma32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vgfma64, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vlp8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vlp16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 07868ff082..fc8886ff42 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1096,6 +1096,8 @@
     F(0xe7bc, VGFMA,   VRR_d, V,   0, 0, 0, 0, vgfma, 0, IF_VEC)
 /* VECTOR LOAD COMPLEMENT */
     F(0xe7de, VLC,     VRR_a, V,   0, 0, 0, 0, vlc, 0, IF_VEC)
+/* VECTOR LOAD POSITIVE */
+    F(0xe7df, VLP,     VRR_a, V,   0, 0, 0, 0, vlp, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 28436cb01a..8cab3f876a 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1496,3 +1496,43 @@ static DisasJumpType op_vlc(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v2));
     return DISAS_NEXT;
 }
+
+static void gen_lp_i32(TCGv_i32 d, TCGv_i32 a)
+{
+    TCGv_i32 zero = tcg_const_i32(0);
+    TCGv_i32 neg = tcg_temp_new_i32();
+
+    tcg_gen_neg_i32(neg, a);
+    tcg_gen_movcond_i32(TCG_COND_LT, d, a, zero, neg, a);
+    tcg_temp_free_i32(neg);
+    tcg_temp_free_i32(zero);
+}
+
+static void gen_lp_i64(TCGv_i64 d, TCGv_i64 a)
+{
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 neg = tcg_temp_new_i64();
+
+    tcg_gen_neg_i64(neg, a);
+    tcg_gen_movcond_i64(TCG_COND_LT, d, a, zero, neg, a);
+    tcg_temp_free_i64(neg);
+    tcg_temp_free_i64(zero);
+}
+
+static DisasJumpType op_vlp(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m3);
+    static const GVecGen2 g[4] = {
+        { .fno = gen_helper_gvec_vlp8, },
+        { .fno = gen_helper_gvec_vlp16, },
+        { .fni4 = gen_lp_i32, },
+        { .fni8 = gen_lp_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 99dd0653f6..574f707abf 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -289,3 +289,17 @@ void HELPER(gvec_vgfma64)(void *v1, const void *v2, const void *v3,
     s390_vec_xor(&tmp1, &tmp1, &tmp2);
     s390_vec_xor(v1, &tmp1, v4);
 }
+
+#define DEF_VLP(BITS)                                                          \
+void HELPER(gvec_vlp##BITS)(void *v1, const void *v2, uint32_t desc)           \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const int##BITS##_t a = s390_vec_read_element##BITS(v2, i);            \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a < 0 ? -a : a);                   \
+    }                                                                          \
+}
+DEF_VLP(8)
+DEF_VLP(16)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 17/41] s390x/tcg: Implement VECTOR LOAD POSITIVE
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Similar to VECTOR LOAD COMPLEMENT but unfortunately we don't have a
gvec helper.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  2 ++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 40 +++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 14 ++++++++++++
 4 files changed, 58 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 18a3df6b07..065c4c6ea3 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -164,6 +164,8 @@ DEF_HELPER_FLAGS_5(gvec_vgfma8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i3
 DEF_HELPER_FLAGS_5(gvec_vgfma16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vgfma32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vgfma64, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vlp8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vlp16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 07868ff082..fc8886ff42 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1096,6 +1096,8 @@
     F(0xe7bc, VGFMA,   VRR_d, V,   0, 0, 0, 0, vgfma, 0, IF_VEC)
 /* VECTOR LOAD COMPLEMENT */
     F(0xe7de, VLC,     VRR_a, V,   0, 0, 0, 0, vlc, 0, IF_VEC)
+/* VECTOR LOAD POSITIVE */
+    F(0xe7df, VLP,     VRR_a, V,   0, 0, 0, 0, vlp, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 28436cb01a..8cab3f876a 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1496,3 +1496,43 @@ static DisasJumpType op_vlc(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v2));
     return DISAS_NEXT;
 }
+
+static void gen_lp_i32(TCGv_i32 d, TCGv_i32 a)
+{
+    TCGv_i32 zero = tcg_const_i32(0);
+    TCGv_i32 neg = tcg_temp_new_i32();
+
+    tcg_gen_neg_i32(neg, a);
+    tcg_gen_movcond_i32(TCG_COND_LT, d, a, zero, neg, a);
+    tcg_temp_free_i32(neg);
+    tcg_temp_free_i32(zero);
+}
+
+static void gen_lp_i64(TCGv_i64 d, TCGv_i64 a)
+{
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 neg = tcg_temp_new_i64();
+
+    tcg_gen_neg_i64(neg, a);
+    tcg_gen_movcond_i64(TCG_COND_LT, d, a, zero, neg, a);
+    tcg_temp_free_i64(neg);
+    tcg_temp_free_i64(zero);
+}
+
+static DisasJumpType op_vlp(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m3);
+    static const GVecGen2 g[4] = {
+        { .fno = gen_helper_gvec_vlp8, },
+        { .fno = gen_helper_gvec_vlp16, },
+        { .fni4 = gen_lp_i32, },
+        { .fni8 = gen_lp_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 99dd0653f6..574f707abf 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -289,3 +289,17 @@ void HELPER(gvec_vgfma64)(void *v1, const void *v2, const void *v3,
     s390_vec_xor(&tmp1, &tmp1, &tmp2);
     s390_vec_xor(v1, &tmp1, v4);
 }
+
+#define DEF_VLP(BITS)                                                          \
+void HELPER(gvec_vlp##BITS)(void *v1, const void *v2, uint32_t desc)           \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const int##BITS##_t a = s390_vec_read_element##BITS(v2, i);            \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a < 0 ? -a : a);                   \
+    }                                                                          \
+}
+DEF_VLP(8)
+DEF_VLP(16)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 18/41] s390x/tcg: Implement VECTOR (MAXIMUM|MINIMUM) (LOGICAL)
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Luckily, we already have gvec helpers for all four cases.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  8 ++++++++
 target/s390x/translate_vx.inc.c | 31 +++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index fc8886ff42..b22d9f0f6a 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1098,6 +1098,14 @@
     F(0xe7de, VLC,     VRR_a, V,   0, 0, 0, 0, vlc, 0, IF_VEC)
 /* VECTOR LOAD POSITIVE */
     F(0xe7df, VLP,     VRR_a, V,   0, 0, 0, 0, vlp, 0, IF_VEC)
+/* VECTOR MAXIMUM */
+    F(0xe7ff, VMX,     VRR_c, V,   0, 0, 0, 0, vmx, 0, IF_VEC)
+/* VECTOR MAXIMUM LOGICAL */
+    F(0xe7fd, VMXL,    VRR_c, V,   0, 0, 0, 0, vmx, 0, IF_VEC)
+/* VECTOR MINIMUM */
+    F(0xe7fe, VMN,     VRR_c, V,   0, 0, 0, 0, vmx, 0, IF_VEC)
+/* VECTOR MINIMUM LOGICAL */
+    F(0xe7fc, VMNL,    VRR_c, V,   0, 0, 0, 0, vmx, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 8cab3f876a..aae1ff107a 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1536,3 +1536,34 @@ static DisasJumpType op_vlp(DisasContext *s, DisasOps *o)
     gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vmx(DisasContext *s, DisasOps *o)
+{
+    const uint8_t v1 = get_field(s->fields, v1);
+    const uint8_t v2 = get_field(s->fields, v2);
+    const uint8_t v3 = get_field(s->fields, v3);
+    const uint8_t es = get_field(s->fields, m4);
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    switch (s->fields->op2) {
+    case 0xff:
+        gen_gvec_fn_3(smax, es, v1, v2, v3);
+        break;
+    case 0xfd:
+        gen_gvec_fn_3(umax, es, v1, v2, v3);
+        break;
+    case 0xfe:
+        gen_gvec_fn_3(smin, es, v1, v2, v3);
+        break;
+    case 0xfc:
+        gen_gvec_fn_3(umin, es, v1, v2, v3);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 18/41] s390x/tcg: Implement VECTOR (MAXIMUM|MINIMUM) (LOGICAL)
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Luckily, we already have gvec helpers for all four cases.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  8 ++++++++
 target/s390x/translate_vx.inc.c | 31 +++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index fc8886ff42..b22d9f0f6a 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1098,6 +1098,14 @@
     F(0xe7de, VLC,     VRR_a, V,   0, 0, 0, 0, vlc, 0, IF_VEC)
 /* VECTOR LOAD POSITIVE */
     F(0xe7df, VLP,     VRR_a, V,   0, 0, 0, 0, vlp, 0, IF_VEC)
+/* VECTOR MAXIMUM */
+    F(0xe7ff, VMX,     VRR_c, V,   0, 0, 0, 0, vmx, 0, IF_VEC)
+/* VECTOR MAXIMUM LOGICAL */
+    F(0xe7fd, VMXL,    VRR_c, V,   0, 0, 0, 0, vmx, 0, IF_VEC)
+/* VECTOR MINIMUM */
+    F(0xe7fe, VMN,     VRR_c, V,   0, 0, 0, 0, vmx, 0, IF_VEC)
+/* VECTOR MINIMUM LOGICAL */
+    F(0xe7fc, VMNL,    VRR_c, V,   0, 0, 0, 0, vmx, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 8cab3f876a..aae1ff107a 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1536,3 +1536,34 @@ static DisasJumpType op_vlp(DisasContext *s, DisasOps *o)
     gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vmx(DisasContext *s, DisasOps *o)
+{
+    const uint8_t v1 = get_field(s->fields, v1);
+    const uint8_t v2 = get_field(s->fields, v2);
+    const uint8_t v3 = get_field(s->fields, v3);
+    const uint8_t es = get_field(s->fields, m4);
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    switch (s->fields->op2) {
+    case 0xff:
+        gen_gvec_fn_3(smax, es, v1, v2, v3);
+        break;
+    case 0xfd:
+        gen_gvec_fn_3(umax, es, v1, v2, v3);
+        break;
+    case 0xfe:
+        gen_gvec_fn_3(smin, es, v1, v2, v3);
+        break;
+    case 0xfc:
+        gen_gvec_fn_3(umin, es, v1, v2, v3);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 19/41] s390x/tcg: Implement VECTOR MULTIPLY AND ADD *
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Quite some variants to handle. At least handle some 32-bit element
variants via gvec expansion (we could also handle 16/32-bit variants
for ODD and EVEN easily via gvec expansion, but let's keep it simple
for now).

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  18 +++++
 target/s390x/insn-data.def      |  14 ++++
 target/s390x/translate_vx.inc.c | 122 +++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 123 ++++++++++++++++++++++++++++++++
 4 files changed, 277 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 065c4c6ea3..b73a35107e 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -166,6 +166,24 @@ DEF_HELPER_FLAGS_5(gvec_vgfma32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i
 DEF_HELPER_FLAGS_5(gvec_vgfma64, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vlp8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vlp16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmal8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmal16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmah8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmah16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmalh8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmalh16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmae8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmae16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmae32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmale8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmale16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmale32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmao8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmao16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmao32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmalo8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmalo16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmalo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index b22d9f0f6a..7ccec0544f 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1106,6 +1106,20 @@
     F(0xe7fe, VMN,     VRR_c, V,   0, 0, 0, 0, vmx, 0, IF_VEC)
 /* VECTOR MINIMUM LOGICAL */
     F(0xe7fc, VMNL,    VRR_c, V,   0, 0, 0, 0, vmx, 0, IF_VEC)
+/* VECTOR MULTIPLY AND ADD LOW */
+    F(0xe7aa, VMAL,    VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
+/* VECTOR MULTIPLY AND ADD HIGH */
+    F(0xe7ab, VMAH,    VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
+/* VECTOR MULTIPLY AND ADD LOGICAL HIGH */
+    F(0xe7a9, VMALH,   VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
+/* VECTOR MULTIPLY AND ADD EVEN */
+    F(0xe7ae, VMAE,    VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
+/* VECTOR MULTIPLY AND ADD LOGICAL EVEN */
+    F(0xe7ac, VMALE,   VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
+/* VECTOR MULTIPLY AND ADD ODD */
+    F(0xe7af, VMAO,    VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
+/* VECTOR MULTIPLY AND ADD LOGICAL ODD */
+    F(0xe7ad, VMALO,   VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index aae1ff107a..4967af6a07 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1567,3 +1567,125 @@ static DisasJumpType op_vmx(DisasContext *s, DisasOps *o)
     }
     return DISAS_NEXT;
 }
+
+static void gen_mal_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, TCGv_i32 c)
+{
+    TCGv_i32 t0 = tcg_temp_new_i32();
+
+    tcg_gen_mul_i32(t0, a, b);
+    tcg_gen_add_i32(d, t0, c);
+
+    tcg_temp_free_i32(t0);
+}
+
+static void gen_mah_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, TCGv_i32 c)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+
+    tcg_gen_ext_i32_i64(t0, a);
+    tcg_gen_ext_i32_i64(t1, b);
+    tcg_gen_ext_i32_i64(t2, c);
+    tcg_gen_mul_i64(t0, t0, t1);
+    tcg_gen_add_i64(t0, t0, t2);
+    tcg_gen_extrh_i64_i32(d, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+}
+
+static void gen_malh_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, TCGv_i32 c)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+
+    tcg_gen_extu_i32_i64(t0, a);
+    tcg_gen_extu_i32_i64(t1, b);
+    tcg_gen_extu_i32_i64(t2, c);
+    tcg_gen_mul_i64(t0, t0, t1);
+    tcg_gen_add_i64(t0, t0, t2);
+    tcg_gen_extrh_i64_i32(d, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+}
+
+static DisasJumpType op_vma(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m5);
+    static const GVecGen4 g_vmal[3] = {
+        { .fno = gen_helper_gvec_vmal8, },
+        { .fno = gen_helper_gvec_vmal16, },
+        { .fni4 = gen_mal_i32, },
+    };
+    static const GVecGen4 g_vmah[3] = {
+        { .fno = gen_helper_gvec_vmah8, },
+        { .fno = gen_helper_gvec_vmah16, },
+        { .fni4 = gen_mah_i32, },
+    };
+    static const GVecGen4 g_vmalh[3] = {
+        { .fno = gen_helper_gvec_vmalh8, },
+        { .fno = gen_helper_gvec_vmalh16, },
+        { .fni4 = gen_malh_i32, },
+    };
+    static const GVecGen4 g_vmae[3] = {
+        { .fno = gen_helper_gvec_vmae8, },
+        { .fno = gen_helper_gvec_vmae16, },
+        { .fno = gen_helper_gvec_vmae32, },
+    };
+    static const GVecGen4 g_vmale[3] = {
+        { .fno = gen_helper_gvec_vmale8, },
+        { .fno = gen_helper_gvec_vmale16, },
+        { .fno = gen_helper_gvec_vmale32, },
+    };
+    static const GVecGen4 g_vmao[3] = {
+        { .fno = gen_helper_gvec_vmao8, },
+        { .fno = gen_helper_gvec_vmao16, },
+        { .fno = gen_helper_gvec_vmao32, },
+    };
+    static const GVecGen4 g_vmalo[3] = {
+        { .fno = gen_helper_gvec_vmalo8, },
+        { .fno = gen_helper_gvec_vmalo16, },
+        { .fno = gen_helper_gvec_vmalo32, },
+    };
+    const GVecGen4 *fn;
+
+    if (es > ES_32) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    switch (s->fields->op2) {
+    case 0xaa:
+        fn = &g_vmal[es];
+        break;
+    case 0xab:
+        fn = &g_vmah[es];
+        break;
+    case 0xa9:
+        fn = &g_vmalh[es];
+        break;
+    case 0xae:
+        fn = &g_vmae[es];
+        break;
+    case 0xac:
+        fn = &g_vmale[es];
+        break;
+    case 0xaf:
+        fn = &g_vmao[es];
+        break;
+    case 0xad:
+        fn = &g_vmalo[es];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    gen_gvec_4(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), get_field(s->fields, v4), fn);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 574f707abf..424f248325 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -303,3 +303,126 @@ void HELPER(gvec_vlp##BITS)(void *v1, const void *v2, uint32_t desc)           \
 }
 DEF_VLP(8)
 DEF_VLP(16)
+
+#define DEF_VMAL(BITS)                                                         \
+void HELPER(gvec_vmal##BITS)(void *v1, const void *v2, const void *v3,         \
+                             const void *v4, uint32_t desc)                    \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint##BITS##_t b = s390_vec_read_element##BITS(v3, i);           \
+        const uint##BITS##_t c = s390_vec_read_element##BITS(v4, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a * b + c);                        \
+    }                                                                          \
+}
+DEF_VMAL(8)
+DEF_VMAL(16)
+
+#define DEF_VMAH(BITS)                                                         \
+void HELPER(gvec_vmah##BITS)(void *v1, const void *v2, const void *v3,         \
+                             const void *v4, uint32_t desc)                    \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const int32_t a = (int##BITS##_t)s390_vec_read_element##BITS(v2, i);   \
+        const int32_t b = (int##BITS##_t)s390_vec_read_element##BITS(v3, i);   \
+        const int32_t c = (int##BITS##_t)s390_vec_read_element##BITS(v4, i);   \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, (a * b + c) >> BITS);              \
+    }                                                                          \
+}
+DEF_VMAH(8)
+DEF_VMAH(16)
+
+#define DEF_VMALH(BITS)                                                        \
+void HELPER(gvec_vmalh##BITS)(void *v1, const void *v2, const void *v3,        \
+                              const void *v4, uint32_t desc)                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint##BITS##_t b = s390_vec_read_element##BITS(v3, i);           \
+        const uint##BITS##_t c = s390_vec_read_element##BITS(v4, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, (a * b + c) >> BITS);              \
+    }                                                                          \
+}
+DEF_VMALH(8)
+DEF_VMALH(16)
+
+#define DEF_VMAE(BITS, TBITS)                                                  \
+void HELPER(gvec_vmae##BITS)(void *v1, const void *v2, const void *v3,         \
+                             const void *v4, uint32_t desc)                    \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 0; i < (128 / TBITS); i++, j += 2) {                       \
+        int##TBITS##_t a = (int##BITS##_t)s390_vec_read_element##BITS(v2, j);  \
+        int##TBITS##_t b = (int##BITS##_t)s390_vec_read_element##BITS(v3, j);  \
+        int##TBITS##_t c = (int##BITS##_t)s390_vec_read_element##BITS(v4, j);  \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b + c);                       \
+    }                                                                          \
+}
+DEF_VMAE(8, 16)
+DEF_VMAE(16, 32)
+DEF_VMAE(32, 64)
+
+#define DEF_VMALE(BITS, TBITS)                                                 \
+void HELPER(gvec_vmale##BITS)(void *v1, const void *v2, const void *v3,        \
+                              const void *v4, uint32_t desc)                   \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 0; i < (128 / TBITS); i++, j += 2) {                       \
+        uint##TBITS##_t a = s390_vec_read_element##BITS(v2, j);                \
+        uint##TBITS##_t b = s390_vec_read_element##BITS(v3, j);                \
+        uint##TBITS##_t c = s390_vec_read_element##BITS(v4, j);                \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b + c);                       \
+    }                                                                          \
+}
+DEF_VMALE(8, 16)
+DEF_VMALE(16, 32)
+DEF_VMALE(32, 64)
+
+#define DEF_VMAO(BITS, TBITS)                                                  \
+void HELPER(gvec_vmao##BITS)(void *v1, const void *v2, const void *v3,         \
+                             const void *v4, uint32_t desc)                    \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 1; i < (128 / TBITS); i++, j += 2) {                       \
+        int##TBITS##_t a = (int##BITS##_t)s390_vec_read_element##BITS(v2, j);  \
+        int##TBITS##_t b = (int##BITS##_t)s390_vec_read_element##BITS(v3, j);  \
+        int##TBITS##_t c = (int##BITS##_t)s390_vec_read_element##BITS(v4, j);  \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b + c);                       \
+    }                                                                          \
+}
+DEF_VMAO(8, 16)
+DEF_VMAO(16, 32)
+DEF_VMAO(32, 64)
+
+#define DEF_VMALO(BITS, TBITS)                                                 \
+void HELPER(gvec_vmalo##BITS)(void *v1, const void *v2, const void *v3,        \
+                              const void *v4, uint32_t desc)                   \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 1; i < (128 / TBITS); i++, j += 2) {                       \
+        uint##TBITS##_t a = s390_vec_read_element##BITS(v2, j);                \
+        uint##TBITS##_t b = s390_vec_read_element##BITS(v3, j);                \
+        uint##TBITS##_t c = s390_vec_read_element##BITS(v4, j);                \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b + c);                       \
+    }                                                                          \
+}
+DEF_VMALO(8, 16)
+DEF_VMALO(16, 32)
+DEF_VMALO(32, 64)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 19/41] s390x/tcg: Implement VECTOR MULTIPLY AND ADD *
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Quite some variants to handle. At least handle some 32-bit element
variants via gvec expansion (we could also handle 16/32-bit variants
for ODD and EVEN easily via gvec expansion, but let's keep it simple
for now).

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  18 +++++
 target/s390x/insn-data.def      |  14 ++++
 target/s390x/translate_vx.inc.c | 122 +++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 123 ++++++++++++++++++++++++++++++++
 4 files changed, 277 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 065c4c6ea3..b73a35107e 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -166,6 +166,24 @@ DEF_HELPER_FLAGS_5(gvec_vgfma32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i
 DEF_HELPER_FLAGS_5(gvec_vgfma64, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vlp8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vlp16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmal8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmal16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmah8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmah16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmalh8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmalh16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmae8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmae16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmae32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmale8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmale16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmale32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmao8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmao16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmao32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmalo8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmalo16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vmalo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index b22d9f0f6a..7ccec0544f 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1106,6 +1106,20 @@
     F(0xe7fe, VMN,     VRR_c, V,   0, 0, 0, 0, vmx, 0, IF_VEC)
 /* VECTOR MINIMUM LOGICAL */
     F(0xe7fc, VMNL,    VRR_c, V,   0, 0, 0, 0, vmx, 0, IF_VEC)
+/* VECTOR MULTIPLY AND ADD LOW */
+    F(0xe7aa, VMAL,    VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
+/* VECTOR MULTIPLY AND ADD HIGH */
+    F(0xe7ab, VMAH,    VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
+/* VECTOR MULTIPLY AND ADD LOGICAL HIGH */
+    F(0xe7a9, VMALH,   VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
+/* VECTOR MULTIPLY AND ADD EVEN */
+    F(0xe7ae, VMAE,    VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
+/* VECTOR MULTIPLY AND ADD LOGICAL EVEN */
+    F(0xe7ac, VMALE,   VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
+/* VECTOR MULTIPLY AND ADD ODD */
+    F(0xe7af, VMAO,    VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
+/* VECTOR MULTIPLY AND ADD LOGICAL ODD */
+    F(0xe7ad, VMALO,   VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index aae1ff107a..4967af6a07 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1567,3 +1567,125 @@ static DisasJumpType op_vmx(DisasContext *s, DisasOps *o)
     }
     return DISAS_NEXT;
 }
+
+static void gen_mal_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, TCGv_i32 c)
+{
+    TCGv_i32 t0 = tcg_temp_new_i32();
+
+    tcg_gen_mul_i32(t0, a, b);
+    tcg_gen_add_i32(d, t0, c);
+
+    tcg_temp_free_i32(t0);
+}
+
+static void gen_mah_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, TCGv_i32 c)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+
+    tcg_gen_ext_i32_i64(t0, a);
+    tcg_gen_ext_i32_i64(t1, b);
+    tcg_gen_ext_i32_i64(t2, c);
+    tcg_gen_mul_i64(t0, t0, t1);
+    tcg_gen_add_i64(t0, t0, t2);
+    tcg_gen_extrh_i64_i32(d, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+}
+
+static void gen_malh_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, TCGv_i32 c)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+    TCGv_i64 t2 = tcg_temp_new_i64();
+
+    tcg_gen_extu_i32_i64(t0, a);
+    tcg_gen_extu_i32_i64(t1, b);
+    tcg_gen_extu_i32_i64(t2, c);
+    tcg_gen_mul_i64(t0, t0, t1);
+    tcg_gen_add_i64(t0, t0, t2);
+    tcg_gen_extrh_i64_i32(d, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+}
+
+static DisasJumpType op_vma(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m5);
+    static const GVecGen4 g_vmal[3] = {
+        { .fno = gen_helper_gvec_vmal8, },
+        { .fno = gen_helper_gvec_vmal16, },
+        { .fni4 = gen_mal_i32, },
+    };
+    static const GVecGen4 g_vmah[3] = {
+        { .fno = gen_helper_gvec_vmah8, },
+        { .fno = gen_helper_gvec_vmah16, },
+        { .fni4 = gen_mah_i32, },
+    };
+    static const GVecGen4 g_vmalh[3] = {
+        { .fno = gen_helper_gvec_vmalh8, },
+        { .fno = gen_helper_gvec_vmalh16, },
+        { .fni4 = gen_malh_i32, },
+    };
+    static const GVecGen4 g_vmae[3] = {
+        { .fno = gen_helper_gvec_vmae8, },
+        { .fno = gen_helper_gvec_vmae16, },
+        { .fno = gen_helper_gvec_vmae32, },
+    };
+    static const GVecGen4 g_vmale[3] = {
+        { .fno = gen_helper_gvec_vmale8, },
+        { .fno = gen_helper_gvec_vmale16, },
+        { .fno = gen_helper_gvec_vmale32, },
+    };
+    static const GVecGen4 g_vmao[3] = {
+        { .fno = gen_helper_gvec_vmao8, },
+        { .fno = gen_helper_gvec_vmao16, },
+        { .fno = gen_helper_gvec_vmao32, },
+    };
+    static const GVecGen4 g_vmalo[3] = {
+        { .fno = gen_helper_gvec_vmalo8, },
+        { .fno = gen_helper_gvec_vmalo16, },
+        { .fno = gen_helper_gvec_vmalo32, },
+    };
+    const GVecGen4 *fn;
+
+    if (es > ES_32) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    switch (s->fields->op2) {
+    case 0xaa:
+        fn = &g_vmal[es];
+        break;
+    case 0xab:
+        fn = &g_vmah[es];
+        break;
+    case 0xa9:
+        fn = &g_vmalh[es];
+        break;
+    case 0xae:
+        fn = &g_vmae[es];
+        break;
+    case 0xac:
+        fn = &g_vmale[es];
+        break;
+    case 0xaf:
+        fn = &g_vmao[es];
+        break;
+    case 0xad:
+        fn = &g_vmalo[es];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    gen_gvec_4(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), get_field(s->fields, v4), fn);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 574f707abf..424f248325 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -303,3 +303,126 @@ void HELPER(gvec_vlp##BITS)(void *v1, const void *v2, uint32_t desc)           \
 }
 DEF_VLP(8)
 DEF_VLP(16)
+
+#define DEF_VMAL(BITS)                                                         \
+void HELPER(gvec_vmal##BITS)(void *v1, const void *v2, const void *v3,         \
+                             const void *v4, uint32_t desc)                    \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint##BITS##_t b = s390_vec_read_element##BITS(v3, i);           \
+        const uint##BITS##_t c = s390_vec_read_element##BITS(v4, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a * b + c);                        \
+    }                                                                          \
+}
+DEF_VMAL(8)
+DEF_VMAL(16)
+
+#define DEF_VMAH(BITS)                                                         \
+void HELPER(gvec_vmah##BITS)(void *v1, const void *v2, const void *v3,         \
+                             const void *v4, uint32_t desc)                    \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const int32_t a = (int##BITS##_t)s390_vec_read_element##BITS(v2, i);   \
+        const int32_t b = (int##BITS##_t)s390_vec_read_element##BITS(v3, i);   \
+        const int32_t c = (int##BITS##_t)s390_vec_read_element##BITS(v4, i);   \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, (a * b + c) >> BITS);              \
+    }                                                                          \
+}
+DEF_VMAH(8)
+DEF_VMAH(16)
+
+#define DEF_VMALH(BITS)                                                        \
+void HELPER(gvec_vmalh##BITS)(void *v1, const void *v2, const void *v3,        \
+                              const void *v4, uint32_t desc)                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint##BITS##_t b = s390_vec_read_element##BITS(v3, i);           \
+        const uint##BITS##_t c = s390_vec_read_element##BITS(v4, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, (a * b + c) >> BITS);              \
+    }                                                                          \
+}
+DEF_VMALH(8)
+DEF_VMALH(16)
+
+#define DEF_VMAE(BITS, TBITS)                                                  \
+void HELPER(gvec_vmae##BITS)(void *v1, const void *v2, const void *v3,         \
+                             const void *v4, uint32_t desc)                    \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 0; i < (128 / TBITS); i++, j += 2) {                       \
+        int##TBITS##_t a = (int##BITS##_t)s390_vec_read_element##BITS(v2, j);  \
+        int##TBITS##_t b = (int##BITS##_t)s390_vec_read_element##BITS(v3, j);  \
+        int##TBITS##_t c = (int##BITS##_t)s390_vec_read_element##BITS(v4, j);  \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b + c);                       \
+    }                                                                          \
+}
+DEF_VMAE(8, 16)
+DEF_VMAE(16, 32)
+DEF_VMAE(32, 64)
+
+#define DEF_VMALE(BITS, TBITS)                                                 \
+void HELPER(gvec_vmale##BITS)(void *v1, const void *v2, const void *v3,        \
+                              const void *v4, uint32_t desc)                   \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 0; i < (128 / TBITS); i++, j += 2) {                       \
+        uint##TBITS##_t a = s390_vec_read_element##BITS(v2, j);                \
+        uint##TBITS##_t b = s390_vec_read_element##BITS(v3, j);                \
+        uint##TBITS##_t c = s390_vec_read_element##BITS(v4, j);                \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b + c);                       \
+    }                                                                          \
+}
+DEF_VMALE(8, 16)
+DEF_VMALE(16, 32)
+DEF_VMALE(32, 64)
+
+#define DEF_VMAO(BITS, TBITS)                                                  \
+void HELPER(gvec_vmao##BITS)(void *v1, const void *v2, const void *v3,         \
+                             const void *v4, uint32_t desc)                    \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 1; i < (128 / TBITS); i++, j += 2) {                       \
+        int##TBITS##_t a = (int##BITS##_t)s390_vec_read_element##BITS(v2, j);  \
+        int##TBITS##_t b = (int##BITS##_t)s390_vec_read_element##BITS(v3, j);  \
+        int##TBITS##_t c = (int##BITS##_t)s390_vec_read_element##BITS(v4, j);  \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b + c);                       \
+    }                                                                          \
+}
+DEF_VMAO(8, 16)
+DEF_VMAO(16, 32)
+DEF_VMAO(32, 64)
+
+#define DEF_VMALO(BITS, TBITS)                                                 \
+void HELPER(gvec_vmalo##BITS)(void *v1, const void *v2, const void *v3,        \
+                              const void *v4, uint32_t desc)                   \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 1; i < (128 / TBITS); i++, j += 2) {                       \
+        uint##TBITS##_t a = s390_vec_read_element##BITS(v2, j);                \
+        uint##TBITS##_t b = s390_vec_read_element##BITS(v3, j);                \
+        uint##TBITS##_t c = s390_vec_read_element##BITS(v4, j);                \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b + c);                       \
+    }                                                                          \
+}
+DEF_VMALO(8, 16)
+DEF_VMALO(16, 32)
+DEF_VMALO(32, 64)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 20/41] s390x/tcg: Implement VECTOR MULTIPLY *
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Yet another set of variants. Implement it similar to VECTOR MULTIPLY AND
ADD *. At least for one variant we have a gvec helper we can reuse.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  16 +++++
 target/s390x/insn-data.def      |  14 +++++
 target/s390x/translate_vx.inc.c | 100 ++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 100 ++++++++++++++++++++++++++++++++
 4 files changed, 230 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index b73a35107e..a44cc462ae 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -184,6 +184,22 @@ DEF_HELPER_FLAGS_5(gvec_vmao32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i3
 DEF_HELPER_FLAGS_5(gvec_vmalo8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vmalo16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vmalo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmh8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmh16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmlh8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmlh16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vme8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vme16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vme32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmle8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmle16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmle32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmo8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmo16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmlo8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmlo16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmlo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 7ccec0544f..2c794a2744 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1120,6 +1120,20 @@
     F(0xe7af, VMAO,    VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
 /* VECTOR MULTIPLY AND ADD LOGICAL ODD */
     F(0xe7ad, VMALO,   VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
+/* VECTOR MULTIPLY HIGH */
+    F(0xe7a3, VMH,     VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR MULTIPLY LOGICAL HIGH */
+    F(0xe7a1, VMLH,    VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR MULTIPLY LOW */
+    F(0xe7a2, VML,     VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR MULTIPLY EVEN */
+    F(0xe7a6, VME,     VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR MULTIPLY LOGICAL EVEN */
+    F(0xe7a4, VMLE,    VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR MULTIPLY ODD */
+    F(0xe7a7, VMO,     VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR MULTIPLY LOGICAL ODD */
+    F(0xe7a5, VMLO,    VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 4967af6a07..53bbb4a2ce 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1689,3 +1689,103 @@ static DisasJumpType op_vma(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), get_field(s->fields, v4), fn);
     return DISAS_NEXT;
 }
+
+static void gen_mh_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    tcg_gen_ext_i32_i64(t0, a);
+    tcg_gen_ext_i32_i64(t1, b);
+    tcg_gen_mul_i64(t0, t0, t1);
+    tcg_gen_extrh_i64_i32(d, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+}
+
+static void gen_mlh_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    tcg_gen_extu_i32_i64(t0, a);
+    tcg_gen_extu_i32_i64(t1, b);
+    tcg_gen_mul_i64(t0, t0, t1);
+    tcg_gen_extrh_i64_i32(d, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+}
+
+static DisasJumpType op_vm(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g_vmh[3] = {
+        { .fno = gen_helper_gvec_vmh8, },
+        { .fno = gen_helper_gvec_vmh16, },
+        { .fni4 = gen_mh_i32, },
+    };
+    static const GVecGen3 g_vmlh[3] = {
+        { .fno = gen_helper_gvec_vmlh8, },
+        { .fno = gen_helper_gvec_vmlh16, },
+        { .fni4 = gen_mlh_i32, },
+    };
+    static const GVecGen3 g_vme[3] = {
+        { .fno = gen_helper_gvec_vme8, },
+        { .fno = gen_helper_gvec_vme16, },
+        { .fno = gen_helper_gvec_vme32, },
+    };
+    static const GVecGen3 g_vmle[3] = {
+        { .fno = gen_helper_gvec_vmle8, },
+        { .fno = gen_helper_gvec_vmle16, },
+        { .fno = gen_helper_gvec_vmle32, },
+    };
+    static const GVecGen3 g_vmo[3] = {
+        { .fno = gen_helper_gvec_vmo8, },
+        { .fno = gen_helper_gvec_vmo16, },
+        { .fno = gen_helper_gvec_vmo32, },
+    };
+    static const GVecGen3 g_vmlo[3] = {
+        { .fno = gen_helper_gvec_vmlo8, },
+        { .fno = gen_helper_gvec_vmlo16, },
+        { .fno = gen_helper_gvec_vmlo32, },
+    };
+    const GVecGen3 *fn;
+
+    if (es > ES_32) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    switch (s->fields->op2) {
+    case 0xa2:
+        gen_gvec_fn_3(mul, es, get_field(s->fields, v1),
+                      get_field(s->fields, v2), get_field(s->fields, v3));
+        return DISAS_NEXT;
+    case 0xa3:
+        fn = &g_vmh[es];
+        break;
+    case 0xa1:
+        fn = &g_vmlh[es];
+        break;
+    case 0xa6:
+        fn = &g_vme[es];
+        break;
+    case 0xa4:
+        fn = &g_vmle[es];
+        break;
+    case 0xa7:
+        fn = &g_vmo[es];
+        break;
+    case 0xa5:
+        fn = &g_vmlo[es];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), fn);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 424f248325..b818c513a9 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -426,3 +426,103 @@ void HELPER(gvec_vmalo##BITS)(void *v1, const void *v2, const void *v3,        \
 DEF_VMALO(8, 16)
 DEF_VMALO(16, 32)
 DEF_VMALO(32, 64)
+
+#define DEF_VMH(BITS)                                                          \
+void HELPER(gvec_vmh##BITS)(void *v1, const void *v2, const void *v3,          \
+                            uint32_t desc)                                     \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const int32_t a = (int##BITS##_t)s390_vec_read_element##BITS(v2, i);   \
+        const int32_t b = (int##BITS##_t)s390_vec_read_element##BITS(v3, i);   \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, (a * b) >> BITS);                  \
+    }                                                                          \
+}
+DEF_VMH(8)
+DEF_VMH(16)
+
+#define DEF_VMLH(BITS)                                                         \
+void HELPER(gvec_vmlh##BITS)(void *v1, const void *v2, const void *v3,         \
+                             uint32_t desc)                                    \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint##BITS##_t b = s390_vec_read_element##BITS(v3, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, (a * b) >> BITS);                  \
+    }                                                                          \
+}
+DEF_VMLH(8)
+DEF_VMLH(16)
+
+#define DEF_VME(BITS, TBITS)                                                   \
+void HELPER(gvec_vme##BITS)(void *v1, const void *v2, const void *v3,          \
+                            uint32_t desc)                                     \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 0; i < (128 / TBITS); i++, j += 2) {                       \
+        int##TBITS##_t a = (int##BITS##_t)s390_vec_read_element##BITS(v2, j);  \
+        int##TBITS##_t b = (int##BITS##_t)s390_vec_read_element##BITS(v3, j);  \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b);                           \
+    }                                                                          \
+}
+DEF_VME(8, 16)
+DEF_VME(16, 32)
+DEF_VME(32, 64)
+
+#define DEF_VMLE(BITS, TBITS)                                                  \
+void HELPER(gvec_vmle##BITS)(void *v1, const void *v2, const void *v3,         \
+                             uint32_t desc)                                    \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 0; i < (128 / TBITS); i++, j += 2) {                       \
+        const uint##TBITS##_t a = s390_vec_read_element##BITS(v2, j);          \
+        const uint##TBITS##_t b = s390_vec_read_element##BITS(v3, j);          \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b);                           \
+    }                                                                          \
+}
+DEF_VMLE(8, 16)
+DEF_VMLE(16, 32)
+DEF_VMLE(32, 64)
+
+#define DEF_VMO(BITS, TBITS)                                                   \
+void HELPER(gvec_vmo##BITS)(void *v1, const void *v2, const void *v3,          \
+                            uint32_t desc)                                     \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 1; i < (128 / TBITS); i++, j += 2) {                       \
+        int##TBITS##_t a = (int##BITS##_t)s390_vec_read_element##BITS(v2, j);  \
+        int##TBITS##_t b = (int##BITS##_t)s390_vec_read_element##BITS(v3, j);  \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b);                           \
+    }                                                                          \
+}
+DEF_VMO(8, 16)
+DEF_VMO(16, 32)
+DEF_VMO(32, 64)
+
+#define DEF_VMLO(BITS, TBITS)                                                  \
+void HELPER(gvec_vmlo##BITS)(void *v1, const void *v2, const void *v3,         \
+                             uint32_t desc)                                    \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 0; i < (128 / TBITS); i++, j += 2) {                       \
+        const uint##TBITS##_t a = s390_vec_read_element##BITS(v2, j);          \
+        const uint##TBITS##_t b = s390_vec_read_element##BITS(v3, j);          \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b);                           \
+    }                                                                          \
+}
+DEF_VMLO(8, 16)
+DEF_VMLO(16, 32)
+DEF_VMLO(32, 64)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 20/41] s390x/tcg: Implement VECTOR MULTIPLY *
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Yet another set of variants. Implement it similar to VECTOR MULTIPLY AND
ADD *. At least for one variant we have a gvec helper we can reuse.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  16 +++++
 target/s390x/insn-data.def      |  14 +++++
 target/s390x/translate_vx.inc.c | 100 ++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 100 ++++++++++++++++++++++++++++++++
 4 files changed, 230 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index b73a35107e..a44cc462ae 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -184,6 +184,22 @@ DEF_HELPER_FLAGS_5(gvec_vmao32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i3
 DEF_HELPER_FLAGS_5(gvec_vmalo8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vmalo16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vmalo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmh8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmh16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmlh8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmlh16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vme8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vme16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vme32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmle8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmle16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmle32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmo8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmo16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmlo8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmlo16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vmlo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 7ccec0544f..2c794a2744 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1120,6 +1120,20 @@
     F(0xe7af, VMAO,    VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
 /* VECTOR MULTIPLY AND ADD LOGICAL ODD */
     F(0xe7ad, VMALO,   VRR_d, V,   0, 0, 0, 0, vma, 0, IF_VEC)
+/* VECTOR MULTIPLY HIGH */
+    F(0xe7a3, VMH,     VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR MULTIPLY LOGICAL HIGH */
+    F(0xe7a1, VMLH,    VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR MULTIPLY LOW */
+    F(0xe7a2, VML,     VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR MULTIPLY EVEN */
+    F(0xe7a6, VME,     VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR MULTIPLY LOGICAL EVEN */
+    F(0xe7a4, VMLE,    VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR MULTIPLY ODD */
+    F(0xe7a7, VMO,     VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR MULTIPLY LOGICAL ODD */
+    F(0xe7a5, VMLO,    VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 4967af6a07..53bbb4a2ce 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1689,3 +1689,103 @@ static DisasJumpType op_vma(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), get_field(s->fields, v4), fn);
     return DISAS_NEXT;
 }
+
+static void gen_mh_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    tcg_gen_ext_i32_i64(t0, a);
+    tcg_gen_ext_i32_i64(t1, b);
+    tcg_gen_mul_i64(t0, t0, t1);
+    tcg_gen_extrh_i64_i32(d, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+}
+
+static void gen_mlh_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    tcg_gen_extu_i32_i64(t0, a);
+    tcg_gen_extu_i32_i64(t1, b);
+    tcg_gen_mul_i64(t0, t0, t1);
+    tcg_gen_extrh_i64_i32(d, t0);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+}
+
+static DisasJumpType op_vm(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g_vmh[3] = {
+        { .fno = gen_helper_gvec_vmh8, },
+        { .fno = gen_helper_gvec_vmh16, },
+        { .fni4 = gen_mh_i32, },
+    };
+    static const GVecGen3 g_vmlh[3] = {
+        { .fno = gen_helper_gvec_vmlh8, },
+        { .fno = gen_helper_gvec_vmlh16, },
+        { .fni4 = gen_mlh_i32, },
+    };
+    static const GVecGen3 g_vme[3] = {
+        { .fno = gen_helper_gvec_vme8, },
+        { .fno = gen_helper_gvec_vme16, },
+        { .fno = gen_helper_gvec_vme32, },
+    };
+    static const GVecGen3 g_vmle[3] = {
+        { .fno = gen_helper_gvec_vmle8, },
+        { .fno = gen_helper_gvec_vmle16, },
+        { .fno = gen_helper_gvec_vmle32, },
+    };
+    static const GVecGen3 g_vmo[3] = {
+        { .fno = gen_helper_gvec_vmo8, },
+        { .fno = gen_helper_gvec_vmo16, },
+        { .fno = gen_helper_gvec_vmo32, },
+    };
+    static const GVecGen3 g_vmlo[3] = {
+        { .fno = gen_helper_gvec_vmlo8, },
+        { .fno = gen_helper_gvec_vmlo16, },
+        { .fno = gen_helper_gvec_vmlo32, },
+    };
+    const GVecGen3 *fn;
+
+    if (es > ES_32) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    switch (s->fields->op2) {
+    case 0xa2:
+        gen_gvec_fn_3(mul, es, get_field(s->fields, v1),
+                      get_field(s->fields, v2), get_field(s->fields, v3));
+        return DISAS_NEXT;
+    case 0xa3:
+        fn = &g_vmh[es];
+        break;
+    case 0xa1:
+        fn = &g_vmlh[es];
+        break;
+    case 0xa6:
+        fn = &g_vme[es];
+        break;
+    case 0xa4:
+        fn = &g_vmle[es];
+        break;
+    case 0xa7:
+        fn = &g_vmo[es];
+        break;
+    case 0xa5:
+        fn = &g_vmlo[es];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), fn);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 424f248325..b818c513a9 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -426,3 +426,103 @@ void HELPER(gvec_vmalo##BITS)(void *v1, const void *v2, const void *v3,        \
 DEF_VMALO(8, 16)
 DEF_VMALO(16, 32)
 DEF_VMALO(32, 64)
+
+#define DEF_VMH(BITS)                                                          \
+void HELPER(gvec_vmh##BITS)(void *v1, const void *v2, const void *v3,          \
+                            uint32_t desc)                                     \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const int32_t a = (int##BITS##_t)s390_vec_read_element##BITS(v2, i);   \
+        const int32_t b = (int##BITS##_t)s390_vec_read_element##BITS(v3, i);   \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, (a * b) >> BITS);                  \
+    }                                                                          \
+}
+DEF_VMH(8)
+DEF_VMH(16)
+
+#define DEF_VMLH(BITS)                                                         \
+void HELPER(gvec_vmlh##BITS)(void *v1, const void *v2, const void *v3,         \
+                             uint32_t desc)                                    \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint##BITS##_t b = s390_vec_read_element##BITS(v3, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, (a * b) >> BITS);                  \
+    }                                                                          \
+}
+DEF_VMLH(8)
+DEF_VMLH(16)
+
+#define DEF_VME(BITS, TBITS)                                                   \
+void HELPER(gvec_vme##BITS)(void *v1, const void *v2, const void *v3,          \
+                            uint32_t desc)                                     \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 0; i < (128 / TBITS); i++, j += 2) {                       \
+        int##TBITS##_t a = (int##BITS##_t)s390_vec_read_element##BITS(v2, j);  \
+        int##TBITS##_t b = (int##BITS##_t)s390_vec_read_element##BITS(v3, j);  \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b);                           \
+    }                                                                          \
+}
+DEF_VME(8, 16)
+DEF_VME(16, 32)
+DEF_VME(32, 64)
+
+#define DEF_VMLE(BITS, TBITS)                                                  \
+void HELPER(gvec_vmle##BITS)(void *v1, const void *v2, const void *v3,         \
+                             uint32_t desc)                                    \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 0; i < (128 / TBITS); i++, j += 2) {                       \
+        const uint##TBITS##_t a = s390_vec_read_element##BITS(v2, j);          \
+        const uint##TBITS##_t b = s390_vec_read_element##BITS(v3, j);          \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b);                           \
+    }                                                                          \
+}
+DEF_VMLE(8, 16)
+DEF_VMLE(16, 32)
+DEF_VMLE(32, 64)
+
+#define DEF_VMO(BITS, TBITS)                                                   \
+void HELPER(gvec_vmo##BITS)(void *v1, const void *v2, const void *v3,          \
+                            uint32_t desc)                                     \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 1; i < (128 / TBITS); i++, j += 2) {                       \
+        int##TBITS##_t a = (int##BITS##_t)s390_vec_read_element##BITS(v2, j);  \
+        int##TBITS##_t b = (int##BITS##_t)s390_vec_read_element##BITS(v3, j);  \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b);                           \
+    }                                                                          \
+}
+DEF_VMO(8, 16)
+DEF_VMO(16, 32)
+DEF_VMO(32, 64)
+
+#define DEF_VMLO(BITS, TBITS)                                                  \
+void HELPER(gvec_vmlo##BITS)(void *v1, const void *v2, const void *v3,         \
+                             uint32_t desc)                                    \
+{                                                                              \
+    int i, j;                                                                  \
+                                                                               \
+    for (i = 0, j = 0; i < (128 / TBITS); i++, j += 2) {                       \
+        const uint##TBITS##_t a = s390_vec_read_element##BITS(v2, j);          \
+        const uint##TBITS##_t b = s390_vec_read_element##BITS(v3, j);          \
+                                                                               \
+        s390_vec_write_element##TBITS(v1, i, a * b);                           \
+    }                                                                          \
+}
+DEF_VMLO(8, 16)
+DEF_VMLO(16, 32)
+DEF_VMLO(32, 64)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 21/41] s390x/tcg: Implement VECTOR NAND
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Part of vector enhancements facility 1, but easy to implement.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      | 2 ++
 target/s390x/translate.c        | 1 +
 target/s390x/translate_vx.inc.c | 7 +++++++
 3 files changed, 10 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 2c794a2744..bc8b84e1c2 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1134,6 +1134,8 @@
     F(0xe7a7, VMO,     VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
 /* VECTOR MULTIPLY LOGICAL ODD */
     F(0xe7a5, VMLO,    VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR NAND */
+    F(0xe76e, VNN,     VRR_c, VE,  0, 0, 0, 0, vnn, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index a800aa9dc9..c6378b2b53 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -6099,6 +6099,7 @@ enum DisasInsnEnum {
 #define FAC_PCI         S390_FEAT_ZPCI /* z/PCI facility */
 #define FAC_AIS         S390_FEAT_ADAPTER_INT_SUPPRESSION
 #define FAC_V           S390_FEAT_VECTOR /* vector facility */
+#define FAC_VE          S390_FEAT_VECTOR_ENH /* vector enhancements facility 1 */
 
 static const DisasInsn insn_info[] = {
 #include "insn-data.def"
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 53bbb4a2ce..aa01d6274c 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1789,3 +1789,10 @@ static DisasJumpType op_vm(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), fn);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vnn(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(nand, ES_8, get_field(s->fields, v1),
+                  get_field(s->fields, v2), get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 21/41] s390x/tcg: Implement VECTOR NAND
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Part of vector enhancements facility 1, but easy to implement.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      | 2 ++
 target/s390x/translate.c        | 1 +
 target/s390x/translate_vx.inc.c | 7 +++++++
 3 files changed, 10 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 2c794a2744..bc8b84e1c2 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1134,6 +1134,8 @@
     F(0xe7a7, VMO,     VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
 /* VECTOR MULTIPLY LOGICAL ODD */
     F(0xe7a5, VMLO,    VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR NAND */
+    F(0xe76e, VNN,     VRR_c, VE,  0, 0, 0, 0, vnn, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index a800aa9dc9..c6378b2b53 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -6099,6 +6099,7 @@ enum DisasInsnEnum {
 #define FAC_PCI         S390_FEAT_ZPCI /* z/PCI facility */
 #define FAC_AIS         S390_FEAT_ADAPTER_INT_SUPPRESSION
 #define FAC_V           S390_FEAT_VECTOR /* vector facility */
+#define FAC_VE          S390_FEAT_VECTOR_ENH /* vector enhancements facility 1 */
 
 static const DisasInsn insn_info[] = {
 #include "insn-data.def"
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 53bbb4a2ce..aa01d6274c 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1789,3 +1789,10 @@ static DisasJumpType op_vm(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), fn);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vnn(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(nand, ES_8, get_field(s->fields, v1),
+                  get_field(s->fields, v2), get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 22/41] s390x/tcg: Implement VECTOR NOR
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      | 2 ++
 target/s390x/translate_vx.inc.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index bc8b84e1c2..4983867a44 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1136,6 +1136,8 @@
     F(0xe7a5, VMLO,    VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
 /* VECTOR NAND */
     F(0xe76e, VNN,     VRR_c, VE,  0, 0, 0, 0, vnn, 0, IF_VEC)
+/* VECTOR NOR */
+    F(0xe76b, VNO,     VRR_c, V,   0, 0, 0, 0, vno, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index aa01d6274c..b78f1bb604 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1796,3 +1796,10 @@ static DisasJumpType op_vnn(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v2), get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vno(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(nor, ES_8, get_field(s->fields, v1), get_field(s->fields, v2),
+                  get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 22/41] s390x/tcg: Implement VECTOR NOR
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      | 2 ++
 target/s390x/translate_vx.inc.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index bc8b84e1c2..4983867a44 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1136,6 +1136,8 @@
     F(0xe7a5, VMLO,    VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
 /* VECTOR NAND */
     F(0xe76e, VNN,     VRR_c, VE,  0, 0, 0, 0, vnn, 0, IF_VEC)
+/* VECTOR NOR */
+    F(0xe76b, VNO,     VRR_c, V,   0, 0, 0, 0, vno, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index aa01d6274c..b78f1bb604 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1796,3 +1796,10 @@ static DisasJumpType op_vnn(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v2), get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vno(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(nor, ES_8, get_field(s->fields, v1), get_field(s->fields, v2),
+                  get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 23/41] s390x/tcg: Implement VECTOR NOT EXCLUSIVE OR
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Again, part of vector enhancement facility 1. The operation corresponds
to an bitwise equality check.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      | 2 ++
 target/s390x/translate_vx.inc.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 4983867a44..b549b76b96 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1138,6 +1138,8 @@
     F(0xe76e, VNN,     VRR_c, VE,  0, 0, 0, 0, vnn, 0, IF_VEC)
 /* VECTOR NOR */
     F(0xe76b, VNO,     VRR_c, V,   0, 0, 0, 0, vno, 0, IF_VEC)
+/* VECTOR NOT EXCLUSIVE OR */
+    F(0xe76c, VNX,     VRR_c, VE,  0, 0, 0, 0, vnx, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index b78f1bb604..df6cf514b2 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1803,3 +1803,10 @@ static DisasJumpType op_vno(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vnx(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(eqv, ES_8, get_field(s->fields, v1), get_field(s->fields, v2),
+                  get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 23/41] s390x/tcg: Implement VECTOR NOT EXCLUSIVE OR
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Again, part of vector enhancement facility 1. The operation corresponds
to an bitwise equality check.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      | 2 ++
 target/s390x/translate_vx.inc.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 4983867a44..b549b76b96 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1138,6 +1138,8 @@
     F(0xe76e, VNN,     VRR_c, VE,  0, 0, 0, 0, vnn, 0, IF_VEC)
 /* VECTOR NOR */
     F(0xe76b, VNO,     VRR_c, V,   0, 0, 0, 0, vno, 0, IF_VEC)
+/* VECTOR NOT EXCLUSIVE OR */
+    F(0xe76c, VNX,     VRR_c, VE,  0, 0, 0, 0, vnx, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index b78f1bb604..df6cf514b2 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1803,3 +1803,10 @@ static DisasJumpType op_vno(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vnx(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(eqv, ES_8, get_field(s->fields, v1), get_field(s->fields, v2),
+                  get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 24/41] s390x/tcg: Implement VECTOR OR
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Reuse a gvec helper.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      | 2 ++
 target/s390x/translate_vx.inc.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index b549b76b96..fb74374a0a 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1140,6 +1140,8 @@
     F(0xe76b, VNO,     VRR_c, V,   0, 0, 0, 0, vno, 0, IF_VEC)
 /* VECTOR NOT EXCLUSIVE OR */
     F(0xe76c, VNX,     VRR_c, VE,  0, 0, 0, 0, vnx, 0, IF_VEC)
+/* VECTOR OR */
+    F(0xe76a, VO,      VRR_c, V,   0, 0, 0, 0, vo, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index df6cf514b2..b0b54a49a3 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1810,3 +1810,10 @@ static DisasJumpType op_vnx(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vo(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(or, ES_8, get_field(s->fields, v1), get_field(s->fields, v2),
+                  get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 24/41] s390x/tcg: Implement VECTOR OR
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Reuse a gvec helper.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      | 2 ++
 target/s390x/translate_vx.inc.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index b549b76b96..fb74374a0a 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1140,6 +1140,8 @@
     F(0xe76b, VNO,     VRR_c, V,   0, 0, 0, 0, vno, 0, IF_VEC)
 /* VECTOR NOT EXCLUSIVE OR */
     F(0xe76c, VNX,     VRR_c, VE,  0, 0, 0, 0, vnx, 0, IF_VEC)
+/* VECTOR OR */
+    F(0xe76a, VO,      VRR_c, V,   0, 0, 0, 0, vo, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index df6cf514b2..b0b54a49a3 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1810,3 +1810,10 @@ static DisasJumpType op_vnx(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vo(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(or, ES_8, get_field(s->fields, v1), get_field(s->fields, v2),
+                  get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 25/41] s390x/tcg: Implement VECTOR OR WITH COMPLEMENT
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Again, vector enhancements facility 1 material.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      | 2 ++
 target/s390x/translate_vx.inc.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index fb74374a0a..52171252be 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1142,6 +1142,8 @@
     F(0xe76c, VNX,     VRR_c, VE,  0, 0, 0, 0, vnx, 0, IF_VEC)
 /* VECTOR OR */
     F(0xe76a, VO,      VRR_c, V,   0, 0, 0, 0, vo, 0, IF_VEC)
+/* VECTOR OR WITH COMPLEMENT */
+    F(0xe76f, VOC,     VRR_c, VE,  0, 0, 0, 0, voc, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index b0b54a49a3..08ebc7fc4c 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1817,3 +1817,10 @@ static DisasJumpType op_vo(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_voc(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(orc, ES_8, get_field(s->fields, v1), get_field(s->fields, v2),
+                  get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 25/41] s390x/tcg: Implement VECTOR OR WITH COMPLEMENT
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Again, vector enhancements facility 1 material.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      | 2 ++
 target/s390x/translate_vx.inc.c | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index fb74374a0a..52171252be 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1142,6 +1142,8 @@
     F(0xe76c, VNX,     VRR_c, VE,  0, 0, 0, 0, vnx, 0, IF_VEC)
 /* VECTOR OR */
     F(0xe76a, VO,      VRR_c, V,   0, 0, 0, 0, vo, 0, IF_VEC)
+/* VECTOR OR WITH COMPLEMENT */
+    F(0xe76f, VOC,     VRR_c, VE,  0, 0, 0, 0, voc, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index b0b54a49a3..08ebc7fc4c 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1817,3 +1817,10 @@ static DisasJumpType op_vo(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_voc(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_fn_3(orc, ES_8, get_field(s->fields, v1), get_field(s->fields, v2),
+                  get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 26/41] s390x/tcg: Implement VECTOR POPULATION COUNT
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Similar to VECTOR COUNT TRAILING ZEROES.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  2 ++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 19 +++++++++++++++++++
 target/s390x/vec_int_helper.c   | 14 ++++++++++++++
 4 files changed, 37 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index a44cc462ae..a306378950 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -200,6 +200,8 @@ DEF_HELPER_FLAGS_4(gvec_vmo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vmlo8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vmlo16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vmlo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vpopct8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vpopct16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 52171252be..0f786d6ab1 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1144,6 +1144,8 @@
     F(0xe76a, VO,      VRR_c, V,   0, 0, 0, 0, vo, 0, IF_VEC)
 /* VECTOR OR WITH COMPLEMENT */
     F(0xe76f, VOC,     VRR_c, VE,  0, 0, 0, 0, voc, 0, IF_VEC)
+/* VECTOR POPULATION COUNT */
+    F(0xe750, VPOPCT,  VRR_a, V,   0, 0, 0, 0, vpopct, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 08ebc7fc4c..df17b8242d 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1824,3 +1824,22 @@ static DisasJumpType op_voc(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vpopct(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m3);
+    static const GVecGen2 g[4] = {
+        { .fno = gen_helper_gvec_vpopct8, },
+        { .fno = gen_helper_gvec_vpopct16, },
+        { .fni4 = tcg_gen_ctpop_i32, },
+        { .fni8 = tcg_gen_ctpop_i64, },
+    };
+
+    if (es > ES_64 || (es != ES_8 && !s390_has_feat(S390_FEAT_VECTOR_ENH))) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index b818c513a9..f49d5c2ffb 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -526,3 +526,17 @@ void HELPER(gvec_vmlo##BITS)(void *v1, const void *v2, const void *v3,         \
 DEF_VMLO(8, 16)
 DEF_VMLO(16, 32)
 DEF_VMLO(32, 64)
+
+#define DEF_VPOPCT(BITS)                                                       \
+void HELPER(gvec_vpopct##BITS)(void *v1, const void *v2, uint32_t desc)        \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, ctpop32(a));                       \
+    }                                                                          \
+}
+DEF_VPOPCT(8)
+DEF_VPOPCT(16)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 26/41] s390x/tcg: Implement VECTOR POPULATION COUNT
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Similar to VECTOR COUNT TRAILING ZEROES.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  2 ++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 19 +++++++++++++++++++
 target/s390x/vec_int_helper.c   | 14 ++++++++++++++
 4 files changed, 37 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index a44cc462ae..a306378950 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -200,6 +200,8 @@ DEF_HELPER_FLAGS_4(gvec_vmo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vmlo8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vmlo16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vmlo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vpopct8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_3(gvec_vpopct16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 52171252be..0f786d6ab1 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1144,6 +1144,8 @@
     F(0xe76a, VO,      VRR_c, V,   0, 0, 0, 0, vo, 0, IF_VEC)
 /* VECTOR OR WITH COMPLEMENT */
     F(0xe76f, VOC,     VRR_c, VE,  0, 0, 0, 0, voc, 0, IF_VEC)
+/* VECTOR POPULATION COUNT */
+    F(0xe750, VPOPCT,  VRR_a, V,   0, 0, 0, 0, vpopct, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 08ebc7fc4c..df17b8242d 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1824,3 +1824,22 @@ static DisasJumpType op_voc(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vpopct(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m3);
+    static const GVecGen2 g[4] = {
+        { .fno = gen_helper_gvec_vpopct8, },
+        { .fno = gen_helper_gvec_vpopct16, },
+        { .fni4 = tcg_gen_ctpop_i32, },
+        { .fni8 = tcg_gen_ctpop_i64, },
+    };
+
+    if (es > ES_64 || (es != ES_8 && !s390_has_feat(S390_FEAT_VECTOR_ENH))) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index b818c513a9..f49d5c2ffb 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -526,3 +526,17 @@ void HELPER(gvec_vmlo##BITS)(void *v1, const void *v2, const void *v3,         \
 DEF_VMLO(8, 16)
 DEF_VMLO(16, 32)
 DEF_VMLO(32, 64)
+
+#define DEF_VPOPCT(BITS)                                                       \
+void HELPER(gvec_vpopct##BITS)(void *v1, const void *v2, uint32_t desc)        \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, ctpop32(a));                       \
+    }                                                                          \
+}
+DEF_VPOPCT(8)
+DEF_VPOPCT(16)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 27/41] s390x/tcg: Implement VECTOR ELEMENT ROTATE LEFT LOGICAL
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Take care of properly taking the modulo of the count. We might later
want to come back and create a variant of VERLL where the base register
is 0, resulting in an immediate.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  4 +++
 target/s390x/insn-data.def      |  3 ++
 target/s390x/translate_vx.inc.c | 60 +++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 40 ++++++++++++++++++++++
 4 files changed, 107 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index a306378950..f0efaf9cd5 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -202,6 +202,10 @@ DEF_HELPER_FLAGS_4(gvec_vmlo16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vmlo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vpopct8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vpopct16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_verllv8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_verllv16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_verll8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_verll16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 0f786d6ab1..e765c15941 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1146,6 +1146,9 @@
     F(0xe76f, VOC,     VRR_c, VE,  0, 0, 0, 0, voc, 0, IF_VEC)
 /* VECTOR POPULATION COUNT */
     F(0xe750, VPOPCT,  VRR_a, V,   0, 0, 0, 0, vpopct, 0, IF_VEC)
+/* VECTOR ELEMENT ROTATE LEFT LOGICAL */
+    F(0xe773, VERLLV,  VRR_c, V,   0, 0, 0, 0, verllv, 0, IF_VEC)
+    F(0xe733, VERLL,   VRS_a, V,   la2, 0, 0, 0, verll, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index df17b8242d..92c14174da 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -185,6 +185,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
 #define gen_gvec_2(v1, v2, gen) \
     tcg_gen_gvec_2(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                    16, 16, gen)
+#define gen_gvec_2s(v1, v2, c, gen) \
+    tcg_gen_gvec_2s(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                    16, 16, c, gen)
 #define gen_gvec_3(v1, v2, v3, gen) \
     tcg_gen_gvec_3(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                    vec_full_reg_offset(v3), 16, 16, gen)
@@ -1843,3 +1846,60 @@ static DisasJumpType op_vpopct(DisasContext *s, DisasOps *o)
     gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
     return DISAS_NEXT;
 }
+
+static void gen_rll_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t0 = tcg_temp_new_i32();
+
+    tcg_gen_andi_i32(t0, b, 31);
+    tcg_gen_rotl_i32(d, a, t0);
+    tcg_temp_free_i32(t0);
+}
+
+static void gen_rll_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+
+    tcg_gen_andi_i64(t0, b, 63);
+    tcg_gen_rotl_i64(d, a, t0);
+    tcg_temp_free_i64(t0);
+}
+
+static DisasJumpType op_verllv(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g[4] = {
+        { .fno = gen_helper_gvec_verllv8, },
+        { .fno = gen_helper_gvec_verllv16, },
+        { .fni4 = gen_rll_i32, },
+        { .fni8 = gen_rll_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), &g[es]);
+    return DISAS_NEXT;
+}
+
+static DisasJumpType op_verll(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen2s g[4] = {
+        { .fno = gen_helper_gvec_verll8, },
+        { .fno = gen_helper_gvec_verll16, },
+        { .fni4 = gen_rll_i32, },
+        { .fni8 = gen_rll_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_2s(get_field(s->fields, v1), get_field(s->fields, v3), o->addr1,
+                &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index f49d5c2ffb..ed67fa73fb 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -540,3 +540,43 @@ void HELPER(gvec_vpopct##BITS)(void *v1, const void *v2, uint32_t desc)        \
 }
 DEF_VPOPCT(8)
 DEF_VPOPCT(16)
+
+#define DEF_ROTL(BITS)                                                         \
+static uint##BITS##_t rotl##BITS(uint##BITS##_t a, uint8_t count)              \
+{                                                                              \
+    count &= BITS - 1;                                                         \
+    return (a << count) | (a >> (BITS - count));                               \
+}
+DEF_ROTL(8)
+DEF_ROTL(16)
+
+#define DEF_VERLLV(BITS)                                                       \
+void HELPER(gvec_verllv##BITS)(void *v1, const void *v2, const void *v3,       \
+                               uint32_t desc)                                  \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint##BITS##_t b = s390_vec_read_element##BITS(v3, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, rotl##BITS(a, b));                 \
+    }                                                                          \
+}
+DEF_VERLLV(8)
+DEF_VERLLV(16)
+
+#define DEF_VERLL(BITS)                                                        \
+void HELPER(gvec_verll##BITS)(void *v1, const void *v2, uint64_t count,        \
+                              uint32_t desc)                                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, rotl##BITS(a, count));             \
+    }                                                                          \
+}
+DEF_VERLL(8)
+DEF_VERLL(16)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 27/41] s390x/tcg: Implement VECTOR ELEMENT ROTATE LEFT LOGICAL
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Take care of properly taking the modulo of the count. We might later
want to come back and create a variant of VERLL where the base register
is 0, resulting in an immediate.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  4 +++
 target/s390x/insn-data.def      |  3 ++
 target/s390x/translate_vx.inc.c | 60 +++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 40 ++++++++++++++++++++++
 4 files changed, 107 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index a306378950..f0efaf9cd5 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -202,6 +202,10 @@ DEF_HELPER_FLAGS_4(gvec_vmlo16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vmlo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vpopct8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vpopct16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_verllv8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_verllv16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_verll8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_verll16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 0f786d6ab1..e765c15941 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1146,6 +1146,9 @@
     F(0xe76f, VOC,     VRR_c, VE,  0, 0, 0, 0, voc, 0, IF_VEC)
 /* VECTOR POPULATION COUNT */
     F(0xe750, VPOPCT,  VRR_a, V,   0, 0, 0, 0, vpopct, 0, IF_VEC)
+/* VECTOR ELEMENT ROTATE LEFT LOGICAL */
+    F(0xe773, VERLLV,  VRR_c, V,   0, 0, 0, 0, verllv, 0, IF_VEC)
+    F(0xe733, VERLL,   VRS_a, V,   la2, 0, 0, 0, verll, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index df17b8242d..92c14174da 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -185,6 +185,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
 #define gen_gvec_2(v1, v2, gen) \
     tcg_gen_gvec_2(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                    16, 16, gen)
+#define gen_gvec_2s(v1, v2, c, gen) \
+    tcg_gen_gvec_2s(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                    16, 16, c, gen)
 #define gen_gvec_3(v1, v2, v3, gen) \
     tcg_gen_gvec_3(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                    vec_full_reg_offset(v3), 16, 16, gen)
@@ -1843,3 +1846,60 @@ static DisasJumpType op_vpopct(DisasContext *s, DisasOps *o)
     gen_gvec_2(get_field(s->fields, v1), get_field(s->fields, v2), &g[es]);
     return DISAS_NEXT;
 }
+
+static void gen_rll_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    TCGv_i32 t0 = tcg_temp_new_i32();
+
+    tcg_gen_andi_i32(t0, b, 31);
+    tcg_gen_rotl_i32(d, a, t0);
+    tcg_temp_free_i32(t0);
+}
+
+static void gen_rll_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+
+    tcg_gen_andi_i64(t0, b, 63);
+    tcg_gen_rotl_i64(d, a, t0);
+    tcg_temp_free_i64(t0);
+}
+
+static DisasJumpType op_verllv(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g[4] = {
+        { .fno = gen_helper_gvec_verllv8, },
+        { .fno = gen_helper_gvec_verllv16, },
+        { .fni4 = gen_rll_i32, },
+        { .fni8 = gen_rll_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), &g[es]);
+    return DISAS_NEXT;
+}
+
+static DisasJumpType op_verll(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen2s g[4] = {
+        { .fno = gen_helper_gvec_verll8, },
+        { .fno = gen_helper_gvec_verll16, },
+        { .fni4 = gen_rll_i32, },
+        { .fni8 = gen_rll_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_2s(get_field(s->fields, v1), get_field(s->fields, v3), o->addr1,
+                &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index f49d5c2ffb..ed67fa73fb 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -540,3 +540,43 @@ void HELPER(gvec_vpopct##BITS)(void *v1, const void *v2, uint32_t desc)        \
 }
 DEF_VPOPCT(8)
 DEF_VPOPCT(16)
+
+#define DEF_ROTL(BITS)                                                         \
+static uint##BITS##_t rotl##BITS(uint##BITS##_t a, uint8_t count)              \
+{                                                                              \
+    count &= BITS - 1;                                                         \
+    return (a << count) | (a >> (BITS - count));                               \
+}
+DEF_ROTL(8)
+DEF_ROTL(16)
+
+#define DEF_VERLLV(BITS)                                                       \
+void HELPER(gvec_verllv##BITS)(void *v1, const void *v2, const void *v3,       \
+                               uint32_t desc)                                  \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint##BITS##_t b = s390_vec_read_element##BITS(v3, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, rotl##BITS(a, b));                 \
+    }                                                                          \
+}
+DEF_VERLLV(8)
+DEF_VERLLV(16)
+
+#define DEF_VERLL(BITS)                                                        \
+void HELPER(gvec_verll##BITS)(void *v1, const void *v2, uint64_t count,        \
+                              uint32_t desc)                                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, rotl##BITS(a, count));             \
+    }                                                                          \
+}
+DEF_VERLL(8)
+DEF_VERLL(16)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 28/41] s390x/tcg: Implement VECTOR ELEMENT ROTATE AND INSERT UNDER MASK
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Use the new vector expansion for GVecGen3i. In the ool helpers, reuse
the rotation funvtions introduced with VECTOR ELEMENT ROTATE LEFT
LOGICAL.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  2 ++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 53 +++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 19 ++++++++++++
 4 files changed, 76 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index f0efaf9cd5..bfde7e3cc6 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -206,6 +206,8 @@ DEF_HELPER_FLAGS_4(gvec_verllv8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_verllv16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_verll8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_verll16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_verim8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_verim16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index e765c15941..59c323a796 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1149,6 +1149,8 @@
 /* VECTOR ELEMENT ROTATE LEFT LOGICAL */
     F(0xe773, VERLLV,  VRR_c, V,   0, 0, 0, 0, verllv, 0, IF_VEC)
     F(0xe733, VERLL,   VRS_a, V,   la2, 0, 0, 0, verll, 0, IF_VEC)
+/* VECTOR ELEMENT ROTATE AND INSERT UNDER MASK */
+    F(0xe772, VERIM,   VRI_d, V,   0, 0, 0, 0, verim, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 92c14174da..a6169b9827 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -197,6 +197,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
 #define gen_gvec_3_ptr(v1, v2, v3, ptr, data, fn) \
     tcg_gen_gvec_3_ptr(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                        vec_full_reg_offset(v3), ptr, 16, 16, data, fn)
+#define gen_gvec_3i(v1, v2, v3, c, gen) \
+    tcg_gen_gvec_3i(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                    vec_full_reg_offset(v3), c, 16, 16, gen)
 #define gen_gvec_4(v1, v2, v3, v4, gen) \
     tcg_gen_gvec_4(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                    vec_full_reg_offset(v3), vec_full_reg_offset(v4), \
@@ -1903,3 +1906,53 @@ static DisasJumpType op_verll(DisasContext *s, DisasOps *o)
                 &g[es]);
     return DISAS_NEXT;
 }
+
+static void gen_rim_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, int32_t c)
+{
+    TCGv_i32 t0 = tcg_temp_new_i32();
+    TCGv_i32 t1 = tcg_temp_new_i32();
+
+    tcg_gen_andc_i32(t0, a, b);
+    tcg_gen_rotli_i32(t1, a, c & 31);
+    tcg_gen_and_i32(t1, t1, b);
+    tcg_gen_or_i32(d, t0, t1);
+
+    tcg_temp_free_i32(t0);
+    tcg_temp_free_i32(t1);
+}
+
+static void gen_rim_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, int64_t c)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    tcg_gen_andc_i64(t0, a, b);
+    tcg_gen_rotli_i64(t1, a, c & 63);
+    tcg_gen_and_i64(t1, t1, b);
+    tcg_gen_or_i64(d, t0, t1);
+
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+}
+
+static DisasJumpType op_verim(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m5);
+    const uint8_t i4 = get_field(s->fields, i4) &
+                       (NUM_VEC_ELEMENT_BITS(es) - 1);
+    static const GVecGen3i g[4] = {
+        { .fno = gen_helper_gvec_verim8, },
+        { .fno = gen_helper_gvec_verim16, },
+        { .fni4 = gen_rim_i32, },
+        { .fni8 = gen_rim_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec_3i(get_field(s->fields, v1), get_field(s->fields, v2),
+                get_field(s->fields, v3), i4, &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index ed67fa73fb..6dc31003b9 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -14,6 +14,7 @@
 #include "cpu.h"
 #include "vec.h"
 #include "exec/helper-proto.h"
+#include "tcg/tcg-gvec-desc.h"
 
 /*
  * Add two 128 bit vectors, returning the carry.
@@ -580,3 +581,21 @@ void HELPER(gvec_verll##BITS)(void *v1, const void *v2, uint64_t count,        \
 }
 DEF_VERLL(8)
 DEF_VERLL(16)
+
+#define DEF_VERIM(BITS)                                                        \
+void HELPER(gvec_verim##BITS)(void *v1, const void *v2, const void *v3,        \
+                              uint32_t desc)                                   \
+{                                                                              \
+    const uint8_t count = simd_data(desc);                                     \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint##BITS##_t mask = s390_vec_read_element##BITS(v3, i);        \
+        const uint##BITS##_t d = (a & ~mask) | (rotl##BITS(a, count) & mask);  \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, d);                                \
+    }                                                                          \
+}
+DEF_VERIM(8)
+DEF_VERIM(16)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 28/41] s390x/tcg: Implement VECTOR ELEMENT ROTATE AND INSERT UNDER MASK
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Use the new vector expansion for GVecGen3i. In the ool helpers, reuse
the rotation funvtions introduced with VECTOR ELEMENT ROTATE LEFT
LOGICAL.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  2 ++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 53 +++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 19 ++++++++++++
 4 files changed, 76 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index f0efaf9cd5..bfde7e3cc6 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -206,6 +206,8 @@ DEF_HELPER_FLAGS_4(gvec_verllv8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_verllv16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_verll8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_verll16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_verim8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_verim16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index e765c15941..59c323a796 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1149,6 +1149,8 @@
 /* VECTOR ELEMENT ROTATE LEFT LOGICAL */
     F(0xe773, VERLLV,  VRR_c, V,   0, 0, 0, 0, verllv, 0, IF_VEC)
     F(0xe733, VERLL,   VRS_a, V,   la2, 0, 0, 0, verll, 0, IF_VEC)
+/* VECTOR ELEMENT ROTATE AND INSERT UNDER MASK */
+    F(0xe772, VERIM,   VRI_d, V,   0, 0, 0, 0, verim, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 92c14174da..a6169b9827 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -197,6 +197,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
 #define gen_gvec_3_ptr(v1, v2, v3, ptr, data, fn) \
     tcg_gen_gvec_3_ptr(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                        vec_full_reg_offset(v3), ptr, 16, 16, data, fn)
+#define gen_gvec_3i(v1, v2, v3, c, gen) \
+    tcg_gen_gvec_3i(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                    vec_full_reg_offset(v3), c, 16, 16, gen)
 #define gen_gvec_4(v1, v2, v3, v4, gen) \
     tcg_gen_gvec_4(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                    vec_full_reg_offset(v3), vec_full_reg_offset(v4), \
@@ -1903,3 +1906,53 @@ static DisasJumpType op_verll(DisasContext *s, DisasOps *o)
                 &g[es]);
     return DISAS_NEXT;
 }
+
+static void gen_rim_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, int32_t c)
+{
+    TCGv_i32 t0 = tcg_temp_new_i32();
+    TCGv_i32 t1 = tcg_temp_new_i32();
+
+    tcg_gen_andc_i32(t0, a, b);
+    tcg_gen_rotli_i32(t1, a, c & 31);
+    tcg_gen_and_i32(t1, t1, b);
+    tcg_gen_or_i32(d, t0, t1);
+
+    tcg_temp_free_i32(t0);
+    tcg_temp_free_i32(t1);
+}
+
+static void gen_rim_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, int64_t c)
+{
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    tcg_gen_andc_i64(t0, a, b);
+    tcg_gen_rotli_i64(t1, a, c & 63);
+    tcg_gen_and_i64(t1, t1, b);
+    tcg_gen_or_i64(d, t0, t1);
+
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+}
+
+static DisasJumpType op_verim(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m5);
+    const uint8_t i4 = get_field(s->fields, i4) &
+                       (NUM_VEC_ELEMENT_BITS(es) - 1);
+    static const GVecGen3i g[4] = {
+        { .fno = gen_helper_gvec_verim8, },
+        { .fno = gen_helper_gvec_verim16, },
+        { .fni4 = gen_rim_i32, },
+        { .fni8 = gen_rim_i64, },
+    };
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec_3i(get_field(s->fields, v1), get_field(s->fields, v2),
+                get_field(s->fields, v3), i4, &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index ed67fa73fb..6dc31003b9 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -14,6 +14,7 @@
 #include "cpu.h"
 #include "vec.h"
 #include "exec/helper-proto.h"
+#include "tcg/tcg-gvec-desc.h"
 
 /*
  * Add two 128 bit vectors, returning the carry.
@@ -580,3 +581,21 @@ void HELPER(gvec_verll##BITS)(void *v1, const void *v2, uint64_t count,        \
 }
 DEF_VERLL(8)
 DEF_VERLL(16)
+
+#define DEF_VERIM(BITS)                                                        \
+void HELPER(gvec_verim##BITS)(void *v1, const void *v2, const void *v3,        \
+                              uint32_t desc)                                   \
+{                                                                              \
+    const uint8_t count = simd_data(desc);                                     \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint##BITS##_t mask = s390_vec_read_element##BITS(v3, i);        \
+        const uint##BITS##_t d = (a & ~mask) | (rotl##BITS(a, count) & mask);  \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, d);                                \
+    }                                                                          \
+}
+DEF_VERIM(8)
+DEF_VERIM(16)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 29/41] s390x/tcg: Implement VECTOR ELEMENT SHIFT
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Only for one special case we can reuse real gvec helpers. Mostly
rely on oom helpers.

One important thing to take care of is always to properly mask of
unused bits from the shift count.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  18 +++++
 target/s390x/insn-data.def      |   9 +++
 target/s390x/translate_vx.inc.c | 113 ++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   |  99 ++++++++++++++++++++++++++++
 4 files changed, 239 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index bfde7e3cc6..a04d1d8948 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -208,6 +208,24 @@ DEF_HELPER_FLAGS_4(gvec_verll8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_verll16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_verim8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_verim16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_veslv8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_veslv16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_veslv32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_veslv64, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrav8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrav16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrav32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrav64, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrlv8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrlv16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrlv32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrlv64, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesl8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vesl16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vesra8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vesra16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrl8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrl16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 59c323a796..f4b67bda7e 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1151,6 +1151,15 @@
     F(0xe733, VERLL,   VRS_a, V,   la2, 0, 0, 0, verll, 0, IF_VEC)
 /* VECTOR ELEMENT ROTATE AND INSERT UNDER MASK */
     F(0xe772, VERIM,   VRI_d, V,   0, 0, 0, 0, verim, 0, IF_VEC)
+/* VECTOR ELEMENT SHIFT LEFT */
+    F(0xe770, VESLV,   VRR_c, V,   0, 0, 0, 0, vesv, 0, IF_VEC)
+    F(0xe730, VESL,    VRS_a, V,   la2, 0, 0, 0, ves, 0, IF_VEC)
+/* VECTOR ELEMENT SHIFT RIGHT ARITHMETIC */
+    F(0xe77a, VESRAV,  VRR_c, V,   0, 0, 0, 0, vesv, 0, IF_VEC)
+    F(0xe73a, VESRA,   VRS_a, V,   la2, 0, 0, 0, ves, 0, IF_VEC)
+/* VECTOR ELEMENT SHIFT RIGHT LOGICAL */
+    F(0xe778, VESRLV,  VRR_c, V,   0, 0, 0, 0, vesv, 0, IF_VEC)
+    F(0xe738, VESRL,   VRS_a, V,   la2, 0, 0, 0, ves, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index a6169b9827..7553e4069e 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -218,6 +218,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
 #define gen_gvec_fn_2(fn, es, v1, v2) \
     tcg_gen_gvec_##fn(es, vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                       16, 16)
+#define gen_gvec_fn_2i(fn, es, v1, v2, c) \
+    tcg_gen_gvec_##fn(es, vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                      c, 16, 16)
 #define gen_gvec_fn_3(fn, es, v1, v2, v3) \
     tcg_gen_gvec_##fn(es, vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                       vec_full_reg_offset(v3), 16, 16)
@@ -1956,3 +1959,113 @@ static DisasJumpType op_verim(DisasContext *s, DisasOps *o)
                 get_field(s->fields, v3), i4, &g[es]);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vesv(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g_veslv[4] = {
+        { .fno = gen_helper_gvec_veslv8, },
+        { .fno = gen_helper_gvec_veslv16, },
+        { .fno = gen_helper_gvec_veslv32, },
+        { .fno = gen_helper_gvec_veslv64, },
+    };
+    static const GVecGen3 g_vesrav[4] = {
+        { .fno = gen_helper_gvec_vesrav8, },
+        { .fno = gen_helper_gvec_vesrav16, },
+        { .fno = gen_helper_gvec_vesrav32, },
+        { .fno = gen_helper_gvec_vesrav64, },
+    };
+    static const GVecGen3 g_vesrlv[4] = {
+        { .fno = gen_helper_gvec_vesrlv8, },
+        { .fno = gen_helper_gvec_vesrlv16, },
+        { .fno = gen_helper_gvec_vesrlv32, },
+        { .fno = gen_helper_gvec_vesrlv64, },
+    };
+    const GVecGen3 *fn;
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    switch (s->fields->op2) {
+    case 0x70:
+        fn = &g_veslv[es];
+        break;
+    case 0x7a:
+        fn = &g_vesrav[es];
+        break;
+    case 0x78:
+        fn = &g_vesrlv[es];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), fn);
+    return DISAS_NEXT;
+}
+
+static DisasJumpType op_ves(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    const uint8_t d2 = get_field(s->fields, d2) &
+                       (NUM_VEC_ELEMENT_BITS(es) - 1);
+    const uint8_t v1 = get_field(s->fields, v1);
+    const uint8_t v3 = get_field(s->fields, v3);
+    static const GVecGen2s g_vesl[4] = {
+        { .fno = gen_helper_gvec_vesl8, },
+        { .fno = gen_helper_gvec_vesl16, },
+        { .fni4 = tcg_gen_shl_i32, },
+        { .fni8 = tcg_gen_shl_i64, },
+    };
+    static const GVecGen2s g_vesra[4] = {
+        { .fno = gen_helper_gvec_vesra8, },
+        { .fno = gen_helper_gvec_vesra16, },
+        { .fni4 = tcg_gen_sar_i32, },
+        { .fni8 = tcg_gen_sar_i64, },
+    };
+    static const GVecGen2s g_vesrl[4] = {
+        { .fno = gen_helper_gvec_vesrl8, },
+        { .fno = gen_helper_gvec_vesrl16, },
+        { .fni4 = tcg_gen_shr_i32, },
+        { .fni8 = tcg_gen_shr_i64, },
+    };
+    const GVecGen2s *fn;
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    switch (s->fields->op2) {
+    case 0x30:
+        if (likely(!get_field(s->fields, b2))) {
+            gen_gvec_fn_2i(shli, es, v1, v3, d2);
+            return DISAS_NEXT;
+        }
+        fn = &g_vesl[es];
+        break;
+    case 0x3a:
+        if (likely(!get_field(s->fields, b2))) {
+            gen_gvec_fn_2i(sari, es, v1, v3, d2);
+            return DISAS_NEXT;
+        }
+        fn = &g_vesra[es];
+        break;
+    case 0x38:
+        if (likely(!get_field(s->fields, b2))) {
+            gen_gvec_fn_2i(shri, es, v1, v3, d2);
+            return DISAS_NEXT;
+        }
+        fn = &g_vesrl[es];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    tcg_gen_andi_i64(o->addr1, o->addr1, NUM_VEC_ELEMENT_BITS(es) - 1);
+    gen_gvec_2s(v1, v3, o->addr1, fn);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 6dc31003b9..8cc736b287 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -599,3 +599,102 @@ void HELPER(gvec_verim##BITS)(void *v1, const void *v2, const void *v3,        \
 }
 DEF_VERIM(8)
 DEF_VERIM(16)
+
+#define DEF_VESLV(BITS)                                                        \
+void HELPER(gvec_veslv##BITS)(void *v1, const void *v2, const void *v3,        \
+                              uint32_t desc)                                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint8_t shift = s390_vec_read_element##BITS(v3, i) & (BITS - 1); \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a << shift);                       \
+    }                                                                          \
+}
+DEF_VESLV(8)
+DEF_VESLV(16)
+DEF_VESLV(32)
+DEF_VESLV(64)
+
+#define DEF_VESRAV(BITS)                                                       \
+void HELPER(gvec_vesrav##BITS)(void *v1, const void *v2, const void *v3,       \
+                               uint32_t desc)                                  \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const int##BITS##_t a = s390_vec_read_element##BITS(v2, i);            \
+        const uint8_t shift = s390_vec_read_element##BITS(v3, i) & (BITS - 1); \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a >> shift);                       \
+    }                                                                          \
+}
+DEF_VESRAV(8)
+DEF_VESRAV(16)
+DEF_VESRAV(32)
+DEF_VESRAV(64)
+
+#define DEF_VESRLV(BITS)                                                       \
+void HELPER(gvec_vesrlv##BITS)(void *v1, const void *v2, const void *v3,       \
+                               uint32_t desc)                                  \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint8_t shift = s390_vec_read_element##BITS(v3, i) & (BITS - 1); \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a >> shift);                       \
+    }                                                                          \
+}
+DEF_VESRLV(8)
+DEF_VESRLV(16)
+DEF_VESRLV(32)
+DEF_VESRLV(64)
+
+#define DEF_VESL(BITS)                                                         \
+void HELPER(gvec_vesl##BITS)(void *v1, const void *v3, uint64_t shift,         \
+                             uint32_t desc)                                    \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v3, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a << shift);                       \
+    }                                                                          \
+}
+DEF_VESL(8)
+DEF_VESL(16)
+
+#define DEF_VESRA(BITS)                                                        \
+void HELPER(gvec_vesra##BITS)(void *v1, const void *v3, uint64_t shift,        \
+                              uint32_t desc)                                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const int##BITS##_t a = s390_vec_read_element##BITS(v3, i);            \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a >> shift);                       \
+    }                                                                          \
+}
+DEF_VESRA(8)
+DEF_VESRA(16)
+
+#define DEF_VESRL(BITS)                                                        \
+void HELPER(gvec_vesrl##BITS)(void *v1, const void *v3, uint64_t shift,        \
+                              uint32_t desc)                                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v3, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a >> shift);                       \
+    }                                                                          \
+}
+DEF_VESRL(8)
+DEF_VESRL(16)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 29/41] s390x/tcg: Implement VECTOR ELEMENT SHIFT
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Only for one special case we can reuse real gvec helpers. Mostly
rely on oom helpers.

One important thing to take care of is always to properly mask of
unused bits from the shift count.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  18 +++++
 target/s390x/insn-data.def      |   9 +++
 target/s390x/translate_vx.inc.c | 113 ++++++++++++++++++++++++++++++++
 target/s390x/vec_int_helper.c   |  99 ++++++++++++++++++++++++++++
 4 files changed, 239 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index bfde7e3cc6..a04d1d8948 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -208,6 +208,24 @@ DEF_HELPER_FLAGS_4(gvec_verll8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_verll16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_verim8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_verim16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_veslv8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_veslv16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_veslv32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_veslv64, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrav8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrav16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrav32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrav64, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrlv8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrlv16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrlv32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrlv64, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vesl8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vesl16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vesra8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vesra16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrl8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vesrl16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 59c323a796..f4b67bda7e 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1151,6 +1151,15 @@
     F(0xe733, VERLL,   VRS_a, V,   la2, 0, 0, 0, verll, 0, IF_VEC)
 /* VECTOR ELEMENT ROTATE AND INSERT UNDER MASK */
     F(0xe772, VERIM,   VRI_d, V,   0, 0, 0, 0, verim, 0, IF_VEC)
+/* VECTOR ELEMENT SHIFT LEFT */
+    F(0xe770, VESLV,   VRR_c, V,   0, 0, 0, 0, vesv, 0, IF_VEC)
+    F(0xe730, VESL,    VRS_a, V,   la2, 0, 0, 0, ves, 0, IF_VEC)
+/* VECTOR ELEMENT SHIFT RIGHT ARITHMETIC */
+    F(0xe77a, VESRAV,  VRR_c, V,   0, 0, 0, 0, vesv, 0, IF_VEC)
+    F(0xe73a, VESRA,   VRS_a, V,   la2, 0, 0, 0, ves, 0, IF_VEC)
+/* VECTOR ELEMENT SHIFT RIGHT LOGICAL */
+    F(0xe778, VESRLV,  VRR_c, V,   0, 0, 0, 0, vesv, 0, IF_VEC)
+    F(0xe738, VESRL,   VRS_a, V,   la2, 0, 0, 0, ves, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index a6169b9827..7553e4069e 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -218,6 +218,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
 #define gen_gvec_fn_2(fn, es, v1, v2) \
     tcg_gen_gvec_##fn(es, vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                       16, 16)
+#define gen_gvec_fn_2i(fn, es, v1, v2, c) \
+    tcg_gen_gvec_##fn(es, vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                      c, 16, 16)
 #define gen_gvec_fn_3(fn, es, v1, v2, v3) \
     tcg_gen_gvec_##fn(es, vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                       vec_full_reg_offset(v3), 16, 16)
@@ -1956,3 +1959,113 @@ static DisasJumpType op_verim(DisasContext *s, DisasOps *o)
                 get_field(s->fields, v3), i4, &g[es]);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vesv(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g_veslv[4] = {
+        { .fno = gen_helper_gvec_veslv8, },
+        { .fno = gen_helper_gvec_veslv16, },
+        { .fno = gen_helper_gvec_veslv32, },
+        { .fno = gen_helper_gvec_veslv64, },
+    };
+    static const GVecGen3 g_vesrav[4] = {
+        { .fno = gen_helper_gvec_vesrav8, },
+        { .fno = gen_helper_gvec_vesrav16, },
+        { .fno = gen_helper_gvec_vesrav32, },
+        { .fno = gen_helper_gvec_vesrav64, },
+    };
+    static const GVecGen3 g_vesrlv[4] = {
+        { .fno = gen_helper_gvec_vesrlv8, },
+        { .fno = gen_helper_gvec_vesrlv16, },
+        { .fno = gen_helper_gvec_vesrlv32, },
+        { .fno = gen_helper_gvec_vesrlv64, },
+    };
+    const GVecGen3 *fn;
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    switch (s->fields->op2) {
+    case 0x70:
+        fn = &g_veslv[es];
+        break;
+    case 0x7a:
+        fn = &g_vesrav[es];
+        break;
+    case 0x78:
+        fn = &g_vesrlv[es];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), fn);
+    return DISAS_NEXT;
+}
+
+static DisasJumpType op_ves(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    const uint8_t d2 = get_field(s->fields, d2) &
+                       (NUM_VEC_ELEMENT_BITS(es) - 1);
+    const uint8_t v1 = get_field(s->fields, v1);
+    const uint8_t v3 = get_field(s->fields, v3);
+    static const GVecGen2s g_vesl[4] = {
+        { .fno = gen_helper_gvec_vesl8, },
+        { .fno = gen_helper_gvec_vesl16, },
+        { .fni4 = tcg_gen_shl_i32, },
+        { .fni8 = tcg_gen_shl_i64, },
+    };
+    static const GVecGen2s g_vesra[4] = {
+        { .fno = gen_helper_gvec_vesra8, },
+        { .fno = gen_helper_gvec_vesra16, },
+        { .fni4 = tcg_gen_sar_i32, },
+        { .fni8 = tcg_gen_sar_i64, },
+    };
+    static const GVecGen2s g_vesrl[4] = {
+        { .fno = gen_helper_gvec_vesrl8, },
+        { .fno = gen_helper_gvec_vesrl16, },
+        { .fni4 = tcg_gen_shr_i32, },
+        { .fni8 = tcg_gen_shr_i64, },
+    };
+    const GVecGen2s *fn;
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    switch (s->fields->op2) {
+    case 0x30:
+        if (likely(!get_field(s->fields, b2))) {
+            gen_gvec_fn_2i(shli, es, v1, v3, d2);
+            return DISAS_NEXT;
+        }
+        fn = &g_vesl[es];
+        break;
+    case 0x3a:
+        if (likely(!get_field(s->fields, b2))) {
+            gen_gvec_fn_2i(sari, es, v1, v3, d2);
+            return DISAS_NEXT;
+        }
+        fn = &g_vesra[es];
+        break;
+    case 0x38:
+        if (likely(!get_field(s->fields, b2))) {
+            gen_gvec_fn_2i(shri, es, v1, v3, d2);
+            return DISAS_NEXT;
+        }
+        fn = &g_vesrl[es];
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    tcg_gen_andi_i64(o->addr1, o->addr1, NUM_VEC_ELEMENT_BITS(es) - 1);
+    gen_gvec_2s(v1, v3, o->addr1, fn);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 6dc31003b9..8cc736b287 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -599,3 +599,102 @@ void HELPER(gvec_verim##BITS)(void *v1, const void *v2, const void *v3,        \
 }
 DEF_VERIM(8)
 DEF_VERIM(16)
+
+#define DEF_VESLV(BITS)                                                        \
+void HELPER(gvec_veslv##BITS)(void *v1, const void *v2, const void *v3,        \
+                              uint32_t desc)                                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint8_t shift = s390_vec_read_element##BITS(v3, i) & (BITS - 1); \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a << shift);                       \
+    }                                                                          \
+}
+DEF_VESLV(8)
+DEF_VESLV(16)
+DEF_VESLV(32)
+DEF_VESLV(64)
+
+#define DEF_VESRAV(BITS)                                                       \
+void HELPER(gvec_vesrav##BITS)(void *v1, const void *v2, const void *v3,       \
+                               uint32_t desc)                                  \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const int##BITS##_t a = s390_vec_read_element##BITS(v2, i);            \
+        const uint8_t shift = s390_vec_read_element##BITS(v3, i) & (BITS - 1); \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a >> shift);                       \
+    }                                                                          \
+}
+DEF_VESRAV(8)
+DEF_VESRAV(16)
+DEF_VESRAV(32)
+DEF_VESRAV(64)
+
+#define DEF_VESRLV(BITS)                                                       \
+void HELPER(gvec_vesrlv##BITS)(void *v1, const void *v2, const void *v3,       \
+                               uint32_t desc)                                  \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint8_t shift = s390_vec_read_element##BITS(v3, i) & (BITS - 1); \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a >> shift);                       \
+    }                                                                          \
+}
+DEF_VESRLV(8)
+DEF_VESRLV(16)
+DEF_VESRLV(32)
+DEF_VESRLV(64)
+
+#define DEF_VESL(BITS)                                                         \
+void HELPER(gvec_vesl##BITS)(void *v1, const void *v3, uint64_t shift,         \
+                             uint32_t desc)                                    \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v3, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a << shift);                       \
+    }                                                                          \
+}
+DEF_VESL(8)
+DEF_VESL(16)
+
+#define DEF_VESRA(BITS)                                                        \
+void HELPER(gvec_vesra##BITS)(void *v1, const void *v3, uint64_t shift,        \
+                              uint32_t desc)                                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const int##BITS##_t a = s390_vec_read_element##BITS(v3, i);            \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a >> shift);                       \
+    }                                                                          \
+}
+DEF_VESRA(8)
+DEF_VESRA(16)
+
+#define DEF_VESRL(BITS)                                                        \
+void HELPER(gvec_vesrl##BITS)(void *v1, const void *v3, uint64_t shift,        \
+                              uint32_t desc)                                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v3, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a >> shift);                       \
+    }                                                                          \
+}
+DEF_VESRL(8)
+DEF_VESRL(16)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 30/41] s390x/tcg: Implement VECTOR SHIFT LEFT (BY BYTE)
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

We can reuse the existing 128-bit shift utility function.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  4 ++++
 target/s390x/translate_vx.inc.c | 20 ++++++++++++++++++++
 target/s390x/vec_int_helper.c   |  6 ++++++
 4 files changed, 31 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index a04d1d8948..67037f6de6 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -226,6 +226,7 @@ DEF_HELPER_FLAGS_4(gvec_vesra8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vesra16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vesrl8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vesrl16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vsl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index f4b67bda7e..2621e433cd 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1160,6 +1160,10 @@
 /* VECTOR ELEMENT SHIFT RIGHT LOGICAL */
     F(0xe778, VESRLV,  VRR_c, V,   0, 0, 0, 0, vesv, 0, IF_VEC)
     F(0xe738, VESRL,   VRS_a, V,   la2, 0, 0, 0, ves, 0, IF_VEC)
+/* VECTOR SHIFT LEFT */
+    F(0xe774, VSL,     VRR_c, V,   0, 0, 0, 0, vsl, 0, IF_VEC)
+/* VECTOR SHIFT LEFT BY BYTE */
+    F(0xe775, VSLB,    VRR_c, V,   0, 0, 0, 0, vsl, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 7553e4069e..c08710fd45 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -188,6 +188,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
 #define gen_gvec_2s(v1, v2, c, gen) \
     tcg_gen_gvec_2s(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                     16, 16, c, gen)
+#define gen_gvec_2i_ool(v1, v2, c, data, fn) \
+    tcg_gen_gvec_2i_ool(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                        c, 16, 16, data, fn)
 #define gen_gvec_3(v1, v2, v3, gen) \
     tcg_gen_gvec_3(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                    vec_full_reg_offset(v3), 16, 16, gen)
@@ -2069,3 +2072,20 @@ static DisasJumpType op_ves(DisasContext *s, DisasOps *o)
     gen_gvec_2s(v1, v3, o->addr1, fn);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsl(DisasContext *s, DisasOps *o)
+{
+    TCGv_i64 shift = tcg_temp_new_i64();
+
+    read_vec_element_i64(shift, get_field(s->fields, v3), 7, ES_8);
+    if (s->fields->op2 == 0x74) {
+        tcg_gen_andi_i64(shift, shift, 0x7);
+    } else {
+        tcg_gen_andi_i64(shift, shift, 0x78);
+    }
+
+    gen_gvec_2i_ool(get_field(s->fields, v1), get_field(s->fields, v2),
+                    shift, 0, gen_helper_gvec_vsl);
+    tcg_temp_free_i64(shift);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 8cc736b287..b1a3a25f9f 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -698,3 +698,9 @@ void HELPER(gvec_vesrl##BITS)(void *v1, const void *v3, uint64_t shift,        \
 }
 DEF_VESRL(8)
 DEF_VESRL(16)
+
+void HELPER(gvec_vsl)(void *v1, const void *v2, uint64_t count,
+                      uint32_t desc)
+{
+    s390_vec_shl(v1, v2, count);
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 30/41] s390x/tcg: Implement VECTOR SHIFT LEFT (BY BYTE)
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

We can reuse the existing 128-bit shift utility function.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  4 ++++
 target/s390x/translate_vx.inc.c | 20 ++++++++++++++++++++
 target/s390x/vec_int_helper.c   |  6 ++++++
 4 files changed, 31 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index a04d1d8948..67037f6de6 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -226,6 +226,7 @@ DEF_HELPER_FLAGS_4(gvec_vesra8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vesra16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vesrl8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vesrl16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vsl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index f4b67bda7e..2621e433cd 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1160,6 +1160,10 @@
 /* VECTOR ELEMENT SHIFT RIGHT LOGICAL */
     F(0xe778, VESRLV,  VRR_c, V,   0, 0, 0, 0, vesv, 0, IF_VEC)
     F(0xe738, VESRL,   VRS_a, V,   la2, 0, 0, 0, ves, 0, IF_VEC)
+/* VECTOR SHIFT LEFT */
+    F(0xe774, VSL,     VRR_c, V,   0, 0, 0, 0, vsl, 0, IF_VEC)
+/* VECTOR SHIFT LEFT BY BYTE */
+    F(0xe775, VSLB,    VRR_c, V,   0, 0, 0, 0, vsl, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 7553e4069e..c08710fd45 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -188,6 +188,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
 #define gen_gvec_2s(v1, v2, c, gen) \
     tcg_gen_gvec_2s(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                     16, 16, c, gen)
+#define gen_gvec_2i_ool(v1, v2, c, data, fn) \
+    tcg_gen_gvec_2i_ool(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                        c, 16, 16, data, fn)
 #define gen_gvec_3(v1, v2, v3, gen) \
     tcg_gen_gvec_3(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                    vec_full_reg_offset(v3), 16, 16, gen)
@@ -2069,3 +2072,20 @@ static DisasJumpType op_ves(DisasContext *s, DisasOps *o)
     gen_gvec_2s(v1, v3, o->addr1, fn);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsl(DisasContext *s, DisasOps *o)
+{
+    TCGv_i64 shift = tcg_temp_new_i64();
+
+    read_vec_element_i64(shift, get_field(s->fields, v3), 7, ES_8);
+    if (s->fields->op2 == 0x74) {
+        tcg_gen_andi_i64(shift, shift, 0x7);
+    } else {
+        tcg_gen_andi_i64(shift, shift, 0x78);
+    }
+
+    gen_gvec_2i_ool(get_field(s->fields, v1), get_field(s->fields, v2),
+                    shift, 0, gen_helper_gvec_vsl);
+    tcg_temp_free_i64(shift);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 8cc736b287..b1a3a25f9f 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -698,3 +698,9 @@ void HELPER(gvec_vesrl##BITS)(void *v1, const void *v3, uint64_t shift,        \
 }
 DEF_VESRL(8)
 DEF_VESRL(16)
+
+void HELPER(gvec_vsl)(void *v1, const void *v2, uint64_t count,
+                      uint32_t desc)
+{
+    s390_vec_shl(v1, v2, count);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 31/41] s390x/tcg: Implement VECTOR SHIFT LEFT DOUBLE BY BYTE
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Implement it via an ool helper. reusing the existing shift helpers.
In case the starting index is 0, it is basically a copy of v2 to v1.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 15 +++++++++++++++
 target/s390x/vec_int_helper.c   | 20 ++++++++++++++++++++
 4 files changed, 38 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 67037f6de6..a433f57009 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -227,6 +227,7 @@ DEF_HELPER_FLAGS_4(gvec_vesra16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vesrl8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vesrl16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vsl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vsldb, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 2621e433cd..76aec5a21f 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1164,6 +1164,8 @@
     F(0xe774, VSL,     VRR_c, V,   0, 0, 0, 0, vsl, 0, IF_VEC)
 /* VECTOR SHIFT LEFT BY BYTE */
     F(0xe775, VSLB,    VRR_c, V,   0, 0, 0, 0, vsl, 0, IF_VEC)
+/* VECTOR SHIFT LEFT DOUBLE BY BYTE */
+    F(0xe777, VSLDB,   VRI_d, V,   0, 0, 0, 0, vsldb, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index c08710fd45..221b729ee0 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2089,3 +2089,18 @@ static DisasJumpType op_vsl(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(shift);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsldb(DisasContext *s, DisasOps *o)
+{
+    int src_idx = get_field(s->fields, i4) & 0xf;
+
+    if (src_idx == 0) {
+        gen_gvec_mov(get_field(s->fields, v1), get_field(s->fields, v2));
+    } else {
+        gen_gvec_3_ool(get_field(s->fields, v1), get_field(s->fields, v2),
+                       get_field(s->fields, v3), src_idx,
+                       gen_helper_gvec_vsldb);
+        return DISAS_NEXT;
+    }
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index b1a3a25f9f..8b922e717f 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -43,6 +43,13 @@ static bool s390_vec_is_zero(const S390Vector *v)
     return !v->doubleword[0] && !v->doubleword[1];
 }
 
+static void s390_vec_or(S390Vector *res, const S390Vector *a,
+                        const S390Vector *b)
+{
+    res->doubleword[0] = a->doubleword[0] | b->doubleword[0];
+    res->doubleword[1] = a->doubleword[1] | b->doubleword[1];
+}
+
 static void s390_vec_xor(S390Vector *res, const S390Vector *a,
                          const S390Vector *b)
 {
@@ -704,3 +711,16 @@ void HELPER(gvec_vsl)(void *v1, const void *v2, uint64_t count,
 {
     s390_vec_shl(v1, v2, count);
 }
+
+void HELPER(gvec_vsldb)(void *v1, const void *v2, const void *v3,
+                        uint32_t desc)
+{
+    const uint8_t src_idx = simd_data(desc);
+    S390Vector t0;
+    S390Vector t1;
+
+    g_assert(src_idx > 0 && src_idx < 16);
+    s390_vec_shl(&t0, v2, src_idx * 8);
+    s390_vec_shr(&t1, v3, 128 - src_idx * 8);
+    s390_vec_or(v1, &t0, &t1);
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 31/41] s390x/tcg: Implement VECTOR SHIFT LEFT DOUBLE BY BYTE
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Implement it via an ool helper. reusing the existing shift helpers.
In case the starting index is 0, it is basically a copy of v2 to v1.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 15 +++++++++++++++
 target/s390x/vec_int_helper.c   | 20 ++++++++++++++++++++
 4 files changed, 38 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 67037f6de6..a433f57009 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -227,6 +227,7 @@ DEF_HELPER_FLAGS_4(gvec_vesra16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vesrl8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vesrl16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vsl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vsldb, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 2621e433cd..76aec5a21f 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1164,6 +1164,8 @@
     F(0xe774, VSL,     VRR_c, V,   0, 0, 0, 0, vsl, 0, IF_VEC)
 /* VECTOR SHIFT LEFT BY BYTE */
     F(0xe775, VSLB,    VRR_c, V,   0, 0, 0, 0, vsl, 0, IF_VEC)
+/* VECTOR SHIFT LEFT DOUBLE BY BYTE */
+    F(0xe777, VSLDB,   VRI_d, V,   0, 0, 0, 0, vsldb, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index c08710fd45..221b729ee0 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2089,3 +2089,18 @@ static DisasJumpType op_vsl(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(shift);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsldb(DisasContext *s, DisasOps *o)
+{
+    int src_idx = get_field(s->fields, i4) & 0xf;
+
+    if (src_idx == 0) {
+        gen_gvec_mov(get_field(s->fields, v1), get_field(s->fields, v2));
+    } else {
+        gen_gvec_3_ool(get_field(s->fields, v1), get_field(s->fields, v2),
+                       get_field(s->fields, v3), src_idx,
+                       gen_helper_gvec_vsldb);
+        return DISAS_NEXT;
+    }
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index b1a3a25f9f..8b922e717f 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -43,6 +43,13 @@ static bool s390_vec_is_zero(const S390Vector *v)
     return !v->doubleword[0] && !v->doubleword[1];
 }
 
+static void s390_vec_or(S390Vector *res, const S390Vector *a,
+                        const S390Vector *b)
+{
+    res->doubleword[0] = a->doubleword[0] | b->doubleword[0];
+    res->doubleword[1] = a->doubleword[1] | b->doubleword[1];
+}
+
 static void s390_vec_xor(S390Vector *res, const S390Vector *a,
                          const S390Vector *b)
 {
@@ -704,3 +711,16 @@ void HELPER(gvec_vsl)(void *v1, const void *v2, uint64_t count,
 {
     s390_vec_shl(v1, v2, count);
 }
+
+void HELPER(gvec_vsldb)(void *v1, const void *v2, const void *v3,
+                        uint32_t desc)
+{
+    const uint8_t src_idx = simd_data(desc);
+    S390Vector t0;
+    S390Vector t1;
+
+    g_assert(src_idx > 0 && src_idx < 16);
+    s390_vec_shl(&t0, v2, src_idx * 8);
+    s390_vec_shr(&t1, v3, 128 - src_idx * 8);
+    s390_vec_or(v1, &t0, &t1);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 32/41] s390x/tcg: Implement VECTOR SHIFT RIGHT ARITHMETIC
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Similar to VECTOR SHIFT LEFT ARITHMETIC. Add s390_vec_sar() similar to
s390_vec_shr().

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  4 ++++
 target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
 target/s390x/vec_int_helper.c   | 26 ++++++++++++++++++++++++++
 4 files changed, 48 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index a433f57009..54a861c179 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -228,6 +228,7 @@ DEF_HELPER_FLAGS_4(gvec_vesrl8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vesrl16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vsl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vsldb, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vsra, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 76aec5a21f..587de3eaac 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1166,6 +1166,10 @@
     F(0xe775, VSLB,    VRR_c, V,   0, 0, 0, 0, vsl, 0, IF_VEC)
 /* VECTOR SHIFT LEFT DOUBLE BY BYTE */
     F(0xe777, VSLDB,   VRI_d, V,   0, 0, 0, 0, vsldb, 0, IF_VEC)
+/* VECTOR SHIFT RIGHT ARITHMETIC */
+    F(0xe77e, VSRA,    VRR_c, V,   0, 0, 0, 0, vsra, 0, IF_VEC)
+/* VECTOR SHIFT RIGHT ARITHMETIC BY BYTE */
+    F(0xe77f, VSRAB,   VRR_c, V,   0, 0, 0, 0, vsra, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 221b729ee0..8c44dcf471 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2104,3 +2104,20 @@ static DisasJumpType op_vsldb(DisasContext *s, DisasOps *o)
     }
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsra(DisasContext *s, DisasOps *o)
+{
+    TCGv_i64 shift = tcg_temp_new_i64();
+
+    read_vec_element_i64(shift, get_field(s->fields, v3), 7, ES_8);
+    if (s->fields->op2 == 0x7e) {
+        tcg_gen_andi_i64(shift, shift, 0x7);
+    } else {
+        tcg_gen_andi_i64(shift, shift, 0x78);
+    }
+
+    gen_gvec_2i_ool(get_field(s->fields, v1), get_field(s->fields, v2),
+                    shift, 0, gen_helper_gvec_vsra);
+    tcg_temp_free_i64(shift);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 8b922e717f..220e6647ff 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -78,6 +78,26 @@ static void s390_vec_shl(S390Vector *d, const S390Vector *a, uint64_t count)
     }
 }
 
+static void s390_vec_sar(S390Vector *d, const S390Vector *a, uint64_t count)
+{
+    uint64_t tmp;
+
+    if (count == 0) {
+        d->doubleword[0] = a->doubleword[0];
+        d->doubleword[1] = a->doubleword[1];
+    } else if (count == 64) {
+        d->doubleword[1] = a->doubleword[0];
+        d->doubleword[0] = 0;
+    } else if (count < 64) {
+        tmp = a->doubleword[1] >> count;
+        d->doubleword[1] = deposit64(tmp, 64 - count, count, a->doubleword[0]);
+        d->doubleword[0] = (int64_t)a->doubleword[0] >> count;
+    } else {
+        d->doubleword[1] = (int64_t)a->doubleword[0] >> (count - 64);
+        d->doubleword[0] = 0;
+    }
+}
+
 static void s390_vec_shr(S390Vector *d, const S390Vector *a, uint64_t count)
 {
     uint64_t tmp;
@@ -724,3 +744,9 @@ void HELPER(gvec_vsldb)(void *v1, const void *v2, const void *v3,
     s390_vec_shr(&t1, v3, 128 - src_idx * 8);
     s390_vec_or(v1, &t0, &t1);
 }
+
+void HELPER(gvec_vsra)(void *v1, const void *v2, uint64_t count,
+                       uint32_t desc)
+{
+    s390_vec_sar(v1, v2, count);
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 32/41] s390x/tcg: Implement VECTOR SHIFT RIGHT ARITHMETIC
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Similar to VECTOR SHIFT LEFT ARITHMETIC. Add s390_vec_sar() similar to
s390_vec_shr().

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  4 ++++
 target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
 target/s390x/vec_int_helper.c   | 26 ++++++++++++++++++++++++++
 4 files changed, 48 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index a433f57009..54a861c179 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -228,6 +228,7 @@ DEF_HELPER_FLAGS_4(gvec_vesrl8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vesrl16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vsl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vsldb, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vsra, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 76aec5a21f..587de3eaac 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1166,6 +1166,10 @@
     F(0xe775, VSLB,    VRR_c, V,   0, 0, 0, 0, vsl, 0, IF_VEC)
 /* VECTOR SHIFT LEFT DOUBLE BY BYTE */
     F(0xe777, VSLDB,   VRI_d, V,   0, 0, 0, 0, vsldb, 0, IF_VEC)
+/* VECTOR SHIFT RIGHT ARITHMETIC */
+    F(0xe77e, VSRA,    VRR_c, V,   0, 0, 0, 0, vsra, 0, IF_VEC)
+/* VECTOR SHIFT RIGHT ARITHMETIC BY BYTE */
+    F(0xe77f, VSRAB,   VRR_c, V,   0, 0, 0, 0, vsra, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 221b729ee0..8c44dcf471 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2104,3 +2104,20 @@ static DisasJumpType op_vsldb(DisasContext *s, DisasOps *o)
     }
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsra(DisasContext *s, DisasOps *o)
+{
+    TCGv_i64 shift = tcg_temp_new_i64();
+
+    read_vec_element_i64(shift, get_field(s->fields, v3), 7, ES_8);
+    if (s->fields->op2 == 0x7e) {
+        tcg_gen_andi_i64(shift, shift, 0x7);
+    } else {
+        tcg_gen_andi_i64(shift, shift, 0x78);
+    }
+
+    gen_gvec_2i_ool(get_field(s->fields, v1), get_field(s->fields, v2),
+                    shift, 0, gen_helper_gvec_vsra);
+    tcg_temp_free_i64(shift);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 8b922e717f..220e6647ff 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -78,6 +78,26 @@ static void s390_vec_shl(S390Vector *d, const S390Vector *a, uint64_t count)
     }
 }
 
+static void s390_vec_sar(S390Vector *d, const S390Vector *a, uint64_t count)
+{
+    uint64_t tmp;
+
+    if (count == 0) {
+        d->doubleword[0] = a->doubleword[0];
+        d->doubleword[1] = a->doubleword[1];
+    } else if (count == 64) {
+        d->doubleword[1] = a->doubleword[0];
+        d->doubleword[0] = 0;
+    } else if (count < 64) {
+        tmp = a->doubleword[1] >> count;
+        d->doubleword[1] = deposit64(tmp, 64 - count, count, a->doubleword[0]);
+        d->doubleword[0] = (int64_t)a->doubleword[0] >> count;
+    } else {
+        d->doubleword[1] = (int64_t)a->doubleword[0] >> (count - 64);
+        d->doubleword[0] = 0;
+    }
+}
+
 static void s390_vec_shr(S390Vector *d, const S390Vector *a, uint64_t count)
 {
     uint64_t tmp;
@@ -724,3 +744,9 @@ void HELPER(gvec_vsldb)(void *v1, const void *v2, const void *v3,
     s390_vec_shr(&t1, v3, 128 - src_idx * 8);
     s390_vec_or(v1, &t0, &t1);
 }
+
+void HELPER(gvec_vsra)(void *v1, const void *v2, uint64_t count,
+                       uint32_t desc)
+{
+    s390_vec_sar(v1, v2, count);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 33/41] s390x/tcg: Implement VECTOR SHIFT RIGHT LOGICAL *
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Similar to VECTOR SHIFT RIGHT ARITHMETICAL.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  4 ++++
 target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
 target/s390x/vec_int_helper.c   |  6 ++++++
 4 files changed, 28 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 54a861c179..af7fb10f76 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -229,6 +229,7 @@ DEF_HELPER_FLAGS_4(gvec_vesrl16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vsl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vsldb, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vsra, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vsrl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 587de3eaac..f3bf9edfca 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1170,6 +1170,10 @@
     F(0xe77e, VSRA,    VRR_c, V,   0, 0, 0, 0, vsra, 0, IF_VEC)
 /* VECTOR SHIFT RIGHT ARITHMETIC BY BYTE */
     F(0xe77f, VSRAB,   VRR_c, V,   0, 0, 0, 0, vsra, 0, IF_VEC)
+/* VECTOR SHIFT RIGHT LOGICAL */
+    F(0xe77c, VSRL,    VRR_c, V,   0, 0, 0, 0, vsrl, 0, IF_VEC)
+/* VECTOR SHIFT RIGHT LOGICAL BY BYTE */
+    F(0xe77d, VSRLB,   VRR_c, V,   0, 0, 0, 0, vsrl, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 8c44dcf471..af8ad71084 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2121,3 +2121,20 @@ static DisasJumpType op_vsra(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(shift);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsrl(DisasContext *s, DisasOps *o)
+{
+    TCGv_i64 shift = tcg_temp_new_i64();
+
+    read_vec_element_i64(shift, get_field(s->fields, v3), 7, ES_8);
+    if (s->fields->op2 == 0x7c) {
+        tcg_gen_andi_i64(shift, shift, 0x7);
+    } else {
+        tcg_gen_andi_i64(shift, shift, 0x78);
+    }
+
+    gen_gvec_2i_ool(get_field(s->fields, v1), get_field(s->fields, v2),
+                    shift, 0, gen_helper_gvec_vsrl);
+    tcg_temp_free_i64(shift);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 220e6647ff..12502b48e8 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -750,3 +750,9 @@ void HELPER(gvec_vsra)(void *v1, const void *v2, uint64_t count,
 {
     s390_vec_sar(v1, v2, count);
 }
+
+void HELPER(gvec_vsrl)(void *v1, const void *v2, uint64_t count,
+                       uint32_t desc)
+{
+    s390_vec_shr(v1, v2, count);
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 33/41] s390x/tcg: Implement VECTOR SHIFT RIGHT LOGICAL *
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Similar to VECTOR SHIFT RIGHT ARITHMETICAL.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  4 ++++
 target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
 target/s390x/vec_int_helper.c   |  6 ++++++
 4 files changed, 28 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 54a861c179..af7fb10f76 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -229,6 +229,7 @@ DEF_HELPER_FLAGS_4(gvec_vesrl16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vsl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vsldb, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vsra, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vsrl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 587de3eaac..f3bf9edfca 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1170,6 +1170,10 @@
     F(0xe77e, VSRA,    VRR_c, V,   0, 0, 0, 0, vsra, 0, IF_VEC)
 /* VECTOR SHIFT RIGHT ARITHMETIC BY BYTE */
     F(0xe77f, VSRAB,   VRR_c, V,   0, 0, 0, 0, vsra, 0, IF_VEC)
+/* VECTOR SHIFT RIGHT LOGICAL */
+    F(0xe77c, VSRL,    VRR_c, V,   0, 0, 0, 0, vsrl, 0, IF_VEC)
+/* VECTOR SHIFT RIGHT LOGICAL BY BYTE */
+    F(0xe77d, VSRLB,   VRR_c, V,   0, 0, 0, 0, vsrl, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 8c44dcf471..af8ad71084 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2121,3 +2121,20 @@ static DisasJumpType op_vsra(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(shift);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsrl(DisasContext *s, DisasOps *o)
+{
+    TCGv_i64 shift = tcg_temp_new_i64();
+
+    read_vec_element_i64(shift, get_field(s->fields, v3), 7, ES_8);
+    if (s->fields->op2 == 0x7c) {
+        tcg_gen_andi_i64(shift, shift, 0x7);
+    } else {
+        tcg_gen_andi_i64(shift, shift, 0x78);
+    }
+
+    gen_gvec_2i_ool(get_field(s->fields, v1), get_field(s->fields, v2),
+                    shift, 0, gen_helper_gvec_vsrl);
+    tcg_temp_free_i64(shift);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 220e6647ff..12502b48e8 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -750,3 +750,9 @@ void HELPER(gvec_vsra)(void *v1, const void *v2, uint64_t count,
 {
     s390_vec_sar(v1, v2, count);
 }
+
+void HELPER(gvec_vsrl)(void *v1, const void *v2, uint64_t count,
+                       uint32_t desc)
+{
+    s390_vec_shr(v1, v2, count);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 34/41] s390x/tcg: Implement VECTOR SUBTRACT
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

We can use tcg_gen_sub2_i64() to do 128-bit subtraction and otherwise
existing gvec helpers.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index f3bf9edfca..58a61f41ef 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1174,6 +1174,8 @@
     F(0xe77c, VSRL,    VRR_c, V,   0, 0, 0, 0, vsrl, 0, IF_VEC)
 /* VECTOR SHIFT RIGHT LOGICAL BY BYTE */
     F(0xe77d, VSRLB,   VRR_c, V,   0, 0, 0, 0, vsrl, 0, IF_VEC)
+/* VECTOR SUBTRACT */
+    F(0xe7f7, VS,      VRR_c, V,   0, 0, 0, 0, vs, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index af8ad71084..83463155f6 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2138,3 +2138,20 @@ static DisasJumpType op_vsrl(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(shift);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vs(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+
+    if (es > ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    } else if (es == ES_128) {
+        gen_gvec128_3_i64(tcg_gen_sub2_i64, get_field(s->fields, v1),
+                          get_field(s->fields, v2), get_field(s->fields, v3));
+        return DISAS_NEXT;
+    }
+    gen_gvec_fn_3(sub, es, get_field(s->fields, v1), get_field(s->fields, v2),
+                  get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 34/41] s390x/tcg: Implement VECTOR SUBTRACT
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

We can use tcg_gen_sub2_i64() to do 128-bit subtraction and otherwise
existing gvec helpers.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index f3bf9edfca..58a61f41ef 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1174,6 +1174,8 @@
     F(0xe77c, VSRL,    VRR_c, V,   0, 0, 0, 0, vsrl, 0, IF_VEC)
 /* VECTOR SHIFT RIGHT LOGICAL BY BYTE */
     F(0xe77d, VSRLB,   VRR_c, V,   0, 0, 0, 0, vsrl, 0, IF_VEC)
+/* VECTOR SUBTRACT */
+    F(0xe7f7, VS,      VRR_c, V,   0, 0, 0, 0, vs, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index af8ad71084..83463155f6 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2138,3 +2138,20 @@ static DisasJumpType op_vsrl(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(shift);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vs(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+
+    if (es > ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    } else if (es == ES_128) {
+        gen_gvec128_3_i64(tcg_gen_sub2_i64, get_field(s->fields, v1),
+                          get_field(s->fields, v2), get_field(s->fields, v3));
+        return DISAS_NEXT;
+    }
+    gen_gvec_fn_3(sub, es, get_field(s->fields, v1), get_field(s->fields, v2),
+                  get_field(s->fields, v3));
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 35/41] s390x/tcg: Implement VECTOR SUBTRACT COMPUTE BORROW INDICATION
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Let's keep it simple for now and handle 8/16/128 bit elements via helpers.
Especially for 8/16, we could come up with some bit tricks.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  3 +++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 30 +++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 47 +++++++++++++++++++++++++++++++++
 4 files changed, 82 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index af7fb10f76..33e3e003f8 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -230,6 +230,9 @@ DEF_HELPER_FLAGS_4(gvec_vsl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vsldb, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vsra, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vsrl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vscbi8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vscbi16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vscbi128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 58a61f41ef..94de3c9c7d 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1176,6 +1176,8 @@
     F(0xe77d, VSRLB,   VRR_c, V,   0, 0, 0, 0, vsrl, 0, IF_VEC)
 /* VECTOR SUBTRACT */
     F(0xe7f7, VS,      VRR_c, V,   0, 0, 0, 0, vs, 0, IF_VEC)
+/* VECTOR SUBTRACT COMPUTE BORROW INDICATION */
+    F(0xe7f5, VSCBI,   VRR_c, V,   0, 0, 0, 0, vscbi, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 83463155f6..7770ca4101 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2155,3 +2155,33 @@ static DisasJumpType op_vs(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static void gen_scbi_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    tcg_gen_setcond_i32(TCG_COND_LTU, d, a, b);
+}
+
+static void gen_scbi_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    tcg_gen_setcond_i64(TCG_COND_LTU, d, a, b);
+}
+
+static DisasJumpType op_vscbi(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g[5] = {
+        { .fno = gen_helper_gvec_vscbi8, },
+        { .fno = gen_helper_gvec_vscbi16, },
+        { .fni4 = gen_scbi_i32, },
+        { .fni8 = gen_scbi_i64, },
+        { .fno = gen_helper_gvec_vscbi128, },
+    };
+
+    if (es > ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 12502b48e8..699b399a26 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -38,6 +38,27 @@ static bool s390_vec_add(S390Vector *d, const S390Vector *a,
     return high_carry;
 }
 
+/*
+ * Subtract two 128 bit vectors, returning the borrow.
+ */
+static bool s390_vec_sub(S390Vector *d, const S390Vector *a,
+                         const S390Vector *b)
+{
+    bool low_borrow = false, high_borrow = false;
+
+    if (a->doubleword[0] < b->doubleword[0]) {
+        high_borrow = true;
+    } else if (a->doubleword[1] < b->doubleword[0]) {
+        low_borrow = true;
+        if (a->doubleword[0] == b->doubleword[0]) {
+            high_borrow = true;
+        }
+    }
+    d->doubleword[0] = a->doubleword[0] - b->doubleword[0] - low_borrow;
+    d->doubleword[1] = a->doubleword[1] - b->doubleword[1];
+    return high_borrow;
+}
+
 static bool s390_vec_is_zero(const S390Vector *v)
 {
     return !v->doubleword[0] && !v->doubleword[1];
@@ -756,3 +777,29 @@ void HELPER(gvec_vsrl)(void *v1, const void *v2, uint64_t count,
 {
     s390_vec_shr(v1, v2, count);
 }
+
+#define DEF_VSCBI(BITS)                                                        \
+void HELPER(gvec_vscbi##BITS)(void *v1, const void *v2, const void *v3,        \
+                              uint32_t desc)                                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint##BITS##_t b = s390_vec_read_element##BITS(v3, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a < b);                            \
+    }                                                                          \
+}
+DEF_VSCBI(8)
+DEF_VSCBI(16)
+
+void HELPER(gvec_vscbi128)(void *v1, const void *v2, const void *v3,
+                           uint32_t desc)
+{
+    S390Vector *dst = v1;
+    S390Vector tmp;
+
+    dst->doubleword[0] = 0;
+    dst->doubleword[1] = s390_vec_sub(&tmp, v2, v3);
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 35/41] s390x/tcg: Implement VECTOR SUBTRACT COMPUTE BORROW INDICATION
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Let's keep it simple for now and handle 8/16/128 bit elements via helpers.
Especially for 8/16, we could come up with some bit tricks.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  3 +++
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 30 +++++++++++++++++++++
 target/s390x/vec_int_helper.c   | 47 +++++++++++++++++++++++++++++++++
 4 files changed, 82 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index af7fb10f76..33e3e003f8 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -230,6 +230,9 @@ DEF_HELPER_FLAGS_4(gvec_vsl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vsldb, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vsra, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vsrl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vscbi8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vscbi16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_4(gvec_vscbi128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 58a61f41ef..94de3c9c7d 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1176,6 +1176,8 @@
     F(0xe77d, VSRLB,   VRR_c, V,   0, 0, 0, 0, vsrl, 0, IF_VEC)
 /* VECTOR SUBTRACT */
     F(0xe7f7, VS,      VRR_c, V,   0, 0, 0, 0, vs, 0, IF_VEC)
+/* VECTOR SUBTRACT COMPUTE BORROW INDICATION */
+    F(0xe7f5, VSCBI,   VRR_c, V,   0, 0, 0, 0, vscbi, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 83463155f6..7770ca4101 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2155,3 +2155,33 @@ static DisasJumpType op_vs(DisasContext *s, DisasOps *o)
                   get_field(s->fields, v3));
     return DISAS_NEXT;
 }
+
+static void gen_scbi_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+    tcg_gen_setcond_i32(TCG_COND_LTU, d, a, b);
+}
+
+static void gen_scbi_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+    tcg_gen_setcond_i64(TCG_COND_LTU, d, a, b);
+}
+
+static DisasJumpType op_vscbi(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    static const GVecGen3 g[5] = {
+        { .fno = gen_helper_gvec_vscbi8, },
+        { .fno = gen_helper_gvec_vscbi16, },
+        { .fni4 = gen_scbi_i32, },
+        { .fni8 = gen_scbi_i64, },
+        { .fno = gen_helper_gvec_vscbi128, },
+    };
+
+    if (es > ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    gen_gvec_3(get_field(s->fields, v1), get_field(s->fields, v2),
+               get_field(s->fields, v3), &g[es]);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 12502b48e8..699b399a26 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -38,6 +38,27 @@ static bool s390_vec_add(S390Vector *d, const S390Vector *a,
     return high_carry;
 }
 
+/*
+ * Subtract two 128 bit vectors, returning the borrow.
+ */
+static bool s390_vec_sub(S390Vector *d, const S390Vector *a,
+                         const S390Vector *b)
+{
+    bool low_borrow = false, high_borrow = false;
+
+    if (a->doubleword[0] < b->doubleword[0]) {
+        high_borrow = true;
+    } else if (a->doubleword[1] < b->doubleword[0]) {
+        low_borrow = true;
+        if (a->doubleword[0] == b->doubleword[0]) {
+            high_borrow = true;
+        }
+    }
+    d->doubleword[0] = a->doubleword[0] - b->doubleword[0] - low_borrow;
+    d->doubleword[1] = a->doubleword[1] - b->doubleword[1];
+    return high_borrow;
+}
+
 static bool s390_vec_is_zero(const S390Vector *v)
 {
     return !v->doubleword[0] && !v->doubleword[1];
@@ -756,3 +777,29 @@ void HELPER(gvec_vsrl)(void *v1, const void *v2, uint64_t count,
 {
     s390_vec_shr(v1, v2, count);
 }
+
+#define DEF_VSCBI(BITS)                                                        \
+void HELPER(gvec_vscbi##BITS)(void *v1, const void *v2, const void *v3,        \
+                              uint32_t desc)                                   \
+{                                                                              \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
+        const uint##BITS##_t b = s390_vec_read_element##BITS(v3, i);           \
+                                                                               \
+        s390_vec_write_element##BITS(v1, i, a < b);                            \
+    }                                                                          \
+}
+DEF_VSCBI(8)
+DEF_VSCBI(16)
+
+void HELPER(gvec_vscbi128)(void *v1, const void *v2, const void *v3,
+                           uint32_t desc)
+{
+    S390Vector *dst = v1;
+    S390Vector tmp;
+
+    dst->doubleword[0] = 0;
+    dst->doubleword[1] = s390_vec_sub(&tmp, v2, v3);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 36/41] s390x/tcg: Implement VECTOR SUBTRACT WITH BORROW INDICATION
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Fairly easy as only 128-bit handling is required. Simply perform the
subtraction and then subtract the borrow.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 27 +++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 94de3c9c7d..a60d8531dc 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1178,6 +1178,8 @@
     F(0xe7f7, VS,      VRR_c, V,   0, 0, 0, 0, vs, 0, IF_VEC)
 /* VECTOR SUBTRACT COMPUTE BORROW INDICATION */
     F(0xe7f5, VSCBI,   VRR_c, V,   0, 0, 0, 0, vscbi, 0, IF_VEC)
+/* VECTOR SUBTRACT WITH BORROW INDICATION */
+    F(0xe7bf, VSBI,    VRR_d, V,   0, 0, 0, 0, vsbi, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 7770ca4101..3f60b97654 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2185,3 +2185,30 @@ static DisasJumpType op_vscbi(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), &g[es]);
     return DISAS_NEXT;
 }
+
+static void gen_sbi2_i64(TCGv_i64 dl, TCGv_i64 dh, TCGv_i64 al, TCGv_i64 ah,
+                         TCGv_i64 bl, TCGv_i64 bh, TCGv_i64 cl, TCGv_i64 ch)
+{
+    TCGv_i64 tl = tcg_temp_new_i64();
+    TCGv_i64 th = tcg_const_i64(0);
+
+    /* extract the borrow only */
+    tcg_gen_extract_i64(tl, cl, 0, 1);
+    tcg_gen_sub2_i64(dl, dh, al, ah, bl, bh);
+    tcg_gen_sub2_i64(dl, dh, dl, dh, tl, th);
+    tcg_temp_free_i64(tl);
+    tcg_temp_free_i64(th);
+}
+
+static DisasJumpType op_vsbi(DisasContext *s, DisasOps *o)
+{
+    if (get_field(s->fields, m5) != ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec128_4_i64(gen_sbi2_i64, get_field(s->fields, v1),
+                      get_field(s->fields, v2), get_field(s->fields, v3),
+                      get_field(s->fields, v4));
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 36/41] s390x/tcg: Implement VECTOR SUBTRACT WITH BORROW INDICATION
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Fairly easy as only 128-bit handling is required. Simply perform the
subtraction and then subtract the borrow.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 27 +++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 94de3c9c7d..a60d8531dc 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1178,6 +1178,8 @@
     F(0xe7f7, VS,      VRR_c, V,   0, 0, 0, 0, vs, 0, IF_VEC)
 /* VECTOR SUBTRACT COMPUTE BORROW INDICATION */
     F(0xe7f5, VSCBI,   VRR_c, V,   0, 0, 0, 0, vscbi, 0, IF_VEC)
+/* VECTOR SUBTRACT WITH BORROW INDICATION */
+    F(0xe7bf, VSBI,    VRR_d, V,   0, 0, 0, 0, vsbi, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 7770ca4101..3f60b97654 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2185,3 +2185,30 @@ static DisasJumpType op_vscbi(DisasContext *s, DisasOps *o)
                get_field(s->fields, v3), &g[es]);
     return DISAS_NEXT;
 }
+
+static void gen_sbi2_i64(TCGv_i64 dl, TCGv_i64 dh, TCGv_i64 al, TCGv_i64 ah,
+                         TCGv_i64 bl, TCGv_i64 bh, TCGv_i64 cl, TCGv_i64 ch)
+{
+    TCGv_i64 tl = tcg_temp_new_i64();
+    TCGv_i64 th = tcg_const_i64(0);
+
+    /* extract the borrow only */
+    tcg_gen_extract_i64(tl, cl, 0, 1);
+    tcg_gen_sub2_i64(dl, dh, al, ah, bl, bh);
+    tcg_gen_sub2_i64(dl, dh, dl, dh, tl, th);
+    tcg_temp_free_i64(tl);
+    tcg_temp_free_i64(th);
+}
+
+static DisasJumpType op_vsbi(DisasContext *s, DisasOps *o)
+{
+    if (get_field(s->fields, m5) != ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec128_4_i64(gen_sbi2_i64, get_field(s->fields, v1),
+                      get_field(s->fields, v2), get_field(s->fields, v3),
+                      get_field(s->fields, v4));
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 37/41] s390x/tcg: Implement VECTOR SUBTRACT WITH BORROW COMPUTE BORROW INDICATION
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Reuse s390_vec_sub() to perform two 128-bit subtraction, calculating the
borrow.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 13 +++++++++++++
 target/s390x/vec_int_helper.c   | 16 ++++++++++++++++
 4 files changed, 32 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 33e3e003f8..d040e4cd07 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -233,6 +233,7 @@ DEF_HELPER_FLAGS_4(gvec_vsrl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vscbi8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vscbi16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vscbi128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vsbcbi128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index a60d8531dc..a8d90517f6 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1180,6 +1180,8 @@
     F(0xe7f5, VSCBI,   VRR_c, V,   0, 0, 0, 0, vscbi, 0, IF_VEC)
 /* VECTOR SUBTRACT WITH BORROW INDICATION */
     F(0xe7bf, VSBI,    VRR_d, V,   0, 0, 0, 0, vsbi, 0, IF_VEC)
+/* VECTOR SUBTRACT WITH BORROW COMPUTE BORROW INDICATION */
+    F(0xe7bd, VSBCBI,  VRR_d, V,   0, 0, 0, 0, vsbcbi, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 3f60b97654..fd232ba6c3 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2212,3 +2212,16 @@ static DisasJumpType op_vsbi(DisasContext *s, DisasOps *o)
                       get_field(s->fields, v4));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsbcbi(DisasContext *s, DisasOps *o)
+{
+    if (get_field(s->fields, m5) != ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec_4_ool(get_field(s->fields, v1), get_field(s->fields, v2),
+                   get_field(s->fields, v3), get_field(s->fields, v4), 0,
+                   gen_helper_gvec_vsbcbi128);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 699b399a26..95686e79fd 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -803,3 +803,19 @@ void HELPER(gvec_vscbi128)(void *v1, const void *v2, const void *v3,
     dst->doubleword[0] = 0;
     dst->doubleword[1] = s390_vec_sub(&tmp, v2, v3);
 }
+
+void HELPER(gvec_vsbcbi128)(void *v1, const void *v2, const void *v3,
+                            const void *v4, uint32_t desc)
+{
+    const S390Vector old_borrow = {
+        .doubleword[0] = 0,
+        .doubleword[1] = ((S390Vector *)v4)->doubleword[1] & 1,
+    };
+    S390Vector tmp, *dst = v1;
+    bool borrow;
+
+    borrow = s390_vec_sub(&tmp, v2, v3);
+    borrow |= s390_vec_sub(&tmp, &tmp, &old_borrow);
+    dst->doubleword[0] = 0;
+    dst->doubleword[1] = borrow;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 37/41] s390x/tcg: Implement VECTOR SUBTRACT WITH BORROW COMPUTE BORROW INDICATION
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Reuse s390_vec_sub() to perform two 128-bit subtraction, calculating the
borrow.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 13 +++++++++++++
 target/s390x/vec_int_helper.c   | 16 ++++++++++++++++
 4 files changed, 32 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 33e3e003f8..d040e4cd07 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -233,6 +233,7 @@ DEF_HELPER_FLAGS_4(gvec_vsrl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_vscbi8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vscbi16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vscbi128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vsbcbi128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index a60d8531dc..a8d90517f6 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1180,6 +1180,8 @@
     F(0xe7f5, VSCBI,   VRR_c, V,   0, 0, 0, 0, vscbi, 0, IF_VEC)
 /* VECTOR SUBTRACT WITH BORROW INDICATION */
     F(0xe7bf, VSBI,    VRR_d, V,   0, 0, 0, 0, vsbi, 0, IF_VEC)
+/* VECTOR SUBTRACT WITH BORROW COMPUTE BORROW INDICATION */
+    F(0xe7bd, VSBCBI,  VRR_d, V,   0, 0, 0, 0, vsbcbi, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 3f60b97654..fd232ba6c3 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2212,3 +2212,16 @@ static DisasJumpType op_vsbi(DisasContext *s, DisasOps *o)
                       get_field(s->fields, v4));
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsbcbi(DisasContext *s, DisasOps *o)
+{
+    if (get_field(s->fields, m5) != ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec_4_ool(get_field(s->fields, v1), get_field(s->fields, v2),
+                   get_field(s->fields, v3), get_field(s->fields, v4), 0,
+                   gen_helper_gvec_vsbcbi128);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 699b399a26..95686e79fd 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -803,3 +803,19 @@ void HELPER(gvec_vscbi128)(void *v1, const void *v2, const void *v3,
     dst->doubleword[0] = 0;
     dst->doubleword[1] = s390_vec_sub(&tmp, v2, v3);
 }
+
+void HELPER(gvec_vsbcbi128)(void *v1, const void *v2, const void *v3,
+                            const void *v4, uint32_t desc)
+{
+    const S390Vector old_borrow = {
+        .doubleword[0] = 0,
+        .doubleword[1] = ((S390Vector *)v4)->doubleword[1] & 1,
+    };
+    S390Vector tmp, *dst = v1;
+    bool borrow;
+
+    borrow = s390_vec_sub(&tmp, v2, v3);
+    borrow |= s390_vec_sub(&tmp, &tmp, &old_borrow);
+    dst->doubleword[0] = 0;
+    dst->doubleword[1] = borrow;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 38/41] s390x/tcg: Implement VECTOR SUM ACROSS DOUBLEWORD
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Perform the calculations without a helper. Only 16 bit or 32 bit values
have to be added.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 29 +++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index a8d90517f6..dd37003082 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1182,6 +1182,8 @@
     F(0xe7bf, VSBI,    VRR_d, V,   0, 0, 0, 0, vsbi, 0, IF_VEC)
 /* VECTOR SUBTRACT WITH BORROW COMPUTE BORROW INDICATION */
     F(0xe7bd, VSBCBI,  VRR_d, V,   0, 0, 0, 0, vsbcbi, 0, IF_VEC)
+/* VECTOR SUM ACROSS DOUBLEWORD */
+    F(0xe765, VSUMG,   VRR_c, V,   0, 0, 0, 0, vsumg, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index fd232ba6c3..2168a519fa 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2225,3 +2225,32 @@ static DisasJumpType op_vsbcbi(DisasContext *s, DisasOps *o)
                    gen_helper_gvec_vsbcbi128);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsumg(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    TCGv_i64 sum, tmp;
+    uint8_t dst_idx;
+
+    if (es == ES_8 || es > ES_32) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    sum = tcg_temp_new_i64();
+    tmp = tcg_temp_new_i64();
+    for (dst_idx = 0; dst_idx < 2; dst_idx++) {
+        uint8_t idx = dst_idx * NUM_VEC_ELEMENTS(es) / 2;
+        const uint8_t max_idx = idx + NUM_VEC_ELEMENTS(es) / 2 - 1;
+
+        read_vec_element_i64(sum, get_field(s->fields, v3), max_idx, es);
+        for (; idx <= max_idx; idx++) {
+            read_vec_element_i64(tmp, get_field(s->fields, v2), idx, es);
+            tcg_gen_add_i64(sum, sum, tmp);
+        }
+        write_vec_element_i64(sum, get_field(s->fields, v1), dst_idx, ES_64);
+    }
+    tcg_temp_free_i64(sum);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 38/41] s390x/tcg: Implement VECTOR SUM ACROSS DOUBLEWORD
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Perform the calculations without a helper. Only 16 bit or 32 bit values
have to be added.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 29 +++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index a8d90517f6..dd37003082 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1182,6 +1182,8 @@
     F(0xe7bf, VSBI,    VRR_d, V,   0, 0, 0, 0, vsbi, 0, IF_VEC)
 /* VECTOR SUBTRACT WITH BORROW COMPUTE BORROW INDICATION */
     F(0xe7bd, VSBCBI,  VRR_d, V,   0, 0, 0, 0, vsbcbi, 0, IF_VEC)
+/* VECTOR SUM ACROSS DOUBLEWORD */
+    F(0xe765, VSUMG,   VRR_c, V,   0, 0, 0, 0, vsumg, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index fd232ba6c3..2168a519fa 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2225,3 +2225,32 @@ static DisasJumpType op_vsbcbi(DisasContext *s, DisasOps *o)
                    gen_helper_gvec_vsbcbi128);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsumg(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    TCGv_i64 sum, tmp;
+    uint8_t dst_idx;
+
+    if (es == ES_8 || es > ES_32) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    sum = tcg_temp_new_i64();
+    tmp = tcg_temp_new_i64();
+    for (dst_idx = 0; dst_idx < 2; dst_idx++) {
+        uint8_t idx = dst_idx * NUM_VEC_ELEMENTS(es) / 2;
+        const uint8_t max_idx = idx + NUM_VEC_ELEMENTS(es) / 2 - 1;
+
+        read_vec_element_i64(sum, get_field(s->fields, v3), max_idx, es);
+        for (; idx <= max_idx; idx++) {
+            read_vec_element_i64(tmp, get_field(s->fields, v2), idx, es);
+            tcg_gen_add_i64(sum, sum, tmp);
+        }
+        write_vec_element_i64(sum, get_field(s->fields, v1), dst_idx, ES_64);
+    }
+    tcg_temp_free_i64(sum);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 39/41] s390x/tcg: Implement VECTOR SUM ACROSS QUADWORD
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Similar to VECTOR SUM ACROSS DOUBLEWORD, however without a loop and
using 128-bit calculations.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index dd37003082..2483ee01d7 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1184,6 +1184,8 @@
     F(0xe7bd, VSBCBI,  VRR_d, V,   0, 0, 0, 0, vsbcbi, 0, IF_VEC)
 /* VECTOR SUM ACROSS DOUBLEWORD */
     F(0xe765, VSUMG,   VRR_c, V,   0, 0, 0, 0, vsumg, 0, IF_VEC)
+/* VECTOR SUM ACROSS QUADWORD */
+    F(0xe767, VSUMQ,   VRR_c, V,   0, 0, 0, 0, vsumq, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 2168a519fa..995c2b4461 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2254,3 +2254,35 @@ static DisasJumpType op_vsumg(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsumq(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    const uint8_t max_idx = NUM_VEC_ELEMENTS(es) - 1;
+    TCGv_i64 sumh, suml, zero, tmpl;
+    uint8_t idx;
+
+    if (es < ES_32 || es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    sumh = tcg_const_i64(0);
+    suml = tcg_temp_new_i64();
+    zero = tcg_const_i64(0);
+    tmpl = tcg_temp_new_i64();
+
+    read_vec_element_i64(suml, get_field(s->fields, v3), max_idx, es);
+    for (idx = 0; idx <= max_idx; idx++) {
+        read_vec_element_i64(tmpl, get_field(s->fields, v2), idx, es);
+        tcg_gen_add2_i64(suml, sumh, suml, sumh, tmpl, zero);
+    }
+    write_vec_element_i64(sumh, get_field(s->fields, v1), 0, ES_64);
+    write_vec_element_i64(suml, get_field(s->fields, v1), 1, ES_64);
+
+    tcg_temp_free_i64(sumh);
+    tcg_temp_free_i64(suml);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(tmpl);
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 39/41] s390x/tcg: Implement VECTOR SUM ACROSS QUADWORD
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Similar to VECTOR SUM ACROSS DOUBLEWORD, however without a loop and
using 128-bit calculations.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index dd37003082..2483ee01d7 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1184,6 +1184,8 @@
     F(0xe7bd, VSBCBI,  VRR_d, V,   0, 0, 0, 0, vsbcbi, 0, IF_VEC)
 /* VECTOR SUM ACROSS DOUBLEWORD */
     F(0xe765, VSUMG,   VRR_c, V,   0, 0, 0, 0, vsumg, 0, IF_VEC)
+/* VECTOR SUM ACROSS QUADWORD */
+    F(0xe767, VSUMQ,   VRR_c, V,   0, 0, 0, 0, vsumq, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 2168a519fa..995c2b4461 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2254,3 +2254,35 @@ static DisasJumpType op_vsumg(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsumq(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    const uint8_t max_idx = NUM_VEC_ELEMENTS(es) - 1;
+    TCGv_i64 sumh, suml, zero, tmpl;
+    uint8_t idx;
+
+    if (es < ES_32 || es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    sumh = tcg_const_i64(0);
+    suml = tcg_temp_new_i64();
+    zero = tcg_const_i64(0);
+    tmpl = tcg_temp_new_i64();
+
+    read_vec_element_i64(suml, get_field(s->fields, v3), max_idx, es);
+    for (idx = 0; idx <= max_idx; idx++) {
+        read_vec_element_i64(tmpl, get_field(s->fields, v2), idx, es);
+        tcg_gen_add2_i64(suml, sumh, suml, sumh, tmpl, zero);
+    }
+    write_vec_element_i64(sumh, get_field(s->fields, v1), 0, ES_64);
+    write_vec_element_i64(suml, get_field(s->fields, v1), 1, ES_64);
+
+    tcg_temp_free_i64(sumh);
+    tcg_temp_free_i64(suml);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(tmpl);
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 40/41] s390x/tcg: Implement VECTOR SUM ACROSS WORD
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Similar to VECTOR SUM ACROSS DOUBLEWORD.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 29 +++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 2483ee01d7..a52db41388 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1186,6 +1186,8 @@
     F(0xe765, VSUMG,   VRR_c, V,   0, 0, 0, 0, vsumg, 0, IF_VEC)
 /* VECTOR SUM ACROSS QUADWORD */
     F(0xe767, VSUMQ,   VRR_c, V,   0, 0, 0, 0, vsumq, 0, IF_VEC)
+/* VECTOR SUM ACROSS WORD */
+    F(0xe764, VSUM,    VRR_c, V,   0, 0, 0, 0, vsum, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 995c2b4461..59a9885892 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2286,3 +2286,32 @@ static DisasJumpType op_vsumq(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmpl);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsum(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    TCGv_i32 sum, tmp;
+    uint8_t dst_idx;
+
+    if (es > ES_16) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    sum = tcg_temp_new_i32();
+    tmp = tcg_temp_new_i32();
+    for (dst_idx = 0; dst_idx < 4; dst_idx++) {
+        uint8_t idx = dst_idx * NUM_VEC_ELEMENTS(es) / 4;
+        const uint8_t max_idx = idx + NUM_VEC_ELEMENTS(es) / 4 - 1;
+
+        read_vec_element_i32(sum, get_field(s->fields, v3), max_idx, es);
+        for (; idx <= max_idx; idx++) {
+            read_vec_element_i32(tmp, get_field(s->fields, v2), idx, es);
+            tcg_gen_add_i32(sum, sum, tmp);
+        }
+        write_vec_element_i32(sum, get_field(s->fields, v1), dst_idx, ES_32);
+    }
+    tcg_temp_free_i32(sum);
+    tcg_temp_free_i32(tmp);
+    return DISAS_NEXT;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 40/41] s390x/tcg: Implement VECTOR SUM ACROSS WORD
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Similar to VECTOR SUM ACROSS DOUBLEWORD.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 29 +++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 2483ee01d7..a52db41388 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1186,6 +1186,8 @@
     F(0xe765, VSUMG,   VRR_c, V,   0, 0, 0, 0, vsumg, 0, IF_VEC)
 /* VECTOR SUM ACROSS QUADWORD */
     F(0xe767, VSUMQ,   VRR_c, V,   0, 0, 0, 0, vsumq, 0, IF_VEC)
+/* VECTOR SUM ACROSS WORD */
+    F(0xe764, VSUM,    VRR_c, V,   0, 0, 0, 0, vsum, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 995c2b4461..59a9885892 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -2286,3 +2286,32 @@ static DisasJumpType op_vsumq(DisasContext *s, DisasOps *o)
     tcg_temp_free_i64(tmpl);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vsum(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s->fields, m4);
+    TCGv_i32 sum, tmp;
+    uint8_t dst_idx;
+
+    if (es > ES_16) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    sum = tcg_temp_new_i32();
+    tmp = tcg_temp_new_i32();
+    for (dst_idx = 0; dst_idx < 4; dst_idx++) {
+        uint8_t idx = dst_idx * NUM_VEC_ELEMENTS(es) / 4;
+        const uint8_t max_idx = idx + NUM_VEC_ELEMENTS(es) / 4 - 1;
+
+        read_vec_element_i32(sum, get_field(s->fields, v3), max_idx, es);
+        for (; idx <= max_idx; idx++) {
+            read_vec_element_i32(tmp, get_field(s->fields, v2), idx, es);
+            tcg_gen_add_i32(sum, sum, tmp);
+        }
+        write_vec_element_i32(sum, get_field(s->fields, v1), dst_idx, ES_32);
+    }
+    tcg_temp_free_i32(sum);
+    tcg_temp_free_i32(tmp);
+    return DISAS_NEXT;
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 41/41] s390x/tcg: Implement VECTOR TEST UNDER MASK
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Thomas Huth, Cornelia Huck, Richard Henderson,
	David Hildenbrand

Let's return the cc value directly via cpu_env. Unfortunately there
isn't a simple way to calculate the value lazily - one would have to
calculate and store e.g. the population count of the mask and the
result so it can be evaluated in a cc helper.

But as VTM only sets the cc, we can assume the value will be needed soon
either way.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 11 +++++++++++
 target/s390x/vec_int_helper.c   | 18 ++++++++++++++++++
 4 files changed, 32 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index d040e4cd07..200d1730f4 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -234,6 +234,7 @@ DEF_HELPER_FLAGS_4(gvec_vscbi8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vscbi16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vscbi128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vsbcbi128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_4(gvec_vtm, void, ptr, cptr, env, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index a52db41388..e61475bdc4 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1188,6 +1188,8 @@
     F(0xe767, VSUMQ,   VRR_c, V,   0, 0, 0, 0, vsumq, 0, IF_VEC)
 /* VECTOR SUM ACROSS WORD */
     F(0xe764, VSUM,    VRR_c, V,   0, 0, 0, 0, vsum, 0, IF_VEC)
+/* VECTOR TEST UNDER MASK */
+    F(0xe7d8, VTM,     VRR_a, V,   0, 0, 0, 0, vtm, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 59a9885892..eacf68ec28 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -191,6 +191,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
 #define gen_gvec_2i_ool(v1, v2, c, data, fn) \
     tcg_gen_gvec_2i_ool(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                         c, 16, 16, data, fn)
+#define gen_gvec_2_ptr(v1, v2, ptr, data, fn) \
+    tcg_gen_gvec_2_ptr(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                       ptr, 16, 16, data, fn)
 #define gen_gvec_3(v1, v2, v3, gen) \
     tcg_gen_gvec_3(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                    vec_full_reg_offset(v3), 16, 16, gen)
@@ -2315,3 +2318,11 @@ static DisasJumpType op_vsum(DisasContext *s, DisasOps *o)
     tcg_temp_free_i32(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vtm(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_2_ptr(get_field(s->fields, v1), get_field(s->fields, v2),
+                   cpu_env, 0, gen_helper_gvec_vtm);
+    set_cc_static(s);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 95686e79fd..f5bdaff633 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -819,3 +819,21 @@ void HELPER(gvec_vsbcbi128)(void *v1, const void *v2, const void *v3,
     dst->doubleword[0] = 0;
     dst->doubleword[1] = borrow;
 }
+
+void HELPER(gvec_vtm)(void *v1, const void *v2, CPUS390XState *env,
+                      uint32_t desc)
+{
+    S390Vector tmp;
+
+    s390_vec_and(&tmp, v1, v2);
+    if (s390_vec_is_zero(&tmp)) {
+        /* Selected bits all zeros; or all mask bits zero */
+        env->cc_op = 0;
+    } else if (s390_vec_equal(&tmp, v2)) {
+        /* Selected bits all ones */
+        env->cc_op = 3;
+    } else {
+        /* Selected bits a mix of zeros and ones */
+        env->cc_op = 1;
+    }
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 152+ messages in thread

* [Qemu-devel] [PATCH v1 41/41] s390x/tcg: Implement VECTOR TEST UNDER MASK
@ 2019-04-11 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-11 10:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, David Hildenbrand, Thomas Huth,
	Richard Henderson

Let's return the cc value directly via cpu_env. Unfortunately there
isn't a simple way to calculate the value lazily - one would have to
calculate and store e.g. the population count of the mask and the
result so it can be evaluated in a cc helper.

But as VTM only sets the cc, we can assume the value will be needed soon
either way.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.inc.c | 11 +++++++++++
 target/s390x/vec_int_helper.c   | 18 ++++++++++++++++++
 4 files changed, 32 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index d040e4cd07..200d1730f4 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -234,6 +234,7 @@ DEF_HELPER_FLAGS_4(gvec_vscbi8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vscbi16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vscbi128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vsbcbi128, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, cptr, i32)
+DEF_HELPER_4(gvec_vtm, void, ptr, cptr, env, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index a52db41388..e61475bdc4 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1188,6 +1188,8 @@
     F(0xe767, VSUMQ,   VRR_c, V,   0, 0, 0, 0, vsumq, 0, IF_VEC)
 /* VECTOR SUM ACROSS WORD */
     F(0xe764, VSUM,    VRR_c, V,   0, 0, 0, 0, vsum, 0, IF_VEC)
+/* VECTOR TEST UNDER MASK */
+    F(0xe7d8, VTM,     VRR_a, V,   0, 0, 0, 0, vtm, 0, IF_VEC)
 
 #ifndef CONFIG_USER_ONLY
 /* COMPARE AND SWAP AND PURGE */
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 59a9885892..eacf68ec28 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -191,6 +191,9 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t reg, TCGv_i64 enr,
 #define gen_gvec_2i_ool(v1, v2, c, data, fn) \
     tcg_gen_gvec_2i_ool(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                         c, 16, 16, data, fn)
+#define gen_gvec_2_ptr(v1, v2, ptr, data, fn) \
+    tcg_gen_gvec_2_ptr(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
+                       ptr, 16, 16, data, fn)
 #define gen_gvec_3(v1, v2, v3, gen) \
     tcg_gen_gvec_3(vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
                    vec_full_reg_offset(v3), 16, 16, gen)
@@ -2315,3 +2318,11 @@ static DisasJumpType op_vsum(DisasContext *s, DisasOps *o)
     tcg_temp_free_i32(tmp);
     return DISAS_NEXT;
 }
+
+static DisasJumpType op_vtm(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_2_ptr(get_field(s->fields, v1), get_field(s->fields, v2),
+                   cpu_env, 0, gen_helper_gvec_vtm);
+    set_cc_static(s);
+    return DISAS_NEXT;
+}
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 95686e79fd..f5bdaff633 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -819,3 +819,21 @@ void HELPER(gvec_vsbcbi128)(void *v1, const void *v2, const void *v3,
     dst->doubleword[0] = 0;
     dst->doubleword[1] = borrow;
 }
+
+void HELPER(gvec_vtm)(void *v1, const void *v2, CPUS390XState *env,
+                      uint32_t desc)
+{
+    S390Vector tmp;
+
+    s390_vec_and(&tmp, v1, v2);
+    if (s390_vec_is_zero(&tmp)) {
+        /* Selected bits all zeros; or all mask bits zero */
+        env->cc_op = 0;
+    } else if (s390_vec_equal(&tmp, v2)) {
+        /* Selected bits all ones */
+        env->cc_op = 3;
+    } else {
+        /* Selected bits a mix of zeros and ones */
+        env->cc_op = 1;
+    }
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 02/41] s390x/tcg: Implement VECTOR ADD
@ 2019-04-12 18:28     ` Richard Henderson
  0 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 18:28 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Thomas Huth, Cornelia Huck

On 4/11/19 12:07 AM, David Hildenbrand wrote:
> Introduce two types of fancy new helpers that will be reused a couple of
> times
> 
> 1. gen_gvec_fn_3: Call an existing tcg_gen_gvec_X function with 3
>    parameters, simplifying parameter passing
> 2. gen_gvec128_3_i64: Call a function that performs 128 bit calculations
>    using two 64 bit values per vector.
> 
> Luckily, for VECTOR ADD we already have everything we need.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  5 ++++
>  target/s390x/translate_vx.inc.c | 52 +++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 02/41] s390x/tcg: Implement VECTOR ADD
@ 2019-04-12 18:28     ` Richard Henderson
  0 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 18:28 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 4/11/19 12:07 AM, David Hildenbrand wrote:
> Introduce two types of fancy new helpers that will be reused a couple of
> times
> 
> 1. gen_gvec_fn_3: Call an existing tcg_gen_gvec_X function with 3
>    parameters, simplifying parameter passing
> 2. gen_gvec128_3_i64: Call a function that performs 128 bit calculations
>    using two 64 bit values per vector.
> 
> Luckily, for VECTOR ADD we already have everything we need.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  5 ++++
>  target/s390x/translate_vx.inc.c | 52 +++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
@ 2019-04-12 21:05     ` Richard Henderson
  0 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 21:05 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Thomas Huth, Cornelia Huck

On 4/11/19 12:07 AM, David Hildenbrand wrote:
> +    static const GVecGen3 g[5] = {
> +        { .fni8 = gen_acc8_i64, },
> +        { .fni8 = gen_acc16_i64, },
> +        { .fni8 = gen_acc32_i64, },
> +        { .fni8 = gen_acc_i64, },
> +        { .fno = gen_helper_gvec_vacc128, },
> +    };

Vector versions of the first four are fairly simple too.

static void gen_acc_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
{
    tcgv_vec t = tcg_temp_new_vec_matching(d);

    tcg_gen_add_vec(vece, t, a, b);
    tcg_gen_cmp_vec(TCG_COND_LTU, vece, d, r, a);  /* produces -1 for carry */
    tcg_gen_neg_vec(vece, d, d);                 /* convert to +1 for carry */
}

  { .fni8 = gen_acc8_i64,
    .fniv = gen_acc_vec,
    .opc = INDEX_op_cmp_vec,
    .vece = MO_8 },
  ...


I'm surprised that you're expanding the 128-bit addition out-of-line.
One possible expansion is

  tcg_gen_add2_i64(tl, th, al, zero, bl, zero);
  tcg_gen_add2_i64(tl, th, th, zero, ah, zero);
  tcg_gen_add2_i64(tl, th, tl, th, bl, zero);
  /* carry out in th */


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
@ 2019-04-12 21:05     ` Richard Henderson
  0 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 21:05 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 4/11/19 12:07 AM, David Hildenbrand wrote:
> +    static const GVecGen3 g[5] = {
> +        { .fni8 = gen_acc8_i64, },
> +        { .fni8 = gen_acc16_i64, },
> +        { .fni8 = gen_acc32_i64, },
> +        { .fni8 = gen_acc_i64, },
> +        { .fno = gen_helper_gvec_vacc128, },
> +    };

Vector versions of the first four are fairly simple too.

static void gen_acc_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
{
    tcgv_vec t = tcg_temp_new_vec_matching(d);

    tcg_gen_add_vec(vece, t, a, b);
    tcg_gen_cmp_vec(TCG_COND_LTU, vece, d, r, a);  /* produces -1 for carry */
    tcg_gen_neg_vec(vece, d, d);                 /* convert to +1 for carry */
}

  { .fni8 = gen_acc8_i64,
    .fniv = gen_acc_vec,
    .opc = INDEX_op_cmp_vec,
    .vece = MO_8 },
  ...


I'm surprised that you're expanding the 128-bit addition out-of-line.
One possible expansion is

  tcg_gen_add2_i64(tl, th, al, zero, bl, zero);
  tcg_gen_add2_i64(tl, th, th, zero, ah, zero);
  tcg_gen_add2_i64(tl, th, tl, th, bl, zero);
  /* carry out in th */


r~


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 04/41] s390x/tcg: Implement VECTOR ADD WITH CARRY
  2019-04-11 10:07   ` David Hildenbrand
  (?)
@ 2019-04-12 21:36   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 21:36 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:07 AM, David Hildenbrand wrote:
> Only slightly ugly, perform two additions. At least it is only supported
> for 128 bit elements.
> 
> Introduce gen_gvec128_4_i64() similar to gen_gvec128_3_i64().
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 63 +++++++++++++++++++++++++++++++++
>  2 files changed, 65 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 05/41] s390x/tcg: Implement VECTOR ADD WITH CARRY COMPUTE CARRY
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-12 21:58   ` Richard Henderson
  2019-04-16  8:40     ` David Hildenbrand
  -1 siblings, 1 reply; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 21:58 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> +static DisasJumpType op_vaccc(DisasContext *s, DisasOps *o)
> +{
> +    if (get_field(s->fields, m5) != ES_128) {
> +        gen_program_exception(s, PGM_SPECIFICATION);
> +        return DISAS_NORETURN;
> +    }
> +
> +    gen_gvec_4_ool(get_field(s->fields, v1), get_field(s->fields, v2),
> +                   get_field(s->fields, v3), get_field(s->fields, v4), 0,
> +                   gen_helper_gvec_vaccc128);
> +    return DISAS_NEXT;
> +}

An inline expansion could be

One possible expansion is

  tcg_gen_andi_i64(tl, cl, 1);
  tcg_gen_add2_i64(tl, th, tl, zero, al, zero);
  tcg_gen_add2_i64(tl, th, tl, th, bl, zero);
  tcg_gen_add2_i64(tl, th, th, zero, ah, zero);
  tcg_gen_add2_i64(tl, th, tl, th, bl, zero);
  /* carry out in th */

This is 8 insns for the addition vs the hw optimal 6, but we're not exactly an
optimizing compiler either.  ;-)


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 06/41] s390x/tcg: Implement VECTOR AND (WITH COMPLEMENT)
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-12 21:59   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 21:59 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Easy, as we can reuse existing gvec helpers.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  4 ++++
>  target/s390x/translate_vx.inc.c | 14 ++++++++++++++
>  2 files changed, 18 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 07/41] s390x/tcg: Implement VECTOR AVERAGE
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-12 22:34   ` Richard Henderson
  2019-04-16  8:52     ` David Hildenbrand
  -1 siblings, 1 reply; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 22:34 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> +}
> +static DisasJumpType op_vavg(DisasContext *s, DisasOps *o)
> +{

Watch your spacing.


> +    static const GVecGen3 g[4] = {
> +        { .fno = gen_helper_gvec_vavg8, },
> +        { .fno = gen_helper_gvec_vavg16, },
> +        { .fni4 = gen_avg_i32, },
> +        { .fni8 = gen_avg_i64, },
> +    };

Pondering possible vector expansions.  I think one possibility is

  t1 = (a >> 1) + (b >> 1);

We still have the two "0.5 bits" to add back in, plus we round up by adding
another 0.5.  This means if either lsb is set, then we have carry in to the 1's
bit.  So:

  t1 = t1 + ((a | b) & 1);

Which leads to

  tcg_gen_sari_vec(vece, t0, a, 1);
  tcg_gen_sari_vec(vece, t1, b, 1);
  tcg_gen_or_vec(vece, t2, a, b);
  tcg_gen_add_vec(vece, t0, t0, t1);
  tcg_gen_dupi_vec(vece, t1, 1);
  tcg_gen_and_vec(vece, t2, t2, t1);
  tcg_gen_add_vec(vece, t0, t0, t2);

  { .fnv = gen_avg_vec,
    .fno = gen_helper_gvec_vavg8,
    .opc = INDEX_op_sari_vec },

But what you have here is correct and the above is mere optimization so,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 08/41] s390x/tcg: Implement VECTOR AVERAGE LOGICAL
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-12 22:35   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 22:35 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Similar to VECTOR AVERAGE but without sign extension.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  2 ++
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 48 +++++++++++++++++++++++++++++++++
>  target/s390x/vec_int_helper.c   | 16 +++++++++++
>  4 files changed, 68 insertions(+)

The vector expansion for vavg applies here too, except with shri instead of
sari.  And of course you code is correct as-is.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 09/41] s390x/tcg: Implement VECTOR CHECKSUM
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-12 23:01   ` Richard Henderson
  2019-04-16  8:58     ` David Hildenbrand
  -1 siblings, 1 reply; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 23:01 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> +    read_vec_element_i32(sum, get_field(s->fields, v3), 1, ES_32);
> +    for (i = 0; i < 4; i++) {
> +        read_vec_element_i32(tmp, get_field(s->fields, v2), i, ES_32);
> +        tcg_gen_add_i32(sum, sum, tmp);
> +        tcg_gen_setcond_i32(TCG_COND_LTU, tmp, sum, tmp);
> +        tcg_gen_add_i32(sum, sum, tmp);
> +    }
> +    zero_vec(get_field(s->fields, v1));
> +    write_vec_element_i32(sum, get_field(s->fields, v1), 1, ES_32);

It seems like it should be possible to implement this with i64, and fold the
carry around at the end -- 2 insns instead of 12 for managing carry.  But I
can't quite tell if that produces the same results.

You could use

  tcg_gen_add2_i32(sum, tmp, sum, zero, tmp, zero);
  tcg_gen_add_i32(sum, sum, tmp);

instead of computing carry manually with setcond.

That said, your code exactly matches the language in the manual, so

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 10/41] s390x/tcg: Implement VECTOR ELEMENT COMPARE *
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-12 23:14   ` Richard Henderson
  2019-04-16  9:05     ` David Hildenbrand
  -1 siblings, 1 reply; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 23:14 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> +                         es | logical ? 0 : MO_SIGN);

Incorrect operator precedence.  You need:

  es | (logical ? 0 : MO_SIGN)

or

  logical ? es : es | MO_SIGN

And perhaps cse this expression into a temporary
and not replicate it between the two reads.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 11/41] s390x/tcg: Implement VECTOR COMPARE *
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-12 23:17   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 23:17 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> To carry out the comparison, we can reuse the existing gvec comparison
> function. In case the CC is to be computed, save the result vector
> and compute the CC lazily. The result is a vector consisting of all 1's
> for elements that matched and 0's for elements that didn't match.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/cc_helper.c        | 17 +++++++++++++++++
>  target/s390x/helper.c           |  1 +
>  target/s390x/insn-data.def      |  6 ++++++
>  target/s390x/internal.h         |  1 +
>  target/s390x/translate.c        |  1 +
>  target/s390x/translate_vx.inc.c | 28 ++++++++++++++++++++++++++++
>  6 files changed, 54 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 12/41] s390x/tcg: Implement VECTOR COUNT LEADING ZEROS
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-12 23:21   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 23:21 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> For 8/16, use the 32 bit variant and properly subtract the added
> leading zero bits.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  2 ++
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 31 +++++++++++++++++++++++++++++++
>  target/s390x/vec_int_helper.c   | 14 ++++++++++++++
>  4 files changed, 49 insertions(+)
> 
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 13/41] s390x/tcg: Implement VECTOR COUNT TRAILING ZEROS
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-12 23:23   ` Richard Henderson
  2019-04-16  9:07     ` David Hildenbrand
  -1 siblings, 1 reply; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 23:23 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> +        s390_vec_write_element##BITS(v1, i, ctz32(a));                         

Wrong result for a == 0.  You need a ? ctz32(a) : BITS.


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 14/41] s390x/tcg: Implement VECTOR EXCLUSIVE OR
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-12 23:23   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 23:23 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Easy, we can reuse an existing gvec helper.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      | 2 ++
>  target/s390x/translate_vx.inc.c | 7 +++++++
>  2 files changed, 9 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 15/41] s390x/tcg: Implement VECTOR GALOIS FIELD MULTIPLY SUM (AND ACCUMULATE)
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-12 23:44   ` Richard Henderson
  2019-04-16  9:10     ` David Hildenbrand
  -1 siblings, 1 reply; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 23:44 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> A galois field multiplication in field 2 is like binary multiplication,
> however instead of doing ordinary binary additions, xor's are performed.
> So no carries are considered.
> 
> Implement all variants via helpers. s390_vec_sar() and s390_vec_shr()
> will be reused later on.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |   8 ++
>  target/s390x/insn-data.def      |   4 +
>  target/s390x/translate_vx.inc.c |  38 ++++++++
>  target/s390x/vec_int_helper.c   | 168 ++++++++++++++++++++++++++++++++
>  4 files changed, 218 insertions(+)

FYI, this is now the 4th copy of this operation.

  arm: pmull
  x86: pclmulqdq
  ppc: vpmsum[bhwd]

We really should promote this to generic.  But that can come later,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 16/41] s390x/tcg: Implement VECTOR LOAD COMPLEMENT
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-12 23:47   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 23:47 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> We can reuse an existing gvec helper for negating the values.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
>  2 files changed, 19 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 17/41] s390x/tcg: Implement VECTOR LOAD POSITIVE
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-12 23:50   ` Richard Henderson
  2019-04-16  9:16     ` David Hildenbrand
  -1 siblings, 1 reply; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 23:50 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Similar to VECTOR LOAD COMPLEMENT but unfortunately we don't have a
> gvec helper.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  2 ++
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 40 +++++++++++++++++++++++++++++++++
>  target/s390x/vec_int_helper.c   | 14 ++++++++++++
>  4 files changed, 58 insertions(+)

I would be happy to add ABS as a gvec primitive, if you like.
Expandable with MAX or CMP on hosts without ABS.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 18/41] s390x/tcg: Implement VECTOR (MAXIMUM|MINIMUM) (LOGICAL)
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-12 23:51   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-12 23:51 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Luckily, we already have gvec helpers for all four cases.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  8 ++++++++
>  target/s390x/translate_vx.inc.c | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 39 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 19/41] s390x/tcg: Implement VECTOR MULTIPLY AND ADD *
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  0:01   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  0:01 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Quite some variants to handle. At least handle some 32-bit element
> variants via gvec expansion (we could also handle 16/32-bit variants
> for ODD and EVEN easily via gvec expansion, but let's keep it simple
> for now).
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  18 +++++
>  target/s390x/insn-data.def      |  14 ++++
>  target/s390x/translate_vx.inc.c | 122 +++++++++++++++++++++++++++++++
>  target/s390x/vec_int_helper.c   | 123 ++++++++++++++++++++++++++++++++
>  4 files changed, 277 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 20/41] s390x/tcg: Implement VECTOR MULTIPLY *
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  0:04   ` Richard Henderson
  2019-04-16  9:23     ` David Hildenbrand
  -1 siblings, 1 reply; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  0:04 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> +static void gen_mh_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
> +{
> +    TCGv_i64 t0 = tcg_temp_new_i64();
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +
> +    tcg_gen_ext_i32_i64(t0, a);
> +    tcg_gen_ext_i32_i64(t1, b);
> +    tcg_gen_mul_i64(t0, t0, t1);
> +    tcg_gen_extrh_i64_i32(d, t0);

This is

  tcg_gen_muls2_i32(tmp, d, a, b);

Similarly for mlh w/ mulu2.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 21/41] s390x/tcg: Implement VECTOR NAND
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  0:05   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  0:05 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Part of vector enhancements facility 1, but easy to implement.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      | 2 ++
>  target/s390x/translate.c        | 1 +
>  target/s390x/translate_vx.inc.c | 7 +++++++
>  3 files changed, 10 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 22/41] s390x/tcg: Implement VECTOR NOR
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  0:05   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  0:05 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      | 2 ++
>  target/s390x/translate_vx.inc.c | 7 +++++++
>  2 files changed, 9 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 23/41] s390x/tcg: Implement VECTOR NOT EXCLUSIVE OR
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  0:06   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  0:06 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Again, part of vector enhancement facility 1. The operation corresponds
> to an bitwise equality check.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      | 2 ++
>  target/s390x/translate_vx.inc.c | 7 +++++++
>  2 files changed, 9 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 24/41] s390x/tcg: Implement VECTOR OR
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  0:06   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  0:06 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Reuse a gvec helper.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      | 2 ++
>  target/s390x/translate_vx.inc.c | 7 +++++++
>  2 files changed, 9 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 25/41] s390x/tcg: Implement VECTOR OR WITH COMPLEMENT
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  0:07   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  0:07 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Again, vector enhancements facility 1 material.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      | 2 ++
>  target/s390x/translate_vx.inc.c | 7 +++++++
>  2 files changed, 9 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 26/41] s390x/tcg: Implement VECTOR POPULATION COUNT
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  0:08   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  0:08 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Similar to VECTOR COUNT TRAILING ZEROES.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  2 ++
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 19 +++++++++++++++++++
>  target/s390x/vec_int_helper.c   | 14 ++++++++++++++
>  4 files changed, 37 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 27/41] s390x/tcg: Implement VECTOR ELEMENT ROTATE LEFT LOGICAL
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  0:15   ` Richard Henderson
  2019-04-16  9:27     ` David Hildenbrand
  -1 siblings, 1 reply; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  0:15 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> +#define DEF_ROTL(BITS)                                                         \
> +static uint##BITS##_t rotl##BITS(uint##BITS##_t a, uint8_t count)            \
> +{                                                                            \
> +    count &= BITS - 1;                                                       \
> +    return (a << count) | (a >> (BITS - count));                             \
> +}
> +DEF_ROTL(8)
> +DEF_ROTL(16)

We already have rol8 and rol16 in <qemu/bitops.h> for this.
Otherwise,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

This does point out that I should go ahead and fill out the
variable shift and rotate patterns in gvec...


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 28/41] s390x/tcg: Implement VECTOR ELEMENT ROTATE AND INSERT UNDER MASK
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  0:29   ` Richard Henderson
  2019-04-16  9:35     ` David Hildenbrand
  -1 siblings, 1 reply; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  0:29 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> +static void gen_rim_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, int32_t c)
> +{
> +    TCGv_i32 t0 = tcg_temp_new_i32();
> +    TCGv_i32 t1 = tcg_temp_new_i32();
> +
> +    tcg_gen_andc_i32(t0, a, b);
> +    tcg_gen_rotli_i32(t1, a, c & 31);
> +    tcg_gen_and_i32(t1, t1, b);
> +    tcg_gen_or_i32(d, t0, t1);

The ANDC and ROTL look to be in the wrong order.

"For each bit in the third operand (b) that is one,
the corresponding bit *of the rotated elements* in
the second operand replaces the corresponding bit in
the first operand".

I think you need

    tcg_gen_rotli_i32(a, a, c & 31);
    tcg_gen_and_i32(a, a, b);
    tcg_gen_andc_i32(d, d, b);
    tcg_gen_or_i32(d, d, a);

with

  { .fni4 = gen_rim_32, .load_dest = true },

> +     const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
> +     const uint##BITS##_t mask = s390_vec_read_element##BITS(v3, i);        \
> +     const uint##BITS##_t d = (a & ~mask) | (rotl##BITS(a, count) & mask);  \

Again, this seems to be missing the insert into "the first operand", i.e.
loading from v1 as well.


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 29/41] s390x/tcg: Implement VECTOR ELEMENT SHIFT
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  0:31   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  0:31 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Only for one special case we can reuse real gvec helpers. Mostly
> rely on oom helpers.
> 
> One important thing to take care of is always to properly mask of
> unused bits from the shift count.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  18 +++++
>  target/s390x/insn-data.def      |   9 +++
>  target/s390x/translate_vx.inc.c | 113 ++++++++++++++++++++++++++++++++
>  target/s390x/vec_int_helper.c   |  99 ++++++++++++++++++++++++++++
>  4 files changed, 239 insertions(+)

Again, I should fill out these in gvec...

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 30/41] s390x/tcg: Implement VECTOR SHIFT LEFT (BY BYTE)
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  0:36   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  0:36 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> We can reuse the existing 128-bit shift utility function.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  1 +
>  target/s390x/insn-data.def      |  4 ++++
>  target/s390x/translate_vx.inc.c | 20 ++++++++++++++++++++
>  target/s390x/vec_int_helper.c   |  6 ++++++
>  4 files changed, 31 insertions(+)


Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 31/41] s390x/tcg: Implement VECTOR SHIFT LEFT DOUBLE BY BYTE
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  0:54   ` Richard Henderson
  2019-04-16  9:45     ` David Hildenbrand
  -1 siblings, 1 reply; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  0:54 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> +static DisasJumpType op_vsldb(DisasContext *s, DisasOps *o)
> +{
> +    int src_idx = get_field(s->fields, i4) & 0xf;
> +
> +    if (src_idx == 0) {
> +        gen_gvec_mov(get_field(s->fields, v1), get_field(s->fields, v2));
> +    } else {
> +        gen_gvec_3_ool(get_field(s->fields, v1), get_field(s->fields, v2),
> +                       get_field(s->fields, v3), src_idx,
> +                       gen_helper_gvec_vsldb);
> +        return DISAS_NEXT;

You could also expand this inline using your new extract2 primitive.

  int i4 = get_field(s->fields, i4);
  int left_shift, right_shift;

  left_shift = (i4 & 7) * 8;
  right_shift = 64 - left_shift;

  if ((i4 & 8) == 0) {
      read_vec_element_i64(t0, get_field(s->fields, v2), 0, ES_64);
      read_vec_element_i64(t1, get_field(s->fields, v2), 1, ES_64);
      read_vec_element_i64(t2, get_field(s->fields, v3), 0, ES_64);
  } else {
      read_vec_element_i64(t0, get_field(s->fields, v2), 1, ES_64);
      read_vec_element_i64(t1, get_field(s->fields, v3), 0, ES_64);
      read_vec_element_i64(t2, get_field(s->fields, v3), 1, ES_64);
  }
  tcg_gen_extract2_i64(t0, t1, t0, right_shift);
  tcg_gen_extract2_i64(t1, t2, t1, right_shift);
  write_vec_element_i64(t0, get_field(s->fields, v1), 0, ES_64);
  write_vec_element_i64(t1, get_field(s->fields, v1), 1, ES_64);


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 32/41] s390x/tcg: Implement VECTOR SHIFT RIGHT ARITHMETIC
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  5:48   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  5:48 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Similar to VECTOR SHIFT LEFT ARITHMETIC. Add s390_vec_sar() similar to
> s390_vec_shr().
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  1 +
>  target/s390x/insn-data.def      |  4 ++++
>  target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
>  target/s390x/vec_int_helper.c   | 26 ++++++++++++++++++++++++++
>  4 files changed, 48 insertions(+)


Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 33/41] s390x/tcg: Implement VECTOR SHIFT RIGHT LOGICAL *
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  5:48   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  5:48 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Similar to VECTOR SHIFT RIGHT ARITHMETICAL.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  1 +
>  target/s390x/insn-data.def      |  4 ++++
>  target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
>  target/s390x/vec_int_helper.c   |  6 ++++++
>  4 files changed, 28 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 34/41] s390x/tcg: Implement VECTOR SUBTRACT
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  5:49   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  5:49 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> We can use tcg_gen_sub2_i64() to do 128-bit subtraction and otherwise
> existing gvec helpers.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 17 +++++++++++++++++
>  2 files changed, 19 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 35/41] s390x/tcg: Implement VECTOR SUBTRACT COMPUTE BORROW INDICATION
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  5:51   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  5:51 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Let's keep it simple for now and handle 8/16/128 bit elements via helpers.
> Especially for 8/16, we could come up with some bit tricks.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  3 +++
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 30 +++++++++++++++++++++
>  target/s390x/vec_int_helper.c   | 47 +++++++++++++++++++++++++++++++++
>  4 files changed, 82 insertions(+)

Similar comments to add carry, really,
but what you have is not wrong.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 36/41] s390x/tcg: Implement VECTOR SUBTRACT WITH BORROW INDICATION
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  5:52   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  5:52 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Fairly easy as only 128-bit handling is required. Simply perform the
> subtraction and then subtract the borrow.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 27 +++++++++++++++++++++++++++
>  2 files changed, 29 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 37/41] s390x/tcg: Implement VECTOR SUBTRACT WITH BORROW COMPUTE BORROW INDICATION
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  6:11   ` Richard Henderson
  2019-04-16 18:26     ` David Hildenbrand
  -1 siblings, 1 reply; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  6:11 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> +static DisasJumpType op_vsbcbi(DisasContext *s, DisasOps *o)
> +{
> +    if (get_field(s->fields, m5) != ES_128) {
> +        gen_program_exception(s, PGM_SPECIFICATION);
> +        return DISAS_NORETURN;
> +    }
> +
> +    gen_gvec_4_ool(get_field(s->fields, v1), get_field(s->fields, v2),
> +                   get_field(s->fields, v3), get_field(s->fields, v4), 0,
> +                   gen_helper_gvec_vsbcbi128);
> +    return DISAS_NEXT;
> +}


Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

I'm sure this can be done similarly to add with carry compute carry, but it's
harder to reason with sub2.  Something like

	tcg_gen_andi_i64(tl, cl, 1);
	tcg_gen_sub2_i64(tl, th, al, zero, tl, zero);
	tcg_gen_sub2_i64(tl, th, tl, th, bl, zero);
	tcg_gen_andi_i64(tl, th, 1);
	tcg_gen_sub2_i64(tl, th, ah, zero, tl, zero);
	tcg_gen_sub2_i64(tl, th, tl, th, bh, zero);
	tcg_gen_andi_i64(tl, th, 1);
	/* result in tl */


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 38/41] s390x/tcg: Implement VECTOR SUM ACROSS DOUBLEWORD
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  6:15   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  6:15 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Perform the calculations without a helper. Only 16 bit or 32 bit values
> have to be added.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 29 +++++++++++++++++++++++++++++
>  2 files changed, 31 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 39/41] s390x/tcg: Implement VECTOR SUM ACROSS QUADWORD
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  6:17   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  6:17 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> +    read_vec_element_i64(suml, get_field(s->fields, v3), max_idx, es);
> +    for (idx = 0; idx <= max_idx; idx++) {
> +        read_vec_element_i64(tmpl, get_field(s->fields, v2), idx, es);
> +        tcg_gen_add2_i64(suml, sumh, suml, sumh, tmpl, zero);
> +    }
> +    write_vec_element_i64(sumh, get_field(s->fields, v1), 0, ES_64);
> +    write_vec_element_i64(suml, get_field(s->fields, v1), 1, ES_64);

It's a long way around for ES_32, as there will never be overflow into bit 65.
 But I guess it's not wrong.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 40/41] s390x/tcg: Implement VECTOR SUM ACROSS WORD
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  6:19   ` Richard Henderson
  -1 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  6:19 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> Similar to VECTOR SUM ACROSS DOUBLEWORD.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/insn-data.def      |  2 ++
>  target/s390x/translate_vx.inc.c | 29 +++++++++++++++++++++++++++++
>  2 files changed, 31 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 41/41] s390x/tcg: Implement VECTOR TEST UNDER MASK
  2019-04-11 10:08   ` David Hildenbrand
  (?)
@ 2019-04-13  6:28   ` Richard Henderson
  2019-04-16 18:20     ` David Hildenbrand
  -1 siblings, 1 reply; 152+ messages in thread
From: Richard Henderson @ 2019-04-13  6:28 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 4/11/19 12:08 AM, David Hildenbrand wrote:
> +void HELPER(gvec_vtm)(void *v1, const void *v2, CPUS390XState *env,
> +                      uint32_t desc)
> +{
> +    S390Vector tmp;
> +
> +    s390_vec_and(&tmp, v1, v2);
> +    if (s390_vec_is_zero(&tmp)) {
> +        /* Selected bits all zeros; or all mask bits zero */
> +        env->cc_op = 0;
> +    } else if (s390_vec_equal(&tmp, v2)) {
> +        /* Selected bits all ones */
> +        env->cc_op = 3;
> +    } else {
> +        /* Selected bits a mix of zeros and ones */
> +        env->cc_op = 1;
> +    }
> +}

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

However, if you return this value, then you can do

DEF_HELPER_FLAGS_4(gvec_vtm, TCG_CALL_NO_RWG_SE, i32, cptr, cptr)

static DisasJumpType op_vtm(DisasContext *s, DisasOps *o)
{
    TCGv_ptr p1 = tcg_temp_new_ptr();
    TCGv_ptr p2 = tcg_temp_new_ptr();

    tcg_gen_addi_ptr(p1, cpu_env,
        vec_full_reg_offset(get_field(s->fields, v1)));
    tcg_gen_addi_ptr(p2, cpu_env,
        vec_full_reg_offset(get_field(s->fields, v2)));
    gen_helper_gvec_vtm(cc_op, p1, p2);
    tcg_temp_free_ptr(p1);
    tcg_temp_free_ptr(p2);
    set_cc_static(s);
    return DISAS_NEXT;
}

Perhaps it doesn't matter though, since use of vtm probably implies a jump,
which implies end of TB, which means that registers are going to get saved to
backing store anyway.


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
@ 2019-04-16  8:01       ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  8:01 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x, Thomas Huth, Cornelia Huck

On 12.04.19 23:05, Richard Henderson wrote:
> On 4/11/19 12:07 AM, David Hildenbrand wrote:
>> +    static const GVecGen3 g[5] = {
>> +        { .fni8 = gen_acc8_i64, },
>> +        { .fni8 = gen_acc16_i64, },
>> +        { .fni8 = gen_acc32_i64, },
>> +        { .fni8 = gen_acc_i64, },
>> +        { .fno = gen_helper_gvec_vacc128, },
>> +    };
> 
> Vector versions of the first four are fairly simple too.
> 
> static void gen_acc_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
> {
>     tcgv_vec t = tcg_temp_new_vec_matching(d);
> 
>     tcg_gen_add_vec(vece, t, a, b);
>     tcg_gen_cmp_vec(TCG_COND_LTU, vece, d, r, a);  /* produces -1 for carry */
>     tcg_gen_neg_vec(vece, d, d);                 /* convert to +1 for carry */
> }
> 
>   { .fni8 = gen_acc8_i64,
>     .fniv = gen_acc_vec,
>     .opc = INDEX_op_cmp_vec,
>     .vece = MO_8 },
>   ...
> 

Indeed, I didn't really explore vector operations yet. This is more
compact than I expected :)

> 
> I'm surprised that you're expanding the 128-bit addition out-of-line.
> One possible expansion is
> 
>   tcg_gen_add2_i64(tl, th, al, zero, bl, zero);
>   tcg_gen_add2_i64(tl, th, th, zero, ah, zero);
>   tcg_gen_add2_i64(tl, th, tl, th, bl, zero);
>   /* carry out in th */

Nice trick. Just so I get it right, the third line should actually be

tcg_gen_add2_i64(tl, th, tl, th, bh, zero);

right? Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
@ 2019-04-16  8:01       ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  8:01 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 12.04.19 23:05, Richard Henderson wrote:
> On 4/11/19 12:07 AM, David Hildenbrand wrote:
>> +    static const GVecGen3 g[5] = {
>> +        { .fni8 = gen_acc8_i64, },
>> +        { .fni8 = gen_acc16_i64, },
>> +        { .fni8 = gen_acc32_i64, },
>> +        { .fni8 = gen_acc_i64, },
>> +        { .fno = gen_helper_gvec_vacc128, },
>> +    };
> 
> Vector versions of the first four are fairly simple too.
> 
> static void gen_acc_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
> {
>     tcgv_vec t = tcg_temp_new_vec_matching(d);
> 
>     tcg_gen_add_vec(vece, t, a, b);
>     tcg_gen_cmp_vec(TCG_COND_LTU, vece, d, r, a);  /* produces -1 for carry */
>     tcg_gen_neg_vec(vece, d, d);                 /* convert to +1 for carry */
> }
> 
>   { .fni8 = gen_acc8_i64,
>     .fniv = gen_acc_vec,
>     .opc = INDEX_op_cmp_vec,
>     .vece = MO_8 },
>   ...
> 

Indeed, I didn't really explore vector operations yet. This is more
compact than I expected :)

> 
> I'm surprised that you're expanding the 128-bit addition out-of-line.
> One possible expansion is
> 
>   tcg_gen_add2_i64(tl, th, al, zero, bl, zero);
>   tcg_gen_add2_i64(tl, th, th, zero, ah, zero);
>   tcg_gen_add2_i64(tl, th, tl, th, bl, zero);
>   /* carry out in th */

Nice trick. Just so I get it right, the third line should actually be

tcg_gen_add2_i64(tl, th, tl, th, bh, zero);

right? Thanks!

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
@ 2019-04-16  8:17         ` Richard Henderson
  0 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-16  8:17 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Thomas Huth, Cornelia Huck

On 4/15/19 10:01 PM, David Hildenbrand wrote:
> On 12.04.19 23:05, Richard Henderson wrote:
>> On 4/11/19 12:07 AM, David Hildenbrand wrote:
>>> +    static const GVecGen3 g[5] = {
>>> +        { .fni8 = gen_acc8_i64, },
>>> +        { .fni8 = gen_acc16_i64, },
>>> +        { .fni8 = gen_acc32_i64, },
>>> +        { .fni8 = gen_acc_i64, },
>>> +        { .fno = gen_helper_gvec_vacc128, },
>>> +    };
>>
>> Vector versions of the first four are fairly simple too.
>>
>> static void gen_acc_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
>> {
>>     tcgv_vec t = tcg_temp_new_vec_matching(d);
>>
>>     tcg_gen_add_vec(vece, t, a, b);
>>     tcg_gen_cmp_vec(TCG_COND_LTU, vece, d, r, a);  /* produces -1 for carry */
>>     tcg_gen_neg_vec(vece, d, d);                 /* convert to +1 for carry */
>> }
>>
>>   { .fni8 = gen_acc8_i64,
>>     .fniv = gen_acc_vec,
>>     .opc = INDEX_op_cmp_vec,
>>     .vece = MO_8 },
>>   ...
>>
> 
> Indeed, I didn't really explore vector operations yet. This is more
> compact than I expected :)

:-)

That said, in implementing vector variable shifts today,
I've come up with a representational problem here.
You may want to hold off on these until I can address them.


>>   tcg_gen_add2_i64(tl, th, al, zero, bl, zero);
>>   tcg_gen_add2_i64(tl, th, th, zero, ah, zero);
>>   tcg_gen_add2_i64(tl, th, tl, th, bl, zero);
>>   /* carry out in th */
> 
> Nice trick. Just so I get it right, the third line should actually be
> 
> tcg_gen_add2_i64(tl, th, tl, th, bh, zero);
> 
> right? Thanks!

Oops, yes, typo indeed.


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
@ 2019-04-16  8:17         ` Richard Henderson
  0 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-16  8:17 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 4/15/19 10:01 PM, David Hildenbrand wrote:
> On 12.04.19 23:05, Richard Henderson wrote:
>> On 4/11/19 12:07 AM, David Hildenbrand wrote:
>>> +    static const GVecGen3 g[5] = {
>>> +        { .fni8 = gen_acc8_i64, },
>>> +        { .fni8 = gen_acc16_i64, },
>>> +        { .fni8 = gen_acc32_i64, },
>>> +        { .fni8 = gen_acc_i64, },
>>> +        { .fno = gen_helper_gvec_vacc128, },
>>> +    };
>>
>> Vector versions of the first four are fairly simple too.
>>
>> static void gen_acc_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
>> {
>>     tcgv_vec t = tcg_temp_new_vec_matching(d);
>>
>>     tcg_gen_add_vec(vece, t, a, b);
>>     tcg_gen_cmp_vec(TCG_COND_LTU, vece, d, r, a);  /* produces -1 for carry */
>>     tcg_gen_neg_vec(vece, d, d);                 /* convert to +1 for carry */
>> }
>>
>>   { .fni8 = gen_acc8_i64,
>>     .fniv = gen_acc_vec,
>>     .opc = INDEX_op_cmp_vec,
>>     .vece = MO_8 },
>>   ...
>>
> 
> Indeed, I didn't really explore vector operations yet. This is more
> compact than I expected :)

:-)

That said, in implementing vector variable shifts today,
I've come up with a representational problem here.
You may want to hold off on these until I can address them.


>>   tcg_gen_add2_i64(tl, th, al, zero, bl, zero);
>>   tcg_gen_add2_i64(tl, th, th, zero, ah, zero);
>>   tcg_gen_add2_i64(tl, th, tl, th, bl, zero);
>>   /* carry out in th */
> 
> Nice trick. Just so I get it right, the third line should actually be
> 
> tcg_gen_add2_i64(tl, th, tl, th, bh, zero);
> 
> right? Thanks!

Oops, yes, typo indeed.


r~



^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
@ 2019-04-16  8:33           ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  8:33 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x, Thomas Huth, Cornelia Huck

>>
>> Indeed, I didn't really explore vector operations yet. This is more
>> compact than I expected :)
> 
> :-)
> 
> That said, in implementing vector variable shifts today,
> I've come up with a representational problem here.
> You may want to hold off on these until I can address them.

I assume you mean vector helpers *in general* in this file. Yes, we can
add them later.

What exact problem are you dealing with?

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
@ 2019-04-16  8:33           ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  8:33 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

>>
>> Indeed, I didn't really explore vector operations yet. This is more
>> compact than I expected :)
> 
> :-)
> 
> That said, in implementing vector variable shifts today,
> I've come up with a representational problem here.
> You may want to hold off on these until I can address them.

I assume you mean vector helpers *in general* in this file. Yes, we can
add them later.

What exact problem are you dealing with?

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 05/41] s390x/tcg: Implement VECTOR ADD WITH CARRY COMPUTE CARRY
  2019-04-12 21:58   ` Richard Henderson
@ 2019-04-16  8:40     ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  8:40 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 12.04.19 23:58, Richard Henderson wrote:
> On 4/11/19 12:08 AM, David Hildenbrand wrote:
>> +static DisasJumpType op_vaccc(DisasContext *s, DisasOps *o)
>> +{
>> +    if (get_field(s->fields, m5) != ES_128) {
>> +        gen_program_exception(s, PGM_SPECIFICATION);
>> +        return DISAS_NORETURN;
>> +    }
>> +
>> +    gen_gvec_4_ool(get_field(s->fields, v1), get_field(s->fields, v2),
>> +                   get_field(s->fields, v3), get_field(s->fields, v4), 0,
>> +                   gen_helper_gvec_vaccc128);
>> +    return DISAS_NEXT;
>> +}
> 
> An inline expansion could be
> 
> One possible expansion is
> 
>   tcg_gen_andi_i64(tl, cl, 1);
>   tcg_gen_add2_i64(tl, th, tl, zero, al, zero);
>   tcg_gen_add2_i64(tl, th, tl, th, bl, zero);
>   tcg_gen_add2_i64(tl, th, th, zero, ah, zero);
>   tcg_gen_add2_i64(tl, th, tl, th, bl, zero);
>   /* carry out in th */
> 
> This is 8 insns for the addition vs the hw optimal 6, but we're not exactly an
> optimizing compiler either.  ;-)
> 

Yes, very nice! Thanks!

> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
@ 2019-04-16  8:43             ` Richard Henderson
  0 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-16  8:43 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Thomas Huth, Cornelia Huck

On 4/15/19 10:33 PM, David Hildenbrand wrote:
>>>
>>> Indeed, I didn't really explore vector operations yet. This is more
>>> compact than I expected :)
>>
>> :-)
>>
>> That said, in implementing vector variable shifts today,
>> I've come up with a representational problem here.
>> You may want to hold off on these until I can address them.
> 
> I assume you mean vector helpers *in general* in this file. Yes, we can
> add them later.
> 
> What exact problem are you dealing with?

The .opc field lets you only specify one opcode which is optional in the
backend on which you depend.  In writing support for AArch64 USHL, I find that
I needed 3 optional opcodes.

I'm thinking of a mass change whereby .opc becomes a pointer to an array with
terminator.  But then, for debugging purposes, I think I need to validate that
array, so that it's not missing things that ought to be specified, but which
happen to be supported by the current host.


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
@ 2019-04-16  8:43             ` Richard Henderson
  0 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-16  8:43 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 4/15/19 10:33 PM, David Hildenbrand wrote:
>>>
>>> Indeed, I didn't really explore vector operations yet. This is more
>>> compact than I expected :)
>>
>> :-)
>>
>> That said, in implementing vector variable shifts today,
>> I've come up with a representational problem here.
>> You may want to hold off on these until I can address them.
> 
> I assume you mean vector helpers *in general* in this file. Yes, we can
> add them later.
> 
> What exact problem are you dealing with?

The .opc field lets you only specify one opcode which is optional in the
backend on which you depend.  In writing support for AArch64 USHL, I find that
I needed 3 optional opcodes.

I'm thinking of a mass change whereby .opc becomes a pointer to an array with
terminator.  But then, for debugging purposes, I think I need to validate that
array, so that it's not missing things that ought to be specified, but which
happen to be supported by the current host.


r~


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
@ 2019-04-16  8:46               ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  8:46 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x, Thomas Huth, Cornelia Huck

On 16.04.19 10:43, Richard Henderson wrote:
> On 4/15/19 10:33 PM, David Hildenbrand wrote:
>>>>
>>>> Indeed, I didn't really explore vector operations yet. This is more
>>>> compact than I expected :)
>>>
>>> :-)
>>>
>>> That said, in implementing vector variable shifts today,
>>> I've come up with a representational problem here.
>>> You may want to hold off on these until I can address them.
>>
>> I assume you mean vector helpers *in general* in this file. Yes, we can
>> add them later.
>>
>> What exact problem are you dealing with?
> 
> The .opc field lets you only specify one opcode which is optional in the
> backend on which you depend.  In writing support for AArch64 USHL, I find that
> I needed 3 optional opcodes.

I was asking myself this exact thing when looking at the opc field in
the example you gave ("which instruction is one supposed to indicate
here") :)

> 
> I'm thinking of a mass change whereby .opc becomes a pointer to an array with
> terminator.  But then, for debugging purposes, I think I need to validate that
> array, so that it's not missing things that ought to be specified, but which
> happen to be supported by the current host.

Makes sense!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY
@ 2019-04-16  8:46               ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  8:46 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 16.04.19 10:43, Richard Henderson wrote:
> On 4/15/19 10:33 PM, David Hildenbrand wrote:
>>>>
>>>> Indeed, I didn't really explore vector operations yet. This is more
>>>> compact than I expected :)
>>>
>>> :-)
>>>
>>> That said, in implementing vector variable shifts today,
>>> I've come up with a representational problem here.
>>> You may want to hold off on these until I can address them.
>>
>> I assume you mean vector helpers *in general* in this file. Yes, we can
>> add them later.
>>
>> What exact problem are you dealing with?
> 
> The .opc field lets you only specify one opcode which is optional in the
> backend on which you depend.  In writing support for AArch64 USHL, I find that
> I needed 3 optional opcodes.

I was asking myself this exact thing when looking at the opc field in
the example you gave ("which instruction is one supposed to indicate
here") :)

> 
> I'm thinking of a mass change whereby .opc becomes a pointer to an array with
> terminator.  But then, for debugging purposes, I think I need to validate that
> array, so that it's not missing things that ought to be specified, but which
> happen to be supported by the current host.

Makes sense!

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 07/41] s390x/tcg: Implement VECTOR AVERAGE
  2019-04-12 22:34   ` Richard Henderson
@ 2019-04-16  8:52     ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  8:52 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 13.04.19 00:34, Richard Henderson wrote:
> On 4/11/19 12:08 AM, David Hildenbrand wrote:
>> +}
>> +static DisasJumpType op_vavg(DisasContext *s, DisasOps *o)
>> +{
> 
> Watch your spacing.

Whoops.

> 
> 
>> +    static const GVecGen3 g[4] = {
>> +        { .fno = gen_helper_gvec_vavg8, },
>> +        { .fno = gen_helper_gvec_vavg16, },
>> +        { .fni4 = gen_avg_i32, },
>> +        { .fni8 = gen_avg_i64, },
>> +    };
> 
> Pondering possible vector expansions.  I think one possibility is
> 
>   t1 = (a >> 1) + (b >> 1);
> 
> We still have the two "0.5 bits" to add back in, plus we round up by adding
> another 0.5.  This means if either lsb is set, then we have carry in to the 1's
> bit.  So:
> 
>   t1 = t1 + ((a | b) & 1);
> 
> Which leads to
> 
>   tcg_gen_sari_vec(vece, t0, a, 1);
>   tcg_gen_sari_vec(vece, t1, b, 1);
>   tcg_gen_or_vec(vece, t2, a, b);
>   tcg_gen_add_vec(vece, t0, t0, t1);
>   tcg_gen_dupi_vec(vece, t1, 1);
>   tcg_gen_and_vec(vece, t2, t2, t1);
>   tcg_gen_add_vec(vece, t0, t0, t2);
> 
>   { .fnv = gen_avg_vec,
>     .fno = gen_helper_gvec_vavg8,
>     .opc = INDEX_op_sari_vec },
> 
> But what you have here is correct and the above is mere optimization so,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 

Looks sane, as discussed, let's handle vector expansions later.

Thanks!

> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 09/41] s390x/tcg: Implement VECTOR CHECKSUM
  2019-04-12 23:01   ` Richard Henderson
@ 2019-04-16  8:58     ` David Hildenbrand
  2019-04-16  9:08       ` Richard Henderson
  0 siblings, 1 reply; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  8:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 13.04.19 01:01, Richard Henderson wrote:
> On 4/11/19 12:08 AM, David Hildenbrand wrote:
>> +    read_vec_element_i32(sum, get_field(s->fields, v3), 1, ES_32);
>> +    for (i = 0; i < 4; i++) {
>> +        read_vec_element_i32(tmp, get_field(s->fields, v2), i, ES_32);
>> +        tcg_gen_add_i32(sum, sum, tmp);
>> +        tcg_gen_setcond_i32(TCG_COND_LTU, tmp, sum, tmp);
>> +        tcg_gen_add_i32(sum, sum, tmp);
>> +    }
>> +    zero_vec(get_field(s->fields, v1));
>> +    write_vec_element_i32(sum, get_field(s->fields, v1), 1, ES_32);
> 
> It seems like it should be possible to implement this with i64, and fold the
> carry around at the end -- 2 insns instead of 12 for managing carry.  But I
> can't quite tell if that produces the same results.

I had the same in mind but also wasn't sure if it would produce the
exact same result. Feels like it should.
> 
> You could use
> 
>   tcg_gen_add2_i32(sum, tmp, sum, zero, tmp, zero);
>   tcg_gen_add_i32(sum, sum, tmp);

That makes perfect sense, I will use that for now, thanks!

> 
> instead of computing carry manually with setcond.
> 
> That said, your code exactly matches the language in the manual, so
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 10/41] s390x/tcg: Implement VECTOR ELEMENT COMPARE *
  2019-04-12 23:14   ` Richard Henderson
@ 2019-04-16  9:05     ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  9:05 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 13.04.19 01:14, Richard Henderson wrote:
> On 4/11/19 12:08 AM, David Hildenbrand wrote:
>> +                         es | logical ? 0 : MO_SIGN);
> 
> Incorrect operator precedence.  You need:
> 
>   es | (logical ? 0 : MO_SIGN)
> 
> or
> 
>   logical ? es : es | MO_SIGN
> 
> And perhaps cse this expression into a temporary
> and not replicate it between the two reads.
> 
> Otherwise,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> 
> r~
> 

Thanks, good catch! I'll do it like this

+static DisasJumpType op_vec(DisasContext *s, DisasOps *o)
+{
+    uint8_t es = get_field(s->fields, m3);
+    const uint8_t enr = NUM_VEC_ELEMENTS(es) / 2 - 1;
+
+    if (es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+    if (s->fields->op2 == 0xdb) {
+        es |= MO_SIGN;
+    }
+
+    o->in1 = tcg_temp_new_i64();
+    o->in2 = tcg_temp_new_i64();
+    read_vec_element_i64(o->in1, get_field(s->fields, v1), enr, es);
+    read_vec_element_i64(o->in2, get_field(s->fields, v2), enr, es);
+    return DISAS_NEXT;
+}


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 13/41] s390x/tcg: Implement VECTOR COUNT TRAILING ZEROS
  2019-04-12 23:23   ` Richard Henderson
@ 2019-04-16  9:07     ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  9:07 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 13.04.19 01:23, Richard Henderson wrote:
> a ? ctz32(a) : BITS

Nice catch, thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 09/41] s390x/tcg: Implement VECTOR CHECKSUM
  2019-04-16  8:58     ` David Hildenbrand
@ 2019-04-16  9:08       ` Richard Henderson
  2019-04-16  9:13         ` David Hildenbrand
  0 siblings, 1 reply; 152+ messages in thread
From: Richard Henderson @ 2019-04-16  9:08 UTC (permalink / raw)
  To: David Hildenbrand, Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 4/15/19 10:58 PM, David Hildenbrand wrote:
>> You could use
>>
>>   tcg_gen_add2_i32(sum, tmp, sum, zero, tmp, zero);
>>   tcg_gen_add_i32(sum, sum, tmp);
> That makes perfect sense, I will use that for now, thanks!
> 

Here's a funny one.  We can do this in one operation:

  tcg_gen_add2_i32(tmp, sum, sum, sum, tmp, tmp);

The lower (sum+tmp) carries into the upper (sum+tmp).
We take the upper result and discard the lower.


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 15/41] s390x/tcg: Implement VECTOR GALOIS FIELD MULTIPLY SUM (AND ACCUMULATE)
  2019-04-12 23:44   ` Richard Henderson
@ 2019-04-16  9:10     ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  9:10 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 13.04.19 01:44, Richard Henderson wrote:
> On 4/11/19 12:08 AM, David Hildenbrand wrote:
>> A galois field multiplication in field 2 is like binary multiplication,
>> however instead of doing ordinary binary additions, xor's are performed.
>> So no carries are considered.
>>
>> Implement all variants via helpers. s390_vec_sar() and s390_vec_shr()
>> will be reused later on.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  target/s390x/helper.h           |   8 ++
>>  target/s390x/insn-data.def      |   4 +
>>  target/s390x/translate_vx.inc.c |  38 ++++++++
>>  target/s390x/vec_int_helper.c   | 168 ++++++++++++++++++++++++++++++++
>>  4 files changed, 218 insertions(+)
> 
> FYI, this is now the 4th copy of this operation.
> 
>   arm: pmull
>   x86: pclmulqdq
>   ppc: vpmsum[bhwd]
> 
> We really should promote this to generic.  But that can come later,

huh, I tried my best to search for anything related to galois, but seems
like only s390x uses that terminology.  :) Well at least I learned how
it is supposed to be calculated.

Thanks!

> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 09/41] s390x/tcg: Implement VECTOR CHECKSUM
  2019-04-16  9:08       ` Richard Henderson
@ 2019-04-16  9:13         ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  9:13 UTC (permalink / raw)
  To: Richard Henderson, Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 16.04.19 11:08, Richard Henderson wrote:
> On 4/15/19 10:58 PM, David Hildenbrand wrote:
>>> You could use
>>>
>>>   tcg_gen_add2_i32(sum, tmp, sum, zero, tmp, zero);
>>>   tcg_gen_add_i32(sum, sum, tmp);
>> That makes perfect sense, I will use that for now, thanks!
>>
> 
> Here's a funny one.  We can do this in one operation:
> 
>   tcg_gen_add2_i32(tmp, sum, sum, sum, tmp, tmp);

:D I had to look at it 10 times. Very nice trick.

> 
> The lower (sum+tmp) carries into the upper (sum+tmp).
> We take the upper result and discard the lower.
> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 17/41] s390x/tcg: Implement VECTOR LOAD POSITIVE
  2019-04-12 23:50   ` Richard Henderson
@ 2019-04-16  9:16     ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  9:16 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 13.04.19 01:50, Richard Henderson wrote:
> On 4/11/19 12:08 AM, David Hildenbrand wrote:
>> Similar to VECTOR LOAD COMPLEMENT but unfortunately we don't have a
>> gvec helper.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  target/s390x/helper.h           |  2 ++
>>  target/s390x/insn-data.def      |  2 ++
>>  target/s390x/translate_vx.inc.c | 40 +++++++++++++++++++++++++++++++++
>>  target/s390x/vec_int_helper.c   | 14 ++++++++++++
>>  4 files changed, 58 insertions(+)
> 
> I would be happy to add ABS as a gvec primitive, if you like.
> Expandable with MAX or CMP on hosts without ABS.

Sure, that would be great. Once you have that one implemented we can
throw most of this stuff away (either before or after merging). Thanks!

> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 20/41] s390x/tcg: Implement VECTOR MULTIPLY *
  2019-04-13  0:04   ` Richard Henderson
@ 2019-04-16  9:23     ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  9:23 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 13.04.19 02:04, Richard Henderson wrote:
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Changed, thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 27/41] s390x/tcg: Implement VECTOR ELEMENT ROTATE LEFT LOGICAL
  2019-04-13  0:15   ` Richard Henderson
@ 2019-04-16  9:27     ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  9:27 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 13.04.19 02:15, Richard Henderson wrote:
> On 4/11/19 12:08 AM, David Hildenbrand wrote:
>> +#define DEF_ROTL(BITS)                                                         \
>> +static uint##BITS##_t rotl##BITS(uint##BITS##_t a, uint8_t count)            \
>> +{                                                                            \
>> +    count &= BITS - 1;                                                       \
>> +    return (a << count) | (a >> (BITS - count));                             \
>> +}
>> +DEF_ROTL(8)
>> +DEF_ROTL(16)
> 
> We already have rol8 and rol16 in <qemu/bitops.h> for this.
> Otherwise,
> 

Thanks, missed these helpers!

> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> This does point out that I should go ahead and fill out the
> variable shift and rotate patterns in gvec...
> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 28/41] s390x/tcg: Implement VECTOR ELEMENT ROTATE AND INSERT UNDER MASK
  2019-04-13  0:29   ` Richard Henderson
@ 2019-04-16  9:35     ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  9:35 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 13.04.19 02:29, Richard Henderson wrote:
> On 4/11/19 12:08 AM, David Hildenbrand wrote:
>> +static void gen_rim_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, int32_t c)
>> +{
>> +    TCGv_i32 t0 = tcg_temp_new_i32();
>> +    TCGv_i32 t1 = tcg_temp_new_i32();
>> +
>> +    tcg_gen_andc_i32(t0, a, b);
>> +    tcg_gen_rotli_i32(t1, a, c & 31);
>> +    tcg_gen_and_i32(t1, t1, b);
>> +    tcg_gen_or_i32(d, t0, t1);
> 
> The ANDC and ROTL look to be in the wrong order.
> 
> "For each bit in the third operand (b) that is one,
> the corresponding bit *of the rotated elements* in
> the second operand replaces the corresponding bit in
> the first operand".
> 
> I think you need
> 
>     tcg_gen_rotli_i32(a, a, c & 31);
>     tcg_gen_and_i32(a, a, b);
>     tcg_gen_andc_i32(d, d, b);
>     tcg_gen_or_i32(d, d, a);
> 
> with
> 
>   { .fni4 = gen_rim_32, .load_dest = true },
> 
>> +     const uint##BITS##_t a = s390_vec_read_element##BITS(v2, i);           \
>> +     const uint##BITS##_t mask = s390_vec_read_element##BITS(v3, i);        \
>> +     const uint##BITS##_t d = (a & ~mask) | (rotl##BITS(a, count) & mask);  \
> 
> Again, this seems to be missing the insert into "the first operand", i.e.
> loading from v1 as well.

Yes indeed, I misinterpreted/misread the PoP. Nice catch! (as usual,
excellent review)

> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 31/41] s390x/tcg: Implement VECTOR SHIFT LEFT DOUBLE BY BYTE
  2019-04-13  0:54   ` Richard Henderson
@ 2019-04-16  9:45     ` David Hildenbrand
  2019-04-16 15:21       ` Richard Henderson
  0 siblings, 1 reply; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16  9:45 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 13.04.19 02:54, Richard Henderson wrote:
> On 4/11/19 12:08 AM, David Hildenbrand wrote:
>> +static DisasJumpType op_vsldb(DisasContext *s, DisasOps *o)
>> +{
>> +    int src_idx = get_field(s->fields, i4) & 0xf;
>> +
>> +    if (src_idx == 0) {
>> +        gen_gvec_mov(get_field(s->fields, v1), get_field(s->fields, v2));
>> +    } else {
>> +        gen_gvec_3_ool(get_field(s->fields, v1), get_field(s->fields, v2),
>> +                       get_field(s->fields, v3), src_idx,
>> +                       gen_helper_gvec_vsldb);
>> +        return DISAS_NEXT;
> 
> You could also expand this inline using your new extract2 primitive.
> 
>   int i4 = get_field(s->fields, i4);
>   int left_shift, right_shift;
> 
>   left_shift = (i4 & 7) * 8;
>   right_shift = 64 - left_shift;
> 
>   if ((i4 & 8) == 0) {
>       read_vec_element_i64(t0, get_field(s->fields, v2), 0, ES_64);
>       read_vec_element_i64(t1, get_field(s->fields, v2), 1, ES_64);
>       read_vec_element_i64(t2, get_field(s->fields, v3), 0, ES_64);
>   } else {
>       read_vec_element_i64(t0, get_field(s->fields, v2), 1, ES_64);
>       read_vec_element_i64(t1, get_field(s->fields, v3), 0, ES_64);
>       read_vec_element_i64(t2, get_field(s->fields, v3), 1, ES_64);
>   }
>   tcg_gen_extract2_i64(t0, t1, t0, right_shift);
>   tcg_gen_extract2_i64(t1, t2, t1, right_shift);

Trying to understand the magic, left_shift is really only used to to
calculate right_shift, right?

>   write_vec_element_i64(t0, get_field(s->fields, v1), 0, ES_64);
>   write_vec_element_i64(t1, get_field(s->fields, v1), 1, ES_64);
> 
> 
> r~
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 31/41] s390x/tcg: Implement VECTOR SHIFT LEFT DOUBLE BY BYTE
  2019-04-16  9:45     ` David Hildenbrand
@ 2019-04-16 15:21       ` Richard Henderson
  0 siblings, 0 replies; 152+ messages in thread
From: Richard Henderson @ 2019-04-16 15:21 UTC (permalink / raw)
  To: David Hildenbrand, Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 4/15/19 11:45 PM, David Hildenbrand wrote:
> On 13.04.19 02:54, Richard Henderson wrote:
>> On 4/11/19 12:08 AM, David Hildenbrand wrote:
>>> +static DisasJumpType op_vsldb(DisasContext *s, DisasOps *o)
>>> +{
>>> +    int src_idx = get_field(s->fields, i4) & 0xf;
>>> +
>>> +    if (src_idx == 0) {
>>> +        gen_gvec_mov(get_field(s->fields, v1), get_field(s->fields, v2));
>>> +    } else {
>>> +        gen_gvec_3_ool(get_field(s->fields, v1), get_field(s->fields, v2),
>>> +                       get_field(s->fields, v3), src_idx,
>>> +                       gen_helper_gvec_vsldb);
>>> +        return DISAS_NEXT;
>>
>> You could also expand this inline using your new extract2 primitive.
>>
>>   int i4 = get_field(s->fields, i4);
>>   int left_shift, right_shift;
>>
>>   left_shift = (i4 & 7) * 8;
>>   right_shift = 64 - left_shift;
>>
>>   if ((i4 & 8) == 0) {
>>       read_vec_element_i64(t0, get_field(s->fields, v2), 0, ES_64);
>>       read_vec_element_i64(t1, get_field(s->fields, v2), 1, ES_64);
>>       read_vec_element_i64(t2, get_field(s->fields, v3), 0, ES_64);
>>   } else {
>>       read_vec_element_i64(t0, get_field(s->fields, v2), 1, ES_64);
>>       read_vec_element_i64(t1, get_field(s->fields, v3), 0, ES_64);
>>       read_vec_element_i64(t2, get_field(s->fields, v3), 1, ES_64);
>>   }
>>   tcg_gen_extract2_i64(t0, t1, t0, right_shift);
>>   tcg_gen_extract2_i64(t1, t2, t1, right_shift);
> 
> Trying to understand the magic, left_shift is really only used to to
> calculate right_shift, right?

Yes.  I thought that was clearer as a separate step.


r~

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 41/41] s390x/tcg: Implement VECTOR TEST UNDER MASK
  2019-04-13  6:28   ` Richard Henderson
@ 2019-04-16 18:20     ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16 18:20 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 13.04.19 08:28, Richard Henderson wrote:
> On 4/11/19 12:08 AM, David Hildenbrand wrote:
>> +void HELPER(gvec_vtm)(void *v1, const void *v2, CPUS390XState *env,
>> +                      uint32_t desc)
>> +{
>> +    S390Vector tmp;
>> +
>> +    s390_vec_and(&tmp, v1, v2);
>> +    if (s390_vec_is_zero(&tmp)) {
>> +        /* Selected bits all zeros; or all mask bits zero */
>> +        env->cc_op = 0;
>> +    } else if (s390_vec_equal(&tmp, v2)) {
>> +        /* Selected bits all ones */
>> +        env->cc_op = 3;
>> +    } else {
>> +        /* Selected bits a mix of zeros and ones */
>> +        env->cc_op = 1;
>> +    }
>> +}
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> However, if you return this value, then you can do
> 
> DEF_HELPER_FLAGS_4(gvec_vtm, TCG_CALL_NO_RWG_SE, i32, cptr, cptr)
> 
> static DisasJumpType op_vtm(DisasContext *s, DisasOps *o)
> {
>     TCGv_ptr p1 = tcg_temp_new_ptr();
>     TCGv_ptr p2 = tcg_temp_new_ptr();
> 
>     tcg_gen_addi_ptr(p1, cpu_env,
>         vec_full_reg_offset(get_field(s->fields, v1)));
>     tcg_gen_addi_ptr(p2, cpu_env,
>         vec_full_reg_offset(get_field(s->fields, v2)));
>     gen_helper_gvec_vtm(cc_op, p1, p2);
>     tcg_temp_free_ptr(p1);
>     tcg_temp_free_ptr(p2);
>     set_cc_static(s);
>     return DISAS_NEXT;
> }
> 
> Perhaps it doesn't matter though, since use of vtm probably implies a jump,
> which implies end of TB, which means that registers are going to get saved to
> backing store anyway.

Had a similar idea, but I guess it doesn't really matter. This way, we
can at least use standardized gvec ool helpers. Will leave it as is for
now, thanks!


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

* Re: [Qemu-devel] [PATCH v1 37/41] s390x/tcg: Implement VECTOR SUBTRACT WITH BORROW COMPUTE BORROW INDICATION
  2019-04-13  6:11   ` Richard Henderson
@ 2019-04-16 18:26     ` David Hildenbrand
  0 siblings, 0 replies; 152+ messages in thread
From: David Hildenbrand @ 2019-04-16 18:26 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Thomas Huth, Richard Henderson

On 13.04.19 08:11, Richard Henderson wrote:
> On 4/11/19 12:08 AM, David Hildenbrand wrote:
>> +static DisasJumpType op_vsbcbi(DisasContext *s, DisasOps *o)
>> +{
>> +    if (get_field(s->fields, m5) != ES_128) {
>> +        gen_program_exception(s, PGM_SPECIFICATION);
>> +        return DISAS_NORETURN;
>> +    }
>> +
>> +    gen_gvec_4_ool(get_field(s->fields, v1), get_field(s->fields, v2),
>> +                   get_field(s->fields, v3), get_field(s->fields, v4), 0,
>> +                   gen_helper_gvec_vsbcbi128);
>> +    return DISAS_NEXT;
>> +}
> 
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> I'm sure this can be done similarly to add with carry compute carry, but it's
> harder to reason with sub2.  Something like
> 
> 	tcg_gen_andi_i64(tl, cl, 1);
> 	tcg_gen_sub2_i64(tl, th, al, zero, tl, zero);
> 	tcg_gen_sub2_i64(tl, th, tl, th, bl, zero);
> 	tcg_gen_andi_i64(tl, th, 1);
> 	tcg_gen_sub2_i64(tl, th, ah, zero, tl, zero);
> 	tcg_gen_sub2_i64(tl, th, tl, th, bh, zero);
> 	tcg_gen_andi_i64(tl, th, 1);
> 	/* result in tl */

Indeed, looks good to me. The only thing to care about is to convert -1
to 1, so we get a proper borrow.

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 152+ messages in thread

end of thread, other threads:[~2019-04-16 18:26 UTC | newest]

Thread overview: 152+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-11 10:07 [Qemu-devel] [PATCH v1 00/41] s390x/tcg: Vector Instruction Support Part 2 David Hildenbrand
2019-04-11 10:07 ` David Hildenbrand
2019-04-11 10:07 ` [Qemu-devel] [PATCH v1 01/41] tcg: Implement tcg_gen_gvec_3i() David Hildenbrand
2019-04-11 10:07   ` David Hildenbrand
2019-04-11 10:07 ` [Qemu-devel] [PATCH v1 02/41] s390x/tcg: Implement VECTOR ADD David Hildenbrand
2019-04-11 10:07   ` David Hildenbrand
2019-04-12 18:28   ` Richard Henderson
2019-04-12 18:28     ` Richard Henderson
2019-04-11 10:07 ` [Qemu-devel] [PATCH v1 03/41] s390x/tcg: Implement VECTOR ADD COMPUTE CARRY David Hildenbrand
2019-04-11 10:07   ` David Hildenbrand
2019-04-12 21:05   ` Richard Henderson
2019-04-12 21:05     ` Richard Henderson
2019-04-16  8:01     ` David Hildenbrand
2019-04-16  8:01       ` David Hildenbrand
2019-04-16  8:17       ` Richard Henderson
2019-04-16  8:17         ` Richard Henderson
2019-04-16  8:33         ` David Hildenbrand
2019-04-16  8:33           ` David Hildenbrand
2019-04-16  8:43           ` Richard Henderson
2019-04-16  8:43             ` Richard Henderson
2019-04-16  8:46             ` David Hildenbrand
2019-04-16  8:46               ` David Hildenbrand
2019-04-11 10:07 ` [Qemu-devel] [PATCH v1 04/41] s390x/tcg: Implement VECTOR ADD WITH CARRY David Hildenbrand
2019-04-11 10:07   ` David Hildenbrand
2019-04-12 21:36   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 05/41] s390x/tcg: Implement VECTOR ADD WITH CARRY COMPUTE CARRY David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-12 21:58   ` Richard Henderson
2019-04-16  8:40     ` David Hildenbrand
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 06/41] s390x/tcg: Implement VECTOR AND (WITH COMPLEMENT) David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-12 21:59   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 07/41] s390x/tcg: Implement VECTOR AVERAGE David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-12 22:34   ` Richard Henderson
2019-04-16  8:52     ` David Hildenbrand
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 08/41] s390x/tcg: Implement VECTOR AVERAGE LOGICAL David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-12 22:35   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 09/41] s390x/tcg: Implement VECTOR CHECKSUM David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-12 23:01   ` Richard Henderson
2019-04-16  8:58     ` David Hildenbrand
2019-04-16  9:08       ` Richard Henderson
2019-04-16  9:13         ` David Hildenbrand
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 10/41] s390x/tcg: Implement VECTOR ELEMENT COMPARE * David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-12 23:14   ` Richard Henderson
2019-04-16  9:05     ` David Hildenbrand
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 11/41] s390x/tcg: Implement VECTOR " David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-12 23:17   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 12/41] s390x/tcg: Implement VECTOR COUNT LEADING ZEROS David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-12 23:21   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 13/41] s390x/tcg: Implement VECTOR COUNT TRAILING ZEROS David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-12 23:23   ` Richard Henderson
2019-04-16  9:07     ` David Hildenbrand
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 14/41] s390x/tcg: Implement VECTOR EXCLUSIVE OR David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-12 23:23   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 15/41] s390x/tcg: Implement VECTOR GALOIS FIELD MULTIPLY SUM (AND ACCUMULATE) David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-12 23:44   ` Richard Henderson
2019-04-16  9:10     ` David Hildenbrand
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 16/41] s390x/tcg: Implement VECTOR LOAD COMPLEMENT David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-12 23:47   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 17/41] s390x/tcg: Implement VECTOR LOAD POSITIVE David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-12 23:50   ` Richard Henderson
2019-04-16  9:16     ` David Hildenbrand
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 18/41] s390x/tcg: Implement VECTOR (MAXIMUM|MINIMUM) (LOGICAL) David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-12 23:51   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 19/41] s390x/tcg: Implement VECTOR MULTIPLY AND ADD * David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  0:01   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 20/41] s390x/tcg: Implement VECTOR MULTIPLY * David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  0:04   ` Richard Henderson
2019-04-16  9:23     ` David Hildenbrand
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 21/41] s390x/tcg: Implement VECTOR NAND David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  0:05   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 22/41] s390x/tcg: Implement VECTOR NOR David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  0:05   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 23/41] s390x/tcg: Implement VECTOR NOT EXCLUSIVE OR David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  0:06   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 24/41] s390x/tcg: Implement VECTOR OR David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  0:06   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 25/41] s390x/tcg: Implement VECTOR OR WITH COMPLEMENT David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  0:07   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 26/41] s390x/tcg: Implement VECTOR POPULATION COUNT David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  0:08   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 27/41] s390x/tcg: Implement VECTOR ELEMENT ROTATE LEFT LOGICAL David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  0:15   ` Richard Henderson
2019-04-16  9:27     ` David Hildenbrand
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 28/41] s390x/tcg: Implement VECTOR ELEMENT ROTATE AND INSERT UNDER MASK David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  0:29   ` Richard Henderson
2019-04-16  9:35     ` David Hildenbrand
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 29/41] s390x/tcg: Implement VECTOR ELEMENT SHIFT David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  0:31   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 30/41] s390x/tcg: Implement VECTOR SHIFT LEFT (BY BYTE) David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  0:36   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 31/41] s390x/tcg: Implement VECTOR SHIFT LEFT DOUBLE BY BYTE David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  0:54   ` Richard Henderson
2019-04-16  9:45     ` David Hildenbrand
2019-04-16 15:21       ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 32/41] s390x/tcg: Implement VECTOR SHIFT RIGHT ARITHMETIC David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  5:48   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 33/41] s390x/tcg: Implement VECTOR SHIFT RIGHT LOGICAL * David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  5:48   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 34/41] s390x/tcg: Implement VECTOR SUBTRACT David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  5:49   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 35/41] s390x/tcg: Implement VECTOR SUBTRACT COMPUTE BORROW INDICATION David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  5:51   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 36/41] s390x/tcg: Implement VECTOR SUBTRACT WITH " David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  5:52   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 37/41] s390x/tcg: Implement VECTOR SUBTRACT WITH BORROW COMPUTE " David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  6:11   ` Richard Henderson
2019-04-16 18:26     ` David Hildenbrand
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 38/41] s390x/tcg: Implement VECTOR SUM ACROSS DOUBLEWORD David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  6:15   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 39/41] s390x/tcg: Implement VECTOR SUM ACROSS QUADWORD David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  6:17   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 40/41] s390x/tcg: Implement VECTOR SUM ACROSS WORD David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  6:19   ` Richard Henderson
2019-04-11 10:08 ` [Qemu-devel] [PATCH v1 41/41] s390x/tcg: Implement VECTOR TEST UNDER MASK David Hildenbrand
2019-04-11 10:08   ` David Hildenbrand
2019-04-13  6:28   ` Richard Henderson
2019-04-16 18:20     ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.