[PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14
@ 2020-09-30 14:55 David Hildenbrand
  2020-09-30 14:55 ` [PATCH v1 01/20] softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag) David Hildenbrand
                   ` (22 more replies)
  0 siblings, 23 replies; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Thomas Huth, David Hildenbrand, Cornelia Huck,
	Richard Henderson, Christian Borntraeger, qemu-s390x,
	Alex Bennée, Aurelien Jarno

This series adds support for the "Vector enhancements facility" and bumps
the qemu CPU model tp to a stripped-down z14.

I yet have to find a way to get more test coverage - looks like some of
the functions aren't used anywhere yet (e.g., VECTOR FP MAXIMUM), writing
unit tests to cover all functions and cases is just nasty. But I might be
wrong - I'm planning to at least test basic functionality of all new added
instructions.

I have to make excessive use of c macros again to cover different element
sizes (32/64/128bit). Any advise to clean things up are welcome.

This is based on:
    "[PATCH v2 0/9] s390x/tcg: Implement some z14 facilities"
    "[PATCH v2 00/10] softfloat: Implement float128_muladd"

Based-on: 20200928122717.30586-1-david@redhat.com
Based-on: 20200925152047.709901-1-richard.henderson@linaro.org

David Hildenbrand (20):
  softfloat: Implement
    float128_(min|minnum|minnummag|max|maxnum|maxnummag)
  s390x/tcg: Implement VECTOR BIT PERMUTE
  s390x/tcg: Implement VECTOR MULTIPLY SUM LOGICAL
  s390x/tcg: Implement 32/128 bit for VECTOR FP ADD
  s390x/tcg: Implement 32/128 bit for VECTOR FP DIVIDE
  s390x/tcg: Implement 32/128 bit for VECTOR FP MULTIPLY
  s390x/tcg: Implement 32/128 bit for VECTOR FP SUBTRACT
  s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE (AND SIGNAL)
    SCALAR
  s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE *
  s390x/tcg: Implement 32/128 bit for VECTOR LOAD FP INTEGER
  s390x/tcg: Implement 64 bit for VECTOR FP LOAD LENGTHENED
  s390x/tcg: Implement 128 bit for VECTOR FP LOAD ROUNDED
  s390x/tcg: Implement 32/128 bit for VECTOR FP PERFORM SIGN OPERATION
  s390x/tcg: Implement 32/128 bit for VECTOR FP SQUARE ROOT
  s390x/tcg: Implement 32/128 bit for VECTOR FP TEST DATA CLASS
    IMMEDIATE
  s390x/tcg: Implement 32/128bit for VECTOR FP MULTIPLY AND
    (ADD|SUBTRACT)
  s390x/tcg: Implement VECTOR FP NEGATIVE MULTIPLY AND (ADD|SUBTRACT)
  s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM)
  s390x/tcg: We support Vector enhancements facility
  s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z14 GA2

 fpu/softfloat.c                 |  100 +++
 hw/s390x/s390-virtio-ccw.c      |    2 +
 include/fpu/softfloat.h         |    6 +
 target/s390x/cpu_models.c       |    4 +-
 target/s390x/gen-features.c     |   14 +-
 target/s390x/helper.h           |   72 ++
 target/s390x/insn-data.def      |   12 +
 target/s390x/translate_vx.c.inc |  625 ++++++++++++---
 target/s390x/vec_fpu_helper.c   | 1302 ++++++++++++++++++++++---------
 target/s390x/vec_helper.c       |   22 +
 10 files changed, 1681 insertions(+), 478 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v1 01/20] softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag)
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-09-30 16:10   ` Alex Bennée
  2020-09-30 14:55 ` [PATCH v1 02/20] s390x/tcg: Implement VECTOR BIT PERMUTE David Hildenbrand
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Thomas Huth, David Hildenbrand, Cornelia Huck,
	Richard Henderson, qemu-s390x, Alex Bennée, Aurelien Jarno

Implementation inspired by minmax_floats(). Unfortuantely, we don't have
any tests we can simply adjust/unlock.

Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: "Alex Bennée" <alex.bennee@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 fpu/softfloat.c         | 100 ++++++++++++++++++++++++++++++++++++++++
 include/fpu/softfloat.h |   6 +++
 2 files changed, 106 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 9af75b9146..9463c5ea56 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -621,6 +621,8 @@ static inline FloatParts float64_unpack_raw(float64 f)
     return unpack_raw(float64_params, f);
 }
 
+static void float128_unpack(FloatParts128 *p, float128 a, float_status *status);
+
 /* Pack a float from parts, but do not canonicalize.  */
 static inline uint64_t pack_raw(FloatFmt fmt, FloatParts p)
 {
@@ -3180,6 +3182,89 @@ static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
     }
 }
 
+static float128 float128_minmax(float128 a, float128 b, bool ismin, bool ieee,
+                                bool ismag, float_status *s)
+{
+    FloatParts128 pa, pb;
+    int a_exp, b_exp;
+    bool a_less;
+
+    float128_unpack(&pa, a, s);
+    float128_unpack(&pb, b, s);
+
+    if (unlikely(is_nan(pa.cls) || is_nan(pb.cls))) {
+        /* See comment in minmax_floats() */
+        if (ieee && !is_snan(pa.cls) && !is_snan(pb.cls)) {
+            if (is_nan(pa.cls) && !is_nan(pb.cls)) {
+                return b;
+            } else if (is_nan(pb.cls) && !is_nan(pa.cls)) {
+                return a;
+            }
+        }
+
+        /* Similar logic to pick_nan(), avoiding re-packing. */
+        if (is_snan(pa.cls) || is_snan(pb.cls)) {
+            s->float_exception_flags |= float_flag_invalid;
+        }
+        if (s->default_nan_mode) {
+            return float128_default_nan(s);
+        }
+        if (pickNaN(pa.cls, pb.cls,
+                    pa.frac0 > pb.frac0 ||
+                    (pa.frac0 == pb.frac0 && pa.frac1 > pb.frac1) ||
+                    (pa.frac0 == pb.frac0 && pa.frac1 == pb.frac1 &&
+                     pa.sign < pb.sign), s)) {
+            return is_snan(pb.cls) ? float128_silence_nan(b, s) : b;
+        }
+        return is_snan(pa.cls) ? float128_silence_nan(a, s) : a;
+    }
+
+    switch (pa.cls) {
+    case float_class_normal:
+        a_exp = pa.exp;
+        break;
+    case float_class_inf:
+        a_exp = INT_MAX;
+        break;
+    case float_class_zero:
+        a_exp = INT_MIN;
+        break;
+    default:
+        g_assert_not_reached();
+        break;
+    }
+    switch (pb.cls) {
+    case float_class_normal:
+        b_exp = pb.exp;
+        break;
+    case float_class_inf:
+        b_exp = INT_MAX;
+        break;
+    case float_class_zero:
+        b_exp = INT_MIN;
+        break;
+    default:
+        g_assert_not_reached();
+        break;
+    }
+
+    a_less = a_exp < b_exp;
+    if (a_exp == b_exp) {
+        a_less = pa.frac0 < pb.frac0;
+        if (pa.frac0 == pb.frac0) {
+            a_less = pa.frac1 < pb.frac1;
+        }
+    }
+
+    if (ismag &&
+        (a_exp != b_exp || pa.frac0 != pb.frac0 || pa.frac1 != pb.frac1)) {
+        return a_less ^ ismin ? b : a;
+    } else if (pa.sign == pb.sign) {
+        return pa.sign ^ a_less ^ ismin ? b : a;
+    }
+    return pa.sign ^ ismin ? b : a;
+}
+
 #define MINMAX(sz, name, ismin, isiee, ismag)                           \
 float ## sz float ## sz ## _ ## name(float ## sz a, float ## sz b,      \
                                      float_status *s)                   \
@@ -3214,6 +3299,21 @@ MINMAX(64, maxnummag, false, true, true)
 
 #undef MINMAX
 
+#define F128_MINMAX(name, ismin, isiee, ismag)                          \
+float128 float128_ ## name(float128 a, float128 b, float_status *s)     \
+{                                                                       \
+    return float128_minmax(a, b, ismin, isiee, ismag, s);               \
+}
+
+F128_MINMAX(min, true, false, false)
+F128_MINMAX(minnum, true, true, false)
+F128_MINMAX(minnummag, true, true, true)
+F128_MINMAX(max, false, false, false)
+F128_MINMAX(maxnum, false, true, false)
+F128_MINMAX(maxnummag, false, true, true)
+
+#undef F128_MINMAX
+
 #define BF16_MINMAX(name, ismin, isiee, ismag)                          \
 bfloat16 bfloat16_ ## name(bfloat16 a, bfloat16 b, float_status *s)     \
 {                                                                       \
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index a38433deb4..4fab2ef6f4 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -1201,6 +1201,12 @@ float128 float128_muladd(float128, float128, float128, int,
 float128 float128_sqrt(float128, float_status *status);
 FloatRelation float128_compare(float128, float128, float_status *status);
 FloatRelation float128_compare_quiet(float128, float128, float_status *status);
+float128 float128_min(float128, float128, float_status *status);
+float128 float128_max(float128, float128, float_status *status);
+float128 float128_minnum(float128, float128, float_status *status);
+float128 float128_maxnum(float128, float128, float_status *status);
+float128 float128_minnummag(float128, float128, float_status *status);
+float128 float128_maxnummag(float128, float128, float_status *status);
 bool float128_is_quiet_nan(float128, float_status *status);
 bool float128_is_signaling_nan(float128, float_status *status);
 float128 float128_silence_nan(float128, float_status *status);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 02/20] s390x/tcg: Implement VECTOR BIT PERMUTE
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
  2020-09-30 14:55 ` [PATCH v1 01/20] softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag) David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-10-01 15:17   ` Richard Henderson
  2020-09-30 14:55 ` [PATCH v1 03/20] s390x/tcg: Implement VECTOR MULTIPLY SUM LOGICAL David Hildenbrand
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.c.inc |  8 ++++++++
 target/s390x/vec_helper.c       | 22 ++++++++++++++++++++++
 4 files changed, 33 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 55bd1551e6..f579fd38a7 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -126,6 +126,7 @@ DEF_HELPER_FLAGS_1(stck, TCG_CALL_NO_RWG_SE, i64, env)
 DEF_HELPER_FLAGS_3(probe_write_access, TCG_CALL_NO_WG, void, env, i64, i64)
 
 /* === Vector Support Instructions === */
+DEF_HELPER_FLAGS_4(gvec_vbperm, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(vll, TCG_CALL_NO_WG, void, env, ptr, i64, i64)
 DEF_HELPER_FLAGS_4(gvec_vpk16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vpk32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index d3bcdfd67b..b55cb44f60 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -989,6 +989,8 @@
 
 /* === Vector Support Instructions === */
 
+/* VECTOR BIT PERMUTE */
+    E(0xe785, VBPERM,  VRR_c, VE,  0, 0, 0, 0, vbperm, 0, 0, IF_VEC)
 /* VECTOR GATHER ELEMENT */
     E(0xe713, VGEF,    VRV,   V,   la2, 0, 0, 0, vge, 0, ES_32, IF_VEC)
     E(0xe712, VGEG,    VRV,   V,   la2, 0, 0, 0, vge, 0, ES_64, IF_VEC)
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index eb767f5288..44f54a79f4 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -327,6 +327,14 @@ static void gen_addi2_i64(TCGv_i64 dl, TCGv_i64 dh, TCGv_i64 al, TCGv_i64 ah,
     tcg_temp_free_i64(bh);
 }
 
+static DisasJumpType op_vbperm(DisasContext *s, DisasOps *o)
+{
+    gen_gvec_3_ool(get_field(s, v1), get_field(s, v2), get_field(s, v3), 0,
+                   gen_helper_gvec_vbperm);
+
+    return DISAS_NEXT;
+}
+
 static DisasJumpType op_vge(DisasContext *s, DisasOps *o)
 {
     const uint8_t es = s->insn->data;
diff --git a/target/s390x/vec_helper.c b/target/s390x/vec_helper.c
index 986e7cc825..7d9843f37f 100644
--- a/target/s390x/vec_helper.c
+++ b/target/s390x/vec_helper.c
@@ -19,6 +19,28 @@
 #include "exec/cpu_ldst.h"
 #include "exec/exec-all.h"
 
+void HELPER(gvec_vbperm)(void *v1, const void *v2, const void *v3,
+                         uint32_t desc)
+{
+    S390Vector tmp = {};
+    uint16_t result = 0;
+    int i;
+
+    for (i = 0; i < 16; i++) {
+        const uint8_t bit_nr = s390_vec_read_element8(v3, i);
+        uint16_t bit;
+
+        if (bit_nr >= 128) {
+            continue;
+        }
+        bit = !!(s390_vec_read_element8(v2, bit_nr / 8) &
+                 (0x80 >> (bit_nr % 8)));
+        result |= (bit << (15 - i));
+    }
+    s390_vec_write_element16(&tmp, 3, result);
+    *(S390Vector *)v1 = tmp;
+}
+
 void HELPER(vll)(CPUS390XState *env, void *v1, uint64_t addr, uint64_t bytes)
 {
     if (likely(bytes >= 16)) {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 03/20] s390x/tcg: Implement VECTOR MULTIPLY SUM LOGICAL
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
  2020-09-30 14:55 ` [PATCH v1 01/20] softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag) David Hildenbrand
  2020-09-30 14:55 ` [PATCH v1 02/20] s390x/tcg: Implement VECTOR BIT PERMUTE David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-10-01 15:26   ` Richard Henderson
  2020-09-30 14:55 ` [PATCH v1 04/20] s390x/tcg: Implement 32/128 bit for VECTOR FP ADD David Hildenbrand
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

Fortunately, we only need the Doubleword implementation.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/insn-data.def      |  2 ++
 target/s390x/translate_vx.c.inc | 52 +++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index b55cb44f60..da7fe6f21c 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1151,6 +1151,8 @@
     F(0xe7a7, VMO,     VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
 /* VECTOR MULTIPLY LOGICAL ODD */
     F(0xe7a5, VMLO,    VRR_c, V,   0, 0, 0, 0, vm, 0, IF_VEC)
+/* VECTOR MULTIPLY SUM LOGICAL */
+    F(0xe7b8, VMSL,    VRR_d, VE,  0, 0, 0, 0, vmsl, 0, IF_VEC)
 /* VECTOR NAND */
     F(0xe76e, VNN,     VRR_c, VE,  0, 0, 0, 0, vnn, 0, IF_VEC)
 /* VECTOR NOR */
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index 44f54a79f4..4c1b430013 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -1779,6 +1779,58 @@ static DisasJumpType op_vm(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
+static DisasJumpType op_vmsl(DisasContext *s, DisasOps *o)
+{
+    TCGv_i64 l1, h1, l2, h2;
+
+    if (get_field(s, m4) != ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    l1 = tcg_temp_new_i64();
+    h1 = tcg_temp_new_i64();
+    l2 = tcg_temp_new_i64();
+    h2 = tcg_temp_new_i64();
+
+    /* Multipy both even elements from v2 and v3 */
+    read_vec_element_i64(l1, get_field(s, v2), 0, ES_64);
+    read_vec_element_i64(h1, get_field(s, v3), 0, ES_64);
+    tcg_gen_mulu2_i64(l1, h1, l1, h1);
+    /* Shift result left by one bit if requested */
+    if (extract32(get_field(s, m6), 3, 1)) {
+        tcg_gen_extract2_i64(h1, l1, h1, 63);
+        tcg_gen_shli_i64(l1, l1, 1);
+    }
+
+    /* Multipy both odd elements from v2 and v3 */
+    read_vec_element_i64(l2, get_field(s, v2), 1, ES_64);
+    read_vec_element_i64(h2, get_field(s, v3), 1, ES_64);
+    tcg_gen_mulu2_i64(l2, h2, l2, h2);
+    /* Shift result left by one bit if requested */
+    if (extract32(get_field(s, m6), 2, 1)) {
+        tcg_gen_extract2_i64(h2, l2, h2, 63);
+        tcg_gen_shli_i64(l2, l2, 1);
+    }
+
+    /* Add both intermediate results */
+    tcg_gen_add2_i64(l1, h1, l1, h1, l2, h2);
+    /* Add whole v4 */
+    read_vec_element_i64(h2, get_field(s, v4), 0, ES_64);
+    read_vec_element_i64(l2, get_field(s, v4), 1, ES_64);
+    tcg_gen_add2_i64(l1, h1, l1, h1, l2, h2);
+
+    /* Store final result into v1. */
+    write_vec_element_i64(h1, get_field(s, v1), 0, ES_64);
+    write_vec_element_i64(l1, get_field(s, v1), 1, ES_64);
+
+    tcg_temp_free_i64(l1);
+    tcg_temp_free_i64(h1);
+    tcg_temp_free_i64(l2);
+    tcg_temp_free_i64(h2);
+    return DISAS_NEXT;
+}
+
 static DisasJumpType op_vnn(DisasContext *s, DisasOps *o)
 {
     gen_gvec_fn_3(nand, ES_8, get_field(s, v1),
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 04/20] s390x/tcg: Implement 32/128 bit for VECTOR FP ADD
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 03/20] s390x/tcg: Implement VECTOR MULTIPLY SUM LOGICAL David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-10-01 15:45   ` Richard Henderson
  2020-10-01 16:08   ` Richard Henderson
  2020-09-30 14:55 ` [PATCH v1 05/20] s390x/tcg: Implement 32/128 bit for VECTOR FP DIVIDE David Hildenbrand
                   ` (18 subsequent siblings)
  22 siblings, 2 replies; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

In case of 128bit, we always have a single element.

Add some helpers that allow to generically read/write 32/64/128 bit
floats. Convert the existing implementation of vop64_3 into a macro that
deals with float* instead of uint* instead - the other users keep
working as
    typedef uint32_t float32;
    typedef uint64_t float64;

Most of them will get converted next.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |   3 +
 target/s390x/translate_vx.c.inc |  31 +++++++--
 target/s390x/vec_fpu_helper.c   | 119 +++++++++++++++++++++-----------
 3 files changed, 107 insertions(+), 46 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index f579fd38a7..3d59f143e0 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -247,8 +247,11 @@ DEF_HELPER_6(gvec_vstrc_cc_rt16, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_6(gvec_vstrc_cc_rt32, void, ptr, cptr, cptr, cptr, env, i32)
 
 /* === Vector Floating-Point Instructions */
+DEF_HELPER_FLAGS_5(gvec_vfa32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfa32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfa64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfa64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfa128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_4(gvec_wfc64, void, cptr, cptr, env, i32)
 DEF_HELPER_4(gvec_wfk64, void, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfce64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index 4c1b430013..2ba2170b16 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2504,16 +2504,27 @@ static DisasJumpType op_vfa(DisasContext *s, DisasOps *o)
     const uint8_t fpf = get_field(s, m4);
     const uint8_t m5 = get_field(s, m5);
     const bool se = extract32(m5, 3, 1);
-    gen_helper_gvec_3_ptr *fn;
-
-    if (fpf != FPF_LONG || extract32(m5, 0, 3)) {
-        gen_program_exception(s, PGM_SPECIFICATION);
-        return DISAS_NORETURN;
-    }
+    gen_helper_gvec_3_ptr *fn = NULL;
 
     switch (s->fields.op2) {
     case 0xe3:
-        fn = se ? gen_helper_gvec_vfa64s : gen_helper_gvec_vfa64;
+        switch (fpf) {
+        case FPF_SHORT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = se ? gen_helper_gvec_vfa32s : gen_helper_gvec_vfa32;
+            }
+            break;
+        case FPF_LONG:
+            fn = se ? gen_helper_gvec_vfa64s : gen_helper_gvec_vfa64;
+            break;
+        case FPF_EXT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = gen_helper_gvec_vfa128;
+            }
+            break;
+        default:
+            break;
+        }
         break;
     case 0xe5:
         fn = se ? gen_helper_gvec_vfd64s : gen_helper_gvec_vfd64;
@@ -2527,6 +2538,12 @@ static DisasJumpType op_vfa(DisasContext *s, DisasOps *o)
     default:
         g_assert_not_reached();
     }
+
+    if (!fn || extract32(m5, 0, 3)) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
     gen_gvec_3_ptr(get_field(s, v1), get_field(s, v2),
                    get_field(s, v3), cpu_env, 0, fn);
     return DISAS_NEXT;
diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
index c1564e819b..ae803ba602 100644
--- a/target/s390x/vec_fpu_helper.c
+++ b/target/s390x/vec_fpu_helper.c
@@ -78,6 +78,40 @@ static void handle_ieee_exc(CPUS390XState *env, uint8_t vxc, uint8_t vec_exc,
     }
 }
 
+static float32 s390_vec_read_float32(const S390Vector *v, uint8_t enr)
+{
+    return make_float32(s390_vec_read_element32(v, enr));
+}
+
+static float64 s390_vec_read_float64(const S390Vector *v, uint8_t enr)
+{
+    return make_float64(s390_vec_read_element64(v, enr));
+}
+
+static float128 s390_vec_read_float128(const S390Vector *v, uint8_t enr)
+{
+    g_assert(enr == 0);
+    return make_float128(s390_vec_read_element64(v, 0),
+                         s390_vec_read_element64(v, 1));
+}
+
+static void s390_vec_write_float32(S390Vector *v, uint8_t enr, float32 data)
+{
+    return s390_vec_write_element32(v, enr, data);
+}
+
+static void s390_vec_write_float64(S390Vector *v, uint8_t enr, float64 data)
+{
+    return s390_vec_write_element64(v, enr, data);
+}
+
+static void s390_vec_write_float128(S390Vector *v, uint8_t enr, float128 data)
+{
+    g_assert(enr == 0);
+    s390_vec_write_element64(v, 0, data.high);
+    s390_vec_write_element64(v, 1, data.low);
+}
+
 typedef uint64_t (*vop64_2_fn)(uint64_t a, float_status *s);
 static void vop64_2(S390Vector *v1, const S390Vector *v2, CPUS390XState *env,
                     bool s, bool XxC, uint8_t erm, vop64_2_fn fn,
@@ -102,45 +136,52 @@ static void vop64_2(S390Vector *v1, const S390Vector *v2, CPUS390XState *env,
     *v1 = tmp;
 }
 
-typedef uint64_t (*vop64_3_fn)(uint64_t a, uint64_t b, float_status *s);
-static void vop64_3(S390Vector *v1, const S390Vector *v2, const S390Vector *v3,
-                    CPUS390XState *env, bool s, vop64_3_fn fn,
-                    uintptr_t retaddr)
-{
-    uint8_t vxc, vec_exc = 0;
-    S390Vector tmp = {};
-    int i;
-
-    for (i = 0; i < 2; i++) {
-        const uint64_t a = s390_vec_read_element64(v2, i);
-        const uint64_t b = s390_vec_read_element64(v3, i);
-
-        s390_vec_write_element64(&tmp, i, fn(a, b, &env->fpu_status));
-        vxc = check_ieee_exc(env, i, false, &vec_exc);
-        if (s || vxc) {
-            break;
-        }
-    }
-    handle_ieee_exc(env, vxc, vec_exc, retaddr);
-    *v1 = tmp;
-}
-
-static uint64_t vfa64(uint64_t a, uint64_t b, float_status *s)
-{
-    return float64_add(a, b, s);
-}
-
-void HELPER(gvec_vfa64)(void *v1, const void *v2, const void *v3,
-                        CPUS390XState *env, uint32_t desc)
-{
-    vop64_3(v1, v2, v3, env, false, vfa64, GETPC());
-}
-
-void HELPER(gvec_vfa64s)(void *v1, const void *v2, const void *v3,
-                         CPUS390XState *env, uint32_t desc)
-{
-    vop64_3(v1, v2, v3, env, true, vfa64, GETPC());
-}
+#define DEF_VOP_3(BITS)                                                        \
+typedef float##BITS (*vop##BITS##_3_fn)(float##BITS a, float##BITS b,          \
+                                        float_status *s);                      \
+static void vop##BITS##_3(S390Vector *v1, const S390Vector *v2,                \
+                          const S390Vector *v3, CPUS390XState *env, bool s,    \
+                          vop##BITS##_3_fn fn, uintptr_t retaddr)              \
+{                                                                              \
+    uint8_t vxc, vec_exc = 0;                                                  \
+    S390Vector tmp = {};                                                       \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const float##BITS a = s390_vec_read_float##BITS(v2, i);                \
+        const float##BITS b = s390_vec_read_float##BITS(v3, i);                \
+                                                                               \
+        s390_vec_write_float##BITS(&tmp, i, fn(a, b, &env->fpu_status));       \
+        vxc = check_ieee_exc(env, i, false, &vec_exc);                         \
+        if (s || vxc) {                                                        \
+            break;                                                             \
+        }                                                                      \
+    }                                                                          \
+    handle_ieee_exc(env, vxc, vec_exc, retaddr);                               \
+    *v1 = tmp;                                                                 \
+}
+DEF_VOP_3(32)
+DEF_VOP_3(64)
+DEF_VOP_3(128)
+
+#define DEF_GVEC_FVA(BITS)                                                     \
+void HELPER(gvec_vfa##BITS)(void *v1, const void *v2, const void *v3,          \
+                            CPUS390XState *env, uint32_t desc)                 \
+{                                                                              \
+    vop##BITS##_3(v1, v2, v3, env, false, float##BITS##_add, GETPC());         \
+}
+DEF_GVEC_FVA(32)
+DEF_GVEC_FVA(64)
+DEF_GVEC_FVA(128)
+
+#define DEF_GVEC_FVA_S(BITS)                                                   \
+void HELPER(gvec_vfa##BITS##s)(void *v1, const void *v2, const void *v3,       \
+                               CPUS390XState *env, uint32_t desc)              \
+{                                                                              \
+    vop##BITS##_3(v1, v2, v3, env, true, float##BITS##_add, GETPC());          \
+}
+DEF_GVEC_FVA_S(32)
+DEF_GVEC_FVA_S(64)
 
 static int wfc64(const S390Vector *v1, const S390Vector *v2,
                  CPUS390XState *env, bool signal, uintptr_t retaddr)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 05/20] s390x/tcg: Implement 32/128 bit for VECTOR FP DIVIDE
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 04/20] s390x/tcg: Implement 32/128 bit for VECTOR FP ADD David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-09-30 14:55 ` [PATCH v1 06/20] s390x/tcg: Implement 32/128 bit for VECTOR FP MULTIPLY David Hildenbrand
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

Just like VECTOR FP ADD.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  3 +++
 target/s390x/translate_vx.c.inc | 18 +++++++++++++++++-
 target/s390x/vec_fpu_helper.c   | 28 +++++++++++++++-------------
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 3d59f143e0..3dfd480fc1 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -274,8 +274,11 @@ DEF_HELPER_FLAGS_4(gvec_vcgd64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vcgd64s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vclgd64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vclgd64s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfd32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfd32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfd64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfd64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfd128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfi64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfi64s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfll32, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index 2ba2170b16..ea1b2732bc 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2527,7 +2527,23 @@ static DisasJumpType op_vfa(DisasContext *s, DisasOps *o)
         }
         break;
     case 0xe5:
-        fn = se ? gen_helper_gvec_vfd64s : gen_helper_gvec_vfd64;
+        switch (fpf) {
+        case FPF_SHORT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = se ? gen_helper_gvec_vfd32s : gen_helper_gvec_vfd32;
+            }
+            break;
+        case FPF_LONG:
+            fn = se ? gen_helper_gvec_vfd64s : gen_helper_gvec_vfd64;
+            break;
+        case FPF_EXT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = gen_helper_gvec_vfd128;
+            }
+            break;
+        default:
+            break;
+        }
         break;
     case 0xe7:
         fn = se ? gen_helper_gvec_vfm64s : gen_helper_gvec_vfm64;
diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
index ae803ba602..cfa143b62a 100644
--- a/target/s390x/vec_fpu_helper.c
+++ b/target/s390x/vec_fpu_helper.c
@@ -411,22 +411,24 @@ void HELPER(gvec_vclgd64s)(void *v1, const void *v2, CPUS390XState *env,
     vop64_2(v1, v2, env, true, XxC, erm, vclgd64, GETPC());
 }
 
-static uint64_t vfd64(uint64_t a, uint64_t b, float_status *s)
-{
-    return float64_div(a, b, s);
-}
-
-void HELPER(gvec_vfd64)(void *v1, const void *v2, const void *v3,
-                        CPUS390XState *env, uint32_t desc)
-{
-    vop64_3(v1, v2, v3, env, false, vfd64, GETPC());
+#define DEF_GVEC_FVD(BITS)                                                     \
+void HELPER(gvec_vfd##BITS)(void *v1, const void *v2, const void *v3,          \
+                            CPUS390XState *env, uint32_t desc)                 \
+{                                                                              \
+    vop##BITS##_3(v1, v2, v3, env, false, float##BITS##_div, GETPC());         \
 }
+DEF_GVEC_FVD(32)
+DEF_GVEC_FVD(64)
+DEF_GVEC_FVD(128)
 
-void HELPER(gvec_vfd64s)(void *v1, const void *v2, const void *v3,
-                         CPUS390XState *env, uint32_t desc)
-{
-    vop64_3(v1, v2, v3, env, true, vfd64, GETPC());
+#define DEF_GVEC_FVD_S(BITS)                                                   \
+void HELPER(gvec_vfd##BITS##s)(void *v1, const void *v2, const void *v3,       \
+                               CPUS390XState *env, uint32_t desc)              \
+{                                                                              \
+    vop##BITS##_3(v1, v2, v3, env, true, float##BITS##_div, GETPC());          \
 }
+DEF_GVEC_FVD_S(32)
+DEF_GVEC_FVD_S(64)
 
 static uint64_t vfi64(uint64_t a, float_status *s)
 {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 06/20] s390x/tcg: Implement 32/128 bit for VECTOR FP MULTIPLY
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (4 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 05/20] s390x/tcg: Implement 32/128 bit for VECTOR FP DIVIDE David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-09-30 14:55 ` [PATCH v1 07/20] s390x/tcg: Implement 32/128 bit for VECTOR FP SUBTRACT David Hildenbrand
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

Just like VECTOR FP ADD.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  3 +++
 target/s390x/translate_vx.c.inc | 18 +++++++++++++++++-
 target/s390x/vec_fpu_helper.c   | 28 +++++++++++++++-------------
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 3dfd480fc1..a7a902ed9c 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -285,8 +285,11 @@ DEF_HELPER_FLAGS_4(gvec_vfll32, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfll32s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vflr64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vflr64s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfm32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfm32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfm64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfm64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfm128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_6(gvec_vfma64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_6(gvec_vfma64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_6(gvec_vfms64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index ea1b2732bc..65385ce5ee 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2546,7 +2546,23 @@ static DisasJumpType op_vfa(DisasContext *s, DisasOps *o)
         }
         break;
     case 0xe7:
-        fn = se ? gen_helper_gvec_vfm64s : gen_helper_gvec_vfm64;
+        switch (fpf) {
+        case FPF_SHORT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = se ? gen_helper_gvec_vfm32s : gen_helper_gvec_vfm32;
+            }
+            break;
+        case FPF_LONG:
+            fn = se ? gen_helper_gvec_vfm64s : gen_helper_gvec_vfm64;
+            break;
+        case FPF_EXT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = gen_helper_gvec_vfm128;
+            }
+            break;
+        default:
+            break;
+        }
         break;
     case 0xe2:
         fn = se ? gen_helper_gvec_vfs64s : gen_helper_gvec_vfs64;
diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
index cfa143b62a..335d540622 100644
--- a/target/s390x/vec_fpu_helper.c
+++ b/target/s390x/vec_fpu_helper.c
@@ -531,22 +531,24 @@ void HELPER(gvec_vflr64s)(void *v1, const void *v2, CPUS390XState *env,
     vflr64(v1, v2, env, true, XxC, erm, GETPC());
 }
 
-static uint64_t vfm64(uint64_t a, uint64_t b, float_status *s)
-{
-    return float64_mul(a, b, s);
-}
-
-void HELPER(gvec_vfm64)(void *v1, const void *v2, const void *v3,
-                        CPUS390XState *env, uint32_t desc)
-{
-    vop64_3(v1, v2, v3, env, false, vfm64, GETPC());
+#define DEF_GVEC_FVM(BITS)                                                     \
+void HELPER(gvec_vfm##BITS)(void *v1, const void *v2, const void *v3,          \
+                            CPUS390XState *env, uint32_t desc)                 \
+{                                                                              \
+    vop##BITS##_3(v1, v2, v3, env, false, float##BITS##_mul, GETPC());         \
 }
+DEF_GVEC_FVM(32)
+DEF_GVEC_FVM(64)
+DEF_GVEC_FVM(128)
 
-void HELPER(gvec_vfm64s)(void *v1, const void *v2, const void *v3,
-                         CPUS390XState *env, uint32_t desc)
-{
-    vop64_3(v1, v2, v3, env, true, vfm64, GETPC());
+#define DEF_GVEC_FVM_S(BITS)                                                   \
+void HELPER(gvec_vfm##BITS##s)(void *v1, const void *v2, const void *v3,       \
+                               CPUS390XState *env, uint32_t desc)              \
+{                                                                              \
+    vop##BITS##_3(v1, v2, v3, env, true, float##BITS##_mul, GETPC());          \
 }
+DEF_GVEC_FVM_S(32)
+DEF_GVEC_FVM_S(64)
 
 static void vfma64(S390Vector *v1, const S390Vector *v2, const S390Vector *v3,
                    const S390Vector *v4, CPUS390XState *env, bool s, int flags,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 07/20] s390x/tcg: Implement 32/128 bit for VECTOR FP SUBTRACT
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (5 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 06/20] s390x/tcg: Implement 32/128 bit for VECTOR FP MULTIPLY David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-09-30 14:55 ` [PATCH v1 08/20] s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE (AND SIGNAL) SCALAR David Hildenbrand
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

Just like VECTOR FP ADD.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  3 +++
 target/s390x/translate_vx.c.inc | 18 +++++++++++++++++-
 target/s390x/vec_fpu_helper.c   | 28 +++++++++++++++-------------
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index a7a902ed9c..ab41555764 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -296,8 +296,11 @@ DEF_HELPER_FLAGS_6(gvec_vfms64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env
 DEF_HELPER_FLAGS_6(gvec_vfms64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfsq64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfsq64s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfs32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfs32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfs64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfs64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfs128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_4(gvec_vftci64, void, ptr, cptr, env, i32)
 DEF_HELPER_4(gvec_vftci64s, void, ptr, cptr, env, i32)
 
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index 65385ce5ee..91d4e74a68 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2565,7 +2565,23 @@ static DisasJumpType op_vfa(DisasContext *s, DisasOps *o)
         }
         break;
     case 0xe2:
-        fn = se ? gen_helper_gvec_vfs64s : gen_helper_gvec_vfs64;
+        switch (fpf) {
+        case FPF_SHORT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = se ? gen_helper_gvec_vfs32s : gen_helper_gvec_vfs32;
+            }
+            break;
+        case FPF_LONG:
+            fn = se ? gen_helper_gvec_vfs64s : gen_helper_gvec_vfs64;
+            break;
+        case FPF_EXT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = gen_helper_gvec_vfs128;
+            }
+            break;
+        default:
+            break;
+        }
         break;
     default:
         g_assert_not_reached();
diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
index 335d540622..799e7f793e 100644
--- a/target/s390x/vec_fpu_helper.c
+++ b/target/s390x/vec_fpu_helper.c
@@ -615,22 +615,24 @@ void HELPER(gvec_vfsq64s)(void *v1, const void *v2, CPUS390XState *env,
     vop64_2(v1, v2, env, true, false, 0, vfsq64, GETPC());
 }
 
-static uint64_t vfs64(uint64_t a, uint64_t b, float_status *s)
-{
-    return float64_sub(a, b, s);
-}
-
-void HELPER(gvec_vfs64)(void *v1, const void *v2, const void *v3,
-                        CPUS390XState *env, uint32_t desc)
-{
-    vop64_3(v1, v2, v3, env, false, vfs64, GETPC());
+#define DEF_GVEC_FVS(BITS)                                                     \
+void HELPER(gvec_vfs##BITS)(void *v1, const void *v2, const void *v3,          \
+                            CPUS390XState *env, uint32_t desc)                 \
+{                                                                              \
+    vop##BITS##_3(v1, v2, v3, env, false, float##BITS##_sub, GETPC());         \
 }
+DEF_GVEC_FVS(32)
+DEF_GVEC_FVS(64)
+DEF_GVEC_FVS(128)
 
-void HELPER(gvec_vfs64s)(void *v1, const void *v2, const void *v3,
-                         CPUS390XState *env, uint32_t desc)
-{
-    vop64_3(v1, v2, v3, env, true, vfs64, GETPC());
+#define DEF_GVEC_FVS_S(BITS)                                                   \
+void HELPER(gvec_vfs##BITS##s)(void *v1, const void *v2, const void *v3,       \
+                               CPUS390XState *env, uint32_t desc)              \
+{                                                                              \
+    vop##BITS##_3(v1, v2, v3, env, true, float##BITS##_sub, GETPC());          \
 }
+DEF_GVEC_FVS_S(32)
+DEF_GVEC_FVS_S(64)
 
 static int vftci64(S390Vector *v1, const S390Vector *v2, CPUS390XState *env,
                    bool s, uint16_t i3)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 08/20] s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE (AND SIGNAL) SCALAR
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (6 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 07/20] s390x/tcg: Implement 32/128 bit for VECTOR FP SUBTRACT David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-10-01 15:52   ` Richard Henderson
  2020-09-30 14:55 ` [PATCH v1 09/20] s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE * David Hildenbrand
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  4 +++
 target/s390x/translate_vx.c.inc | 38 +++++++++++++++-----
 target/s390x/vec_fpu_helper.c   | 64 +++++++++++++++++++--------------
 3 files changed, 72 insertions(+), 34 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index ab41555764..6bf4d3e7d0 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -252,8 +252,12 @@ DEF_HELPER_FLAGS_5(gvec_vfa32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfa64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfa64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfa128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_4(gvec_wfc32, void, cptr, cptr, env, i32)
+DEF_HELPER_4(gvec_wfk32, void, cptr, cptr, env, i32)
 DEF_HELPER_4(gvec_wfc64, void, cptr, cptr, env, i32)
 DEF_HELPER_4(gvec_wfk64, void, cptr, cptr, env, i32)
+DEF_HELPER_4(gvec_wfc128, void, cptr, cptr, env, i32)
+DEF_HELPER_4(gvec_wfk128, void, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfce64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfce64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_5(gvec_vfce64_cc, void, ptr, cptr, cptr, env, i32)
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index 91d4e74a68..cc745784e5 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2601,19 +2601,41 @@ static DisasJumpType op_wfc(DisasContext *s, DisasOps *o)
 {
     const uint8_t fpf = get_field(s, m3);
     const uint8_t m4 = get_field(s, m4);
+    gen_helper_gvec_2_ptr *fn = NULL;
 
-    if (fpf != FPF_LONG || m4) {
+    switch (fpf) {
+    case FPF_SHORT:
+        if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+            fn = gen_helper_gvec_wfk32;
+            if (s->fields.op2 == 0xcb) {
+                fn = gen_helper_gvec_wfc32;
+            }
+        }
+        break;
+    case FPF_LONG:
+        fn = gen_helper_gvec_wfk64;
+        if (s->fields.op2 == 0xcb) {
+            fn = gen_helper_gvec_wfc64;
+        }
+        break;
+    case FPF_EXT:
+        if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+            fn = gen_helper_gvec_wfk128;
+            if (s->fields.op2 == 0xcb) {
+                fn = gen_helper_gvec_wfc128;
+            }
+        }
+        break;
+    default:
+        break;
+    };
+
+    if (!fn || m4) {
         gen_program_exception(s, PGM_SPECIFICATION);
         return DISAS_NORETURN;
     }
 
-    if (s->fields.op2 == 0xcb) {
-        gen_gvec_2_ptr(get_field(s, v1), get_field(s, v2),
-                       cpu_env, 0, gen_helper_gvec_wfc64);
-    } else {
-        gen_gvec_2_ptr(get_field(s, v1), get_field(s, v2),
-                       cpu_env, 0, gen_helper_gvec_wfk64);
-    }
+    gen_gvec_2_ptr(get_field(s, v1), get_field(s, v2), cpu_env, 0, fn);
     set_cc_static(s);
     return DISAS_NEXT;
 }
diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
index 799e7f793e..1b78b6c088 100644
--- a/target/s390x/vec_fpu_helper.c
+++ b/target/s390x/vec_fpu_helper.c
@@ -183,37 +183,49 @@ void HELPER(gvec_vfa##BITS##s)(void *v1, const void *v2, const void *v3,       \
 DEF_GVEC_FVA_S(32)
 DEF_GVEC_FVA_S(64)
 
-static int wfc64(const S390Vector *v1, const S390Vector *v2,
-                 CPUS390XState *env, bool signal, uintptr_t retaddr)
-{
-    /* only the zero-indexed elements are compared */
-    const float64 a = s390_vec_read_element64(v1, 0);
-    const float64 b = s390_vec_read_element64(v2, 0);
-    uint8_t vxc, vec_exc = 0;
-    int cmp;
-
-    if (signal) {
-        cmp = float64_compare(a, b, &env->fpu_status);
-    } else {
-        cmp = float64_compare_quiet(a, b, &env->fpu_status);
-    }
-    vxc = check_ieee_exc(env, 0, false, &vec_exc);
-    handle_ieee_exc(env, vxc, vec_exc, retaddr);
-
-    return float_comp_to_cc(env, cmp);
+#define DEF_WFC(BITS)                                                          \
+static int wfc##BITS(const S390Vector *v1, const S390Vector *v2,               \
+                     CPUS390XState *env, bool signal, uintptr_t retaddr)       \
+{                                                                              \
+    /* only the zero-indexed elements are compared */                          \
+    const float##BITS a = s390_vec_read_float##BITS(v1, 0);                    \
+    const float##BITS b = s390_vec_read_float##BITS(v2, 0);                    \
+    uint8_t vxc, vec_exc = 0;                                                  \
+    int cmp;                                                                   \
+                                                                               \
+    if (signal) {                                                              \
+        cmp = float##BITS##_compare(a, b, &env->fpu_status);                   \
+    } else {                                                                   \
+        cmp = float##BITS##_compare_quiet(a, b, &env->fpu_status);             \
+    }                                                                          \
+    vxc = check_ieee_exc(env, 0, false, &vec_exc);                             \
+    handle_ieee_exc(env, vxc, vec_exc, retaddr);                               \
+                                                                               \
+    return float_comp_to_cc(env, cmp);                                         \
 }
+DEF_WFC(32)
+DEF_WFC(64)
+DEF_WFC(128)
 
-void HELPER(gvec_wfc64)(const void *v1, const void *v2, CPUS390XState *env,
-                        uint32_t desc)
-{
-    env->cc_op = wfc64(v1, v2, env, false, GETPC());
+#define DEF_GVEC_WFC(BITS)                                                     \
+void HELPER(gvec_wfc##BITS)(const void *v1, const void *v2, CPUS390XState *env,\
+                            uint32_t desc)                                     \
+{                                                                              \
+    env->cc_op = wfc##BITS(v1, v2, env, false, GETPC());                       \
 }
+DEF_GVEC_WFC(32)
+DEF_GVEC_WFC(64)
+DEF_GVEC_WFC(128)
 
-void HELPER(gvec_wfk64)(const void *v1, const void *v2, CPUS390XState *env,
-                        uint32_t desc)
-{
-    env->cc_op = wfc64(v1, v2, env, true, GETPC());
+#define DEF_GVEC_WFK(BITS)                                                     \
+void HELPER(gvec_wfk##BITS)(const void *v1, const void *v2, CPUS390XState *env,\
+                            uint32_t desc)                                     \
+{                                                                              \
+    env->cc_op = wfc##BITS(v1, v2, env, true, GETPC());                        \
 }
+DEF_GVEC_WFK(32)
+DEF_GVEC_WFK(64)
+DEF_GVEC_WFK(128)
 
 typedef bool (*vfc64_fn)(float64 a, float64 b, float_status *status);
 static int vfc64(S390Vector *v1, const S390Vector *v2, const S390Vector *v3,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 09/20] s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE *
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (7 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 08/20] s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE (AND SIGNAL) SCALAR David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-10-01 16:12   ` Richard Henderson
  2020-09-30 14:55 ` [PATCH v1 10/20] s390x/tcg: Implement 32/128 bit for VECTOR LOAD FP INTEGER David Hildenbrand
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

In addition to 32/128bit variants, we also have to support the
"Signal-on-QNaN (SQ)" bit ... let's pass it as a simple flag, I don't
feel like duplicating all helpers and coming up with names like
"...s_cc_sq".

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  18 +++
 target/s390x/translate_vx.c.inc |  91 +++++++++---
 target/s390x/vec_fpu_helper.c   | 250 ++++++++++++++++++++++----------
 3 files changed, 258 insertions(+), 101 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 6bf4d3e7d0..538d55420b 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -258,18 +258,36 @@ DEF_HELPER_4(gvec_wfc64, void, cptr, cptr, env, i32)
 DEF_HELPER_4(gvec_wfk64, void, cptr, cptr, env, i32)
 DEF_HELPER_4(gvec_wfc128, void, cptr, cptr, env, i32)
 DEF_HELPER_4(gvec_wfk128, void, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfce32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfce32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfce64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfce64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfce128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_5(gvec_vfce32_cc, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_5(gvec_vfce32s_cc, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_5(gvec_vfce64_cc, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_5(gvec_vfce64s_cc, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_5(gvec_vfce128_cc, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfch32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfch32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfch64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfch64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfch128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_5(gvec_vfch32_cc, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_5(gvec_vfch32s_cc, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_5(gvec_vfch64_cc, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_5(gvec_vfch64s_cc, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_5(gvec_vfch128_cc, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfche32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfche32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfche64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfche64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfche128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_5(gvec_vfche32_cc, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_5(gvec_vfche32s_cc, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_5(gvec_vfche64_cc, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_5(gvec_vfche64s_cc, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_5(gvec_vfche128_cc, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vcdg64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vcdg64s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vcdlg64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index cc745784e5..fd1cd6f6d5 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2646,45 +2646,90 @@ static DisasJumpType op_vfc(DisasContext *s, DisasOps *o)
     const uint8_t m5 = get_field(s, m5);
     const uint8_t m6 = get_field(s, m6);
     const bool se = extract32(m5, 3, 1);
+    const bool sq = extract32(m5, 2, 1);
     const bool cs = extract32(m6, 0, 1);
-    gen_helper_gvec_3_ptr *fn;
-
-    if (fpf != FPF_LONG || extract32(m5, 0, 3) || extract32(m6, 1, 3)) {
-        gen_program_exception(s, PGM_SPECIFICATION);
-        return DISAS_NORETURN;
-    }
+    gen_helper_gvec_3_ptr *fn = NULL;
 
-    if (cs) {
-        switch (s->fields.op2) {
-        case 0xe8:
-            fn = se ? gen_helper_gvec_vfce64s_cc : gen_helper_gvec_vfce64_cc;
+    switch (s->fields.op2) {
+    case 0xe8:
+        switch (fpf) {
+        case FPF_SHORT:
+            fn = se ? gen_helper_gvec_vfce32s : gen_helper_gvec_vfce32;
+            if (cs) {
+                fn = se ? gen_helper_gvec_vfce32s_cc :
+                          gen_helper_gvec_vfce32_cc;
+            }
             break;
-        case 0xeb:
-            fn = se ? gen_helper_gvec_vfch64s_cc : gen_helper_gvec_vfch64_cc;
+        case FPF_LONG:
+            fn = se ? gen_helper_gvec_vfce64s : gen_helper_gvec_vfce64;
+            if (cs) {
+                fn = se ? gen_helper_gvec_vfce64s_cc :
+                          gen_helper_gvec_vfce64_cc;
+            }
             break;
-        case 0xea:
-            fn = se ? gen_helper_gvec_vfche64s_cc : gen_helper_gvec_vfche64_cc;
+        case FPF_EXT:
+            fn = cs ? gen_helper_gvec_vfce128_cc : gen_helper_gvec_vfce128;
             break;
         default:
-            g_assert_not_reached();
+            break;
         }
-    } else {
-        switch (s->fields.op2) {
-        case 0xe8:
-            fn = se ? gen_helper_gvec_vfce64s : gen_helper_gvec_vfce64;
+        break;
+    case 0xeb:
+        switch (fpf) {
+        case FPF_SHORT:
+            fn = se ? gen_helper_gvec_vfch32s : gen_helper_gvec_vfch32;
+            if (cs) {
+                fn = se ? gen_helper_gvec_vfch32s_cc : gen_helper_gvec_vfch32_cc;
+            }
             break;
-        case 0xeb:
+        case FPF_LONG:
             fn = se ? gen_helper_gvec_vfch64s : gen_helper_gvec_vfch64;
+            if (cs) {
+                fn = se ? gen_helper_gvec_vfch64s_cc : gen_helper_gvec_vfch64_cc;
+            }
+            break;
+        case FPF_EXT:
+            fn = cs ? gen_helper_gvec_vfch128_cc : gen_helper_gvec_vfch128;
             break;
-        case 0xea:
+        default:
+            break;
+        }
+        break;
+    case 0xea:
+        switch (fpf) {
+        case FPF_SHORT:
+            fn = se ? gen_helper_gvec_vfche32s : gen_helper_gvec_vfche32;
+            if (cs) {
+                fn = se ? gen_helper_gvec_vfche32s_cc :
+                          gen_helper_gvec_vfche32_cc;
+            }
+            break;
+        case FPF_LONG:
             fn = se ? gen_helper_gvec_vfche64s : gen_helper_gvec_vfche64;
+            if (cs) {
+                fn = se ? gen_helper_gvec_vfche64s_cc :
+                          gen_helper_gvec_vfche64_cc;
+            }
+            break;
+        case FPF_EXT:
+            fn = cs ? gen_helper_gvec_vfche128_cc : gen_helper_gvec_vfche128;
             break;
         default:
-            g_assert_not_reached();
+            break;
         }
+    default:
+        break;
     }
+
+    if (!fn || extract32(m5, 0, 2) || extract32(m6, 1, 3) ||
+        (!s390_has_feat(S390_FEAT_VECTOR_ENH) && (fpf != FPF_LONG || sq))) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    /* Pass the "sq" flag as data. */
     gen_gvec_3_ptr(get_field(s, v1), get_field(s, v2),
-                   get_field(s, v3), cpu_env, 0, fn);
+                   get_field(s, v3), cpu_env, sq, fn);
     if (cs) {
         set_cc_static(s);
     }
diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
index 1b78b6c088..e8ae608da6 100644
--- a/target/s390x/vec_fpu_helper.c
+++ b/target/s390x/vec_fpu_helper.c
@@ -20,6 +20,10 @@
 #include "exec/helper-proto.h"
 #include "fpu/softfloat.h"
 
+const float32 float32_ones = make_float32(-1u);
+const float64 float64_ones = make_float64(-1ull);
+const float128 float128_ones = make_float128(-1ull, -1ull);
+
 #define VIC_INVALID         0x1
 #define VIC_DIVBYZERO       0x2
 #define VIC_OVERFLOW        0x3
@@ -227,109 +231,199 @@ DEF_GVEC_WFK(32)
 DEF_GVEC_WFK(64)
 DEF_GVEC_WFK(128)
 
-typedef bool (*vfc64_fn)(float64 a, float64 b, float_status *status);
-static int vfc64(S390Vector *v1, const S390Vector *v2, const S390Vector *v3,
-                 CPUS390XState *env, bool s, vfc64_fn fn, uintptr_t retaddr)
-{
-    uint8_t vxc, vec_exc = 0;
-    S390Vector tmp = {};
-    int match = 0;
-    int i;
-
-    for (i = 0; i < 2; i++) {
-        const float64 a = s390_vec_read_element64(v2, i);
-        const float64 b = s390_vec_read_element64(v3, i);
-
-        /* swap the order of the parameters, so we can use existing functions */
-        if (fn(b, a, &env->fpu_status)) {
-            match++;
-            s390_vec_write_element64(&tmp, i, -1ull);
-        }
-        vxc = check_ieee_exc(env, i, false, &vec_exc);
-        if (s || vxc) {
-            break;
-        }
-    }
-
-    handle_ieee_exc(env, vxc, vec_exc, retaddr);
-    *v1 = tmp;
-    if (match) {
-        return s || match == 2 ? 0 : 1;
-    }
-    return 3;
+#define DEF_VFC(BITS)                                                          \
+typedef bool (*vfc##BITS##_fn)(float##BITS a, float##BITS b,                   \
+                               float_status *status);                          \
+static int vfc##BITS(S390Vector *v1, const S390Vector *v2,                     \
+                     const S390Vector *v3, CPUS390XState *env, bool s,         \
+                     vfc##BITS##_fn fn, uintptr_t retaddr)                     \
+{                                                                              \
+    uint8_t vxc, vec_exc = 0;                                                  \
+    S390Vector tmp = {};                                                       \
+    int match = 0;                                                             \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const float##BITS a = s390_vec_read_float##BITS(v2, i);                \
+        const float##BITS b = s390_vec_read_float##BITS(v3, i);                \
+                                                                               \
+        /* swap the parameters, so we can use existing functions */            \
+        if (fn(b, a, &env->fpu_status)) {                                      \
+            match++;                                                           \
+            s390_vec_write_float##BITS(&tmp, i, float##BITS##_ones);           \
+        }                                                                      \
+        vxc = check_ieee_exc(env, i, false, &vec_exc);                         \
+        if (s || vxc) {                                                        \
+            break;                                                             \
+        }                                                                      \
+    }                                                                          \
+                                                                               \
+    handle_ieee_exc(env, vxc, vec_exc, retaddr);                               \
+    *v1 = tmp;                                                                 \
+    if (match) {                                                               \
+        return s || match == (128 / BITS) ? 0 : 1;                             \
+    }                                                                          \
+    return 3;                                                                  \
 }
+DEF_VFC(32)
+DEF_VFC(64)
+DEF_VFC(128)
 
-void HELPER(gvec_vfce64)(void *v1, const void *v2, const void *v3,
-                         CPUS390XState *env, uint32_t desc)
-{
-    vfc64(v1, v2, v3, env, false, float64_eq_quiet, GETPC());
+#define DEF_GVEC_VFCE(BITS)                                                    \
+void HELPER(gvec_vfce##BITS)(void *v1, const void *v2, const void *v3,         \
+                             CPUS390XState *env, uint32_t desc)                \
+{                                                                              \
+    const bool sq = simd_data(desc);                                           \
+                                                                               \
+    vfc##BITS(v1, v2, v3, env, false,                                          \
+              sq ? float##BITS##_eq : float##BITS##_eq_quiet, GETPC());        \
 }
+DEF_GVEC_VFCE(32)
+DEF_GVEC_VFCE(64)
+DEF_GVEC_VFCE(128)
 
-void HELPER(gvec_vfce64s)(void *v1, const void *v2, const void *v3,
-                          CPUS390XState *env, uint32_t desc)
-{
-    vfc64(v1, v2, v3, env, true, float64_eq_quiet, GETPC());
+#define DEF_GVEC_VFCE_S(BITS)                                                  \
+void HELPER(gvec_vfce##BITS##s)(void *v1, const void *v2, const void *v3,      \
+                                CPUS390XState *env, uint32_t desc)             \
+{                                                                              \
+    const bool sq = simd_data(desc);                                           \
+                                                                               \
+    vfc##BITS(v1, v2, v3, env, true,                                           \
+              sq ? float##BITS##_eq : float##BITS##_eq_quiet, GETPC());        \
 }
+DEF_GVEC_VFCE_S(32)
+DEF_GVEC_VFCE_S(64)
 
-void HELPER(gvec_vfce64_cc)(void *v1, const void *v2, const void *v3,
-                            CPUS390XState *env, uint32_t desc)
-{
-    env->cc_op = vfc64(v1, v2, v3, env, false, float64_eq_quiet, GETPC());
+#define DEF_GVEC_VFCE_CC(BITS)                                                 \
+void HELPER(gvec_vfce##BITS##_cc)(void *v1, const void *v2, const void *v3,    \
+                                  CPUS390XState *env, uint32_t desc)           \
+{                                                                              \
+    const bool sq = simd_data(desc);                                           \
+                                                                               \
+    env->cc_op = vfc##BITS(v1, v2, v3, env, false,                             \
+                           sq ? float##BITS##_eq : float##BITS##_eq_quiet,     \
+                           GETPC());                                           \
 }
+DEF_GVEC_VFCE_CC(32)
+DEF_GVEC_VFCE_CC(64)
+DEF_GVEC_VFCE_CC(128)
 
-void HELPER(gvec_vfce64s_cc)(void *v1, const void *v2, const void *v3,
-                            CPUS390XState *env, uint32_t desc)
-{
-    env->cc_op = vfc64(v1, v2, v3, env, true, float64_eq_quiet, GETPC());
+#define DEF_GVEC_VFCE_S_CC(BITS)                                               \
+void HELPER(gvec_vfce##BITS##s_cc)(void *v1, const void *v2, const void *v3,   \
+                                   CPUS390XState *env, uint32_t desc)          \
+{                                                                              \
+    const bool sq = simd_data(desc);                                           \
+                                                                               \
+    env->cc_op = vfc##BITS(v1, v2, v3, env, true,                              \
+                           sq ? float##BITS##_eq : float##BITS##_eq_quiet,     \
+                           GETPC());                                           \
 }
+DEF_GVEC_VFCE_S_CC(32)
+DEF_GVEC_VFCE_S_CC(64)
 
-void HELPER(gvec_vfch64)(void *v1, const void *v2, const void *v3,
-                         CPUS390XState *env, uint32_t desc)
-{
-    vfc64(v1, v2, v3, env, false, float64_lt_quiet, GETPC());
+#define DEF_GVEC_VFCH(BITS)                                                    \
+void HELPER(gvec_vfch##BITS)(void *v1, const void *v2, const void *v3,         \
+                             CPUS390XState *env, uint32_t desc)                \
+{                                                                              \
+    const bool sq = simd_data(desc);                                           \
+                                                                               \
+    vfc##BITS(v1, v2, v3, env, false,                                          \
+              sq ? float##BITS##_lt : float##BITS##_lt_quiet, GETPC());        \
 }
+DEF_GVEC_VFCH(32)
+DEF_GVEC_VFCH(64)
+DEF_GVEC_VFCH(128)
 
-void HELPER(gvec_vfch64s)(void *v1, const void *v2, const void *v3,
-                          CPUS390XState *env, uint32_t desc)
-{
-    vfc64(v1, v2, v3, env, true, float64_lt_quiet, GETPC());
+#define DEF_GVEC_VFCH_S(BITS)                                                  \
+void HELPER(gvec_vfch##BITS##s)(void *v1, const void *v2, const void *v3,      \
+                                CPUS390XState *env, uint32_t desc)             \
+{                                                                              \
+    const bool sq = simd_data(desc);                                           \
+                                                                               \
+    vfc##BITS(v1, v2, v3, env, true,                                           \
+              sq ? float##BITS##_lt : float##BITS##_lt_quiet, GETPC());        \
 }
+DEF_GVEC_VFCH_S(32)
+DEF_GVEC_VFCH_S(64)
 
-void HELPER(gvec_vfch64_cc)(void *v1, const void *v2, const void *v3,
-                            CPUS390XState *env, uint32_t desc)
-{
-    env->cc_op = vfc64(v1, v2, v3, env, false, float64_lt_quiet, GETPC());
+#define DEF_GVEC_VFCH_CC(BITS)                                                 \
+void HELPER(gvec_vfch##BITS##_cc)(void *v1, const void *v2, const void *v3,    \
+                                  CPUS390XState *env, uint32_t desc)           \
+{                                                                              \
+    const bool sq = simd_data(desc);                                           \
+                                                                               \
+    env->cc_op = vfc##BITS(v1, v2, v3, env, false,                             \
+                           sq ? float##BITS##_lt : float##BITS##_lt_quiet,     \
+                           GETPC());                                           \
 }
+DEF_GVEC_VFCH_CC(32)
+DEF_GVEC_VFCH_CC(64)
+DEF_GVEC_VFCH_CC(128)
 
-void HELPER(gvec_vfch64s_cc)(void *v1, const void *v2, const void *v3,
-                             CPUS390XState *env, uint32_t desc)
-{
-    env->cc_op = vfc64(v1, v2, v3, env, true, float64_lt_quiet, GETPC());
+#define DEF_GVEC_VFCH_S_CC(BITS)                                               \
+void HELPER(gvec_vfch##BITS##s_cc)(void *v1, const void *v2, const void *v3,   \
+                                   CPUS390XState *env, uint32_t desc)          \
+{                                                                              \
+    const bool sq = simd_data(desc);                                           \
+                                                                               \
+    env->cc_op = vfc##BITS(v1, v2, v3, env, true,                              \
+                           sq ? float##BITS##_lt : float##BITS##_lt_quiet,     \
+                           GETPC());                                           \
 }
+DEF_GVEC_VFCH_S_CC(32)
+DEF_GVEC_VFCH_S_CC(64)
 
-void HELPER(gvec_vfche64)(void *v1, const void *v2, const void *v3,
-                          CPUS390XState *env, uint32_t desc)
-{
-    vfc64(v1, v2, v3, env, false, float64_le_quiet, GETPC());
+#define DEF_GVEC_VFCHE(BITS)                                                   \
+void HELPER(gvec_vfche##BITS)(void *v1, const void *v2, const void *v3,        \
+                              CPUS390XState *env, uint32_t desc)               \
+{                                                                              \
+    const bool sq = simd_data(desc);                                           \
+                                                                               \
+    vfc##BITS(v1, v2, v3, env, false,                                          \
+              sq ? float##BITS##_le : float##BITS##_le_quiet, GETPC());        \
 }
+DEF_GVEC_VFCHE(32)
+DEF_GVEC_VFCHE(64)
+DEF_GVEC_VFCHE(128)
 
-void HELPER(gvec_vfche64s)(void *v1, const void *v2, const void *v3,
-                           CPUS390XState *env, uint32_t desc)
-{
-    vfc64(v1, v2, v3, env, true, float64_le_quiet, GETPC());
+#define DEF_GVEC_VFCHE_S(BITS)                                                 \
+void HELPER(gvec_vfche##BITS##s)(void *v1, const void *v2, const void *v3,     \
+                                 CPUS390XState *env, uint32_t desc)            \
+{                                                                              \
+    const bool sq = simd_data(desc);                                           \
+                                                                               \
+    vfc##BITS(v1, v2, v3, env, true,                                           \
+              sq ? float##BITS##_le : float##BITS##_le_quiet, GETPC());        \
 }
+DEF_GVEC_VFCHE_S(32)
+DEF_GVEC_VFCHE_S(64)
 
-void HELPER(gvec_vfche64_cc)(void *v1, const void *v2, const void *v3,
-                             CPUS390XState *env, uint32_t desc)
-{
-    env->cc_op = vfc64(v1, v2, v3, env, false, float64_le_quiet, GETPC());
+#define DEF_GVEC_VFCHE_CC(BITS)                                                \
+void HELPER(gvec_vfche##BITS##_cc)(void *v1, const void *v2, const void *v3,   \
+                                   CPUS390XState *env, uint32_t desc)          \
+{                                                                              \
+    const bool sq = simd_data(desc);                                           \
+                                                                               \
+    env->cc_op = vfc##BITS(v1, v2, v3, env, false,                             \
+                           sq ? float##BITS##_le : float##BITS##_le_quiet,     \
+                           GETPC());                                           \
 }
+DEF_GVEC_VFCHE_CC(32)
+DEF_GVEC_VFCHE_CC(64)
+DEF_GVEC_VFCHE_CC(128)
 
-void HELPER(gvec_vfche64s_cc)(void *v1, const void *v2, const void *v3,
-                              CPUS390XState *env, uint32_t desc)
-{
-    env->cc_op = vfc64(v1, v2, v3, env, true, float64_le_quiet, GETPC());
+#define DEF_GVEC_VFCHE_S_CC(BITS)                                              \
+void HELPER(gvec_vfche##BITS##s_cc)(void *v1, const void *v2, const void *v3,  \
+                                    CPUS390XState *env, uint32_t desc)         \
+{                                                                              \
+    const bool sq = simd_data(desc);                                           \
+                                                                               \
+    env->cc_op = vfc##BITS(v1, v2, v3, env, true,                              \
+                           sq ? float##BITS##_le : float##BITS##_le_quiet,     \
+                           GETPC());                                           \
 }
+DEF_GVEC_VFCHE_S_CC(32)
+DEF_GVEC_VFCHE_S_CC(64)
 
 static uint64_t vcdg64(uint64_t a, float_status *s)
 {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 10/20] s390x/tcg: Implement 32/128 bit for VECTOR LOAD FP INTEGER
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (8 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 09/20] s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE * David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-09-30 14:55 ` [PATCH v1 11/20] s390x/tcg: Implement 64 bit for VECTOR FP LOAD LENGTHENED David Hildenbrand
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

Convert vop64_2 into a macro, similar to already done with vop64_3.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  3 ++
 target/s390x/translate_vx.c.inc | 51 ++++++++++++++-----
 target/s390x/vec_fpu_helper.c   | 90 ++++++++++++++++++---------------
 3 files changed, 91 insertions(+), 53 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 538d55420b..ae9f855b05 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -301,8 +301,11 @@ DEF_HELPER_FLAGS_5(gvec_vfd32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfd64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfd64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfd128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_4(gvec_vfi32, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
+DEF_HELPER_FLAGS_4(gvec_vfi32s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfi64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfi64s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
+DEF_HELPER_FLAGS_4(gvec_vfi128, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfll32, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfll32s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vflr64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index fd1cd6f6d5..f6aed65ff5 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2742,35 +2742,62 @@ static DisasJumpType op_vcdg(DisasContext *s, DisasOps *o)
     const uint8_t m4 = get_field(s, m4);
     const uint8_t erm = get_field(s, m5);
     const bool se = extract32(m4, 3, 1);
-    gen_helper_gvec_2_ptr *fn;
-
-    if (fpf != FPF_LONG || extract32(m4, 0, 2) || erm > 7 || erm == 2) {
-        gen_program_exception(s, PGM_SPECIFICATION);
-        return DISAS_NORETURN;
-    }
+    gen_helper_gvec_2_ptr *fn = NULL;
 
     switch (s->fields.op2) {
     case 0xc3:
-        fn = se ? gen_helper_gvec_vcdg64s : gen_helper_gvec_vcdg64;
+        if (fpf == FPF_LONG) {
+            fn = se ? gen_helper_gvec_vcdg64s : gen_helper_gvec_vcdg64;
+        }
         break;
     case 0xc1:
-        fn = se ? gen_helper_gvec_vcdlg64s : gen_helper_gvec_vcdlg64;
+        if (fpf == FPF_LONG) {
+            fn = se ? gen_helper_gvec_vcdlg64s : gen_helper_gvec_vcdlg64;
+        }
         break;
     case 0xc2:
-        fn = se ? gen_helper_gvec_vcgd64s : gen_helper_gvec_vcgd64;
+        if (fpf == FPF_LONG) {
+            fn = se ? gen_helper_gvec_vcgd64s : gen_helper_gvec_vcgd64;
+        }
         break;
     case 0xc0:
-        fn = se ? gen_helper_gvec_vclgd64s : gen_helper_gvec_vclgd64;
+        if (fpf == FPF_LONG) {
+            fn = se ? gen_helper_gvec_vclgd64s : gen_helper_gvec_vclgd64;
+        }
         break;
     case 0xc7:
-        fn = se ? gen_helper_gvec_vfi64s : gen_helper_gvec_vfi64;
+        switch (fpf) {
+        case FPF_SHORT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = se ? gen_helper_gvec_vfi32s : gen_helper_gvec_vfi32;
+            }
+            break;
+        case FPF_LONG:
+            fn = se ? gen_helper_gvec_vfi64s : gen_helper_gvec_vfi64;
+            break;
+        case FPF_EXT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = gen_helper_gvec_vfi128;
+            }
+            break;
+        default:
+            break;
+        }
         break;
     case 0xc5:
-        fn = se ? gen_helper_gvec_vflr64s : gen_helper_gvec_vflr64;
+        if (fpf == FPF_LONG) {
+            fn = se ? gen_helper_gvec_vflr64s : gen_helper_gvec_vflr64;
+        }
         break;
     default:
         g_assert_not_reached();
     }
+
+    if (!fn || extract32(m4, 0, 2) || erm > 7 || erm == 2) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
     gen_gvec_2_ptr(get_field(s, v1), get_field(s, v2), cpu_env,
                    deposit32(m4, 4, 4, erm), fn);
     return DISAS_NEXT;
diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
index e8ae608da6..9bc7f5c8d7 100644
--- a/target/s390x/vec_fpu_helper.c
+++ b/target/s390x/vec_fpu_helper.c
@@ -116,29 +116,33 @@ static void s390_vec_write_float128(S390Vector *v, uint8_t enr, float128 data)
     s390_vec_write_element64(v, 1, data.low);
 }
 
-typedef uint64_t (*vop64_2_fn)(uint64_t a, float_status *s);
-static void vop64_2(S390Vector *v1, const S390Vector *v2, CPUS390XState *env,
-                    bool s, bool XxC, uint8_t erm, vop64_2_fn fn,
-                    uintptr_t retaddr)
-{
-    uint8_t vxc, vec_exc = 0;
-    S390Vector tmp = {};
-    int i, old_mode;
-
-    old_mode = s390_swap_bfp_rounding_mode(env, erm);
-    for (i = 0; i < 2; i++) {
-        const uint64_t a = s390_vec_read_element64(v2, i);
-
-        s390_vec_write_element64(&tmp, i, fn(a, &env->fpu_status));
-        vxc = check_ieee_exc(env, i, XxC, &vec_exc);
-        if (s || vxc) {
-            break;
-        }
-    }
-    s390_restore_bfp_rounding_mode(env, old_mode);
-    handle_ieee_exc(env, vxc, vec_exc, retaddr);
-    *v1 = tmp;
+#define DEF_VOP_2(BITS)                                                        \
+typedef float##BITS (*vop##BITS##_2_fn)(float##BITS a, float_status *s);       \
+static void vop##BITS##_2(S390Vector *v1, const S390Vector *v2,                \
+                          CPUS390XState *env, bool s, bool XxC, uint8_t erm,   \
+                          vop##BITS##_2_fn fn, uintptr_t retaddr)              \
+{                                                                              \
+    uint8_t vxc, vec_exc = 0;                                                  \
+    S390Vector tmp = {};                                                       \
+    int i, old_mode;                                                           \
+                                                                               \
+    old_mode = s390_swap_bfp_rounding_mode(env, erm);                          \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const float##BITS a = s390_vec_read_float##BITS(v2, i);                \
+                                                                               \
+        s390_vec_write_float##BITS(&tmp, i, fn(a, &env->fpu_status));          \
+        vxc = check_ieee_exc(env, i, XxC, &vec_exc);                           \
+        if (s || vxc) {                                                        \
+            break;                                                             \
+        }                                                                      \
+    }                                                                          \
+    s390_restore_bfp_rounding_mode(env, old_mode);                             \
+    handle_ieee_exc(env, vxc, vec_exc, retaddr);                               \
+    *v1 = tmp;                                                                 \
 }
+DEF_VOP_2(32)
+DEF_VOP_2(64)
+DEF_VOP_2(128)
 
 #define DEF_VOP_3(BITS)                                                        \
 typedef float##BITS (*vop##BITS##_3_fn)(float##BITS a, float##BITS b,          \
@@ -536,28 +540,32 @@ void HELPER(gvec_vfd##BITS##s)(void *v1, const void *v2, const void *v3,       \
 DEF_GVEC_FVD_S(32)
 DEF_GVEC_FVD_S(64)
 
-static uint64_t vfi64(uint64_t a, float_status *s)
-{
-    return float64_round_to_int(a, s);
-}
-
-void HELPER(gvec_vfi64)(void *v1, const void *v2, CPUS390XState *env,
-                        uint32_t desc)
-{
-    const uint8_t erm = extract32(simd_data(desc), 4, 4);
-    const bool XxC = extract32(simd_data(desc), 2, 1);
-
-    vop64_2(v1, v2, env, false, XxC, erm, vfi64, GETPC());
+#define DEF_GVEC_VFI(BITS)                                                     \
+void HELPER(gvec_vfi##BITS)(void *v1, const void *v2, CPUS390XState *env,      \
+                            uint32_t desc)                                     \
+{                                                                              \
+    const uint8_t erm = extract32(simd_data(desc), 4, 4);                      \
+    const bool XxC = extract32(simd_data(desc), 2, 1);                         \
+                                                                               \
+    vop##BITS##_2(v1, v2, env, false, XxC, erm, float##BITS##_round_to_int,    \
+                  GETPC());                                                    \
 }
+DEF_GVEC_VFI(32)
+DEF_GVEC_VFI(64)
+DEF_GVEC_VFI(128)
 
-void HELPER(gvec_vfi64s)(void *v1, const void *v2, CPUS390XState *env,
-                         uint32_t desc)
-{
-    const uint8_t erm = extract32(simd_data(desc), 4, 4);
-    const bool XxC = extract32(simd_data(desc), 2, 1);
-
-    vop64_2(v1, v2, env, true, XxC, erm, vfi64, GETPC());
+#define DEF_GVEC_VFI_S(BITS)                                                   \
+void HELPER(gvec_vfi##BITS##s)(void *v1, const void *v2, CPUS390XState *env,   \
+                               uint32_t desc)                                  \
+{                                                                              \
+    const uint8_t erm = extract32(simd_data(desc), 4, 4);                      \
+    const bool XxC = extract32(simd_data(desc), 2, 1);                         \
+                                                                               \
+    vop##BITS##_2(v1, v2, env, true, XxC, erm, float##BITS##_round_to_int,     \
+                  GETPC());                                                    \
 }
+DEF_GVEC_VFI_S(32)
+DEF_GVEC_VFI_S(64)
 
 static void vfll32(S390Vector *v1, const S390Vector *v2, CPUS390XState *env,
                    bool s, uintptr_t retaddr)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 11/20] s390x/tcg: Implement 64 bit for VECTOR FP LOAD LENGTHENED
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (9 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 10/20] s390x/tcg: Implement 32/128 bit for VECTOR LOAD FP INTEGER David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-10-01 16:19   ` Richard Henderson
  2020-09-30 14:55 ` [PATCH v1 12/20] s390x/tcg: Implement 128 bit for VECTOR FP LOAD ROUNDED David Hildenbrand
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

64 bit -> 128 bit, there is only a single final element.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/translate_vx.c.inc | 21 ++++++++++++++++-----
 target/s390x/vec_fpu_helper.c   | 13 +++++++++++++
 3 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index ae9f855b05..e643672ec4 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -308,6 +308,7 @@ DEF_HELPER_FLAGS_4(gvec_vfi64s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfi128, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfll32, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfll32s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
+DEF_HELPER_FLAGS_4(gvec_vfll64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vflr64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vflr64s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfm32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index f6aed65ff5..ff697f3470 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2807,16 +2807,27 @@ static DisasJumpType op_vfll(DisasContext *s, DisasOps *o)
 {
     const uint8_t fpf = get_field(s, m3);
     const uint8_t m4 = get_field(s, m4);
-    gen_helper_gvec_2_ptr *fn = gen_helper_gvec_vfll32;
+    const bool se = extract32(m4, 3, 1);
+    gen_helper_gvec_2_ptr *fn = NULL;
 
-    if (fpf != FPF_SHORT || extract32(m4, 0, 3)) {
+    switch (fpf) {
+    case FPF_SHORT:
+        fn = se ? gen_helper_gvec_vfll32s : gen_helper_gvec_vfll32;
+        break;
+    case FPF_LONG:
+        if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+            fn = gen_helper_gvec_vfll64;
+        }
+        break;
+    default:
+        break;
+    }
+
+    if (!fn || extract32(m4, 0, 3)) {
         gen_program_exception(s, PGM_SPECIFICATION);
         return DISAS_NORETURN;
     }
 
-    if (extract32(m4, 3, 1)) {
-        fn = gen_helper_gvec_vfll32s;
-    }
     gen_gvec_2_ptr(get_field(s, v1), get_field(s, v2), cpu_env,
                    0, fn);
     return DISAS_NEXT;
diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
index 9bc7f5c8d7..5ded2ccbcd 100644
--- a/target/s390x/vec_fpu_helper.c
+++ b/target/s390x/vec_fpu_helper.c
@@ -602,6 +602,19 @@ void HELPER(gvec_vfll32s)(void *v1, const void *v2, CPUS390XState *env,
     vfll32(v1, v2, env, true, GETPC());
 }
 
+void HELPER(gvec_vfll64)(void *v1, const void *v2, CPUS390XState *env,
+                         uint32_t desc)
+{
+    /* load from even element */
+    float128 ret = float64_to_float128(s390_vec_read_float64(v2, 0),
+                                       &env->fpu_status);
+    uint8_t vxc, vec_exc = 0;
+
+    vxc = check_ieee_exc(env, 0, false, &vec_exc);
+    handle_ieee_exc(env, vxc, vec_exc, GETPC());
+    s390_vec_write_float128(v1, 0, ret);
+}
+
 static void vflr64(S390Vector *v1, const S390Vector *v2, CPUS390XState *env,
                    bool s, bool XxC, uint8_t erm, uintptr_t retaddr)
 {
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 12/20] s390x/tcg: Implement 128 bit for VECTOR FP LOAD ROUNDED
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (10 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 11/20] s390x/tcg: Implement 64 bit for VECTOR FP LOAD LENGTHENED David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-10-01 16:21   ` Richard Henderson
  2020-09-30 14:55 ` [PATCH v1 13/20] s390x/tcg: Implement 32/128 bit for VECTOR FP PERFORM SIGN OPERATION David Hildenbrand
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

128 bit -> 64 bit, there is only a single element to process.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  1 +
 target/s390x/translate_vx.c.inc | 11 ++++++++++-
 target/s390x/vec_fpu_helper.c   | 19 +++++++++++++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index e643672ec4..79e3fa14f8 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -311,6 +311,7 @@ DEF_HELPER_FLAGS_4(gvec_vfll32s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfll64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vflr64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vflr64s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
+DEF_HELPER_FLAGS_4(gvec_vflr128, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfm32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfm32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfm64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index ff697f3470..0b21e8789f 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2785,8 +2785,17 @@ static DisasJumpType op_vcdg(DisasContext *s, DisasOps *o)
         }
         break;
     case 0xc5:
-        if (fpf == FPF_LONG) {
+        switch (fpf) {
+        case FPF_LONG:
             fn = se ? gen_helper_gvec_vflr64s : gen_helper_gvec_vflr64;
+            break;
+        case FPF_EXT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = gen_helper_gvec_vflr128;
+            }
+            break;
+        default:
+            break;
         }
         break;
     default:
diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
index 5ded2ccbcd..f8ebd04516 100644
--- a/target/s390x/vec_fpu_helper.c
+++ b/target/s390x/vec_fpu_helper.c
@@ -658,6 +658,25 @@ void HELPER(gvec_vflr64s)(void *v1, const void *v2, CPUS390XState *env,
     vflr64(v1, v2, env, true, XxC, erm, GETPC());
 }
 
+void HELPER(gvec_vflr128)(void *v1, const void *v2, CPUS390XState *env,
+                          uint32_t desc)
+{
+    const uint8_t erm = extract32(simd_data(desc), 4, 4);
+    const bool XxC = extract32(simd_data(desc), 2, 1);
+    uint8_t vxc, vec_exc = 0;
+    int old_mode;
+    float64 ret;
+
+    old_mode = s390_swap_bfp_rounding_mode(env, erm);
+    ret = float128_to_float64(s390_vec_read_float128(v2, 0), &env->fpu_status);
+    vxc = check_ieee_exc(env, 0, XxC, &vec_exc);
+    s390_restore_bfp_rounding_mode(env, old_mode);
+    handle_ieee_exc(env, vxc, vec_exc, GETPC());
+
+    /* place at even element, odd element is unpredictable */
+    s390_vec_write_float64(v1, 0, ret);
+}
+
 #define DEF_GVEC_FVM(BITS)                                                     \
 void HELPER(gvec_vfm##BITS)(void *v1, const void *v2, const void *v3,          \
                             CPUS390XState *env, uint32_t desc)                 \
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 13/20] s390x/tcg: Implement 32/128 bit for VECTOR FP PERFORM SIGN OPERATION
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (11 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 12/20] s390x/tcg: Implement 128 bit for VECTOR FP LOAD ROUNDED David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-10-01 16:24   ` Richard Henderson
  2020-09-30 14:55 ` [PATCH v1 14/20] s390x/tcg: Implement 32/128 bit for VECTOR FP SQUARE ROOT David Hildenbrand
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/translate_vx.c.inc | 100 +++++++++++++++++++++-----------
 1 file changed, 67 insertions(+), 33 deletions(-)

diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index 0b21e8789f..ee79d97e19 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2872,48 +2872,82 @@ static DisasJumpType op_vfpso(DisasContext *s, DisasOps *o)
     const uint8_t fpf = get_field(s, m3);
     const uint8_t m4 = get_field(s, m4);
     const uint8_t m5 = get_field(s, m5);
+    const bool se = extract32(m4, 3, 1);
     TCGv_i64 tmp;
 
-    if (fpf != FPF_LONG || extract32(m4, 0, 3) || m5 > 2) {
+    if ((fpf != FPF_LONG && !s390_has_feat(S390_FEAT_VECTOR_ENH)) ||
+        extract32(m4, 0, 3) || m5 > 2) {
         gen_program_exception(s, PGM_SPECIFICATION);
         return DISAS_NORETURN;
     }
 
-    if (extract32(m4, 3, 1)) {
-        tmp = tcg_temp_new_i64();
-        read_vec_element_i64(tmp, v2, 0, ES_64);
-        switch (m5) {
-        case 0:
-            /* sign bit is inverted (complement) */
-            tcg_gen_xori_i64(tmp, tmp, 1ull << 63);
-            break;
-        case 1:
-            /* sign bit is set to one (negative) */
-            tcg_gen_ori_i64(tmp, tmp, 1ull << 63);
-            break;
-        case 2:
-            /* sign bit is set to zero (positive) */
-            tcg_gen_andi_i64(tmp, tmp, (1ull << 63) - 1);
-            break;
+    switch (fpf) {
+    case FPF_SHORT:
+        if (!se) {
+            switch (m5) {
+            case 0:
+                /* sign bit is inverted (complement) */
+                gen_gvec_fn_2i(xori, ES_32, v1, v2, 1ull << 31);
+                break;
+            case 1:
+                /* sign bit is set to one (negative) */
+                gen_gvec_fn_2i(ori, ES_32, v1, v2, 1ull << 31);
+                break;
+            case 2:
+                /* sign bit is set to zero (positive) */
+                gen_gvec_fn_2i(andi, ES_32, v1, v2, (1ull << 31) - 1);
+                break;
+            }
+            return DISAS_NEXT;
         }
-        write_vec_element_i64(tmp, v1, 0, ES_64);
-        tcg_temp_free_i64(tmp);
-    } else {
-        switch (m5) {
-        case 0:
-            /* sign bit is inverted (complement) */
-            gen_gvec_fn_2i(xori, ES_64, v1, v2, 1ull << 63);
-            break;
-        case 1:
-            /* sign bit is set to one (negative) */
-            gen_gvec_fn_2i(ori, ES_64, v1, v2, 1ull << 63);
-            break;
-        case 2:
-            /* sign bit is set to zero (positive) */
-            gen_gvec_fn_2i(andi, ES_64, v1, v2, (1ull << 63) - 1);
-            break;
+        break;
+    case FPF_LONG:
+        if (!se) {
+            switch (m5) {
+            case 0:
+                /* sign bit is inverted (complement) */
+                gen_gvec_fn_2i(xori, ES_64, v1, v2, 1ull << 63);
+                break;
+            case 1:
+                /* sign bit is set to one (negative) */
+                gen_gvec_fn_2i(ori, ES_64, v1, v2, 1ull << 63);
+                break;
+            case 2:
+                /* sign bit is set to zero (positive) */
+                gen_gvec_fn_2i(andi, ES_64, v1, v2, (1ull << 63) - 1);
+                break;
+            }
+            return DISAS_NEXT;
         }
+        break;
+    case FPF_EXT:
+        /* Only a single element. */
+        break;
+    default:
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    /* With a single element, we are only interested in bit 0. */
+    tmp = tcg_temp_new_i64();
+    read_vec_element_i64(tmp, v2, 0, ES_64);
+    switch (m5) {
+    case 0:
+        /* sign bit is inverted (complement) */
+        tcg_gen_xori_i64(tmp, tmp, 1ull << 63);
+        break;
+    case 1:
+        /* sign bit is set to one (negative) */
+        tcg_gen_ori_i64(tmp, tmp, 1ull << 63);
+        break;
+    case 2:
+        /* sign bit is set to zero (positive) */
+        tcg_gen_andi_i64(tmp, tmp, (1ull << 63) - 1);
+        break;
     }
+    write_vec_element_i64(tmp, v1, 0, ES_64);
+    tcg_temp_free_i64(tmp);
+
     return DISAS_NEXT;
 }
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 14/20] s390x/tcg: Implement 32/128 bit for VECTOR FP SQUARE ROOT
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (12 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 13/20] s390x/tcg: Implement 32/128 bit for VECTOR FP PERFORM SIGN OPERATION David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-09-30 14:55 ` [PATCH v1 15/20] s390x/tcg: Implement 32/128 bit for VECTOR FP TEST DATA CLASS IMMEDIATE David Hildenbrand
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  3 +++
 target/s390x/translate_vx.c.inc | 26 +++++++++++++++++++++-----
 target/s390x/vec_fpu_helper.c   | 28 +++++++++++++++-------------
 3 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 79e3fa14f8..bee283e3d4 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -321,8 +321,11 @@ DEF_HELPER_FLAGS_6(gvec_vfma64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env
 DEF_HELPER_FLAGS_6(gvec_vfma64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_6(gvec_vfms64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_6(gvec_vfms64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_4(gvec_vfsq32, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
+DEF_HELPER_FLAGS_4(gvec_vfsq32s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfsq64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfsq64s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
+DEF_HELPER_FLAGS_4(gvec_vfsq128, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfs32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfs32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfs64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index ee79d97e19..7d4811ccf7 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2955,16 +2955,32 @@ static DisasJumpType op_vfsq(DisasContext *s, DisasOps *o)
 {
     const uint8_t fpf = get_field(s, m3);
     const uint8_t m4 = get_field(s, m4);
-    gen_helper_gvec_2_ptr *fn = gen_helper_gvec_vfsq64;
+    const bool se = extract32(m4, 3, 1);
+    gen_helper_gvec_2_ptr *fn = NULL;
 
-    if (fpf != FPF_LONG || extract32(m4, 0, 3)) {
+    switch (fpf) {
+    case FPF_SHORT:
+        if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+            fn = se ? gen_helper_gvec_vfsq32s : gen_helper_gvec_vfsq32;
+        }
+        break;
+    case FPF_LONG:
+        fn = se ? gen_helper_gvec_vfsq64s : gen_helper_gvec_vfsq64;
+        break;
+    case FPF_EXT:
+        if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+            fn = gen_helper_gvec_vfsq128;
+        }
+        break;
+    default:
+        break;
+    }
+
+    if (!fn || extract32(m4, 0, 3)) {
         gen_program_exception(s, PGM_SPECIFICATION);
         return DISAS_NORETURN;
     }
 
-    if (extract32(m4, 3, 1)) {
-        fn = gen_helper_gvec_vfsq64s;
-    }
     gen_gvec_2_ptr(get_field(s, v1), get_field(s, v2), cpu_env,
                    0, fn);
     return DISAS_NEXT;
diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
index f8ebd04516..b7045e85d6 100644
--- a/target/s390x/vec_fpu_helper.c
+++ b/target/s390x/vec_fpu_helper.c
@@ -744,22 +744,24 @@ void HELPER(gvec_vfms64s)(void *v1, const void *v2, const void *v3,
     vfma64(v1, v2, v3, v4, env, true, float_muladd_negate_c, GETPC());
 }
 
-static uint64_t vfsq64(uint64_t a, float_status *s)
-{
-    return float64_sqrt(a, s);
-}
-
-void HELPER(gvec_vfsq64)(void *v1, const void *v2, CPUS390XState *env,
-                         uint32_t desc)
-{
-    vop64_2(v1, v2, env, false, false, 0, vfsq64, GETPC());
+#define DEF_GVEC_VFSQ(BITS)                                                    \
+void HELPER(gvec_vfsq##BITS)(void *v1, const void *v2, CPUS390XState *env,     \
+                             uint32_t desc)                                    \
+{                                                                              \
+    vop##BITS##_2(v1, v2, env, false, false, 0, float##BITS##_sqrt, GETPC());  \
 }
+DEF_GVEC_VFSQ(32)
+DEF_GVEC_VFSQ(64)
+DEF_GVEC_VFSQ(128)
 
-void HELPER(gvec_vfsq64s)(void *v1, const void *v2, CPUS390XState *env,
-                          uint32_t desc)
-{
-    vop64_2(v1, v2, env, true, false, 0, vfsq64, GETPC());
+#define DEF_GVEC_VFSQ_S(BITS)                                                  \
+void HELPER(gvec_vfsq##BITS##s)(void *v1, const void *v2, CPUS390XState *env,  \
+                                uint32_t desc)                                 \
+{                                                                              \
+    vop##BITS##_2(v1, v2, env, true, false, 0, float##BITS##_sqrt, GETPC());   \
 }
+DEF_GVEC_VFSQ_S(32)
+DEF_GVEC_VFSQ_S(64)
 
 #define DEF_GVEC_FVS(BITS)                                                     \
 void HELPER(gvec_vfs##BITS)(void *v1, const void *v2, const void *v3,          \
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 15/20] s390x/tcg: Implement 32/128 bit for VECTOR FP TEST DATA CLASS IMMEDIATE
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (13 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 14/20] s390x/tcg: Implement 32/128 bit for VECTOR FP SQUARE ROOT David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-10-01 16:30   ` Richard Henderson
  2020-09-30 14:55 ` [PATCH v1 16/20] s390x/tcg: Implement 32/128bit for VECTOR FP MULTIPLY AND (ADD|SUBTRACT) David Hildenbrand
                   ` (7 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  3 ++
 target/s390x/translate_vx.c.inc | 26 ++++++++---
 target/s390x/vec_fpu_helper.c   | 76 +++++++++++++++++++--------------
 3 files changed, 69 insertions(+), 36 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index bee283e3d4..c2ded83669 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -331,8 +331,11 @@ DEF_HELPER_FLAGS_5(gvec_vfs32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfs64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfs64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfs128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_4(gvec_vftci32, void, ptr, cptr, env, i32)
+DEF_HELPER_4(gvec_vftci32s, void, ptr, cptr, env, i32)
 DEF_HELPER_4(gvec_vftci64, void, ptr, cptr, env, i32)
 DEF_HELPER_4(gvec_vftci64s, void, ptr, cptr, env, i32)
+DEF_HELPER_4(gvec_vftci128, void, ptr, cptr, env, i32)
 
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_3(servc, i32, env, i64, i64)
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index 7d4811ccf7..6bd599b319 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2991,16 +2991,32 @@ static DisasJumpType op_vftci(DisasContext *s, DisasOps *o)
     const uint16_t i3 = get_field(s, i3);
     const uint8_t fpf = get_field(s, m4);
     const uint8_t m5 = get_field(s, m5);
-    gen_helper_gvec_2_ptr *fn = gen_helper_gvec_vftci64;
+    const bool se = extract32(m5, 3, 1);
+    gen_helper_gvec_2_ptr *fn = NULL;
 
-    if (fpf != FPF_LONG || extract32(m5, 0, 3)) {
+    switch (fpf) {
+    case FPF_SHORT:
+        if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+            fn = se ? gen_helper_gvec_vftci32s : gen_helper_gvec_vftci32;
+        }
+        break;
+    case FPF_LONG:
+        fn = se ? gen_helper_gvec_vftci64s : gen_helper_gvec_vftci64;
+        break;
+    case FPF_EXT:
+        if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+            fn = gen_helper_gvec_vftci128;
+        }
+        break;
+    default:
+        break;
+    }
+
+    if (!fn || extract32(m5, 0, 3)) {
         gen_program_exception(s, PGM_SPECIFICATION);
         return DISAS_NORETURN;
     }
 
-    if (extract32(m5, 3, 1)) {
-        fn = gen_helper_gvec_vftci64s;
-    }
     gen_gvec_2_ptr(get_field(s, v1), get_field(s, v2), cpu_env, i3, fn);
     set_cc_static(s);
     return DISAS_NEXT;
diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
index b7045e85d6..f18f0ae8e2 100644
--- a/target/s390x/vec_fpu_helper.c
+++ b/target/s390x/vec_fpu_helper.c
@@ -23,6 +23,9 @@
 const float32 float32_ones = make_float32(-1u);
 const float64 float64_ones = make_float64(-1ull);
 const float128 float128_ones = make_float128(-1ull, -1ull);
+const float32 float32_zeroes = make_float32(0);
+const float64 float64_zeroes = make_float64(0);
+const float128 float128_zeroes = make_float128(0, 0);
 
 #define VIC_INVALID         0x1
 #define VIC_DIVBYZERO       0x2
@@ -782,39 +785,50 @@ void HELPER(gvec_vfs##BITS##s)(void *v1, const void *v2, const void *v3,       \
 DEF_GVEC_FVS_S(32)
 DEF_GVEC_FVS_S(64)
 
-static int vftci64(S390Vector *v1, const S390Vector *v2, CPUS390XState *env,
-                   bool s, uint16_t i3)
-{
-    int i, match = 0;
-
-    for (i = 0; i < 2; i++) {
-        float64 a = s390_vec_read_element64(v2, i);
-
-        if (float64_dcmask(env, a) & i3) {
-            match++;
-            s390_vec_write_element64(v1, i, -1ull);
-        } else {
-            s390_vec_write_element64(v1, i, 0);
-        }
-        if (s) {
-            break;
-        }
-    }
-
-    if (match) {
-        return s || match == 2 ? 0 : 1;
-    }
-    return 3;
+#define DEF_VFTCI(BITS)                                                        \
+static int vftci##BITS(S390Vector *v1, const S390Vector *v2,                   \
+                       CPUS390XState *env, bool s, uint16_t i3)                \
+{                                                                              \
+    int i, match = 0;                                                          \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        float##BITS a = s390_vec_read_float##BITS(v2, i);                      \
+                                                                               \
+        if (float##BITS##_dcmask(env, a) & i3) {                               \
+            match++;                                                           \
+            s390_vec_write_float##BITS(v1, i, float##BITS##_ones);             \
+        } else {                                                               \
+            s390_vec_write_float##BITS(v1, i, float##BITS##_zeroes);           \
+        }                                                                      \
+        if (s) {                                                               \
+            break;                                                             \
+        }                                                                      \
+    }                                                                          \
+                                                                               \
+    if (match) {                                                               \
+        return s || match == (128 / BITS) ? 0 : 1;                             \
+    }                                                                          \
+    return 3;                                                                  \
 }
+DEF_VFTCI(32)
+DEF_VFTCI(64)
+DEF_VFTCI(128)
 
-void HELPER(gvec_vftci64)(void *v1, const void *v2, CPUS390XState *env,
-                          uint32_t desc)
-{
-    env->cc_op = vftci64(v1, v2, env, false, simd_data(desc));
+#define DEF_GVEC_VFTCI(BITS)                                                   \
+void HELPER(gvec_vftci##BITS)(void *v1, const void *v2, CPUS390XState *env,    \
+                              uint32_t desc)                                   \
+{                                                                              \
+    env->cc_op = vftci##BITS(v1, v2, env, false, simd_data(desc));             \
 }
+DEF_GVEC_VFTCI(32)
+DEF_GVEC_VFTCI(64)
+DEF_GVEC_VFTCI(128)
 
-void HELPER(gvec_vftci64s)(void *v1, const void *v2, CPUS390XState *env,
-                           uint32_t desc)
-{
-    env->cc_op = vftci64(v1, v2, env, true, simd_data(desc));
+#define DEF_GVEC_VFTCI_S(BITS)                                                 \
+void HELPER(gvec_vftci##BITS##s)(void *v1, const void *v2, CPUS390XState *env, \
+                                 uint32_t desc)                                \
+{                                                                              \
+    env->cc_op = vftci##BITS(v1, v2, env, true, simd_data(desc));              \
 }
+DEF_GVEC_VFTCI_S(32)
+DEF_GVEC_VFTCI_S(64)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 16/20] s390x/tcg: Implement 32/128bit for VECTOR FP MULTIPLY AND (ADD|SUBTRACT)
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (14 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 15/20] s390x/tcg: Implement 32/128 bit for VECTOR FP TEST DATA CLASS IMMEDIATE David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-09-30 14:55 ` [PATCH v1 17/20] s390x/tcg: Implement VECTOR FP NEGATIVE " David Hildenbrand
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  6 ++
 target/s390x/translate_vx.c.inc | 48 +++++++++++++---
 target/s390x/vec_fpu_helper.c   | 98 ++++++++++++++++++++-------------
 3 files changed, 107 insertions(+), 45 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index c2ded83669..e4d60299dc 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -317,10 +317,16 @@ DEF_HELPER_FLAGS_5(gvec_vfm32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfm64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfm64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfm128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfma32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfma32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_6(gvec_vfma64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_6(gvec_vfma64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfma128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfms32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfms32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_6(gvec_vfms64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_6(gvec_vfms64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfms128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfsq32, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfsq32s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfsq64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index 6bd599b319..5d31498cc1 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2847,18 +2847,52 @@ static DisasJumpType op_vfma(DisasContext *s, DisasOps *o)
     const uint8_t m5 = get_field(s, m5);
     const uint8_t fpf = get_field(s, m6);
     const bool se = extract32(m5, 3, 1);
-    gen_helper_gvec_4_ptr *fn;
+    gen_helper_gvec_4_ptr *fn = NULL;
 
-    if (fpf != FPF_LONG || extract32(m5, 0, 3)) {
+    switch (s->fields.op2) {
+    case 0x8f:
+        switch (fpf) {
+        case FPF_SHORT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = se ? gen_helper_gvec_vfma32s : gen_helper_gvec_vfma32;
+            }
+            break;
+        case FPF_LONG:
+            fn = se ? gen_helper_gvec_vfma64s : gen_helper_gvec_vfma64;
+            break;
+        default:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = gen_helper_gvec_vfma128;
+            }
+            break;
+        }
+        break;
+    case 0x8e:
+        switch (fpf) {
+        case FPF_SHORT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = se ? gen_helper_gvec_vfms32s : gen_helper_gvec_vfms32;
+            }
+            break;
+        case FPF_LONG:
+            fn = se ? gen_helper_gvec_vfms64s : gen_helper_gvec_vfms64;
+            break;
+        default:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
+                fn = gen_helper_gvec_vfms128;
+            }
+            break;
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    if (!fn || extract32(m5, 0, 3)) {
         gen_program_exception(s, PGM_SPECIFICATION);
         return DISAS_NORETURN;
     }
 
-    if (s->fields.op2 == 0x8f) {
-        fn = se ? gen_helper_gvec_vfma64s : gen_helper_gvec_vfma64;
-    } else {
-        fn = se ? gen_helper_gvec_vfms64s : gen_helper_gvec_vfms64;
-    }
     gen_gvec_4_ptr(get_field(s, v1), get_field(s, v2),
                    get_field(s, v3), get_field(s, v4), cpu_env,
                    0, fn);
diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
index f18f0ae8e2..0b25718365 100644
--- a/target/s390x/vec_fpu_helper.c
+++ b/target/s390x/vec_fpu_helper.c
@@ -699,53 +699,75 @@ void HELPER(gvec_vfm##BITS##s)(void *v1, const void *v2, const void *v3,       \
 DEF_GVEC_FVM_S(32)
 DEF_GVEC_FVM_S(64)
 
-static void vfma64(S390Vector *v1, const S390Vector *v2, const S390Vector *v3,
-                   const S390Vector *v4, CPUS390XState *env, bool s, int flags,
-                   uintptr_t retaddr)
-{
-    uint8_t vxc, vec_exc = 0;
-    S390Vector tmp = {};
-    int i;
-
-    for (i = 0; i < 2; i++) {
-        const uint64_t a = s390_vec_read_element64(v2, i);
-        const uint64_t b = s390_vec_read_element64(v3, i);
-        const uint64_t c = s390_vec_read_element64(v4, i);
-        uint64_t ret = float64_muladd(a, b, c, flags, &env->fpu_status);
-
-        s390_vec_write_element64(&tmp, i, ret);
-        vxc = check_ieee_exc(env, i, false, &vec_exc);
-        if (s || vxc) {
-            break;
-        }
-    }
-    handle_ieee_exc(env, vxc, vec_exc, retaddr);
-    *v1 = tmp;
+#define DEF_VFMA(BITS)                                                         \
+static void vfma##BITS(S390Vector *v1, const S390Vector *v2,                   \
+                       const S390Vector *v3, const S390Vector *v4,             \
+                       CPUS390XState *env, bool s, int flags,                  \
+                       uintptr_t retaddr)                                      \
+{                                                                              \
+    uint8_t vxc, vec_exc = 0;                                                  \
+    S390Vector tmp = {};                                                       \
+    int i;                                                                     \
+                                                                               \
+    for (i = 0; i < (128 / BITS); i++) {                                       \
+        const float##BITS a = s390_vec_read_float##BITS(v2, i);                \
+        const float##BITS b = s390_vec_read_float##BITS(v3, i);                \
+        const float##BITS c = s390_vec_read_float##BITS(v4, i);                \
+        float##BITS ret = float##BITS##_muladd(a, b, c, flags,                 \
+                                               &env->fpu_status);              \
+                                                                               \
+        s390_vec_write_float##BITS(&tmp, i, ret);                              \
+        vxc = check_ieee_exc(env, i, false, &vec_exc);                         \
+        if (s || vxc) {                                                        \
+            break;                                                             \
+        }                                                                      \
+    }                                                                          \
+    handle_ieee_exc(env, vxc, vec_exc, retaddr);                               \
+    *v1 = tmp;                                                                 \
 }
+DEF_VFMA(32)
+DEF_VFMA(64)
+DEF_VFMA(128)
 
-void HELPER(gvec_vfma64)(void *v1, const void *v2, const void *v3,
-                         const void *v4, CPUS390XState *env, uint32_t desc)
-{
-    vfma64(v1, v2, v3, v4, env, false, 0, GETPC());
+#define DEF_GVEC_VFMA(BITS)                                                    \
+void HELPER(gvec_vfma##BITS)(void *v1, const void *v2, const void *v3,         \
+                             const void *v4, CPUS390XState *env, uint32_t desc)\
+{                                                                              \
+    vfma##BITS(v1, v2, v3, v4, env, false, 0, GETPC());                        \
 }
+DEF_GVEC_VFMA(32)
+DEF_GVEC_VFMA(64)
+DEF_GVEC_VFMA(128)
 
-void HELPER(gvec_vfma64s)(void *v1, const void *v2, const void *v3,
-                         const void *v4, CPUS390XState *env, uint32_t desc)
-{
-    vfma64(v1, v2, v3, v4, env, true, 0, GETPC());
+#define DEF_GVEC_VFMA_S(BITS)                                                  \
+void HELPER(gvec_vfma##BITS##s)(void *v1, const void *v2, const void *v3,      \
+                                const void *v4, CPUS390XState *env,            \
+                                uint32_t desc)                                 \
+{                                                                              \
+    vfma##BITS(v1, v2, v3, v4, env, true, 0, GETPC());                         \
 }
+DEF_GVEC_VFMA_S(32)
+DEF_GVEC_VFMA_S(64)
 
-void HELPER(gvec_vfms64)(void *v1, const void *v2, const void *v3,
-                         const void *v4, CPUS390XState *env, uint32_t desc)
-{
-    vfma64(v1, v2, v3, v4, env, false, float_muladd_negate_c, GETPC());
+#define DEF_GVEC_VFMS(BITS)                                                    \
+void HELPER(gvec_vfms##BITS)(void *v1, const void *v2, const void *v3,         \
+                             const void *v4, CPUS390XState *env, uint32_t desc)\
+{                                                                              \
+    vfma##BITS(v1, v2, v3, v4, env, false, float_muladd_negate_c, GETPC());    \
 }
+DEF_GVEC_VFMS(32)
+DEF_GVEC_VFMS(64)
+DEF_GVEC_VFMS(128)
 
-void HELPER(gvec_vfms64s)(void *v1, const void *v2, const void *v3,
-                         const void *v4, CPUS390XState *env, uint32_t desc)
-{
-    vfma64(v1, v2, v3, v4, env, true, float_muladd_negate_c, GETPC());
+#define DEF_GVEC_VFMS_S(BITS)                                                  \
+void HELPER(gvec_vfms##BITS##s)(void *v1, const void *v2, const void *v3,      \
+                                const void *v4, CPUS390XState *env,            \
+                                uint32_t desc)                                 \
+{                                                                              \
+    vfma##BITS(v1, v2, v3, v4, env, true, float_muladd_negate_c, GETPC());     \
 }
+DEF_GVEC_VFMS_S(32)
+DEF_GVEC_VFMS_S(64)
 
 #define DEF_GVEC_VFSQ(BITS)                                                    \
 void HELPER(gvec_vfsq##BITS)(void *v1, const void *v2, CPUS390XState *env,     \
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 17/20] s390x/tcg: Implement VECTOR FP NEGATIVE MULTIPLY AND (ADD|SUBTRACT)
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (15 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 16/20] s390x/tcg: Implement 32/128bit for VECTOR FP MULTIPLY AND (ADD|SUBTRACT) David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-09-30 14:55 ` [PATCH v1 18/20] s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM) David Hildenbrand
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           | 10 ++++++++
 target/s390x/insn-data.def      |  4 +++
 target/s390x/translate_vx.c.inc | 26 +++++++++++++++++++
 target/s390x/vec_fpu_helper.c   | 45 +++++++++++++++++++++++++++++++++
 4 files changed, 85 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index e4d60299dc..6b4a6c5185 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -327,6 +327,16 @@ DEF_HELPER_FLAGS_6(gvec_vfms32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, en
 DEF_HELPER_FLAGS_6(gvec_vfms64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_6(gvec_vfms64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_6(gvec_vfms128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfnma32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfnma32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfnma64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfnma64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfnma128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfnms32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfnms32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfnms64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfnms64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_6(gvec_vfnms128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfsq32, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfsq32s, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vfsq64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index da7fe6f21c..082de27298 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1259,6 +1259,10 @@
     F(0xe78f, VFMA,    VRR_e, V,   0, 0, 0, 0, vfma, 0, IF_VEC)
 /* VECTOR FP MULTIPLY AND SUBTRACT */
     F(0xe78e, VFMS,    VRR_e, V,   0, 0, 0, 0, vfma, 0, IF_VEC)
+/* VECTOR FP NEGATIVE MULTIPLY AND ADD */
+    F(0xe79f, VFNMA,   VRR_e, VE,  0, 0, 0, 0, vfma, 0, IF_VEC)
+/* VECTOR FP NEGATIVE MULTIPLY AND SUBTRACT */
+    F(0xe79e, VFNMS,   VRR_e, VE,   0, 0, 0, 0, vfma, 0, IF_VEC)
 /* VECTOR FP PERFORM SIGN OPERATION */
     F(0xe7cc, VFPSO,   VRR_a, V,   0, 0, 0, 0, vfpso, 0, IF_VEC)
 /* VECTOR FP SQUARE ROOT */
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index 5d31498cc1..40e452f552 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2884,6 +2884,32 @@ static DisasJumpType op_vfma(DisasContext *s, DisasOps *o)
             break;
         }
         break;
+    case 0x9f:
+        switch (fpf) {
+        case FPF_SHORT:
+            fn = se ? gen_helper_gvec_vfnma32s : gen_helper_gvec_vfnma32;
+            break;
+        case FPF_LONG:
+            fn = se ? gen_helper_gvec_vfnma64s : gen_helper_gvec_vfnma64;
+            break;
+        default:
+            fn = gen_helper_gvec_vfnma128;
+            break;
+        }
+        break;
+    case 0x9e:
+        switch (fpf) {
+        case FPF_SHORT:
+            fn = se ? gen_helper_gvec_vfnms32s : gen_helper_gvec_vfnms32;
+            break;
+        case FPF_LONG:
+            fn = se ? gen_helper_gvec_vfnms64s : gen_helper_gvec_vfnms64;
+            break;
+        default:
+            fn = gen_helper_gvec_vfnms128;
+            break;
+        }
+        break;
     default:
         g_assert_not_reached();
     }
diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
index 0b25718365..92858c8c59 100644
--- a/target/s390x/vec_fpu_helper.c
+++ b/target/s390x/vec_fpu_helper.c
@@ -769,6 +769,51 @@ void HELPER(gvec_vfms##BITS##s)(void *v1, const void *v2, const void *v3,      \
 DEF_GVEC_VFMS_S(32)
 DEF_GVEC_VFMS_S(64)
 
+#define DEF_GVEC_VFNMA(BITS)                                                   \
+void HELPER(gvec_vfnma##BITS)(void *v1, const void *v2, const void *v3,        \
+                              const void *v4, CPUS390XState *env,              \
+                              uint32_t desc)                                   \
+{                                                                              \
+    vfma##BITS(v1, v2, v3, v4, env, false, float_muladd_negate_result,         \
+               GETPC());                                                       \
+}
+DEF_GVEC_VFNMA(32)
+DEF_GVEC_VFNMA(64)
+DEF_GVEC_VFNMA(128)
+
+#define DEF_GVEC_VFNMA_S(BITS)                                                 \
+void HELPER(gvec_vfnma##BITS##s)(void *v1, const void *v2, const void *v3,     \
+                                 const void *v4, CPUS390XState *env,           \
+                                 uint32_t desc)                                \
+{                                                                              \
+    vfma##BITS(v1, v2, v3, v4, env, true, float_muladd_negate_result, GETPC());\
+}
+DEF_GVEC_VFNMA_S(32)
+DEF_GVEC_VFNMA_S(64)
+
+#define DEF_GVEC_VFNMS(BITS)                                                   \
+void HELPER(gvec_vfnms##BITS)(void *v1, const void *v2, const void *v3,        \
+                              const void *v4, CPUS390XState *env,              \
+                              uint32_t desc)                                   \
+{                                                                              \
+    vfma##BITS(v1, v2, v3, v4, env, false,                                     \
+               float_muladd_negate_c | float_muladd_negate_result, GETPC());   \
+}
+DEF_GVEC_VFNMS(32)
+DEF_GVEC_VFNMS(64)
+DEF_GVEC_VFNMS(128)
+
+#define DEF_GVEC_VFNMS_S(BITS)                                                 \
+void HELPER(gvec_vfnms##BITS##s)(void *v1, const void *v2, const void *v3,     \
+                                 const void *v4, CPUS390XState *env,           \
+                                 uint32_t desc)                                \
+{                                                                              \
+    vfma##BITS(v1, v2, v3, v4, env, true,                                      \
+               float_muladd_negate_c | float_muladd_negate_result, GETPC());   \
+}
+DEF_GVEC_VFNMS_S(32)
+DEF_GVEC_VFNMS_S(64)
+
 #define DEF_GVEC_VFSQ(BITS)                                                    \
 void HELPER(gvec_vfsq##BITS)(void *v1, const void *v2, CPUS390XState *env,     \
                              uint32_t desc)                                    \
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 18/20] s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM)
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (16 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 17/20] s390x/tcg: Implement VECTOR FP NEGATIVE " David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-10-01 16:49   ` Richard Henderson
  2020-09-30 14:55 ` [PATCH v1 19/20] s390x/tcg: We support Vector enhancements facility David Hildenbrand
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

For IEEE functions, we can reuse the softfloat implementations. For the
other functions, implement it generically for 32bit/64bit/128bit -
carefully taking care of all weird special cases according to the tables
defined in the PoP.

While we could add plenty of helpers to do the function selection at
translation time, I don't feel like adding 20*(3+2) helpers for that.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h           |  10 ++
 target/s390x/insn-data.def      |   4 +
 target/s390x/translate_vx.c.inc |  44 +++++
 target/s390x/vec_fpu_helper.c   | 300 ++++++++++++++++++++++++++++++++
 4 files changed, 358 insertions(+)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 6b4a6c5185..b2f8ccc60d 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -317,6 +317,16 @@ DEF_HELPER_FLAGS_5(gvec_vfm32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfm64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfm64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfm128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfmax32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfmax32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfmax64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfmax64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfmax128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfmin32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfmin32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfmin64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfmin64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_5(gvec_vfmin128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_6(gvec_vfma32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_6(gvec_vfma32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_6(gvec_vfma64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)
diff --git a/target/s390x/insn-data.def b/target/s390x/insn-data.def
index 082de27298..e9a3fdbc5a 100644
--- a/target/s390x/insn-data.def
+++ b/target/s390x/insn-data.def
@@ -1253,6 +1253,10 @@
     F(0xe7c4, VFLL,    VRR_a, V,   0, 0, 0, 0, vfll, 0, IF_VEC)
 /* VECTOR LOAD ROUNDED */
     F(0xe7c5, VFLR,    VRR_a, V,   0, 0, 0, 0, vcdg, 0, IF_VEC)
+/* VECTOR FP MAXIMUM */
+    F(0xe7ef, VFMAX,   VRR_c, VE,  0, 0, 0, 0, vfmax, 0, IF_VEC)
+/* VECTOR FP MINIMUM */
+    F(0xe7ee, VFMIN,   VRR_c, VE,  0, 0, 0, 0, vfmax, 0, IF_VEC)
 /* VECTOR FP MULTIPLY */
     F(0xe7e7, VFM,     VRR_c, V,   0, 0, 0, 0, vfa, 0, IF_VEC)
 /* VECTOR FP MULTIPLY AND ADD */
diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
index 40e452f552..e2bde426e0 100644
--- a/target/s390x/translate_vx.c.inc
+++ b/target/s390x/translate_vx.c.inc
@@ -2842,6 +2842,50 @@ static DisasJumpType op_vfll(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
+static DisasJumpType op_vfmax(DisasContext *s, DisasOps *o)
+{
+    const bool se = extract32(get_field(s, m5), 3, 1);
+    const uint8_t fpf = get_field(s, m4);
+    const uint8_t m6 = get_field(s, m6);
+    gen_helper_gvec_3_ptr *fn;
+
+    if (m6 == 5 || m6 == 6 || m6 == 7 || m6 > 13) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    switch (fpf) {
+    case FPF_SHORT:
+        if (s->fields.op2 == 0xef) {
+            fn = se ? gen_helper_gvec_vfmax32s : gen_helper_gvec_vfmax32;
+        } else {
+            fn = se ? gen_helper_gvec_vfmin32s : gen_helper_gvec_vfmin32;
+        }
+        break;
+    case FPF_LONG:
+        if (s->fields.op2 == 0xef) {
+            fn = se ? gen_helper_gvec_vfmax64s : gen_helper_gvec_vfmax64;
+        } else {
+            fn = se ? gen_helper_gvec_vfmin64s : gen_helper_gvec_vfmin64;
+        }
+        break;
+    case FPF_EXT:
+        if (s->fields.op2 == 0xef) {
+            fn = gen_helper_gvec_vfmax128;
+        } else {
+            fn = gen_helper_gvec_vfmin128;
+        }
+        break;
+    default:
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec_3_ptr(get_field(s, v1), get_field(s, v2), get_field(s, v3),
+                   cpu_env, m6, fn);
+    return DISAS_NEXT;
+}
+
 static DisasJumpType op_vfma(DisasContext *s, DisasOps *o)
 {
     const uint8_t m5 = get_field(s, m5);
diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
index 92858c8c59..80c6b644bf 100644
--- a/target/s390x/vec_fpu_helper.c
+++ b/target/s390x/vec_fpu_helper.c
@@ -899,3 +899,303 @@ void HELPER(gvec_vftci##BITS##s)(void *v1, const void *v2, CPUS390XState *env, \
 }
 DEF_GVEC_VFTCI_S(32)
 DEF_GVEC_VFTCI_S(64)
+
+typedef enum S390MinMaxType {
+    s390_minmax_java_math_min,
+    s390_minmax_java_math_max,
+    s390_minmax_c_macro_min,
+    s390_minmax_c_macro_max,
+    s390_minmax_fmin,
+    s390_minmax_fmax,
+    s390_minmax_cpp_alg_min,
+    s390_minmax_cpp_alg_max,
+} S390MinMaxType;
+
+#define S390_MINMAX(BITS, TYPE)                                                \
+static float##BITS TYPE##BITS(float##BITS a, float##BITS b, float_status *s)   \
+{                                                                              \
+    const bool zero_a = float##BITS##_is_infinity(a);                          \
+    const bool zero_b = float##BITS##_is_infinity(b);                          \
+    const bool inf_a = float##BITS##_is_infinity(a);                           \
+    const bool inf_b = float##BITS##_is_infinity(b);                           \
+    const bool nan_a = float##BITS##_is_infinity(a);                           \
+    const bool nan_b = float##BITS##_is_infinity(b);                           \
+    const bool neg_a = float##BITS##_is_neg(a);                                \
+    const bool neg_b = float##BITS##_is_neg(b);                                \
+                                                                               \
+    if (unlikely(nan_a || nan_b)) {                                            \
+        const bool sig_a = float##BITS##_is_signaling_nan(a, s);               \
+        const bool sig_b = float##BITS##_is_signaling_nan(b, s);               \
+                                                                               \
+        if (sig_a || sig_b) {                                                  \
+            s->float_exception_flags |= float_flag_invalid;                    \
+        }                                                                      \
+        switch (TYPE) {                                                        \
+        case s390_minmax_java_math_min:                                        \
+        case s390_minmax_java_math_max:                                        \
+            if (sig_a) {                                                       \
+                return float##BITS##_silence_nan(a, s);                        \
+            } else if (sig_b) {                                                \
+                return float##BITS##_silence_nan(b, s);                        \
+            }                                                                  \
+            /* fall through */                                                 \
+        case s390_minmax_fmin:                                                 \
+        case s390_minmax_fmax:                                                 \
+            return nan_a ? a : b;                                              \
+        case s390_minmax_c_macro_min:                                          \
+        case s390_minmax_c_macro_max:                                          \
+            s->float_exception_flags |= float_flag_invalid;                    \
+            return b;                                                          \
+        case s390_minmax_cpp_alg_min:                                          \
+        case s390_minmax_cpp_alg_max:                                          \
+            s->float_exception_flags |= float_flag_invalid;                    \
+            return a;                                                          \
+        default:                                                               \
+            g_assert_not_reached();                                            \
+        }                                                                      \
+    } else if (unlikely(inf_a && inf_b)) {                                     \
+        switch (TYPE) {                                                        \
+        case s390_minmax_java_math_min:                                        \
+            return neg_a && !neg_b ? a : b;                                    \
+        case s390_minmax_java_math_max:                                        \
+        case s390_minmax_fmax:                                                 \
+        case s390_minmax_cpp_alg_max:                                          \
+            return neg_a && !neg_b ? b : a;                                    \
+        case s390_minmax_c_macro_min:                                          \
+        case s390_minmax_cpp_alg_min:                                          \
+            return neg_b ? b : a;                                              \
+        case s390_minmax_c_macro_max:                                          \
+            return !neg_a && neg_b ? a : b;                                    \
+        case s390_minmax_fmin:                                                 \
+            return !neg_a && neg_b ? b : a;                                    \
+        default:                                                               \
+            g_assert_not_reached();                                            \
+        }                                                                      \
+    } else if (unlikely(zero_a && zero_b)) {                                   \
+        switch (TYPE) {                                                        \
+        case s390_minmax_java_math_min:                                        \
+            return neg_a && !neg_b ? a : b;                                    \
+        case s390_minmax_java_math_max:                                        \
+        case s390_minmax_fmax:                                                 \
+            return neg_a && !neg_b ? b : a;                                    \
+        case s390_minmax_c_macro_min:                                          \
+        case s390_minmax_c_macro_max:                                          \
+            return b;                                                          \
+        case s390_minmax_fmin:                                                 \
+            return !neg_a && neg_b ? b : a;                                    \
+        case s390_minmax_cpp_alg_min:                                          \
+        case s390_minmax_cpp_alg_max:                                          \
+            return a;                                                          \
+        default:                                                               \
+            g_assert_not_reached();                                            \
+        }                                                                      \
+    }                                                                          \
+                                                                               \
+    /* We can process all remaining cases using simple comparison. */          \
+    switch (TYPE) {                                                            \
+    case s390_minmax_java_math_min:                                            \
+    case s390_minmax_c_macro_min:                                              \
+    case s390_minmax_fmin:                                                     \
+    case s390_minmax_cpp_alg_min:                                              \
+        if (float##BITS##_le_quiet(a, b, s)) {                                 \
+            return a;                                                          \
+        }                                                                      \
+        return b;                                                              \
+    case s390_minmax_java_math_max:                                            \
+    case s390_minmax_c_macro_max:                                              \
+    case s390_minmax_fmax:                                                     \
+    case s390_minmax_cpp_alg_max:                                              \
+        if (float##BITS##_le_quiet(a, b, s)) {                                 \
+            return b;                                                          \
+        }                                                                      \
+        return a;                                                              \
+    default:                                                                   \
+        g_assert_not_reached();                                                \
+    }                                                                          \
+}
+
+#define S390_MINMAX_ABS(BITS, TYPE)                                            \
+static float##BITS TYPE##_abs##BITS(float##BITS a, float##BITS b,              \
+                                    float_status *s)                           \
+{                                                                              \
+    return TYPE##BITS(float##BITS##_abs(a), float##BITS##_abs(b), s);          \
+}
+
+S390_MINMAX(32, s390_minmax_java_math_min)
+S390_MINMAX(32, s390_minmax_java_math_max)
+S390_MINMAX(32, s390_minmax_c_macro_min)
+S390_MINMAX(32, s390_minmax_c_macro_max)
+S390_MINMAX(32, s390_minmax_fmin)
+S390_MINMAX(32, s390_minmax_fmax)
+S390_MINMAX(32, s390_minmax_cpp_alg_min)
+S390_MINMAX(32, s390_minmax_cpp_alg_max)
+S390_MINMAX_ABS(32, s390_minmax_java_math_min)
+S390_MINMAX_ABS(32, s390_minmax_java_math_max)
+S390_MINMAX_ABS(32, s390_minmax_c_macro_min)
+S390_MINMAX_ABS(32, s390_minmax_c_macro_max)
+S390_MINMAX_ABS(32, s390_minmax_fmin)
+S390_MINMAX_ABS(32, s390_minmax_fmax)
+S390_MINMAX_ABS(32, s390_minmax_cpp_alg_min)
+S390_MINMAX_ABS(32, s390_minmax_cpp_alg_max)
+
+S390_MINMAX(64, s390_minmax_java_math_min)
+S390_MINMAX(64, s390_minmax_java_math_max)
+S390_MINMAX(64, s390_minmax_c_macro_min)
+S390_MINMAX(64, s390_minmax_c_macro_max)
+S390_MINMAX(64, s390_minmax_fmin)
+S390_MINMAX(64, s390_minmax_fmax)
+S390_MINMAX(64, s390_minmax_cpp_alg_min)
+S390_MINMAX(64, s390_minmax_cpp_alg_max)
+S390_MINMAX_ABS(64, s390_minmax_java_math_min)
+S390_MINMAX_ABS(64, s390_minmax_java_math_max)
+S390_MINMAX_ABS(64, s390_minmax_c_macro_min)
+S390_MINMAX_ABS(64, s390_minmax_c_macro_max)
+S390_MINMAX_ABS(64, s390_minmax_fmin)
+S390_MINMAX_ABS(64, s390_minmax_fmax)
+S390_MINMAX_ABS(64, s390_minmax_cpp_alg_min)
+S390_MINMAX_ABS(64, s390_minmax_cpp_alg_max)
+
+S390_MINMAX(128, s390_minmax_java_math_min)
+S390_MINMAX(128, s390_minmax_java_math_max)
+S390_MINMAX(128, s390_minmax_c_macro_min)
+S390_MINMAX(128, s390_minmax_c_macro_max)
+S390_MINMAX(128, s390_minmax_fmin)
+S390_MINMAX(128, s390_minmax_fmax)
+S390_MINMAX(128, s390_minmax_cpp_alg_min)
+S390_MINMAX(128, s390_minmax_cpp_alg_max)
+S390_MINMAX_ABS(128, s390_minmax_java_math_min)
+S390_MINMAX_ABS(128, s390_minmax_java_math_max)
+S390_MINMAX_ABS(128, s390_minmax_c_macro_min)
+S390_MINMAX_ABS(128, s390_minmax_c_macro_max)
+S390_MINMAX_ABS(128, s390_minmax_fmin)
+S390_MINMAX_ABS(128, s390_minmax_fmax)
+S390_MINMAX_ABS(128, s390_minmax_cpp_alg_min)
+S390_MINMAX_ABS(128, s390_minmax_cpp_alg_max)
+
+static vop32_3_fn const vfmax_fns32[16] = {
+    [0] = float32_maxnum,
+    [1] = s390_minmax_java_math_max32,
+    [2] = s390_minmax_c_macro_max32,
+    [3] = s390_minmax_cpp_alg_max32,
+    [4] = s390_minmax_fmax32,
+    [8] = float32_maxnummag,
+    [9] = s390_minmax_java_math_max_abs32,
+    [10] = s390_minmax_c_macro_max_abs32,
+    [11] = s390_minmax_cpp_alg_max_abs32,
+    [12] = s390_minmax_fmax_abs32,
+};
+
+static vop64_3_fn const vfmax_fns64[16] = {
+    [0] = float64_maxnum,
+    [1] = s390_minmax_java_math_max64,
+    [2] = s390_minmax_c_macro_max64,
+    [3] = s390_minmax_cpp_alg_max64,
+    [4] = s390_minmax_fmax64,
+    [8] = float64_maxnummag,
+    [9] = s390_minmax_java_math_max_abs64,
+    [10] = s390_minmax_c_macro_max_abs64,
+    [11] = s390_minmax_cpp_alg_max_abs64,
+    [12] = s390_minmax_fmax_abs64,
+};
+
+static vop128_3_fn const vfmax_fns128[16] = {
+    [0] = float128_maxnum,
+    [1] = s390_minmax_java_math_max128,
+    [2] = s390_minmax_c_macro_max128,
+    [3] = s390_minmax_cpp_alg_max128,
+    [4] = s390_minmax_fmax128,
+    [8] = float128_maxnummag,
+    [9] = s390_minmax_java_math_max_abs128,
+    [10] = s390_minmax_c_macro_max_abs128,
+    [11] = s390_minmax_cpp_alg_max_abs128,
+    [12] = s390_minmax_fmax_abs128,
+};
+
+#define DEF_GVEC_VFMAX(BITS)                                                   \
+void HELPER(gvec_vfmax##BITS)(void *v1, const void *v2, const void *v3,        \
+                              CPUS390XState *env, uint32_t desc)               \
+{                                                                              \
+    vop##BITS##_3_fn fn = vfmax_fns##BITS[simd_data(desc)];                    \
+                                                                               \
+    g_assert(fn);                                                              \
+    vop##BITS##_3(v1, v2, v3, env, false, fn, GETPC());                        \
+}
+DEF_GVEC_VFMAX(32)
+DEF_GVEC_VFMAX(64)
+DEF_GVEC_VFMAX(128)
+
+#define DEF_GVEC_VFMAX_S(BITS)                                                 \
+void HELPER(gvec_vfmax##BITS##s)(void *v1, const void *v2, const void *v3,     \
+                                 CPUS390XState *env, uint32_t desc)            \
+{                                                                              \
+    vop##BITS##_3_fn fn = vfmax_fns##BITS[simd_data(desc)];                    \
+                                                                               \
+    g_assert(fn);                                                              \
+    vop##BITS##_3(v1, v2, v3, env, true, fn, GETPC());                         \
+}
+DEF_GVEC_VFMAX_S(32)
+DEF_GVEC_VFMAX_S(64)
+
+static vop32_3_fn const vfmin_fns32[16] = {
+    [0] = float32_minnum,
+    [1] = s390_minmax_java_math_min32,
+    [2] = s390_minmax_c_macro_min32,
+    [3] = s390_minmax_cpp_alg_min32,
+    [4] = s390_minmax_fmin32,
+    [8] = float32_minnummag,
+    [9] = s390_minmax_java_math_min_abs32,
+    [10] = s390_minmax_c_macro_min_abs32,
+    [11] = s390_minmax_cpp_alg_min_abs32,
+    [12] = s390_minmax_fmin_abs32,
+};
+
+static vop64_3_fn const vfmin_fns64[16] = {
+    [0] = float64_minnum,
+    [1] = s390_minmax_java_math_min64,
+    [2] = s390_minmax_c_macro_min64,
+    [3] = s390_minmax_cpp_alg_min64,
+    [4] = s390_minmax_fmin64,
+    [8] = float64_minnummag,
+    [9] = s390_minmax_java_math_min_abs64,
+    [10] = s390_minmax_c_macro_min_abs64,
+    [11] = s390_minmax_cpp_alg_min_abs64,
+    [12] = s390_minmax_fmin_abs64,
+};
+
+static vop128_3_fn const vfmin_fns128[16] = {
+    [0] = float128_minnum,
+    [1] = s390_minmax_java_math_min128,
+    [2] = s390_minmax_c_macro_min128,
+    [3] = s390_minmax_cpp_alg_min128,
+    [4] = s390_minmax_fmin128,
+    [8] = float128_minnummag,
+    [9] = s390_minmax_java_math_min_abs128,
+    [10] = s390_minmax_c_macro_min_abs128,
+    [11] = s390_minmax_cpp_alg_min_abs128,
+    [12] = s390_minmax_fmin_abs128,
+};
+
+#define DEF_GVEC_VFMIN(BITS)                                                   \
+void HELPER(gvec_vfmin##BITS)(void *v1, const void *v2, const void *v3,        \
+                              CPUS390XState *env, uint32_t desc)               \
+{                                                                              \
+    vop##BITS##_3_fn fn = vfmin_fns##BITS[simd_data(desc)];                    \
+                                                                               \
+    g_assert(fn);                                                              \
+    vop##BITS##_3(v1, v2, v3, env, false, fn, GETPC());                        \
+}
+DEF_GVEC_VFMIN(32)
+DEF_GVEC_VFMIN(64)
+DEF_GVEC_VFMIN(128)
+
+#define DEF_GVEC_VFMIN_S(BITS)                                                 \
+void HELPER(gvec_vfmin##BITS##s)(void *v1, const void *v2, const void *v3,     \
+                                 CPUS390XState *env, uint32_t desc)            \
+{                                                                              \
+    vop##BITS##_3_fn fn = vfmin_fns##BITS[simd_data(desc)];                    \
+                                                                               \
+    g_assert(fn);                                                              \
+    vop##BITS##_3(v1, v2, v3, env, true, fn, GETPC());                         \
+}
+DEF_GVEC_VFMIN_S(32)
+DEF_GVEC_VFMIN_S(64)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 19/20] s390x/tcg: We support Vector enhancements facility
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (17 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 18/20] s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM) David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-10-01 16:50   ` Richard Henderson
  2020-09-30 14:55 ` [PATCH v1 20/20] s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z14 GA2 David Hildenbrand
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Cornelia Huck, Richard Henderson, Thomas Huth,
	David Hildenbrand

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/gen-features.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 21c1e912fd..a7bad36f35 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -718,6 +718,7 @@ static uint16_t qemu_MAX[] = {
     S390_FEAT_INSTRUCTION_EXEC_PROT,
     S390_FEAT_MISC_INSTRUCTION_EXT2,
     S390_FEAT_MSA_EXT_8,
+    S390_FEAT_VECTOR_ENH,
 };
 
 /****** END FEATURE DEFS ******/
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v1 20/20] s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z14 GA2
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (18 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 19/20] s390x/tcg: We support Vector enhancements facility David Hildenbrand
@ 2020-09-30 14:55 ` David Hildenbrand
  2020-10-01 16:52   ` Richard Henderson
  2020-09-30 15:35 ` [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 no-reply
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2020-09-30 14:55 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Christian Borntraeger, qemu-s390x

TCG implements everything we need to run basic z14 OS+software.

Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/s390x/s390-virtio-ccw.c  |  2 ++
 target/s390x/cpu_models.c   |  4 ++--
 target/s390x/gen-features.c | 15 +++++++++------
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 3106bbea33..5f9931d509 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -812,7 +812,9 @@ DEFINE_CCW_MACHINE(5_2, "5.2", true);
 
 static void ccw_machine_5_1_instance_options(MachineState *machine)
 {
+    static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V5_1 };
     ccw_machine_5_2_instance_options(machine);
+    s390_set_qemu_cpu_model(0x2964, 13, 2, qemu_cpu_feat);
 }
 
 static void ccw_machine_5_1_class_options(MachineClass *mc)
diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index b97e9596ab..c613ea87e8 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -88,8 +88,8 @@ static S390CPUDef s390_cpu_defs[] = {
     CPUDEF_INIT(0x8562, 15, 1, 47, 0x08000000U, "gen15b", "IBM 8562 GA1"),
 };
 
-#define QEMU_MAX_CPU_TYPE 0x2964
-#define QEMU_MAX_CPU_GEN 13
+#define QEMU_MAX_CPU_TYPE 0x3906
+#define QEMU_MAX_CPU_GEN 14
 #define QEMU_MAX_CPU_EC_GA 2
 static const S390FeatInit qemu_max_cpu_feat_init = { S390_FEAT_LIST_QEMU_MAX };
 static S390FeatBitmap qemu_max_cpu_feat;
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index a7bad36f35..017b8ac95e 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -704,23 +704,25 @@ static uint16_t qemu_V4_1[] = {
     S390_FEAT_VECTOR,
 };
 
-static uint16_t qemu_LATEST[] = {
+static uint16_t qemu_V5_1[] = {
     S390_FEAT_ACCESS_EXCEPTION_FS_INDICATION,
     S390_FEAT_SIDE_EFFECT_ACCESS_ESOP2,
     S390_FEAT_ESOP,
 };
 
-/* add all new definitions before this point */
-static uint16_t qemu_MAX[] = {
-    /* generates a dependency warning, leave it out for now */
-    S390_FEAT_MSA_EXT_5,
-    /* features introduced after the z13 */
+static uint16_t qemu_LATEST[] = {
     S390_FEAT_INSTRUCTION_EXEC_PROT,
     S390_FEAT_MISC_INSTRUCTION_EXT2,
     S390_FEAT_MSA_EXT_8,
     S390_FEAT_VECTOR_ENH,
 };
 
+/* add all new definitions before this point */
+static uint16_t qemu_MAX[] = {
+    /* generates a dependency warning, leave it out for now */
+    S390_FEAT_MSA_EXT_5,
+};
+
 /****** END FEATURE DEFS ******/
 
 #define _YEARS  "2016"
@@ -837,6 +839,7 @@ static FeatGroupDefSpec QemuFeatDef[] = {
     QEMU_FEAT_INITIALIZER(V3_1),
     QEMU_FEAT_INITIALIZER(V4_0),
     QEMU_FEAT_INITIALIZER(V4_1),
+    QEMU_FEAT_INITIALIZER(V5_1),
     QEMU_FEAT_INITIALIZER(LATEST),
     QEMU_FEAT_INITIALIZER(MAX),
 };
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (19 preceding siblings ...)
  2020-09-30 14:55 ` [PATCH v1 20/20] s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z14 GA2 David Hildenbrand
@ 2020-09-30 15:35 ` no-reply
  2020-10-01 15:07 ` Richard Henderson
  2021-05-05 10:55 ` David Hildenbrand
  22 siblings, 0 replies; 47+ messages in thread
From: no-reply @ 2020-09-30 15:35 UTC (permalink / raw)
  To: david
  Cc: peter.maydell, thuth, david, cohuck, richard.henderson,
	qemu-devel, borntraeger, qemu-s390x, alex.bennee, aurelien

Patchew URL: https://patchew.org/QEMU/20200930145523.71087-1-david@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20200930145523.71087-1-david@redhat.com
Subject: [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
855b893 s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z14 GA2
9e2c5a3 s390x/tcg: We support Vector enhancements facility
91355af s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM)
22ae891 s390x/tcg: Implement VECTOR FP NEGATIVE MULTIPLY AND (ADD|SUBTRACT)
599e1d8 s390x/tcg: Implement 32/128bit for VECTOR FP MULTIPLY AND (ADD|SUBTRACT)
2d46f63 s390x/tcg: Implement 32/128 bit for VECTOR FP TEST DATA CLASS IMMEDIATE
7f46bf1 s390x/tcg: Implement 32/128 bit for VECTOR FP SQUARE ROOT
0870741 s390x/tcg: Implement 32/128 bit for VECTOR FP PERFORM SIGN OPERATION
743a4a3 s390x/tcg: Implement 128 bit for VECTOR FP LOAD ROUNDED
793d28c s390x/tcg: Implement 64 bit for VECTOR FP LOAD LENGTHENED
ff85384 s390x/tcg: Implement 32/128 bit for VECTOR LOAD FP INTEGER
de00280 s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE *
ba7aec7 s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE (AND SIGNAL) SCALAR
74af73a s390x/tcg: Implement 32/128 bit for VECTOR FP SUBTRACT
59a6f7a s390x/tcg: Implement 32/128 bit for VECTOR FP MULTIPLY
c0903b4 s390x/tcg: Implement 32/128 bit for VECTOR FP DIVIDE
8af7fca s390x/tcg: Implement 32/128 bit for VECTOR FP ADD
e103776 s390x/tcg: Implement VECTOR MULTIPLY SUM LOGICAL
092690a s390x/tcg: Implement VECTOR BIT PERMUTE
46d60ed softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag)

=== OUTPUT BEGIN ===
1/20 Checking commit 46d60ede64da (softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag))
2/20 Checking commit 092690a01e4b (s390x/tcg: Implement VECTOR BIT PERMUTE)
3/20 Checking commit e1037765c06f (s390x/tcg: Implement VECTOR MULTIPLY SUM LOGICAL)
4/20 Checking commit 8af7fcaee5ef (s390x/tcg: Implement 32/128 bit for VECTOR FP ADD)
ERROR: space prohibited between function name and open parenthesis '('
#162: FILE: target/s390x/vec_fpu_helper.c:140:
+typedef float##BITS (*vop##BITS##_3_fn)(float##BITS a, float##BITS b,          \

total: 1 errors, 0 warnings, 183 lines checked

Patch 4/20 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

5/20 Checking commit c0903b4e139a (s390x/tcg: Implement 32/128 bit for VECTOR FP DIVIDE)
6/20 Checking commit 59a6f7a3d4ca (s390x/tcg: Implement 32/128 bit for VECTOR FP MULTIPLY)
7/20 Checking commit 74af73a36417 (s390x/tcg: Implement 32/128 bit for VECTOR FP SUBTRACT)
8/20 Checking commit ba7aec76b6d7 (s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE (AND SIGNAL) SCALAR)
WARNING: Block comments use a leading /* on a separate line
#113: FILE: target/s390x/vec_fpu_helper.c:190:
+    /* only the zero-indexed elements are compared */                          \

total: 0 errors, 1 warnings, 136 lines checked

Patch 8/20 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
9/20 Checking commit de00280993e0 (s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE *)
WARNING: line over 80 characters
#26: FILE: target/s390x/helper.h:262:
+DEF_HELPER_FLAGS_5(gvec_vfce32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#29: FILE: target/s390x/helper.h:265:
+DEF_HELPER_FLAGS_5(gvec_vfce128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#36: FILE: target/s390x/helper.h:272:
+DEF_HELPER_FLAGS_5(gvec_vfch32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#39: FILE: target/s390x/helper.h:275:
+DEF_HELPER_FLAGS_5(gvec_vfch128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#45: FILE: target/s390x/helper.h:281:
+DEF_HELPER_FLAGS_5(gvec_vfche32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#46: FILE: target/s390x/helper.h:282:
+DEF_HELPER_FLAGS_5(gvec_vfche32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#49: FILE: target/s390x/helper.h:285:
+DEF_HELPER_FLAGS_5(gvec_vfche128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: Block comments use a leading /* on a separate line
#250: FILE: target/s390x/vec_fpu_helper.c:250:
+        /* swap the parameters, so we can use existing functions */            \

total: 0 errors, 8 warnings, 445 lines checked

Patch 9/20 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
10/20 Checking commit ff85384a58d4 (s390x/tcg: Implement 32/128 bit for VECTOR LOAD FP INTEGER)
ERROR: space prohibited between function name and open parenthesis '('
#140: FILE: target/s390x/vec_fpu_helper.c:120:
+typedef float##BITS (*vop##BITS##_2_fn)(float##BITS a, float_status *s);       \

total: 1 errors, 0 warnings, 191 lines checked

Patch 10/20 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

11/20 Checking commit 793d28cefc57 (s390x/tcg: Implement 64 bit for VECTOR FP LOAD LENGTHENED)
12/20 Checking commit 743a4a3474e5 (s390x/tcg: Implement 128 bit for VECTOR FP LOAD ROUNDED)
13/20 Checking commit 08707417ca0e (s390x/tcg: Implement 32/128 bit for VECTOR FP PERFORM SIGN OPERATION)
14/20 Checking commit 7f46bf12ce3e (s390x/tcg: Implement 32/128 bit for VECTOR FP SQUARE ROOT)
15/20 Checking commit 2d46f639d6dd (s390x/tcg: Implement 32/128 bit for VECTOR FP TEST DATA CLASS IMMEDIATE)
16/20 Checking commit 599e1d8a08ff (s390x/tcg: Implement 32/128bit for VECTOR FP MULTIPLY AND (ADD|SUBTRACT))
WARNING: line over 80 characters
#20: FILE: target/s390x/helper.h:320:
+DEF_HELPER_FLAGS_6(gvec_vfma32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#21: FILE: target/s390x/helper.h:321:
+DEF_HELPER_FLAGS_6(gvec_vfma32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#24: FILE: target/s390x/helper.h:324:
+DEF_HELPER_FLAGS_6(gvec_vfma128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#25: FILE: target/s390x/helper.h:325:
+DEF_HELPER_FLAGS_6(gvec_vfms32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#26: FILE: target/s390x/helper.h:326:
+DEF_HELPER_FLAGS_6(gvec_vfms32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#29: FILE: target/s390x/helper.h:329:
+DEF_HELPER_FLAGS_6(gvec_vfms128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

total: 0 errors, 6 warnings, 188 lines checked

Patch 16/20 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
17/20 Checking commit 22ae891fa9a5 (s390x/tcg: Implement VECTOR FP NEGATIVE MULTIPLY AND (ADD|SUBTRACT))
WARNING: line over 80 characters
#20: FILE: target/s390x/helper.h:330:
+DEF_HELPER_FLAGS_6(gvec_vfnma32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#21: FILE: target/s390x/helper.h:331:
+DEF_HELPER_FLAGS_6(gvec_vfnma32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#22: FILE: target/s390x/helper.h:332:
+DEF_HELPER_FLAGS_6(gvec_vfnma64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#23: FILE: target/s390x/helper.h:333:
+DEF_HELPER_FLAGS_6(gvec_vfnma64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#24: FILE: target/s390x/helper.h:334:
+DEF_HELPER_FLAGS_6(gvec_vfnma128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#25: FILE: target/s390x/helper.h:335:
+DEF_HELPER_FLAGS_6(gvec_vfnms32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#26: FILE: target/s390x/helper.h:336:
+DEF_HELPER_FLAGS_6(gvec_vfnms32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#27: FILE: target/s390x/helper.h:337:
+DEF_HELPER_FLAGS_6(gvec_vfnms64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#28: FILE: target/s390x/helper.h:338:
+DEF_HELPER_FLAGS_6(gvec_vfnms64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#29: FILE: target/s390x/helper.h:339:
+DEF_HELPER_FLAGS_6(gvec_vfnms128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, cptr, env, i32)

total: 0 errors, 10 warnings, 109 lines checked

Patch 17/20 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
18/20 Checking commit 91355af73073 (s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM))
WARNING: line over 80 characters
#28: FILE: target/s390x/helper.h:320:
+DEF_HELPER_FLAGS_5(gvec_vfmax32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#29: FILE: target/s390x/helper.h:321:
+DEF_HELPER_FLAGS_5(gvec_vfmax32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#30: FILE: target/s390x/helper.h:322:
+DEF_HELPER_FLAGS_5(gvec_vfmax64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#31: FILE: target/s390x/helper.h:323:
+DEF_HELPER_FLAGS_5(gvec_vfmax64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#32: FILE: target/s390x/helper.h:324:
+DEF_HELPER_FLAGS_5(gvec_vfmax128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#33: FILE: target/s390x/helper.h:325:
+DEF_HELPER_FLAGS_5(gvec_vfmin32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#34: FILE: target/s390x/helper.h:326:
+DEF_HELPER_FLAGS_5(gvec_vfmin32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#35: FILE: target/s390x/helper.h:327:
+DEF_HELPER_FLAGS_5(gvec_vfmin64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#36: FILE: target/s390x/helper.h:328:
+DEF_HELPER_FLAGS_5(gvec_vfmin64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: line over 80 characters
#37: FILE: target/s390x/helper.h:329:
+DEF_HELPER_FLAGS_5(gvec_vfmin128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)

WARNING: Block comments use a leading /* on a separate line
#158: FILE: target/s390x/vec_fpu_helper.c:941:
+            /* fall through */                                                 \

WARNING: Block comments use a leading /* on a separate line
#211: FILE: target/s390x/vec_fpu_helper.c:994:
+    /* We can process all remaining cases using simple comparison. */          \

total: 0 errors, 12 warnings, 379 lines checked

Patch 18/20 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
19/20 Checking commit 9e2c5a3530bd (s390x/tcg: We support Vector enhancements facility)
20/20 Checking commit 855b893d0a4d (s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z14 GA2)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200930145523.71087-1-david@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 01/20] softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag)
  2020-09-30 14:55 ` [PATCH v1 01/20] softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag) David Hildenbrand
@ 2020-09-30 16:10   ` Alex Bennée
  2020-10-01 12:40     ` David Hildenbrand
  0 siblings, 1 reply; 47+ messages in thread
From: Alex Bennée @ 2020-09-30 16:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Peter Maydell, Thomas Huth, Cornelia Huck, Richard Henderson,
	qemu-devel, qemu-s390x, Aurelien Jarno


David Hildenbrand <david@redhat.com> writes:

> Implementation inspired by minmax_floats(). Unfortuantely, we don't have
> any tests we can simply adjust/unlock.
>
> Cc: Aurelien Jarno <aurelien@aurel32.net>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: "Alex Bennée" <alex.bennee@linaro.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  fpu/softfloat.c         | 100 ++++++++++++++++++++++++++++++++++++++++
>  include/fpu/softfloat.h |   6 +++
>  2 files changed, 106 insertions(+)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 9af75b9146..9463c5ea56 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -621,6 +621,8 @@ static inline FloatParts float64_unpack_raw(float64 f)
>      return unpack_raw(float64_params, f);
>  }
>  
> +static void float128_unpack(FloatParts128 *p, float128 a, float_status *status);
> +
>  /* Pack a float from parts, but do not canonicalize.  */
>  static inline uint64_t pack_raw(FloatFmt fmt, FloatParts p)
>  {
> @@ -3180,6 +3182,89 @@ static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
>      }
>  }

It would be desirable to share as much logic for this as possible with
the existing minmax_floats code. I appreciate at some point we end up
having to deal with fractions and we haven't found a good way to
efficiently handle dealing with FloatParts and FloatParts128 in the same
unrolled code, however:

>  
> +static float128 float128_minmax(float128 a, float128 b, bool ismin, bool ieee,
> +                                bool ismag, float_status *s)
> +{
> +    FloatParts128 pa, pb;
> +    int a_exp, b_exp;
> +    bool a_less;
> +
> +    float128_unpack(&pa, a, s);
> +    float128_unpack(&pb, b, s);
> +

From here:
> +    if (unlikely(is_nan(pa.cls) || is_nan(pb.cls))) {
> +        /* See comment in minmax_floats() */
> +        if (ieee && !is_snan(pa.cls) && !is_snan(pb.cls)) {
> +            if (is_nan(pa.cls) && !is_nan(pb.cls)) {
> +                return b;
> +            } else if (is_nan(pb.cls) && !is_nan(pa.cls)) {
> +                return a;
> +            }
> +        }
> +
> +        /* Similar logic to pick_nan(), avoiding re-packing. */
> +        if (is_snan(pa.cls) || is_snan(pb.cls)) {
> +            s->float_exception_flags |= float_flag_invalid;
> +        }
> +        if (s->default_nan_mode) {
> +            return float128_default_nan(s);
> +        }

to here is common logic - is there anyway we could share it?

> +        if (pickNaN(pa.cls, pb.cls,
> +                    pa.frac0 > pb.frac0 ||
> +                    (pa.frac0 == pb.frac0 && pa.frac1 > pb.frac1) ||
> +                    (pa.frac0 == pb.frac0 && pa.frac1 == pb.frac1 &&
> +                     pa.sign < pb.sign), s)) {
> +            return is_snan(pb.cls) ? float128_silence_nan(b, s) : b;
> +        }
> +        return is_snan(pa.cls) ? float128_silence_nan(a, s) : a;
> +    }
> +
> +    switch (pa.cls) {
> +    case float_class_normal:
> +        a_exp = pa.exp;
> +        break;
> +    case float_class_inf:
> +        a_exp = INT_MAX;
> +        break;
> +    case float_class_zero:
> +        a_exp = INT_MIN;
> +        break;
> +    default:
> +        g_assert_not_reached();
> +        break;
> +    }

Likewise I wonder if there is scope for a float_minmax_exp helper that
could be shared here?

> +    switch (pb.cls) {
> +    case float_class_normal:
> +        b_exp = pb.exp;
> +        break;
> +    case float_class_inf:
> +        b_exp = INT_MAX;
> +        break;
> +    case float_class_zero:
> +        b_exp = INT_MIN;
> +        break;
> +    default:
> +        g_assert_not_reached();
> +        break;
> +    }
> +
> +    a_less = a_exp < b_exp;
> +    if (a_exp == b_exp) {
> +        a_less = pa.frac0 < pb.frac0;
> +        if (pa.frac0 == pb.frac0) {
> +            a_less = pa.frac1 < pb.frac1;
> +        }
> +    }
> +
> +    if (ismag &&
> +        (a_exp != b_exp || pa.frac0 != pb.frac0 || pa.frac1 != pb.frac1)) {
> +        return a_less ^ ismin ? b : a;
> +    } else if (pa.sign == pb.sign) {
> +        return pa.sign ^ a_less ^ ismin ? b : a;
> +    }
> +    return pa.sign ^ ismin ? b : a;
> +}

Otherwise it seems sane to me.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 01/20] softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag)
  2020-09-30 16:10   ` Alex Bennée
@ 2020-10-01 12:40     ` David Hildenbrand
  2020-10-01 13:15       ` Alex Bennée
  0 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2020-10-01 12:40 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Peter Maydell, Thomas Huth, Cornelia Huck, Richard Henderson,
	qemu-devel, qemu-s390x, Aurelien Jarno

On 30.09.20 18:10, Alex Bennée wrote:
> 
> David Hildenbrand <david@redhat.com> writes:
> 
>> Implementation inspired by minmax_floats(). Unfortuantely, we don't have
>> any tests we can simply adjust/unlock.
>>
>> Cc: Aurelien Jarno <aurelien@aurel32.net>
>> Cc: Peter Maydell <peter.maydell@linaro.org>
>> Cc: "Alex Bennée" <alex.bennee@linaro.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  fpu/softfloat.c         | 100 ++++++++++++++++++++++++++++++++++++++++
>>  include/fpu/softfloat.h |   6 +++
>>  2 files changed, 106 insertions(+)
>>
>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>> index 9af75b9146..9463c5ea56 100644
>> --- a/fpu/softfloat.c
>> +++ b/fpu/softfloat.c
>> @@ -621,6 +621,8 @@ static inline FloatParts float64_unpack_raw(float64 f)
>>      return unpack_raw(float64_params, f);
>>  }
>>  
>> +static void float128_unpack(FloatParts128 *p, float128 a, float_status *status);
>> +
>>  /* Pack a float from parts, but do not canonicalize.  */
>>  static inline uint64_t pack_raw(FloatFmt fmt, FloatParts p)
>>  {
>> @@ -3180,6 +3182,89 @@ static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
>>      }
>>  }
> 
> It would be desirable to share as much logic for this as possible with
> the existing minmax_floats code. I appreciate at some point we end up
> having to deal with fractions and we haven't found a good way to
> efficiently handle dealing with FloatParts and FloatParts128 in the same
> unrolled code, however:
> 
>>  
>> +static float128 float128_minmax(float128 a, float128 b, bool ismin, bool ieee,
>> +                                bool ismag, float_status *s)
>> +{
>> +    FloatParts128 pa, pb;
>> +    int a_exp, b_exp;
>> +    bool a_less;
>> +
>> +    float128_unpack(&pa, a, s);
>> +    float128_unpack(&pb, b, s);
>> +
> 
> From here:
>> +    if (unlikely(is_nan(pa.cls) || is_nan(pb.cls))) {
>> +        /* See comment in minmax_floats() */
>> +        if (ieee && !is_snan(pa.cls) && !is_snan(pb.cls)) {
>> +            if (is_nan(pa.cls) && !is_nan(pb.cls)) {
>> +                return b;
>> +            } else if (is_nan(pb.cls) && !is_nan(pa.cls)) {
>> +                return a;
>> +            }
>> +        }
>> +
>> +        /* Similar logic to pick_nan(), avoiding re-packing. */
>> +        if (is_snan(pa.cls) || is_snan(pb.cls)) {
>> +            s->float_exception_flags |= float_flag_invalid;
>> +        }
>> +        if (s->default_nan_mode) {
>> +            return float128_default_nan(s);
>> +        }
> 
> to here is common logic - is there anyway we could share it?

I can try to factor it out, similar to pickNaN() - passing weird boolean
flags and such. It most certainly won't win in a beauty contest, that's
for sure.

> 
>> +        if (pickNaN(pa.cls, pb.cls,
>> +                    pa.frac0 > pb.frac0 ||
>> +                    (pa.frac0 == pb.frac0 && pa.frac1 > pb.frac1) ||
>> +                    (pa.frac0 == pb.frac0 && pa.frac1 == pb.frac1 &&
>> +                     pa.sign < pb.sign), s)) {
>> +            return is_snan(pb.cls) ? float128_silence_nan(b, s) : b;
>> +        }
>> +        return is_snan(pa.cls) ? float128_silence_nan(a, s) : a;
>> +    }
>> +
>> +    switch (pa.cls) {
>> +    case float_class_normal:
>> +        a_exp = pa.exp;
>> +        break;
>> +    case float_class_inf:
>> +        a_exp = INT_MAX;
>> +        break;
>> +    case float_class_zero:
>> +        a_exp = INT_MIN;
>> +        break;
>> +    default:
>> +        g_assert_not_reached();
>> +        break;
>> +    }
> 
> Likewise I wonder if there is scope for a float_minmax_exp helper that
> could be shared here?

I'll try, but I have the feeling that it might make the code harder to
read than actually help. Will give it a try.

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 01/20] softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag)
  2020-10-01 12:40     ` David Hildenbrand
@ 2020-10-01 13:15       ` Alex Bennée
  2021-05-05 14:54         ` David Hildenbrand
  0 siblings, 1 reply; 47+ messages in thread
From: Alex Bennée @ 2020-10-01 13:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Peter Maydell, Thomas Huth, Cornelia Huck, Richard Henderson,
	qemu-devel, qemu-s390x, Aurelien Jarno


David Hildenbrand <david@redhat.com> writes:

> On 30.09.20 18:10, Alex Bennée wrote:
>> 
>> David Hildenbrand <david@redhat.com> writes:
>> 
>>> Implementation inspired by minmax_floats(). Unfortuantely, we don't have
>>> any tests we can simply adjust/unlock.
>>>
>>> Cc: Aurelien Jarno <aurelien@aurel32.net>
>>> Cc: Peter Maydell <peter.maydell@linaro.org>
>>> Cc: "Alex Bennée" <alex.bennee@linaro.org>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  fpu/softfloat.c         | 100 ++++++++++++++++++++++++++++++++++++++++
>>>  include/fpu/softfloat.h |   6 +++
>>>  2 files changed, 106 insertions(+)
>>>
>>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>>> index 9af75b9146..9463c5ea56 100644
>>> --- a/fpu/softfloat.c
>>> +++ b/fpu/softfloat.c
>>> @@ -621,6 +621,8 @@ static inline FloatParts float64_unpack_raw(float64 f)
>>>      return unpack_raw(float64_params, f);
>>>  }
>>>  
>>> +static void float128_unpack(FloatParts128 *p, float128 a, float_status *status);
>>> +
>>>  /* Pack a float from parts, but do not canonicalize.  */
>>>  static inline uint64_t pack_raw(FloatFmt fmt, FloatParts p)
>>>  {
>>> @@ -3180,6 +3182,89 @@ static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
>>>      }
>>>  }
>> 
>> It would be desirable to share as much logic for this as possible with
>> the existing minmax_floats code. I appreciate at some point we end up
>> having to deal with fractions and we haven't found a good way to
>> efficiently handle dealing with FloatParts and FloatParts128 in the same
>> unrolled code, however:
>> 
>>>  
>>> +static float128 float128_minmax(float128 a, float128 b, bool ismin, bool ieee,
>>> +                                bool ismag, float_status *s)
>>> +{
>>> +    FloatParts128 pa, pb;
>>> +    int a_exp, b_exp;
>>> +    bool a_less;
>>> +
>>> +    float128_unpack(&pa, a, s);
>>> +    float128_unpack(&pb, b, s);
>>> +
>> 
>> From here:
>>> +    if (unlikely(is_nan(pa.cls) || is_nan(pb.cls))) {
>>> +        /* See comment in minmax_floats() */
>>> +        if (ieee && !is_snan(pa.cls) && !is_snan(pb.cls)) {
>>> +            if (is_nan(pa.cls) && !is_nan(pb.cls)) {
>>> +                return b;
>>> +            } else if (is_nan(pb.cls) && !is_nan(pa.cls)) {
>>> +                return a;
>>> +            }
>>> +        }
>>> +
>>> +        /* Similar logic to pick_nan(), avoiding re-packing. */
>>> +        if (is_snan(pa.cls) || is_snan(pb.cls)) {
>>> +            s->float_exception_flags |= float_flag_invalid;
>>> +        }
>>> +        if (s->default_nan_mode) {
>>> +            return float128_default_nan(s);
>>> +        }
>> 
>> to here is common logic - is there anyway we could share it?
>
> I can try to factor it out, similar to pickNaN() - passing weird boolean
> flags and such. It most certainly won't win in a beauty contest, that's
> for sure.
>> 
>> Likewise I wonder if there is scope for a float_minmax_exp helper that
>> could be shared here?
>
> I'll try, but I have the feeling that it might make the code harder to
> read than actually help. Will give it a try.

Give it a try - if it really does become harder to follow then we'll
stick with the duplication however if we can have common code you'll
know at least the nan handling and minmax behaviour for float128 will be
partially tested by the 16/32/64 float code.

>
> Thanks!


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (20 preceding siblings ...)
  2020-09-30 15:35 ` [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 no-reply
@ 2020-10-01 15:07 ` Richard Henderson
  2020-10-07 13:09   ` David Hildenbrand
  2021-05-05 10:55 ` David Hildenbrand
  22 siblings, 1 reply; 47+ messages in thread
From: Richard Henderson @ 2020-10-01 15:07 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Peter Maydell, Thomas Huth, Cornelia Huck, Christian Borntraeger,
	qemu-s390x, Alex Bennée, Aurelien Jarno

On 9/30/20 9:55 AM, David Hildenbrand wrote:
> This series adds support for the "Vector enhancements facility" and bumps
> the qemu CPU model tp to a stripped-down z14.
> 
> I yet have to find a way to get more test coverage - looks like some of
> the functions aren't used anywhere yet (e.g., VECTOR FP MAXIMUM), writing
> unit tests to cover all functions and cases is just nasty. But I might be
> wrong - I'm planning to at least test basic functionality of all new added
> instructions.

This is where RISU can be helpful.  Auto-generate 100k random variations,
record known good results on hardware, verify that replay on qemu produces the
same results.


r~


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 02/20] s390x/tcg: Implement VECTOR BIT PERMUTE
  2020-09-30 14:55 ` [PATCH v1 02/20] s390x/tcg: Implement VECTOR BIT PERMUTE David Hildenbrand
@ 2020-10-01 15:17   ` Richard Henderson
  2020-10-01 17:28     ` David Hildenbrand
  0 siblings, 1 reply; 47+ messages in thread
From: Richard Henderson @ 2020-10-01 15:17 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 9/30/20 9:55 AM, David Hildenbrand wrote:
> +        bit = !!(s390_vec_read_element8(v2, bit_nr / 8) &
> +                 (0x80 >> (bit_nr % 8)));

I think this would be clearer as

  bit = (s390_vec_read_element8(v2, bit_nr / 8)
         >> (7 - (bit_nr % 8))) & 1;

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 03/20] s390x/tcg: Implement VECTOR MULTIPLY SUM LOGICAL
  2020-09-30 14:55 ` [PATCH v1 03/20] s390x/tcg: Implement VECTOR MULTIPLY SUM LOGICAL David Hildenbrand
@ 2020-10-01 15:26   ` Richard Henderson
  2020-10-01 17:30     ` David Hildenbrand
  0 siblings, 1 reply; 47+ messages in thread
From: Richard Henderson @ 2020-10-01 15:26 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 9/30/20 9:55 AM, David Hildenbrand wrote:
> +    /* Multipy both even elements from v2 and v3 */
> +    read_vec_element_i64(l1, get_field(s, v2), 0, ES_64);
> +    read_vec_element_i64(h1, get_field(s, v3), 0, ES_64);
> +    tcg_gen_mulu2_i64(l1, h1, l1, h1);
> +    /* Shift result left by one bit if requested */
> +    if (extract32(get_field(s, m6), 3, 1)) {
> +        tcg_gen_extract2_i64(h1, l1, h1, 63);
> +        tcg_gen_shli_i64(l1, l1, 1);
> +    }

Not a bug, but some hosts require 3 insns for extract2 (so 4 total for this
sequence).

This doubling can also be had via add2:

    tcg_gen_add2_i64(l1, h1, l1, h1, l1, h1);

At which point most hosts will require only 2 insns for this sequence.  The two
hosts that don't have a carry bit (mips, riscv), will still be able to perform
the add in 3 insns.

So add is never more expensive and sometimes half as expensive.

Regardless,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 04/20] s390x/tcg: Implement 32/128 bit for VECTOR FP ADD
  2020-09-30 14:55 ` [PATCH v1 04/20] s390x/tcg: Implement 32/128 bit for VECTOR FP ADD David Hildenbrand
@ 2020-10-01 15:45   ` Richard Henderson
  2020-10-01 16:08   ` Richard Henderson
  1 sibling, 0 replies; 47+ messages in thread
From: Richard Henderson @ 2020-10-01 15:45 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 9/30/20 9:55 AM, David Hildenbrand wrote:
> -typedef uint64_t (*vop64_3_fn)(uint64_t a, uint64_t b, float_status *s);
> -static void vop64_3(S390Vector *v1, const S390Vector *v2, const S390Vector *v3,
> -                    CPUS390XState *env, bool s, vop64_3_fn fn,
> -                    uintptr_t retaddr)
> -{
> -    uint8_t vxc, vec_exc = 0;
> -    S390Vector tmp = {};
> -    int i;
> -
> -    for (i = 0; i < 2; i++) {
> -        const uint64_t a = s390_vec_read_element64(v2, i);
> -        const uint64_t b = s390_vec_read_element64(v3, i);
> -
> -        s390_vec_write_element64(&tmp, i, fn(a, b, &env->fpu_status));
> -        vxc = check_ieee_exc(env, i, false, &vec_exc);
> -        if (s || vxc) {
> -            break;
> -        }
> -    }
> -    handle_ieee_exc(env, vxc, vec_exc, retaddr);
> -    *v1 = tmp;
> -}
...
> +#define DEF_VOP_3(BITS)                                                        \
> +typedef float##BITS (*vop##BITS##_3_fn)(float##BITS a, float##BITS b,          \
> +                                        float_status *s);                      \
> +static void vop##BITS##_3(S390Vector *v1, const S390Vector *v2,                \
> +                          const S390Vector *v3, CPUS390XState *env, bool s,    \
> +                          vop##BITS##_3_fn fn, uintptr_t retaddr)              \
> +{                                                                              \
> +    uint8_t vxc, vec_exc = 0;                                                  \
> +    S390Vector tmp = {};                                                       \
> +    int i;                                                                     \
> +                                                                               \
> +    for (i = 0; i < (128 / BITS); i++) {                                       \
> +        const float##BITS a = s390_vec_read_float##BITS(v2, i);                \
> +        const float##BITS b = s390_vec_read_float##BITS(v3, i);                \
> +                                                                               \
> +        s390_vec_write_float##BITS(&tmp, i, fn(a, b, &env->fpu_status));       \
> +        vxc = check_ieee_exc(env, i, false, &vec_exc);                         \
> +        if (s || vxc) {                                                        \
> +            break;                                                             \
> +        }                                                                      \
> +    }                                                                          \
> +    handle_ieee_exc(env, vxc, vec_exc, retaddr);                               \
> +    *v1 = tmp;                                                                 \
> +}
> +DEF_VOP_3(32)
> +DEF_VOP_3(64)
> +DEF_VOP_3(128)

While this works, you won't be able to step through this function in the
debugger anymore, because it now has one source line: at the point of expansion.

We do have plenty of these around the code base, I know.  This is small enough
that I think it's reasonable to simply have three copies, one for each type.

> +#define DEF_GVEC_FVA(BITS)                                                     \
> +void HELPER(gvec_vfa##BITS)(void *v1, const void *v2, const void *v3,          \
> +                            CPUS390XState *env, uint32_t desc)                 \
> +{                                                                              \
> +    vop##BITS##_3(v1, v2, v3, env, false, float##BITS##_add, GETPC());         \
> +}
> +DEF_GVEC_FVA(32)
> +DEF_GVEC_FVA(64)
> +DEF_GVEC_FVA(128)
> +
> +#define DEF_GVEC_FVA_S(BITS)                                                   \
> +void HELPER(gvec_vfa##BITS##s)(void *v1, const void *v2, const void *v3,       \
> +                               CPUS390XState *env, uint32_t desc)              \
> +{                                                                              \
> +    vop##BITS##_3(v1, v2, v3, env, true, float##BITS##_add, GETPC());          \
> +}
> +DEF_GVEC_FVA_S(32)
> +DEF_GVEC_FVA_S(64)
I think you're defining these macros with the wrong parameters.  Think of how
to use the same macros for all of add/sub/etc.

E.g.

#define DEF_FOP3_B(NAME, OP, BITS) \...
  void HELPER(gvec_##NAME##BITS)(void *v1, const void *v2,
      const void *v3, CPUS390XState *env, uint32_t desc)
  {
    vop##BITS##_3(v1, v2, v3, env, false,
                  float##BITS##_##OP, GETPC());
  }
  void HELPER(gvec_##NAME##BITS##s)(void *v1, const void *v2,
      const void *v3, CPUS390XState *env, uint32_t desc)
  {
    vop##BITS##_3(v1, v2, v3, env, true,
                  float##BITS##_##OP, GETPC());
  }

#define DEF_FOP3(NAME, OP) \
  DEF_FOP3_B(NAME, OP, 32) \
  DEF_FOP3_B(NAME, OP, 64) \
  DEF_FOP3_B(NAME, OP, 128)

DEF_FOP3(vfa, add)
DEF_FOP3(vfd, div)
DEF_FOP3(vfm, mul)

etc.


r~


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 08/20] s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE (AND SIGNAL) SCALAR
  2020-09-30 14:55 ` [PATCH v1 08/20] s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE (AND SIGNAL) SCALAR David Hildenbrand
@ 2020-10-01 15:52   ` Richard Henderson
  0 siblings, 0 replies; 47+ messages in thread
From: Richard Henderson @ 2020-10-01 15:52 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 9/30/20 9:55 AM, David Hildenbrand wrote:
> @@ -2601,19 +2601,41 @@ static DisasJumpType op_wfc(DisasContext *s, DisasOps *o)
>  {
>      const uint8_t fpf = get_field(s, m3);
>      const uint8_t m4 = get_field(s, m4);
> +    gen_helper_gvec_2_ptr *fn = NULL;
>  
> -    if (fpf != FPF_LONG || m4) {
> +    switch (fpf) {
> +    case FPF_SHORT:
> +        if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
> +            fn = gen_helper_gvec_wfk32;
> +            if (s->fields.op2 == 0xcb) {

Hoist and name this comparision (e.g. bool signal = ...).

> -static int wfc64(const S390Vector *v1, const S390Vector *v2,
> -                 CPUS390XState *env, bool signal, uintptr_t retaddr)
> -{
> -    /* only the zero-indexed elements are compared */
> -    const float64 a = s390_vec_read_element64(v1, 0);
> -    const float64 b = s390_vec_read_element64(v2, 0);
> -    uint8_t vxc, vec_exc = 0;
> -    int cmp;
> -
> -    if (signal) {
> -        cmp = float64_compare(a, b, &env->fpu_status);
> -    } else {
> -        cmp = float64_compare_quiet(a, b, &env->fpu_status);
> -    }
> -    vxc = check_ieee_exc(env, 0, false, &vec_exc);
> -    handle_ieee_exc(env, vxc, vec_exc, retaddr);
> -
> -    return float_comp_to_cc(env, cmp);
> +#define DEF_WFC(BITS)                                                          \
> +static int wfc##BITS(const S390Vector *v1, const S390Vector *v2,               \
> +                     CPUS390XState *env, bool signal, uintptr_t retaddr)       \
> +{                                                                              \
> +    /* only the zero-indexed elements are compared */                          \
> +    const float##BITS a = s390_vec_read_float##BITS(v1, 0);                    \
> +    const float##BITS b = s390_vec_read_float##BITS(v2, 0);                    \
> +    uint8_t vxc, vec_exc = 0;                                                  \
> +    int cmp;                                                                   \
> +                                                                               \
> +    if (signal) {                                                              \
> +        cmp = float##BITS##_compare(a, b, &env->fpu_status);                   \
> +    } else {                                                                   \
> +        cmp = float##BITS##_compare_quiet(a, b, &env->fpu_status);             \
> +    }                                                                          \
> +    vxc = check_ieee_exc(env, 0, false, &vec_exc);                             \
> +    handle_ieee_exc(env, vxc, vec_exc, retaddr);                               \
> +                                                                               \
> +    return float_comp_to_cc(env, cmp);                                         \
>  }
> +DEF_WFC(32)
> +DEF_WFC(64)
> +DEF_WFC(128)

So, same issue here vs debugging.

If you keep this macroized, I don't see the value in two levels of macros...

> +#define DEF_GVEC_WFC(BITS)                                                     \
> +void HELPER(gvec_wfc##BITS)(const void *v1, const void *v2, CPUS390XState *env,\
> +                            uint32_t desc)                                     \
> +{                                                                              \
> +    env->cc_op = wfc##BITS(v1, v2, env, false, GETPC());                       \
>  }
> +DEF_GVEC_WFC(32)
> +DEF_GVEC_WFC(64)
> +DEF_GVEC_WFC(128)
>  
> +#define DEF_GVEC_WFK(BITS)                                                     \
> +void HELPER(gvec_wfk##BITS)(const void *v1, const void *v2, CPUS390XState *env,\
> +                            uint32_t desc)                                     \
> +{                                                                              \
> +    env->cc_op = wfc##BITS(v1, v2, env, true, GETPC());                        \
>  }
> +DEF_GVEC_WFK(32)
> +DEF_GVEC_WFK(64)
> +DEF_GVEC_WFK(128)

These could be folded in to the first macro via parameters.


r~



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 04/20] s390x/tcg: Implement 32/128 bit for VECTOR FP ADD
  2020-09-30 14:55 ` [PATCH v1 04/20] s390x/tcg: Implement 32/128 bit for VECTOR FP ADD David Hildenbrand
  2020-10-01 15:45   ` Richard Henderson
@ 2020-10-01 16:08   ` Richard Henderson
  2020-10-01 17:08     ` David Hildenbrand
  1 sibling, 1 reply; 47+ messages in thread
From: Richard Henderson @ 2020-10-01 16:08 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 9/30/20 9:55 AM, David Hildenbrand wrote:
> +        case FPF_LONG:
> +            fn = se ? gen_helper_gvec_vfa64s : gen_helper_gvec_vfa64;
> +            break;

BTW, any reason not to pass SE as data, like you do later for SQ?  Or
potentially the entire M field as is?

Just wondering if it would help tidy up here...


r~


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 09/20] s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE *
  2020-09-30 14:55 ` [PATCH v1 09/20] s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE * David Hildenbrand
@ 2020-10-01 16:12   ` Richard Henderson
  0 siblings, 0 replies; 47+ messages in thread
From: Richard Henderson @ 2020-10-01 16:12 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 9/30/20 9:55 AM, David Hildenbrand wrote:
> -void HELPER(gvec_vfce64)(void *v1, const void *v2, const void *v3,
> -                         CPUS390XState *env, uint32_t desc)
> -{
> -    vfc64(v1, v2, v3, env, false, float64_eq_quiet, GETPC());
> +#define DEF_GVEC_VFCE(BITS)                                                    \
> +void HELPER(gvec_vfce##BITS)(void *v1, const void *v2, const void *v3,         \
> +                             CPUS390XState *env, uint32_t desc)                \
> +{                                                                              \
> +    const bool sq = simd_data(desc);                                           \
> +                                                                               \
> +    vfc##BITS(v1, v2, v3, env, false,                                          \
> +              sq ? float##BITS##_eq : float##BITS##_eq_quiet, GETPC());        \
>  }
> +DEF_GVEC_VFCE(32)
> +DEF_GVEC_VFCE(64)
> +DEF_GVEC_VFCE(128)
>  
> -void HELPER(gvec_vfce64s)(void *v1, const void *v2, const void *v3,
> -                          CPUS390XState *env, uint32_t desc)
> -{
> -    vfc64(v1, v2, v3, env, true, float64_eq_quiet, GETPC());
> +#define DEF_GVEC_VFCE_S(BITS)                                                  \
> +void HELPER(gvec_vfce##BITS##s)(void *v1, const void *v2, const void *v3,      \
> +                                CPUS390XState *env, uint32_t desc)             \
> +{                                                                              \
> +    const bool sq = simd_data(desc);                                           \
> +                                                                               \
> +    vfc##BITS(v1, v2, v3, env, true,                                           \
> +              sq ? float##BITS##_eq : float##BITS##_eq_quiet, GETPC());        \
>  }
> +DEF_GVEC_VFCE_S(32)
> +DEF_GVEC_VFCE_S(64)
>  
> -void HELPER(gvec_vfce64_cc)(void *v1, const void *v2, const void *v3,
> -                            CPUS390XState *env, uint32_t desc)
> -{
> -    env->cc_op = vfc64(v1, v2, v3, env, false, float64_eq_quiet, GETPC());
> +#define DEF_GVEC_VFCE_CC(BITS)                                                 \
> +void HELPER(gvec_vfce##BITS##_cc)(void *v1, const void *v2, const void *v3,    \
> +                                  CPUS390XState *env, uint32_t desc)           \
> +{                                                                              \
> +    const bool sq = simd_data(desc);                                           \
> +                                                                               \
> +    env->cc_op = vfc##BITS(v1, v2, v3, env, false,                             \
> +                           sq ? float##BITS##_eq : float##BITS##_eq_quiet,     \
> +                           GETPC());                                           \
>  }
> +DEF_GVEC_VFCE_CC(32)
> +DEF_GVEC_VFCE_CC(64)
> +DEF_GVEC_VFCE_CC(128)
>  
> -void HELPER(gvec_vfce64s_cc)(void *v1, const void *v2, const void *v3,
> -                            CPUS390XState *env, uint32_t desc)
> -{
> -    env->cc_op = vfc64(v1, v2, v3, env, true, float64_eq_quiet, GETPC());
> +#define DEF_GVEC_VFCE_S_CC(BITS)                                               \
> +void HELPER(gvec_vfce##BITS##s_cc)(void *v1, const void *v2, const void *v3,   \
> +                                   CPUS390XState *env, uint32_t desc)          \
> +{                                                                              \
> +    const bool sq = simd_data(desc);                                           \
> +                                                                               \
> +    env->cc_op = vfc##BITS(v1, v2, v3, env, true,                              \
> +                           sq ? float##BITS##_eq : float##BITS##_eq_quiet,     \
> +                           GETPC());                                           \
>  }
> +DEF_GVEC_VFCE_S_CC(32)
> +DEF_GVEC_VFCE_S_CC(64)

These macros are at the wrong level.  You shouldn't need separate macros for EQ
vs LT, etc.


r~


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 11/20] s390x/tcg: Implement 64 bit for VECTOR FP LOAD LENGTHENED
  2020-09-30 14:55 ` [PATCH v1 11/20] s390x/tcg: Implement 64 bit for VECTOR FP LOAD LENGTHENED David Hildenbrand
@ 2020-10-01 16:19   ` Richard Henderson
  0 siblings, 0 replies; 47+ messages in thread
From: Richard Henderson @ 2020-10-01 16:19 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 9/30/20 9:55 AM, David Hildenbrand wrote:
> 64 bit -> 128 bit, there is only a single final element.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  1 +
>  target/s390x/translate_vx.c.inc | 21 ++++++++++++++++-----
>  target/s390x/vec_fpu_helper.c   | 13 +++++++++++++
>  3 files changed, 30 insertions(+), 5 deletions(-)


Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 12/20] s390x/tcg: Implement 128 bit for VECTOR FP LOAD ROUNDED
  2020-09-30 14:55 ` [PATCH v1 12/20] s390x/tcg: Implement 128 bit for VECTOR FP LOAD ROUNDED David Hildenbrand
@ 2020-10-01 16:21   ` Richard Henderson
  0 siblings, 0 replies; 47+ messages in thread
From: Richard Henderson @ 2020-10-01 16:21 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 9/30/20 9:55 AM, David Hildenbrand wrote:
> 128 bit -> 64 bit, there is only a single element to process.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  1 +
>  target/s390x/translate_vx.c.inc | 11 ++++++++++-
>  target/s390x/vec_fpu_helper.c   | 19 +++++++++++++++++++
>  3 files changed, 30 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 13/20] s390x/tcg: Implement 32/128 bit for VECTOR FP PERFORM SIGN OPERATION
  2020-09-30 14:55 ` [PATCH v1 13/20] s390x/tcg: Implement 32/128 bit for VECTOR FP PERFORM SIGN OPERATION David Hildenbrand
@ 2020-10-01 16:24   ` Richard Henderson
  0 siblings, 0 replies; 47+ messages in thread
From: Richard Henderson @ 2020-10-01 16:24 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 9/30/20 9:55 AM, David Hildenbrand wrote:
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/translate_vx.c.inc | 100 +++++++++++++++++++++-----------
>  1 file changed, 67 insertions(+), 33 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 15/20] s390x/tcg: Implement 32/128 bit for VECTOR FP TEST DATA CLASS IMMEDIATE
  2020-09-30 14:55 ` [PATCH v1 15/20] s390x/tcg: Implement 32/128 bit for VECTOR FP TEST DATA CLASS IMMEDIATE David Hildenbrand
@ 2020-10-01 16:30   ` Richard Henderson
  0 siblings, 0 replies; 47+ messages in thread
From: Richard Henderson @ 2020-10-01 16:30 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 9/30/20 9:55 AM, David Hildenbrand wrote:
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/helper.h           |  3 ++
>  target/s390x/translate_vx.c.inc | 26 ++++++++---
>  target/s390x/vec_fpu_helper.c   | 76 +++++++++++++++++++--------------
>  3 files changed, 69 insertions(+), 36 deletions(-)
> 
> diff --git a/target/s390x/helper.h b/target/s390x/helper.h
> index bee283e3d4..c2ded83669 100644
> --- a/target/s390x/helper.h
> +++ b/target/s390x/helper.h
> @@ -331,8 +331,11 @@ DEF_HELPER_FLAGS_5(gvec_vfs32s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
>  DEF_HELPER_FLAGS_5(gvec_vfs64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
>  DEF_HELPER_FLAGS_5(gvec_vfs64s, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
>  DEF_HELPER_FLAGS_5(gvec_vfs128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
> +DEF_HELPER_4(gvec_vftci32, void, ptr, cptr, env, i32)
> +DEF_HELPER_4(gvec_vftci32s, void, ptr, cptr, env, i32)
>  DEF_HELPER_4(gvec_vftci64, void, ptr, cptr, env, i32)
>  DEF_HELPER_4(gvec_vftci64s, void, ptr, cptr, env, i32)
> +DEF_HELPER_4(gvec_vftci128, void, ptr, cptr, env, i32)
>  
>  #ifndef CONFIG_USER_ONLY
>  DEF_HELPER_3(servc, i32, env, i64, i64)
> diff --git a/target/s390x/translate_vx.c.inc b/target/s390x/translate_vx.c.inc
> index 7d4811ccf7..6bd599b319 100644
> --- a/target/s390x/translate_vx.c.inc
> +++ b/target/s390x/translate_vx.c.inc
> @@ -2991,16 +2991,32 @@ static DisasJumpType op_vftci(DisasContext *s, DisasOps *o)
>      const uint16_t i3 = get_field(s, i3);
>      const uint8_t fpf = get_field(s, m4);
>      const uint8_t m5 = get_field(s, m5);
> -    gen_helper_gvec_2_ptr *fn = gen_helper_gvec_vftci64;
> +    const bool se = extract32(m5, 3, 1);
> +    gen_helper_gvec_2_ptr *fn = NULL;
>  
> -    if (fpf != FPF_LONG || extract32(m5, 0, 3)) {
> +    switch (fpf) {
> +    case FPF_SHORT:
> +        if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
> +            fn = se ? gen_helper_gvec_vftci32s : gen_helper_gvec_vftci32;
> +        }
> +        break;
> +    case FPF_LONG:
> +        fn = se ? gen_helper_gvec_vftci64s : gen_helper_gvec_vftci64;
> +        break;
> +    case FPF_EXT:
> +        if (s390_has_feat(S390_FEAT_VECTOR_ENH)) {
> +            fn = gen_helper_gvec_vftci128;
> +        }
> +        break;
> +    default:
> +        break;
> +    }
> +
> +    if (!fn || extract32(m5, 0, 3)) {
>          gen_program_exception(s, PGM_SPECIFICATION);
>          return DISAS_NORETURN;
>      }
>  
> -    if (extract32(m5, 3, 1)) {
> -        fn = gen_helper_gvec_vftci64s;
> -    }
>      gen_gvec_2_ptr(get_field(s, v1), get_field(s, v2), cpu_env, i3, fn);
>      set_cc_static(s);
>      return DISAS_NEXT;
> diff --git a/target/s390x/vec_fpu_helper.c b/target/s390x/vec_fpu_helper.c
> index b7045e85d6..f18f0ae8e2 100644
> --- a/target/s390x/vec_fpu_helper.c
> +++ b/target/s390x/vec_fpu_helper.c
> @@ -23,6 +23,9 @@
>  const float32 float32_ones = make_float32(-1u);
>  const float64 float64_ones = make_float64(-1ull);
>  const float128 float128_ones = make_float128(-1ull, -1ull);
> +const float32 float32_zeroes = make_float32(0);
> +const float64 float64_zeroes = make_float64(0);
> +const float128 float128_zeroes = make_float128(0, 0);

These already exist as "zero" not "zeroes".

Otherwise looks ok, modulo the same comments as for all the other macros in
this series.


r~


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 18/20] s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM)
  2020-09-30 14:55 ` [PATCH v1 18/20] s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM) David Hildenbrand
@ 2020-10-01 16:49   ` Richard Henderson
  0 siblings, 0 replies; 47+ messages in thread
From: Richard Henderson @ 2020-10-01 16:49 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 9/30/20 9:55 AM, David Hildenbrand wrote:
> +typedef enum S390MinMaxType {
> +    s390_minmax_java_math_min,
> +    s390_minmax_java_math_max,
> +    s390_minmax_c_macro_min,
> +    s390_minmax_c_macro_max,
> +    s390_minmax_fmin,
> +    s390_minmax_fmax,
> +    s390_minmax_cpp_alg_min,
> +    s390_minmax_cpp_alg_max,
> +} S390MinMaxType;

I think you'd do just as well making this enum match M6, so that no translation
is necessary.

> +
> +#define S390_MINMAX(BITS, TYPE)                                                \
> +static float##BITS TYPE##BITS(float##BITS a, float##BITS b, float_status *s)   \
> +{                                                                              \
> +    const bool zero_a = float##BITS##_is_infinity(a);                          \
> +    const bool zero_b = float##BITS##_is_infinity(b);                          \
> +    const bool inf_a = float##BITS##_is_infinity(a);                           \
> +    const bool inf_b = float##BITS##_is_infinity(b);                           \
> +    const bool nan_a = float##BITS##_is_infinity(a);                           \
> +    const bool nan_b = float##BITS##_is_infinity(b);                           \

Wrong lookups.

Note that you've already got float*_dcmask which you could use to help out
here.  You just need some named constants to help with the 12 different bits.

> +        switch (TYPE) {                                                        \
> +        case s390_minmax_java_math_min:                                        \
> +        case s390_minmax_java_math_max:                                        \

I think you should make TYPE a function parameter, and not replicate this
function N times, and so you also don't need

> +static vop32_3_fn const vfmax_fns32[16] = {
> +    [0] = float32_maxnum,
> +    [1] = s390_minmax_java_math_max32,
> +    [2] = s390_minmax_c_macro_max32,
> +    [3] = s390_minmax_cpp_alg_max32,
> +    [4] = s390_minmax_fmax32,
> +    [8] = float32_maxnummag,
> +    [9] = s390_minmax_java_math_max_abs32,
> +    [10] = s390_minmax_c_macro_max_abs32,
> +    [11] = s390_minmax_cpp_alg_max_abs32,
> +    [12] = s390_minmax_fmax_abs32,
> +};
> +
> +static vop64_3_fn const vfmax_fns64[16] = {
> +    [0] = float64_maxnum,
> +    [1] = s390_minmax_java_math_max64,
> +    [2] = s390_minmax_c_macro_max64,
> +    [3] = s390_minmax_cpp_alg_max64,
> +    [4] = s390_minmax_fmax64,
> +    [8] = float64_maxnummag,
> +    [9] = s390_minmax_java_math_max_abs64,
> +    [10] = s390_minmax_c_macro_max_abs64,
> +    [11] = s390_minmax_cpp_alg_max_abs64,
> +    [12] = s390_minmax_fmax_abs64,
> +};
> +
> +static vop128_3_fn const vfmax_fns128[16] = {
> +    [0] = float128_maxnum,
> +    [1] = s390_minmax_java_math_max128,
> +    [2] = s390_minmax_c_macro_max128,
> +    [3] = s390_minmax_cpp_alg_max128,
> +    [4] = s390_minmax_fmax128,
> +    [8] = float128_maxnummag,
> +    [9] = s390_minmax_java_math_max_abs128,
> +    [10] = s390_minmax_c_macro_max_abs128,
> +    [11] = s390_minmax_cpp_alg_max_abs128,
> +    [12] = s390_minmax_fmax_abs128,
> +};

these tables.

I think that the bulk of your minmax could be done exclusively with dcmask, so
there could be a common non-macroized function that returns an indication of
whether A or B (possibly silenced) should be the result, or if we should use
one of your two compare cases.

BTW, for your two "simple comparison" cases, have we eliminated all of the
special cases?  Could we in fact be calling floatN_min/max instead of
floatN_le_quiet?


r~


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 19/20] s390x/tcg: We support Vector enhancements facility
  2020-09-30 14:55 ` [PATCH v1 19/20] s390x/tcg: We support Vector enhancements facility David Hildenbrand
@ 2020-10-01 16:50   ` Richard Henderson
  0 siblings, 0 replies; 47+ messages in thread
From: Richard Henderson @ 2020-10-01 16:50 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 9/30/20 9:55 AM, David Hildenbrand wrote:
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/gen-features.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 20/20] s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z14 GA2
  2020-09-30 14:55 ` [PATCH v1 20/20] s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z14 GA2 David Hildenbrand
@ 2020-10-01 16:52   ` Richard Henderson
  0 siblings, 0 replies; 47+ messages in thread
From: Richard Henderson @ 2020-10-01 16:52 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Christian Borntraeger, qemu-s390x, Cornelia Huck, Thomas Huth

On 9/30/20 9:55 AM, David Hildenbrand wrote:
> TCG implements everything we need to run basic z14 OS+software.
> 
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/s390x/s390-virtio-ccw.c  |  2 ++
>  target/s390x/cpu_models.c   |  4 ++--
>  target/s390x/gen-features.c | 15 +++++++++------
>  3 files changed, 13 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 04/20] s390x/tcg: Implement 32/128 bit for VECTOR FP ADD
  2020-10-01 16:08   ` Richard Henderson
@ 2020-10-01 17:08     ` David Hildenbrand
  0 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2020-10-01 17:08 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 01.10.20 18:08, Richard Henderson wrote:
> On 9/30/20 9:55 AM, David Hildenbrand wrote:
>> +        case FPF_LONG:
>> +            fn = se ? gen_helper_gvec_vfa64s : gen_helper_gvec_vfa64;
>> +            break;
> 
> BTW, any reason not to pass SE as data, like you do later for SQ?  Or
> potentially the entire M field as is?

Having a separate implementation for "se" is desirable, because the
compiler can optimize-out the complete loop. If we simply pass the M
field to the helper, I'm not sure how likely it is that the compiler
will specialize (would have to double check).

(if we decide to remove all "s" helpers here, we'd better do it for all
 helpers)

> 
> Just wondering if it would help tidy up here...



-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 02/20] s390x/tcg: Implement VECTOR BIT PERMUTE
  2020-10-01 15:17   ` Richard Henderson
@ 2020-10-01 17:28     ` David Hildenbrand
  0 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2020-10-01 17:28 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 01.10.20 17:17, Richard Henderson wrote:
> On 9/30/20 9:55 AM, David Hildenbrand wrote:
>> +        bit = !!(s390_vec_read_element8(v2, bit_nr / 8) &
>> +                 (0x80 >> (bit_nr % 8)));
> 
> I think this would be clearer as
> 
>   bit = (s390_vec_read_element8(v2, bit_nr / 8)
>          >> (7 - (bit_nr % 8))) & 1;

Can do, thanks!

> 
> Otherwise,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> 
> r~
> 


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 03/20] s390x/tcg: Implement VECTOR MULTIPLY SUM LOGICAL
  2020-10-01 15:26   ` Richard Henderson
@ 2020-10-01 17:30     ` David Hildenbrand
  0 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2020-10-01 17:30 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x, Cornelia Huck, Thomas Huth

On 01.10.20 17:26, Richard Henderson wrote:
> On 9/30/20 9:55 AM, David Hildenbrand wrote:
>> +    /* Multipy both even elements from v2 and v3 */
>> +    read_vec_element_i64(l1, get_field(s, v2), 0, ES_64);
>> +    read_vec_element_i64(h1, get_field(s, v3), 0, ES_64);
>> +    tcg_gen_mulu2_i64(l1, h1, l1, h1);
>> +    /* Shift result left by one bit if requested */
>> +    if (extract32(get_field(s, m6), 3, 1)) {
>> +        tcg_gen_extract2_i64(h1, l1, h1, 63);
>> +        tcg_gen_shli_i64(l1, l1, 1);
>> +    }
> 
> Not a bug, but some hosts require 3 insns for extract2 (so 4 total for this
> sequence).
> 
> This doubling can also be had via add2:
> 
>     tcg_gen_add2_i64(l1, h1, l1, h1, l1, h1);

Took me longer than it should to realize this is really just doubling
the value ... will use tcg_gen_add2_i64() and add a comment.

Thanks!


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14
  2020-10-01 15:07 ` Richard Henderson
@ 2020-10-07 13:09   ` David Hildenbrand
  0 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2020-10-07 13:09 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Peter Maydell, Thomas Huth, Cornelia Huck, Christian Borntraeger,
	qemu-s390x, Alex Bennée, Aurelien Jarno

On 01.10.20 17:07, Richard Henderson wrote:
> On 9/30/20 9:55 AM, David Hildenbrand wrote:
>> This series adds support for the "Vector enhancements facility" and bumps
>> the qemu CPU model tp to a stripped-down z14.
>>
>> I yet have to find a way to get more test coverage - looks like some of
>> the functions aren't used anywhere yet (e.g., VECTOR FP MAXIMUM), writing
>> unit tests to cover all functions and cases is just nasty. But I might be
>> wrong - I'm planning to at least test basic functionality of all new added
>> instructions.
> 
> This is where RISU can be helpful.  Auto-generate 100k random variations,
> record known good results on hardware, verify that replay on qemu produces the
> same results.

Makes sense, however, some corner cases (e.g., MAX(+0, -O)) still might
have to be handled manually.

It might take me a while to address the feedback (I'm  fairly busy as
usual ...). Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14
  2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
                   ` (21 preceding siblings ...)
  2020-10-01 15:07 ` Richard Henderson
@ 2021-05-05 10:55 ` David Hildenbrand
  22 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-05-05 10:55 UTC (permalink / raw)
  To: qemu-devel, Richard Henderson
  Cc: Peter Maydell, Thomas Huth, Cornelia Huck, Christian Borntraeger,
	qemu-s390x, Alex Bennée, Aurelien Jarno

On 30.09.20 16:55, David Hildenbrand wrote:
> This series adds support for the "Vector enhancements facility" and bumps
> the qemu CPU model tp to a stripped-down z14.
> 
> I yet have to find a way to get more test coverage - looks like some of
> the functions aren't used anywhere yet (e.g., VECTOR FP MAXIMUM), writing
> unit tests to cover all functions and cases is just nasty. But I might be
> wrong - I'm planning to at least test basic functionality of all new added
> instructions.
> 
> I have to make excessive use of c macros again to cover different element
> sizes (32/64/128bit). Any advise to clean things up are welcome.
> 
> This is based on:
>      "[PATCH v2 0/9] s390x/tcg: Implement some z14 facilities"
>      "[PATCH v2 00/10] softfloat: Implement float128_muladd"
> 
> Based-on: 20200928122717.30586-1-david@redhat.com
> Based-on: 20200925152047.709901-1-richard.henderson@linaro.org

Hi Richard,

I'll have a new series ready soonish. I did what you suggested and 
started generating random (valid) Vector FP instructions with (valid) 
random parameters, executed on randomly generated vectors. It's looking 
pretty good by now.

I'll still have to see to which degree I can address feedback on 
"softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag)"

Anyhow, I'd need your "softfloat: Implement float128_muladd" series -- 
any idea when you might get to continue working on that? Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 01/20] softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag)
  2020-10-01 13:15       ` Alex Bennée
@ 2021-05-05 14:54         ` David Hildenbrand
  2021-05-10  9:57           ` Alex Bennée
  0 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2021-05-05 14:54 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Peter Maydell, Thomas Huth, Cornelia Huck, Richard Henderson,
	qemu-devel, qemu-s390x, Aurelien Jarno

On 01.10.20 15:15, Alex Bennée wrote:
> 
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 30.09.20 18:10, Alex Bennée wrote:
>>>
>>> David Hildenbrand <david@redhat.com> writes:
>>>
>>>> Implementation inspired by minmax_floats(). Unfortuantely, we don't have
>>>> any tests we can simply adjust/unlock.
>>>>
>>>> Cc: Aurelien Jarno <aurelien@aurel32.net>
>>>> Cc: Peter Maydell <peter.maydell@linaro.org>
>>>> Cc: "Alex Bennée" <alex.bennee@linaro.org>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>   fpu/softfloat.c         | 100 ++++++++++++++++++++++++++++++++++++++++
>>>>   include/fpu/softfloat.h |   6 +++
>>>>   2 files changed, 106 insertions(+)
>>>>
>>>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>>>> index 9af75b9146..9463c5ea56 100644
>>>> --- a/fpu/softfloat.c
>>>> +++ b/fpu/softfloat.c
>>>> @@ -621,6 +621,8 @@ static inline FloatParts float64_unpack_raw(float64 f)
>>>>       return unpack_raw(float64_params, f);
>>>>   }
>>>>   
>>>> +static void float128_unpack(FloatParts128 *p, float128 a, float_status *status);
>>>> +
>>>>   /* Pack a float from parts, but do not canonicalize.  */
>>>>   static inline uint64_t pack_raw(FloatFmt fmt, FloatParts p)
>>>>   {
>>>> @@ -3180,6 +3182,89 @@ static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
>>>>       }
>>>>   }
>>>
>>> It would be desirable to share as much logic for this as possible with
>>> the existing minmax_floats code. I appreciate at some point we end up
>>> having to deal with fractions and we haven't found a good way to
>>> efficiently handle dealing with FloatParts and FloatParts128 in the same
>>> unrolled code, however:
>>>
>>>>   
>>>> +static float128 float128_minmax(float128 a, float128 b, bool ismin, bool ieee,
>>>> +                                bool ismag, float_status *s)
>>>> +{
>>>> +    FloatParts128 pa, pb;
>>>> +    int a_exp, b_exp;
>>>> +    bool a_less;
>>>> +
>>>> +    float128_unpack(&pa, a, s);
>>>> +    float128_unpack(&pb, b, s);
>>>> +
>>>
>>>  From here:
>>>> +    if (unlikely(is_nan(pa.cls) || is_nan(pb.cls))) {
>>>> +        /* See comment in minmax_floats() */
>>>> +        if (ieee && !is_snan(pa.cls) && !is_snan(pb.cls)) {
>>>> +            if (is_nan(pa.cls) && !is_nan(pb.cls)) {
>>>> +                return b;
>>>> +            } else if (is_nan(pb.cls) && !is_nan(pa.cls)) {
>>>> +                return a;
>>>> +            }
>>>> +        }
>>>> +
>>>> +        /* Similar logic to pick_nan(), avoiding re-packing. */
>>>> +        if (is_snan(pa.cls) || is_snan(pb.cls)) {
>>>> +            s->float_exception_flags |= float_flag_invalid;
>>>> +        }
>>>> +        if (s->default_nan_mode) {
>>>> +            return float128_default_nan(s);
>>>> +        }
>>>
>>> to here is common logic - is there anyway we could share it?
>>
>> I can try to factor it out, similar to pickNaN() - passing weird boolean
>> flags and such. It most certainly won't win in a beauty contest, that's
>> for sure.
>>>
>>> Likewise I wonder if there is scope for a float_minmax_exp helper that
>>> could be shared here?
>>
>> I'll try, but I have the feeling that it might make the code harder to
>> read than actually help. Will give it a try.
> 
> Give it a try - if it really does become harder to follow then we'll
> stick with the duplication however if we can have common code you'll
> know at least the nan handling and minmax behaviour for float128 will be
> partially tested by the 16/32/64 float code.

(finally had time to look into this)

What about something like this:



 From dd6cf176c840fc34a095cb2a158032a994aca5ef Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Tue, 29 Sep 2020 14:36:26 +0200
Subject: [PATCH] softfloat: Implement
  float128_(min|minnum|minnummag|max|maxnum|maxnummag)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Implementation inspired by minmax_floats(). Unfortuantely, we don't have
any tests we can simply adjust/unlock.

Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: "Alex Bennée" <alex.bennee@linaro.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
  fpu/softfloat.c         | 141 ++++++++++++++++++++++++++++++++--------
  include/fpu/softfloat.h |   6 ++
  2 files changed, 120 insertions(+), 27 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index db7d3a39db..f1014f6d47 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -525,6 +525,18 @@ typedef struct {
      bool sign;
  } FloatParts128;
  
+static inline FloatParts128 floatparts64_to_128(FloatParts a)
+{
+    FloatParts128 res = {
+        .frac0 = a.frac,
+        .exp = a.exp,
+        .cls = a.cls,
+        .sign = a.sign,
+    };
+
+    return res;
+}
+
  #define DECOMPOSED_BINARY_POINT    (64 - 2)
  #define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
  #define DECOMPOSED_OVERFLOW_BIT    (DECOMPOSED_IMPLICIT_BIT << 1)
@@ -621,6 +633,8 @@ static inline FloatParts float64_unpack_raw(float64 f)
      return unpack_raw(float64_params, f);
  }
  
+static void float128_unpack(FloatParts128 *p, float128 a, float_status *status);
+
  /* Pack a float from parts, but do not canonicalize.  */
  static inline uint64_t pack_raw(FloatFmt fmt, FloatParts p)
  {
@@ -3093,6 +3107,14 @@ bfloat16 uint16_to_bfloat16(uint16_t a, float_status *status)
      return uint64_to_bfloat16_scalbn(a, 0, status);
  }
  
+typedef enum MinMaxRes {
+    MINMAX_RES_A,
+    MINMAX_RES_B,
+    MINMAX_RES_SILENCE_A,
+    MINMAX_RES_SILENCE_B,
+    MINMAX_RES_DEFAULT_NAN,
+} MinMaxRes;
+
  /* Float Min/Max */
  /* min() and max() functions. These can't be implemented as
   * 'compare and pick one input' because that would mishandle
@@ -3109,27 +3131,36 @@ bfloat16 uint16_to_bfloat16(uint16_t a, float_status *status)
   * minnummag() and maxnummag() functions correspond to minNumMag()
   * and minNumMag() from the IEEE-754 2008.
   */
-static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
-                                bool ieee, bool ismag, float_status *s)
+static MinMaxRes minmax_floats128(FloatParts128 a, FloatParts128 b, bool ismin,
+                                  bool ieee, bool ismag, float_status *s)
  {
      if (unlikely(is_nan(a.cls) || is_nan(b.cls))) {
-        if (ieee) {
-            /* Takes two floating-point values `a' and `b', one of
-             * which is a NaN, and returns the appropriate NaN
-             * result. If either `a' or `b' is a signaling NaN,
-             * the invalid exception is raised.
-             */
-            if (is_snan(a.cls) || is_snan(b.cls)) {
-                return pick_nan(a, b, s);
-            } else if (is_nan(a.cls) && !is_nan(b.cls)) {
-                return b;
+        if (ieee && !is_snan(a.cls) && !is_snan(b.cls)) {
+            if (is_nan(a.cls) && !is_nan(b.cls)) {
+                return MINMAX_RES_B;
              } else if (is_nan(b.cls) && !is_nan(a.cls)) {
-                return a;
+                return MINMAX_RES_A;
              }
          }
-        return pick_nan(a, b, s);
+
+        /* Similar logic to pick_nan(), avoiding re-packing. */
+        if (is_snan(a.cls) || is_snan(b.cls)) {
+            s->float_exception_flags |= float_flag_invalid;
+        }
+        if (s->default_nan_mode) {
+            return MINMAX_RES_DEFAULT_NAN;
+        }
+        if (pickNaN(a.cls, b.cls,
+                    a.frac0 > b.frac0 ||
+                    (a.frac0 == b.frac0 && a.frac1 > b.frac1) ||
+                    (a.frac0 == b.frac0 && a.frac1 == b.frac1 &&
+                     a.sign < b.sign), s)) {
+            return is_snan(b.cls) ? MINMAX_RES_SILENCE_B : MINMAX_RES_B;
+        }
+        return is_snan(a.cls) ? MINMAX_RES_SILENCE_A : MINMAX_RES_A;
      } else {
          int a_exp, b_exp;
+        bool a_less;
  
          switch (a.cls) {
          case float_class_normal:
@@ -3160,23 +3191,44 @@ static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
              break;
          }
  
-        if (ismag && (a_exp != b_exp || a.frac != b.frac)) {
-            bool a_less = a_exp < b_exp;
-            if (a_exp == b_exp) {
-                a_less = a.frac < b.frac;
+        a_less = a_exp < b_exp;
+        if (a_exp == b_exp) {
+            a_less = a.frac0 < b.frac0;
+            if (a.frac0 == b.frac0) {
+                a_less = a.frac1 < b.frac1;
              }
-            return a_less ^ ismin ? b : a;
          }
  
-        if (a.sign == b.sign) {
-            bool a_less = a_exp < b_exp;
-            if (a_exp == b_exp) {
-                a_less = a.frac < b.frac;
-            }
-            return a.sign ^ a_less ^ ismin ? b : a;
-        } else {
-            return a.sign ^ ismin ? b : a;
+        if (ismag &&
+            (a_exp != b_exp || a.frac0 != b.frac0 || a.frac1 != b.frac1)) {
+            return a_less ^ ismin ? MINMAX_RES_B : MINMAX_RES_A;
+        } else if (a.sign == b.sign) {
+            return a.sign ^ a_less ^ ismin ? MINMAX_RES_B : MINMAX_RES_A;
          }
+        return a.sign ^ ismin ? MINMAX_RES_B : MINMAX_RES_A;
+    }
+}
+
+static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
+                                bool ieee, bool ismag, float_status *s)
+{
+    FloatParts128 ta = floatparts64_to_128(a);
+    FloatParts128 tb = floatparts64_to_128(b);
+    MinMaxRes res = minmax_floats128(ta, tb, ismin, ieee, ismag, s);
+
+    switch(res) {
+    case MINMAX_RES_A:
+        return a;
+    case MINMAX_RES_B:
+        return b;
+    case MINMAX_RES_SILENCE_A:
+        return parts_silence_nan(a, s);
+    case MINMAX_RES_SILENCE_B:
+        return parts_silence_nan(b, s);
+    case MINMAX_RES_DEFAULT_NAN:
+        return parts_default_nan(s);
+    default:
+        g_assert_not_reached();
      }
  }
  
@@ -3233,6 +3285,41 @@ BF16_MINMAX(maxnummag, false, true, true)
  
  #undef BF16_MINMAX
  
+#define F128_MINMAX(name, ismin, isiee, ismag)                          \
+float128 float128_ ## name(float128 a, float128 b, float_status *s)     \
+{                                                                       \
+    FloatParts128 pa, pb;                                               \
+    MinMaxRes res;                                                      \
+                                                                        \
+    float128_unpack(&pa, a, s);                                         \
+    float128_unpack(&pb, b, s);                                         \
+    res = minmax_floats128(pa, pb, ismin, isiee, ismag, s);             \
+                                                                        \
+    switch(res) {                                                       \
+    case MINMAX_RES_A:                                                  \
+        return a;                                                       \
+    case MINMAX_RES_B:                                                  \
+        return b;                                                       \
+    case MINMAX_RES_SILENCE_A:                                          \
+        return float128_silence_nan(a, s);                              \
+    case MINMAX_RES_SILENCE_B:                                          \
+        return float128_silence_nan(b, s);                              \
+    case MINMAX_RES_DEFAULT_NAN:                                        \
+        return float128_default_nan(s);                                 \
+    default:                                                            \
+        g_assert_not_reached();                                         \
+    }                                                                   \
+}
+
+F128_MINMAX(min, true, false, false)
+F128_MINMAX(minnum, true, true, false)
+F128_MINMAX(minnummag, true, true, true)
+F128_MINMAX(max, false, false, false)
+F128_MINMAX(maxnum, false, true, false)
+F128_MINMAX(maxnummag, false, true, true)
+
+#undef F128_MINMAX
+
  /* Floating point compare */
  static FloatRelation compare_floats(FloatParts a, FloatParts b, bool is_quiet,
                                      float_status *s)
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index a38433deb4..4fab2ef6f4 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -1201,6 +1201,12 @@ float128 float128_muladd(float128, float128, float128, int,
  float128 float128_sqrt(float128, float_status *status);
  FloatRelation float128_compare(float128, float128, float_status *status);
  FloatRelation float128_compare_quiet(float128, float128, float_status *status);
+float128 float128_min(float128, float128, float_status *status);
+float128 float128_max(float128, float128, float_status *status);
+float128 float128_minnum(float128, float128, float_status *status);
+float128 float128_maxnum(float128, float128, float_status *status);
+float128 float128_minnummag(float128, float128, float_status *status);
+float128 float128_maxnummag(float128, float128, float_status *status);
  bool float128_is_quiet_nan(float128, float_status *status);
  bool float128_is_signaling_nan(float128, float_status *status);
  float128 float128_silence_nan(float128, float_status *status);
-- 
2.30.2



-- 
Thanks,

David / dhildenb



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 01/20] softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag)
  2021-05-05 14:54         ` David Hildenbrand
@ 2021-05-10  9:57           ` Alex Bennée
  2021-05-10 10:00             ` David Hildenbrand
  0 siblings, 1 reply; 47+ messages in thread
From: Alex Bennée @ 2021-05-10  9:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Peter Maydell, Thomas Huth, Cornelia Huck, Richard Henderson,
	qemu-devel, qemu-s390x, Aurelien Jarno


David Hildenbrand <david@redhat.com> writes:

> On 01.10.20 15:15, Alex Bennée wrote:
>> David Hildenbrand <david@redhat.com> writes:
>> 
>>> On 30.09.20 18:10, Alex Bennée wrote:
>>>>
>>>> David Hildenbrand <david@redhat.com> writes:
>>>>
>>>>> Implementation inspired by minmax_floats(). Unfortuantely, we don't have
>>>>> any tests we can simply adjust/unlock.
>>>>>
>>>>> Cc: Aurelien Jarno <aurelien@aurel32.net>
>>>>> Cc: Peter Maydell <peter.maydell@linaro.org>
>>>>> Cc: "Alex Bennée" <alex.bennee@linaro.org>
>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>> ---
>>>>>   fpu/softfloat.c         | 100 ++++++++++++++++++++++++++++++++++++++++
>>>>>   include/fpu/softfloat.h |   6 +++
>>>>>   2 files changed, 106 insertions(+)
>>>>>
>>>>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>>>>> index 9af75b9146..9463c5ea56 100644
>>>>> --- a/fpu/softfloat.c
>>>>> +++ b/fpu/softfloat.c
>>>>> @@ -621,6 +621,8 @@ static inline FloatParts float64_unpack_raw(float64 f)
>>>>>       return unpack_raw(float64_params, f);
>>>>>   }
>>>>>   +static void float128_unpack(FloatParts128 *p, float128 a,
>>>>> float_status *status);
>>>>> +
>>>>>   /* Pack a float from parts, but do not canonicalize.  */
>>>>>   static inline uint64_t pack_raw(FloatFmt fmt, FloatParts p)
>>>>>   {
>>>>> @@ -3180,6 +3182,89 @@ static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
>>>>>       }
>>>>>   }
>>>>
>>>> It would be desirable to share as much logic for this as possible with
>>>> the existing minmax_floats code. I appreciate at some point we end up
>>>> having to deal with fractions and we haven't found a good way to
>>>> efficiently handle dealing with FloatParts and FloatParts128 in the same
>>>> unrolled code, however:
>>>>
>>>>>   +static float128 float128_minmax(float128 a, float128 b, bool
>>>>> ismin, bool ieee,
>>>>> +                                bool ismag, float_status *s)
>>>>> +{
>>>>> +    FloatParts128 pa, pb;
>>>>> +    int a_exp, b_exp;
>>>>> +    bool a_less;
>>>>> +
>>>>> +    float128_unpack(&pa, a, s);
>>>>> +    float128_unpack(&pb, b, s);
>>>>> +
>>>>
>>>>  From here:
>>>>> +    if (unlikely(is_nan(pa.cls) || is_nan(pb.cls))) {
>>>>> +        /* See comment in minmax_floats() */
>>>>> +        if (ieee && !is_snan(pa.cls) && !is_snan(pb.cls)) {
>>>>> +            if (is_nan(pa.cls) && !is_nan(pb.cls)) {
>>>>> +                return b;
>>>>> +            } else if (is_nan(pb.cls) && !is_nan(pa.cls)) {
>>>>> +                return a;
>>>>> +            }
>>>>> +        }
>>>>> +
>>>>> +        /* Similar logic to pick_nan(), avoiding re-packing. */
>>>>> +        if (is_snan(pa.cls) || is_snan(pb.cls)) {
>>>>> +            s->float_exception_flags |= float_flag_invalid;
>>>>> +        }
>>>>> +        if (s->default_nan_mode) {
>>>>> +            return float128_default_nan(s);
>>>>> +        }
>>>>
>>>> to here is common logic - is there anyway we could share it?
>>>
>>> I can try to factor it out, similar to pickNaN() - passing weird boolean
>>> flags and such. It most certainly won't win in a beauty contest, that's
>>> for sure.
>>>>
>>>> Likewise I wonder if there is scope for a float_minmax_exp helper that
>>>> could be shared here?
>>>
>>> I'll try, but I have the feeling that it might make the code harder to
>>> read than actually help. Will give it a try.
>> Give it a try - if it really does become harder to follow then we'll
>> stick with the duplication however if we can have common code you'll
>> know at least the nan handling and minmax behaviour for float128 will be
>> partially tested by the 16/32/64 float code.
>
> (finally had time to look into this)
>
> What about something like this:
>

I was just about to look this morning but I see Richard has posted his
mega series:

  From: Richard Henderson <richard.henderson@linaro.org>
  Subject: [PATCH 00/72] Convert floatx80 and float128 to FloatParts
  Date: Fri,  7 May 2021 18:46:50 -0700
  Message-Id: <20210508014802.892561-1-richard.henderson@linaro.org>

which I think also includes the
float128_(min|minnum|minnummag|max|maxnum|maxnummag) functions. I'm
going to have a look at that first if that's ok.

>
>
> From dd6cf176c840fc34a095cb2a158032a994aca5ef Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Tue, 29 Sep 2020 14:36:26 +0200
> Subject: [PATCH] softfloat: Implement
>  float128_(min|minnum|minnummag|max|maxnum|maxnummag)
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Implementation inspired by minmax_floats(). Unfortuantely, we don't have
> any tests we can simply adjust/unlock.
>
> Cc: Aurelien Jarno <aurelien@aurel32.net>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Cc: "Alex Bennée" <alex.bennee@linaro.org>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  fpu/softfloat.c         | 141 ++++++++++++++++++++++++++++++++--------
>  include/fpu/softfloat.h |   6 ++
>  2 files changed, 120 insertions(+), 27 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index db7d3a39db..f1014f6d47 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -525,6 +525,18 @@ typedef struct {
>      bool sign;
>  } FloatParts128;
>  +static inline FloatParts128 floatparts64_to_128(FloatParts a)
> +{
> +    FloatParts128 res = {
> +        .frac0 = a.frac,
> +        .exp = a.exp,
> +        .cls = a.cls,
> +        .sign = a.sign,
> +    };
> +
> +    return res;
> +}
> +
>  #define DECOMPOSED_BINARY_POINT    (64 - 2)
>  #define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
>  #define DECOMPOSED_OVERFLOW_BIT    (DECOMPOSED_IMPLICIT_BIT << 1)
> @@ -621,6 +633,8 @@ static inline FloatParts float64_unpack_raw(float64 f)
>      return unpack_raw(float64_params, f);
>  }
>  +static void float128_unpack(FloatParts128 *p, float128 a,
> float_status *status);
> +
>  /* Pack a float from parts, but do not canonicalize.  */
>  static inline uint64_t pack_raw(FloatFmt fmt, FloatParts p)
>  {
> @@ -3093,6 +3107,14 @@ bfloat16 uint16_to_bfloat16(uint16_t a, float_status *status)
>      return uint64_to_bfloat16_scalbn(a, 0, status);
>  }
>  +typedef enum MinMaxRes {
> +    MINMAX_RES_A,
> +    MINMAX_RES_B,
> +    MINMAX_RES_SILENCE_A,
> +    MINMAX_RES_SILENCE_B,
> +    MINMAX_RES_DEFAULT_NAN,
> +} MinMaxRes;
> +
>  /* Float Min/Max */
>  /* min() and max() functions. These can't be implemented as
>   * 'compare and pick one input' because that would mishandle
> @@ -3109,27 +3131,36 @@ bfloat16 uint16_to_bfloat16(uint16_t a, float_status *status)
>   * minnummag() and maxnummag() functions correspond to minNumMag()
>   * and minNumMag() from the IEEE-754 2008.
>   */
> -static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
> -                                bool ieee, bool ismag, float_status *s)
> +static MinMaxRes minmax_floats128(FloatParts128 a, FloatParts128 b, bool ismin,
> +                                  bool ieee, bool ismag, float_status *s)
>  {
>      if (unlikely(is_nan(a.cls) || is_nan(b.cls))) {
> -        if (ieee) {
> -            /* Takes two floating-point values `a' and `b', one of
> -             * which is a NaN, and returns the appropriate NaN
> -             * result. If either `a' or `b' is a signaling NaN,
> -             * the invalid exception is raised.
> -             */
> -            if (is_snan(a.cls) || is_snan(b.cls)) {
> -                return pick_nan(a, b, s);
> -            } else if (is_nan(a.cls) && !is_nan(b.cls)) {
> -                return b;
> +        if (ieee && !is_snan(a.cls) && !is_snan(b.cls)) {
> +            if (is_nan(a.cls) && !is_nan(b.cls)) {
> +                return MINMAX_RES_B;
>              } else if (is_nan(b.cls) && !is_nan(a.cls)) {
> -                return a;
> +                return MINMAX_RES_A;
>              }
>          }
> -        return pick_nan(a, b, s);
> +
> +        /* Similar logic to pick_nan(), avoiding re-packing. */
> +        if (is_snan(a.cls) || is_snan(b.cls)) {
> +            s->float_exception_flags |= float_flag_invalid;
> +        }
> +        if (s->default_nan_mode) {
> +            return MINMAX_RES_DEFAULT_NAN;
> +        }
> +        if (pickNaN(a.cls, b.cls,
> +                    a.frac0 > b.frac0 ||
> +                    (a.frac0 == b.frac0 && a.frac1 > b.frac1) ||
> +                    (a.frac0 == b.frac0 && a.frac1 == b.frac1 &&
> +                     a.sign < b.sign), s)) {
> +            return is_snan(b.cls) ? MINMAX_RES_SILENCE_B : MINMAX_RES_B;
> +        }
> +        return is_snan(a.cls) ? MINMAX_RES_SILENCE_A : MINMAX_RES_A;
>      } else {
>          int a_exp, b_exp;
> +        bool a_less;
>            switch (a.cls) {
>          case float_class_normal:
> @@ -3160,23 +3191,44 @@ static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
>              break;
>          }
>  -        if (ismag && (a_exp != b_exp || a.frac != b.frac)) {
> -            bool a_less = a_exp < b_exp;
> -            if (a_exp == b_exp) {
> -                a_less = a.frac < b.frac;
> +        a_less = a_exp < b_exp;
> +        if (a_exp == b_exp) {
> +            a_less = a.frac0 < b.frac0;
> +            if (a.frac0 == b.frac0) {
> +                a_less = a.frac1 < b.frac1;
>              }
> -            return a_less ^ ismin ? b : a;
>          }
>  -        if (a.sign == b.sign) {
> -            bool a_less = a_exp < b_exp;
> -            if (a_exp == b_exp) {
> -                a_less = a.frac < b.frac;
> -            }
> -            return a.sign ^ a_less ^ ismin ? b : a;
> -        } else {
> -            return a.sign ^ ismin ? b : a;
> +        if (ismag &&
> +            (a_exp != b_exp || a.frac0 != b.frac0 || a.frac1 != b.frac1)) {
> +            return a_less ^ ismin ? MINMAX_RES_B : MINMAX_RES_A;
> +        } else if (a.sign == b.sign) {
> +            return a.sign ^ a_less ^ ismin ? MINMAX_RES_B : MINMAX_RES_A;
>          }
> +        return a.sign ^ ismin ? MINMAX_RES_B : MINMAX_RES_A;
> +    }
> +}
> +
> +static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
> +                                bool ieee, bool ismag, float_status *s)
> +{
> +    FloatParts128 ta = floatparts64_to_128(a);
> +    FloatParts128 tb = floatparts64_to_128(b);
> +    MinMaxRes res = minmax_floats128(ta, tb, ismin, ieee, ismag, s);
> +
> +    switch(res) {
> +    case MINMAX_RES_A:
> +        return a;
> +    case MINMAX_RES_B:
> +        return b;
> +    case MINMAX_RES_SILENCE_A:
> +        return parts_silence_nan(a, s);
> +    case MINMAX_RES_SILENCE_B:
> +        return parts_silence_nan(b, s);
> +    case MINMAX_RES_DEFAULT_NAN:
> +        return parts_default_nan(s);
> +    default:
> +        g_assert_not_reached();
>      }
>  }
>  @@ -3233,6 +3285,41 @@ BF16_MINMAX(maxnummag, false, true, true)
>    #undef BF16_MINMAX
>  +#define F128_MINMAX(name, ismin, isiee, ismag)
> \
> +float128 float128_ ## name(float128 a, float128 b, float_status *s)     \
> +{                                                                       \
> +    FloatParts128 pa, pb;                                               \
> +    MinMaxRes res;                                                      \
> +                                                                        \
> +    float128_unpack(&pa, a, s);                                         \
> +    float128_unpack(&pb, b, s);                                         \
> +    res = minmax_floats128(pa, pb, ismin, isiee, ismag, s);             \
> +                                                                        \
> +    switch(res) {                                                       \
> +    case MINMAX_RES_A:                                                  \
> +        return a;                                                       \
> +    case MINMAX_RES_B:                                                  \
> +        return b;                                                       \
> +    case MINMAX_RES_SILENCE_A:                                          \
> +        return float128_silence_nan(a, s);                              \
> +    case MINMAX_RES_SILENCE_B:                                          \
> +        return float128_silence_nan(b, s);                              \
> +    case MINMAX_RES_DEFAULT_NAN:                                        \
> +        return float128_default_nan(s);                                 \
> +    default:                                                            \
> +        g_assert_not_reached();                                         \
> +    }                                                                   \
> +}
> +
> +F128_MINMAX(min, true, false, false)
> +F128_MINMAX(minnum, true, true, false)
> +F128_MINMAX(minnummag, true, true, true)
> +F128_MINMAX(max, false, false, false)
> +F128_MINMAX(maxnum, false, true, false)
> +F128_MINMAX(maxnummag, false, true, true)
> +
> +#undef F128_MINMAX
> +
>  /* Floating point compare */
>  static FloatRelation compare_floats(FloatParts a, FloatParts b, bool is_quiet,
>                                      float_status *s)
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index a38433deb4..4fab2ef6f4 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -1201,6 +1201,12 @@ float128 float128_muladd(float128, float128, float128, int,
>  float128 float128_sqrt(float128, float_status *status);
>  FloatRelation float128_compare(float128, float128, float_status *status);
>  FloatRelation float128_compare_quiet(float128, float128, float_status *status);
> +float128 float128_min(float128, float128, float_status *status);
> +float128 float128_max(float128, float128, float_status *status);
> +float128 float128_minnum(float128, float128, float_status *status);
> +float128 float128_maxnum(float128, float128, float_status *status);
> +float128 float128_minnummag(float128, float128, float_status *status);
> +float128 float128_maxnummag(float128, float128, float_status *status);
>  bool float128_is_quiet_nan(float128, float_status *status);
>  bool float128_is_signaling_nan(float128, float_status *status);
>  float128 float128_silence_nan(float128, float_status *status);
> -- 
> 2.30.2


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v1 01/20] softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag)
  2021-05-10  9:57           ` Alex Bennée
@ 2021-05-10 10:00             ` David Hildenbrand
  0 siblings, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2021-05-10 10:00 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Peter Maydell, Thomas Huth, Cornelia Huck, Richard Henderson,
	qemu-devel, qemu-s390x, Aurelien Jarno

On 10.05.21 11:57, Alex Bennée wrote:
> 
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 01.10.20 15:15, Alex Bennée wrote:
>>> David Hildenbrand <david@redhat.com> writes:
>>>
>>>> On 30.09.20 18:10, Alex Bennée wrote:
>>>>>
>>>>> David Hildenbrand <david@redhat.com> writes:
>>>>>
>>>>>> Implementation inspired by minmax_floats(). Unfortuantely, we don't have
>>>>>> any tests we can simply adjust/unlock.
>>>>>>
>>>>>> Cc: Aurelien Jarno <aurelien@aurel32.net>
>>>>>> Cc: Peter Maydell <peter.maydell@linaro.org>
>>>>>> Cc: "Alex Bennée" <alex.bennee@linaro.org>
>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>> ---
>>>>>>    fpu/softfloat.c         | 100 ++++++++++++++++++++++++++++++++++++++++
>>>>>>    include/fpu/softfloat.h |   6 +++
>>>>>>    2 files changed, 106 insertions(+)
>>>>>>
>>>>>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>>>>>> index 9af75b9146..9463c5ea56 100644
>>>>>> --- a/fpu/softfloat.c
>>>>>> +++ b/fpu/softfloat.c
>>>>>> @@ -621,6 +621,8 @@ static inline FloatParts float64_unpack_raw(float64 f)
>>>>>>        return unpack_raw(float64_params, f);
>>>>>>    }
>>>>>>    +static void float128_unpack(FloatParts128 *p, float128 a,
>>>>>> float_status *status);
>>>>>> +
>>>>>>    /* Pack a float from parts, but do not canonicalize.  */
>>>>>>    static inline uint64_t pack_raw(FloatFmt fmt, FloatParts p)
>>>>>>    {
>>>>>> @@ -3180,6 +3182,89 @@ static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
>>>>>>        }
>>>>>>    }
>>>>>
>>>>> It would be desirable to share as much logic for this as possible with
>>>>> the existing minmax_floats code. I appreciate at some point we end up
>>>>> having to deal with fractions and we haven't found a good way to
>>>>> efficiently handle dealing with FloatParts and FloatParts128 in the same
>>>>> unrolled code, however:
>>>>>
>>>>>>    +static float128 float128_minmax(float128 a, float128 b, bool
>>>>>> ismin, bool ieee,
>>>>>> +                                bool ismag, float_status *s)
>>>>>> +{
>>>>>> +    FloatParts128 pa, pb;
>>>>>> +    int a_exp, b_exp;
>>>>>> +    bool a_less;
>>>>>> +
>>>>>> +    float128_unpack(&pa, a, s);
>>>>>> +    float128_unpack(&pb, b, s);
>>>>>> +
>>>>>
>>>>>   From here:
>>>>>> +    if (unlikely(is_nan(pa.cls) || is_nan(pb.cls))) {
>>>>>> +        /* See comment in minmax_floats() */
>>>>>> +        if (ieee && !is_snan(pa.cls) && !is_snan(pb.cls)) {
>>>>>> +            if (is_nan(pa.cls) && !is_nan(pb.cls)) {
>>>>>> +                return b;
>>>>>> +            } else if (is_nan(pb.cls) && !is_nan(pa.cls)) {
>>>>>> +                return a;
>>>>>> +            }
>>>>>> +        }
>>>>>> +
>>>>>> +        /* Similar logic to pick_nan(), avoiding re-packing. */
>>>>>> +        if (is_snan(pa.cls) || is_snan(pb.cls)) {
>>>>>> +            s->float_exception_flags |= float_flag_invalid;
>>>>>> +        }
>>>>>> +        if (s->default_nan_mode) {
>>>>>> +            return float128_default_nan(s);
>>>>>> +        }
>>>>>
>>>>> to here is common logic - is there anyway we could share it?
>>>>
>>>> I can try to factor it out, similar to pickNaN() - passing weird boolean
>>>> flags and such. It most certainly won't win in a beauty contest, that's
>>>> for sure.
>>>>>
>>>>> Likewise I wonder if there is scope for a float_minmax_exp helper that
>>>>> could be shared here?
>>>>
>>>> I'll try, but I have the feeling that it might make the code harder to
>>>> read than actually help. Will give it a try.
>>> Give it a try - if it really does become harder to follow then we'll
>>> stick with the duplication however if we can have common code you'll
>>> know at least the nan handling and minmax behaviour for float128 will be
>>> partially tested by the 16/32/64 float code.
>>
>> (finally had time to look into this)
>>
>> What about something like this:
>>
> 
> I was just about to look this morning but I see Richard has posted his
> mega series:
> 
>    From: Richard Henderson <richard.henderson@linaro.org>
>    Subject: [PATCH 00/72] Convert floatx80 and float128 to FloatParts
>    Date: Fri,  7 May 2021 18:46:50 -0700
>    Message-Id: <20210508014802.892561-1-richard.henderson@linaro.org>
> 
> which I think also includes the
> float128_(min|minnum|minnummag|max|maxnum|maxnummag) functions. I'm
> going to have a look at that first if that's ok.

Sure, I have the gut feeling that it follows a similar approach :)


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2021-05-10 10:06 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-30 14:55 [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 David Hildenbrand
2020-09-30 14:55 ` [PATCH v1 01/20] softfloat: Implement float128_(min|minnum|minnummag|max|maxnum|maxnummag) David Hildenbrand
2020-09-30 16:10   ` Alex Bennée
2020-10-01 12:40     ` David Hildenbrand
2020-10-01 13:15       ` Alex Bennée
2021-05-05 14:54         ` David Hildenbrand
2021-05-10  9:57           ` Alex Bennée
2021-05-10 10:00             ` David Hildenbrand
2020-09-30 14:55 ` [PATCH v1 02/20] s390x/tcg: Implement VECTOR BIT PERMUTE David Hildenbrand
2020-10-01 15:17   ` Richard Henderson
2020-10-01 17:28     ` David Hildenbrand
2020-09-30 14:55 ` [PATCH v1 03/20] s390x/tcg: Implement VECTOR MULTIPLY SUM LOGICAL David Hildenbrand
2020-10-01 15:26   ` Richard Henderson
2020-10-01 17:30     ` David Hildenbrand
2020-09-30 14:55 ` [PATCH v1 04/20] s390x/tcg: Implement 32/128 bit for VECTOR FP ADD David Hildenbrand
2020-10-01 15:45   ` Richard Henderson
2020-10-01 16:08   ` Richard Henderson
2020-10-01 17:08     ` David Hildenbrand
2020-09-30 14:55 ` [PATCH v1 05/20] s390x/tcg: Implement 32/128 bit for VECTOR FP DIVIDE David Hildenbrand
2020-09-30 14:55 ` [PATCH v1 06/20] s390x/tcg: Implement 32/128 bit for VECTOR FP MULTIPLY David Hildenbrand
2020-09-30 14:55 ` [PATCH v1 07/20] s390x/tcg: Implement 32/128 bit for VECTOR FP SUBTRACT David Hildenbrand
2020-09-30 14:55 ` [PATCH v1 08/20] s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE (AND SIGNAL) SCALAR David Hildenbrand
2020-10-01 15:52   ` Richard Henderson
2020-09-30 14:55 ` [PATCH v1 09/20] s390x/tcg: Implement 32/128 bit for VECTOR FP COMPARE * David Hildenbrand
2020-10-01 16:12   ` Richard Henderson
2020-09-30 14:55 ` [PATCH v1 10/20] s390x/tcg: Implement 32/128 bit for VECTOR LOAD FP INTEGER David Hildenbrand
2020-09-30 14:55 ` [PATCH v1 11/20] s390x/tcg: Implement 64 bit for VECTOR FP LOAD LENGTHENED David Hildenbrand
2020-10-01 16:19   ` Richard Henderson
2020-09-30 14:55 ` [PATCH v1 12/20] s390x/tcg: Implement 128 bit for VECTOR FP LOAD ROUNDED David Hildenbrand
2020-10-01 16:21   ` Richard Henderson
2020-09-30 14:55 ` [PATCH v1 13/20] s390x/tcg: Implement 32/128 bit for VECTOR FP PERFORM SIGN OPERATION David Hildenbrand
2020-10-01 16:24   ` Richard Henderson
2020-09-30 14:55 ` [PATCH v1 14/20] s390x/tcg: Implement 32/128 bit for VECTOR FP SQUARE ROOT David Hildenbrand
2020-09-30 14:55 ` [PATCH v1 15/20] s390x/tcg: Implement 32/128 bit for VECTOR FP TEST DATA CLASS IMMEDIATE David Hildenbrand
2020-10-01 16:30   ` Richard Henderson
2020-09-30 14:55 ` [PATCH v1 16/20] s390x/tcg: Implement 32/128bit for VECTOR FP MULTIPLY AND (ADD|SUBTRACT) David Hildenbrand
2020-09-30 14:55 ` [PATCH v1 17/20] s390x/tcg: Implement VECTOR FP NEGATIVE " David Hildenbrand
2020-09-30 14:55 ` [PATCH v1 18/20] s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM) David Hildenbrand
2020-10-01 16:49   ` Richard Henderson
2020-09-30 14:55 ` [PATCH v1 19/20] s390x/tcg: We support Vector enhancements facility David Hildenbrand
2020-10-01 16:50   ` Richard Henderson
2020-09-30 14:55 ` [PATCH v1 20/20] s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z14 GA2 David Hildenbrand
2020-10-01 16:52   ` Richard Henderson
2020-09-30 15:35 ` [PATCH v1 00/20] s390x/tcg: Implement Vector enhancements facility and switch to z14 no-reply
2020-10-01 15:07 ` Richard Henderson
2020-10-07 13:09   ` David Hildenbrand
2021-05-05 10:55 ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.