All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2
@ 2022-04-28  9:46 David Hildenbrand
  2022-04-28  9:46 ` [PATCH v6 01/13] target/s390x: Fix writeback to v1 in helper_vstl David Hildenbrand
                   ` (13 more replies)
  0 siblings, 14 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-04-28  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Eric Farman, David Miller, Halil Pasic, qemu-s390x,
	Christian Borntraeger

Implement Vector-Enhancements Facility 2 for s390x

resolves: https://gitlab.com/qemu-project/qemu/-/issues/738

implements:
    VECTOR LOAD ELEMENTS REVERSED               (VLER)
    VECTOR LOAD BYTE REVERSED ELEMENTS          (VLBR)
    VECTOR LOAD BYTE REVERSED ELEMENT           (VLEBRH, VLEBRF, VLEBRG)
    VECTOR LOAD BYTE REVERSED ELEMENT AND ZERO  (VLLEBRZ)
    VECTOR LOAD BYTE REVERSED ELEMENT AND REPLICATE (VLBRREP)
    VECTOR STORE ELEMENTS REVERSED              (VSTER)
    VECTOR STORE BYTE REVERSED ELEMENTS         (VSTBR)
    VECTOR STORE BYTE REVERSED ELEMENTS         (VSTEBRH, VSTEBRF, VSTEBRG)
    VECTOR SHIFT LEFT DOUBLE BY BIT             (VSLD)
    VECTOR SHIFT RIGHT DOUBLE BY BIT            (VSRD)
    VECTOR STRING SEARCH                        (VSTRS)

    modifies:
    VECTOR FP CONVERT FROM FIXED                (VCFPS)
    VECTOR FP CONVERT FROM LOGICAL              (VCFPL)
    VECTOR FP CONVERT TO FIXED                  (VCSFP)
    VECTOR FP CONVERT TO LOGICAL                (VCLFP)
    VECTOR SHIFT LEFT                           (VSL)
    VECTOR SHIFT RIGHT ARITHMETIC               (VSRA)
    VECTOR SHIFT RIGHT LOGICAL                  (VSRL)


v5 -> v6:
* Move fix to #1
* Include max CPU model cleanups
* "target/s390x: add S390_FEAT_VECTOR_ENH2 to qemu CPU model"
 -> Take care of compat machines
* "tests/tcg/s390x: Tests for Vector Enhancements Facility 2"
 -> Add missing newline to end of header file
 -> Resolve simple conflict in Makefile

Cc: Thomas Huth <thuth@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: David Miller <dmiller423@gmail.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Eric Farman <farman@linux.ibm.com>

David Hildenbrand (2):
  s390x/cpu_models: drop "msa5" from the TCG "max" model
  s390x/cpu_models: make "max" match the unmodified "qemu" CPU model
    under TCG

David Miller (9):
  target/s390x: vxeh2: vector convert short/32b
  target/s390x: vxeh2: vector string search
  target/s390x: vxeh2: Update for changes to vector shifts
  target/s390x: vxeh2: vector shift double by bit
  target/s390x: vxeh2: vector {load, store} elements reversed
  target/s390x: vxeh2: vector {load, store} byte reversed elements
  target/s390x: vxeh2: vector {load, store} byte reversed element
  target/s390x: add S390_FEAT_VECTOR_ENH2 to qemu CPU model
  tests/tcg/s390x: Tests for Vector Enhancements Facility 2

Richard Henderson (2):
  target/s390x: Fix writeback to v1 in helper_vstl
  tcg: Implement tcg_gen_{h,w}swap_{i32,i64}

 hw/s390x/s390-virtio-ccw.c           |   3 +
 include/tcg/tcg-op.h                 |   6 +
 target/s390x/cpu_models.c            |  26 +-
 target/s390x/gen-features.c          |  14 +-
 target/s390x/helper.h                |  13 +
 target/s390x/tcg/insn-data.def       |  40 ++-
 target/s390x/tcg/translate.c         |   3 +-
 target/s390x/tcg/translate_vx.c.inc  | 461 ++++++++++++++++++++++++---
 target/s390x/tcg/vec_fpu_helper.c    |  31 ++
 target/s390x/tcg/vec_helper.c        |   2 -
 target/s390x/tcg/vec_int_helper.c    |  55 ++++
 target/s390x/tcg/vec_string_helper.c |  99 ++++++
 tcg/tcg-op.c                         |  30 ++
 tests/tcg/s390x/Makefile.target      |   8 +
 tests/tcg/s390x/vx.h                 |  19 ++
 tests/tcg/s390x/vxeh2_vcvt.c         |  88 +++++
 tests/tcg/s390x/vxeh2_vlstr.c        | 139 ++++++++
 tests/tcg/s390x/vxeh2_vs.c           |  93 ++++++
 18 files changed, 1051 insertions(+), 79 deletions(-)
 create mode 100644 tests/tcg/s390x/vx.h
 create mode 100644 tests/tcg/s390x/vxeh2_vcvt.c
 create mode 100644 tests/tcg/s390x/vxeh2_vlstr.c
 create mode 100644 tests/tcg/s390x/vxeh2_vs.c

-- 
2.35.1



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v6 01/13] target/s390x: Fix writeback to v1 in helper_vstl
  2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
@ 2022-04-28  9:46 ` David Hildenbrand
  2022-04-28  9:46 ` [PATCH v6 02/13] s390x/cpu_models: drop "msa5" from the TCG "max" model David Hildenbrand
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-04-28  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Eric Farman, David Miller, Halil Pasic, qemu-s390x,
	Christian Borntraeger

From: Richard Henderson <richard.henderson@linaro.org>

Fixes: 0e0a5b49ad58 ("s390x/tcg: Implement VECTOR STORE WITH LENGTH")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: David Miller <dmiller423@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/tcg/vec_helper.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/target/s390x/tcg/vec_helper.c b/target/s390x/tcg/vec_helper.c
index ededf13cf0..48d86722b2 100644
--- a/target/s390x/tcg/vec_helper.c
+++ b/target/s390x/tcg/vec_helper.c
@@ -200,7 +200,6 @@ void HELPER(vstl)(CPUS390XState *env, const void *v1, uint64_t addr,
         addr = wrap_address(env, addr + 8);
         cpu_stq_data_ra(env, addr, s390_vec_read_element64(v1, 1), GETPC());
     } else {
-        S390Vector tmp = {};
         int i;
 
         for (i = 0; i < bytes; i++) {
@@ -209,6 +208,5 @@ void HELPER(vstl)(CPUS390XState *env, const void *v1, uint64_t addr,
             cpu_stb_data_ra(env, addr, byte, GETPC());
             addr = wrap_address(env, addr + 1);
         }
-        *(S390Vector *)v1 = tmp;
     }
 }
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 02/13] s390x/cpu_models: drop "msa5" from the TCG "max" model
  2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
  2022-04-28  9:46 ` [PATCH v6 01/13] target/s390x: Fix writeback to v1 in helper_vstl David Hildenbrand
@ 2022-04-28  9:46 ` David Hildenbrand
  2022-04-28  9:46 ` [PATCH v6 03/13] s390x/cpu_models: make "max" match the unmodified "qemu" CPU model under TCG David Hildenbrand
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-04-28  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Eric Farman, David Miller, Halil Pasic, qemu-s390x,
	Christian Borntraeger

We don't include the "msa5" feature in the "qemu" model because it
generates a warning. The PoP states:

"The message-security-assist extension 5 requires
the secure-hash-algorithm (SHA-512) capabilities of
the message-security-assist extension 2 as a prereq-
uisite. (March, 2015)"

As SHA-512 won't be supported in the near future, let's just drop the
feature from the "max" model. This avoids the warning and allows us for
making the "max" model match the "qemu" model (except for compat
machines). We don't lose much, as we only implement the function stubs
for MSA, excluding any real subfunctions.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/897
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/gen-features.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 22846121c4..7b4430f9de 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -743,8 +743,6 @@ static uint16_t qemu_LATEST[] = {
 };
 /* add all new definitions before this point */
 static uint16_t qemu_MAX[] = {
-    /* generates a dependency warning, leave it out for now */
-    S390_FEAT_MSA_EXT_5,
 };
 
 /****** END FEATURE DEFS ******/
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 03/13] s390x/cpu_models: make "max" match the unmodified "qemu" CPU model under TCG
  2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
  2022-04-28  9:46 ` [PATCH v6 01/13] target/s390x: Fix writeback to v1 in helper_vstl David Hildenbrand
  2022-04-28  9:46 ` [PATCH v6 02/13] s390x/cpu_models: drop "msa5" from the TCG "max" model David Hildenbrand
@ 2022-04-28  9:46 ` David Hildenbrand
  2022-04-28  9:46 ` [PATCH v6 04/13] tcg: Implement tcg_gen_{h,w}swap_{i32,i64} David Hildenbrand
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-04-28  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Eric Farman, David Miller, Halil Pasic, qemu-s390x,
	Christian Borntraeger

Before we were able to bump up the qemu CPU model to a z13, we included
some experimental features during development in the "max" model only.
Nowadays, the "max" model corresponds exactly to the "qemu" CPU model
of the latest QEMU machine under TCG.

Let's remove all the special casing, effectively making both models
match completely from now on, and clean up.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/cpu_models.c   | 26 +++++++-------------------
 target/s390x/gen-features.c | 11 ++++++-----
 2 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 6d71428056..1a562d2801 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -89,7 +89,6 @@ static S390CPUDef s390_cpu_defs[] = {
 #define QEMU_MAX_CPU_TYPE 0x8561
 #define QEMU_MAX_CPU_GEN 15
 #define QEMU_MAX_CPU_EC_GA 1
-static const S390FeatInit qemu_max_cpu_feat_init = { S390_FEAT_LIST_QEMU_MAX };
 static S390FeatBitmap qemu_max_cpu_feat;
 
 /* features part of a base model but not relevant for finding a base model */
@@ -728,7 +727,6 @@ static void s390_cpu_model_initfn(Object *obj)
     }
 }
 
-static S390CPUDef s390_qemu_cpu_def;
 static S390CPUModel s390_qemu_cpu_model;
 
 /* Set the qemu CPU model (on machine initialization). Must not be called
@@ -742,17 +740,8 @@ void s390_set_qemu_cpu_model(uint16_t type, uint8_t gen, uint8_t ec_ga,
     g_assert(def);
     g_assert(QTAILQ_EMPTY_RCU(&cpus));
 
-    /* TCG emulates some features that can usually not be enabled with
-     * the emulated machine generation. Make sure they can be enabled
-     * when using the QEMU model by adding them to full_feat. We have
-     * to copy the definition to do that.
-     */
-    memcpy(&s390_qemu_cpu_def, def, sizeof(s390_qemu_cpu_def));
-    bitmap_or(s390_qemu_cpu_def.full_feat, s390_qemu_cpu_def.full_feat,
-              qemu_max_cpu_feat, S390_FEAT_MAX);
-
     /* build the CPU model */
-    s390_qemu_cpu_model.def = &s390_qemu_cpu_def;
+    s390_qemu_cpu_model.def = def;
     bitmap_zero(s390_qemu_cpu_model.features, S390_FEAT_MAX);
     s390_init_feat_bitmap(feat_init, s390_qemu_cpu_model.features);
 }
@@ -885,9 +874,8 @@ static void s390_max_cpu_model_class_init(ObjectClass *oc, void *data)
 
     /*
      * The "max" model is neither static nor migration safe. Under KVM
-     * it represents the "host" model. Under TCG it represents some kind of
-     * "qemu" CPU model without compat handling and maybe with some additional
-     * CPU features that are not yet unlocked in the "qemu" model.
+     * it represents the "host" model. Under TCG it represents the "qemu" CPU
+     * model of the latest QEMU machine.
      */
     xcc->desc =
         "Enables all features supported by the accelerator in the current host";
@@ -966,13 +954,13 @@ static void init_ignored_base_feat(void)
 
 static void register_types(void)
 {
-    static const S390FeatInit qemu_latest_init = { S390_FEAT_LIST_QEMU_LATEST };
+    static const S390FeatInit qemu_max_init = { S390_FEAT_LIST_QEMU_MAX };
     int i;
 
     init_ignored_base_feat();
 
     /* init all bitmaps from gnerated data initially */
-    s390_init_feat_bitmap(qemu_max_cpu_feat_init, qemu_max_cpu_feat);
+    s390_init_feat_bitmap(qemu_max_init, qemu_max_cpu_feat);
     for (i = 0; i < ARRAY_SIZE(s390_cpu_defs); i++) {
         s390_init_feat_bitmap(s390_cpu_defs[i].base_init,
                               s390_cpu_defs[i].base_feat);
@@ -982,9 +970,9 @@ static void register_types(void)
                               s390_cpu_defs[i].full_feat);
     }
 
-    /* initialize the qemu model with latest definition */
+    /* initialize the qemu model with the maximum definition ("max" model) */
     s390_set_qemu_cpu_model(QEMU_MAX_CPU_TYPE, QEMU_MAX_CPU_GEN,
-                            QEMU_MAX_CPU_EC_GA, qemu_latest_init);
+                            QEMU_MAX_CPU_EC_GA, qemu_max_init);
 
     for (i = 0; i < ARRAY_SIZE(s390_cpu_defs); i++) {
         char *base_name = s390_base_cpu_type_name(s390_cpu_defs[i].name);
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 7b4430f9de..ec7d8ceab5 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -738,11 +738,13 @@ static uint16_t qemu_V6_2[] = {
     S390_FEAT_VECTOR_ENH,
 };
 
-static uint16_t qemu_LATEST[] = {
-    S390_FEAT_MISC_INSTRUCTION_EXT3,
-};
-/* add all new definitions before this point */
+/*
+ * Features for the "qemu" CPU model of the latest QEMU machine and the "max"
+ * CPU model under TCG. Don't include features that are not part of the full
+ * feature set of the current "max" CPU model generation.
+ */
 static uint16_t qemu_MAX[] = {
+    S390_FEAT_MISC_INSTRUCTION_EXT3,
 };
 
 /****** END FEATURE DEFS ******/
@@ -864,7 +866,6 @@ static FeatGroupDefSpec QemuFeatDef[] = {
     QEMU_FEAT_INITIALIZER(V4_1),
     QEMU_FEAT_INITIALIZER(V6_0),
     QEMU_FEAT_INITIALIZER(V6_2),
-    QEMU_FEAT_INITIALIZER(LATEST),
     QEMU_FEAT_INITIALIZER(MAX),
 };
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 04/13] tcg: Implement tcg_gen_{h,w}swap_{i32,i64}
  2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
                   ` (2 preceding siblings ...)
  2022-04-28  9:46 ` [PATCH v6 03/13] s390x/cpu_models: make "max" match the unmodified "qemu" CPU model under TCG David Hildenbrand
@ 2022-04-28  9:46 ` David Hildenbrand
  2022-04-28  9:47 ` [PATCH v6 05/13] target/s390x: vxeh2: vector convert short/32b David Hildenbrand
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-04-28  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Eric Farman, David Miller, Halil Pasic, qemu-s390x,
	Christian Borntraeger

From: Richard Henderson <richard.henderson@linaro.org>

Swap half-words (16-bit) and words (32-bit) within a larger value.
Mirrors functions of the same names within include/qemu/bitops.h.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: David Miller <dmiller423@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/tcg/tcg-op.h |  6 ++++++
 tcg/tcg-op.c         | 30 ++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index caa0a63612..b09b8b4a05 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -332,6 +332,7 @@ void tcg_gen_ext8u_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_ext16u_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg, int flags);
 void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg);
+void tcg_gen_hswap_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_smin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_smax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_umin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
@@ -531,6 +532,8 @@ void tcg_gen_ext32u_i64(TCGv_i64 ret, TCGv_i64 arg);
 void tcg_gen_bswap16_i64(TCGv_i64 ret, TCGv_i64 arg, int flags);
 void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg, int flags);
 void tcg_gen_bswap64_i64(TCGv_i64 ret, TCGv_i64 arg);
+void tcg_gen_hswap_i64(TCGv_i64 ret, TCGv_i64 arg);
+void tcg_gen_wswap_i64(TCGv_i64 ret, TCGv_i64 arg);
 void tcg_gen_smin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_smax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_umin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
@@ -1077,6 +1080,8 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
 #define tcg_gen_bswap32_tl tcg_gen_bswap32_i64
 #define tcg_gen_bswap64_tl tcg_gen_bswap64_i64
 #define tcg_gen_bswap_tl tcg_gen_bswap64_i64
+#define tcg_gen_hswap_tl tcg_gen_hswap_i64
+#define tcg_gen_wswap_tl tcg_gen_wswap_i64
 #define tcg_gen_concat_tl_i64 tcg_gen_concat32_i64
 #define tcg_gen_extr_i64_tl tcg_gen_extr32_i64
 #define tcg_gen_andc_tl tcg_gen_andc_i64
@@ -1192,6 +1197,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
 #define tcg_gen_bswap16_tl tcg_gen_bswap16_i32
 #define tcg_gen_bswap32_tl(D, S, F) tcg_gen_bswap32_i32(D, S)
 #define tcg_gen_bswap_tl tcg_gen_bswap32_i32
+#define tcg_gen_hswap_tl tcg_gen_hswap_i32
 #define tcg_gen_concat_tl_i64 tcg_gen_concat_i32_i64
 #define tcg_gen_extr_i64_tl tcg_gen_extr_i64_i32
 #define tcg_gen_andc_tl tcg_gen_andc_i32
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 5d48537927..019fab00cc 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1056,6 +1056,12 @@ void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
     }
 }
 
+void tcg_gen_hswap_i32(TCGv_i32 ret, TCGv_i32 arg)
+{
+    /* Swapping 2 16-bit elements is a rotate. */
+    tcg_gen_rotli_i32(ret, arg, 16);
+}
+
 void tcg_gen_smin_i32(TCGv_i32 ret, TCGv_i32 a, TCGv_i32 b)
 {
     tcg_gen_movcond_i32(TCG_COND_LT, ret, a, b, a, b);
@@ -1792,6 +1798,30 @@ void tcg_gen_bswap64_i64(TCGv_i64 ret, TCGv_i64 arg)
     }
 }
 
+void tcg_gen_hswap_i64(TCGv_i64 ret, TCGv_i64 arg)
+{
+    uint64_t m = 0x0000ffff0000ffffull;
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    /* See include/qemu/bitops.h, hswap64. */
+    tcg_gen_rotli_i64(t1, arg, 32);
+    tcg_gen_andi_i64(t0, t1, m);
+    tcg_gen_shli_i64(t0, t0, 16);
+    tcg_gen_shri_i64(t1, t1, 16);
+    tcg_gen_andi_i64(t1, t1, m);
+    tcg_gen_or_i64(ret, t0, t1);
+
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+}
+
+void tcg_gen_wswap_i64(TCGv_i64 ret, TCGv_i64 arg)
+{
+    /* Swapping 2 32-bit elements is a rotate. */
+    tcg_gen_rotli_i64(ret, arg, 32);
+}
+
 void tcg_gen_not_i64(TCGv_i64 ret, TCGv_i64 arg)
 {
     if (TCG_TARGET_REG_BITS == 32) {
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 05/13] target/s390x: vxeh2: vector convert short/32b
  2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
                   ` (3 preceding siblings ...)
  2022-04-28  9:46 ` [PATCH v6 04/13] tcg: Implement tcg_gen_{h,w}swap_{i32,i64} David Hildenbrand
@ 2022-04-28  9:47 ` David Hildenbrand
  2022-04-28  9:47 ` [PATCH v6 06/13] target/s390x: vxeh2: vector string search David Hildenbrand
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-04-28  9:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Eric Farman, David Miller, Halil Pasic, qemu-s390x,
	Christian Borntraeger

From: David Miller <dmiller423@gmail.com>

Signed-off-by: David Miller <dmiller423@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h               |  4 +++
 target/s390x/tcg/translate_vx.c.inc | 44 ++++++++++++++++++++++++++---
 target/s390x/tcg/vec_fpu_helper.c   | 31 ++++++++++++++++++++
 3 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 69f69cf718..7cbcbd7f0b 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -275,6 +275,10 @@ DEF_HELPER_FLAGS_5(gvec_vfche64, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32
 DEF_HELPER_5(gvec_vfche64_cc, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_FLAGS_5(gvec_vfche128, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
 DEF_HELPER_5(gvec_vfche128_cc, void, ptr, cptr, cptr, env, i32)
+DEF_HELPER_FLAGS_4(gvec_vcdg32, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
+DEF_HELPER_FLAGS_4(gvec_vcdlg32, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
+DEF_HELPER_FLAGS_4(gvec_vcgd32, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
+DEF_HELPER_FLAGS_4(gvec_vclgd32, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vcdg64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vcdlg64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
 DEF_HELPER_FLAGS_4(gvec_vcgd64, TCG_CALL_NO_WG, void, ptr, cptr, env, i32)
diff --git a/target/s390x/tcg/translate_vx.c.inc b/target/s390x/tcg/translate_vx.c.inc
index b829ce0c7c..be9407d1ed 100644
--- a/target/s390x/tcg/translate_vx.c.inc
+++ b/target/s390x/tcg/translate_vx.c.inc
@@ -2720,23 +2720,59 @@ static DisasJumpType op_vcdg(DisasContext *s, DisasOps *o)
 
     switch (s->fields.op2) {
     case 0xc3:
-        if (fpf == FPF_LONG) {
+        switch (fpf) {
+        case FPF_LONG:
             fn = gen_helper_gvec_vcdg64;
+            break;
+        case FPF_SHORT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH2)) {
+                fn = gen_helper_gvec_vcdg32;
+            }
+            break;
+        default:
+            break;
         }
         break;
     case 0xc1:
-        if (fpf == FPF_LONG) {
+        switch (fpf) {
+        case FPF_LONG:
             fn = gen_helper_gvec_vcdlg64;
+            break;
+        case FPF_SHORT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH2)) {
+                fn = gen_helper_gvec_vcdlg32;
+            }
+            break;
+        default:
+            break;
         }
         break;
     case 0xc2:
-        if (fpf == FPF_LONG) {
+        switch (fpf) {
+        case FPF_LONG:
             fn = gen_helper_gvec_vcgd64;
+            break;
+        case FPF_SHORT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH2)) {
+                fn = gen_helper_gvec_vcgd32;
+            }
+            break;
+        default:
+            break;
         }
         break;
     case 0xc0:
-        if (fpf == FPF_LONG) {
+        switch (fpf) {
+        case FPF_LONG:
             fn = gen_helper_gvec_vclgd64;
+            break;
+        case FPF_SHORT:
+            if (s390_has_feat(S390_FEAT_VECTOR_ENH2)) {
+                fn = gen_helper_gvec_vclgd32;
+            }
+            break;
+        default:
+            break;
         }
         break;
     case 0xc7:
diff --git a/target/s390x/tcg/vec_fpu_helper.c b/target/s390x/tcg/vec_fpu_helper.c
index aa2cc8e4a6..2a618a1093 100644
--- a/target/s390x/tcg/vec_fpu_helper.c
+++ b/target/s390x/tcg/vec_fpu_helper.c
@@ -175,6 +175,30 @@ static void vop128_2(S390Vector *v1, const S390Vector *v2, CPUS390XState *env,
     *v1 = tmp;
 }
 
+static float32 vcdg32(float32 a, float_status *s)
+{
+    return int32_to_float32(a, s);
+}
+
+static float32 vcdlg32(float32 a, float_status *s)
+{
+    return uint32_to_float32(a, s);
+}
+
+static float32 vcgd32(float32 a, float_status *s)
+{
+    const float32 tmp = float32_to_int32(a, s);
+
+    return float32_is_any_nan(a) ? INT32_MIN : tmp;
+}
+
+static float32 vclgd32(float32 a, float_status *s)
+{
+    const float32 tmp = float32_to_uint32(a, s);
+
+    return float32_is_any_nan(a) ? 0 : tmp;
+}
+
 static float64 vcdg64(float64 a, float_status *s)
 {
     return int64_to_float64(a, s);
@@ -210,6 +234,9 @@ void HELPER(gvec_##NAME##BITS)(void *v1, const void *v2, CPUS390XState *env,   \
     vop##BITS##_2(v1, v2, env, se, XxC, erm, FN, GETPC());                     \
 }
 
+#define DEF_GVEC_VOP2_32(NAME)                                                 \
+DEF_GVEC_VOP2_FN(NAME, NAME##32, 32)
+
 #define DEF_GVEC_VOP2_64(NAME)                                                 \
 DEF_GVEC_VOP2_FN(NAME, NAME##64, 64)
 
@@ -218,6 +245,10 @@ DEF_GVEC_VOP2_FN(NAME, float32_##OP, 32)                                       \
 DEF_GVEC_VOP2_FN(NAME, float64_##OP, 64)                                       \
 DEF_GVEC_VOP2_FN(NAME, float128_##OP, 128)
 
+DEF_GVEC_VOP2_32(vcdg)
+DEF_GVEC_VOP2_32(vcdlg)
+DEF_GVEC_VOP2_32(vcgd)
+DEF_GVEC_VOP2_32(vclgd)
 DEF_GVEC_VOP2_64(vcdg)
 DEF_GVEC_VOP2_64(vcdlg)
 DEF_GVEC_VOP2_64(vcgd)
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 06/13] target/s390x: vxeh2: vector string search
  2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
                   ` (4 preceding siblings ...)
  2022-04-28  9:47 ` [PATCH v6 05/13] target/s390x: vxeh2: vector convert short/32b David Hildenbrand
@ 2022-04-28  9:47 ` David Hildenbrand
  2022-04-28  9:47 ` [PATCH v6 07/13] target/s390x: vxeh2: Update for changes to vector shifts David Hildenbrand
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-04-28  9:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Eric Farman, David Miller, Halil Pasic, qemu-s390x,
	Christian Borntraeger

From: David Miller <dmiller423@gmail.com>

Signed-off-by: David Miller <dmiller423@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h                |  6 ++
 target/s390x/tcg/insn-data.def       |  2 +
 target/s390x/tcg/translate.c         |  3 +-
 target/s390x/tcg/translate_vx.c.inc  | 25 +++++++
 target/s390x/tcg/vec_string_helper.c | 99 ++++++++++++++++++++++++++++
 5 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 7cbcbd7f0b..7412130883 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -246,6 +246,12 @@ DEF_HELPER_6(gvec_vstrc_cc32, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_6(gvec_vstrc_cc_rt8, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_6(gvec_vstrc_cc_rt16, void, ptr, cptr, cptr, cptr, env, i32)
 DEF_HELPER_6(gvec_vstrc_cc_rt32, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_6(gvec_vstrs_8, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_6(gvec_vstrs_16, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_6(gvec_vstrs_32, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_6(gvec_vstrs_zs8, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_6(gvec_vstrs_zs16, void, ptr, cptr, cptr, cptr, env, i32)
+DEF_HELPER_6(gvec_vstrs_zs32, void, ptr, cptr, cptr, cptr, env, i32)
 
 /* === Vector Floating-Point Instructions */
 DEF_HELPER_FLAGS_5(gvec_vfa32, TCG_CALL_NO_WG, void, ptr, cptr, cptr, env, i32)
diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index 6c8a8b229f..46add91a0e 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -1246,6 +1246,8 @@
     F(0xe75c, VISTR,   VRR_a, V,   0, 0, 0, 0, vistr, 0, IF_VEC)
 /* VECTOR STRING RANGE COMPARE */
     F(0xe78a, VSTRC,   VRR_d, V,   0, 0, 0, 0, vstrc, 0, IF_VEC)
+/* VECTOR STRING SEARCH */
+    F(0xe78b, VSTRS,   VRR_d, VE2, 0, 0, 0, 0, vstrs, 0, IF_VEC)
 
 /* === Vector Floating-Point Instructions */
 
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 8f092dab95..b40cb84bae 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -6222,7 +6222,8 @@ enum DisasInsnEnum {
 #define FAC_PCI         S390_FEAT_ZPCI /* z/PCI facility */
 #define FAC_AIS         S390_FEAT_ADAPTER_INT_SUPPRESSION
 #define FAC_V           S390_FEAT_VECTOR /* vector facility */
-#define FAC_VE          S390_FEAT_VECTOR_ENH /* vector enhancements facility 1 */
+#define FAC_VE          S390_FEAT_VECTOR_ENH  /* vector enhancements facility 1 */
+#define FAC_VE2         S390_FEAT_VECTOR_ENH2 /* vector enhancements facility 2 */
 #define FAC_MIE2        S390_FEAT_MISC_INSTRUCTION_EXT2 /* miscellaneous-instruction-extensions facility 2 */
 #define FAC_MIE3        S390_FEAT_MISC_INSTRUCTION_EXT3 /* miscellaneous-instruction-extensions facility 3 */
 
diff --git a/target/s390x/tcg/translate_vx.c.inc b/target/s390x/tcg/translate_vx.c.inc
index be9407d1ed..8ddbd440e2 100644
--- a/target/s390x/tcg/translate_vx.c.inc
+++ b/target/s390x/tcg/translate_vx.c.inc
@@ -2497,6 +2497,31 @@ static DisasJumpType op_vstrc(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
+static DisasJumpType op_vstrs(DisasContext *s, DisasOps *o)
+{
+    typedef void (*helper_vstrs)(TCGv_ptr, TCGv_ptr, TCGv_ptr,
+                                 TCGv_ptr, TCGv_ptr, TCGv_i32);
+    static const helper_vstrs fns[3][2] = {
+        { gen_helper_gvec_vstrs_8, gen_helper_gvec_vstrs_zs8 },
+        { gen_helper_gvec_vstrs_16, gen_helper_gvec_vstrs_zs16 },
+        { gen_helper_gvec_vstrs_32, gen_helper_gvec_vstrs_zs32 },
+    };
+    const uint8_t es = get_field(s, m5);
+    const uint8_t m6 = get_field(s, m6);
+    const bool zs = extract32(m6, 1, 1);
+
+    if (es > ES_32 || m6 & ~2) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_gvec_4_ptr(get_field(s, v1), get_field(s, v2),
+                   get_field(s, v3), get_field(s, v4),
+                   cpu_env, 0, fns[es][zs]);
+    set_cc_static(s);
+    return DISAS_NEXT;
+}
+
 static DisasJumpType op_vfa(DisasContext *s, DisasOps *o)
 {
     const uint8_t fpf = get_field(s, m4);
diff --git a/target/s390x/tcg/vec_string_helper.c b/target/s390x/tcg/vec_string_helper.c
index f8b54bba4a..9b85becdfb 100644
--- a/target/s390x/tcg/vec_string_helper.c
+++ b/target/s390x/tcg/vec_string_helper.c
@@ -470,3 +470,102 @@ void HELPER(gvec_vstrc_cc_rt##BITS)(void *v1, const void *v2, const void *v3,  \
 DEF_VSTRC_CC_RT_HELPER(8)
 DEF_VSTRC_CC_RT_HELPER(16)
 DEF_VSTRC_CC_RT_HELPER(32)
+
+static int vstrs(S390Vector *v1, const S390Vector *v2, const S390Vector *v3,
+                 const S390Vector *v4, uint8_t es, bool zs)
+{
+    int substr_elen, substr_0, str_elen, i, j, k, cc;
+    int nelem = 16 >> es;
+    bool eos = false;
+
+    substr_elen = s390_vec_read_element8(v4, 7) >> es;
+
+    /* If ZS, bound substr length by min(nelem, strlen(v3)). */
+    if (zs) {
+        substr_elen = MIN(substr_elen, nelem);
+        for (i = 0; i < substr_elen; i++) {
+            if (s390_vec_read_element(v3, i, es) == 0) {
+                substr_elen = i;
+                break;
+            }
+        }
+    }
+
+    if (substr_elen == 0) {
+        cc = 2; /* full match for degenerate case of empty substr */
+        k = 0;
+        goto done;
+    }
+
+    /* If ZS, look for eos in the searched string. */
+    if (zs) {
+        for (k = 0; k < nelem; k++) {
+            if (s390_vec_read_element(v2, k, es) == 0) {
+                eos = true;
+                break;
+            }
+        }
+        str_elen = k;
+    } else {
+        str_elen = nelem;
+    }
+
+    substr_0 = s390_vec_read_element(v3, 0, es);
+
+    for (k = 0; ; k++) {
+        for (; k < str_elen; k++) {
+            if (s390_vec_read_element(v2, k, es) == substr_0) {
+                break;
+            }
+        }
+
+        /* If we reached the end of the string, no match. */
+        if (k == str_elen) {
+            cc = eos; /* no match (with or without zero char) */
+            goto done;
+        }
+
+        /* If the substring is only one char, match. */
+        if (substr_elen == 1) {
+            cc = 2; /* full match */
+            goto done;
+        }
+
+        /* If the match begins at the last char, we have a partial match. */
+        if (k == str_elen - 1) {
+            cc = 3; /* partial match */
+            goto done;
+        }
+
+        i = MIN(nelem, k + substr_elen);
+        for (j = k + 1; j < i; j++) {
+            uint32_t e2 = s390_vec_read_element(v2, j, es);
+            uint32_t e3 = s390_vec_read_element(v3, j - k, es);
+            if (e2 != e3) {
+                break;
+            }
+        }
+        if (j == i) {
+            /* Matched up until "end". */
+            cc = i - k == substr_elen ? 2 : 3; /* full or partial match */
+            goto done;
+        }
+    }
+
+ done:
+    s390_vec_write_element64(v1, 0, k << es);
+    s390_vec_write_element64(v1, 1, 0);
+    return cc;
+}
+
+#define DEF_VSTRS_HELPER(BITS)                                             \
+void QEMU_FLATTEN HELPER(gvec_vstrs_##BITS)(void *v1, const void *v2,      \
+    const void *v3, const void *v4, CPUS390XState *env, uint32_t desc)     \
+    { env->cc_op = vstrs(v1, v2, v3, v4, MO_##BITS, false); }              \
+void QEMU_FLATTEN HELPER(gvec_vstrs_zs##BITS)(void *v1, const void *v2,    \
+    const void *v3, const void *v4, CPUS390XState *env, uint32_t desc)     \
+    { env->cc_op = vstrs(v1, v2, v3, v4, MO_##BITS, true); }
+
+DEF_VSTRS_HELPER(8)
+DEF_VSTRS_HELPER(16)
+DEF_VSTRS_HELPER(32)
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 07/13] target/s390x: vxeh2: Update for changes to vector shifts
  2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
                   ` (5 preceding siblings ...)
  2022-04-28  9:47 ` [PATCH v6 06/13] target/s390x: vxeh2: vector string search David Hildenbrand
@ 2022-04-28  9:47 ` David Hildenbrand
  2022-04-28  9:47 ` [PATCH v6 08/13] target/s390x: vxeh2: vector shift double by bit David Hildenbrand
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-04-28  9:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Eric Farman, David Miller, Halil Pasic, qemu-s390x,
	Christian Borntraeger

From: David Miller <dmiller423@gmail.com>

Signed-off-by: David Miller <dmiller423@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/helper.h               |  3 ++
 target/s390x/tcg/insn-data.def      | 12 ++---
 target/s390x/tcg/translate_vx.c.inc | 75 ++++++++++++-----------------
 target/s390x/tcg/vec_int_helper.c   | 55 +++++++++++++++++++++
 4 files changed, 95 insertions(+), 50 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 7412130883..bf33d86f74 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -203,8 +203,11 @@ DEF_HELPER_FLAGS_3(gvec_vpopct16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_verim8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_verim16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vsl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vsl_ve2, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vsra, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vsra_ve2, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vsrl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
+DEF_HELPER_FLAGS_4(gvec_vsrl_ve2, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vscbi8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vscbi16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_4(gvec_vtm, void, ptr, cptr, env, i32)
diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index 46add91a0e..f487a64abf 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -1204,19 +1204,19 @@
     F(0xe778, VESRLV,  VRR_c, V,   0, 0, 0, 0, vesv, 0, IF_VEC)
     F(0xe738, VESRL,   VRS_a, V,   la2, 0, 0, 0, ves, 0, IF_VEC)
 /* VECTOR SHIFT LEFT */
-    F(0xe774, VSL,     VRR_c, V,   0, 0, 0, 0, vsl, 0, IF_VEC)
+    E(0xe774, VSL,     VRR_c, V,   0, 0, 0, 0, vsl, 0, 0, IF_VEC)
 /* VECTOR SHIFT LEFT BY BYTE */
-    F(0xe775, VSLB,    VRR_c, V,   0, 0, 0, 0, vsl, 0, IF_VEC)
+    E(0xe775, VSLB,    VRR_c, V,   0, 0, 0, 0, vsl, 0, 1, IF_VEC)
 /* VECTOR SHIFT LEFT DOUBLE BY BYTE */
     F(0xe777, VSLDB,   VRI_d, V,   0, 0, 0, 0, vsldb, 0, IF_VEC)
 /* VECTOR SHIFT RIGHT ARITHMETIC */
-    F(0xe77e, VSRA,    VRR_c, V,   0, 0, 0, 0, vsra, 0, IF_VEC)
+    E(0xe77e, VSRA,    VRR_c, V,   0, 0, 0, 0, vsra, 0, 0, IF_VEC)
 /* VECTOR SHIFT RIGHT ARITHMETIC BY BYTE */
-    F(0xe77f, VSRAB,   VRR_c, V,   0, 0, 0, 0, vsra, 0, IF_VEC)
+    E(0xe77f, VSRAB,   VRR_c, V,   0, 0, 0, 0, vsra, 0, 1, IF_VEC)
 /* VECTOR SHIFT RIGHT LOGICAL */
-    F(0xe77c, VSRL,    VRR_c, V,   0, 0, 0, 0, vsrl, 0, IF_VEC)
+    E(0xe77c, VSRL,    VRR_c, V,   0, 0, 0, 0, vsrl, 0, 0, IF_VEC)
 /* VECTOR SHIFT RIGHT LOGICAL BY BYTE */
-    F(0xe77d, VSRLB,   VRR_c, V,   0, 0, 0, 0, vsrl, 0, IF_VEC)
+    E(0xe77d, VSRLB,   VRR_c, V,   0, 0, 0, 0, vsrl, 0, 1, IF_VEC)
 /* VECTOR SUBTRACT */
     F(0xe7f7, VS,      VRR_c, V,   0, 0, 0, 0, vs, 0, IF_VEC)
 /* VECTOR SUBTRACT COMPUTE BORROW INDICATION */
diff --git a/target/s390x/tcg/translate_vx.c.inc b/target/s390x/tcg/translate_vx.c.inc
index 8ddbd440e2..81673ea68f 100644
--- a/target/s390x/tcg/translate_vx.c.inc
+++ b/target/s390x/tcg/translate_vx.c.inc
@@ -2018,23 +2018,44 @@ static DisasJumpType op_ves(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
-static DisasJumpType op_vsl(DisasContext *s, DisasOps *o)
+static DisasJumpType gen_vsh_by_byte(DisasContext *s, DisasOps *o,
+                                      gen_helper_gvec_2i *gen,
+                                      gen_helper_gvec_3 *gen_ve2)
 {
-    TCGv_i64 shift = tcg_temp_new_i64();
+    bool byte = s->insn->data;
 
-    read_vec_element_i64(shift, get_field(s, v3), 7, ES_8);
-    if (s->fields.op2 == 0x74) {
-        tcg_gen_andi_i64(shift, shift, 0x7);
+    if (!byte && s390_has_feat(S390_FEAT_VECTOR_ENH2)) {
+        gen_gvec_3_ool(get_field(s, v1), get_field(s, v2),
+                       get_field(s, v3), 0, gen_ve2);
     } else {
-        tcg_gen_andi_i64(shift, shift, 0x78);
-    }
+        TCGv_i64 shift = tcg_temp_new_i64();
 
-    gen_gvec_2i_ool(get_field(s, v1), get_field(s, v2),
-                    shift, 0, gen_helper_gvec_vsl);
-    tcg_temp_free_i64(shift);
+        read_vec_element_i64(shift, get_field(s, v3), 7, ES_8);
+        tcg_gen_andi_i64(shift, shift, byte ? 0x78 : 7);
+        gen_gvec_2i_ool(get_field(s, v1), get_field(s, v2), shift, 0, gen);
+        tcg_temp_free_i64(shift);
+    }
     return DISAS_NEXT;
 }
 
+static DisasJumpType op_vsl(DisasContext *s, DisasOps *o)
+{
+    return gen_vsh_by_byte(s, o, gen_helper_gvec_vsl,
+                            gen_helper_gvec_vsl_ve2);
+}
+
+static DisasJumpType op_vsra(DisasContext *s, DisasOps *o)
+{
+    return gen_vsh_by_byte(s, o, gen_helper_gvec_vsra,
+                            gen_helper_gvec_vsra_ve2);
+}
+
+static DisasJumpType op_vsrl(DisasContext *s, DisasOps *o)
+{
+    return gen_vsh_by_byte(s, o, gen_helper_gvec_vsrl,
+                            gen_helper_gvec_vsrl_ve2);
+}
+
 static DisasJumpType op_vsldb(DisasContext *s, DisasOps *o)
 {
     const uint8_t i4 = get_field(s, i4) & 0xf;
@@ -2064,40 +2085,6 @@ static DisasJumpType op_vsldb(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
-static DisasJumpType op_vsra(DisasContext *s, DisasOps *o)
-{
-    TCGv_i64 shift = tcg_temp_new_i64();
-
-    read_vec_element_i64(shift, get_field(s, v3), 7, ES_8);
-    if (s->fields.op2 == 0x7e) {
-        tcg_gen_andi_i64(shift, shift, 0x7);
-    } else {
-        tcg_gen_andi_i64(shift, shift, 0x78);
-    }
-
-    gen_gvec_2i_ool(get_field(s, v1), get_field(s, v2),
-                    shift, 0, gen_helper_gvec_vsra);
-    tcg_temp_free_i64(shift);
-    return DISAS_NEXT;
-}
-
-static DisasJumpType op_vsrl(DisasContext *s, DisasOps *o)
-{
-    TCGv_i64 shift = tcg_temp_new_i64();
-
-    read_vec_element_i64(shift, get_field(s, v3), 7, ES_8);
-    if (s->fields.op2 == 0x7c) {
-        tcg_gen_andi_i64(shift, shift, 0x7);
-    } else {
-        tcg_gen_andi_i64(shift, shift, 0x78);
-    }
-
-    gen_gvec_2i_ool(get_field(s, v1), get_field(s, v2),
-                    shift, 0, gen_helper_gvec_vsrl);
-    tcg_temp_free_i64(shift);
-    return DISAS_NEXT;
-}
-
 static DisasJumpType op_vs(DisasContext *s, DisasOps *o)
 {
     const uint8_t es = get_field(s, m4);
diff --git a/target/s390x/tcg/vec_int_helper.c b/target/s390x/tcg/vec_int_helper.c
index b44859ee16..53ab5c5eb3 100644
--- a/target/s390x/tcg/vec_int_helper.c
+++ b/target/s390x/tcg/vec_int_helper.c
@@ -539,18 +539,73 @@ void HELPER(gvec_vsl)(void *v1, const void *v2, uint64_t count,
     s390_vec_shl(v1, v2, count);
 }
 
+void HELPER(gvec_vsl_ve2)(void *v1, const void *v2, const void *v3,
+                          uint32_t desc)
+{
+    S390Vector tmp;
+    uint32_t sh, e0, e1 = 0;
+    int i;
+
+    for (i = 15; i >= 0; --i, e1 = e0) {
+        e0 = s390_vec_read_element8(v2, i);
+        sh = s390_vec_read_element8(v3, i) & 7;
+
+        s390_vec_write_element8(&tmp, i, rol32(e0 | (e1 << 24), sh));
+    }
+
+    *(S390Vector *)v1 = tmp;
+}
+
 void HELPER(gvec_vsra)(void *v1, const void *v2, uint64_t count,
                        uint32_t desc)
 {
     s390_vec_sar(v1, v2, count);
 }
 
+void HELPER(gvec_vsra_ve2)(void *v1, const void *v2, const void *v3,
+                           uint32_t desc)
+{
+    S390Vector tmp;
+    uint32_t sh, e0, e1 = 0;
+    int i = 0;
+
+    /* Byte 0 is special only. */
+    e0 = (int32_t)(int8_t)s390_vec_read_element8(v2, i);
+    sh = s390_vec_read_element8(v3, i) & 7;
+    s390_vec_write_element8(&tmp, i, e0 >> sh);
+
+    e1 = e0;
+    for (i = 1; i < 16; ++i, e1 = e0) {
+        e0 = s390_vec_read_element8(v2, i);
+        sh = s390_vec_read_element8(v3, i) & 7;
+        s390_vec_write_element8(&tmp, i, (e0 | e1 << 8) >> sh);
+    }
+
+    *(S390Vector *)v1 = tmp;
+}
+
 void HELPER(gvec_vsrl)(void *v1, const void *v2, uint64_t count,
                        uint32_t desc)
 {
     s390_vec_shr(v1, v2, count);
 }
 
+void HELPER(gvec_vsrl_ve2)(void *v1, const void *v2, const void *v3,
+                           uint32_t desc)
+{
+    S390Vector tmp;
+    uint32_t sh, e0, e1 = 0;
+
+    for (int i = 0; i < 16; ++i, e1 = e0) {
+        e0 = s390_vec_read_element8(v2, i);
+        sh = s390_vec_read_element8(v3, i) & 7;
+
+        s390_vec_write_element8(&tmp, i, (e0 | (e1 << 8)) >> sh);
+    }
+
+    *(S390Vector *)v1 = tmp;
+}
+
 #define DEF_VSCBI(BITS)                                                        \
 void HELPER(gvec_vscbi##BITS)(void *v1, const void *v2, const void *v3,        \
                               uint32_t desc)                                   \
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 08/13] target/s390x: vxeh2: vector shift double by bit
  2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
                   ` (6 preceding siblings ...)
  2022-04-28  9:47 ` [PATCH v6 07/13] target/s390x: vxeh2: Update for changes to vector shifts David Hildenbrand
@ 2022-04-28  9:47 ` David Hildenbrand
  2022-04-28  9:47 ` [PATCH v6 09/13] target/s390x: vxeh2: vector {load, store} elements reversed David Hildenbrand
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-04-28  9:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Eric Farman, David Miller, Halil Pasic, qemu-s390x,
	Christian Borntraeger

From: David Miller <dmiller423@gmail.com>

Signed-off-by: David Miller <dmiller423@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/tcg/insn-data.def      |  6 +++-
 target/s390x/tcg/translate_vx.c.inc | 55 +++++++++++++++++++++++++----
 2 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index f487a64abf..98a31a557d 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -1207,12 +1207,16 @@
     E(0xe774, VSL,     VRR_c, V,   0, 0, 0, 0, vsl, 0, 0, IF_VEC)
 /* VECTOR SHIFT LEFT BY BYTE */
     E(0xe775, VSLB,    VRR_c, V,   0, 0, 0, 0, vsl, 0, 1, IF_VEC)
+/* VECTOR SHIFT LEFT DOUBLE BY BIT */
+    E(0xe786, VSLD,    VRI_d, VE2, 0, 0, 0, 0, vsld, 0, 0, IF_VEC)
 /* VECTOR SHIFT LEFT DOUBLE BY BYTE */
-    F(0xe777, VSLDB,   VRI_d, V,   0, 0, 0, 0, vsldb, 0, IF_VEC)
+    E(0xe777, VSLDB,   VRI_d, V,   0, 0, 0, 0, vsld, 0, 1, IF_VEC)
 /* VECTOR SHIFT RIGHT ARITHMETIC */
     E(0xe77e, VSRA,    VRR_c, V,   0, 0, 0, 0, vsra, 0, 0, IF_VEC)
 /* VECTOR SHIFT RIGHT ARITHMETIC BY BYTE */
     E(0xe77f, VSRAB,   VRR_c, V,   0, 0, 0, 0, vsra, 0, 1, IF_VEC)
+/* VECTOR SHIFT RIGHT DOUBLE BY BIT */
+    F(0xe787, VSRD,    VRI_d, VE2, 0, 0, 0, 0, vsrd, 0, IF_VEC)
 /* VECTOR SHIFT RIGHT LOGICAL */
     E(0xe77c, VSRL,    VRR_c, V,   0, 0, 0, 0, vsrl, 0, 0, IF_VEC)
 /* VECTOR SHIFT RIGHT LOGICAL BY BYTE */
diff --git a/target/s390x/tcg/translate_vx.c.inc b/target/s390x/tcg/translate_vx.c.inc
index 81673ea68f..cb6540673d 100644
--- a/target/s390x/tcg/translate_vx.c.inc
+++ b/target/s390x/tcg/translate_vx.c.inc
@@ -2056,14 +2056,23 @@ static DisasJumpType op_vsrl(DisasContext *s, DisasOps *o)
                             gen_helper_gvec_vsrl_ve2);
 }
 
-static DisasJumpType op_vsldb(DisasContext *s, DisasOps *o)
+static DisasJumpType op_vsld(DisasContext *s, DisasOps *o)
 {
-    const uint8_t i4 = get_field(s, i4) & 0xf;
-    const int left_shift = (i4 & 7) * 8;
-    const int right_shift = 64 - left_shift;
-    TCGv_i64 t0 = tcg_temp_new_i64();
-    TCGv_i64 t1 = tcg_temp_new_i64();
-    TCGv_i64 t2 = tcg_temp_new_i64();
+    const bool byte = s->insn->data;
+    const uint8_t mask = byte ? 15 : 7;
+    const uint8_t mul  = byte ?  8 : 1;
+    const uint8_t i4   = get_field(s, i4);
+    const int right_shift = 64 - (i4 & 7) * mul;
+    TCGv_i64 t0, t1, t2;
+
+    if (i4 & ~mask) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
 
     if ((i4 & 8) == 0) {
         read_vec_element_i64(t0, get_field(s, v2), 0, ES_64);
@@ -2074,8 +2083,40 @@ static DisasJumpType op_vsldb(DisasContext *s, DisasOps *o)
         read_vec_element_i64(t1, get_field(s, v3), 0, ES_64);
         read_vec_element_i64(t2, get_field(s, v3), 1, ES_64);
     }
+
     tcg_gen_extract2_i64(t0, t1, t0, right_shift);
     tcg_gen_extract2_i64(t1, t2, t1, right_shift);
+
+    write_vec_element_i64(t0, get_field(s, v1), 0, ES_64);
+    write_vec_element_i64(t1, get_field(s, v1), 1, ES_64);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    tcg_temp_free(t2);
+    return DISAS_NEXT;
+}
+
+static DisasJumpType op_vsrd(DisasContext *s, DisasOps *o)
+{
+    const uint8_t i4 = get_field(s, i4);
+    TCGv_i64 t0, t1, t2;
+
+    if (i4 & ~7) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    t2 = tcg_temp_new_i64();
+
+    read_vec_element_i64(t0, get_field(s, v2), 1, ES_64);
+    read_vec_element_i64(t1, get_field(s, v3), 0, ES_64);
+    read_vec_element_i64(t2, get_field(s, v3), 1, ES_64);
+
+    tcg_gen_extract2_i64(t0, t1, t0, i4);
+    tcg_gen_extract2_i64(t1, t2, t1, i4);
+
     write_vec_element_i64(t0, get_field(s, v1), 0, ES_64);
     write_vec_element_i64(t1, get_field(s, v1), 1, ES_64);
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 09/13] target/s390x: vxeh2: vector {load, store} elements reversed
  2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
                   ` (7 preceding siblings ...)
  2022-04-28  9:47 ` [PATCH v6 08/13] target/s390x: vxeh2: vector shift double by bit David Hildenbrand
@ 2022-04-28  9:47 ` David Hildenbrand
  2022-04-28  9:47 ` [PATCH v6 10/13] target/s390x: vxeh2: vector {load, store} byte reversed elements David Hildenbrand
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-04-28  9:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Eric Farman, David Miller, Halil Pasic, qemu-s390x,
	Christian Borntraeger

From: David Miller <dmiller423@gmail.com>

Signed-off-by: David Miller <dmiller423@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/tcg/insn-data.def      |  4 ++
 target/s390x/tcg/translate_vx.c.inc | 84 +++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+)

diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index 98a31a557d..b524541a7d 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -1037,6 +1037,8 @@
     E(0xe741, VLEIH,   VRI_a, V,   0, 0, 0, 0, vlei, 0, ES_16, IF_VEC)
     E(0xe743, VLEIF,   VRI_a, V,   0, 0, 0, 0, vlei, 0, ES_32, IF_VEC)
     E(0xe742, VLEIG,   VRI_a, V,   0, 0, 0, 0, vlei, 0, ES_64, IF_VEC)
+/* VECTOR LOAD ELEMENTS REVERSED */
+    F(0xe607, VLER,    VRX,   VE2, la2, 0, 0, 0, vler, 0, IF_VEC)
 /* VECTOR LOAD GR FROM VR ELEMENT */
     F(0xe721, VLGV,    VRS_c, V,   la2, 0, r1, 0, vlgv, 0, IF_VEC)
 /* VECTOR LOAD LOGICAL ELEMENT AND ZERO */
@@ -1082,6 +1084,8 @@
     E(0xe709, VSTEH,   VRX,   V,   la2, 0, 0, 0, vste, 0, ES_16, IF_VEC)
     E(0xe70b, VSTEF,   VRX,   V,   la2, 0, 0, 0, vste, 0, ES_32, IF_VEC)
     E(0xe70a, VSTEG,   VRX,   V,   la2, 0, 0, 0, vste, 0, ES_64, IF_VEC)
+/* VECTOR STORE ELEMENTS REVERSED */
+    F(0xe60f, VSTER,   VRX,   VE2, la2, 0, 0, 0, vster, 0, IF_VEC)
 /* VECTOR STORE MULTIPLE */
     F(0xe73e, VSTM,    VRS_a, V,   la2, 0, 0, 0, vstm, 0, IF_VEC)
 /* VECTOR STORE WITH LENGTH */
diff --git a/target/s390x/tcg/translate_vx.c.inc b/target/s390x/tcg/translate_vx.c.inc
index cb6540673d..7667a995c8 100644
--- a/target/s390x/tcg/translate_vx.c.inc
+++ b/target/s390x/tcg/translate_vx.c.inc
@@ -492,6 +492,46 @@ static DisasJumpType op_vlei(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
+static DisasJumpType op_vler(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s, m3);
+
+    if (es < ES_16 || es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    TCGv_i64 t0 = tcg_temp_new_i64();
+    TCGv_i64 t1 = tcg_temp_new_i64();
+
+    /* Begin with the two doublewords swapped... */
+    tcg_gen_qemu_ld_i64(t1, o->addr1, get_mem_index(s), MO_TEUQ);
+    gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
+    tcg_gen_qemu_ld_i64(t0, o->addr1, get_mem_index(s), MO_TEUQ);
+
+    /* ... then swap smaller elements within the doublewords as required. */
+    switch (es) {
+    case MO_16:
+        tcg_gen_hswap_i64(t1, t1);
+        tcg_gen_hswap_i64(t0, t0);
+        break;
+    case MO_32:
+        tcg_gen_wswap_i64(t1, t1);
+        tcg_gen_wswap_i64(t0, t0);
+        break;
+    case MO_64:
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    write_vec_element_i64(t0, get_field(s, v1), 0, ES_64);
+    write_vec_element_i64(t1, get_field(s, v1), 1, ES_64);
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    return DISAS_NEXT;
+}
+
 static DisasJumpType op_vlgv(DisasContext *s, DisasOps *o)
 {
     const uint8_t es = get_field(s, m4);
@@ -976,6 +1016,50 @@ static DisasJumpType op_vste(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
+static DisasJumpType op_vster(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s, m3);
+    TCGv_i64 t0, t1;
+
+    if (es < ES_16 || es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    /* Probe write access before actually modifying memory */
+    gen_helper_probe_write_access(cpu_env, o->addr1, tcg_constant_i64(16));
+
+    /* Begin with the two doublewords swapped... */
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+    read_vec_element_i64(t1,  get_field(s, v1), 0, ES_64);
+    read_vec_element_i64(t0,  get_field(s, v1), 1, ES_64);
+
+    /* ... then swap smaller elements within the doublewords as required. */
+    switch (es) {
+    case MO_16:
+        tcg_gen_hswap_i64(t1, t1);
+        tcg_gen_hswap_i64(t0, t0);
+        break;
+    case MO_32:
+        tcg_gen_wswap_i64(t1, t1);
+        tcg_gen_wswap_i64(t0, t0);
+        break;
+    case MO_64:
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    tcg_gen_qemu_st_i64(t0, o->addr1, get_mem_index(s), MO_TEUQ);
+    gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
+    tcg_gen_qemu_st_i64(t1, o->addr1, get_mem_index(s), MO_TEUQ);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    return DISAS_NEXT;
+}
+
 static DisasJumpType op_vstm(DisasContext *s, DisasOps *o)
 {
     const uint8_t v3 = get_field(s, v3);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 10/13] target/s390x: vxeh2: vector {load, store} byte reversed elements
  2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
                   ` (8 preceding siblings ...)
  2022-04-28  9:47 ` [PATCH v6 09/13] target/s390x: vxeh2: vector {load, store} elements reversed David Hildenbrand
@ 2022-04-28  9:47 ` David Hildenbrand
  2022-04-28  9:47 ` [PATCH v6 11/13] target/s390x: vxeh2: vector {load, store} byte reversed element David Hildenbrand
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-04-28  9:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Eric Farman, David Miller, Halil Pasic, qemu-s390x,
	Christian Borntraeger

From: David Miller <dmiller423@gmail.com>

Signed-off-by: David Miller <dmiller423@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/tcg/insn-data.def      |   4 +
 target/s390x/tcg/translate_vx.c.inc | 113 ++++++++++++++++++++++++++++
 2 files changed, 117 insertions(+)

diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index b524541a7d..ee6e1dc9e5 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -1027,6 +1027,8 @@
     F(0xe756, VLR,     VRR_a, V,   0, 0, 0, 0, vlr, 0, IF_VEC)
 /* VECTOR LOAD AND REPLICATE */
     F(0xe705, VLREP,   VRX,   V,   la2, 0, 0, 0, vlrep, 0, IF_VEC)
+/* VECTOR LOAD BYTE REVERSED ELEMENTS */
+    F(0xe606, VLBR,    VRX,   VE2, la2, 0, 0, 0, vlbr, 0, IF_VEC)
 /* VECTOR LOAD ELEMENT */
     E(0xe700, VLEB,    VRX,   V,   la2, 0, 0, 0, vle, 0, ES_8, IF_VEC)
     E(0xe701, VLEH,    VRX,   V,   la2, 0, 0, 0, vle, 0, ES_16, IF_VEC)
@@ -1079,6 +1081,8 @@
     F(0xe75f, VSEG,    VRR_a, V,   0, 0, 0, 0, vseg, 0, IF_VEC)
 /* VECTOR STORE */
     F(0xe70e, VST,     VRX,   V,   la2, 0, 0, 0, vst, 0, IF_VEC)
+/* VECTOR STORE BYTE REVERSED ELEMENTS */
+    F(0xe60e, VSTBR,    VRX,   VE2, la2, 0, 0, 0, vstbr, 0, IF_VEC)
 /* VECTOR STORE ELEMENT */
     E(0xe708, VSTEB,   VRX,   V,   la2, 0, 0, 0, vste, 0, ES_8, IF_VEC)
     E(0xe709, VSTEH,   VRX,   V,   la2, 0, 0, 0, vste, 0, ES_16, IF_VEC)
diff --git a/target/s390x/tcg/translate_vx.c.inc b/target/s390x/tcg/translate_vx.c.inc
index 7667a995c8..75f3fd7edd 100644
--- a/target/s390x/tcg/translate_vx.c.inc
+++ b/target/s390x/tcg/translate_vx.c.inc
@@ -457,6 +457,62 @@ static DisasJumpType op_vlrep(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
+static DisasJumpType op_vlbr(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s, m3);
+    TCGv_i64 t0, t1;
+
+    if (es < ES_16 || es > ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+
+
+    if (es == ES_128) {
+        tcg_gen_qemu_ld_i64(t1, o->addr1, get_mem_index(s), MO_LEUQ);
+        gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
+        tcg_gen_qemu_ld_i64(t0, o->addr1, get_mem_index(s), MO_LEUQ);
+        goto write;
+    }
+
+    /* Begin with byte reversed doublewords... */
+    tcg_gen_qemu_ld_i64(t0, o->addr1, get_mem_index(s), MO_LEUQ);
+    gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
+    tcg_gen_qemu_ld_i64(t1, o->addr1, get_mem_index(s), MO_LEUQ);
+
+    /*
+     * For 16 and 32-bit elements, the doubleword bswap also reversed
+     * the order of the elements.  Perform a larger order swap to put
+     * them back into place.  For the 128-bit "element", finish the
+     * bswap by swapping the doublewords.
+     */
+    switch (es) {
+    case ES_16:
+        tcg_gen_hswap_i64(t0, t0);
+        tcg_gen_hswap_i64(t1, t1);
+        break;
+    case ES_32:
+        tcg_gen_wswap_i64(t0, t0);
+        tcg_gen_wswap_i64(t1, t1);
+        break;
+    case ES_64:
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+write:
+    write_vec_element_i64(t0, get_field(s, v1), 0, ES_64);
+    write_vec_element_i64(t1, get_field(s, v1), 1, ES_64);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    return DISAS_NEXT;
+}
+
 static DisasJumpType op_vle(DisasContext *s, DisasOps *o)
 {
     const uint8_t es = s->insn->data;
@@ -998,6 +1054,63 @@ static DisasJumpType op_vst(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
+static DisasJumpType op_vstbr(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s, m3);
+    TCGv_i64 t0, t1;
+
+    if (es < ES_16 || es > ES_128) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    /* Probe write access before actually modifying memory */
+    gen_helper_probe_write_access(cpu_env, o->addr1, tcg_constant_i64(16));
+
+    t0 = tcg_temp_new_i64();
+    t1 = tcg_temp_new_i64();
+
+
+    if (es == ES_128) {
+        read_vec_element_i64(t1, get_field(s, v1), 0, ES_64);
+        read_vec_element_i64(t0, get_field(s, v1), 1, ES_64);
+        goto write;
+    }
+
+    read_vec_element_i64(t0, get_field(s, v1), 0, ES_64);
+    read_vec_element_i64(t1, get_field(s, v1), 1, ES_64);
+
+    /*
+     * For 16 and 32-bit elements, the doubleword bswap below will
+     * reverse the order of the elements.  Perform a larger order
+     * swap to put them back into place.  For the 128-bit "element",
+     * finish the bswap by swapping the doublewords.
+     */
+    switch (es) {
+    case MO_16:
+        tcg_gen_hswap_i64(t0, t0);
+        tcg_gen_hswap_i64(t1, t1);
+        break;
+    case MO_32:
+        tcg_gen_wswap_i64(t0, t0);
+        tcg_gen_wswap_i64(t1, t1);
+        break;
+    case MO_64:
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+write:
+    tcg_gen_qemu_st_i64(t0, o->addr1, get_mem_index(s), MO_LEUQ);
+    gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8);
+    tcg_gen_qemu_st_i64(t1, o->addr1, get_mem_index(s), MO_LEUQ);
+
+    tcg_temp_free(t0);
+    tcg_temp_free(t1);
+    return DISAS_NEXT;
+}
+
 static DisasJumpType op_vste(DisasContext *s, DisasOps *o)
 {
     const uint8_t es = s->insn->data;
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 11/13] target/s390x: vxeh2: vector {load, store} byte reversed element
  2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
                   ` (9 preceding siblings ...)
  2022-04-28  9:47 ` [PATCH v6 10/13] target/s390x: vxeh2: vector {load, store} byte reversed elements David Hildenbrand
@ 2022-04-28  9:47 ` David Hildenbrand
  2022-04-28  9:47 ` [PATCH v6 12/13] target/s390x: add S390_FEAT_VECTOR_ENH2 to qemu CPU model David Hildenbrand
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-04-28  9:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Eric Farman, David Miller, Halil Pasic, qemu-s390x,
	Christian Borntraeger

From: David Miller <dmiller423@gmail.com>

Signed-off-by: David Miller <dmiller423@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/tcg/insn-data.def      | 12 ++++
 target/s390x/tcg/translate_vx.c.inc | 85 +++++++++++++++++++++++++++++
 2 files changed, 97 insertions(+)

diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index ee6e1dc9e5..5e448bb2c4 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -1027,6 +1027,14 @@
     F(0xe756, VLR,     VRR_a, V,   0, 0, 0, 0, vlr, 0, IF_VEC)
 /* VECTOR LOAD AND REPLICATE */
     F(0xe705, VLREP,   VRX,   V,   la2, 0, 0, 0, vlrep, 0, IF_VEC)
+/* VECTOR LOAD BYTE REVERSED ELEMENT */
+    E(0xe601, VLEBRH,  VRX,   VE2, la2, 0, 0, 0, vlebr, 0, ES_16, IF_VEC)
+    E(0xe603, VLEBRF,  VRX,   VE2, la2, 0, 0, 0, vlebr, 0, ES_32, IF_VEC)
+    E(0xe602, VLEBRG,  VRX,   VE2, la2, 0, 0, 0, vlebr, 0, ES_64, IF_VEC)
+/* VECTOR LOAD BYTE REVERSED ELEMENT AND REPLICATE */
+    F(0xe605, VLBRREP, VRX,   VE2, la2, 0, 0, 0, vlbrrep, 0, IF_VEC)
+/* VECTOR LOAD BYTE REVERSED ELEMENT AND ZERO */
+    F(0xe604, VLLEBRZ, VRX,   VE2, la2, 0, 0, 0, vllebrz, 0, IF_VEC)
 /* VECTOR LOAD BYTE REVERSED ELEMENTS */
     F(0xe606, VLBR,    VRX,   VE2, la2, 0, 0, 0, vlbr, 0, IF_VEC)
 /* VECTOR LOAD ELEMENT */
@@ -1081,6 +1089,10 @@
     F(0xe75f, VSEG,    VRR_a, V,   0, 0, 0, 0, vseg, 0, IF_VEC)
 /* VECTOR STORE */
     F(0xe70e, VST,     VRX,   V,   la2, 0, 0, 0, vst, 0, IF_VEC)
+/* VECTOR STORE BYTE REVERSED ELEMENT */
+    E(0xe609, VSTEBRH,  VRX,   VE2, la2, 0, 0, 0, vstebr, 0, ES_16, IF_VEC)
+    E(0xe60b, VSTEBRF,  VRX,   VE2, la2, 0, 0, 0, vstebr, 0, ES_32, IF_VEC)
+    E(0xe60a, VSTEBRG,  VRX,   VE2, la2, 0, 0, 0, vstebr, 0, ES_64, IF_VEC)
 /* VECTOR STORE BYTE REVERSED ELEMENTS */
     F(0xe60e, VSTBR,    VRX,   VE2, la2, 0, 0, 0, vstbr, 0, IF_VEC)
 /* VECTOR STORE ELEMENT */
diff --git a/target/s390x/tcg/translate_vx.c.inc b/target/s390x/tcg/translate_vx.c.inc
index 75f3fd7edd..3526ba3e3b 100644
--- a/target/s390x/tcg/translate_vx.c.inc
+++ b/target/s390x/tcg/translate_vx.c.inc
@@ -457,6 +457,73 @@ static DisasJumpType op_vlrep(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
+static DisasJumpType op_vlebr(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = s->insn->data;
+    const uint8_t enr = get_field(s, m3);
+    TCGv_i64 tmp;
+
+    if (!valid_vec_element(enr, es)) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tmp = tcg_temp_new_i64();
+    tcg_gen_qemu_ld_i64(tmp, o->addr1, get_mem_index(s), MO_LE | es);
+    write_vec_element_i64(tmp, get_field(s, v1), enr, es);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
+
+static DisasJumpType op_vlbrrep(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = get_field(s, m3);
+    TCGv_i64 tmp;
+
+    if (es < ES_16 || es > ES_64) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tmp = tcg_temp_new_i64();
+    tcg_gen_qemu_ld_i64(tmp, o->addr1, get_mem_index(s), MO_LE | es);
+    gen_gvec_dup_i64(es, get_field(s, v1), tmp);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
+
+static DisasJumpType op_vllebrz(DisasContext *s, DisasOps *o)
+{
+    const uint8_t m3 = get_field(s, m3);
+    TCGv_i64 tmp;
+    int es, lshift;
+
+    switch (m3) {
+    case ES_16:
+    case ES_32:
+    case ES_64:
+        es = m3;
+        lshift = 0;
+        break;
+    case 6:
+        es = ES_32;
+        lshift = 32;
+        break;
+    default:
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tmp = tcg_temp_new_i64();
+    tcg_gen_qemu_ld_i64(tmp, o->addr1, get_mem_index(s), MO_LE | es);
+    tcg_gen_shli_i64(tmp, tmp, lshift);
+
+    write_vec_element_i64(tmp, get_field(s, v1), 0, ES_64);
+    write_vec_element_i64(tcg_constant_i64(0), get_field(s, v1), 1, ES_64);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
+
 static DisasJumpType op_vlbr(DisasContext *s, DisasOps *o)
 {
     const uint8_t es = get_field(s, m3);
@@ -1054,6 +1121,24 @@ static DisasJumpType op_vst(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
+static DisasJumpType op_vstebr(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = s->insn->data;
+    const uint8_t enr = get_field(s, m3);
+    TCGv_i64 tmp;
+
+    if (!valid_vec_element(enr, es)) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tmp = tcg_temp_new_i64();
+    read_vec_element_i64(tmp, get_field(s, v1), enr, es);
+    tcg_gen_qemu_st_i64(tmp, o->addr1, get_mem_index(s), MO_LE | es);
+    tcg_temp_free_i64(tmp);
+    return DISAS_NEXT;
+}
+
 static DisasJumpType op_vstbr(DisasContext *s, DisasOps *o)
 {
     const uint8_t es = get_field(s, m3);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 12/13] target/s390x: add S390_FEAT_VECTOR_ENH2 to qemu CPU model
  2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
                   ` (10 preceding siblings ...)
  2022-04-28  9:47 ` [PATCH v6 11/13] target/s390x: vxeh2: vector {load, store} byte reversed element David Hildenbrand
@ 2022-04-28  9:47 ` David Hildenbrand
  2022-04-28  9:47 ` [PATCH v6 13/13] tests/tcg/s390x: Tests for Vector Enhancements Facility 2 David Hildenbrand
  2022-05-02  7:20 ` [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements " Thomas Huth
  13 siblings, 0 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-04-28  9:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Eric Farman, David Miller, Halil Pasic, qemu-s390x,
	Christian Borntraeger

From: David Miller <dmiller423@gmail.com>

Signed-off-by: David Miller <dmiller423@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
[ dh: take care of compat machines ]
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/s390x/s390-virtio-ccw.c  | 3 +++
 target/s390x/gen-features.c | 7 ++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 8fa488d13a..047cca0487 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -803,7 +803,10 @@ DEFINE_CCW_MACHINE(7_1, "7.1", true);
 
 static void ccw_machine_7_0_instance_options(MachineState *machine)
 {
+    static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V7_0 };
+
     ccw_machine_7_1_instance_options(machine);
+    s390_set_qemu_cpu_model(0x8561, 15, 1, qemu_cpu_feat);
 }
 
 static void ccw_machine_7_0_class_options(MachineClass *mc)
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index ec7d8ceab5..c03ec2c9a9 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -738,13 +738,17 @@ static uint16_t qemu_V6_2[] = {
     S390_FEAT_VECTOR_ENH,
 };
 
+static uint16_t qemu_V7_0[] = {
+    S390_FEAT_MISC_INSTRUCTION_EXT3,
+};
+
 /*
  * Features for the "qemu" CPU model of the latest QEMU machine and the "max"
  * CPU model under TCG. Don't include features that are not part of the full
  * feature set of the current "max" CPU model generation.
  */
 static uint16_t qemu_MAX[] = {
-    S390_FEAT_MISC_INSTRUCTION_EXT3,
+    S390_FEAT_VECTOR_ENH2,
 };
 
 /****** END FEATURE DEFS ******/
@@ -866,6 +870,7 @@ static FeatGroupDefSpec QemuFeatDef[] = {
     QEMU_FEAT_INITIALIZER(V4_1),
     QEMU_FEAT_INITIALIZER(V6_0),
     QEMU_FEAT_INITIALIZER(V6_2),
+    QEMU_FEAT_INITIALIZER(V7_0),
     QEMU_FEAT_INITIALIZER(MAX),
 };
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 13/13] tests/tcg/s390x: Tests for Vector Enhancements Facility 2
  2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
                   ` (11 preceding siblings ...)
  2022-04-28  9:47 ` [PATCH v6 12/13] target/s390x: add S390_FEAT_VECTOR_ENH2 to qemu CPU model David Hildenbrand
@ 2022-04-28  9:47 ` David Hildenbrand
  2022-05-02  8:12   ` Thomas Huth
  2022-05-02  9:35   ` Thomas Huth
  2022-05-02  7:20 ` [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements " Thomas Huth
  13 siblings, 2 replies; 26+ messages in thread
From: David Hildenbrand @ 2022-04-28  9:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, David Hildenbrand, Cornelia Huck, Richard Henderson,
	Eric Farman, David Miller, Halil Pasic, qemu-s390x,
	Christian Borntraeger

From: David Miller <dmiller423@gmail.com>

Signed-off-by: David Miller <dmiller423@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 tests/tcg/s390x/Makefile.target |   8 ++
 tests/tcg/s390x/vx.h            |  19 +++++
 tests/tcg/s390x/vxeh2_vcvt.c    |  88 ++++++++++++++++++++
 tests/tcg/s390x/vxeh2_vlstr.c   | 139 ++++++++++++++++++++++++++++++++
 tests/tcg/s390x/vxeh2_vs.c      |  93 +++++++++++++++++++++
 5 files changed, 347 insertions(+)
 create mode 100644 tests/tcg/s390x/vx.h
 create mode 100644 tests/tcg/s390x/vxeh2_vcvt.c
 create mode 100644 tests/tcg/s390x/vxeh2_vlstr.c
 create mode 100644 tests/tcg/s390x/vxeh2_vs.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index f0d474a245..e50d617f21 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -17,6 +17,14 @@ TESTS+=trap
 TESTS+=signals-s390x
 TESTS+=branch-relative-long
 
+VECTOR_TESTS=vxeh2_vs
+VECTOR_TESTS+=vxeh2_vcvt
+VECTOR_TESTS+=vxeh2_vlstr
+
+TESTS+=$(VECTOR_TESTS)
+
+$(VECTOR_TESTS): CFLAGS+=-march=z15 -O2
+
 ifneq ($(HAVE_GDB_BIN),)
 GDB_SCRIPT=$(SRC_PATH)/tests/guest-debug/run-test.py
 
diff --git a/tests/tcg/s390x/vx.h b/tests/tcg/s390x/vx.h
new file mode 100644
index 0000000000..02e7fd518a
--- /dev/null
+++ b/tests/tcg/s390x/vx.h
@@ -0,0 +1,19 @@
+#ifndef QEMU_TESTS_S390X_VX_H
+#define QEMU_TESTS_S390X_VX_H
+
+typedef union S390Vector {
+    uint64_t d[2];  /* doubleword */
+    uint32_t w[4];  /* word */
+    uint16_t h[8];  /* halfword */
+    uint8_t  b[16]; /* byte */
+    float    f[4];  /* float32 */
+    double   fd[2]; /* float64 */
+    __uint128_t v;
+} S390Vector;
+
+#define ES8  0
+#define ES16 1
+#define ES32 2
+#define ES64 3
+
+#endif /* QEMU_TESTS_S390X_VX_H */
diff --git a/tests/tcg/s390x/vxeh2_vcvt.c b/tests/tcg/s390x/vxeh2_vcvt.c
new file mode 100644
index 0000000000..d6e551c16e
--- /dev/null
+++ b/tests/tcg/s390x/vxeh2_vcvt.c
@@ -0,0 +1,88 @@
+/*
+ * vxeh2_vcvt: vector-enhancements facility 2 vector convert *
+ */
+#include <stdint.h>
+#include "vx.h"
+
+#define M_S 8
+#define M4_XxC 4
+#define M4_def M4_XxC
+
+static inline void vcfps(S390Vector *v1, S390Vector *v2,
+    const uint8_t m3,  const uint8_t m4,  const uint8_t m5)
+{
+    asm volatile("vcfps %[v1], %[v2], %[m3], %[m4], %[m5]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [m3]  "i" (m3)
+                , [m4]  "i" (m4)
+                , [m5]  "i" (m5));
+}
+
+static inline void vcfpl(S390Vector *v1, S390Vector *v2,
+    const uint8_t m3,  const uint8_t m4,  const uint8_t m5)
+{
+    asm volatile("vcfpl %[v1], %[v2], %[m3], %[m4], %[m5]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [m3]  "i" (m3)
+                , [m4]  "i" (m4)
+                , [m5]  "i" (m5));
+}
+
+static inline void vcsfp(S390Vector *v1, S390Vector *v2,
+    const uint8_t m3,  const uint8_t m4,  const uint8_t m5)
+{
+    asm volatile("vcsfp %[v1], %[v2], %[m3], %[m4], %[m5]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [m3]  "i" (m3)
+                , [m4]  "i" (m4)
+                , [m5]  "i" (m5));
+}
+
+static inline void vclfp(S390Vector *v1, S390Vector *v2,
+    const uint8_t m3,  const uint8_t m4,  const uint8_t m5)
+{
+    asm volatile("vclfp %[v1], %[v2], %[m3], %[m4], %[m5]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [m3]  "i" (m3)
+                , [m4]  "i" (m4)
+                , [m5]  "i" (m5));
+}
+
+int main(int argc, char *argv[])
+{
+    S390Vector vd;
+    S390Vector vs_i32 = { .w[0] = 1, .w[1] = 64, .w[2] = 1024, .w[3] = -10 };
+    S390Vector vs_u32 = { .w[0] = 2, .w[1] = 32, .w[2] = 4096, .w[3] = 8888 };
+    S390Vector vs_f32 = { .f[0] = 3.987, .f[1] = 5.123,
+                          .f[2] = 4.499, .f[3] = 0.512 };
+
+    vd.d[0] = vd.d[1] = 0;
+    vcfps(&vd, &vs_i32, 2, M4_def, 0);
+    if (1 != vd.f[0] || 1024 != vd.f[2] || 64 != vd.f[1] || -10 != vd.f[3]) {
+        return 1;
+    }
+
+    vd.d[0] = vd.d[1] = 0;
+    vcfpl(&vd, &vs_u32, 2, M4_def, 0);
+    if (2 != vd.f[0] || 4096 != vd.f[2] || 32 != vd.f[1] || 8888 != vd.f[3]) {
+        return 1;
+    }
+
+    vd.d[0] = vd.d[1] = 0;
+    vcsfp(&vd, &vs_f32, 2, M4_def, 0);
+    if (4 != vd.w[0] || 4 != vd.w[2] || 5 != vd.w[1] || 1 != vd.w[3]) {
+        return 1;
+    }
+
+    vd.d[0] = vd.d[1] = 0;
+    vclfp(&vd, &vs_f32, 2, M4_def, 0);
+    if (4 != vd.w[0] || 4 != vd.w[2] || 5 != vd.w[1] || 1 != vd.w[3]) {
+        return 1;
+    }
+
+    return 0;
+}
diff --git a/tests/tcg/s390x/vxeh2_vlstr.c b/tests/tcg/s390x/vxeh2_vlstr.c
new file mode 100644
index 0000000000..5677bf7c29
--- /dev/null
+++ b/tests/tcg/s390x/vxeh2_vlstr.c
@@ -0,0 +1,139 @@
+/*
+ * vxeh2_vlstr: vector-enhancements facility 2 vector load/store reversed *
+ */
+#include <stdint.h>
+#include "vx.h"
+
+#define vtst(v1, v2) \
+    if (v1.d[0] != v2.d[0] || v1.d[1] != v2.d[1]) { \
+        return 1;     \
+    }
+
+static inline void vler(S390Vector *v1, const void *va, uint8_t m3)
+{
+    asm volatile("vler %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vster(S390Vector *v1, const void *va, uint8_t m3)
+{
+    asm volatile("vster %[v1], 0(%[va]), %[m3]\n"
+                : [va] "+d" (va)
+                : [v1]  "v" (v1->v)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vlbr(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile("vlbr %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vstbr(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile("vstbr %[v1], 0(%[va]), %[m3]\n"
+                : [va] "+d" (va)
+                : [v1]  "v" (v1->v)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+
+static inline void vlebrh(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile("vlebrh %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vstebrh(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile("vstebrh %[v1], 0(%[va]), %[m3]\n"
+                : [va] "+d" (va)
+                : [v1]  "v" (v1->v)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vllebrz(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile("vllebrz %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vlbrrep(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile("vlbrrep %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+int main(int argc, char *argv[])
+{
+    S390Vector vd = { .d[0] = 0, .d[1] = 0 };
+    S390Vector vs = { .d[0] = 0x8FEEDDCCBBAA9988ull,
+                      .d[1] = 0x7766554433221107ull };
+
+    const S390Vector vt_v_er16 = {
+        .h[0] = 0x1107, .h[1] = 0x3322, .h[2] = 0x5544, .h[3] = 0x7766,
+        .h[4] = 0x9988, .h[5] = 0xBBAA, .h[6] = 0xDDCC, .h[7] = 0x8FEE };
+
+    const S390Vector vt_v_br16 = {
+        .h[0] = 0xEE8F, .h[1] = 0xCCDD, .h[2] = 0xAABB, .h[3] = 0x8899,
+        .h[4] = 0x6677, .h[5] = 0x4455, .h[6] = 0x2233, .h[7] = 0x0711 };
+
+    int ix;
+    uint64_t ss64 = 0xFEEDFACE0BADBEEFull, sd64 = 0;
+
+    vler(&vd, &vs, ES16);
+    vtst(vd, vt_v_er16);
+
+    vster(&vs, &vd, ES16);
+    vtst(vd, vt_v_er16);
+
+    vlbr(&vd, &vs, ES16);
+    vtst(vd, vt_v_br16);
+
+    vstbr(&vs, &vd, ES16);
+    vtst(vd, vt_v_br16);
+
+    vlebrh(&vd, &ss64, 5);
+    if (0xEDFE != vd.h[5]) {
+        return 1;
+    }
+
+    vstebrh(&vs, (uint8_t *)&sd64 + 4, 7);
+    if (0x0000000007110000ull != sd64) {
+        return 1;
+    }
+
+    vllebrz(&vd, (uint8_t *)&ss64 + 3, 2);
+    for (ix = 0; ix < 4; ix++) {
+        if (vd.w[ix] != (ix != 1 ? 0 : 0xBEAD0BCE)) {
+            return 1;
+        }
+    }
+
+    vlbrrep(&vd, (uint8_t *)&ss64 + 4, 1);
+    for (ix = 0; ix < 8; ix++) {
+        if (0xAD0B != vd.h[ix]) {
+            return 1;
+        }
+    }
+
+    return 0;
+}
diff --git a/tests/tcg/s390x/vxeh2_vs.c b/tests/tcg/s390x/vxeh2_vs.c
new file mode 100644
index 0000000000..b7ef419d79
--- /dev/null
+++ b/tests/tcg/s390x/vxeh2_vs.c
@@ -0,0 +1,93 @@
+/*
+ * vxeh2_vs: vector-enhancements facility 2 vector shift
+ */
+#include <stdint.h>
+#include "vx.h"
+
+#define vtst(v1, v2) \
+    if (v1.d[0] != v2.d[0] || v1.d[1] != v2.d[1]) { \
+        return 1;     \
+    }
+
+static inline void vsl(S390Vector *v1, S390Vector *v2, S390Vector *v3)
+{
+    asm volatile("vsl %[v1], %[v2], %[v3]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v));
+}
+
+static inline void vsra(S390Vector *v1, S390Vector *v2, S390Vector *v3)
+{
+    asm volatile("vsra %[v1], %[v2], %[v3]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v));
+}
+
+static inline void vsrl(S390Vector *v1, S390Vector *v2, S390Vector *v3)
+{
+    asm volatile("vsrl %[v1], %[v2], %[v3]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v));
+}
+
+static inline void vsld(S390Vector *v1, S390Vector *v2,
+    S390Vector *v3, const uint8_t I)
+{
+    asm volatile("vsld %[v1], %[v2], %[v3], %[I]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v)
+                , [I]   "i" (I & 7));
+}
+
+static inline void vsrd(S390Vector *v1, S390Vector *v2,
+    S390Vector *v3, const uint8_t I)
+{
+    asm volatile("vsrd %[v1], %[v2], %[v3], %[I]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v)
+                , [I]   "i" (I & 7));
+}
+
+int main(int argc, char *argv[])
+{
+    const S390Vector vt_vsl  = { .d[0] = 0x7FEDBB32D5AA311Dull,
+                                 .d[1] = 0xBB65AA10912220C0ull };
+    const S390Vector vt_vsra = { .d[0] = 0xF1FE6E7399AA5466ull,
+                                 .d[1] = 0x0E762A5188221044ull };
+    const S390Vector vt_vsrl = { .d[0] = 0x11FE6E7399AA5466ull,
+                                 .d[1] = 0x0E762A5188221044ull };
+    const S390Vector vt_vsld = { .d[0] = 0x7F76EE65DD54CC43ull,
+                                 .d[1] = 0xBB32AA2199108838ull };
+    const S390Vector vt_vsrd = { .d[0] = 0x0E060802040E000Aull,
+                                 .d[1] = 0x0C060802040E000Aull };
+    S390Vector vs  = { .d[0] = 0x8FEEDDCCBBAA9988ull,
+                       .d[1] = 0x7766554433221107ull };
+    S390Vector  vd = { .d[0] = 0, .d[1] = 0 };
+    S390Vector vsi = { .d[0] = 0, .d[1] = 0 };
+
+    for (int ix = 0; ix < 16; ix++) {
+        vsi.b[ix] = (1 + (5 ^ ~ix)) & 7;
+    }
+
+    vsl(&vd, &vs, &vsi);
+    vtst(vd, vt_vsl);
+
+    vsra(&vd, &vs, &vsi);
+    vtst(vd, vt_vsra);
+
+    vsrl(&vd, &vs, &vsi);
+    vtst(vd, vt_vsrl);
+
+    vsld(&vd, &vs, &vsi, 3);
+    vtst(vd, vt_vsld);
+
+    vsrd(&vd, &vs, &vsi, 15);
+    vtst(vd, vt_vsrd);
+
+    return 0;
+}
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2
  2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
                   ` (12 preceding siblings ...)
  2022-04-28  9:47 ` [PATCH v6 13/13] tests/tcg/s390x: Tests for Vector Enhancements Facility 2 David Hildenbrand
@ 2022-05-02  7:20 ` Thomas Huth
  2022-05-02 15:52   ` David Hildenbrand
  13 siblings, 1 reply; 26+ messages in thread
From: Thomas Huth @ 2022-05-02  7:20 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel, David Miller
  Cc: Eric Farman, Cornelia Huck, Richard Henderson, Halil Pasic,
	qemu-s390x, Christian Borntraeger

On 28/04/2022 11.46, David Hildenbrand wrote:
> Implement Vector-Enhancements Facility 2 for s390x
> 
> resolves: https://gitlab.com/qemu-project/qemu/-/issues/738
> 
> implements:
>      VECTOR LOAD ELEMENTS REVERSED               (VLER)
>      VECTOR LOAD BYTE REVERSED ELEMENTS          (VLBR)
>      VECTOR LOAD BYTE REVERSED ELEMENT           (VLEBRH, VLEBRF, VLEBRG)
>      VECTOR LOAD BYTE REVERSED ELEMENT AND ZERO  (VLLEBRZ)
>      VECTOR LOAD BYTE REVERSED ELEMENT AND REPLICATE (VLBRREP)
>      VECTOR STORE ELEMENTS REVERSED              (VSTER)
>      VECTOR STORE BYTE REVERSED ELEMENTS         (VSTBR)
>      VECTOR STORE BYTE REVERSED ELEMENTS         (VSTEBRH, VSTEBRF, VSTEBRG)
>      VECTOR SHIFT LEFT DOUBLE BY BIT             (VSLD)
>      VECTOR SHIFT RIGHT DOUBLE BY BIT            (VSRD)
>      VECTOR STRING SEARCH                        (VSTRS)
> 
>      modifies:
>      VECTOR FP CONVERT FROM FIXED                (VCFPS)
>      VECTOR FP CONVERT FROM LOGICAL              (VCFPL)
>      VECTOR FP CONVERT TO FIXED                  (VCSFP)
>      VECTOR FP CONVERT TO LOGICAL                (VCLFP)
>      VECTOR SHIFT LEFT                           (VSL)
>      VECTOR SHIFT RIGHT ARITHMETIC               (VSRA)
>      VECTOR SHIFT RIGHT LOGICAL                  (VSRL)

Thanks, queued to my s390x-next branch now:

  https://gitlab.com/thuth/qemu/-/commits/s390x-next/

  Thomas




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 13/13] tests/tcg/s390x: Tests for Vector Enhancements Facility 2
  2022-04-28  9:47 ` [PATCH v6 13/13] tests/tcg/s390x: Tests for Vector Enhancements Facility 2 David Hildenbrand
@ 2022-05-02  8:12   ` Thomas Huth
  2022-05-02  9:10     ` Thomas Huth
  2022-05-02  9:35   ` Thomas Huth
  1 sibling, 1 reply; 26+ messages in thread
From: Thomas Huth @ 2022-05-02  8:12 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel, David Miller, Christian Borntraeger
  Cc: Halil Pasic, qemu-s390x, Cornelia Huck, Richard Henderson, Eric Farman

On 28/04/2022 11.47, David Hildenbrand wrote:
> From: David Miller <dmiller423@gmail.com>
> 
> Signed-off-by: David Miller <dmiller423@gmail.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Tested-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>   tests/tcg/s390x/Makefile.target |   8 ++
>   tests/tcg/s390x/vx.h            |  19 +++++
>   tests/tcg/s390x/vxeh2_vcvt.c    |  88 ++++++++++++++++++++
>   tests/tcg/s390x/vxeh2_vlstr.c   | 139 ++++++++++++++++++++++++++++++++
>   tests/tcg/s390x/vxeh2_vs.c      |  93 +++++++++++++++++++++
>   5 files changed, 347 insertions(+)
>   create mode 100644 tests/tcg/s390x/vx.h
>   create mode 100644 tests/tcg/s390x/vxeh2_vcvt.c
>   create mode 100644 tests/tcg/s390x/vxeh2_vlstr.c
>   create mode 100644 tests/tcg/s390x/vxeh2_vs.c
> 
> diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
> index f0d474a245..e50d617f21 100644
> --- a/tests/tcg/s390x/Makefile.target
> +++ b/tests/tcg/s390x/Makefile.target
> @@ -17,6 +17,14 @@ TESTS+=trap
>   TESTS+=signals-s390x
>   TESTS+=branch-relative-long
>   
> +VECTOR_TESTS=vxeh2_vs
> +VECTOR_TESTS+=vxeh2_vcvt
> +VECTOR_TESTS+=vxeh2_vlstr
> +
> +TESTS+=$(VECTOR_TESTS)
> +
> +$(VECTOR_TESTS): CFLAGS+=-march=z15 -O2

I'm sorry, but this still fails in the QEMU CI:

https://gitlab.com/thuth/qemu/-/jobs/2401500348

s390x-linux-gnu-gcc: error: unrecognized argument in option '-march=z15'

I think we either have to switch to manually encoded instructions again, or 
add a check to the Makefile and only add the tests if the compiler supports 
-march=z15 ...? Opinions? Preferences?

  Thomas



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 13/13] tests/tcg/s390x: Tests for Vector Enhancements Facility 2
  2022-05-02  8:12   ` Thomas Huth
@ 2022-05-02  9:10     ` Thomas Huth
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Huth @ 2022-05-02  9:10 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel, David Miller, Christian Borntraeger
  Cc: Halil Pasic, qemu-s390x, Cornelia Huck, Richard Henderson, Eric Farman

On 02/05/2022 10.12, Thomas Huth wrote:
> On 28/04/2022 11.47, David Hildenbrand wrote:
>> From: David Miller <dmiller423@gmail.com>
>>
>> Signed-off-by: David Miller <dmiller423@gmail.com>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> Tested-by: Thomas Huth <thuth@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   tests/tcg/s390x/Makefile.target |   8 ++
>>   tests/tcg/s390x/vx.h            |  19 +++++
>>   tests/tcg/s390x/vxeh2_vcvt.c    |  88 ++++++++++++++++++++
>>   tests/tcg/s390x/vxeh2_vlstr.c   | 139 ++++++++++++++++++++++++++++++++
>>   tests/tcg/s390x/vxeh2_vs.c      |  93 +++++++++++++++++++++
>>   5 files changed, 347 insertions(+)
>>   create mode 100644 tests/tcg/s390x/vx.h
>>   create mode 100644 tests/tcg/s390x/vxeh2_vcvt.c
>>   create mode 100644 tests/tcg/s390x/vxeh2_vlstr.c
>>   create mode 100644 tests/tcg/s390x/vxeh2_vs.c
>>
>> diff --git a/tests/tcg/s390x/Makefile.target 
>> b/tests/tcg/s390x/Makefile.target
>> index f0d474a245..e50d617f21 100644
>> --- a/tests/tcg/s390x/Makefile.target
>> +++ b/tests/tcg/s390x/Makefile.target
>> @@ -17,6 +17,14 @@ TESTS+=trap
>>   TESTS+=signals-s390x
>>   TESTS+=branch-relative-long
>> +VECTOR_TESTS=vxeh2_vs
>> +VECTOR_TESTS+=vxeh2_vcvt
>> +VECTOR_TESTS+=vxeh2_vlstr
>> +
>> +TESTS+=$(VECTOR_TESTS)
>> +
>> +$(VECTOR_TESTS): CFLAGS+=-march=z15 -O2
> 
> I'm sorry, but this still fails in the QEMU CI:
> 
> https://gitlab.com/thuth/qemu/-/jobs/2401500348
> 
> s390x-linux-gnu-gcc: error: unrecognized argument in option '-march=z15'
> 
> I think we either have to switch to manually encoded instructions again, or 
> add a check to the Makefile and only add the tests if the compiler supports 
> -march=z15 ...? Opinions? Preferences?

I just tried, and seems like something like this should do the job, I think:

diff a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -20,11 +20,11 @@ TESTS+=branch-relative-long
  VECTOR_TESTS=vxeh2_vs
  VECTOR_TESTS+=vxeh2_vcvt
  VECTOR_TESTS+=vxeh2_vlstr
-
-TESTS+=$(VECTOR_TESTS)
-
  $(VECTOR_TESTS): CFLAGS+=-march=z15 -O2
  
+TESTS+=$(if $(shell $(CC) -march=z15 -S -o /dev/null -xc /dev/null \
+                        >/dev/null 2>&1 && echo OK),$(VECTOR_TESTS))
+
  ifneq ($(HAVE_GDB_BIN),)
  GDB_SCRIPT=$(SRC_PATH)/tests/guest-debug/run-test.py
  
Does that look reasonable?

  Thomas



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 13/13] tests/tcg/s390x: Tests for Vector Enhancements Facility 2
  2022-04-28  9:47 ` [PATCH v6 13/13] tests/tcg/s390x: Tests for Vector Enhancements Facility 2 David Hildenbrand
  2022-05-02  8:12   ` Thomas Huth
@ 2022-05-02  9:35   ` Thomas Huth
  1 sibling, 0 replies; 26+ messages in thread
From: Thomas Huth @ 2022-05-02  9:35 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Eric Farman, Cornelia Huck, Richard Henderson, David Miller,
	Halil Pasic, qemu-s390x, Christian Borntraeger

On 28/04/2022 11.47, David Hildenbrand wrote:
> From: David Miller <dmiller423@gmail.com>
> 
> Signed-off-by: David Miller <dmiller423@gmail.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Tested-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
[...]
> diff --git a/tests/tcg/s390x/vxeh2_vlstr.c b/tests/tcg/s390x/vxeh2_vlstr.c
> new file mode 100644
> index 0000000000..5677bf7c29
> --- /dev/null
> +++ b/tests/tcg/s390x/vxeh2_vlstr.c
> @@ -0,0 +1,139 @@
> +/*
> + * vxeh2_vlstr: vector-enhancements facility 2 vector load/store reversed *
> + */
> +#include <stdint.h>
> +#include "vx.h"
> +
> +#define vtst(v1, v2) \
> +    if (v1.d[0] != v2.d[0] || v1.d[1] != v2.d[1]) { \
> +        return 1;     \
> +    }
> +
> +static inline void vler(S390Vector *v1, const void *va, uint8_t m3)
> +{
> +    asm volatile("vler %[v1], 0(%[va]), %[m3]\n"
> +                : [v1] "+v" (v1->v)
> +                : [va]  "d" (va)
> +                , [m3]  "i" (m3)
> +                : "memory");
> +}

The vxeh2_vlstr test fails when compiling with Clang instead of GCC ... 
seems like it enjoys using register r0 in the spots that use the "d" 
constraints in the inline assembly in here. The fix is easy:

diff a/tests/tcg/s390x/vxeh2_vlstr.c b/tests/tcg/s390x/vxeh2_vlstr.c
--- a/tests/tcg/s390x/vxeh2_vlstr.c
+++ b/tests/tcg/s390x/vxeh2_vlstr.c
@@ -13,7 +13,7 @@ static inline void vler(S390Vector *v1, const void *va, 
uint8_t m3)
  {
      asm volatile("vler %[v1], 0(%[va]), %[m3]\n"
                  : [v1] "+v" (v1->v)
-                : [va]  "d" (va)
+                : [va]  "a" (va)
                  , [m3]  "i" (m3)
                  : "memory");
  }
@@ -21,7 +21,7 @@ static inline void vler(S390Vector *v1, const void *va, 
uint8_t m3)
  static inline void vster(S390Vector *v1, const void *va, uint8_t m3)
  {
      asm volatile("vster %[v1], 0(%[va]), %[m3]\n"
-                : [va] "+d" (va)
+                : [va] "+a" (va)
                  : [v1]  "v" (v1->v)
                  , [m3]  "i" (m3)
                  : "memory");
@@ -31,7 +31,7 @@ static inline void vlbr(S390Vector *v1, void *va, const 
uint8_t m3)
  {
      asm volatile("vlbr %[v1], 0(%[va]), %[m3]\n"
                  : [v1] "+v" (v1->v)
-                : [va]  "d" (va)
+                : [va]  "a" (va)
                  , [m3]  "i" (m3)
                  : "memory");
  }
@@ -39,7 +39,7 @@ static inline void vlbr(S390Vector *v1, void *va, const 
uint8_t m3)
  static inline void vstbr(S390Vector *v1, void *va, const uint8_t m3)
  {
      asm volatile("vstbr %[v1], 0(%[va]), %[m3]\n"
-                : [va] "+d" (va)
+                : [va] "+a" (va)
                  : [v1]  "v" (v1->v)
                  , [m3]  "i" (m3)
                  : "memory");
@@ -50,7 +50,7 @@ static inline void vlebrh(S390Vector *v1, void *va, const 
uint8_t m3)
  {
      asm volatile("vlebrh %[v1], 0(%[va]), %[m3]\n"
                  : [v1] "+v" (v1->v)
-                : [va]  "d" (va)
+                : [va]  "a" (va)
                  , [m3]  "i" (m3)
                  : "memory");
  }
@@ -58,7 +58,7 @@ static inline void vlebrh(S390Vector *v1, void *va, const 
uint8_t m3)
  static inline void vstebrh(S390Vector *v1, void *va, const uint8_t m3)
  {
      asm volatile("vstebrh %[v1], 0(%[va]), %[m3]\n"
-                : [va] "+d" (va)
+                : [va] "+a" (va)
                  : [v1]  "v" (v1->v)
                  , [m3]  "i" (m3)
                  : "memory");
@@ -68,7 +68,7 @@ static inline void vllebrz(S390Vector *v1, void *va, const 
uint8_t m3)
  {
      asm volatile("vllebrz %[v1], 0(%[va]), %[m3]\n"
                  : [v1] "+v" (v1->v)
-                : [va]  "d" (va)
+                : [va]  "a" (va)
                  , [m3]  "i" (m3)
                  : "memory");
  }
@@ -77,7 +77,7 @@ static inline void vlbrrep(S390Vector *v1, void *va, const 
uint8_t m3)
  {
      asm volatile("vlbrrep %[v1], 0(%[va]), %[m3]\n"
                  : [v1] "+v" (v1->v)
-                : [va]  "d" (va)
+                : [va]  "a" (va)
                  , [m3]  "i" (m3)
                  : "memory");
  }

I'll fix it up in my queue, so no need to resend.

  Thomas



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2
  2022-05-02  7:20 ` [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements " Thomas Huth
@ 2022-05-02 15:52   ` David Hildenbrand
  2022-05-02 16:06     ` David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: David Hildenbrand @ 2022-05-02 15:52 UTC (permalink / raw)
  To: Thomas Huth, qemu-devel, David Miller
  Cc: Eric Farman, Cornelia Huck, Richard Henderson, Halil Pasic,
	qemu-s390x, Christian Borntraeger

On 02.05.22 09:20, Thomas Huth wrote:
> On 28/04/2022 11.46, David Hildenbrand wrote:
>> Implement Vector-Enhancements Facility 2 for s390x
>>
>> resolves: https://gitlab.com/qemu-project/qemu/-/issues/738
>>
>> implements:
>>      VECTOR LOAD ELEMENTS REVERSED               (VLER)
>>      VECTOR LOAD BYTE REVERSED ELEMENTS          (VLBR)
>>      VECTOR LOAD BYTE REVERSED ELEMENT           (VLEBRH, VLEBRF, VLEBRG)
>>      VECTOR LOAD BYTE REVERSED ELEMENT AND ZERO  (VLLEBRZ)
>>      VECTOR LOAD BYTE REVERSED ELEMENT AND REPLICATE (VLBRREP)
>>      VECTOR STORE ELEMENTS REVERSED              (VSTER)
>>      VECTOR STORE BYTE REVERSED ELEMENTS         (VSTBR)
>>      VECTOR STORE BYTE REVERSED ELEMENTS         (VSTEBRH, VSTEBRF, VSTEBRG)
>>      VECTOR SHIFT LEFT DOUBLE BY BIT             (VSLD)
>>      VECTOR SHIFT RIGHT DOUBLE BY BIT            (VSRD)
>>      VECTOR STRING SEARCH                        (VSTRS)
>>
>>      modifies:
>>      VECTOR FP CONVERT FROM FIXED                (VCFPS)
>>      VECTOR FP CONVERT FROM LOGICAL              (VCFPL)
>>      VECTOR FP CONVERT TO FIXED                  (VCSFP)
>>      VECTOR FP CONVERT TO LOGICAL                (VCLFP)
>>      VECTOR SHIFT LEFT                           (VSL)
>>      VECTOR SHIFT RIGHT ARITHMETIC               (VSRA)
>>      VECTOR SHIFT RIGHT LOGICAL                  (VSRL)
> 
> Thanks, queued to my s390x-next branch now:
> 
>   https://gitlab.com/thuth/qemu/-/commits/s390x-next/
>
Thanks for fixing up. At this point I would have suggested to exclude
the tests for now.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2
  2022-05-02 15:52   ` David Hildenbrand
@ 2022-05-02 16:06     ` David Miller
  2022-05-03  6:55       ` Thomas Huth
  0 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2022-05-02 16:06 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Thomas Huth, Eric Farman, Cornelia Huck, Richard Henderson,
	qemu-devel, Halil Pasic, qemu-s390x, Christian Borntraeger

There was also the patch that had them as .insn in the other series of emails.

On Mon, May 2, 2022 at 11:52 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 02.05.22 09:20, Thomas Huth wrote:
> > On 28/04/2022 11.46, David Hildenbrand wrote:
> >> Implement Vector-Enhancements Facility 2 for s390x
> >>
> >> resolves: https://gitlab.com/qemu-project/qemu/-/issues/738
> >>
> >> implements:
> >>      VECTOR LOAD ELEMENTS REVERSED               (VLER)
> >>      VECTOR LOAD BYTE REVERSED ELEMENTS          (VLBR)
> >>      VECTOR LOAD BYTE REVERSED ELEMENT           (VLEBRH, VLEBRF, VLEBRG)
> >>      VECTOR LOAD BYTE REVERSED ELEMENT AND ZERO  (VLLEBRZ)
> >>      VECTOR LOAD BYTE REVERSED ELEMENT AND REPLICATE (VLBRREP)
> >>      VECTOR STORE ELEMENTS REVERSED              (VSTER)
> >>      VECTOR STORE BYTE REVERSED ELEMENTS         (VSTBR)
> >>      VECTOR STORE BYTE REVERSED ELEMENTS         (VSTEBRH, VSTEBRF, VSTEBRG)
> >>      VECTOR SHIFT LEFT DOUBLE BY BIT             (VSLD)
> >>      VECTOR SHIFT RIGHT DOUBLE BY BIT            (VSRD)
> >>      VECTOR STRING SEARCH                        (VSTRS)
> >>
> >>      modifies:
> >>      VECTOR FP CONVERT FROM FIXED                (VCFPS)
> >>      VECTOR FP CONVERT FROM LOGICAL              (VCFPL)
> >>      VECTOR FP CONVERT TO FIXED                  (VCSFP)
> >>      VECTOR FP CONVERT TO LOGICAL                (VCLFP)
> >>      VECTOR SHIFT LEFT                           (VSL)
> >>      VECTOR SHIFT RIGHT ARITHMETIC               (VSRA)
> >>      VECTOR SHIFT RIGHT LOGICAL                  (VSRL)
> >
> > Thanks, queued to my s390x-next branch now:
> >
> >   https://gitlab.com/thuth/qemu/-/commits/s390x-next/
> >
> Thanks for fixing up. At this point I would have suggested to exclude
> the tests for now.
>
> --
> Thanks,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2
  2022-05-02 16:06     ` David Miller
@ 2022-05-03  6:55       ` Thomas Huth
  2022-05-03 14:42         ` David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Huth @ 2022-05-03  6:55 UTC (permalink / raw)
  To: David Miller, David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Richard Henderson, Christian Borntraeger,
	Cornelia Huck, Halil Pasic, Eric Farman

  Hi!

On 02/05/2022 18.06, David Miller wrote:
> There was also the patch that had them as .insn in the other series of emails.

Sorry, I missed that patch, could you please point me to the mail on 
https://lore.kernel.org/qemu-devel/ ? I remember that there was a discussion 
about the vri-d encoding, but I apparently missed the patch that came out of 
this discussion...

  Thomas

> On Mon, May 2, 2022 at 11:52 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 02.05.22 09:20, Thomas Huth wrote:
>>> On 28/04/2022 11.46, David Hildenbrand wrote:
>>>> Implement Vector-Enhancements Facility 2 for s390x
>>>>
>>>> resolves: https://gitlab.com/qemu-project/qemu/-/issues/738
>>>>
>>>> implements:
>>>>       VECTOR LOAD ELEMENTS REVERSED               (VLER)
>>>>       VECTOR LOAD BYTE REVERSED ELEMENTS          (VLBR)
>>>>       VECTOR LOAD BYTE REVERSED ELEMENT           (VLEBRH, VLEBRF, VLEBRG)
>>>>       VECTOR LOAD BYTE REVERSED ELEMENT AND ZERO  (VLLEBRZ)
>>>>       VECTOR LOAD BYTE REVERSED ELEMENT AND REPLICATE (VLBRREP)
>>>>       VECTOR STORE ELEMENTS REVERSED              (VSTER)
>>>>       VECTOR STORE BYTE REVERSED ELEMENTS         (VSTBR)
>>>>       VECTOR STORE BYTE REVERSED ELEMENTS         (VSTEBRH, VSTEBRF, VSTEBRG)
>>>>       VECTOR SHIFT LEFT DOUBLE BY BIT             (VSLD)
>>>>       VECTOR SHIFT RIGHT DOUBLE BY BIT            (VSRD)
>>>>       VECTOR STRING SEARCH                        (VSTRS)
>>>>
>>>>       modifies:
>>>>       VECTOR FP CONVERT FROM FIXED                (VCFPS)
>>>>       VECTOR FP CONVERT FROM LOGICAL              (VCFPL)
>>>>       VECTOR FP CONVERT TO FIXED                  (VCSFP)
>>>>       VECTOR FP CONVERT TO LOGICAL                (VCLFP)
>>>>       VECTOR SHIFT LEFT                           (VSL)
>>>>       VECTOR SHIFT RIGHT ARITHMETIC               (VSRA)
>>>>       VECTOR SHIFT RIGHT LOGICAL                  (VSRL)
>>>
>>> Thanks, queued to my s390x-next branch now:
>>>
>>>    https://gitlab.com/thuth/qemu/-/commits/s390x-next/
>>>
>> Thanks for fixing up. At this point I would have suggested to exclude
>> the tests for now.
>>
>> --
>> Thanks,
>>
>> David / dhildenb
>>
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2
  2022-05-03  6:55       ` Thomas Huth
@ 2022-05-03 14:42         ` David Miller
  2022-05-03 14:57           ` David Miller
  2022-05-04  9:10           ` Thomas Huth
  0 siblings, 2 replies; 26+ messages in thread
From: David Miller @ 2022-05-03 14:42 UTC (permalink / raw)
  To: Thomas Huth
  Cc: David Hildenbrand, qemu-devel, qemu-s390x, Richard Henderson,
	Christian Borntraeger, Cornelia Huck, Halil Pasic, Eric Farman

[-- Attachment #1: Type: text/plain, Size: 2811 bytes --]

Sorry,  It was in the discussion for v4 patches,  as an attachment .
mail thread:
[PATCH v4 10/11] tests/tcg/s390x: Tests for Vector Enhancements Facility 2
So it likely never made it to the mailing list.

I've reattached and will forward the patch (by itself) to the mailing list.

I think the other solution works just as well by ignoring if compiler
doesn't support z15.

I just thought I'd bring it back up as I saw discussion about it.

Thanks
- David Miller






On Tue, May 3, 2022 at 2:55 AM Thomas Huth <thuth@redhat.com> wrote:
>
>   Hi!
>
> On 02/05/2022 18.06, David Miller wrote:
> > There was also the patch that had them as .insn in the other series of emails.
>
> Sorry, I missed that patch, could you please point me to the mail on
> https://lore.kernel.org/qemu-devel/ ? I remember that there was a discussion
> about the vri-d encoding, but I apparently missed the patch that came out of
> this discussion...
>
>   Thomas
>
> > On Mon, May 2, 2022 at 11:52 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 02.05.22 09:20, Thomas Huth wrote:
> >>> On 28/04/2022 11.46, David Hildenbrand wrote:
> >>>> Implement Vector-Enhancements Facility 2 for s390x
> >>>>
> >>>> resolves: https://gitlab.com/qemu-project/qemu/-/issues/738
> >>>>
> >>>> implements:
> >>>>       VECTOR LOAD ELEMENTS REVERSED               (VLER)
> >>>>       VECTOR LOAD BYTE REVERSED ELEMENTS          (VLBR)
> >>>>       VECTOR LOAD BYTE REVERSED ELEMENT           (VLEBRH, VLEBRF, VLEBRG)
> >>>>       VECTOR LOAD BYTE REVERSED ELEMENT AND ZERO  (VLLEBRZ)
> >>>>       VECTOR LOAD BYTE REVERSED ELEMENT AND REPLICATE (VLBRREP)
> >>>>       VECTOR STORE ELEMENTS REVERSED              (VSTER)
> >>>>       VECTOR STORE BYTE REVERSED ELEMENTS         (VSTBR)
> >>>>       VECTOR STORE BYTE REVERSED ELEMENTS         (VSTEBRH, VSTEBRF, VSTEBRG)
> >>>>       VECTOR SHIFT LEFT DOUBLE BY BIT             (VSLD)
> >>>>       VECTOR SHIFT RIGHT DOUBLE BY BIT            (VSRD)
> >>>>       VECTOR STRING SEARCH                        (VSTRS)
> >>>>
> >>>>       modifies:
> >>>>       VECTOR FP CONVERT FROM FIXED                (VCFPS)
> >>>>       VECTOR FP CONVERT FROM LOGICAL              (VCFPL)
> >>>>       VECTOR FP CONVERT TO FIXED                  (VCSFP)
> >>>>       VECTOR FP CONVERT TO LOGICAL                (VCLFP)
> >>>>       VECTOR SHIFT LEFT                           (VSL)
> >>>>       VECTOR SHIFT RIGHT ARITHMETIC               (VSRA)
> >>>>       VECTOR SHIFT RIGHT LOGICAL                  (VSRL)
> >>>
> >>> Thanks, queued to my s390x-next branch now:
> >>>
> >>>    https://gitlab.com/thuth/qemu/-/commits/s390x-next/
> >>>
> >> Thanks for fixing up. At this point I would have suggested to exclude
> >> the tests for now.
> >>
> >> --
> >> Thanks,
> >>
> >> David / dhildenb
> >>
> >
>

[-- Attachment #2: s390x-tcg-vector-tests-insn.patch --]
[-- Type: text/x-patch, Size: 12208 bytes --]

From bb6bf2f9529c4d76db9a9eff2ff7fa1235657103 Mon Sep 17 00:00:00 2001
From: David Miller <dmiller423@gmail.com>
Date: Mon, 21 Mar 2022 16:58:57 -0400
Subject: [PATCH v5 10/11] tests/tcg/s390x: Tests for Vector Enhancements
 Facility 2

Signed-off-by: David Miller <dmiller423@gmail.com>
---
 tests/tcg/s390x/Makefile.target |   8 ++
 tests/tcg/s390x/vx.h            |  19 +++++
 tests/tcg/s390x/vxeh2_vcvt.c    |  88 ++++++++++++++++++++
 tests/tcg/s390x/vxeh2_vlstr.c   | 139 ++++++++++++++++++++++++++++++++
 tests/tcg/s390x/vxeh2_vs.c      |  95 ++++++++++++++++++++++
 5 files changed, 349 insertions(+)
 create mode 100644 tests/tcg/s390x/vx.h
 create mode 100644 tests/tcg/s390x/vxeh2_vcvt.c
 create mode 100644 tests/tcg/s390x/vxeh2_vlstr.c
 create mode 100644 tests/tcg/s390x/vxeh2_vs.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 8c9b6a13ce..921a056dd1 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -16,6 +16,14 @@ TESTS+=shift
 TESTS+=trap
 TESTS+=signals-s390x
 
+VECTOR_TESTS=vxeh2_vs
+VECTOR_TESTS+=vxeh2_vcvt
+VECTOR_TESTS+=vxeh2_vlstr
+
+TESTS+=$(VECTOR_TESTS)
+
+$(VECTOR_TESTS): CFLAGS+=-march=z15 -O2
+
 ifneq ($(HAVE_GDB_BIN),)
 GDB_SCRIPT=$(SRC_PATH)/tests/guest-debug/run-test.py
 
diff --git a/tests/tcg/s390x/vx.h b/tests/tcg/s390x/vx.h
new file mode 100644
index 0000000000..2e66f8b714
--- /dev/null
+++ b/tests/tcg/s390x/vx.h
@@ -0,0 +1,19 @@
+#ifndef QEMU_TESTS_S390X_VX_H
+#define QEMU_TESTS_S390X_VX_H
+
+typedef union S390Vector {
+    uint64_t d[2];  /* doubleword */
+    uint32_t w[4];  /* word */
+    uint16_t h[8];  /* halfword */
+    uint8_t  b[16]; /* byte */
+    float    f[4];  /* float32 */
+    double   fd[2]; /* float64 */
+    __uint128_t v;
+} S390Vector;
+
+#define ES8  0
+#define ES16 1
+#define ES32 2
+#define ES64 3
+
+#endif
\ No newline at end of file
diff --git a/tests/tcg/s390x/vxeh2_vcvt.c b/tests/tcg/s390x/vxeh2_vcvt.c
new file mode 100644
index 0000000000..2e46841ab5
--- /dev/null
+++ b/tests/tcg/s390x/vxeh2_vcvt.c
@@ -0,0 +1,88 @@
+/*
+ * vxeh2_vcvt: vector-enhancements facility 2 vector convert *
+ */
+#include <stdint.h>
+#include "vx.h"
+
+#define M_S 8
+#define M4_XxC 4
+#define M4_def M4_XxC
+
+static inline void vcfps(S390Vector *v1, S390Vector *v2,
+    const uint8_t m3,  const uint8_t m4,  const uint8_t m5)
+{
+    asm volatile(".insn vrr, 0xE700000000C3, %[v1], %[v2], 0, %[m3], %[m4], %[m5]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [m3]  "i" (m3)
+                , [m4]  "i" (m4)
+                , [m5]  "i" (m5));
+}
+
+static inline void vcfpl(S390Vector *v1, S390Vector *v2,
+    const uint8_t m3,  const uint8_t m4,  const uint8_t m5)
+{
+    asm volatile(".insn vrr, 0xE700000000C1, %[v1], %[v2], 0, %[m3], %[m4], %[m5]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [m3]  "i" (m3)
+                , [m4]  "i" (m4)
+                , [m5]  "i" (m5));
+}
+
+static inline void vcsfp(S390Vector *v1, S390Vector *v2,
+    const uint8_t m3,  const uint8_t m4,  const uint8_t m5)
+{
+    asm volatile(".insn vrr, 0xE700000000C2, %[v1], %[v2], 0, %[m3], %[m4], %[m5]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [m3]  "i" (m3)
+                , [m4]  "i" (m4)
+                , [m5]  "i" (m5));
+}
+
+static inline void vclfp(S390Vector *v1, S390Vector *v2,
+    const uint8_t m3,  const uint8_t m4,  const uint8_t m5)
+{
+    asm volatile(".insn vrr, 0xE700000000C0, %[v1], %[v2], 0, %[m3], %[m4], %[m5]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [m3]  "i" (m3)
+                , [m4]  "i" (m4)
+                , [m5]  "i" (m5));
+}
+
+int main(int argc, char *argv[])
+{
+    S390Vector vd;
+    S390Vector vs_i32 = { .w[0] = 1, .w[1] = 64, .w[2] = 1024, .w[3] = -10 };
+    S390Vector vs_u32 = { .w[0] = 2, .w[1] = 32, .w[2] = 4096, .w[3] = 8888 };
+    S390Vector vs_f32 = { .f[0] = 3.987, .f[1] = 5.123,
+                          .f[2] = 4.499, .f[3] = 0.512 };
+
+    vd.d[0] = vd.d[1] = 0;
+    vcfps(&vd, &vs_i32, 2, M4_def, 0);
+    if (1 != vd.f[0] || 1024 != vd.f[2] || 64 != vd.f[1] || -10 != vd.f[3]) {
+        return 1;
+    }
+
+    vd.d[0] = vd.d[1] = 0;
+    vcfpl(&vd, &vs_u32, 2, M4_def, 0);
+    if (2 != vd.f[0] || 4096 != vd.f[2] || 32 != vd.f[1] || 8888 != vd.f[3]) {
+        return 1;
+    }
+
+    vd.d[0] = vd.d[1] = 0;
+    vcsfp(&vd, &vs_f32, 2, M4_def, 0);
+    if (4 != vd.w[0] || 4 != vd.w[2] || 5 != vd.w[1] || 1 != vd.w[3]) {
+        return 1;
+    }
+
+    vd.d[0] = vd.d[1] = 0;
+    vclfp(&vd, &vs_f32, 2, M4_def, 0);
+    if (4 != vd.w[0] || 4 != vd.w[2] || 5 != vd.w[1] || 1 != vd.w[3]) {
+        return 1;
+    }
+
+    return 0;
+}
diff --git a/tests/tcg/s390x/vxeh2_vlstr.c b/tests/tcg/s390x/vxeh2_vlstr.c
new file mode 100644
index 0000000000..770691a4e8
--- /dev/null
+++ b/tests/tcg/s390x/vxeh2_vlstr.c
@@ -0,0 +1,139 @@
+/*
+ * vxeh2_vlstr: vector-enhancements facility 2 vector load/store reversed *
+ */
+#include <stdint.h>
+#include "vx.h"
+
+#define vtst(v1, v2) \
+    if (v1.d[0] != v2.d[0] || v1.d[1] != v2.d[1]) { \
+        return 1;     \
+    }
+
+static inline void vler(S390Vector *v1, const void *va, uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE60000000007, %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vster(S390Vector *v1, const void *va, uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE6000000000F, %[v1], 0(%[va]), %[m3]\n"
+                : [va] "+d" (va)
+                : [v1]  "v" (v1->v)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vlbr(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE60000000006, %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vstbr(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE6000000000E, %[v1], 0(%[va]), %[m3]\n"
+                : [va] "+d" (va)
+                : [v1]  "v" (v1->v)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+
+static inline void vlebrh(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE60000000001, %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vstebrh(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE60000000009, %[v1], 0(%[va]), %[m3]\n"
+                : [va] "+d" (va)
+                : [v1]  "v" (v1->v)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vllebrz(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE60000000004, %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vlbrrep(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE60000000005, %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+int main(int argc, char *argv[])
+{
+    S390Vector vd = { .d[0] = 0, .d[1] = 0 };
+    S390Vector vs = { .d[0] = 0x8FEEDDCCBBAA9988ull,
+                      .d[1] = 0x7766554433221107ull };
+
+    const S390Vector vt_v_er16 = {
+        .h[0] = 0x1107, .h[1] = 0x3322, .h[2] = 0x5544, .h[3] = 0x7766,
+        .h[4] = 0x9988, .h[5] = 0xBBAA, .h[6] = 0xDDCC, .h[7] = 0x8FEE };
+
+    const S390Vector vt_v_br16 = {
+        .h[0] = 0xEE8F, .h[1] = 0xCCDD, .h[2] = 0xAABB, .h[3] = 0x8899,
+        .h[4] = 0x6677, .h[5] = 0x4455, .h[6] = 0x2233, .h[7] = 0x0711 };
+
+    int ix;
+    uint64_t ss64 = 0xFEEDFACE0BADBEEFull, sd64 = 0;
+
+    vler(&vd, &vs, ES16);
+    vtst(vd, vt_v_er16);
+
+    vster(&vs, &vd, ES16);
+    vtst(vd, vt_v_er16);
+
+    vlbr(&vd, &vs, ES16);
+    vtst(vd, vt_v_br16);
+
+    vstbr(&vs, &vd, ES16);
+    vtst(vd, vt_v_br16);
+
+    vlebrh(&vd, &ss64, 5);
+    if (0xEDFE != vd.h[5]) {
+        return 1;
+    }
+
+    vstebrh(&vs, (uint8_t *)&sd64 + 4, 7);
+    if (0x0000000007110000ull != sd64) {
+        return 1;
+    }
+
+    vllebrz(&vd, (uint8_t *)&ss64 + 3, 2);
+    for (ix = 0; ix < 4; ix++) {
+        if (vd.w[ix] != (ix != 1 ? 0 : 0xBEAD0BCE)) {
+            return 1;
+        }
+    }
+
+    vlbrrep(&vd, (uint8_t *)&ss64 + 4, 1);
+    for (ix = 0; ix < 8; ix++) {
+        if (0xAD0B != vd.h[ix]) {
+            return 1;
+        }
+    }
+
+    return 0;
+}
diff --git a/tests/tcg/s390x/vxeh2_vs.c b/tests/tcg/s390x/vxeh2_vs.c
new file mode 100644
index 0000000000..78f5c9a8be
--- /dev/null
+++ b/tests/tcg/s390x/vxeh2_vs.c
@@ -0,0 +1,95 @@
+/*
+ * vxeh2_vs: vector-enhancements facility 2 vector shift
+ */
+#include <stdint.h>
+#include "vx.h"
+
+#define vtst(v1, v2) \
+    if (v1.d[0] != v2.d[0] || v1.d[1] != v2.d[1]) { \
+        return 1;     \
+    }
+
+static inline void vsl(S390Vector *v1, S390Vector *v2, S390Vector *v3)
+{
+    asm volatile(".insn vrr, 0xE70000000074, %[v1], %[v2], %[v3], 0,0,0\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v));
+}
+
+static inline void vsra(S390Vector *v1, S390Vector *v2, S390Vector *v3)
+{
+    asm volatile(".insn vrr, 0xE7000000007E, %[v1], %[v2], %[v3], 0,0,0\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v));
+}
+
+static inline void vsrl(S390Vector *v1, S390Vector *v2, S390Vector *v3)
+{
+    asm volatile(".insn vrr, 0xE7000000007C, %[v1], %[v2], %[v3], 0,0,0\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v));
+}
+
+static inline void vsld(S390Vector *v1, S390Vector *v2,
+    S390Vector *v3, const uint8_t I)
+{
+    /* vri-d as vrr */
+    asm volatile(".insn vrr, 0xE70000000086, %[v1], %[v2], %[v3], 0, %[I], 0\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v)
+                , [I]   "i" (I & 7));
+}
+
+static inline void vsrd(S390Vector *v1, S390Vector *v2,
+    S390Vector *v3, const uint8_t I)
+{
+    /* vri-d as vrr */
+    asm volatile(".insn vrr, 0xE70000000087, %[v1], %[v2], %[v3], 0, %[I], 0\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v)
+                , [I]   "i" (I & 7));
+}
+
+int main(int argc, char *argv[])
+{
+    const S390Vector vt_vsl  = { .d[0] = 0x7FEDBB32D5AA311Dull,
+                                 .d[1] = 0xBB65AA10912220C0ull };
+    const S390Vector vt_vsra = { .d[0] = 0xF1FE6E7399AA5466ull,
+                                 .d[1] = 0x0E762A5188221044ull };
+    const S390Vector vt_vsrl = { .d[0] = 0x11FE6E7399AA5466ull,
+                                 .d[1] = 0x0E762A5188221044ull };
+    const S390Vector vt_vsld = { .d[0] = 0x7F76EE65DD54CC43ull,
+                                 .d[1] = 0xBB32AA2199108838ull };
+    const S390Vector vt_vsrd = { .d[0] = 0x0E060802040E000Aull,
+                                 .d[1] = 0x0C060802040E000Aull };
+    S390Vector vs  = { .d[0] = 0x8FEEDDCCBBAA9988ull,
+                       .d[1] = 0x7766554433221107ull };
+    S390Vector  vd = { .d[0] = 0, .d[1] = 0 };
+    S390Vector vsi = { .d[0] = 0, .d[1] = 0 };
+
+    for (int ix = 0; ix < 16; ix++) {
+        vsi.b[ix] = (1 + (5 ^ ~ix)) & 7;
+    }
+
+    vsl(&vd, &vs, &vsi);
+    vtst(vd, vt_vsl);
+
+    vsra(&vd, &vs, &vsi);
+    vtst(vd, vt_vsra);
+
+    vsrl(&vd, &vs, &vsi);
+    vtst(vd, vt_vsrl);
+
+    vsld(&vd, &vs, &vsi, 3);
+    vtst(vd, vt_vsld);
+
+    vsrd(&vd, &vs, &vsi, 15);
+    vtst(vd, vt_vsrd);
+
+    return 0;
+}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2
  2022-05-03 14:42         ` David Miller
@ 2022-05-03 14:57           ` David Miller
  2022-05-03 14:57             ` David Miller
  2022-05-04  8:28             ` Thomas Huth
  2022-05-04  9:10           ` Thomas Huth
  1 sibling, 2 replies; 26+ messages in thread
From: David Miller @ 2022-05-03 14:57 UTC (permalink / raw)
  To: Thomas Huth
  Cc: David Hildenbrand, qemu-devel, qemu-s390x, Richard Henderson,
	Christian Borntraeger, Cornelia Huck, Halil Pasic, Eric Farman

It looks like google killed allowing password access early, nothing
makes it work anymore.
They had plans to disable 'less secure app' in may,  but it thought it
was the end of the month.
I'll try copy/paste as plain text as well though I Know it will likely
screw it up..

On Tue, May 3, 2022 at 10:42 AM David Miller <dmiller423@gmail.com> wrote:
>
> Sorry,  It was in the discussion for v4 patches,  as an attachment .
> mail thread:
> [PATCH v4 10/11] tests/tcg/s390x: Tests for Vector Enhancements Facility 2
> So it likely never made it to the mailing list.
>
> I've reattached and will forward the patch (by itself) to the mailing list.
>
> I think the other solution works just as well by ignoring if compiler
> doesn't support z15.
>
> I just thought I'd bring it back up as I saw discussion about it.
>
> Thanks
> - David Miller
>
>
>
>
>
>
> On Tue, May 3, 2022 at 2:55 AM Thomas Huth <thuth@redhat.com> wrote:
> >
> >   Hi!
> >
> > On 02/05/2022 18.06, David Miller wrote:
> > > There was also the patch that had them as .insn in the other series of emails.
> >
> > Sorry, I missed that patch, could you please point me to the mail on
> > https://lore.kernel.org/qemu-devel/ ? I remember that there was a discussion
> > about the vri-d encoding, but I apparently missed the patch that came out of
> > this discussion...
> >
> >   Thomas
> >
> > > On Mon, May 2, 2022 at 11:52 AM David Hildenbrand <david@redhat.com> wrote:
> > >>
> > >> On 02.05.22 09:20, Thomas Huth wrote:
> > >>> On 28/04/2022 11.46, David Hildenbrand wrote:
> > >>>> Implement Vector-Enhancements Facility 2 for s390x
> > >>>>
> > >>>> resolves: https://gitlab.com/qemu-project/qemu/-/issues/738
> > >>>>
> > >>>> implements:
> > >>>>       VECTOR LOAD ELEMENTS REVERSED               (VLER)
> > >>>>       VECTOR LOAD BYTE REVERSED ELEMENTS          (VLBR)
> > >>>>       VECTOR LOAD BYTE REVERSED ELEMENT           (VLEBRH, VLEBRF, VLEBRG)
> > >>>>       VECTOR LOAD BYTE REVERSED ELEMENT AND ZERO  (VLLEBRZ)
> > >>>>       VECTOR LOAD BYTE REVERSED ELEMENT AND REPLICATE (VLBRREP)
> > >>>>       VECTOR STORE ELEMENTS REVERSED              (VSTER)
> > >>>>       VECTOR STORE BYTE REVERSED ELEMENTS         (VSTBR)
> > >>>>       VECTOR STORE BYTE REVERSED ELEMENTS         (VSTEBRH, VSTEBRF, VSTEBRG)
> > >>>>       VECTOR SHIFT LEFT DOUBLE BY BIT             (VSLD)
> > >>>>       VECTOR SHIFT RIGHT DOUBLE BY BIT            (VSRD)
> > >>>>       VECTOR STRING SEARCH                        (VSTRS)
> > >>>>
> > >>>>       modifies:
> > >>>>       VECTOR FP CONVERT FROM FIXED                (VCFPS)
> > >>>>       VECTOR FP CONVERT FROM LOGICAL              (VCFPL)
> > >>>>       VECTOR FP CONVERT TO FIXED                  (VCSFP)
> > >>>>       VECTOR FP CONVERT TO LOGICAL                (VCLFP)
> > >>>>       VECTOR SHIFT LEFT                           (VSL)
> > >>>>       VECTOR SHIFT RIGHT ARITHMETIC               (VSRA)
> > >>>>       VECTOR SHIFT RIGHT LOGICAL                  (VSRL)
> > >>>
> > >>> Thanks, queued to my s390x-next branch now:
> > >>>
> > >>>    https://gitlab.com/thuth/qemu/-/commits/s390x-next/
> > >>>
> > >> Thanks for fixing up. At this point I would have suggested to exclude
> > >> the tests for now.
> > >>
> > >> --
> > >> Thanks,
> > >>
> > >> David / dhildenb
> > >>
> > >
> >


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2
  2022-05-03 14:57           ` David Miller
@ 2022-05-03 14:57             ` David Miller
  2022-05-04  8:28             ` Thomas Huth
  1 sibling, 0 replies; 26+ messages in thread
From: David Miller @ 2022-05-03 14:57 UTC (permalink / raw)
  To: Thomas Huth
  Cc: David Hildenbrand, qemu-devel, qemu-s390x, Richard Henderson,
	Christian Borntraeger, Cornelia Huck, Halil Pasic, Eric Farman

From bb6bf2f9529c4d76db9a9eff2ff7fa1235657103 Mon Sep 17 00:00:00 2001
From: David Miller <dmiller423@gmail.com>
Date: Mon, 21 Mar 2022 16:58:57 -0400
Subject: [PATCH v5 10/11] tests/tcg/s390x: Tests for Vector Enhancements
 Facility 2

Signed-off-by: David Miller <dmiller423@gmail.com>
---
 tests/tcg/s390x/Makefile.target |   8 ++
 tests/tcg/s390x/vx.h            |  19 +++++
 tests/tcg/s390x/vxeh2_vcvt.c    |  88 ++++++++++++++++++++
 tests/tcg/s390x/vxeh2_vlstr.c   | 139 ++++++++++++++++++++++++++++++++
 tests/tcg/s390x/vxeh2_vs.c      |  95 ++++++++++++++++++++++
 5 files changed, 349 insertions(+)
 create mode 100644 tests/tcg/s390x/vx.h
 create mode 100644 tests/tcg/s390x/vxeh2_vcvt.c
 create mode 100644 tests/tcg/s390x/vxeh2_vlstr.c
 create mode 100644 tests/tcg/s390x/vxeh2_vs.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 8c9b6a13ce..921a056dd1 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -16,6 +16,14 @@ TESTS+=shift
 TESTS+=trap
 TESTS+=signals-s390x

+VECTOR_TESTS=vxeh2_vs
+VECTOR_TESTS+=vxeh2_vcvt
+VECTOR_TESTS+=vxeh2_vlstr
+
+TESTS+=$(VECTOR_TESTS)
+
+$(VECTOR_TESTS): CFLAGS+=-march=z15 -O2
+
 ifneq ($(HAVE_GDB_BIN),)
 GDB_SCRIPT=$(SRC_PATH)/tests/guest-debug/run-test.py

diff --git a/tests/tcg/s390x/vx.h b/tests/tcg/s390x/vx.h
new file mode 100644
index 0000000000..2e66f8b714
--- /dev/null
+++ b/tests/tcg/s390x/vx.h
@@ -0,0 +1,19 @@
+#ifndef QEMU_TESTS_S390X_VX_H
+#define QEMU_TESTS_S390X_VX_H
+
+typedef union S390Vector {
+    uint64_t d[2];  /* doubleword */
+    uint32_t w[4];  /* word */
+    uint16_t h[8];  /* halfword */
+    uint8_t  b[16]; /* byte */
+    float    f[4];  /* float32 */
+    double   fd[2]; /* float64 */
+    __uint128_t v;
+} S390Vector;
+
+#define ES8  0
+#define ES16 1
+#define ES32 2
+#define ES64 3
+
+#endif
\ No newline at end of file
diff --git a/tests/tcg/s390x/vxeh2_vcvt.c b/tests/tcg/s390x/vxeh2_vcvt.c
new file mode 100644
index 0000000000..2e46841ab5
--- /dev/null
+++ b/tests/tcg/s390x/vxeh2_vcvt.c
@@ -0,0 +1,88 @@
+/*
+ * vxeh2_vcvt: vector-enhancements facility 2 vector convert *
+ */
+#include <stdint.h>
+#include "vx.h"
+
+#define M_S 8
+#define M4_XxC 4
+#define M4_def M4_XxC
+
+static inline void vcfps(S390Vector *v1, S390Vector *v2,
+    const uint8_t m3,  const uint8_t m4,  const uint8_t m5)
+{
+    asm volatile(".insn vrr, 0xE700000000C3, %[v1], %[v2], 0, %[m3],
%[m4], %[m5]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [m3]  "i" (m3)
+                , [m4]  "i" (m4)
+                , [m5]  "i" (m5));
+}
+
+static inline void vcfpl(S390Vector *v1, S390Vector *v2,
+    const uint8_t m3,  const uint8_t m4,  const uint8_t m5)
+{
+    asm volatile(".insn vrr, 0xE700000000C1, %[v1], %[v2], 0, %[m3],
%[m4], %[m5]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [m3]  "i" (m3)
+                , [m4]  "i" (m4)
+                , [m5]  "i" (m5));
+}
+
+static inline void vcsfp(S390Vector *v1, S390Vector *v2,
+    const uint8_t m3,  const uint8_t m4,  const uint8_t m5)
+{
+    asm volatile(".insn vrr, 0xE700000000C2, %[v1], %[v2], 0, %[m3],
%[m4], %[m5]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [m3]  "i" (m3)
+                , [m4]  "i" (m4)
+                , [m5]  "i" (m5));
+}
+
+static inline void vclfp(S390Vector *v1, S390Vector *v2,
+    const uint8_t m3,  const uint8_t m4,  const uint8_t m5)
+{
+    asm volatile(".insn vrr, 0xE700000000C0, %[v1], %[v2], 0, %[m3],
%[m4], %[m5]\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [m3]  "i" (m3)
+                , [m4]  "i" (m4)
+                , [m5]  "i" (m5));
+}
+
+int main(int argc, char *argv[])
+{
+    S390Vector vd;
+    S390Vector vs_i32 = { .w[0] = 1, .w[1] = 64, .w[2] = 1024, .w[3] = -10 };
+    S390Vector vs_u32 = { .w[0] = 2, .w[1] = 32, .w[2] = 4096, .w[3] = 8888 };
+    S390Vector vs_f32 = { .f[0] = 3.987, .f[1] = 5.123,
+                          .f[2] = 4.499, .f[3] = 0.512 };
+
+    vd.d[0] = vd.d[1] = 0;
+    vcfps(&vd, &vs_i32, 2, M4_def, 0);
+    if (1 != vd.f[0] || 1024 != vd.f[2] || 64 != vd.f[1] || -10 != vd.f[3]) {
+        return 1;
+    }
+
+    vd.d[0] = vd.d[1] = 0;
+    vcfpl(&vd, &vs_u32, 2, M4_def, 0);
+    if (2 != vd.f[0] || 4096 != vd.f[2] || 32 != vd.f[1] || 8888 != vd.f[3]) {
+        return 1;
+    }
+
+    vd.d[0] = vd.d[1] = 0;
+    vcsfp(&vd, &vs_f32, 2, M4_def, 0);
+    if (4 != vd.w[0] || 4 != vd.w[2] || 5 != vd.w[1] || 1 != vd.w[3]) {
+        return 1;
+    }
+
+    vd.d[0] = vd.d[1] = 0;
+    vclfp(&vd, &vs_f32, 2, M4_def, 0);
+    if (4 != vd.w[0] || 4 != vd.w[2] || 5 != vd.w[1] || 1 != vd.w[3]) {
+        return 1;
+    }
+
+    return 0;
+}
diff --git a/tests/tcg/s390x/vxeh2_vlstr.c b/tests/tcg/s390x/vxeh2_vlstr.c
new file mode 100644
index 0000000000..770691a4e8
--- /dev/null
+++ b/tests/tcg/s390x/vxeh2_vlstr.c
@@ -0,0 +1,139 @@
+/*
+ * vxeh2_vlstr: vector-enhancements facility 2 vector load/store reversed *
+ */
+#include <stdint.h>
+#include "vx.h"
+
+#define vtst(v1, v2) \
+    if (v1.d[0] != v2.d[0] || v1.d[1] != v2.d[1]) { \
+        return 1;     \
+    }
+
+static inline void vler(S390Vector *v1, const void *va, uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE60000000007, %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vster(S390Vector *v1, const void *va, uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE6000000000F, %[v1], 0(%[va]), %[m3]\n"
+                : [va] "+d" (va)
+                : [v1]  "v" (v1->v)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vlbr(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE60000000006, %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vstbr(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE6000000000E, %[v1], 0(%[va]), %[m3]\n"
+                : [va] "+d" (va)
+                : [v1]  "v" (v1->v)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+
+static inline void vlebrh(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE60000000001, %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vstebrh(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE60000000009, %[v1], 0(%[va]), %[m3]\n"
+                : [va] "+d" (va)
+                : [v1]  "v" (v1->v)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vllebrz(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE60000000004, %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+static inline void vlbrrep(S390Vector *v1, void *va, const uint8_t m3)
+{
+    asm volatile(".insn vrx, 0xE60000000005, %[v1], 0(%[va]), %[m3]\n"
+                : [v1] "+v" (v1->v)
+                : [va]  "d" (va)
+                , [m3]  "i" (m3)
+                : "memory");
+}
+
+int main(int argc, char *argv[])
+{
+    S390Vector vd = { .d[0] = 0, .d[1] = 0 };
+    S390Vector vs = { .d[0] = 0x8FEEDDCCBBAA9988ull,
+                      .d[1] = 0x7766554433221107ull };
+
+    const S390Vector vt_v_er16 = {
+        .h[0] = 0x1107, .h[1] = 0x3322, .h[2] = 0x5544, .h[3] = 0x7766,
+        .h[4] = 0x9988, .h[5] = 0xBBAA, .h[6] = 0xDDCC, .h[7] = 0x8FEE };
+
+    const S390Vector vt_v_br16 = {
+        .h[0] = 0xEE8F, .h[1] = 0xCCDD, .h[2] = 0xAABB, .h[3] = 0x8899,
+        .h[4] = 0x6677, .h[5] = 0x4455, .h[6] = 0x2233, .h[7] = 0x0711 };
+
+    int ix;
+    uint64_t ss64 = 0xFEEDFACE0BADBEEFull, sd64 = 0;
+
+    vler(&vd, &vs, ES16);
+    vtst(vd, vt_v_er16);
+
+    vster(&vs, &vd, ES16);
+    vtst(vd, vt_v_er16);
+
+    vlbr(&vd, &vs, ES16);
+    vtst(vd, vt_v_br16);
+
+    vstbr(&vs, &vd, ES16);
+    vtst(vd, vt_v_br16);
+
+    vlebrh(&vd, &ss64, 5);
+    if (0xEDFE != vd.h[5]) {
+        return 1;
+    }
+
+    vstebrh(&vs, (uint8_t *)&sd64 + 4, 7);
+    if (0x0000000007110000ull != sd64) {
+        return 1;
+    }
+
+    vllebrz(&vd, (uint8_t *)&ss64 + 3, 2);
+    for (ix = 0; ix < 4; ix++) {
+        if (vd.w[ix] != (ix != 1 ? 0 : 0xBEAD0BCE)) {
+            return 1;
+        }
+    }
+
+    vlbrrep(&vd, (uint8_t *)&ss64 + 4, 1);
+    for (ix = 0; ix < 8; ix++) {
+        if (0xAD0B != vd.h[ix]) {
+            return 1;
+        }
+    }
+
+    return 0;
+}
diff --git a/tests/tcg/s390x/vxeh2_vs.c b/tests/tcg/s390x/vxeh2_vs.c
new file mode 100644
index 0000000000..78f5c9a8be
--- /dev/null
+++ b/tests/tcg/s390x/vxeh2_vs.c
@@ -0,0 +1,95 @@
+/*
+ * vxeh2_vs: vector-enhancements facility 2 vector shift
+ */
+#include <stdint.h>
+#include "vx.h"
+
+#define vtst(v1, v2) \
+    if (v1.d[0] != v2.d[0] || v1.d[1] != v2.d[1]) { \
+        return 1;     \
+    }
+
+static inline void vsl(S390Vector *v1, S390Vector *v2, S390Vector *v3)
+{
+    asm volatile(".insn vrr, 0xE70000000074, %[v1], %[v2], %[v3], 0,0,0\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v));
+}
+
+static inline void vsra(S390Vector *v1, S390Vector *v2, S390Vector *v3)
+{
+    asm volatile(".insn vrr, 0xE7000000007E, %[v1], %[v2], %[v3], 0,0,0\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v));
+}
+
+static inline void vsrl(S390Vector *v1, S390Vector *v2, S390Vector *v3)
+{
+    asm volatile(".insn vrr, 0xE7000000007C, %[v1], %[v2], %[v3], 0,0,0\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v));
+}
+
+static inline void vsld(S390Vector *v1, S390Vector *v2,
+    S390Vector *v3, const uint8_t I)
+{
+    /* vri-d as vrr */
+    asm volatile(".insn vrr, 0xE70000000086, %[v1], %[v2], %[v3], 0, %[I], 0\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v)
+                , [I]   "i" (I & 7));
+}
+
+static inline void vsrd(S390Vector *v1, S390Vector *v2,
+    S390Vector *v3, const uint8_t I)
+{
+    /* vri-d as vrr */
+    asm volatile(".insn vrr, 0xE70000000087, %[v1], %[v2], %[v3], 0, %[I], 0\n"
+                : [v1] "=v" (v1->v)
+                : [v2]  "v" (v2->v)
+                , [v3]  "v" (v3->v)
+                , [I]   "i" (I & 7));
+}
+
+int main(int argc, char *argv[])
+{
+    const S390Vector vt_vsl  = { .d[0] = 0x7FEDBB32D5AA311Dull,
+                                 .d[1] = 0xBB65AA10912220C0ull };
+    const S390Vector vt_vsra = { .d[0] = 0xF1FE6E7399AA5466ull,
+                                 .d[1] = 0x0E762A5188221044ull };
+    const S390Vector vt_vsrl = { .d[0] = 0x11FE6E7399AA5466ull,
+                                 .d[1] = 0x0E762A5188221044ull };
+    const S390Vector vt_vsld = { .d[0] = 0x7F76EE65DD54CC43ull,
+                                 .d[1] = 0xBB32AA2199108838ull };
+    const S390Vector vt_vsrd = { .d[0] = 0x0E060802040E000Aull,
+                                 .d[1] = 0x0C060802040E000Aull };
+    S390Vector vs  = { .d[0] = 0x8FEEDDCCBBAA9988ull,
+                       .d[1] = 0x7766554433221107ull };
+    S390Vector  vd = { .d[0] = 0, .d[1] = 0 };
+    S390Vector vsi = { .d[0] = 0, .d[1] = 0 };
+
+    for (int ix = 0; ix < 16; ix++) {
+        vsi.b[ix] = (1 + (5 ^ ~ix)) & 7;
+    }
+
+    vsl(&vd, &vs, &vsi);
+    vtst(vd, vt_vsl);
+
+    vsra(&vd, &vs, &vsi);
+    vtst(vd, vt_vsra);
+
+    vsrl(&vd, &vs, &vsi);
+    vtst(vd, vt_vsrl);
+
+    vsld(&vd, &vs, &vsi, 3);
+    vtst(vd, vt_vsld);
+
+    vsrd(&vd, &vs, &vsi, 15);
+    vtst(vd, vt_vsrd);
+
+    return 0;
+}
-- 
2.32.0

On Tue, May 3, 2022 at 10:57 AM David Miller <dmiller423@gmail.com> wrote:
>
> It looks like google killed allowing password access early, nothing
> makes it work anymore.
> They had plans to disable 'less secure app' in may,  but it thought it
> was the end of the month.
> I'll try copy/paste as plain text as well though I Know it will likely
> screw it up..
>
> On Tue, May 3, 2022 at 10:42 AM David Miller <dmiller423@gmail.com> wrote:
> >
> > Sorry,  It was in the discussion for v4 patches,  as an attachment .
> > mail thread:
> > [PATCH v4 10/11] tests/tcg/s390x: Tests for Vector Enhancements Facility 2
> > So it likely never made it to the mailing list.
> >
> > I've reattached and will forward the patch (by itself) to the mailing list.
> >
> > I think the other solution works just as well by ignoring if compiler
> > doesn't support z15.
> >
> > I just thought I'd bring it back up as I saw discussion about it.
> >
> > Thanks
> > - David Miller
> >
> >
> >
> >
> >
> >
> > On Tue, May 3, 2022 at 2:55 AM Thomas Huth <thuth@redhat.com> wrote:
> > >
> > >   Hi!
> > >
> > > On 02/05/2022 18.06, David Miller wrote:
> > > > There was also the patch that had them as .insn in the other series of emails.
> > >
> > > Sorry, I missed that patch, could you please point me to the mail on
> > > https://lore.kernel.org/qemu-devel/ ? I remember that there was a discussion
> > > about the vri-d encoding, but I apparently missed the patch that came out of
> > > this discussion...
> > >
> > >   Thomas
> > >
> > > > On Mon, May 2, 2022 at 11:52 AM David Hildenbrand <david@redhat.com> wrote:
> > > >>
> > > >> On 02.05.22 09:20, Thomas Huth wrote:
> > > >>> On 28/04/2022 11.46, David Hildenbrand wrote:
> > > >>>> Implement Vector-Enhancements Facility 2 for s390x
> > > >>>>
> > > >>>> resolves: https://gitlab.com/qemu-project/qemu/-/issues/738
> > > >>>>
> > > >>>> implements:
> > > >>>>       VECTOR LOAD ELEMENTS REVERSED               (VLER)
> > > >>>>       VECTOR LOAD BYTE REVERSED ELEMENTS          (VLBR)
> > > >>>>       VECTOR LOAD BYTE REVERSED ELEMENT           (VLEBRH, VLEBRF, VLEBRG)
> > > >>>>       VECTOR LOAD BYTE REVERSED ELEMENT AND ZERO  (VLLEBRZ)
> > > >>>>       VECTOR LOAD BYTE REVERSED ELEMENT AND REPLICATE (VLBRREP)
> > > >>>>       VECTOR STORE ELEMENTS REVERSED              (VSTER)
> > > >>>>       VECTOR STORE BYTE REVERSED ELEMENTS         (VSTBR)
> > > >>>>       VECTOR STORE BYTE REVERSED ELEMENTS         (VSTEBRH, VSTEBRF, VSTEBRG)
> > > >>>>       VECTOR SHIFT LEFT DOUBLE BY BIT             (VSLD)
> > > >>>>       VECTOR SHIFT RIGHT DOUBLE BY BIT            (VSRD)
> > > >>>>       VECTOR STRING SEARCH                        (VSTRS)
> > > >>>>
> > > >>>>       modifies:
> > > >>>>       VECTOR FP CONVERT FROM FIXED                (VCFPS)
> > > >>>>       VECTOR FP CONVERT FROM LOGICAL              (VCFPL)
> > > >>>>       VECTOR FP CONVERT TO FIXED                  (VCSFP)
> > > >>>>       VECTOR FP CONVERT TO LOGICAL                (VCLFP)
> > > >>>>       VECTOR SHIFT LEFT                           (VSL)
> > > >>>>       VECTOR SHIFT RIGHT ARITHMETIC               (VSRA)
> > > >>>>       VECTOR SHIFT RIGHT LOGICAL                  (VSRL)
> > > >>>
> > > >>> Thanks, queued to my s390x-next branch now:
> > > >>>
> > > >>>    https://gitlab.com/thuth/qemu/-/commits/s390x-next/
> > > >>>
> > > >> Thanks for fixing up. At this point I would have suggested to exclude
> > > >> the tests for now.
> > > >>
> > > >> --
> > > >> Thanks,
> > > >>
> > > >> David / dhildenb
> > > >>
> > > >
> > >


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2
  2022-05-03 14:57           ` David Miller
  2022-05-03 14:57             ` David Miller
@ 2022-05-04  8:28             ` Thomas Huth
  1 sibling, 0 replies; 26+ messages in thread
From: Thomas Huth @ 2022-05-04  8:28 UTC (permalink / raw)
  To: David Miller
  Cc: David Hildenbrand, qemu-devel, qemu-s390x, Richard Henderson,
	Christian Borntraeger, Cornelia Huck, Halil Pasic, Eric Farman

On 03/05/2022 16.57, David Miller wrote:
> It looks like google killed allowing password access early, nothing
> makes it work anymore.

Uh, that's ugly! I hope you'll figure out a way to work-around that problem!

> They had plans to disable 'less secure app' in may,  but it thought it
> was the end of the month.
> I'll try copy/paste as plain text as well though I Know it will likely
> screw it up..

Yup, that plain text patch didn't apply anymore - so I went for the 
attachment from your previous mail this time instead (hoping that you'll 
find another way for using git-send-email again in the future).

  Thomas



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2
  2022-05-03 14:42         ` David Miller
  2022-05-03 14:57           ` David Miller
@ 2022-05-04  9:10           ` Thomas Huth
  1 sibling, 0 replies; 26+ messages in thread
From: Thomas Huth @ 2022-05-04  9:10 UTC (permalink / raw)
  To: David Miller
  Cc: David Hildenbrand, qemu-devel, qemu-s390x, Richard Henderson,
	Christian Borntraeger, Cornelia Huck, Halil Pasic, Eric Farman

On 03/05/2022 16.42, David Miller wrote:
> Sorry,  It was in the discussion for v4 patches,  as an attachment .
> mail thread:
> [PATCH v4 10/11] tests/tcg/s390x: Tests for Vector Enhancements Facility 2
> So it likely never made it to the mailing list.
> 
> I've reattached and will forward the patch (by itself) to the mailing list.
> 
> I think the other solution works just as well by ignoring if compiler
> doesn't support z15.
> 
> I just thought I'd bring it back up as I saw discussion about it.

Ok, I now gave this a try ... and while this should now work fine with older 
versions of gcc/binutils, it's failing with Clang now:

   BUILD   s390x-linux-user guest-tests
/home/thuth/devel/qemu/tests/tcg/s390x/vxeh2_vs.c:14:18: error: couldn't 
allocate output register for constraint 'v'
     asm volatile(".insn vrr, 0xE70000000074, %[v1], %[v2], %[v3], 0,0,0\n"
                  ^
/home/thuth/devel/qemu/tests/tcg/s390x/vxeh2_vs.c:22:18: error: couldn't 
allocate output register for constraint 'v'
     asm volatile(".insn vrr, 0xE7000000007E, %[v1], %[v2], %[v3], 0,0,0\n"
                  ^
/home/thuth/devel/qemu/tests/tcg/s390x/vxeh2_vs.c:30:18: error: couldn't 
allocate output register for constraint 'v'
     asm volatile(".insn vrr, 0xE7000000007C, %[v1], %[v2], %[v3], 0,0,0\n"
                  ^
/home/thuth/devel/qemu/tests/tcg/s390x/vxeh2_vs.c:40:18: error: couldn't 
allocate output register for constraint 'v'
     asm volatile(".insn vrr, 0xE70000000086, %[v1], %[v2], %[v3], 0, %[I], 0\n"
                  ^
/home/thuth/devel/qemu/tests/tcg/s390x/vxeh2_vs.c:51:18: error: couldn't 
allocate output register for constraint 'v'
     asm volatile(".insn vrr, 0xE70000000087, %[v1], %[v2], %[v3], 0, %[I], 0\n"
                  ^
5 errors generated.
make[1]: *** [../Makefile.target:109: vxeh2_vs] Error 1
make[1]: *** Waiting for unfinished jobs....
/home/thuth/devel/qemu/tests/tcg/s390x/vxeh2_vcvt.c:14:18: error: couldn't 
allocate output register for constraint 'v'
     asm volatile(".insn vrr, 0xE700000000C3, %[v1], %[v2], 0, %[m3], %[m4], 
%[m5]\n"
                  ^
/home/thuth/devel/qemu/tests/tcg/s390x/vxeh2_vcvt.c:25:18: error: couldn't 
allocate output register for constraint 'v'
     asm volatile(".insn vrr, 0xE700000000C1, %[v1], %[v2], 0, %[m3], %[m4], 
%[m5]\n"
                  ^
/home/thuth/devel/qemu/tests/tcg/s390x/vxeh2_vcvt.c:36:18: error: couldn't 
allocate output register for constraint 'v'
     asm volatile(".insn vrr, 0xE700000000C2, %[v1], %[v2], 0, %[m3], %[m4], 
%[m5]\n"
                  ^
/home/thuth/devel/qemu/tests/tcg/s390x/vxeh2_vcvt.c:47:18: error: couldn't 
allocate output register for constraint 'v'
     asm volatile(".insn vrr, 0xE700000000C0, %[v1], %[v2], 0, %[m3], %[m4], 
%[m5]\n"
                  ^
4 errors generated.

...

Thus I think I'll rather go with the other approach instead that checks for 
the availability of -march=z15.

  Thomas



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-05-04  9:14 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-28  9:46 [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements Facility 2 David Hildenbrand
2022-04-28  9:46 ` [PATCH v6 01/13] target/s390x: Fix writeback to v1 in helper_vstl David Hildenbrand
2022-04-28  9:46 ` [PATCH v6 02/13] s390x/cpu_models: drop "msa5" from the TCG "max" model David Hildenbrand
2022-04-28  9:46 ` [PATCH v6 03/13] s390x/cpu_models: make "max" match the unmodified "qemu" CPU model under TCG David Hildenbrand
2022-04-28  9:46 ` [PATCH v6 04/13] tcg: Implement tcg_gen_{h,w}swap_{i32,i64} David Hildenbrand
2022-04-28  9:47 ` [PATCH v6 05/13] target/s390x: vxeh2: vector convert short/32b David Hildenbrand
2022-04-28  9:47 ` [PATCH v6 06/13] target/s390x: vxeh2: vector string search David Hildenbrand
2022-04-28  9:47 ` [PATCH v6 07/13] target/s390x: vxeh2: Update for changes to vector shifts David Hildenbrand
2022-04-28  9:47 ` [PATCH v6 08/13] target/s390x: vxeh2: vector shift double by bit David Hildenbrand
2022-04-28  9:47 ` [PATCH v6 09/13] target/s390x: vxeh2: vector {load, store} elements reversed David Hildenbrand
2022-04-28  9:47 ` [PATCH v6 10/13] target/s390x: vxeh2: vector {load, store} byte reversed elements David Hildenbrand
2022-04-28  9:47 ` [PATCH v6 11/13] target/s390x: vxeh2: vector {load, store} byte reversed element David Hildenbrand
2022-04-28  9:47 ` [PATCH v6 12/13] target/s390x: add S390_FEAT_VECTOR_ENH2 to qemu CPU model David Hildenbrand
2022-04-28  9:47 ` [PATCH v6 13/13] tests/tcg/s390x: Tests for Vector Enhancements Facility 2 David Hildenbrand
2022-05-02  8:12   ` Thomas Huth
2022-05-02  9:10     ` Thomas Huth
2022-05-02  9:35   ` Thomas Huth
2022-05-02  7:20 ` [PATCH v6 00/13] s390x/tcg: Implement Vector-Enhancements " Thomas Huth
2022-05-02 15:52   ` David Hildenbrand
2022-05-02 16:06     ` David Miller
2022-05-03  6:55       ` Thomas Huth
2022-05-03 14:42         ` David Miller
2022-05-03 14:57           ` David Miller
2022-05-03 14:57             ` David Miller
2022-05-04  8:28             ` Thomas Huth
2022-05-04  9:10           ` Thomas Huth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.