All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] target/s390x: Use Int128 for float128 and retxl
@ 2022-10-21  7:29 Richard Henderson
  2022-10-21  7:29 ` [PATCH 1/9] target/s390x: Use a single return for helper_divs32/u32 Richard Henderson
                   ` (8 more replies)
  0 siblings, 9 replies; 35+ messages in thread
From: Richard Henderson @ 2022-10-21  7:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-s390x

Removing retxl seemed like the easiest test case for TCGv_i128.

I don't quite get rid of retxl, because it is still used by LPQ,
which will be convertable directly to generic tcg ops once I'm
further along with the main improvements to atomic operations.

Tested only via the little test case, but that was sufficient
to show a bug or two with TCI.  The insn-data.def patterns look
quite regular now, so that's a pleasant outcome.


r~


Richard Henderson (9):
  target/s390x: Use a single return for helper_divs32/u32
  target/s390x: Use a single return for helper_divs64/u64
  target/s390x: Use Int128 for return from CLST
  target/s390x: Use Int128 for return from CKSM
  target/s390x: Use Int128 for return from TRE
  target/s390x: Copy wout_x1 to wout_x1_P
  tests/tcg/s390x: Add long-double.c
  target/s390x: Use Int128 for returning float128
  target/s390x: Use Int128 for passing float128

 target/s390x/helper.h           |  52 +++++------
 target/s390x/tcg/fpu_helper.c   | 103 ++++++++++-----------
 target/s390x/tcg/int_helper.c   |  64 +++++--------
 target/s390x/tcg/mem_helper.c   |  25 +++---
 target/s390x/tcg/translate.c    | 154 ++++++++++++++++++++++----------
 tests/tcg/s390x/long-double.c   |  24 +++++
 target/s390x/tcg/insn-data.def  |  58 ++++++------
 tests/tcg/s390x/Makefile.target |   1 +
 8 files changed, 264 insertions(+), 217 deletions(-)
 create mode 100644 tests/tcg/s390x/long-double.c

-- 
2.34.1



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/9] target/s390x: Use a single return for helper_divs32/u32
  2022-10-21  7:29 [PATCH 0/9] target/s390x: Use Int128 for float128 and retxl Richard Henderson
@ 2022-10-21  7:29 ` Richard Henderson
  2022-10-21 11:25   ` Ilya Leoshkevich
                     ` (3 more replies)
  2022-10-21  7:29 ` [PATCH 2/9] target/s390x: Use a single return for helper_divs64/u64 Richard Henderson
                   ` (7 subsequent siblings)
  8 siblings, 4 replies; 35+ messages in thread
From: Richard Henderson @ 2022-10-21  7:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-s390x

Pack the quotient and remainder into a single uint64_t.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/helper.h         |  2 +-
 target/s390x/tcg/int_helper.c | 26 +++++++++++++-------------
 target/s390x/tcg/translate.c  | 10 ++++++----
 3 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index bf33d86f74..97a9668eef 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -10,7 +10,7 @@ DEF_HELPER_FLAGS_4(clc, TCG_CALL_NO_WG, i32, env, i32, i64, i64)
 DEF_HELPER_3(mvcl, i32, env, i32, i32)
 DEF_HELPER_3(clcl, i32, env, i32, i32)
 DEF_HELPER_FLAGS_4(clm, TCG_CALL_NO_WG, i32, env, i32, i32, i64)
-DEF_HELPER_FLAGS_3(divs32, TCG_CALL_NO_WG, s64, env, s64, s64)
+DEF_HELPER_FLAGS_3(divs32, TCG_CALL_NO_WG, i64, env, s64, s64)
 DEF_HELPER_FLAGS_3(divu32, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(divs64, TCG_CALL_NO_WG, s64, env, s64, s64)
 DEF_HELPER_FLAGS_4(divu64, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
diff --git a/target/s390x/tcg/int_helper.c b/target/s390x/tcg/int_helper.c
index 954542388a..130b8bd4d2 100644
--- a/target/s390x/tcg/int_helper.c
+++ b/target/s390x/tcg/int_helper.c
@@ -34,45 +34,45 @@
 #endif
 
 /* 64/32 -> 32 signed division */
-int64_t HELPER(divs32)(CPUS390XState *env, int64_t a, int64_t b64)
+uint64_t HELPER(divs32)(CPUS390XState *env, int64_t a, int64_t b64)
 {
-    int32_t ret, b = b64;
-    int64_t q;
+    int32_t b = b64;
+    int64_t q, r;
 
     if (b == 0) {
         tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
     }
 
-    ret = q = a / b;
-    env->retxl = a % b;
+    q = a / b;
+    r = a % b;
 
     /* Catch non-representable quotient.  */
-    if (ret != q) {
+    if (q != (int32_t)q) {
         tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
     }
 
-    return ret;
+    return deposit64(r, 32, 32, q);
 }
 
 /* 64/32 -> 32 unsigned division */
 uint64_t HELPER(divu32)(CPUS390XState *env, uint64_t a, uint64_t b64)
 {
-    uint32_t ret, b = b64;
-    uint64_t q;
+    uint32_t b = b64;
+    uint64_t q, r;
 
     if (b == 0) {
         tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
     }
 
-    ret = q = a / b;
-    env->retxl = a % b;
+    q = a / b;
+    r = a % b;
 
     /* Catch non-representable quotient.  */
-    if (ret != q) {
+    if (q != (uint32_t)q) {
         tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
     }
 
-    return ret;
+    return deposit64(r, 32, 32, q);
 }
 
 /* 64/64 -> 64 signed division */
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 1d2dddab1c..525317c9df 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2395,15 +2395,17 @@ static DisasJumpType op_diag(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_divs32(DisasContext *s, DisasOps *o)
 {
-    gen_helper_divs32(o->out2, cpu_env, o->in1, o->in2);
-    return_low128(o->out);
+    gen_helper_divs32(o->out, cpu_env, o->in1, o->in2);
+    tcg_gen_ext32u_i64(o->out2, o->out);
+    tcg_gen_shri_i64(o->out, o->out, 32);
     return DISAS_NEXT;
 }
 
 static DisasJumpType op_divu32(DisasContext *s, DisasOps *o)
 {
-    gen_helper_divu32(o->out2, cpu_env, o->in1, o->in2);
-    return_low128(o->out);
+    gen_helper_divu32(o->out, cpu_env, o->in1, o->in2);
+    tcg_gen_ext32u_i64(o->out2, o->out);
+    tcg_gen_shri_i64(o->out, o->out, 32);
     return DISAS_NEXT;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/9] target/s390x: Use a single return for helper_divs64/u64
  2022-10-21  7:29 [PATCH 0/9] target/s390x: Use Int128 for float128 and retxl Richard Henderson
  2022-10-21  7:29 ` [PATCH 1/9] target/s390x: Use a single return for helper_divs32/u32 Richard Henderson
@ 2022-10-21  7:29 ` Richard Henderson
  2022-10-24 13:16   ` Philippe Mathieu-Daudé
  2022-10-25 20:47   ` Ilya Leoshkevich
  2022-10-21  7:30 ` [PATCH 3/9] target/s390x: Use Int128 for return from CLST Richard Henderson
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 35+ messages in thread
From: Richard Henderson @ 2022-10-21  7:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-s390x

Pack the quotient and remainder into a single Int128.
Use the divu128 primitive to remove the cpu_abort on
32-bit hosts.

This is the first use of Int128 as a return value.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/helper.h         |  4 ++--
 target/s390x/tcg/int_helper.c | 38 +++++++++--------------------------
 target/s390x/tcg/translate.c  | 14 +++++++++----
 3 files changed, 21 insertions(+), 35 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 97a9668eef..29986557ed 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -12,8 +12,8 @@ DEF_HELPER_3(clcl, i32, env, i32, i32)
 DEF_HELPER_FLAGS_4(clm, TCG_CALL_NO_WG, i32, env, i32, i32, i64)
 DEF_HELPER_FLAGS_3(divs32, TCG_CALL_NO_WG, i64, env, s64, s64)
 DEF_HELPER_FLAGS_3(divu32, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_3(divs64, TCG_CALL_NO_WG, s64, env, s64, s64)
-DEF_HELPER_FLAGS_4(divu64, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
+DEF_HELPER_FLAGS_3(divs64, TCG_CALL_NO_WG, i128, env, s64, s64)
+DEF_HELPER_FLAGS_4(divu64, TCG_CALL_NO_WG, i128, env, i64, i64, i64)
 DEF_HELPER_3(srst, void, env, i32, i32)
 DEF_HELPER_3(srstu, void, env, i32, i32)
 DEF_HELPER_4(clst, i64, env, i64, i64, i64)
diff --git a/target/s390x/tcg/int_helper.c b/target/s390x/tcg/int_helper.c
index 130b8bd4d2..7dc919a102 100644
--- a/target/s390x/tcg/int_helper.c
+++ b/target/s390x/tcg/int_helper.c
@@ -76,46 +76,26 @@ uint64_t HELPER(divu32)(CPUS390XState *env, uint64_t a, uint64_t b64)
 }
 
 /* 64/64 -> 64 signed division */
-int64_t HELPER(divs64)(CPUS390XState *env, int64_t a, int64_t b)
+Int128 HELPER(divs64)(CPUS390XState *env, int64_t a, int64_t b)
 {
     /* Catch divide by zero, and non-representable quotient (MIN / -1).  */
     if (b == 0 || (b == -1 && a == (1ll << 63))) {
         tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
     }
-    env->retxl = a % b;
-    return a / b;
+    return int128_make128(a % b, a / b);
 }
 
 /* 128 -> 64/64 unsigned division */
-uint64_t HELPER(divu64)(CPUS390XState *env, uint64_t ah, uint64_t al,
-                        uint64_t b)
+Int128 HELPER(divu64)(CPUS390XState *env, uint64_t ah, uint64_t al, uint64_t b)
 {
-    uint64_t ret;
-    /* Signal divide by zero.  */
-    if (b == 0) {
-        tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
-    }
-    if (ah == 0) {
-        /* 64 -> 64/64 case */
-        env->retxl = al % b;
-        ret = al / b;
-    } else {
-        /* ??? Move i386 idivq helper to host-utils.  */
-#ifdef CONFIG_INT128
-        __uint128_t a = ((__uint128_t)ah << 64) | al;
-        __uint128_t q = a / b;
-        env->retxl = a % b;
-        ret = q;
-        if (ret != q) {
-            tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
+    if (b != 0) {
+        uint64_t r = divu128(&al, &ah, b);
+        if (ah == 0) {
+            return int128_make128(r, al);
         }
-#else
-        /* 32-bit hosts would need special wrapper functionality - just abort if
-           we encounter such a case; it's very unlikely anyways. */
-        cpu_abort(env_cpu(env), "128 -> 64/64 division not implemented\n");
-#endif
     }
-    return ret;
+    /* divide by zero or overflow */
+    tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
 }
 
 uint64_t HELPER(cvd)(int32_t reg)
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 525317c9df..a3f5a55891 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2411,15 +2411,21 @@ static DisasJumpType op_divu32(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_divs64(DisasContext *s, DisasOps *o)
 {
-    gen_helper_divs64(o->out2, cpu_env, o->in1, o->in2);
-    return_low128(o->out);
+    TCGv_i128 t = tcg_temp_new_i128();
+
+    gen_helper_divs64(t, cpu_env, o->in1, o->in2);
+    tcg_gen_extr_i128_i64(o->out, o->out2, t);
+    tcg_temp_free_i128(t);
     return DISAS_NEXT;
 }
 
 static DisasJumpType op_divu64(DisasContext *s, DisasOps *o)
 {
-    gen_helper_divu64(o->out2, cpu_env, o->out, o->out2, o->in2);
-    return_low128(o->out);
+    TCGv_i128 t = tcg_temp_new_i128();
+
+    gen_helper_divu64(t, cpu_env, o->out, o->out2, o->in2);
+    tcg_gen_extr_i128_i64(o->out, o->out2, t);
+    tcg_temp_free_i128(t);
     return DISAS_NEXT;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/9] target/s390x: Use Int128 for return from CLST
  2022-10-21  7:29 [PATCH 0/9] target/s390x: Use Int128 for float128 and retxl Richard Henderson
  2022-10-21  7:29 ` [PATCH 1/9] target/s390x: Use a single return for helper_divs32/u32 Richard Henderson
  2022-10-21  7:29 ` [PATCH 2/9] target/s390x: Use a single return for helper_divs64/u64 Richard Henderson
@ 2022-10-21  7:30 ` Richard Henderson
  2022-10-24 13:15   ` Philippe Mathieu-Daudé
                     ` (2 more replies)
  2022-10-21  7:30 ` [PATCH 4/9] target/s390x: Use Int128 for return from CKSM Richard Henderson
                   ` (5 subsequent siblings)
  8 siblings, 3 replies; 35+ messages in thread
From: Richard Henderson @ 2022-10-21  7:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-s390x

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/helper.h         |  2 +-
 target/s390x/tcg/mem_helper.c | 11 ++++-------
 target/s390x/tcg/translate.c  |  8 ++++++--
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 29986557ed..ba9ed11896 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -16,7 +16,7 @@ DEF_HELPER_FLAGS_3(divs64, TCG_CALL_NO_WG, i128, env, s64, s64)
 DEF_HELPER_FLAGS_4(divu64, TCG_CALL_NO_WG, i128, env, i64, i64, i64)
 DEF_HELPER_3(srst, void, env, i32, i32)
 DEF_HELPER_3(srstu, void, env, i32, i32)
-DEF_HELPER_4(clst, i64, env, i64, i64, i64)
+DEF_HELPER_4(clst, i128, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(mvn, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(mvo, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(mvpg, TCG_CALL_NO_WG, i32, env, i64, i32, i32)
diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index 3758b9e688..5a299ca9a2 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -886,7 +886,7 @@ void HELPER(srstu)(CPUS390XState *env, uint32_t r1, uint32_t r2)
 }
 
 /* unsigned string compare (c is string terminator) */
-uint64_t HELPER(clst)(CPUS390XState *env, uint64_t c, uint64_t s1, uint64_t s2)
+Int128 HELPER(clst)(CPUS390XState *env, uint64_t c, uint64_t s1, uint64_t s2)
 {
     uintptr_t ra = GETPC();
     uint32_t len;
@@ -904,23 +904,20 @@ uint64_t HELPER(clst)(CPUS390XState *env, uint64_t c, uint64_t s1, uint64_t s2)
             if (v1 == c) {
                 /* Equal.  CC=0, and don't advance the registers.  */
                 env->cc_op = 0;
-                env->retxl = s2;
-                return s1;
+                return int128_make128(s2, s1);
             }
         } else {
             /* Unequal.  CC={1,2}, and advance the registers.  Note that
                the terminator need not be zero, but the string that contains
                the terminator is by definition "low".  */
             env->cc_op = (v1 == c ? 1 : v2 == c ? 2 : v1 < v2 ? 1 : 2);
-            env->retxl = s2 + len;
-            return s1 + len;
+            return int128_make128(s2 + len, s1 + len);
         }
     }
 
     /* CPU-determined bytes equal; advance the registers.  */
     env->cc_op = 3;
-    env->retxl = s2 + len;
-    return s1 + len;
+    return int128_make128(s2 + len, s1 + len);
 }
 
 /* move page */
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index a3f5a55891..63ce131cbf 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2164,9 +2164,13 @@ static DisasJumpType op_clm(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_clst(DisasContext *s, DisasOps *o)
 {
-    gen_helper_clst(o->in1, cpu_env, regs[0], o->in1, o->in2);
+    TCGv_i128 pair = tcg_temp_new_i128();
+
+    gen_helper_clst(pair, cpu_env, regs[0], o->in1, o->in2);
+    tcg_gen_extr_i128_i64(o->in2, o->in1, pair);
+    tcg_temp_free_i128(pair);
+
     set_cc_static(s);
-    return_low128(o->in2);
     return DISAS_NEXT;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/9] target/s390x: Use Int128 for return from CKSM
  2022-10-21  7:29 [PATCH 0/9] target/s390x: Use Int128 for float128 and retxl Richard Henderson
                   ` (2 preceding siblings ...)
  2022-10-21  7:30 ` [PATCH 3/9] target/s390x: Use Int128 for return from CLST Richard Henderson
@ 2022-10-21  7:30 ` Richard Henderson
  2022-10-24 13:17   ` Philippe Mathieu-Daudé
  2022-10-27 10:36   ` Ilya Leoshkevich
  2022-10-21  7:30 ` [PATCH 5/9] target/s390x: Use Int128 for return from TRE Richard Henderson
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 35+ messages in thread
From: Richard Henderson @ 2022-10-21  7:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-s390x

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/helper.h         | 2 +-
 target/s390x/tcg/mem_helper.c | 7 +++----
 target/s390x/tcg/translate.c  | 6 ++++--
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index ba9ed11896..16045b6dbe 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -103,7 +103,7 @@ DEF_HELPER_4(tre, i64, env, i64, i64, i64)
 DEF_HELPER_4(trt, i32, env, i32, i64, i64)
 DEF_HELPER_4(trtr, i32, env, i32, i64, i64)
 DEF_HELPER_5(trXX, i32, env, i32, i32, i32, i32)
-DEF_HELPER_4(cksm, i64, env, i64, i64, i64)
+DEF_HELPER_4(cksm, i128, env, i64, i64, i64)
 DEF_HELPER_FLAGS_5(calc_cc, TCG_CALL_NO_RWG_SE, i32, env, i32, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sfpc, TCG_CALL_NO_WG, void, env, i64)
 DEF_HELPER_FLAGS_2(sfas, TCG_CALL_NO_WG, void, env, i64)
diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index 5a299ca9a2..bb6714aee4 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -1350,8 +1350,8 @@ uint32_t HELPER(clclu)(CPUS390XState *env, uint32_t r1, uint64_t a2,
 }
 
 /* checksum */
-uint64_t HELPER(cksm)(CPUS390XState *env, uint64_t r1,
-                      uint64_t src, uint64_t src_len)
+Int128 HELPER(cksm)(CPUS390XState *env, uint64_t r1,
+                    uint64_t src, uint64_t src_len)
 {
     uintptr_t ra = GETPC();
     uint64_t max_len, len;
@@ -1392,8 +1392,7 @@ uint64_t HELPER(cksm)(CPUS390XState *env, uint64_t r1,
     env->cc_op = (len == src_len ? 0 : 3);
 
     /* Return both cksm and processed length.  */
-    env->retxl = cksm;
-    return len;
+    return int128_make128(cksm, len);
 }
 
 void HELPER(pack)(CPUS390XState *env, uint32_t len, uint64_t dest, uint64_t src)
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 63ce131cbf..224046551f 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2041,11 +2041,13 @@ static DisasJumpType op_cxlgb(DisasContext *s, DisasOps *o)
 static DisasJumpType op_cksm(DisasContext *s, DisasOps *o)
 {
     int r2 = get_field(s, r2);
+    TCGv_i128 pair = tcg_temp_new_i128();
     TCGv_i64 len = tcg_temp_new_i64();
 
-    gen_helper_cksm(len, cpu_env, o->in1, o->in2, regs[r2 + 1]);
+    gen_helper_cksm(pair, cpu_env, o->in1, o->in2, regs[r2 + 1]);
     set_cc_static(s);
-    return_low128(o->out);
+    tcg_gen_extr_i128_i64(o->out, len, pair);
+    tcg_temp_free_i128(pair);
 
     tcg_gen_add_i64(regs[r2], regs[r2], len);
     tcg_gen_sub_i64(regs[r2 + 1], regs[r2 + 1], len);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/9] target/s390x: Use Int128 for return from TRE
  2022-10-21  7:29 [PATCH 0/9] target/s390x: Use Int128 for float128 and retxl Richard Henderson
                   ` (3 preceding siblings ...)
  2022-10-21  7:30 ` [PATCH 4/9] target/s390x: Use Int128 for return from CKSM Richard Henderson
@ 2022-10-21  7:30 ` Richard Henderson
  2022-10-24 13:17   ` Philippe Mathieu-Daudé
  2022-10-27 10:40   ` Ilya Leoshkevich
  2022-10-21  7:30 ` [PATCH 6/9] target/s390x: Copy wout_x1 to wout_x1_P Richard Henderson
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 35+ messages in thread
From: Richard Henderson @ 2022-10-21  7:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-s390x

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/helper.h         | 2 +-
 target/s390x/tcg/mem_helper.c | 7 +++----
 target/s390x/tcg/translate.c  | 7 +++++--
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 16045b6dbe..aaea177246 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -99,7 +99,7 @@ DEF_HELPER_FLAGS_4(unpka, TCG_CALL_NO_WG, i32, env, i64, i32, i64)
 DEF_HELPER_FLAGS_4(unpku, TCG_CALL_NO_WG, i32, env, i64, i32, i64)
 DEF_HELPER_FLAGS_3(tp, TCG_CALL_NO_WG, i32, env, i64, i32)
 DEF_HELPER_FLAGS_4(tr, TCG_CALL_NO_WG, void, env, i32, i64, i64)
-DEF_HELPER_4(tre, i64, env, i64, i64, i64)
+DEF_HELPER_4(tre, i128, env, i64, i64, i64)
 DEF_HELPER_4(trt, i32, env, i32, i64, i64)
 DEF_HELPER_4(trtr, i32, env, i32, i64, i64)
 DEF_HELPER_5(trXX, i32, env, i32, i32, i32, i32)
diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index bb6714aee4..caf8c408ef 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -1632,8 +1632,8 @@ void HELPER(tr)(CPUS390XState *env, uint32_t len, uint64_t array,
     do_helper_tr(env, len, array, trans, GETPC());
 }
 
-uint64_t HELPER(tre)(CPUS390XState *env, uint64_t array,
-                     uint64_t len, uint64_t trans)
+Int128 HELPER(tre)(CPUS390XState *env, uint64_t array,
+                   uint64_t len, uint64_t trans)
 {
     uintptr_t ra = GETPC();
     uint8_t end = env->regs[0] & 0xff;
@@ -1668,8 +1668,7 @@ uint64_t HELPER(tre)(CPUS390XState *env, uint64_t array,
     }
 
     env->cc_op = cc;
-    env->retxl = len - i;
-    return array + i;
+    return int128_make128(len - i, array + i);
 }
 
 static inline uint32_t do_helper_trt(CPUS390XState *env, int len,
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 224046551f..8dc4f4b5ec 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -4905,8 +4905,11 @@ static DisasJumpType op_tr(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_tre(DisasContext *s, DisasOps *o)
 {
-    gen_helper_tre(o->out, cpu_env, o->out, o->out2, o->in2);
-    return_low128(o->out2);
+    TCGv_i128 pair = tcg_temp_new_i128();
+
+    gen_helper_tre(pair, cpu_env, o->out, o->out2, o->in2);
+    tcg_gen_extr_i128_i64(o->out2, o->out, pair);
+    tcg_temp_free_i128(pair);
     set_cc_static(s);
     return DISAS_NEXT;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 6/9] target/s390x: Copy wout_x1 to wout_x1_P
  2022-10-21  7:29 [PATCH 0/9] target/s390x: Use Int128 for float128 and retxl Richard Henderson
                   ` (4 preceding siblings ...)
  2022-10-21  7:30 ` [PATCH 5/9] target/s390x: Use Int128 for return from TRE Richard Henderson
@ 2022-10-21  7:30 ` Richard Henderson
  2022-10-27 10:57   ` Ilya Leoshkevich
  2022-10-21  7:30 ` [PATCH 7/9] tests/tcg/s390x: Add long-double.c Richard Henderson
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2022-10-21  7:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-s390x

Make a copy of wout_x1 before modifying it, as wout_x1_P
emphasizing that it operates on the out/out2 pair.  The insns
that use x1_P are data movement that will not change to Int128.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/tcg/translate.c   |  8 ++++++++
 target/s390x/tcg/insn-data.def | 12 ++++++------
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 8dc4f4b5ec..15c1a2067e 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -5518,6 +5518,14 @@ static void wout_x1(DisasContext *s, DisasOps *o)
 }
 #define SPEC_wout_x1 SPEC_r1_f128
 
+static void wout_x1_P(DisasContext *s, DisasOps *o)
+{
+    int f1 = get_field(s, r1);
+    store_freg(f1, o->out);
+    store_freg(f1 + 2, o->out2);
+}
+#define SPEC_wout_x1_P SPEC_r1_f128
+
 static void wout_cond_r1r2_32(DisasContext *s, DisasOps *o)
 {
     if (get_field(s, r1) != get_field(s, r2)) {
diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index 6382ceabfc..83e246ab16 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -422,7 +422,7 @@
     F(0x3800, LER,     RR_a,  Z,   0, e2, 0, cond_e1e2, mov2, 0, IF_AFP1 | IF_AFP2)
     F(0x7800, LE,      RX_a,  Z,   0, m2_32u, 0, e1, mov2, 0, IF_AFP1)
     F(0xed64, LEY,     RXY_a, LD,  0, m2_32u, 0, e1, mov2, 0, IF_AFP1)
-    F(0xb365, LXR,     RRE,   Z,   x2h, x2l, 0, x1, movx, 0, IF_AFP1)
+    F(0xb365, LXR,     RRE,   Z,   x2h, x2l, 0, x1_P, movx, 0, IF_AFP1)
 /* LOAD IMMEDIATE */
     C(0xc001, LGFI,    RIL_a, EI,  0, i2, 0, r1, mov2, 0)
 /* LOAD RELATIVE LONG */
@@ -461,7 +461,7 @@
     C(0xe332, LTGF,    RXY_a, GIE, 0, a2, r1, 0, ld32s, s64)
     F(0xb302, LTEBR,   RRE,   Z,   0, e2, 0, cond_e1e2, mov2, f32, IF_BFP)
     F(0xb312, LTDBR,   RRE,   Z,   0, f2, 0, f1, mov2, f64, IF_BFP)
-    F(0xb342, LTXBR,   RRE,   Z,   x2h, x2l, 0, x1, movx, f128, IF_BFP)
+    F(0xb342, LTXBR,   RRE,   Z,   x2h, x2l, 0, x1_P, movx, f128, IF_BFP)
 /* LOAD AND TRAP */
     C(0xe39f, LAT,     RXY_a, LAT, 0, m2_32u, r1, 0, lat, 0)
     C(0xe385, LGAT,    RXY_a, LAT, 0, a2, r1, 0, lgat, 0)
@@ -483,7 +483,7 @@
     C(0xb913, LCGFR,   RRE,   Z,   0, r2_32s, r1, 0, neg, neg64)
     F(0xb303, LCEBR,   RRE,   Z,   0, e2, new, e1, negf32, f32, IF_BFP)
     F(0xb313, LCDBR,   RRE,   Z,   0, f2, new, f1, negf64, f64, IF_BFP)
-    F(0xb343, LCXBR,   RRE,   Z,   x2h, x2l, new_P, x1, negf128, f128, IF_BFP)
+    F(0xb343, LCXBR,   RRE,   Z,   x2h, x2l, new_P, x1_P, negf128, f128, IF_BFP)
     F(0xb373, LCDFR,   RRE,   FPSSH, 0, f2, new, f1, negf64, 0, IF_AFP1 | IF_AFP2)
 /* LOAD COUNT TO BLOCK BOUNDARY */
     C(0xe727, LCBB,    RXE,   V,   la2, 0, r1, 0, lcbb, 0)
@@ -552,7 +552,7 @@
     C(0xb911, LNGFR,   RRE,   Z,   0, r2_32s, r1, 0, nabs, nabs64)
     F(0xb301, LNEBR,   RRE,   Z,   0, e2, new, e1, nabsf32, f32, IF_BFP)
     F(0xb311, LNDBR,   RRE,   Z,   0, f2, new, f1, nabsf64, f64, IF_BFP)
-    F(0xb341, LNXBR,   RRE,   Z,   x2h, x2l, new_P, x1, nabsf128, f128, IF_BFP)
+    F(0xb341, LNXBR,   RRE,   Z,   x2h, x2l, new_P, x1_P, nabsf128, f128, IF_BFP)
     F(0xb371, LNDFR,   RRE,   FPSSH, 0, f2, new, f1, nabsf64, 0, IF_AFP1 | IF_AFP2)
 /* LOAD ON CONDITION */
     C(0xb9f2, LOCR,    RRF_c, LOC, r1, r2, new, r1_32, loc, 0)
@@ -577,7 +577,7 @@
     C(0xb910, LPGFR,   RRE,   Z,   0, r2_32s, r1, 0, abs, abs64)
     F(0xb300, LPEBR,   RRE,   Z,   0, e2, new, e1, absf32, f32, IF_BFP)
     F(0xb310, LPDBR,   RRE,   Z,   0, f2, new, f1, absf64, f64, IF_BFP)
-    F(0xb340, LPXBR,   RRE,   Z,   x2h, x2l, new_P, x1, absf128, f128, IF_BFP)
+    F(0xb340, LPXBR,   RRE,   Z,   x2h, x2l, new_P, x1_P, absf128, f128, IF_BFP)
     F(0xb370, LPDFR,   RRE,   FPSSH, 0, f2, new, f1, absf64, 0, IF_AFP1 | IF_AFP2)
 /* LOAD REVERSED */
     C(0xb91f, LRVR,    RRE,   Z,   0, r2_32u, new, r1_32, rev32, 0)
@@ -588,7 +588,7 @@
 /* LOAD ZERO */
     F(0xb374, LZER,    RRE,   Z,   0, 0, 0, e1, zero, 0, IF_AFP1)
     F(0xb375, LZDR,    RRE,   Z,   0, 0, 0, f1, zero, 0, IF_AFP1)
-    F(0xb376, LZXR,    RRE,   Z,   0, 0, 0, x1, zero2, 0, IF_AFP1)
+    F(0xb376, LZXR,    RRE,   Z,   0, 0, 0, x1_P, zero2, 0, IF_AFP1)
 
 /* LOAD FPC */
     F(0xb29d, LFPC,    S,     Z,   0, m2_32u, 0, 0, sfpc, 0, IF_BFP)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 7/9] tests/tcg/s390x: Add long-double.c
  2022-10-21  7:29 [PATCH 0/9] target/s390x: Use Int128 for float128 and retxl Richard Henderson
                   ` (5 preceding siblings ...)
  2022-10-21  7:30 ` [PATCH 6/9] target/s390x: Copy wout_x1 to wout_x1_P Richard Henderson
@ 2022-10-21  7:30 ` Richard Henderson
  2022-10-24 13:19   ` Philippe Mathieu-Daudé
  2022-10-27 11:04   ` Ilya Leoshkevich
  2022-10-21  7:30 ` [PATCH 8/9] target/s390x: Use Int128 for returning float128 Richard Henderson
  2022-10-21  7:30 ` [PATCH 9/9] target/s390x: Use Int128 for passing float128 Richard Henderson
  8 siblings, 2 replies; 35+ messages in thread
From: Richard Henderson @ 2022-10-21  7:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-s390x

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tests/tcg/s390x/long-double.c   | 24 ++++++++++++++++++++++++
 tests/tcg/s390x/Makefile.target |  1 +
 2 files changed, 25 insertions(+)
 create mode 100644 tests/tcg/s390x/long-double.c

diff --git a/tests/tcg/s390x/long-double.c b/tests/tcg/s390x/long-double.c
new file mode 100644
index 0000000000..757a6262fd
--- /dev/null
+++ b/tests/tcg/s390x/long-double.c
@@ -0,0 +1,24 @@
+/*
+ * Perform some basic arithmetic with long double, as a sanity check.
+ * With small integral numbers, we can cross-check with integers.
+ */
+
+#include <assert.h>
+
+int main()
+{
+    int i, j;
+
+    for (i = 1; i < 5; i++) {
+        for (j = 1; j < 5; j++) {
+            long double la = (long double)i + j;
+            long double lm = (long double)i * j;
+            long double ls = (long double)i - j;
+
+            assert(la == i + j);
+            assert(lm == i * j);
+            assert(ls == i - j);
+        }
+    }
+    return 0;
+}
diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index c830313e67..627668e1ce 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -17,6 +17,7 @@ TESTS+=trap
 TESTS+=signals-s390x
 TESTS+=branch-relative-long
 TESTS+=noexec
+TESTS+=long-double
 
 Z14_TESTS=vfminmax
 vfminmax: LDFLAGS+=-lm
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 8/9] target/s390x: Use Int128 for returning float128
  2022-10-21  7:29 [PATCH 0/9] target/s390x: Use Int128 for float128 and retxl Richard Henderson
                   ` (6 preceding siblings ...)
  2022-10-21  7:30 ` [PATCH 7/9] tests/tcg/s390x: Add long-double.c Richard Henderson
@ 2022-10-21  7:30 ` Richard Henderson
  2022-10-25  9:58   ` Philippe Mathieu-Daudé
  2022-10-27 11:18   ` Ilya Leoshkevich
  2022-10-21  7:30 ` [PATCH 9/9] target/s390x: Use Int128 for passing float128 Richard Henderson
  8 siblings, 2 replies; 35+ messages in thread
From: Richard Henderson @ 2022-10-21  7:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-s390x

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/helper.h          | 22 +++++++--------
 target/s390x/tcg/fpu_helper.c  | 29 ++++++++++----------
 target/s390x/tcg/translate.c   | 49 +++++++++++++++++++---------------
 target/s390x/tcg/insn-data.def | 20 +++++++-------
 4 files changed, 63 insertions(+), 57 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index aaea177246..429131a85e 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -31,32 +31,32 @@ DEF_HELPER_4(clcle, i32, env, i32, i64, i32)
 DEF_HELPER_4(clclu, i32, env, i32, i64, i32)
 DEF_HELPER_3(cegb, i64, env, s64, i32)
 DEF_HELPER_3(cdgb, i64, env, s64, i32)
-DEF_HELPER_3(cxgb, i64, env, s64, i32)
+DEF_HELPER_3(cxgb, i128, env, s64, i32)
 DEF_HELPER_3(celgb, i64, env, i64, i32)
 DEF_HELPER_3(cdlgb, i64, env, i64, i32)
-DEF_HELPER_3(cxlgb, i64, env, i64, i32)
+DEF_HELPER_3(cxlgb, i128, env, i64, i32)
 DEF_HELPER_4(cdsg, void, env, i64, i32, i32)
 DEF_HELPER_4(cdsg_parallel, void, env, i64, i32, i32)
 DEF_HELPER_4(csst, i32, env, i32, i64, i64)
 DEF_HELPER_4(csst_parallel, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_3(aeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(adb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(axb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_5(axb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
 DEF_HELPER_FLAGS_3(seb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(sdb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(sxb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_5(sxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
 DEF_HELPER_FLAGS_3(deb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(ddb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(dxb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_5(dxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
 DEF_HELPER_FLAGS_3(meeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(mdeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(mdb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(mxb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64)
-DEF_HELPER_FLAGS_4(mxdb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
+DEF_HELPER_FLAGS_5(mxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_4(mxdb, TCG_CALL_NO_WG, i128, env, i64, i64, i64)
 DEF_HELPER_FLAGS_2(ldeb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_4(ldxb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
-DEF_HELPER_FLAGS_2(lxdb, TCG_CALL_NO_WG, i64, env, i64)
-DEF_HELPER_FLAGS_2(lxeb, TCG_CALL_NO_WG, i64, env, i64)
+DEF_HELPER_FLAGS_2(lxdb, TCG_CALL_NO_WG, i128, env, i64)
+DEF_HELPER_FLAGS_2(lxeb, TCG_CALL_NO_WG, i128, env, i64)
 DEF_HELPER_FLAGS_3(ledb, TCG_CALL_NO_WG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_4(lexb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
 DEF_HELPER_FLAGS_3(ceb, TCG_CALL_NO_WG_SE, i32, env, i64, i64)
@@ -79,7 +79,7 @@ DEF_HELPER_3(clfdb, i64, env, i64, i32)
 DEF_HELPER_4(clfxb, i64, env, i64, i64, i32)
 DEF_HELPER_FLAGS_3(fieb, TCG_CALL_NO_WG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_3(fidb, TCG_CALL_NO_WG, i64, env, i64, i32)
-DEF_HELPER_FLAGS_4(fixb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
+DEF_HELPER_FLAGS_4(fixb, TCG_CALL_NO_WG, i128, env, i64, i64, i32)
 DEF_HELPER_FLAGS_4(maeb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(madb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(mseb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
@@ -89,7 +89,7 @@ DEF_HELPER_FLAGS_3(tcdb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
 DEF_HELPER_FLAGS_4(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sqeb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_2(sqdb, TCG_CALL_NO_WG, i64, env, i64)
-DEF_HELPER_FLAGS_3(sqxb, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(sqxb, TCG_CALL_NO_WG, i128, env, i64, i64)
 DEF_HELPER_FLAGS_1(cvd, TCG_CALL_NO_RWG_SE, i64, s32)
 DEF_HELPER_FLAGS_4(pack, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(pka, TCG_CALL_NO_WG, void, env, i64, i64, i32)
diff --git a/target/s390x/tcg/fpu_helper.c b/target/s390x/tcg/fpu_helper.c
index 4067205405..a584794be6 100644
--- a/target/s390x/tcg/fpu_helper.c
+++ b/target/s390x/tcg/fpu_helper.c
@@ -34,7 +34,10 @@
 #define HELPER_LOG(x...)
 #endif
 
-#define RET128(F) (env->retxl = F.low, F.high)
+static inline Int128 RET128(float128 f)
+{
+    return int128_make128(f.low, f.high);
+}
 
 uint8_t s390_softfloat_exc_to_ieee(unsigned int exc)
 {
@@ -224,7 +227,7 @@ uint64_t HELPER(adb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
 }
 
 /* 128-bit FP addition */
-uint64_t HELPER(axb)(CPUS390XState *env, uint64_t ah, uint64_t al,
+Int128 HELPER(axb)(CPUS390XState *env, uint64_t ah, uint64_t al,
                      uint64_t bh, uint64_t bl)
 {
     float128 ret = float128_add(make_float128(ah, al),
@@ -251,7 +254,7 @@ uint64_t HELPER(sdb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
 }
 
 /* 128-bit FP subtraction */
-uint64_t HELPER(sxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
+Int128 HELPER(sxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
                      uint64_t bh, uint64_t bl)
 {
     float128 ret = float128_sub(make_float128(ah, al),
@@ -278,7 +281,7 @@ uint64_t HELPER(ddb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
 }
 
 /* 128-bit FP division */
-uint64_t HELPER(dxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
+Int128 HELPER(dxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
                      uint64_t bh, uint64_t bl)
 {
     float128 ret = float128_div(make_float128(ah, al),
@@ -314,7 +317,7 @@ uint64_t HELPER(mdeb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
 }
 
 /* 128-bit FP multiplication */
-uint64_t HELPER(mxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
+Int128 HELPER(mxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
                      uint64_t bh, uint64_t bl)
 {
     float128 ret = float128_mul(make_float128(ah, al),
@@ -325,8 +328,7 @@ uint64_t HELPER(mxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
 }
 
 /* 128/64-bit FP multiplication */
-uint64_t HELPER(mxdb)(CPUS390XState *env, uint64_t ah, uint64_t al,
-                      uint64_t f2)
+Int128 HELPER(mxdb)(CPUS390XState *env, uint64_t ah, uint64_t al, uint64_t f2)
 {
     float128 ret = float64_to_float128(f2, &env->fpu_status);
     ret = float128_mul(make_float128(ah, al), ret, &env->fpu_status);
@@ -355,7 +357,7 @@ uint64_t HELPER(ldxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
 }
 
 /* convert 64-bit float to 128-bit float */
-uint64_t HELPER(lxdb)(CPUS390XState *env, uint64_t f2)
+Int128 HELPER(lxdb)(CPUS390XState *env, uint64_t f2)
 {
     float128 ret = float64_to_float128(f2, &env->fpu_status);
     handle_exceptions(env, false, GETPC());
@@ -363,7 +365,7 @@ uint64_t HELPER(lxdb)(CPUS390XState *env, uint64_t f2)
 }
 
 /* convert 32-bit float to 128-bit float */
-uint64_t HELPER(lxeb)(CPUS390XState *env, uint64_t f2)
+Int128 HELPER(lxeb)(CPUS390XState *env, uint64_t f2)
 {
     float128 ret = float32_to_float128(f2, &env->fpu_status);
     handle_exceptions(env, false, GETPC());
@@ -486,7 +488,7 @@ uint64_t HELPER(cdgb)(CPUS390XState *env, int64_t v2, uint32_t m34)
 }
 
 /* convert 64-bit int to 128-bit float */
-uint64_t HELPER(cxgb)(CPUS390XState *env, int64_t v2, uint32_t m34)
+Int128 HELPER(cxgb)(CPUS390XState *env, int64_t v2, uint32_t m34)
 {
     int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
     float128 ret = int64_to_float128(v2, &env->fpu_status);
@@ -519,7 +521,7 @@ uint64_t HELPER(cdlgb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
 }
 
 /* convert 64-bit uint to 128-bit float */
-uint64_t HELPER(cxlgb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
+Int128 HELPER(cxlgb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
 {
     int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
     float128 ret = uint64_to_float128(v2, &env->fpu_status);
@@ -748,8 +750,7 @@ uint64_t HELPER(fidb)(CPUS390XState *env, uint64_t f2, uint32_t m34)
 }
 
 /* round to integer 128-bit */
-uint64_t HELPER(fixb)(CPUS390XState *env, uint64_t ah, uint64_t al,
-                      uint32_t m34)
+Int128 HELPER(fixb)(CPUS390XState *env, uint64_t ah, uint64_t al, uint32_t m34)
 {
     int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
     float128 ret = float128_round_to_int(make_float128(ah, al),
@@ -890,7 +891,7 @@ uint64_t HELPER(sqdb)(CPUS390XState *env, uint64_t f2)
 }
 
 /* square root 128-bit */
-uint64_t HELPER(sqxb)(CPUS390XState *env, uint64_t ah, uint64_t al)
+Int128 HELPER(sqxb)(CPUS390XState *env, uint64_t ah, uint64_t al)
 {
     float128 ret = float128_sqrt(make_float128(ah, al), &env->fpu_status);
     handle_exceptions(env, false, GETPC());
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 15c1a2067e..d1ffbb8710 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -1103,6 +1103,7 @@ typedef struct {
     bool g_out, g_out2, g_in1, g_in2;
     TCGv_i64 out, out2, in1, in2;
     TCGv_i64 addr1;
+    TCGv_i128 out_128;
 } DisasOps;
 
 /* Instructions can place constraints on their operands, raising specification
@@ -1461,8 +1462,7 @@ static DisasJumpType op_adb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_axb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_axb(o->out, cpu_env, o->out, o->out2, o->in1, o->in2);
-    return_low128(o->out2);
+    gen_helper_axb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
     return DISAS_NEXT;
 }
 
@@ -1995,9 +1995,8 @@ static DisasJumpType op_cxgb(DisasContext *s, DisasOps *o)
     if (!m34) {
         return DISAS_NORETURN;
     }
-    gen_helper_cxgb(o->out, cpu_env, o->in2, m34);
+    gen_helper_cxgb(o->out_128, cpu_env, o->in2, m34);
     tcg_temp_free_i32(m34);
-    return_low128(o->out2);
     return DISAS_NEXT;
 }
 
@@ -2032,7 +2031,7 @@ static DisasJumpType op_cxlgb(DisasContext *s, DisasOps *o)
     if (!m34) {
         return DISAS_NORETURN;
     }
-    gen_helper_cxlgb(o->out, cpu_env, o->in2, m34);
+    gen_helper_cxlgb(o->out_128, cpu_env, o->in2, m34);
     tcg_temp_free_i32(m34);
     return_low128(o->out2);
     return DISAS_NEXT;
@@ -2449,8 +2448,7 @@ static DisasJumpType op_ddb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_dxb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_dxb(o->out, cpu_env, o->out, o->out2, o->in1, o->in2);
-    return_low128(o->out2);
+    gen_helper_dxb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
     return DISAS_NEXT;
 }
 
@@ -2555,8 +2553,7 @@ static DisasJumpType op_fixb(DisasContext *s, DisasOps *o)
     if (!m34) {
         return DISAS_NORETURN;
     }
-    gen_helper_fixb(o->out, cpu_env, o->in1, o->in2, m34);
-    return_low128(o->out2);
+    gen_helper_fixb(o->out_128, cpu_env, o->in1, o->in2, m34);
     tcg_temp_free_i32(m34);
     return DISAS_NEXT;
 }
@@ -2868,14 +2865,13 @@ static DisasJumpType op_lexb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_lxdb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_lxdb(o->out, cpu_env, o->in2);
-    return_low128(o->out2);
+    gen_helper_lxdb(o->out_128, cpu_env, o->in2);
     return DISAS_NEXT;
 }
 
 static DisasJumpType op_lxeb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_lxeb(o->out, cpu_env, o->in2);
+    gen_helper_lxeb(o->out_128, cpu_env, o->in2);
     return_low128(o->out2);
     return DISAS_NEXT;
 }
@@ -3590,15 +3586,13 @@ static DisasJumpType op_mdb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_mxb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_mxb(o->out, cpu_env, o->out, o->out2, o->in1, o->in2);
-    return_low128(o->out2);
+    gen_helper_mxb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
     return DISAS_NEXT;
 }
 
 static DisasJumpType op_mxdb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_mxdb(o->out, cpu_env, o->out, o->out2, o->in2);
-    return_low128(o->out2);
+    gen_helper_mxdb(o->out_128, cpu_env, o->out, o->out2, o->in2);
     return DISAS_NEXT;
 }
 
@@ -4063,8 +4057,7 @@ static DisasJumpType op_sdb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_sxb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_sxb(o->out, cpu_env, o->out, o->out2, o->in1, o->in2);
-    return_low128(o->out2);
+    gen_helper_sxb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
     return DISAS_NEXT;
 }
 
@@ -4082,8 +4075,7 @@ static DisasJumpType op_sqdb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_sqxb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_sqxb(o->out, cpu_env, o->in1, o->in2);
-    return_low128(o->out2);
+    gen_helper_sqxb(o->out_128, cpu_env, o->in1, o->in2);
     return DISAS_NEXT;
 }
 
@@ -5395,6 +5387,14 @@ static void prep_new_P(DisasContext *s, DisasOps *o)
 }
 #define SPEC_prep_new_P 0
 
+static void prep_new_x(DisasContext *s, DisasOps *o)
+{
+    o->out = tcg_temp_new_i64();
+    o->out2 = tcg_temp_new_i64();
+    o->out_128 = tcg_temp_new_i128();
+}
+#define SPEC_prep_new_x 0
+
 static void prep_r1(DisasContext *s, DisasOps *o)
 {
     o->out = regs[get_field(s, r1)];
@@ -5411,11 +5411,12 @@ static void prep_r1_P(DisasContext *s, DisasOps *o)
 }
 #define SPEC_prep_r1_P SPEC_r1_even
 
-/* Whenever we need x1 in addition to other inputs, we'll load it to out/out2 */
 static void prep_x1(DisasContext *s, DisasOps *o)
 {
     o->out = load_freg(get_field(s, r1));
     o->out2 = load_freg(get_field(s, r1) + 2);
+    o->out_128 = tcg_temp_new_i128();
+    tcg_gen_concat_i64_i128(o->out_128, o->out2, o->out);
 }
 #define SPEC_prep_x1 SPEC_r1_f128
 
@@ -5513,6 +5514,8 @@ static void wout_f1(DisasContext *s, DisasOps *o)
 static void wout_x1(DisasContext *s, DisasOps *o)
 {
     int f1 = get_field(s, r1);
+
+    tcg_gen_extr_i128_i64(o->out2, o->out, o->out_128);
     store_freg(f1, o->out);
     store_freg(f1 + 2, o->out2);
 }
@@ -6582,7 +6585,9 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s)
     if (o.addr1) {
         tcg_temp_free_i64(o.addr1);
     }
-
+    if (o.out_128) {
+        tcg_temp_free_i128(o.out_128);
+    }
     /* io should be the last instruction in tb when icount is enabled */
     if (unlikely(icount && ret == DISAS_NEXT)) {
         ret = DISAS_TOO_MANY;
diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index 83e246ab16..20bf20c766 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -306,10 +306,10 @@
 /* CONVERT FROM FIXED */
     F(0xb394, CEFBR,   RRF_e, Z,   0, r2_32s, new, e1, cegb, 0, IF_BFP)
     F(0xb395, CDFBR,   RRF_e, Z,   0, r2_32s, new, f1, cdgb, 0, IF_BFP)
-    F(0xb396, CXFBR,   RRF_e, Z,   0, r2_32s, new_P, x1, cxgb, 0, IF_BFP)
+    F(0xb396, CXFBR,   RRF_e, Z,   0, r2_32s, new_x, x1, cxgb, 0, IF_BFP)
     F(0xb3a4, CEGBR,   RRF_e, Z,   0, r2_o, new, e1, cegb, 0, IF_BFP)
     F(0xb3a5, CDGBR,   RRF_e, Z,   0, r2_o, new, f1, cdgb, 0, IF_BFP)
-    F(0xb3a6, CXGBR,   RRF_e, Z,   0, r2_o, new_P, x1, cxgb, 0, IF_BFP)
+    F(0xb3a6, CXGBR,   RRF_e, Z,   0, r2_o, new_x, x1, cxgb, 0, IF_BFP)
 /* CONVERT TO LOGICAL */
     F(0xb39c, CLFEBR,  RRF_e, FPE, 0, e2, new, r1_32, clfeb, 0, IF_BFP)
     F(0xb39d, CLFDBR,  RRF_e, FPE, 0, f2, new, r1_32, clfdb, 0, IF_BFP)
@@ -320,10 +320,10 @@
 /* CONVERT FROM LOGICAL */
     F(0xb390, CELFBR,  RRF_e, FPE, 0, r2_32u, new, e1, celgb, 0, IF_BFP)
     F(0xb391, CDLFBR,  RRF_e, FPE, 0, r2_32u, new, f1, cdlgb, 0, IF_BFP)
-    F(0xb392, CXLFBR,  RRF_e, FPE, 0, r2_32u, new_P, x1, cxlgb, 0, IF_BFP)
+    F(0xb392, CXLFBR,  RRF_e, FPE, 0, r2_32u, new_x, x1, cxlgb, 0, IF_BFP)
     F(0xb3a0, CELGBR,  RRF_e, FPE, 0, r2_o, new, e1, celgb, 0, IF_BFP)
     F(0xb3a1, CDLGBR,  RRF_e, FPE, 0, r2_o, new, f1, cdlgb, 0, IF_BFP)
-    F(0xb3a2, CXLGBR,  RRF_e, FPE, 0, r2_o, new_P, x1, cxlgb, 0, IF_BFP)
+    F(0xb3a2, CXLGBR,  RRF_e, FPE, 0, r2_o, new_x, x1, cxlgb, 0, IF_BFP)
 
 /* CONVERT UTF-8 TO UTF-16 */
     D(0xb2a7, CU12,    RRF_c, Z,   0, 0, 0, 0, cuXX, 0, 12)
@@ -597,15 +597,15 @@
 /* LOAD FP INTEGER */
     F(0xb357, FIEBR,   RRF_e, Z,   0, e2, new, e1, fieb, 0, IF_BFP)
     F(0xb35f, FIDBR,   RRF_e, Z,   0, f2, new, f1, fidb, 0, IF_BFP)
-    F(0xb347, FIXBR,   RRF_e, Z,   x2h, x2l, new_P, x1, fixb, 0, IF_BFP)
+    F(0xb347, FIXBR,   RRF_e, Z,   x2h, x2l, new_x, x1, fixb, 0, IF_BFP)
 
 /* LOAD LENGTHENED */
     F(0xb304, LDEBR,   RRE,   Z,   0, e2, new, f1, ldeb, 0, IF_BFP)
-    F(0xb305, LXDBR,   RRE,   Z,   0, f2, new_P, x1, lxdb, 0, IF_BFP)
-    F(0xb306, LXEBR,   RRE,   Z,   0, e2, new_P, x1, lxeb, 0, IF_BFP)
+    F(0xb305, LXDBR,   RRE,   Z,   0, f2, new_x, x1, lxdb, 0, IF_BFP)
+    F(0xb306, LXEBR,   RRE,   Z,   0, e2, new_x, x1, lxeb, 0, IF_BFP)
     F(0xed04, LDEB,    RXE,   Z,   0, m2_32u, new, f1, ldeb, 0, IF_BFP)
-    F(0xed05, LXDB,    RXE,   Z,   0, m2_64, new_P, x1, lxdb, 0, IF_BFP)
-    F(0xed06, LXEB,    RXE,   Z,   0, m2_32u, new_P, x1, lxeb, 0, IF_BFP)
+    F(0xed05, LXDB,    RXE,   Z,   0, m2_64, new_x, x1, lxdb, 0, IF_BFP)
+    F(0xed06, LXEB,    RXE,   Z,   0, m2_32u, new_x, x1, lxeb, 0, IF_BFP)
     F(0xb324, LDER,    RXE,   Z,   0, e2, new, f1, lde, 0, IF_AFP1)
     F(0xed24, LDE,     RXE,   Z,   0, m2_32u, new, f1, lde, 0, IF_AFP1)
 /* LOAD ROUNDED */
@@ -835,7 +835,7 @@
 /* SQUARE ROOT */
     F(0xb314, SQEBR,   RRE,   Z,   0, e2, new, e1, sqeb, 0, IF_BFP)
     F(0xb315, SQDBR,   RRE,   Z,   0, f2, new, f1, sqdb, 0, IF_BFP)
-    F(0xb316, SQXBR,   RRE,   Z,   x2h, x2l, new_P, x1, sqxb, 0, IF_BFP)
+    F(0xb316, SQXBR,   RRE,   Z,   x2h, x2l, new_x, x1, sqxb, 0, IF_BFP)
     F(0xed14, SQEB,    RXE,   Z,   0, m2_32u, new, e1, sqeb, 0, IF_BFP)
     F(0xed15, SQDB,    RXE,   Z,   0, m2_64, new, f1, sqdb, 0, IF_BFP)
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 9/9] target/s390x: Use Int128 for passing float128
  2022-10-21  7:29 [PATCH 0/9] target/s390x: Use Int128 for float128 and retxl Richard Henderson
                   ` (7 preceding siblings ...)
  2022-10-21  7:30 ` [PATCH 8/9] target/s390x: Use Int128 for returning float128 Richard Henderson
@ 2022-10-21  7:30 ` Richard Henderson
  2022-10-24 18:01   ` Philippe Mathieu-Daudé
  2022-11-02  9:38   ` Ilya Leoshkevich
  8 siblings, 2 replies; 35+ messages in thread
From: Richard Henderson @ 2022-10-21  7:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-s390x

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/helper.h          | 32 ++++++-------
 target/s390x/tcg/fpu_helper.c  | 88 ++++++++++++++--------------------
 target/s390x/tcg/translate.c   | 76 ++++++++++++++++++++---------
 target/s390x/tcg/insn-data.def | 30 ++++++------
 4 files changed, 121 insertions(+), 105 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 429131a85e..481b9019f9 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -41,55 +41,55 @@ DEF_HELPER_4(csst, i32, env, i32, i64, i64)
 DEF_HELPER_4(csst_parallel, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_3(aeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(adb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(axb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(axb, TCG_CALL_NO_WG, i128, env, i128, i128)
 DEF_HELPER_FLAGS_3(seb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(sdb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(sxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(sxb, TCG_CALL_NO_WG, i128, env, i128, i128)
 DEF_HELPER_FLAGS_3(deb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(ddb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(dxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(dxb, TCG_CALL_NO_WG, i128, env, i128, i128)
 DEF_HELPER_FLAGS_3(meeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(mdeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(mdb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(mxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
-DEF_HELPER_FLAGS_4(mxdb, TCG_CALL_NO_WG, i128, env, i64, i64, i64)
+DEF_HELPER_FLAGS_3(mxb, TCG_CALL_NO_WG, i128, env, i128, i128)
+DEF_HELPER_FLAGS_3(mxdb, TCG_CALL_NO_WG, i128, env, i128, i64)
 DEF_HELPER_FLAGS_2(ldeb, TCG_CALL_NO_WG, i64, env, i64)
-DEF_HELPER_FLAGS_4(ldxb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
+DEF_HELPER_FLAGS_3(ldxb, TCG_CALL_NO_WG, i64, env, i128, i32)
 DEF_HELPER_FLAGS_2(lxdb, TCG_CALL_NO_WG, i128, env, i64)
 DEF_HELPER_FLAGS_2(lxeb, TCG_CALL_NO_WG, i128, env, i64)
 DEF_HELPER_FLAGS_3(ledb, TCG_CALL_NO_WG, i64, env, i64, i32)
-DEF_HELPER_FLAGS_4(lexb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
+DEF_HELPER_FLAGS_3(lexb, TCG_CALL_NO_WG, i64, env, i128, i32)
 DEF_HELPER_FLAGS_3(ceb, TCG_CALL_NO_WG_SE, i32, env, i64, i64)
 DEF_HELPER_FLAGS_3(cdb, TCG_CALL_NO_WG_SE, i32, env, i64, i64)
-DEF_HELPER_FLAGS_5(cxb, TCG_CALL_NO_WG_SE, i32, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(cxb, TCG_CALL_NO_WG_SE, i32, env, i128, i128)
 DEF_HELPER_FLAGS_3(keb, TCG_CALL_NO_WG, i32, env, i64, i64)
 DEF_HELPER_FLAGS_3(kdb, TCG_CALL_NO_WG, i32, env, i64, i64)
-DEF_HELPER_FLAGS_5(kxb, TCG_CALL_NO_WG, i32, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(kxb, TCG_CALL_NO_WG, i32, env, i128, i128)
 DEF_HELPER_3(cgeb, i64, env, i64, i32)
 DEF_HELPER_3(cgdb, i64, env, i64, i32)
-DEF_HELPER_4(cgxb, i64, env, i64, i64, i32)
+DEF_HELPER_3(cgxb, i64, env, i128, i32)
 DEF_HELPER_3(cfeb, i64, env, i64, i32)
 DEF_HELPER_3(cfdb, i64, env, i64, i32)
-DEF_HELPER_4(cfxb, i64, env, i64, i64, i32)
+DEF_HELPER_3(cfxb, i64, env, i128, i32)
 DEF_HELPER_3(clgeb, i64, env, i64, i32)
 DEF_HELPER_3(clgdb, i64, env, i64, i32)
-DEF_HELPER_4(clgxb, i64, env, i64, i64, i32)
+DEF_HELPER_3(clgxb, i64, env, i128, i32)
 DEF_HELPER_3(clfeb, i64, env, i64, i32)
 DEF_HELPER_3(clfdb, i64, env, i64, i32)
-DEF_HELPER_4(clfxb, i64, env, i64, i64, i32)
+DEF_HELPER_3(clfxb, i64, env, i128, i32)
 DEF_HELPER_FLAGS_3(fieb, TCG_CALL_NO_WG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_3(fidb, TCG_CALL_NO_WG, i64, env, i64, i32)
-DEF_HELPER_FLAGS_4(fixb, TCG_CALL_NO_WG, i128, env, i64, i64, i32)
+DEF_HELPER_FLAGS_3(fixb, TCG_CALL_NO_WG, i128, env, i128, i32)
 DEF_HELPER_FLAGS_4(maeb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(madb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(mseb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(msdb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_3(tceb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
 DEF_HELPER_FLAGS_3(tcdb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
-DEF_HELPER_FLAGS_4(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64, i64)
+DEF_HELPER_FLAGS_3(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i128, i64)
 DEF_HELPER_FLAGS_2(sqeb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_2(sqdb, TCG_CALL_NO_WG, i64, env, i64)
-DEF_HELPER_FLAGS_3(sqxb, TCG_CALL_NO_WG, i128, env, i64, i64)
+DEF_HELPER_FLAGS_2(sqxb, TCG_CALL_NO_WG, i128, env, i128)
 DEF_HELPER_FLAGS_1(cvd, TCG_CALL_NO_RWG_SE, i64, s32)
 DEF_HELPER_FLAGS_4(pack, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(pka, TCG_CALL_NO_WG, void, env, i64, i64, i32)
diff --git a/target/s390x/tcg/fpu_helper.c b/target/s390x/tcg/fpu_helper.c
index a584794be6..5a322e3f87 100644
--- a/target/s390x/tcg/fpu_helper.c
+++ b/target/s390x/tcg/fpu_helper.c
@@ -39,6 +39,11 @@ static inline Int128 RET128(float128 f)
     return int128_make128(f.low, f.high);
 }
 
+static inline float128 ARG128(Int128 i)
+{
+    return make_float128(int128_gethi(i), int128_getlo(i));
+}
+
 uint8_t s390_softfloat_exc_to_ieee(unsigned int exc)
 {
     uint8_t s390_exc = 0;
@@ -227,12 +232,9 @@ uint64_t HELPER(adb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
 }
 
 /* 128-bit FP addition */
-Int128 HELPER(axb)(CPUS390XState *env, uint64_t ah, uint64_t al,
-                     uint64_t bh, uint64_t bl)
+Int128 HELPER(axb)(CPUS390XState *env, Int128 a, Int128 b)
 {
-    float128 ret = float128_add(make_float128(ah, al),
-                                make_float128(bh, bl),
-                                &env->fpu_status);
+    float128 ret = float128_add(ARG128(a), ARG128(b), &env->fpu_status);
     handle_exceptions(env, false, GETPC());
     return RET128(ret);
 }
@@ -254,12 +256,9 @@ uint64_t HELPER(sdb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
 }
 
 /* 128-bit FP subtraction */
-Int128 HELPER(sxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
-                     uint64_t bh, uint64_t bl)
+Int128 HELPER(sxb)(CPUS390XState *env, Int128 a, Int128 b)
 {
-    float128 ret = float128_sub(make_float128(ah, al),
-                                make_float128(bh, bl),
-                                &env->fpu_status);
+    float128 ret = float128_sub(ARG128(a), ARG128(b), &env->fpu_status);
     handle_exceptions(env, false, GETPC());
     return RET128(ret);
 }
@@ -281,12 +280,9 @@ uint64_t HELPER(ddb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
 }
 
 /* 128-bit FP division */
-Int128 HELPER(dxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
-                     uint64_t bh, uint64_t bl)
+Int128 HELPER(dxb)(CPUS390XState *env, Int128 a, Int128 b)
 {
-    float128 ret = float128_div(make_float128(ah, al),
-                                make_float128(bh, bl),
-                                &env->fpu_status);
+    float128 ret = float128_div(ARG128(a), ARG128(b), &env->fpu_status);
     handle_exceptions(env, false, GETPC());
     return RET128(ret);
 }
@@ -317,21 +313,18 @@ uint64_t HELPER(mdeb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
 }
 
 /* 128-bit FP multiplication */
-Int128 HELPER(mxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
-                     uint64_t bh, uint64_t bl)
+Int128 HELPER(mxb)(CPUS390XState *env, Int128 a, Int128 b)
 {
-    float128 ret = float128_mul(make_float128(ah, al),
-                                make_float128(bh, bl),
-                                &env->fpu_status);
+    float128 ret = float128_mul(ARG128(a), ARG128(b), &env->fpu_status);
     handle_exceptions(env, false, GETPC());
     return RET128(ret);
 }
 
 /* 128/64-bit FP multiplication */
-Int128 HELPER(mxdb)(CPUS390XState *env, uint64_t ah, uint64_t al, uint64_t f2)
+Int128 HELPER(mxdb)(CPUS390XState *env, Int128 a, uint64_t f2)
 {
     float128 ret = float64_to_float128(f2, &env->fpu_status);
-    ret = float128_mul(make_float128(ah, al), ret, &env->fpu_status);
+    ret = float128_mul(ARG128(a), ret, &env->fpu_status);
     handle_exceptions(env, false, GETPC());
     return RET128(ret);
 }
@@ -345,11 +338,10 @@ uint64_t HELPER(ldeb)(CPUS390XState *env, uint64_t f2)
 }
 
 /* convert 128-bit float to 64-bit float */
-uint64_t HELPER(ldxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
-                      uint32_t m34)
+uint64_t HELPER(ldxb)(CPUS390XState *env, Int128 a, uint32_t m34)
 {
     int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
-    float64 ret = float128_to_float64(make_float128(ah, al), &env->fpu_status);
+    float64 ret = float128_to_float64(ARG128(a), &env->fpu_status);
 
     s390_restore_bfp_rounding_mode(env, old_mode);
     handle_exceptions(env, xxc_from_m34(m34), GETPC());
@@ -384,11 +376,10 @@ uint64_t HELPER(ledb)(CPUS390XState *env, uint64_t f2, uint32_t m34)
 }
 
 /* convert 128-bit float to 32-bit float */
-uint64_t HELPER(lexb)(CPUS390XState *env, uint64_t ah, uint64_t al,
-                      uint32_t m34)
+uint64_t HELPER(lexb)(CPUS390XState *env, Int128 a, uint32_t m34)
 {
     int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
-    float32 ret = float128_to_float32(make_float128(ah, al), &env->fpu_status);
+    float32 ret = float128_to_float32(ARG128(a), &env->fpu_status);
 
     s390_restore_bfp_rounding_mode(env, old_mode);
     handle_exceptions(env, xxc_from_m34(m34), GETPC());
@@ -412,11 +403,9 @@ uint32_t HELPER(cdb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
 }
 
 /* 128-bit FP compare */
-uint32_t HELPER(cxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
-                     uint64_t bh, uint64_t bl)
+uint32_t HELPER(cxb)(CPUS390XState *env, Int128 a, Int128 b)
 {
-    FloatRelation cmp = float128_compare_quiet(make_float128(ah, al),
-                                               make_float128(bh, bl),
+    FloatRelation cmp = float128_compare_quiet(ARG128(a), ARG128(b),
                                                &env->fpu_status);
     handle_exceptions(env, false, GETPC());
     return float_comp_to_cc(env, cmp);
@@ -564,10 +553,10 @@ uint64_t HELPER(cgdb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
 }
 
 /* convert 128-bit float to 64-bit int */
-uint64_t HELPER(cgxb)(CPUS390XState *env, uint64_t h, uint64_t l, uint32_t m34)
+uint64_t HELPER(cgxb)(CPUS390XState *env, Int128 i2, uint32_t m34)
 {
     int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
-    float128 v2 = make_float128(h, l);
+    float128 v2 = ARG128(i2);
     int64_t ret = float128_to_int64(v2, &env->fpu_status);
     uint32_t cc = set_cc_conv_f128(v2, &env->fpu_status);
 
@@ -613,10 +602,10 @@ uint64_t HELPER(cfdb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
 }
 
 /* convert 128-bit float to 32-bit int */
-uint64_t HELPER(cfxb)(CPUS390XState *env, uint64_t h, uint64_t l, uint32_t m34)
+uint64_t HELPER(cfxb)(CPUS390XState *env, Int128 i2, uint32_t m34)
 {
     int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
-    float128 v2 = make_float128(h, l);
+    float128 v2 = ARG128(i2);
     int32_t ret = float128_to_int32(v2, &env->fpu_status);
     uint32_t cc = set_cc_conv_f128(v2, &env->fpu_status);
 
@@ -662,10 +651,10 @@ uint64_t HELPER(clgdb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
 }
 
 /* convert 128-bit float to 64-bit uint */
-uint64_t HELPER(clgxb)(CPUS390XState *env, uint64_t h, uint64_t l, uint32_t m34)
+uint64_t HELPER(clgxb)(CPUS390XState *env, Int128 i2, uint32_t m34)
 {
     int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
-    float128 v2 = make_float128(h, l);
+    float128 v2 = ARG128(i2);
     uint64_t ret = float128_to_uint64(v2, &env->fpu_status);
     uint32_t cc = set_cc_conv_f128(v2, &env->fpu_status);
 
@@ -711,10 +700,10 @@ uint64_t HELPER(clfdb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
 }
 
 /* convert 128-bit float to 32-bit uint */
-uint64_t HELPER(clfxb)(CPUS390XState *env, uint64_t h, uint64_t l, uint32_t m34)
+uint64_t HELPER(clfxb)(CPUS390XState *env, Int128 i2, uint32_t m34)
 {
     int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
-    float128 v2 = make_float128(h, l);
+    float128 v2 = ARG128(i2);
     uint32_t ret = float128_to_uint32(v2, &env->fpu_status);
     uint32_t cc = set_cc_conv_f128(v2, &env->fpu_status);
 
@@ -750,11 +739,10 @@ uint64_t HELPER(fidb)(CPUS390XState *env, uint64_t f2, uint32_t m34)
 }
 
 /* round to integer 128-bit */
-Int128 HELPER(fixb)(CPUS390XState *env, uint64_t ah, uint64_t al, uint32_t m34)
+Int128 HELPER(fixb)(CPUS390XState *env, Int128 a, uint32_t m34)
 {
     int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
-    float128 ret = float128_round_to_int(make_float128(ah, al),
-                                         &env->fpu_status);
+    float128 ret = float128_round_to_int(ARG128(a), &env->fpu_status);
 
     s390_restore_bfp_rounding_mode(env, old_mode);
     handle_exceptions(env, xxc_from_m34(m34), GETPC());
@@ -778,11 +766,9 @@ uint32_t HELPER(kdb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
 }
 
 /* 128-bit FP compare and signal */
-uint32_t HELPER(kxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
-                     uint64_t bh, uint64_t bl)
+uint32_t HELPER(kxb)(CPUS390XState *env, Int128 a, Int128 b)
 {
-    FloatRelation cmp = float128_compare(make_float128(ah, al),
-                                         make_float128(bh, bl),
+    FloatRelation cmp = float128_compare(ARG128(a), ARG128(b),
                                          &env->fpu_status);
     handle_exceptions(env, false, GETPC());
     return float_comp_to_cc(env, cmp);
@@ -869,9 +855,9 @@ uint32_t HELPER(tcdb)(CPUS390XState *env, uint64_t v1, uint64_t m2)
 }
 
 /* test data class 128-bit */
-uint32_t HELPER(tcxb)(CPUS390XState *env, uint64_t ah, uint64_t al, uint64_t m2)
+uint32_t HELPER(tcxb)(CPUS390XState *env, Int128 a, uint64_t m2)
 {
-    return (m2 & float128_dcmask(env, make_float128(ah, al))) != 0;
+    return (m2 & float128_dcmask(env, ARG128(a))) != 0;
 }
 
 /* square root 32-bit */
@@ -891,9 +877,9 @@ uint64_t HELPER(sqdb)(CPUS390XState *env, uint64_t f2)
 }
 
 /* square root 128-bit */
-Int128 HELPER(sqxb)(CPUS390XState *env, uint64_t ah, uint64_t al)
+Int128 HELPER(sqxb)(CPUS390XState *env, Int128 a)
 {
-    float128 ret = float128_sqrt(make_float128(ah, al), &env->fpu_status);
+    float128 ret = float128_sqrt(ARG128(a), &env->fpu_status);
     handle_exceptions(env, false, GETPC());
     return RET128(ret);
 }
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index d1ffbb8710..8023bbab2f 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -305,6 +305,18 @@ static TCGv_i64 load_freg32_i64(int reg)
     return r;
 }
 
+static TCGv_i128 load_freg_128(int reg)
+{
+    TCGv_i64 h = load_freg(reg);
+    TCGv_i64 l = load_freg(reg + 2);
+    TCGv_i128 r = tcg_temp_new_i128();
+
+    tcg_gen_concat_i64_i128(r, l, h);
+    tcg_temp_free_i64(h);
+    tcg_temp_free_i64(l);
+    return r;
+}
+
 static void store_reg(int reg, TCGv_i64 v)
 {
     tcg_gen_mov_i64(regs[reg], v);
@@ -1103,7 +1115,7 @@ typedef struct {
     bool g_out, g_out2, g_in1, g_in2;
     TCGv_i64 out, out2, in1, in2;
     TCGv_i64 addr1;
-    TCGv_i128 out_128;
+    TCGv_i128 out_128, in1_128, in2_128;
 } DisasOps;
 
 /* Instructions can place constraints on their operands, raising specification
@@ -1462,7 +1474,7 @@ static DisasJumpType op_adb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_axb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_axb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
+    gen_helper_axb(o->out_128, cpu_env, o->in1_128, o->in2_128);
     return DISAS_NEXT;
 }
 
@@ -1778,7 +1790,7 @@ static DisasJumpType op_cdb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_cxb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_cxb(cc_op, cpu_env, o->out, o->out2, o->in1, o->in2);
+    gen_helper_cxb(cc_op, cpu_env, o->in1_128, o->in2_128);
     set_cc_static(s);
     return DISAS_NEXT;
 }
@@ -1841,7 +1853,7 @@ static DisasJumpType op_cfxb(DisasContext *s, DisasOps *o)
     if (!m34) {
         return DISAS_NORETURN;
     }
-    gen_helper_cfxb(o->out, cpu_env, o->in1, o->in2, m34);
+    gen_helper_cfxb(o->out, cpu_env, o->in2_128, m34);
     tcg_temp_free_i32(m34);
     set_cc_static(s);
     return DISAS_NEXT;
@@ -1880,7 +1892,7 @@ static DisasJumpType op_cgxb(DisasContext *s, DisasOps *o)
     if (!m34) {
         return DISAS_NORETURN;
     }
-    gen_helper_cgxb(o->out, cpu_env, o->in1, o->in2, m34);
+    gen_helper_cgxb(o->out, cpu_env, o->in2_128, m34);
     tcg_temp_free_i32(m34);
     set_cc_static(s);
     return DISAS_NEXT;
@@ -1919,7 +1931,7 @@ static DisasJumpType op_clfxb(DisasContext *s, DisasOps *o)
     if (!m34) {
         return DISAS_NORETURN;
     }
-    gen_helper_clfxb(o->out, cpu_env, o->in1, o->in2, m34);
+    gen_helper_clfxb(o->out, cpu_env, o->in2_128, m34);
     tcg_temp_free_i32(m34);
     set_cc_static(s);
     return DISAS_NEXT;
@@ -1958,7 +1970,7 @@ static DisasJumpType op_clgxb(DisasContext *s, DisasOps *o)
     if (!m34) {
         return DISAS_NORETURN;
     }
-    gen_helper_clgxb(o->out, cpu_env, o->in1, o->in2, m34);
+    gen_helper_clgxb(o->out, cpu_env, o->in2_128, m34);
     tcg_temp_free_i32(m34);
     set_cc_static(s);
     return DISAS_NEXT;
@@ -2448,7 +2460,7 @@ static DisasJumpType op_ddb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_dxb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_dxb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
+    gen_helper_dxb(o->out_128, cpu_env, o->in1_128, o->in2_128);
     return DISAS_NEXT;
 }
 
@@ -2553,7 +2565,7 @@ static DisasJumpType op_fixb(DisasContext *s, DisasOps *o)
     if (!m34) {
         return DISAS_NORETURN;
     }
-    gen_helper_fixb(o->out_128, cpu_env, o->in1, o->in2, m34);
+    gen_helper_fixb(o->out_128, cpu_env, o->in2_128, m34);
     tcg_temp_free_i32(m34);
     return DISAS_NEXT;
 }
@@ -2772,7 +2784,7 @@ static DisasJumpType op_kdb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_kxb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_kxb(cc_op, cpu_env, o->out, o->out2, o->in1, o->in2);
+    gen_helper_kxb(cc_op, cpu_env, o->in1_128, o->in2_128);
     set_cc_static(s);
     return DISAS_NEXT;
 }
@@ -2846,7 +2858,7 @@ static DisasJumpType op_ldxb(DisasContext *s, DisasOps *o)
     if (!m34) {
         return DISAS_NORETURN;
     }
-    gen_helper_ldxb(o->out, cpu_env, o->in1, o->in2, m34);
+    gen_helper_ldxb(o->out, cpu_env, o->in2_128, m34);
     tcg_temp_free_i32(m34);
     return DISAS_NEXT;
 }
@@ -2858,7 +2870,7 @@ static DisasJumpType op_lexb(DisasContext *s, DisasOps *o)
     if (!m34) {
         return DISAS_NORETURN;
     }
-    gen_helper_lexb(o->out, cpu_env, o->in1, o->in2, m34);
+    gen_helper_lexb(o->out, cpu_env, o->in2_128, m34);
     tcg_temp_free_i32(m34);
     return DISAS_NEXT;
 }
@@ -3586,13 +3598,13 @@ static DisasJumpType op_mdb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_mxb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_mxb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
+    gen_helper_mxb(o->out_128, cpu_env, o->in1_128, o->in2_128);
     return DISAS_NEXT;
 }
 
 static DisasJumpType op_mxdb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_mxdb(o->out_128, cpu_env, o->out, o->out2, o->in2);
+    gen_helper_mxdb(o->out_128, cpu_env, o->in1_128, o->in2);
     return DISAS_NEXT;
 }
 
@@ -4057,7 +4069,7 @@ static DisasJumpType op_sdb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_sxb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_sxb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
+    gen_helper_sxb(o->out_128, cpu_env, o->in1_128, o->in2_128);
     return DISAS_NEXT;
 }
 
@@ -4075,7 +4087,7 @@ static DisasJumpType op_sqdb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_sqxb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_sqxb(o->out_128, cpu_env, o->in1, o->in2);
+    gen_helper_sqxb(o->out_128, cpu_env, o->in2_128);
     return DISAS_NEXT;
 }
 
@@ -4854,7 +4866,7 @@ static DisasJumpType op_tcdb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_tcxb(DisasContext *s, DisasOps *o)
 {
-    gen_helper_tcxb(cc_op, cpu_env, o->out, o->out2, o->in2);
+    gen_helper_tcxb(cc_op, cpu_env, o->in1_128, o->in2);
     set_cc_static(s);
     return DISAS_NEXT;
 }
@@ -5389,8 +5401,6 @@ static void prep_new_P(DisasContext *s, DisasOps *o)
 
 static void prep_new_x(DisasContext *s, DisasOps *o)
 {
-    o->out = tcg_temp_new_i64();
-    o->out2 = tcg_temp_new_i64();
     o->out_128 = tcg_temp_new_i128();
 }
 #define SPEC_prep_new_x 0
@@ -5413,10 +5423,7 @@ static void prep_r1_P(DisasContext *s, DisasOps *o)
 
 static void prep_x1(DisasContext *s, DisasOps *o)
 {
-    o->out = load_freg(get_field(s, r1));
-    o->out2 = load_freg(get_field(s, r1) + 2);
-    o->out_128 = tcg_temp_new_i128();
-    tcg_gen_concat_i64_i128(o->out_128, o->out2, o->out);
+    o->out_128 = load_freg_128(get_field(s, r1));
 }
 #define SPEC_prep_x1 SPEC_r1_f128
 
@@ -5515,6 +5522,11 @@ static void wout_x1(DisasContext *s, DisasOps *o)
 {
     int f1 = get_field(s, r1);
 
+    /* Split out_128 into out+out2 for cout_f128. */
+    tcg_debug_assert(o->out == NULL);
+    o->out = tcg_temp_new_i64();
+    o->out2 = tcg_temp_new_i64();
+
     tcg_gen_extr_i128_i64(o->out2, o->out, o->out_128);
     store_freg(f1, o->out);
     store_freg(f1 + 2, o->out2);
@@ -5757,6 +5769,12 @@ static void in1_f1(DisasContext *s, DisasOps *o)
 }
 #define SPEC_in1_f1 0
 
+static void in1_x1(DisasContext *s, DisasOps *o)
+{
+    o->in1_128 = load_freg_128(get_field(s, r1));
+}
+#define SPEC_in1_x1 SPEC_r2_f128
+
 /* Load the high double word of an extended (128-bit) format FP number */
 static void in1_x2h(DisasContext *s, DisasOps *o)
 {
@@ -5966,6 +5984,12 @@ static void in2_f2(DisasContext *s, DisasOps *o)
 }
 #define SPEC_in2_f2 0
 
+static void in2_x2(DisasContext *s, DisasOps *o)
+{
+    o->in2_128 = load_freg_128(get_field(s, r2));
+}
+#define SPEC_in2_x2 SPEC_r2_f128
+
 /* Load the low double word of an extended (128-bit) format FP number */
 static void in2_x2l(DisasContext *s, DisasOps *o)
 {
@@ -6588,6 +6612,12 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s)
     if (o.out_128) {
         tcg_temp_free_i128(o.out_128);
     }
+    if (o.in1_128) {
+        tcg_temp_free_i128(o.in1_128);
+    }
+    if (o.in2_128) {
+        tcg_temp_free_i128(o.in2_128);
+    }
     /* io should be the last instruction in tb when icount is enabled */
     if (unlikely(icount && ret == DISAS_NEXT)) {
         ret = DISAS_TOO_MANY;
diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index 20bf20c766..26523746d6 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -34,7 +34,7 @@
     C(0xe318, AGF,     RXY_a, Z,   r1, m2_32s, r1, 0, add, adds64)
     F(0xb30a, AEBR,    RRE,   Z,   e1, e2, new, e1, aeb, f32, IF_BFP)
     F(0xb31a, ADBR,    RRE,   Z,   f1, f2, new, f1, adb, f64, IF_BFP)
-    F(0xb34a, AXBR,    RRE,   Z,   x2h, x2l, x1, x1, axb, f128, IF_BFP)
+    F(0xb34a, AXBR,    RRE,   Z,   x1, x2, new_x, x1, axb, f128, IF_BFP)
     F(0xed0a, AEB,     RXE,   Z,   e1, m2_32u, new, e1, aeb, f32, IF_BFP)
     F(0xed1a, ADB,     RXE,   Z,   f1, m2_64, new, f1, adb, f64, IF_BFP)
 /* ADD HIGH */
@@ -172,13 +172,13 @@
     C(0xe330, CGF,     RXY_a, Z,   r1_o, m2_32s, 0, 0, 0, cmps64)
     F(0xb309, CEBR,    RRE,   Z,   e1, e2, 0, 0, ceb, 0, IF_BFP)
     F(0xb319, CDBR,    RRE,   Z,   f1, f2, 0, 0, cdb, 0, IF_BFP)
-    F(0xb349, CXBR,    RRE,   Z,   x2h, x2l, x1, 0, cxb, 0, IF_BFP)
+    F(0xb349, CXBR,    RRE,   Z,   x1, x2, 0, 0, cxb, 0, IF_BFP)
     F(0xed09, CEB,     RXE,   Z,   e1, m2_32u, 0, 0, ceb, 0, IF_BFP)
     F(0xed19, CDB,     RXE,   Z,   f1, m2_64, 0, 0, cdb, 0, IF_BFP)
 /* COMPARE AND SIGNAL */
     F(0xb308, KEBR,    RRE,   Z,   e1, e2, 0, 0, keb, 0, IF_BFP)
     F(0xb318, KDBR,    RRE,   Z,   f1, f2, 0, 0, kdb, 0, IF_BFP)
-    F(0xb348, KXBR,    RRE,   Z,   x2h, x2l, x1, 0, kxb, 0, IF_BFP)
+    F(0xb348, KXBR,    RRE,   Z,   x1, x2, 0, 0, kxb, 0, IF_BFP)
     F(0xed08, KEB,     RXE,   Z,   e1, m2_32u, 0, 0, keb, 0, IF_BFP)
     F(0xed18, KDB,     RXE,   Z,   f1, m2_64, 0, 0, kdb, 0, IF_BFP)
 /* COMPARE IMMEDIATE */
@@ -299,10 +299,10 @@
 /* CONVERT TO FIXED */
     F(0xb398, CFEBR,   RRF_e, Z,   0, e2, new, r1_32, cfeb, 0, IF_BFP)
     F(0xb399, CFDBR,   RRF_e, Z,   0, f2, new, r1_32, cfdb, 0, IF_BFP)
-    F(0xb39a, CFXBR,   RRF_e, Z,   x2h, x2l, new, r1_32, cfxb, 0, IF_BFP)
+    F(0xb39a, CFXBR,   RRF_e, Z,   0, x2, new, r1_32, cfxb, 0, IF_BFP)
     F(0xb3a8, CGEBR,   RRF_e, Z,   0, e2, r1, 0, cgeb, 0, IF_BFP)
     F(0xb3a9, CGDBR,   RRF_e, Z,   0, f2, r1, 0, cgdb, 0, IF_BFP)
-    F(0xb3aa, CGXBR,   RRF_e, Z,   x2h, x2l, r1, 0, cgxb, 0, IF_BFP)
+    F(0xb3aa, CGXBR,   RRF_e, Z,   0, x2, r1, 0, cgxb, 0, IF_BFP)
 /* CONVERT FROM FIXED */
     F(0xb394, CEFBR,   RRF_e, Z,   0, r2_32s, new, e1, cegb, 0, IF_BFP)
     F(0xb395, CDFBR,   RRF_e, Z,   0, r2_32s, new, f1, cdgb, 0, IF_BFP)
@@ -313,10 +313,10 @@
 /* CONVERT TO LOGICAL */
     F(0xb39c, CLFEBR,  RRF_e, FPE, 0, e2, new, r1_32, clfeb, 0, IF_BFP)
     F(0xb39d, CLFDBR,  RRF_e, FPE, 0, f2, new, r1_32, clfdb, 0, IF_BFP)
-    F(0xb39e, CLFXBR,  RRF_e, FPE, x2h, x2l, new, r1_32, clfxb, 0, IF_BFP)
+    F(0xb39e, CLFXBR,  RRF_e, FPE, 0, x2, new, r1_32, clfxb, 0, IF_BFP)
     F(0xb3ac, CLGEBR,  RRF_e, FPE, 0, e2, r1, 0, clgeb, 0, IF_BFP)
     F(0xb3ad, CLGDBR,  RRF_e, FPE, 0, f2, r1, 0, clgdb, 0, IF_BFP)
-    F(0xb3ae, CLGXBR,  RRF_e, FPE, x2h, x2l, r1, 0, clgxb, 0, IF_BFP)
+    F(0xb3ae, CLGXBR,  RRF_e, FPE, 0, x2, r1, 0, clgxb, 0, IF_BFP)
 /* CONVERT FROM LOGICAL */
     F(0xb390, CELFBR,  RRF_e, FPE, 0, r2_32u, new, e1, celgb, 0, IF_BFP)
     F(0xb391, CDLFBR,  RRF_e, FPE, 0, r2_32u, new, f1, cdlgb, 0, IF_BFP)
@@ -343,7 +343,7 @@
     C(0x5d00, D,       RX_a,  Z,   r1_D32, m2_32s, new_P, r1_P32, divs32, 0)
     F(0xb30d, DEBR,    RRE,   Z,   e1, e2, new, e1, deb, 0, IF_BFP)
     F(0xb31d, DDBR,    RRE,   Z,   f1, f2, new, f1, ddb, 0, IF_BFP)
-    F(0xb34d, DXBR,    RRE,   Z,   x2h, x2l, x1, x1, dxb, 0, IF_BFP)
+    F(0xb34d, DXBR,    RRE,   Z,   x1, x2, new_x, x1, dxb, 0, IF_BFP)
     F(0xed0d, DEB,     RXE,   Z,   e1, m2_32u, new, e1, deb, 0, IF_BFP)
     F(0xed1d, DDB,     RXE,   Z,   f1, m2_64, new, f1, ddb, 0, IF_BFP)
 /* DIVIDE LOGICAL */
@@ -597,7 +597,7 @@
 /* LOAD FP INTEGER */
     F(0xb357, FIEBR,   RRF_e, Z,   0, e2, new, e1, fieb, 0, IF_BFP)
     F(0xb35f, FIDBR,   RRF_e, Z,   0, f2, new, f1, fidb, 0, IF_BFP)
-    F(0xb347, FIXBR,   RRF_e, Z,   x2h, x2l, new_x, x1, fixb, 0, IF_BFP)
+    F(0xb347, FIXBR,   RRF_e, Z,   0, x2, new_x, x1, fixb, 0, IF_BFP)
 
 /* LOAD LENGTHENED */
     F(0xb304, LDEBR,   RRE,   Z,   0, e2, new, f1, ldeb, 0, IF_BFP)
@@ -610,8 +610,8 @@
     F(0xed24, LDE,     RXE,   Z,   0, m2_32u, new, f1, lde, 0, IF_AFP1)
 /* LOAD ROUNDED */
     F(0xb344, LEDBR,   RRF_e, Z,   0, f2, new, e1, ledb, 0, IF_BFP)
-    F(0xb345, LDXBR,   RRF_e, Z,   x2h, x2l, new, f1, ldxb, 0, IF_BFP)
-    F(0xb346, LEXBR,   RRF_e, Z,   x2h, x2l, new, e1, lexb, 0, IF_BFP)
+    F(0xb345, LDXBR,   RRF_e, Z,   0, x2, new, f1, ldxb, 0, IF_BFP)
+    F(0xb346, LEXBR,   RRF_e, Z,   0, x2, new, e1, lexb, 0, IF_BFP)
 
 /* LOAD MULTIPLE */
     C(0x9800, LM,      RS_a,  Z,   0, a2, 0, 0, lm32, 0)
@@ -666,7 +666,7 @@
     C(0xe384, MG,      RXY_a, MIE2,r1p1_o, m2_64, r1_P, 0, muls128, 0)
     F(0xb317, MEEBR,   RRE,   Z,   e1, e2, new, e1, meeb, 0, IF_BFP)
     F(0xb31c, MDBR,    RRE,   Z,   f1, f2, new, f1, mdb, 0, IF_BFP)
-    F(0xb34c, MXBR,    RRE,   Z,   x2h, x2l, x1, x1, mxb, 0, IF_BFP)
+    F(0xb34c, MXBR,    RRE,   Z,   x1, x2, new_x, x1, mxb, 0, IF_BFP)
     F(0xb30c, MDEBR,   RRE,   Z,   f1, e2, new, f1, mdeb, 0, IF_BFP)
     F(0xb307, MXDBR,   RRE,   Z,   0, f2, x1, x1, mxdb, 0, IF_BFP)
     F(0xed17, MEEB,    RXE,   Z,   e1, m2_32u, new, e1, meeb, 0, IF_BFP)
@@ -835,7 +835,7 @@
 /* SQUARE ROOT */
     F(0xb314, SQEBR,   RRE,   Z,   0, e2, new, e1, sqeb, 0, IF_BFP)
     F(0xb315, SQDBR,   RRE,   Z,   0, f2, new, f1, sqdb, 0, IF_BFP)
-    F(0xb316, SQXBR,   RRE,   Z,   x2h, x2l, new_x, x1, sqxb, 0, IF_BFP)
+    F(0xb316, SQXBR,   RRE,   Z,   0, x2, new_x, x1, sqxb, 0, IF_BFP)
     F(0xed14, SQEB,    RXE,   Z,   0, m2_32u, new, e1, sqeb, 0, IF_BFP)
     F(0xed15, SQDB,    RXE,   Z,   0, m2_64, new, f1, sqdb, 0, IF_BFP)
 
@@ -913,7 +913,7 @@
     C(0xe319, SGF,     RXY_a, Z,   r1, m2_32s, r1, 0, sub, subs64)
     F(0xb30b, SEBR,    RRE,   Z,   e1, e2, new, e1, seb, f32, IF_BFP)
     F(0xb31b, SDBR,    RRE,   Z,   f1, f2, new, f1, sdb, f64, IF_BFP)
-    F(0xb34b, SXBR,    RRE,   Z,   x2h, x2l, x1, x1, sxb, f128, IF_BFP)
+    F(0xb34b, SXBR,    RRE,   Z,   x1, x2, new_x, x1, sxb, f128, IF_BFP)
     F(0xed0b, SEB,     RXE,   Z,   e1, m2_32u, new, e1, seb, f32, IF_BFP)
     F(0xed1b, SDB,     RXE,   Z,   f1, m2_64, new, f1, sdb, f64, IF_BFP)
 /* SUBTRACT HALFWORD */
@@ -957,7 +957,7 @@
 /* TEST DATA CLASS */
     F(0xed10, TCEB,    RXE,   Z,   e1, a2, 0, 0, tceb, 0, IF_BFP)
     F(0xed11, TCDB,    RXE,   Z,   f1, a2, 0, 0, tcdb, 0, IF_BFP)
-    F(0xed12, TCXB,    RXE,   Z,   0, a2, x1, 0, tcxb, 0, IF_BFP)
+    F(0xed12, TCXB,    RXE,   Z,   x1, a2, 0, 0, tcxb, 0, IF_BFP)
 
 /* TEST DECIMAL */
     C(0xebc0, TP,      RSL,   E2,  la1, 0, 0, 0, tp, 0)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] target/s390x: Use a single return for helper_divs32/u32
  2022-10-21  7:29 ` [PATCH 1/9] target/s390x: Use a single return for helper_divs32/u32 Richard Henderson
@ 2022-10-21 11:25   ` Ilya Leoshkevich
  2022-10-25  9:59   ` Philippe Mathieu-Daudé
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2022-10-21 11:25 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On Fri, 2022-10-21 at 17:29 +1000, Richard Henderson wrote:
> Pack the quotient and remainder into a single uint64_t.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/s390x/helper.h         |  2 +-
>  target/s390x/tcg/int_helper.c | 26 +++++++++++++-------------
>  target/s390x/tcg/translate.c  | 10 ++++++----
>  3 files changed, 20 insertions(+), 18 deletions(-)

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] target/s390x: Use Int128 for return from CLST
  2022-10-21  7:30 ` [PATCH 3/9] target/s390x: Use Int128 for return from CLST Richard Henderson
@ 2022-10-24 13:15   ` Philippe Mathieu-Daudé
  2022-10-25 21:22   ` Ilya Leoshkevich
  2022-10-25 21:30   ` [PATCH 0/1] " Ilya Leoshkevich
  2 siblings, 0 replies; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-24 13:15 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On 21/10/22 09:30, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/s390x/helper.h         |  2 +-
>   target/s390x/tcg/mem_helper.c | 11 ++++-------
>   target/s390x/tcg/translate.c  |  8 ++++++--
>   3 files changed, 11 insertions(+), 10 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/9] target/s390x: Use a single return for helper_divs64/u64
  2022-10-21  7:29 ` [PATCH 2/9] target/s390x: Use a single return for helper_divs64/u64 Richard Henderson
@ 2022-10-24 13:16   ` Philippe Mathieu-Daudé
  2022-10-25 20:47   ` Ilya Leoshkevich
  1 sibling, 0 replies; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-24 13:16 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On 21/10/22 09:29, Richard Henderson wrote:
> Pack the quotient and remainder into a single Int128.
> Use the divu128 primitive to remove the cpu_abort on
> 32-bit hosts.
> 
> This is the first use of Int128 as a return value.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/s390x/helper.h         |  4 ++--
>   target/s390x/tcg/int_helper.c | 38 +++++++++--------------------------
>   target/s390x/tcg/translate.c  | 14 +++++++++----
>   3 files changed, 21 insertions(+), 35 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/9] target/s390x: Use Int128 for return from CKSM
  2022-10-21  7:30 ` [PATCH 4/9] target/s390x: Use Int128 for return from CKSM Richard Henderson
@ 2022-10-24 13:17   ` Philippe Mathieu-Daudé
  2022-10-27 10:36   ` Ilya Leoshkevich
  1 sibling, 0 replies; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-24 13:17 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On 21/10/22 09:30, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/s390x/helper.h         | 2 +-
>   target/s390x/tcg/mem_helper.c | 7 +++----
>   target/s390x/tcg/translate.c  | 6 ++++--
>   3 files changed, 8 insertions(+), 7 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/9] target/s390x: Use Int128 for return from TRE
  2022-10-21  7:30 ` [PATCH 5/9] target/s390x: Use Int128 for return from TRE Richard Henderson
@ 2022-10-24 13:17   ` Philippe Mathieu-Daudé
  2022-10-27 10:40   ` Ilya Leoshkevich
  1 sibling, 0 replies; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-24 13:17 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On 21/10/22 09:30, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/s390x/helper.h         | 2 +-
>   target/s390x/tcg/mem_helper.c | 7 +++----
>   target/s390x/tcg/translate.c  | 7 +++++--
>   3 files changed, 9 insertions(+), 7 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/9] tests/tcg/s390x: Add long-double.c
  2022-10-21  7:30 ` [PATCH 7/9] tests/tcg/s390x: Add long-double.c Richard Henderson
@ 2022-10-24 13:19   ` Philippe Mathieu-Daudé
  2022-10-27 11:04   ` Ilya Leoshkevich
  1 sibling, 0 replies; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-24 13:19 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On 21/10/22 09:30, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   tests/tcg/s390x/long-double.c   | 24 ++++++++++++++++++++++++
>   tests/tcg/s390x/Makefile.target |  1 +
>   2 files changed, 25 insertions(+)
>   create mode 100644 tests/tcg/s390x/long-double.c

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 9/9] target/s390x: Use Int128 for passing float128
  2022-10-21  7:30 ` [PATCH 9/9] target/s390x: Use Int128 for passing float128 Richard Henderson
@ 2022-10-24 18:01   ` Philippe Mathieu-Daudé
  2022-10-24 22:31     ` Richard Henderson
  2022-11-02  9:38   ` Ilya Leoshkevich
  1 sibling, 1 reply; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-24 18:01 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On 21/10/22 09:30, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/s390x/helper.h          | 32 ++++++-------
>   target/s390x/tcg/fpu_helper.c  | 88 ++++++++++++++--------------------
>   target/s390x/tcg/translate.c   | 76 ++++++++++++++++++++---------
>   target/s390x/tcg/insn-data.def | 30 ++++++------
>   4 files changed, 121 insertions(+), 105 deletions(-)

> diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
> index d1ffbb8710..8023bbab2f 100644
> --- a/target/s390x/tcg/translate.c
> +++ b/target/s390x/tcg/translate.c
> @@ -305,6 +305,18 @@ static TCGv_i64 load_freg32_i64(int reg)
>       return r;
>   }
>   
> +static TCGv_i128 load_freg_128(int reg)
> +{
> +    TCGv_i64 h = load_freg(reg);
> +    TCGv_i64 l = load_freg(reg + 2);
> +    TCGv_i128 r = tcg_temp_new_i128();

Maybe rename as load_freg_new_128() to make emphasis on the returned
TCGv need to be freed? Otherwise:

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

> +    tcg_gen_concat_i64_i128(r, l, h);
> +    tcg_temp_free_i64(h);
> +    tcg_temp_free_i64(l);
> +    return r;
> +}




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 9/9] target/s390x: Use Int128 for passing float128
  2022-10-24 18:01   ` Philippe Mathieu-Daudé
@ 2022-10-24 22:31     ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2022-10-24 22:31 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel; +Cc: qemu-s390x

On 10/25/22 04:01, Philippe Mathieu-Daudé wrote:
> On 21/10/22 09:30, Richard Henderson wrote:
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   target/s390x/helper.h          | 32 ++++++-------
>>   target/s390x/tcg/fpu_helper.c  | 88 ++++++++++++++--------------------
>>   target/s390x/tcg/translate.c   | 76 ++++++++++++++++++++---------
>>   target/s390x/tcg/insn-data.def | 30 ++++++------
>>   4 files changed, 121 insertions(+), 105 deletions(-)
> 
>> diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
>> index d1ffbb8710..8023bbab2f 100644
>> --- a/target/s390x/tcg/translate.c
>> +++ b/target/s390x/tcg/translate.c
>> @@ -305,6 +305,18 @@ static TCGv_i64 load_freg32_i64(int reg)
>>       return r;
>>   }
>> +static TCGv_i128 load_freg_128(int reg)
>> +{
>> +    TCGv_i64 h = load_freg(reg);
>> +    TCGv_i64 l = load_freg(reg + 2);
>> +    TCGv_i128 r = tcg_temp_new_i128();
> 
> Maybe rename as load_freg_new_128() to make emphasis on the returned
> TCGv need to be freed?

It's no different from the other load_freg* functions just above.  As with those, the 
result is assigned to one of the DisasOps slots, and all of those slots are freed at the 
end of each instruction.

r~


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 8/9] target/s390x: Use Int128 for returning float128
  2022-10-21  7:30 ` [PATCH 8/9] target/s390x: Use Int128 for returning float128 Richard Henderson
@ 2022-10-25  9:58   ` Philippe Mathieu-Daudé
  2022-10-27 11:18   ` Ilya Leoshkevich
  1 sibling, 0 replies; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-25  9:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On 21/10/22 09:30, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/s390x/helper.h          | 22 +++++++--------
>   target/s390x/tcg/fpu_helper.c  | 29 ++++++++++----------
>   target/s390x/tcg/translate.c   | 49 +++++++++++++++++++---------------
>   target/s390x/tcg/insn-data.def | 20 +++++++-------
>   4 files changed, 63 insertions(+), 57 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] target/s390x: Use a single return for helper_divs32/u32
  2022-10-21  7:29 ` [PATCH 1/9] target/s390x: Use a single return for helper_divs32/u32 Richard Henderson
  2022-10-21 11:25   ` Ilya Leoshkevich
@ 2022-10-25  9:59   ` Philippe Mathieu-Daudé
  2022-11-01 11:09   ` Ilya Leoshkevich
  2022-11-01 11:13   ` [PATCH] tests/tcg/s390x: Add div.c Ilya Leoshkevich
  3 siblings, 0 replies; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-25  9:59 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On 21/10/22 09:29, Richard Henderson wrote:
> Pack the quotient and remainder into a single uint64_t.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/s390x/helper.h         |  2 +-
>   target/s390x/tcg/int_helper.c | 26 +++++++++++++-------------
>   target/s390x/tcg/translate.c  | 10 ++++++----
>   3 files changed, 20 insertions(+), 18 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/9] target/s390x: Use a single return for helper_divs64/u64
  2022-10-21  7:29 ` [PATCH 2/9] target/s390x: Use a single return for helper_divs64/u64 Richard Henderson
  2022-10-24 13:16   ` Philippe Mathieu-Daudé
@ 2022-10-25 20:47   ` Ilya Leoshkevich
  1 sibling, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2022-10-25 20:47 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On Fri, Oct 21, 2022 at 05:29:59PM +1000, Richard Henderson wrote:
> Pack the quotient and remainder into a single Int128.
> Use the divu128 primitive to remove the cpu_abort on
> 32-bit hosts.
> 
> This is the first use of Int128 as a return value.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/9] target/s390x: Use Int128 for return from CLST
  2022-10-21  7:30 ` [PATCH 3/9] target/s390x: Use Int128 for return from CLST Richard Henderson
  2022-10-24 13:15   ` Philippe Mathieu-Daudé
@ 2022-10-25 21:22   ` Ilya Leoshkevich
  2022-10-25 21:30   ` [PATCH 0/1] " Ilya Leoshkevich
  2 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2022-10-25 21:22 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On Fri, Oct 21, 2022 at 05:30:00PM +1000, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/s390x/helper.h         |  2 +-
>  target/s390x/tcg/mem_helper.c | 11 ++++-------
>  target/s390x/tcg/translate.c  |  8 ++++++--
>  3 files changed, 11 insertions(+), 10 deletions(-)

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>

I wanted to make sure the ordering within a pair was right and wrote a
small test. Feel free to add it to the series:



From: Ilya Leoshkevich <iii@linux.ibm.com>
Subject: [PATCH] tests/tcg/s390x: Add clst.c

Add a basic test to prevent regressions.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 tests/tcg/s390x/Makefile.target |  1 +
 tests/tcg/s390x/clst.c          | 82 +++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)
 create mode 100644 tests/tcg/s390x/clst.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 627668e1ce9..ad2e34b1859 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -18,6 +18,7 @@ TESTS+=signals-s390x
 TESTS+=branch-relative-long
 TESTS+=noexec
 TESTS+=long-double
+TESTS+=clst
 
 Z14_TESTS=vfminmax
 vfminmax: LDFLAGS+=-lm
diff --git a/tests/tcg/s390x/clst.c b/tests/tcg/s390x/clst.c
new file mode 100644
index 00000000000..ed2fe7326c3
--- /dev/null
+++ b/tests/tcg/s390x/clst.c
@@ -0,0 +1,82 @@
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+
+static int clst(char sep, const char **s1, const char **s2)
+{
+    const char *r1 = *s1;
+    const char *r2 = *s2;
+    int cc;
+
+    do {
+        register int r0 asm("r0") = sep;
+
+        asm("clst %[r1],%[r2]\n"
+            "ipm %[cc]\n"
+            "srl %[cc],28"
+            : [r1] "+r" (r1), [r2] "+r" (r2), "+r" (r0), [cc] "=r" (cc)
+            :
+            : "cc");
+        *s1 = r1;
+        *s2 = r2;
+    } while (cc == 3);
+
+    return cc;
+}
+
+static const struct test {
+    const char *name;
+    char sep;
+    const char *s1;
+    const char *s2;
+    int exp_cc;
+    int exp_off;
+} tests[] = {
+    {
+        .name = "cc0",
+        .sep = 0,
+        .s1 = "aa",
+        .s2 = "aa",
+        .exp_cc = 0,
+        .exp_off = 0,
+    },
+    {
+        .name = "cc1",
+        .sep = 1,
+        .s1 = "a\x01",
+        .s2 = "aa\x01",
+        .exp_cc = 1,
+        .exp_off = 1,
+    },
+    {
+        .name = "cc2",
+        .sep = 2,
+        .s1 = "abc\x02",
+        .s2 = "abb\x02",
+        .exp_cc = 2,
+        .exp_off = 2,
+    },
+};
+
+int main(void)
+{
+    const struct test *t;
+    const char *s1, *s2;
+    size_t i;
+    int cc;
+
+    for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
+        t = &tests[i];
+        s1 = t->s1;
+        s2 = t->s2;
+        cc = clst(t->sep, &s1, &s2);
+        if (cc != t->exp_cc ||
+                s1 != t->s1 + t->exp_off ||
+                s2 != t->s2 + t->exp_off) {
+            fprintf(stderr, "%s\n", t->name);
+            return EXIT_FAILURE;
+        }
+    }
+
+    return EXIT_SUCCESS;
+}
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 0/1] Re: target/s390x: Use Int128 for return from CLST
  2022-10-21  7:30 ` [PATCH 3/9] target/s390x: Use Int128 for return from CLST Richard Henderson
  2022-10-24 13:15   ` Philippe Mathieu-Daudé
  2022-10-25 21:22   ` Ilya Leoshkevich
@ 2022-10-25 21:30   ` Ilya Leoshkevich
  2022-10-25 21:30     ` [PATCH 1/1] tests/tcg/s390x: Add clst.c Ilya Leoshkevich
  2 siblings, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2022-10-25 21:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Ilya Leoshkevich

I wanted to make sure that the pair elements were not mixed up and
wrote a small test. Feel free to add it to the series.

Ilya Leoshkevich (1):
  tests/tcg/s390x: Add clst.c

 tests/tcg/s390x/Makefile.target |  1 +
 tests/tcg/s390x/clst.c          | 82 +++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)
 create mode 100644 tests/tcg/s390x/clst.c

-- 
2.37.2



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/1] tests/tcg/s390x: Add clst.c
  2022-10-25 21:30   ` [PATCH 0/1] " Ilya Leoshkevich
@ 2022-10-25 21:30     ` Ilya Leoshkevich
  0 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2022-10-25 21:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Ilya Leoshkevich

Add a basic test to prevent regressions.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 tests/tcg/s390x/Makefile.target |  1 +
 tests/tcg/s390x/clst.c          | 82 +++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)
 create mode 100644 tests/tcg/s390x/clst.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 627668e1ce9..ad2e34b1859 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -18,6 +18,7 @@ TESTS+=signals-s390x
 TESTS+=branch-relative-long
 TESTS+=noexec
 TESTS+=long-double
+TESTS+=clst
 
 Z14_TESTS=vfminmax
 vfminmax: LDFLAGS+=-lm
diff --git a/tests/tcg/s390x/clst.c b/tests/tcg/s390x/clst.c
new file mode 100644
index 00000000000..ed2fe7326c3
--- /dev/null
+++ b/tests/tcg/s390x/clst.c
@@ -0,0 +1,82 @@
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+
+static int clst(char sep, const char **s1, const char **s2)
+{
+    const char *r1 = *s1;
+    const char *r2 = *s2;
+    int cc;
+
+    do {
+        register int r0 asm("r0") = sep;
+
+        asm("clst %[r1],%[r2]\n"
+            "ipm %[cc]\n"
+            "srl %[cc],28"
+            : [r1] "+r" (r1), [r2] "+r" (r2), "+r" (r0), [cc] "=r" (cc)
+            :
+            : "cc");
+        *s1 = r1;
+        *s2 = r2;
+    } while (cc == 3);
+
+    return cc;
+}
+
+static const struct test {
+    const char *name;
+    char sep;
+    const char *s1;
+    const char *s2;
+    int exp_cc;
+    int exp_off;
+} tests[] = {
+    {
+        .name = "cc0",
+        .sep = 0,
+        .s1 = "aa",
+        .s2 = "aa",
+        .exp_cc = 0,
+        .exp_off = 0,
+    },
+    {
+        .name = "cc1",
+        .sep = 1,
+        .s1 = "a\x01",
+        .s2 = "aa\x01",
+        .exp_cc = 1,
+        .exp_off = 1,
+    },
+    {
+        .name = "cc2",
+        .sep = 2,
+        .s1 = "abc\x02",
+        .s2 = "abb\x02",
+        .exp_cc = 2,
+        .exp_off = 2,
+    },
+};
+
+int main(void)
+{
+    const struct test *t;
+    const char *s1, *s2;
+    size_t i;
+    int cc;
+
+    for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
+        t = &tests[i];
+        s1 = t->s1;
+        s2 = t->s2;
+        cc = clst(t->sep, &s1, &s2);
+        if (cc != t->exp_cc ||
+                s1 != t->s1 + t->exp_off ||
+                s2 != t->s2 + t->exp_off) {
+            fprintf(stderr, "%s\n", t->name);
+            return EXIT_FAILURE;
+        }
+    }
+
+    return EXIT_SUCCESS;
+}
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/9] target/s390x: Use Int128 for return from CKSM
  2022-10-21  7:30 ` [PATCH 4/9] target/s390x: Use Int128 for return from CKSM Richard Henderson
  2022-10-24 13:17   ` Philippe Mathieu-Daudé
@ 2022-10-27 10:36   ` Ilya Leoshkevich
  1 sibling, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2022-10-27 10:36 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On Fri, Oct 21, 2022 at 05:30:01PM +1000, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/s390x/helper.h         | 2 +-
>  target/s390x/tcg/mem_helper.c | 7 +++----
>  target/s390x/tcg/translate.c  | 6 ++++--
>  3 files changed, 8 insertions(+), 7 deletions(-)

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/9] target/s390x: Use Int128 for return from TRE
  2022-10-21  7:30 ` [PATCH 5/9] target/s390x: Use Int128 for return from TRE Richard Henderson
  2022-10-24 13:17   ` Philippe Mathieu-Daudé
@ 2022-10-27 10:40   ` Ilya Leoshkevich
  1 sibling, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2022-10-27 10:40 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On Fri, Oct 21, 2022 at 05:30:02PM +1000, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/s390x/helper.h         | 2 +-
>  target/s390x/tcg/mem_helper.c | 7 +++----
>  target/s390x/tcg/translate.c  | 7 +++++--
>  3 files changed, 9 insertions(+), 7 deletions(-)

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 6/9] target/s390x: Copy wout_x1 to wout_x1_P
  2022-10-21  7:30 ` [PATCH 6/9] target/s390x: Copy wout_x1 to wout_x1_P Richard Henderson
@ 2022-10-27 10:57   ` Ilya Leoshkevich
  0 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2022-10-27 10:57 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On Fri, Oct 21, 2022 at 05:30:03PM +1000, Richard Henderson wrote:
> Make a copy of wout_x1 before modifying it, as wout_x1_P
> emphasizing that it operates on the out/out2 pair.  The insns
> that use x1_P are data movement that will not change to Int128.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/s390x/tcg/translate.c   |  8 ++++++++
>  target/s390x/tcg/insn-data.def | 12 ++++++------
>  2 files changed, 14 insertions(+), 6 deletions(-)

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/9] tests/tcg/s390x: Add long-double.c
  2022-10-21  7:30 ` [PATCH 7/9] tests/tcg/s390x: Add long-double.c Richard Henderson
  2022-10-24 13:19   ` Philippe Mathieu-Daudé
@ 2022-10-27 11:04   ` Ilya Leoshkevich
  1 sibling, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2022-10-27 11:04 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On Fri, Oct 21, 2022 at 05:30:04PM +1000, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tests/tcg/s390x/long-double.c   | 24 ++++++++++++++++++++++++
>  tests/tcg/s390x/Makefile.target |  1 +
>  2 files changed, 25 insertions(+)
>  create mode 100644 tests/tcg/s390x/long-double.c

It might be better to do this in asm in order to be sure that a
compiler doesn't perform any magic. But at least as of today gcc
generates all the "interesting" instructions from this code.

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 8/9] target/s390x: Use Int128 for returning float128
  2022-10-21  7:30 ` [PATCH 8/9] target/s390x: Use Int128 for returning float128 Richard Henderson
  2022-10-25  9:58   ` Philippe Mathieu-Daudé
@ 2022-10-27 11:18   ` Ilya Leoshkevich
  2022-10-27 11:31     ` Richard Henderson
  1 sibling, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2022-10-27 11:18 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On Fri, Oct 21, 2022 at 05:30:05PM +1000, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/s390x/helper.h          | 22 +++++++--------
>  target/s390x/tcg/fpu_helper.c  | 29 ++++++++++----------
>  target/s390x/tcg/translate.c   | 49 +++++++++++++++++++---------------
>  target/s390x/tcg/insn-data.def | 20 +++++++-------
>  4 files changed, 63 insertions(+), 57 deletions(-)
> 

> @@ -2032,7 +2031,7 @@ static DisasJumpType op_cxlgb(DisasContext *s, DisasOps *o)
>      if (!m34) {
>          return DISAS_NORETURN;
>      }
> -    gen_helper_cxlgb(o->out, cpu_env, o->in2, m34);
> +    gen_helper_cxlgb(o->out_128, cpu_env, o->in2, m34);
>      tcg_temp_free_i32(m34);
>      return_low128(o->out2);
>      return DISAS_NEXT;

Do we still need return_low128() here?

>  static DisasJumpType op_lxeb(DisasContext *s, DisasOps *o)
>  {
> -    gen_helper_lxeb(o->out, cpu_env, o->in2);
> +    gen_helper_lxeb(o->out_128, cpu_env, o->in2);
>      return_low128(o->out2);
>      return DISAS_NEXT;
>  }

Same question.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 8/9] target/s390x: Use Int128 for returning float128
  2022-10-27 11:18   ` Ilya Leoshkevich
@ 2022-10-27 11:31     ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2022-10-27 11:31 UTC (permalink / raw)
  To: Ilya Leoshkevich, qemu-devel; +Cc: qemu-s390x

On 10/27/22 21:18, Ilya Leoshkevich wrote:
> On Fri, Oct 21, 2022 at 05:30:05PM +1000, Richard Henderson wrote:
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   target/s390x/helper.h          | 22 +++++++--------
>>   target/s390x/tcg/fpu_helper.c  | 29 ++++++++++----------
>>   target/s390x/tcg/translate.c   | 49 +++++++++++++++++++---------------
>>   target/s390x/tcg/insn-data.def | 20 +++++++-------
>>   4 files changed, 63 insertions(+), 57 deletions(-)
>>
> 
>> @@ -2032,7 +2031,7 @@ static DisasJumpType op_cxlgb(DisasContext *s, DisasOps *o)
>>       if (!m34) {
>>           return DISAS_NORETURN;
>>       }
>> -    gen_helper_cxlgb(o->out, cpu_env, o->in2, m34);
>> +    gen_helper_cxlgb(o->out_128, cpu_env, o->in2, m34);
>>       tcg_temp_free_i32(m34);
>>       return_low128(o->out2);
>>       return DISAS_NEXT;
> 
> Do we still need return_low128() here?
> 
>>   static DisasJumpType op_lxeb(DisasContext *s, DisasOps *o)
>>   {
>> -    gen_helper_lxeb(o->out, cpu_env, o->in2);
>> +    gen_helper_lxeb(o->out_128, cpu_env, o->in2);
>>       return_low128(o->out2);
>>       return DISAS_NEXT;
>>   }
> 
> Same question.

Nope, that should have gone away.  Oops.


r~


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/9] target/s390x: Use a single return for helper_divs32/u32
  2022-10-21  7:29 ` [PATCH 1/9] target/s390x: Use a single return for helper_divs32/u32 Richard Henderson
  2022-10-21 11:25   ` Ilya Leoshkevich
  2022-10-25  9:59   ` Philippe Mathieu-Daudé
@ 2022-11-01 11:09   ` Ilya Leoshkevich
  2022-11-01 11:13   ` [PATCH] tests/tcg/s390x: Add div.c Ilya Leoshkevich
  3 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2022-11-01 11:09 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On Fri, Oct 21, 2022 at 05:29:58PM +1000, Richard Henderson wrote:
> Pack the quotient and remainder into a single uint64_t.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/s390x/helper.h         |  2 +-
>  target/s390x/tcg/int_helper.c | 26 +++++++++++++-------------
>  target/s390x/tcg/translate.c  | 10 ++++++----
>  3 files changed, 20 insertions(+), 18 deletions(-)
> 
> diff --git a/target/s390x/helper.h b/target/s390x/helper.h
> index bf33d86f74..97a9668eef 100644
> --- a/target/s390x/helper.h
> +++ b/target/s390x/helper.h
> @@ -10,7 +10,7 @@ DEF_HELPER_FLAGS_4(clc, TCG_CALL_NO_WG, i32, env, i32, i64, i64)
>  DEF_HELPER_3(mvcl, i32, env, i32, i32)
>  DEF_HELPER_3(clcl, i32, env, i32, i32)
>  DEF_HELPER_FLAGS_4(clm, TCG_CALL_NO_WG, i32, env, i32, i32, i64)
> -DEF_HELPER_FLAGS_3(divs32, TCG_CALL_NO_WG, s64, env, s64, s64)
> +DEF_HELPER_FLAGS_3(divs32, TCG_CALL_NO_WG, i64, env, s64, s64)
>  DEF_HELPER_FLAGS_3(divu32, TCG_CALL_NO_WG, i64, env, i64, i64)
>  DEF_HELPER_FLAGS_3(divs64, TCG_CALL_NO_WG, s64, env, s64, s64)
>  DEF_HELPER_FLAGS_4(divu64, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
> diff --git a/target/s390x/tcg/int_helper.c b/target/s390x/tcg/int_helper.c
> index 954542388a..130b8bd4d2 100644
> --- a/target/s390x/tcg/int_helper.c
> +++ b/target/s390x/tcg/int_helper.c
> @@ -34,45 +34,45 @@
>  #endif
>  
>  /* 64/32 -> 32 signed division */
> -int64_t HELPER(divs32)(CPUS390XState *env, int64_t a, int64_t b64)
> +uint64_t HELPER(divs32)(CPUS390XState *env, int64_t a, int64_t b64)
>  {
> -    int32_t ret, b = b64;
> -    int64_t q;
> +    int32_t b = b64;
> +    int64_t q, r;
>  
>      if (b == 0) {
>          tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
>      }
>  
> -    ret = q = a / b;
> -    env->retxl = a % b;
> +    q = a / b;
> +    r = a % b;
>  
>      /* Catch non-representable quotient.  */
> -    if (ret != q) {
> +    if (q != (int32_t)q) {
>          tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
>      }
>  
> -    return ret;
> +    return deposit64(r, 32, 32, q);
>  }
>  
>  /* 64/32 -> 32 unsigned division */
>  uint64_t HELPER(divu32)(CPUS390XState *env, uint64_t a, uint64_t b64)
>  {
> -    uint32_t ret, b = b64;
> -    uint64_t q;
> +    uint32_t b = b64;
> +    uint64_t q, r;
>  
>      if (b == 0) {
>          tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
>      }
>  
> -    ret = q = a / b;
> -    env->retxl = a % b;
> +    q = a / b;
> +    r = a % b;
>  
>      /* Catch non-representable quotient.  */
> -    if (ret != q) {
> +    if (q != (uint32_t)q) {
>          tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
>      }
>  
> -    return ret;
> +    return deposit64(r, 32, 32, q);
>  }
>  
>  /* 64/64 -> 64 signed division */
> diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
> index 1d2dddab1c..525317c9df 100644
> --- a/target/s390x/tcg/translate.c
> +++ b/target/s390x/tcg/translate.c
> @@ -2395,15 +2395,17 @@ static DisasJumpType op_diag(DisasContext *s, DisasOps *o)
>  
>  static DisasJumpType op_divs32(DisasContext *s, DisasOps *o)
>  {
> -    gen_helper_divs32(o->out2, cpu_env, o->in1, o->in2);
> -    return_low128(o->out);
> +    gen_helper_divs32(o->out, cpu_env, o->in1, o->in2);
> +    tcg_gen_ext32u_i64(o->out2, o->out);
> +    tcg_gen_shri_i64(o->out, o->out, 32);
>      return DISAS_NEXT;
>  }
>  
>  static DisasJumpType op_divu32(DisasContext *s, DisasOps *o)
>  {
> -    gen_helper_divu32(o->out2, cpu_env, o->in1, o->in2);
> -    return_low128(o->out);
> +    gen_helper_divu32(o->out, cpu_env, o->in1, o->in2);
> +    tcg_gen_ext32u_i64(o->out2, o->out);
> +    tcg_gen_shri_i64(o->out, o->out, 32);
>      return DISAS_NEXT;
>  }
>  
> -- 
> 2.34.1

Hi,

The wasmtime testsuite was failing and bisect pointed to this commit.
Apparently this needs a fixup:


--- a/target/s390x/tcg/int_helper.c
+++ b/target/s390x/tcg/int_helper.c
@@ -51,7 +51,7 @@ uint64_t HELPER(divs32)(CPUS390XState *env, int64_t a, int64_t b64)
         tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
     }
 
-    return deposit64(r, 32, 32, q);
+    return deposit64(q, 32, 32, r);
 }
 
 /* 64/32 -> 32 unsigned division */
@@ -72,7 +72,7 @@ uint64_t HELPER(divu32)(CPUS390XState *env, uint64_t a, uint64_t b64)
         tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
     }
 
-    return deposit64(r, 32, 32, q);
+    return deposit64(q, 32, 32, r);
 }
 
 /* 64/64 -> 64 signed division */


Currently we return out = r | (q << 32) here.
op_divu32 makes out2 = r, out = q from this.
Finally, r1_P32 stores r1 = q, r1+1 = r.
But DLR wants the opposite:

    The remainder is placed in bit
    positions 32-63 of general register R1, and the quo-
    tient is placed in bit positions 32-63 of general regis-
    ter R1 + 1.

Ditto DR.

Best regards,
Ilya


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] tests/tcg/s390x: Add div.c
  2022-10-21  7:29 ` [PATCH 1/9] target/s390x: Use a single return for helper_divs32/u32 Richard Henderson
                     ` (2 preceding siblings ...)
  2022-11-01 11:09   ` Ilya Leoshkevich
@ 2022-11-01 11:13   ` Ilya Leoshkevich
  3 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2022-11-01 11:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Ilya Leoshkevich

Add a basic test to prevent regressions.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 tests/tcg/s390x/Makefile.target |  1 +
 tests/tcg/s390x/div.c           | 40 +++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)
 create mode 100644 tests/tcg/s390x/div.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index c882db7a78a..f2ec7587387 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -26,6 +26,7 @@ TESTS+=branch-relative-long
 TESTS+=noexec
 TESTS+=clst
 TESTS+=long-double
+TESTS+=div
 
 Z13_TESTS=vistr
 $(Z13_TESTS): CFLAGS+=-march=z13 -O2
diff --git a/tests/tcg/s390x/div.c b/tests/tcg/s390x/div.c
new file mode 100644
index 00000000000..58072956147
--- /dev/null
+++ b/tests/tcg/s390x/div.c
@@ -0,0 +1,40 @@
+#include <assert.h>
+#include <stdint.h>
+
+static void test_dr(void)
+{
+    register int32_t r0 asm("r0") = -1;
+    register int32_t r1 asm("r1") = -4241;
+    int32_t b = 101, q, r;
+
+    asm("dr %[r0],%[b]"
+        : [r0] "+r" (r0), [r1] "+r" (r1)
+        : [b] "r" (b)
+        : "cc");
+    q = r1;
+    r = r0;
+    assert(q == -41);
+    assert(r == -100);
+}
+
+static void test_dlr(void)
+{
+    register uint32_t r0 asm("r0") = 0;
+    register uint32_t r1 asm("r1") = 4243;
+    uint32_t b = 101, q, r;
+
+    asm("dlr %[r0],%[b]"
+        : [r0] "+r" (r0), [r1] "+r" (r1)
+        : [b] "r" (b)
+        : "cc");
+    q = r1;
+    r = r0;
+    assert(q == 42);
+    assert(r == 1);
+}
+
+int main(void)
+{
+    test_dr();
+    test_dlr();
+}
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 9/9] target/s390x: Use Int128 for passing float128
  2022-10-21  7:30 ` [PATCH 9/9] target/s390x: Use Int128 for passing float128 Richard Henderson
  2022-10-24 18:01   ` Philippe Mathieu-Daudé
@ 2022-11-02  9:38   ` Ilya Leoshkevich
  2022-11-02  9:47     ` Richard Henderson
  1 sibling, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2022-11-02  9:38 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On Fri, Oct 21, 2022 at 05:30:06PM +1000, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/s390x/helper.h          | 32 ++++++-------
>  target/s390x/tcg/fpu_helper.c  | 88 ++++++++++++++--------------------
>  target/s390x/tcg/translate.c   | 76 ++++++++++++++++++++---------
>  target/s390x/tcg/insn-data.def | 30 ++++++------
>  4 files changed, 121 insertions(+), 105 deletions(-)
> 
> diff --git a/target/s390x/helper.h b/target/s390x/helper.h
> index 429131a85e..481b9019f9 100644
> --- a/target/s390x/helper.h
> +++ b/target/s390x/helper.h
> @@ -41,55 +41,55 @@ DEF_HELPER_4(csst, i32, env, i32, i64, i64)
>  DEF_HELPER_4(csst_parallel, i32, env, i32, i64, i64)
>  DEF_HELPER_FLAGS_3(aeb, TCG_CALL_NO_WG, i64, env, i64, i64)
>  DEF_HELPER_FLAGS_3(adb, TCG_CALL_NO_WG, i64, env, i64, i64)
> -DEF_HELPER_FLAGS_5(axb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
> +DEF_HELPER_FLAGS_3(axb, TCG_CALL_NO_WG, i128, env, i128, i128)
>  DEF_HELPER_FLAGS_3(seb, TCG_CALL_NO_WG, i64, env, i64, i64)
>  DEF_HELPER_FLAGS_3(sdb, TCG_CALL_NO_WG, i64, env, i64, i64)
> -DEF_HELPER_FLAGS_5(sxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
> +DEF_HELPER_FLAGS_3(sxb, TCG_CALL_NO_WG, i128, env, i128, i128)
>  DEF_HELPER_FLAGS_3(deb, TCG_CALL_NO_WG, i64, env, i64, i64)
>  DEF_HELPER_FLAGS_3(ddb, TCG_CALL_NO_WG, i64, env, i64, i64)
> -DEF_HELPER_FLAGS_5(dxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
> +DEF_HELPER_FLAGS_3(dxb, TCG_CALL_NO_WG, i128, env, i128, i128)
>  DEF_HELPER_FLAGS_3(meeb, TCG_CALL_NO_WG, i64, env, i64, i64)
>  DEF_HELPER_FLAGS_3(mdeb, TCG_CALL_NO_WG, i64, env, i64, i64)
>  DEF_HELPER_FLAGS_3(mdb, TCG_CALL_NO_WG, i64, env, i64, i64)
> -DEF_HELPER_FLAGS_5(mxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
> -DEF_HELPER_FLAGS_4(mxdb, TCG_CALL_NO_WG, i128, env, i64, i64, i64)
> +DEF_HELPER_FLAGS_3(mxb, TCG_CALL_NO_WG, i128, env, i128, i128)
> +DEF_HELPER_FLAGS_3(mxdb, TCG_CALL_NO_WG, i128, env, i128, i64)
>  DEF_HELPER_FLAGS_2(ldeb, TCG_CALL_NO_WG, i64, env, i64)
> -DEF_HELPER_FLAGS_4(ldxb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
> +DEF_HELPER_FLAGS_3(ldxb, TCG_CALL_NO_WG, i64, env, i128, i32)
>  DEF_HELPER_FLAGS_2(lxdb, TCG_CALL_NO_WG, i128, env, i64)
>  DEF_HELPER_FLAGS_2(lxeb, TCG_CALL_NO_WG, i128, env, i64)
>  DEF_HELPER_FLAGS_3(ledb, TCG_CALL_NO_WG, i64, env, i64, i32)
> -DEF_HELPER_FLAGS_4(lexb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
> +DEF_HELPER_FLAGS_3(lexb, TCG_CALL_NO_WG, i64, env, i128, i32)
>  DEF_HELPER_FLAGS_3(ceb, TCG_CALL_NO_WG_SE, i32, env, i64, i64)
>  DEF_HELPER_FLAGS_3(cdb, TCG_CALL_NO_WG_SE, i32, env, i64, i64)
> -DEF_HELPER_FLAGS_5(cxb, TCG_CALL_NO_WG_SE, i32, env, i64, i64, i64, i64)
> +DEF_HELPER_FLAGS_3(cxb, TCG_CALL_NO_WG_SE, i32, env, i128, i128)
>  DEF_HELPER_FLAGS_3(keb, TCG_CALL_NO_WG, i32, env, i64, i64)
>  DEF_HELPER_FLAGS_3(kdb, TCG_CALL_NO_WG, i32, env, i64, i64)
> -DEF_HELPER_FLAGS_5(kxb, TCG_CALL_NO_WG, i32, env, i64, i64, i64, i64)
> +DEF_HELPER_FLAGS_3(kxb, TCG_CALL_NO_WG, i32, env, i128, i128)
>  DEF_HELPER_3(cgeb, i64, env, i64, i32)
>  DEF_HELPER_3(cgdb, i64, env, i64, i32)
> -DEF_HELPER_4(cgxb, i64, env, i64, i64, i32)
> +DEF_HELPER_3(cgxb, i64, env, i128, i32)
>  DEF_HELPER_3(cfeb, i64, env, i64, i32)
>  DEF_HELPER_3(cfdb, i64, env, i64, i32)
> -DEF_HELPER_4(cfxb, i64, env, i64, i64, i32)
> +DEF_HELPER_3(cfxb, i64, env, i128, i32)
>  DEF_HELPER_3(clgeb, i64, env, i64, i32)
>  DEF_HELPER_3(clgdb, i64, env, i64, i32)
> -DEF_HELPER_4(clgxb, i64, env, i64, i64, i32)
> +DEF_HELPER_3(clgxb, i64, env, i128, i32)
>  DEF_HELPER_3(clfeb, i64, env, i64, i32)
>  DEF_HELPER_3(clfdb, i64, env, i64, i32)
> -DEF_HELPER_4(clfxb, i64, env, i64, i64, i32)
> +DEF_HELPER_3(clfxb, i64, env, i128, i32)
>  DEF_HELPER_FLAGS_3(fieb, TCG_CALL_NO_WG, i64, env, i64, i32)
>  DEF_HELPER_FLAGS_3(fidb, TCG_CALL_NO_WG, i64, env, i64, i32)
> -DEF_HELPER_FLAGS_4(fixb, TCG_CALL_NO_WG, i128, env, i64, i64, i32)
> +DEF_HELPER_FLAGS_3(fixb, TCG_CALL_NO_WG, i128, env, i128, i32)
>  DEF_HELPER_FLAGS_4(maeb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
>  DEF_HELPER_FLAGS_4(madb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
>  DEF_HELPER_FLAGS_4(mseb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
>  DEF_HELPER_FLAGS_4(msdb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
>  DEF_HELPER_FLAGS_3(tceb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
>  DEF_HELPER_FLAGS_3(tcdb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
> -DEF_HELPER_FLAGS_4(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64, i64)
> +DEF_HELPER_FLAGS_3(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i128, i64)
>  DEF_HELPER_FLAGS_2(sqeb, TCG_CALL_NO_WG, i64, env, i64)
>  DEF_HELPER_FLAGS_2(sqdb, TCG_CALL_NO_WG, i64, env, i64)
> -DEF_HELPER_FLAGS_3(sqxb, TCG_CALL_NO_WG, i128, env, i64, i64)
> +DEF_HELPER_FLAGS_2(sqxb, TCG_CALL_NO_WG, i128, env, i128)
>  DEF_HELPER_FLAGS_1(cvd, TCG_CALL_NO_RWG_SE, i64, s32)
>  DEF_HELPER_FLAGS_4(pack, TCG_CALL_NO_WG, void, env, i32, i64, i64)
>  DEF_HELPER_FLAGS_4(pka, TCG_CALL_NO_WG, void, env, i64, i64, i32)
> diff --git a/target/s390x/tcg/fpu_helper.c b/target/s390x/tcg/fpu_helper.c
> index a584794be6..5a322e3f87 100644
> --- a/target/s390x/tcg/fpu_helper.c
> +++ b/target/s390x/tcg/fpu_helper.c
> @@ -39,6 +39,11 @@ static inline Int128 RET128(float128 f)
>      return int128_make128(f.low, f.high);
>  }
>  
> +static inline float128 ARG128(Int128 i)
> +{
> +    return make_float128(int128_gethi(i), int128_getlo(i));
> +}
> +
>  uint8_t s390_softfloat_exc_to_ieee(unsigned int exc)
>  {
>      uint8_t s390_exc = 0;
> @@ -227,12 +232,9 @@ uint64_t HELPER(adb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
>  }
>  
>  /* 128-bit FP addition */
> -Int128 HELPER(axb)(CPUS390XState *env, uint64_t ah, uint64_t al,
> -                     uint64_t bh, uint64_t bl)
> +Int128 HELPER(axb)(CPUS390XState *env, Int128 a, Int128 b)
>  {
> -    float128 ret = float128_add(make_float128(ah, al),
> -                                make_float128(bh, bl),
> -                                &env->fpu_status);
> +    float128 ret = float128_add(ARG128(a), ARG128(b), &env->fpu_status);
>      handle_exceptions(env, false, GETPC());
>      return RET128(ret);
>  }
> @@ -254,12 +256,9 @@ uint64_t HELPER(sdb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
>  }
>  
>  /* 128-bit FP subtraction */
> -Int128 HELPER(sxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
> -                     uint64_t bh, uint64_t bl)
> +Int128 HELPER(sxb)(CPUS390XState *env, Int128 a, Int128 b)
>  {
> -    float128 ret = float128_sub(make_float128(ah, al),
> -                                make_float128(bh, bl),
> -                                &env->fpu_status);
> +    float128 ret = float128_sub(ARG128(a), ARG128(b), &env->fpu_status);
>      handle_exceptions(env, false, GETPC());
>      return RET128(ret);
>  }
> @@ -281,12 +280,9 @@ uint64_t HELPER(ddb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
>  }
>  
>  /* 128-bit FP division */
> -Int128 HELPER(dxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
> -                     uint64_t bh, uint64_t bl)
> +Int128 HELPER(dxb)(CPUS390XState *env, Int128 a, Int128 b)
>  {
> -    float128 ret = float128_div(make_float128(ah, al),
> -                                make_float128(bh, bl),
> -                                &env->fpu_status);
> +    float128 ret = float128_div(ARG128(a), ARG128(b), &env->fpu_status);
>      handle_exceptions(env, false, GETPC());
>      return RET128(ret);
>  }
> @@ -317,21 +313,18 @@ uint64_t HELPER(mdeb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
>  }
>  
>  /* 128-bit FP multiplication */
> -Int128 HELPER(mxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
> -                     uint64_t bh, uint64_t bl)
> +Int128 HELPER(mxb)(CPUS390XState *env, Int128 a, Int128 b)
>  {
> -    float128 ret = float128_mul(make_float128(ah, al),
> -                                make_float128(bh, bl),
> -                                &env->fpu_status);
> +    float128 ret = float128_mul(ARG128(a), ARG128(b), &env->fpu_status);
>      handle_exceptions(env, false, GETPC());
>      return RET128(ret);
>  }
>  
>  /* 128/64-bit FP multiplication */
> -Int128 HELPER(mxdb)(CPUS390XState *env, uint64_t ah, uint64_t al, uint64_t f2)
> +Int128 HELPER(mxdb)(CPUS390XState *env, Int128 a, uint64_t f2)
>  {
>      float128 ret = float64_to_float128(f2, &env->fpu_status);
> -    ret = float128_mul(make_float128(ah, al), ret, &env->fpu_status);
> +    ret = float128_mul(ARG128(a), ret, &env->fpu_status);
>      handle_exceptions(env, false, GETPC());
>      return RET128(ret);
>  }
> @@ -345,11 +338,10 @@ uint64_t HELPER(ldeb)(CPUS390XState *env, uint64_t f2)
>  }
>  
>  /* convert 128-bit float to 64-bit float */
> -uint64_t HELPER(ldxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
> -                      uint32_t m34)
> +uint64_t HELPER(ldxb)(CPUS390XState *env, Int128 a, uint32_t m34)
>  {
>      int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
> -    float64 ret = float128_to_float64(make_float128(ah, al), &env->fpu_status);
> +    float64 ret = float128_to_float64(ARG128(a), &env->fpu_status);
>  
>      s390_restore_bfp_rounding_mode(env, old_mode);
>      handle_exceptions(env, xxc_from_m34(m34), GETPC());
> @@ -384,11 +376,10 @@ uint64_t HELPER(ledb)(CPUS390XState *env, uint64_t f2, uint32_t m34)
>  }
>  
>  /* convert 128-bit float to 32-bit float */
> -uint64_t HELPER(lexb)(CPUS390XState *env, uint64_t ah, uint64_t al,
> -                      uint32_t m34)
> +uint64_t HELPER(lexb)(CPUS390XState *env, Int128 a, uint32_t m34)
>  {
>      int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
> -    float32 ret = float128_to_float32(make_float128(ah, al), &env->fpu_status);
> +    float32 ret = float128_to_float32(ARG128(a), &env->fpu_status);
>  
>      s390_restore_bfp_rounding_mode(env, old_mode);
>      handle_exceptions(env, xxc_from_m34(m34), GETPC());
> @@ -412,11 +403,9 @@ uint32_t HELPER(cdb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
>  }
>  
>  /* 128-bit FP compare */
> -uint32_t HELPER(cxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
> -                     uint64_t bh, uint64_t bl)
> +uint32_t HELPER(cxb)(CPUS390XState *env, Int128 a, Int128 b)
>  {
> -    FloatRelation cmp = float128_compare_quiet(make_float128(ah, al),
> -                                               make_float128(bh, bl),
> +    FloatRelation cmp = float128_compare_quiet(ARG128(a), ARG128(b),
>                                                 &env->fpu_status);
>      handle_exceptions(env, false, GETPC());
>      return float_comp_to_cc(env, cmp);
> @@ -564,10 +553,10 @@ uint64_t HELPER(cgdb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
>  }
>  
>  /* convert 128-bit float to 64-bit int */
> -uint64_t HELPER(cgxb)(CPUS390XState *env, uint64_t h, uint64_t l, uint32_t m34)
> +uint64_t HELPER(cgxb)(CPUS390XState *env, Int128 i2, uint32_t m34)
>  {
>      int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
> -    float128 v2 = make_float128(h, l);
> +    float128 v2 = ARG128(i2);
>      int64_t ret = float128_to_int64(v2, &env->fpu_status);
>      uint32_t cc = set_cc_conv_f128(v2, &env->fpu_status);
>  
> @@ -613,10 +602,10 @@ uint64_t HELPER(cfdb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
>  }
>  
>  /* convert 128-bit float to 32-bit int */
> -uint64_t HELPER(cfxb)(CPUS390XState *env, uint64_t h, uint64_t l, uint32_t m34)
> +uint64_t HELPER(cfxb)(CPUS390XState *env, Int128 i2, uint32_t m34)
>  {
>      int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
> -    float128 v2 = make_float128(h, l);
> +    float128 v2 = ARG128(i2);
>      int32_t ret = float128_to_int32(v2, &env->fpu_status);
>      uint32_t cc = set_cc_conv_f128(v2, &env->fpu_status);
>  
> @@ -662,10 +651,10 @@ uint64_t HELPER(clgdb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
>  }
>  
>  /* convert 128-bit float to 64-bit uint */
> -uint64_t HELPER(clgxb)(CPUS390XState *env, uint64_t h, uint64_t l, uint32_t m34)
> +uint64_t HELPER(clgxb)(CPUS390XState *env, Int128 i2, uint32_t m34)
>  {
>      int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
> -    float128 v2 = make_float128(h, l);
> +    float128 v2 = ARG128(i2);
>      uint64_t ret = float128_to_uint64(v2, &env->fpu_status);
>      uint32_t cc = set_cc_conv_f128(v2, &env->fpu_status);
>  
> @@ -711,10 +700,10 @@ uint64_t HELPER(clfdb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
>  }
>  
>  /* convert 128-bit float to 32-bit uint */
> -uint64_t HELPER(clfxb)(CPUS390XState *env, uint64_t h, uint64_t l, uint32_t m34)
> +uint64_t HELPER(clfxb)(CPUS390XState *env, Int128 i2, uint32_t m34)
>  {
>      int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
> -    float128 v2 = make_float128(h, l);
> +    float128 v2 = ARG128(i2);
>      uint32_t ret = float128_to_uint32(v2, &env->fpu_status);
>      uint32_t cc = set_cc_conv_f128(v2, &env->fpu_status);
>  
> @@ -750,11 +739,10 @@ uint64_t HELPER(fidb)(CPUS390XState *env, uint64_t f2, uint32_t m34)
>  }
>  
>  /* round to integer 128-bit */
> -Int128 HELPER(fixb)(CPUS390XState *env, uint64_t ah, uint64_t al, uint32_t m34)
> +Int128 HELPER(fixb)(CPUS390XState *env, Int128 a, uint32_t m34)
>  {
>      int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
> -    float128 ret = float128_round_to_int(make_float128(ah, al),
> -                                         &env->fpu_status);
> +    float128 ret = float128_round_to_int(ARG128(a), &env->fpu_status);
>  
>      s390_restore_bfp_rounding_mode(env, old_mode);
>      handle_exceptions(env, xxc_from_m34(m34), GETPC());
> @@ -778,11 +766,9 @@ uint32_t HELPER(kdb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
>  }
>  
>  /* 128-bit FP compare and signal */
> -uint32_t HELPER(kxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
> -                     uint64_t bh, uint64_t bl)
> +uint32_t HELPER(kxb)(CPUS390XState *env, Int128 a, Int128 b)
>  {
> -    FloatRelation cmp = float128_compare(make_float128(ah, al),
> -                                         make_float128(bh, bl),
> +    FloatRelation cmp = float128_compare(ARG128(a), ARG128(b),
>                                           &env->fpu_status);
>      handle_exceptions(env, false, GETPC());
>      return float_comp_to_cc(env, cmp);
> @@ -869,9 +855,9 @@ uint32_t HELPER(tcdb)(CPUS390XState *env, uint64_t v1, uint64_t m2)
>  }
>  
>  /* test data class 128-bit */
> -uint32_t HELPER(tcxb)(CPUS390XState *env, uint64_t ah, uint64_t al, uint64_t m2)
> +uint32_t HELPER(tcxb)(CPUS390XState *env, Int128 a, uint64_t m2)
>  {
> -    return (m2 & float128_dcmask(env, make_float128(ah, al))) != 0;
> +    return (m2 & float128_dcmask(env, ARG128(a))) != 0;
>  }
>  
>  /* square root 32-bit */
> @@ -891,9 +877,9 @@ uint64_t HELPER(sqdb)(CPUS390XState *env, uint64_t f2)
>  }
>  
>  /* square root 128-bit */
> -Int128 HELPER(sqxb)(CPUS390XState *env, uint64_t ah, uint64_t al)
> +Int128 HELPER(sqxb)(CPUS390XState *env, Int128 a)
>  {
> -    float128 ret = float128_sqrt(make_float128(ah, al), &env->fpu_status);
> +    float128 ret = float128_sqrt(ARG128(a), &env->fpu_status);
>      handle_exceptions(env, false, GETPC());
>      return RET128(ret);
>  }
> diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
> index d1ffbb8710..8023bbab2f 100644
> --- a/target/s390x/tcg/translate.c
> +++ b/target/s390x/tcg/translate.c
> @@ -305,6 +305,18 @@ static TCGv_i64 load_freg32_i64(int reg)
>      return r;
>  }
>  
> +static TCGv_i128 load_freg_128(int reg)
> +{
> +    TCGv_i64 h = load_freg(reg);
> +    TCGv_i64 l = load_freg(reg + 2);
> +    TCGv_i128 r = tcg_temp_new_i128();
> +
> +    tcg_gen_concat_i64_i128(r, l, h);
> +    tcg_temp_free_i64(h);
> +    tcg_temp_free_i64(l);
> +    return r;
> +}
> +
>  static void store_reg(int reg, TCGv_i64 v)
>  {
>      tcg_gen_mov_i64(regs[reg], v);
> @@ -1103,7 +1115,7 @@ typedef struct {
>      bool g_out, g_out2, g_in1, g_in2;
>      TCGv_i64 out, out2, in1, in2;
>      TCGv_i64 addr1;
> -    TCGv_i128 out_128;
> +    TCGv_i128 out_128, in1_128, in2_128;
>  } DisasOps;
>  
>  /* Instructions can place constraints on their operands, raising specification
> @@ -1462,7 +1474,7 @@ static DisasJumpType op_adb(DisasContext *s, DisasOps *o)
>  
>  static DisasJumpType op_axb(DisasContext *s, DisasOps *o)
>  {
> -    gen_helper_axb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
> +    gen_helper_axb(o->out_128, cpu_env, o->in1_128, o->in2_128);
>      return DISAS_NEXT;
>  }
>  
> @@ -1778,7 +1790,7 @@ static DisasJumpType op_cdb(DisasContext *s, DisasOps *o)
>  
>  static DisasJumpType op_cxb(DisasContext *s, DisasOps *o)
>  {
> -    gen_helper_cxb(cc_op, cpu_env, o->out, o->out2, o->in1, o->in2);
> +    gen_helper_cxb(cc_op, cpu_env, o->in1_128, o->in2_128);
>      set_cc_static(s);
>      return DISAS_NEXT;
>  }
> @@ -1841,7 +1853,7 @@ static DisasJumpType op_cfxb(DisasContext *s, DisasOps *o)
>      if (!m34) {
>          return DISAS_NORETURN;
>      }
> -    gen_helper_cfxb(o->out, cpu_env, o->in1, o->in2, m34);
> +    gen_helper_cfxb(o->out, cpu_env, o->in2_128, m34);
>      tcg_temp_free_i32(m34);
>      set_cc_static(s);
>      return DISAS_NEXT;
> @@ -1880,7 +1892,7 @@ static DisasJumpType op_cgxb(DisasContext *s, DisasOps *o)
>      if (!m34) {
>          return DISAS_NORETURN;
>      }
> -    gen_helper_cgxb(o->out, cpu_env, o->in1, o->in2, m34);
> +    gen_helper_cgxb(o->out, cpu_env, o->in2_128, m34);
>      tcg_temp_free_i32(m34);
>      set_cc_static(s);
>      return DISAS_NEXT;
> @@ -1919,7 +1931,7 @@ static DisasJumpType op_clfxb(DisasContext *s, DisasOps *o)
>      if (!m34) {
>          return DISAS_NORETURN;
>      }
> -    gen_helper_clfxb(o->out, cpu_env, o->in1, o->in2, m34);
> +    gen_helper_clfxb(o->out, cpu_env, o->in2_128, m34);
>      tcg_temp_free_i32(m34);
>      set_cc_static(s);
>      return DISAS_NEXT;
> @@ -1958,7 +1970,7 @@ static DisasJumpType op_clgxb(DisasContext *s, DisasOps *o)
>      if (!m34) {
>          return DISAS_NORETURN;
>      }
> -    gen_helper_clgxb(o->out, cpu_env, o->in1, o->in2, m34);
> +    gen_helper_clgxb(o->out, cpu_env, o->in2_128, m34);
>      tcg_temp_free_i32(m34);
>      set_cc_static(s);
>      return DISAS_NEXT;
> @@ -2448,7 +2460,7 @@ static DisasJumpType op_ddb(DisasContext *s, DisasOps *o)
>  
>  static DisasJumpType op_dxb(DisasContext *s, DisasOps *o)
>  {
> -    gen_helper_dxb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
> +    gen_helper_dxb(o->out_128, cpu_env, o->in1_128, o->in2_128);
>      return DISAS_NEXT;
>  }
>  
> @@ -2553,7 +2565,7 @@ static DisasJumpType op_fixb(DisasContext *s, DisasOps *o)
>      if (!m34) {
>          return DISAS_NORETURN;
>      }
> -    gen_helper_fixb(o->out_128, cpu_env, o->in1, o->in2, m34);
> +    gen_helper_fixb(o->out_128, cpu_env, o->in2_128, m34);
>      tcg_temp_free_i32(m34);
>      return DISAS_NEXT;
>  }
> @@ -2772,7 +2784,7 @@ static DisasJumpType op_kdb(DisasContext *s, DisasOps *o)
>  
>  static DisasJumpType op_kxb(DisasContext *s, DisasOps *o)
>  {
> -    gen_helper_kxb(cc_op, cpu_env, o->out, o->out2, o->in1, o->in2);
> +    gen_helper_kxb(cc_op, cpu_env, o->in1_128, o->in2_128);
>      set_cc_static(s);
>      return DISAS_NEXT;
>  }
> @@ -2846,7 +2858,7 @@ static DisasJumpType op_ldxb(DisasContext *s, DisasOps *o)
>      if (!m34) {
>          return DISAS_NORETURN;
>      }
> -    gen_helper_ldxb(o->out, cpu_env, o->in1, o->in2, m34);
> +    gen_helper_ldxb(o->out, cpu_env, o->in2_128, m34);
>      tcg_temp_free_i32(m34);
>      return DISAS_NEXT;
>  }
> @@ -2858,7 +2870,7 @@ static DisasJumpType op_lexb(DisasContext *s, DisasOps *o)
>      if (!m34) {
>          return DISAS_NORETURN;
>      }
> -    gen_helper_lexb(o->out, cpu_env, o->in1, o->in2, m34);
> +    gen_helper_lexb(o->out, cpu_env, o->in2_128, m34);
>      tcg_temp_free_i32(m34);
>      return DISAS_NEXT;
>  }
> @@ -3586,13 +3598,13 @@ static DisasJumpType op_mdb(DisasContext *s, DisasOps *o)
>  
>  static DisasJumpType op_mxb(DisasContext *s, DisasOps *o)
>  {
> -    gen_helper_mxb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
> +    gen_helper_mxb(o->out_128, cpu_env, o->in1_128, o->in2_128);
>      return DISAS_NEXT;
>  }
>  
>  static DisasJumpType op_mxdb(DisasContext *s, DisasOps *o)
>  {
> -    gen_helper_mxdb(o->out_128, cpu_env, o->out, o->out2, o->in2);
> +    gen_helper_mxdb(o->out_128, cpu_env, o->in1_128, o->in2);
>      return DISAS_NEXT;
>  }
>  
> @@ -4057,7 +4069,7 @@ static DisasJumpType op_sdb(DisasContext *s, DisasOps *o)
>  
>  static DisasJumpType op_sxb(DisasContext *s, DisasOps *o)
>  {
> -    gen_helper_sxb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
> +    gen_helper_sxb(o->out_128, cpu_env, o->in1_128, o->in2_128);
>      return DISAS_NEXT;
>  }
>  
> @@ -4075,7 +4087,7 @@ static DisasJumpType op_sqdb(DisasContext *s, DisasOps *o)
>  
>  static DisasJumpType op_sqxb(DisasContext *s, DisasOps *o)
>  {
> -    gen_helper_sqxb(o->out_128, cpu_env, o->in1, o->in2);
> +    gen_helper_sqxb(o->out_128, cpu_env, o->in2_128);
>      return DISAS_NEXT;
>  }
>  
> @@ -4854,7 +4866,7 @@ static DisasJumpType op_tcdb(DisasContext *s, DisasOps *o)
>  
>  static DisasJumpType op_tcxb(DisasContext *s, DisasOps *o)
>  {
> -    gen_helper_tcxb(cc_op, cpu_env, o->out, o->out2, o->in2);
> +    gen_helper_tcxb(cc_op, cpu_env, o->in1_128, o->in2);
>      set_cc_static(s);
>      return DISAS_NEXT;
>  }
> @@ -5389,8 +5401,6 @@ static void prep_new_P(DisasContext *s, DisasOps *o)
>  
>  static void prep_new_x(DisasContext *s, DisasOps *o)
>  {
> -    o->out = tcg_temp_new_i64();
> -    o->out2 = tcg_temp_new_i64();
>      o->out_128 = tcg_temp_new_i128();
>  }
>  #define SPEC_prep_new_x 0
> @@ -5413,10 +5423,7 @@ static void prep_r1_P(DisasContext *s, DisasOps *o)
>  
>  static void prep_x1(DisasContext *s, DisasOps *o)
>  {
> -    o->out = load_freg(get_field(s, r1));
> -    o->out2 = load_freg(get_field(s, r1) + 2);
> -    o->out_128 = tcg_temp_new_i128();
> -    tcg_gen_concat_i64_i128(o->out_128, o->out2, o->out);
> +    o->out_128 = load_freg_128(get_field(s, r1));
>  }
>  #define SPEC_prep_x1 SPEC_r1_f128
>  
> @@ -5515,6 +5522,11 @@ static void wout_x1(DisasContext *s, DisasOps *o)
>  {
>      int f1 = get_field(s, r1);
>  
> +    /* Split out_128 into out+out2 for cout_f128. */
> +    tcg_debug_assert(o->out == NULL);
> +    o->out = tcg_temp_new_i64();
> +    o->out2 = tcg_temp_new_i64();
> +
>      tcg_gen_extr_i128_i64(o->out2, o->out, o->out_128);
>      store_freg(f1, o->out);
>      store_freg(f1 + 2, o->out2);
> @@ -5757,6 +5769,12 @@ static void in1_f1(DisasContext *s, DisasOps *o)
>  }
>  #define SPEC_in1_f1 0
>  
> +static void in1_x1(DisasContext *s, DisasOps *o)
> +{
> +    o->in1_128 = load_freg_128(get_field(s, r1));
> +}
> +#define SPEC_in1_x1 SPEC_r2_f128
> +
>  /* Load the high double word of an extended (128-bit) format FP number */
>  static void in1_x2h(DisasContext *s, DisasOps *o)
>  {
> @@ -5966,6 +5984,12 @@ static void in2_f2(DisasContext *s, DisasOps *o)
>  }
>  #define SPEC_in2_f2 0
>  
> +static void in2_x2(DisasContext *s, DisasOps *o)
> +{
> +    o->in2_128 = load_freg_128(get_field(s, r2));
> +}
> +#define SPEC_in2_x2 SPEC_r2_f128
> +
>  /* Load the low double word of an extended (128-bit) format FP number */
>  static void in2_x2l(DisasContext *s, DisasOps *o)
>  {
> @@ -6588,6 +6612,12 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s)
>      if (o.out_128) {
>          tcg_temp_free_i128(o.out_128);
>      }
> +    if (o.in1_128) {
> +        tcg_temp_free_i128(o.in1_128);
> +    }
> +    if (o.in2_128) {
> +        tcg_temp_free_i128(o.in2_128);
> +    }
>      /* io should be the last instruction in tb when icount is enabled */
>      if (unlikely(icount && ret == DISAS_NEXT)) {
>          ret = DISAS_TOO_MANY;
> diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
> index 20bf20c766..26523746d6 100644
> --- a/target/s390x/tcg/insn-data.def
> +++ b/target/s390x/tcg/insn-data.def
> @@ -34,7 +34,7 @@
>      C(0xe318, AGF,     RXY_a, Z,   r1, m2_32s, r1, 0, add, adds64)
>      F(0xb30a, AEBR,    RRE,   Z,   e1, e2, new, e1, aeb, f32, IF_BFP)
>      F(0xb31a, ADBR,    RRE,   Z,   f1, f2, new, f1, adb, f64, IF_BFP)
> -    F(0xb34a, AXBR,    RRE,   Z,   x2h, x2l, x1, x1, axb, f128, IF_BFP)
> +    F(0xb34a, AXBR,    RRE,   Z,   x1, x2, new_x, x1, axb, f128, IF_BFP)
>      F(0xed0a, AEB,     RXE,   Z,   e1, m2_32u, new, e1, aeb, f32, IF_BFP)
>      F(0xed1a, ADB,     RXE,   Z,   f1, m2_64, new, f1, adb, f64, IF_BFP)
>  /* ADD HIGH */
> @@ -172,13 +172,13 @@
>      C(0xe330, CGF,     RXY_a, Z,   r1_o, m2_32s, 0, 0, 0, cmps64)
>      F(0xb309, CEBR,    RRE,   Z,   e1, e2, 0, 0, ceb, 0, IF_BFP)
>      F(0xb319, CDBR,    RRE,   Z,   f1, f2, 0, 0, cdb, 0, IF_BFP)
> -    F(0xb349, CXBR,    RRE,   Z,   x2h, x2l, x1, 0, cxb, 0, IF_BFP)
> +    F(0xb349, CXBR,    RRE,   Z,   x1, x2, 0, 0, cxb, 0, IF_BFP)
>      F(0xed09, CEB,     RXE,   Z,   e1, m2_32u, 0, 0, ceb, 0, IF_BFP)
>      F(0xed19, CDB,     RXE,   Z,   f1, m2_64, 0, 0, cdb, 0, IF_BFP)
>  /* COMPARE AND SIGNAL */
>      F(0xb308, KEBR,    RRE,   Z,   e1, e2, 0, 0, keb, 0, IF_BFP)
>      F(0xb318, KDBR,    RRE,   Z,   f1, f2, 0, 0, kdb, 0, IF_BFP)
> -    F(0xb348, KXBR,    RRE,   Z,   x2h, x2l, x1, 0, kxb, 0, IF_BFP)
> +    F(0xb348, KXBR,    RRE,   Z,   x1, x2, 0, 0, kxb, 0, IF_BFP)
>      F(0xed08, KEB,     RXE,   Z,   e1, m2_32u, 0, 0, keb, 0, IF_BFP)
>      F(0xed18, KDB,     RXE,   Z,   f1, m2_64, 0, 0, kdb, 0, IF_BFP)
>  /* COMPARE IMMEDIATE */
> @@ -299,10 +299,10 @@
>  /* CONVERT TO FIXED */
>      F(0xb398, CFEBR,   RRF_e, Z,   0, e2, new, r1_32, cfeb, 0, IF_BFP)
>      F(0xb399, CFDBR,   RRF_e, Z,   0, f2, new, r1_32, cfdb, 0, IF_BFP)
> -    F(0xb39a, CFXBR,   RRF_e, Z,   x2h, x2l, new, r1_32, cfxb, 0, IF_BFP)
> +    F(0xb39a, CFXBR,   RRF_e, Z,   0, x2, new, r1_32, cfxb, 0, IF_BFP)
>      F(0xb3a8, CGEBR,   RRF_e, Z,   0, e2, r1, 0, cgeb, 0, IF_BFP)
>      F(0xb3a9, CGDBR,   RRF_e, Z,   0, f2, r1, 0, cgdb, 0, IF_BFP)
> -    F(0xb3aa, CGXBR,   RRF_e, Z,   x2h, x2l, r1, 0, cgxb, 0, IF_BFP)
> +    F(0xb3aa, CGXBR,   RRF_e, Z,   0, x2, r1, 0, cgxb, 0, IF_BFP)
>  /* CONVERT FROM FIXED */
>      F(0xb394, CEFBR,   RRF_e, Z,   0, r2_32s, new, e1, cegb, 0, IF_BFP)
>      F(0xb395, CDFBR,   RRF_e, Z,   0, r2_32s, new, f1, cdgb, 0, IF_BFP)
> @@ -313,10 +313,10 @@
>  /* CONVERT TO LOGICAL */
>      F(0xb39c, CLFEBR,  RRF_e, FPE, 0, e2, new, r1_32, clfeb, 0, IF_BFP)
>      F(0xb39d, CLFDBR,  RRF_e, FPE, 0, f2, new, r1_32, clfdb, 0, IF_BFP)
> -    F(0xb39e, CLFXBR,  RRF_e, FPE, x2h, x2l, new, r1_32, clfxb, 0, IF_BFP)
> +    F(0xb39e, CLFXBR,  RRF_e, FPE, 0, x2, new, r1_32, clfxb, 0, IF_BFP)
>      F(0xb3ac, CLGEBR,  RRF_e, FPE, 0, e2, r1, 0, clgeb, 0, IF_BFP)
>      F(0xb3ad, CLGDBR,  RRF_e, FPE, 0, f2, r1, 0, clgdb, 0, IF_BFP)
> -    F(0xb3ae, CLGXBR,  RRF_e, FPE, x2h, x2l, r1, 0, clgxb, 0, IF_BFP)
> +    F(0xb3ae, CLGXBR,  RRF_e, FPE, 0, x2, r1, 0, clgxb, 0, IF_BFP)
>  /* CONVERT FROM LOGICAL */
>      F(0xb390, CELFBR,  RRF_e, FPE, 0, r2_32u, new, e1, celgb, 0, IF_BFP)
>      F(0xb391, CDLFBR,  RRF_e, FPE, 0, r2_32u, new, f1, cdlgb, 0, IF_BFP)
> @@ -343,7 +343,7 @@
>      C(0x5d00, D,       RX_a,  Z,   r1_D32, m2_32s, new_P, r1_P32, divs32, 0)
>      F(0xb30d, DEBR,    RRE,   Z,   e1, e2, new, e1, deb, 0, IF_BFP)
>      F(0xb31d, DDBR,    RRE,   Z,   f1, f2, new, f1, ddb, 0, IF_BFP)
> -    F(0xb34d, DXBR,    RRE,   Z,   x2h, x2l, x1, x1, dxb, 0, IF_BFP)
> +    F(0xb34d, DXBR,    RRE,   Z,   x1, x2, new_x, x1, dxb, 0, IF_BFP)
>      F(0xed0d, DEB,     RXE,   Z,   e1, m2_32u, new, e1, deb, 0, IF_BFP)
>      F(0xed1d, DDB,     RXE,   Z,   f1, m2_64, new, f1, ddb, 0, IF_BFP)
>  /* DIVIDE LOGICAL */
> @@ -597,7 +597,7 @@
>  /* LOAD FP INTEGER */
>      F(0xb357, FIEBR,   RRF_e, Z,   0, e2, new, e1, fieb, 0, IF_BFP)
>      F(0xb35f, FIDBR,   RRF_e, Z,   0, f2, new, f1, fidb, 0, IF_BFP)
> -    F(0xb347, FIXBR,   RRF_e, Z,   x2h, x2l, new_x, x1, fixb, 0, IF_BFP)
> +    F(0xb347, FIXBR,   RRF_e, Z,   0, x2, new_x, x1, fixb, 0, IF_BFP)
>  
>  /* LOAD LENGTHENED */
>      F(0xb304, LDEBR,   RRE,   Z,   0, e2, new, f1, ldeb, 0, IF_BFP)
> @@ -610,8 +610,8 @@
>      F(0xed24, LDE,     RXE,   Z,   0, m2_32u, new, f1, lde, 0, IF_AFP1)
>  /* LOAD ROUNDED */
>      F(0xb344, LEDBR,   RRF_e, Z,   0, f2, new, e1, ledb, 0, IF_BFP)
> -    F(0xb345, LDXBR,   RRF_e, Z,   x2h, x2l, new, f1, ldxb, 0, IF_BFP)
> -    F(0xb346, LEXBR,   RRF_e, Z,   x2h, x2l, new, e1, lexb, 0, IF_BFP)
> +    F(0xb345, LDXBR,   RRF_e, Z,   0, x2, new, f1, ldxb, 0, IF_BFP)
> +    F(0xb346, LEXBR,   RRF_e, Z,   0, x2, new, e1, lexb, 0, IF_BFP)
>  
>  /* LOAD MULTIPLE */
>      C(0x9800, LM,      RS_a,  Z,   0, a2, 0, 0, lm32, 0)
> @@ -666,7 +666,7 @@
>      C(0xe384, MG,      RXY_a, MIE2,r1p1_o, m2_64, r1_P, 0, muls128, 0)
>      F(0xb317, MEEBR,   RRE,   Z,   e1, e2, new, e1, meeb, 0, IF_BFP)
>      F(0xb31c, MDBR,    RRE,   Z,   f1, f2, new, f1, mdb, 0, IF_BFP)
> -    F(0xb34c, MXBR,    RRE,   Z,   x2h, x2l, x1, x1, mxb, 0, IF_BFP)
> +    F(0xb34c, MXBR,    RRE,   Z,   x1, x2, new_x, x1, mxb, 0, IF_BFP)
>      F(0xb30c, MDEBR,   RRE,   Z,   f1, e2, new, f1, mdeb, 0, IF_BFP)
>      F(0xb307, MXDBR,   RRE,   Z,   0, f2, x1, x1, mxdb, 0, IF_BFP)
>      F(0xed17, MEEB,    RXE,   Z,   e1, m2_32u, new, e1, meeb, 0, IF_BFP)
> @@ -835,7 +835,7 @@
>  /* SQUARE ROOT */
>      F(0xb314, SQEBR,   RRE,   Z,   0, e2, new, e1, sqeb, 0, IF_BFP)
>      F(0xb315, SQDBR,   RRE,   Z,   0, f2, new, f1, sqdb, 0, IF_BFP)
> -    F(0xb316, SQXBR,   RRE,   Z,   x2h, x2l, new_x, x1, sqxb, 0, IF_BFP)
> +    F(0xb316, SQXBR,   RRE,   Z,   0, x2, new_x, x1, sqxb, 0, IF_BFP)
>      F(0xed14, SQEB,    RXE,   Z,   0, m2_32u, new, e1, sqeb, 0, IF_BFP)
>      F(0xed15, SQDB,    RXE,   Z,   0, m2_64, new, f1, sqdb, 0, IF_BFP)
>  
> @@ -913,7 +913,7 @@
>      C(0xe319, SGF,     RXY_a, Z,   r1, m2_32s, r1, 0, sub, subs64)
>      F(0xb30b, SEBR,    RRE,   Z,   e1, e2, new, e1, seb, f32, IF_BFP)
>      F(0xb31b, SDBR,    RRE,   Z,   f1, f2, new, f1, sdb, f64, IF_BFP)
> -    F(0xb34b, SXBR,    RRE,   Z,   x2h, x2l, x1, x1, sxb, f128, IF_BFP)
> +    F(0xb34b, SXBR,    RRE,   Z,   x1, x2, new_x, x1, sxb, f128, IF_BFP)
>      F(0xed0b, SEB,     RXE,   Z,   e1, m2_32u, new, e1, seb, f32, IF_BFP)
>      F(0xed1b, SDB,     RXE,   Z,   f1, m2_64, new, f1, sdb, f64, IF_BFP)
>  /* SUBTRACT HALFWORD */
> @@ -957,7 +957,7 @@
>  /* TEST DATA CLASS */
>      F(0xed10, TCEB,    RXE,   Z,   e1, a2, 0, 0, tceb, 0, IF_BFP)
>      F(0xed11, TCDB,    RXE,   Z,   f1, a2, 0, 0, tcdb, 0, IF_BFP)
> -    F(0xed12, TCXB,    RXE,   Z,   0, a2, x1, 0, tcxb, 0, IF_BFP)
> +    F(0xed12, TCXB,    RXE,   Z,   x1, a2, 0, 0, tcxb, 0, IF_BFP)
>  
>  /* TEST DECIMAL */
>      C(0xebc0, TP,      RSL,   E2,  la1, 0, 0, 0, tp, 0)
> -- 
> 2.34.1
> 
> 

Hi,

I ran valgrind's testsuite with this patch, and their bpf-4 test
triggered an assertion in the

    (insn->spec & SPEC_r2_f128 && !is_fp_pair(get_field(s, r2)))

condition. The following fixup helped:


--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -5771,14 +5771,14 @@ static void in1_x1(DisasContext *s, DisasOps *o)
 {
     o->in1_128 = load_freg_128(get_field(s, r1));
 }
-#define SPEC_in1_x1 SPEC_r2_f128
+#define SPEC_in1_x1 SPEC_r1_f128
 
 /* Load the high double word of an extended (128-bit) format FP number */
 static void in1_x2h(DisasContext *s, DisasOps *o)
 {
     o->in1 = load_freg(get_field(s, r2));
 }
-#define SPEC_in1_x2h SPEC_r2_f128
+#define SPEC_in1_x2h SPEC_r1_f128
 
 static void in1_f3(DisasContext *s, DisasOps *o)
 {


Best regards,
Ilya


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 9/9] target/s390x: Use Int128 for passing float128
  2022-11-02  9:38   ` Ilya Leoshkevich
@ 2022-11-02  9:47     ` Richard Henderson
  2022-11-02 10:05       ` Ilya Leoshkevich
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2022-11-02  9:47 UTC (permalink / raw)
  To: Ilya Leoshkevich, qemu-devel; +Cc: qemu-s390x

On 11/2/22 20:38, Ilya Leoshkevich wrote:
> On Fri, Oct 21, 2022 at 05:30:06PM +1000, Richard Henderson wrote:
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   target/s390x/helper.h          | 32 ++++++-------
>>   target/s390x/tcg/fpu_helper.c  | 88 ++++++++++++++--------------------
>>   target/s390x/tcg/translate.c   | 76 ++++++++++++++++++++---------
>>   target/s390x/tcg/insn-data.def | 30 ++++++------
>>   4 files changed, 121 insertions(+), 105 deletions(-)
>>
>> diff --git a/target/s390x/helper.h b/target/s390x/helper.h
>> index 429131a85e..481b9019f9 100644
>> --- a/target/s390x/helper.h
>> +++ b/target/s390x/helper.h
>> @@ -41,55 +41,55 @@ DEF_HELPER_4(csst, i32, env, i32, i64, i64)
>>   DEF_HELPER_4(csst_parallel, i32, env, i32, i64, i64)
>>   DEF_HELPER_FLAGS_3(aeb, TCG_CALL_NO_WG, i64, env, i64, i64)
>>   DEF_HELPER_FLAGS_3(adb, TCG_CALL_NO_WG, i64, env, i64, i64)
>> -DEF_HELPER_FLAGS_5(axb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
>> +DEF_HELPER_FLAGS_3(axb, TCG_CALL_NO_WG, i128, env, i128, i128)
>>   DEF_HELPER_FLAGS_3(seb, TCG_CALL_NO_WG, i64, env, i64, i64)
>>   DEF_HELPER_FLAGS_3(sdb, TCG_CALL_NO_WG, i64, env, i64, i64)
>> -DEF_HELPER_FLAGS_5(sxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
>> +DEF_HELPER_FLAGS_3(sxb, TCG_CALL_NO_WG, i128, env, i128, i128)
>>   DEF_HELPER_FLAGS_3(deb, TCG_CALL_NO_WG, i64, env, i64, i64)
>>   DEF_HELPER_FLAGS_3(ddb, TCG_CALL_NO_WG, i64, env, i64, i64)
>> -DEF_HELPER_FLAGS_5(dxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
>> +DEF_HELPER_FLAGS_3(dxb, TCG_CALL_NO_WG, i128, env, i128, i128)
>>   DEF_HELPER_FLAGS_3(meeb, TCG_CALL_NO_WG, i64, env, i64, i64)
>>   DEF_HELPER_FLAGS_3(mdeb, TCG_CALL_NO_WG, i64, env, i64, i64)
>>   DEF_HELPER_FLAGS_3(mdb, TCG_CALL_NO_WG, i64, env, i64, i64)
>> -DEF_HELPER_FLAGS_5(mxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
>> -DEF_HELPER_FLAGS_4(mxdb, TCG_CALL_NO_WG, i128, env, i64, i64, i64)
>> +DEF_HELPER_FLAGS_3(mxb, TCG_CALL_NO_WG, i128, env, i128, i128)
>> +DEF_HELPER_FLAGS_3(mxdb, TCG_CALL_NO_WG, i128, env, i128, i64)
>>   DEF_HELPER_FLAGS_2(ldeb, TCG_CALL_NO_WG, i64, env, i64)
>> -DEF_HELPER_FLAGS_4(ldxb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
>> +DEF_HELPER_FLAGS_3(ldxb, TCG_CALL_NO_WG, i64, env, i128, i32)
>>   DEF_HELPER_FLAGS_2(lxdb, TCG_CALL_NO_WG, i128, env, i64)
>>   DEF_HELPER_FLAGS_2(lxeb, TCG_CALL_NO_WG, i128, env, i64)
>>   DEF_HELPER_FLAGS_3(ledb, TCG_CALL_NO_WG, i64, env, i64, i32)
>> -DEF_HELPER_FLAGS_4(lexb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
>> +DEF_HELPER_FLAGS_3(lexb, TCG_CALL_NO_WG, i64, env, i128, i32)
>>   DEF_HELPER_FLAGS_3(ceb, TCG_CALL_NO_WG_SE, i32, env, i64, i64)
>>   DEF_HELPER_FLAGS_3(cdb, TCG_CALL_NO_WG_SE, i32, env, i64, i64)
>> -DEF_HELPER_FLAGS_5(cxb, TCG_CALL_NO_WG_SE, i32, env, i64, i64, i64, i64)
>> +DEF_HELPER_FLAGS_3(cxb, TCG_CALL_NO_WG_SE, i32, env, i128, i128)
>>   DEF_HELPER_FLAGS_3(keb, TCG_CALL_NO_WG, i32, env, i64, i64)
>>   DEF_HELPER_FLAGS_3(kdb, TCG_CALL_NO_WG, i32, env, i64, i64)
>> -DEF_HELPER_FLAGS_5(kxb, TCG_CALL_NO_WG, i32, env, i64, i64, i64, i64)
>> +DEF_HELPER_FLAGS_3(kxb, TCG_CALL_NO_WG, i32, env, i128, i128)
>>   DEF_HELPER_3(cgeb, i64, env, i64, i32)
>>   DEF_HELPER_3(cgdb, i64, env, i64, i32)
>> -DEF_HELPER_4(cgxb, i64, env, i64, i64, i32)
>> +DEF_HELPER_3(cgxb, i64, env, i128, i32)
>>   DEF_HELPER_3(cfeb, i64, env, i64, i32)
>>   DEF_HELPER_3(cfdb, i64, env, i64, i32)
>> -DEF_HELPER_4(cfxb, i64, env, i64, i64, i32)
>> +DEF_HELPER_3(cfxb, i64, env, i128, i32)
>>   DEF_HELPER_3(clgeb, i64, env, i64, i32)
>>   DEF_HELPER_3(clgdb, i64, env, i64, i32)
>> -DEF_HELPER_4(clgxb, i64, env, i64, i64, i32)
>> +DEF_HELPER_3(clgxb, i64, env, i128, i32)
>>   DEF_HELPER_3(clfeb, i64, env, i64, i32)
>>   DEF_HELPER_3(clfdb, i64, env, i64, i32)
>> -DEF_HELPER_4(clfxb, i64, env, i64, i64, i32)
>> +DEF_HELPER_3(clfxb, i64, env, i128, i32)
>>   DEF_HELPER_FLAGS_3(fieb, TCG_CALL_NO_WG, i64, env, i64, i32)
>>   DEF_HELPER_FLAGS_3(fidb, TCG_CALL_NO_WG, i64, env, i64, i32)
>> -DEF_HELPER_FLAGS_4(fixb, TCG_CALL_NO_WG, i128, env, i64, i64, i32)
>> +DEF_HELPER_FLAGS_3(fixb, TCG_CALL_NO_WG, i128, env, i128, i32)
>>   DEF_HELPER_FLAGS_4(maeb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
>>   DEF_HELPER_FLAGS_4(madb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
>>   DEF_HELPER_FLAGS_4(mseb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
>>   DEF_HELPER_FLAGS_4(msdb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
>>   DEF_HELPER_FLAGS_3(tceb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
>>   DEF_HELPER_FLAGS_3(tcdb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
>> -DEF_HELPER_FLAGS_4(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64, i64)
>> +DEF_HELPER_FLAGS_3(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i128, i64)
>>   DEF_HELPER_FLAGS_2(sqeb, TCG_CALL_NO_WG, i64, env, i64)
>>   DEF_HELPER_FLAGS_2(sqdb, TCG_CALL_NO_WG, i64, env, i64)
>> -DEF_HELPER_FLAGS_3(sqxb, TCG_CALL_NO_WG, i128, env, i64, i64)
>> +DEF_HELPER_FLAGS_2(sqxb, TCG_CALL_NO_WG, i128, env, i128)
>>   DEF_HELPER_FLAGS_1(cvd, TCG_CALL_NO_RWG_SE, i64, s32)
>>   DEF_HELPER_FLAGS_4(pack, TCG_CALL_NO_WG, void, env, i32, i64, i64)
>>   DEF_HELPER_FLAGS_4(pka, TCG_CALL_NO_WG, void, env, i64, i64, i32)
>> diff --git a/target/s390x/tcg/fpu_helper.c b/target/s390x/tcg/fpu_helper.c
>> index a584794be6..5a322e3f87 100644
>> --- a/target/s390x/tcg/fpu_helper.c
>> +++ b/target/s390x/tcg/fpu_helper.c
>> @@ -39,6 +39,11 @@ static inline Int128 RET128(float128 f)
>>       return int128_make128(f.low, f.high);
>>   }
>>   
>> +static inline float128 ARG128(Int128 i)
>> +{
>> +    return make_float128(int128_gethi(i), int128_getlo(i));
>> +}
>> +
>>   uint8_t s390_softfloat_exc_to_ieee(unsigned int exc)
>>   {
>>       uint8_t s390_exc = 0;
>> @@ -227,12 +232,9 @@ uint64_t HELPER(adb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
>>   }
>>   
>>   /* 128-bit FP addition */
>> -Int128 HELPER(axb)(CPUS390XState *env, uint64_t ah, uint64_t al,
>> -                     uint64_t bh, uint64_t bl)
>> +Int128 HELPER(axb)(CPUS390XState *env, Int128 a, Int128 b)
>>   {
>> -    float128 ret = float128_add(make_float128(ah, al),
>> -                                make_float128(bh, bl),
>> -                                &env->fpu_status);
>> +    float128 ret = float128_add(ARG128(a), ARG128(b), &env->fpu_status);
>>       handle_exceptions(env, false, GETPC());
>>       return RET128(ret);
>>   }
>> @@ -254,12 +256,9 @@ uint64_t HELPER(sdb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
>>   }
>>   
>>   /* 128-bit FP subtraction */
>> -Int128 HELPER(sxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
>> -                     uint64_t bh, uint64_t bl)
>> +Int128 HELPER(sxb)(CPUS390XState *env, Int128 a, Int128 b)
>>   {
>> -    float128 ret = float128_sub(make_float128(ah, al),
>> -                                make_float128(bh, bl),
>> -                                &env->fpu_status);
>> +    float128 ret = float128_sub(ARG128(a), ARG128(b), &env->fpu_status);
>>       handle_exceptions(env, false, GETPC());
>>       return RET128(ret);
>>   }
>> @@ -281,12 +280,9 @@ uint64_t HELPER(ddb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
>>   }
>>   
>>   /* 128-bit FP division */
>> -Int128 HELPER(dxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
>> -                     uint64_t bh, uint64_t bl)
>> +Int128 HELPER(dxb)(CPUS390XState *env, Int128 a, Int128 b)
>>   {
>> -    float128 ret = float128_div(make_float128(ah, al),
>> -                                make_float128(bh, bl),
>> -                                &env->fpu_status);
>> +    float128 ret = float128_div(ARG128(a), ARG128(b), &env->fpu_status);
>>       handle_exceptions(env, false, GETPC());
>>       return RET128(ret);
>>   }
>> @@ -317,21 +313,18 @@ uint64_t HELPER(mdeb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
>>   }
>>   
>>   /* 128-bit FP multiplication */
>> -Int128 HELPER(mxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
>> -                     uint64_t bh, uint64_t bl)
>> +Int128 HELPER(mxb)(CPUS390XState *env, Int128 a, Int128 b)
>>   {
>> -    float128 ret = float128_mul(make_float128(ah, al),
>> -                                make_float128(bh, bl),
>> -                                &env->fpu_status);
>> +    float128 ret = float128_mul(ARG128(a), ARG128(b), &env->fpu_status);
>>       handle_exceptions(env, false, GETPC());
>>       return RET128(ret);
>>   }
>>   
>>   /* 128/64-bit FP multiplication */
>> -Int128 HELPER(mxdb)(CPUS390XState *env, uint64_t ah, uint64_t al, uint64_t f2)
>> +Int128 HELPER(mxdb)(CPUS390XState *env, Int128 a, uint64_t f2)
>>   {
>>       float128 ret = float64_to_float128(f2, &env->fpu_status);
>> -    ret = float128_mul(make_float128(ah, al), ret, &env->fpu_status);
>> +    ret = float128_mul(ARG128(a), ret, &env->fpu_status);
>>       handle_exceptions(env, false, GETPC());
>>       return RET128(ret);
>>   }
>> @@ -345,11 +338,10 @@ uint64_t HELPER(ldeb)(CPUS390XState *env, uint64_t f2)
>>   }
>>   
>>   /* convert 128-bit float to 64-bit float */
>> -uint64_t HELPER(ldxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
>> -                      uint32_t m34)
>> +uint64_t HELPER(ldxb)(CPUS390XState *env, Int128 a, uint32_t m34)
>>   {
>>       int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
>> -    float64 ret = float128_to_float64(make_float128(ah, al), &env->fpu_status);
>> +    float64 ret = float128_to_float64(ARG128(a), &env->fpu_status);
>>   
>>       s390_restore_bfp_rounding_mode(env, old_mode);
>>       handle_exceptions(env, xxc_from_m34(m34), GETPC());
>> @@ -384,11 +376,10 @@ uint64_t HELPER(ledb)(CPUS390XState *env, uint64_t f2, uint32_t m34)
>>   }
>>   
>>   /* convert 128-bit float to 32-bit float */
>> -uint64_t HELPER(lexb)(CPUS390XState *env, uint64_t ah, uint64_t al,
>> -                      uint32_t m34)
>> +uint64_t HELPER(lexb)(CPUS390XState *env, Int128 a, uint32_t m34)
>>   {
>>       int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
>> -    float32 ret = float128_to_float32(make_float128(ah, al), &env->fpu_status);
>> +    float32 ret = float128_to_float32(ARG128(a), &env->fpu_status);
>>   
>>       s390_restore_bfp_rounding_mode(env, old_mode);
>>       handle_exceptions(env, xxc_from_m34(m34), GETPC());
>> @@ -412,11 +403,9 @@ uint32_t HELPER(cdb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
>>   }
>>   
>>   /* 128-bit FP compare */
>> -uint32_t HELPER(cxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
>> -                     uint64_t bh, uint64_t bl)
>> +uint32_t HELPER(cxb)(CPUS390XState *env, Int128 a, Int128 b)
>>   {
>> -    FloatRelation cmp = float128_compare_quiet(make_float128(ah, al),
>> -                                               make_float128(bh, bl),
>> +    FloatRelation cmp = float128_compare_quiet(ARG128(a), ARG128(b),
>>                                                  &env->fpu_status);
>>       handle_exceptions(env, false, GETPC());
>>       return float_comp_to_cc(env, cmp);
>> @@ -564,10 +553,10 @@ uint64_t HELPER(cgdb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
>>   }
>>   
>>   /* convert 128-bit float to 64-bit int */
>> -uint64_t HELPER(cgxb)(CPUS390XState *env, uint64_t h, uint64_t l, uint32_t m34)
>> +uint64_t HELPER(cgxb)(CPUS390XState *env, Int128 i2, uint32_t m34)
>>   {
>>       int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
>> -    float128 v2 = make_float128(h, l);
>> +    float128 v2 = ARG128(i2);
>>       int64_t ret = float128_to_int64(v2, &env->fpu_status);
>>       uint32_t cc = set_cc_conv_f128(v2, &env->fpu_status);
>>   
>> @@ -613,10 +602,10 @@ uint64_t HELPER(cfdb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
>>   }
>>   
>>   /* convert 128-bit float to 32-bit int */
>> -uint64_t HELPER(cfxb)(CPUS390XState *env, uint64_t h, uint64_t l, uint32_t m34)
>> +uint64_t HELPER(cfxb)(CPUS390XState *env, Int128 i2, uint32_t m34)
>>   {
>>       int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
>> -    float128 v2 = make_float128(h, l);
>> +    float128 v2 = ARG128(i2);
>>       int32_t ret = float128_to_int32(v2, &env->fpu_status);
>>       uint32_t cc = set_cc_conv_f128(v2, &env->fpu_status);
>>   
>> @@ -662,10 +651,10 @@ uint64_t HELPER(clgdb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
>>   }
>>   
>>   /* convert 128-bit float to 64-bit uint */
>> -uint64_t HELPER(clgxb)(CPUS390XState *env, uint64_t h, uint64_t l, uint32_t m34)
>> +uint64_t HELPER(clgxb)(CPUS390XState *env, Int128 i2, uint32_t m34)
>>   {
>>       int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
>> -    float128 v2 = make_float128(h, l);
>> +    float128 v2 = ARG128(i2);
>>       uint64_t ret = float128_to_uint64(v2, &env->fpu_status);
>>       uint32_t cc = set_cc_conv_f128(v2, &env->fpu_status);
>>   
>> @@ -711,10 +700,10 @@ uint64_t HELPER(clfdb)(CPUS390XState *env, uint64_t v2, uint32_t m34)
>>   }
>>   
>>   /* convert 128-bit float to 32-bit uint */
>> -uint64_t HELPER(clfxb)(CPUS390XState *env, uint64_t h, uint64_t l, uint32_t m34)
>> +uint64_t HELPER(clfxb)(CPUS390XState *env, Int128 i2, uint32_t m34)
>>   {
>>       int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
>> -    float128 v2 = make_float128(h, l);
>> +    float128 v2 = ARG128(i2);
>>       uint32_t ret = float128_to_uint32(v2, &env->fpu_status);
>>       uint32_t cc = set_cc_conv_f128(v2, &env->fpu_status);
>>   
>> @@ -750,11 +739,10 @@ uint64_t HELPER(fidb)(CPUS390XState *env, uint64_t f2, uint32_t m34)
>>   }
>>   
>>   /* round to integer 128-bit */
>> -Int128 HELPER(fixb)(CPUS390XState *env, uint64_t ah, uint64_t al, uint32_t m34)
>> +Int128 HELPER(fixb)(CPUS390XState *env, Int128 a, uint32_t m34)
>>   {
>>       int old_mode = s390_swap_bfp_rounding_mode(env, round_from_m34(m34));
>> -    float128 ret = float128_round_to_int(make_float128(ah, al),
>> -                                         &env->fpu_status);
>> +    float128 ret = float128_round_to_int(ARG128(a), &env->fpu_status);
>>   
>>       s390_restore_bfp_rounding_mode(env, old_mode);
>>       handle_exceptions(env, xxc_from_m34(m34), GETPC());
>> @@ -778,11 +766,9 @@ uint32_t HELPER(kdb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
>>   }
>>   
>>   /* 128-bit FP compare and signal */
>> -uint32_t HELPER(kxb)(CPUS390XState *env, uint64_t ah, uint64_t al,
>> -                     uint64_t bh, uint64_t bl)
>> +uint32_t HELPER(kxb)(CPUS390XState *env, Int128 a, Int128 b)
>>   {
>> -    FloatRelation cmp = float128_compare(make_float128(ah, al),
>> -                                         make_float128(bh, bl),
>> +    FloatRelation cmp = float128_compare(ARG128(a), ARG128(b),
>>                                            &env->fpu_status);
>>       handle_exceptions(env, false, GETPC());
>>       return float_comp_to_cc(env, cmp);
>> @@ -869,9 +855,9 @@ uint32_t HELPER(tcdb)(CPUS390XState *env, uint64_t v1, uint64_t m2)
>>   }
>>   
>>   /* test data class 128-bit */
>> -uint32_t HELPER(tcxb)(CPUS390XState *env, uint64_t ah, uint64_t al, uint64_t m2)
>> +uint32_t HELPER(tcxb)(CPUS390XState *env, Int128 a, uint64_t m2)
>>   {
>> -    return (m2 & float128_dcmask(env, make_float128(ah, al))) != 0;
>> +    return (m2 & float128_dcmask(env, ARG128(a))) != 0;
>>   }
>>   
>>   /* square root 32-bit */
>> @@ -891,9 +877,9 @@ uint64_t HELPER(sqdb)(CPUS390XState *env, uint64_t f2)
>>   }
>>   
>>   /* square root 128-bit */
>> -Int128 HELPER(sqxb)(CPUS390XState *env, uint64_t ah, uint64_t al)
>> +Int128 HELPER(sqxb)(CPUS390XState *env, Int128 a)
>>   {
>> -    float128 ret = float128_sqrt(make_float128(ah, al), &env->fpu_status);
>> +    float128 ret = float128_sqrt(ARG128(a), &env->fpu_status);
>>       handle_exceptions(env, false, GETPC());
>>       return RET128(ret);
>>   }
>> diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
>> index d1ffbb8710..8023bbab2f 100644
>> --- a/target/s390x/tcg/translate.c
>> +++ b/target/s390x/tcg/translate.c
>> @@ -305,6 +305,18 @@ static TCGv_i64 load_freg32_i64(int reg)
>>       return r;
>>   }
>>   
>> +static TCGv_i128 load_freg_128(int reg)
>> +{
>> +    TCGv_i64 h = load_freg(reg);
>> +    TCGv_i64 l = load_freg(reg + 2);
>> +    TCGv_i128 r = tcg_temp_new_i128();
>> +
>> +    tcg_gen_concat_i64_i128(r, l, h);
>> +    tcg_temp_free_i64(h);
>> +    tcg_temp_free_i64(l);
>> +    return r;
>> +}
>> +
>>   static void store_reg(int reg, TCGv_i64 v)
>>   {
>>       tcg_gen_mov_i64(regs[reg], v);
>> @@ -1103,7 +1115,7 @@ typedef struct {
>>       bool g_out, g_out2, g_in1, g_in2;
>>       TCGv_i64 out, out2, in1, in2;
>>       TCGv_i64 addr1;
>> -    TCGv_i128 out_128;
>> +    TCGv_i128 out_128, in1_128, in2_128;
>>   } DisasOps;
>>   
>>   /* Instructions can place constraints on their operands, raising specification
>> @@ -1462,7 +1474,7 @@ static DisasJumpType op_adb(DisasContext *s, DisasOps *o)
>>   
>>   static DisasJumpType op_axb(DisasContext *s, DisasOps *o)
>>   {
>> -    gen_helper_axb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
>> +    gen_helper_axb(o->out_128, cpu_env, o->in1_128, o->in2_128);
>>       return DISAS_NEXT;
>>   }
>>   
>> @@ -1778,7 +1790,7 @@ static DisasJumpType op_cdb(DisasContext *s, DisasOps *o)
>>   
>>   static DisasJumpType op_cxb(DisasContext *s, DisasOps *o)
>>   {
>> -    gen_helper_cxb(cc_op, cpu_env, o->out, o->out2, o->in1, o->in2);
>> +    gen_helper_cxb(cc_op, cpu_env, o->in1_128, o->in2_128);
>>       set_cc_static(s);
>>       return DISAS_NEXT;
>>   }
>> @@ -1841,7 +1853,7 @@ static DisasJumpType op_cfxb(DisasContext *s, DisasOps *o)
>>       if (!m34) {
>>           return DISAS_NORETURN;
>>       }
>> -    gen_helper_cfxb(o->out, cpu_env, o->in1, o->in2, m34);
>> +    gen_helper_cfxb(o->out, cpu_env, o->in2_128, m34);
>>       tcg_temp_free_i32(m34);
>>       set_cc_static(s);
>>       return DISAS_NEXT;
>> @@ -1880,7 +1892,7 @@ static DisasJumpType op_cgxb(DisasContext *s, DisasOps *o)
>>       if (!m34) {
>>           return DISAS_NORETURN;
>>       }
>> -    gen_helper_cgxb(o->out, cpu_env, o->in1, o->in2, m34);
>> +    gen_helper_cgxb(o->out, cpu_env, o->in2_128, m34);
>>       tcg_temp_free_i32(m34);
>>       set_cc_static(s);
>>       return DISAS_NEXT;
>> @@ -1919,7 +1931,7 @@ static DisasJumpType op_clfxb(DisasContext *s, DisasOps *o)
>>       if (!m34) {
>>           return DISAS_NORETURN;
>>       }
>> -    gen_helper_clfxb(o->out, cpu_env, o->in1, o->in2, m34);
>> +    gen_helper_clfxb(o->out, cpu_env, o->in2_128, m34);
>>       tcg_temp_free_i32(m34);
>>       set_cc_static(s);
>>       return DISAS_NEXT;
>> @@ -1958,7 +1970,7 @@ static DisasJumpType op_clgxb(DisasContext *s, DisasOps *o)
>>       if (!m34) {
>>           return DISAS_NORETURN;
>>       }
>> -    gen_helper_clgxb(o->out, cpu_env, o->in1, o->in2, m34);
>> +    gen_helper_clgxb(o->out, cpu_env, o->in2_128, m34);
>>       tcg_temp_free_i32(m34);
>>       set_cc_static(s);
>>       return DISAS_NEXT;
>> @@ -2448,7 +2460,7 @@ static DisasJumpType op_ddb(DisasContext *s, DisasOps *o)
>>   
>>   static DisasJumpType op_dxb(DisasContext *s, DisasOps *o)
>>   {
>> -    gen_helper_dxb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
>> +    gen_helper_dxb(o->out_128, cpu_env, o->in1_128, o->in2_128);
>>       return DISAS_NEXT;
>>   }
>>   
>> @@ -2553,7 +2565,7 @@ static DisasJumpType op_fixb(DisasContext *s, DisasOps *o)
>>       if (!m34) {
>>           return DISAS_NORETURN;
>>       }
>> -    gen_helper_fixb(o->out_128, cpu_env, o->in1, o->in2, m34);
>> +    gen_helper_fixb(o->out_128, cpu_env, o->in2_128, m34);
>>       tcg_temp_free_i32(m34);
>>       return DISAS_NEXT;
>>   }
>> @@ -2772,7 +2784,7 @@ static DisasJumpType op_kdb(DisasContext *s, DisasOps *o)
>>   
>>   static DisasJumpType op_kxb(DisasContext *s, DisasOps *o)
>>   {
>> -    gen_helper_kxb(cc_op, cpu_env, o->out, o->out2, o->in1, o->in2);
>> +    gen_helper_kxb(cc_op, cpu_env, o->in1_128, o->in2_128);
>>       set_cc_static(s);
>>       return DISAS_NEXT;
>>   }
>> @@ -2846,7 +2858,7 @@ static DisasJumpType op_ldxb(DisasContext *s, DisasOps *o)
>>       if (!m34) {
>>           return DISAS_NORETURN;
>>       }
>> -    gen_helper_ldxb(o->out, cpu_env, o->in1, o->in2, m34);
>> +    gen_helper_ldxb(o->out, cpu_env, o->in2_128, m34);
>>       tcg_temp_free_i32(m34);
>>       return DISAS_NEXT;
>>   }
>> @@ -2858,7 +2870,7 @@ static DisasJumpType op_lexb(DisasContext *s, DisasOps *o)
>>       if (!m34) {
>>           return DISAS_NORETURN;
>>       }
>> -    gen_helper_lexb(o->out, cpu_env, o->in1, o->in2, m34);
>> +    gen_helper_lexb(o->out, cpu_env, o->in2_128, m34);
>>       tcg_temp_free_i32(m34);
>>       return DISAS_NEXT;
>>   }
>> @@ -3586,13 +3598,13 @@ static DisasJumpType op_mdb(DisasContext *s, DisasOps *o)
>>   
>>   static DisasJumpType op_mxb(DisasContext *s, DisasOps *o)
>>   {
>> -    gen_helper_mxb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
>> +    gen_helper_mxb(o->out_128, cpu_env, o->in1_128, o->in2_128);
>>       return DISAS_NEXT;
>>   }
>>   
>>   static DisasJumpType op_mxdb(DisasContext *s, DisasOps *o)
>>   {
>> -    gen_helper_mxdb(o->out_128, cpu_env, o->out, o->out2, o->in2);
>> +    gen_helper_mxdb(o->out_128, cpu_env, o->in1_128, o->in2);
>>       return DISAS_NEXT;
>>   }
>>   
>> @@ -4057,7 +4069,7 @@ static DisasJumpType op_sdb(DisasContext *s, DisasOps *o)
>>   
>>   static DisasJumpType op_sxb(DisasContext *s, DisasOps *o)
>>   {
>> -    gen_helper_sxb(o->out_128, cpu_env, o->out, o->out2, o->in1, o->in2);
>> +    gen_helper_sxb(o->out_128, cpu_env, o->in1_128, o->in2_128);
>>       return DISAS_NEXT;
>>   }
>>   
>> @@ -4075,7 +4087,7 @@ static DisasJumpType op_sqdb(DisasContext *s, DisasOps *o)
>>   
>>   static DisasJumpType op_sqxb(DisasContext *s, DisasOps *o)
>>   {
>> -    gen_helper_sqxb(o->out_128, cpu_env, o->in1, o->in2);
>> +    gen_helper_sqxb(o->out_128, cpu_env, o->in2_128);
>>       return DISAS_NEXT;
>>   }
>>   
>> @@ -4854,7 +4866,7 @@ static DisasJumpType op_tcdb(DisasContext *s, DisasOps *o)
>>   
>>   static DisasJumpType op_tcxb(DisasContext *s, DisasOps *o)
>>   {
>> -    gen_helper_tcxb(cc_op, cpu_env, o->out, o->out2, o->in2);
>> +    gen_helper_tcxb(cc_op, cpu_env, o->in1_128, o->in2);
>>       set_cc_static(s);
>>       return DISAS_NEXT;
>>   }
>> @@ -5389,8 +5401,6 @@ static void prep_new_P(DisasContext *s, DisasOps *o)
>>   
>>   static void prep_new_x(DisasContext *s, DisasOps *o)
>>   {
>> -    o->out = tcg_temp_new_i64();
>> -    o->out2 = tcg_temp_new_i64();
>>       o->out_128 = tcg_temp_new_i128();
>>   }
>>   #define SPEC_prep_new_x 0
>> @@ -5413,10 +5423,7 @@ static void prep_r1_P(DisasContext *s, DisasOps *o)
>>   
>>   static void prep_x1(DisasContext *s, DisasOps *o)
>>   {
>> -    o->out = load_freg(get_field(s, r1));
>> -    o->out2 = load_freg(get_field(s, r1) + 2);
>> -    o->out_128 = tcg_temp_new_i128();
>> -    tcg_gen_concat_i64_i128(o->out_128, o->out2, o->out);
>> +    o->out_128 = load_freg_128(get_field(s, r1));
>>   }
>>   #define SPEC_prep_x1 SPEC_r1_f128
>>   
>> @@ -5515,6 +5522,11 @@ static void wout_x1(DisasContext *s, DisasOps *o)
>>   {
>>       int f1 = get_field(s, r1);
>>   
>> +    /* Split out_128 into out+out2 for cout_f128. */
>> +    tcg_debug_assert(o->out == NULL);
>> +    o->out = tcg_temp_new_i64();
>> +    o->out2 = tcg_temp_new_i64();
>> +
>>       tcg_gen_extr_i128_i64(o->out2, o->out, o->out_128);
>>       store_freg(f1, o->out);
>>       store_freg(f1 + 2, o->out2);
>> @@ -5757,6 +5769,12 @@ static void in1_f1(DisasContext *s, DisasOps *o)
>>   }
>>   #define SPEC_in1_f1 0
>>   
>> +static void in1_x1(DisasContext *s, DisasOps *o)
>> +{
>> +    o->in1_128 = load_freg_128(get_field(s, r1));
>> +}
>> +#define SPEC_in1_x1 SPEC_r2_f128
>> +
>>   /* Load the high double word of an extended (128-bit) format FP number */
>>   static void in1_x2h(DisasContext *s, DisasOps *o)
>>   {
>> @@ -5966,6 +5984,12 @@ static void in2_f2(DisasContext *s, DisasOps *o)
>>   }
>>   #define SPEC_in2_f2 0
>>   
>> +static void in2_x2(DisasContext *s, DisasOps *o)
>> +{
>> +    o->in2_128 = load_freg_128(get_field(s, r2));
>> +}
>> +#define SPEC_in2_x2 SPEC_r2_f128
>> +
>>   /* Load the low double word of an extended (128-bit) format FP number */
>>   static void in2_x2l(DisasContext *s, DisasOps *o)
>>   {
>> @@ -6588,6 +6612,12 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s)
>>       if (o.out_128) {
>>           tcg_temp_free_i128(o.out_128);
>>       }
>> +    if (o.in1_128) {
>> +        tcg_temp_free_i128(o.in1_128);
>> +    }
>> +    if (o.in2_128) {
>> +        tcg_temp_free_i128(o.in2_128);
>> +    }
>>       /* io should be the last instruction in tb when icount is enabled */
>>       if (unlikely(icount && ret == DISAS_NEXT)) {
>>           ret = DISAS_TOO_MANY;
>> diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
>> index 20bf20c766..26523746d6 100644
>> --- a/target/s390x/tcg/insn-data.def
>> +++ b/target/s390x/tcg/insn-data.def
>> @@ -34,7 +34,7 @@
>>       C(0xe318, AGF,     RXY_a, Z,   r1, m2_32s, r1, 0, add, adds64)
>>       F(0xb30a, AEBR,    RRE,   Z,   e1, e2, new, e1, aeb, f32, IF_BFP)
>>       F(0xb31a, ADBR,    RRE,   Z,   f1, f2, new, f1, adb, f64, IF_BFP)
>> -    F(0xb34a, AXBR,    RRE,   Z,   x2h, x2l, x1, x1, axb, f128, IF_BFP)
>> +    F(0xb34a, AXBR,    RRE,   Z,   x1, x2, new_x, x1, axb, f128, IF_BFP)
>>       F(0xed0a, AEB,     RXE,   Z,   e1, m2_32u, new, e1, aeb, f32, IF_BFP)
>>       F(0xed1a, ADB,     RXE,   Z,   f1, m2_64, new, f1, adb, f64, IF_BFP)
>>   /* ADD HIGH */
>> @@ -172,13 +172,13 @@
>>       C(0xe330, CGF,     RXY_a, Z,   r1_o, m2_32s, 0, 0, 0, cmps64)
>>       F(0xb309, CEBR,    RRE,   Z,   e1, e2, 0, 0, ceb, 0, IF_BFP)
>>       F(0xb319, CDBR,    RRE,   Z,   f1, f2, 0, 0, cdb, 0, IF_BFP)
>> -    F(0xb349, CXBR,    RRE,   Z,   x2h, x2l, x1, 0, cxb, 0, IF_BFP)
>> +    F(0xb349, CXBR,    RRE,   Z,   x1, x2, 0, 0, cxb, 0, IF_BFP)
>>       F(0xed09, CEB,     RXE,   Z,   e1, m2_32u, 0, 0, ceb, 0, IF_BFP)
>>       F(0xed19, CDB,     RXE,   Z,   f1, m2_64, 0, 0, cdb, 0, IF_BFP)
>>   /* COMPARE AND SIGNAL */
>>       F(0xb308, KEBR,    RRE,   Z,   e1, e2, 0, 0, keb, 0, IF_BFP)
>>       F(0xb318, KDBR,    RRE,   Z,   f1, f2, 0, 0, kdb, 0, IF_BFP)
>> -    F(0xb348, KXBR,    RRE,   Z,   x2h, x2l, x1, 0, kxb, 0, IF_BFP)
>> +    F(0xb348, KXBR,    RRE,   Z,   x1, x2, 0, 0, kxb, 0, IF_BFP)
>>       F(0xed08, KEB,     RXE,   Z,   e1, m2_32u, 0, 0, keb, 0, IF_BFP)
>>       F(0xed18, KDB,     RXE,   Z,   f1, m2_64, 0, 0, kdb, 0, IF_BFP)
>>   /* COMPARE IMMEDIATE */
>> @@ -299,10 +299,10 @@
>>   /* CONVERT TO FIXED */
>>       F(0xb398, CFEBR,   RRF_e, Z,   0, e2, new, r1_32, cfeb, 0, IF_BFP)
>>       F(0xb399, CFDBR,   RRF_e, Z,   0, f2, new, r1_32, cfdb, 0, IF_BFP)
>> -    F(0xb39a, CFXBR,   RRF_e, Z,   x2h, x2l, new, r1_32, cfxb, 0, IF_BFP)
>> +    F(0xb39a, CFXBR,   RRF_e, Z,   0, x2, new, r1_32, cfxb, 0, IF_BFP)
>>       F(0xb3a8, CGEBR,   RRF_e, Z,   0, e2, r1, 0, cgeb, 0, IF_BFP)
>>       F(0xb3a9, CGDBR,   RRF_e, Z,   0, f2, r1, 0, cgdb, 0, IF_BFP)
>> -    F(0xb3aa, CGXBR,   RRF_e, Z,   x2h, x2l, r1, 0, cgxb, 0, IF_BFP)
>> +    F(0xb3aa, CGXBR,   RRF_e, Z,   0, x2, r1, 0, cgxb, 0, IF_BFP)
>>   /* CONVERT FROM FIXED */
>>       F(0xb394, CEFBR,   RRF_e, Z,   0, r2_32s, new, e1, cegb, 0, IF_BFP)
>>       F(0xb395, CDFBR,   RRF_e, Z,   0, r2_32s, new, f1, cdgb, 0, IF_BFP)
>> @@ -313,10 +313,10 @@
>>   /* CONVERT TO LOGICAL */
>>       F(0xb39c, CLFEBR,  RRF_e, FPE, 0, e2, new, r1_32, clfeb, 0, IF_BFP)
>>       F(0xb39d, CLFDBR,  RRF_e, FPE, 0, f2, new, r1_32, clfdb, 0, IF_BFP)
>> -    F(0xb39e, CLFXBR,  RRF_e, FPE, x2h, x2l, new, r1_32, clfxb, 0, IF_BFP)
>> +    F(0xb39e, CLFXBR,  RRF_e, FPE, 0, x2, new, r1_32, clfxb, 0, IF_BFP)
>>       F(0xb3ac, CLGEBR,  RRF_e, FPE, 0, e2, r1, 0, clgeb, 0, IF_BFP)
>>       F(0xb3ad, CLGDBR,  RRF_e, FPE, 0, f2, r1, 0, clgdb, 0, IF_BFP)
>> -    F(0xb3ae, CLGXBR,  RRF_e, FPE, x2h, x2l, r1, 0, clgxb, 0, IF_BFP)
>> +    F(0xb3ae, CLGXBR,  RRF_e, FPE, 0, x2, r1, 0, clgxb, 0, IF_BFP)
>>   /* CONVERT FROM LOGICAL */
>>       F(0xb390, CELFBR,  RRF_e, FPE, 0, r2_32u, new, e1, celgb, 0, IF_BFP)
>>       F(0xb391, CDLFBR,  RRF_e, FPE, 0, r2_32u, new, f1, cdlgb, 0, IF_BFP)
>> @@ -343,7 +343,7 @@
>>       C(0x5d00, D,       RX_a,  Z,   r1_D32, m2_32s, new_P, r1_P32, divs32, 0)
>>       F(0xb30d, DEBR,    RRE,   Z,   e1, e2, new, e1, deb, 0, IF_BFP)
>>       F(0xb31d, DDBR,    RRE,   Z,   f1, f2, new, f1, ddb, 0, IF_BFP)
>> -    F(0xb34d, DXBR,    RRE,   Z,   x2h, x2l, x1, x1, dxb, 0, IF_BFP)
>> +    F(0xb34d, DXBR,    RRE,   Z,   x1, x2, new_x, x1, dxb, 0, IF_BFP)
>>       F(0xed0d, DEB,     RXE,   Z,   e1, m2_32u, new, e1, deb, 0, IF_BFP)
>>       F(0xed1d, DDB,     RXE,   Z,   f1, m2_64, new, f1, ddb, 0, IF_BFP)
>>   /* DIVIDE LOGICAL */
>> @@ -597,7 +597,7 @@
>>   /* LOAD FP INTEGER */
>>       F(0xb357, FIEBR,   RRF_e, Z,   0, e2, new, e1, fieb, 0, IF_BFP)
>>       F(0xb35f, FIDBR,   RRF_e, Z,   0, f2, new, f1, fidb, 0, IF_BFP)
>> -    F(0xb347, FIXBR,   RRF_e, Z,   x2h, x2l, new_x, x1, fixb, 0, IF_BFP)
>> +    F(0xb347, FIXBR,   RRF_e, Z,   0, x2, new_x, x1, fixb, 0, IF_BFP)
>>   
>>   /* LOAD LENGTHENED */
>>       F(0xb304, LDEBR,   RRE,   Z,   0, e2, new, f1, ldeb, 0, IF_BFP)
>> @@ -610,8 +610,8 @@
>>       F(0xed24, LDE,     RXE,   Z,   0, m2_32u, new, f1, lde, 0, IF_AFP1)
>>   /* LOAD ROUNDED */
>>       F(0xb344, LEDBR,   RRF_e, Z,   0, f2, new, e1, ledb, 0, IF_BFP)
>> -    F(0xb345, LDXBR,   RRF_e, Z,   x2h, x2l, new, f1, ldxb, 0, IF_BFP)
>> -    F(0xb346, LEXBR,   RRF_e, Z,   x2h, x2l, new, e1, lexb, 0, IF_BFP)
>> +    F(0xb345, LDXBR,   RRF_e, Z,   0, x2, new, f1, ldxb, 0, IF_BFP)
>> +    F(0xb346, LEXBR,   RRF_e, Z,   0, x2, new, e1, lexb, 0, IF_BFP)
>>   
>>   /* LOAD MULTIPLE */
>>       C(0x9800, LM,      RS_a,  Z,   0, a2, 0, 0, lm32, 0)
>> @@ -666,7 +666,7 @@
>>       C(0xe384, MG,      RXY_a, MIE2,r1p1_o, m2_64, r1_P, 0, muls128, 0)
>>       F(0xb317, MEEBR,   RRE,   Z,   e1, e2, new, e1, meeb, 0, IF_BFP)
>>       F(0xb31c, MDBR,    RRE,   Z,   f1, f2, new, f1, mdb, 0, IF_BFP)
>> -    F(0xb34c, MXBR,    RRE,   Z,   x2h, x2l, x1, x1, mxb, 0, IF_BFP)
>> +    F(0xb34c, MXBR,    RRE,   Z,   x1, x2, new_x, x1, mxb, 0, IF_BFP)
>>       F(0xb30c, MDEBR,   RRE,   Z,   f1, e2, new, f1, mdeb, 0, IF_BFP)
>>       F(0xb307, MXDBR,   RRE,   Z,   0, f2, x1, x1, mxdb, 0, IF_BFP)
>>       F(0xed17, MEEB,    RXE,   Z,   e1, m2_32u, new, e1, meeb, 0, IF_BFP)
>> @@ -835,7 +835,7 @@
>>   /* SQUARE ROOT */
>>       F(0xb314, SQEBR,   RRE,   Z,   0, e2, new, e1, sqeb, 0, IF_BFP)
>>       F(0xb315, SQDBR,   RRE,   Z,   0, f2, new, f1, sqdb, 0, IF_BFP)
>> -    F(0xb316, SQXBR,   RRE,   Z,   x2h, x2l, new_x, x1, sqxb, 0, IF_BFP)
>> +    F(0xb316, SQXBR,   RRE,   Z,   0, x2, new_x, x1, sqxb, 0, IF_BFP)
>>       F(0xed14, SQEB,    RXE,   Z,   0, m2_32u, new, e1, sqeb, 0, IF_BFP)
>>       F(0xed15, SQDB,    RXE,   Z,   0, m2_64, new, f1, sqdb, 0, IF_BFP)
>>   
>> @@ -913,7 +913,7 @@
>>       C(0xe319, SGF,     RXY_a, Z,   r1, m2_32s, r1, 0, sub, subs64)
>>       F(0xb30b, SEBR,    RRE,   Z,   e1, e2, new, e1, seb, f32, IF_BFP)
>>       F(0xb31b, SDBR,    RRE,   Z,   f1, f2, new, f1, sdb, f64, IF_BFP)
>> -    F(0xb34b, SXBR,    RRE,   Z,   x2h, x2l, x1, x1, sxb, f128, IF_BFP)
>> +    F(0xb34b, SXBR,    RRE,   Z,   x1, x2, new_x, x1, sxb, f128, IF_BFP)
>>       F(0xed0b, SEB,     RXE,   Z,   e1, m2_32u, new, e1, seb, f32, IF_BFP)
>>       F(0xed1b, SDB,     RXE,   Z,   f1, m2_64, new, f1, sdb, f64, IF_BFP)
>>   /* SUBTRACT HALFWORD */
>> @@ -957,7 +957,7 @@
>>   /* TEST DATA CLASS */
>>       F(0xed10, TCEB,    RXE,   Z,   e1, a2, 0, 0, tceb, 0, IF_BFP)
>>       F(0xed11, TCDB,    RXE,   Z,   f1, a2, 0, 0, tcdb, 0, IF_BFP)
>> -    F(0xed12, TCXB,    RXE,   Z,   0, a2, x1, 0, tcxb, 0, IF_BFP)
>> +    F(0xed12, TCXB,    RXE,   Z,   x1, a2, 0, 0, tcxb, 0, IF_BFP)
>>   
>>   /* TEST DECIMAL */
>>       C(0xebc0, TP,      RSL,   E2,  la1, 0, 0, 0, tp, 0)
>> -- 
>> 2.34.1
>>
>>
> 
> Hi,
> 
> I ran valgrind's testsuite with this patch, and their bpf-4 test
> triggered an assertion in the
> 
>      (insn->spec & SPEC_r2_f128 && !is_fp_pair(get_field(s, r2)))
> 
> condition. The following fixup helped:
> 
> 
> --- a/target/s390x/tcg/translate.c
> +++ b/target/s390x/tcg/translate.c
> @@ -5771,14 +5771,14 @@ static void in1_x1(DisasContext *s, DisasOps *o)
>   {
>       o->in1_128 = load_freg_128(get_field(s, r1));
>   }
> -#define SPEC_in1_x1 SPEC_r2_f128
> +#define SPEC_in1_x1 SPEC_r1_f128

Looks right, thanks.

>   
>   /* Load the high double word of an extended (128-bit) format FP number */
>   static void in1_x2h(DisasContext *s, DisasOps *o)
>   {
>       o->in1 = load_freg(get_field(s, r2));
>   }
> -#define SPEC_in1_x2h SPEC_r2_f128
> +#define SPEC_in1_x2h SPEC_r1_f128

This looks wrong.


r~


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 9/9] target/s390x: Use Int128 for passing float128
  2022-11-02  9:47     ` Richard Henderson
@ 2022-11-02 10:05       ` Ilya Leoshkevich
  0 siblings, 0 replies; 35+ messages in thread
From: Ilya Leoshkevich @ 2022-11-02 10:05 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: qemu-s390x

On Wed, Nov 02, 2022 at 08:47:24PM +1100, Richard Henderson wrote:
> On 11/2/22 20:38, Ilya Leoshkevich wrote:
> > On Fri, Oct 21, 2022 at 05:30:06PM +1000, Richard Henderson wrote:
> > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> > > ---
> > >   target/s390x/helper.h          | 32 ++++++-------
> > >   target/s390x/tcg/fpu_helper.c  | 88 ++++++++++++++--------------------
> > >   target/s390x/tcg/translate.c   | 76 ++++++++++++++++++++---------
> > >   target/s390x/tcg/insn-data.def | 30 ++++++------
> > >   4 files changed, 121 insertions(+), 105 deletions(-)

...

> > Hi,
> > 
> > I ran valgrind's testsuite with this patch, and their bpf-4 test
> > triggered an assertion in the
> > 
> >      (insn->spec & SPEC_r2_f128 && !is_fp_pair(get_field(s, r2)))
> > 
> > condition. The following fixup helped:
> > 
> > 
> > --- a/target/s390x/tcg/translate.c
> > +++ b/target/s390x/tcg/translate.c
> > @@ -5771,14 +5771,14 @@ static void in1_x1(DisasContext *s, DisasOps *o)
> >   {
> >       o->in1_128 = load_freg_128(get_field(s, r1));
> >   }
> > -#define SPEC_in1_x1 SPEC_r2_f128
> > +#define SPEC_in1_x1 SPEC_r1_f128
> 
> Looks right, thanks.
> 
> >   /* Load the high double word of an extended (128-bit) format FP number */
> >   static void in1_x2h(DisasContext *s, DisasOps *o)
> >   {
> >       o->in1 = load_freg(get_field(s, r2));
> >   }
> > -#define SPEC_in1_x2h SPEC_r2_f128
> > +#define SPEC_in1_x2h SPEC_r1_f128
> 
> This looks wrong.

Oh, right - we do get_field(r2) here.
Only the first hunk is necessary.

> r~


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2022-11-02 10:07 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-21  7:29 [PATCH 0/9] target/s390x: Use Int128 for float128 and retxl Richard Henderson
2022-10-21  7:29 ` [PATCH 1/9] target/s390x: Use a single return for helper_divs32/u32 Richard Henderson
2022-10-21 11:25   ` Ilya Leoshkevich
2022-10-25  9:59   ` Philippe Mathieu-Daudé
2022-11-01 11:09   ` Ilya Leoshkevich
2022-11-01 11:13   ` [PATCH] tests/tcg/s390x: Add div.c Ilya Leoshkevich
2022-10-21  7:29 ` [PATCH 2/9] target/s390x: Use a single return for helper_divs64/u64 Richard Henderson
2022-10-24 13:16   ` Philippe Mathieu-Daudé
2022-10-25 20:47   ` Ilya Leoshkevich
2022-10-21  7:30 ` [PATCH 3/9] target/s390x: Use Int128 for return from CLST Richard Henderson
2022-10-24 13:15   ` Philippe Mathieu-Daudé
2022-10-25 21:22   ` Ilya Leoshkevich
2022-10-25 21:30   ` [PATCH 0/1] " Ilya Leoshkevich
2022-10-25 21:30     ` [PATCH 1/1] tests/tcg/s390x: Add clst.c Ilya Leoshkevich
2022-10-21  7:30 ` [PATCH 4/9] target/s390x: Use Int128 for return from CKSM Richard Henderson
2022-10-24 13:17   ` Philippe Mathieu-Daudé
2022-10-27 10:36   ` Ilya Leoshkevich
2022-10-21  7:30 ` [PATCH 5/9] target/s390x: Use Int128 for return from TRE Richard Henderson
2022-10-24 13:17   ` Philippe Mathieu-Daudé
2022-10-27 10:40   ` Ilya Leoshkevich
2022-10-21  7:30 ` [PATCH 6/9] target/s390x: Copy wout_x1 to wout_x1_P Richard Henderson
2022-10-27 10:57   ` Ilya Leoshkevich
2022-10-21  7:30 ` [PATCH 7/9] tests/tcg/s390x: Add long-double.c Richard Henderson
2022-10-24 13:19   ` Philippe Mathieu-Daudé
2022-10-27 11:04   ` Ilya Leoshkevich
2022-10-21  7:30 ` [PATCH 8/9] target/s390x: Use Int128 for returning float128 Richard Henderson
2022-10-25  9:58   ` Philippe Mathieu-Daudé
2022-10-27 11:18   ` Ilya Leoshkevich
2022-10-27 11:31     ` Richard Henderson
2022-10-21  7:30 ` [PATCH 9/9] target/s390x: Use Int128 for passing float128 Richard Henderson
2022-10-24 18:01   ` Philippe Mathieu-Daudé
2022-10-24 22:31     ` Richard Henderson
2022-11-02  9:38   ` Ilya Leoshkevich
2022-11-02  9:47     ` Richard Henderson
2022-11-02 10:05       ` Ilya Leoshkevich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.