All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang
@ 2022-03-03 17:20 matheus.ferst
  2022-03-03 17:20 ` [PATCH v2 1/5] tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf matheus.ferst
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: matheus.ferst @ 2022-03-03 17:20 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: thuth, Matheus Ferst, danielhb413, richard.henderson, groug,
	philippe.mathieu.daude, clg, mrezanin, alex.bennee, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

As the configuration scripts used -mbig and -mlittle, building PPC tests
with Clang was silently skipped. With the patch to fix these options[1],
"make check-tcg" fails because of build and runtime errors. This patch
series tries to fix some of these problems.

The first patch fixes tests/tcg/ppc64le/mtfsf.c by removing the GCC-only
builtins used to emit mtfsf and mffs. We can emit these insns with
inline asm instead.

The second patch addresses differences in the output of float_madds.c.
The __builtin_fmaf used in this test emits fmadds with GCC and
xsmaddasp with LLVM. The first insn had rounding errors fixed in
d04ca895dc7f ("target/ppc: Add helpers for fmadds et al"), we apply a
similar fix to xsmaddasp.

Then we have build and runtime errors of bcdsub. According to GCC
docs[2], the '-mpower8-vector' flag provides some bcdsub builtins, so
it'd be reasonable to assume that the rest of the toolchain knows about
the insn if the compiler accepts this flag. Clang has supported this
flag since version 3.6[3], but the insn and builtins were only added in
LLVM 14[4]. I couldn't find a good config-time solution, so we use
__has_builtin to check for __builtin_bcdsub at compile-time and, if not
available, emit the opcode with a ".long." If __has_builtin itself is
not available, we assume that -mpower8-vector is enough and that the
insn is available.

Even building with a newer Clang that accepts the bcdsub insn, the test
fails at runtime because LLVM doesn't like "__int128" in inline asm. No
error or warning is emitted, but the generated code only loads one
doubleword of the VSR. The third patch avoids this issue by passing the
VSR values in GPR pairs, as we did in
84ade98e87ea ("target/ppc: do not silence snan in xscvspdpn").

The last patch fixes tests/tcg/ppc64le/non_signalling_xscv.c build with
-mabi=elfv1. Clang only recognizes VSX register in the clobber list of
inline asm when using ELFv2, so we use VSRs >= 32 and list them by their
Altivec name.

Finally, the insns tested by tests/tcg/ppc64le/byte_reverse.c are not
yet supported by LLVM. Since the configuration script uses '-mpower10'
to check for POWER10 support and Clang doesn't support this flag,
"make check-tcg" doesn't fail. We should probably change this check in
the future, but since LLVM support of POWER10 seems incomplete, I guess
we can leave it for now.

v2:
 - New patch to address non_signalling_xscv.c build problems with Clang
   and ELFv1;
 - Rework of bcdsub patch to work with LLVM < 14 and avoid vector types.

[1] https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg06506.html
[2] https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html
[3] https://github.com/llvm/llvm-project/commit/59eb767e11d4ffefb5f55409524e5c8416b2b0db
[4] https://github.com/llvm/llvm-project/commit/c933c2eb334660c131f4afc9d194fafb0cec0423

Matheus Ferst (5):
  tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf
  target/ppc: change xs[n]madd[am]sp to use float64r32_muladd
  tests/tcg/ppc64le: drop __int128 usage in bcdsub
  tests/tcg/ppc64le: emit bcdsub with .long when needed
  tests/tcg/ppc64le: Use Altivec register names in clobbler list

 target/ppc/fpu_helper.c                 |  93 +++++++----------
 tests/tcg/ppc64le/bcdsub.c              | 126 +++++++++++-------------
 tests/tcg/ppc64le/mtfsf.c               |  19 ++--
 tests/tcg/ppc64le/non_signalling_xscv.c |  16 +--
 4 files changed, 112 insertions(+), 142 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/5] tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf
  2022-03-03 17:20 [PATCH v2 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang matheus.ferst
@ 2022-03-03 17:20 ` matheus.ferst
  2022-03-03 18:46   ` Richard Henderson
  2022-03-03 17:20 ` [PATCH v2 2/5] target/ppc: change xs[n]madd[am]sp to use float64r32_muladd matheus.ferst
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: matheus.ferst @ 2022-03-03 17:20 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: thuth, Matheus Ferst, danielhb413, richard.henderson, groug,
	philippe.mathieu.daude, clg, mrezanin, alex.bennee, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

LLVM/Clang does not support __builtin_mtfsf.

Acked-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 tests/tcg/ppc64le/mtfsf.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/tests/tcg/ppc64le/mtfsf.c b/tests/tcg/ppc64le/mtfsf.c
index b3d31f3637..bed5b1afa4 100644
--- a/tests/tcg/ppc64le/mtfsf.c
+++ b/tests/tcg/ppc64le/mtfsf.c
@@ -1,8 +1,12 @@
 #include <stdlib.h>
+#include <stdint.h>
 #include <assert.h>
 #include <signal.h>
 #include <sys/prctl.h>
 
+#define MTFSF(FLM, FRB) asm volatile ("mtfsf %0, %1" :: "i" (FLM), "f" (FRB))
+#define MFFS(FRT) asm("mffs %0" : "=f" (FRT))
+
 #define FPSCR_VE     7  /* Floating-point invalid operation exception enable */
 #define FPSCR_VXSOFT 10 /* Floating-point invalid operation exception (soft) */
 #define FPSCR_FI     17 /* Floating-point fraction inexact                   */
@@ -21,10 +25,7 @@ void sigfpe_handler(int sig, siginfo_t *si, void *ucontext)
 
 int main(void)
 {
-    union {
-        double d;
-        long long ll;
-    } fpscr;
+    uint64_t fpscr;
 
     struct sigaction sa = {
         .sa_sigaction = sigfpe_handler,
@@ -40,10 +41,9 @@ int main(void)
     prctl(PR_SET_FPEXC, PR_FP_EXC_PRECISE);
 
     /* First test if the FI bit is being set correctly */
-    fpscr.ll = FP_FI;
-    __builtin_mtfsf(0b11111111, fpscr.d);
-    fpscr.d = __builtin_mffs();
-    assert((fpscr.ll & FP_FI) != 0);
+    MTFSF(0b11111111, FP_FI);
+    MFFS(fpscr);
+    assert((fpscr & FP_FI) != 0);
 
     /* Then test if the deferred exception is being called correctly */
     sigaction(SIGFPE, &sa, NULL);
@@ -54,8 +54,7 @@ int main(void)
      * But if a different exception is chosen si_code check should
      * change accordingly.
      */
-    fpscr.ll = FP_VE | FP_VXSOFT;
-    __builtin_mtfsf(0b11111111, fpscr.d);
+    MTFSF(0b11111111, FP_VE | FP_VXSOFT);
 
     return 1;
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/5] target/ppc: change xs[n]madd[am]sp to use float64r32_muladd
  2022-03-03 17:20 [PATCH v2 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang matheus.ferst
  2022-03-03 17:20 ` [PATCH v2 1/5] tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf matheus.ferst
@ 2022-03-03 17:20 ` matheus.ferst
  2022-03-03 18:49   ` Richard Henderson
  2022-03-03 17:20 ` [RFC PATCH v2 3/5] tests/tcg/ppc64le: drop __int128 usage in bcdsub matheus.ferst
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: matheus.ferst @ 2022-03-03 17:20 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: thuth, Matheus Ferst, danielhb413, richard.henderson, groug,
	philippe.mathieu.daude, clg, mrezanin, alex.bennee, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Change VSX Scalar Multiply-Add/Subtract Type-A/M Single Precision
helpers to use float64r32_muladd. This method should correctly handle
all rounding modes, so the workaround for float_round_nearest_even can
be dropped.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 target/ppc/fpu_helper.c | 93 ++++++++++++++++-------------------------
 1 file changed, 35 insertions(+), 58 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 8f970288f5..c973968ed6 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -1916,22 +1916,19 @@ void helper_xsdivqp(CPUPPCState *env, uint32_t opcode,
  *   fld   - vsr_t field (VsrD(*) or VsrW(*))
  *   sfprf - set FPRF
  */
-#define VSX_RE(op, nels, tp, fld, sfprf, r2sp)                                \
+#define VSX_RE(op, nels, tp, op_tp, fld, sfprf)                               \
 void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)              \
 {                                                                             \
     ppc_vsr_t t = { };                                                        \
-    int i;                                                                    \
+    int i, flags;                                                             \
                                                                               \
     helper_reset_fpstatus(env);                                               \
                                                                               \
     for (i = 0; i < nels; i++) {                                              \
-        if (unlikely(tp##_is_signaling_nan(xb->fld, &env->fp_status))) {      \
+        t.fld = op_tp##_div(tp##_one, xb->fld, &env->fp_status);              \
+        flags = get_float_exception_flags(&env->fp_status);                   \
+        if (unlikely(flags & float_flag_invalid_snan)) {                      \
             float_invalid_op_vxsnan(env, GETPC());                            \
-        }                                                                     \
-        t.fld = tp##_div(tp##_one, xb->fld, &env->fp_status);                 \
-                                                                              \
-        if (r2sp) {                                                           \
-            t.fld = do_frsp(env, t.fld, GETPC());                             \
         }                                                                     \
                                                                               \
         if (sfprf) {                                                          \
@@ -1943,10 +1940,10 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)              \
     do_float_check_status(env, GETPC());                                      \
 }
 
-VSX_RE(xsredp, 1, float64, VsrD(0), 1, 0)
-VSX_RE(xsresp, 1, float64, VsrD(0), 1, 1)
-VSX_RE(xvredp, 2, float64, VsrD(i), 0, 0)
-VSX_RE(xvresp, 4, float32, VsrW(i), 0, 0)
+VSX_RE(xsredp, 1, float64, float64, VsrD(0), 1)
+VSX_RE(xsresp, 1, float64, float64r32, VsrD(0), 1)
+VSX_RE(xvredp, 2, float64, float64, VsrD(i), 0)
+VSX_RE(xvresp, 4, float32, float32, VsrW(i), 0)
 
 /*
  * VSX_SQRT - VSX floating point square root
@@ -1998,10 +1995,11 @@ VSX_SQRT(xvsqrtsp, 4, float32, VsrW(i), 0, 0)
  *   op    - instruction mnemonic
  *   nels  - number of elements (1, 2 or 4)
  *   tp    - type (float32 or float64)
+ *   op_tp - operation type
  *   fld   - vsr_t field (VsrD(*) or VsrW(*))
  *   sfprf - set FPRF
  */
-#define VSX_RSQRTE(op, nels, tp, fld, sfprf, r2sp)                           \
+#define VSX_RSQRTE(op, nels, tp, op_tp, fld, sfprf)                          \
 void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)             \
 {                                                                            \
     ppc_vsr_t t = { };                                                       \
@@ -2012,15 +2010,12 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)             \
     for (i = 0; i < nels; i++) {                                             \
         float_status tstat = env->fp_status;                                 \
         set_float_exception_flags(0, &tstat);                                \
-        t.fld = tp##_sqrt(xb->fld, &tstat);                                  \
-        t.fld = tp##_div(tp##_one, t.fld, &tstat);                           \
+        t.fld = op_tp##_sqrt(xb->fld, &tstat);                               \
+        t.fld = op_tp##_div(tp##_one, t.fld, &tstat);                        \
         env->fp_status.float_exception_flags |= tstat.float_exception_flags; \
         if (unlikely(tstat.float_exception_flags & float_flag_invalid)) {    \
             float_invalid_op_sqrt(env, tstat.float_exception_flags,          \
                                   sfprf, GETPC());                           \
-        }                                                                    \
-        if (r2sp) {                                                          \
-            t.fld = do_frsp(env, t.fld, GETPC());                            \
         }                                                                    \
                                                                              \
         if (sfprf) {                                                         \
@@ -2032,10 +2027,10 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)             \
     do_float_check_status(env, GETPC());                                     \
 }
 
-VSX_RSQRTE(xsrsqrtedp, 1, float64, VsrD(0), 1, 0)
-VSX_RSQRTE(xsrsqrtesp, 1, float64, VsrD(0), 1, 1)
-VSX_RSQRTE(xvrsqrtedp, 2, float64, VsrD(i), 0, 0)
-VSX_RSQRTE(xvrsqrtesp, 4, float32, VsrW(i), 0, 0)
+VSX_RSQRTE(xsrsqrtedp, 1, float64, float64, VsrD(0), 1)
+VSX_RSQRTE(xsrsqrtesp, 1, float64, float64r32, VsrD(0), 1)
+VSX_RSQRTE(xvrsqrtedp, 2, float64, float64, VsrD(i), 0)
+VSX_RSQRTE(xvrsqrtesp, 4, float32, float32, VsrW(i), 0)
 
 /*
  * VSX_TDIV - VSX floating point test for divide
@@ -2156,9 +2151,8 @@ VSX_TSQRT(xvtsqrtsp, 4, float32, VsrW(i), -126, 23)
  *   maddflgs - flags for the float*muladd routine that control the
  *           various forms (madd, msub, nmadd, nmsub)
  *   sfprf - set FPRF
- *   r2sp  - round intermediate double precision result to single precision
  */
-#define VSX_MADD(op, nels, tp, fld, maddflgs, sfprf, r2sp)                    \
+#define VSX_MADD(op, nels, tp, fld, maddflgs, sfprf)                          \
 void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
                  ppc_vsr_t *s1, ppc_vsr_t *s2, ppc_vsr_t *s3)                 \
 {                                                                             \
@@ -2170,20 +2164,7 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
     for (i = 0; i < nels; i++) {                                              \
         float_status tstat = env->fp_status;                                  \
         set_float_exception_flags(0, &tstat);                                 \
-        if (r2sp && (tstat.float_rounding_mode == float_round_nearest_even)) {\
-            /*                                                                \
-             * Avoid double rounding errors by rounding the intermediate      \
-             * result to odd.                                                 \
-             */                                                               \
-            set_float_rounding_mode(float_round_to_zero, &tstat);             \
-            t.fld = tp##_muladd(s1->fld, s3->fld, s2->fld,                    \
-                                maddflgs, &tstat);                            \
-            t.fld |= (get_float_exception_flags(&tstat) &                     \
-                      float_flag_inexact) != 0;                               \
-        } else {                                                              \
-            t.fld = tp##_muladd(s1->fld, s3->fld, s2->fld,                    \
-                                maddflgs, &tstat);                            \
-        }                                                                     \
+        t.fld = tp##_muladd(s1->fld, s3->fld, s2->fld, maddflgs, &tstat);     \
         env->fp_status.float_exception_flags |= tstat.float_exception_flags;  \
                                                                               \
         if (unlikely(tstat.float_exception_flags & float_flag_invalid)) {     \
@@ -2191,10 +2172,6 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
                                   sfprf, GETPC());                            \
         }                                                                     \
                                                                               \
-        if (r2sp) {                                                           \
-            t.fld = do_frsp(env, t.fld, GETPC());                             \
-        }                                                                     \
-                                                                              \
         if (sfprf) {                                                          \
             helper_compute_fprf_float64(env, t.fld);                          \
         }                                                                     \
@@ -2203,24 +2180,24 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
     do_float_check_status(env, GETPC());                                      \
 }
 
-VSX_MADD(XSMADDDP, 1, float64, VsrD(0), MADD_FLGS, 1, 0)
-VSX_MADD(XSMSUBDP, 1, float64, VsrD(0), MSUB_FLGS, 1, 0)
-VSX_MADD(XSNMADDDP, 1, float64, VsrD(0), NMADD_FLGS, 1, 0)
-VSX_MADD(XSNMSUBDP, 1, float64, VsrD(0), NMSUB_FLGS, 1, 0)
-VSX_MADD(XSMADDSP, 1, float64, VsrD(0), MADD_FLGS, 1, 1)
-VSX_MADD(XSMSUBSP, 1, float64, VsrD(0), MSUB_FLGS, 1, 1)
-VSX_MADD(XSNMADDSP, 1, float64, VsrD(0), NMADD_FLGS, 1, 1)
-VSX_MADD(XSNMSUBSP, 1, float64, VsrD(0), NMSUB_FLGS, 1, 1)
+VSX_MADD(XSMADDDP, 1, float64, VsrD(0), MADD_FLGS, 1)
+VSX_MADD(XSMSUBDP, 1, float64, VsrD(0), MSUB_FLGS, 1)
+VSX_MADD(XSNMADDDP, 1, float64, VsrD(0), NMADD_FLGS, 1)
+VSX_MADD(XSNMSUBDP, 1, float64, VsrD(0), NMSUB_FLGS, 1)
+VSX_MADD(XSMADDSP, 1, float64r32, VsrD(0), MADD_FLGS, 1)
+VSX_MADD(XSMSUBSP, 1, float64r32, VsrD(0), MSUB_FLGS, 1)
+VSX_MADD(XSNMADDSP, 1, float64r32, VsrD(0), NMADD_FLGS, 1)
+VSX_MADD(XSNMSUBSP, 1, float64r32, VsrD(0), NMSUB_FLGS, 1)
 
-VSX_MADD(xvmadddp, 2, float64, VsrD(i), MADD_FLGS, 0, 0)
-VSX_MADD(xvmsubdp, 2, float64, VsrD(i), MSUB_FLGS, 0, 0)
-VSX_MADD(xvnmadddp, 2, float64, VsrD(i), NMADD_FLGS, 0, 0)
-VSX_MADD(xvnmsubdp, 2, float64, VsrD(i), NMSUB_FLGS, 0, 0)
+VSX_MADD(xvmadddp, 2, float64, VsrD(i), MADD_FLGS, 0)
+VSX_MADD(xvmsubdp, 2, float64, VsrD(i), MSUB_FLGS, 0)
+VSX_MADD(xvnmadddp, 2, float64, VsrD(i), NMADD_FLGS, 0)
+VSX_MADD(xvnmsubdp, 2, float64, VsrD(i), NMSUB_FLGS, 0)
 
-VSX_MADD(xvmaddsp, 4, float32, VsrW(i), MADD_FLGS, 0, 0)
-VSX_MADD(xvmsubsp, 4, float32, VsrW(i), MSUB_FLGS, 0, 0)
-VSX_MADD(xvnmaddsp, 4, float32, VsrW(i), NMADD_FLGS, 0, 0)
-VSX_MADD(xvnmsubsp, 4, float32, VsrW(i), NMSUB_FLGS, 0, 0)
+VSX_MADD(xvmaddsp, 4, float32, VsrW(i), MADD_FLGS, 0)
+VSX_MADD(xvmsubsp, 4, float32, VsrW(i), MSUB_FLGS, 0)
+VSX_MADD(xvnmaddsp, 4, float32, VsrW(i), NMADD_FLGS, 0)
+VSX_MADD(xvnmsubsp, 4, float32, VsrW(i), NMSUB_FLGS, 0)
 
 /*
  * VSX_MADDQ - VSX floating point quad-precision muliply/add
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v2 3/5] tests/tcg/ppc64le: drop __int128 usage in bcdsub
  2022-03-03 17:20 [PATCH v2 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang matheus.ferst
  2022-03-03 17:20 ` [PATCH v2 1/5] tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf matheus.ferst
  2022-03-03 17:20 ` [PATCH v2 2/5] target/ppc: change xs[n]madd[am]sp to use float64r32_muladd matheus.ferst
@ 2022-03-03 17:20 ` matheus.ferst
  2022-03-03 18:57   ` Richard Henderson
  2022-03-03 17:20 ` [RFC PATCH v2 4/5] tests/tcg/ppc64le: emit bcdsub with .long when needed matheus.ferst
  2022-03-03 17:20 ` [PATCH v2 5/5] tests/tcg/ppc64le: Use Altivec register names in clobbler list matheus.ferst
  4 siblings, 1 reply; 14+ messages in thread
From: matheus.ferst @ 2022-03-03 17:20 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: thuth, Matheus Ferst, danielhb413, richard.henderson, groug,
	philippe.mathieu.daude, clg, mrezanin, alex.bennee, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Using __int128 with inline asm constraints like "v" generates incorrect
code when compiling with LLVM/Clang (e.g., only one doubleword of the
VSR is loaded). Instead, use a GPR pair to pass the 128-bits value and
load the VSR with mtvsrd/xxmrghd.

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
I'm avoiding newer insns like mtvsrdd/mfvsrld to move values between VSR
and GPR so we can run this test in a POWER8 machine.

v2:
 - No vector types to avoid endian-dependent code.
---
 tests/tcg/ppc64le/bcdsub.c | 117 +++++++++++++++++--------------------
 1 file changed, 53 insertions(+), 64 deletions(-)

diff --git a/tests/tcg/ppc64le/bcdsub.c b/tests/tcg/ppc64le/bcdsub.c
index 8c188cae6d..c9ca5357cb 100644
--- a/tests/tcg/ppc64le/bcdsub.c
+++ b/tests/tcg/ppc64le/bcdsub.c
@@ -1,6 +1,7 @@
 #include <assert.h>
 #include <unistd.h>
 #include <signal.h>
+#include <stdint.h>
 
 #define CRF_LT  (1 << 3)
 #define CRF_GT  (1 << 2)
@@ -8,21 +9,32 @@
 #define CRF_SO  (1 << 0)
 #define UNDEF   0
 
-#define BCDSUB(vra, vrb, ps)                    \
-    asm ("bcdsub. %1,%2,%3,%4;"                 \
-         "mfocrf %0,0b10;"                      \
-         : "=r" (cr), "=v" (vrt)                \
-         : "v" (vra), "v" (vrb), "i" (ps)       \
-         : );
+#define BCDSUB(AH, AL, BH, BL, PS)                          \
+    asm ("mtvsrd 32, %3\n\t"                                \
+         "mtvsrd 33, %4\n\t"                                \
+         "xxmrghd 32, 32, 33\n\t"                           \
+         "mtvsrd 33, %5\n\t"                                \
+         "mtvsrd 34, %6\n\t"                                \
+         "xxmrghd 33, 33, 34\n\t"                           \
+         "bcdsub. 0, 0, 1, %7\n\t"                          \
+         "mfocrf %0, 0b10\n\t"                              \
+         "mfvsrd %1, 32\n\t"                                \
+         "xxswapd 32, 32\n\t"                               \
+         "mfvsrd %2, 32\n\t"                                \
+         : "=r" (cr), "=r" (th), "=r" (tl)                  \
+         : "r" (AH), "r" (AL), "r" (BH), "r" (BL), "i" (PS) \
+         : "v0", "v1", "v2");
 
-#define TEST(vra, vrb, ps, exp_res, exp_cr6)    \
+#define TEST(AH, AL, BH, BL, PS, TH, TL, CR6)   \
     do {                                        \
-        __int128 vrt = 0;                       \
         int cr = 0;                             \
-        BCDSUB(vra, vrb, ps);                   \
-        if (exp_res)                            \
-            assert(vrt == exp_res);             \
-        assert((cr >> 4) == exp_cr6);           \
+        uint64_t th, tl;                        \
+        BCDSUB(AH, AL, BH, BL, PS);             \
+        if (TH || TL) {                         \
+            assert(tl == TL);                   \
+            assert(th == TH);                   \
+        }                                       \
+        assert((cr >> 4) == CR6);               \
     } while (0)
 
 
@@ -33,13 +45,13 @@
  */
 void test_bcdsub_eq(void)
 {
-    __int128 a, b;
-
     /* maximum positive BCD value */
-    a = b = (((__int128) 0x9999999999999999) << 64 | 0x999999999999999c);
-
-    TEST(a, b, 0, 0xc, CRF_EQ);
-    TEST(a, b, 1, 0xf, CRF_EQ);
+    TEST(0x9999999999999999, 0x999999999999999c,
+         0x9999999999999999, 0x999999999999999c,
+         0, 0x0, 0xc, CRF_EQ);
+    TEST(0x9999999999999999, 0x999999999999999c,
+         0x9999999999999999, 0x999999999999999c,
+         1, 0x0, 0xf, CRF_EQ);
 }
 
 /*
@@ -49,21 +61,16 @@ void test_bcdsub_eq(void)
  */
 void test_bcdsub_gt(void)
 {
-    __int128 a, b, c;
+    /* maximum positive and negative one BCD values */
+    TEST(0x9999999999999999, 0x999999999999999c, 0x0, 0x1d, 0,
+         0x0, 0xc, (CRF_GT | CRF_SO));
+    TEST(0x9999999999999999, 0x999999999999999c, 0x0, 0x1d, 1,
+         0x0, 0xf, (CRF_GT | CRF_SO));
 
-    /* maximum positive BCD value */
-    a = (((__int128) 0x9999999999999999) << 64 | 0x999999999999999c);
-
-    /* negative one BCD value */
-    b = (__int128) 0x1d;
-
-    TEST(a, b, 0, 0xc, (CRF_GT | CRF_SO));
-    TEST(a, b, 1, 0xf, (CRF_GT | CRF_SO));
-
-    c = (((__int128) 0x9999999999999999) << 64 | 0x999999999999998c);
-
-    TEST(c, b, 0, a, CRF_GT);
-    TEST(c, b, 1, (a | 0x3), CRF_GT);
+    TEST(0x9999999999999999, 0x999999999999998c, 0x0, 0x1d, 0,
+         0x9999999999999999, 0x999999999999999c, CRF_GT);
+    TEST(0x9999999999999999, 0x999999999999998c, 0x0, 0x1d, 1,
+         0x9999999999999999, 0x999999999999999f, CRF_GT);
 }
 
 /*
@@ -73,45 +80,27 @@ void test_bcdsub_gt(void)
  */
 void test_bcdsub_lt(void)
 {
-    __int128 a, b;
+    /* positive zero and positive one BCD values */
+    TEST(0x0, 0xc, 0x0, 0x1c, 0, 0x0, 0x1d, CRF_LT);
+    TEST(0x0, 0xc, 0x0, 0x1c, 1, 0x0, 0x1d, CRF_LT);
 
-    /* positive zero BCD value */
-    a = (__int128) 0xc;
-
-    /* positive one BCD value */
-    b = (__int128) 0x1c;
-
-    TEST(a, b, 0, 0x1d, CRF_LT);
-    TEST(a, b, 1, 0x1d, CRF_LT);
-
-    /* maximum negative BCD value */
-    a = (((__int128) 0x9999999999999999) << 64 | 0x999999999999999d);
-
-    /* positive one BCD value */
-    b = (__int128) 0x1c;
-
-    TEST(a, b, 0, 0xd, (CRF_LT | CRF_SO));
-    TEST(a, b, 1, 0xd, (CRF_LT | CRF_SO));
+    /* maximum negative and positive one BCD values */
+    TEST(0x9999999999999999, 0x999999999999999d, 0x0, 0x1c, 0,
+         0x0, 0xd, (CRF_LT | CRF_SO));
+    TEST(0x9999999999999999, 0x999999999999999d, 0x0, 0x1c, 1,
+         0x0, 0xd, (CRF_LT | CRF_SO));
 }
 
 void test_bcdsub_invalid(void)
 {
-    __int128 a, b;
+    TEST(0x0, 0x1c, 0x0, 0xf00, 0, UNDEF, UNDEF, CRF_SO);
+    TEST(0x0, 0x1c, 0x0, 0xf00, 1, UNDEF, UNDEF, CRF_SO);
 
-    /* positive one BCD value */
-    a = (__int128) 0x1c;
-    b = 0xf00;
+    TEST(0x0, 0xf00, 0x0, 0x1c, 0, UNDEF, UNDEF, CRF_SO);
+    TEST(0x0, 0xf00, 0x0, 0x1c, 1, UNDEF, UNDEF, CRF_SO);
 
-    TEST(a, b, 0, UNDEF, CRF_SO);
-    TEST(a, b, 1, UNDEF, CRF_SO);
-
-    TEST(b, a, 0, UNDEF, CRF_SO);
-    TEST(b, a, 1, UNDEF, CRF_SO);
-
-    a = 0xbad;
-
-    TEST(a, b, 0, UNDEF, CRF_SO);
-    TEST(a, b, 1, UNDEF, CRF_SO);
+    TEST(0x0, 0xbad, 0x0, 0xf00, 0, UNDEF, UNDEF, CRF_SO);
+    TEST(0x0, 0xbad, 0x0, 0xf00, 1, UNDEF, UNDEF, CRF_SO);
 }
 
 int main(void)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v2 4/5] tests/tcg/ppc64le: emit bcdsub with .long when needed
  2022-03-03 17:20 [PATCH v2 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang matheus.ferst
                   ` (2 preceding siblings ...)
  2022-03-03 17:20 ` [RFC PATCH v2 3/5] tests/tcg/ppc64le: drop __int128 usage in bcdsub matheus.ferst
@ 2022-03-03 17:20 ` matheus.ferst
  2022-03-03 19:19   ` Richard Henderson
  2022-03-03 17:20 ` [PATCH v2 5/5] tests/tcg/ppc64le: Use Altivec register names in clobbler list matheus.ferst
  4 siblings, 1 reply; 14+ messages in thread
From: matheus.ferst @ 2022-03-03 17:20 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: thuth, Matheus Ferst, danielhb413, richard.henderson, groug,
	philippe.mathieu.daude, clg, mrezanin, alex.bennee, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Based on GCC docs[1], we use the '-mpower8-vector' flag at config-time
to detect the toolchain support to the bcdsub instruction. LLVM/Clang
supports this flag since version 3.6[2], but the instruction and related
builtins were only added in LLVM 14[3]. In the absence of other means to
detect this support at config-time, we resort to __has_builtin to
identify the presence of __builtin_bcdsub at compile-time. If the
builtin is not available, the instruction is emitted with a ".long".

[1] https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html
[2] https://github.com/llvm/llvm-project/commit/59eb767e11d4ffefb5f55409524e5c8416b2b0db
[3] https://github.com/llvm/llvm-project/commit/c933c2eb334660c131f4afc9d194fafb0cec0423

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 tests/tcg/ppc64le/bcdsub.c | 55 +++++++++++++++++++++-----------------
 1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/tests/tcg/ppc64le/bcdsub.c b/tests/tcg/ppc64le/bcdsub.c
index c9ca5357cb..445d50f07d 100644
--- a/tests/tcg/ppc64le/bcdsub.c
+++ b/tests/tcg/ppc64le/bcdsub.c
@@ -9,32 +9,37 @@
 #define CRF_SO  (1 << 0)
 #define UNDEF   0
 
-#define BCDSUB(AH, AL, BH, BL, PS)                          \
-    asm ("mtvsrd 32, %3\n\t"                                \
-         "mtvsrd 33, %4\n\t"                                \
-         "xxmrghd 32, 32, 33\n\t"                           \
-         "mtvsrd 33, %5\n\t"                                \
-         "mtvsrd 34, %6\n\t"                                \
-         "xxmrghd 33, 33, 34\n\t"                           \
-         "bcdsub. 0, 0, 1, %7\n\t"                          \
-         "mfocrf %0, 0b10\n\t"                              \
-         "mfvsrd %1, 32\n\t"                                \
-         "xxswapd 32, 32\n\t"                               \
-         "mfvsrd %2, 32\n\t"                                \
-         : "=r" (cr), "=r" (th), "=r" (tl)                  \
-         : "r" (AH), "r" (AL), "r" (BH), "r" (BL), "i" (PS) \
-         : "v0", "v1", "v2");
+#if defined(__has_builtin) && !__has_builtin(__builtin_bcdsub)
+#define BCDSUB(T, A, B, PS) \
+    ".long 4 << 26 | (" #T ") << 21 | (" #A ") << 16 | (" #B ") << 11"  \
+    " | 1 << 10 | (" #PS ") << 9 | 65\n\t"
+#else
+#define BCDSUB(T, A, B, PS) "bcdsub. " #T ", " #A ", " #B ", " #PS "\n\t"
+#endif
 
-#define TEST(AH, AL, BH, BL, PS, TH, TL, CR6)   \
-    do {                                        \
-        int cr = 0;                             \
-        uint64_t th, tl;                        \
-        BCDSUB(AH, AL, BH, BL, PS);             \
-        if (TH || TL) {                         \
-            assert(tl == TL);                   \
-            assert(th == TH);                   \
-        }                                       \
-        assert((cr >> 4) == CR6);               \
+#define TEST(AH, AL, BH, BL, PS, TH, TL, CR6)                   \
+    do {                                                        \
+        int cr = 0;                                             \
+        uint64_t th, tl;                                        \
+        asm ("mtvsrd 32, %3\n\t"                                \
+             "mtvsrd 33, %4\n\t"                                \
+             "xxmrghd 32, 32, 33\n\t"                           \
+             "mtvsrd 33, %5\n\t"                                \
+             "mtvsrd 34, %6\n\t"                                \
+             "xxmrghd 33, 33, 34\n\t"                           \
+             BCDSUB(0, 0, 1, PS)                                \
+             "mfocrf %0, 0b10\n\t"                              \
+             "mfvsrd %1, 32\n\t"                                \
+             "xxswapd 32, 32\n\t"                               \
+             "mfvsrd %2, 32\n\t"                                \
+             : "=r" (cr), "=r" (th), "=r" (tl)                  \
+             : "r" (AH), "r" (AL), "r" (BH), "r" (BL)           \
+             : "v0", "v1", "v2");                               \
+        if (TH || TL) {                                         \
+            assert(tl == TL);                                   \
+            assert(th == TH);                                   \
+        }                                                       \
+        assert((cr >> 4) == CR6);                               \
     } while (0)
 
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 5/5] tests/tcg/ppc64le: Use Altivec register names in clobbler list
  2022-03-03 17:20 [PATCH v2 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang matheus.ferst
                   ` (3 preceding siblings ...)
  2022-03-03 17:20 ` [RFC PATCH v2 4/5] tests/tcg/ppc64le: emit bcdsub with .long when needed matheus.ferst
@ 2022-03-03 17:20 ` matheus.ferst
  2022-03-03 19:30   ` Richard Henderson
  4 siblings, 1 reply; 14+ messages in thread
From: matheus.ferst @ 2022-03-03 17:20 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: thuth, Matheus Ferst, danielhb413, richard.henderson, groug,
	philippe.mathieu.daude, clg, mrezanin, alex.bennee, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

LLVM/Clang doesn't know the VSX registers when compiling with
-mabi=elfv1. Use only registers >= 32 and list them with their Altivec
name.

Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 tests/tcg/ppc64le/non_signalling_xscv.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/tests/tcg/ppc64le/non_signalling_xscv.c b/tests/tcg/ppc64le/non_signalling_xscv.c
index 91e25cad46..836df71ef0 100644
--- a/tests/tcg/ppc64le/non_signalling_xscv.c
+++ b/tests/tcg/ppc64le/non_signalling_xscv.c
@@ -6,16 +6,16 @@
 #define TEST(INSN, B_HI, B_LO, T_HI, T_LO) \
     do {                                                                \
         uint64_t th, tl, bh = B_HI, bl = B_LO;                          \
-        asm("mtvsrd 0, %2\n\t"                                          \
-            "mtvsrd 1, %3\n\t"                                          \
-            "xxmrghd 0, 0, 1\n\t"                                       \
-            INSN " 0, 0\n\t"                                            \
-            "mfvsrd %0, 0\n\t"                                          \
-            "xxswapd 0, 0\n\t"                                          \
-            "mfvsrd %1, 0\n\t"                                          \
+        asm("mtvsrd 32, %2\n\t"                                         \
+            "mtvsrd 33, %3\n\t"                                         \
+            "xxmrghd 32, 32, 33\n\t"                                    \
+            INSN " 32, 32\n\t"                                          \
+            "mfvsrd %0, 32\n\t"                                         \
+            "xxswapd 32, 32\n\t"                                        \
+            "mfvsrd %1, 32\n\t"                                         \
             : "=r" (th), "=r" (tl)                                      \
             : "r" (bh), "r" (bl)                                        \
-            : "vs0", "vs1");                                            \
+            : "v0", "v1");                                              \
         printf(INSN "(0x%016" PRIx64 "%016" PRIx64 ") = 0x%016" PRIx64  \
                "%016" PRIx64 "\n", bh, bl, th, tl);                     \
         assert(th == T_HI && tl == T_LO);                               \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/5] tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf
  2022-03-03 17:20 ` [PATCH v2 1/5] tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf matheus.ferst
@ 2022-03-03 18:46   ` Richard Henderson
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2022-03-03 18:46 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: thuth, danielhb413, groug, philippe.mathieu.daude, clg, mrezanin,
	alex.bennee, david

On 3/3/22 07:20, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> LLVM/Clang does not support __builtin_mtfsf.
> 
> Acked-by: Alex Bennée<alex.bennee@linaro.org>
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   tests/tcg/ppc64le/mtfsf.c | 19 +++++++++----------
>   1 file changed, 9 insertions(+), 10 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/5] target/ppc: change xs[n]madd[am]sp to use float64r32_muladd
  2022-03-03 17:20 ` [PATCH v2 2/5] target/ppc: change xs[n]madd[am]sp to use float64r32_muladd matheus.ferst
@ 2022-03-03 18:49   ` Richard Henderson
  2022-03-03 20:53     ` Matheus K. Ferst
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2022-03-03 18:49 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: thuth, danielhb413, groug, philippe.mathieu.daude, clg, mrezanin,
	alex.bennee, david

On 3/3/22 07:20, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> Change VSX Scalar Multiply-Add/Subtract Type-A/M Single Precision
> helpers to use float64r32_muladd. This method should correctly handle
> all rounding modes, so the workaround for float_round_nearest_even can
> be dropped.
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> ---
>   target/ppc/fpu_helper.c | 93 ++++++++++++++++-------------------------
>   1 file changed, 35 insertions(+), 58 deletions(-)
> 
> diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
> index 8f970288f5..c973968ed6 100644
> --- a/target/ppc/fpu_helper.c
> +++ b/target/ppc/fpu_helper.c
> @@ -1916,22 +1916,19 @@ void helper_xsdivqp(CPUPPCState *env, uint32_t opcode,
>    *   fld   - vsr_t field (VsrD(*) or VsrW(*))
>    *   sfprf - set FPRF
>    */
> -#define VSX_RE(op, nels, tp, fld, sfprf, r2sp)                                \
> +#define VSX_RE(op, nels, tp, op_tp, fld, sfprf)                               \
>   void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)              \
>   {                                                                             \
>       ppc_vsr_t t = { };                                                        \
> -    int i;                                                                    \
> +    int i, flags;                                                             \
>                                                                                 \
>       helper_reset_fpstatus(env);                                               \
>                                                                                 \
>       for (i = 0; i < nels; i++) {                                              \
> -        if (unlikely(tp##_is_signaling_nan(xb->fld, &env->fp_status))) {      \
> +        t.fld = op_tp##_div(tp##_one, xb->fld, &env->fp_status);              \
> +        flags = get_float_exception_flags(&env->fp_status);                   \
> +        if (unlikely(flags & float_flag_invalid_snan)) {   

You seem to have squashed the change to recip-estimate into the same patch as muladd.


> -#define VSX_RSQRTE(op, nels, tp, fld, sfprf, r2sp)                           \
> +#define VSX_RSQRTE(op, nels, tp, op_tp, fld, sfprf)                          \

And recip-sqrt-estimate.

I guess it's ok to squash, since it's all related, but you should update the patch 
description if you leave it this way.


r~


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2 3/5] tests/tcg/ppc64le: drop __int128 usage in bcdsub
  2022-03-03 17:20 ` [RFC PATCH v2 3/5] tests/tcg/ppc64le: drop __int128 usage in bcdsub matheus.ferst
@ 2022-03-03 18:57   ` Richard Henderson
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2022-03-03 18:57 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: thuth, danielhb413, groug, philippe.mathieu.daude, clg, mrezanin,
	alex.bennee, david

On 3/3/22 07:20, matheus.ferst@eldorado.org.br wrote:
> I'm avoiding newer insns like mtvsrdd/mfvsrld to move values between VSR
> and GPR so we can run this test in a POWER8 machine.

This...

> +#define BCDSUB(AH, AL, BH, BL, PS)                          \
> +    asm ("mtvsrd 32, %3\n\t"                                \
> +         "mtvsrd 33, %4\n\t"                                \
> +         "xxmrghd 32, 32, 33\n\t"                           \
> +         "mtvsrd 33, %5\n\t"                                \
> +         "mtvsrd 34, %6\n\t"                                \
> +         "xxmrghd 33, 33, 34\n\t"                           \
> +         "bcdsub. 0, 0, 1, %7\n\t"                          \
> +         "mfocrf %0, 0b10\n\t"                              \
> +         "mfvsrd %1, 32\n\t"                                \
> +         "xxswapd 32, 32\n\t"                               \
> +         "mfvsrd %2, 32\n\t"                                \
> +         : "=r" (cr), "=r" (th), "=r" (tl)                  \
> +         : "r" (AH), "r" (AL), "r" (BH), "r" (BL), "i" (PS) \
> +         : "v0", "v1", "v2");

... belongs here as a comment.

> +        if (TH || TL) {                         \

Would be clearer with TH != UNDEF || TL != UNDEF.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v2 4/5] tests/tcg/ppc64le: emit bcdsub with .long when needed
  2022-03-03 17:20 ` [RFC PATCH v2 4/5] tests/tcg/ppc64le: emit bcdsub with .long when needed matheus.ferst
@ 2022-03-03 19:19   ` Richard Henderson
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2022-03-03 19:19 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: thuth, danielhb413, groug, philippe.mathieu.daude, clg, mrezanin,
	alex.bennee, david

On 3/3/22 07:20, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst<matheus.ferst@eldorado.org.br>
> 
> Based on GCC docs[1], we use the '-mpower8-vector' flag at config-time
> to detect the toolchain support to the bcdsub instruction. LLVM/Clang
> supports this flag since version 3.6[2], but the instruction and related
> builtins were only added in LLVM 14[3]. In the absence of other means to
> detect this support at config-time, we resort to __has_builtin to
> identify the presence of __builtin_bcdsub at compile-time. If the
> builtin is not available, the instruction is emitted with a ".long".
> 
> [1]https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html
> [2]https://github.com/llvm/llvm-project/commit/59eb767e11d4ffefb5f55409524e5c8416b2b0db
> [3]https://github.com/llvm/llvm-project/commit/c933c2eb334660c131f4afc9d194fafb0cec0423
> 
> Signed-off-by: Matheus Ferst<matheus.ferst@eldorado.org.br>
> ---
>   tests/tcg/ppc64le/bcdsub.c | 55 +++++++++++++++++++++-----------------
>   1 file changed, 30 insertions(+), 25 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] tests/tcg/ppc64le: Use Altivec register names in clobbler list
  2022-03-03 17:20 ` [PATCH v2 5/5] tests/tcg/ppc64le: Use Altivec register names in clobbler list matheus.ferst
@ 2022-03-03 19:30   ` Richard Henderson
  2022-03-03 20:53     ` Matheus K. Ferst
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2022-03-03 19:30 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: thuth, danielhb413, groug, philippe.mathieu.daude, clg, mrezanin,
	alex.bennee, david

On 3/3/22 07:20, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> LLVM/Clang doesn't know the VSX registers when compiling with
> -mabi=elfv1. Use only registers >= 32 and list them with their Altivec
> name.
> 
> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>

This description isn't quite right.  The change to the m[tf]vsr insns is a generic bug 
fix, and not related to Clang.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/5] target/ppc: change xs[n]madd[am]sp to use float64r32_muladd
  2022-03-03 18:49   ` Richard Henderson
@ 2022-03-03 20:53     ` Matheus K. Ferst
  0 siblings, 0 replies; 14+ messages in thread
From: Matheus K. Ferst @ 2022-03-03 20:53 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-ppc
  Cc: thuth, danielhb413, groug, philippe.mathieu.daude, clg, mrezanin,
	alex.bennee, david

On 03/03/2022 15:49, Richard Henderson wrote:
> On 3/3/22 07:20, matheus.ferst@eldorado.org.br wrote:
>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>
>> Change VSX Scalar Multiply-Add/Subtract Type-A/M Single Precision
>> helpers to use float64r32_muladd. This method should correctly handle
>> all rounding modes, so the workaround for float_round_nearest_even can
>> be dropped.
>>
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>> ---
>>   target/ppc/fpu_helper.c | 93 ++++++++++++++++-------------------------
>>   1 file changed, 35 insertions(+), 58 deletions(-)
>>
>> diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
>> index 8f970288f5..c973968ed6 100644
>> --- a/target/ppc/fpu_helper.c
>> +++ b/target/ppc/fpu_helper.c
>> @@ -1916,22 +1916,19 @@ void helper_xsdivqp(CPUPPCState *env, uint32_t 
>> opcode,
>>    *   fld   - vsr_t field (VsrD(*) or VsrW(*))
>>    *   sfprf - set FPRF
>>    */
>> -#define VSX_RE(op, nels, tp, fld, sfprf, 
>> r2sp)                                \
>> +#define VSX_RE(op, nels, tp, op_tp, fld, 
>> sfprf)                               \
>>   void helper_##op(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t 
>> *xb)              \
>>   
>> {                                                                             
>> \
>>       ppc_vsr_t t = { 
>> };                                                        \
>> -    int 
>> i;                                                                    \
>> +    int i, 
>> flags;                                                             \
>>                                                                                 
>> \
>>       
>> helper_reset_fpstatus(env);                                               
>> \
>>                                                                                 
>> \
>>       for (i = 0; i < nels; i++) 
>> {                                              \
>> -        if (unlikely(tp##_is_signaling_nan(xb->fld, 
>> &env->fp_status))) {      \
>> +        t.fld = op_tp##_div(tp##_one, xb->fld, 
>> &env->fp_status);              \
>> +        flags = 
>> get_float_exception_flags(&env->fp_status);                   \
>> +        if (unlikely(flags & float_flag_invalid_snan)) {
> 
> You seem to have squashed the change to recip-estimate into the same 
> patch as muladd.
> 
> 
>> -#define VSX_RSQRTE(op, nels, tp, fld, sfprf, 
>> r2sp)                           \
>> +#define VSX_RSQRTE(op, nels, tp, op_tp, fld, 
>> sfprf)                          \
> 
> And recip-sqrt-estimate.
> 
> I guess it's ok to squash, since it's all related, but you should update 
> the patch
> description if you leave it this way.
> 

Sorry, I cherry-picked the wrong branch. This patch should just be a 
rebase of v1. I'll send the changes to 
VSX_{ADD_SUB,MUL,DIV,RE,SQRT,RSQRTE} in a separate patch series since 
it's not test-related.

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] tests/tcg/ppc64le: Use Altivec register names in clobbler list
  2022-03-03 19:30   ` Richard Henderson
@ 2022-03-03 20:53     ` Matheus K. Ferst
  2022-03-03 21:05       ` Richard Henderson
  0 siblings, 1 reply; 14+ messages in thread
From: Matheus K. Ferst @ 2022-03-03 20:53 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel, qemu-ppc
  Cc: thuth, danielhb413, groug, philippe.mathieu.daude, clg, mrezanin,
	alex.bennee, david

On 03/03/2022 16:30, Richard Henderson wrote:
> On 3/3/22 07:20, matheus.ferst@eldorado.org.br wrote:
>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>
>> LLVM/Clang doesn't know the VSX registers when compiling with
>> -mabi=elfv1. Use only registers >= 32 and list them with their Altivec
>> name.
>>
>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> This description isn't quite right.  The change to the m[tf]vsr insns is 
> a generic bug
> fix, and not related to Clang.
> 

I'm not sure if I understood. I'm targeting the Clang problem with this 
patch, is something else being fixed by this change?
AFAICT, the old "mtvsrd 0, %2" and "mfvsrd %0, 0" were correct, I'm just 
changing from VSR 0 to VSR 32 to allow the clobber with Clang, but GCC 
doesn't seem to have this limitation with ELFv1.

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] tests/tcg/ppc64le: Use Altivec register names in clobbler list
  2022-03-03 20:53     ` Matheus K. Ferst
@ 2022-03-03 21:05       ` Richard Henderson
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2022-03-03 21:05 UTC (permalink / raw)
  To: Matheus K. Ferst, qemu-devel, qemu-ppc
  Cc: thuth, danielhb413, groug, philippe.mathieu.daude, clg, mrezanin,
	alex.bennee, david

On 3/3/22 10:53, Matheus K. Ferst wrote:
> On 03/03/2022 16:30, Richard Henderson wrote:
>> On 3/3/22 07:20, matheus.ferst@eldorado.org.br wrote:
>>> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>>
>>> LLVM/Clang doesn't know the VSX registers when compiling with
>>> -mabi=elfv1. Use only registers >= 32 and list them with their Altivec
>>> name.
>>>
>>> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
>>
>> This description isn't quite right.  The change to the m[tf]vsr insns is a generic bug
>> fix, and not related to Clang.
>>
> 
> I'm not sure if I understood. I'm targeting the Clang problem with this patch, is 
> something else being fixed by this change?
> AFAICT, the old "mtvsrd 0, %2" and "mfvsrd %0, 0" were correct, I'm just changing from VSR 
> 0 to VSR 32 to allow the clobber with Clang, but GCC doesn't seem to have this limitation 
> with ELFv1.

Oh, whoops, I mis-read the patch.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-03-03 21:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-03 17:20 [PATCH v2 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang matheus.ferst
2022-03-03 17:20 ` [PATCH v2 1/5] tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf matheus.ferst
2022-03-03 18:46   ` Richard Henderson
2022-03-03 17:20 ` [PATCH v2 2/5] target/ppc: change xs[n]madd[am]sp to use float64r32_muladd matheus.ferst
2022-03-03 18:49   ` Richard Henderson
2022-03-03 20:53     ` Matheus K. Ferst
2022-03-03 17:20 ` [RFC PATCH v2 3/5] tests/tcg/ppc64le: drop __int128 usage in bcdsub matheus.ferst
2022-03-03 18:57   ` Richard Henderson
2022-03-03 17:20 ` [RFC PATCH v2 4/5] tests/tcg/ppc64le: emit bcdsub with .long when needed matheus.ferst
2022-03-03 19:19   ` Richard Henderson
2022-03-03 17:20 ` [PATCH v2 5/5] tests/tcg/ppc64le: Use Altivec register names in clobbler list matheus.ferst
2022-03-03 19:30   ` Richard Henderson
2022-03-03 20:53     ` Matheus K. Ferst
2022-03-03 21:05       ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.