All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang
@ 2022-03-04 16:54 matheus.ferst
  2022-03-04 16:54 ` [PATCH v3 1/5] tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf matheus.ferst
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: matheus.ferst @ 2022-03-04 16:54 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: thuth, Matheus Ferst, danielhb413, richard.henderson, groug,
	philippe.mathieu.daude, clg, mrezanin, alex.bennee, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

As the configuration scripts used -mbig and -mlittle, building PPC tests
with Clang was silently skipped. With the patch to fix these options[1],
"make check-tcg" fails because of build and runtime errors. This patch
series tries to fix some of these problems.

The first patch fixes tests/tcg/ppc64le/mtfsf.c by removing the GCC-only
builtins used to emit mtfsf and mffs. We can emit these insns with
inline asm instead.

The second patch addresses differences in the output of float_madds.c.
The __builtin_fmaf used in this test emits fmadds with GCC and
xsmaddasp with LLVM. The first insn had rounding errors fixed in
d04ca895dc7f ("target/ppc: Add helpers for fmadds et al"), we apply a
similar fix to xsmaddasp.

Then we have build and runtime errors of bcdsub. According to GCC
docs[2], the '-mpower8-vector' flag provides some bcdsub builtins, so
it'd be reasonable to assume that the rest of the toolchain knows about
the insn if the compiler accepts this flag. Clang has supported this
flag since version 3.6[3], but the insn and builtins were only added in
LLVM 14[4]. I couldn't find a good config-time solution, so we use
__has_builtin to check for __builtin_bcdsub at compile-time and, if not
available, emit the opcode with a ".long." If __has_builtin itself is
not available, we assume that -mpower8-vector is enough and that the
insn is available.

Even building with a newer Clang that accepts the bcdsub insn, the test
fails at runtime because LLVM doesn't like "__int128" in inline asm. No
error or warning is emitted, but the generated code only loads one
doubleword of the VSR. The third patch avoids this issue by passing the
VSR values in GPR pairs, as we did in
84ade98e87ea ("target/ppc: do not silence snan in xscvspdpn").

The last patch fixes tests/tcg/ppc64le/non_signalling_xscv.c build with
-mabi=elfv1. Clang only recognizes VSX register in the clobber list of
inline asm when using ELFv2, so we use VSRs >= 32 and list them by their
Altivec name.

Finally, the insns tested by tests/tcg/ppc64le/byte_reverse.c are not
yet supported by LLVM. Since the configuration script uses '-mpower10'
to check for POWER10 support and Clang doesn't support this flag,
"make check-tcg" doesn't fail. We should probably change this check in
the future, but since LLVM support of POWER10 seems incomplete, I guess
we can leave it for now.

v3:
 - Removed unrelated changes in patch 2
 - Fixed __has_builtin check in bcdsub

v2:
 - New patch to address non_signalling_xscv.c build problems with Clang
   and ELFv1;
 - Rework of bcdsub patch to work with LLVM < 14 and avoid vector types.

[1] https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg06506.html
[2] https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html
[3] https://github.com/llvm/llvm-project/commit/59eb767e11d4ffefb5f55409524e5c8416b2b0db
[4] https://github.com/llvm/llvm-project/commit/c933c2eb334660c131f4afc9d194fafb0cec0423

Matheus Ferst (5):
  tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf
  target/ppc: change xs[n]madd[am]sp to use float64r32_muladd
  tests/tcg/ppc64le: drop __int128 usage in bcdsub
  tests/tcg/ppc64le: emit bcdsub with .long when needed
  tests/tcg/ppc64le: Use Altivec register names in clobber list

 target/ppc/fpu_helper.c                 |  54 ++++------
 tests/tcg/ppc64le/bcdsub.c              | 138 ++++++++++++------------
 tests/tcg/ppc64le/mtfsf.c               |  19 ++--
 tests/tcg/ppc64le/non_signalling_xscv.c |  16 +--
 4 files changed, 106 insertions(+), 121 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/5] tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf
  2022-03-04 16:54 [PATCH v3 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang matheus.ferst
@ 2022-03-04 16:54 ` matheus.ferst
  2022-03-04 16:54 ` [PATCH v3 2/5] target/ppc: change xs[n]madd[am]sp to use float64r32_muladd matheus.ferst
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: matheus.ferst @ 2022-03-04 16:54 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: thuth, Matheus Ferst, danielhb413, richard.henderson, groug,
	philippe.mathieu.daude, clg, mrezanin, alex.bennee, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

LLVM/Clang does not support __builtin_mtfsf.

Acked-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 tests/tcg/ppc64le/mtfsf.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/tests/tcg/ppc64le/mtfsf.c b/tests/tcg/ppc64le/mtfsf.c
index b3d31f3637..bed5b1afa4 100644
--- a/tests/tcg/ppc64le/mtfsf.c
+++ b/tests/tcg/ppc64le/mtfsf.c
@@ -1,8 +1,12 @@
 #include <stdlib.h>
+#include <stdint.h>
 #include <assert.h>
 #include <signal.h>
 #include <sys/prctl.h>
 
+#define MTFSF(FLM, FRB) asm volatile ("mtfsf %0, %1" :: "i" (FLM), "f" (FRB))
+#define MFFS(FRT) asm("mffs %0" : "=f" (FRT))
+
 #define FPSCR_VE     7  /* Floating-point invalid operation exception enable */
 #define FPSCR_VXSOFT 10 /* Floating-point invalid operation exception (soft) */
 #define FPSCR_FI     17 /* Floating-point fraction inexact                   */
@@ -21,10 +25,7 @@ void sigfpe_handler(int sig, siginfo_t *si, void *ucontext)
 
 int main(void)
 {
-    union {
-        double d;
-        long long ll;
-    } fpscr;
+    uint64_t fpscr;
 
     struct sigaction sa = {
         .sa_sigaction = sigfpe_handler,
@@ -40,10 +41,9 @@ int main(void)
     prctl(PR_SET_FPEXC, PR_FP_EXC_PRECISE);
 
     /* First test if the FI bit is being set correctly */
-    fpscr.ll = FP_FI;
-    __builtin_mtfsf(0b11111111, fpscr.d);
-    fpscr.d = __builtin_mffs();
-    assert((fpscr.ll & FP_FI) != 0);
+    MTFSF(0b11111111, FP_FI);
+    MFFS(fpscr);
+    assert((fpscr & FP_FI) != 0);
 
     /* Then test if the deferred exception is being called correctly */
     sigaction(SIGFPE, &sa, NULL);
@@ -54,8 +54,7 @@ int main(void)
      * But if a different exception is chosen si_code check should
      * change accordingly.
      */
-    fpscr.ll = FP_VE | FP_VXSOFT;
-    __builtin_mtfsf(0b11111111, fpscr.d);
+    MTFSF(0b11111111, FP_VE | FP_VXSOFT);
 
     return 1;
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 2/5] target/ppc: change xs[n]madd[am]sp to use float64r32_muladd
  2022-03-04 16:54 [PATCH v3 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang matheus.ferst
  2022-03-04 16:54 ` [PATCH v3 1/5] tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf matheus.ferst
@ 2022-03-04 16:54 ` matheus.ferst
  2022-03-04 16:54 ` [PATCH v3 3/5] tests/tcg/ppc64le: drop __int128 usage in bcdsub matheus.ferst
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: matheus.ferst @ 2022-03-04 16:54 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: thuth, Matheus Ferst, danielhb413, richard.henderson, groug,
	philippe.mathieu.daude, clg, mrezanin, alex.bennee, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Change VSX Scalar Multiply-Add/Subtract Type-A/M Single Precision
helpers to use float64r32_muladd. This method should correctly handle
all rounding modes, so the workaround for float_round_nearest_even can
be dropped.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v3:
 - Removed unrelated changes;
---
 target/ppc/fpu_helper.c | 54 ++++++++++++++---------------------------
 1 file changed, 18 insertions(+), 36 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 8f970288f5..2cad05c9cf 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2156,9 +2156,8 @@ VSX_TSQRT(xvtsqrtsp, 4, float32, VsrW(i), -126, 23)
  *   maddflgs - flags for the float*muladd routine that control the
  *           various forms (madd, msub, nmadd, nmsub)
  *   sfprf - set FPRF
- *   r2sp  - round intermediate double precision result to single precision
  */
-#define VSX_MADD(op, nels, tp, fld, maddflgs, sfprf, r2sp)                    \
+#define VSX_MADD(op, nels, tp, fld, maddflgs, sfprf)                          \
 void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
                  ppc_vsr_t *s1, ppc_vsr_t *s2, ppc_vsr_t *s3)                 \
 {                                                                             \
@@ -2170,20 +2169,7 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
     for (i = 0; i < nels; i++) {                                              \
         float_status tstat = env->fp_status;                                  \
         set_float_exception_flags(0, &tstat);                                 \
-        if (r2sp && (tstat.float_rounding_mode == float_round_nearest_even)) {\
-            /*                                                                \
-             * Avoid double rounding errors by rounding the intermediate      \
-             * result to odd.                                                 \
-             */                                                               \
-            set_float_rounding_mode(float_round_to_zero, &tstat);             \
-            t.fld = tp##_muladd(s1->fld, s3->fld, s2->fld,                    \
-                                maddflgs, &tstat);                            \
-            t.fld |= (get_float_exception_flags(&tstat) &                     \
-                      float_flag_inexact) != 0;                               \
-        } else {                                                              \
-            t.fld = tp##_muladd(s1->fld, s3->fld, s2->fld,                    \
-                                maddflgs, &tstat);                            \
-        }                                                                     \
+        t.fld = tp##_muladd(s1->fld, s3->fld, s2->fld, maddflgs, &tstat);     \
         env->fp_status.float_exception_flags |= tstat.float_exception_flags;  \
                                                                               \
         if (unlikely(tstat.float_exception_flags & float_flag_invalid)) {     \
@@ -2191,10 +2177,6 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
                                   sfprf, GETPC());                            \
         }                                                                     \
                                                                               \
-        if (r2sp) {                                                           \
-            t.fld = do_frsp(env, t.fld, GETPC());                             \
-        }                                                                     \
-                                                                              \
         if (sfprf) {                                                          \
             helper_compute_fprf_float64(env, t.fld);                          \
         }                                                                     \
@@ -2203,24 +2185,24 @@ void helper_##op(CPUPPCState *env, ppc_vsr_t *xt,                             \
     do_float_check_status(env, GETPC());                                      \
 }
 
-VSX_MADD(XSMADDDP, 1, float64, VsrD(0), MADD_FLGS, 1, 0)
-VSX_MADD(XSMSUBDP, 1, float64, VsrD(0), MSUB_FLGS, 1, 0)
-VSX_MADD(XSNMADDDP, 1, float64, VsrD(0), NMADD_FLGS, 1, 0)
-VSX_MADD(XSNMSUBDP, 1, float64, VsrD(0), NMSUB_FLGS, 1, 0)
-VSX_MADD(XSMADDSP, 1, float64, VsrD(0), MADD_FLGS, 1, 1)
-VSX_MADD(XSMSUBSP, 1, float64, VsrD(0), MSUB_FLGS, 1, 1)
-VSX_MADD(XSNMADDSP, 1, float64, VsrD(0), NMADD_FLGS, 1, 1)
-VSX_MADD(XSNMSUBSP, 1, float64, VsrD(0), NMSUB_FLGS, 1, 1)
+VSX_MADD(XSMADDDP, 1, float64, VsrD(0), MADD_FLGS, 1)
+VSX_MADD(XSMSUBDP, 1, float64, VsrD(0), MSUB_FLGS, 1)
+VSX_MADD(XSNMADDDP, 1, float64, VsrD(0), NMADD_FLGS, 1)
+VSX_MADD(XSNMSUBDP, 1, float64, VsrD(0), NMSUB_FLGS, 1)
+VSX_MADD(XSMADDSP, 1, float64r32, VsrD(0), MADD_FLGS, 1)
+VSX_MADD(XSMSUBSP, 1, float64r32, VsrD(0), MSUB_FLGS, 1)
+VSX_MADD(XSNMADDSP, 1, float64r32, VsrD(0), NMADD_FLGS, 1)
+VSX_MADD(XSNMSUBSP, 1, float64r32, VsrD(0), NMSUB_FLGS, 1)
 
-VSX_MADD(xvmadddp, 2, float64, VsrD(i), MADD_FLGS, 0, 0)
-VSX_MADD(xvmsubdp, 2, float64, VsrD(i), MSUB_FLGS, 0, 0)
-VSX_MADD(xvnmadddp, 2, float64, VsrD(i), NMADD_FLGS, 0, 0)
-VSX_MADD(xvnmsubdp, 2, float64, VsrD(i), NMSUB_FLGS, 0, 0)
+VSX_MADD(xvmadddp, 2, float64, VsrD(i), MADD_FLGS, 0)
+VSX_MADD(xvmsubdp, 2, float64, VsrD(i), MSUB_FLGS, 0)
+VSX_MADD(xvnmadddp, 2, float64, VsrD(i), NMADD_FLGS, 0)
+VSX_MADD(xvnmsubdp, 2, float64, VsrD(i), NMSUB_FLGS, 0)
 
-VSX_MADD(xvmaddsp, 4, float32, VsrW(i), MADD_FLGS, 0, 0)
-VSX_MADD(xvmsubsp, 4, float32, VsrW(i), MSUB_FLGS, 0, 0)
-VSX_MADD(xvnmaddsp, 4, float32, VsrW(i), NMADD_FLGS, 0, 0)
-VSX_MADD(xvnmsubsp, 4, float32, VsrW(i), NMSUB_FLGS, 0, 0)
+VSX_MADD(xvmaddsp, 4, float32, VsrW(i), MADD_FLGS, 0)
+VSX_MADD(xvmsubsp, 4, float32, VsrW(i), MSUB_FLGS, 0)
+VSX_MADD(xvnmaddsp, 4, float32, VsrW(i), NMADD_FLGS, 0)
+VSX_MADD(xvnmsubsp, 4, float32, VsrW(i), NMSUB_FLGS, 0)
 
 /*
  * VSX_MADDQ - VSX floating point quad-precision muliply/add
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 3/5] tests/tcg/ppc64le: drop __int128 usage in bcdsub
  2022-03-04 16:54 [PATCH v3 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang matheus.ferst
  2022-03-04 16:54 ` [PATCH v3 1/5] tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf matheus.ferst
  2022-03-04 16:54 ` [PATCH v3 2/5] target/ppc: change xs[n]madd[am]sp to use float64r32_muladd matheus.ferst
@ 2022-03-04 16:54 ` matheus.ferst
  2022-03-04 16:54 ` [PATCH v3 4/5] tests/tcg/ppc64le: emit bcdsub with .long when needed matheus.ferst
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: matheus.ferst @ 2022-03-04 16:54 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: thuth, Matheus Ferst, danielhb413, richard.henderson, groug,
	philippe.mathieu.daude, clg, mrezanin, alex.bennee, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Using __int128 with inline asm constraints like "v" generates incorrect
code when compiling with LLVM/Clang (e.g., only one doubleword of the
VSR is loaded). Instead, use a GPR pair to pass the 128-bits value and
load the VSR with mtvsrd/xxmrghd.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 tests/tcg/ppc64le/bcdsub.c | 123 +++++++++++++++++--------------------
 1 file changed, 58 insertions(+), 65 deletions(-)

diff --git a/tests/tcg/ppc64le/bcdsub.c b/tests/tcg/ppc64le/bcdsub.c
index 8c188cae6d..12da19b78e 100644
--- a/tests/tcg/ppc64le/bcdsub.c
+++ b/tests/tcg/ppc64le/bcdsub.c
@@ -1,6 +1,7 @@
 #include <assert.h>
 #include <unistd.h>
 #include <signal.h>
+#include <stdint.h>
 
 #define CRF_LT  (1 << 3)
 #define CRF_GT  (1 << 2)
@@ -8,24 +9,39 @@
 #define CRF_SO  (1 << 0)
 #define UNDEF   0
 
-#define BCDSUB(vra, vrb, ps)                    \
-    asm ("bcdsub. %1,%2,%3,%4;"                 \
-         "mfocrf %0,0b10;"                      \
-         : "=r" (cr), "=v" (vrt)                \
-         : "v" (vra), "v" (vrb), "i" (ps)       \
-         : );
+/*
+ * Use GPR pairs to load the VSR values and place the resulting VSR and CR6 in
+ * th, tl, and cr. Note that we avoid newer instructions (e.g., mtvsrdd/mfvsrld)
+ * so we can run this test on POWER8 machines.
+ */
+#define BCDSUB(AH, AL, BH, BL, PS)                          \
+    asm ("mtvsrd 32, %3\n\t"                                \
+         "mtvsrd 33, %4\n\t"                                \
+         "xxmrghd 32, 32, 33\n\t"                           \
+         "mtvsrd 33, %5\n\t"                                \
+         "mtvsrd 34, %6\n\t"                                \
+         "xxmrghd 33, 33, 34\n\t"                           \
+         "bcdsub. 0, 0, 1, %7\n\t"                          \
+         "mfocrf %0, 0b10\n\t"                              \
+         "mfvsrd %1, 32\n\t"                                \
+         "xxswapd 32, 32\n\t"                               \
+         "mfvsrd %2, 32\n\t"                                \
+         : "=r" (cr), "=r" (th), "=r" (tl)                  \
+         : "r" (AH), "r" (AL), "r" (BH), "r" (BL), "i" (PS) \
+         : "v0", "v1", "v2");
 
-#define TEST(vra, vrb, ps, exp_res, exp_cr6)    \
+#define TEST(AH, AL, BH, BL, PS, TH, TL, CR6)   \
     do {                                        \
-        __int128 vrt = 0;                       \
         int cr = 0;                             \
-        BCDSUB(vra, vrb, ps);                   \
-        if (exp_res)                            \
-            assert(vrt == exp_res);             \
-        assert((cr >> 4) == exp_cr6);           \
+        uint64_t th, tl;                        \
+        BCDSUB(AH, AL, BH, BL, PS);             \
+        if (TH != UNDEF || TL != UNDEF) {       \
+            assert(tl == TL);                   \
+            assert(th == TH);                   \
+        }                                       \
+        assert((cr >> 4) == CR6);               \
     } while (0)
 
-
 /*
  * Unbounded result is equal to zero:
  *   sign = (PS) ? 0b1111 : 0b1100
@@ -33,13 +49,13 @@
  */
 void test_bcdsub_eq(void)
 {
-    __int128 a, b;
-
     /* maximum positive BCD value */
-    a = b = (((__int128) 0x9999999999999999) << 64 | 0x999999999999999c);
-
-    TEST(a, b, 0, 0xc, CRF_EQ);
-    TEST(a, b, 1, 0xf, CRF_EQ);
+    TEST(0x9999999999999999, 0x999999999999999c,
+         0x9999999999999999, 0x999999999999999c,
+         0, 0x0, 0xc, CRF_EQ);
+    TEST(0x9999999999999999, 0x999999999999999c,
+         0x9999999999999999, 0x999999999999999c,
+         1, 0x0, 0xf, CRF_EQ);
 }
 
 /*
@@ -49,21 +65,16 @@ void test_bcdsub_eq(void)
  */
 void test_bcdsub_gt(void)
 {
-    __int128 a, b, c;
+    /* maximum positive and negative one BCD values */
+    TEST(0x9999999999999999, 0x999999999999999c, 0x0, 0x1d, 0,
+         0x0, 0xc, (CRF_GT | CRF_SO));
+    TEST(0x9999999999999999, 0x999999999999999c, 0x0, 0x1d, 1,
+         0x0, 0xf, (CRF_GT | CRF_SO));
 
-    /* maximum positive BCD value */
-    a = (((__int128) 0x9999999999999999) << 64 | 0x999999999999999c);
-
-    /* negative one BCD value */
-    b = (__int128) 0x1d;
-
-    TEST(a, b, 0, 0xc, (CRF_GT | CRF_SO));
-    TEST(a, b, 1, 0xf, (CRF_GT | CRF_SO));
-
-    c = (((__int128) 0x9999999999999999) << 64 | 0x999999999999998c);
-
-    TEST(c, b, 0, a, CRF_GT);
-    TEST(c, b, 1, (a | 0x3), CRF_GT);
+    TEST(0x9999999999999999, 0x999999999999998c, 0x0, 0x1d, 0,
+         0x9999999999999999, 0x999999999999999c, CRF_GT);
+    TEST(0x9999999999999999, 0x999999999999998c, 0x0, 0x1d, 1,
+         0x9999999999999999, 0x999999999999999f, CRF_GT);
 }
 
 /*
@@ -73,45 +84,27 @@ void test_bcdsub_gt(void)
  */
 void test_bcdsub_lt(void)
 {
-    __int128 a, b;
+    /* positive zero and positive one BCD values */
+    TEST(0x0, 0xc, 0x0, 0x1c, 0, 0x0, 0x1d, CRF_LT);
+    TEST(0x0, 0xc, 0x0, 0x1c, 1, 0x0, 0x1d, CRF_LT);
 
-    /* positive zero BCD value */
-    a = (__int128) 0xc;
-
-    /* positive one BCD value */
-    b = (__int128) 0x1c;
-
-    TEST(a, b, 0, 0x1d, CRF_LT);
-    TEST(a, b, 1, 0x1d, CRF_LT);
-
-    /* maximum negative BCD value */
-    a = (((__int128) 0x9999999999999999) << 64 | 0x999999999999999d);
-
-    /* positive one BCD value */
-    b = (__int128) 0x1c;
-
-    TEST(a, b, 0, 0xd, (CRF_LT | CRF_SO));
-    TEST(a, b, 1, 0xd, (CRF_LT | CRF_SO));
+    /* maximum negative and positive one BCD values */
+    TEST(0x9999999999999999, 0x999999999999999d, 0x0, 0x1c, 0,
+         0x0, 0xd, (CRF_LT | CRF_SO));
+    TEST(0x9999999999999999, 0x999999999999999d, 0x0, 0x1c, 1,
+         0x0, 0xd, (CRF_LT | CRF_SO));
 }
 
 void test_bcdsub_invalid(void)
 {
-    __int128 a, b;
+    TEST(0x0, 0x1c, 0x0, 0xf00, 0, UNDEF, UNDEF, CRF_SO);
+    TEST(0x0, 0x1c, 0x0, 0xf00, 1, UNDEF, UNDEF, CRF_SO);
 
-    /* positive one BCD value */
-    a = (__int128) 0x1c;
-    b = 0xf00;
+    TEST(0x0, 0xf00, 0x0, 0x1c, 0, UNDEF, UNDEF, CRF_SO);
+    TEST(0x0, 0xf00, 0x0, 0x1c, 1, UNDEF, UNDEF, CRF_SO);
 
-    TEST(a, b, 0, UNDEF, CRF_SO);
-    TEST(a, b, 1, UNDEF, CRF_SO);
-
-    TEST(b, a, 0, UNDEF, CRF_SO);
-    TEST(b, a, 1, UNDEF, CRF_SO);
-
-    a = 0xbad;
-
-    TEST(a, b, 0, UNDEF, CRF_SO);
-    TEST(a, b, 1, UNDEF, CRF_SO);
+    TEST(0x0, 0xbad, 0x0, 0xf00, 0, UNDEF, UNDEF, CRF_SO);
+    TEST(0x0, 0xbad, 0x0, 0xf00, 1, UNDEF, UNDEF, CRF_SO);
 }
 
 int main(void)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 4/5] tests/tcg/ppc64le: emit bcdsub with .long when needed
  2022-03-04 16:54 [PATCH v3 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang matheus.ferst
                   ` (2 preceding siblings ...)
  2022-03-04 16:54 ` [PATCH v3 3/5] tests/tcg/ppc64le: drop __int128 usage in bcdsub matheus.ferst
@ 2022-03-04 16:54 ` matheus.ferst
  2022-03-04 16:54 ` [PATCH v3 5/5] tests/tcg/ppc64le: Use Altivec register names in clobber list matheus.ferst
  2022-03-05  8:06 ` [PATCH v3 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang Cédric Le Goater
  5 siblings, 0 replies; 7+ messages in thread
From: matheus.ferst @ 2022-03-04 16:54 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: thuth, Matheus Ferst, danielhb413, richard.henderson, groug,
	philippe.mathieu.daude, clg, mrezanin, alex.bennee, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

Based on GCC docs[1], we use the '-mpower8-vector' flag at config-time
to detect the toolchain support to the bcdsub instruction. LLVM/Clang
supports this flag since version 3.6[2], but the instruction and related
builtins were only added in LLVM 14[3]. In the absence of other means to
detect this support at config-time, we resort to __has_builtin to
identify the presence of __builtin_bcdsub at compile-time. If the
builtin is not available, the instruction is emitted with a ".long".

[1] https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html
[2] https://github.com/llvm/llvm-project/commit/59eb767e11d4ffefb5f55409524e5c8416b2b0db
[3] https://github.com/llvm/llvm-project/commit/c933c2eb334660c131f4afc9d194fafb0cec0423

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
v3:
 - Fixed __has_builtin check.
---
 tests/tcg/ppc64le/bcdsub.c | 71 ++++++++++++++++++++++----------------
 1 file changed, 41 insertions(+), 30 deletions(-)

diff --git a/tests/tcg/ppc64le/bcdsub.c b/tests/tcg/ppc64le/bcdsub.c
index 12da19b78e..87c8c44a44 100644
--- a/tests/tcg/ppc64le/bcdsub.c
+++ b/tests/tcg/ppc64le/bcdsub.c
@@ -9,37 +9,48 @@
 #define CRF_SO  (1 << 0)
 #define UNDEF   0
 
-/*
- * Use GPR pairs to load the VSR values and place the resulting VSR and CR6 in
- * th, tl, and cr. Note that we avoid newer instructions (e.g., mtvsrdd/mfvsrld)
- * so we can run this test on POWER8 machines.
- */
-#define BCDSUB(AH, AL, BH, BL, PS)                          \
-    asm ("mtvsrd 32, %3\n\t"                                \
-         "mtvsrd 33, %4\n\t"                                \
-         "xxmrghd 32, 32, 33\n\t"                           \
-         "mtvsrd 33, %5\n\t"                                \
-         "mtvsrd 34, %6\n\t"                                \
-         "xxmrghd 33, 33, 34\n\t"                           \
-         "bcdsub. 0, 0, 1, %7\n\t"                          \
-         "mfocrf %0, 0b10\n\t"                              \
-         "mfvsrd %1, 32\n\t"                                \
-         "xxswapd 32, 32\n\t"                               \
-         "mfvsrd %2, 32\n\t"                                \
-         : "=r" (cr), "=r" (th), "=r" (tl)                  \
-         : "r" (AH), "r" (AL), "r" (BH), "r" (BL), "i" (PS) \
-         : "v0", "v1", "v2");
+#ifdef __has_builtin
+#if !__has_builtin(__builtin_bcdsub)
+#define NO_BUILTIN_BCDSUB
+#endif
+#endif
 
-#define TEST(AH, AL, BH, BL, PS, TH, TL, CR6)   \
-    do {                                        \
-        int cr = 0;                             \
-        uint64_t th, tl;                        \
-        BCDSUB(AH, AL, BH, BL, PS);             \
-        if (TH != UNDEF || TL != UNDEF) {       \
-            assert(tl == TL);                   \
-            assert(th == TH);                   \
-        }                                       \
-        assert((cr >> 4) == CR6);               \
+#ifdef NO_BUILTIN_BCDSUB
+#define BCDSUB(T, A, B, PS) \
+    ".long 4 << 26 | (" #T ") << 21 | (" #A ") << 16 | (" #B ") << 11"  \
+    " | 1 << 10 | (" #PS ") << 9 | 65\n\t"
+#else
+#define BCDSUB(T, A, B, PS) "bcdsub. " #T ", " #A ", " #B ", " #PS "\n\t"
+#endif
+
+#define TEST(AH, AL, BH, BL, PS, TH, TL, CR6)                                  \
+    do {                                                                       \
+        int cr = 0;                                                            \
+        uint64_t th, tl;                                                       \
+        /*                                                                     \
+         * Use GPR pairs to load the VSR values and place the resulting VSR and\
+         * CR6 in th, tl, and cr. Note that we avoid newer instructions (e.g., \
+         * mtvsrdd/mfvsrld) so we can run this test on POWER8 machines.        \
+         */                                                                    \
+        asm ("mtvsrd 32, %3\n\t"                                               \
+             "mtvsrd 33, %4\n\t"                                               \
+             "xxmrghd 32, 32, 33\n\t"                                          \
+             "mtvsrd 33, %5\n\t"                                               \
+             "mtvsrd 34, %6\n\t"                                               \
+             "xxmrghd 33, 33, 34\n\t"                                          \
+             BCDSUB(0, 0, 1, PS)                                               \
+             "mfocrf %0, 0b10\n\t"                                             \
+             "mfvsrd %1, 32\n\t"                                               \
+             "xxswapd 32, 32\n\t"                                              \
+             "mfvsrd %2, 32\n\t"                                               \
+             : "=r" (cr), "=r" (th), "=r" (tl)                                 \
+             : "r" (AH), "r" (AL), "r" (BH), "r" (BL)                          \
+             : "v0", "v1", "v2");                                              \
+        if (TH != UNDEF || TL != UNDEF) {                                      \
+            assert(tl == TL);                                                  \
+            assert(th == TH);                                                  \
+        }                                                                      \
+        assert((cr >> 4) == CR6);                                              \
     } while (0)
 
 /*
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 5/5] tests/tcg/ppc64le: Use Altivec register names in clobber list
  2022-03-04 16:54 [PATCH v3 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang matheus.ferst
                   ` (3 preceding siblings ...)
  2022-03-04 16:54 ` [PATCH v3 4/5] tests/tcg/ppc64le: emit bcdsub with .long when needed matheus.ferst
@ 2022-03-04 16:54 ` matheus.ferst
  2022-03-05  8:06 ` [PATCH v3 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang Cédric Le Goater
  5 siblings, 0 replies; 7+ messages in thread
From: matheus.ferst @ 2022-03-04 16:54 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc
  Cc: thuth, Matheus Ferst, danielhb413, richard.henderson, groug,
	philippe.mathieu.daude, clg, mrezanin, alex.bennee, david

From: Matheus Ferst <matheus.ferst@eldorado.org.br>

LLVM/Clang doesn't know the VSX registers when compiling with
-mabi=elfv1. Use only registers >= 32 and list them with their Altivec
name.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
---
 tests/tcg/ppc64le/non_signalling_xscv.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/tests/tcg/ppc64le/non_signalling_xscv.c b/tests/tcg/ppc64le/non_signalling_xscv.c
index 91e25cad46..836df71ef0 100644
--- a/tests/tcg/ppc64le/non_signalling_xscv.c
+++ b/tests/tcg/ppc64le/non_signalling_xscv.c
@@ -6,16 +6,16 @@
 #define TEST(INSN, B_HI, B_LO, T_HI, T_LO) \
     do {                                                                \
         uint64_t th, tl, bh = B_HI, bl = B_LO;                          \
-        asm("mtvsrd 0, %2\n\t"                                          \
-            "mtvsrd 1, %3\n\t"                                          \
-            "xxmrghd 0, 0, 1\n\t"                                       \
-            INSN " 0, 0\n\t"                                            \
-            "mfvsrd %0, 0\n\t"                                          \
-            "xxswapd 0, 0\n\t"                                          \
-            "mfvsrd %1, 0\n\t"                                          \
+        asm("mtvsrd 32, %2\n\t"                                         \
+            "mtvsrd 33, %3\n\t"                                         \
+            "xxmrghd 32, 32, 33\n\t"                                    \
+            INSN " 32, 32\n\t"                                          \
+            "mfvsrd %0, 32\n\t"                                         \
+            "xxswapd 32, 32\n\t"                                        \
+            "mfvsrd %1, 32\n\t"                                         \
             : "=r" (th), "=r" (tl)                                      \
             : "r" (bh), "r" (bl)                                        \
-            : "vs0", "vs1");                                            \
+            : "v0", "v1");                                              \
         printf(INSN "(0x%016" PRIx64 "%016" PRIx64 ") = 0x%016" PRIx64  \
                "%016" PRIx64 "\n", bh, bl, th, tl);                     \
         assert(th == T_HI && tl == T_LO);                               \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang
  2022-03-04 16:54 [PATCH v3 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang matheus.ferst
                   ` (4 preceding siblings ...)
  2022-03-04 16:54 ` [PATCH v3 5/5] tests/tcg/ppc64le: Use Altivec register names in clobber list matheus.ferst
@ 2022-03-05  8:06 ` Cédric Le Goater
  5 siblings, 0 replies; 7+ messages in thread
From: Cédric Le Goater @ 2022-03-05  8:06 UTC (permalink / raw)
  To: matheus.ferst, qemu-devel, qemu-ppc
  Cc: thuth, danielhb413, richard.henderson, groug,
	philippe.mathieu.daude, mrezanin, alex.bennee, david

On 3/4/22 17:54, matheus.ferst@eldorado.org.br wrote:
> From: Matheus Ferst <matheus.ferst@eldorado.org.br>
> 
> As the configuration scripts used -mbig and -mlittle, building PPC tests
> with Clang was silently skipped. With the patch to fix these options[1],
> "make check-tcg" fails because of build and runtime errors. This patch
> series tries to fix some of these problems.
> 
> The first patch fixes tests/tcg/ppc64le/mtfsf.c by removing the GCC-only
> builtins used to emit mtfsf and mffs. We can emit these insns with
> inline asm instead.
> 
> The second patch addresses differences in the output of float_madds.c.
> The __builtin_fmaf used in this test emits fmadds with GCC and
> xsmaddasp with LLVM. The first insn had rounding errors fixed in
> d04ca895dc7f ("target/ppc: Add helpers for fmadds et al"), we apply a
> similar fix to xsmaddasp.
> 
> Then we have build and runtime errors of bcdsub. According to GCC
> docs[2], the '-mpower8-vector' flag provides some bcdsub builtins, so
> it'd be reasonable to assume that the rest of the toolchain knows about
> the insn if the compiler accepts this flag. Clang has supported this
> flag since version 3.6[3], but the insn and builtins were only added in
> LLVM 14[4]. I couldn't find a good config-time solution, so we use
> __has_builtin to check for __builtin_bcdsub at compile-time and, if not
> available, emit the opcode with a ".long." If __has_builtin itself is
> not available, we assume that -mpower8-vector is enough and that the
> insn is available.
> 
> Even building with a newer Clang that accepts the bcdsub insn, the test
> fails at runtime because LLVM doesn't like "__int128" in inline asm. No
> error or warning is emitted, but the generated code only loads one
> doubleword of the VSR. The third patch avoids this issue by passing the
> VSR values in GPR pairs, as we did in
> 84ade98e87ea ("target/ppc: do not silence snan in xscvspdpn").
> 
> The last patch fixes tests/tcg/ppc64le/non_signalling_xscv.c build with
> -mabi=elfv1. Clang only recognizes VSX register in the clobber list of
> inline asm when using ELFv2, so we use VSRs >= 32 and list them by their
> Altivec name.
> 
> Finally, the insns tested by tests/tcg/ppc64le/byte_reverse.c are not
> yet supported by LLVM. Since the configuration script uses '-mpower10'
> to check for POWER10 support and Clang doesn't support this flag,
> "make check-tcg" doesn't fail. We should probably change this check in
> the future, but since LLVM support of POWER10 seems incomplete, I guess
> we can leave it for now.
> 
> v3:
>   - Removed unrelated changes in patch 2
>   - Fixed __has_builtin check in bcdsub
> 
> v2:
>   - New patch to address non_signalling_xscv.c build problems with Clang
>     and ELFv1;
>   - Rework of bcdsub patch to work with LLVM < 14 and avoid vector types.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg06506.html
> [2] https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html
> [3] https://github.com/llvm/llvm-project/commit/59eb767e11d4ffefb5f55409524e5c8416b2b0db
> [4] https://github.com/llvm/llvm-project/commit/c933c2eb334660c131f4afc9d194fafb0cec0423
> 
> Matheus Ferst (5):
>    tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf
>    target/ppc: change xs[n]madd[am]sp to use float64r32_muladd
>    tests/tcg/ppc64le: drop __int128 usage in bcdsub
>    tests/tcg/ppc64le: emit bcdsub with .long when needed
>    tests/tcg/ppc64le: Use Altivec register names in clobber list
> 
>   target/ppc/fpu_helper.c                 |  54 ++++------
>   tests/tcg/ppc64le/bcdsub.c              | 138 ++++++++++++------------
>   tests/tcg/ppc64le/mtfsf.c               |  19 ++--
>   tests/tcg/ppc64le/non_signalling_xscv.c |  16 +--
>   4 files changed, 106 insertions(+), 121 deletions(-)
> 

Queued for ppc-7.0.

Thanks,

C.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-03-05  8:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-04 16:54 [PATCH v3 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang matheus.ferst
2022-03-04 16:54 ` [PATCH v3 1/5] tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf matheus.ferst
2022-03-04 16:54 ` [PATCH v3 2/5] target/ppc: change xs[n]madd[am]sp to use float64r32_muladd matheus.ferst
2022-03-04 16:54 ` [PATCH v3 3/5] tests/tcg/ppc64le: drop __int128 usage in bcdsub matheus.ferst
2022-03-04 16:54 ` [PATCH v3 4/5] tests/tcg/ppc64le: emit bcdsub with .long when needed matheus.ferst
2022-03-04 16:54 ` [PATCH v3 5/5] tests/tcg/ppc64le: Use Altivec register names in clobber list matheus.ferst
2022-03-05  8:06 ` [PATCH v3 0/5] tests/tcg/ppc64le: fix the build of TCG tests with Clang Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.