[PATCH v2 00/10] softfloat: Implement float128

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 00/10] softfloat: Implement float128_muladd
@ 2020-09-25 15:20 Richard Henderson
  2020-09-25 15:20 ` [PATCH v2 01/10] softfloat: Use mulu64 for mul64To128 Richard Henderson
                   ` (10 more replies)
  0 siblings, 11 replies; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: bharata, alex.bennee, david

Plus assorted cleanups, passes tests/fp/fp-test.

Changes in v2:
  * Add UInt256 type (david)
  * Rewrite and inline shift256RightJamming.  This keeps the whole
    UInt256 in registers, avoiding long sequences of loads and stores.
  * Add x86_64 assembly for double shifts.  I don't know why the
    compiler can't recognize this pattern, but swapping values in
    and out of %cl (the only register in the base isa that can
    hold a variable shift) is really ugly.
  * Add ppc64 assembly.


r~


Richard Henderson (10):
  softfloat: Use mulu64 for mul64To128
  softfloat: Use int128.h for some operations
  softfloat: Tidy a * b + inf return
  softfloat: Add float_cmask and constants
  softfloat: Inline pick_nan_muladd into its caller
  softfloat: Implement float128_muladd
  softfloat: Use x86_64 assembly for {add,sub}{192,256}
  softfloat: Use x86_64 assembly for sh[rl]_double
  softfloat: Use aarch64 assembly for {add,sub}{192,256}
  softfloat: Use ppc64 assembly for {add,sub}{192,256}

 include/fpu/softfloat-macros.h | 109 +++---
 include/fpu/softfloat.h        |   2 +
 fpu/softfloat.c                | 620 ++++++++++++++++++++++++++++++---
 tests/fp/fp-test.c             |   2 +-
 tests/fp/wrap.c.inc            |  12 +
 5 files changed, 652 insertions(+), 93 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 01/10] softfloat: Use mulu64 for mul64To128
  2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
  2020-10-15 19:08   ` Alex Bennée
  2020-09-25 15:20 ` [PATCH v2 02/10] softfloat: Use int128.h for some operations Richard Henderson
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: bharata, alex.bennee, david

Via host-utils.h, we use a host widening multiply for
64-bit hosts, and a common subroutine for 32-bit hosts.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-macros.h | 24 ++++--------------------
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index a35ec2893a..57845f8af0 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -83,6 +83,7 @@ this code that are retained.
 #define FPU_SOFTFLOAT_MACROS_H
 
 #include "fpu/softfloat-types.h"
+#include "qemu/host-utils.h"
 
 /*----------------------------------------------------------------------------
 | Shifts `a' right by the number of bits given in `count'.  If any nonzero
@@ -515,27 +516,10 @@ static inline void
 | `z0Ptr' and `z1Ptr'.
 *----------------------------------------------------------------------------*/
 
-static inline void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr )
+static inline void
+mul64To128(uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr)
 {
-    uint32_t aHigh, aLow, bHigh, bLow;
-    uint64_t z0, zMiddleA, zMiddleB, z1;
-
-    aLow = a;
-    aHigh = a>>32;
-    bLow = b;
-    bHigh = b>>32;
-    z1 = ( (uint64_t) aLow ) * bLow;
-    zMiddleA = ( (uint64_t) aLow ) * bHigh;
-    zMiddleB = ( (uint64_t) aHigh ) * bLow;
-    z0 = ( (uint64_t) aHigh ) * bHigh;
-    zMiddleA += zMiddleB;
-    z0 += ( ( (uint64_t) ( zMiddleA < zMiddleB ) )<<32 ) + ( zMiddleA>>32 );
-    zMiddleA <<= 32;
-    z1 += zMiddleA;
-    z0 += ( z1 < zMiddleA );
-    *z1Ptr = z1;
-    *z0Ptr = z0;
-
+    mulu64(z1Ptr, z0Ptr, a, b);
 }
 
 /*----------------------------------------------------------------------------
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 02/10] softfloat: Use int128.h for some operations
  2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
  2020-09-25 15:20 ` [PATCH v2 01/10] softfloat: Use mulu64 for mul64To128 Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
  2020-10-15 19:10   ` Alex Bennée
  2020-09-25 15:20 ` [PATCH v2 03/10] softfloat: Tidy a * b + inf return Richard Henderson
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: bharata, alex.bennee, david

Use our Int128, which wraps the compiler's __int128_t,
instead of open-coding left shifts and arithmetic.
We'd need to extend Int128 to have unsigned operations
to replace more than these three.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-macros.h | 39 +++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 57845f8af0..95d88d05b8 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -84,6 +84,7 @@ this code that are retained.
 
 #include "fpu/softfloat-types.h"
 #include "qemu/host-utils.h"
+#include "qemu/int128.h"
 
 /*----------------------------------------------------------------------------
 | Shifts `a' right by the number of bits given in `count'.  If any nonzero
@@ -352,13 +353,11 @@ static inline void shortShift128Left(uint64_t a0, uint64_t a1, int count,
 static inline void shift128Left(uint64_t a0, uint64_t a1, int count,
                                 uint64_t *z0Ptr, uint64_t *z1Ptr)
 {
-    if (count < 64) {
-        *z1Ptr = a1 << count;
-        *z0Ptr = count == 0 ? a0 : (a0 << count) | (a1 >> (-count & 63));
-    } else {
-        *z1Ptr = 0;
-        *z0Ptr = a1 << (count - 64);
-    }
+    Int128 a = int128_make128(a1, a0);
+    Int128 z = int128_lshift(a, count);
+
+    *z0Ptr = int128_gethi(z);
+    *z1Ptr = int128_getlo(z);
 }
 
 /*----------------------------------------------------------------------------
@@ -405,15 +404,15 @@ static inline void
 *----------------------------------------------------------------------------*/
 
 static inline void
- add128(
-     uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr )
+add128(uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1,
+       uint64_t *z0Ptr, uint64_t *z1Ptr)
 {
-    uint64_t z1;
-
-    z1 = a1 + b1;
-    *z1Ptr = z1;
-    *z0Ptr = a0 + b0 + ( z1 < a1 );
+    Int128 a = int128_make128(a1, a0);
+    Int128 b = int128_make128(b1, b0);
+    Int128 z = int128_add(a, b);
 
+    *z0Ptr = int128_gethi(z);
+    *z1Ptr = int128_getlo(z);
 }
 
 /*----------------------------------------------------------------------------
@@ -463,13 +462,15 @@ static inline void
 *----------------------------------------------------------------------------*/
 
 static inline void
- sub128(
-     uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr )
+sub128(uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1,
+       uint64_t *z0Ptr, uint64_t *z1Ptr)
 {
+    Int128 a = int128_make128(a1, a0);
+    Int128 b = int128_make128(b1, b0);
+    Int128 z = int128_sub(a, b);
 
-    *z1Ptr = a1 - b1;
-    *z0Ptr = a0 - b0 - ( a1 < b1 );
-
+    *z0Ptr = int128_gethi(z);
+    *z1Ptr = int128_getlo(z);
 }
 
 /*----------------------------------------------------------------------------
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 03/10] softfloat: Tidy a * b + inf return
  2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
  2020-09-25 15:20 ` [PATCH v2 01/10] softfloat: Use mulu64 for mul64To128 Richard Henderson
  2020-09-25 15:20 ` [PATCH v2 02/10] softfloat: Use int128.h for some operations Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
  2020-10-16  9:40   ` Alex Bennée
  2020-10-16 17:04   ` Philippe Mathieu-Daudé
  2020-09-25 15:20 ` [PATCH v2 04/10] softfloat: Add float_cmask and constants Richard Henderson
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: bharata, alex.bennee, david

No reason to set values in 'a', when we already
have float_class_inf in 'c', and can flip that sign.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 67cfa0fd82..9db55d2b11 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1380,9 +1380,8 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
             s->float_exception_flags |= float_flag_invalid;
             return parts_default_nan(s);
         } else {
-            a.cls = float_class_inf;
-            a.sign = c.sign ^ sign_flip;
-            return a;
+            c.sign ^= sign_flip;
+            return c;
         }
     }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 04/10] softfloat: Add float_cmask and constants
  2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
                   ` (2 preceding siblings ...)
  2020-09-25 15:20 ` [PATCH v2 03/10] softfloat: Tidy a * b + inf return Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
  2020-10-16  9:44   ` Alex Bennée
  2020-09-25 15:20 ` [PATCH v2 05/10] softfloat: Inline pick_nan_muladd into its caller Richard Henderson
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: bharata, alex.bennee, david

Testing more than one class at a time is better done with masks.
This reduces the static branch count.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 9db55d2b11..3e625c47cd 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -469,6 +469,20 @@ typedef enum __attribute__ ((__packed__)) {
     float_class_snan,
 } FloatClass;
 
+#define float_cmask(bit)  (1u << (bit))
+
+enum {
+    float_cmask_zero    = float_cmask(float_class_zero),
+    float_cmask_normal  = float_cmask(float_class_normal),
+    float_cmask_inf     = float_cmask(float_class_inf),
+    float_cmask_qnan    = float_cmask(float_class_qnan),
+    float_cmask_snan    = float_cmask(float_class_snan),
+
+    float_cmask_infzero = float_cmask_zero | float_cmask_inf,
+    float_cmask_anynan  = float_cmask_qnan | float_cmask_snan,
+};
+
+
 /* Simple helpers for checking if, or what kind of, NaN we have */
 static inline __attribute__((unused)) bool is_nan(FloatClass c)
 {
@@ -1335,24 +1349,27 @@ bfloat16 QEMU_FLATTEN bfloat16_mul(bfloat16 a, bfloat16 b, float_status *status)
 static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
                                 int flags, float_status *s)
 {
-    bool inf_zero = ((1 << a.cls) | (1 << b.cls)) ==
-                    ((1 << float_class_inf) | (1 << float_class_zero));
-    bool p_sign;
+    bool inf_zero, p_sign;
     bool sign_flip = flags & float_muladd_negate_result;
     FloatClass p_class;
     uint64_t hi, lo;
     int p_exp;
+    int ab_mask, abc_mask;
+
+    ab_mask = float_cmask(a.cls) | float_cmask(b.cls);
+    abc_mask = float_cmask(c.cls) | ab_mask;
+    inf_zero = ab_mask == float_cmask_infzero;
 
     /* It is implementation-defined whether the cases of (0,inf,qnan)
      * and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
      * they return if they do), so we have to hand this information
      * off to the target-specific pick-a-NaN routine.
      */
-    if (is_nan(a.cls) || is_nan(b.cls) || is_nan(c.cls)) {
+    if (unlikely(abc_mask & float_cmask_anynan)) {
         return pick_nan_muladd(a, b, c, inf_zero, s);
     }
 
-    if (inf_zero) {
+    if (unlikely(inf_zero)) {
         s->float_exception_flags |= float_flag_invalid;
         return parts_default_nan(s);
     }
@@ -1367,9 +1384,9 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
         p_sign ^= 1;
     }
 
-    if (a.cls == float_class_inf || b.cls == float_class_inf) {
+    if (ab_mask & float_cmask_inf) {
         p_class = float_class_inf;
-    } else if (a.cls == float_class_zero || b.cls == float_class_zero) {
+    } else if (ab_mask & float_cmask_zero) {
         p_class = float_class_zero;
     } else {
         p_class = float_class_normal;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 05/10] softfloat: Inline pick_nan_muladd into its caller
  2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
                   ` (3 preceding siblings ...)
  2020-09-25 15:20 ` [PATCH v2 04/10] softfloat: Add float_cmask and constants Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
  2020-10-16 16:20   ` Alex Bennée
  2020-09-25 15:20 ` [PATCH v2 06/10] softfloat: Implement float128_muladd Richard Henderson
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: bharata, alex.bennee, david

Because of FloatParts, there will only ever be one caller.
Inlining allows us to re-use abc_mask for the snan test.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 75 +++++++++++++++++++++++--------------------------
 1 file changed, 35 insertions(+), 40 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3e625c47cd..e038434a07 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -929,45 +929,6 @@ static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
     return a;
 }
 
-static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
-                                  bool inf_zero, float_status *s)
-{
-    int which;
-
-    if (is_snan(a.cls) || is_snan(b.cls) || is_snan(c.cls)) {
-        s->float_exception_flags |= float_flag_invalid;
-    }
-
-    which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
-
-    if (s->default_nan_mode) {
-        /* Note that this check is after pickNaNMulAdd so that function
-         * has an opportunity to set the Invalid flag.
-         */
-        which = 3;
-    }
-
-    switch (which) {
-    case 0:
-        break;
-    case 1:
-        a = b;
-        break;
-    case 2:
-        a = c;
-        break;
-    case 3:
-        return parts_default_nan(s);
-    default:
-        g_assert_not_reached();
-    }
-
-    if (is_snan(a.cls)) {
-        return parts_silence_nan(a, s);
-    }
-    return a;
-}
-
 /*
  * Returns the result of adding or subtracting the values of the
  * floating-point values `a' and `b'. The operation is performed
@@ -1366,7 +1327,41 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
      * off to the target-specific pick-a-NaN routine.
      */
     if (unlikely(abc_mask & float_cmask_anynan)) {
-        return pick_nan_muladd(a, b, c, inf_zero, s);
+        int which;
+
+        if (unlikely(abc_mask & float_cmask_snan)) {
+            float_raise(float_flag_invalid, s);
+        }
+
+        which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
+
+        if (s->default_nan_mode) {
+            /*
+             * Note that this check is after pickNaNMulAdd so that function
+             * has an opportunity to set the Invalid flag for inf_zero.
+             */
+            which = 3;
+        }
+
+        switch (which) {
+        case 0:
+            break;
+        case 1:
+            a = b;
+            break;
+        case 2:
+            a = c;
+            break;
+        case 3:
+            return parts_default_nan(s);
+        default:
+            g_assert_not_reached();
+        }
+
+        if (is_snan(a.cls)) {
+            return parts_silence_nan(a, s);
+        }
+        return a;
     }
 
     if (unlikely(inf_zero)) {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 06/10] softfloat: Implement float128_muladd
  2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
                   ` (4 preceding siblings ...)
  2020-09-25 15:20 ` [PATCH v2 05/10] softfloat: Inline pick_nan_muladd into its caller Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
  2020-10-16 16:31   ` Alex Bennée
  2020-09-25 15:20 ` [PATCH v2 07/10] softfloat: Use x86_64 assembly for {add, sub}{192, 256} Richard Henderson
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: bharata, alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat.h |   2 +
 fpu/softfloat.c         | 416 +++++++++++++++++++++++++++++++++++++++-
 tests/fp/fp-test.c      |   2 +-
 tests/fp/wrap.c.inc     |  12 ++
 4 files changed, 430 insertions(+), 2 deletions(-)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 78ad5ca738..a38433deb4 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -1196,6 +1196,8 @@ float128 float128_sub(float128, float128, float_status *status);
 float128 float128_mul(float128, float128, float_status *status);
 float128 float128_div(float128, float128, float_status *status);
 float128 float128_rem(float128, float128, float_status *status);
+float128 float128_muladd(float128, float128, float128, int,
+                         float_status *status);
 float128 float128_sqrt(float128, float_status *status);
 FloatRelation float128_compare(float128, float128, float_status *status);
 FloatRelation float128_compare_quiet(float128, float128, float_status *status);
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index e038434a07..49de31fec2 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -512,11 +512,19 @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
 
 typedef struct {
     uint64_t frac;
-    int32_t  exp;
+    int32_t exp;
     FloatClass cls;
     bool sign;
 } FloatParts;
 
+/* Similar for float128.  */
+typedef struct {
+    uint64_t frac0, frac1;
+    int32_t exp;
+    FloatClass cls;
+    bool sign;
+} FloatParts128;
+
 #define DECOMPOSED_BINARY_POINT    (64 - 2)
 #define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
 #define DECOMPOSED_OVERFLOW_BIT    (DECOMPOSED_IMPLICIT_BIT << 1)
@@ -4574,6 +4582,46 @@ static void
 
 }
 
+/*----------------------------------------------------------------------------
+| Returns the parts of floating-point value `a'.
+*----------------------------------------------------------------------------*/
+
+static void float128_unpack(FloatParts128 *p, float128 a, float_status *status)
+{
+    p->sign = extractFloat128Sign(a);
+    p->exp = extractFloat128Exp(a);
+    p->frac0 = extractFloat128Frac0(a);
+    p->frac1 = extractFloat128Frac1(a);
+
+    if (p->exp == 0) {
+        if ((p->frac0 | p->frac1) == 0) {
+            p->cls = float_class_zero;
+        } else if (status->flush_inputs_to_zero) {
+            float_raise(float_flag_input_denormal, status);
+            p->cls = float_class_zero;
+            p->frac0 = p->frac1 = 0;
+        } else {
+            normalizeFloat128Subnormal(p->frac0, p->frac1, &p->exp,
+                                       &p->frac0, &p->frac1);
+            p->exp -= 0x3fff;
+            p->cls = float_class_normal;
+        }
+    } else if (p->exp == 0x7fff) {
+        if ((p->frac0 | p->frac1) == 0) {
+            p->cls = float_class_inf;
+        } else if (float128_is_signaling_nan(a, status)) {
+            p->cls = float_class_snan;
+        } else {
+            p->cls = float_class_qnan;
+        }
+    } else {
+        /* Add the implicit bit. */
+        p->frac0 |= UINT64_C(0x0001000000000000);
+        p->exp -= 0x3fff;
+        p->cls = float_class_normal;
+    }
+}
+
 /*----------------------------------------------------------------------------
 | Packs the sign `zSign', the exponent `zExp', and the significand formed
 | by the concatenation of `zSig0' and `zSig1' into a quadruple-precision
@@ -7205,6 +7253,372 @@ float128 float128_mul(float128 a, float128 b, float_status *status)
 
 }
 
+typedef struct UInt256 {
+    /* Indexed big-endian, to match the rest of softfloat numbering. */
+    uint64_t w[4];
+} UInt256;
+
+static inline uint64_t shl_double(uint64_t h, uint64_t l, unsigned lsh)
+{
+    return lsh ? (h << lsh) | (l >> (64 - lsh)) : h;
+}
+
+static inline uint64_t shr_double(uint64_t h, uint64_t l, unsigned rsh)
+{
+    return rsh ? (h << (64 - rsh)) | (l >> rsh) : l;
+}
+
+static void shortShift256Left(UInt256 *p, unsigned lsh)
+{
+    if (lsh != 0) {
+        p->w[0] = shl_double(p->w[0], p->w[1], lsh);
+        p->w[1] = shl_double(p->w[1], p->w[2], lsh);
+        p->w[2] = shl_double(p->w[2], p->w[3], lsh);
+        p->w[3] <<= lsh;
+    }
+}
+
+static inline void shift256RightJamming(UInt256 *p, unsigned count)
+{
+    uint64_t out, p0, p1, p2, p3;
+
+    p0 = p->w[0];
+    p1 = p->w[1];
+    p2 = p->w[2];
+    p3 = p->w[3];
+
+    if (unlikely(count == 0)) {
+        return;
+    } else if (likely(count < 64)) {
+        out = 0;
+    } else if (likely(count < 256)) {
+        if (count < 128) {
+            out = p3;
+            p3 = p2;
+            p2 = p1;
+            p1 = p0;
+            p0 = 0;
+        } else if (count < 192) {
+            out = p2 | p3;
+            p3 = p1;
+            p2 = p0;
+            p1 = 0;
+            p0 = 0;
+        } else {
+            out = p1 | p2 | p3;
+            p3 = p0;
+            p2 = 0;
+            p1 = 0;
+            p0 = 0;
+        }
+        count &= 63;
+        if (count == 0) {
+            goto done;
+        }
+    } else {
+        out = p0 | p1 | p2 | p3;
+        p3 = 0;
+        p2 = 0;
+        p1 = 0;
+        p0 = 0;
+        goto done;
+    }
+
+    out |= shr_double(p3, 0, count);
+    p3 = shr_double(p2, p3, count);
+    p2 = shr_double(p1, p2, count);
+    p1 = shr_double(p0, p1, count);
+    p0 = p0 >> count;
+
+ done:
+    p->w[3] = p3 | (out != 0);
+    p->w[2] = p2;
+    p->w[1] = p1;
+    p->w[0] = p0;
+}
+
+/* R = A - B */
+static void sub256(UInt256 *r, UInt256 *a, UInt256 *b)
+{
+    bool borrow = false;
+
+    for (int i = 3; i >= 0; --i) {
+        uint64_t at = a->w[i];
+        uint64_t bt = b->w[i];
+        uint64_t rt = at - bt;
+
+        if (borrow) {
+            borrow = at <= bt;
+            rt -= 1;
+        } else {
+            borrow = at < bt;
+        }
+        r->w[i] = rt;
+    }
+}
+
+/* A = -A */
+static void neg256(UInt256 *a)
+{
+    /*
+     * Recall that -X - 1 = ~X, and that since this is negation,
+     * once we find a non-zero number, all subsequent words will
+     * have borrow-in, and thus use NOT.
+     */
+    uint64_t t = a->w[3];
+    if (likely(t)) {
+        a->w[3] = -t;
+        goto not2;
+    }
+    t = a->w[2];
+    if (likely(t)) {
+        a->w[2] = -t;
+        goto not1;
+    }
+    t = a->w[1];
+    if (likely(t)) {
+        a->w[1] = -t;
+        goto not0;
+    }
+    a->w[0] = -a->w[0];
+    return;
+ not2:
+    a->w[2] = ~a->w[2];
+ not1:
+    a->w[1] = ~a->w[1];
+ not0:
+    a->w[0] = ~a->w[0];
+}
+
+/* A += B */
+static void add256(UInt256 *a, UInt256 *b)
+{
+    bool carry = false;
+
+    for (int i = 3; i >= 0; --i) {
+        uint64_t bt = b->w[i];
+        uint64_t at = a->w[i] + bt;
+
+        if (carry) {
+            at += 1;
+            carry = at <= bt;
+        } else {
+            carry = at < bt;
+        }
+        a->w[i] = at;
+    }
+}
+
+float128 float128_muladd(float128 a_f, float128 b_f, float128 c_f,
+                         int flags, float_status *status)
+{
+    bool inf_zero, p_sign, sign_flip;
+    int p_exp, exp_diff, shift, ab_mask, abc_mask;
+    FloatParts128 a, b, c;
+    FloatClass p_cls;
+    UInt256 p_frac, c_frac;
+
+    float128_unpack(&a, a_f, status);
+    float128_unpack(&b, b_f, status);
+    float128_unpack(&c, c_f, status);
+
+    ab_mask = float_cmask(a.cls) | float_cmask(b.cls);
+    abc_mask = float_cmask(c.cls) | ab_mask;
+    inf_zero = ab_mask == float_cmask_infzero;
+
+    /* If any input is a NaN, select the required result. */
+    if (unlikely(abc_mask & float_cmask_anynan)) {
+        if (unlikely(abc_mask & float_cmask_snan)) {
+            float_raise(float_flag_invalid, status);
+        }
+
+        int which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, status);
+        if (status->default_nan_mode) {
+            which = 3;
+        }
+        switch (which) {
+        case 0:
+            break;
+        case 1:
+            a_f = b_f;
+            a.cls = b.cls;
+            break;
+        case 2:
+            a_f = c_f;
+            a.cls = c.cls;
+            break;
+        case 3:
+            return float128_default_nan(status);
+        }
+        if (is_snan(a.cls)) {
+            return float128_silence_nan(a_f, status);
+        }
+        return a_f;
+    }
+
+    /* After dealing with input NaNs, look for Inf * Zero. */
+    if (unlikely(inf_zero)) {
+        float_raise(float_flag_invalid, status);
+        return float128_default_nan(status);
+    }
+
+    p_sign = a.sign ^ b.sign;
+
+    if (flags & float_muladd_negate_c) {
+        c.sign ^= 1;
+    }
+    if (flags & float_muladd_negate_product) {
+        p_sign ^= 1;
+    }
+    sign_flip = (flags & float_muladd_negate_result);
+
+    if (ab_mask & float_cmask_inf) {
+        p_cls = float_class_inf;
+    } else if (ab_mask & float_cmask_zero) {
+        p_cls = float_class_zero;
+    } else {
+        p_cls = float_class_normal;
+    }
+
+    if (c.cls == float_class_inf) {
+        if (p_cls == float_class_inf && p_sign != c.sign) {
+            /* +Inf + -Inf = NaN */
+            float_raise(float_flag_invalid, status);
+            return float128_default_nan(status);
+        }
+        /* Inf + Inf = Inf of the proper sign; reuse the return below. */
+        p_cls = float_class_inf;
+        p_sign = c.sign;
+    }
+
+    if (p_cls == float_class_inf) {
+        return packFloat128(p_sign ^ sign_flip, 0x7fff, 0, 0);
+    }
+
+    if (p_cls == float_class_zero) {
+        if (c.cls == float_class_zero) {
+            if (p_sign != c.sign) {
+                p_sign = status->float_rounding_mode == float_round_down;
+            }
+            return packFloat128(p_sign ^ sign_flip, 0, 0, 0);
+        }
+
+        if (flags & float_muladd_halve_result) {
+            c.exp -= 1;
+        }
+        return roundAndPackFloat128(c.sign ^ sign_flip,
+                                    c.exp + 0x3fff - 1,
+                                    c.frac0, c.frac1, 0, status);
+    }
+
+    /* a & b should be normals now... */
+    assert(a.cls == float_class_normal && b.cls == float_class_normal);
+
+    /* Multiply of 2 113-bit numbers produces a 226-bit result.  */
+    mul128To256(a.frac0, a.frac1, b.frac0, b.frac1,
+                &p_frac.w[0], &p_frac.w[1], &p_frac.w[2], &p_frac.w[3]);
+
+    /* Realign the binary point at bit 48 of p_frac[0].  */
+    shift = clz64(p_frac.w[0]) - 15;
+    shortShift256Left(&p_frac, shift);
+    p_exp = a.exp + b.exp - (shift - 16);
+    exp_diff = p_exp - c.exp;
+
+    /* Extend the fraction part of the addend to 256 bits.  */
+    c_frac.w[0] = c.frac0;
+    c_frac.w[1] = c.frac1;
+    c_frac.w[2] = 0;
+    c_frac.w[3] = 0;
+
+    /* Add or subtract C from the intermediate product. */
+    if (c.cls == float_class_zero) {
+        /* Fall through to rounding after addition (with zero). */
+    } else if (p_sign != c.sign) {
+        /* Subtraction */
+        if (exp_diff < 0) {
+            shift256RightJamming(&p_frac, -exp_diff);
+            sub256(&p_frac, &c_frac, &p_frac);
+            p_exp = c.exp;
+            p_sign ^= 1;
+        } else if (exp_diff > 0) {
+            shift256RightJamming(&c_frac, exp_diff);
+            sub256(&p_frac, &p_frac, &c_frac);
+        } else {
+            /* Low 128 bits of C are known to be zero. */
+            sub128(p_frac.w[0], p_frac.w[1], c_frac.w[0], c_frac.w[1],
+                   &p_frac.w[0], &p_frac.w[1]);
+            /*
+             * Since we have normalized to bit 48 of p_frac[0],
+             * a negative result means C > P and we need to invert.
+             */
+            if ((int64_t)p_frac.w[0] < 0) {
+                neg256(&p_frac);
+                p_sign ^= 1;
+            }
+        }
+
+        /*
+         * Gross normalization of the 256-bit subtraction result.
+         * Fine tuning below shared with addition.
+         */
+        if (p_frac.w[0] != 0) {
+            /* nothing to do */
+        } else if (p_frac.w[1] != 0) {
+            p_exp -= 64;
+            p_frac.w[0] = p_frac.w[1];
+            p_frac.w[1] = p_frac.w[2];
+            p_frac.w[2] = p_frac.w[3];
+            p_frac.w[3] = 0;
+        } else if (p_frac.w[2] != 0) {
+            p_exp -= 128;
+            p_frac.w[0] = p_frac.w[2];
+            p_frac.w[1] = p_frac.w[3];
+            p_frac.w[2] = 0;
+            p_frac.w[3] = 0;
+        } else if (p_frac.w[3] != 0) {
+            p_exp -= 192;
+            p_frac.w[0] = p_frac.w[3];
+            p_frac.w[1] = 0;
+            p_frac.w[2] = 0;
+            p_frac.w[3] = 0;
+        } else {
+            /* Subtraction was exact: result is zero. */
+            p_sign = status->float_rounding_mode == float_round_down;
+            return packFloat128(p_sign ^ sign_flip, 0, 0, 0);
+        }
+    } else {
+        /* Addition */
+        if (exp_diff <= 0) {
+            shift256RightJamming(&p_frac, -exp_diff);
+            /* Low 128 bits of C are known to be zero. */
+            add128(p_frac.w[0], p_frac.w[1], c_frac.w[0], c_frac.w[1],
+                   &p_frac.w[0], &p_frac.w[1]);
+            p_exp = c.exp;
+        } else {
+            shift256RightJamming(&c_frac, exp_diff);
+            add256(&p_frac, &c_frac);
+        }
+    }
+
+    /* Fine normalization of the 256-bit result: p_frac[0] != 0. */
+    shift = clz64(p_frac.w[0]) - 15;
+    if (shift < 0) {
+        shift256RightJamming(&p_frac, -shift);
+    } else if (shift > 0) {
+        shortShift256Left(&p_frac, shift);
+    }
+    p_exp -= shift;
+
+    if (flags & float_muladd_halve_result) {
+        p_exp -= 1;
+    }
+    return roundAndPackFloat128(p_sign ^ sign_flip,
+                                p_exp + 0x3fff - 1,
+                                p_frac.w[0], p_frac.w[1],
+                                p_frac.w[2] | (p_frac.w[3] != 0),
+                                status);
+}
+
 /*----------------------------------------------------------------------------
 | Returns the result of dividing the quadruple-precision floating-point value
 | `a' by the corresponding value `b'.  The operation is performed according to
diff --git a/tests/fp/fp-test.c b/tests/fp/fp-test.c
index 06ffebd6db..9bbb0dba67 100644
--- a/tests/fp/fp-test.c
+++ b/tests/fp/fp-test.c
@@ -717,7 +717,7 @@ static void do_testfloat(int op, int rmode, bool exact)
         test_abz_f128(true_abz_f128M, subj_abz_f128M);
         break;
     case F128_MULADD:
-        not_implemented();
+        test_abcz_f128(slow_f128M_mulAdd, qemu_f128_mulAdd);
         break;
     case F128_SQRT:
         test_az_f128(slow_f128M_sqrt, qemu_f128M_sqrt);
diff --git a/tests/fp/wrap.c.inc b/tests/fp/wrap.c.inc
index 0cbd20013e..65a713deae 100644
--- a/tests/fp/wrap.c.inc
+++ b/tests/fp/wrap.c.inc
@@ -574,6 +574,18 @@ WRAP_MULADD(qemu_f32_mulAdd, float32_muladd, float32)
 WRAP_MULADD(qemu_f64_mulAdd, float64_muladd, float64)
 #undef WRAP_MULADD
 
+static void qemu_f128_mulAdd(const float128_t *ap, const float128_t *bp,
+                             const float128_t *cp, float128_t *res)
+{
+    float128 a, b, c, ret;
+
+    a = soft_to_qemu128(*ap);
+    b = soft_to_qemu128(*bp);
+    c = soft_to_qemu128(*cp);
+    ret = float128_muladd(a, b, c, 0, &qsf);
+    *res = qemu_to_soft128(ret);
+}
+
 #define WRAP_CMP16(name, func, retcond)         \
     static bool name(float16_t a, float16_t b)  \
     {                                           \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 07/10] softfloat: Use x86_64 assembly for {add, sub}{192, 256}
  2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
                   ` (5 preceding siblings ...)
  2020-09-25 15:20 ` [PATCH v2 06/10] softfloat: Implement float128_muladd Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
  2020-09-25 15:20 ` [PATCH v2 08/10] softfloat: Use x86_64 assembly for sh[rl]_double Richard Henderson
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: bharata, alex.bennee, david

The compiler cannot chain more than two additions together.
Use inline assembly for 3 or 4 additions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-macros.h | 18 ++++++++++++++++--
 fpu/softfloat.c                | 29 +++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 95d88d05b8..99fa124e56 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -436,6 +436,13 @@ static inline void
      uint64_t *z2Ptr
  )
 {
+#ifdef __x86_64__
+    asm("add %5, %2\n\t"
+        "adc %4, %1\n\t"
+        "adc %3, %0"
+        : "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
+        : "rm"(b0), "rm"(b1), "rm"(b2), "0"(a0), "1"(a1), "2"(a2));
+#else
     uint64_t z0, z1, z2;
     int8_t carry0, carry1;
 
@@ -450,7 +457,7 @@ static inline void
     *z2Ptr = z2;
     *z1Ptr = z1;
     *z0Ptr = z0;
-
+#endif
 }
 
 /*----------------------------------------------------------------------------
@@ -494,6 +501,13 @@ static inline void
      uint64_t *z2Ptr
  )
 {
+#ifdef __x86_64__
+    asm("sub %5, %2\n\t"
+        "sbb %4, %1\n\t"
+        "sbb %3, %0"
+        : "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
+        : "rm"(b0), "rm"(b1), "rm"(b2), "0"(a0), "1"(a1), "2"(a2));
+#else
     uint64_t z0, z1, z2;
     int8_t borrow0, borrow1;
 
@@ -508,7 +522,7 @@ static inline void
     *z2Ptr = z2;
     *z1Ptr = z1;
     *z0Ptr = z0;
-
+#endif
 }
 
 /*----------------------------------------------------------------------------
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 49de31fec2..54d0b210ac 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -7340,6 +7340,15 @@ static inline void shift256RightJamming(UInt256 *p, unsigned count)
 /* R = A - B */
 static void sub256(UInt256 *r, UInt256 *a, UInt256 *b)
 {
+#if defined(__x86_64__)
+    asm("sub %7, %3\n\t"
+        "sbb %6, %2\n\t"
+        "sbb %5, %1\n\t"
+        "sbb %4, %0"
+        : "=&r"(r->w[0]), "=&r"(r->w[1]), "=&r"(r->w[2]), "=&r"(r->w[3])
+        : "rme"(b->w[0]), "rme"(b->w[1]), "rme"(b->w[2]), "rme"(b->w[3]),
+            "0"(a->w[0]),   "1"(a->w[1]),   "2"(a->w[2]),   "3"(a->w[3]));
+#else
     bool borrow = false;
 
     for (int i = 3; i >= 0; --i) {
@@ -7355,11 +7364,21 @@ static void sub256(UInt256 *r, UInt256 *a, UInt256 *b)
         }
         r->w[i] = rt;
     }
+#endif
 }
 
 /* A = -A */
 static void neg256(UInt256 *a)
 {
+#if defined(__x86_64__)
+    asm("negq %3\n\t"
+        "sbb %6, %2\n\t"
+        "sbb %5, %1\n\t"
+        "sbb %4, %0"
+        : "=&r"(a->w[0]), "=&r"(a->w[1]), "=&r"(a->w[2]), "+rm"(a->w[3])
+        : "rme"(a->w[0]), "rme"(a->w[1]), "rme"(a->w[2]),
+          "0"(0), "1"(0), "2"(0));
+#else
     /*
      * Recall that -X - 1 = ~X, and that since this is negation,
      * once we find a non-zero number, all subsequent words will
@@ -7388,11 +7407,20 @@ static void neg256(UInt256 *a)
     a->w[1] = ~a->w[1];
  not0:
     a->w[0] = ~a->w[0];
+#endif
 }
 
 /* A += B */
 static void add256(UInt256 *a, UInt256 *b)
 {
+#if defined(__x86_64__)
+    asm("add %7, %3\n\t"
+        "adc %6, %2\n\t"
+        "adc %5, %1\n\t"
+        "adc %4, %0"
+        :  "+r"(a->w[0]),  "+r"(a->w[1]),  "+r"(a->w[2]),  "+r"(a->w[3])
+        : "rme"(b->w[0]), "rme"(b->w[1]), "rme"(b->w[2]), "rme"(b->w[3]));
+#else
     bool carry = false;
 
     for (int i = 3; i >= 0; --i) {
@@ -7407,6 +7435,7 @@ static void add256(UInt256 *a, UInt256 *b)
         }
         a->w[i] = at;
     }
+#endif
 }
 
 float128 float128_muladd(float128 a_f, float128 b_f, float128 c_f,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 08/10] softfloat: Use x86_64 assembly for sh[rl]_double
  2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
                   ` (6 preceding siblings ...)
  2020-09-25 15:20 ` [PATCH v2 07/10] softfloat: Use x86_64 assembly for {add, sub}{192, 256} Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
  2020-09-25 15:20 ` [PATCH v2 09/10] softfloat: Use aarch64 assembly for {add, sub}{192, 256} Richard Henderson
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: bharata, alex.bennee, david

GCC isn't recognizing this pattern for x86, and it
probably couldn't recognize that the outer condition
is not required either.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 54d0b210ac..fdf5bde69e 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -7260,12 +7260,22 @@ typedef struct UInt256 {
 
 static inline uint64_t shl_double(uint64_t h, uint64_t l, unsigned lsh)
 {
+#ifdef __x86_64__
+    asm("shld %b2, %1, %0" : "+r"(h) : "r"(l), "ci"(lsh));
+    return h;
+#else
     return lsh ? (h << lsh) | (l >> (64 - lsh)) : h;
+#endif
 }
 
 static inline uint64_t shr_double(uint64_t h, uint64_t l, unsigned rsh)
 {
+#ifdef __x86_64__
+    asm("shrd %b2, %1, %0" : "+r"(l) : "r"(h), "ci"(rsh));
+    return l;
+#else
     return rsh ? (h << (64 - rsh)) | (l >> rsh) : l;
+#endif
 }
 
 static void shortShift256Left(UInt256 *p, unsigned lsh)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 09/10] softfloat: Use aarch64 assembly for {add, sub}{192, 256}
  2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
                   ` (7 preceding siblings ...)
  2020-09-25 15:20 ` [PATCH v2 08/10] softfloat: Use x86_64 assembly for sh[rl]_double Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
  2020-09-25 15:20 ` [PATCH v2 10/10] softfloat: Use ppc64 " Richard Henderson
  2020-10-15 17:23 ` [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
  10 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: bharata, alex.bennee, david

The compiler cannot chain more than two additions together.
Use inline assembly for 3 or 4 additions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-macros.h | 14 ++++++++++++++
 fpu/softfloat.c                | 27 +++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 99fa124e56..969a486fd2 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -442,6 +442,13 @@ static inline void
         "adc %3, %0"
         : "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
         : "rm"(b0), "rm"(b1), "rm"(b2), "0"(a0), "1"(a1), "2"(a2));
+#elif defined(__aarch64__)
+    asm("adds %2, %x5, %x8\n\t"
+        "adcs %1, %x4, %x7\n\t"
+        "adc  %0, %x3, %x6"
+        : "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
+        : "rZ"(a0), "rZ"(a1), "rZ"(a2), "rZ"(b0), "rZ"(b1), "rZ"(b2)
+        : "cc");
 #else
     uint64_t z0, z1, z2;
     int8_t carry0, carry1;
@@ -507,6 +514,13 @@ static inline void
         "sbb %3, %0"
         : "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
         : "rm"(b0), "rm"(b1), "rm"(b2), "0"(a0), "1"(a1), "2"(a2));
+#elif defined(__aarch64__)
+    asm("subs %2, %x5, %x8\n\t"
+        "sbcs %1, %x4, %x7\n\t"
+        "sbc  %0, %x3, %x6"
+        : "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
+        : "rZ"(a0), "rZ"(a1), "rZ"(a2), "rZ"(b0), "rZ"(b1), "rZ"(b2)
+        : "cc");
 #else
     uint64_t z0, z1, z2;
     int8_t borrow0, borrow1;
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index fdf5bde69e..07dc17caad 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -7358,6 +7358,18 @@ static void sub256(UInt256 *r, UInt256 *a, UInt256 *b)
         : "=&r"(r->w[0]), "=&r"(r->w[1]), "=&r"(r->w[2]), "=&r"(r->w[3])
         : "rme"(b->w[0]), "rme"(b->w[1]), "rme"(b->w[2]), "rme"(b->w[3]),
             "0"(a->w[0]),   "1"(a->w[1]),   "2"(a->w[2]),   "3"(a->w[3]));
+#elif defined(__aarch64__)
+    asm("subs %[r3], %x[a3], %x[b3]\n\t"
+        "sbcs %[r2], %x[a2], %x[b2]\n\t"
+        "sbcs %[r1], %x[a1], %x[b1]\n\t"
+        "sbc  %[r0], %x[a0], %x[b0]"
+        : [r0] "=&r"(r->w[0]), [r1] "=&r"(r->w[1]),
+          [r2] "=&r"(r->w[2]), [r3] "=&r"(r->w[3])
+        : [a0] "rZ"(a->w[0]), [a1] "rZ"(a->w[1]),
+          [a2] "rZ"(a->w[2]), [a3] "rZ"(a->w[3]),
+          [b0] "rZ"(b->w[0]), [b1] "rZ"(b->w[1]),
+          [b2] "rZ"(b->w[2]), [b3] "rZ"(b->w[3])
+        : "cc");
 #else
     bool borrow = false;
 
@@ -7388,6 +7400,13 @@ static void neg256(UInt256 *a)
         : "=&r"(a->w[0]), "=&r"(a->w[1]), "=&r"(a->w[2]), "+rm"(a->w[3])
         : "rme"(a->w[0]), "rme"(a->w[1]), "rme"(a->w[2]),
           "0"(0), "1"(0), "2"(0));
+#elif defined(__aarch64__)
+    asm("negs %3, %3\n\t"
+        "ngcs %2, %2\n\t"
+        "ngcs %1, %1\n\t"
+        "ngc  %0, %0"
+        : "+r"(a->w[0]), "+r"(a->w[1]), "+r"(a->w[2]), "+r"(a->w[3])
+        : : "cc");
 #else
     /*
      * Recall that -X - 1 = ~X, and that since this is negation,
@@ -7430,6 +7449,14 @@ static void add256(UInt256 *a, UInt256 *b)
         "adc %4, %0"
         :  "+r"(a->w[0]),  "+r"(a->w[1]),  "+r"(a->w[2]),  "+r"(a->w[3])
         : "rme"(b->w[0]), "rme"(b->w[1]), "rme"(b->w[2]), "rme"(b->w[3]));
+#elif defined(__aarch64__)
+    asm("adds %3, %3, %x7\n\t"
+        "adcs %2, %2, %x6\n\t"
+        "adcs %1, %1, %x5\n\t"
+        "adc  %0, %0, %x4"
+        : "+r"(a->w[0]), "+r"(a->w[1]), "+r"(a->w[2]), "+r"(a->w[3])
+        : "rZ"(b->w[0]), "rZ"(b->w[1]), "rZ"(b->w[2]), "rZ"(b->w[3])
+        : "cc");
 #else
     bool carry = false;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 10/10] softfloat: Use ppc64 assembly for {add, sub}{192, 256}
  2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
                   ` (8 preceding siblings ...)
  2020-09-25 15:20 ` [PATCH v2 09/10] softfloat: Use aarch64 assembly for {add, sub}{192, 256} Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
  2020-10-15 17:23 ` [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
  10 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: bharata, alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-macros.h | 14 ++++++++++++++
 fpu/softfloat.c                | 27 +++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 969a486fd2..d26cfaf267 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -449,6 +449,13 @@ static inline void
         : "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
         : "rZ"(a0), "rZ"(a1), "rZ"(a2), "rZ"(b0), "rZ"(b1), "rZ"(b2)
         : "cc");
+#elif defined(__powerpc64__)
+    asm("addc %2, %5, %8\n\t"
+        "adde %1, %4, %7\n\t"
+        "adde %0, %3, %6"
+        : "=r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
+        : "r"(a0), "r"(a1), "r"(a2), "r"(b0), "r"(b1), "r"(b2)
+        : "ca");
 #else
     uint64_t z0, z1, z2;
     int8_t carry0, carry1;
@@ -521,6 +528,13 @@ static inline void
         : "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
         : "rZ"(a0), "rZ"(a1), "rZ"(a2), "rZ"(b0), "rZ"(b1), "rZ"(b2)
         : "cc");
+#elif defined(__powerpc64__)
+    asm("subfc %2, %8, %5\n\t"
+        "subfe %1, %7, %4\n\t"
+        "subfe %0, %6, %3"
+        : "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
+        : "r"(a0), "r"(a1), "r"(a2), "r"(b0), "r"(b1), "r"(b2)
+        : "ca");
 #else
     uint64_t z0, z1, z2;
     int8_t borrow0, borrow1;
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 07dc17caad..9af75b9146 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -7370,6 +7370,18 @@ static void sub256(UInt256 *r, UInt256 *a, UInt256 *b)
           [b0] "rZ"(b->w[0]), [b1] "rZ"(b->w[1]),
           [b2] "rZ"(b->w[2]), [b3] "rZ"(b->w[3])
         : "cc");
+#elif defined(__powerpc64__)
+    asm("subfc %[r3], %[b3], %[a3]\n\t"
+        "subfe %[r2], %[b2], %[a2]\n\t"
+        "subfe %[r1], %[b1], %[a1]\n\t"
+        "subfe %[r0], %[b0], %[a0]"
+        : [r0] "=&r"(r->w[0]), [r1] "=&r"(r->w[1]),
+          [r2] "=&r"(r->w[2]), [r3] "=&r"(r->w[3])
+        : [a0] "r"(a->w[0]), [a1] "r"(a->w[1]),
+          [a2] "r"(a->w[2]), [a3] "r"(a->w[3]),
+          [b0] "r"(b->w[0]), [b1] "r"(b->w[1]),
+          [b2] "r"(b->w[2]), [b3] "r"(b->w[3])
+        : "ca");
 #else
     bool borrow = false;
 
@@ -7407,6 +7419,13 @@ static void neg256(UInt256 *a)
         "ngc  %0, %0"
         : "+r"(a->w[0]), "+r"(a->w[1]), "+r"(a->w[2]), "+r"(a->w[3])
         : : "cc");
+#elif defined(__powerpc64__)
+    asm("subfic %3, %3, 0\n\t"
+        "subfze %2, %2\n\t"
+        "subfze %1, %1\n\t"
+        "subfze %0, %0"
+        : "+r"(a->w[0]), "+r"(a->w[1]), "+r"(a->w[2]), "+r"(a->w[3])
+        : : "ca");
 #else
     /*
      * Recall that -X - 1 = ~X, and that since this is negation,
@@ -7457,6 +7476,14 @@ static void add256(UInt256 *a, UInt256 *b)
         : "+r"(a->w[0]), "+r"(a->w[1]), "+r"(a->w[2]), "+r"(a->w[3])
         : "rZ"(b->w[0]), "rZ"(b->w[1]), "rZ"(b->w[2]), "rZ"(b->w[3])
         : "cc");
+#elif defined(__powerpc64__)
+    asm("addc %3, %3, %7\n\t"
+        "adde %2, %2, %6\n\t"
+        "adde %1, %1, %5\n\t"
+        "adde %0, %0, %4"
+        : "+r"(a->w[0]), "+r"(a->w[1]), "+r"(a->w[2]), "+r"(a->w[3])
+        :  "r"(b->w[0]),  "r"(b->w[1]),  "r"(b->w[2]),  "r"(b->w[3])
+        : "ca");
 #else
     bool carry = false;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/10] softfloat: Implement float128_muladd
  2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
                   ` (9 preceding siblings ...)
  2020-09-25 15:20 ` [PATCH v2 10/10] softfloat: Use ppc64 " Richard Henderson
@ 2020-10-15 17:23 ` Richard Henderson
  10 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-10-15 17:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: bharata, alex.bennee, david

Ping.

On 9/25/20 8:20 AM, Richard Henderson wrote:
> Plus assorted cleanups, passes tests/fp/fp-test.
> 
> Changes in v2:
>   * Add UInt256 type (david)
>   * Rewrite and inline shift256RightJamming.  This keeps the whole
>     UInt256 in registers, avoiding long sequences of loads and stores.
>   * Add x86_64 assembly for double shifts.  I don't know why the
>     compiler can't recognize this pattern, but swapping values in
>     and out of %cl (the only register in the base isa that can
>     hold a variable shift) is really ugly.
>   * Add ppc64 assembly.
> 
> 
> r~
> 
> 
> Richard Henderson (10):
>   softfloat: Use mulu64 for mul64To128
>   softfloat: Use int128.h for some operations
>   softfloat: Tidy a * b + inf return
>   softfloat: Add float_cmask and constants
>   softfloat: Inline pick_nan_muladd into its caller
>   softfloat: Implement float128_muladd
>   softfloat: Use x86_64 assembly for {add,sub}{192,256}
>   softfloat: Use x86_64 assembly for sh[rl]_double
>   softfloat: Use aarch64 assembly for {add,sub}{192,256}
>   softfloat: Use ppc64 assembly for {add,sub}{192,256}
> 
>  include/fpu/softfloat-macros.h | 109 +++---
>  include/fpu/softfloat.h        |   2 +
>  fpu/softfloat.c                | 620 ++++++++++++++++++++++++++++++---
>  tests/fp/fp-test.c             |   2 +-
>  tests/fp/wrap.c.inc            |  12 +
>  5 files changed, 652 insertions(+), 93 deletions(-)
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 01/10] softfloat: Use mulu64 for mul64To128
  2020-09-25 15:20 ` [PATCH v2 01/10] softfloat: Use mulu64 for mul64To128 Richard Henderson
@ 2020-10-15 19:08   ` Alex Bennée
  0 siblings, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2020-10-15 19:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: bharata, qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Via host-utils.h, we use a host widening multiply for
> 64-bit hosts, and a common subroutine for 32-bit hosts.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 02/10] softfloat: Use int128.h for some operations
  2020-09-25 15:20 ` [PATCH v2 02/10] softfloat: Use int128.h for some operations Richard Henderson
@ 2020-10-15 19:10   ` Alex Bennée
  0 siblings, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2020-10-15 19:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: bharata, qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Use our Int128, which wraps the compiler's __int128_t,
> instead of open-coding left shifts and arithmetic.
> We'd need to extend Int128 to have unsigned operations
> to replace more than these three.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 03/10] softfloat: Tidy a * b + inf return
  2020-09-25 15:20 ` [PATCH v2 03/10] softfloat: Tidy a * b + inf return Richard Henderson
@ 2020-10-16  9:40   ` Alex Bennée
  2020-10-16 17:04   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2020-10-16  9:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: bharata, qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> No reason to set values in 'a', when we already
> have float_class_inf in 'c', and can flip that sign.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 04/10] softfloat: Add float_cmask and constants
  2020-09-25 15:20 ` [PATCH v2 04/10] softfloat: Add float_cmask and constants Richard Henderson
@ 2020-10-16  9:44   ` Alex Bennée
  0 siblings, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2020-10-16  9:44 UTC (permalink / raw)
  To: Richard Henderson; +Cc: bharata, qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Testing more than one class at a time is better done with masks.
> This reduces the static branch count.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 05/10] softfloat: Inline pick_nan_muladd into its caller
  2020-09-25 15:20 ` [PATCH v2 05/10] softfloat: Inline pick_nan_muladd into its caller Richard Henderson
@ 2020-10-16 16:20   ` Alex Bennée
  2020-10-16 16:36     ` Richard Henderson
  0 siblings, 1 reply; 23+ messages in thread
From: Alex Bennée @ 2020-10-16 16:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: bharata, qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Because of FloatParts, there will only ever be one caller.

Isn't that admitting defeat - after all the logic here will be the same
as the login in the up coming float128_muladd code and we only seem to
need additional information:

> Inlining allows us to re-use abc_mask for the snan test.

couldn't we just pass the masks in?

<snip>
> -    if (is_snan(a.cls)) {
> -        return parts_silence_nan(a, s);
> -    }
> -    return a;

here.

> -}
> -
>  /*
>   * Returns the result of adding or subtracting the values of the
>   * floating-point values `a' and `b'. The operation is performed
> @@ -1366,7 +1327,41 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
>       * off to the target-specific pick-a-NaN routine.
>       */
>      if (unlikely(abc_mask & float_cmask_anynan)) {
> -        return pick_nan_muladd(a, b, c, inf_zero, s);
> +        int which;
> +
> +        if (unlikely(abc_mask & float_cmask_snan)) {
> +            float_raise(float_flag_invalid, s);
> +        }
> +
> +        which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
> +
> +        if (s->default_nan_mode) {
> +            /*
> +             * Note that this check is after pickNaNMulAdd so that function
> +             * has an opportunity to set the Invalid flag for inf_zero.
> +             */
> +            which = 3;
> +        }
> +
> +        switch (which) {
> +        case 0:
> +            break;
> +        case 1:
> +            a = b;
> +            break;
> +        case 2:
> +            a = c;
> +            break;
> +        case 3:
> +            return parts_default_nan(s);
> +        default:
> +            g_assert_not_reached();
> +        }
> +
> +        if (is_snan(a.cls)) {
> +            return parts_silence_nan(a, s);
> +        }
> +        return a;
>      }
>  
>      if (unlikely(inf_zero)) {

I'm not totally against it given it's fairly simple logic but it seems a
shame to loose the commonality of processing which makes the parts code
so much nicer.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 06/10] softfloat: Implement float128_muladd
  2020-09-25 15:20 ` [PATCH v2 06/10] softfloat: Implement float128_muladd Richard Henderson
@ 2020-10-16 16:31   ` Alex Bennée
  2020-10-16 16:55     ` Richard Henderson
  0 siblings, 1 reply; 23+ messages in thread
From: Alex Bennée @ 2020-10-16 16:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: bharata, qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/fpu/softfloat.h |   2 +
>  fpu/softfloat.c         | 416 +++++++++++++++++++++++++++++++++++++++-
>  tests/fp/fp-test.c      |   2 +-
>  tests/fp/wrap.c.inc     |  12 ++
>  4 files changed, 430 insertions(+), 2 deletions(-)
>
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 78ad5ca738..a38433deb4 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -1196,6 +1196,8 @@ float128 float128_sub(float128, float128, float_status *status);
>  float128 float128_mul(float128, float128, float_status *status);
>  float128 float128_div(float128, float128, float_status *status);
>  float128 float128_rem(float128, float128, float_status *status);
> +float128 float128_muladd(float128, float128, float128, int,
> +                         float_status *status);
>  float128 float128_sqrt(float128, float_status *status);
>  FloatRelation float128_compare(float128, float128, float_status *status);
>  FloatRelation float128_compare_quiet(float128, float128, float_status *status);
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index e038434a07..49de31fec2 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -512,11 +512,19 @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
>  
>  typedef struct {
>      uint64_t frac;
> -    int32_t  exp;
> +    int32_t exp;
>      FloatClass cls;
>      bool sign;
>  } FloatParts;
>  
> +/* Similar for float128.  */
> +typedef struct {
> +    uint64_t frac0, frac1;
> +    int32_t exp;
> +    FloatClass cls;
> +    bool sign;
> +} FloatParts128;
> +
>  #define DECOMPOSED_BINARY_POINT    (64 - 2)
>  #define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
>  #define DECOMPOSED_OVERFLOW_BIT    (DECOMPOSED_IMPLICIT_BIT << 1)
> @@ -4574,6 +4582,46 @@ static void
>  
>  }
>  
> +/*----------------------------------------------------------------------------
> +| Returns the parts of floating-point value `a'.
> +*----------------------------------------------------------------------------*/
> +
> +static void float128_unpack(FloatParts128 *p, float128 a, float_status *status)
> +{
> +    p->sign = extractFloat128Sign(a);
> +    p->exp = extractFloat128Exp(a);
> +    p->frac0 = extractFloat128Frac0(a);
> +    p->frac1 = extractFloat128Frac1(a);

Here we are deviating from the exiting style, it would be nice if we
could separate the raw unpack and have something like:

static const FloatFmt float128_params = {
    FLOAT_PARAMS(15, 112)
}

static inline FloatParts128 unpack128_raw(FloatFmt fmt, uint128_t raw)
{
    const int sign_pos = fmt.frac_size + fmt.exp_size;

    return (FloatParts128) {
        .cls = float_class_unclassified,
        .sign = extract128(raw, sign_pos, 1),
        .exp = extract128(raw, fmt.frac_size, fmt.exp_size),
        .frac1 = extract128_lo(raw, 0, fmt.frac_size),
        .frac2 = extract128_hi(raw, 0, fmt.frac_size),
    };
}

So even if we end up duplicating a chunk of the code the form will be
similar so when we side-by-side the logic we can see it works the same
way.

> +
> +    if (p->exp == 0) {
> +        if ((p->frac0 | p->frac1) == 0) {
> +            p->cls = float_class_zero;
> +        } else if (status->flush_inputs_to_zero) {
> +            float_raise(float_flag_input_denormal, status);
> +            p->cls = float_class_zero;
> +            p->frac0 = p->frac1 = 0;
> +        } else {
> +            normalizeFloat128Subnormal(p->frac0, p->frac1, &p->exp,
> +                                       &p->frac0, &p->frac1);
> +            p->exp -= 0x3fff;
> +            p->cls = float_class_normal;
> +        }

and also we can get avoid introducing the magic numbers into the code.

> +    } else if (p->exp == 0x7fff) {
> +        if ((p->frac0 | p->frac1) == 0) {
> +            p->cls = float_class_inf;
> +        } else if (float128_is_signaling_nan(a, status)) {
> +            p->cls = float_class_snan;
> +        } else {
> +            p->cls = float_class_qnan;
> +        }
> +    } else {
> +        /* Add the implicit bit. */
> +        p->frac0 |= UINT64_C(0x0001000000000000);
> +        p->exp -= 0x3fff;
> +        p->cls = float_class_normal;
> +    }
> +}
> +

and eventually hold out for compilers smart enough to handle unification
at a later date.

>  /*----------------------------------------------------------------------------
>  | Packs the sign `zSign', the exponent `zExp', and the significand formed
>  | by the concatenation of `zSig0' and `zSig1' into a quadruple-precision
> @@ -7205,6 +7253,372 @@ float128 float128_mul(float128 a, float128 b, float_status *status)
>  
>  }
>  
> +typedef struct UInt256 {
> +    /* Indexed big-endian, to match the rest of softfloat numbering. */
> +    uint64_t w[4];
> +} UInt256;
> +
> +static inline uint64_t shl_double(uint64_t h, uint64_t l, unsigned lsh)
> +{
> +    return lsh ? (h << lsh) | (l >> (64 - lsh)) : h;
> +}
> +
> +static inline uint64_t shr_double(uint64_t h, uint64_t l, unsigned rsh)
> +{
> +    return rsh ? (h << (64 - rsh)) | (l >> rsh) : l;
> +}
> +
> +static void shortShift256Left(UInt256 *p, unsigned lsh)
> +{
> +    if (lsh != 0) {
> +        p->w[0] = shl_double(p->w[0], p->w[1], lsh);
> +        p->w[1] = shl_double(p->w[1], p->w[2], lsh);
> +        p->w[2] = shl_double(p->w[2], p->w[3], lsh);
> +        p->w[3] <<= lsh;
> +    }
> +}
> +
> +static inline void shift256RightJamming(UInt256 *p, unsigned count)
> +{
> +    uint64_t out, p0, p1, p2, p3;
> +
> +    p0 = p->w[0];
> +    p1 = p->w[1];
> +    p2 = p->w[2];
> +    p3 = p->w[3];
> +
> +    if (unlikely(count == 0)) {
> +        return;
> +    } else if (likely(count < 64)) {
> +        out = 0;
> +    } else if (likely(count < 256)) {
> +        if (count < 128) {
> +            out = p3;
> +            p3 = p2;
> +            p2 = p1;
> +            p1 = p0;
> +            p0 = 0;
> +        } else if (count < 192) {
> +            out = p2 | p3;
> +            p3 = p1;
> +            p2 = p0;
> +            p1 = 0;
> +            p0 = 0;
> +        } else {
> +            out = p1 | p2 | p3;
> +            p3 = p0;
> +            p2 = 0;
> +            p1 = 0;
> +            p0 = 0;
> +        }
> +        count &= 63;
> +        if (count == 0) {
> +            goto done;
> +        }
> +    } else {
> +        out = p0 | p1 | p2 | p3;
> +        p3 = 0;
> +        p2 = 0;
> +        p1 = 0;
> +        p0 = 0;
> +        goto done;
> +    }
> +
> +    out |= shr_double(p3, 0, count);
> +    p3 = shr_double(p2, p3, count);
> +    p2 = shr_double(p1, p2, count);
> +    p1 = shr_double(p0, p1, count);
> +    p0 = p0 >> count;
> +
> + done:
> +    p->w[3] = p3 | (out != 0);
> +    p->w[2] = p2;
> +    p->w[1] = p1;
> +    p->w[0] = p0;
> +}
> +
> +/* R = A - B */
> +static void sub256(UInt256 *r, UInt256 *a, UInt256 *b)
> +{
> +    bool borrow = false;
> +
> +    for (int i = 3; i >= 0; --i) {
> +        uint64_t at = a->w[i];
> +        uint64_t bt = b->w[i];
> +        uint64_t rt = at - bt;
> +
> +        if (borrow) {
> +            borrow = at <= bt;
> +            rt -= 1;
> +        } else {
> +            borrow = at < bt;
> +        }
> +        r->w[i] = rt;
> +    }
> +}
> +
> +/* A = -A */
> +static void neg256(UInt256 *a)
> +{
> +    /*
> +     * Recall that -X - 1 = ~X, and that since this is negation,
> +     * once we find a non-zero number, all subsequent words will
> +     * have borrow-in, and thus use NOT.
> +     */
> +    uint64_t t = a->w[3];
> +    if (likely(t)) {
> +        a->w[3] = -t;
> +        goto not2;
> +    }
> +    t = a->w[2];
> +    if (likely(t)) {
> +        a->w[2] = -t;
> +        goto not1;
> +    }
> +    t = a->w[1];
> +    if (likely(t)) {
> +        a->w[1] = -t;
> +        goto not0;
> +    }
> +    a->w[0] = -a->w[0];
> +    return;
> + not2:
> +    a->w[2] = ~a->w[2];
> + not1:
> +    a->w[1] = ~a->w[1];
> + not0:
> +    a->w[0] = ~a->w[0];
> +}
> +
> +/* A += B */
> +static void add256(UInt256 *a, UInt256 *b)
> +{
> +    bool carry = false;
> +
> +    for (int i = 3; i >= 0; --i) {
> +        uint64_t bt = b->w[i];
> +        uint64_t at = a->w[i] + bt;
> +
> +        if (carry) {
> +            at += 1;
> +            carry = at <= bt;
> +        } else {
> +            carry = at < bt;
> +        }
> +        a->w[i] = at;
> +    }
> +}
> +
> +float128 float128_muladd(float128 a_f, float128 b_f, float128 c_f,
> +                         int flags, float_status *status)
> +{
> +    bool inf_zero, p_sign, sign_flip;
> +    int p_exp, exp_diff, shift, ab_mask, abc_mask;
> +    FloatParts128 a, b, c;
> +    FloatClass p_cls;
> +    UInt256 p_frac, c_frac;
> +
> +    float128_unpack(&a, a_f, status);
> +    float128_unpack(&b, b_f, status);
> +    float128_unpack(&c, c_f, status);
> +
> +    ab_mask = float_cmask(a.cls) | float_cmask(b.cls);
> +    abc_mask = float_cmask(c.cls) | ab_mask;
> +    inf_zero = ab_mask == float_cmask_infzero;
> +
> +    /* If any input is a NaN, select the required result. */
> +    if (unlikely(abc_mask & float_cmask_anynan)) {
> +        if (unlikely(abc_mask & float_cmask_snan)) {
> +            float_raise(float_flag_invalid, status);
> +        }
> +
> +        int which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, status);
> +        if (status->default_nan_mode) {
> +            which = 3;
> +        }
> +        switch (which) {
> +        case 0:
> +            break;
> +        case 1:
> +            a_f = b_f;
> +            a.cls = b.cls;
> +            break;
> +        case 2:
> +            a_f = c_f;
> +            a.cls = c.cls;
> +            break;
> +        case 3:
> +            return float128_default_nan(status);
> +        }
> +        if (is_snan(a.cls)) {
> +            return float128_silence_nan(a_f, status);
> +        }
> +        return a_f;
> +    }
> +
> +    /* After dealing with input NaNs, look for Inf * Zero. */
> +    if (unlikely(inf_zero)) {
> +        float_raise(float_flag_invalid, status);
> +        return float128_default_nan(status);
> +    }
> +
> +    p_sign = a.sign ^ b.sign;
> +
> +    if (flags & float_muladd_negate_c) {
> +        c.sign ^= 1;
> +    }
> +    if (flags & float_muladd_negate_product) {
> +        p_sign ^= 1;
> +    }
> +    sign_flip = (flags & float_muladd_negate_result);
> +
> +    if (ab_mask & float_cmask_inf) {
> +        p_cls = float_class_inf;
> +    } else if (ab_mask & float_cmask_zero) {
> +        p_cls = float_class_zero;
> +    } else {
> +        p_cls = float_class_normal;
> +    }
> +
> +    if (c.cls == float_class_inf) {
> +        if (p_cls == float_class_inf && p_sign != c.sign) {
> +            /* +Inf + -Inf = NaN */
> +            float_raise(float_flag_invalid, status);
> +            return float128_default_nan(status);
> +        }
> +        /* Inf + Inf = Inf of the proper sign; reuse the return below. */
> +        p_cls = float_class_inf;
> +        p_sign = c.sign;
> +    }
> +
> +    if (p_cls == float_class_inf) {
> +        return packFloat128(p_sign ^ sign_flip, 0x7fff, 0, 0);
> +    }
> +
> +    if (p_cls == float_class_zero) {
> +        if (c.cls == float_class_zero) {
> +            if (p_sign != c.sign) {
> +                p_sign = status->float_rounding_mode == float_round_down;
> +            }
> +            return packFloat128(p_sign ^ sign_flip, 0, 0, 0);
> +        }
> +
> +        if (flags & float_muladd_halve_result) {
> +            c.exp -= 1;
> +        }
> +        return roundAndPackFloat128(c.sign ^ sign_flip,
> +                                    c.exp + 0x3fff - 1,
> +                                    c.frac0, c.frac1, 0, status);
> +    }
> +
> +    /* a & b should be normals now... */
> +    assert(a.cls == float_class_normal && b.cls == float_class_normal);
> +
> +    /* Multiply of 2 113-bit numbers produces a 226-bit result.  */
> +    mul128To256(a.frac0, a.frac1, b.frac0, b.frac1,
> +                &p_frac.w[0], &p_frac.w[1], &p_frac.w[2], &p_frac.w[3]);
> +
> +    /* Realign the binary point at bit 48 of p_frac[0].  */
> +    shift = clz64(p_frac.w[0]) - 15;
> +    shortShift256Left(&p_frac, shift);
> +    p_exp = a.exp + b.exp - (shift - 16);
> +    exp_diff = p_exp - c.exp;
> +
> +    /* Extend the fraction part of the addend to 256 bits.  */
> +    c_frac.w[0] = c.frac0;
> +    c_frac.w[1] = c.frac1;
> +    c_frac.w[2] = 0;
> +    c_frac.w[3] = 0;
> +
> +    /* Add or subtract C from the intermediate product. */
> +    if (c.cls == float_class_zero) {
> +        /* Fall through to rounding after addition (with zero). */
> +    } else if (p_sign != c.sign) {
> +        /* Subtraction */
> +        if (exp_diff < 0) {
> +            shift256RightJamming(&p_frac, -exp_diff);
> +            sub256(&p_frac, &c_frac, &p_frac);
> +            p_exp = c.exp;
> +            p_sign ^= 1;
> +        } else if (exp_diff > 0) {
> +            shift256RightJamming(&c_frac, exp_diff);
> +            sub256(&p_frac, &p_frac, &c_frac);
> +        } else {
> +            /* Low 128 bits of C are known to be zero. */
> +            sub128(p_frac.w[0], p_frac.w[1], c_frac.w[0], c_frac.w[1],
> +                   &p_frac.w[0], &p_frac.w[1]);
> +            /*
> +             * Since we have normalized to bit 48 of p_frac[0],
> +             * a negative result means C > P and we need to invert.
> +             */
> +            if ((int64_t)p_frac.w[0] < 0) {
> +                neg256(&p_frac);
> +                p_sign ^= 1;
> +            }
> +        }
> +
> +        /*
> +         * Gross normalization of the 256-bit subtraction result.
> +         * Fine tuning below shared with addition.
> +         */
> +        if (p_frac.w[0] != 0) {
> +            /* nothing to do */
> +        } else if (p_frac.w[1] != 0) {
> +            p_exp -= 64;
> +            p_frac.w[0] = p_frac.w[1];
> +            p_frac.w[1] = p_frac.w[2];
> +            p_frac.w[2] = p_frac.w[3];
> +            p_frac.w[3] = 0;
> +        } else if (p_frac.w[2] != 0) {
> +            p_exp -= 128;
> +            p_frac.w[0] = p_frac.w[2];
> +            p_frac.w[1] = p_frac.w[3];
> +            p_frac.w[2] = 0;
> +            p_frac.w[3] = 0;
> +        } else if (p_frac.w[3] != 0) {
> +            p_exp -= 192;
> +            p_frac.w[0] = p_frac.w[3];
> +            p_frac.w[1] = 0;
> +            p_frac.w[2] = 0;
> +            p_frac.w[3] = 0;
> +        } else {
> +            /* Subtraction was exact: result is zero. */
> +            p_sign = status->float_rounding_mode == float_round_down;
> +            return packFloat128(p_sign ^ sign_flip, 0, 0, 0);
> +        }
> +    } else {
> +        /* Addition */
> +        if (exp_diff <= 0) {
> +            shift256RightJamming(&p_frac, -exp_diff);
> +            /* Low 128 bits of C are known to be zero. */
> +            add128(p_frac.w[0], p_frac.w[1], c_frac.w[0], c_frac.w[1],
> +                   &p_frac.w[0], &p_frac.w[1]);
> +            p_exp = c.exp;
> +        } else {
> +            shift256RightJamming(&c_frac, exp_diff);
> +            add256(&p_frac, &c_frac);
> +        }
> +    }
> +
> +    /* Fine normalization of the 256-bit result: p_frac[0] != 0. */
> +    shift = clz64(p_frac.w[0]) - 15;
> +    if (shift < 0) {
> +        shift256RightJamming(&p_frac, -shift);
> +    } else if (shift > 0) {
> +        shortShift256Left(&p_frac, shift);
> +    }
> +    p_exp -= shift;
> +
> +    if (flags & float_muladd_halve_result) {
> +        p_exp -= 1;
> +    }
> +    return roundAndPackFloat128(p_sign ^ sign_flip,
> +                                p_exp + 0x3fff - 1,
> +                                p_frac.w[0], p_frac.w[1],
> +                                p_frac.w[2] | (p_frac.w[3] != 0),
> +                                status);
> +}
> +
>  /*----------------------------------------------------------------------------
>  | Returns the result of dividing the quadruple-precision floating-point value
>  | `a' by the corresponding value `b'.  The operation is performed according to
> diff --git a/tests/fp/fp-test.c b/tests/fp/fp-test.c
> index 06ffebd6db..9bbb0dba67 100644
> --- a/tests/fp/fp-test.c
> +++ b/tests/fp/fp-test.c
> @@ -717,7 +717,7 @@ static void do_testfloat(int op, int rmode, bool exact)
>          test_abz_f128(true_abz_f128M, subj_abz_f128M);
>          break;
>      case F128_MULADD:
> -        not_implemented();
> +        test_abcz_f128(slow_f128M_mulAdd, qemu_f128_mulAdd);
>          break;
>      case F128_SQRT:
>          test_az_f128(slow_f128M_sqrt, qemu_f128M_sqrt);
> diff --git a/tests/fp/wrap.c.inc b/tests/fp/wrap.c.inc
> index 0cbd20013e..65a713deae 100644
> --- a/tests/fp/wrap.c.inc
> +++ b/tests/fp/wrap.c.inc
> @@ -574,6 +574,18 @@ WRAP_MULADD(qemu_f32_mulAdd, float32_muladd, float32)
>  WRAP_MULADD(qemu_f64_mulAdd, float64_muladd, float64)
>  #undef WRAP_MULADD
>  
> +static void qemu_f128_mulAdd(const float128_t *ap, const float128_t *bp,
> +                             const float128_t *cp, float128_t *res)
> +{
> +    float128 a, b, c, ret;
> +
> +    a = soft_to_qemu128(*ap);
> +    b = soft_to_qemu128(*bp);
> +    c = soft_to_qemu128(*cp);
> +    ret = float128_muladd(a, b, c, 0, &qsf);
> +    *res = qemu_to_soft128(ret);
> +}
> +
>  #define WRAP_CMP16(name, func, retcond)         \
>      static bool name(float16_t a, float16_t b)  \
>      {                                           \


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 05/10] softfloat: Inline pick_nan_muladd into its caller
  2020-10-16 16:20   ` Alex Bennée
@ 2020-10-16 16:36     ` Richard Henderson
  2020-10-18 21:06       ` [PATCH] softfpu: Generalize pick_nan_muladd to opaque structures Richard Henderson
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2020-10-16 16:36 UTC (permalink / raw)
  To: Alex Bennée; +Cc: bharata, qemu-devel, david

On 10/16/20 9:20 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Because of FloatParts, there will only ever be one caller.
> 
> Isn't that admitting defeat - after all the logic here will be the same
> as the login in the up coming float128_muladd code and we only seem to
> need additional information...


Well, that and passing around a completely different structure.
Which is the big stop.  Any suggestions for that?


r~


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 06/10] softfloat: Implement float128_muladd
  2020-10-16 16:31   ` Alex Bennée
@ 2020-10-16 16:55     ` Richard Henderson
  0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-10-16 16:55 UTC (permalink / raw)
  To: Alex Bennée; +Cc: bharata, qemu-devel, david

On 10/16/20 9:31 AM, Alex Bennée wrote:
>> +static void float128_unpack(FloatParts128 *p, float128 a, float_status *status)
>> +{
>> +    p->sign = extractFloat128Sign(a);
>> +    p->exp = extractFloat128Exp(a);
>> +    p->frac0 = extractFloat128Frac0(a);
>> +    p->frac1 = extractFloat128Frac1(a);
> 
> Here we are deviating from the exiting style, it would be nice if we
> could separate the raw unpack and have something like:
> 
> static const FloatFmt float128_params = {
>     FLOAT_PARAMS(15, 112)
> }
> 
> static inline FloatParts128 unpack128_raw(FloatFmt fmt, uint128_t raw)
> {
>     const int sign_pos = fmt.frac_size + fmt.exp_size;
> 
>     return (FloatParts128) {
>         .cls = float_class_unclassified,
>         .sign = extract128(raw, sign_pos, 1),
>         .exp = extract128(raw, fmt.frac_size, fmt.exp_size),
>         .frac1 = extract128_lo(raw, 0, fmt.frac_size),
>         .frac2 = extract128_hi(raw, 0, fmt.frac_size),
>     };
> }
> 
> So even if we end up duplicating a chunk of the code the form will be
> similar so when we side-by-side the logic we can see it works the same
> way.

I suppose, but unlike the smaller fp formats, we won't be able to reuse this.
Even if we pull in the x86 80-bit format and the m68k 96-bit format, there are
a number of fundamental differences.  E.g. the implicit bit

>> +        /* Add the implicit bit. */
>> +        p->frac0 |= UINT64_C(0x0001000000000000);

is not present in the x86 and m68k formats.

Finally, I'm continuing to use the existing Berkeley packing logic.  Which a
bit persnickety with where that implicit bit goes.  Our smaller formats put the
implicit bit at bit 62.


r~


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 03/10] softfloat: Tidy a * b + inf return
  2020-09-25 15:20 ` [PATCH v2 03/10] softfloat: Tidy a * b + inf return Richard Henderson
  2020-10-16  9:40   ` Alex Bennée
@ 2020-10-16 17:04   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-16 17:04 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: david, alex.bennee, bharata

On 9/25/20 5:20 PM, Richard Henderson wrote:
> No reason to set values in 'a', when we already
> have float_class_inf in 'c', and can flip that sign.
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   fpu/softfloat.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] softfpu: Generalize pick_nan_muladd to opaque structures
  2020-10-16 16:36     ` Richard Henderson
@ 2020-10-18 21:06       ` Richard Henderson
  2020-10-19  9:57         ` Alex Bennée
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2020-10-18 21:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennee

This will allow us to share code between FloatParts and FloatParts128.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
Cc: Alex Bennee <alex.bennee@linaro.org>

What do you think of this instead of inlining pick_nan_muladd
into the two muladd implementations?


r~

---
 fpu/softfloat.c | 40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3e625c47cd..60fdddd163 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -929,16 +929,23 @@ static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
     return a;
 }
 
-static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
-                                  bool inf_zero, float_status *s)
+/*
+ * Given pointers to A, B, C, and the respective classes, return the
+ * pointer to the structure that is the NaN result, or NULL to signal
+ * that the result is the default NaN.
+ */
+static inline void *
+pick_nan_muladd(FloatClass a_cls, FloatClass b_cls, FloatClass c_cls,
+                void *a, void *b, void *c,
+                bool inf_zero, int abc_mask, float_status *s)
 {
     int which;
 
-    if (is_snan(a.cls) || is_snan(b.cls) || is_snan(c.cls)) {
+    if (unlikely(abc_mask & float_cmask_snan)) {
         s->float_exception_flags |= float_flag_invalid;
     }
 
-    which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
+    which = pickNaNMulAdd(a_cls, b_cls, c_cls, inf_zero, s);
 
     if (s->default_nan_mode) {
         /* Note that this check is after pickNaNMulAdd so that function
@@ -949,23 +956,16 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
 
     switch (which) {
     case 0:
-        break;
+        return a;
     case 1:
-        a = b;
-        break;
+        return b;
     case 2:
-        a = c;
-        break;
+        return c;
     case 3:
-        return parts_default_nan(s);
+        return NULL;
     default:
         g_assert_not_reached();
     }
-
-    if (is_snan(a.cls)) {
-        return parts_silence_nan(a, s);
-    }
-    return a;
 }
 
 /*
@@ -1366,7 +1366,15 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
      * off to the target-specific pick-a-NaN routine.
      */
     if (unlikely(abc_mask & float_cmask_anynan)) {
-        return pick_nan_muladd(a, b, c, inf_zero, s);
+        FloatParts *r = pick_nan_muladd(a.cls, b.cls, c.cls, &a, &b, &c,
+                                        inf_zero, abc_mask, s);
+        if (r == NULL) {
+            return parts_default_nan(s);
+        }
+        if (is_snan(r->cls)) {
+            return parts_silence_nan(*r, s);
+        }
+        return *r;
     }
 
     if (unlikely(inf_zero)) {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] softfpu: Generalize pick_nan_muladd to opaque structures
  2020-10-18 21:06       ` [PATCH] softfpu: Generalize pick_nan_muladd to opaque structures Richard Henderson
@ 2020-10-19  9:57         ` Alex Bennée
  0 siblings, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2020-10-19  9:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> This will allow us to share code between FloatParts and FloatParts128.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> Cc: Alex Bennee <alex.bennee@linaro.org>
>
> What do you think of this instead of inlining pick_nan_muladd
> into the two muladd implementations?

I think that can work. I was noodling about with float_addsub128 over
the weekend so I'll post what that looks like once I've tested it.

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

>
>
> r~
>
> ---
>  fpu/softfloat.c | 40 ++++++++++++++++++++++++----------------
>  1 file changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 3e625c47cd..60fdddd163 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -929,16 +929,23 @@ static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
>      return a;
>  }
>  
> -static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
> -                                  bool inf_zero, float_status *s)
> +/*
> + * Given pointers to A, B, C, and the respective classes, return the
> + * pointer to the structure that is the NaN result, or NULL to signal
> + * that the result is the default NaN.
> + */
> +static inline void *
> +pick_nan_muladd(FloatClass a_cls, FloatClass b_cls, FloatClass c_cls,
> +                void *a, void *b, void *c,
> +                bool inf_zero, int abc_mask, float_status *s)
>  {
>      int which;
>  
> -    if (is_snan(a.cls) || is_snan(b.cls) || is_snan(c.cls)) {
> +    if (unlikely(abc_mask & float_cmask_snan)) {
>          s->float_exception_flags |= float_flag_invalid;
>      }
>  
> -    which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
> +    which = pickNaNMulAdd(a_cls, b_cls, c_cls, inf_zero, s);
>  
>      if (s->default_nan_mode) {
>          /* Note that this check is after pickNaNMulAdd so that function
> @@ -949,23 +956,16 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
>  
>      switch (which) {
>      case 0:
> -        break;
> +        return a;
>      case 1:
> -        a = b;
> -        break;
> +        return b;
>      case 2:
> -        a = c;
> -        break;
> +        return c;
>      case 3:
> -        return parts_default_nan(s);
> +        return NULL;
>      default:
>          g_assert_not_reached();
>      }
> -
> -    if (is_snan(a.cls)) {
> -        return parts_silence_nan(a, s);
> -    }
> -    return a;
>  }
>  
>  /*
> @@ -1366,7 +1366,15 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
>       * off to the target-specific pick-a-NaN routine.
>       */
>      if (unlikely(abc_mask & float_cmask_anynan)) {
> -        return pick_nan_muladd(a, b, c, inf_zero, s);
> +        FloatParts *r = pick_nan_muladd(a.cls, b.cls, c.cls, &a, &b, &c,
> +                                        inf_zero, abc_mask, s);
> +        if (r == NULL) {
> +            return parts_default_nan(s);
> +        }
> +        if (is_snan(r->cls)) {
> +            return parts_silence_nan(*r, s);
> +        }
> +        return *r;
>      }
>  
>      if (unlikely(inf_zero)) {


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-10-19  9:58 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
2020-09-25 15:20 ` [PATCH v2 01/10] softfloat: Use mulu64 for mul64To128 Richard Henderson
2020-10-15 19:08   ` Alex Bennée
2020-09-25 15:20 ` [PATCH v2 02/10] softfloat: Use int128.h for some operations Richard Henderson
2020-10-15 19:10   ` Alex Bennée
2020-09-25 15:20 ` [PATCH v2 03/10] softfloat: Tidy a * b + inf return Richard Henderson
2020-10-16  9:40   ` Alex Bennée
2020-10-16 17:04   ` Philippe Mathieu-Daudé
2020-09-25 15:20 ` [PATCH v2 04/10] softfloat: Add float_cmask and constants Richard Henderson
2020-10-16  9:44   ` Alex Bennée
2020-09-25 15:20 ` [PATCH v2 05/10] softfloat: Inline pick_nan_muladd into its caller Richard Henderson
2020-10-16 16:20   ` Alex Bennée
2020-10-16 16:36     ` Richard Henderson
2020-10-18 21:06       ` [PATCH] softfpu: Generalize pick_nan_muladd to opaque structures Richard Henderson
2020-10-19  9:57         ` Alex Bennée
2020-09-25 15:20 ` [PATCH v2 06/10] softfloat: Implement float128_muladd Richard Henderson
2020-10-16 16:31   ` Alex Bennée
2020-10-16 16:55     ` Richard Henderson
2020-09-25 15:20 ` [PATCH v2 07/10] softfloat: Use x86_64 assembly for {add, sub}{192, 256} Richard Henderson
2020-09-25 15:20 ` [PATCH v2 08/10] softfloat: Use x86_64 assembly for sh[rl]_double Richard Henderson
2020-09-25 15:20 ` [PATCH v2 09/10] softfloat: Use aarch64 assembly for {add, sub}{192, 256} Richard Henderson
2020-09-25 15:20 ` [PATCH v2 10/10] softfloat: Use ppc64 " Richard Henderson
2020-10-15 17:23 ` [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.