qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/72] Convert floatx80 and float128 to FloatParts
@ 2021-05-08  1:46 Richard Henderson
  2021-05-08  1:46 ` [PATCH 01/72] qemu/host-utils: Use __builtin_bitreverseN Richard Henderson
                   ` (76 more replies)
  0 siblings, 77 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Reorg everything using QEMU_GENERIC and multiple inclusion to
reduce the amount of code duplication between the formats.

The use of QEMU_GENERIC means that we need to use pointers instead
of structures, which means that even the smaller float formats
need rearranging.

I've carried it through to completion within fpu/, so that we don't
have (much) of the legacy code remaining.  There is some floatx80
stuff in target/m68k and target/i386 that's still hanging around.


r~


Alex Bennée (1):
  tests/fp: add quad support to the benchmark utility

Richard Henderson (71):
  qemu/host-utils: Use __builtin_bitreverseN
  qemu/host-utils: Add wrappers for overflow builtins
  qemu/host-utils: Add wrappers for carry builtins
  accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c
  softfloat: Move the binary point to the msb
  softfloat: Inline float_raise
  softfloat: Use float_raise in more places
  softfloat: Tidy a * b + inf return
  softfloat: Add float_cmask and constants
  softfloat: Use return_nan in float_to_float
  softfloat: fix return_nan vs default_nan_mode
  target/mips: Set set_default_nan_mode with set_snan_bit_is_one
  softfloat: Do not produce a default_nan from parts_silence_nan
  softfloat: Rename FloatParts to FloatParts64
  softfloat: Move type-specific pack/unpack routines
  softfloat: Use pointers with parts_default_nan
  softfloat: Use pointers with unpack_raw
  softfloat: Use pointers with ftype_unpack_raw
  softfloat: Use pointers with pack_raw
  softfloat: Use pointers with ftype_pack_raw
  softfloat: Use pointers with ftype_unpack_canonical
  softfloat: Use pointers with ftype_round_pack_canonical
  softfloat: Use pointers with parts_silence_nan
  softfloat: Rearrange FloatParts64
  softfloat: Convert float128_silence_nan to parts
  softfloat: Convert float128_default_nan to parts
  softfloat: Move return_nan to softfloat-parts.c.inc
  softfloat: Move pick_nan to softfloat-parts.c.inc
  softfloat: Move pick_nan_muladd to softfloat-parts.c.inc
  softfloat: Move sf_canonicalize to softfloat-parts.c.inc
  softfloat: Move round_canonical to softfloat-parts.c.inc
  softfloat: Use uadd64_carry, usub64_borrow in softfloat-macros.h
  softfloat: Move addsub_floats to softfloat-parts.c.inc
  softfloat: Implement float128_add/sub via parts
  softfloat: Move mul_floats to softfloat-parts.c.inc
  softfloat: Move muladd_floats to softfloat-parts.c.inc
  softfloat: Use mulu64 for mul64To128
  softfloat: Use add192 in mul128To256
  softfloat: Tidy mul128By64To192
  softfloat: Introduce sh[lr]_double primitives
  softfloat: Move div_floats to softfloat-parts.c.inc
  softfloat: Split float_to_float
  softfloat: Convert float-to-float conversions with float128
  softfloat: Move round_to_int to softfloat-parts.c.inc
  softfloat: Move rount_to_int_and_pack to softfloat-parts.c.inc
  softfloat: Move rount_to_uint_and_pack to softfloat-parts.c.inc
  softfloat: Move int_to_float to softfloat-parts.c.inc
  softfloat: Move uint_to_float to softfloat-parts.c.inc
  softfloat: Move minmax_flags to softfloat-parts.c.inc
  softfloat: Move compare_floats to softfloat-parts.c.inc
  softfloat: Move scalbn_decomposed to softfloat-parts.c.inc
  softfloat: Move sqrt_float to softfloat-parts.c.inc
  softfloat: Split out parts_uncanon_normal
  softfloat: Reduce FloatFmt
  softfloat: Introduce Floatx80RoundPrec
  softfloat: Adjust parts_uncanon_normal for floatx80
  tests/fp/fp-test: Reverse order of floatx80 precision tests
  softfloat: Convert floatx80_add/sub to FloatParts
  softfloat: Convert floatx80_mul to FloatParts
  softfloat: Convert floatx80_div to FloatParts
  softfloat: Convert floatx80_sqrt to FloatParts
  softfloat: Convert floatx80_round to FloatParts
  softfloat: Convert floatx80_round_to_int to FloatParts
  softfloat: Convert integer to floatx80 to FloatParts
  softfloat: Convert floatx80 float conversions to FloatParts
  softfloat: Convert floatx80 to integer to FloatParts
  softfloat: Convert floatx80_scalbn to FloatParts
  softfloat: Convert floatx80 compare to FloatParts
  softfloat: Convert float32_exp2 to FloatParts
  softfloat: Move floatN_log2 to softfloat-parts.c.inc
  softfloat: Convert modrem operations to FloatParts

 include/fpu/softfloat-helpers.h  |    5 +-
 include/fpu/softfloat-macros.h   |  247 +-
 include/fpu/softfloat-types.h    |   10 +-
 include/fpu/softfloat.h          |   11 +-
 include/qemu/host-utils.h        |  291 ++
 target/mips/fpu_helper.h         |   10 +-
 accel/tcg/tcg-runtime-gvec.c     |   36 +-
 fpu/softfloat.c                  | 7760 ++++++++++--------------------
 linux-user/arm/nwfpe/fpa11.c     |   41 +-
 target/i386/tcg/fpu_helper.c     |   79 +-
 target/m68k/fpu_helper.c         |   50 +-
 target/m68k/softfloat.c          |   90 +-
 tests/fp/fp-bench.c              |   88 +-
 tests/fp/fp-test-log2.c          |  118 +
 tests/fp/fp-test.c               |   11 +-
 fpu/softfloat-parts-addsub.c.inc |   62 +
 fpu/softfloat-parts.c.inc        | 1480 ++++++
 fpu/softfloat-specialize.c.inc   |  424 +-
 tests/fp/wrap.c.inc              |   12 +
 tests/fp/meson.build             |   11 +
 20 files changed, 4886 insertions(+), 5950 deletions(-)
 create mode 100644 tests/fp/fp-test-log2.c
 create mode 100644 fpu/softfloat-parts-addsub.c.inc
 create mode 100644 fpu/softfloat-parts.c.inc

-- 
2.25.1



^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH 01/72] qemu/host-utils: Use __builtin_bitreverseN
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
@ 2021-05-08  1:46 ` Richard Henderson
  2021-05-10  9:59   ` Alex Bennée
  2021-05-11  9:41   ` David Hildenbrand
  2021-05-08  1:46 ` [PATCH 02/72] qemu/host-utils: Add wrappers for overflow builtins Richard Henderson
                   ` (75 subsequent siblings)
  76 siblings, 2 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Clang has added some builtins for these operations;
use them if available.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/host-utils.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index cdca2991d8..f1e52851e0 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -272,6 +272,9 @@ static inline int ctpop64(uint64_t val)
  */
 static inline uint8_t revbit8(uint8_t x)
 {
+#if __has_builtin(__builtin_bitreverse8)
+    return __builtin_bitreverse8(x);
+#else
     /* Assign the correct nibble position.  */
     x = ((x & 0xf0) >> 4)
       | ((x & 0x0f) << 4);
@@ -281,6 +284,7 @@ static inline uint8_t revbit8(uint8_t x)
       | ((x & 0x22) << 1)
       | ((x & 0x11) << 3);
     return x;
+#endif
 }
 
 /**
@@ -289,6 +293,9 @@ static inline uint8_t revbit8(uint8_t x)
  */
 static inline uint16_t revbit16(uint16_t x)
 {
+#if __has_builtin(__builtin_bitreverse16)
+    return __builtin_bitreverse16(x);
+#else
     /* Assign the correct byte position.  */
     x = bswap16(x);
     /* Assign the correct nibble position.  */
@@ -300,6 +307,7 @@ static inline uint16_t revbit16(uint16_t x)
       | ((x & 0x2222) << 1)
       | ((x & 0x1111) << 3);
     return x;
+#endif
 }
 
 /**
@@ -308,6 +316,9 @@ static inline uint16_t revbit16(uint16_t x)
  */
 static inline uint32_t revbit32(uint32_t x)
 {
+#if __has_builtin(__builtin_bitreverse32)
+    return __builtin_bitreverse32(x);
+#else
     /* Assign the correct byte position.  */
     x = bswap32(x);
     /* Assign the correct nibble position.  */
@@ -319,6 +330,7 @@ static inline uint32_t revbit32(uint32_t x)
       | ((x & 0x22222222u) << 1)
       | ((x & 0x11111111u) << 3);
     return x;
+#endif
 }
 
 /**
@@ -327,6 +339,9 @@ static inline uint32_t revbit32(uint32_t x)
  */
 static inline uint64_t revbit64(uint64_t x)
 {
+#if __has_builtin(__builtin_bitreverse64)
+    return __builtin_bitreverse64(x);
+#else
     /* Assign the correct byte position.  */
     x = bswap64(x);
     /* Assign the correct nibble position.  */
@@ -338,6 +353,7 @@ static inline uint64_t revbit64(uint64_t x)
       | ((x & 0x2222222222222222ull) << 1)
       | ((x & 0x1111111111111111ull) << 3);
     return x;
+#endif
 }
 
 /* Host type specific sizes of these routines.  */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 02/72] qemu/host-utils: Add wrappers for overflow builtins
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
  2021-05-08  1:46 ` [PATCH 01/72] qemu/host-utils: Use __builtin_bitreverseN Richard Henderson
@ 2021-05-08  1:46 ` Richard Henderson
  2021-05-10 10:22   ` Alex Bennée
  2021-05-08  1:46 ` [PATCH 03/72] qemu/host-utils: Add wrappers for carry builtins Richard Henderson
                   ` (74 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

These builtins came in with gcc 5 and clang 3.8, which are
slightly newer than our supported minimum compiler versions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/host-utils.h | 225 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 225 insertions(+)

diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index f1e52851e0..fd76f0cbd3 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -356,6 +356,231 @@ static inline uint64_t revbit64(uint64_t x)
 #endif
 }
 
+/**
+ * sadd32_overflow - addition with overflow indication
+ * @x, @y: addends
+ * @ret: Output for sum
+ *
+ * Computes *@ret = @x + @y, and returns true if and only if that
+ * value has been truncated.
+ */
+static inline bool sadd32_overflow(int32_t x, int32_t y, int32_t *ret)
+{
+#if __has_builtin(__builtin_add_overflow) || __GNUC__ >= 5
+    return __builtin_add_overflow(x, y, ret);
+#else
+    *ret = x + y;
+    return ((*ret ^ x) & ~(x ^ y)) < 0;
+#endif
+}
+
+/**
+ * sadd64_overflow - addition with overflow indication
+ * @x, @y: addends
+ * @ret: Output for sum
+ *
+ * Computes *@ret = @x + @y, and returns true if and only if that
+ * value has been truncated.
+ */
+static inline bool sadd64_overflow(int64_t x, int64_t y, int64_t *ret)
+{
+#if __has_builtin(__builtin_add_overflow) || __GNUC__ >= 5
+    return __builtin_add_overflow(x, y, ret);
+#else
+    *ret = x + y;
+    return ((*ret ^ x) & ~(x ^ y)) < 0;
+#endif
+}
+
+/**
+ * uadd32_overflow - addition with overflow indication
+ * @x, @y: addends
+ * @ret: Output for sum
+ *
+ * Computes *@ret = @x + @y, and returns true if and only if that
+ * value has been truncated.
+ */
+static inline bool uadd32_overflow(uint32_t x, uint32_t y, uint32_t *ret)
+{
+#if __has_builtin(__builtin_add_overflow) || __GNUC__ >= 5
+    return __builtin_add_overflow(x, y, ret);
+#else
+    *ret = x + y;
+    return *ret < x;
+#endif
+}
+
+/**
+ * uadd64_overflow - addition with overflow indication
+ * @x, @y: addends
+ * @ret: Output for sum
+ *
+ * Computes *@ret = @x + @y, and returns true if and only if that
+ * value has been truncated.
+ */
+static inline bool uadd64_overflow(uint64_t x, uint64_t y, uint64_t *ret)
+{
+#if __has_builtin(__builtin_add_overflow) || __GNUC__ >= 5
+    return __builtin_add_overflow(x, y, ret);
+#else
+    *ret = x + y;
+    return *ret < x;
+#endif
+}
+
+/**
+ * ssub32_overflow - subtraction with overflow indication
+ * @x: Minuend
+ * @y: Subtrahend
+ * @ret: Output for difference
+ *
+ * Computes *@ret = @x - @y, and returns true if and only if that
+ * value has been truncated.
+ */
+static inline bool ssub32_overflow(int32_t x, int32_t y, int32_t *ret)
+{
+#if __has_builtin(__builtin_sub_overflow) || __GNUC__ >= 5
+    return __builtin_sub_overflow(x, y, ret);
+#else
+    *ret = x - y;
+    return ((*ret ^ x) & (x ^ y)) < 0;
+#endif
+}
+
+/**
+ * ssub64_overflow - subtraction with overflow indication
+ * @x: Minuend
+ * @y: Subtrahend
+ * @ret: Output for sum
+ *
+ * Computes *@ret = @x - @y, and returns true if and only if that
+ * value has been truncated.
+ */
+static inline bool ssub64_overflow(int64_t x, int64_t y, int64_t *ret)
+{
+#if __has_builtin(__builtin_sub_overflow) || __GNUC__ >= 5
+    return __builtin_sub_overflow(x, y, ret);
+#else
+    *ret = x - y;
+    return ((*ret ^ x) & (x ^ y)) < 0;
+#endif
+}
+
+/**
+ * usub32_overflow - subtraction with overflow indication
+ * @x: Minuend
+ * @y: Subtrahend
+ * @ret: Output for sum
+ *
+ * Computes *@ret = @x - @y, and returns true if and only if that
+ * value has been truncated.
+ */
+static inline bool usub32_overflow(uint32_t x, uint32_t y, uint32_t *ret)
+{
+#if __has_builtin(__builtin_sub_overflow) || __GNUC__ >= 5
+    return __builtin_sub_overflow(x, y, ret);
+#else
+    *ret = x - y;
+    return x < y;
+#endif
+}
+
+/**
+ * usub64_overflow - subtraction with overflow indication
+ * @x: Minuend
+ * @y: Subtrahend
+ * @ret: Output for sum
+ *
+ * Computes *@ret = @x - @y, and returns true if and only if that
+ * value has been truncated.
+ */
+static inline bool usub64_overflow(uint64_t x, uint64_t y, uint64_t *ret)
+{
+#if __has_builtin(__builtin_sub_overflow) || __GNUC__ >= 5
+    return __builtin_sub_overflow(x, y, ret);
+#else
+    *ret = x - y;
+    return x < y;
+#endif
+}
+
+/**
+ * smul32_overflow - multiplication with overflow indication
+ * @x, @y: Input multipliers
+ * @ret: Output for product
+ *
+ * Computes *@ret = @x * @y, and returns true if and only if that
+ * value has been truncated.
+ */
+static inline bool smul32_overflow(int32_t x, int32_t y, int32_t *ret)
+{
+#if __has_builtin(__builtin_mul_overflow) || __GNUC__ >= 5
+    return __builtin_mul_overflow(x, y, ret);
+#else
+    int64_t z = (int64_t)x * y;
+    *ret = z;
+    return *ret != z;
+#endif
+}
+
+/**
+ * smul64_overflow - multiplication with overflow indication
+ * @x, @y: Input multipliers
+ * @ret: Output for product
+ *
+ * Computes *@ret = @x * @y, and returns true if and only if that
+ * value has been truncated.
+ */
+static inline bool smul64_overflow(int64_t x, int64_t y, int64_t *ret)
+{
+#if __has_builtin(__builtin_mul_overflow) || __GNUC__ >= 5
+    return __builtin_mul_overflow(x, y, ret);
+#else
+    uint64_t hi, lo;
+    muls64(&lo, &hi, x, y);
+    *ret = lo;
+    return hi != ((int64_t)lo >> 63);
+#endif
+}
+
+/**
+ * umul32_overflow - multiplication with overflow indication
+ * @x, @y: Input multipliers
+ * @ret: Output for product
+ *
+ * Computes *@ret = @x * @y, and returns true if and only if that
+ * value has been truncated.
+ */
+static inline bool umul32_overflow(uint32_t x, uint32_t y, uint32_t *ret)
+{
+#if __has_builtin(__builtin_mul_overflow) || __GNUC__ >= 5
+    return __builtin_mul_overflow(x, y, ret);
+#else
+    uint64_t z = (uint64_t)x * y;
+    *ret = z;
+    return z > UINT32_MAX;
+#endif
+}
+
+/**
+ * smul64_overflow - multiplication with overflow indication
+ * @x, @y: Input multipliers
+ * @ret: Output for product
+ *
+ * Computes *@ret = @x * @y, and returns true if and only if that
+ * value has been truncated.
+ */
+static inline bool umul64_overflow(uint64_t x, uint64_t y, uint64_t *ret)
+{
+#if __has_builtin(__builtin_mul_overflow) || __GNUC__ >= 5
+    return __builtin_mul_overflow(x, y, ret);
+#else
+    uint64_t hi;
+    mulu64(ret, &hi, x, y);
+    return hi != 0;
+#endif
+}
+
 /* Host type specific sizes of these routines.  */
 
 #if ULONG_MAX == UINT32_MAX
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 03/72] qemu/host-utils: Add wrappers for carry builtins
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
  2021-05-08  1:46 ` [PATCH 01/72] qemu/host-utils: Use __builtin_bitreverseN Richard Henderson
  2021-05-08  1:46 ` [PATCH 02/72] qemu/host-utils: Add wrappers for overflow builtins Richard Henderson
@ 2021-05-08  1:46 ` Richard Henderson
  2021-05-10 12:57   ` Alex Bennée
  2021-05-08  1:46 ` [PATCH 04/72] accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c Richard Henderson
                   ` (73 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

These builtins came in clang 3.8, but are not present in gcc through
version 11.  Even in clang the optimization is not ideal except for
x86_64, but no worse than the hand-coding that we currently do.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/host-utils.h | 50 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index fd76f0cbd3..2ea8b3000b 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -26,6 +26,7 @@
 #ifndef HOST_UTILS_H
 #define HOST_UTILS_H
 
+#include "qemu/compiler.h"
 #include "qemu/bswap.h"
 
 #ifdef CONFIG_INT128
@@ -581,6 +582,55 @@ static inline bool umul64_overflow(uint64_t x, uint64_t y, uint64_t *ret)
 #endif
 }
 
+/**
+ * uadd64_carry - addition with carry-in and carry-out
+ * @x, @y: addends
+ * @pcarry: in-out carry value
+ *
+ * Computes @x + @y + *@pcarry, placing the carry-out back
+ * into *@pcarry and returning the 64-bit sum.
+ */
+static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
+{
+#if __has_builtin(__builtin_addcll)
+    unsigned long long c = *pcarry;
+    x = __builtin_addcll(x, y, c, &c);
+    *pcarry = c & 1;
+    return x;
+#else
+    bool c = *pcarry;
+    /* This is clang's internal expansion of __builtin_addc. */
+    c = uadd64_overflow(x, c, &x);
+    c |= uadd64_overflow(x, y, &x);
+    *pcarry = c;
+    return x;
+#endif
+}
+
+/**
+ * usub64_borrow - subtraction with borrow-in and borrow-out
+ * @x, @y: addends
+ * @pborrow: in-out borrow value
+ *
+ * Computes @x - @y - *@pborrow, placing the borrow-out back
+ * into *@pborrow and returning the 64-bit sum.
+ */
+static inline uint64_t usub64_borrow(uint64_t x, uint64_t y, bool *pborrow)
+{
+#if __has_builtin(__builtin_subcll)
+    unsigned long long b = *pborrow;
+    x = __builtin_subcll(x, y, b, &b);
+    *pborrow = b & 1;
+    return x;
+#else
+    bool b = *pborrow;
+    b = usub64_overflow(x, b, &x);
+    b |= usub64_overflow(x, y, &x);
+    *pborrow = b;
+    return x;
+#endif
+}
+
 /* Host type specific sizes of these routines.  */
 
 #if ULONG_MAX == UINT32_MAX
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 04/72] accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (2 preceding siblings ...)
  2021-05-08  1:46 ` [PATCH 03/72] qemu/host-utils: Add wrappers for carry builtins Richard Henderson
@ 2021-05-08  1:46 ` Richard Henderson
  2021-05-09  8:43   ` Philippe Mathieu-Daudé
                     ` (2 more replies)
  2021-05-08  1:46 ` [PATCH 05/72] tests/fp: add quad support to the benchmark utility Richard Henderson
                   ` (72 subsequent siblings)
  76 siblings, 3 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Obvious uses of the new functions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/tcg-runtime-gvec.c | 36 ++++++++++++++++--------------------
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index 521da4a813..ac7d28c251 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -1073,9 +1073,8 @@ void HELPER(gvec_ssadd32)(void *d, void *a, void *b, uint32_t desc)
     for (i = 0; i < oprsz; i += sizeof(int32_t)) {
         int32_t ai = *(int32_t *)(a + i);
         int32_t bi = *(int32_t *)(b + i);
-        int32_t di = ai + bi;
-        if (((di ^ ai) &~ (ai ^ bi)) < 0) {
-            /* Signed overflow.  */
+        int32_t di;
+        if (sadd32_overflow(ai, bi, &di)) {
             di = (di < 0 ? INT32_MAX : INT32_MIN);
         }
         *(int32_t *)(d + i) = di;
@@ -1091,9 +1090,8 @@ void HELPER(gvec_ssadd64)(void *d, void *a, void *b, uint32_t desc)
     for (i = 0; i < oprsz; i += sizeof(int64_t)) {
         int64_t ai = *(int64_t *)(a + i);
         int64_t bi = *(int64_t *)(b + i);
-        int64_t di = ai + bi;
-        if (((di ^ ai) &~ (ai ^ bi)) < 0) {
-            /* Signed overflow.  */
+        int64_t di;
+        if (sadd64_overflow(ai, bi, &di)) {
             di = (di < 0 ? INT64_MAX : INT64_MIN);
         }
         *(int64_t *)(d + i) = di;
@@ -1143,9 +1141,8 @@ void HELPER(gvec_sssub32)(void *d, void *a, void *b, uint32_t desc)
     for (i = 0; i < oprsz; i += sizeof(int32_t)) {
         int32_t ai = *(int32_t *)(a + i);
         int32_t bi = *(int32_t *)(b + i);
-        int32_t di = ai - bi;
-        if (((di ^ ai) & (ai ^ bi)) < 0) {
-            /* Signed overflow.  */
+        int32_t di;
+        if (ssub32_overflow(ai, bi, &di)) {
             di = (di < 0 ? INT32_MAX : INT32_MIN);
         }
         *(int32_t *)(d + i) = di;
@@ -1161,9 +1158,8 @@ void HELPER(gvec_sssub64)(void *d, void *a, void *b, uint32_t desc)
     for (i = 0; i < oprsz; i += sizeof(int64_t)) {
         int64_t ai = *(int64_t *)(a + i);
         int64_t bi = *(int64_t *)(b + i);
-        int64_t di = ai - bi;
-        if (((di ^ ai) & (ai ^ bi)) < 0) {
-            /* Signed overflow.  */
+        int64_t di;
+        if (ssub64_overflow(ai, bi, &di)) {
             di = (di < 0 ? INT64_MAX : INT64_MIN);
         }
         *(int64_t *)(d + i) = di;
@@ -1209,8 +1205,8 @@ void HELPER(gvec_usadd32)(void *d, void *a, void *b, uint32_t desc)
     for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
         uint32_t ai = *(uint32_t *)(a + i);
         uint32_t bi = *(uint32_t *)(b + i);
-        uint32_t di = ai + bi;
-        if (di < ai) {
+        uint32_t di;
+        if (uadd32_overflow(ai, bi, &di)) {
             di = UINT32_MAX;
         }
         *(uint32_t *)(d + i) = di;
@@ -1226,8 +1222,8 @@ void HELPER(gvec_usadd64)(void *d, void *a, void *b, uint32_t desc)
     for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
         uint64_t ai = *(uint64_t *)(a + i);
         uint64_t bi = *(uint64_t *)(b + i);
-        uint64_t di = ai + bi;
-        if (di < ai) {
+        uint64_t di;
+        if (uadd64_overflow(ai, bi, &di)) {
             di = UINT64_MAX;
         }
         *(uint64_t *)(d + i) = di;
@@ -1273,8 +1269,8 @@ void HELPER(gvec_ussub32)(void *d, void *a, void *b, uint32_t desc)
     for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
         uint32_t ai = *(uint32_t *)(a + i);
         uint32_t bi = *(uint32_t *)(b + i);
-        uint32_t di = ai - bi;
-        if (ai < bi) {
+        uint32_t di;
+        if (usub32_overflow(ai, bi, &di)) {
             di = 0;
         }
         *(uint32_t *)(d + i) = di;
@@ -1290,8 +1286,8 @@ void HELPER(gvec_ussub64)(void *d, void *a, void *b, uint32_t desc)
     for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
         uint64_t ai = *(uint64_t *)(a + i);
         uint64_t bi = *(uint64_t *)(b + i);
-        uint64_t di = ai - bi;
-        if (ai < bi) {
+        uint64_t di;
+        if (usub64_overflow(ai, bi, &di)) {
             di = 0;
         }
         *(uint64_t *)(d + i) = di;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 05/72] tests/fp: add quad support to the benchmark utility
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (3 preceding siblings ...)
  2021-05-08  1:46 ` [PATCH 04/72] accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c Richard Henderson
@ 2021-05-08  1:46 ` Richard Henderson
  2021-05-11 10:01   ` David Hildenbrand
  2021-05-08  1:46 ` [PATCH 06/72] softfloat: Move the binary point to the msb Richard Henderson
                   ` (71 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

From: Alex Bennée <alex.bennee@linaro.org>

Currently this only support softfloat calculations because working out
if the hardware supports 128 bit floats needs configure magic. The 3
op muladd operation is currently unimplemented so commented out for
now.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20201020163738.27700-8-alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tests/fp/fp-bench.c | 88 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 83 insertions(+), 5 deletions(-)

diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
index 4ba5e1d2d4..d319993280 100644
--- a/tests/fp/fp-bench.c
+++ b/tests/fp/fp-bench.c
@@ -14,6 +14,7 @@
 #include <math.h>
 #include <fenv.h>
 #include "qemu/timer.h"
+#include "qemu/int128.h"
 #include "fpu/softfloat.h"
 
 /* amortize the computation of random inputs */
@@ -50,8 +51,10 @@ static const char * const op_names[] = {
 enum precision {
     PREC_SINGLE,
     PREC_DOUBLE,
+    PREC_QUAD,
     PREC_FLOAT32,
     PREC_FLOAT64,
+    PREC_FLOAT128,
     PREC_MAX_NR,
 };
 
@@ -89,6 +92,7 @@ union fp {
     double d;
     float32 f32;
     float64 f64;
+    float128 f128;
     uint64_t u64;
 };
 
@@ -113,6 +117,10 @@ struct op_desc {
 static uint64_t random_ops[MAX_OPERANDS] = {
     SEED_A, SEED_B, SEED_C,
 };
+
+static float128 random_quad_ops[MAX_OPERANDS] = {
+    {SEED_A, SEED_B}, {SEED_B, SEED_C}, {SEED_C, SEED_A},
+};
 static float_status soft_status;
 static enum precision precision;
 static enum op operation;
@@ -141,25 +149,45 @@ static void update_random_ops(int n_ops, enum precision prec)
     int i;
 
     for (i = 0; i < n_ops; i++) {
-        uint64_t r = random_ops[i];
 
         switch (prec) {
         case PREC_SINGLE:
         case PREC_FLOAT32:
+        {
+            uint64_t r = random_ops[i];
             do {
                 r = xorshift64star(r);
             } while (!float32_is_normal(r));
+            random_ops[i] = r;
             break;
+        }
         case PREC_DOUBLE:
         case PREC_FLOAT64:
+        {
+            uint64_t r = random_ops[i];
             do {
                 r = xorshift64star(r);
             } while (!float64_is_normal(r));
+            random_ops[i] = r;
             break;
+        }
+        case PREC_QUAD:
+        case PREC_FLOAT128:
+        {
+            float128 r = random_quad_ops[i];
+            uint64_t hi = r.high;
+            uint64_t lo = r.low;
+            do {
+                hi = xorshift64star(hi);
+                lo = xorshift64star(lo);
+                r = make_float128(hi, lo);
+            } while (!float128_is_normal(r));
+            random_quad_ops[i] = r;
+            break;
+        }
         default:
             g_assert_not_reached();
         }
-        random_ops[i] = r;
     }
 }
 
@@ -184,6 +212,13 @@ static void fill_random(union fp *ops, int n_ops, enum precision prec,
                 ops[i].f64 = float64_chs(ops[i].f64);
             }
             break;
+        case PREC_QUAD:
+        case PREC_FLOAT128:
+            ops[i].f128 = random_quad_ops[i];
+            if (no_neg && float128_is_neg(ops[i].f128)) {
+                ops[i].f128 = float128_chs(ops[i].f128);
+            }
+            break;
         default:
             g_assert_not_reached();
         }
@@ -345,6 +380,41 @@ static void bench(enum precision prec, enum op op, int n_ops, bool no_neg)
                 }
             }
             break;
+        case PREC_FLOAT128:
+            fill_random(ops, n_ops, prec, no_neg);
+            t0 = get_clock();
+            for (i = 0; i < OPS_PER_ITER; i++) {
+                float128 a = ops[0].f128;
+                float128 b = ops[1].f128;
+                /* float128 c = ops[2].f128; */
+
+                switch (op) {
+                case OP_ADD:
+                    res.f128 = float128_add(a, b, &soft_status);
+                    break;
+                case OP_SUB:
+                    res.f128 = float128_sub(a, b, &soft_status);
+                    break;
+                case OP_MUL:
+                    res.f128 = float128_mul(a, b, &soft_status);
+                    break;
+                case OP_DIV:
+                    res.f128 = float128_div(a, b, &soft_status);
+                    break;
+                /* case OP_FMA: */
+                /*     res.f128 = float128_muladd(a, b, c, 0, &soft_status); */
+                /*     break; */
+                case OP_SQRT:
+                    res.f128 = float128_sqrt(a, &soft_status);
+                    break;
+                case OP_CMP:
+                    res.u64 = float128_compare_quiet(a, b, &soft_status);
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            }
+            break;
         default:
             g_assert_not_reached();
         }
@@ -369,7 +439,8 @@ static void bench(enum precision prec, enum op op, int n_ops, bool no_neg)
     GEN_BENCH(bench_ ## opname ## _float, float, PREC_SINGLE, op, n_ops) \
     GEN_BENCH(bench_ ## opname ## _double, double, PREC_DOUBLE, op, n_ops) \
     GEN_BENCH(bench_ ## opname ## _float32, float32, PREC_FLOAT32, op, n_ops) \
-    GEN_BENCH(bench_ ## opname ## _float64, float64, PREC_FLOAT64, op, n_ops)
+    GEN_BENCH(bench_ ## opname ## _float64, float64, PREC_FLOAT64, op, n_ops) \
+    GEN_BENCH(bench_ ## opname ## _float128, float128, PREC_FLOAT128, op, n_ops)
 
 GEN_BENCH_ALL_TYPES(add, OP_ADD, 2)
 GEN_BENCH_ALL_TYPES(sub, OP_SUB, 2)
@@ -383,7 +454,8 @@ GEN_BENCH_ALL_TYPES(cmp, OP_CMP, 2)
     GEN_BENCH_NO_NEG(bench_ ## name ## _float, float, PREC_SINGLE, op, n) \
     GEN_BENCH_NO_NEG(bench_ ## name ## _double, double, PREC_DOUBLE, op, n) \
     GEN_BENCH_NO_NEG(bench_ ## name ## _float32, float32, PREC_FLOAT32, op, n) \
-    GEN_BENCH_NO_NEG(bench_ ## name ## _float64, float64, PREC_FLOAT64, op, n)
+    GEN_BENCH_NO_NEG(bench_ ## name ## _float64, float64, PREC_FLOAT64, op, n) \
+    GEN_BENCH_NO_NEG(bench_ ## name ## _float128, float128, PREC_FLOAT128, op, n)
 
 GEN_BENCH_ALL_TYPES_NO_NEG(sqrt, OP_SQRT, 1)
 #undef GEN_BENCH_ALL_TYPES_NO_NEG
@@ -397,6 +469,7 @@ GEN_BENCH_ALL_TYPES_NO_NEG(sqrt, OP_SQRT, 1)
         [PREC_DOUBLE]    = bench_ ## opname ## _double,         \
         [PREC_FLOAT32]   = bench_ ## opname ## _float32,        \
         [PREC_FLOAT64]   = bench_ ## opname ## _float64,        \
+        [PREC_FLOAT128]   = bench_ ## opname ## _float128,      \
     }
 
 static const bench_func_t bench_funcs[OP_MAX_NR][PREC_MAX_NR] = {
@@ -445,7 +518,7 @@ static void usage_complete(int argc, char *argv[])
     fprintf(stderr, " -h = show this help message.\n");
     fprintf(stderr, " -o = floating point operation (%s). Default: %s\n",
             op_list, op_names[0]);
-    fprintf(stderr, " -p = floating point precision (single, double). "
+    fprintf(stderr, " -p = floating point precision (single, double, quad[soft only]). "
             "Default: single\n");
     fprintf(stderr, " -r = rounding mode (even, zero, down, up, tieaway). "
             "Default: even\n");
@@ -565,6 +638,8 @@ static void parse_args(int argc, char *argv[])
                 precision = PREC_SINGLE;
             } else if (!strcmp(optarg, "double")) {
                 precision = PREC_DOUBLE;
+            } else if (!strcmp(optarg, "quad")) {
+                precision = PREC_QUAD;
             } else {
                 fprintf(stderr, "Unsupported precision '%s'\n", optarg);
                 exit(EXIT_FAILURE);
@@ -608,6 +683,9 @@ static void parse_args(int argc, char *argv[])
         case PREC_DOUBLE:
             precision = PREC_FLOAT64;
             break;
+        case PREC_QUAD:
+            precision = PREC_FLOAT128;
+            break;
         default:
             g_assert_not_reached();
         }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 06/72] softfloat: Move the binary point to the msb
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (4 preceding siblings ...)
  2021-05-08  1:46 ` [PATCH 05/72] tests/fp: add quad support to the benchmark utility Richard Henderson
@ 2021-05-08  1:46 ` Richard Henderson
  2021-05-10 13:36   ` Alex Bennée
  2021-05-08  1:46 ` [PATCH 07/72] softfloat: Inline float_raise Richard Henderson
                   ` (70 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Rather than point the binary point at msb-1, put it at the msb.
Use uadd64_overflow to detect when addition overflows instead
of DECOMPOSED_OVERFLOW_BIT.

This reduces the number of special cases within the code, such
as shifting an int64_t either left or right during conversion.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 169 +++++++++++++++++++-----------------------------
 1 file changed, 66 insertions(+), 103 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 67cfa0fd82..cd777743f1 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -503,9 +503,8 @@ typedef struct {
     bool sign;
 } FloatParts;
 
-#define DECOMPOSED_BINARY_POINT    (64 - 2)
+#define DECOMPOSED_BINARY_POINT    63
 #define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
-#define DECOMPOSED_OVERFLOW_BIT    (DECOMPOSED_IMPLICIT_BIT << 1)
 
 /* Structure holding all of the relevant parameters for a format.
  *   exp_size: the size of the exponent field
@@ -657,7 +656,7 @@ static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm,
             part.cls = float_class_zero;
             part.frac = 0;
         } else {
-            int shift = clz64(part.frac) - 1;
+            int shift = clz64(part.frac);
             part.cls = float_class_normal;
             part.exp = parm->frac_shift - parm->exp_bias - shift + 1;
             part.frac <<= shift;
@@ -727,9 +726,8 @@ static FloatParts round_canonical(FloatParts p, float_status *s,
         if (likely(exp > 0)) {
             if (frac & round_mask) {
                 flags |= float_flag_inexact;
-                frac += inc;
-                if (frac & DECOMPOSED_OVERFLOW_BIT) {
-                    frac >>= 1;
+                if (uadd64_overflow(frac, inc, &frac)) {
+                    frac = (frac >> 1) | DECOMPOSED_IMPLICIT_BIT;
                     exp++;
                 }
             }
@@ -758,9 +756,12 @@ static FloatParts round_canonical(FloatParts p, float_status *s,
             p.cls = float_class_zero;
             goto do_zero;
         } else {
-            bool is_tiny = s->tininess_before_rounding
-                        || (exp < 0)
-                        || !((frac + inc) & DECOMPOSED_OVERFLOW_BIT);
+            bool is_tiny = s->tininess_before_rounding || (exp < 0);
+
+            if (!is_tiny) {
+                uint64_t discard;
+                is_tiny = !uadd64_overflow(frac, inc, &discard);
+            }
 
             shift64RightJamming(frac, 1 - exp, &frac);
             if (frac & round_mask) {
@@ -985,7 +986,7 @@ static FloatParts addsub_floats(FloatParts a, FloatParts b, bool subtract,
                 a.cls = float_class_zero;
                 a.sign = s->float_rounding_mode == float_round_down;
             } else {
-                int shift = clz64(a.frac) - 1;
+                int shift = clz64(a.frac);
                 a.frac = a.frac << shift;
                 a.exp = a.exp - shift;
                 a.sign = a_sign;
@@ -1022,9 +1023,10 @@ static FloatParts addsub_floats(FloatParts a, FloatParts b, bool subtract,
                 shift64RightJamming(a.frac, b.exp - a.exp, &a.frac);
                 a.exp = b.exp;
             }
-            a.frac += b.frac;
-            if (a.frac & DECOMPOSED_OVERFLOW_BIT) {
+
+            if (uadd64_overflow(a.frac, b.frac, &a.frac)) {
                 shift64RightJamming(a.frac, 1, &a.frac);
+                a.frac |= DECOMPOSED_IMPLICIT_BIT;
                 a.exp += 1;
             }
             return a;
@@ -1219,16 +1221,17 @@ static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s)
         int exp = a.exp + b.exp;
 
         mul64To128(a.frac, b.frac, &hi, &lo);
-        shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo);
-        if (lo & DECOMPOSED_OVERFLOW_BIT) {
-            shift64RightJamming(lo, 1, &lo);
+        if (hi & DECOMPOSED_IMPLICIT_BIT) {
             exp += 1;
+        } else {
+            hi <<= 1;
         }
+        hi |= (lo != 0);
 
         /* Re-use a */
         a.exp = exp;
         a.sign = sign;
-        a.frac = lo;
+        a.frac = hi;
         return a;
     }
     /* handle all the NaN cases */
@@ -1411,56 +1414,41 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
 
     p_exp = a.exp + b.exp;
 
-    /* Multiply of 2 62-bit numbers produces a (2*62) == 124-bit
-     * result.
-     */
     mul64To128(a.frac, b.frac, &hi, &lo);
-    /* binary point now at bit 124 */
 
-    /* check for overflow */
-    if (hi & (1ULL << (DECOMPOSED_BINARY_POINT * 2 + 1 - 64))) {
-        shift128RightJamming(hi, lo, 1, &hi, &lo);
+    /* Renormalize to the msb. */
+    if (hi & DECOMPOSED_IMPLICIT_BIT) {
         p_exp += 1;
+    } else {
+        shortShift128Left(hi, lo, 1, &hi, &lo);
     }
 
     /* + add/sub */
-    if (c.cls == float_class_zero) {
-        /* move binary point back to 62 */
-        shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo);
-    } else {
+    if (c.cls != float_class_zero) {
         int exp_diff = p_exp - c.exp;
         if (p_sign == c.sign) {
             /* Addition */
             if (exp_diff <= 0) {
-                shift128RightJamming(hi, lo,
-                                     DECOMPOSED_BINARY_POINT - exp_diff,
-                                     &hi, &lo);
-                lo += c.frac;
+                shift64RightJamming(hi, -exp_diff, &hi);
                 p_exp = c.exp;
+                if (uadd64_overflow(hi, c.frac, &hi)) {
+                    shift64RightJamming(hi, 1, &hi);
+                    hi |= DECOMPOSED_IMPLICIT_BIT;
+                    p_exp += 1;
+                }
             } else {
-                uint64_t c_hi, c_lo;
-                /* shift c to the same binary point as the product (124) */
-                c_hi = c.frac >> 2;
-                c_lo = 0;
-                shift128RightJamming(c_hi, c_lo,
-                                     exp_diff,
-                                     &c_hi, &c_lo);
-                add128(hi, lo, c_hi, c_lo, &hi, &lo);
-                /* move binary point back to 62 */
-                shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo);
+                uint64_t c_hi, c_lo, over;
+                shift128RightJamming(c.frac, 0, exp_diff, &c_hi, &c_lo);
+                add192(0, hi, lo, 0, c_hi, c_lo, &over, &hi, &lo);
+                if (over) {
+                    shift64RightJamming(hi, 1, &hi);
+                    hi |= DECOMPOSED_IMPLICIT_BIT;
+                    p_exp += 1;
+                }
             }
-
-            if (lo & DECOMPOSED_OVERFLOW_BIT) {
-                shift64RightJamming(lo, 1, &lo);
-                p_exp += 1;
-            }
-
         } else {
             /* Subtraction */
-            uint64_t c_hi, c_lo;
-            /* make C binary point match product at bit 124 */
-            c_hi = c.frac >> 2;
-            c_lo = 0;
+            uint64_t c_hi = c.frac, c_lo = 0;
 
             if (exp_diff <= 0) {
                 shift128RightJamming(hi, lo, -exp_diff, &hi, &lo);
@@ -1495,20 +1483,15 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
                 /* Normalizing to a binary point of 124 is the
                    correct adjust for the exponent.  However since we're
                    shifting, we might as well put the binary point back
-                   at 62 where we really want it.  Therefore shift as
+                   at 63 where we really want it.  Therefore shift as
                    if we're leaving 1 bit at the top of the word, but
                    adjust the exponent as if we're leaving 3 bits.  */
-                shift -= 1;
-                if (shift >= 64) {
-                    lo = lo << (shift - 64);
-                } else {
-                    hi = (hi << shift) | (lo >> (64 - shift));
-                    lo = hi | ((lo << shift) != 0);
-                }
-                p_exp -= shift - 2;
+                shift128Left(hi, lo, shift, &hi, &lo);
+                p_exp -= shift;
             }
         }
     }
+    hi |= (lo != 0);
 
     if (flags & float_muladd_halve_result) {
         p_exp -= 1;
@@ -1518,7 +1501,7 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
     a.cls = float_class_normal;
     a.sign = p_sign ^ sign_flip;
     a.exp = p_exp;
-    a.frac = lo;
+    a.frac = hi;
 
     return a;
 }
@@ -1742,25 +1725,17 @@ static FloatParts div_floats(FloatParts a, FloatParts b, float_status *s)
          * exponent to match.
          *
          * The udiv_qrnnd algorithm that we're using requires normalization,
-         * i.e. the msb of the denominator must be set.  Since we know that
-         * DECOMPOSED_BINARY_POINT is msb-1, the inputs must be shifted left
-         * by one (more), and the remainder must be shifted right by one.
+         * i.e. the msb of the denominator must be set, which is already true.
          */
         if (a.frac < b.frac) {
             exp -= 1;
-            shift128Left(0, a.frac, DECOMPOSED_BINARY_POINT + 2, &n1, &n0);
-        } else {
             shift128Left(0, a.frac, DECOMPOSED_BINARY_POINT + 1, &n1, &n0);
+        } else {
+            shift128Left(0, a.frac, DECOMPOSED_BINARY_POINT, &n1, &n0);
         }
-        q = udiv_qrnnd(&r, n1, n0, b.frac << 1);
+        q = udiv_qrnnd(&r, n1, n0, b.frac);
 
-        /*
-         * Set lsb if there is a remainder, to set inexact.
-         * As mentioned above, to find the actual value of the remainder we
-         * would need to shift right, but (1) we are only concerned about
-         * non-zero-ness, and (2) the remainder will always be even because
-         * both inputs to the division primitive are even.
-         */
+        /* Set lsb if there is a remainder, to set inexact. */
         a.frac = q | (r != 0);
         a.sign = sign;
         a.exp = exp;
@@ -2135,12 +2110,12 @@ static FloatParts round_to_int(FloatParts a, FloatRoundMode rmode,
 
             if (a.frac & rnd_mask) {
                 s->float_exception_flags |= float_flag_inexact;
-                a.frac += inc;
-                a.frac &= ~rnd_mask;
-                if (a.frac & DECOMPOSED_OVERFLOW_BIT) {
+                if (uadd64_overflow(a.frac, inc, &a.frac)) {
                     a.frac >>= 1;
+                    a.frac |= DECOMPOSED_IMPLICIT_BIT;
                     a.exp++;
                 }
+                a.frac &= ~rnd_mask;
             }
         }
         break;
@@ -2213,10 +2188,8 @@ static int64_t round_to_int_and_pack(FloatParts in, FloatRoundMode rmode,
     case float_class_zero:
         return 0;
     case float_class_normal:
-        if (p.exp < DECOMPOSED_BINARY_POINT) {
+        if (p.exp <= DECOMPOSED_BINARY_POINT) {
             r = p.frac >> (DECOMPOSED_BINARY_POINT - p.exp);
-        } else if (p.exp - DECOMPOSED_BINARY_POINT < 2) {
-            r = p.frac << (p.exp - DECOMPOSED_BINARY_POINT);
         } else {
             r = UINT64_MAX;
         }
@@ -2498,10 +2471,8 @@ static uint64_t round_to_uint_and_pack(FloatParts in, FloatRoundMode rmode,
             return 0;
         }
 
-        if (p.exp < DECOMPOSED_BINARY_POINT) {
+        if (p.exp <= DECOMPOSED_BINARY_POINT) {
             r = p.frac >> (DECOMPOSED_BINARY_POINT - p.exp);
-        } else if (p.exp - DECOMPOSED_BINARY_POINT < 2) {
-            r = p.frac << (p.exp - DECOMPOSED_BINARY_POINT);
         } else {
             s->float_exception_flags = orig_flags | float_flag_invalid;
             return max;
@@ -2765,11 +2736,11 @@ static FloatParts int_to_float(int64_t a, int scale, float_status *status)
             f = -f;
             r.sign = true;
         }
-        shift = clz64(f) - 1;
+        shift = clz64(f);
         scale = MIN(MAX(scale, -0x10000), 0x10000);
 
         r.exp = DECOMPOSED_BINARY_POINT - shift + scale;
-        r.frac = (shift < 0 ? DECOMPOSED_IMPLICIT_BIT : f << shift);
+        r.frac = f << shift;
     }
 
     return r;
@@ -2920,21 +2891,16 @@ bfloat16 int16_to_bfloat16(int16_t a, float_status *status)
 static FloatParts uint_to_float(uint64_t a, int scale, float_status *status)
 {
     FloatParts r = { .sign = false };
+    int shift;
 
     if (a == 0) {
         r.cls = float_class_zero;
     } else {
         scale = MIN(MAX(scale, -0x10000), 0x10000);
+        shift = clz64(a);
         r.cls = float_class_normal;
-        if ((int64_t)a < 0) {
-            r.exp = DECOMPOSED_BINARY_POINT + 1 + scale;
-            shift64RightJamming(a, 1, &a);
-            r.frac = a;
-        } else {
-            int shift = clz64(a) - 1;
-            r.exp = DECOMPOSED_BINARY_POINT - shift + scale;
-            r.frac = a << shift;
-        }
+        r.exp = DECOMPOSED_BINARY_POINT - shift + scale;
+        r.frac = a << shift;
     }
 
     return r;
@@ -3475,12 +3441,9 @@ static FloatParts sqrt_float(FloatParts a, float_status *s, const FloatFmt *p)
     /* We need two overflow bits at the top. Adding room for that is a
      * right shift. If the exponent is odd, we can discard the low bit
      * by multiplying the fraction by 2; that's a left shift. Combine
-     * those and we shift right if the exponent is even.
+     * those and we shift right by 1 if the exponent is odd, otherwise 2.
      */
-    a_frac = a.frac;
-    if (!(a.exp & 1)) {
-        a_frac >>= 1;
-    }
+    a_frac = a.frac >> (2 - (a.exp & 1));
     a.exp >>= 1;
 
     /* Bit-by-bit computation of sqrt.  */
@@ -3488,10 +3451,10 @@ static FloatParts sqrt_float(FloatParts a, float_status *s, const FloatFmt *p)
     s_frac = 0;
 
     /* Iterate from implicit bit down to the 3 extra bits to compute a
-     * properly rounded result. Remember we've inserted one more bit
-     * at the top, so these positions are one less.
+     * properly rounded result. Remember we've inserted two more bits
+     * at the top, so these positions are two less.
      */
-    bit = DECOMPOSED_BINARY_POINT - 1;
+    bit = DECOMPOSED_BINARY_POINT - 2;
     last_bit = MAX(p->frac_shift - 4, 0);
     do {
         uint64_t q = 1ULL << bit;
@@ -3507,7 +3470,7 @@ static FloatParts sqrt_float(FloatParts a, float_status *s, const FloatFmt *p)
     /* Undo the right shift done above. If there is any remaining
      * fraction, the result is inexact. Set the sticky bit.
      */
-    a.frac = (r_frac << 1) + (a_frac != 0);
+    a.frac = (r_frac << 2) + (a_frac != 0);
 
     return a;
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 07/72] softfloat: Inline float_raise
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (5 preceding siblings ...)
  2021-05-08  1:46 ` [PATCH 06/72] softfloat: Move the binary point to the msb Richard Henderson
@ 2021-05-08  1:46 ` Richard Henderson
  2021-05-09  8:32   ` Philippe Mathieu-Daudé
  2021-05-11 10:04   ` David Hildenbrand
  2021-05-08  1:46 ` [PATCH 08/72] softfloat: Use float_raise in more places Richard Henderson
                   ` (69 subsequent siblings)
  76 siblings, 2 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat.h        |  5 ++++-
 fpu/softfloat-specialize.c.inc | 12 ------------
 2 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 78ad5ca738..019c2ec66d 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -100,7 +100,10 @@ typedef enum {
 | Routine to raise any or all of the software IEC/IEEE floating-point
 | exception flags.
 *----------------------------------------------------------------------------*/
-void float_raise(uint8_t flags, float_status *status);
+static inline void float_raise(uint8_t flags, float_status *status)
+{
+    status->float_exception_flags |= flags;
+}
 
 /*----------------------------------------------------------------------------
 | If `a' is denormal and we are in flush-to-zero mode then set the
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index 9ea318f3e2..487b29155c 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -228,18 +228,6 @@ floatx80 floatx80_default_nan(float_status *status)
 const floatx80 floatx80_infinity
     = make_floatx80_init(floatx80_infinity_high, floatx80_infinity_low);
 
-/*----------------------------------------------------------------------------
-| Raises the exceptions specified by `flags'.  Floating-point traps can be
-| defined here if desired.  It is currently not possible for such a trap
-| to substitute a result value.  If traps are not implemented, this routine
-| should be simply `float_exception_flags |= flags;'.
-*----------------------------------------------------------------------------*/
-
-void float_raise(uint8_t flags, float_status *status)
-{
-    status->float_exception_flags |= flags;
-}
-
 /*----------------------------------------------------------------------------
 | Internal canonical NaN format.
 *----------------------------------------------------------------------------*/
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 08/72] softfloat: Use float_raise in more places
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (6 preceding siblings ...)
  2021-05-08  1:46 ` [PATCH 07/72] softfloat: Inline float_raise Richard Henderson
@ 2021-05-08  1:46 ` Richard Henderson
  2021-05-09  8:34   ` Philippe Mathieu-Daudé
  2021-05-11 10:06   ` David Hildenbrand
  2021-05-08  1:46 ` [PATCH 09/72] softfloat: Tidy a * b + inf return Richard Henderson
                   ` (68 subsequent siblings)
  76 siblings, 2 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

We have been somewhat inconsistent about when to use
float_raise and when to or in the bit by hand.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 87 +++++++++++++++++++++++++------------------------
 1 file changed, 44 insertions(+), 43 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index cd777743f1..93fe785809 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -132,7 +132,7 @@ this code that are retained.
         if (unlikely(soft_t ## _is_denormal(*a))) {                     \
             *a = soft_t ## _set_sign(soft_t ## _zero,                   \
                                      soft_t ## _is_neg(*a));            \
-            s->float_exception_flags |= float_flag_input_denormal;      \
+            float_raise(float_flag_input_denormal, s);                  \
         }                                                               \
     }
 
@@ -360,7 +360,7 @@ float32_gen2(float32 xa, float32 xb, float_status *s,
 
     ur.h = hard(ua.h, ub.h);
     if (unlikely(f32_is_inf(ur))) {
-        s->float_exception_flags |= float_flag_overflow;
+        float_raise(float_flag_overflow, s);
     } else if (unlikely(fabsf(ur.h) <= FLT_MIN) && post(ua, ub)) {
         goto soft;
     }
@@ -391,7 +391,7 @@ float64_gen2(float64 xa, float64 xb, float_status *s,
 
     ur.h = hard(ua.h, ub.h);
     if (unlikely(f64_is_inf(ur))) {
-        s->float_exception_flags |= float_flag_overflow;
+        float_raise(float_flag_overflow, s);
     } else if (unlikely(fabs(ur.h) <= DBL_MIN) && post(ua, ub)) {
         goto soft;
     }
@@ -880,7 +880,7 @@ static FloatParts return_nan(FloatParts a, float_status *s)
 {
     switch (a.cls) {
     case float_class_snan:
-        s->float_exception_flags |= float_flag_invalid;
+        float_raise(float_flag_invalid, s);
         a = parts_silence_nan(a, s);
         /* fall through */
     case float_class_qnan:
@@ -898,7 +898,7 @@ static FloatParts return_nan(FloatParts a, float_status *s)
 static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
 {
     if (is_snan(a.cls) || is_snan(b.cls)) {
-        s->float_exception_flags |= float_flag_invalid;
+        float_raise(float_flag_invalid, s);
     }
 
     if (s->default_nan_mode) {
@@ -922,7 +922,7 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
     int which;
 
     if (is_snan(a.cls) || is_snan(b.cls) || is_snan(c.cls)) {
-        s->float_exception_flags |= float_flag_invalid;
+        float_raise(float_flag_invalid, s);
     }
 
     which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
@@ -1241,7 +1241,7 @@ static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s)
     /* Inf * Zero == NaN */
     if ((a.cls == float_class_inf && b.cls == float_class_zero) ||
         (a.cls == float_class_zero && b.cls == float_class_inf)) {
-        s->float_exception_flags |= float_flag_invalid;
+        float_raise(float_flag_invalid, s);
         return parts_default_nan(s);
     }
     /* Multiply by 0 or Inf */
@@ -1356,6 +1356,7 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
     }
 
     if (inf_zero) {
+        float_raise(float_flag_invalid, s);
         s->float_exception_flags |= float_flag_invalid;
         return parts_default_nan(s);
     }
@@ -1380,7 +1381,7 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
 
     if (c.cls == float_class_inf) {
         if (p_class == float_class_inf && p_sign != c.sign) {
-            s->float_exception_flags |= float_flag_invalid;
+            float_raise(float_flag_invalid, s);
             return parts_default_nan(s);
         } else {
             a.cls = float_class_inf;
@@ -1598,7 +1599,7 @@ float32_muladd(float32 xa, float32 xb, float32 xc, int flags, float_status *s)
         ur.h = fmaf(ua.h, ub.h, uc.h);
 
         if (unlikely(f32_is_inf(ur))) {
-            s->float_exception_flags |= float_flag_overflow;
+            float_raise(float_flag_overflow, s);
         } else if (unlikely(fabsf(ur.h) <= FLT_MIN)) {
             ua = ua_orig;
             uc = uc_orig;
@@ -1669,7 +1670,7 @@ float64_muladd(float64 xa, float64 xb, float64 xc, int flags, float_status *s)
         ur.h = fma(ua.h, ub.h, uc.h);
 
         if (unlikely(f64_is_inf(ur))) {
-            s->float_exception_flags |= float_flag_overflow;
+            float_raise(float_flag_overflow, s);
         } else if (unlikely(fabs(ur.h) <= FLT_MIN)) {
             ua = ua_orig;
             uc = uc_orig;
@@ -1749,7 +1750,7 @@ static FloatParts div_floats(FloatParts a, FloatParts b, float_status *s)
     if (a.cls == b.cls
         &&
         (a.cls == float_class_inf || a.cls == float_class_zero)) {
-        s->float_exception_flags |= float_flag_invalid;
+        float_raise(float_flag_invalid, s);
         return parts_default_nan(s);
     }
     /* Inf / x or 0 / x */
@@ -1759,7 +1760,7 @@ static FloatParts div_floats(FloatParts a, FloatParts b, float_status *s)
     }
     /* Div 0 => Inf */
     if (b.cls == float_class_zero) {
-        s->float_exception_flags |= float_flag_divbyzero;
+        float_raise(float_flag_divbyzero, s);
         a.cls = float_class_inf;
         a.sign = sign;
         return a;
@@ -1895,7 +1896,7 @@ static FloatParts float_to_float(FloatParts a, const FloatFmt *dstf,
             /* There is no NaN in the destination format.  Raise Invalid
              * and return a zero with the sign of the input NaN.
              */
-            s->float_exception_flags |= float_flag_invalid;
+            float_raise(float_flag_invalid, s);
             a.cls = float_class_zero;
             a.frac = 0;
             a.exp = 0;
@@ -1905,7 +1906,7 @@ static FloatParts float_to_float(FloatParts a, const FloatFmt *dstf,
             /* There is no Inf in the destination format.  Raise Invalid
              * and return the maximum normal with the correct sign.
              */
-            s->float_exception_flags |= float_flag_invalid;
+            float_raise(float_flag_invalid, s);
             a.cls = float_class_normal;
             a.exp = dstf->exp_max;
             a.frac = ((1ull << dstf->frac_size) - 1) << dstf->frac_shift;
@@ -1916,7 +1917,7 @@ static FloatParts float_to_float(FloatParts a, const FloatFmt *dstf,
         }
     } else if (is_nan(a.cls)) {
         if (is_snan(a.cls)) {
-            s->float_exception_flags |= float_flag_invalid;
+            float_raise(float_flag_invalid, s);
             a = parts_silence_nan(a, s);
         }
         if (s->default_nan_mode) {
@@ -2048,7 +2049,7 @@ static FloatParts round_to_int(FloatParts a, FloatRoundMode rmode,
         if (a.exp < 0) {
             bool one;
             /* all fractional */
-            s->float_exception_flags |= float_flag_inexact;
+            float_raise(float_flag_inexact, s);
             switch (rmode) {
             case float_round_nearest_even:
                 one = a.exp == -1 && a.frac > DECOMPOSED_IMPLICIT_BIT;
@@ -2109,7 +2110,7 @@ static FloatParts round_to_int(FloatParts a, FloatRoundMode rmode,
             }
 
             if (a.frac & rnd_mask) {
-                s->float_exception_flags |= float_flag_inexact;
+                float_raise(float_flag_inexact, s);
                 if (uadd64_overflow(a.frac, inc, &a.frac)) {
                     a.frac >>= 1;
                     a.frac |= DECOMPOSED_IMPLICIT_BIT;
@@ -3188,7 +3189,7 @@ static FloatRelation compare_floats(FloatParts a, FloatParts b, bool is_quiet,
         if (!is_quiet ||
             a.cls == float_class_snan ||
             b.cls == float_class_snan) {
-            s->float_exception_flags |= float_flag_invalid;
+            float_raise(float_flag_invalid, s);
         }
         return float_relation_unordered;
     }
@@ -3429,7 +3430,7 @@ static FloatParts sqrt_float(FloatParts a, float_status *s, const FloatFmt *p)
         return a;  /* sqrt(+-0) = +-0 */
     }
     if (a.sign) {
-        s->float_exception_flags |= float_flag_invalid;
+        float_raise(float_flag_invalid, s);
         return parts_default_nan(s);
     }
     if (a.cls == float_class_inf) {
@@ -3760,7 +3761,7 @@ static int32_t roundAndPackInt32(bool zSign, uint64_t absZ,
         return zSign ? INT32_MIN : INT32_MAX;
     }
     if (roundBits) {
-        status->float_exception_flags |= float_flag_inexact;
+        float_raise(float_flag_inexact, status);
     }
     return z;
 
@@ -3822,7 +3823,7 @@ static int64_t roundAndPackInt64(bool zSign, uint64_t absZ0, uint64_t absZ1,
         return zSign ? INT64_MIN : INT64_MAX;
     }
     if (absZ1) {
-        status->float_exception_flags |= float_flag_inexact;
+        float_raise(float_flag_inexact, status);
     }
     return z;
 
@@ -3883,7 +3884,7 @@ static int64_t roundAndPackUint64(bool zSign, uint64_t absZ0,
     }
 
     if (absZ1) {
-        status->float_exception_flags |= float_flag_inexact;
+        float_raise(float_flag_inexact, status);
     }
     return absZ0;
 }
@@ -3994,7 +3995,7 @@ static float32 roundAndPackFloat32(bool zSign, int zExp, uint32_t zSig,
         }
     }
     if (roundBits) {
-        status->float_exception_flags |= float_flag_inexact;
+        float_raise(float_flag_inexact, status);
     }
     zSig = ( zSig + roundIncrement )>>7;
     if (!(roundBits ^ 0x40) && roundNearestEven) {
@@ -4150,7 +4151,7 @@ static float64 roundAndPackFloat64(bool zSign, int zExp, uint64_t zSig,
         }
     }
     if (roundBits) {
-        status->float_exception_flags |= float_flag_inexact;
+        float_raise(float_flag_inexact, status);
     }
     zSig = ( zSig + roundIncrement )>>10;
     if (!(roundBits ^ 0x200) && roundNearestEven) {
@@ -4284,7 +4285,7 @@ floatx80 roundAndPackFloatx80(int8_t roundingPrecision, bool zSign,
                 float_raise(float_flag_underflow, status);
             }
             if (roundBits) {
-                status->float_exception_flags |= float_flag_inexact;
+                float_raise(float_flag_inexact, status);
             }
             zSig0 += roundIncrement;
             if ( (int64_t) zSig0 < 0 ) zExp = 1;
@@ -4297,7 +4298,7 @@ floatx80 roundAndPackFloatx80(int8_t roundingPrecision, bool zSign,
         }
     }
     if (roundBits) {
-        status->float_exception_flags |= float_flag_inexact;
+        float_raise(float_flag_inexact, status);
     }
     zSig0 += roundIncrement;
     if ( zSig0 < roundIncrement ) {
@@ -4360,7 +4361,7 @@ floatx80 roundAndPackFloatx80(int8_t roundingPrecision, bool zSign,
                 float_raise(float_flag_underflow, status);
             }
             if (zSig1) {
-                status->float_exception_flags |= float_flag_inexact;
+                float_raise(float_flag_inexact, status);
             }
             switch (roundingMode) {
             case float_round_nearest_even:
@@ -4390,7 +4391,7 @@ floatx80 roundAndPackFloatx80(int8_t roundingPrecision, bool zSign,
         }
     }
     if (zSig1) {
-        status->float_exception_flags |= float_flag_inexact;
+        float_raise(float_flag_inexact, status);
     }
     if ( increment ) {
         ++zSig0;
@@ -4667,7 +4668,7 @@ static float128 roundAndPackFloat128(bool zSign, int32_t zExp,
         }
     }
     if (zSig2) {
-        status->float_exception_flags |= float_flag_inexact;
+        float_raise(float_flag_inexact, status);
     }
     if ( increment ) {
         add128( zSig0, zSig1, 0, 1, &zSig0, &zSig1 );
@@ -5405,7 +5406,7 @@ int32_t floatx80_to_int32_round_to_zero(floatx80 a, float_status *status)
     }
     else if ( aExp < 0x3FFF ) {
         if (aExp || aSig) {
-            status->float_exception_flags |= float_flag_inexact;
+            float_raise(float_flag_inexact, status);
         }
         return 0;
     }
@@ -5420,7 +5421,7 @@ int32_t floatx80_to_int32_round_to_zero(floatx80 a, float_status *status)
         return aSign ? (int32_t) 0x80000000 : 0x7FFFFFFF;
     }
     if ( ( aSig<<shiftCount ) != savedASig ) {
-        status->float_exception_flags |= float_flag_inexact;
+        float_raise(float_flag_inexact, status);
     }
     return z;
 
@@ -5504,13 +5505,13 @@ int64_t floatx80_to_int64_round_to_zero(floatx80 a, float_status *status)
     }
     else if ( aExp < 0x3FFF ) {
         if (aExp | aSig) {
-            status->float_exception_flags |= float_flag_inexact;
+            float_raise(float_flag_inexact, status);
         }
         return 0;
     }
     z = aSig>>( - shiftCount );
     if ( (uint64_t) ( aSig<<( shiftCount & 63 ) ) ) {
-        status->float_exception_flags |= float_flag_inexact;
+        float_raise(float_flag_inexact, status);
     }
     if ( aSign ) z = - z;
     return z;
@@ -5661,7 +5662,7 @@ floatx80 floatx80_round_to_int(floatx80 a, float_status *status)
              && ( (uint64_t) ( extractFloatx80Frac( a ) ) == 0 ) ) {
             return a;
         }
-        status->float_exception_flags |= float_flag_inexact;
+        float_raise(float_flag_inexact, status);
         aSign = extractFloatx80Sign( a );
         switch (status->float_rounding_mode) {
          case float_round_nearest_even:
@@ -5728,7 +5729,7 @@ floatx80 floatx80_round_to_int(floatx80 a, float_status *status)
         z.low = UINT64_C(0x8000000000000000);
     }
     if (z.low != a.low) {
-        status->float_exception_flags |= float_flag_inexact;
+        float_raise(float_flag_inexact, status);
     }
     return z;
 
@@ -6364,7 +6365,7 @@ int32_t float128_to_int32_round_to_zero(float128 a, float_status *status)
     }
     else if ( aExp < 0x3FFF ) {
         if (aExp || aSig0) {
-            status->float_exception_flags |= float_flag_inexact;
+            float_raise(float_flag_inexact, status);
         }
         return 0;
     }
@@ -6380,7 +6381,7 @@ int32_t float128_to_int32_round_to_zero(float128 a, float_status *status)
         return aSign ? INT32_MIN : INT32_MAX;
     }
     if ( ( aSig0<<shiftCount ) != savedASig ) {
-        status->float_exception_flags |= float_flag_inexact;
+        float_raise(float_flag_inexact, status);
     }
     return z;
 
@@ -6458,7 +6459,7 @@ int64_t float128_to_int64_round_to_zero(float128 a, float_status *status)
             if (    ( a.high == UINT64_C(0xC03E000000000000) )
                  && ( aSig1 < UINT64_C(0x0002000000000000) ) ) {
                 if (aSig1) {
-                    status->float_exception_flags |= float_flag_inexact;
+                    float_raise(float_flag_inexact, status);
                 }
             }
             else {
@@ -6471,20 +6472,20 @@ int64_t float128_to_int64_round_to_zero(float128 a, float_status *status)
         }
         z = ( aSig0<<shiftCount ) | ( aSig1>>( ( - shiftCount ) & 63 ) );
         if ( (uint64_t) ( aSig1<<shiftCount ) ) {
-            status->float_exception_flags |= float_flag_inexact;
+            float_raise(float_flag_inexact, status);
         }
     }
     else {
         if ( aExp < 0x3FFF ) {
             if ( aExp | aSig0 | aSig1 ) {
-                status->float_exception_flags |= float_flag_inexact;
+                float_raise(float_flag_inexact, status);
             }
             return 0;
         }
         z = aSig0>>( - shiftCount );
         if (    aSig1
              || ( shiftCount && (uint64_t) ( aSig0<<( shiftCount & 63 ) ) ) ) {
-            status->float_exception_flags |= float_flag_inexact;
+            float_raise(float_flag_inexact, status);
         }
     }
     if ( aSign ) z = - z;
@@ -6793,7 +6794,7 @@ float128 float128_round_to_int(float128 a, float_status *status)
     else {
         if ( aExp < 0x3FFF ) {
             if ( ( ( (uint64_t) ( a.high<<1 ) ) | a.low ) == 0 ) return a;
-            status->float_exception_flags |= float_flag_inexact;
+            float_raise(float_flag_inexact, status);
             aSign = extractFloat128Sign( a );
             switch (status->float_rounding_mode) {
             case float_round_nearest_even:
@@ -6867,7 +6868,7 @@ float128 float128_round_to_int(float128 a, float_status *status)
         z.high &= ~ roundBitsMask;
     }
     if ( ( z.low != a.low ) || ( z.high != a.high ) ) {
-        status->float_exception_flags |= float_flag_inexact;
+        float_raise(float_flag_inexact, status);
     }
     return z;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 09/72] softfloat: Tidy a * b + inf return
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (7 preceding siblings ...)
  2021-05-08  1:46 ` [PATCH 08/72] softfloat: Use float_raise in more places Richard Henderson
@ 2021-05-08  1:46 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 10/72] softfloat: Add float_cmask and constants Richard Henderson
                   ` (67 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, Philippe Mathieu-Daudé, david

No reason to set values in 'a', when we already
have float_class_inf in 'c', and can flip that sign.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 93fe785809..ee4b5073b6 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1384,9 +1384,8 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
             float_raise(float_flag_invalid, s);
             return parts_default_nan(s);
         } else {
-            a.cls = float_class_inf;
-            a.sign = c.sign ^ sign_flip;
-            return a;
+            c.sign ^= sign_flip;
+            return c;
         }
     }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 10/72] softfloat: Add float_cmask and constants
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (8 preceding siblings ...)
  2021-05-08  1:46 ` [PATCH 09/72] softfloat: Tidy a * b + inf return Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 11/72] softfloat: Use return_nan in float_to_float Richard Henderson
                   ` (66 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Testing more than one class at a time is better done with masks.
This reduces the static branch count.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ee4b5073b6..64edb23793 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -469,6 +469,20 @@ typedef enum __attribute__ ((__packed__)) {
     float_class_snan,
 } FloatClass;
 
+#define float_cmask(bit)  (1u << (bit))
+
+enum {
+    float_cmask_zero    = float_cmask(float_class_zero),
+    float_cmask_normal  = float_cmask(float_class_normal),
+    float_cmask_inf     = float_cmask(float_class_inf),
+    float_cmask_qnan    = float_cmask(float_class_qnan),
+    float_cmask_snan    = float_cmask(float_class_snan),
+
+    float_cmask_infzero = float_cmask_zero | float_cmask_inf,
+    float_cmask_anynan  = float_cmask_qnan | float_cmask_snan,
+};
+
+
 /* Simple helpers for checking if, or what kind of, NaN we have */
 static inline __attribute__((unused)) bool is_nan(FloatClass c)
 {
@@ -1338,26 +1352,28 @@ bfloat16 QEMU_FLATTEN bfloat16_mul(bfloat16 a, bfloat16 b, float_status *status)
 static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
                                 int flags, float_status *s)
 {
-    bool inf_zero = ((1 << a.cls) | (1 << b.cls)) ==
-                    ((1 << float_class_inf) | (1 << float_class_zero));
-    bool p_sign;
+    bool inf_zero, p_sign;
     bool sign_flip = flags & float_muladd_negate_result;
     FloatClass p_class;
     uint64_t hi, lo;
     int p_exp;
+    int ab_mask, abc_mask;
+
+    ab_mask = float_cmask(a.cls) | float_cmask(b.cls);
+    abc_mask = float_cmask(c.cls) | ab_mask;
+    inf_zero = ab_mask == float_cmask_infzero;
 
     /* It is implementation-defined whether the cases of (0,inf,qnan)
      * and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
      * they return if they do), so we have to hand this information
      * off to the target-specific pick-a-NaN routine.
      */
-    if (is_nan(a.cls) || is_nan(b.cls) || is_nan(c.cls)) {
+    if (unlikely(abc_mask & float_cmask_anynan)) {
         return pick_nan_muladd(a, b, c, inf_zero, s);
     }
 
     if (inf_zero) {
         float_raise(float_flag_invalid, s);
-        s->float_exception_flags |= float_flag_invalid;
         return parts_default_nan(s);
     }
 
@@ -1371,9 +1387,9 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
         p_sign ^= 1;
     }
 
-    if (a.cls == float_class_inf || b.cls == float_class_inf) {
+    if (ab_mask & float_cmask_inf) {
         p_class = float_class_inf;
-    } else if (a.cls == float_class_zero || b.cls == float_class_zero) {
+    } else if (ab_mask & float_cmask_zero) {
         p_class = float_class_zero;
     } else {
         p_class = float_class_normal;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 11/72] softfloat: Use return_nan in float_to_float
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (9 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 10/72] softfloat: Add float_cmask and constants Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-10 15:10   ` Alex Bennée
  2021-05-11 10:10   ` David Hildenbrand
  2021-05-08  1:47 ` [PATCH 12/72] softfloat: fix return_nan vs default_nan_mode Richard Henderson
                   ` (65 subsequent siblings)
  76 siblings, 2 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 64edb23793..b694e38522 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1931,13 +1931,7 @@ static FloatParts float_to_float(FloatParts a, const FloatFmt *dstf,
             break;
         }
     } else if (is_nan(a.cls)) {
-        if (is_snan(a.cls)) {
-            float_raise(float_flag_invalid, s);
-            a = parts_silence_nan(a, s);
-        }
-        if (s->default_nan_mode) {
-            return parts_default_nan(s);
-        }
+        return return_nan(a, s);
     }
     return a;
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 12/72] softfloat: fix return_nan vs default_nan_mode
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (10 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 11/72] softfloat: Use return_nan in float_to_float Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-10 15:12   ` Alex Bennée
  2021-05-11 10:12   ` David Hildenbrand
  2021-05-08  1:47 ` [PATCH 13/72] target/mips: Set set_default_nan_mode with set_snan_bit_is_one Richard Henderson
                   ` (64 subsequent siblings)
  76 siblings, 2 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Do not call parts_silence_nan when default_nan_mode is in
effect.  This will avoid an assert in a later patch.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index b694e38522..6589f00b23 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -892,21 +892,16 @@ static float64 float64_round_pack_canonical(FloatParts p, float_status *s)
 
 static FloatParts return_nan(FloatParts a, float_status *s)
 {
-    switch (a.cls) {
-    case float_class_snan:
+    g_assert(is_nan(a.cls));
+    if (is_snan(a.cls)) {
         float_raise(float_flag_invalid, s);
-        a = parts_silence_nan(a, s);
-        /* fall through */
-    case float_class_qnan:
-        if (s->default_nan_mode) {
-            return parts_default_nan(s);
+        if (!s->default_nan_mode) {
+            return parts_silence_nan(a, s);
         }
-        break;
-
-    default:
-        g_assert_not_reached();
+    } else if (!s->default_nan_mode) {
+        return a;
     }
-    return a;
+    return parts_default_nan(s);
 }
 
 static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 13/72] target/mips: Set set_default_nan_mode with set_snan_bit_is_one
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (11 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 12/72] softfloat: fix return_nan vs default_nan_mode Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-11  9:37   ` Alex Bennée
  2021-05-11 10:16   ` David Hildenbrand
  2021-05-08  1:47 ` [PATCH 14/72] softfloat: Do not produce a default_nan from parts_silence_nan Richard Henderson
                   ` (63 subsequent siblings)
  76 siblings, 2 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

This behavior is currently hard-coded in parts_silence_nan,
but setting this bit properly will allow this to be cleaned up.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/mips/fpu_helper.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/target/mips/fpu_helper.h b/target/mips/fpu_helper.h
index 1c2d6d35a7..ad1116e8c1 100644
--- a/target/mips/fpu_helper.h
+++ b/target/mips/fpu_helper.h
@@ -27,8 +27,14 @@ static inline void restore_flush_mode(CPUMIPSState *env)
 
 static inline void restore_snan_bit_mode(CPUMIPSState *env)
 {
-    set_snan_bit_is_one((env->active_fpu.fcr31 & (1 << FCR31_NAN2008)) == 0,
-                        &env->active_fpu.fp_status);
+    bool nan2008 = env->active_fpu.fcr31 & (1 << FCR31_NAN2008);
+
+    /*
+     * With nan2008, SNaNs are silenced in the usual way.
+     * Before that, SNaNs are not silenced; default nans are produced.
+     */
+    set_snan_bit_is_one(!nan2008, &env->active_fpu.fp_status);
+    set_default_nan_mode(!nan2008, &env->active_fpu.fp_status);
 }
 
 static inline void restore_fp_status(CPUMIPSState *env)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 14/72] softfloat: Do not produce a default_nan from parts_silence_nan
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (12 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 13/72] target/mips: Set set_default_nan_mode with set_snan_bit_is_one Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-11 10:16   ` David Hildenbrand
  2021-05-11 10:32   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 15/72] softfloat: Rename FloatParts to FloatParts64 Richard Henderson
                   ` (62 subsequent siblings)
  76 siblings, 2 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Require default_nan_mode to be set instead.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat-specialize.c.inc | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index 487b29155c..5988830c16 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -180,16 +180,15 @@ static FloatParts parts_default_nan(float_status *status)
 static FloatParts parts_silence_nan(FloatParts a, float_status *status)
 {
     g_assert(!no_signaling_nans(status));
-#if defined(TARGET_HPPA)
-    a.frac &= ~(1ULL << (DECOMPOSED_BINARY_POINT - 1));
-    a.frac |= 1ULL << (DECOMPOSED_BINARY_POINT - 2);
-#else
+    g_assert(!status->default_nan_mode);
+
+    /* The only snan_bit_is_one target without default_nan_mode is HPPA. */
     if (snan_bit_is_one(status)) {
-        return parts_default_nan(status);
+        a.frac &= ~(1ULL << (DECOMPOSED_BINARY_POINT - 1));
+        a.frac |= 1ULL << (DECOMPOSED_BINARY_POINT - 2);
     } else {
         a.frac |= 1ULL << (DECOMPOSED_BINARY_POINT - 1);
     }
-#endif
     a.cls = float_class_qnan;
     return a;
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 15/72] softfloat: Rename FloatParts to FloatParts64
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (13 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 14/72] softfloat: Do not produce a default_nan from parts_silence_nan Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-09  8:45   ` Philippe Mathieu-Daudé
  2021-05-11 10:16   ` David Hildenbrand
  2021-05-08  1:47 ` [PATCH 16/72] softfloat: Move type-specific pack/unpack routines Richard Henderson
                   ` (61 subsequent siblings)
  76 siblings, 2 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c                | 362 ++++++++++++++++-----------------
 fpu/softfloat-specialize.c.inc |   6 +-
 2 files changed, 184 insertions(+), 184 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6589f00b23..27b51659c9 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -515,7 +515,7 @@ typedef struct {
     int32_t  exp;
     FloatClass cls;
     bool sign;
-} FloatParts;
+} FloatParts64;
 
 #define DECOMPOSED_BINARY_POINT    63
 #define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
@@ -580,11 +580,11 @@ static const FloatFmt float64_params = {
 };
 
 /* Unpack a float to parts, but do not canonicalize.  */
-static inline FloatParts unpack_raw(FloatFmt fmt, uint64_t raw)
+static inline FloatParts64 unpack_raw(FloatFmt fmt, uint64_t raw)
 {
     const int sign_pos = fmt.frac_size + fmt.exp_size;
 
-    return (FloatParts) {
+    return (FloatParts64) {
         .cls = float_class_unclassified,
         .sign = extract64(raw, sign_pos, 1),
         .exp = extract64(raw, fmt.frac_size, fmt.exp_size),
@@ -592,50 +592,50 @@ static inline FloatParts unpack_raw(FloatFmt fmt, uint64_t raw)
     };
 }
 
-static inline FloatParts float16_unpack_raw(float16 f)
+static inline FloatParts64 float16_unpack_raw(float16 f)
 {
     return unpack_raw(float16_params, f);
 }
 
-static inline FloatParts bfloat16_unpack_raw(bfloat16 f)
+static inline FloatParts64 bfloat16_unpack_raw(bfloat16 f)
 {
     return unpack_raw(bfloat16_params, f);
 }
 
-static inline FloatParts float32_unpack_raw(float32 f)
+static inline FloatParts64 float32_unpack_raw(float32 f)
 {
     return unpack_raw(float32_params, f);
 }
 
-static inline FloatParts float64_unpack_raw(float64 f)
+static inline FloatParts64 float64_unpack_raw(float64 f)
 {
     return unpack_raw(float64_params, f);
 }
 
 /* Pack a float from parts, but do not canonicalize.  */
-static inline uint64_t pack_raw(FloatFmt fmt, FloatParts p)
+static inline uint64_t pack_raw(FloatFmt fmt, FloatParts64 p)
 {
     const int sign_pos = fmt.frac_size + fmt.exp_size;
     uint64_t ret = deposit64(p.frac, fmt.frac_size, fmt.exp_size, p.exp);
     return deposit64(ret, sign_pos, 1, p.sign);
 }
 
-static inline float16 float16_pack_raw(FloatParts p)
+static inline float16 float16_pack_raw(FloatParts64 p)
 {
     return make_float16(pack_raw(float16_params, p));
 }
 
-static inline bfloat16 bfloat16_pack_raw(FloatParts p)
+static inline bfloat16 bfloat16_pack_raw(FloatParts64 p)
 {
     return pack_raw(bfloat16_params, p);
 }
 
-static inline float32 float32_pack_raw(FloatParts p)
+static inline float32 float32_pack_raw(FloatParts64 p)
 {
     return make_float32(pack_raw(float32_params, p));
 }
 
-static inline float64 float64_pack_raw(FloatParts p)
+static inline float64 float64_pack_raw(FloatParts64 p)
 {
     return make_float64(pack_raw(float64_params, p));
 }
@@ -651,7 +651,7 @@ static inline float64 float64_pack_raw(FloatParts p)
 #include "softfloat-specialize.c.inc"
 
 /* Canonicalize EXP and FRAC, setting CLS.  */
-static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm,
+static FloatParts64 sf_canonicalize(FloatParts64 part, const FloatFmt *parm,
                                   float_status *status)
 {
     if (part.exp == parm->exp_max && !parm->arm_althp) {
@@ -689,7 +689,7 @@ static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm,
  * by EXP_BIAS and must be bounded by [EXP_MAX-1, 0].
  */
 
-static FloatParts round_canonical(FloatParts p, float_status *s,
+static FloatParts64 round_canonical(FloatParts64 p, float_status *s,
                                   const FloatFmt *parm)
 {
     const uint64_t frac_lsb = parm->frac_lsb;
@@ -838,59 +838,59 @@ static FloatParts round_canonical(FloatParts p, float_status *s,
 }
 
 /* Explicit FloatFmt version */
-static FloatParts float16a_unpack_canonical(float16 f, float_status *s,
+static FloatParts64 float16a_unpack_canonical(float16 f, float_status *s,
                                             const FloatFmt *params)
 {
     return sf_canonicalize(float16_unpack_raw(f), params, s);
 }
 
-static FloatParts float16_unpack_canonical(float16 f, float_status *s)
+static FloatParts64 float16_unpack_canonical(float16 f, float_status *s)
 {
     return float16a_unpack_canonical(f, s, &float16_params);
 }
 
-static FloatParts bfloat16_unpack_canonical(bfloat16 f, float_status *s)
+static FloatParts64 bfloat16_unpack_canonical(bfloat16 f, float_status *s)
 {
     return sf_canonicalize(bfloat16_unpack_raw(f), &bfloat16_params, s);
 }
 
-static float16 float16a_round_pack_canonical(FloatParts p, float_status *s,
+static float16 float16a_round_pack_canonical(FloatParts64 p, float_status *s,
                                              const FloatFmt *params)
 {
     return float16_pack_raw(round_canonical(p, s, params));
 }
 
-static float16 float16_round_pack_canonical(FloatParts p, float_status *s)
+static float16 float16_round_pack_canonical(FloatParts64 p, float_status *s)
 {
     return float16a_round_pack_canonical(p, s, &float16_params);
 }
 
-static bfloat16 bfloat16_round_pack_canonical(FloatParts p, float_status *s)
+static bfloat16 bfloat16_round_pack_canonical(FloatParts64 p, float_status *s)
 {
     return bfloat16_pack_raw(round_canonical(p, s, &bfloat16_params));
 }
 
-static FloatParts float32_unpack_canonical(float32 f, float_status *s)
+static FloatParts64 float32_unpack_canonical(float32 f, float_status *s)
 {
     return sf_canonicalize(float32_unpack_raw(f), &float32_params, s);
 }
 
-static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
+static float32 float32_round_pack_canonical(FloatParts64 p, float_status *s)
 {
     return float32_pack_raw(round_canonical(p, s, &float32_params));
 }
 
-static FloatParts float64_unpack_canonical(float64 f, float_status *s)
+static FloatParts64 float64_unpack_canonical(float64 f, float_status *s)
 {
     return sf_canonicalize(float64_unpack_raw(f), &float64_params, s);
 }
 
-static float64 float64_round_pack_canonical(FloatParts p, float_status *s)
+static float64 float64_round_pack_canonical(FloatParts64 p, float_status *s)
 {
     return float64_pack_raw(round_canonical(p, s, &float64_params));
 }
 
-static FloatParts return_nan(FloatParts a, float_status *s)
+static FloatParts64 return_nan(FloatParts64 a, float_status *s)
 {
     g_assert(is_nan(a.cls));
     if (is_snan(a.cls)) {
@@ -904,7 +904,7 @@ static FloatParts return_nan(FloatParts a, float_status *s)
     return parts_default_nan(s);
 }
 
-static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
+static FloatParts64 pick_nan(FloatParts64 a, FloatParts64 b, float_status *s)
 {
     if (is_snan(a.cls) || is_snan(b.cls)) {
         float_raise(float_flag_invalid, s);
@@ -925,7 +925,7 @@ static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
     return a;
 }
 
-static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
+static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64 c,
                                   bool inf_zero, float_status *s)
 {
     int which;
@@ -971,7 +971,7 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
  * Arithmetic.
  */
 
-static FloatParts addsub_floats(FloatParts a, FloatParts b, bool subtract,
+static FloatParts64 addsub_floats(FloatParts64 a, FloatParts64 b, bool subtract,
                                 float_status *s)
 {
     bool a_sign = a.sign;
@@ -1062,18 +1062,18 @@ static FloatParts addsub_floats(FloatParts a, FloatParts b, bool subtract,
 
 float16 QEMU_FLATTEN float16_add(float16 a, float16 b, float_status *status)
 {
-    FloatParts pa = float16_unpack_canonical(a, status);
-    FloatParts pb = float16_unpack_canonical(b, status);
-    FloatParts pr = addsub_floats(pa, pb, false, status);
+    FloatParts64 pa = float16_unpack_canonical(a, status);
+    FloatParts64 pb = float16_unpack_canonical(b, status);
+    FloatParts64 pr = addsub_floats(pa, pb, false, status);
 
     return float16_round_pack_canonical(pr, status);
 }
 
 float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
 {
-    FloatParts pa = float16_unpack_canonical(a, status);
-    FloatParts pb = float16_unpack_canonical(b, status);
-    FloatParts pr = addsub_floats(pa, pb, true, status);
+    FloatParts64 pa = float16_unpack_canonical(a, status);
+    FloatParts64 pb = float16_unpack_canonical(b, status);
+    FloatParts64 pr = addsub_floats(pa, pb, true, status);
 
     return float16_round_pack_canonical(pr, status);
 }
@@ -1081,9 +1081,9 @@ float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
 static float32 QEMU_SOFTFLOAT_ATTR
 soft_f32_addsub(float32 a, float32 b, bool subtract, float_status *status)
 {
-    FloatParts pa = float32_unpack_canonical(a, status);
-    FloatParts pb = float32_unpack_canonical(b, status);
-    FloatParts pr = addsub_floats(pa, pb, subtract, status);
+    FloatParts64 pa = float32_unpack_canonical(a, status);
+    FloatParts64 pb = float32_unpack_canonical(b, status);
+    FloatParts64 pr = addsub_floats(pa, pb, subtract, status);
 
     return float32_round_pack_canonical(pr, status);
 }
@@ -1101,9 +1101,9 @@ static inline float32 soft_f32_sub(float32 a, float32 b, float_status *status)
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_f64_addsub(float64 a, float64 b, bool subtract, float_status *status)
 {
-    FloatParts pa = float64_unpack_canonical(a, status);
-    FloatParts pb = float64_unpack_canonical(b, status);
-    FloatParts pr = addsub_floats(pa, pb, subtract, status);
+    FloatParts64 pa = float64_unpack_canonical(a, status);
+    FloatParts64 pb = float64_unpack_canonical(b, status);
+    FloatParts64 pr = addsub_floats(pa, pb, subtract, status);
 
     return float64_round_pack_canonical(pr, status);
 }
@@ -1199,18 +1199,18 @@ float64_sub(float64 a, float64 b, float_status *s)
  */
 bfloat16 QEMU_FLATTEN bfloat16_add(bfloat16 a, bfloat16 b, float_status *status)
 {
-    FloatParts pa = bfloat16_unpack_canonical(a, status);
-    FloatParts pb = bfloat16_unpack_canonical(b, status);
-    FloatParts pr = addsub_floats(pa, pb, false, status);
+    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
+    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
+    FloatParts64 pr = addsub_floats(pa, pb, false, status);
 
     return bfloat16_round_pack_canonical(pr, status);
 }
 
 bfloat16 QEMU_FLATTEN bfloat16_sub(bfloat16 a, bfloat16 b, float_status *status)
 {
-    FloatParts pa = bfloat16_unpack_canonical(a, status);
-    FloatParts pb = bfloat16_unpack_canonical(b, status);
-    FloatParts pr = addsub_floats(pa, pb, true, status);
+    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
+    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
+    FloatParts64 pr = addsub_floats(pa, pb, true, status);
 
     return bfloat16_round_pack_canonical(pr, status);
 }
@@ -1221,7 +1221,7 @@ bfloat16 QEMU_FLATTEN bfloat16_sub(bfloat16 a, bfloat16 b, float_status *status)
  * for Binary Floating-Point Arithmetic.
  */
 
-static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s)
+static FloatParts64 mul_floats(FloatParts64 a, FloatParts64 b, float_status *s)
 {
     bool sign = a.sign ^ b.sign;
 
@@ -1267,9 +1267,9 @@ static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s)
 
 float16 QEMU_FLATTEN float16_mul(float16 a, float16 b, float_status *status)
 {
-    FloatParts pa = float16_unpack_canonical(a, status);
-    FloatParts pb = float16_unpack_canonical(b, status);
-    FloatParts pr = mul_floats(pa, pb, status);
+    FloatParts64 pa = float16_unpack_canonical(a, status);
+    FloatParts64 pb = float16_unpack_canonical(b, status);
+    FloatParts64 pr = mul_floats(pa, pb, status);
 
     return float16_round_pack_canonical(pr, status);
 }
@@ -1277,9 +1277,9 @@ float16 QEMU_FLATTEN float16_mul(float16 a, float16 b, float_status *status)
 static float32 QEMU_SOFTFLOAT_ATTR
 soft_f32_mul(float32 a, float32 b, float_status *status)
 {
-    FloatParts pa = float32_unpack_canonical(a, status);
-    FloatParts pb = float32_unpack_canonical(b, status);
-    FloatParts pr = mul_floats(pa, pb, status);
+    FloatParts64 pa = float32_unpack_canonical(a, status);
+    FloatParts64 pb = float32_unpack_canonical(b, status);
+    FloatParts64 pr = mul_floats(pa, pb, status);
 
     return float32_round_pack_canonical(pr, status);
 }
@@ -1287,9 +1287,9 @@ soft_f32_mul(float32 a, float32 b, float_status *status)
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_f64_mul(float64 a, float64 b, float_status *status)
 {
-    FloatParts pa = float64_unpack_canonical(a, status);
-    FloatParts pb = float64_unpack_canonical(b, status);
-    FloatParts pr = mul_floats(pa, pb, status);
+    FloatParts64 pa = float64_unpack_canonical(a, status);
+    FloatParts64 pb = float64_unpack_canonical(b, status);
+    FloatParts64 pr = mul_floats(pa, pb, status);
 
     return float64_round_pack_canonical(pr, status);
 }
@@ -1325,9 +1325,9 @@ float64_mul(float64 a, float64 b, float_status *s)
 
 bfloat16 QEMU_FLATTEN bfloat16_mul(bfloat16 a, bfloat16 b, float_status *status)
 {
-    FloatParts pa = bfloat16_unpack_canonical(a, status);
-    FloatParts pb = bfloat16_unpack_canonical(b, status);
-    FloatParts pr = mul_floats(pa, pb, status);
+    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
+    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
+    FloatParts64 pr = mul_floats(pa, pb, status);
 
     return bfloat16_round_pack_canonical(pr, status);
 }
@@ -1344,7 +1344,7 @@ bfloat16 QEMU_FLATTEN bfloat16_mul(bfloat16 a, bfloat16 b, float_status *status)
  * NaNs.)
  */
 
-static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
+static FloatParts64 muladd_floats(FloatParts64 a, FloatParts64 b, FloatParts64 c,
                                 int flags, float_status *s)
 {
     bool inf_zero, p_sign;
@@ -1520,10 +1520,10 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
 float16 QEMU_FLATTEN float16_muladd(float16 a, float16 b, float16 c,
                                                 int flags, float_status *status)
 {
-    FloatParts pa = float16_unpack_canonical(a, status);
-    FloatParts pb = float16_unpack_canonical(b, status);
-    FloatParts pc = float16_unpack_canonical(c, status);
-    FloatParts pr = muladd_floats(pa, pb, pc, flags, status);
+    FloatParts64 pa = float16_unpack_canonical(a, status);
+    FloatParts64 pb = float16_unpack_canonical(b, status);
+    FloatParts64 pc = float16_unpack_canonical(c, status);
+    FloatParts64 pr = muladd_floats(pa, pb, pc, flags, status);
 
     return float16_round_pack_canonical(pr, status);
 }
@@ -1532,10 +1532,10 @@ static float32 QEMU_SOFTFLOAT_ATTR
 soft_f32_muladd(float32 a, float32 b, float32 c, int flags,
                 float_status *status)
 {
-    FloatParts pa = float32_unpack_canonical(a, status);
-    FloatParts pb = float32_unpack_canonical(b, status);
-    FloatParts pc = float32_unpack_canonical(c, status);
-    FloatParts pr = muladd_floats(pa, pb, pc, flags, status);
+    FloatParts64 pa = float32_unpack_canonical(a, status);
+    FloatParts64 pb = float32_unpack_canonical(b, status);
+    FloatParts64 pc = float32_unpack_canonical(c, status);
+    FloatParts64 pr = muladd_floats(pa, pb, pc, flags, status);
 
     return float32_round_pack_canonical(pr, status);
 }
@@ -1544,10 +1544,10 @@ static float64 QEMU_SOFTFLOAT_ATTR
 soft_f64_muladd(float64 a, float64 b, float64 c, int flags,
                 float_status *status)
 {
-    FloatParts pa = float64_unpack_canonical(a, status);
-    FloatParts pb = float64_unpack_canonical(b, status);
-    FloatParts pc = float64_unpack_canonical(c, status);
-    FloatParts pr = muladd_floats(pa, pb, pc, flags, status);
+    FloatParts64 pa = float64_unpack_canonical(a, status);
+    FloatParts64 pb = float64_unpack_canonical(b, status);
+    FloatParts64 pc = float64_unpack_canonical(c, status);
+    FloatParts64 pr = muladd_floats(pa, pb, pc, flags, status);
 
     return float64_round_pack_canonical(pr, status);
 }
@@ -1705,10 +1705,10 @@ float64_muladd(float64 xa, float64 xb, float64 xc, int flags, float_status *s)
 bfloat16 QEMU_FLATTEN bfloat16_muladd(bfloat16 a, bfloat16 b, bfloat16 c,
                                       int flags, float_status *status)
 {
-    FloatParts pa = bfloat16_unpack_canonical(a, status);
-    FloatParts pb = bfloat16_unpack_canonical(b, status);
-    FloatParts pc = bfloat16_unpack_canonical(c, status);
-    FloatParts pr = muladd_floats(pa, pb, pc, flags, status);
+    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
+    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
+    FloatParts64 pc = bfloat16_unpack_canonical(c, status);
+    FloatParts64 pr = muladd_floats(pa, pb, pc, flags, status);
 
     return bfloat16_round_pack_canonical(pr, status);
 }
@@ -1719,7 +1719,7 @@ bfloat16 QEMU_FLATTEN bfloat16_muladd(bfloat16 a, bfloat16 b, bfloat16 c,
  * the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
  */
 
-static FloatParts div_floats(FloatParts a, FloatParts b, float_status *s)
+static FloatParts64 div_floats(FloatParts64 a, FloatParts64 b, float_status *s)
 {
     bool sign = a.sign ^ b.sign;
 
@@ -1786,9 +1786,9 @@ static FloatParts div_floats(FloatParts a, FloatParts b, float_status *s)
 
 float16 float16_div(float16 a, float16 b, float_status *status)
 {
-    FloatParts pa = float16_unpack_canonical(a, status);
-    FloatParts pb = float16_unpack_canonical(b, status);
-    FloatParts pr = div_floats(pa, pb, status);
+    FloatParts64 pa = float16_unpack_canonical(a, status);
+    FloatParts64 pb = float16_unpack_canonical(b, status);
+    FloatParts64 pr = div_floats(pa, pb, status);
 
     return float16_round_pack_canonical(pr, status);
 }
@@ -1796,9 +1796,9 @@ float16 float16_div(float16 a, float16 b, float_status *status)
 static float32 QEMU_SOFTFLOAT_ATTR
 soft_f32_div(float32 a, float32 b, float_status *status)
 {
-    FloatParts pa = float32_unpack_canonical(a, status);
-    FloatParts pb = float32_unpack_canonical(b, status);
-    FloatParts pr = div_floats(pa, pb, status);
+    FloatParts64 pa = float32_unpack_canonical(a, status);
+    FloatParts64 pb = float32_unpack_canonical(b, status);
+    FloatParts64 pr = div_floats(pa, pb, status);
 
     return float32_round_pack_canonical(pr, status);
 }
@@ -1806,9 +1806,9 @@ soft_f32_div(float32 a, float32 b, float_status *status)
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_f64_div(float64 a, float64 b, float_status *status)
 {
-    FloatParts pa = float64_unpack_canonical(a, status);
-    FloatParts pb = float64_unpack_canonical(b, status);
-    FloatParts pr = div_floats(pa, pb, status);
+    FloatParts64 pa = float64_unpack_canonical(a, status);
+    FloatParts64 pb = float64_unpack_canonical(b, status);
+    FloatParts64 pr = div_floats(pa, pb, status);
 
     return float64_round_pack_canonical(pr, status);
 }
@@ -1878,9 +1878,9 @@ float64_div(float64 a, float64 b, float_status *s)
 
 bfloat16 bfloat16_div(bfloat16 a, bfloat16 b, float_status *status)
 {
-    FloatParts pa = bfloat16_unpack_canonical(a, status);
-    FloatParts pb = bfloat16_unpack_canonical(b, status);
-    FloatParts pr = div_floats(pa, pb, status);
+    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
+    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
+    FloatParts64 pr = div_floats(pa, pb, status);
 
     return bfloat16_round_pack_canonical(pr, status);
 }
@@ -1896,7 +1896,7 @@ bfloat16 bfloat16_div(bfloat16 a, bfloat16 b, float_status *status)
  * invalid exceptions and handling the conversion on NaNs.
  */
 
-static FloatParts float_to_float(FloatParts a, const FloatFmt *dstf,
+static FloatParts64 float_to_float(FloatParts64 a, const FloatFmt *dstf,
                                  float_status *s)
 {
     if (dstf->arm_althp) {
@@ -1934,32 +1934,32 @@ static FloatParts float_to_float(FloatParts a, const FloatFmt *dstf,
 float32 float16_to_float32(float16 a, bool ieee, float_status *s)
 {
     const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
-    FloatParts p = float16a_unpack_canonical(a, s, fmt16);
-    FloatParts pr = float_to_float(p, &float32_params, s);
+    FloatParts64 p = float16a_unpack_canonical(a, s, fmt16);
+    FloatParts64 pr = float_to_float(p, &float32_params, s);
     return float32_round_pack_canonical(pr, s);
 }
 
 float64 float16_to_float64(float16 a, bool ieee, float_status *s)
 {
     const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
-    FloatParts p = float16a_unpack_canonical(a, s, fmt16);
-    FloatParts pr = float_to_float(p, &float64_params, s);
+    FloatParts64 p = float16a_unpack_canonical(a, s, fmt16);
+    FloatParts64 pr = float_to_float(p, &float64_params, s);
     return float64_round_pack_canonical(pr, s);
 }
 
 float16 float32_to_float16(float32 a, bool ieee, float_status *s)
 {
     const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
-    FloatParts p = float32_unpack_canonical(a, s);
-    FloatParts pr = float_to_float(p, fmt16, s);
+    FloatParts64 p = float32_unpack_canonical(a, s);
+    FloatParts64 pr = float_to_float(p, fmt16, s);
     return float16a_round_pack_canonical(pr, s, fmt16);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_float32_to_float64(float32 a, float_status *s)
 {
-    FloatParts p = float32_unpack_canonical(a, s);
-    FloatParts pr = float_to_float(p, &float64_params, s);
+    FloatParts64 p = float32_unpack_canonical(a, s);
+    FloatParts64 pr = float_to_float(p, &float64_params, s);
     return float64_round_pack_canonical(pr, s);
 }
 
@@ -1982,43 +1982,43 @@ float64 float32_to_float64(float32 a, float_status *s)
 float16 float64_to_float16(float64 a, bool ieee, float_status *s)
 {
     const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
-    FloatParts p = float64_unpack_canonical(a, s);
-    FloatParts pr = float_to_float(p, fmt16, s);
+    FloatParts64 p = float64_unpack_canonical(a, s);
+    FloatParts64 pr = float_to_float(p, fmt16, s);
     return float16a_round_pack_canonical(pr, s, fmt16);
 }
 
 float32 float64_to_float32(float64 a, float_status *s)
 {
-    FloatParts p = float64_unpack_canonical(a, s);
-    FloatParts pr = float_to_float(p, &float32_params, s);
+    FloatParts64 p = float64_unpack_canonical(a, s);
+    FloatParts64 pr = float_to_float(p, &float32_params, s);
     return float32_round_pack_canonical(pr, s);
 }
 
 float32 bfloat16_to_float32(bfloat16 a, float_status *s)
 {
-    FloatParts p = bfloat16_unpack_canonical(a, s);
-    FloatParts pr = float_to_float(p, &float32_params, s);
+    FloatParts64 p = bfloat16_unpack_canonical(a, s);
+    FloatParts64 pr = float_to_float(p, &float32_params, s);
     return float32_round_pack_canonical(pr, s);
 }
 
 float64 bfloat16_to_float64(bfloat16 a, float_status *s)
 {
-    FloatParts p = bfloat16_unpack_canonical(a, s);
-    FloatParts pr = float_to_float(p, &float64_params, s);
+    FloatParts64 p = bfloat16_unpack_canonical(a, s);
+    FloatParts64 pr = float_to_float(p, &float64_params, s);
     return float64_round_pack_canonical(pr, s);
 }
 
 bfloat16 float32_to_bfloat16(float32 a, float_status *s)
 {
-    FloatParts p = float32_unpack_canonical(a, s);
-    FloatParts pr = float_to_float(p, &bfloat16_params, s);
+    FloatParts64 p = float32_unpack_canonical(a, s);
+    FloatParts64 pr = float_to_float(p, &bfloat16_params, s);
     return bfloat16_round_pack_canonical(pr, s);
 }
 
 bfloat16 float64_to_bfloat16(float64 a, float_status *s)
 {
-    FloatParts p = float64_unpack_canonical(a, s);
-    FloatParts pr = float_to_float(p, &bfloat16_params, s);
+    FloatParts64 p = float64_unpack_canonical(a, s);
+    FloatParts64 pr = float_to_float(p, &bfloat16_params, s);
     return bfloat16_round_pack_canonical(pr, s);
 }
 
@@ -2029,7 +2029,7 @@ bfloat16 float64_to_bfloat16(float64 a, float_status *s)
  * Arithmetic.
  */
 
-static FloatParts round_to_int(FloatParts a, FloatRoundMode rmode,
+static FloatParts64 round_to_int(FloatParts64 a, FloatRoundMode rmode,
                                int scale, float_status *s)
 {
     switch (a.cls) {
@@ -2132,22 +2132,22 @@ static FloatParts round_to_int(FloatParts a, FloatRoundMode rmode,
 
 float16 float16_round_to_int(float16 a, float_status *s)
 {
-    FloatParts pa = float16_unpack_canonical(a, s);
-    FloatParts pr = round_to_int(pa, s->float_rounding_mode, 0, s);
+    FloatParts64 pa = float16_unpack_canonical(a, s);
+    FloatParts64 pr = round_to_int(pa, s->float_rounding_mode, 0, s);
     return float16_round_pack_canonical(pr, s);
 }
 
 float32 float32_round_to_int(float32 a, float_status *s)
 {
-    FloatParts pa = float32_unpack_canonical(a, s);
-    FloatParts pr = round_to_int(pa, s->float_rounding_mode, 0, s);
+    FloatParts64 pa = float32_unpack_canonical(a, s);
+    FloatParts64 pr = round_to_int(pa, s->float_rounding_mode, 0, s);
     return float32_round_pack_canonical(pr, s);
 }
 
 float64 float64_round_to_int(float64 a, float_status *s)
 {
-    FloatParts pa = float64_unpack_canonical(a, s);
-    FloatParts pr = round_to_int(pa, s->float_rounding_mode, 0, s);
+    FloatParts64 pa = float64_unpack_canonical(a, s);
+    FloatParts64 pr = round_to_int(pa, s->float_rounding_mode, 0, s);
     return float64_round_pack_canonical(pr, s);
 }
 
@@ -2158,8 +2158,8 @@ float64 float64_round_to_int(float64 a, float_status *s)
 
 bfloat16 bfloat16_round_to_int(bfloat16 a, float_status *s)
 {
-    FloatParts pa = bfloat16_unpack_canonical(a, s);
-    FloatParts pr = round_to_int(pa, s->float_rounding_mode, 0, s);
+    FloatParts64 pa = bfloat16_unpack_canonical(a, s);
+    FloatParts64 pr = round_to_int(pa, s->float_rounding_mode, 0, s);
     return bfloat16_round_pack_canonical(pr, s);
 }
 
@@ -2174,13 +2174,13 @@ bfloat16 bfloat16_round_to_int(bfloat16 a, float_status *s)
  * is returned.
 */
 
-static int64_t round_to_int_and_pack(FloatParts in, FloatRoundMode rmode,
+static int64_t round_to_int_and_pack(FloatParts64 in, FloatRoundMode rmode,
                                      int scale, int64_t min, int64_t max,
                                      float_status *s)
 {
     uint64_t r;
     int orig_flags = get_float_exception_flags(s);
-    FloatParts p = round_to_int(in, rmode, scale, s);
+    FloatParts64 p = round_to_int(in, rmode, scale, s);
 
     switch (p.cls) {
     case float_class_snan:
@@ -2452,12 +2452,12 @@ int64_t bfloat16_to_int64_round_to_zero(bfloat16 a, float_status *s)
  *  flag.
  */
 
-static uint64_t round_to_uint_and_pack(FloatParts in, FloatRoundMode rmode,
+static uint64_t round_to_uint_and_pack(FloatParts64 in, FloatRoundMode rmode,
                                        int scale, uint64_t max,
                                        float_status *s)
 {
     int orig_flags = get_float_exception_flags(s);
-    FloatParts p = round_to_int(in, rmode, scale, s);
+    FloatParts64 p = round_to_int(in, rmode, scale, s);
     uint64_t r;
 
     switch (p.cls) {
@@ -2726,9 +2726,9 @@ uint64_t bfloat16_to_uint64_round_to_zero(bfloat16 a, float_status *s)
  * to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
  */
 
-static FloatParts int_to_float(int64_t a, int scale, float_status *status)
+static FloatParts64 int_to_float(int64_t a, int scale, float_status *status)
 {
-    FloatParts r = { .sign = false };
+    FloatParts64 r = { .sign = false };
 
     if (a == 0) {
         r.cls = float_class_zero;
@@ -2753,7 +2753,7 @@ static FloatParts int_to_float(int64_t a, int scale, float_status *status)
 
 float16 int64_to_float16_scalbn(int64_t a, int scale, float_status *status)
 {
-    FloatParts pa = int_to_float(a, scale, status);
+    FloatParts64 pa = int_to_float(a, scale, status);
     return float16_round_pack_canonical(pa, status);
 }
 
@@ -2789,7 +2789,7 @@ float16 int8_to_float16(int8_t a, float_status *status)
 
 float32 int64_to_float32_scalbn(int64_t a, int scale, float_status *status)
 {
-    FloatParts pa = int_to_float(a, scale, status);
+    FloatParts64 pa = int_to_float(a, scale, status);
     return float32_round_pack_canonical(pa, status);
 }
 
@@ -2820,7 +2820,7 @@ float32 int16_to_float32(int16_t a, float_status *status)
 
 float64 int64_to_float64_scalbn(int64_t a, int scale, float_status *status)
 {
-    FloatParts pa = int_to_float(a, scale, status);
+    FloatParts64 pa = int_to_float(a, scale, status);
     return float64_round_pack_canonical(pa, status);
 }
 
@@ -2856,7 +2856,7 @@ float64 int16_to_float64(int16_t a, float_status *status)
 
 bfloat16 int64_to_bfloat16_scalbn(int64_t a, int scale, float_status *status)
 {
-    FloatParts pa = int_to_float(a, scale, status);
+    FloatParts64 pa = int_to_float(a, scale, status);
     return bfloat16_round_pack_canonical(pa, status);
 }
 
@@ -2893,9 +2893,9 @@ bfloat16 int16_to_bfloat16(int16_t a, float_status *status)
  * IEC/IEEE Standard for Binary Floating-Point Arithmetic.
  */
 
-static FloatParts uint_to_float(uint64_t a, int scale, float_status *status)
+static FloatParts64 uint_to_float(uint64_t a, int scale, float_status *status)
 {
-    FloatParts r = { .sign = false };
+    FloatParts64 r = { .sign = false };
     int shift;
 
     if (a == 0) {
@@ -2913,7 +2913,7 @@ static FloatParts uint_to_float(uint64_t a, int scale, float_status *status)
 
 float16 uint64_to_float16_scalbn(uint64_t a, int scale, float_status *status)
 {
-    FloatParts pa = uint_to_float(a, scale, status);
+    FloatParts64 pa = uint_to_float(a, scale, status);
     return float16_round_pack_canonical(pa, status);
 }
 
@@ -2949,7 +2949,7 @@ float16 uint8_to_float16(uint8_t a, float_status *status)
 
 float32 uint64_to_float32_scalbn(uint64_t a, int scale, float_status *status)
 {
-    FloatParts pa = uint_to_float(a, scale, status);
+    FloatParts64 pa = uint_to_float(a, scale, status);
     return float32_round_pack_canonical(pa, status);
 }
 
@@ -2980,7 +2980,7 @@ float32 uint16_to_float32(uint16_t a, float_status *status)
 
 float64 uint64_to_float64_scalbn(uint64_t a, int scale, float_status *status)
 {
-    FloatParts pa = uint_to_float(a, scale, status);
+    FloatParts64 pa = uint_to_float(a, scale, status);
     return float64_round_pack_canonical(pa, status);
 }
 
@@ -3016,7 +3016,7 @@ float64 uint16_to_float64(uint16_t a, float_status *status)
 
 bfloat16 uint64_to_bfloat16_scalbn(uint64_t a, int scale, float_status *status)
 {
-    FloatParts pa = uint_to_float(a, scale, status);
+    FloatParts64 pa = uint_to_float(a, scale, status);
     return bfloat16_round_pack_canonical(pa, status);
 }
 
@@ -3061,7 +3061,7 @@ bfloat16 uint16_to_bfloat16(uint16_t a, float_status *status)
  * minnummag() and maxnummag() functions correspond to minNumMag()
  * and minNumMag() from the IEEE-754 2008.
  */
-static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
+static FloatParts64 minmax_floats(FloatParts64 a, FloatParts64 b, bool ismin,
                                 bool ieee, bool ismag, float_status *s)
 {
     if (unlikely(is_nan(a.cls) || is_nan(b.cls))) {
@@ -3136,9 +3136,9 @@ static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
 float ## sz float ## sz ## _ ## name(float ## sz a, float ## sz b,      \
                                      float_status *s)                   \
 {                                                                       \
-    FloatParts pa = float ## sz ## _unpack_canonical(a, s);             \
-    FloatParts pb = float ## sz ## _unpack_canonical(b, s);             \
-    FloatParts pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);      \
+    FloatParts64 pa = float ## sz ## _unpack_canonical(a, s);             \
+    FloatParts64 pb = float ## sz ## _unpack_canonical(b, s);             \
+    FloatParts64 pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);      \
                                                                         \
     return float ## sz ## _round_pack_canonical(pr, s);                 \
 }
@@ -3169,9 +3169,9 @@ MINMAX(64, maxnummag, false, true, true)
 #define BF16_MINMAX(name, ismin, isiee, ismag)                          \
 bfloat16 bfloat16_ ## name(bfloat16 a, bfloat16 b, float_status *s)     \
 {                                                                       \
-    FloatParts pa = bfloat16_unpack_canonical(a, s);                    \
-    FloatParts pb = bfloat16_unpack_canonical(b, s);                    \
-    FloatParts pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);      \
+    FloatParts64 pa = bfloat16_unpack_canonical(a, s);                    \
+    FloatParts64 pb = bfloat16_unpack_canonical(b, s);                    \
+    FloatParts64 pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);      \
                                                                         \
     return bfloat16_round_pack_canonical(pr, s);                        \
 }
@@ -3186,7 +3186,7 @@ BF16_MINMAX(maxnummag, false, true, true)
 #undef BF16_MINMAX
 
 /* Floating point compare */
-static FloatRelation compare_floats(FloatParts a, FloatParts b, bool is_quiet,
+static FloatRelation compare_floats(FloatParts64 a, FloatParts64 b, bool is_quiet,
                                     float_status *s)
 {
     if (is_nan(a.cls) || is_nan(b.cls)) {
@@ -3247,8 +3247,8 @@ static FloatRelation compare_floats(FloatParts a, FloatParts b, bool is_quiet,
 static int attr                                                         \
 name(float ## sz a, float ## sz b, bool is_quiet, float_status *s)      \
 {                                                                       \
-    FloatParts pa = float ## sz ## _unpack_canonical(a, s);             \
-    FloatParts pb = float ## sz ## _unpack_canonical(b, s);             \
+    FloatParts64 pa = float ## sz ## _unpack_canonical(a, s);             \
+    FloatParts64 pb = float ## sz ## _unpack_canonical(b, s);             \
     return compare_floats(pa, pb, is_quiet, s);                         \
 }
 
@@ -3349,8 +3349,8 @@ FloatRelation float64_compare_quiet(float64 a, float64 b, float_status *s)
 static FloatRelation QEMU_FLATTEN
 soft_bf16_compare(bfloat16 a, bfloat16 b, bool is_quiet, float_status *s)
 {
-    FloatParts pa = bfloat16_unpack_canonical(a, s);
-    FloatParts pb = bfloat16_unpack_canonical(b, s);
+    FloatParts64 pa = bfloat16_unpack_canonical(a, s);
+    FloatParts64 pb = bfloat16_unpack_canonical(b, s);
     return compare_floats(pa, pb, is_quiet, s);
 }
 
@@ -3365,16 +3365,16 @@ FloatRelation bfloat16_compare_quiet(bfloat16 a, bfloat16 b, float_status *s)
 }
 
 /* Multiply A by 2 raised to the power N.  */
-static FloatParts scalbn_decomposed(FloatParts a, int n, float_status *s)
+static FloatParts64 scalbn_decomposed(FloatParts64 a, int n, float_status *s)
 {
     if (unlikely(is_nan(a.cls))) {
         return return_nan(a, s);
     }
     if (a.cls == float_class_normal) {
-        /* The largest float type (even though not supported by FloatParts)
+        /* The largest float type (even though not supported by FloatParts64)
          * is float128, which has a 15 bit exponent.  Bounding N to 16 bits
          * still allows rounding to infinity, without allowing overflow
-         * within the int32_t that backs FloatParts.exp.
+         * within the int32_t that backs FloatParts64.exp.
          */
         n = MIN(MAX(n, -0x10000), 0x10000);
         a.exp += n;
@@ -3384,29 +3384,29 @@ static FloatParts scalbn_decomposed(FloatParts a, int n, float_status *s)
 
 float16 float16_scalbn(float16 a, int n, float_status *status)
 {
-    FloatParts pa = float16_unpack_canonical(a, status);
-    FloatParts pr = scalbn_decomposed(pa, n, status);
+    FloatParts64 pa = float16_unpack_canonical(a, status);
+    FloatParts64 pr = scalbn_decomposed(pa, n, status);
     return float16_round_pack_canonical(pr, status);
 }
 
 float32 float32_scalbn(float32 a, int n, float_status *status)
 {
-    FloatParts pa = float32_unpack_canonical(a, status);
-    FloatParts pr = scalbn_decomposed(pa, n, status);
+    FloatParts64 pa = float32_unpack_canonical(a, status);
+    FloatParts64 pr = scalbn_decomposed(pa, n, status);
     return float32_round_pack_canonical(pr, status);
 }
 
 float64 float64_scalbn(float64 a, int n, float_status *status)
 {
-    FloatParts pa = float64_unpack_canonical(a, status);
-    FloatParts pr = scalbn_decomposed(pa, n, status);
+    FloatParts64 pa = float64_unpack_canonical(a, status);
+    FloatParts64 pr = scalbn_decomposed(pa, n, status);
     return float64_round_pack_canonical(pr, status);
 }
 
 bfloat16 bfloat16_scalbn(bfloat16 a, int n, float_status *status)
 {
-    FloatParts pa = bfloat16_unpack_canonical(a, status);
-    FloatParts pr = scalbn_decomposed(pa, n, status);
+    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
+    FloatParts64 pr = scalbn_decomposed(pa, n, status);
     return bfloat16_round_pack_canonical(pr, status);
 }
 
@@ -3422,7 +3422,7 @@ bfloat16 bfloat16_scalbn(bfloat16 a, int n, float_status *status)
  * especially for 64 bit floats.
  */
 
-static FloatParts sqrt_float(FloatParts a, float_status *s, const FloatFmt *p)
+static FloatParts64 sqrt_float(FloatParts64 a, float_status *s, const FloatFmt *p)
 {
     uint64_t a_frac, r_frac, s_frac;
     int bit, last_bit;
@@ -3482,24 +3482,24 @@ static FloatParts sqrt_float(FloatParts a, float_status *s, const FloatFmt *p)
 
 float16 QEMU_FLATTEN float16_sqrt(float16 a, float_status *status)
 {
-    FloatParts pa = float16_unpack_canonical(a, status);
-    FloatParts pr = sqrt_float(pa, status, &float16_params);
+    FloatParts64 pa = float16_unpack_canonical(a, status);
+    FloatParts64 pr = sqrt_float(pa, status, &float16_params);
     return float16_round_pack_canonical(pr, status);
 }
 
 static float32 QEMU_SOFTFLOAT_ATTR
 soft_f32_sqrt(float32 a, float_status *status)
 {
-    FloatParts pa = float32_unpack_canonical(a, status);
-    FloatParts pr = sqrt_float(pa, status, &float32_params);
+    FloatParts64 pa = float32_unpack_canonical(a, status);
+    FloatParts64 pr = sqrt_float(pa, status, &float32_params);
     return float32_round_pack_canonical(pr, status);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_f64_sqrt(float64 a, float_status *status)
 {
-    FloatParts pa = float64_unpack_canonical(a, status);
-    FloatParts pr = sqrt_float(pa, status, &float64_params);
+    FloatParts64 pa = float64_unpack_canonical(a, status);
+    FloatParts64 pr = sqrt_float(pa, status, &float64_params);
     return float64_round_pack_canonical(pr, status);
 }
 
@@ -3559,8 +3559,8 @@ float64 QEMU_FLATTEN float64_sqrt(float64 xa, float_status *s)
 
 bfloat16 QEMU_FLATTEN bfloat16_sqrt(bfloat16 a, float_status *status)
 {
-    FloatParts pa = bfloat16_unpack_canonical(a, status);
-    FloatParts pr = sqrt_float(pa, status, &bfloat16_params);
+    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
+    FloatParts64 pr = sqrt_float(pa, status, &bfloat16_params);
     return bfloat16_round_pack_canonical(pr, status);
 }
 
@@ -3570,28 +3570,28 @@ bfloat16 QEMU_FLATTEN bfloat16_sqrt(bfloat16 a, float_status *status)
 
 float16 float16_default_nan(float_status *status)
 {
-    FloatParts p = parts_default_nan(status);
+    FloatParts64 p = parts_default_nan(status);
     p.frac >>= float16_params.frac_shift;
     return float16_pack_raw(p);
 }
 
 float32 float32_default_nan(float_status *status)
 {
-    FloatParts p = parts_default_nan(status);
+    FloatParts64 p = parts_default_nan(status);
     p.frac >>= float32_params.frac_shift;
     return float32_pack_raw(p);
 }
 
 float64 float64_default_nan(float_status *status)
 {
-    FloatParts p = parts_default_nan(status);
+    FloatParts64 p = parts_default_nan(status);
     p.frac >>= float64_params.frac_shift;
     return float64_pack_raw(p);
 }
 
 float128 float128_default_nan(float_status *status)
 {
-    FloatParts p = parts_default_nan(status);
+    FloatParts64 p = parts_default_nan(status);
     float128 r;
 
     /* Extrapolate from the choices made by parts_default_nan to fill
@@ -3608,7 +3608,7 @@ float128 float128_default_nan(float_status *status)
 
 bfloat16 bfloat16_default_nan(float_status *status)
 {
-    FloatParts p = parts_default_nan(status);
+    FloatParts64 p = parts_default_nan(status);
     p.frac >>= bfloat16_params.frac_shift;
     return bfloat16_pack_raw(p);
 }
@@ -3619,7 +3619,7 @@ bfloat16 bfloat16_default_nan(float_status *status)
 
 float16 float16_silence_nan(float16 a, float_status *status)
 {
-    FloatParts p = float16_unpack_raw(a);
+    FloatParts64 p = float16_unpack_raw(a);
     p.frac <<= float16_params.frac_shift;
     p = parts_silence_nan(p, status);
     p.frac >>= float16_params.frac_shift;
@@ -3628,7 +3628,7 @@ float16 float16_silence_nan(float16 a, float_status *status)
 
 float32 float32_silence_nan(float32 a, float_status *status)
 {
-    FloatParts p = float32_unpack_raw(a);
+    FloatParts64 p = float32_unpack_raw(a);
     p.frac <<= float32_params.frac_shift;
     p = parts_silence_nan(p, status);
     p.frac >>= float32_params.frac_shift;
@@ -3637,7 +3637,7 @@ float32 float32_silence_nan(float32 a, float_status *status)
 
 float64 float64_silence_nan(float64 a, float_status *status)
 {
-    FloatParts p = float64_unpack_raw(a);
+    FloatParts64 p = float64_unpack_raw(a);
     p.frac <<= float64_params.frac_shift;
     p = parts_silence_nan(p, status);
     p.frac >>= float64_params.frac_shift;
@@ -3646,7 +3646,7 @@ float64 float64_silence_nan(float64 a, float_status *status)
 
 bfloat16 bfloat16_silence_nan(bfloat16 a, float_status *status)
 {
-    FloatParts p = bfloat16_unpack_raw(a);
+    FloatParts64 p = bfloat16_unpack_raw(a);
     p.frac <<= bfloat16_params.frac_shift;
     p = parts_silence_nan(p, status);
     p.frac >>= bfloat16_params.frac_shift;
@@ -3658,7 +3658,7 @@ bfloat16 bfloat16_silence_nan(bfloat16 a, float_status *status)
 | input-denormal exception and return zero. Otherwise just return the value.
 *----------------------------------------------------------------------------*/
 
-static bool parts_squash_denormal(FloatParts p, float_status *status)
+static bool parts_squash_denormal(FloatParts64 p, float_status *status)
 {
     if (p.exp == 0 && p.frac != 0) {
         float_raise(float_flag_input_denormal, status);
@@ -3671,7 +3671,7 @@ static bool parts_squash_denormal(FloatParts p, float_status *status)
 float16 float16_squash_input_denormal(float16 a, float_status *status)
 {
     if (status->flush_inputs_to_zero) {
-        FloatParts p = float16_unpack_raw(a);
+        FloatParts64 p = float16_unpack_raw(a);
         if (parts_squash_denormal(p, status)) {
             return float16_set_sign(float16_zero, p.sign);
         }
@@ -3682,7 +3682,7 @@ float16 float16_squash_input_denormal(float16 a, float_status *status)
 float32 float32_squash_input_denormal(float32 a, float_status *status)
 {
     if (status->flush_inputs_to_zero) {
-        FloatParts p = float32_unpack_raw(a);
+        FloatParts64 p = float32_unpack_raw(a);
         if (parts_squash_denormal(p, status)) {
             return float32_set_sign(float32_zero, p.sign);
         }
@@ -3693,7 +3693,7 @@ float32 float32_squash_input_denormal(float32 a, float_status *status)
 float64 float64_squash_input_denormal(float64 a, float_status *status)
 {
     if (status->flush_inputs_to_zero) {
-        FloatParts p = float64_unpack_raw(a);
+        FloatParts64 p = float64_unpack_raw(a);
         if (parts_squash_denormal(p, status)) {
             return float64_set_sign(float64_zero, p.sign);
         }
@@ -3704,7 +3704,7 @@ float64 float64_squash_input_denormal(float64 a, float_status *status)
 bfloat16 bfloat16_squash_input_denormal(bfloat16 a, float_status *status)
 {
     if (status->flush_inputs_to_zero) {
-        FloatParts p = bfloat16_unpack_raw(a);
+        FloatParts64 p = bfloat16_unpack_raw(a);
         if (parts_squash_denormal(p, status)) {
             return bfloat16_set_sign(bfloat16_zero, p.sign);
         }
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index 5988830c16..52fc76d800 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -129,7 +129,7 @@ static bool parts_is_snan_frac(uint64_t frac, float_status *status)
 | The pattern for a default generated deconstructed floating-point NaN.
 *----------------------------------------------------------------------------*/
 
-static FloatParts parts_default_nan(float_status *status)
+static FloatParts64 parts_default_nan(float_status *status)
 {
     bool sign = 0;
     uint64_t frac;
@@ -164,7 +164,7 @@ static FloatParts parts_default_nan(float_status *status)
     }
 #endif
 
-    return (FloatParts) {
+    return (FloatParts64) {
         .cls = float_class_qnan,
         .sign = sign,
         .exp = INT_MAX,
@@ -177,7 +177,7 @@ static FloatParts parts_default_nan(float_status *status)
 | floating-point parts.
 *----------------------------------------------------------------------------*/
 
-static FloatParts parts_silence_nan(FloatParts a, float_status *status)
+static FloatParts64 parts_silence_nan(FloatParts64 a, float_status *status)
 {
     g_assert(!no_signaling_nans(status));
     g_assert(!status->default_nan_mode);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 16/72] softfloat: Move type-specific pack/unpack routines
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (14 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 15/72] softfloat: Rename FloatParts to FloatParts64 Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-09  8:46   ` Philippe Mathieu-Daudé
  2021-05-11 10:17   ` David Hildenbrand
  2021-05-08  1:47 ` [PATCH 17/72] softfloat: Use pointers with parts_default_nan Richard Henderson
                   ` (60 subsequent siblings)
  76 siblings, 2 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

In preparation from moving sf_canonicalize.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 109 +++++++++++++++++++++++++-----------------------
 1 file changed, 56 insertions(+), 53 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 27b51659c9..398a068b58 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -837,59 +837,6 @@ static FloatParts64 round_canonical(FloatParts64 p, float_status *s,
     return p;
 }
 
-/* Explicit FloatFmt version */
-static FloatParts64 float16a_unpack_canonical(float16 f, float_status *s,
-                                            const FloatFmt *params)
-{
-    return sf_canonicalize(float16_unpack_raw(f), params, s);
-}
-
-static FloatParts64 float16_unpack_canonical(float16 f, float_status *s)
-{
-    return float16a_unpack_canonical(f, s, &float16_params);
-}
-
-static FloatParts64 bfloat16_unpack_canonical(bfloat16 f, float_status *s)
-{
-    return sf_canonicalize(bfloat16_unpack_raw(f), &bfloat16_params, s);
-}
-
-static float16 float16a_round_pack_canonical(FloatParts64 p, float_status *s,
-                                             const FloatFmt *params)
-{
-    return float16_pack_raw(round_canonical(p, s, params));
-}
-
-static float16 float16_round_pack_canonical(FloatParts64 p, float_status *s)
-{
-    return float16a_round_pack_canonical(p, s, &float16_params);
-}
-
-static bfloat16 bfloat16_round_pack_canonical(FloatParts64 p, float_status *s)
-{
-    return bfloat16_pack_raw(round_canonical(p, s, &bfloat16_params));
-}
-
-static FloatParts64 float32_unpack_canonical(float32 f, float_status *s)
-{
-    return sf_canonicalize(float32_unpack_raw(f), &float32_params, s);
-}
-
-static float32 float32_round_pack_canonical(FloatParts64 p, float_status *s)
-{
-    return float32_pack_raw(round_canonical(p, s, &float32_params));
-}
-
-static FloatParts64 float64_unpack_canonical(float64 f, float_status *s)
-{
-    return sf_canonicalize(float64_unpack_raw(f), &float64_params, s);
-}
-
-static float64 float64_round_pack_canonical(FloatParts64 p, float_status *s)
-{
-    return float64_pack_raw(round_canonical(p, s, &float64_params));
-}
-
 static FloatParts64 return_nan(FloatParts64 a, float_status *s)
 {
     g_assert(is_nan(a.cls));
@@ -964,6 +911,62 @@ static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64
     return a;
 }
 
+/*
+ * Pack/unpack routines with a specific FloatFmt.
+ */
+
+static FloatParts64 float16a_unpack_canonical(float16 f, float_status *s,
+                                            const FloatFmt *params)
+{
+    return sf_canonicalize(float16_unpack_raw(f), params, s);
+}
+
+static FloatParts64 float16_unpack_canonical(float16 f, float_status *s)
+{
+    return float16a_unpack_canonical(f, s, &float16_params);
+}
+
+static FloatParts64 bfloat16_unpack_canonical(bfloat16 f, float_status *s)
+{
+    return sf_canonicalize(bfloat16_unpack_raw(f), &bfloat16_params, s);
+}
+
+static float16 float16a_round_pack_canonical(FloatParts64 p, float_status *s,
+                                             const FloatFmt *params)
+{
+    return float16_pack_raw(round_canonical(p, s, params));
+}
+
+static float16 float16_round_pack_canonical(FloatParts64 p, float_status *s)
+{
+    return float16a_round_pack_canonical(p, s, &float16_params);
+}
+
+static bfloat16 bfloat16_round_pack_canonical(FloatParts64 p, float_status *s)
+{
+    return bfloat16_pack_raw(round_canonical(p, s, &bfloat16_params));
+}
+
+static FloatParts64 float32_unpack_canonical(float32 f, float_status *s)
+{
+    return sf_canonicalize(float32_unpack_raw(f), &float32_params, s);
+}
+
+static float32 float32_round_pack_canonical(FloatParts64 p, float_status *s)
+{
+    return float32_pack_raw(round_canonical(p, s, &float32_params));
+}
+
+static FloatParts64 float64_unpack_canonical(float64 f, float_status *s)
+{
+    return sf_canonicalize(float64_unpack_raw(f), &float64_params, s);
+}
+
+static float64 float64_round_pack_canonical(FloatParts64 p, float_status *s)
+{
+    return float64_pack_raw(round_canonical(p, s, &float64_params));
+}
+
 /*
  * Returns the result of adding or subtracting the values of the
  * floating-point values `a' and `b'. The operation is performed
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 17/72] softfloat: Use pointers with parts_default_nan
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (15 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 16/72] softfloat: Move type-specific pack/unpack routines Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-11 10:22   ` David Hildenbrand
  2021-05-08  1:47 ` [PATCH 18/72] softfloat: Use pointers with unpack_raw Richard Henderson
                   ` (59 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

At the same time, rename to parts64_default_nan and define
a macro for parts_default_nan using QEMU_GENERIC.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c                | 47 +++++++++++++++++++++++-----------
 fpu/softfloat-specialize.c.inc |  4 +--
 2 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 398a068b58..c7f95961cf 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -650,6 +650,8 @@ static inline float64 float64_pack_raw(FloatParts64 p)
 *----------------------------------------------------------------------------*/
 #include "softfloat-specialize.c.inc"
 
+#define parts_default_nan  parts64_default_nan
+
 /* Canonicalize EXP and FRAC, setting CLS.  */
 static FloatParts64 sf_canonicalize(FloatParts64 part, const FloatFmt *parm,
                                   float_status *status)
@@ -848,7 +850,8 @@ static FloatParts64 return_nan(FloatParts64 a, float_status *s)
     } else if (!s->default_nan_mode) {
         return a;
     }
-    return parts_default_nan(s);
+    parts_default_nan(&a, s);
+    return a;
 }
 
 static FloatParts64 pick_nan(FloatParts64 a, FloatParts64 b, float_status *s)
@@ -858,7 +861,7 @@ static FloatParts64 pick_nan(FloatParts64 a, FloatParts64 b, float_status *s)
     }
 
     if (s->default_nan_mode) {
-        return parts_default_nan(s);
+        parts_default_nan(&a, s);
     } else {
         if (pickNaN(a.cls, b.cls,
                     a.frac > b.frac ||
@@ -900,7 +903,8 @@ static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64
         a = c;
         break;
     case 3:
-        return parts_default_nan(s);
+        parts_default_nan(&a, s);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -1011,7 +1015,7 @@ static FloatParts64 addsub_floats(FloatParts64 a, FloatParts64 b, bool subtract,
         if (a.cls == float_class_inf) {
             if (b.cls == float_class_inf) {
                 float_raise(float_flag_invalid, s);
-                return parts_default_nan(s);
+                parts_default_nan(&a, s);
             }
             return a;
         }
@@ -1254,7 +1258,8 @@ static FloatParts64 mul_floats(FloatParts64 a, FloatParts64 b, float_status *s)
     if ((a.cls == float_class_inf && b.cls == float_class_zero) ||
         (a.cls == float_class_zero && b.cls == float_class_inf)) {
         float_raise(float_flag_invalid, s);
-        return parts_default_nan(s);
+        parts_default_nan(&a, s);
+        return a;
     }
     /* Multiply by 0 or Inf */
     if (a.cls == float_class_inf || a.cls == float_class_zero) {
@@ -1372,7 +1377,8 @@ static FloatParts64 muladd_floats(FloatParts64 a, FloatParts64 b, FloatParts64 c
 
     if (inf_zero) {
         float_raise(float_flag_invalid, s);
-        return parts_default_nan(s);
+        parts_default_nan(&a, s);
+        return a;
     }
 
     if (flags & float_muladd_negate_c) {
@@ -1396,11 +1402,11 @@ static FloatParts64 muladd_floats(FloatParts64 a, FloatParts64 b, FloatParts64 c
     if (c.cls == float_class_inf) {
         if (p_class == float_class_inf && p_sign != c.sign) {
             float_raise(float_flag_invalid, s);
-            return parts_default_nan(s);
+            parts_default_nan(&c, s);
         } else {
             c.sign ^= sign_flip;
-            return c;
         }
+        return c;
     }
 
     if (p_class == float_class_inf) {
@@ -1764,7 +1770,8 @@ static FloatParts64 div_floats(FloatParts64 a, FloatParts64 b, float_status *s)
         &&
         (a.cls == float_class_inf || a.cls == float_class_zero)) {
         float_raise(float_flag_invalid, s);
-        return parts_default_nan(s);
+        parts_default_nan(&a, s);
+        return a;
     }
     /* Inf / x or 0 / x */
     if (a.cls == float_class_inf || a.cls == float_class_zero) {
@@ -3438,7 +3445,8 @@ static FloatParts64 sqrt_float(FloatParts64 a, float_status *s, const FloatFmt *
     }
     if (a.sign) {
         float_raise(float_flag_invalid, s);
-        return parts_default_nan(s);
+        parts_default_nan(&a, s);
+        return a;
     }
     if (a.cls == float_class_inf) {
         return a;  /* sqrt(+inf) = +inf */
@@ -3573,30 +3581,37 @@ bfloat16 QEMU_FLATTEN bfloat16_sqrt(bfloat16 a, float_status *status)
 
 float16 float16_default_nan(float_status *status)
 {
-    FloatParts64 p = parts_default_nan(status);
+    FloatParts64 p;
+
+    parts_default_nan(&p, status);
     p.frac >>= float16_params.frac_shift;
     return float16_pack_raw(p);
 }
 
 float32 float32_default_nan(float_status *status)
 {
-    FloatParts64 p = parts_default_nan(status);
+    FloatParts64 p;
+
+    parts_default_nan(&p, status);
     p.frac >>= float32_params.frac_shift;
     return float32_pack_raw(p);
 }
 
 float64 float64_default_nan(float_status *status)
 {
-    FloatParts64 p = parts_default_nan(status);
+    FloatParts64 p;
+
+    parts_default_nan(&p, status);
     p.frac >>= float64_params.frac_shift;
     return float64_pack_raw(p);
 }
 
 float128 float128_default_nan(float_status *status)
 {
-    FloatParts64 p = parts_default_nan(status);
+    FloatParts64 p;
     float128 r;
 
+    parts_default_nan(&p, status);
     /* Extrapolate from the choices made by parts_default_nan to fill
      * in the quad-floating format.  If the low bit is set, assume we
      * want to set all non-snan bits.
@@ -3611,7 +3626,9 @@ float128 float128_default_nan(float_status *status)
 
 bfloat16 bfloat16_default_nan(float_status *status)
 {
-    FloatParts64 p = parts_default_nan(status);
+    FloatParts64 p;
+
+    parts_default_nan(&p, status);
     p.frac >>= bfloat16_params.frac_shift;
     return bfloat16_pack_raw(p);
 }
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index 52fc76d800..085ddea62b 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -129,7 +129,7 @@ static bool parts_is_snan_frac(uint64_t frac, float_status *status)
 | The pattern for a default generated deconstructed floating-point NaN.
 *----------------------------------------------------------------------------*/
 
-static FloatParts64 parts_default_nan(float_status *status)
+static void parts64_default_nan(FloatParts64 *p, float_status *status)
 {
     bool sign = 0;
     uint64_t frac;
@@ -164,7 +164,7 @@ static FloatParts64 parts_default_nan(float_status *status)
     }
 #endif
 
-    return (FloatParts64) {
+    *p = (FloatParts64) {
         .cls = float_class_qnan,
         .sign = sign,
         .exp = INT_MAX,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 18/72] softfloat: Use pointers with unpack_raw
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (16 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 17/72] softfloat: Use pointers with parts_default_nan Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-09  8:48   ` Philippe Mathieu-Daudé
  2021-05-12 12:58   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 19/72] softfloat: Use pointers with ftype_unpack_raw Richard Henderson
                   ` (58 subsequent siblings)
  76 siblings, 2 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

At the same time, rename to unpack_raw64.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index c7f95961cf..5ff9368012 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -580,36 +580,45 @@ static const FloatFmt float64_params = {
 };
 
 /* Unpack a float to parts, but do not canonicalize.  */
-static inline FloatParts64 unpack_raw(FloatFmt fmt, uint64_t raw)
+static void unpack_raw64(FloatParts64 *r, const FloatFmt *fmt, uint64_t raw)
 {
-    const int sign_pos = fmt.frac_size + fmt.exp_size;
+    const int f_size = fmt->frac_size;
+    const int e_size = fmt->exp_size;
 
-    return (FloatParts64) {
+    *r = (FloatParts64) {
         .cls = float_class_unclassified,
-        .sign = extract64(raw, sign_pos, 1),
-        .exp = extract64(raw, fmt.frac_size, fmt.exp_size),
-        .frac = extract64(raw, 0, fmt.frac_size),
+        .sign = extract64(raw, f_size + e_size, 1),
+        .exp = extract64(raw, f_size, e_size),
+        .frac = extract64(raw, 0, f_size)
     };
 }
 
 static inline FloatParts64 float16_unpack_raw(float16 f)
 {
-    return unpack_raw(float16_params, f);
+    FloatParts64 p;
+    unpack_raw64(&p, &float16_params, f);
+    return p;
 }
 
 static inline FloatParts64 bfloat16_unpack_raw(bfloat16 f)
 {
-    return unpack_raw(bfloat16_params, f);
+    FloatParts64 p;
+    unpack_raw64(&p, &bfloat16_params, f);
+    return p;
 }
 
 static inline FloatParts64 float32_unpack_raw(float32 f)
 {
-    return unpack_raw(float32_params, f);
+    FloatParts64 p;
+    unpack_raw64(&p, &float32_params, f);
+    return p;
 }
 
 static inline FloatParts64 float64_unpack_raw(float64 f)
 {
-    return unpack_raw(float64_params, f);
+    FloatParts64 p;
+    unpack_raw64(&p, &float64_params, f);
+    return p;
 }
 
 /* Pack a float from parts, but do not canonicalize.  */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 19/72] softfloat: Use pointers with ftype_unpack_raw
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (17 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 18/72] softfloat: Use pointers with unpack_raw Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-11 11:31   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 20/72] softfloat: Use pointers with pack_raw Richard Henderson
                   ` (57 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 76 +++++++++++++++++++++++++++++++------------------
 1 file changed, 48 insertions(+), 28 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 5ff9368012..5a736a46cf 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -593,32 +593,24 @@ static void unpack_raw64(FloatParts64 *r, const FloatFmt *fmt, uint64_t raw)
     };
 }
 
-static inline FloatParts64 float16_unpack_raw(float16 f)
+static inline void float16_unpack_raw(FloatParts64 *p, float16 f)
 {
-    FloatParts64 p;
-    unpack_raw64(&p, &float16_params, f);
-    return p;
+    unpack_raw64(p, &float16_params, f);
 }
 
-static inline FloatParts64 bfloat16_unpack_raw(bfloat16 f)
+static inline void bfloat16_unpack_raw(FloatParts64 *p, bfloat16 f)
 {
-    FloatParts64 p;
-    unpack_raw64(&p, &bfloat16_params, f);
-    return p;
+    unpack_raw64(p, &bfloat16_params, f);
 }
 
-static inline FloatParts64 float32_unpack_raw(float32 f)
+static inline void float32_unpack_raw(FloatParts64 *p, float32 f)
 {
-    FloatParts64 p;
-    unpack_raw64(&p, &float32_params, f);
-    return p;
+    unpack_raw64(p, &float32_params, f);
 }
 
-static inline FloatParts64 float64_unpack_raw(float64 f)
+static inline void float64_unpack_raw(FloatParts64 *p, float64 f)
 {
-    FloatParts64 p;
-    unpack_raw64(&p, &float64_params, f);
-    return p;
+    unpack_raw64(p, &float64_params, f);
 }
 
 /* Pack a float from parts, but do not canonicalize.  */
@@ -931,7 +923,10 @@ static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64
 static FloatParts64 float16a_unpack_canonical(float16 f, float_status *s,
                                             const FloatFmt *params)
 {
-    return sf_canonicalize(float16_unpack_raw(f), params, s);
+    FloatParts64 p;
+
+    float16_unpack_raw(&p, f);
+    return sf_canonicalize(p, params, s);
 }
 
 static FloatParts64 float16_unpack_canonical(float16 f, float_status *s)
@@ -941,7 +936,10 @@ static FloatParts64 float16_unpack_canonical(float16 f, float_status *s)
 
 static FloatParts64 bfloat16_unpack_canonical(bfloat16 f, float_status *s)
 {
-    return sf_canonicalize(bfloat16_unpack_raw(f), &bfloat16_params, s);
+    FloatParts64 p;
+
+    bfloat16_unpack_raw(&p, f);
+    return sf_canonicalize(p, &bfloat16_params, s);
 }
 
 static float16 float16a_round_pack_canonical(FloatParts64 p, float_status *s,
@@ -962,7 +960,10 @@ static bfloat16 bfloat16_round_pack_canonical(FloatParts64 p, float_status *s)
 
 static FloatParts64 float32_unpack_canonical(float32 f, float_status *s)
 {
-    return sf_canonicalize(float32_unpack_raw(f), &float32_params, s);
+    FloatParts64 p;
+
+    float32_unpack_raw(&p, f);
+    return sf_canonicalize(p, &float32_params, s);
 }
 
 static float32 float32_round_pack_canonical(FloatParts64 p, float_status *s)
@@ -972,7 +973,10 @@ static float32 float32_round_pack_canonical(FloatParts64 p, float_status *s)
 
 static FloatParts64 float64_unpack_canonical(float64 f, float_status *s)
 {
-    return sf_canonicalize(float64_unpack_raw(f), &float64_params, s);
+    FloatParts64 p;
+
+    float64_unpack_raw(&p, f);
+    return sf_canonicalize(p, &float64_params, s);
 }
 
 static float64 float64_round_pack_canonical(FloatParts64 p, float_status *s)
@@ -3648,7 +3652,9 @@ bfloat16 bfloat16_default_nan(float_status *status)
 
 float16 float16_silence_nan(float16 a, float_status *status)
 {
-    FloatParts64 p = float16_unpack_raw(a);
+    FloatParts64 p;
+
+    float16_unpack_raw(&p, a);
     p.frac <<= float16_params.frac_shift;
     p = parts_silence_nan(p, status);
     p.frac >>= float16_params.frac_shift;
@@ -3657,7 +3663,9 @@ float16 float16_silence_nan(float16 a, float_status *status)
 
 float32 float32_silence_nan(float32 a, float_status *status)
 {
-    FloatParts64 p = float32_unpack_raw(a);
+    FloatParts64 p;
+
+    float32_unpack_raw(&p, a);
     p.frac <<= float32_params.frac_shift;
     p = parts_silence_nan(p, status);
     p.frac >>= float32_params.frac_shift;
@@ -3666,7 +3674,9 @@ float32 float32_silence_nan(float32 a, float_status *status)
 
 float64 float64_silence_nan(float64 a, float_status *status)
 {
-    FloatParts64 p = float64_unpack_raw(a);
+    FloatParts64 p;
+
+    float64_unpack_raw(&p, a);
     p.frac <<= float64_params.frac_shift;
     p = parts_silence_nan(p, status);
     p.frac >>= float64_params.frac_shift;
@@ -3675,7 +3685,9 @@ float64 float64_silence_nan(float64 a, float_status *status)
 
 bfloat16 bfloat16_silence_nan(bfloat16 a, float_status *status)
 {
-    FloatParts64 p = bfloat16_unpack_raw(a);
+    FloatParts64 p;
+
+    bfloat16_unpack_raw(&p, a);
     p.frac <<= bfloat16_params.frac_shift;
     p = parts_silence_nan(p, status);
     p.frac >>= bfloat16_params.frac_shift;
@@ -3700,7 +3712,9 @@ static bool parts_squash_denormal(FloatParts64 p, float_status *status)
 float16 float16_squash_input_denormal(float16 a, float_status *status)
 {
     if (status->flush_inputs_to_zero) {
-        FloatParts64 p = float16_unpack_raw(a);
+        FloatParts64 p;
+
+        float16_unpack_raw(&p, a);
         if (parts_squash_denormal(p, status)) {
             return float16_set_sign(float16_zero, p.sign);
         }
@@ -3711,7 +3725,9 @@ float16 float16_squash_input_denormal(float16 a, float_status *status)
 float32 float32_squash_input_denormal(float32 a, float_status *status)
 {
     if (status->flush_inputs_to_zero) {
-        FloatParts64 p = float32_unpack_raw(a);
+        FloatParts64 p;
+
+        float32_unpack_raw(&p, a);
         if (parts_squash_denormal(p, status)) {
             return float32_set_sign(float32_zero, p.sign);
         }
@@ -3722,7 +3738,9 @@ float32 float32_squash_input_denormal(float32 a, float_status *status)
 float64 float64_squash_input_denormal(float64 a, float_status *status)
 {
     if (status->flush_inputs_to_zero) {
-        FloatParts64 p = float64_unpack_raw(a);
+        FloatParts64 p;
+
+        float64_unpack_raw(&p, a);
         if (parts_squash_denormal(p, status)) {
             return float64_set_sign(float64_zero, p.sign);
         }
@@ -3733,7 +3751,9 @@ float64 float64_squash_input_denormal(float64 a, float_status *status)
 bfloat16 bfloat16_squash_input_denormal(bfloat16 a, float_status *status)
 {
     if (status->flush_inputs_to_zero) {
-        FloatParts64 p = bfloat16_unpack_raw(a);
+        FloatParts64 p;
+
+        bfloat16_unpack_raw(&p, a);
         if (parts_squash_denormal(p, status)) {
             return bfloat16_set_sign(bfloat16_zero, p.sign);
         }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 20/72] softfloat: Use pointers with pack_raw
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (18 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 19/72] softfloat: Use pointers with ftype_unpack_raw Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-11 11:32   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 21/72] softfloat: Use pointers with ftype_pack_raw Richard Henderson
                   ` (56 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

At the same time, rename to pack_raw64.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 5a736a46cf..b59b777bca 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -614,31 +614,36 @@ static inline void float64_unpack_raw(FloatParts64 *p, float64 f)
 }
 
 /* Pack a float from parts, but do not canonicalize.  */
-static inline uint64_t pack_raw(FloatFmt fmt, FloatParts64 p)
+static uint64_t pack_raw64(const FloatParts64 *p, const FloatFmt *fmt)
 {
-    const int sign_pos = fmt.frac_size + fmt.exp_size;
-    uint64_t ret = deposit64(p.frac, fmt.frac_size, fmt.exp_size, p.exp);
-    return deposit64(ret, sign_pos, 1, p.sign);
+    const int f_size = fmt->frac_size;
+    const int e_size = fmt->exp_size;
+    uint64_t ret;
+
+    ret = (uint64_t)p->sign << (f_size + e_size);
+    ret = deposit64(ret, f_size, e_size, p->exp);
+    ret = deposit64(ret, 0, f_size, p->frac);
+    return ret;
 }
 
 static inline float16 float16_pack_raw(FloatParts64 p)
 {
-    return make_float16(pack_raw(float16_params, p));
+    return make_float16(pack_raw64(&p, &float16_params));
 }
 
 static inline bfloat16 bfloat16_pack_raw(FloatParts64 p)
 {
-    return pack_raw(bfloat16_params, p);
+    return pack_raw64(&p, &bfloat16_params);
 }
 
 static inline float32 float32_pack_raw(FloatParts64 p)
 {
-    return make_float32(pack_raw(float32_params, p));
+    return make_float32(pack_raw64(&p, &float32_params));
 }
 
 static inline float64 float64_pack_raw(FloatParts64 p)
 {
-    return make_float64(pack_raw(float64_params, p));
+    return make_float64(pack_raw64(&p, &float64_params));
 }
 
 /*----------------------------------------------------------------------------
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 21/72] softfloat: Use pointers with ftype_pack_raw
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (19 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 20/72] softfloat: Use pointers with pack_raw Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-11 11:32   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 22/72] softfloat: Use pointers with ftype_unpack_canonical Richard Henderson
                   ` (55 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 44 ++++++++++++++++++++++++--------------------
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index b59b777bca..e02cbafaf9 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -626,24 +626,24 @@ static uint64_t pack_raw64(const FloatParts64 *p, const FloatFmt *fmt)
     return ret;
 }
 
-static inline float16 float16_pack_raw(FloatParts64 p)
+static inline float16 float16_pack_raw(const FloatParts64 *p)
 {
-    return make_float16(pack_raw64(&p, &float16_params));
+    return make_float16(pack_raw64(p, &float16_params));
 }
 
-static inline bfloat16 bfloat16_pack_raw(FloatParts64 p)
+static inline bfloat16 bfloat16_pack_raw(const FloatParts64 *p)
 {
-    return pack_raw64(&p, &bfloat16_params);
+    return pack_raw64(p, &bfloat16_params);
 }
 
-static inline float32 float32_pack_raw(FloatParts64 p)
+static inline float32 float32_pack_raw(const FloatParts64 *p)
 {
-    return make_float32(pack_raw64(&p, &float32_params));
+    return make_float32(pack_raw64(p, &float32_params));
 }
 
-static inline float64 float64_pack_raw(FloatParts64 p)
+static inline float64 float64_pack_raw(const FloatParts64 *p)
 {
-    return make_float64(pack_raw64(&p, &float64_params));
+    return make_float64(pack_raw64(p, &float64_params));
 }
 
 /*----------------------------------------------------------------------------
@@ -950,7 +950,8 @@ static FloatParts64 bfloat16_unpack_canonical(bfloat16 f, float_status *s)
 static float16 float16a_round_pack_canonical(FloatParts64 p, float_status *s,
                                              const FloatFmt *params)
 {
-    return float16_pack_raw(round_canonical(p, s, params));
+    p = round_canonical(p, s, params);
+    return float16_pack_raw(&p);
 }
 
 static float16 float16_round_pack_canonical(FloatParts64 p, float_status *s)
@@ -960,7 +961,8 @@ static float16 float16_round_pack_canonical(FloatParts64 p, float_status *s)
 
 static bfloat16 bfloat16_round_pack_canonical(FloatParts64 p, float_status *s)
 {
-    return bfloat16_pack_raw(round_canonical(p, s, &bfloat16_params));
+    p = round_canonical(p, s, &bfloat16_params);
+    return bfloat16_pack_raw(&p);
 }
 
 static FloatParts64 float32_unpack_canonical(float32 f, float_status *s)
@@ -973,7 +975,8 @@ static FloatParts64 float32_unpack_canonical(float32 f, float_status *s)
 
 static float32 float32_round_pack_canonical(FloatParts64 p, float_status *s)
 {
-    return float32_pack_raw(round_canonical(p, s, &float32_params));
+    p = round_canonical(p, s, &float32_params);
+    return float32_pack_raw(&p);
 }
 
 static FloatParts64 float64_unpack_canonical(float64 f, float_status *s)
@@ -986,7 +989,8 @@ static FloatParts64 float64_unpack_canonical(float64 f, float_status *s)
 
 static float64 float64_round_pack_canonical(FloatParts64 p, float_status *s)
 {
-    return float64_pack_raw(round_canonical(p, s, &float64_params));
+    p = round_canonical(p, s, &float64_params);
+    return float64_pack_raw(&p);
 }
 
 /*
@@ -3603,7 +3607,7 @@ float16 float16_default_nan(float_status *status)
 
     parts_default_nan(&p, status);
     p.frac >>= float16_params.frac_shift;
-    return float16_pack_raw(p);
+    return float16_pack_raw(&p);
 }
 
 float32 float32_default_nan(float_status *status)
@@ -3612,7 +3616,7 @@ float32 float32_default_nan(float_status *status)
 
     parts_default_nan(&p, status);
     p.frac >>= float32_params.frac_shift;
-    return float32_pack_raw(p);
+    return float32_pack_raw(&p);
 }
 
 float64 float64_default_nan(float_status *status)
@@ -3621,7 +3625,7 @@ float64 float64_default_nan(float_status *status)
 
     parts_default_nan(&p, status);
     p.frac >>= float64_params.frac_shift;
-    return float64_pack_raw(p);
+    return float64_pack_raw(&p);
 }
 
 float128 float128_default_nan(float_status *status)
@@ -3648,7 +3652,7 @@ bfloat16 bfloat16_default_nan(float_status *status)
 
     parts_default_nan(&p, status);
     p.frac >>= bfloat16_params.frac_shift;
-    return bfloat16_pack_raw(p);
+    return bfloat16_pack_raw(&p);
 }
 
 /*----------------------------------------------------------------------------
@@ -3663,7 +3667,7 @@ float16 float16_silence_nan(float16 a, float_status *status)
     p.frac <<= float16_params.frac_shift;
     p = parts_silence_nan(p, status);
     p.frac >>= float16_params.frac_shift;
-    return float16_pack_raw(p);
+    return float16_pack_raw(&p);
 }
 
 float32 float32_silence_nan(float32 a, float_status *status)
@@ -3674,7 +3678,7 @@ float32 float32_silence_nan(float32 a, float_status *status)
     p.frac <<= float32_params.frac_shift;
     p = parts_silence_nan(p, status);
     p.frac >>= float32_params.frac_shift;
-    return float32_pack_raw(p);
+    return float32_pack_raw(&p);
 }
 
 float64 float64_silence_nan(float64 a, float_status *status)
@@ -3685,7 +3689,7 @@ float64 float64_silence_nan(float64 a, float_status *status)
     p.frac <<= float64_params.frac_shift;
     p = parts_silence_nan(p, status);
     p.frac >>= float64_params.frac_shift;
-    return float64_pack_raw(p);
+    return float64_pack_raw(&p);
 }
 
 bfloat16 bfloat16_silence_nan(bfloat16 a, float_status *status)
@@ -3696,7 +3700,7 @@ bfloat16 bfloat16_silence_nan(bfloat16 a, float_status *status)
     p.frac <<= bfloat16_params.frac_shift;
     p = parts_silence_nan(p, status);
     p.frac >>= bfloat16_params.frac_shift;
-    return bfloat16_pack_raw(p);
+    return bfloat16_pack_raw(&p);
 }
 
 /*----------------------------------------------------------------------------
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 22/72] softfloat: Use pointers with ftype_unpack_canonical
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (20 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 21/72] softfloat: Use pointers with ftype_pack_raw Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-11 13:54   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 23/72] softfloat: Use pointers with ftype_round_pack_canonical Richard Henderson
                   ` (54 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 509 ++++++++++++++++++++++++++++++------------------
 1 file changed, 320 insertions(+), 189 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index e02cbafaf9..e53d4a138f 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -925,26 +925,24 @@ static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64
  * Pack/unpack routines with a specific FloatFmt.
  */
 
-static FloatParts64 float16a_unpack_canonical(float16 f, float_status *s,
-                                            const FloatFmt *params)
+static void float16a_unpack_canonical(FloatParts64 *p, float16 f,
+                                      float_status *s, const FloatFmt *params)
 {
-    FloatParts64 p;
-
-    float16_unpack_raw(&p, f);
-    return sf_canonicalize(p, params, s);
+    float16_unpack_raw(p, f);
+    *p = sf_canonicalize(*p, params, s);
 }
 
-static FloatParts64 float16_unpack_canonical(float16 f, float_status *s)
+static void float16_unpack_canonical(FloatParts64 *p, float16 f,
+                                     float_status *s)
 {
-    return float16a_unpack_canonical(f, s, &float16_params);
+    float16a_unpack_canonical(p, f, s, &float16_params);
 }
 
-static FloatParts64 bfloat16_unpack_canonical(bfloat16 f, float_status *s)
+static void bfloat16_unpack_canonical(FloatParts64 *p, bfloat16 f,
+                                      float_status *s)
 {
-    FloatParts64 p;
-
-    bfloat16_unpack_raw(&p, f);
-    return sf_canonicalize(p, &bfloat16_params, s);
+    bfloat16_unpack_raw(p, f);
+    *p = sf_canonicalize(*p, &bfloat16_params, s);
 }
 
 static float16 float16a_round_pack_canonical(FloatParts64 p, float_status *s,
@@ -965,12 +963,11 @@ static bfloat16 bfloat16_round_pack_canonical(FloatParts64 p, float_status *s)
     return bfloat16_pack_raw(&p);
 }
 
-static FloatParts64 float32_unpack_canonical(float32 f, float_status *s)
+static void float32_unpack_canonical(FloatParts64 *p, float32 f,
+                                     float_status *s)
 {
-    FloatParts64 p;
-
-    float32_unpack_raw(&p, f);
-    return sf_canonicalize(p, &float32_params, s);
+    float32_unpack_raw(p, f);
+    *p = sf_canonicalize(*p, &float32_params, s);
 }
 
 static float32 float32_round_pack_canonical(FloatParts64 p, float_status *s)
@@ -979,12 +976,11 @@ static float32 float32_round_pack_canonical(FloatParts64 p, float_status *s)
     return float32_pack_raw(&p);
 }
 
-static FloatParts64 float64_unpack_canonical(float64 f, float_status *s)
+static void float64_unpack_canonical(FloatParts64 *p, float64 f,
+                                     float_status *s)
 {
-    FloatParts64 p;
-
-    float64_unpack_raw(&p, f);
-    return sf_canonicalize(p, &float64_params, s);
+    float64_unpack_raw(p, f);
+    *p = sf_canonicalize(*p, &float64_params, s);
 }
 
 static float64 float64_round_pack_canonical(FloatParts64 p, float_status *s)
@@ -1091,18 +1087,22 @@ static FloatParts64 addsub_floats(FloatParts64 a, FloatParts64 b, bool subtract,
 
 float16 QEMU_FLATTEN float16_add(float16 a, float16 b, float_status *status)
 {
-    FloatParts64 pa = float16_unpack_canonical(a, status);
-    FloatParts64 pb = float16_unpack_canonical(b, status);
-    FloatParts64 pr = addsub_floats(pa, pb, false, status);
+    FloatParts64 pa, pb, pr;
+
+    float16_unpack_canonical(&pa, a, status);
+    float16_unpack_canonical(&pb, b, status);
+    pr = addsub_floats(pa, pb, false, status);
 
     return float16_round_pack_canonical(pr, status);
 }
 
 float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
 {
-    FloatParts64 pa = float16_unpack_canonical(a, status);
-    FloatParts64 pb = float16_unpack_canonical(b, status);
-    FloatParts64 pr = addsub_floats(pa, pb, true, status);
+    FloatParts64 pa, pb, pr;
+
+    float16_unpack_canonical(&pa, a, status);
+    float16_unpack_canonical(&pb, b, status);
+    pr = addsub_floats(pa, pb, true, status);
 
     return float16_round_pack_canonical(pr, status);
 }
@@ -1110,9 +1110,11 @@ float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
 static float32 QEMU_SOFTFLOAT_ATTR
 soft_f32_addsub(float32 a, float32 b, bool subtract, float_status *status)
 {
-    FloatParts64 pa = float32_unpack_canonical(a, status);
-    FloatParts64 pb = float32_unpack_canonical(b, status);
-    FloatParts64 pr = addsub_floats(pa, pb, subtract, status);
+    FloatParts64 pa, pb, pr;
+
+    float32_unpack_canonical(&pa, a, status);
+    float32_unpack_canonical(&pb, b, status);
+    pr = addsub_floats(pa, pb, subtract, status);
 
     return float32_round_pack_canonical(pr, status);
 }
@@ -1130,9 +1132,11 @@ static inline float32 soft_f32_sub(float32 a, float32 b, float_status *status)
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_f64_addsub(float64 a, float64 b, bool subtract, float_status *status)
 {
-    FloatParts64 pa = float64_unpack_canonical(a, status);
-    FloatParts64 pb = float64_unpack_canonical(b, status);
-    FloatParts64 pr = addsub_floats(pa, pb, subtract, status);
+    FloatParts64 pa, pb, pr;
+
+    float64_unpack_canonical(&pa, a, status);
+    float64_unpack_canonical(&pb, b, status);
+    pr = addsub_floats(pa, pb, subtract, status);
 
     return float64_round_pack_canonical(pr, status);
 }
@@ -1228,18 +1232,22 @@ float64_sub(float64 a, float64 b, float_status *s)
  */
 bfloat16 QEMU_FLATTEN bfloat16_add(bfloat16 a, bfloat16 b, float_status *status)
 {
-    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
-    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
-    FloatParts64 pr = addsub_floats(pa, pb, false, status);
+    FloatParts64 pa, pb, pr;
+
+    bfloat16_unpack_canonical(&pa, a, status);
+    bfloat16_unpack_canonical(&pb, b, status);
+    pr = addsub_floats(pa, pb, false, status);
 
     return bfloat16_round_pack_canonical(pr, status);
 }
 
 bfloat16 QEMU_FLATTEN bfloat16_sub(bfloat16 a, bfloat16 b, float_status *status)
 {
-    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
-    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
-    FloatParts64 pr = addsub_floats(pa, pb, true, status);
+    FloatParts64 pa, pb, pr;
+
+    bfloat16_unpack_canonical(&pa, a, status);
+    bfloat16_unpack_canonical(&pb, b, status);
+    pr = addsub_floats(pa, pb, true, status);
 
     return bfloat16_round_pack_canonical(pr, status);
 }
@@ -1297,9 +1305,11 @@ static FloatParts64 mul_floats(FloatParts64 a, FloatParts64 b, float_status *s)
 
 float16 QEMU_FLATTEN float16_mul(float16 a, float16 b, float_status *status)
 {
-    FloatParts64 pa = float16_unpack_canonical(a, status);
-    FloatParts64 pb = float16_unpack_canonical(b, status);
-    FloatParts64 pr = mul_floats(pa, pb, status);
+    FloatParts64 pa, pb, pr;
+
+    float16_unpack_canonical(&pa, a, status);
+    float16_unpack_canonical(&pb, b, status);
+    pr = mul_floats(pa, pb, status);
 
     return float16_round_pack_canonical(pr, status);
 }
@@ -1307,9 +1317,11 @@ float16 QEMU_FLATTEN float16_mul(float16 a, float16 b, float_status *status)
 static float32 QEMU_SOFTFLOAT_ATTR
 soft_f32_mul(float32 a, float32 b, float_status *status)
 {
-    FloatParts64 pa = float32_unpack_canonical(a, status);
-    FloatParts64 pb = float32_unpack_canonical(b, status);
-    FloatParts64 pr = mul_floats(pa, pb, status);
+    FloatParts64 pa, pb, pr;
+
+    float32_unpack_canonical(&pa, a, status);
+    float32_unpack_canonical(&pb, b, status);
+    pr = mul_floats(pa, pb, status);
 
     return float32_round_pack_canonical(pr, status);
 }
@@ -1317,9 +1329,11 @@ soft_f32_mul(float32 a, float32 b, float_status *status)
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_f64_mul(float64 a, float64 b, float_status *status)
 {
-    FloatParts64 pa = float64_unpack_canonical(a, status);
-    FloatParts64 pb = float64_unpack_canonical(b, status);
-    FloatParts64 pr = mul_floats(pa, pb, status);
+    FloatParts64 pa, pb, pr;
+
+    float64_unpack_canonical(&pa, a, status);
+    float64_unpack_canonical(&pb, b, status);
+    pr = mul_floats(pa, pb, status);
 
     return float64_round_pack_canonical(pr, status);
 }
@@ -1355,9 +1369,11 @@ float64_mul(float64 a, float64 b, float_status *s)
 
 bfloat16 QEMU_FLATTEN bfloat16_mul(bfloat16 a, bfloat16 b, float_status *status)
 {
-    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
-    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
-    FloatParts64 pr = mul_floats(pa, pb, status);
+    FloatParts64 pa, pb, pr;
+
+    bfloat16_unpack_canonical(&pa, a, status);
+    bfloat16_unpack_canonical(&pb, b, status);
+    pr = mul_floats(pa, pb, status);
 
     return bfloat16_round_pack_canonical(pr, status);
 }
@@ -1551,10 +1567,12 @@ static FloatParts64 muladd_floats(FloatParts64 a, FloatParts64 b, FloatParts64 c
 float16 QEMU_FLATTEN float16_muladd(float16 a, float16 b, float16 c,
                                                 int flags, float_status *status)
 {
-    FloatParts64 pa = float16_unpack_canonical(a, status);
-    FloatParts64 pb = float16_unpack_canonical(b, status);
-    FloatParts64 pc = float16_unpack_canonical(c, status);
-    FloatParts64 pr = muladd_floats(pa, pb, pc, flags, status);
+    FloatParts64 pa, pb, pc, pr;
+
+    float16_unpack_canonical(&pa, a, status);
+    float16_unpack_canonical(&pb, b, status);
+    float16_unpack_canonical(&pc, c, status);
+    pr = muladd_floats(pa, pb, pc, flags, status);
 
     return float16_round_pack_canonical(pr, status);
 }
@@ -1563,10 +1581,12 @@ static float32 QEMU_SOFTFLOAT_ATTR
 soft_f32_muladd(float32 a, float32 b, float32 c, int flags,
                 float_status *status)
 {
-    FloatParts64 pa = float32_unpack_canonical(a, status);
-    FloatParts64 pb = float32_unpack_canonical(b, status);
-    FloatParts64 pc = float32_unpack_canonical(c, status);
-    FloatParts64 pr = muladd_floats(pa, pb, pc, flags, status);
+    FloatParts64 pa, pb, pc, pr;
+
+    float32_unpack_canonical(&pa, a, status);
+    float32_unpack_canonical(&pb, b, status);
+    float32_unpack_canonical(&pc, c, status);
+    pr = muladd_floats(pa, pb, pc, flags, status);
 
     return float32_round_pack_canonical(pr, status);
 }
@@ -1575,10 +1595,12 @@ static float64 QEMU_SOFTFLOAT_ATTR
 soft_f64_muladd(float64 a, float64 b, float64 c, int flags,
                 float_status *status)
 {
-    FloatParts64 pa = float64_unpack_canonical(a, status);
-    FloatParts64 pb = float64_unpack_canonical(b, status);
-    FloatParts64 pc = float64_unpack_canonical(c, status);
-    FloatParts64 pr = muladd_floats(pa, pb, pc, flags, status);
+    FloatParts64 pa, pb, pc, pr;
+
+    float64_unpack_canonical(&pa, a, status);
+    float64_unpack_canonical(&pb, b, status);
+    float64_unpack_canonical(&pc, c, status);
+    pr = muladd_floats(pa, pb, pc, flags, status);
 
     return float64_round_pack_canonical(pr, status);
 }
@@ -1736,10 +1758,12 @@ float64_muladd(float64 xa, float64 xb, float64 xc, int flags, float_status *s)
 bfloat16 QEMU_FLATTEN bfloat16_muladd(bfloat16 a, bfloat16 b, bfloat16 c,
                                       int flags, float_status *status)
 {
-    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
-    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
-    FloatParts64 pc = bfloat16_unpack_canonical(c, status);
-    FloatParts64 pr = muladd_floats(pa, pb, pc, flags, status);
+    FloatParts64 pa, pb, pc, pr;
+
+    bfloat16_unpack_canonical(&pa, a, status);
+    bfloat16_unpack_canonical(&pb, b, status);
+    bfloat16_unpack_canonical(&pc, c, status);
+    pr = muladd_floats(pa, pb, pc, flags, status);
 
     return bfloat16_round_pack_canonical(pr, status);
 }
@@ -1818,9 +1842,11 @@ static FloatParts64 div_floats(FloatParts64 a, FloatParts64 b, float_status *s)
 
 float16 float16_div(float16 a, float16 b, float_status *status)
 {
-    FloatParts64 pa = float16_unpack_canonical(a, status);
-    FloatParts64 pb = float16_unpack_canonical(b, status);
-    FloatParts64 pr = div_floats(pa, pb, status);
+    FloatParts64 pa, pb, pr;
+
+    float16_unpack_canonical(&pa, a, status);
+    float16_unpack_canonical(&pb, b, status);
+    pr = div_floats(pa, pb, status);
 
     return float16_round_pack_canonical(pr, status);
 }
@@ -1828,9 +1854,11 @@ float16 float16_div(float16 a, float16 b, float_status *status)
 static float32 QEMU_SOFTFLOAT_ATTR
 soft_f32_div(float32 a, float32 b, float_status *status)
 {
-    FloatParts64 pa = float32_unpack_canonical(a, status);
-    FloatParts64 pb = float32_unpack_canonical(b, status);
-    FloatParts64 pr = div_floats(pa, pb, status);
+    FloatParts64 pa, pb, pr;
+
+    float32_unpack_canonical(&pa, a, status);
+    float32_unpack_canonical(&pb, b, status);
+    pr = div_floats(pa, pb, status);
 
     return float32_round_pack_canonical(pr, status);
 }
@@ -1838,9 +1866,11 @@ soft_f32_div(float32 a, float32 b, float_status *status)
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_f64_div(float64 a, float64 b, float_status *status)
 {
-    FloatParts64 pa = float64_unpack_canonical(a, status);
-    FloatParts64 pb = float64_unpack_canonical(b, status);
-    FloatParts64 pr = div_floats(pa, pb, status);
+    FloatParts64 pa, pb, pr;
+
+    float64_unpack_canonical(&pa, a, status);
+    float64_unpack_canonical(&pb, b, status);
+    pr = div_floats(pa, pb, status);
 
     return float64_round_pack_canonical(pr, status);
 }
@@ -1910,9 +1940,11 @@ float64_div(float64 a, float64 b, float_status *s)
 
 bfloat16 bfloat16_div(bfloat16 a, bfloat16 b, float_status *status)
 {
-    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
-    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
-    FloatParts64 pr = div_floats(pa, pb, status);
+    FloatParts64 pa, pb, pr;
+
+    bfloat16_unpack_canonical(&pa, a, status);
+    bfloat16_unpack_canonical(&pb, b, status);
+    pr = div_floats(pa, pb, status);
 
     return bfloat16_round_pack_canonical(pr, status);
 }
@@ -1966,32 +1998,40 @@ static FloatParts64 float_to_float(FloatParts64 a, const FloatFmt *dstf,
 float32 float16_to_float32(float16 a, bool ieee, float_status *s)
 {
     const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
-    FloatParts64 p = float16a_unpack_canonical(a, s, fmt16);
-    FloatParts64 pr = float_to_float(p, &float32_params, s);
+    FloatParts64 pa, pr;
+
+    float16a_unpack_canonical(&pa, a, s, fmt16);
+    pr = float_to_float(pa, &float32_params, s);
     return float32_round_pack_canonical(pr, s);
 }
 
 float64 float16_to_float64(float16 a, bool ieee, float_status *s)
 {
     const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
-    FloatParts64 p = float16a_unpack_canonical(a, s, fmt16);
-    FloatParts64 pr = float_to_float(p, &float64_params, s);
+    FloatParts64 pa, pr;
+
+    float16a_unpack_canonical(&pa, a, s, fmt16);
+    pr = float_to_float(pa, &float64_params, s);
     return float64_round_pack_canonical(pr, s);
 }
 
 float16 float32_to_float16(float32 a, bool ieee, float_status *s)
 {
     const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
-    FloatParts64 p = float32_unpack_canonical(a, s);
-    FloatParts64 pr = float_to_float(p, fmt16, s);
+    FloatParts64 pa, pr;
+
+    float32_unpack_canonical(&pa, a, s);
+    pr = float_to_float(pa, fmt16, s);
     return float16a_round_pack_canonical(pr, s, fmt16);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_float32_to_float64(float32 a, float_status *s)
 {
-    FloatParts64 p = float32_unpack_canonical(a, s);
-    FloatParts64 pr = float_to_float(p, &float64_params, s);
+    FloatParts64 pa, pr;
+
+    float32_unpack_canonical(&pa, a, s);
+    pr = float_to_float(pa, &float64_params, s);
     return float64_round_pack_canonical(pr, s);
 }
 
@@ -2014,43 +2054,55 @@ float64 float32_to_float64(float32 a, float_status *s)
 float16 float64_to_float16(float64 a, bool ieee, float_status *s)
 {
     const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
-    FloatParts64 p = float64_unpack_canonical(a, s);
-    FloatParts64 pr = float_to_float(p, fmt16, s);
+    FloatParts64 pa, pr;
+
+    float64_unpack_canonical(&pa, a, s);
+    pr = float_to_float(pa, fmt16, s);
     return float16a_round_pack_canonical(pr, s, fmt16);
 }
 
 float32 float64_to_float32(float64 a, float_status *s)
 {
-    FloatParts64 p = float64_unpack_canonical(a, s);
-    FloatParts64 pr = float_to_float(p, &float32_params, s);
+    FloatParts64 pa, pr;
+
+    float64_unpack_canonical(&pa, a, s);
+    pr = float_to_float(pa, &float32_params, s);
     return float32_round_pack_canonical(pr, s);
 }
 
 float32 bfloat16_to_float32(bfloat16 a, float_status *s)
 {
-    FloatParts64 p = bfloat16_unpack_canonical(a, s);
-    FloatParts64 pr = float_to_float(p, &float32_params, s);
+    FloatParts64 pa, pr;
+
+    bfloat16_unpack_canonical(&pa, a, s);
+    pr = float_to_float(pa, &float32_params, s);
     return float32_round_pack_canonical(pr, s);
 }
 
 float64 bfloat16_to_float64(bfloat16 a, float_status *s)
 {
-    FloatParts64 p = bfloat16_unpack_canonical(a, s);
-    FloatParts64 pr = float_to_float(p, &float64_params, s);
+    FloatParts64 pa, pr;
+
+    bfloat16_unpack_canonical(&pa, a, s);
+    pr = float_to_float(pa, &float64_params, s);
     return float64_round_pack_canonical(pr, s);
 }
 
 bfloat16 float32_to_bfloat16(float32 a, float_status *s)
 {
-    FloatParts64 p = float32_unpack_canonical(a, s);
-    FloatParts64 pr = float_to_float(p, &bfloat16_params, s);
+    FloatParts64 pa, pr;
+
+    float32_unpack_canonical(&pa, a, s);
+    pr = float_to_float(pa, &bfloat16_params, s);
     return bfloat16_round_pack_canonical(pr, s);
 }
 
 bfloat16 float64_to_bfloat16(float64 a, float_status *s)
 {
-    FloatParts64 p = float64_unpack_canonical(a, s);
-    FloatParts64 pr = float_to_float(p, &bfloat16_params, s);
+    FloatParts64 pa, pr;
+
+    float64_unpack_canonical(&pa, a, s);
+    pr = float_to_float(pa, &bfloat16_params, s);
     return bfloat16_round_pack_canonical(pr, s);
 }
 
@@ -2164,22 +2216,28 @@ static FloatParts64 round_to_int(FloatParts64 a, FloatRoundMode rmode,
 
 float16 float16_round_to_int(float16 a, float_status *s)
 {
-    FloatParts64 pa = float16_unpack_canonical(a, s);
-    FloatParts64 pr = round_to_int(pa, s->float_rounding_mode, 0, s);
+    FloatParts64 pa, pr;
+
+    float16_unpack_canonical(&pa, a, s);
+    pr = round_to_int(pa, s->float_rounding_mode, 0, s);
     return float16_round_pack_canonical(pr, s);
 }
 
 float32 float32_round_to_int(float32 a, float_status *s)
 {
-    FloatParts64 pa = float32_unpack_canonical(a, s);
-    FloatParts64 pr = round_to_int(pa, s->float_rounding_mode, 0, s);
+    FloatParts64 pa, pr;
+
+    float32_unpack_canonical(&pa, a, s);
+    pr = round_to_int(pa, s->float_rounding_mode, 0, s);
     return float32_round_pack_canonical(pr, s);
 }
 
 float64 float64_round_to_int(float64 a, float_status *s)
 {
-    FloatParts64 pa = float64_unpack_canonical(a, s);
-    FloatParts64 pr = round_to_int(pa, s->float_rounding_mode, 0, s);
+    FloatParts64 pa, pr;
+
+    float64_unpack_canonical(&pa, a, s);
+    pr = round_to_int(pa, s->float_rounding_mode, 0, s);
     return float64_round_pack_canonical(pr, s);
 }
 
@@ -2190,8 +2248,10 @@ float64 float64_round_to_int(float64 a, float_status *s)
 
 bfloat16 bfloat16_round_to_int(bfloat16 a, float_status *s)
 {
-    FloatParts64 pa = bfloat16_unpack_canonical(a, s);
-    FloatParts64 pr = round_to_int(pa, s->float_rounding_mode, 0, s);
+    FloatParts64 pa, pr;
+
+    bfloat16_unpack_canonical(&pa, a, s);
+    pr = round_to_int(pa, s->float_rounding_mode, 0, s);
     return bfloat16_round_pack_canonical(pr, s);
 }
 
@@ -2253,71 +2313,91 @@ static int64_t round_to_int_and_pack(FloatParts64 in, FloatRoundMode rmode,
 int8_t float16_to_int8_scalbn(float16 a, FloatRoundMode rmode, int scale,
                               float_status *s)
 {
-    return round_to_int_and_pack(float16_unpack_canonical(a, s),
-                                 rmode, scale, INT8_MIN, INT8_MAX, s);
+    FloatParts64 p;
+
+    float16_unpack_canonical(&p, a, s);
+    return round_to_int_and_pack(p, rmode, scale, INT8_MIN, INT8_MAX, s);
 }
 
 int16_t float16_to_int16_scalbn(float16 a, FloatRoundMode rmode, int scale,
                                 float_status *s)
 {
-    return round_to_int_and_pack(float16_unpack_canonical(a, s),
-                                 rmode, scale, INT16_MIN, INT16_MAX, s);
+    FloatParts64 p;
+
+    float16_unpack_canonical(&p, a, s);
+    return round_to_int_and_pack(p, rmode, scale, INT16_MIN, INT16_MAX, s);
 }
 
 int32_t float16_to_int32_scalbn(float16 a, FloatRoundMode rmode, int scale,
                                 float_status *s)
 {
-    return round_to_int_and_pack(float16_unpack_canonical(a, s),
-                                 rmode, scale, INT32_MIN, INT32_MAX, s);
+    FloatParts64 p;
+
+    float16_unpack_canonical(&p, a, s);
+    return round_to_int_and_pack(p, rmode, scale, INT32_MIN, INT32_MAX, s);
 }
 
 int64_t float16_to_int64_scalbn(float16 a, FloatRoundMode rmode, int scale,
                                 float_status *s)
 {
-    return round_to_int_and_pack(float16_unpack_canonical(a, s),
-                                 rmode, scale, INT64_MIN, INT64_MAX, s);
+    FloatParts64 p;
+
+    float16_unpack_canonical(&p, a, s);
+    return round_to_int_and_pack(p, rmode, scale, INT64_MIN, INT64_MAX, s);
 }
 
 int16_t float32_to_int16_scalbn(float32 a, FloatRoundMode rmode, int scale,
                                 float_status *s)
 {
-    return round_to_int_and_pack(float32_unpack_canonical(a, s),
-                                 rmode, scale, INT16_MIN, INT16_MAX, s);
+    FloatParts64 p;
+
+    float32_unpack_canonical(&p, a, s);
+    return round_to_int_and_pack(p, rmode, scale, INT16_MIN, INT16_MAX, s);
 }
 
 int32_t float32_to_int32_scalbn(float32 a, FloatRoundMode rmode, int scale,
                                 float_status *s)
 {
-    return round_to_int_and_pack(float32_unpack_canonical(a, s),
-                                 rmode, scale, INT32_MIN, INT32_MAX, s);
+    FloatParts64 p;
+
+    float32_unpack_canonical(&p, a, s);
+    return round_to_int_and_pack(p, rmode, scale, INT32_MIN, INT32_MAX, s);
 }
 
 int64_t float32_to_int64_scalbn(float32 a, FloatRoundMode rmode, int scale,
                                 float_status *s)
 {
-    return round_to_int_and_pack(float32_unpack_canonical(a, s),
-                                 rmode, scale, INT64_MIN, INT64_MAX, s);
+    FloatParts64 p;
+
+    float32_unpack_canonical(&p, a, s);
+    return round_to_int_and_pack(p, rmode, scale, INT64_MIN, INT64_MAX, s);
 }
 
 int16_t float64_to_int16_scalbn(float64 a, FloatRoundMode rmode, int scale,
                                 float_status *s)
 {
-    return round_to_int_and_pack(float64_unpack_canonical(a, s),
-                                 rmode, scale, INT16_MIN, INT16_MAX, s);
+    FloatParts64 p;
+
+    float64_unpack_canonical(&p, a, s);
+    return round_to_int_and_pack(p, rmode, scale, INT16_MIN, INT16_MAX, s);
 }
 
 int32_t float64_to_int32_scalbn(float64 a, FloatRoundMode rmode, int scale,
                                 float_status *s)
 {
-    return round_to_int_and_pack(float64_unpack_canonical(a, s),
-                                 rmode, scale, INT32_MIN, INT32_MAX, s);
+    FloatParts64 p;
+
+    float64_unpack_canonical(&p, a, s);
+    return round_to_int_and_pack(p, rmode, scale, INT32_MIN, INT32_MAX, s);
 }
 
 int64_t float64_to_int64_scalbn(float64 a, FloatRoundMode rmode, int scale,
                                 float_status *s)
 {
-    return round_to_int_and_pack(float64_unpack_canonical(a, s),
-                                 rmode, scale, INT64_MIN, INT64_MAX, s);
+    FloatParts64 p;
+
+    float64_unpack_canonical(&p, a, s);
+    return round_to_int_and_pack(p, rmode, scale, INT64_MIN, INT64_MAX, s);
 }
 
 int8_t float16_to_int8(float16 a, float_status *s)
@@ -2423,22 +2503,28 @@ int64_t float64_to_int64_round_to_zero(float64 a, float_status *s)
 int16_t bfloat16_to_int16_scalbn(bfloat16 a, FloatRoundMode rmode, int scale,
                                  float_status *s)
 {
-    return round_to_int_and_pack(bfloat16_unpack_canonical(a, s),
-                                 rmode, scale, INT16_MIN, INT16_MAX, s);
+    FloatParts64 p;
+
+    bfloat16_unpack_canonical(&p, a, s);
+    return round_to_int_and_pack(p, rmode, scale, INT16_MIN, INT16_MAX, s);
 }
 
 int32_t bfloat16_to_int32_scalbn(bfloat16 a, FloatRoundMode rmode, int scale,
                                  float_status *s)
 {
-    return round_to_int_and_pack(bfloat16_unpack_canonical(a, s),
-                                 rmode, scale, INT32_MIN, INT32_MAX, s);
+    FloatParts64 p;
+
+    bfloat16_unpack_canonical(&p, a, s);
+    return round_to_int_and_pack(p, rmode, scale, INT32_MIN, INT32_MAX, s);
 }
 
 int64_t bfloat16_to_int64_scalbn(bfloat16 a, FloatRoundMode rmode, int scale,
                                  float_status *s)
 {
-    return round_to_int_and_pack(bfloat16_unpack_canonical(a, s),
-                                 rmode, scale, INT64_MIN, INT64_MAX, s);
+    FloatParts64 p;
+
+    bfloat16_unpack_canonical(&p, a, s);
+    return round_to_int_and_pack(p, rmode, scale, INT64_MIN, INT64_MAX, s);
 }
 
 int16_t bfloat16_to_int16(bfloat16 a, float_status *s)
@@ -2532,71 +2618,91 @@ static uint64_t round_to_uint_and_pack(FloatParts64 in, FloatRoundMode rmode,
 uint8_t float16_to_uint8_scalbn(float16 a, FloatRoundMode rmode, int scale,
                                 float_status *s)
 {
-    return round_to_uint_and_pack(float16_unpack_canonical(a, s),
-                                  rmode, scale, UINT8_MAX, s);
+    FloatParts64 p;
+
+    float16_unpack_canonical(&p, a, s);
+    return round_to_uint_and_pack(p, rmode, scale, UINT8_MAX, s);
 }
 
 uint16_t float16_to_uint16_scalbn(float16 a, FloatRoundMode rmode, int scale,
                                   float_status *s)
 {
-    return round_to_uint_and_pack(float16_unpack_canonical(a, s),
-                                  rmode, scale, UINT16_MAX, s);
+    FloatParts64 p;
+
+    float16_unpack_canonical(&p, a, s);
+    return round_to_uint_and_pack(p, rmode, scale, UINT16_MAX, s);
 }
 
 uint32_t float16_to_uint32_scalbn(float16 a, FloatRoundMode rmode, int scale,
                                   float_status *s)
 {
-    return round_to_uint_and_pack(float16_unpack_canonical(a, s),
-                                  rmode, scale, UINT32_MAX, s);
+    FloatParts64 p;
+
+    float16_unpack_canonical(&p, a, s);
+    return round_to_uint_and_pack(p, rmode, scale, UINT32_MAX, s);
 }
 
 uint64_t float16_to_uint64_scalbn(float16 a, FloatRoundMode rmode, int scale,
                                   float_status *s)
 {
-    return round_to_uint_and_pack(float16_unpack_canonical(a, s),
-                                  rmode, scale, UINT64_MAX, s);
+    FloatParts64 p;
+
+    float16_unpack_canonical(&p, a, s);
+    return round_to_uint_and_pack(p, rmode, scale, UINT64_MAX, s);
 }
 
 uint16_t float32_to_uint16_scalbn(float32 a, FloatRoundMode rmode, int scale,
                                   float_status *s)
 {
-    return round_to_uint_and_pack(float32_unpack_canonical(a, s),
-                                  rmode, scale, UINT16_MAX, s);
+    FloatParts64 p;
+
+    float32_unpack_canonical(&p, a, s);
+    return round_to_uint_and_pack(p, rmode, scale, UINT16_MAX, s);
 }
 
 uint32_t float32_to_uint32_scalbn(float32 a, FloatRoundMode rmode, int scale,
                                   float_status *s)
 {
-    return round_to_uint_and_pack(float32_unpack_canonical(a, s),
-                                  rmode, scale, UINT32_MAX, s);
+    FloatParts64 p;
+
+    float32_unpack_canonical(&p, a, s);
+    return round_to_uint_and_pack(p, rmode, scale, UINT32_MAX, s);
 }
 
 uint64_t float32_to_uint64_scalbn(float32 a, FloatRoundMode rmode, int scale,
                                   float_status *s)
 {
-    return round_to_uint_and_pack(float32_unpack_canonical(a, s),
-                                  rmode, scale, UINT64_MAX, s);
+    FloatParts64 p;
+
+    float32_unpack_canonical(&p, a, s);
+    return round_to_uint_and_pack(p, rmode, scale, UINT64_MAX, s);
 }
 
 uint16_t float64_to_uint16_scalbn(float64 a, FloatRoundMode rmode, int scale,
                                   float_status *s)
 {
-    return round_to_uint_and_pack(float64_unpack_canonical(a, s),
-                                  rmode, scale, UINT16_MAX, s);
+    FloatParts64 p;
+
+    float64_unpack_canonical(&p, a, s);
+    return round_to_uint_and_pack(p, rmode, scale, UINT16_MAX, s);
 }
 
 uint32_t float64_to_uint32_scalbn(float64 a, FloatRoundMode rmode, int scale,
                                   float_status *s)
 {
-    return round_to_uint_and_pack(float64_unpack_canonical(a, s),
-                                  rmode, scale, UINT32_MAX, s);
+    FloatParts64 p;
+
+    float64_unpack_canonical(&p, a, s);
+    return round_to_uint_and_pack(p, rmode, scale, UINT32_MAX, s);
 }
 
 uint64_t float64_to_uint64_scalbn(float64 a, FloatRoundMode rmode, int scale,
                                   float_status *s)
 {
-    return round_to_uint_and_pack(float64_unpack_canonical(a, s),
-                                  rmode, scale, UINT64_MAX, s);
+    FloatParts64 p;
+
+    float64_unpack_canonical(&p, a, s);
+    return round_to_uint_and_pack(p, rmode, scale, UINT64_MAX, s);
 }
 
 uint8_t float16_to_uint8(float16 a, float_status *s)
@@ -2702,22 +2808,28 @@ uint64_t float64_to_uint64_round_to_zero(float64 a, float_status *s)
 uint16_t bfloat16_to_uint16_scalbn(bfloat16 a, FloatRoundMode rmode,
                                    int scale, float_status *s)
 {
-    return round_to_uint_and_pack(bfloat16_unpack_canonical(a, s),
-                                  rmode, scale, UINT16_MAX, s);
+    FloatParts64 p;
+
+    bfloat16_unpack_canonical(&p, a, s);
+    return round_to_uint_and_pack(p, rmode, scale, UINT16_MAX, s);
 }
 
 uint32_t bfloat16_to_uint32_scalbn(bfloat16 a, FloatRoundMode rmode,
                                    int scale, float_status *s)
 {
-    return round_to_uint_and_pack(bfloat16_unpack_canonical(a, s),
-                                  rmode, scale, UINT32_MAX, s);
+    FloatParts64 p;
+
+    bfloat16_unpack_canonical(&p, a, s);
+    return round_to_uint_and_pack(p, rmode, scale, UINT32_MAX, s);
 }
 
 uint64_t bfloat16_to_uint64_scalbn(bfloat16 a, FloatRoundMode rmode,
                                    int scale, float_status *s)
 {
-    return round_to_uint_and_pack(bfloat16_unpack_canonical(a, s),
-                                  rmode, scale, UINT64_MAX, s);
+    FloatParts64 p;
+
+    bfloat16_unpack_canonical(&p, a, s);
+    return round_to_uint_and_pack(p, rmode, scale, UINT64_MAX, s);
 }
 
 uint16_t bfloat16_to_uint16(bfloat16 a, float_status *s)
@@ -3168,10 +3280,10 @@ static FloatParts64 minmax_floats(FloatParts64 a, FloatParts64 b, bool ismin,
 float ## sz float ## sz ## _ ## name(float ## sz a, float ## sz b,      \
                                      float_status *s)                   \
 {                                                                       \
-    FloatParts64 pa = float ## sz ## _unpack_canonical(a, s);             \
-    FloatParts64 pb = float ## sz ## _unpack_canonical(b, s);             \
-    FloatParts64 pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);      \
-                                                                        \
+    FloatParts64 pa, pb, pr;                                            \
+    float ## sz ## _unpack_canonical(&pa, a, s);                        \
+    float ## sz ## _unpack_canonical(&pb, b, s);                        \
+    pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);                 \
     return float ## sz ## _round_pack_canonical(pr, s);                 \
 }
 
@@ -3201,10 +3313,10 @@ MINMAX(64, maxnummag, false, true, true)
 #define BF16_MINMAX(name, ismin, isiee, ismag)                          \
 bfloat16 bfloat16_ ## name(bfloat16 a, bfloat16 b, float_status *s)     \
 {                                                                       \
-    FloatParts64 pa = bfloat16_unpack_canonical(a, s);                    \
-    FloatParts64 pb = bfloat16_unpack_canonical(b, s);                    \
-    FloatParts64 pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);      \
-                                                                        \
+    FloatParts64 pa, pb, pr;                                            \
+    bfloat16_unpack_canonical(&pa, a, s);                               \
+    bfloat16_unpack_canonical(&pb, b, s);                               \
+    pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);                 \
     return bfloat16_round_pack_canonical(pr, s);                        \
 }
 
@@ -3279,8 +3391,9 @@ static FloatRelation compare_floats(FloatParts64 a, FloatParts64 b, bool is_quie
 static int attr                                                         \
 name(float ## sz a, float ## sz b, bool is_quiet, float_status *s)      \
 {                                                                       \
-    FloatParts64 pa = float ## sz ## _unpack_canonical(a, s);             \
-    FloatParts64 pb = float ## sz ## _unpack_canonical(b, s);             \
+    FloatParts64 pa, pb;                                                \
+    float ## sz ## _unpack_canonical(&pa, a, s);                        \
+    float ## sz ## _unpack_canonical(&pb, b, s);                        \
     return compare_floats(pa, pb, is_quiet, s);                         \
 }
 
@@ -3381,8 +3494,10 @@ FloatRelation float64_compare_quiet(float64 a, float64 b, float_status *s)
 static FloatRelation QEMU_FLATTEN
 soft_bf16_compare(bfloat16 a, bfloat16 b, bool is_quiet, float_status *s)
 {
-    FloatParts64 pa = bfloat16_unpack_canonical(a, s);
-    FloatParts64 pb = bfloat16_unpack_canonical(b, s);
+    FloatParts64 pa, pb;
+
+    bfloat16_unpack_canonical(&pa, a, s);
+    bfloat16_unpack_canonical(&pb, b, s);
     return compare_floats(pa, pb, is_quiet, s);
 }
 
@@ -3416,29 +3531,37 @@ static FloatParts64 scalbn_decomposed(FloatParts64 a, int n, float_status *s)
 
 float16 float16_scalbn(float16 a, int n, float_status *status)
 {
-    FloatParts64 pa = float16_unpack_canonical(a, status);
-    FloatParts64 pr = scalbn_decomposed(pa, n, status);
+    FloatParts64 pa, pr;
+
+    float16_unpack_canonical(&pa, a, status);
+    pr = scalbn_decomposed(pa, n, status);
     return float16_round_pack_canonical(pr, status);
 }
 
 float32 float32_scalbn(float32 a, int n, float_status *status)
 {
-    FloatParts64 pa = float32_unpack_canonical(a, status);
-    FloatParts64 pr = scalbn_decomposed(pa, n, status);
+    FloatParts64 pa, pr;
+
+    float32_unpack_canonical(&pa, a, status);
+    pr = scalbn_decomposed(pa, n, status);
     return float32_round_pack_canonical(pr, status);
 }
 
 float64 float64_scalbn(float64 a, int n, float_status *status)
 {
-    FloatParts64 pa = float64_unpack_canonical(a, status);
-    FloatParts64 pr = scalbn_decomposed(pa, n, status);
+    FloatParts64 pa, pr;
+
+    float64_unpack_canonical(&pa, a, status);
+    pr = scalbn_decomposed(pa, n, status);
     return float64_round_pack_canonical(pr, status);
 }
 
 bfloat16 bfloat16_scalbn(bfloat16 a, int n, float_status *status)
 {
-    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
-    FloatParts64 pr = scalbn_decomposed(pa, n, status);
+    FloatParts64 pa, pr;
+
+    bfloat16_unpack_canonical(&pa, a, status);
+    pr = scalbn_decomposed(pa, n, status);
     return bfloat16_round_pack_canonical(pr, status);
 }
 
@@ -3515,24 +3638,30 @@ static FloatParts64 sqrt_float(FloatParts64 a, float_status *s, const FloatFmt *
 
 float16 QEMU_FLATTEN float16_sqrt(float16 a, float_status *status)
 {
-    FloatParts64 pa = float16_unpack_canonical(a, status);
-    FloatParts64 pr = sqrt_float(pa, status, &float16_params);
+    FloatParts64 pa, pr;
+
+    float16_unpack_canonical(&pa, a, status);
+    pr = sqrt_float(pa, status, &float16_params);
     return float16_round_pack_canonical(pr, status);
 }
 
 static float32 QEMU_SOFTFLOAT_ATTR
 soft_f32_sqrt(float32 a, float_status *status)
 {
-    FloatParts64 pa = float32_unpack_canonical(a, status);
-    FloatParts64 pr = sqrt_float(pa, status, &float32_params);
+    FloatParts64 pa, pr;
+
+    float32_unpack_canonical(&pa, a, status);
+    pr = sqrt_float(pa, status, &float32_params);
     return float32_round_pack_canonical(pr, status);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_f64_sqrt(float64 a, float_status *status)
 {
-    FloatParts64 pa = float64_unpack_canonical(a, status);
-    FloatParts64 pr = sqrt_float(pa, status, &float64_params);
+    FloatParts64 pa, pr;
+
+    float64_unpack_canonical(&pa, a, status);
+    pr = sqrt_float(pa, status, &float64_params);
     return float64_round_pack_canonical(pr, status);
 }
 
@@ -3592,8 +3721,10 @@ float64 QEMU_FLATTEN float64_sqrt(float64 xa, float_status *s)
 
 bfloat16 QEMU_FLATTEN bfloat16_sqrt(bfloat16 a, float_status *status)
 {
-    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
-    FloatParts64 pr = sqrt_float(pa, status, &bfloat16_params);
+    FloatParts64 pa, pr;
+
+    bfloat16_unpack_canonical(&pa, a, status);
+    pr = sqrt_float(pa, status, &bfloat16_params);
     return bfloat16_round_pack_canonical(pr, status);
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 23/72] softfloat: Use pointers with ftype_round_pack_canonical
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (21 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 22/72] softfloat: Use pointers with ftype_unpack_canonical Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-11 13:55   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 24/72] softfloat: Use pointers with parts_silence_nan Richard Henderson
                   ` (53 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 131 +++++++++++++++++++++++++-----------------------
 1 file changed, 68 insertions(+), 63 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index e53d4a138f..b0cbd5941c 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -945,22 +945,25 @@ static void bfloat16_unpack_canonical(FloatParts64 *p, bfloat16 f,
     *p = sf_canonicalize(*p, &bfloat16_params, s);
 }
 
-static float16 float16a_round_pack_canonical(FloatParts64 p, float_status *s,
+static float16 float16a_round_pack_canonical(FloatParts64 *p,
+                                             float_status *s,
                                              const FloatFmt *params)
 {
-    p = round_canonical(p, s, params);
-    return float16_pack_raw(&p);
+    *p = round_canonical(*p, s, params);
+    return float16_pack_raw(p);
 }
 
-static float16 float16_round_pack_canonical(FloatParts64 p, float_status *s)
+static float16 float16_round_pack_canonical(FloatParts64 *p,
+                                            float_status *s)
 {
     return float16a_round_pack_canonical(p, s, &float16_params);
 }
 
-static bfloat16 bfloat16_round_pack_canonical(FloatParts64 p, float_status *s)
+static bfloat16 bfloat16_round_pack_canonical(FloatParts64 *p,
+                                              float_status *s)
 {
-    p = round_canonical(p, s, &bfloat16_params);
-    return bfloat16_pack_raw(&p);
+    *p = round_canonical(*p, s, &bfloat16_params);
+    return bfloat16_pack_raw(p);
 }
 
 static void float32_unpack_canonical(FloatParts64 *p, float32 f,
@@ -970,10 +973,11 @@ static void float32_unpack_canonical(FloatParts64 *p, float32 f,
     *p = sf_canonicalize(*p, &float32_params, s);
 }
 
-static float32 float32_round_pack_canonical(FloatParts64 p, float_status *s)
+static float32 float32_round_pack_canonical(FloatParts64 *p,
+                                            float_status *s)
 {
-    p = round_canonical(p, s, &float32_params);
-    return float32_pack_raw(&p);
+    *p = round_canonical(*p, s, &float32_params);
+    return float32_pack_raw(p);
 }
 
 static void float64_unpack_canonical(FloatParts64 *p, float64 f,
@@ -983,10 +987,11 @@ static void float64_unpack_canonical(FloatParts64 *p, float64 f,
     *p = sf_canonicalize(*p, &float64_params, s);
 }
 
-static float64 float64_round_pack_canonical(FloatParts64 p, float_status *s)
+static float64 float64_round_pack_canonical(FloatParts64 *p,
+                                            float_status *s)
 {
-    p = round_canonical(p, s, &float64_params);
-    return float64_pack_raw(&p);
+    *p = round_canonical(*p, s, &float64_params);
+    return float64_pack_raw(p);
 }
 
 /*
@@ -1093,7 +1098,7 @@ float16 QEMU_FLATTEN float16_add(float16 a, float16 b, float_status *status)
     float16_unpack_canonical(&pb, b, status);
     pr = addsub_floats(pa, pb, false, status);
 
-    return float16_round_pack_canonical(pr, status);
+    return float16_round_pack_canonical(&pr, status);
 }
 
 float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
@@ -1104,7 +1109,7 @@ float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
     float16_unpack_canonical(&pb, b, status);
     pr = addsub_floats(pa, pb, true, status);
 
-    return float16_round_pack_canonical(pr, status);
+    return float16_round_pack_canonical(&pr, status);
 }
 
 static float32 QEMU_SOFTFLOAT_ATTR
@@ -1116,7 +1121,7 @@ soft_f32_addsub(float32 a, float32 b, bool subtract, float_status *status)
     float32_unpack_canonical(&pb, b, status);
     pr = addsub_floats(pa, pb, subtract, status);
 
-    return float32_round_pack_canonical(pr, status);
+    return float32_round_pack_canonical(&pr, status);
 }
 
 static inline float32 soft_f32_add(float32 a, float32 b, float_status *status)
@@ -1138,7 +1143,7 @@ soft_f64_addsub(float64 a, float64 b, bool subtract, float_status *status)
     float64_unpack_canonical(&pb, b, status);
     pr = addsub_floats(pa, pb, subtract, status);
 
-    return float64_round_pack_canonical(pr, status);
+    return float64_round_pack_canonical(&pr, status);
 }
 
 static inline float64 soft_f64_add(float64 a, float64 b, float_status *status)
@@ -1238,7 +1243,7 @@ bfloat16 QEMU_FLATTEN bfloat16_add(bfloat16 a, bfloat16 b, float_status *status)
     bfloat16_unpack_canonical(&pb, b, status);
     pr = addsub_floats(pa, pb, false, status);
 
-    return bfloat16_round_pack_canonical(pr, status);
+    return bfloat16_round_pack_canonical(&pr, status);
 }
 
 bfloat16 QEMU_FLATTEN bfloat16_sub(bfloat16 a, bfloat16 b, float_status *status)
@@ -1249,7 +1254,7 @@ bfloat16 QEMU_FLATTEN bfloat16_sub(bfloat16 a, bfloat16 b, float_status *status)
     bfloat16_unpack_canonical(&pb, b, status);
     pr = addsub_floats(pa, pb, true, status);
 
-    return bfloat16_round_pack_canonical(pr, status);
+    return bfloat16_round_pack_canonical(&pr, status);
 }
 
 /*
@@ -1311,7 +1316,7 @@ float16 QEMU_FLATTEN float16_mul(float16 a, float16 b, float_status *status)
     float16_unpack_canonical(&pb, b, status);
     pr = mul_floats(pa, pb, status);
 
-    return float16_round_pack_canonical(pr, status);
+    return float16_round_pack_canonical(&pr, status);
 }
 
 static float32 QEMU_SOFTFLOAT_ATTR
@@ -1323,7 +1328,7 @@ soft_f32_mul(float32 a, float32 b, float_status *status)
     float32_unpack_canonical(&pb, b, status);
     pr = mul_floats(pa, pb, status);
 
-    return float32_round_pack_canonical(pr, status);
+    return float32_round_pack_canonical(&pr, status);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
@@ -1335,7 +1340,7 @@ soft_f64_mul(float64 a, float64 b, float_status *status)
     float64_unpack_canonical(&pb, b, status);
     pr = mul_floats(pa, pb, status);
 
-    return float64_round_pack_canonical(pr, status);
+    return float64_round_pack_canonical(&pr, status);
 }
 
 static float hard_f32_mul(float a, float b)
@@ -1375,7 +1380,7 @@ bfloat16 QEMU_FLATTEN bfloat16_mul(bfloat16 a, bfloat16 b, float_status *status)
     bfloat16_unpack_canonical(&pb, b, status);
     pr = mul_floats(pa, pb, status);
 
-    return bfloat16_round_pack_canonical(pr, status);
+    return bfloat16_round_pack_canonical(&pr, status);
 }
 
 /*
@@ -1574,7 +1579,7 @@ float16 QEMU_FLATTEN float16_muladd(float16 a, float16 b, float16 c,
     float16_unpack_canonical(&pc, c, status);
     pr = muladd_floats(pa, pb, pc, flags, status);
 
-    return float16_round_pack_canonical(pr, status);
+    return float16_round_pack_canonical(&pr, status);
 }
 
 static float32 QEMU_SOFTFLOAT_ATTR
@@ -1588,7 +1593,7 @@ soft_f32_muladd(float32 a, float32 b, float32 c, int flags,
     float32_unpack_canonical(&pc, c, status);
     pr = muladd_floats(pa, pb, pc, flags, status);
 
-    return float32_round_pack_canonical(pr, status);
+    return float32_round_pack_canonical(&pr, status);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
@@ -1602,7 +1607,7 @@ soft_f64_muladd(float64 a, float64 b, float64 c, int flags,
     float64_unpack_canonical(&pc, c, status);
     pr = muladd_floats(pa, pb, pc, flags, status);
 
-    return float64_round_pack_canonical(pr, status);
+    return float64_round_pack_canonical(&pr, status);
 }
 
 static bool force_soft_fma;
@@ -1765,7 +1770,7 @@ bfloat16 QEMU_FLATTEN bfloat16_muladd(bfloat16 a, bfloat16 b, bfloat16 c,
     bfloat16_unpack_canonical(&pc, c, status);
     pr = muladd_floats(pa, pb, pc, flags, status);
 
-    return bfloat16_round_pack_canonical(pr, status);
+    return bfloat16_round_pack_canonical(&pr, status);
 }
 
 /*
@@ -1848,7 +1853,7 @@ float16 float16_div(float16 a, float16 b, float_status *status)
     float16_unpack_canonical(&pb, b, status);
     pr = div_floats(pa, pb, status);
 
-    return float16_round_pack_canonical(pr, status);
+    return float16_round_pack_canonical(&pr, status);
 }
 
 static float32 QEMU_SOFTFLOAT_ATTR
@@ -1860,7 +1865,7 @@ soft_f32_div(float32 a, float32 b, float_status *status)
     float32_unpack_canonical(&pb, b, status);
     pr = div_floats(pa, pb, status);
 
-    return float32_round_pack_canonical(pr, status);
+    return float32_round_pack_canonical(&pr, status);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
@@ -1872,7 +1877,7 @@ soft_f64_div(float64 a, float64 b, float_status *status)
     float64_unpack_canonical(&pb, b, status);
     pr = div_floats(pa, pb, status);
 
-    return float64_round_pack_canonical(pr, status);
+    return float64_round_pack_canonical(&pr, status);
 }
 
 static float hard_f32_div(float a, float b)
@@ -1946,7 +1951,7 @@ bfloat16 bfloat16_div(bfloat16 a, bfloat16 b, float_status *status)
     bfloat16_unpack_canonical(&pb, b, status);
     pr = div_floats(pa, pb, status);
 
-    return bfloat16_round_pack_canonical(pr, status);
+    return bfloat16_round_pack_canonical(&pr, status);
 }
 
 /*
@@ -2002,7 +2007,7 @@ float32 float16_to_float32(float16 a, bool ieee, float_status *s)
 
     float16a_unpack_canonical(&pa, a, s, fmt16);
     pr = float_to_float(pa, &float32_params, s);
-    return float32_round_pack_canonical(pr, s);
+    return float32_round_pack_canonical(&pr, s);
 }
 
 float64 float16_to_float64(float16 a, bool ieee, float_status *s)
@@ -2012,7 +2017,7 @@ float64 float16_to_float64(float16 a, bool ieee, float_status *s)
 
     float16a_unpack_canonical(&pa, a, s, fmt16);
     pr = float_to_float(pa, &float64_params, s);
-    return float64_round_pack_canonical(pr, s);
+    return float64_round_pack_canonical(&pr, s);
 }
 
 float16 float32_to_float16(float32 a, bool ieee, float_status *s)
@@ -2022,7 +2027,7 @@ float16 float32_to_float16(float32 a, bool ieee, float_status *s)
 
     float32_unpack_canonical(&pa, a, s);
     pr = float_to_float(pa, fmt16, s);
-    return float16a_round_pack_canonical(pr, s, fmt16);
+    return float16a_round_pack_canonical(&pr, s, fmt16);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
@@ -2032,7 +2037,7 @@ soft_float32_to_float64(float32 a, float_status *s)
 
     float32_unpack_canonical(&pa, a, s);
     pr = float_to_float(pa, &float64_params, s);
-    return float64_round_pack_canonical(pr, s);
+    return float64_round_pack_canonical(&pr, s);
 }
 
 float64 float32_to_float64(float32 a, float_status *s)
@@ -2058,7 +2063,7 @@ float16 float64_to_float16(float64 a, bool ieee, float_status *s)
 
     float64_unpack_canonical(&pa, a, s);
     pr = float_to_float(pa, fmt16, s);
-    return float16a_round_pack_canonical(pr, s, fmt16);
+    return float16a_round_pack_canonical(&pr, s, fmt16);
 }
 
 float32 float64_to_float32(float64 a, float_status *s)
@@ -2067,7 +2072,7 @@ float32 float64_to_float32(float64 a, float_status *s)
 
     float64_unpack_canonical(&pa, a, s);
     pr = float_to_float(pa, &float32_params, s);
-    return float32_round_pack_canonical(pr, s);
+    return float32_round_pack_canonical(&pr, s);
 }
 
 float32 bfloat16_to_float32(bfloat16 a, float_status *s)
@@ -2076,7 +2081,7 @@ float32 bfloat16_to_float32(bfloat16 a, float_status *s)
 
     bfloat16_unpack_canonical(&pa, a, s);
     pr = float_to_float(pa, &float32_params, s);
-    return float32_round_pack_canonical(pr, s);
+    return float32_round_pack_canonical(&pr, s);
 }
 
 float64 bfloat16_to_float64(bfloat16 a, float_status *s)
@@ -2085,7 +2090,7 @@ float64 bfloat16_to_float64(bfloat16 a, float_status *s)
 
     bfloat16_unpack_canonical(&pa, a, s);
     pr = float_to_float(pa, &float64_params, s);
-    return float64_round_pack_canonical(pr, s);
+    return float64_round_pack_canonical(&pr, s);
 }
 
 bfloat16 float32_to_bfloat16(float32 a, float_status *s)
@@ -2094,7 +2099,7 @@ bfloat16 float32_to_bfloat16(float32 a, float_status *s)
 
     float32_unpack_canonical(&pa, a, s);
     pr = float_to_float(pa, &bfloat16_params, s);
-    return bfloat16_round_pack_canonical(pr, s);
+    return bfloat16_round_pack_canonical(&pr, s);
 }
 
 bfloat16 float64_to_bfloat16(float64 a, float_status *s)
@@ -2103,7 +2108,7 @@ bfloat16 float64_to_bfloat16(float64 a, float_status *s)
 
     float64_unpack_canonical(&pa, a, s);
     pr = float_to_float(pa, &bfloat16_params, s);
-    return bfloat16_round_pack_canonical(pr, s);
+    return bfloat16_round_pack_canonical(&pr, s);
 }
 
 /*
@@ -2220,7 +2225,7 @@ float16 float16_round_to_int(float16 a, float_status *s)
 
     float16_unpack_canonical(&pa, a, s);
     pr = round_to_int(pa, s->float_rounding_mode, 0, s);
-    return float16_round_pack_canonical(pr, s);
+    return float16_round_pack_canonical(&pr, s);
 }
 
 float32 float32_round_to_int(float32 a, float_status *s)
@@ -2229,7 +2234,7 @@ float32 float32_round_to_int(float32 a, float_status *s)
 
     float32_unpack_canonical(&pa, a, s);
     pr = round_to_int(pa, s->float_rounding_mode, 0, s);
-    return float32_round_pack_canonical(pr, s);
+    return float32_round_pack_canonical(&pr, s);
 }
 
 float64 float64_round_to_int(float64 a, float_status *s)
@@ -2238,7 +2243,7 @@ float64 float64_round_to_int(float64 a, float_status *s)
 
     float64_unpack_canonical(&pa, a, s);
     pr = round_to_int(pa, s->float_rounding_mode, 0, s);
-    return float64_round_pack_canonical(pr, s);
+    return float64_round_pack_canonical(&pr, s);
 }
 
 /*
@@ -2252,7 +2257,7 @@ bfloat16 bfloat16_round_to_int(bfloat16 a, float_status *s)
 
     bfloat16_unpack_canonical(&pa, a, s);
     pr = round_to_int(pa, s->float_rounding_mode, 0, s);
-    return bfloat16_round_pack_canonical(pr, s);
+    return bfloat16_round_pack_canonical(&pr, s);
 }
 
 /*
@@ -2898,7 +2903,7 @@ static FloatParts64 int_to_float(int64_t a, int scale, float_status *status)
 float16 int64_to_float16_scalbn(int64_t a, int scale, float_status *status)
 {
     FloatParts64 pa = int_to_float(a, scale, status);
-    return float16_round_pack_canonical(pa, status);
+    return float16_round_pack_canonical(&pa, status);
 }
 
 float16 int32_to_float16_scalbn(int32_t a, int scale, float_status *status)
@@ -2934,7 +2939,7 @@ float16 int8_to_float16(int8_t a, float_status *status)
 float32 int64_to_float32_scalbn(int64_t a, int scale, float_status *status)
 {
     FloatParts64 pa = int_to_float(a, scale, status);
-    return float32_round_pack_canonical(pa, status);
+    return float32_round_pack_canonical(&pa, status);
 }
 
 float32 int32_to_float32_scalbn(int32_t a, int scale, float_status *status)
@@ -2965,7 +2970,7 @@ float32 int16_to_float32(int16_t a, float_status *status)
 float64 int64_to_float64_scalbn(int64_t a, int scale, float_status *status)
 {
     FloatParts64 pa = int_to_float(a, scale, status);
-    return float64_round_pack_canonical(pa, status);
+    return float64_round_pack_canonical(&pa, status);
 }
 
 float64 int32_to_float64_scalbn(int32_t a, int scale, float_status *status)
@@ -3001,7 +3006,7 @@ float64 int16_to_float64(int16_t a, float_status *status)
 bfloat16 int64_to_bfloat16_scalbn(int64_t a, int scale, float_status *status)
 {
     FloatParts64 pa = int_to_float(a, scale, status);
-    return bfloat16_round_pack_canonical(pa, status);
+    return bfloat16_round_pack_canonical(&pa, status);
 }
 
 bfloat16 int32_to_bfloat16_scalbn(int32_t a, int scale, float_status *status)
@@ -3058,7 +3063,7 @@ static FloatParts64 uint_to_float(uint64_t a, int scale, float_status *status)
 float16 uint64_to_float16_scalbn(uint64_t a, int scale, float_status *status)
 {
     FloatParts64 pa = uint_to_float(a, scale, status);
-    return float16_round_pack_canonical(pa, status);
+    return float16_round_pack_canonical(&pa, status);
 }
 
 float16 uint32_to_float16_scalbn(uint32_t a, int scale, float_status *status)
@@ -3094,7 +3099,7 @@ float16 uint8_to_float16(uint8_t a, float_status *status)
 float32 uint64_to_float32_scalbn(uint64_t a, int scale, float_status *status)
 {
     FloatParts64 pa = uint_to_float(a, scale, status);
-    return float32_round_pack_canonical(pa, status);
+    return float32_round_pack_canonical(&pa, status);
 }
 
 float32 uint32_to_float32_scalbn(uint32_t a, int scale, float_status *status)
@@ -3125,7 +3130,7 @@ float32 uint16_to_float32(uint16_t a, float_status *status)
 float64 uint64_to_float64_scalbn(uint64_t a, int scale, float_status *status)
 {
     FloatParts64 pa = uint_to_float(a, scale, status);
-    return float64_round_pack_canonical(pa, status);
+    return float64_round_pack_canonical(&pa, status);
 }
 
 float64 uint32_to_float64_scalbn(uint32_t a, int scale, float_status *status)
@@ -3161,7 +3166,7 @@ float64 uint16_to_float64(uint16_t a, float_status *status)
 bfloat16 uint64_to_bfloat16_scalbn(uint64_t a, int scale, float_status *status)
 {
     FloatParts64 pa = uint_to_float(a, scale, status);
-    return bfloat16_round_pack_canonical(pa, status);
+    return bfloat16_round_pack_canonical(&pa, status);
 }
 
 bfloat16 uint32_to_bfloat16_scalbn(uint32_t a, int scale, float_status *status)
@@ -3284,7 +3289,7 @@ float ## sz float ## sz ## _ ## name(float ## sz a, float ## sz b,      \
     float ## sz ## _unpack_canonical(&pa, a, s);                        \
     float ## sz ## _unpack_canonical(&pb, b, s);                        \
     pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);                 \
-    return float ## sz ## _round_pack_canonical(pr, s);                 \
+    return float ## sz ## _round_pack_canonical(&pr, s);                \
 }
 
 MINMAX(16, min, true, false, false)
@@ -3317,7 +3322,7 @@ bfloat16 bfloat16_ ## name(bfloat16 a, bfloat16 b, float_status *s)     \
     bfloat16_unpack_canonical(&pa, a, s);                               \
     bfloat16_unpack_canonical(&pb, b, s);                               \
     pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);                 \
-    return bfloat16_round_pack_canonical(pr, s);                        \
+    return bfloat16_round_pack_canonical(&pr, s);                       \
 }
 
 BF16_MINMAX(min, true, false, false)
@@ -3535,7 +3540,7 @@ float16 float16_scalbn(float16 a, int n, float_status *status)
 
     float16_unpack_canonical(&pa, a, status);
     pr = scalbn_decomposed(pa, n, status);
-    return float16_round_pack_canonical(pr, status);
+    return float16_round_pack_canonical(&pr, status);
 }
 
 float32 float32_scalbn(float32 a, int n, float_status *status)
@@ -3544,7 +3549,7 @@ float32 float32_scalbn(float32 a, int n, float_status *status)
 
     float32_unpack_canonical(&pa, a, status);
     pr = scalbn_decomposed(pa, n, status);
-    return float32_round_pack_canonical(pr, status);
+    return float32_round_pack_canonical(&pr, status);
 }
 
 float64 float64_scalbn(float64 a, int n, float_status *status)
@@ -3553,7 +3558,7 @@ float64 float64_scalbn(float64 a, int n, float_status *status)
 
     float64_unpack_canonical(&pa, a, status);
     pr = scalbn_decomposed(pa, n, status);
-    return float64_round_pack_canonical(pr, status);
+    return float64_round_pack_canonical(&pr, status);
 }
 
 bfloat16 bfloat16_scalbn(bfloat16 a, int n, float_status *status)
@@ -3562,7 +3567,7 @@ bfloat16 bfloat16_scalbn(bfloat16 a, int n, float_status *status)
 
     bfloat16_unpack_canonical(&pa, a, status);
     pr = scalbn_decomposed(pa, n, status);
-    return bfloat16_round_pack_canonical(pr, status);
+    return bfloat16_round_pack_canonical(&pr, status);
 }
 
 /*
@@ -3642,7 +3647,7 @@ float16 QEMU_FLATTEN float16_sqrt(float16 a, float_status *status)
 
     float16_unpack_canonical(&pa, a, status);
     pr = sqrt_float(pa, status, &float16_params);
-    return float16_round_pack_canonical(pr, status);
+    return float16_round_pack_canonical(&pr, status);
 }
 
 static float32 QEMU_SOFTFLOAT_ATTR
@@ -3652,7 +3657,7 @@ soft_f32_sqrt(float32 a, float_status *status)
 
     float32_unpack_canonical(&pa, a, status);
     pr = sqrt_float(pa, status, &float32_params);
-    return float32_round_pack_canonical(pr, status);
+    return float32_round_pack_canonical(&pr, status);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
@@ -3662,7 +3667,7 @@ soft_f64_sqrt(float64 a, float_status *status)
 
     float64_unpack_canonical(&pa, a, status);
     pr = sqrt_float(pa, status, &float64_params);
-    return float64_round_pack_canonical(pr, status);
+    return float64_round_pack_canonical(&pr, status);
 }
 
 float32 QEMU_FLATTEN float32_sqrt(float32 xa, float_status *s)
@@ -3725,7 +3730,7 @@ bfloat16 QEMU_FLATTEN bfloat16_sqrt(bfloat16 a, float_status *status)
 
     bfloat16_unpack_canonical(&pa, a, status);
     pr = sqrt_float(pa, status, &bfloat16_params);
-    return bfloat16_round_pack_canonical(pr, status);
+    return bfloat16_round_pack_canonical(&pr, status);
 }
 
 /*----------------------------------------------------------------------------
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 24/72] softfloat: Use pointers with parts_silence_nan
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (22 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 23/72] softfloat: Use pointers with ftype_round_pack_canonical Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-11 13:56   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 25/72] softfloat: Rearrange FloatParts64 Richard Henderson
                   ` (52 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

At the same time, rename to parts64_silence_nan, split out
parts_silence_nan_frac, and define a macro for parts_silence_nan.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c                | 16 +++++++++-------
 fpu/softfloat-specialize.c.inc | 17 +++++++++++------
 2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index b0cbd5941c..2123453d40 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -657,6 +657,7 @@ static inline float64 float64_pack_raw(const FloatParts64 *p)
 #include "softfloat-specialize.c.inc"
 
 #define parts_default_nan  parts64_default_nan
+#define parts_silence_nan  parts64_silence_nan
 
 /* Canonicalize EXP and FRAC, setting CLS.  */
 static FloatParts64 sf_canonicalize(FloatParts64 part, const FloatFmt *parm,
@@ -851,7 +852,8 @@ static FloatParts64 return_nan(FloatParts64 a, float_status *s)
     if (is_snan(a.cls)) {
         float_raise(float_flag_invalid, s);
         if (!s->default_nan_mode) {
-            return parts_silence_nan(a, s);
+            parts_silence_nan(&a, s);
+            return a;
         }
     } else if (!s->default_nan_mode) {
         return a;
@@ -875,7 +877,7 @@ static FloatParts64 pick_nan(FloatParts64 a, FloatParts64 b, float_status *s)
             a = b;
         }
         if (is_snan(a.cls)) {
-            return parts_silence_nan(a, s);
+            parts_silence_nan(&a, s);
         }
     }
     return a;
@@ -916,7 +918,7 @@ static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64
     }
 
     if (is_snan(a.cls)) {
-        return parts_silence_nan(a, s);
+        parts_silence_nan(&a, s);
     }
     return a;
 }
@@ -3801,7 +3803,7 @@ float16 float16_silence_nan(float16 a, float_status *status)
 
     float16_unpack_raw(&p, a);
     p.frac <<= float16_params.frac_shift;
-    p = parts_silence_nan(p, status);
+    parts_silence_nan(&p, status);
     p.frac >>= float16_params.frac_shift;
     return float16_pack_raw(&p);
 }
@@ -3812,7 +3814,7 @@ float32 float32_silence_nan(float32 a, float_status *status)
 
     float32_unpack_raw(&p, a);
     p.frac <<= float32_params.frac_shift;
-    p = parts_silence_nan(p, status);
+    parts_silence_nan(&p, status);
     p.frac >>= float32_params.frac_shift;
     return float32_pack_raw(&p);
 }
@@ -3823,7 +3825,7 @@ float64 float64_silence_nan(float64 a, float_status *status)
 
     float64_unpack_raw(&p, a);
     p.frac <<= float64_params.frac_shift;
-    p = parts_silence_nan(p, status);
+    parts_silence_nan(&p, status);
     p.frac >>= float64_params.frac_shift;
     return float64_pack_raw(&p);
 }
@@ -3834,7 +3836,7 @@ bfloat16 bfloat16_silence_nan(bfloat16 a, float_status *status)
 
     bfloat16_unpack_raw(&p, a);
     p.frac <<= bfloat16_params.frac_shift;
-    p = parts_silence_nan(p, status);
+    parts_silence_nan(&p, status);
     p.frac >>= bfloat16_params.frac_shift;
     return bfloat16_pack_raw(&p);
 }
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index 085ddea62b..2a1bc66633 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -177,20 +177,25 @@ static void parts64_default_nan(FloatParts64 *p, float_status *status)
 | floating-point parts.
 *----------------------------------------------------------------------------*/
 
-static FloatParts64 parts_silence_nan(FloatParts64 a, float_status *status)
+static uint64_t parts_silence_nan_frac(uint64_t frac, float_status *status)
 {
     g_assert(!no_signaling_nans(status));
     g_assert(!status->default_nan_mode);
 
     /* The only snan_bit_is_one target without default_nan_mode is HPPA. */
     if (snan_bit_is_one(status)) {
-        a.frac &= ~(1ULL << (DECOMPOSED_BINARY_POINT - 1));
-        a.frac |= 1ULL << (DECOMPOSED_BINARY_POINT - 2);
+        frac &= ~(1ULL << (DECOMPOSED_BINARY_POINT - 1));
+        frac |= 1ULL << (DECOMPOSED_BINARY_POINT - 2);
     } else {
-        a.frac |= 1ULL << (DECOMPOSED_BINARY_POINT - 1);
+        frac |= 1ULL << (DECOMPOSED_BINARY_POINT - 1);
     }
-    a.cls = float_class_qnan;
-    return a;
+    return frac;
+}
+
+static void parts64_silence_nan(FloatParts64 *p, float_status *status)
+{
+    p->frac = parts_silence_nan_frac(p->frac, status);
+    p->cls = float_class_qnan;
 }
 
 /*----------------------------------------------------------------------------
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 25/72] softfloat: Rearrange FloatParts64
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (23 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 24/72] softfloat: Use pointers with parts_silence_nan Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-11 13:57   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 26/72] softfloat: Convert float128_silence_nan to parts Richard Henderson
                   ` (51 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Shuffle the fraction to the end, otherwise sort by size.
Add frac_hi and frac_lo members to alias frac.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 2123453d40..2d6f61ee7a 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -511,10 +511,19 @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
  */
 
 typedef struct {
-    uint64_t frac;
-    int32_t  exp;
     FloatClass cls;
     bool sign;
+    int32_t exp;
+    union {
+        /* Routines that know the structure may reference the singular name. */
+        uint64_t frac;
+        /*
+         * Routines expanded with multiple structures reference "hi" and "lo".
+         * In this structure, the one word is both highest and lowest.
+         */
+        uint64_t frac_hi;
+        uint64_t frac_lo;
+    };
 } FloatParts64;
 
 #define DECOMPOSED_BINARY_POINT    63
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 26/72] softfloat: Convert float128_silence_nan to parts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (24 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 25/72] softfloat: Rearrange FloatParts64 Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13  8:34   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 27/72] softfloat: Convert float128_default_nan " Richard Henderson
                   ` (50 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

This is the minimal change that also introduces float128_params,
float128_unpack_raw, and float128_pack_raw without running into
unused symbol Werrors.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c                | 96 +++++++++++++++++++++++++++++-----
 fpu/softfloat-specialize.c.inc | 25 +++------
 2 files changed, 89 insertions(+), 32 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 2d6f61ee7a..073b80d502 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -500,14 +500,12 @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
 }
 
 /*
- * Structure holding all of the decomposed parts of a float. The
- * exponent is unbiased and the fraction is normalized. All
- * calculations are done with a 64 bit fraction and then rounded as
- * appropriate for the final format.
+ * Structure holding all of the decomposed parts of a float.
+ * The exponent is unbiased and the fraction is normalized.
  *
- * Thanks to the packed FloatClass a decent compiler should be able to
- * fit the whole structure into registers and avoid using the stack
- * for parameter passing.
+ * The fraction words are stored in big-endian word ordering,
+ * so that truncation from a larger format to a smaller format
+ * can be done simply by ignoring subsequent elements.
  */
 
 typedef struct {
@@ -526,6 +524,15 @@ typedef struct {
     };
 } FloatParts64;
 
+typedef struct {
+    FloatClass cls;
+    bool sign;
+    int32_t exp;
+    uint64_t frac_hi;
+    uint64_t frac_lo;
+} FloatParts128;
+
+/* These apply to the most significant word of each FloatPartsN. */
 #define DECOMPOSED_BINARY_POINT    63
 #define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
 
@@ -561,11 +568,11 @@ typedef struct {
     .exp_bias       = ((1 << E) - 1) >> 1,                           \
     .exp_max        = (1 << E) - 1,                                  \
     .frac_size      = F,                                             \
-    .frac_shift     = DECOMPOSED_BINARY_POINT - F,                   \
-    .frac_lsb       = 1ull << (DECOMPOSED_BINARY_POINT - F),         \
-    .frac_lsbm1     = 1ull << ((DECOMPOSED_BINARY_POINT - F) - 1),   \
-    .round_mask     = (1ull << (DECOMPOSED_BINARY_POINT - F)) - 1,   \
-    .roundeven_mask = (2ull << (DECOMPOSED_BINARY_POINT - F)) - 1
+    .frac_shift     = (-F - 1) & 63,                                 \
+    .frac_lsb       = 1ull << ((-F - 1) & 63),                       \
+    .frac_lsbm1     = 1ull << ((-F - 2) & 63),                       \
+    .round_mask     = (1ull << ((-F - 1) & 63)) - 1,                 \
+    .roundeven_mask = (2ull << ((-F - 1) & 63)) - 1
 
 static const FloatFmt float16_params = {
     FLOAT_PARAMS(5, 10)
@@ -588,6 +595,10 @@ static const FloatFmt float64_params = {
     FLOAT_PARAMS(11, 52)
 };
 
+static const FloatFmt float128_params = {
+    FLOAT_PARAMS(15, 112)
+};
+
 /* Unpack a float to parts, but do not canonicalize.  */
 static void unpack_raw64(FloatParts64 *r, const FloatFmt *fmt, uint64_t raw)
 {
@@ -622,6 +633,20 @@ static inline void float64_unpack_raw(FloatParts64 *p, float64 f)
     unpack_raw64(p, &float64_params, f);
 }
 
+static void float128_unpack_raw(FloatParts128 *p, float128 f)
+{
+    const int f_size = float128_params.frac_size - 64;
+    const int e_size = float128_params.exp_size;
+
+    *p = (FloatParts128) {
+        .cls = float_class_unclassified,
+        .sign = extract64(f.high, f_size + e_size, 1),
+        .exp = extract64(f.high, f_size, e_size),
+        .frac_hi = extract64(f.high, 0, f_size),
+        .frac_lo = f.low,
+    };
+}
+
 /* Pack a float from parts, but do not canonicalize.  */
 static uint64_t pack_raw64(const FloatParts64 *p, const FloatFmt *fmt)
 {
@@ -655,6 +680,18 @@ static inline float64 float64_pack_raw(const FloatParts64 *p)
     return make_float64(pack_raw64(p, &float64_params));
 }
 
+static float128 float128_pack_raw(const FloatParts128 *p)
+{
+    const int f_size = float128_params.frac_size - 64;
+    const int e_size = float128_params.exp_size;
+    uint64_t hi;
+
+    hi = (uint64_t)p->sign << (f_size + e_size);
+    hi = deposit64(hi, f_size, e_size, p->exp);
+    hi = deposit64(hi, 0, f_size, p->frac_hi);
+    return make_float128(hi, p->frac_lo);
+}
+
 /*----------------------------------------------------------------------------
 | Functions and definitions to determine:  (1) whether tininess for underflow
 | is detected before or after rounding by default, (2) what (if anything)
@@ -665,8 +702,30 @@ static inline float64 float64_pack_raw(const FloatParts64 *p)
 *----------------------------------------------------------------------------*/
 #include "softfloat-specialize.c.inc"
 
+#define PARTS_GENERIC_64_128(NAME, P) \
+    QEMU_GENERIC(P, (FloatParts128 *, parts128_##NAME), parts64_##NAME)
+
 #define parts_default_nan  parts64_default_nan
-#define parts_silence_nan  parts64_silence_nan
+#define parts_silence_nan(P, S)    PARTS_GENERIC_64_128(silence_nan, P)(P, S)
+
+
+/*
+ * Helper functions for softfloat-parts.c.inc, per-size operations.
+ */
+
+static void frac128_shl(FloatParts128 *a, int c)
+{
+    shift128Left(a->frac_hi, a->frac_lo, c, &a->frac_hi, &a->frac_lo);
+}
+
+#define frac_shl(A, C)             frac128_shl(A, C)
+
+static void frac128_shr(FloatParts128 *a, int c)
+{
+    shift128Right(a->frac_hi, a->frac_lo, c, &a->frac_hi, &a->frac_lo);
+}
+
+#define frac_shr(A, C)             frac128_shr(A, C)
 
 /* Canonicalize EXP and FRAC, setting CLS.  */
 static FloatParts64 sf_canonicalize(FloatParts64 part, const FloatFmt *parm,
@@ -3850,6 +3909,17 @@ bfloat16 bfloat16_silence_nan(bfloat16 a, float_status *status)
     return bfloat16_pack_raw(&p);
 }
 
+float128 float128_silence_nan(float128 a, float_status *status)
+{
+    FloatParts128 p;
+
+    float128_unpack_raw(&p, a);
+    frac_shl(&p, float128_params.frac_shift);
+    parts_silence_nan(&p, status);
+    frac_shr(&p, float128_params.frac_shift);
+    return float128_pack_raw(&p);
+}
+
 /*----------------------------------------------------------------------------
 | If `a' is denormal and we are in flush-to-zero mode then set the
 | input-denormal exception and return zero. Otherwise just return the value.
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index 2a1bc66633..d892016f0f 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -198,6 +198,12 @@ static void parts64_silence_nan(FloatParts64 *p, float_status *status)
     p->cls = float_class_qnan;
 }
 
+static void parts128_silence_nan(FloatParts128 *p, float_status *status)
+{
+    p->frac_hi = parts_silence_nan_frac(p->frac_hi, status);
+    p->cls = float_class_qnan;
+}
+
 /*----------------------------------------------------------------------------
 | The pattern for a default generated extended double-precision NaN.
 *----------------------------------------------------------------------------*/
@@ -1057,25 +1063,6 @@ bool float128_is_signaling_nan(float128 a, float_status *status)
     }
 }
 
-/*----------------------------------------------------------------------------
-| Returns a quiet NaN from a signalling NaN for the quadruple-precision
-| floating point value `a'.
-*----------------------------------------------------------------------------*/
-
-float128 float128_silence_nan(float128 a, float_status *status)
-{
-    if (no_signaling_nans(status)) {
-        g_assert_not_reached();
-    } else {
-        if (snan_bit_is_one(status)) {
-            return float128_default_nan(status);
-        } else {
-            a.high |= UINT64_C(0x0000800000000000);
-            return a;
-        }
-    }
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of converting the quadruple-precision floating-point NaN
 | `a' to the canonical NaN format.  If `a' is a signaling NaN, the invalid
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 27/72] softfloat: Convert float128_default_nan to parts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (25 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 26/72] softfloat: Convert float128_silence_nan to parts Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13  8:56   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 28/72] softfloat: Move return_nan to softfloat-parts.c.inc Richard Henderson
                   ` (49 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c                | 17 ++++-------------
 fpu/softfloat-specialize.c.inc | 19 +++++++++++++++++++
 2 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 073b80d502..6d5392aeab 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -705,7 +705,7 @@ static float128 float128_pack_raw(const FloatParts128 *p)
 #define PARTS_GENERIC_64_128(NAME, P) \
     QEMU_GENERIC(P, (FloatParts128 *, parts128_##NAME), parts64_##NAME)
 
-#define parts_default_nan  parts64_default_nan
+#define parts_default_nan(P, S)    PARTS_GENERIC_64_128(default_nan, P)(P, S)
 #define parts_silence_nan(P, S)    PARTS_GENERIC_64_128(silence_nan, P)(P, S)
 
 
@@ -3836,20 +3836,11 @@ float64 float64_default_nan(float_status *status)
 
 float128 float128_default_nan(float_status *status)
 {
-    FloatParts64 p;
-    float128 r;
+    FloatParts128 p;
 
     parts_default_nan(&p, status);
-    /* Extrapolate from the choices made by parts_default_nan to fill
-     * in the quad-floating format.  If the low bit is set, assume we
-     * want to set all non-snan bits.
-     */
-    r.low = -(p.frac & 1);
-    r.high = p.frac >> (DECOMPOSED_BINARY_POINT - 48);
-    r.high |= UINT64_C(0x7FFF000000000000);
-    r.high |= (uint64_t)p.sign << 63;
-
-    return r;
+    frac_shr(&p, float128_params.frac_shift);
+    return float128_pack_raw(&p);
 }
 
 bfloat16 bfloat16_default_nan(float_status *status)
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index d892016f0f..a0cf016b4f 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -172,6 +172,25 @@ static void parts64_default_nan(FloatParts64 *p, float_status *status)
     };
 }
 
+static void parts128_default_nan(FloatParts128 *p, float_status *status)
+{
+    /*
+     * Extrapolate from the choices made by parts64_default_nan to fill
+     * in the quad-floating format.  If the low bit is set, assume we
+     * want to set all non-snan bits.
+     */
+    FloatParts64 p64;
+    parts64_default_nan(&p64, status);
+
+    *p = (FloatParts128) {
+        .cls = float_class_qnan,
+        .sign = p64.sign,
+        .exp = INT_MAX,
+        .frac_hi = p64.frac,
+        .frac_lo = -(p64.frac & 1)
+    };
+}
+
 /*----------------------------------------------------------------------------
 | Returns a quiet NaN from a signalling NaN for the deconstructed
 | floating-point parts.
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 28/72] softfloat: Move return_nan to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (26 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 27/72] softfloat: Convert float128_default_nan " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-12 18:10   ` David Hildenbrand
  2021-05-08  1:47 ` [PATCH 29/72] softfloat: Move pick_nan " Richard Henderson
                   ` (48 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

At the same time, convert to pointers, rename to return_nan$N
and define a macro for return_nan using QEMU_GENERIC.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 45 ++++++++++++++++++++++-----------------
 fpu/softfloat-parts.c.inc | 37 ++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 20 deletions(-)
 create mode 100644 fpu/softfloat-parts.c.inc

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6d5392aeab..5b26696078 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -708,6 +708,10 @@ static float128 float128_pack_raw(const FloatParts128 *p)
 #define parts_default_nan(P, S)    PARTS_GENERIC_64_128(default_nan, P)(P, S)
 #define parts_silence_nan(P, S)    PARTS_GENERIC_64_128(silence_nan, P)(P, S)
 
+static void parts64_return_nan(FloatParts64 *a, float_status *s);
+static void parts128_return_nan(FloatParts128 *a, float_status *s);
+
+#define parts_return_nan(P, S)     PARTS_GENERIC_64_128(return_nan, P)(P, S)
 
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
@@ -914,22 +918,6 @@ static FloatParts64 round_canonical(FloatParts64 p, float_status *s,
     return p;
 }
 
-static FloatParts64 return_nan(FloatParts64 a, float_status *s)
-{
-    g_assert(is_nan(a.cls));
-    if (is_snan(a.cls)) {
-        float_raise(float_flag_invalid, s);
-        if (!s->default_nan_mode) {
-            parts_silence_nan(&a, s);
-            return a;
-        }
-    } else if (!s->default_nan_mode) {
-        return a;
-    }
-    parts_default_nan(&a, s);
-    return a;
-}
-
 static FloatParts64 pick_nan(FloatParts64 a, FloatParts64 b, float_status *s)
 {
     if (is_snan(a.cls) || is_snan(b.cls)) {
@@ -991,6 +979,21 @@ static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64
     return a;
 }
 
+#define partsN(NAME)   parts64_##NAME
+#define FloatPartsN    FloatParts64
+
+#include "softfloat-parts.c.inc"
+
+#undef  partsN
+#undef  FloatPartsN
+#define partsN(NAME)   parts128_##NAME
+#define FloatPartsN    FloatParts128
+
+#include "softfloat-parts.c.inc"
+
+#undef  partsN
+#undef  FloatPartsN
+
 /*
  * Pack/unpack routines with a specific FloatFmt.
  */
@@ -2065,7 +2068,7 @@ static FloatParts64 float_to_float(FloatParts64 a, const FloatFmt *dstf,
             break;
         }
     } else if (is_nan(a.cls)) {
-        return return_nan(a, s);
+        parts_return_nan(&a, s);
     }
     return a;
 }
@@ -2194,7 +2197,8 @@ static FloatParts64 round_to_int(FloatParts64 a, FloatRoundMode rmode,
     switch (a.cls) {
     case float_class_qnan:
     case float_class_snan:
-        return return_nan(a, s);
+        parts_return_nan(&a, s);
+        break;
 
     case float_class_zero:
     case float_class_inf:
@@ -3590,7 +3594,7 @@ FloatRelation bfloat16_compare_quiet(bfloat16 a, bfloat16 b, float_status *s)
 static FloatParts64 scalbn_decomposed(FloatParts64 a, int n, float_status *s)
 {
     if (unlikely(is_nan(a.cls))) {
-        return return_nan(a, s);
+        parts_return_nan(&a, s);
     }
     if (a.cls == float_class_normal) {
         /* The largest float type (even though not supported by FloatParts64)
@@ -3658,7 +3662,8 @@ static FloatParts64 sqrt_float(FloatParts64 a, float_status *s, const FloatFmt *
     int bit, last_bit;
 
     if (is_nan(a.cls)) {
-        return return_nan(a, s);
+        parts_return_nan(&a, s);
+        return a;
     }
     if (a.cls == float_class_zero) {
         return a;  /* sqrt(+-0) = +-0 */
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
new file mode 100644
index 0000000000..2a3075d6fe
--- /dev/null
+++ b/fpu/softfloat-parts.c.inc
@@ -0,0 +1,37 @@
+/*
+ * QEMU float support
+ *
+ * The code in this source file is derived from release 2a of the SoftFloat
+ * IEC/IEEE Floating-point Arithmetic Package. Those parts of the code (and
+ * some later contributions) are provided under that license, as detailed below.
+ * It has subsequently been modified by contributors to the QEMU Project,
+ * so some portions are provided under:
+ *  the SoftFloat-2a license
+ *  the BSD license
+ *  GPL-v2-or-later
+ *
+ * Any future contributions to this file after December 1st 2014 will be
+ * taken to be licensed under the Softfloat-2a license unless specifically
+ * indicated otherwise.
+ */
+
+static void partsN(return_nan)(FloatPartsN *a, float_status *s)
+{
+    switch (a->cls) {
+    case float_class_snan:
+        float_raise(float_flag_invalid, s);
+        if (s->default_nan_mode) {
+            parts_default_nan(a, s);
+        } else {
+            parts_silence_nan(a, s);
+        }
+        break;
+    case float_class_qnan:
+        if (s->default_nan_mode) {
+            parts_default_nan(a, s);
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 29/72] softfloat: Move pick_nan to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (27 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 28/72] softfloat: Move return_nan to softfloat-parts.c.inc Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-12 18:16   ` David Hildenbrand
  2021-05-08  1:47 ` [PATCH 30/72] softfloat: Move pick_nan_muladd " Richard Henderson
                   ` (47 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

At the same time, convert to pointers, rename to parts$N_pick_nan
and define a macro for parts_pick_nan using QEMU_GENERIC.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 62 ++++++++++++++++++++++-----------------
 fpu/softfloat-parts.c.inc | 25 ++++++++++++++++
 2 files changed, 60 insertions(+), 27 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 5b26696078..77efaedeaa 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -713,10 +713,39 @@ static void parts128_return_nan(FloatParts128 *a, float_status *s);
 
 #define parts_return_nan(P, S)     PARTS_GENERIC_64_128(return_nan, P)(P, S)
 
+static FloatParts64 *parts64_pick_nan(FloatParts64 *a, FloatParts64 *b,
+                                      float_status *s);
+static FloatParts128 *parts128_pick_nan(FloatParts128 *a, FloatParts128 *b,
+                                        float_status *s);
+
+#define parts_pick_nan(A, B, S)    PARTS_GENERIC_64_128(pick_nan, A)(A, B, S)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
 
+#define FRAC_GENERIC_64_128(NAME, P) \
+    QEMU_GENERIC(P, (FloatParts128 *, frac128_##NAME), frac64_##NAME)
+
+static int frac64_cmp(FloatParts64 *a, FloatParts64 *b)
+{
+    return a->frac == b->frac ? 0 : a->frac < b->frac ? -1 : 1;
+}
+
+static int frac128_cmp(FloatParts128 *a, FloatParts128 *b)
+{
+    uint64_t ta = a->frac_hi, tb = b->frac_hi;
+    if (ta == tb) {
+        ta = a->frac_lo, tb = b->frac_lo;
+        if (ta == tb) {
+            return 0;
+        }
+    }
+    return ta < tb ? -1 : 1;
+}
+
+#define frac_cmp(A, B)  FRAC_GENERIC_64_128(cmp, A)(A, B)
+
 static void frac128_shl(FloatParts128 *a, int c)
 {
     shift128Left(a->frac_hi, a->frac_lo, c, &a->frac_hi, &a->frac_lo);
@@ -918,27 +947,6 @@ static FloatParts64 round_canonical(FloatParts64 p, float_status *s,
     return p;
 }
 
-static FloatParts64 pick_nan(FloatParts64 a, FloatParts64 b, float_status *s)
-{
-    if (is_snan(a.cls) || is_snan(b.cls)) {
-        float_raise(float_flag_invalid, s);
-    }
-
-    if (s->default_nan_mode) {
-        parts_default_nan(&a, s);
-    } else {
-        if (pickNaN(a.cls, b.cls,
-                    a.frac > b.frac ||
-                    (a.frac == b.frac && a.sign < b.sign), s)) {
-            a = b;
-        }
-        if (is_snan(a.cls)) {
-            parts_silence_nan(&a, s);
-        }
-    }
-    return a;
-}
-
 static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64 c,
                                   bool inf_zero, float_status *s)
 {
@@ -1106,7 +1114,7 @@ static FloatParts64 addsub_floats(FloatParts64 a, FloatParts64 b, bool subtract,
             return a;
         }
         if (is_nan(a.cls) || is_nan(b.cls)) {
-            return pick_nan(a, b, s);
+            return *parts_pick_nan(&a, &b, s);
         }
         if (a.cls == float_class_inf) {
             if (b.cls == float_class_inf) {
@@ -1144,7 +1152,7 @@ static FloatParts64 addsub_floats(FloatParts64 a, FloatParts64 b, bool subtract,
             return a;
         }
         if (is_nan(a.cls) || is_nan(b.cls)) {
-            return pick_nan(a, b, s);
+            return *parts_pick_nan(&a, &b, s);
         }
         if (a.cls == float_class_inf || b.cls == float_class_zero) {
             return a;
@@ -1360,7 +1368,7 @@ static FloatParts64 mul_floats(FloatParts64 a, FloatParts64 b, float_status *s)
     }
     /* handle all the NaN cases */
     if (is_nan(a.cls) || is_nan(b.cls)) {
-        return pick_nan(a, b, s);
+        return *parts_pick_nan(&a, &b, s);
     }
     /* Inf * Zero == NaN */
     if ((a.cls == float_class_inf && b.cls == float_class_zero) ||
@@ -1887,7 +1895,7 @@ static FloatParts64 div_floats(FloatParts64 a, FloatParts64 b, float_status *s)
     }
     /* handle all the NaN cases */
     if (is_nan(a.cls) || is_nan(b.cls)) {
-        return pick_nan(a, b, s);
+        return *parts_pick_nan(&a, &b, s);
     }
     /* 0/0 or Inf/Inf */
     if (a.cls == b.cls
@@ -3295,14 +3303,14 @@ static FloatParts64 minmax_floats(FloatParts64 a, FloatParts64 b, bool ismin,
              * the invalid exception is raised.
              */
             if (is_snan(a.cls) || is_snan(b.cls)) {
-                return pick_nan(a, b, s);
+                return *parts_pick_nan(&a, &b, s);
             } else if (is_nan(a.cls) && !is_nan(b.cls)) {
                 return b;
             } else if (is_nan(b.cls) && !is_nan(a.cls)) {
                 return a;
             }
         }
-        return pick_nan(a, b, s);
+        return *parts_pick_nan(&a, &b, s);
     } else {
         int a_exp, b_exp;
 
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 2a3075d6fe..11a71650f7 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -35,3 +35,28 @@ static void partsN(return_nan)(FloatPartsN *a, float_status *s)
         g_assert_not_reached();
     }
 }
+
+static FloatPartsN *partsN(pick_nan)(FloatPartsN *a, FloatPartsN *b,
+                                     float_status *s)
+{
+    if (is_snan(a->cls) || is_snan(b->cls)) {
+        float_raise(float_flag_invalid, s);
+    }
+
+    if (s->default_nan_mode) {
+        parts_default_nan(a, s);
+    } else {
+        int cmp = frac_cmp(a, b);
+        if (cmp == 0) {
+            cmp = a->sign < b->sign;
+        }
+
+        if (pickNaN(a->cls, b->cls, cmp > 0, s)) {
+            a = b;
+        }
+        if (is_snan(a->cls)) {
+            parts_silence_nan(a, s);
+        }
+    }
+    return a;
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 30/72] softfloat: Move pick_nan_muladd to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (28 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 29/72] softfloat: Move pick_nan " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-12 18:18   ` David Hildenbrand
  2021-05-08  1:47 ` [PATCH 31/72] softfloat: Move sf_canonicalize " Richard Henderson
                   ` (46 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

At the same time, convert to pointers, rename to pick_nan_muladd$N
and define a macro for pick_nan_muladd using QEMU_GENERIC.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 53 ++++++++++-----------------------------
 fpu/softfloat-parts.c.inc | 40 +++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 40 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 77efaedeaa..40ee294e35 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -720,6 +720,18 @@ static FloatParts128 *parts128_pick_nan(FloatParts128 *a, FloatParts128 *b,
 
 #define parts_pick_nan(A, B, S)    PARTS_GENERIC_64_128(pick_nan, A)(A, B, S)
 
+static FloatParts64 *parts64_pick_nan_muladd(FloatParts64 *a, FloatParts64 *b,
+                                             FloatParts64 *c, float_status *s,
+                                             int ab_mask, int abc_mask);
+static FloatParts128 *parts128_pick_nan_muladd(FloatParts128 *a,
+                                               FloatParts128 *b,
+                                               FloatParts128 *c,
+                                               float_status *s,
+                                               int ab_mask, int abc_mask);
+
+#define parts_pick_nan_muladd(A, B, C, S, ABM, ABCM) \
+    PARTS_GENERIC_64_128(pick_nan_muladd, A)(A, B, C, S, ABM, ABCM)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -947,45 +959,6 @@ static FloatParts64 round_canonical(FloatParts64 p, float_status *s,
     return p;
 }
 
-static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64 c,
-                                  bool inf_zero, float_status *s)
-{
-    int which;
-
-    if (is_snan(a.cls) || is_snan(b.cls) || is_snan(c.cls)) {
-        float_raise(float_flag_invalid, s);
-    }
-
-    which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
-
-    if (s->default_nan_mode) {
-        /* Note that this check is after pickNaNMulAdd so that function
-         * has an opportunity to set the Invalid flag.
-         */
-        which = 3;
-    }
-
-    switch (which) {
-    case 0:
-        break;
-    case 1:
-        a = b;
-        break;
-    case 2:
-        a = c;
-        break;
-    case 3:
-        parts_default_nan(&a, s);
-        break;
-    default:
-        g_assert_not_reached();
-    }
-
-    if (is_snan(a.cls)) {
-        parts_silence_nan(&a, s);
-    }
-    return a;
-}
 
 #define partsN(NAME)   parts64_##NAME
 #define FloatPartsN    FloatParts64
@@ -1496,7 +1469,7 @@ static FloatParts64 muladd_floats(FloatParts64 a, FloatParts64 b, FloatParts64 c
      * off to the target-specific pick-a-NaN routine.
      */
     if (unlikely(abc_mask & float_cmask_anynan)) {
-        return pick_nan_muladd(a, b, c, inf_zero, s);
+        return *parts_pick_nan_muladd(&a, &b, &c, s, ab_mask, abc_mask);
     }
 
     if (inf_zero) {
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 11a71650f7..a78d61ea07 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -60,3 +60,43 @@ static FloatPartsN *partsN(pick_nan)(FloatPartsN *a, FloatPartsN *b,
     }
     return a;
 }
+
+static FloatPartsN *partsN(pick_nan_muladd)(FloatPartsN *a, FloatPartsN *b,
+                                            FloatPartsN *c, float_status *s,
+                                            int ab_mask, int abc_mask)
+{
+    int which;
+
+    if (unlikely(abc_mask & float_cmask_snan)) {
+        float_raise(float_flag_invalid, s);
+    }
+
+    which = pickNaNMulAdd(a->cls, b->cls, c->cls,
+                          ab_mask == float_cmask_infzero, s);
+
+    if (s->default_nan_mode || which == 3) {
+        /*
+         * Note that this check is after pickNaNMulAdd so that function
+         * has an opportunity to set the Invalid flag for infzero.
+         */
+        parts_default_nan(a, s);
+        return a;
+    }
+
+    switch (which) {
+    case 0:
+        break;
+    case 1:
+        a = b;
+        break;
+    case 2:
+        a = c;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    if (is_snan(a->cls)) {
+        parts_silence_nan(a, s);
+    }
+    return a;
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 31/72] softfloat: Move sf_canonicalize to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (29 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 30/72] softfloat: Move pick_nan_muladd " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13  9:45   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 32/72] softfloat: Move round_canonical " Richard Henderson
                   ` (45 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

At the same time, convert to pointers, rename to parts$N_canonicalize
and define a macro for parts_canonicalize using QEMU_GENERIC.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 117 +++++++++++++++++++++++++-------------
 fpu/softfloat-parts.c.inc |  33 +++++++++++
 2 files changed, 112 insertions(+), 38 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 40ee294e35..bad4b54cd2 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -732,6 +732,14 @@ static FloatParts128 *parts128_pick_nan_muladd(FloatParts128 *a,
 #define parts_pick_nan_muladd(A, B, C, S, ABM, ABCM) \
     PARTS_GENERIC_64_128(pick_nan_muladd, A)(A, B, C, S, ABM, ABCM)
 
+static void parts64_canonicalize(FloatParts64 *p, float_status *status,
+                                 const FloatFmt *fmt);
+static void parts128_canonicalize(FloatParts128 *p, float_status *status,
+                                  const FloatFmt *fmt);
+
+#define parts_canonicalize(A, S, F) \
+    PARTS_GENERIC_64_128(canonicalize, A)(A, S, F)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -758,52 +766,85 @@ static int frac128_cmp(FloatParts128 *a, FloatParts128 *b)
 
 #define frac_cmp(A, B)  FRAC_GENERIC_64_128(cmp, A)(A, B)
 
+static void frac64_clear(FloatParts64 *a)
+{
+    a->frac = 0;
+}
+
+static void frac128_clear(FloatParts128 *a)
+{
+    a->frac_hi = a->frac_lo = 0;
+}
+
+#define frac_clear(A)  FRAC_GENERIC_64_128(clear, A)(A)
+
+static bool frac64_eqz(FloatParts64 *a)
+{
+    return a->frac == 0;
+}
+
+static bool frac128_eqz(FloatParts128 *a)
+{
+    return (a->frac_hi | a->frac_lo) == 0;
+}
+
+#define frac_eqz(A)  FRAC_GENERIC_64_128(eqz, A)(A)
+
+static int frac64_normalize(FloatParts64 *a)
+{
+    if (a->frac) {
+        int shift = clz64(a->frac);
+        a->frac <<= shift;
+        return shift;
+    }
+    return 64;
+}
+
+static int frac128_normalize(FloatParts128 *a)
+{
+    if (a->frac_hi) {
+        int shl = clz64(a->frac_hi);
+        if (shl) {
+            int shr = 64 - shl;
+            a->frac_hi = (a->frac_hi << shl) | (a->frac_lo >> shr);
+            a->frac_lo = (a->frac_lo << shl);
+        }
+        return shl;
+    } else if (a->frac_lo) {
+        int shl = clz64(a->frac_lo);
+        a->frac_hi = (a->frac_lo << shl);
+        a->frac_lo = 0;
+        return shl + 64;
+    }
+    return 128;
+}
+
+#define frac_normalize(A)  FRAC_GENERIC_64_128(normalize, A)(A)
+
+static void frac64_shl(FloatParts64 *a, int c)
+{
+    a->frac <<= c;
+}
+
 static void frac128_shl(FloatParts128 *a, int c)
 {
     shift128Left(a->frac_hi, a->frac_lo, c, &a->frac_hi, &a->frac_lo);
 }
 
-#define frac_shl(A, C)             frac128_shl(A, C)
+#define frac_shl(A, C)  FRAC_GENERIC_64_128(shl, A)(A, C)
+
+static void frac64_shr(FloatParts64 *a, int c)
+{
+    a->frac >>= c;
+}
 
 static void frac128_shr(FloatParts128 *a, int c)
 {
     shift128Right(a->frac_hi, a->frac_lo, c, &a->frac_hi, &a->frac_lo);
 }
 
-#define frac_shr(A, C)             frac128_shr(A, C)
+#define frac_shr(A, C)  FRAC_GENERIC_64_128(shr, A)(A, C)
 
-/* Canonicalize EXP and FRAC, setting CLS.  */
-static FloatParts64 sf_canonicalize(FloatParts64 part, const FloatFmt *parm,
-                                  float_status *status)
-{
-    if (part.exp == parm->exp_max && !parm->arm_althp) {
-        if (part.frac == 0) {
-            part.cls = float_class_inf;
-        } else {
-            part.frac <<= parm->frac_shift;
-            part.cls = (parts_is_snan_frac(part.frac, status)
-                        ? float_class_snan : float_class_qnan);
-        }
-    } else if (part.exp == 0) {
-        if (likely(part.frac == 0)) {
-            part.cls = float_class_zero;
-        } else if (status->flush_inputs_to_zero) {
-            float_raise(float_flag_input_denormal, status);
-            part.cls = float_class_zero;
-            part.frac = 0;
-        } else {
-            int shift = clz64(part.frac);
-            part.cls = float_class_normal;
-            part.exp = parm->frac_shift - parm->exp_bias - shift + 1;
-            part.frac <<= shift;
-        }
-    } else {
-        part.cls = float_class_normal;
-        part.exp -= parm->exp_bias;
-        part.frac = DECOMPOSED_IMPLICIT_BIT + (part.frac << parm->frac_shift);
-    }
-    return part;
-}
 
 /* Round and uncanonicalize a floating-point number by parts. There
  * are FRAC_SHIFT bits that may require rounding at the bottom of the
@@ -983,7 +1024,7 @@ static void float16a_unpack_canonical(FloatParts64 *p, float16 f,
                                       float_status *s, const FloatFmt *params)
 {
     float16_unpack_raw(p, f);
-    *p = sf_canonicalize(*p, params, s);
+    parts_canonicalize(p, s, params);
 }
 
 static void float16_unpack_canonical(FloatParts64 *p, float16 f,
@@ -996,7 +1037,7 @@ static void bfloat16_unpack_canonical(FloatParts64 *p, bfloat16 f,
                                       float_status *s)
 {
     bfloat16_unpack_raw(p, f);
-    *p = sf_canonicalize(*p, &bfloat16_params, s);
+    parts_canonicalize(p, s, &bfloat16_params);
 }
 
 static float16 float16a_round_pack_canonical(FloatParts64 *p,
@@ -1024,7 +1065,7 @@ static void float32_unpack_canonical(FloatParts64 *p, float32 f,
                                      float_status *s)
 {
     float32_unpack_raw(p, f);
-    *p = sf_canonicalize(*p, &float32_params, s);
+    parts_canonicalize(p, s, &float32_params);
 }
 
 static float32 float32_round_pack_canonical(FloatParts64 *p,
@@ -1038,7 +1079,7 @@ static void float64_unpack_canonical(FloatParts64 *p, float64 f,
                                      float_status *s)
 {
     float64_unpack_raw(p, f);
-    *p = sf_canonicalize(*p, &float64_params, s);
+    parts_canonicalize(p, s, &float64_params);
 }
 
 static float64 float64_round_pack_canonical(FloatParts64 *p,
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index a78d61ea07..25bf99bd0f 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -100,3 +100,36 @@ static FloatPartsN *partsN(pick_nan_muladd)(FloatPartsN *a, FloatPartsN *b,
     }
     return a;
 }
+
+/*
+ * Canonicalize the FloatParts structure.  Determine the class,
+ * unbias the exponent, and normalize the fraction.
+ */
+static void partsN(canonicalize)(FloatPartsN *p, float_status *status,
+                                 const FloatFmt *fmt)
+{
+    if (unlikely(p->exp == 0)) {
+        if (likely(frac_eqz(p))) {
+            p->cls = float_class_zero;
+        } else if (status->flush_inputs_to_zero) {
+            float_raise(float_flag_input_denormal, status);
+            p->cls = float_class_zero;
+            frac_clear(p);
+        } else {
+            int shift = frac_normalize(p);
+            p->cls = float_class_normal;
+            p->exp = fmt->frac_shift - fmt->exp_bias - shift + 1;
+        }
+    } else if (likely(p->exp < fmt->exp_max) || fmt->arm_althp) {
+        p->cls = float_class_normal;
+        p->exp -= fmt->exp_bias;
+        frac_shl(p, fmt->frac_shift);
+        p->frac_hi |= DECOMPOSED_IMPLICIT_BIT;
+    } else if (likely(frac_eqz(p))) {
+        p->cls = float_class_inf;
+    } else {
+        frac_shl(p, fmt->frac_shift);
+        p->cls = (parts_is_snan_frac(p->frac_hi, status)
+                  ? float_class_snan : float_class_qnan);
+    }
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 32/72] softfloat: Move round_canonical to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (30 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 31/72] softfloat: Move sf_canonicalize " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13  9:53   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 33/72] softfloat: Use uadd64_carry, usub64_borrow in softfloat-macros.h Richard Henderson
                   ` (44 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

At the same time, convert to pointers, renaming to parts$N_uncanon,
and define a macro for parts_uncanon using QEMU_GENERIC.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 201 +++++++++-----------------------------
 fpu/softfloat-parts.c.inc | 148 ++++++++++++++++++++++++++++
 2 files changed, 193 insertions(+), 156 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index bad4b54cd2..e9d644385d 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -740,6 +740,14 @@ static void parts128_canonicalize(FloatParts128 *p, float_status *status,
 #define parts_canonicalize(A, S, F) \
     PARTS_GENERIC_64_128(canonicalize, A)(A, S, F)
 
+static void parts64_uncanon(FloatParts64 *p, float_status *status,
+                            const FloatFmt *fmt);
+static void parts128_uncanon(FloatParts128 *p, float_status *status,
+                             const FloatFmt *fmt);
+
+#define parts_uncanon(A, S, F) \
+    PARTS_GENERIC_64_128(uncanon, A)(A, S, F)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -747,6 +755,31 @@ static void parts128_canonicalize(FloatParts128 *p, float_status *status,
 #define FRAC_GENERIC_64_128(NAME, P) \
     QEMU_GENERIC(P, (FloatParts128 *, frac128_##NAME), frac64_##NAME)
 
+static bool frac64_addi(FloatParts64 *r, FloatParts64 *a, uint64_t c)
+{
+    return uadd64_overflow(a->frac, c, &r->frac);
+}
+
+static bool frac128_addi(FloatParts128 *r, FloatParts128 *a, uint64_t c)
+{
+    c = uadd64_overflow(a->frac_lo, c, &r->frac_lo);
+    return uadd64_overflow(a->frac_hi, c, &r->frac_hi);
+}
+
+#define frac_addi(R, A, C)  FRAC_GENERIC_64_128(addi, R)(R, A, C)
+
+static void frac64_allones(FloatParts64 *a)
+{
+    a->frac = -1;
+}
+
+static void frac128_allones(FloatParts128 *a)
+{
+    a->frac_hi = a->frac_lo = -1;
+}
+
+#define frac_allones(A)  FRAC_GENERIC_64_128(allones, A)(A)
+
 static int frac64_cmp(FloatParts64 *a, FloatParts64 *b)
 {
     return a->frac == b->frac ? 0 : a->frac < b->frac ? -1 : 1;
@@ -845,161 +878,17 @@ static void frac128_shr(FloatParts128 *a, int c)
 
 #define frac_shr(A, C)  FRAC_GENERIC_64_128(shr, A)(A, C)
 
-
-/* Round and uncanonicalize a floating-point number by parts. There
- * are FRAC_SHIFT bits that may require rounding at the bottom of the
- * fraction; these bits will be removed. The exponent will be biased
- * by EXP_BIAS and must be bounded by [EXP_MAX-1, 0].
- */
-
-static FloatParts64 round_canonical(FloatParts64 p, float_status *s,
-                                  const FloatFmt *parm)
+static void frac64_shrjam(FloatParts64 *a, int c)
 {
-    const uint64_t frac_lsb = parm->frac_lsb;
-    const uint64_t frac_lsbm1 = parm->frac_lsbm1;
-    const uint64_t round_mask = parm->round_mask;
-    const uint64_t roundeven_mask = parm->roundeven_mask;
-    const int exp_max = parm->exp_max;
-    const int frac_shift = parm->frac_shift;
-    uint64_t frac, inc;
-    int exp, flags = 0;
-    bool overflow_norm;
-
-    frac = p.frac;
-    exp = p.exp;
-
-    switch (p.cls) {
-    case float_class_normal:
-        switch (s->float_rounding_mode) {
-        case float_round_nearest_even:
-            overflow_norm = false;
-            inc = ((frac & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
-            break;
-        case float_round_ties_away:
-            overflow_norm = false;
-            inc = frac_lsbm1;
-            break;
-        case float_round_to_zero:
-            overflow_norm = true;
-            inc = 0;
-            break;
-        case float_round_up:
-            inc = p.sign ? 0 : round_mask;
-            overflow_norm = p.sign;
-            break;
-        case float_round_down:
-            inc = p.sign ? round_mask : 0;
-            overflow_norm = !p.sign;
-            break;
-        case float_round_to_odd:
-            overflow_norm = true;
-            inc = frac & frac_lsb ? 0 : round_mask;
-            break;
-        default:
-            g_assert_not_reached();
-        }
-
-        exp += parm->exp_bias;
-        if (likely(exp > 0)) {
-            if (frac & round_mask) {
-                flags |= float_flag_inexact;
-                if (uadd64_overflow(frac, inc, &frac)) {
-                    frac = (frac >> 1) | DECOMPOSED_IMPLICIT_BIT;
-                    exp++;
-                }
-            }
-            frac >>= frac_shift;
-
-            if (parm->arm_althp) {
-                /* ARM Alt HP eschews Inf and NaN for a wider exponent.  */
-                if (unlikely(exp > exp_max)) {
-                    /* Overflow.  Return the maximum normal.  */
-                    flags = float_flag_invalid;
-                    exp = exp_max;
-                    frac = -1;
-                }
-            } else if (unlikely(exp >= exp_max)) {
-                flags |= float_flag_overflow | float_flag_inexact;
-                if (overflow_norm) {
-                    exp = exp_max - 1;
-                    frac = -1;
-                } else {
-                    p.cls = float_class_inf;
-                    goto do_inf;
-                }
-            }
-        } else if (s->flush_to_zero) {
-            flags |= float_flag_output_denormal;
-            p.cls = float_class_zero;
-            goto do_zero;
-        } else {
-            bool is_tiny = s->tininess_before_rounding || (exp < 0);
-
-            if (!is_tiny) {
-                uint64_t discard;
-                is_tiny = !uadd64_overflow(frac, inc, &discard);
-            }
-
-            shift64RightJamming(frac, 1 - exp, &frac);
-            if (frac & round_mask) {
-                /* Need to recompute round-to-even.  */
-                switch (s->float_rounding_mode) {
-                case float_round_nearest_even:
-                    inc = ((frac & roundeven_mask) != frac_lsbm1
-                           ? frac_lsbm1 : 0);
-                    break;
-                case float_round_to_odd:
-                    inc = frac & frac_lsb ? 0 : round_mask;
-                    break;
-                default:
-                    break;
-                }
-                flags |= float_flag_inexact;
-                frac += inc;
-            }
-
-            exp = (frac & DECOMPOSED_IMPLICIT_BIT ? 1 : 0);
-            frac >>= frac_shift;
-
-            if (is_tiny && (flags & float_flag_inexact)) {
-                flags |= float_flag_underflow;
-            }
-            if (exp == 0 && frac == 0) {
-                p.cls = float_class_zero;
-            }
-        }
-        break;
-
-    case float_class_zero:
-    do_zero:
-        exp = 0;
-        frac = 0;
-        break;
-
-    case float_class_inf:
-    do_inf:
-        assert(!parm->arm_althp);
-        exp = exp_max;
-        frac = 0;
-        break;
-
-    case float_class_qnan:
-    case float_class_snan:
-        assert(!parm->arm_althp);
-        exp = exp_max;
-        frac >>= parm->frac_shift;
-        break;
-
-    default:
-        g_assert_not_reached();
-    }
-
-    float_raise(flags, s);
-    p.exp = exp;
-    p.frac = frac;
-    return p;
+    shift64RightJamming(a->frac, c, &a->frac);
 }
 
+static void frac128_shrjam(FloatParts128 *a, int c)
+{
+    shift128RightJamming(a->frac_hi, a->frac_lo, c, &a->frac_hi, &a->frac_lo);
+}
+
+#define frac_shrjam(A, C)  FRAC_GENERIC_64_128(shrjam, A)(A, C)
 
 #define partsN(NAME)   parts64_##NAME
 #define FloatPartsN    FloatParts64
@@ -1044,7 +933,7 @@ static float16 float16a_round_pack_canonical(FloatParts64 *p,
                                              float_status *s,
                                              const FloatFmt *params)
 {
-    *p = round_canonical(*p, s, params);
+    parts_uncanon(p, s, params);
     return float16_pack_raw(p);
 }
 
@@ -1057,7 +946,7 @@ static float16 float16_round_pack_canonical(FloatParts64 *p,
 static bfloat16 bfloat16_round_pack_canonical(FloatParts64 *p,
                                               float_status *s)
 {
-    *p = round_canonical(*p, s, &bfloat16_params);
+    parts_uncanon(p, s, &bfloat16_params);
     return bfloat16_pack_raw(p);
 }
 
@@ -1071,7 +960,7 @@ static void float32_unpack_canonical(FloatParts64 *p, float32 f,
 static float32 float32_round_pack_canonical(FloatParts64 *p,
                                             float_status *s)
 {
-    *p = round_canonical(*p, s, &float32_params);
+    parts_uncanon(p, s, &float32_params);
     return float32_pack_raw(p);
 }
 
@@ -1085,7 +974,7 @@ static void float64_unpack_canonical(FloatParts64 *p, float64 f,
 static float64 float64_round_pack_canonical(FloatParts64 *p,
                                             float_status *s)
 {
-    *p = round_canonical(*p, s, &float64_params);
+    parts_uncanon(p, s, &float64_params);
     return float64_pack_raw(p);
 }
 
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 25bf99bd0f..efdc724770 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -133,3 +133,151 @@ static void partsN(canonicalize)(FloatPartsN *p, float_status *status,
                   ? float_class_snan : float_class_qnan);
     }
 }
+
+/*
+ * Round and uncanonicalize a floating-point number by parts. There
+ * are FRAC_SHIFT bits that may require rounding at the bottom of the
+ * fraction; these bits will be removed. The exponent will be biased
+ * by EXP_BIAS and must be bounded by [EXP_MAX-1, 0].
+ */
+static void partsN(uncanon)(FloatPartsN *p, float_status *s,
+                            const FloatFmt *fmt)
+{
+    const int exp_max = fmt->exp_max;
+    const int frac_shift = fmt->frac_shift;
+    const uint64_t frac_lsb = fmt->frac_lsb;
+    const uint64_t frac_lsbm1 = fmt->frac_lsbm1;
+    const uint64_t round_mask = fmt->round_mask;
+    const uint64_t roundeven_mask = fmt->roundeven_mask;
+    uint64_t inc;
+    bool overflow_norm;
+    int exp, flags = 0;
+
+    if (unlikely(p->cls != float_class_normal)) {
+        switch (p->cls) {
+        case float_class_zero:
+            p->exp = 0;
+            frac_clear(p);
+            return;
+        case float_class_inf:
+            g_assert(!fmt->arm_althp);
+            p->exp = fmt->exp_max;
+            frac_clear(p);
+            return;
+        case float_class_qnan:
+        case float_class_snan:
+            g_assert(!fmt->arm_althp);
+            p->exp = fmt->exp_max;
+            frac_shr(p, fmt->frac_shift);
+            return;
+        default:
+            break;
+        }
+        g_assert_not_reached();
+    }
+
+    switch (s->float_rounding_mode) {
+    case float_round_nearest_even:
+        overflow_norm = false;
+        inc = ((p->frac_lo & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
+        break;
+    case float_round_ties_away:
+        overflow_norm = false;
+        inc = frac_lsbm1;
+        break;
+    case float_round_to_zero:
+        overflow_norm = true;
+        inc = 0;
+        break;
+    case float_round_up:
+        inc = p->sign ? 0 : round_mask;
+        overflow_norm = p->sign;
+        break;
+    case float_round_down:
+        inc = p->sign ? round_mask : 0;
+        overflow_norm = !p->sign;
+        break;
+    case float_round_to_odd:
+        overflow_norm = true;
+        inc = p->frac_lo & frac_lsb ? 0 : round_mask;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    exp = p->exp + fmt->exp_bias;
+    if (likely(exp > 0)) {
+        if (p->frac_lo & round_mask) {
+            flags |= float_flag_inexact;
+            if (frac_addi(p, p, inc)) {
+                frac_shr(p, 1);
+                p->frac_hi |= DECOMPOSED_IMPLICIT_BIT;
+                exp++;
+            }
+        }
+        frac_shr(p, frac_shift);
+
+        if (fmt->arm_althp) {
+            /* ARM Alt HP eschews Inf and NaN for a wider exponent.  */
+            if (unlikely(exp > exp_max)) {
+                /* Overflow.  Return the maximum normal.  */
+                flags = float_flag_invalid;
+                exp = exp_max;
+                frac_allones(p);
+            }
+        } else if (unlikely(exp >= exp_max)) {
+            flags |= float_flag_overflow | float_flag_inexact;
+            if (overflow_norm) {
+                exp = exp_max - 1;
+                frac_allones(p);
+            } else {
+                p->cls = float_class_inf;
+                exp = exp_max;
+                frac_clear(p);
+            }
+        }
+    } else if (s->flush_to_zero) {
+        flags |= float_flag_output_denormal;
+        p->cls = float_class_zero;
+        exp = 0;
+        frac_clear(p);
+    } else {
+        bool is_tiny = s->tininess_before_rounding || exp < 0;
+
+        if (!is_tiny) {
+            FloatPartsN discard;
+            is_tiny = !frac_addi(&discard, p, inc);
+        }
+
+        frac_shrjam(p, 1 - exp);
+
+        if (p->frac_lo & round_mask) {
+            /* Need to recompute round-to-even/round-to-odd. */
+            switch (s->float_rounding_mode) {
+            case float_round_nearest_even:
+                inc = ((p->frac_lo & roundeven_mask) != frac_lsbm1
+                       ? frac_lsbm1 : 0);
+                break;
+            case float_round_to_odd:
+                inc = p->frac_lo & frac_lsb ? 0 : round_mask;
+                break;
+            default:
+                break;
+            }
+            flags |= float_flag_inexact;
+            frac_addi(p, p, inc);
+        }
+
+        exp = (p->frac_hi & DECOMPOSED_IMPLICIT_BIT) != 0;
+        frac_shr(p, frac_shift);
+
+        if (is_tiny && (flags & float_flag_inexact)) {
+            flags |= float_flag_underflow;
+        }
+        if (exp == 0 && frac_eqz(p)) {
+            p->cls = float_class_zero;
+        }
+    }
+    p->exp = exp;
+    float_raise(flags, s);
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 33/72] softfloat: Use uadd64_carry, usub64_borrow in softfloat-macros.h
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (31 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 32/72] softfloat: Move round_canonical " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13 10:00   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 34/72] softfloat: Move addsub_floats to softfloat-parts.c.inc Richard Henderson
                   ` (43 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Use compiler support for carry arithmetic.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-macros.h | 95 +++++++++-------------------------
 1 file changed, 25 insertions(+), 70 deletions(-)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index a35ec2893a..2e3760a9c1 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -83,6 +83,7 @@ this code that are retained.
 #define FPU_SOFTFLOAT_MACROS_H
 
 #include "fpu/softfloat-types.h"
+#include "qemu/host-utils.h"
 
 /*----------------------------------------------------------------------------
 | Shifts `a' right by the number of bits given in `count'.  If any nonzero
@@ -403,16 +404,12 @@ static inline void
 | are stored at the locations pointed to by `z0Ptr' and `z1Ptr'.
 *----------------------------------------------------------------------------*/
 
-static inline void
- add128(
-     uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr )
+static inline void add128(uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1,
+                          uint64_t *z0Ptr, uint64_t *z1Ptr)
 {
-    uint64_t z1;
-
-    z1 = a1 + b1;
-    *z1Ptr = z1;
-    *z0Ptr = a0 + b0 + ( z1 < a1 );
-
+    bool c = 0;
+    *z1Ptr = uadd64_carry(a1, b1, &c);
+    *z0Ptr = uadd64_carry(a0, b0, &c);
 }
 
 /*----------------------------------------------------------------------------
@@ -423,34 +420,14 @@ static inline void
 | `z1Ptr', and `z2Ptr'.
 *----------------------------------------------------------------------------*/
 
-static inline void
- add192(
-     uint64_t a0,
-     uint64_t a1,
-     uint64_t a2,
-     uint64_t b0,
-     uint64_t b1,
-     uint64_t b2,
-     uint64_t *z0Ptr,
-     uint64_t *z1Ptr,
-     uint64_t *z2Ptr
- )
+static inline void add192(uint64_t a0, uint64_t a1, uint64_t a2,
+                          uint64_t b0, uint64_t b1, uint64_t b2,
+                          uint64_t *z0Ptr, uint64_t *z1Ptr, uint64_t *z2Ptr)
 {
-    uint64_t z0, z1, z2;
-    int8_t carry0, carry1;
-
-    z2 = a2 + b2;
-    carry1 = ( z2 < a2 );
-    z1 = a1 + b1;
-    carry0 = ( z1 < a1 );
-    z0 = a0 + b0;
-    z1 += carry1;
-    z0 += ( z1 < carry1 );
-    z0 += carry0;
-    *z2Ptr = z2;
-    *z1Ptr = z1;
-    *z0Ptr = z0;
-
+    bool c = 0;
+    *z2Ptr = uadd64_carry(a2, b2, &c);
+    *z1Ptr = uadd64_carry(a1, b1, &c);
+    *z0Ptr = uadd64_carry(a0, b0, &c);
 }
 
 /*----------------------------------------------------------------------------
@@ -461,14 +438,12 @@ static inline void
 | `z1Ptr'.
 *----------------------------------------------------------------------------*/
 
-static inline void
- sub128(
-     uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr )
+static inline void sub128(uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1,
+                          uint64_t *z0Ptr, uint64_t *z1Ptr)
 {
-
-    *z1Ptr = a1 - b1;
-    *z0Ptr = a0 - b0 - ( a1 < b1 );
-
+    bool c = 0;
+    *z1Ptr = usub64_borrow(a1, b1, &c);
+    *z0Ptr = usub64_borrow(a0, b0, &c);
 }
 
 /*----------------------------------------------------------------------------
@@ -479,34 +454,14 @@ static inline void
 | pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'.
 *----------------------------------------------------------------------------*/
 
-static inline void
- sub192(
-     uint64_t a0,
-     uint64_t a1,
-     uint64_t a2,
-     uint64_t b0,
-     uint64_t b1,
-     uint64_t b2,
-     uint64_t *z0Ptr,
-     uint64_t *z1Ptr,
-     uint64_t *z2Ptr
- )
+static inline void sub192(uint64_t a0, uint64_t a1, uint64_t a2,
+                          uint64_t b0, uint64_t b1, uint64_t b2,
+                          uint64_t *z0Ptr, uint64_t *z1Ptr, uint64_t *z2Ptr)
 {
-    uint64_t z0, z1, z2;
-    int8_t borrow0, borrow1;
-
-    z2 = a2 - b2;
-    borrow1 = ( a2 < b2 );
-    z1 = a1 - b1;
-    borrow0 = ( a1 < b1 );
-    z0 = a0 - b0;
-    z0 -= ( z1 < borrow1 );
-    z1 -= borrow1;
-    z0 -= borrow0;
-    *z2Ptr = z2;
-    *z1Ptr = z1;
-    *z0Ptr = z0;
-
+    bool c = 0;
+    *z2Ptr = usub64_borrow(a2, b2, &c);
+    *z1Ptr = usub64_borrow(a1, b1, &c);
+    *z0Ptr = usub64_borrow(a0, b0, &c);
 }
 
 /*----------------------------------------------------------------------------
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 34/72] softfloat: Move addsub_floats to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (32 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 33/72] softfloat: Use uadd64_carry, usub64_borrow in softfloat-macros.h Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13 10:03   ` Alex Bennée
  2021-05-13 10:05   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 35/72] softfloat: Implement float128_add/sub via parts Richard Henderson
                   ` (42 subsequent siblings)
  76 siblings, 2 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

In preparation for implementing multiple sizes.  Rename to parts_addsub,
split out parts_add/sub_normal for future reuse with muladd.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c                  | 253 ++++++++++++++-----------------
 fpu/softfloat-parts-addsub.c.inc |  62 ++++++++
 fpu/softfloat-parts.c.inc        |  81 ++++++++++
 3 files changed, 255 insertions(+), 141 deletions(-)
 create mode 100644 fpu/softfloat-parts-addsub.c.inc

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index e9d644385d..22dc6da5ef 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -748,6 +748,26 @@ static void parts128_uncanon(FloatParts128 *p, float_status *status,
 #define parts_uncanon(A, S, F) \
     PARTS_GENERIC_64_128(uncanon, A)(A, S, F)
 
+static void parts64_add_normal(FloatParts64 *a, FloatParts64 *b);
+static void parts128_add_normal(FloatParts128 *a, FloatParts128 *b);
+
+#define parts_add_normal(A, B) \
+    PARTS_GENERIC_64_128(add_normal, A)(A, B)
+
+static bool parts64_sub_normal(FloatParts64 *a, FloatParts64 *b);
+static bool parts128_sub_normal(FloatParts128 *a, FloatParts128 *b);
+
+#define parts_sub_normal(A, B) \
+    PARTS_GENERIC_64_128(sub_normal, A)(A, B)
+
+static FloatParts64 *parts64_addsub(FloatParts64 *a, FloatParts64 *b,
+                                    float_status *s, bool subtract);
+static FloatParts128 *parts128_addsub(FloatParts128 *a, FloatParts128 *b,
+                                      float_status *s, bool subtract);
+
+#define parts_addsub(A, B, S, Z) \
+    PARTS_GENERIC_64_128(addsub, A)(A, B, S, Z)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -755,6 +775,21 @@ static void parts128_uncanon(FloatParts128 *p, float_status *status,
 #define FRAC_GENERIC_64_128(NAME, P) \
     QEMU_GENERIC(P, (FloatParts128 *, frac128_##NAME), frac64_##NAME)
 
+static bool frac64_add(FloatParts64 *r, FloatParts64 *a, FloatParts64 *b)
+{
+    return uadd64_overflow(a->frac, b->frac, &r->frac);
+}
+
+static bool frac128_add(FloatParts128 *r, FloatParts128 *a, FloatParts128 *b)
+{
+    bool c = 0;
+    r->frac_lo = uadd64_carry(a->frac_lo, b->frac_lo, &c);
+    r->frac_hi = uadd64_carry(a->frac_hi, b->frac_hi, &c);
+    return c;
+}
+
+#define frac_add(R, A, B)  FRAC_GENERIC_64_128(add, R)(R, A, B)
+
 static bool frac64_addi(FloatParts64 *r, FloatParts64 *a, uint64_t c)
 {
     return uadd64_overflow(a->frac, c, &r->frac);
@@ -823,6 +858,20 @@ static bool frac128_eqz(FloatParts128 *a)
 
 #define frac_eqz(A)  FRAC_GENERIC_64_128(eqz, A)(A)
 
+static void frac64_neg(FloatParts64 *a)
+{
+    a->frac = -a->frac;
+}
+
+static void frac128_neg(FloatParts128 *a)
+{
+    bool c = 0;
+    a->frac_lo = usub64_borrow(0, a->frac_lo, &c);
+    a->frac_hi = usub64_borrow(0, a->frac_hi, &c);
+}
+
+#define frac_neg(A)  FRAC_GENERIC_64_128(neg, A)(A)
+
 static int frac64_normalize(FloatParts64 *a)
 {
     if (a->frac) {
@@ -890,18 +939,36 @@ static void frac128_shrjam(FloatParts128 *a, int c)
 
 #define frac_shrjam(A, C)  FRAC_GENERIC_64_128(shrjam, A)(A, C)
 
-#define partsN(NAME)   parts64_##NAME
-#define FloatPartsN    FloatParts64
+static bool frac64_sub(FloatParts64 *r, FloatParts64 *a, FloatParts64 *b)
+{
+    return usub64_overflow(a->frac, b->frac, &r->frac);
+}
 
+static bool frac128_sub(FloatParts128 *r, FloatParts128 *a, FloatParts128 *b)
+{
+    bool c = 0;
+    r->frac_lo = usub64_borrow(a->frac_lo, b->frac_lo, &c);
+    r->frac_hi = usub64_borrow(a->frac_hi, b->frac_hi, &c);
+    return c;
+}
+
+#define frac_sub(R, A, B)  FRAC_GENERIC_64_128(sub, R)(R, A, B)
+
+#define partsN(NAME)   glue(glue(glue(parts,N),_),NAME)
+#define FloatPartsN    glue(FloatParts,N)
+
+#define N 64
+
+#include "softfloat-parts-addsub.c.inc"
 #include "softfloat-parts.c.inc"
 
-#undef  partsN
-#undef  FloatPartsN
-#define partsN(NAME)   parts128_##NAME
-#define FloatPartsN    FloatParts128
+#undef  N
+#define N 128
 
+#include "softfloat-parts-addsub.c.inc"
 #include "softfloat-parts.c.inc"
 
+#undef  N
 #undef  partsN
 #undef  FloatPartsN
 
@@ -979,165 +1046,73 @@ static float64 float64_round_pack_canonical(FloatParts64 *p,
 }
 
 /*
- * Returns the result of adding or subtracting the values of the
- * floating-point values `a' and `b'. The operation is performed
- * according to the IEC/IEEE Standard for Binary Floating-Point
- * Arithmetic.
+ * Addition and subtraction
  */
 
-static FloatParts64 addsub_floats(FloatParts64 a, FloatParts64 b, bool subtract,
-                                float_status *s)
+static float16 QEMU_FLATTEN
+float16_addsub(float16 a, float16 b, float_status *status, bool subtract)
 {
-    bool a_sign = a.sign;
-    bool b_sign = b.sign ^ subtract;
-
-    if (a_sign != b_sign) {
-        /* Subtraction */
-
-        if (a.cls == float_class_normal && b.cls == float_class_normal) {
-            if (a.exp > b.exp || (a.exp == b.exp && a.frac >= b.frac)) {
-                shift64RightJamming(b.frac, a.exp - b.exp, &b.frac);
-                a.frac = a.frac - b.frac;
-            } else {
-                shift64RightJamming(a.frac, b.exp - a.exp, &a.frac);
-                a.frac = b.frac - a.frac;
-                a.exp = b.exp;
-                a_sign ^= 1;
-            }
-
-            if (a.frac == 0) {
-                a.cls = float_class_zero;
-                a.sign = s->float_rounding_mode == float_round_down;
-            } else {
-                int shift = clz64(a.frac);
-                a.frac = a.frac << shift;
-                a.exp = a.exp - shift;
-                a.sign = a_sign;
-            }
-            return a;
-        }
-        if (is_nan(a.cls) || is_nan(b.cls)) {
-            return *parts_pick_nan(&a, &b, s);
-        }
-        if (a.cls == float_class_inf) {
-            if (b.cls == float_class_inf) {
-                float_raise(float_flag_invalid, s);
-                parts_default_nan(&a, s);
-            }
-            return a;
-        }
-        if (a.cls == float_class_zero && b.cls == float_class_zero) {
-            a.sign = s->float_rounding_mode == float_round_down;
-            return a;
-        }
-        if (a.cls == float_class_zero || b.cls == float_class_inf) {
-            b.sign = a_sign ^ 1;
-            return b;
-        }
-        if (b.cls == float_class_zero) {
-            return a;
-        }
-    } else {
-        /* Addition */
-        if (a.cls == float_class_normal && b.cls == float_class_normal) {
-            if (a.exp > b.exp) {
-                shift64RightJamming(b.frac, a.exp - b.exp, &b.frac);
-            } else if (a.exp < b.exp) {
-                shift64RightJamming(a.frac, b.exp - a.exp, &a.frac);
-                a.exp = b.exp;
-            }
-
-            if (uadd64_overflow(a.frac, b.frac, &a.frac)) {
-                shift64RightJamming(a.frac, 1, &a.frac);
-                a.frac |= DECOMPOSED_IMPLICIT_BIT;
-                a.exp += 1;
-            }
-            return a;
-        }
-        if (is_nan(a.cls) || is_nan(b.cls)) {
-            return *parts_pick_nan(&a, &b, s);
-        }
-        if (a.cls == float_class_inf || b.cls == float_class_zero) {
-            return a;
-        }
-        if (b.cls == float_class_inf || a.cls == float_class_zero) {
-            b.sign = b_sign;
-            return b;
-        }
-    }
-    g_assert_not_reached();
-}
-
-/*
- * Returns the result of adding or subtracting the floating-point
- * values `a' and `b'. The operation is performed according to the
- * IEC/IEEE Standard for Binary Floating-Point Arithmetic.
- */
-
-float16 QEMU_FLATTEN float16_add(float16 a, float16 b, float_status *status)
-{
-    FloatParts64 pa, pb, pr;
+    FloatParts64 pa, pb, *pr;
 
     float16_unpack_canonical(&pa, a, status);
     float16_unpack_canonical(&pb, b, status);
-    pr = addsub_floats(pa, pb, false, status);
+    pr = parts_addsub(&pa, &pb, status, subtract);
 
-    return float16_round_pack_canonical(&pr, status);
+    return float16_round_pack_canonical(pr, status);
 }
 
-float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
+float16 float16_add(float16 a, float16 b, float_status *status)
 {
-    FloatParts64 pa, pb, pr;
+    return float16_addsub(a, b, status, false);
+}
 
-    float16_unpack_canonical(&pa, a, status);
-    float16_unpack_canonical(&pb, b, status);
-    pr = addsub_floats(pa, pb, true, status);
-
-    return float16_round_pack_canonical(&pr, status);
+float16 float16_sub(float16 a, float16 b, float_status *status)
+{
+    return float16_addsub(a, b, status, true);
 }
 
 static float32 QEMU_SOFTFLOAT_ATTR
-soft_f32_addsub(float32 a, float32 b, bool subtract, float_status *status)
+soft_f32_addsub(float32 a, float32 b, float_status *status, bool subtract)
 {
-    FloatParts64 pa, pb, pr;
+    FloatParts64 pa, pb, *pr;
 
     float32_unpack_canonical(&pa, a, status);
     float32_unpack_canonical(&pb, b, status);
-    pr = addsub_floats(pa, pb, subtract, status);
+    pr = parts_addsub(&pa, &pb, status, subtract);
 
-    return float32_round_pack_canonical(&pr, status);
+    return float32_round_pack_canonical(pr, status);
 }
 
-static inline float32 soft_f32_add(float32 a, float32 b, float_status *status)
+static float32 soft_f32_add(float32 a, float32 b, float_status *status)
 {
-    return soft_f32_addsub(a, b, false, status);
+    return soft_f32_addsub(a, b, status, false);
 }
 
-static inline float32 soft_f32_sub(float32 a, float32 b, float_status *status)
+static float32 soft_f32_sub(float32 a, float32 b, float_status *status)
 {
-    return soft_f32_addsub(a, b, true, status);
+    return soft_f32_addsub(a, b, status, true);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
-soft_f64_addsub(float64 a, float64 b, bool subtract, float_status *status)
+soft_f64_addsub(float64 a, float64 b, float_status *status, bool subtract)
 {
-    FloatParts64 pa, pb, pr;
+    FloatParts64 pa, pb, *pr;
 
     float64_unpack_canonical(&pa, a, status);
     float64_unpack_canonical(&pb, b, status);
-    pr = addsub_floats(pa, pb, subtract, status);
+    pr = parts_addsub(&pa, &pb, status, subtract);
 
-    return float64_round_pack_canonical(&pr, status);
+    return float64_round_pack_canonical(pr, status);
 }
 
-static inline float64 soft_f64_add(float64 a, float64 b, float_status *status)
+static float64 soft_f64_add(float64 a, float64 b, float_status *status)
 {
-    return soft_f64_addsub(a, b, false, status);
+    return soft_f64_addsub(a, b, status, false);
 }
 
-static inline float64 soft_f64_sub(float64 a, float64 b, float_status *status)
+static float64 soft_f64_sub(float64 a, float64 b, float_status *status)
 {
-    return soft_f64_addsub(a, b, true, status);
+    return soft_f64_addsub(a, b, status, true);
 }
 
 static float hard_f32_add(float a, float b)
@@ -1215,30 +1190,26 @@ float64_sub(float64 a, float64 b, float_status *s)
     return float64_addsub(a, b, s, hard_f64_sub, soft_f64_sub);
 }
 
-/*
- * Returns the result of adding or subtracting the bfloat16
- * values `a' and `b'.
- */
-bfloat16 QEMU_FLATTEN bfloat16_add(bfloat16 a, bfloat16 b, float_status *status)
+static bfloat16 QEMU_FLATTEN
+bfloat16_addsub(bfloat16 a, bfloat16 b, float_status *status, bool subtract)
 {
-    FloatParts64 pa, pb, pr;
+    FloatParts64 pa, pb, *pr;
 
     bfloat16_unpack_canonical(&pa, a, status);
     bfloat16_unpack_canonical(&pb, b, status);
-    pr = addsub_floats(pa, pb, false, status);
+    pr = parts_addsub(&pa, &pb, status, subtract);
 
-    return bfloat16_round_pack_canonical(&pr, status);
+    return bfloat16_round_pack_canonical(pr, status);
 }
 
-bfloat16 QEMU_FLATTEN bfloat16_sub(bfloat16 a, bfloat16 b, float_status *status)
+bfloat16 bfloat16_add(bfloat16 a, bfloat16 b, float_status *status)
 {
-    FloatParts64 pa, pb, pr;
+    return bfloat16_addsub(a, b, status, false);
+}
 
-    bfloat16_unpack_canonical(&pa, a, status);
-    bfloat16_unpack_canonical(&pb, b, status);
-    pr = addsub_floats(pa, pb, true, status);
-
-    return bfloat16_round_pack_canonical(&pr, status);
+bfloat16 bfloat16_sub(bfloat16 a, bfloat16 b, float_status *status)
+{
+    return bfloat16_addsub(a, b, status, true);
 }
 
 /*
diff --git a/fpu/softfloat-parts-addsub.c.inc b/fpu/softfloat-parts-addsub.c.inc
new file mode 100644
index 0000000000..ae5c1017c5
--- /dev/null
+++ b/fpu/softfloat-parts-addsub.c.inc
@@ -0,0 +1,62 @@
+/*
+ * Floating point arithmetic implementation
+ *
+ * The code in this source file is derived from release 2a of the SoftFloat
+ * IEC/IEEE Floating-point Arithmetic Package. Those parts of the code (and
+ * some later contributions) are provided under that license, as detailed below.
+ * It has subsequently been modified by contributors to the QEMU Project,
+ * so some portions are provided under:
+ *  the SoftFloat-2a license
+ *  the BSD license
+ *  GPL-v2-or-later
+ *
+ * Any future contributions to this file after December 1st 2014 will be
+ * taken to be licensed under the Softfloat-2a license unless specifically
+ * indicated otherwise.
+ */
+
+static void partsN(add_normal)(FloatPartsN *a, FloatPartsN *b)
+{
+    int exp_diff = a->exp - b->exp;
+
+    if (exp_diff > 0) {
+        frac_shrjam(b, exp_diff);
+    } else if (exp_diff < 0) {
+        frac_shrjam(a, -exp_diff);
+        a->exp = b->exp;
+    }
+
+    if (frac_add(a, a, b)) {
+        frac_shrjam(a, 1);
+        a->frac_hi |= DECOMPOSED_IMPLICIT_BIT;
+        a->exp += 1;
+    }
+}
+
+static bool partsN(sub_normal)(FloatPartsN *a, FloatPartsN *b)
+{
+    int exp_diff = a->exp - b->exp;
+    int shift;
+
+    if (exp_diff > 0) {
+        frac_shrjam(b, exp_diff);
+        frac_sub(a, a, b);
+    } else if (exp_diff < 0) {
+        a->exp = b->exp;
+        a->sign ^= 1;
+        frac_shrjam(a, -exp_diff);
+        frac_sub(a, b, a);
+    } else if (frac_sub(a, a, b)) {
+        /* Overflow means that A was less than B. */
+        frac_neg(a);
+        a->sign ^= 1;
+    }
+
+    shift = frac_normalize(a);
+    if (likely(shift < N)) {
+        a->exp -= shift;
+	return true;
+    }
+    a->cls = float_class_zero;
+    return false;
+}
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index efdc724770..cfce9f6421 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -281,3 +281,84 @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
     p->exp = exp;
     float_raise(flags, s);
 }
+
+/*
+ * Returns the result of adding or subtracting the values of the
+ * floating-point values `a' and `b'. The operation is performed
+ * according to the IEC/IEEE Standard for Binary Floating-Point
+ * Arithmetic.
+ */
+static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
+                                   float_status *s, bool subtract)
+{
+    bool b_sign = b->sign ^ subtract;
+    int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
+
+    if (a->sign != b_sign) {
+        /* Subtraction */
+        if (likely(ab_mask == float_cmask_normal)) {
+            if (parts_sub_normal(a, b)) {
+                return a;
+            }
+            /* Subtract was exact, fall through to set sign. */
+            ab_mask = float_cmask_zero;
+        }
+
+        if (ab_mask == float_cmask_zero) {
+            a->sign = s->float_rounding_mode == float_round_down;
+            return a;
+        }
+
+        if (unlikely(ab_mask & float_cmask_anynan)) {
+            goto p_nan;
+        }
+
+        if (ab_mask & float_cmask_inf) {
+            if (a->cls != float_class_inf) {
+                /* N - Inf */
+                goto return_b;
+            }
+            if (b->cls != float_class_inf) {
+                /* Inf - N */
+                return a;
+            }
+            /* Inf - Inf */
+            float_raise(float_flag_invalid, s);
+            parts_default_nan(a, s);
+            return a;
+        }
+    } else {
+        /* Addition */
+        if (likely(ab_mask == float_cmask_normal)) {
+            parts_add_normal(a, b);
+            return a;
+        }
+
+        if (ab_mask == float_cmask_zero) {
+            return a;
+        }
+
+        if (unlikely(ab_mask & float_cmask_anynan)) {
+            goto p_nan;
+        }
+
+        if (ab_mask & float_cmask_inf) {
+            a->cls = float_class_inf;
+            return a;
+        }
+    }
+
+    if (b->cls == float_class_zero) {
+        g_assert(a->cls == float_class_normal);
+        return a;
+    }
+
+    g_assert(a->cls == float_class_zero);
+    g_assert(b->cls == float_class_normal);
+ return_b:
+    b->sign = b_sign;
+    return b;
+
+ p_nan:
+    return parts_pick_nan(a, b, s);
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 35/72] softfloat: Implement float128_add/sub via parts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (33 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 34/72] softfloat: Move addsub_floats to softfloat-parts.c.inc Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13 10:06   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 36/72] softfloat: Move mul_floats to softfloat-parts.c.inc Richard Henderson
                   ` (41 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Replace the existing Berkeley implementation with the
FloatParts implementation.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 257 +++++++-----------------------------------------
 1 file changed, 36 insertions(+), 221 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 22dc6da5ef..f48e6b1f64 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1045,6 +1045,20 @@ static float64 float64_round_pack_canonical(FloatParts64 *p,
     return float64_pack_raw(p);
 }
 
+static void float128_unpack_canonical(FloatParts128 *p, float128 f,
+                                      float_status *s)
+{
+    float128_unpack_raw(p, f);
+    parts_canonicalize(p, s, &float128_params);
+}
+
+static float128 float128_round_pack_canonical(FloatParts128 *p,
+                                              float_status *s)
+{
+    parts_uncanon(p, s, &float128_params);
+    return float128_pack_raw(p);
+}
+
 /*
  * Addition and subtraction
  */
@@ -1212,6 +1226,28 @@ bfloat16 bfloat16_sub(bfloat16 a, bfloat16 b, float_status *status)
     return bfloat16_addsub(a, b, status, true);
 }
 
+static float128 QEMU_FLATTEN
+float128_addsub(float128 a, float128 b, float_status *status, bool subtract)
+{
+    FloatParts128 pa, pb, *pr;
+
+    float128_unpack_canonical(&pa, a, status);
+    float128_unpack_canonical(&pb, b, status);
+    pr = parts_addsub(&pa, &pb, status, subtract);
+
+    return float128_round_pack_canonical(pr, status);
+}
+
+float128 float128_add(float128 a, float128 b, float_status *status)
+{
+    return float128_addsub(a, b, status, false);
+}
+
+float128 float128_sub(float128 a, float128 b, float_status *status)
+{
+    return float128_addsub(a, b, status, true);
+}
+
 /*
  * Returns the result of multiplying the floating-point values `a' and
  * `b'. The operation is performed according to the IEC/IEEE Standard
@@ -7031,227 +7067,6 @@ float128 float128_round_to_int(float128 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of adding the absolute values of the quadruple-precision
-| floating-point values `a' and `b'.  If `zSign' is 1, the sum is negated
-| before being returned.  `zSign' is ignored if the result is a NaN.
-| The addition is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float128 addFloat128Sigs(float128 a, float128 b, bool zSign,
-                                float_status *status)
-{
-    int32_t aExp, bExp, zExp;
-    uint64_t aSig0, aSig1, bSig0, bSig1, zSig0, zSig1, zSig2;
-    int32_t expDiff;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    bSig1 = extractFloat128Frac1( b );
-    bSig0 = extractFloat128Frac0( b );
-    bExp = extractFloat128Exp( b );
-    expDiff = aExp - bExp;
-    if ( 0 < expDiff ) {
-        if ( aExp == 0x7FFF ) {
-            if (aSig0 | aSig1) {
-                return propagateFloat128NaN(a, b, status);
-            }
-            return a;
-        }
-        if ( bExp == 0 ) {
-            --expDiff;
-        }
-        else {
-            bSig0 |= UINT64_C(0x0001000000000000);
-        }
-        shift128ExtraRightJamming(
-            bSig0, bSig1, 0, expDiff, &bSig0, &bSig1, &zSig2 );
-        zExp = aExp;
-    }
-    else if ( expDiff < 0 ) {
-        if ( bExp == 0x7FFF ) {
-            if (bSig0 | bSig1) {
-                return propagateFloat128NaN(a, b, status);
-            }
-            return packFloat128( zSign, 0x7FFF, 0, 0 );
-        }
-        if ( aExp == 0 ) {
-            ++expDiff;
-        }
-        else {
-            aSig0 |= UINT64_C(0x0001000000000000);
-        }
-        shift128ExtraRightJamming(
-            aSig0, aSig1, 0, - expDiff, &aSig0, &aSig1, &zSig2 );
-        zExp = bExp;
-    }
-    else {
-        if ( aExp == 0x7FFF ) {
-            if ( aSig0 | aSig1 | bSig0 | bSig1 ) {
-                return propagateFloat128NaN(a, b, status);
-            }
-            return a;
-        }
-        add128( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1 );
-        if ( aExp == 0 ) {
-            if (status->flush_to_zero) {
-                if (zSig0 | zSig1) {
-                    float_raise(float_flag_output_denormal, status);
-                }
-                return packFloat128(zSign, 0, 0, 0);
-            }
-            return packFloat128( zSign, 0, zSig0, zSig1 );
-        }
-        zSig2 = 0;
-        zSig0 |= UINT64_C(0x0002000000000000);
-        zExp = aExp;
-        goto shiftRight1;
-    }
-    aSig0 |= UINT64_C(0x0001000000000000);
-    add128( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1 );
-    --zExp;
-    if ( zSig0 < UINT64_C(0x0002000000000000) ) goto roundAndPack;
-    ++zExp;
- shiftRight1:
-    shift128ExtraRightJamming(
-        zSig0, zSig1, zSig2, 1, &zSig0, &zSig1, &zSig2 );
- roundAndPack:
-    return roundAndPackFloat128(zSign, zExp, zSig0, zSig1, zSig2, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the absolute values of the quadruple-
-| precision floating-point values `a' and `b'.  If `zSign' is 1, the
-| difference is negated before being returned.  `zSign' is ignored if the
-| result is a NaN.  The subtraction is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float128 subFloat128Sigs(float128 a, float128 b, bool zSign,
-                                float_status *status)
-{
-    int32_t aExp, bExp, zExp;
-    uint64_t aSig0, aSig1, bSig0, bSig1, zSig0, zSig1;
-    int32_t expDiff;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    bSig1 = extractFloat128Frac1( b );
-    bSig0 = extractFloat128Frac0( b );
-    bExp = extractFloat128Exp( b );
-    expDiff = aExp - bExp;
-    shortShift128Left( aSig0, aSig1, 14, &aSig0, &aSig1 );
-    shortShift128Left( bSig0, bSig1, 14, &bSig0, &bSig1 );
-    if ( 0 < expDiff ) goto aExpBigger;
-    if ( expDiff < 0 ) goto bExpBigger;
-    if ( aExp == 0x7FFF ) {
-        if ( aSig0 | aSig1 | bSig0 | bSig1 ) {
-            return propagateFloat128NaN(a, b, status);
-        }
-        float_raise(float_flag_invalid, status);
-        return float128_default_nan(status);
-    }
-    if ( aExp == 0 ) {
-        aExp = 1;
-        bExp = 1;
-    }
-    if ( bSig0 < aSig0 ) goto aBigger;
-    if ( aSig0 < bSig0 ) goto bBigger;
-    if ( bSig1 < aSig1 ) goto aBigger;
-    if ( aSig1 < bSig1 ) goto bBigger;
-    return packFloat128(status->float_rounding_mode == float_round_down,
-                        0, 0, 0);
- bExpBigger:
-    if ( bExp == 0x7FFF ) {
-        if (bSig0 | bSig1) {
-            return propagateFloat128NaN(a, b, status);
-        }
-        return packFloat128( zSign ^ 1, 0x7FFF, 0, 0 );
-    }
-    if ( aExp == 0 ) {
-        ++expDiff;
-    }
-    else {
-        aSig0 |= UINT64_C(0x4000000000000000);
-    }
-    shift128RightJamming( aSig0, aSig1, - expDiff, &aSig0, &aSig1 );
-    bSig0 |= UINT64_C(0x4000000000000000);
- bBigger:
-    sub128( bSig0, bSig1, aSig0, aSig1, &zSig0, &zSig1 );
-    zExp = bExp;
-    zSign ^= 1;
-    goto normalizeRoundAndPack;
- aExpBigger:
-    if ( aExp == 0x7FFF ) {
-        if (aSig0 | aSig1) {
-            return propagateFloat128NaN(a, b, status);
-        }
-        return a;
-    }
-    if ( bExp == 0 ) {
-        --expDiff;
-    }
-    else {
-        bSig0 |= UINT64_C(0x4000000000000000);
-    }
-    shift128RightJamming( bSig0, bSig1, expDiff, &bSig0, &bSig1 );
-    aSig0 |= UINT64_C(0x4000000000000000);
- aBigger:
-    sub128( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1 );
-    zExp = aExp;
- normalizeRoundAndPack:
-    --zExp;
-    return normalizeRoundAndPackFloat128(zSign, zExp - 14, zSig0, zSig1,
-                                         status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of adding the quadruple-precision floating-point values
-| `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
-| for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float128 float128_add(float128 a, float128 b, float_status *status)
-{
-    bool aSign, bSign;
-
-    aSign = extractFloat128Sign( a );
-    bSign = extractFloat128Sign( b );
-    if ( aSign == bSign ) {
-        return addFloat128Sigs(a, b, aSign, status);
-    }
-    else {
-        return subFloat128Sigs(a, b, aSign, status);
-    }
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the quadruple-precision floating-point
-| values `a' and `b'.  The operation is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float128 float128_sub(float128 a, float128 b, float_status *status)
-{
-    bool aSign, bSign;
-
-    aSign = extractFloat128Sign( a );
-    bSign = extractFloat128Sign( b );
-    if ( aSign == bSign ) {
-        return subFloat128Sigs(a, b, aSign, status);
-    }
-    else {
-        return addFloat128Sigs(a, b, aSign, status);
-    }
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of multiplying the quadruple-precision floating-point
 | values `a' and `b'.  The operation is performed according to the IEC/IEEE
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 36/72] softfloat: Move mul_floats to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (34 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 35/72] softfloat: Implement float128_add/sub via parts Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13 10:08   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 37/72] softfloat: Move muladd_floats " Richard Henderson
                   ` (40 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Rename to parts$N_mul.
Reimplement float128_mul with FloatParts128.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 206 ++++++++++++++------------------------
 fpu/softfloat-parts.c.inc |  51 ++++++++++
 2 files changed, 128 insertions(+), 129 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index f48e6b1f64..4f498c11e5 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -532,6 +532,16 @@ typedef struct {
     uint64_t frac_lo;
 } FloatParts128;
 
+typedef struct {
+    FloatClass cls;
+    bool sign;
+    int32_t exp;
+    uint64_t frac_hi;
+    uint64_t frac_hm;  /* high-middle */
+    uint64_t frac_lm;  /* low-middle */
+    uint64_t frac_lo;
+} FloatParts256;
+
 /* These apply to the most significant word of each FloatPartsN. */
 #define DECOMPOSED_BINARY_POINT    63
 #define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
@@ -768,6 +778,14 @@ static FloatParts128 *parts128_addsub(FloatParts128 *a, FloatParts128 *b,
 #define parts_addsub(A, B, S, Z) \
     PARTS_GENERIC_64_128(addsub, A)(A, B, S, Z)
 
+static FloatParts64 *parts64_mul(FloatParts64 *a, FloatParts64 *b,
+                                 float_status *s);
+static FloatParts128 *parts128_mul(FloatParts128 *a, FloatParts128 *b,
+                                   float_status *s);
+
+#define parts_mul(A, B, S) \
+    PARTS_GENERIC_64_128(mul, A)(A, B, S)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -858,6 +876,19 @@ static bool frac128_eqz(FloatParts128 *a)
 
 #define frac_eqz(A)  FRAC_GENERIC_64_128(eqz, A)(A)
 
+static void frac64_mulw(FloatParts128 *r, FloatParts64 *a, FloatParts64 *b)
+{
+    mulu64(&r->frac_lo, &r->frac_hi, a->frac, b->frac);
+}
+
+static void frac128_mulw(FloatParts256 *r, FloatParts128 *a, FloatParts128 *b)
+{
+    mul128To256(a->frac_hi, a->frac_lo, b->frac_hi, b->frac_lo,
+                &r->frac_hi, &r->frac_hm, &r->frac_lm, &r->frac_lo);
+}
+
+#define frac_mulw(R, A, B)  FRAC_GENERIC_64_128(mulw, A)(R, A, B)
+
 static void frac64_neg(FloatParts64 *a)
 {
     a->frac = -a->frac;
@@ -954,23 +985,42 @@ static bool frac128_sub(FloatParts128 *r, FloatParts128 *a, FloatParts128 *b)
 
 #define frac_sub(R, A, B)  FRAC_GENERIC_64_128(sub, R)(R, A, B)
 
+static void frac64_truncjam(FloatParts64 *r, FloatParts128 *a)
+{
+    r->frac = a->frac_hi | (a->frac_lo != 0);
+}
+
+static void frac128_truncjam(FloatParts128 *r, FloatParts256 *a)
+{
+    r->frac_hi = a->frac_hi;
+    r->frac_lo = a->frac_hm | ((a->frac_lm | a->frac_lo) != 0);
+}
+
+#define frac_truncjam(R, A)  FRAC_GENERIC_64_128(truncjam, R)(R, A)
+
 #define partsN(NAME)   glue(glue(glue(parts,N),_),NAME)
 #define FloatPartsN    glue(FloatParts,N)
+#define FloatPartsW    glue(FloatParts,W)
 
 #define N 64
+#define W 128
 
 #include "softfloat-parts-addsub.c.inc"
 #include "softfloat-parts.c.inc"
 
 #undef  N
+#undef  W
 #define N 128
+#define W 256
 
 #include "softfloat-parts-addsub.c.inc"
 #include "softfloat-parts.c.inc"
 
 #undef  N
+#undef  W
 #undef  partsN
 #undef  FloatPartsN
+#undef  FloatPartsW
 
 /*
  * Pack/unpack routines with a specific FloatFmt.
@@ -1249,89 +1299,42 @@ float128 float128_sub(float128 a, float128 b, float_status *status)
 }
 
 /*
- * Returns the result of multiplying the floating-point values `a' and
- * `b'. The operation is performed according to the IEC/IEEE Standard
- * for Binary Floating-Point Arithmetic.
+ * Multiplication
  */
 
-static FloatParts64 mul_floats(FloatParts64 a, FloatParts64 b, float_status *s)
-{
-    bool sign = a.sign ^ b.sign;
-
-    if (a.cls == float_class_normal && b.cls == float_class_normal) {
-        uint64_t hi, lo;
-        int exp = a.exp + b.exp;
-
-        mul64To128(a.frac, b.frac, &hi, &lo);
-        if (hi & DECOMPOSED_IMPLICIT_BIT) {
-            exp += 1;
-        } else {
-            hi <<= 1;
-        }
-        hi |= (lo != 0);
-
-        /* Re-use a */
-        a.exp = exp;
-        a.sign = sign;
-        a.frac = hi;
-        return a;
-    }
-    /* handle all the NaN cases */
-    if (is_nan(a.cls) || is_nan(b.cls)) {
-        return *parts_pick_nan(&a, &b, s);
-    }
-    /* Inf * Zero == NaN */
-    if ((a.cls == float_class_inf && b.cls == float_class_zero) ||
-        (a.cls == float_class_zero && b.cls == float_class_inf)) {
-        float_raise(float_flag_invalid, s);
-        parts_default_nan(&a, s);
-        return a;
-    }
-    /* Multiply by 0 or Inf */
-    if (a.cls == float_class_inf || a.cls == float_class_zero) {
-        a.sign = sign;
-        return a;
-    }
-    if (b.cls == float_class_inf || b.cls == float_class_zero) {
-        b.sign = sign;
-        return b;
-    }
-    g_assert_not_reached();
-}
-
 float16 QEMU_FLATTEN float16_mul(float16 a, float16 b, float_status *status)
 {
-    FloatParts64 pa, pb, pr;
+    FloatParts64 pa, pb, *pr;
 
     float16_unpack_canonical(&pa, a, status);
     float16_unpack_canonical(&pb, b, status);
-    pr = mul_floats(pa, pb, status);
+    pr = parts_mul(&pa, &pb, status);
 
-    return float16_round_pack_canonical(&pr, status);
+    return float16_round_pack_canonical(pr, status);
 }
 
 static float32 QEMU_SOFTFLOAT_ATTR
 soft_f32_mul(float32 a, float32 b, float_status *status)
 {
-    FloatParts64 pa, pb, pr;
+    FloatParts64 pa, pb, *pr;
 
     float32_unpack_canonical(&pa, a, status);
     float32_unpack_canonical(&pb, b, status);
-    pr = mul_floats(pa, pb, status);
+    pr = parts_mul(&pa, &pb, status);
 
-    return float32_round_pack_canonical(&pr, status);
+    return float32_round_pack_canonical(pr, status);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_f64_mul(float64 a, float64 b, float_status *status)
 {
-    FloatParts64 pa, pb, pr;
+    FloatParts64 pa, pb, *pr;
 
     float64_unpack_canonical(&pa, a, status);
     float64_unpack_canonical(&pb, b, status);
-    pr = mul_floats(pa, pb, status);
+    pr = parts_mul(&pa, &pb, status);
 
-    return float64_round_pack_canonical(&pr, status);
+    return float64_round_pack_canonical(pr, status);
 }
 
 static float hard_f32_mul(float a, float b)
@@ -1358,20 +1361,28 @@ float64_mul(float64 a, float64 b, float_status *s)
                         f64_is_zon2, f64_addsubmul_post);
 }
 
-/*
- * Returns the result of multiplying the bfloat16
- * values `a' and `b'.
- */
-
-bfloat16 QEMU_FLATTEN bfloat16_mul(bfloat16 a, bfloat16 b, float_status *status)
+bfloat16 QEMU_FLATTEN
+bfloat16_mul(bfloat16 a, bfloat16 b, float_status *status)
 {
-    FloatParts64 pa, pb, pr;
+    FloatParts64 pa, pb, *pr;
 
     bfloat16_unpack_canonical(&pa, a, status);
     bfloat16_unpack_canonical(&pb, b, status);
-    pr = mul_floats(pa, pb, status);
+    pr = parts_mul(&pa, &pb, status);
 
-    return bfloat16_round_pack_canonical(&pr, status);
+    return bfloat16_round_pack_canonical(pr, status);
+}
+
+float128 QEMU_FLATTEN
+float128_mul(float128 a, float128 b, float_status *status)
+{
+    FloatParts128 pa, pb, *pr;
+
+    float128_unpack_canonical(&pa, a, status);
+    float128_unpack_canonical(&pb, b, status);
+    pr = parts_mul(&pa, &pb, status);
+
+    return float128_round_pack_canonical(pr, status);
 }
 
 /*
@@ -7067,69 +7078,6 @@ float128 float128_round_to_int(float128 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of multiplying the quadruple-precision floating-point
-| values `a' and `b'.  The operation is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float128 float128_mul(float128 a, float128 b, float_status *status)
-{
-    bool aSign, bSign, zSign;
-    int32_t aExp, bExp, zExp;
-    uint64_t aSig0, aSig1, bSig0, bSig1, zSig0, zSig1, zSig2, zSig3;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    aSign = extractFloat128Sign( a );
-    bSig1 = extractFloat128Frac1( b );
-    bSig0 = extractFloat128Frac0( b );
-    bExp = extractFloat128Exp( b );
-    bSign = extractFloat128Sign( b );
-    zSign = aSign ^ bSign;
-    if ( aExp == 0x7FFF ) {
-        if (    ( aSig0 | aSig1 )
-             || ( ( bExp == 0x7FFF ) && ( bSig0 | bSig1 ) ) ) {
-            return propagateFloat128NaN(a, b, status);
-        }
-        if ( ( bExp | bSig0 | bSig1 ) == 0 ) goto invalid;
-        return packFloat128( zSign, 0x7FFF, 0, 0 );
-    }
-    if ( bExp == 0x7FFF ) {
-        if (bSig0 | bSig1) {
-            return propagateFloat128NaN(a, b, status);
-        }
-        if ( ( aExp | aSig0 | aSig1 ) == 0 ) {
- invalid:
-            float_raise(float_flag_invalid, status);
-            return float128_default_nan(status);
-        }
-        return packFloat128( zSign, 0x7FFF, 0, 0 );
-    }
-    if ( aExp == 0 ) {
-        if ( ( aSig0 | aSig1 ) == 0 ) return packFloat128( zSign, 0, 0, 0 );
-        normalizeFloat128Subnormal( aSig0, aSig1, &aExp, &aSig0, &aSig1 );
-    }
-    if ( bExp == 0 ) {
-        if ( ( bSig0 | bSig1 ) == 0 ) return packFloat128( zSign, 0, 0, 0 );
-        normalizeFloat128Subnormal( bSig0, bSig1, &bExp, &bSig0, &bSig1 );
-    }
-    zExp = aExp + bExp - 0x4000;
-    aSig0 |= UINT64_C(0x0001000000000000);
-    shortShift128Left( bSig0, bSig1, 16, &bSig0, &bSig1 );
-    mul128To256( aSig0, aSig1, bSig0, bSig1, &zSig0, &zSig1, &zSig2, &zSig3 );
-    add128( zSig0, zSig1, aSig0, aSig1, &zSig0, &zSig1 );
-    zSig2 |= ( zSig3 != 0 );
-    if (UINT64_C( 0x0002000000000000) <= zSig0 ) {
-        shift128ExtraRightJamming(
-            zSig0, zSig1, zSig2, 1, &zSig0, &zSig1, &zSig2 );
-        ++zExp;
-    }
-    return roundAndPackFloat128(zSign, zExp, zSig0, zSig1, zSig2, status);
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of dividing the quadruple-precision floating-point value
 | `a' by the corresponding value `b'.  The operation is performed according to
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index cfce9f6421..9a67ab2bea 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -362,3 +362,54 @@ static FloatPartsN *partsN(addsub)(FloatPartsN *a, FloatPartsN *b,
  p_nan:
     return parts_pick_nan(a, b, s);
 }
+
+/*
+ * Returns the result of multiplying the floating-point values `a' and
+ * `b'. The operation is performed according to the IEC/IEEE Standard
+ * for Binary Floating-Point Arithmetic.
+ */
+static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
+                                float_status *s)
+{
+    int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
+    bool sign = a->sign ^ b->sign;
+
+    if (likely(ab_mask == float_cmask_normal)) {
+        FloatPartsW tmp;
+
+        frac_mulw(&tmp, a, b);
+        frac_truncjam(a, &tmp);
+
+        a->exp += b->exp + 1;
+        if (!(a->frac_hi & DECOMPOSED_IMPLICIT_BIT)) {
+            frac_add(a, a, a);
+            a->exp -= 1;
+        }
+
+        a->sign = sign;
+        return a;
+    }
+
+    /* Inf * Zero == NaN */
+    if (unlikely(ab_mask == float_cmask_infzero)) {
+        float_raise(float_flag_invalid, s);
+        parts_default_nan(a, s);
+        return a;
+    }
+
+    if (unlikely(ab_mask & float_cmask_anynan)) {
+        return parts_pick_nan(a, b, s);
+    }
+
+    /* Multiply by 0 or Inf */
+    if (ab_mask & float_cmask_inf) {
+        a->cls = float_class_inf;
+        a->sign = sign;
+        return a;
+    }
+
+    g_assert(ab_mask & float_cmask_zero);
+    a->cls = float_class_zero;
+    a->sign = sign;
+    return a;
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 37/72] softfloat: Move muladd_floats to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (35 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 36/72] softfloat: Move mul_floats to softfloat-parts.c.inc Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13 10:43   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 38/72] softfloat: Use mulu64 for mul64To128 Richard Henderson
                   ` (39 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Rename to parts$N_muladd.
Implement float128_muladd with FloatParts128.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat.h   |   2 +
 fpu/softfloat.c           | 406 ++++++++++++++++++--------------------
 tests/fp/fp-bench.c       |   8 +-
 tests/fp/fp-test.c        |   2 +-
 fpu/softfloat-parts.c.inc | 126 ++++++++++++
 tests/fp/wrap.c.inc       |  12 ++
 6 files changed, 342 insertions(+), 214 deletions(-)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 019c2ec66d..53f2c2ea3c 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -1197,6 +1197,8 @@ float128 float128_round_to_int(float128, float_status *status);
 float128 float128_add(float128, float128, float_status *status);
 float128 float128_sub(float128, float128, float_status *status);
 float128 float128_mul(float128, float128, float_status *status);
+float128 float128_muladd(float128, float128, float128, int,
+                         float_status *status);
 float128 float128_div(float128, float128, float_status *status);
 float128 float128_rem(float128, float128, float_status *status);
 float128 float128_sqrt(float128, float_status *status);
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 4f498c11e5..a9ee8498ae 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -715,6 +715,10 @@ static float128 float128_pack_raw(const FloatParts128 *p)
 #define PARTS_GENERIC_64_128(NAME, P) \
     QEMU_GENERIC(P, (FloatParts128 *, parts128_##NAME), parts64_##NAME)
 
+#define PARTS_GENERIC_64_128_256(NAME, P) \
+    QEMU_GENERIC(P, (FloatParts256 *, parts256_##NAME), \
+                 (FloatParts128 *, parts128_##NAME), parts64_##NAME)
+
 #define parts_default_nan(P, S)    PARTS_GENERIC_64_128(default_nan, P)(P, S)
 #define parts_silence_nan(P, S)    PARTS_GENERIC_64_128(silence_nan, P)(P, S)
 
@@ -760,15 +764,17 @@ static void parts128_uncanon(FloatParts128 *p, float_status *status,
 
 static void parts64_add_normal(FloatParts64 *a, FloatParts64 *b);
 static void parts128_add_normal(FloatParts128 *a, FloatParts128 *b);
+static void parts256_add_normal(FloatParts256 *a, FloatParts256 *b);
 
 #define parts_add_normal(A, B) \
-    PARTS_GENERIC_64_128(add_normal, A)(A, B)
+    PARTS_GENERIC_64_128_256(add_normal, A)(A, B)
 
 static bool parts64_sub_normal(FloatParts64 *a, FloatParts64 *b);
 static bool parts128_sub_normal(FloatParts128 *a, FloatParts128 *b);
+static bool parts256_sub_normal(FloatParts256 *a, FloatParts256 *b);
 
 #define parts_sub_normal(A, B) \
-    PARTS_GENERIC_64_128(sub_normal, A)(A, B)
+    PARTS_GENERIC_64_128_256(sub_normal, A)(A, B)
 
 static FloatParts64 *parts64_addsub(FloatParts64 *a, FloatParts64 *b,
                                     float_status *s, bool subtract);
@@ -786,6 +792,16 @@ static FloatParts128 *parts128_mul(FloatParts128 *a, FloatParts128 *b,
 #define parts_mul(A, B, S) \
     PARTS_GENERIC_64_128(mul, A)(A, B, S)
 
+static FloatParts64 *parts64_muladd(FloatParts64 *a, FloatParts64 *b,
+                                    FloatParts64 *c, int flags,
+                                    float_status *s);
+static FloatParts128 *parts128_muladd(FloatParts128 *a, FloatParts128 *b,
+                                      FloatParts128 *c, int flags,
+                                      float_status *s);
+
+#define parts_muladd(A, B, C, Z, S) \
+    PARTS_GENERIC_64_128(muladd, A)(A, B, C, Z, S)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -793,6 +809,10 @@ static FloatParts128 *parts128_mul(FloatParts128 *a, FloatParts128 *b,
 #define FRAC_GENERIC_64_128(NAME, P) \
     QEMU_GENERIC(P, (FloatParts128 *, frac128_##NAME), frac64_##NAME)
 
+#define FRAC_GENERIC_64_128_256(NAME, P) \
+    QEMU_GENERIC(P, (FloatParts256 *, frac256_##NAME), \
+                 (FloatParts128 *, frac128_##NAME), frac64_##NAME)
+
 static bool frac64_add(FloatParts64 *r, FloatParts64 *a, FloatParts64 *b)
 {
     return uadd64_overflow(a->frac, b->frac, &r->frac);
@@ -806,7 +826,17 @@ static bool frac128_add(FloatParts128 *r, FloatParts128 *a, FloatParts128 *b)
     return c;
 }
 
-#define frac_add(R, A, B)  FRAC_GENERIC_64_128(add, R)(R, A, B)
+static bool frac256_add(FloatParts256 *r, FloatParts256 *a, FloatParts256 *b)
+{
+    bool c = 0;
+    r->frac_lo = uadd64_carry(a->frac_lo, b->frac_lo, &c);
+    r->frac_lm = uadd64_carry(a->frac_lm, b->frac_lm, &c);
+    r->frac_hm = uadd64_carry(a->frac_hm, b->frac_hm, &c);
+    r->frac_hi = uadd64_carry(a->frac_hi, b->frac_hi, &c);
+    return c;
+}
+
+#define frac_add(R, A, B)  FRAC_GENERIC_64_128_256(add, R)(R, A, B)
 
 static bool frac64_addi(FloatParts64 *r, FloatParts64 *a, uint64_t c)
 {
@@ -901,7 +931,16 @@ static void frac128_neg(FloatParts128 *a)
     a->frac_hi = usub64_borrow(0, a->frac_hi, &c);
 }
 
-#define frac_neg(A)  FRAC_GENERIC_64_128(neg, A)(A)
+static void frac256_neg(FloatParts256 *a)
+{
+    bool c = 0;
+    a->frac_lo = usub64_borrow(0, a->frac_lo, &c);
+    a->frac_lm = usub64_borrow(0, a->frac_lm, &c);
+    a->frac_hm = usub64_borrow(0, a->frac_hm, &c);
+    a->frac_hi = usub64_borrow(0, a->frac_hi, &c);
+}
+
+#define frac_neg(A)  FRAC_GENERIC_64_128_256(neg, A)(A)
 
 static int frac64_normalize(FloatParts64 *a)
 {
@@ -932,7 +971,55 @@ static int frac128_normalize(FloatParts128 *a)
     return 128;
 }
 
-#define frac_normalize(A)  FRAC_GENERIC_64_128(normalize, A)(A)
+static int frac256_normalize(FloatParts256 *a)
+{
+    uint64_t a0 = a->frac_hi, a1 = a->frac_hm;
+    uint64_t a2 = a->frac_lm, a3 = a->frac_lo;
+    int ret, shl, shr;
+
+    if (likely(a0)) {
+        shl = clz64(a0);
+        if (shl == 0) {
+            return 0;
+        }
+        ret = shl;
+    } else {
+        if (a1) {
+            ret = 64;
+            a0 = a1, a1 = a2, a2 = a3, a3 = 0;
+        } else if (a2) {
+            ret = 128;
+            a0 = a2, a1 = a3, a2 = 0, a3 = 0;
+        } else if (a3) {
+            ret = 192;
+            a0 = a3, a1 = 0, a2 = 0, a3 = 0;
+        } else {
+            ret = 256;
+            a0 = 0, a1 = 0, a2 = 0, a3 = 0;
+            goto done;
+        }
+        shl = clz64(a0);
+        if (shl == 0) {
+            goto done;
+        }
+        ret += shl;
+    }
+
+    shr = -shl & 63;
+    a0 = (a0 << shl) | (a1 >> shr);
+    a1 = (a1 << shl) | (a2 >> shr);
+    a2 = (a2 << shl) | (a3 >> shr);
+    a3 = (a3 << shl);
+
+ done:
+    a->frac_hi = a0;
+    a->frac_hm = a1;
+    a->frac_lm = a2;
+    a->frac_lo = a3;
+    return ret;
+}
+
+#define frac_normalize(A)  FRAC_GENERIC_64_128_256(normalize, A)(A)
 
 static void frac64_shl(FloatParts64 *a, int c)
 {
@@ -968,7 +1055,51 @@ static void frac128_shrjam(FloatParts128 *a, int c)
     shift128RightJamming(a->frac_hi, a->frac_lo, c, &a->frac_hi, &a->frac_lo);
 }
 
-#define frac_shrjam(A, C)  FRAC_GENERIC_64_128(shrjam, A)(A, C)
+static void frac256_shrjam(FloatParts256 *a, int c)
+{
+    uint64_t a0 = a->frac_hi, a1 = a->frac_hm;
+    uint64_t a2 = a->frac_lm, a3 = a->frac_lo;
+    uint64_t sticky = 0;
+    int invc;
+
+    if (unlikely(c == 0)) {
+        return;
+    } else if (likely(c < 64)) {
+        /* nothing */
+    } else if (likely(c < 256)) {
+        if (unlikely(c & 128)) {
+            sticky |= a2 | a3;
+            a3 = a1, a2 = a0, a1 = 0, a0 = 0;
+        }
+        if (unlikely(c & 64)) {
+            sticky |= a3;
+            a3 = a2, a2 = a1, a1 = a0, a0 = 0;
+        }
+        c &= 63;
+        if (c == 0) {
+            goto done;
+        }
+    } else {
+        sticky = a0 | a1 | a2 | a3;
+        a0 = a1 = a2 = a3 = 0;
+        goto done;
+    }
+
+    invc = -c & 63;
+    sticky |= a3 << invc;
+    a3 = (a3 >> c) | (a2 << invc);
+    a2 = (a2 >> c) | (a1 << invc);
+    a1 = (a1 >> c) | (a0 << invc);
+    a0 = (a0 >> c);
+
+ done:
+    a->frac_lo = a3 | (sticky != 0);
+    a->frac_lm = a2;
+    a->frac_hm = a1;
+    a->frac_hi = a0;
+}
+
+#define frac_shrjam(A, C)  FRAC_GENERIC_64_128_256(shrjam, A)(A, C)
 
 static bool frac64_sub(FloatParts64 *r, FloatParts64 *a, FloatParts64 *b)
 {
@@ -983,7 +1114,17 @@ static bool frac128_sub(FloatParts128 *r, FloatParts128 *a, FloatParts128 *b)
     return c;
 }
 
-#define frac_sub(R, A, B)  FRAC_GENERIC_64_128(sub, R)(R, A, B)
+static bool frac256_sub(FloatParts256 *r, FloatParts256 *a, FloatParts256 *b)
+{
+    bool c = 0;
+    r->frac_lo = usub64_borrow(a->frac_lo, b->frac_lo, &c);
+    r->frac_lm = usub64_borrow(a->frac_lm, b->frac_lm, &c);
+    r->frac_hm = usub64_borrow(a->frac_hm, b->frac_hm, &c);
+    r->frac_hi = usub64_borrow(a->frac_hi, b->frac_hi, &c);
+    return c;
+}
+
+#define frac_sub(R, A, B)  FRAC_GENERIC_64_128_256(sub, R)(R, A, B)
 
 static void frac64_truncjam(FloatParts64 *r, FloatParts128 *a)
 {
@@ -998,6 +1139,22 @@ static void frac128_truncjam(FloatParts128 *r, FloatParts256 *a)
 
 #define frac_truncjam(R, A)  FRAC_GENERIC_64_128(truncjam, R)(R, A)
 
+static void frac64_widen(FloatParts128 *r, FloatParts64 *a)
+{
+    r->frac_hi = a->frac;
+    r->frac_lo = 0;
+}
+
+static void frac128_widen(FloatParts256 *r, FloatParts128 *a)
+{
+    r->frac_hi = a->frac_hi;
+    r->frac_hm = a->frac_lo;
+    r->frac_lm = 0;
+    r->frac_lo = 0;
+}
+
+#define frac_widen(A, B)  FRAC_GENERIC_64_128(widen, B)(A, B)
+
 #define partsN(NAME)   glue(glue(glue(parts,N),_),NAME)
 #define FloatPartsN    glue(FloatParts,N)
 #define FloatPartsW    glue(FloatParts,W)
@@ -1016,6 +1173,12 @@ static void frac128_truncjam(FloatParts128 *r, FloatParts256 *a)
 #include "softfloat-parts-addsub.c.inc"
 #include "softfloat-parts.c.inc"
 
+#undef  N
+#undef  W
+#define N            256
+
+#include "softfloat-parts-addsub.c.inc"
+
 #undef  N
 #undef  W
 #undef  partsN
@@ -1386,230 +1549,48 @@ float128_mul(float128 a, float128 b, float_status *status)
 }
 
 /*
- * Returns the result of multiplying the floating-point values `a' and
- * `b' then adding 'c', with no intermediate rounding step after the
- * multiplication. The operation is performed according to the
- * IEC/IEEE Standard for Binary Floating-Point Arithmetic 754-2008.
- * The flags argument allows the caller to select negation of the
- * addend, the intermediate product, or the final result. (The
- * difference between this and having the caller do a separate
- * negation is that negating externally will flip the sign bit on
- * NaNs.)
+ * Fused multiply-add
  */
 
-static FloatParts64 muladd_floats(FloatParts64 a, FloatParts64 b, FloatParts64 c,
-                                int flags, float_status *s)
-{
-    bool inf_zero, p_sign;
-    bool sign_flip = flags & float_muladd_negate_result;
-    FloatClass p_class;
-    uint64_t hi, lo;
-    int p_exp;
-    int ab_mask, abc_mask;
-
-    ab_mask = float_cmask(a.cls) | float_cmask(b.cls);
-    abc_mask = float_cmask(c.cls) | ab_mask;
-    inf_zero = ab_mask == float_cmask_infzero;
-
-    /* It is implementation-defined whether the cases of (0,inf,qnan)
-     * and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
-     * they return if they do), so we have to hand this information
-     * off to the target-specific pick-a-NaN routine.
-     */
-    if (unlikely(abc_mask & float_cmask_anynan)) {
-        return *parts_pick_nan_muladd(&a, &b, &c, s, ab_mask, abc_mask);
-    }
-
-    if (inf_zero) {
-        float_raise(float_flag_invalid, s);
-        parts_default_nan(&a, s);
-        return a;
-    }
-
-    if (flags & float_muladd_negate_c) {
-        c.sign ^= 1;
-    }
-
-    p_sign = a.sign ^ b.sign;
-
-    if (flags & float_muladd_negate_product) {
-        p_sign ^= 1;
-    }
-
-    if (ab_mask & float_cmask_inf) {
-        p_class = float_class_inf;
-    } else if (ab_mask & float_cmask_zero) {
-        p_class = float_class_zero;
-    } else {
-        p_class = float_class_normal;
-    }
-
-    if (c.cls == float_class_inf) {
-        if (p_class == float_class_inf && p_sign != c.sign) {
-            float_raise(float_flag_invalid, s);
-            parts_default_nan(&c, s);
-        } else {
-            c.sign ^= sign_flip;
-        }
-        return c;
-    }
-
-    if (p_class == float_class_inf) {
-        a.cls = float_class_inf;
-        a.sign = p_sign ^ sign_flip;
-        return a;
-    }
-
-    if (p_class == float_class_zero) {
-        if (c.cls == float_class_zero) {
-            if (p_sign != c.sign) {
-                p_sign = s->float_rounding_mode == float_round_down;
-            }
-            c.sign = p_sign;
-        } else if (flags & float_muladd_halve_result) {
-            c.exp -= 1;
-        }
-        c.sign ^= sign_flip;
-        return c;
-    }
-
-    /* a & b should be normals now... */
-    assert(a.cls == float_class_normal &&
-           b.cls == float_class_normal);
-
-    p_exp = a.exp + b.exp;
-
-    mul64To128(a.frac, b.frac, &hi, &lo);
-
-    /* Renormalize to the msb. */
-    if (hi & DECOMPOSED_IMPLICIT_BIT) {
-        p_exp += 1;
-    } else {
-        shortShift128Left(hi, lo, 1, &hi, &lo);
-    }
-
-    /* + add/sub */
-    if (c.cls != float_class_zero) {
-        int exp_diff = p_exp - c.exp;
-        if (p_sign == c.sign) {
-            /* Addition */
-            if (exp_diff <= 0) {
-                shift64RightJamming(hi, -exp_diff, &hi);
-                p_exp = c.exp;
-                if (uadd64_overflow(hi, c.frac, &hi)) {
-                    shift64RightJamming(hi, 1, &hi);
-                    hi |= DECOMPOSED_IMPLICIT_BIT;
-                    p_exp += 1;
-                }
-            } else {
-                uint64_t c_hi, c_lo, over;
-                shift128RightJamming(c.frac, 0, exp_diff, &c_hi, &c_lo);
-                add192(0, hi, lo, 0, c_hi, c_lo, &over, &hi, &lo);
-                if (over) {
-                    shift64RightJamming(hi, 1, &hi);
-                    hi |= DECOMPOSED_IMPLICIT_BIT;
-                    p_exp += 1;
-                }
-            }
-        } else {
-            /* Subtraction */
-            uint64_t c_hi = c.frac, c_lo = 0;
-
-            if (exp_diff <= 0) {
-                shift128RightJamming(hi, lo, -exp_diff, &hi, &lo);
-                if (exp_diff == 0
-                    &&
-                    (hi > c_hi || (hi == c_hi && lo >= c_lo))) {
-                    sub128(hi, lo, c_hi, c_lo, &hi, &lo);
-                } else {
-                    sub128(c_hi, c_lo, hi, lo, &hi, &lo);
-                    p_sign ^= 1;
-                    p_exp = c.exp;
-                }
-            } else {
-                shift128RightJamming(c_hi, c_lo,
-                                     exp_diff,
-                                     &c_hi, &c_lo);
-                sub128(hi, lo, c_hi, c_lo, &hi, &lo);
-            }
-
-            if (hi == 0 && lo == 0) {
-                a.cls = float_class_zero;
-                a.sign = s->float_rounding_mode == float_round_down;
-                a.sign ^= sign_flip;
-                return a;
-            } else {
-                int shift;
-                if (hi != 0) {
-                    shift = clz64(hi);
-                } else {
-                    shift = clz64(lo) + 64;
-                }
-                /* Normalizing to a binary point of 124 is the
-                   correct adjust for the exponent.  However since we're
-                   shifting, we might as well put the binary point back
-                   at 63 where we really want it.  Therefore shift as
-                   if we're leaving 1 bit at the top of the word, but
-                   adjust the exponent as if we're leaving 3 bits.  */
-                shift128Left(hi, lo, shift, &hi, &lo);
-                p_exp -= shift;
-            }
-        }
-    }
-    hi |= (lo != 0);
-
-    if (flags & float_muladd_halve_result) {
-        p_exp -= 1;
-    }
-
-    /* finally prepare our result */
-    a.cls = float_class_normal;
-    a.sign = p_sign ^ sign_flip;
-    a.exp = p_exp;
-    a.frac = hi;
-
-    return a;
-}
-
 float16 QEMU_FLATTEN float16_muladd(float16 a, float16 b, float16 c,
-                                                int flags, float_status *status)
+                                    int flags, float_status *status)
 {
-    FloatParts64 pa, pb, pc, pr;
+    FloatParts64 pa, pb, pc, *pr;
 
     float16_unpack_canonical(&pa, a, status);
     float16_unpack_canonical(&pb, b, status);
     float16_unpack_canonical(&pc, c, status);
-    pr = muladd_floats(pa, pb, pc, flags, status);
+    pr = parts_muladd(&pa, &pb, &pc, flags, status);
 
-    return float16_round_pack_canonical(&pr, status);
+    return float16_round_pack_canonical(pr, status);
 }
 
 static float32 QEMU_SOFTFLOAT_ATTR
 soft_f32_muladd(float32 a, float32 b, float32 c, int flags,
                 float_status *status)
 {
-    FloatParts64 pa, pb, pc, pr;
+    FloatParts64 pa, pb, pc, *pr;
 
     float32_unpack_canonical(&pa, a, status);
     float32_unpack_canonical(&pb, b, status);
     float32_unpack_canonical(&pc, c, status);
-    pr = muladd_floats(pa, pb, pc, flags, status);
+    pr = parts_muladd(&pa, &pb, &pc, flags, status);
 
-    return float32_round_pack_canonical(&pr, status);
+    return float32_round_pack_canonical(pr, status);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_f64_muladd(float64 a, float64 b, float64 c, int flags,
                 float_status *status)
 {
-    FloatParts64 pa, pb, pc, pr;
+    FloatParts64 pa, pb, pc, *pr;
 
     float64_unpack_canonical(&pa, a, status);
     float64_unpack_canonical(&pb, b, status);
     float64_unpack_canonical(&pc, c, status);
-    pr = muladd_floats(pa, pb, pc, flags, status);
+    pr = parts_muladd(&pa, &pb, &pc, flags, status);
 
-    return float64_round_pack_canonical(&pr, status);
+    return float64_round_pack_canonical(pr, status);
 }
 
 static bool force_soft_fma;
@@ -1756,23 +1737,30 @@ float64_muladd(float64 xa, float64 xb, float64 xc, int flags, float_status *s)
     return soft_f64_muladd(ua.s, ub.s, uc.s, flags, s);
 }
 
-/*
- * Returns the result of multiplying the bfloat16 values `a'
- * and `b' then adding 'c', with no intermediate rounding step after the
- * multiplication.
- */
-
 bfloat16 QEMU_FLATTEN bfloat16_muladd(bfloat16 a, bfloat16 b, bfloat16 c,
                                       int flags, float_status *status)
 {
-    FloatParts64 pa, pb, pc, pr;
+    FloatParts64 pa, pb, pc, *pr;
 
     bfloat16_unpack_canonical(&pa, a, status);
     bfloat16_unpack_canonical(&pb, b, status);
     bfloat16_unpack_canonical(&pc, c, status);
-    pr = muladd_floats(pa, pb, pc, flags, status);
+    pr = parts_muladd(&pa, &pb, &pc, flags, status);
 
-    return bfloat16_round_pack_canonical(&pr, status);
+    return bfloat16_round_pack_canonical(pr, status);
+}
+
+float128 QEMU_FLATTEN float128_muladd(float128 a, float128 b, float128 c,
+                                      int flags, float_status *status)
+{
+    FloatParts128 pa, pb, pc, *pr;
+
+    float128_unpack_canonical(&pa, a, status);
+    float128_unpack_canonical(&pb, b, status);
+    float128_unpack_canonical(&pc, c, status);
+    pr = parts_muladd(&pa, &pb, &pc, flags, status);
+
+    return float128_round_pack_canonical(pr, status);
 }
 
 /*
diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
index d319993280..c24baf8535 100644
--- a/tests/fp/fp-bench.c
+++ b/tests/fp/fp-bench.c
@@ -386,7 +386,7 @@ static void bench(enum precision prec, enum op op, int n_ops, bool no_neg)
             for (i = 0; i < OPS_PER_ITER; i++) {
                 float128 a = ops[0].f128;
                 float128 b = ops[1].f128;
-                /* float128 c = ops[2].f128; */
+                float128 c = ops[2].f128;
 
                 switch (op) {
                 case OP_ADD:
@@ -401,9 +401,9 @@ static void bench(enum precision prec, enum op op, int n_ops, bool no_neg)
                 case OP_DIV:
                     res.f128 = float128_div(a, b, &soft_status);
                     break;
-                /* case OP_FMA: */
-                /*     res.f128 = float128_muladd(a, b, c, 0, &soft_status); */
-                /*     break; */
+                case OP_FMA:
+                    res.f128 = float128_muladd(a, b, c, 0, &soft_status);
+                    break;
                 case OP_SQRT:
                     res.f128 = float128_sqrt(a, &soft_status);
                     break;
diff --git a/tests/fp/fp-test.c b/tests/fp/fp-test.c
index 5a4cad8c8b..ff131afbde 100644
--- a/tests/fp/fp-test.c
+++ b/tests/fp/fp-test.c
@@ -717,7 +717,7 @@ static void do_testfloat(int op, int rmode, bool exact)
         test_abz_f128(true_abz_f128M, subj_abz_f128M);
         break;
     case F128_MULADD:
-        not_implemented();
+        test_abcz_f128(slow_f128M_mulAdd, qemu_f128M_mulAdd);
         break;
     case F128_SQRT:
         test_az_f128(slow_f128M_sqrt, qemu_f128M_sqrt);
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 9a67ab2bea..a203811299 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -413,3 +413,129 @@ static FloatPartsN *partsN(mul)(FloatPartsN *a, FloatPartsN *b,
     a->sign = sign;
     return a;
 }
+
+/*
+ * Returns the result of multiplying the floating-point values `a' and
+ * `b' then adding 'c', with no intermediate rounding step after the
+ * multiplication. The operation is performed according to the
+ * IEC/IEEE Standard for Binary Floating-Point Arithmetic 754-2008.
+ * The flags argument allows the caller to select negation of the
+ * addend, the intermediate product, or the final result. (The
+ * difference between this and having the caller do a separate
+ * negation is that negating externally will flip the sign bit on NaNs.)
+ *
+ * Requires A and C extracted into a double-sized structure to provide the
+ * extra space for the widening multiply.
+ */
+static FloatPartsN *partsN(muladd)(FloatPartsN *a, FloatPartsN *b,
+                                   FloatPartsN *c, int flags, float_status *s)
+{
+    int ab_mask, abc_mask;
+    FloatPartsW p_widen, c_widen;
+
+    ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
+    abc_mask = float_cmask(c->cls) | ab_mask;
+
+    /*
+     * It is implementation-defined whether the cases of (0,inf,qnan)
+     * and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
+     * they return if they do), so we have to hand this information
+     * off to the target-specific pick-a-NaN routine.
+     */
+    if (unlikely(abc_mask & float_cmask_anynan)) {
+        return parts_pick_nan_muladd(a, b, c, s, ab_mask, abc_mask);
+    }
+
+    if (flags & float_muladd_negate_c) {
+        c->sign ^= 1;
+    }
+
+    /* Compute the sign of the product into A. */
+    a->sign ^= b->sign;
+    if (flags & float_muladd_negate_product) {
+        a->sign ^= 1;
+    }
+
+    if (unlikely(ab_mask != float_cmask_normal)) {
+        if (unlikely(ab_mask == float_cmask_infzero)) {
+            goto d_nan;
+        }
+
+        if (ab_mask & float_cmask_inf) {
+            if (c->cls == float_class_inf && a->sign != c->sign) {
+                goto d_nan;
+            }
+            goto return_inf;
+        }
+
+        g_assert(ab_mask & float_cmask_zero);
+        if (c->cls == float_class_normal) {
+            *a = *c;
+            goto return_normal;
+        }
+        if (c->cls == float_class_zero) {
+            if (a->sign != c->sign) {
+                goto return_sub_zero;
+            }
+            goto return_zero;
+        }
+        g_assert(c->cls == float_class_inf);
+    }
+
+    if (unlikely(c->cls == float_class_inf)) {
+        a->sign = c->sign;
+        goto return_inf;
+    }
+
+    /* Perform the multiplication step. */
+    p_widen.sign = a->sign;
+    p_widen.exp = a->exp + b->exp + 1;
+    frac_mulw(&p_widen, a, b);
+    if (!(p_widen.frac_hi & DECOMPOSED_IMPLICIT_BIT)) {
+        frac_add(&p_widen, &p_widen, &p_widen);
+        p_widen.exp -= 1;
+    }
+
+    /* Perform the addition step. */
+    if (c->cls != float_class_zero) {
+        /* Zero-extend C to less significant bits. */
+        frac_widen(&c_widen, c);
+        c_widen.exp = c->exp;
+
+        if (a->sign == c->sign) {
+            parts_add_normal(&p_widen, &c_widen);
+        } else if (!parts_sub_normal(&p_widen, &c_widen)) {
+            goto return_sub_zero;
+        }
+    }
+
+    /* Narrow with sticky bit, for proper rounding later. */
+    frac_truncjam(a, &p_widen);
+    a->sign = p_widen.sign;
+    a->exp = p_widen.exp;
+
+ return_normal:
+    if (flags & float_muladd_halve_result) {
+        a->exp -= 1;
+    }
+ finish_sign:
+    if (flags & float_muladd_negate_result) {
+        a->sign ^= 1;
+    }
+    return a;
+
+ return_sub_zero:
+    a->sign = s->float_rounding_mode == float_round_down;
+ return_zero:
+    a->cls = float_class_zero;
+    goto finish_sign;
+
+ return_inf:
+    a->cls = float_class_inf;
+    goto finish_sign;
+
+ d_nan:
+    float_raise(float_flag_invalid, s);
+    parts_default_nan(a, s);
+    return a;
+}
diff --git a/tests/fp/wrap.c.inc b/tests/fp/wrap.c.inc
index 0cbd20013e..cb1bb77e4c 100644
--- a/tests/fp/wrap.c.inc
+++ b/tests/fp/wrap.c.inc
@@ -574,6 +574,18 @@ WRAP_MULADD(qemu_f32_mulAdd, float32_muladd, float32)
 WRAP_MULADD(qemu_f64_mulAdd, float64_muladd, float64)
 #undef WRAP_MULADD
 
+static void qemu_f128M_mulAdd(const float128_t *ap, const float128_t *bp,
+                              const float128_t *cp, float128_t *res)
+{
+    float128 a, b, c, ret;
+
+    a = soft_to_qemu128(*ap);
+    b = soft_to_qemu128(*bp);
+    c = soft_to_qemu128(*cp);
+    ret = float128_muladd(a, b, c, 0, &qsf);
+    *res = qemu_to_soft128(ret);
+}
+
 #define WRAP_CMP16(name, func, retcond)         \
     static bool name(float16_t a, float16_t b)  \
     {                                           \
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 38/72] softfloat: Use mulu64 for mul64To128
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (36 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 37/72] softfloat: Move muladd_floats " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 39/72] softfloat: Use add192 in mul128To256 Richard Henderson
                   ` (38 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Via host-utils.h, we use a host widening multiply for
64-bit hosts, and a common subroutine for 32-bit hosts.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-macros.h | 23 +++--------------------
 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 2e3760a9c1..f6dfbe108d 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -470,27 +470,10 @@ static inline void sub192(uint64_t a0, uint64_t a1, uint64_t a2,
 | `z0Ptr' and `z1Ptr'.
 *----------------------------------------------------------------------------*/
 
-static inline void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr )
+static inline void
+mul64To128(uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr)
 {
-    uint32_t aHigh, aLow, bHigh, bLow;
-    uint64_t z0, zMiddleA, zMiddleB, z1;
-
-    aLow = a;
-    aHigh = a>>32;
-    bLow = b;
-    bHigh = b>>32;
-    z1 = ( (uint64_t) aLow ) * bLow;
-    zMiddleA = ( (uint64_t) aLow ) * bHigh;
-    zMiddleB = ( (uint64_t) aHigh ) * bLow;
-    z0 = ( (uint64_t) aHigh ) * bHigh;
-    zMiddleA += zMiddleB;
-    z0 += ( ( (uint64_t) ( zMiddleA < zMiddleB ) )<<32 ) + ( zMiddleA>>32 );
-    zMiddleA <<= 32;
-    z1 += zMiddleA;
-    z0 += ( z1 < zMiddleA );
-    *z1Ptr = z1;
-    *z0Ptr = z0;
-
+    mulu64(z1Ptr, z0Ptr, a, b);
 }
 
 /*----------------------------------------------------------------------------
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 39/72] softfloat: Use add192 in mul128To256
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (37 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 38/72] softfloat: Use mulu64 for mul64To128 Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13 10:49   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 40/72] softfloat: Tidy mul128By64To192 Richard Henderson
                   ` (37 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

We can perform the operation in 6 total adds instead of 8.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-macros.h | 37 +++++++++++-----------------------
 1 file changed, 12 insertions(+), 25 deletions(-)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index f6dfbe108d..76327d844d 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -511,34 +511,21 @@ static inline void
 | the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'.
 *----------------------------------------------------------------------------*/
 
-static inline void
- mul128To256(
-     uint64_t a0,
-     uint64_t a1,
-     uint64_t b0,
-     uint64_t b1,
-     uint64_t *z0Ptr,
-     uint64_t *z1Ptr,
-     uint64_t *z2Ptr,
-     uint64_t *z3Ptr
- )
+static inline void mul128To256(uint64_t a0, uint64_t a1,
+                               uint64_t b0, uint64_t b1,
+                               uint64_t *z0Ptr, uint64_t *z1Ptr,
+                               uint64_t *z2Ptr, uint64_t *z3Ptr)
 {
-    uint64_t z0, z1, z2, z3;
-    uint64_t more1, more2;
+    uint64_t z0, z1, z2;
+    uint64_t m0, m1, m2, n1, n2;
 
-    mul64To128( a1, b1, &z2, &z3 );
-    mul64To128( a1, b0, &z1, &more2 );
-    add128( z1, more2, 0, z2, &z1, &z2 );
-    mul64To128( a0, b0, &z0, &more1 );
-    add128( z0, more1, 0, z1, &z0, &z1 );
-    mul64To128( a0, b1, &more1, &more2 );
-    add128( more1, more2, 0, z2, &more1, &z2 );
-    add128( z0, z1, 0, more1, &z0, &z1 );
-    *z3Ptr = z3;
-    *z2Ptr = z2;
-    *z1Ptr = z1;
-    *z0Ptr = z0;
+    mul64To128(a1, b0, &m1, &m2);
+    mul64To128(a0, b1, &n1, &n2);
+    mul64To128(a1, b1, &z2, z3Ptr);
+    mul64To128(a0, b0, &z0, &z1);
 
+    add192( 0, m1, m2,  0, n1, n2, &m0, &m1, &m2);
+    add192(m0, m1, m2, z0, z1, z2, z0Ptr, z1Ptr, z2Ptr);
 }
 
 /*----------------------------------------------------------------------------
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 40/72] softfloat: Tidy mul128By64To192
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (38 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 39/72] softfloat: Use add192 in mul128To256 Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13 10:50   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 41/72] softfloat: Introduce sh[lr]_double primitives Richard Henderson
                   ` (36 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Clean up the formatting and variables; no functional change.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-macros.h | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 76327d844d..672c1db555 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -484,24 +484,14 @@ mul64To128(uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr)
 *----------------------------------------------------------------------------*/
 
 static inline void
- mul128By64To192(
-     uint64_t a0,
-     uint64_t a1,
-     uint64_t b,
-     uint64_t *z0Ptr,
-     uint64_t *z1Ptr,
-     uint64_t *z2Ptr
- )
+mul128By64To192(uint64_t a0, uint64_t a1, uint64_t b,
+                uint64_t *z0Ptr, uint64_t *z1Ptr, uint64_t *z2Ptr)
 {
-    uint64_t z0, z1, z2, more1;
-
-    mul64To128( a1, b, &z1, &z2 );
-    mul64To128( a0, b, &z0, &more1 );
-    add128( z0, more1, 0, z1, &z0, &z1 );
-    *z2Ptr = z2;
-    *z1Ptr = z1;
-    *z0Ptr = z0;
+    uint64_t z0, z1, m1;
 
+    mul64To128(a1, b, &m1, z2Ptr);
+    mul64To128(a0, b, &z0, &z1);
+    add128(z0, z1, 0, m1, z0Ptr, z1Ptr);
 }
 
 /*----------------------------------------------------------------------------
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 41/72] softfloat: Introduce sh[lr]_double primitives
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (39 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 40/72] softfloat: Tidy mul128By64To192 Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13 10:59   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 42/72] softfloat: Move div_floats to softfloat-parts.c.inc Richard Henderson
                   ` (35 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Have x86_64 assembly for them, with a fallback.
This avoids shuffling values through %cl in the x86 case.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-macros.h |  36 ++++++++++++
 fpu/softfloat.c                | 102 +++++++++++++++++++++++++--------
 2 files changed, 115 insertions(+), 23 deletions(-)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 672c1db555..ec4e27a595 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -85,6 +85,42 @@ this code that are retained.
 #include "fpu/softfloat-types.h"
 #include "qemu/host-utils.h"
 
+/**
+ * shl_double: double-word merging left shift
+ * @l: left or most-significant word
+ * @r: right or least-significant word
+ * @c: shift count
+ *
+ * Shift @l left by @c bits, shifting in bits from @r.
+ */
+static inline uint64_t shl_double(uint64_t l, uint64_t r, int c)
+{
+#if defined(__x86_64__)
+    asm("shld %b2, %1, %0" : "+r"(l) : "r"(r), "ci"(c));
+    return l;
+#else
+    return c ? (l << c) | (r >> (64 - c)) : l;
+#endif
+}
+
+/**
+ * shr_double: double-word merging right shift
+ * @l: left or most-significant word
+ * @r: right or least-significant word
+ * @c: shift count
+ *
+ * Shift @r right by @c bits, shifting in bits from @l.
+ */
+static inline uint64_t shr_double(uint64_t l, uint64_t r, int c)
+{
+#if defined(__x86_64__)
+    asm("shrd %b2, %1, %0" : "+r"(r) : "r"(l), "ci"(c));
+    return r;
+#else
+    return c ? (r >> c) | (l << (64 - c)) : r;
+#endif
+}
+
 /*----------------------------------------------------------------------------
 | Shifts `a' right by the number of bits given in `count'.  If any nonzero
 | bits are shifted off, they are ``jammed'' into the least significant bit of
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index a9ee8498ae..a42c297828 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -956,15 +956,12 @@ static int frac128_normalize(FloatParts128 *a)
 {
     if (a->frac_hi) {
         int shl = clz64(a->frac_hi);
-        if (shl) {
-            int shr = 64 - shl;
-            a->frac_hi = (a->frac_hi << shl) | (a->frac_lo >> shr);
-            a->frac_lo = (a->frac_lo << shl);
-        }
+        a->frac_hi = shl_double(a->frac_hi, a->frac_lo, shl);
+        a->frac_lo <<= shl;
         return shl;
     } else if (a->frac_lo) {
         int shl = clz64(a->frac_lo);
-        a->frac_hi = (a->frac_lo << shl);
+        a->frac_hi = a->frac_lo << shl;
         a->frac_lo = 0;
         return shl + 64;
     }
@@ -975,7 +972,7 @@ static int frac256_normalize(FloatParts256 *a)
 {
     uint64_t a0 = a->frac_hi, a1 = a->frac_hm;
     uint64_t a2 = a->frac_lm, a3 = a->frac_lo;
-    int ret, shl, shr;
+    int ret, shl;
 
     if (likely(a0)) {
         shl = clz64(a0);
@@ -1005,11 +1002,10 @@ static int frac256_normalize(FloatParts256 *a)
         ret += shl;
     }
 
-    shr = -shl & 63;
-    a0 = (a0 << shl) | (a1 >> shr);
-    a1 = (a1 << shl) | (a2 >> shr);
-    a2 = (a2 << shl) | (a3 >> shr);
-    a3 = (a3 << shl);
+    a0 = shl_double(a0, a1, shl);
+    a1 = shl_double(a1, a2, shl);
+    a2 = shl_double(a2, a3, shl);
+    a3 <<= shl;
 
  done:
     a->frac_hi = a0;
@@ -1028,7 +1024,20 @@ static void frac64_shl(FloatParts64 *a, int c)
 
 static void frac128_shl(FloatParts128 *a, int c)
 {
-    shift128Left(a->frac_hi, a->frac_lo, c, &a->frac_hi, &a->frac_lo);
+    uint64_t a0 = a->frac_hi, a1 = a->frac_lo;
+
+    if (c & 64) {
+        a0 = a1, a1 = 0;
+    }
+
+    c &= 63;
+    if (c) {
+        a0 = shl_double(a0, a1, c);
+        a1 = a1 << c;
+    }
+
+    a->frac_hi = a0;
+    a->frac_lo = a1;
 }
 
 #define frac_shl(A, C)  FRAC_GENERIC_64_128(shl, A)(A, C)
@@ -1040,19 +1049,68 @@ static void frac64_shr(FloatParts64 *a, int c)
 
 static void frac128_shr(FloatParts128 *a, int c)
 {
-    shift128Right(a->frac_hi, a->frac_lo, c, &a->frac_hi, &a->frac_lo);
+    uint64_t a0 = a->frac_hi, a1 = a->frac_lo;
+
+    if (c & 64) {
+        a1 = a0, a0 = 0;
+    }
+
+    c &= 63;
+    if (c) {
+        a1 = shr_double(a0, a1, c);
+        a0 = a0 >> c;
+    }
+
+    a->frac_hi = a0;
+    a->frac_lo = a1;
 }
 
 #define frac_shr(A, C)  FRAC_GENERIC_64_128(shr, A)(A, C)
 
 static void frac64_shrjam(FloatParts64 *a, int c)
 {
-    shift64RightJamming(a->frac, c, &a->frac);
+    uint64_t a0 = a->frac;
+
+    if (likely(c != 0)) {
+        if (likely(c < 64)) {
+            a0 = (a0 >> c) | (shr_double(a0, 0, c) != 0);
+        } else {
+            a0 = a0 != 0;
+        }
+        a->frac = a0;
+    }
 }
 
 static void frac128_shrjam(FloatParts128 *a, int c)
 {
-    shift128RightJamming(a->frac_hi, a->frac_lo, c, &a->frac_hi, &a->frac_lo);
+    uint64_t a0 = a->frac_hi, a1 = a->frac_lo;
+    uint64_t sticky = 0;
+
+    if (unlikely(c == 0)) {
+        return;
+    } else if (likely(c < 64)) {
+        /* nothing */
+    } else if (likely(c < 128)) {
+        sticky = a1;
+        a1 = a0;
+        a0 = 0;
+        c &= 63;
+        if (c == 0) {
+            goto done;
+        }
+    } else {
+        sticky = a0 | a1;
+        a0 = a1 = 0;
+        goto done;
+    }
+
+    sticky |= shr_double(a1, 0, c);
+    a1 = shr_double(a0, a1, c);
+    a0 = a0 >> c;
+
+ done:
+    a->frac_lo = a1 | (sticky != 0);
+    a->frac_hi = a0;
 }
 
 static void frac256_shrjam(FloatParts256 *a, int c)
@@ -1060,7 +1118,6 @@ static void frac256_shrjam(FloatParts256 *a, int c)
     uint64_t a0 = a->frac_hi, a1 = a->frac_hm;
     uint64_t a2 = a->frac_lm, a3 = a->frac_lo;
     uint64_t sticky = 0;
-    int invc;
 
     if (unlikely(c == 0)) {
         return;
@@ -1085,12 +1142,11 @@ static void frac256_shrjam(FloatParts256 *a, int c)
         goto done;
     }
 
-    invc = -c & 63;
-    sticky |= a3 << invc;
-    a3 = (a3 >> c) | (a2 << invc);
-    a2 = (a2 >> c) | (a1 << invc);
-    a1 = (a1 >> c) | (a0 << invc);
-    a0 = (a0 >> c);
+    sticky |= shr_double(a3, 0, c);
+    a3 = shr_double(a2, a3, c);
+    a2 = shr_double(a1, a2, c);
+    a1 = shr_double(a0, a1, c);
+    a0 = a0 >> c;
 
  done:
     a->frac_lo = a3 | (sticky != 0);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 42/72] softfloat: Move div_floats to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (40 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 41/72] softfloat: Introduce sh[lr]_double primitives Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13 11:02   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 43/72] softfloat: Split float_to_float Richard Henderson
                   ` (34 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Rename to parts$N_div.
Implement float128_div with FloatParts128.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 290 +++++++++++++++-----------------------
 fpu/softfloat-parts.c.inc |  55 ++++++++
 2 files changed, 171 insertions(+), 174 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index a42c297828..8efa52f7ec 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -802,6 +802,14 @@ static FloatParts128 *parts128_muladd(FloatParts128 *a, FloatParts128 *b,
 #define parts_muladd(A, B, C, Z, S) \
     PARTS_GENERIC_64_128(muladd, A)(A, B, C, Z, S)
 
+static FloatParts64 *parts64_div(FloatParts64 *a, FloatParts64 *b,
+                                 float_status *s);
+static FloatParts128 *parts128_div(FloatParts128 *a, FloatParts128 *b,
+                                   float_status *s);
+
+#define parts_div(A, B, S) \
+    PARTS_GENERIC_64_128(div, A)(A, B, S)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -894,6 +902,87 @@ static void frac128_clear(FloatParts128 *a)
 
 #define frac_clear(A)  FRAC_GENERIC_64_128(clear, A)(A)
 
+static bool frac64_div(FloatParts64 *a, FloatParts64 *b)
+{
+    uint64_t n1, n0, r, q;
+    bool ret;
+
+    /*
+     * We want a 2*N / N-bit division to produce exactly an N-bit
+     * result, so that we do not lose any precision and so that we
+     * do not have to renormalize afterward.  If A.frac < B.frac,
+     * then division would produce an (N-1)-bit result; shift A left
+     * by one to produce the an N-bit result, and return true to
+     * decrement the exponent to match.
+     *
+     * The udiv_qrnnd algorithm that we're using requires normalization,
+     * i.e. the msb of the denominator must be set, which is already true.
+     */
+    ret = a->frac < b->frac;
+    if (ret) {
+        n0 = a->frac;
+        n1 = 0;
+    } else {
+        n0 = a->frac >> 1;
+        n1 = a->frac << 63;
+    }
+    q = udiv_qrnnd(&r, n0, n1, b->frac);
+
+    /* Set lsb if there is a remainder, to set inexact. */
+    a->frac = q | (r != 0);
+
+    return ret;
+}
+
+static bool frac128_div(FloatParts128 *a, FloatParts128 *b)
+{
+    uint64_t q0, q1, a0, a1, b0, b1;
+    uint64_t r0, r1, r2, r3, t0, t1, t2, t3;
+    bool ret = false;
+
+    a0 = a->frac_hi, a1 = a->frac_lo;
+    b0 = b->frac_hi, b1 = b->frac_lo;
+
+    ret = lt128(a0, a1, b0, b1);
+    if (!ret) {
+        a1 = shr_double(a0, a1, 1);
+        a0 = a0 >> 1;
+    }
+
+    /* Use 128/64 -> 64 division as estimate for 192/128 -> 128 division. */
+    q0 = estimateDiv128To64(a0, a1, b0);
+
+    /*
+     * Estimate is high because B1 was not included (unless B1 == 0).
+     * Reduce quotient and increase remainder until remainder is non-negative.
+     * This loop will execute 0 to 2 times.
+     */
+    mul128By64To192(b0, b1, q0, &t0, &t1, &t2);
+    sub192(a0, a1, 0, t0, t1, t2, &r0, &r1, &r2);
+    while (r0 != 0) {
+        q0--;
+        add192(r0, r1, r2, 0, b0, b1, &r0, &r1, &r2);
+    }
+
+    /* Repeat using the remainder, producing a second word of quotient. */
+    q1 = estimateDiv128To64(r1, r2, b0);
+    mul128By64To192(b0, b1, q1, &t1, &t2, &t3);
+    sub192(r1, r2, 0, t1, t2, t3, &r1, &r2, &r3);
+    while (r1 != 0) {
+        q1--;
+        add192(r1, r2, r3, 0, b0, b1, &r1, &r2, &r3);
+    }
+
+    /* Any remainder indicates inexact; set sticky bit. */
+    q1 |= (r2 | r3) != 0;
+
+    a->frac_hi = q0;
+    a->frac_lo = q1;
+    return ret;
+}
+
+#define frac_div(A, B)  FRAC_GENERIC_64_128(div, A)(A, B)
+
 static bool frac64_eqz(FloatParts64 *a)
 {
     return a->frac == 0;
@@ -1820,110 +1909,42 @@ float128 QEMU_FLATTEN float128_muladd(float128 a, float128 b, float128 c,
 }
 
 /*
- * Returns the result of dividing the floating-point value `a' by the
- * corresponding value `b'. The operation is performed according to
- * the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+ * Division
  */
 
-static FloatParts64 div_floats(FloatParts64 a, FloatParts64 b, float_status *s)
-{
-    bool sign = a.sign ^ b.sign;
-
-    if (a.cls == float_class_normal && b.cls == float_class_normal) {
-        uint64_t n0, n1, q, r;
-        int exp = a.exp - b.exp;
-
-        /*
-         * We want a 2*N / N-bit division to produce exactly an N-bit
-         * result, so that we do not lose any precision and so that we
-         * do not have to renormalize afterward.  If A.frac < B.frac,
-         * then division would produce an (N-1)-bit result; shift A left
-         * by one to produce the an N-bit result, and decrement the
-         * exponent to match.
-         *
-         * The udiv_qrnnd algorithm that we're using requires normalization,
-         * i.e. the msb of the denominator must be set, which is already true.
-         */
-        if (a.frac < b.frac) {
-            exp -= 1;
-            shift128Left(0, a.frac, DECOMPOSED_BINARY_POINT + 1, &n1, &n0);
-        } else {
-            shift128Left(0, a.frac, DECOMPOSED_BINARY_POINT, &n1, &n0);
-        }
-        q = udiv_qrnnd(&r, n1, n0, b.frac);
-
-        /* Set lsb if there is a remainder, to set inexact. */
-        a.frac = q | (r != 0);
-        a.sign = sign;
-        a.exp = exp;
-        return a;
-    }
-    /* handle all the NaN cases */
-    if (is_nan(a.cls) || is_nan(b.cls)) {
-        return *parts_pick_nan(&a, &b, s);
-    }
-    /* 0/0 or Inf/Inf */
-    if (a.cls == b.cls
-        &&
-        (a.cls == float_class_inf || a.cls == float_class_zero)) {
-        float_raise(float_flag_invalid, s);
-        parts_default_nan(&a, s);
-        return a;
-    }
-    /* Inf / x or 0 / x */
-    if (a.cls == float_class_inf || a.cls == float_class_zero) {
-        a.sign = sign;
-        return a;
-    }
-    /* Div 0 => Inf */
-    if (b.cls == float_class_zero) {
-        float_raise(float_flag_divbyzero, s);
-        a.cls = float_class_inf;
-        a.sign = sign;
-        return a;
-    }
-    /* Div by Inf */
-    if (b.cls == float_class_inf) {
-        a.cls = float_class_zero;
-        a.sign = sign;
-        return a;
-    }
-    g_assert_not_reached();
-}
-
 float16 float16_div(float16 a, float16 b, float_status *status)
 {
-    FloatParts64 pa, pb, pr;
+    FloatParts64 pa, pb, *pr;
 
     float16_unpack_canonical(&pa, a, status);
     float16_unpack_canonical(&pb, b, status);
-    pr = div_floats(pa, pb, status);
+    pr = parts_div(&pa, &pb, status);
 
-    return float16_round_pack_canonical(&pr, status);
+    return float16_round_pack_canonical(pr, status);
 }
 
 static float32 QEMU_SOFTFLOAT_ATTR
 soft_f32_div(float32 a, float32 b, float_status *status)
 {
-    FloatParts64 pa, pb, pr;
+    FloatParts64 pa, pb, *pr;
 
     float32_unpack_canonical(&pa, a, status);
     float32_unpack_canonical(&pb, b, status);
-    pr = div_floats(pa, pb, status);
+    pr = parts_div(&pa, &pb, status);
 
-    return float32_round_pack_canonical(&pr, status);
+    return float32_round_pack_canonical(pr, status);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_f64_div(float64 a, float64 b, float_status *status)
 {
-    FloatParts64 pa, pb, pr;
+    FloatParts64 pa, pb, *pr;
 
     float64_unpack_canonical(&pa, a, status);
     float64_unpack_canonical(&pb, b, status);
-    pr = div_floats(pa, pb, status);
+    pr = parts_div(&pa, &pb, status);
 
-    return float64_round_pack_canonical(&pr, status);
+    return float64_round_pack_canonical(pr, status);
 }
 
 static float hard_f32_div(float a, float b)
@@ -1984,20 +2005,28 @@ float64_div(float64 a, float64 b, float_status *s)
                         f64_div_pre, f64_div_post);
 }
 
-/*
- * Returns the result of dividing the bfloat16
- * value `a' by the corresponding value `b'.
- */
-
-bfloat16 bfloat16_div(bfloat16 a, bfloat16 b, float_status *status)
+bfloat16 QEMU_FLATTEN
+bfloat16_div(bfloat16 a, bfloat16 b, float_status *status)
 {
-    FloatParts64 pa, pb, pr;
+    FloatParts64 pa, pb, *pr;
 
     bfloat16_unpack_canonical(&pa, a, status);
     bfloat16_unpack_canonical(&pb, b, status);
-    pr = div_floats(pa, pb, status);
+    pr = parts_div(&pa, &pb, status);
 
-    return bfloat16_round_pack_canonical(&pr, status);
+    return bfloat16_round_pack_canonical(pr, status);
+}
+
+float128 QEMU_FLATTEN
+float128_div(float128 a, float128 b, float_status *status)
+{
+    FloatParts128 pa, pb, *pr;
+
+    float128_unpack_canonical(&pa, a, status);
+    float128_unpack_canonical(&pb, b, status);
+    pr = parts_div(&pa, &pb, status);
+
+    return float128_round_pack_canonical(pr, status);
 }
 
 /*
@@ -7122,93 +7151,6 @@ float128 float128_round_to_int(float128 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of dividing the quadruple-precision floating-point value
-| `a' by the corresponding value `b'.  The operation is performed according to
-| the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float128 float128_div(float128 a, float128 b, float_status *status)
-{
-    bool aSign, bSign, zSign;
-    int32_t aExp, bExp, zExp;
-    uint64_t aSig0, aSig1, bSig0, bSig1, zSig0, zSig1, zSig2;
-    uint64_t rem0, rem1, rem2, rem3, term0, term1, term2, term3;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    aSign = extractFloat128Sign( a );
-    bSig1 = extractFloat128Frac1( b );
-    bSig0 = extractFloat128Frac0( b );
-    bExp = extractFloat128Exp( b );
-    bSign = extractFloat128Sign( b );
-    zSign = aSign ^ bSign;
-    if ( aExp == 0x7FFF ) {
-        if (aSig0 | aSig1) {
-            return propagateFloat128NaN(a, b, status);
-        }
-        if ( bExp == 0x7FFF ) {
-            if (bSig0 | bSig1) {
-                return propagateFloat128NaN(a, b, status);
-            }
-            goto invalid;
-        }
-        return packFloat128( zSign, 0x7FFF, 0, 0 );
-    }
-    if ( bExp == 0x7FFF ) {
-        if (bSig0 | bSig1) {
-            return propagateFloat128NaN(a, b, status);
-        }
-        return packFloat128( zSign, 0, 0, 0 );
-    }
-    if ( bExp == 0 ) {
-        if ( ( bSig0 | bSig1 ) == 0 ) {
-            if ( ( aExp | aSig0 | aSig1 ) == 0 ) {
- invalid:
-                float_raise(float_flag_invalid, status);
-                return float128_default_nan(status);
-            }
-            float_raise(float_flag_divbyzero, status);
-            return packFloat128( zSign, 0x7FFF, 0, 0 );
-        }
-        normalizeFloat128Subnormal( bSig0, bSig1, &bExp, &bSig0, &bSig1 );
-    }
-    if ( aExp == 0 ) {
-        if ( ( aSig0 | aSig1 ) == 0 ) return packFloat128( zSign, 0, 0, 0 );
-        normalizeFloat128Subnormal( aSig0, aSig1, &aExp, &aSig0, &aSig1 );
-    }
-    zExp = aExp - bExp + 0x3FFD;
-    shortShift128Left(
-        aSig0 | UINT64_C(0x0001000000000000), aSig1, 15, &aSig0, &aSig1 );
-    shortShift128Left(
-        bSig0 | UINT64_C(0x0001000000000000), bSig1, 15, &bSig0, &bSig1 );
-    if ( le128( bSig0, bSig1, aSig0, aSig1 ) ) {
-        shift128Right( aSig0, aSig1, 1, &aSig0, &aSig1 );
-        ++zExp;
-    }
-    zSig0 = estimateDiv128To64( aSig0, aSig1, bSig0 );
-    mul128By64To192( bSig0, bSig1, zSig0, &term0, &term1, &term2 );
-    sub192( aSig0, aSig1, 0, term0, term1, term2, &rem0, &rem1, &rem2 );
-    while ( (int64_t) rem0 < 0 ) {
-        --zSig0;
-        add192( rem0, rem1, rem2, 0, bSig0, bSig1, &rem0, &rem1, &rem2 );
-    }
-    zSig1 = estimateDiv128To64( rem1, rem2, bSig0 );
-    if ( ( zSig1 & 0x3FFF ) <= 4 ) {
-        mul128By64To192( bSig0, bSig1, zSig1, &term1, &term2, &term3 );
-        sub192( rem1, rem2, 0, term1, term2, term3, &rem1, &rem2, &rem3 );
-        while ( (int64_t) rem1 < 0 ) {
-            --zSig1;
-            add192( rem1, rem2, rem3, 0, bSig0, bSig1, &rem1, &rem2, &rem3 );
-        }
-        zSig1 |= ( ( rem1 | rem2 | rem3 ) != 0 );
-    }
-    shift128ExtraRightJamming( zSig0, zSig1, 0, 15, &zSig0, &zSig1, &zSig2 );
-    return roundAndPackFloat128(zSign, zExp, zSig0, zSig1, zSig2, status);
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the remainder of the quadruple-precision floating-point value `a'
 | with respect to the corresponding value `b'.  The operation is performed
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index a203811299..f8165d92f9 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -539,3 +539,58 @@ static FloatPartsN *partsN(muladd)(FloatPartsN *a, FloatPartsN *b,
     parts_default_nan(a, s);
     return a;
 }
+
+/*
+ * Returns the result of dividing the floating-point value `a' by the
+ * corresponding value `b'. The operation is performed according to
+ * the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+ */
+static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
+                                float_status *s)
+{
+    int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
+    bool sign = a->sign ^ b->sign;
+
+    if (likely(ab_mask == float_cmask_normal)) {
+        a->sign = sign;
+        a->exp -= b->exp + frac_div(a, b);
+        return a;
+    }
+
+    /* 0/0 or Inf/Inf => NaN */
+    if (unlikely(ab_mask == float_cmask_zero) ||
+        unlikely(ab_mask == float_cmask_inf)) {
+        float_raise(float_flag_invalid, s);
+        parts_default_nan(a, s);
+        return a;
+    }
+
+    /* All the NaN cases */
+    if (unlikely(ab_mask & float_cmask_anynan)) {
+        return parts_pick_nan(a, b, s);
+    }
+
+    a->sign = sign;
+
+    /* Inf / X */
+    if (a->cls == float_class_inf) {
+        return a;
+    }
+
+    /* 0 / X */
+    if (a->cls == float_class_zero) {
+        return a;
+    }
+
+    /* X / Inf */
+    if (b->cls == float_class_inf) {
+        a->cls = float_class_zero;
+        return a;
+    }
+
+    /* X / 0 => Inf */
+    g_assert(b->cls == float_class_zero);
+    float_raise(float_flag_divbyzero, s);
+    a->cls = float_class_inf;
+    return a;
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 43/72] softfloat: Split float_to_float
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (41 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 42/72] softfloat: Move div_floats to softfloat-parts.c.inc Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13 11:04   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 44/72] softfloat: Convert float-to-float conversions with float128 Richard Henderson
                   ` (33 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Split out parts_float_to_ahp and parts_float_to_float.
Convert to pointers.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 174 ++++++++++++++++++++++++++++--------------------
 1 file changed, 101 insertions(+), 73 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 8efa52f7ec..06fac8f41c 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2036,83 +2036,105 @@ float128_div(float128 a, float128 b, float_status *status)
  * conversion is performed according to the IEC/IEEE Standard for
  * Binary Floating-Point Arithmetic.
  *
- * The float_to_float helper only needs to take care of raising
- * invalid exceptions and handling the conversion on NaNs.
+ * Usually this only needs to take care of raising invalid exceptions
+ * and handling the conversion on NaNs.
  */
 
-static FloatParts64 float_to_float(FloatParts64 a, const FloatFmt *dstf,
-                                 float_status *s)
+static void parts_float_to_ahp(FloatParts64 *a, float_status *s)
 {
-    if (dstf->arm_althp) {
-        switch (a.cls) {
-        case float_class_qnan:
-        case float_class_snan:
-            /* There is no NaN in the destination format.  Raise Invalid
-             * and return a zero with the sign of the input NaN.
-             */
-            float_raise(float_flag_invalid, s);
-            a.cls = float_class_zero;
-            a.frac = 0;
-            a.exp = 0;
-            break;
+    switch (a->cls) {
+    case float_class_qnan:
+    case float_class_snan:
+        /*
+         * There is no NaN in the destination format.  Raise Invalid
+         * and return a zero with the sign of the input NaN.
+         */
+        float_raise(float_flag_invalid, s);
+        a->cls = float_class_zero;
+        break;
 
-        case float_class_inf:
-            /* There is no Inf in the destination format.  Raise Invalid
-             * and return the maximum normal with the correct sign.
-             */
-            float_raise(float_flag_invalid, s);
-            a.cls = float_class_normal;
-            a.exp = dstf->exp_max;
-            a.frac = ((1ull << dstf->frac_size) - 1) << dstf->frac_shift;
-            break;
+    case float_class_inf:
+        /*
+         * There is no Inf in the destination format.  Raise Invalid
+         * and return the maximum normal with the correct sign.
+         */
+        float_raise(float_flag_invalid, s);
+        a->cls = float_class_normal;
+        a->exp = float16_params_ahp.exp_max;
+        a->frac = MAKE_64BIT_MASK(float16_params_ahp.frac_shift,
+                                  float16_params_ahp.frac_size + 1);
+        break;
 
-        default:
-            break;
-        }
-    } else if (is_nan(a.cls)) {
-        parts_return_nan(&a, s);
+    case float_class_normal:
+    case float_class_zero:
+        break;
+
+    default:
+        g_assert_not_reached();
     }
-    return a;
 }
 
+static void parts64_float_to_float(FloatParts64 *a, float_status *s)
+{
+    if (is_nan(a->cls)) {
+        parts_return_nan(a, s);
+    }
+}
+
+static void parts128_float_to_float(FloatParts128 *a, float_status *s)
+{
+    if (is_nan(a->cls)) {
+        parts_return_nan(a, s);
+    }
+}
+
+#define parts_float_to_float(P, S) \
+    PARTS_GENERIC_64_128(float_to_float, P)(P, S)
+
 float32 float16_to_float32(float16 a, bool ieee, float_status *s)
 {
     const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float16a_unpack_canonical(&pa, a, s, fmt16);
-    pr = float_to_float(pa, &float32_params, s);
-    return float32_round_pack_canonical(&pr, s);
+    float16a_unpack_canonical(&p, a, s, fmt16);
+    parts_float_to_float(&p, s);
+    return float32_round_pack_canonical(&p, s);
 }
 
 float64 float16_to_float64(float16 a, bool ieee, float_status *s)
 {
     const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float16a_unpack_canonical(&pa, a, s, fmt16);
-    pr = float_to_float(pa, &float64_params, s);
-    return float64_round_pack_canonical(&pr, s);
+    float16a_unpack_canonical(&p, a, s, fmt16);
+    parts_float_to_float(&p, s);
+    return float64_round_pack_canonical(&p, s);
 }
 
 float16 float32_to_float16(float32 a, bool ieee, float_status *s)
 {
-    const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
-    FloatParts64 pa, pr;
+    FloatParts64 p;
+    const FloatFmt *fmt;
 
-    float32_unpack_canonical(&pa, a, s);
-    pr = float_to_float(pa, fmt16, s);
-    return float16a_round_pack_canonical(&pr, s, fmt16);
+    float32_unpack_canonical(&p, a, s);
+    if (ieee) {
+        parts_float_to_float(&p, s);
+        fmt = &float16_params;
+    } else {
+        parts_float_to_ahp(&p, s);
+        fmt = &float16_params_ahp;
+    }
+    return float16a_round_pack_canonical(&p, s, fmt);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_float32_to_float64(float32 a, float_status *s)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float32_unpack_canonical(&pa, a, s);
-    pr = float_to_float(pa, &float64_params, s);
-    return float64_round_pack_canonical(&pr, s);
+    float32_unpack_canonical(&p, a, s);
+    parts_float_to_float(&p, s);
+    return float64_round_pack_canonical(&p, s);
 }
 
 float64 float32_to_float64(float32 a, float_status *s)
@@ -2133,57 +2155,63 @@ float64 float32_to_float64(float32 a, float_status *s)
 
 float16 float64_to_float16(float64 a, bool ieee, float_status *s)
 {
-    const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
-    FloatParts64 pa, pr;
+    FloatParts64 p;
+    const FloatFmt *fmt;
 
-    float64_unpack_canonical(&pa, a, s);
-    pr = float_to_float(pa, fmt16, s);
-    return float16a_round_pack_canonical(&pr, s, fmt16);
+    float64_unpack_canonical(&p, a, s);
+    if (ieee) {
+        parts_float_to_float(&p, s);
+        fmt = &float16_params;
+    } else {
+        parts_float_to_ahp(&p, s);
+        fmt = &float16_params_ahp;
+    }
+    return float16a_round_pack_canonical(&p, s, fmt);
 }
 
 float32 float64_to_float32(float64 a, float_status *s)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float64_unpack_canonical(&pa, a, s);
-    pr = float_to_float(pa, &float32_params, s);
-    return float32_round_pack_canonical(&pr, s);
+    float64_unpack_canonical(&p, a, s);
+    parts_float_to_float(&p, s);
+    return float32_round_pack_canonical(&p, s);
 }
 
 float32 bfloat16_to_float32(bfloat16 a, float_status *s)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    bfloat16_unpack_canonical(&pa, a, s);
-    pr = float_to_float(pa, &float32_params, s);
-    return float32_round_pack_canonical(&pr, s);
+    bfloat16_unpack_canonical(&p, a, s);
+    parts_float_to_float(&p, s);
+    return float32_round_pack_canonical(&p, s);
 }
 
 float64 bfloat16_to_float64(bfloat16 a, float_status *s)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    bfloat16_unpack_canonical(&pa, a, s);
-    pr = float_to_float(pa, &float64_params, s);
-    return float64_round_pack_canonical(&pr, s);
+    bfloat16_unpack_canonical(&p, a, s);
+    parts_float_to_float(&p, s);
+    return float64_round_pack_canonical(&p, s);
 }
 
 bfloat16 float32_to_bfloat16(float32 a, float_status *s)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float32_unpack_canonical(&pa, a, s);
-    pr = float_to_float(pa, &bfloat16_params, s);
-    return bfloat16_round_pack_canonical(&pr, s);
+    float32_unpack_canonical(&p, a, s);
+    parts_float_to_float(&p, s);
+    return bfloat16_round_pack_canonical(&p, s);
 }
 
 bfloat16 float64_to_bfloat16(float64 a, float_status *s)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float64_unpack_canonical(&pa, a, s);
-    pr = float_to_float(pa, &bfloat16_params, s);
-    return bfloat16_round_pack_canonical(&pr, s);
+    float64_unpack_canonical(&p, a, s);
+    parts_float_to_float(&p, s);
+    return bfloat16_round_pack_canonical(&p, s);
 }
 
 /*
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 44/72] softfloat: Convert float-to-float conversions with float128
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (42 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 43/72] softfloat: Split float_to_float Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13 11:17   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 45/72] softfloat: Move round_to_int to softfloat-parts.c.inc Richard Henderson
                   ` (32 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Introduce parts_float_to_float_widen and parts_float_to_float_narrow.
Use them for float128_to_float{32,64} and float{32,64}_to_float128.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 203 ++++++++++++++++--------------------------------
 1 file changed, 69 insertions(+), 134 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 06fac8f41c..1b86111279 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2091,6 +2091,35 @@ static void parts128_float_to_float(FloatParts128 *a, float_status *s)
 #define parts_float_to_float(P, S) \
     PARTS_GENERIC_64_128(float_to_float, P)(P, S)
 
+static void parts_float_to_float_narrow(FloatParts64 *a, FloatParts128 *b,
+                                        float_status *s)
+{
+    a->cls = b->cls;
+    a->sign = b->sign;
+    a->exp = b->exp;
+
+    if (a->cls == float_class_normal) {
+        frac_truncjam(a, b);
+    } else if (is_nan(a->cls)) {
+        /* Discard the low bits of the NaN. */
+        a->frac = b->frac_hi;
+        parts_return_nan(a, s);
+    }
+}
+
+static void parts_float_to_float_widen(FloatParts128 *a, FloatParts64 *b,
+                                       float_status *s)
+{
+    a->cls = b->cls;
+    a->sign = b->sign;
+    a->exp = b->exp;
+    frac_widen(a, b);
+
+    if (is_nan(a->cls)) {
+        parts_return_nan(a, s);
+    }
+}
+
 float32 float16_to_float32(float16 a, bool ieee, float_status *s)
 {
     const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
@@ -2214,6 +2243,46 @@ bfloat16 float64_to_bfloat16(float64 a, float_status *s)
     return bfloat16_round_pack_canonical(&p, s);
 }
 
+float32 float128_to_float32(float128 a, float_status *s)
+{
+    FloatParts64 p64;
+    FloatParts128 p128;
+
+    float128_unpack_canonical(&p128, a, s);
+    parts_float_to_float_narrow(&p64, &p128, s);
+    return float32_round_pack_canonical(&p64, s);
+}
+
+float64 float128_to_float64(float128 a, float_status *s)
+{
+    FloatParts64 p64;
+    FloatParts128 p128;
+
+    float128_unpack_canonical(&p128, a, s);
+    parts_float_to_float_narrow(&p64, &p128, s);
+    return float64_round_pack_canonical(&p64, s);
+}
+
+float128 float32_to_float128(float32 a, float_status *s)
+{
+    FloatParts64 p64;
+    FloatParts128 p128;
+
+    float32_unpack_canonical(&p64, a, s);
+    parts_float_to_float_widen(&p128, &p64, s);
+    return float128_round_pack_canonical(&p128, s);
+}
+
+float128 float64_to_float128(float64 a, float_status *s)
+{
+    FloatParts64 p64;
+    FloatParts128 p128;
+
+    float64_unpack_canonical(&p64, a, s);
+    parts_float_to_float_widen(&p128, &p64, s);
+    return float128_round_pack_canonical(&p128, s);
+}
+
 /*
  * Rounds the floating-point value `a' to an integer, and returns the
  * result as a floating-point value. The operation is performed
@@ -5174,38 +5243,6 @@ floatx80 float32_to_floatx80(float32 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the double-precision floating-point format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float128 float32_to_float128(float32 a, float_status *status)
-{
-    bool aSign;
-    int aExp;
-    uint32_t aSig;
-
-    a = float32_squash_input_denormal(a, status);
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    if ( aExp == 0xFF ) {
-        if (aSig) {
-            return commonNaNToFloat128(float32ToCommonNaN(a, status), status);
-        }
-        return packFloat128( aSign, 0x7FFF, 0, 0 );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloat128( aSign, 0, 0, 0 );
-        normalizeFloat32Subnormal( aSig, &aExp, &aSig );
-        --aExp;
-    }
-    return packFloat128( aSign, aExp + 0x3F80, ( (uint64_t) aSig )<<25, 0 );
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the remainder of the single-precision floating-point value `a'
 | with respect to the corresponding value `b'.  The operation is performed
@@ -5479,40 +5516,6 @@ floatx80 float64_to_floatx80(float64 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point value
-| `a' to the quadruple-precision floating-point format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float128 float64_to_float128(float64 a, float_status *status)
-{
-    bool aSign;
-    int aExp;
-    uint64_t aSig, zSig0, zSig1;
-
-    a = float64_squash_input_denormal(a, status);
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    if ( aExp == 0x7FF ) {
-        if (aSig) {
-            return commonNaNToFloat128(float64ToCommonNaN(a, status), status);
-        }
-        return packFloat128( aSign, 0x7FFF, 0, 0 );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloat128( aSign, 0, 0, 0 );
-        normalizeFloat64Subnormal( aSig, &aExp, &aSig );
-        --aExp;
-    }
-    shift128Right( aSig, 0, 4, &zSig0, &zSig1 );
-    return packFloat128( aSign, aExp + 0x3C00, zSig0, zSig1 );
-
-}
-
-
 /*----------------------------------------------------------------------------
 | Returns the remainder of the double-precision floating-point value `a'
 | with respect to the corresponding value `b'.  The operation is performed
@@ -6914,74 +6917,6 @@ uint32_t float128_to_uint32(float128 a, float_status *status)
     return res;
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the quadruple-precision floating-point
-| value `a' to the single-precision floating-point format.  The conversion
-| is performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 float128_to_float32(float128 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp;
-    uint64_t aSig0, aSig1;
-    uint32_t zSig;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    aSign = extractFloat128Sign( a );
-    if ( aExp == 0x7FFF ) {
-        if ( aSig0 | aSig1 ) {
-            return commonNaNToFloat32(float128ToCommonNaN(a, status), status);
-        }
-        return packFloat32( aSign, 0xFF, 0 );
-    }
-    aSig0 |= ( aSig1 != 0 );
-    shift64RightJamming( aSig0, 18, &aSig0 );
-    zSig = aSig0;
-    if ( aExp || zSig ) {
-        zSig |= 0x40000000;
-        aExp -= 0x3F81;
-    }
-    return roundAndPackFloat32(aSign, aExp, zSig, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the quadruple-precision floating-point
-| value `a' to the double-precision floating-point format.  The conversion
-| is performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 float128_to_float64(float128 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp;
-    uint64_t aSig0, aSig1;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    aSign = extractFloat128Sign( a );
-    if ( aExp == 0x7FFF ) {
-        if ( aSig0 | aSig1 ) {
-            return commonNaNToFloat64(float128ToCommonNaN(a, status), status);
-        }
-        return packFloat64( aSign, 0x7FF, 0 );
-    }
-    shortShift128Left( aSig0, aSig1, 14, &aSig0, &aSig1 );
-    aSig0 |= ( aSig1 != 0 );
-    if ( aExp || aSig0 ) {
-        aSig0 |= UINT64_C(0x4000000000000000);
-        aExp -= 0x3C01;
-    }
-    return roundAndPackFloat64(aSign, aExp, aSig0, status);
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of converting the quadruple-precision floating-point
 | value `a' to the extended double-precision floating-point format.  The
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 45/72] softfloat: Move round_to_int to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (43 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 44/72] softfloat: Convert float-to-float conversions with float128 Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13 14:10   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 46/72] softfloat: Move rount_to_int_and_pack " Richard Henderson
                   ` (31 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

At the same time, convert to pointers, split out
parts$N_round_to_int_normal, define a macro for
parts_round_to_int using QEMU_GENERIC.

This necessarily meant some rearrangement to the
rount_to_{,u}int_and_pack routines, so go ahead and
convert to parts_round_to_int_normal, which in turn
allows cleaning up of the raised exception handling.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 434 ++++++++++----------------------------
 fpu/softfloat-parts.c.inc | 157 ++++++++++++++
 2 files changed, 263 insertions(+), 328 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 1b86111279..ce96ea753c 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -810,6 +810,24 @@ static FloatParts128 *parts128_div(FloatParts128 *a, FloatParts128 *b,
 #define parts_div(A, B, S) \
     PARTS_GENERIC_64_128(div, A)(A, B, S)
 
+static bool parts64_round_to_int_normal(FloatParts64 *a, FloatRoundMode rm,
+                                        int scale, int frac_size);
+static bool parts128_round_to_int_normal(FloatParts128 *a, FloatRoundMode r,
+                                         int scale, int frac_size);
+
+#define parts_round_to_int_normal(A, R, C, F) \
+    PARTS_GENERIC_64_128(round_to_int_normal, A)(A, R, C, F)
+
+static void parts64_round_to_int(FloatParts64 *a, FloatRoundMode rm,
+                                 int scale, float_status *s,
+                                 const FloatFmt *fmt);
+static void parts128_round_to_int(FloatParts128 *a, FloatRoundMode r,
+                                  int scale, float_status *s,
+                                  const FloatFmt *fmt);
+
+#define parts_round_to_int(A, R, C, S, F) \
+    PARTS_GENERIC_64_128(round_to_int, A)(A, R, C, S, F)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -2284,153 +2302,52 @@ float128 float64_to_float128(float64 a, float_status *s)
 }
 
 /*
- * Rounds the floating-point value `a' to an integer, and returns the
- * result as a floating-point value. The operation is performed
- * according to the IEC/IEEE Standard for Binary Floating-Point
- * Arithmetic.
+ * Round to integral value
  */
 
-static FloatParts64 round_to_int(FloatParts64 a, FloatRoundMode rmode,
-                               int scale, float_status *s)
-{
-    switch (a.cls) {
-    case float_class_qnan:
-    case float_class_snan:
-        parts_return_nan(&a, s);
-        break;
-
-    case float_class_zero:
-    case float_class_inf:
-        /* already "integral" */
-        break;
-
-    case float_class_normal:
-        scale = MIN(MAX(scale, -0x10000), 0x10000);
-        a.exp += scale;
-
-        if (a.exp >= DECOMPOSED_BINARY_POINT) {
-            /* already integral */
-            break;
-        }
-        if (a.exp < 0) {
-            bool one;
-            /* all fractional */
-            float_raise(float_flag_inexact, s);
-            switch (rmode) {
-            case float_round_nearest_even:
-                one = a.exp == -1 && a.frac > DECOMPOSED_IMPLICIT_BIT;
-                break;
-            case float_round_ties_away:
-                one = a.exp == -1 && a.frac >= DECOMPOSED_IMPLICIT_BIT;
-                break;
-            case float_round_to_zero:
-                one = false;
-                break;
-            case float_round_up:
-                one = !a.sign;
-                break;
-            case float_round_down:
-                one = a.sign;
-                break;
-            case float_round_to_odd:
-                one = true;
-                break;
-            default:
-                g_assert_not_reached();
-            }
-
-            if (one) {
-                a.frac = DECOMPOSED_IMPLICIT_BIT;
-                a.exp = 0;
-            } else {
-                a.cls = float_class_zero;
-            }
-        } else {
-            uint64_t frac_lsb = DECOMPOSED_IMPLICIT_BIT >> a.exp;
-            uint64_t frac_lsbm1 = frac_lsb >> 1;
-            uint64_t rnd_even_mask = (frac_lsb - 1) | frac_lsb;
-            uint64_t rnd_mask = rnd_even_mask >> 1;
-            uint64_t inc;
-
-            switch (rmode) {
-            case float_round_nearest_even:
-                inc = ((a.frac & rnd_even_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
-                break;
-            case float_round_ties_away:
-                inc = frac_lsbm1;
-                break;
-            case float_round_to_zero:
-                inc = 0;
-                break;
-            case float_round_up:
-                inc = a.sign ? 0 : rnd_mask;
-                break;
-            case float_round_down:
-                inc = a.sign ? rnd_mask : 0;
-                break;
-            case float_round_to_odd:
-                inc = a.frac & frac_lsb ? 0 : rnd_mask;
-                break;
-            default:
-                g_assert_not_reached();
-            }
-
-            if (a.frac & rnd_mask) {
-                float_raise(float_flag_inexact, s);
-                if (uadd64_overflow(a.frac, inc, &a.frac)) {
-                    a.frac >>= 1;
-                    a.frac |= DECOMPOSED_IMPLICIT_BIT;
-                    a.exp++;
-                }
-                a.frac &= ~rnd_mask;
-            }
-        }
-        break;
-    default:
-        g_assert_not_reached();
-    }
-    return a;
-}
-
 float16 float16_round_to_int(float16 a, float_status *s)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float16_unpack_canonical(&pa, a, s);
-    pr = round_to_int(pa, s->float_rounding_mode, 0, s);
-    return float16_round_pack_canonical(&pr, s);
+    float16_unpack_canonical(&p, a, s);
+    parts_round_to_int(&p, s->float_rounding_mode, 0, s, &float16_params);
+    return float16_round_pack_canonical(&p, s);
 }
 
 float32 float32_round_to_int(float32 a, float_status *s)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float32_unpack_canonical(&pa, a, s);
-    pr = round_to_int(pa, s->float_rounding_mode, 0, s);
-    return float32_round_pack_canonical(&pr, s);
+    float32_unpack_canonical(&p, a, s);
+    parts_round_to_int(&p, s->float_rounding_mode, 0, s, &float32_params);
+    return float32_round_pack_canonical(&p, s);
 }
 
 float64 float64_round_to_int(float64 a, float_status *s)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float64_unpack_canonical(&pa, a, s);
-    pr = round_to_int(pa, s->float_rounding_mode, 0, s);
-    return float64_round_pack_canonical(&pr, s);
+    float64_unpack_canonical(&p, a, s);
+    parts_round_to_int(&p, s->float_rounding_mode, 0, s, &float64_params);
+    return float64_round_pack_canonical(&p, s);
 }
 
-/*
- * Rounds the bfloat16 value `a' to an integer, and returns the
- * result as a bfloat16 value.
- */
-
 bfloat16 bfloat16_round_to_int(bfloat16 a, float_status *s)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    bfloat16_unpack_canonical(&pa, a, s);
-    pr = round_to_int(pa, s->float_rounding_mode, 0, s);
-    return bfloat16_round_pack_canonical(&pr, s);
+    bfloat16_unpack_canonical(&p, a, s);
+    parts_round_to_int(&p, s->float_rounding_mode, 0, s, &bfloat16_params);
+    return bfloat16_round_pack_canonical(&p, s);
+}
+
+float128 float128_round_to_int(float128 a, float_status *s)
+{
+    FloatParts128 p;
+
+    float128_unpack_canonical(&p, a, s);
+    parts_round_to_int(&p, s->float_rounding_mode, 0, s, &float128_params);
+    return float128_round_pack_canonical(&p, s);
 }
 
 /*
@@ -2444,48 +2361,58 @@ bfloat16 bfloat16_round_to_int(bfloat16 a, float_status *s)
  * is returned.
 */
 
-static int64_t round_to_int_and_pack(FloatParts64 in, FloatRoundMode rmode,
+static int64_t round_to_int_and_pack(FloatParts64 p, FloatRoundMode rmode,
                                      int scale, int64_t min, int64_t max,
                                      float_status *s)
 {
+    int flags = 0;
     uint64_t r;
-    int orig_flags = get_float_exception_flags(s);
-    FloatParts64 p = round_to_int(in, rmode, scale, s);
 
     switch (p.cls) {
     case float_class_snan:
     case float_class_qnan:
-        s->float_exception_flags = orig_flags | float_flag_invalid;
-        return max;
+        flags = float_flag_invalid;
+        r = max;
+        break;
+
     case float_class_inf:
-        s->float_exception_flags = orig_flags | float_flag_invalid;
-        return p.sign ? min : max;
+        flags = float_flag_invalid;
+        r = p.sign ? min : max;
+        break;
+
     case float_class_zero:
         return 0;
+
     case float_class_normal:
+        /* TODO: 62 = N - 2, frac_size for rounding */
+        if (parts_round_to_int_normal(&p, rmode, scale, 62)) {
+            flags = float_flag_inexact;
+        }
+
         if (p.exp <= DECOMPOSED_BINARY_POINT) {
             r = p.frac >> (DECOMPOSED_BINARY_POINT - p.exp);
         } else {
             r = UINT64_MAX;
         }
         if (p.sign) {
-            if (r <= -(uint64_t) min) {
-                return -r;
+            if (r <= -(uint64_t)min) {
+                r = -r;
             } else {
-                s->float_exception_flags = orig_flags | float_flag_invalid;
-                return min;
-            }
-        } else {
-            if (r <= max) {
-                return r;
-            } else {
-                s->float_exception_flags = orig_flags | float_flag_invalid;
-                return max;
+                flags = float_flag_invalid;
+                r = min;
             }
+        } else if (r > max) {
+            flags = float_flag_invalid;
+            r = max;
         }
+        break;
+
     default:
         g_assert_not_reached();
     }
+
+    float_raise(flags, s);
+    return r;
 }
 
 int8_t float16_to_int8_scalbn(float16 a, FloatRoundMode rmode, int scale,
@@ -2748,49 +2675,59 @@ int64_t bfloat16_to_int64_round_to_zero(bfloat16 a, float_status *s)
  *  flag.
  */
 
-static uint64_t round_to_uint_and_pack(FloatParts64 in, FloatRoundMode rmode,
+static uint64_t round_to_uint_and_pack(FloatParts64 p, FloatRoundMode rmode,
                                        int scale, uint64_t max,
                                        float_status *s)
 {
-    int orig_flags = get_float_exception_flags(s);
-    FloatParts64 p = round_to_int(in, rmode, scale, s);
+    int flags = 0;
     uint64_t r;
 
     switch (p.cls) {
     case float_class_snan:
     case float_class_qnan:
-        s->float_exception_flags = orig_flags | float_flag_invalid;
-        return max;
+        flags = float_flag_invalid;
+        r = max;
+        break;
+
     case float_class_inf:
-        s->float_exception_flags = orig_flags | float_flag_invalid;
-        return p.sign ? 0 : max;
+        flags = float_flag_invalid;
+        r = p.sign ? 0 : max;
+        break;
+
     case float_class_zero:
         return 0;
+
     case float_class_normal:
+        /* TODO: 62 = N - 2, frac_size for rounding */
+        if (parts_round_to_int_normal(&p, rmode, scale, 62)) {
+            flags = float_flag_inexact;
+            if (p.cls == float_class_zero) {
+                r = 0;
+                break;
+            }
+        }
+
         if (p.sign) {
-            s->float_exception_flags = orig_flags | float_flag_invalid;
-            return 0;
-        }
-
-        if (p.exp <= DECOMPOSED_BINARY_POINT) {
-            r = p.frac >> (DECOMPOSED_BINARY_POINT - p.exp);
+            flags = float_flag_invalid;
+            r = 0;
+        } else if (p.exp > DECOMPOSED_BINARY_POINT) {
+            flags = float_flag_invalid;
+            r = max;
         } else {
-            s->float_exception_flags = orig_flags | float_flag_invalid;
-            return max;
+            r = p.frac >> (DECOMPOSED_BINARY_POINT - p.exp);
+            if (r > max) {
+                flags = float_flag_invalid;
+                r = max;
+            }
         }
+        break;
 
-        /* For uint64 this will never trip, but if p.exp is too large
-         * to shift a decomposed fraction we shall have exited via the
-         * 3rd leg above.
-         */
-        if (r > max) {
-            s->float_exception_flags = orig_flags | float_flag_invalid;
-            return max;
-        }
-        return r;
     default:
         g_assert_not_reached();
     }
+
+    float_raise(flags, s);
+    return r;
 }
 
 uint8_t float16_to_uint8_scalbn(float16 a, FloatRoundMode rmode, int scale,
@@ -6955,165 +6892,6 @@ floatx80 float128_to_floatx80(float128 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Rounds the quadruple-precision floating-point value `a' to an integer, and
-| returns the result as a quadruple-precision floating-point value.  The
-| operation is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float128 float128_round_to_int(float128 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp;
-    uint64_t lastBitMask, roundBitsMask;
-    float128 z;
-
-    aExp = extractFloat128Exp( a );
-    if ( 0x402F <= aExp ) {
-        if ( 0x406F <= aExp ) {
-            if (    ( aExp == 0x7FFF )
-                 && ( extractFloat128Frac0( a ) | extractFloat128Frac1( a ) )
-               ) {
-                return propagateFloat128NaN(a, a, status);
-            }
-            return a;
-        }
-        lastBitMask = 1;
-        lastBitMask = ( lastBitMask<<( 0x406E - aExp ) )<<1;
-        roundBitsMask = lastBitMask - 1;
-        z = a;
-        switch (status->float_rounding_mode) {
-        case float_round_nearest_even:
-            if ( lastBitMask ) {
-                add128( z.high, z.low, 0, lastBitMask>>1, &z.high, &z.low );
-                if ( ( z.low & roundBitsMask ) == 0 ) z.low &= ~ lastBitMask;
-            }
-            else {
-                if ( (int64_t) z.low < 0 ) {
-                    ++z.high;
-                    if ( (uint64_t) ( z.low<<1 ) == 0 ) z.high &= ~1;
-                }
-            }
-            break;
-        case float_round_ties_away:
-            if (lastBitMask) {
-                add128(z.high, z.low, 0, lastBitMask >> 1, &z.high, &z.low);
-            } else {
-                if ((int64_t) z.low < 0) {
-                    ++z.high;
-                }
-            }
-            break;
-        case float_round_to_zero:
-            break;
-        case float_round_up:
-            if (!extractFloat128Sign(z)) {
-                add128(z.high, z.low, 0, roundBitsMask, &z.high, &z.low);
-            }
-            break;
-        case float_round_down:
-            if (extractFloat128Sign(z)) {
-                add128(z.high, z.low, 0, roundBitsMask, &z.high, &z.low);
-            }
-            break;
-        case float_round_to_odd:
-            /*
-             * Note that if lastBitMask == 0, the last bit is the lsb
-             * of high, and roundBitsMask == -1.
-             */
-            if ((lastBitMask ? z.low & lastBitMask : z.high & 1) == 0) {
-                add128(z.high, z.low, 0, roundBitsMask, &z.high, &z.low);
-            }
-            break;
-        default:
-            abort();
-        }
-        z.low &= ~ roundBitsMask;
-    }
-    else {
-        if ( aExp < 0x3FFF ) {
-            if ( ( ( (uint64_t) ( a.high<<1 ) ) | a.low ) == 0 ) return a;
-            float_raise(float_flag_inexact, status);
-            aSign = extractFloat128Sign( a );
-            switch (status->float_rounding_mode) {
-            case float_round_nearest_even:
-                if (    ( aExp == 0x3FFE )
-                     && (   extractFloat128Frac0( a )
-                          | extractFloat128Frac1( a ) )
-                   ) {
-                    return packFloat128( aSign, 0x3FFF, 0, 0 );
-                }
-                break;
-            case float_round_ties_away:
-                if (aExp == 0x3FFE) {
-                    return packFloat128(aSign, 0x3FFF, 0, 0);
-                }
-                break;
-            case float_round_down:
-                return
-                      aSign ? packFloat128( 1, 0x3FFF, 0, 0 )
-                    : packFloat128( 0, 0, 0, 0 );
-            case float_round_up:
-                return
-                      aSign ? packFloat128( 1, 0, 0, 0 )
-                    : packFloat128( 0, 0x3FFF, 0, 0 );
-
-            case float_round_to_odd:
-                return packFloat128(aSign, 0x3FFF, 0, 0);
-
-            case float_round_to_zero:
-                break;
-            }
-            return packFloat128( aSign, 0, 0, 0 );
-        }
-        lastBitMask = 1;
-        lastBitMask <<= 0x402F - aExp;
-        roundBitsMask = lastBitMask - 1;
-        z.low = 0;
-        z.high = a.high;
-        switch (status->float_rounding_mode) {
-        case float_round_nearest_even:
-            z.high += lastBitMask>>1;
-            if ( ( ( z.high & roundBitsMask ) | a.low ) == 0 ) {
-                z.high &= ~ lastBitMask;
-            }
-            break;
-        case float_round_ties_away:
-            z.high += lastBitMask>>1;
-            break;
-        case float_round_to_zero:
-            break;
-        case float_round_up:
-            if (!extractFloat128Sign(z)) {
-                z.high |= ( a.low != 0 );
-                z.high += roundBitsMask;
-            }
-            break;
-        case float_round_down:
-            if (extractFloat128Sign(z)) {
-                z.high |= (a.low != 0);
-                z.high += roundBitsMask;
-            }
-            break;
-        case float_round_to_odd:
-            if ((z.high & lastBitMask) == 0) {
-                z.high |= (a.low != 0);
-                z.high += roundBitsMask;
-            }
-            break;
-        default:
-            abort();
-        }
-        z.high &= ~ roundBitsMask;
-    }
-    if ( ( z.low != a.low ) || ( z.high != a.high ) ) {
-        float_raise(float_flag_inexact, status);
-    }
-    return z;
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the remainder of the quadruple-precision floating-point value `a'
 | with respect to the corresponding value `b'.  The operation is performed
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index f8165d92f9..b2c4624d8c 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -594,3 +594,160 @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
     a->cls = float_class_inf;
     return a;
 }
+
+/*
+ * Rounds the floating-point value `a' to an integer, and returns the
+ * result as a floating-point value. The operation is performed
+ * according to the IEC/IEEE Standard for Binary Floating-Point
+ * Arithmetic.
+ *
+ * parts_round_to_int_normal is an internal helper function for
+ * normal numbers only, returning true for inexact but not directly
+ * raising float_flag_inexact.
+ */
+static bool partsN(round_to_int_normal)(FloatPartsN *a, FloatRoundMode rmode,
+                                        int scale, int frac_size)
+{
+    uint64_t frac_lsb, frac_lsbm1, rnd_even_mask, rnd_mask, inc;
+    int shift_adj;
+
+    scale = MIN(MAX(scale, -0x10000), 0x10000);
+    a->exp += scale;
+
+    if (a->exp < 0) {
+        bool one;
+
+        /* All fractional */
+        switch (rmode) {
+        case float_round_nearest_even:
+            one = false;
+            if (a->exp == -1) {
+                FloatPartsN tmp;
+                /* Shift left one, discarding DECOMPOSED_IMPLICIT_BIT */
+                frac_add(&tmp, a, a);
+                /* Anything remaining means frac > 0.5. */
+                one = !frac_eqz(&tmp);
+            }
+            break;
+        case float_round_ties_away:
+            one = a->exp == -1;
+            break;
+        case float_round_to_zero:
+            one = false;
+            break;
+        case float_round_up:
+            one = !a->sign;
+            break;
+        case float_round_down:
+            one = a->sign;
+            break;
+        case float_round_to_odd:
+            one = true;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+
+        frac_clear(a);
+        a->exp = 0;
+        if (one) {
+            a->frac_hi = DECOMPOSED_IMPLICIT_BIT;
+        } else {
+            a->cls = float_class_zero;
+        }
+        return true;
+    }
+
+    if (a->exp >= frac_size) {
+        /* All integral */
+        return false;
+    }
+
+    if (N > 64 && a->exp < N - 64) {
+        /*
+         * Rounding is not in the low word -- shift lsb to bit 2,
+         * which leaves room for sticky and rounding bit.
+         */
+        shift_adj = (N - 1) - (a->exp + 2);
+        frac_shrjam(a, shift_adj);
+        frac_lsb = 1 << 2;
+    } else {
+        shift_adj = 0;
+        frac_lsb = DECOMPOSED_IMPLICIT_BIT >> (a->exp & 63);
+    }
+
+    frac_lsbm1 = frac_lsb >> 1;
+    rnd_mask = frac_lsb - 1;
+    rnd_even_mask = rnd_mask | frac_lsb;
+
+    if (!(a->frac_lo & rnd_mask)) {
+        /* Fractional bits already clear, undo the shift above. */
+        frac_shl(a, shift_adj);
+        return false;
+    }
+
+    switch (rmode) {
+    case float_round_nearest_even:
+        inc = ((a->frac_lo & rnd_even_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
+        break;
+    case float_round_ties_away:
+        inc = frac_lsbm1;
+        break;
+    case float_round_to_zero:
+        inc = 0;
+        break;
+    case float_round_up:
+        inc = a->sign ? 0 : rnd_mask;
+        break;
+    case float_round_down:
+        inc = a->sign ? rnd_mask : 0;
+        break;
+    case float_round_to_odd:
+        inc = a->frac_lo & frac_lsb ? 0 : rnd_mask;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    if (shift_adj == 0) {
+        if (frac_addi(a, a, inc)) {
+            frac_shr(a, 1);
+            a->frac_hi |= DECOMPOSED_IMPLICIT_BIT;
+            a->exp++;
+        }
+        a->frac_lo &= ~rnd_mask;
+    } else {
+        frac_addi(a, a, inc);
+        a->frac_lo &= ~rnd_mask;
+        /* Be careful shifting back, not to overflow */
+        frac_shl(a, shift_adj - 1);
+        if (a->frac_hi & DECOMPOSED_IMPLICIT_BIT) {
+            a->exp++;
+        } else {
+            frac_add(a, a, a);
+        }
+    }
+    return true;
+}
+
+static void partsN(round_to_int)(FloatPartsN *a, FloatRoundMode rmode,
+                                 int scale, float_status *s,
+                                 const FloatFmt *fmt)
+{
+    switch (a->cls) {
+    case float_class_qnan:
+    case float_class_snan:
+        parts_return_nan(a, s);
+        break;
+    case float_class_zero:
+    case float_class_inf:
+        break;
+    case float_class_normal:
+        if (parts_round_to_int_normal(a, rmode, scale, fmt->frac_size)) {
+            float_raise(float_flag_inexact, s);
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 46/72] softfloat: Move rount_to_int_and_pack to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (44 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 45/72] softfloat: Move round_to_int to softfloat-parts.c.inc Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13 14:11   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 47/72] softfloat: Move rount_to_uint_and_pack " Richard Henderson
                   ` (30 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Rename to parts$N_float_to_sint.  Reimplement
float128_to_int{32,64}{_round_to_zero} with FloatParts128.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 365 +++++++++-----------------------------
 fpu/softfloat-parts.c.inc |  64 +++++++
 2 files changed, 145 insertions(+), 284 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ce96ea753c..ac8e726935 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -828,6 +828,16 @@ static void parts128_round_to_int(FloatParts128 *a, FloatRoundMode r,
 #define parts_round_to_int(A, R, C, S, F) \
     PARTS_GENERIC_64_128(round_to_int, A)(A, R, C, S, F)
 
+static int64_t parts64_float_to_sint(FloatParts64 *p, FloatRoundMode rmode,
+                                     int scale, int64_t min, int64_t max,
+                                     float_status *s);
+static int64_t parts128_float_to_sint(FloatParts128 *p, FloatRoundMode rmode,
+                                     int scale, int64_t min, int64_t max,
+                                     float_status *s);
+
+#define parts_float_to_sint(P, R, Z, MN, MX, S) \
+    PARTS_GENERIC_64_128(float_to_sint, P)(P, R, Z, MN, MX, S)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -2351,69 +2361,8 @@ float128 float128_round_to_int(float128 a, float_status *s)
 }
 
 /*
- * Returns the result of converting the floating-point value `a' to
- * the two's complement integer format. The conversion is performed
- * according to the IEC/IEEE Standard for Binary Floating-Point
- * Arithmetic---which means in particular that the conversion is
- * rounded according to the current rounding mode. If `a' is a NaN,
- * the largest positive integer is returned. Otherwise, if the
- * conversion overflows, the largest integer with the same sign as `a'
- * is returned.
-*/
-
-static int64_t round_to_int_and_pack(FloatParts64 p, FloatRoundMode rmode,
-                                     int scale, int64_t min, int64_t max,
-                                     float_status *s)
-{
-    int flags = 0;
-    uint64_t r;
-
-    switch (p.cls) {
-    case float_class_snan:
-    case float_class_qnan:
-        flags = float_flag_invalid;
-        r = max;
-        break;
-
-    case float_class_inf:
-        flags = float_flag_invalid;
-        r = p.sign ? min : max;
-        break;
-
-    case float_class_zero:
-        return 0;
-
-    case float_class_normal:
-        /* TODO: 62 = N - 2, frac_size for rounding */
-        if (parts_round_to_int_normal(&p, rmode, scale, 62)) {
-            flags = float_flag_inexact;
-        }
-
-        if (p.exp <= DECOMPOSED_BINARY_POINT) {
-            r = p.frac >> (DECOMPOSED_BINARY_POINT - p.exp);
-        } else {
-            r = UINT64_MAX;
-        }
-        if (p.sign) {
-            if (r <= -(uint64_t)min) {
-                r = -r;
-            } else {
-                flags = float_flag_invalid;
-                r = min;
-            }
-        } else if (r > max) {
-            flags = float_flag_invalid;
-            r = max;
-        }
-        break;
-
-    default:
-        g_assert_not_reached();
-    }
-
-    float_raise(flags, s);
-    return r;
-}
+ * Floating-point to signed integer conversions
+ */
 
 int8_t float16_to_int8_scalbn(float16 a, FloatRoundMode rmode, int scale,
                               float_status *s)
@@ -2421,7 +2370,7 @@ int8_t float16_to_int8_scalbn(float16 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float16_unpack_canonical(&p, a, s);
-    return round_to_int_and_pack(p, rmode, scale, INT8_MIN, INT8_MAX, s);
+    return parts_float_to_sint(&p, rmode, scale, INT8_MIN, INT8_MAX, s);
 }
 
 int16_t float16_to_int16_scalbn(float16 a, FloatRoundMode rmode, int scale,
@@ -2430,7 +2379,7 @@ int16_t float16_to_int16_scalbn(float16 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float16_unpack_canonical(&p, a, s);
-    return round_to_int_and_pack(p, rmode, scale, INT16_MIN, INT16_MAX, s);
+    return parts_float_to_sint(&p, rmode, scale, INT16_MIN, INT16_MAX, s);
 }
 
 int32_t float16_to_int32_scalbn(float16 a, FloatRoundMode rmode, int scale,
@@ -2439,7 +2388,7 @@ int32_t float16_to_int32_scalbn(float16 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float16_unpack_canonical(&p, a, s);
-    return round_to_int_and_pack(p, rmode, scale, INT32_MIN, INT32_MAX, s);
+    return parts_float_to_sint(&p, rmode, scale, INT32_MIN, INT32_MAX, s);
 }
 
 int64_t float16_to_int64_scalbn(float16 a, FloatRoundMode rmode, int scale,
@@ -2448,7 +2397,7 @@ int64_t float16_to_int64_scalbn(float16 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float16_unpack_canonical(&p, a, s);
-    return round_to_int_and_pack(p, rmode, scale, INT64_MIN, INT64_MAX, s);
+    return parts_float_to_sint(&p, rmode, scale, INT64_MIN, INT64_MAX, s);
 }
 
 int16_t float32_to_int16_scalbn(float32 a, FloatRoundMode rmode, int scale,
@@ -2457,7 +2406,7 @@ int16_t float32_to_int16_scalbn(float32 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float32_unpack_canonical(&p, a, s);
-    return round_to_int_and_pack(p, rmode, scale, INT16_MIN, INT16_MAX, s);
+    return parts_float_to_sint(&p, rmode, scale, INT16_MIN, INT16_MAX, s);
 }
 
 int32_t float32_to_int32_scalbn(float32 a, FloatRoundMode rmode, int scale,
@@ -2466,7 +2415,7 @@ int32_t float32_to_int32_scalbn(float32 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float32_unpack_canonical(&p, a, s);
-    return round_to_int_and_pack(p, rmode, scale, INT32_MIN, INT32_MAX, s);
+    return parts_float_to_sint(&p, rmode, scale, INT32_MIN, INT32_MAX, s);
 }
 
 int64_t float32_to_int64_scalbn(float32 a, FloatRoundMode rmode, int scale,
@@ -2475,7 +2424,7 @@ int64_t float32_to_int64_scalbn(float32 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float32_unpack_canonical(&p, a, s);
-    return round_to_int_and_pack(p, rmode, scale, INT64_MIN, INT64_MAX, s);
+    return parts_float_to_sint(&p, rmode, scale, INT64_MIN, INT64_MAX, s);
 }
 
 int16_t float64_to_int16_scalbn(float64 a, FloatRoundMode rmode, int scale,
@@ -2484,7 +2433,7 @@ int16_t float64_to_int16_scalbn(float64 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float64_unpack_canonical(&p, a, s);
-    return round_to_int_and_pack(p, rmode, scale, INT16_MIN, INT16_MAX, s);
+    return parts_float_to_sint(&p, rmode, scale, INT16_MIN, INT16_MAX, s);
 }
 
 int32_t float64_to_int32_scalbn(float64 a, FloatRoundMode rmode, int scale,
@@ -2493,7 +2442,7 @@ int32_t float64_to_int32_scalbn(float64 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float64_unpack_canonical(&p, a, s);
-    return round_to_int_and_pack(p, rmode, scale, INT32_MIN, INT32_MAX, s);
+    return parts_float_to_sint(&p, rmode, scale, INT32_MIN, INT32_MAX, s);
 }
 
 int64_t float64_to_int64_scalbn(float64 a, FloatRoundMode rmode, int scale,
@@ -2502,7 +2451,52 @@ int64_t float64_to_int64_scalbn(float64 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float64_unpack_canonical(&p, a, s);
-    return round_to_int_and_pack(p, rmode, scale, INT64_MIN, INT64_MAX, s);
+    return parts_float_to_sint(&p, rmode, scale, INT64_MIN, INT64_MAX, s);
+}
+
+int16_t bfloat16_to_int16_scalbn(bfloat16 a, FloatRoundMode rmode, int scale,
+                                 float_status *s)
+{
+    FloatParts64 p;
+
+    bfloat16_unpack_canonical(&p, a, s);
+    return parts_float_to_sint(&p, rmode, scale, INT16_MIN, INT16_MAX, s);
+}
+
+int32_t bfloat16_to_int32_scalbn(bfloat16 a, FloatRoundMode rmode, int scale,
+                                 float_status *s)
+{
+    FloatParts64 p;
+
+    bfloat16_unpack_canonical(&p, a, s);
+    return parts_float_to_sint(&p, rmode, scale, INT32_MIN, INT32_MAX, s);
+}
+
+int64_t bfloat16_to_int64_scalbn(bfloat16 a, FloatRoundMode rmode, int scale,
+                                 float_status *s)
+{
+    FloatParts64 p;
+
+    bfloat16_unpack_canonical(&p, a, s);
+    return parts_float_to_sint(&p, rmode, scale, INT64_MIN, INT64_MAX, s);
+}
+
+static int32_t float128_to_int32_scalbn(float128 a, FloatRoundMode rmode,
+                                        int scale, float_status *s)
+{
+    FloatParts128 p;
+
+    float128_unpack_canonical(&p, a, s);
+    return parts_float_to_sint(&p, rmode, scale, INT32_MIN, INT32_MAX, s);
+}
+
+static int64_t float128_to_int64_scalbn(float128 a, FloatRoundMode rmode,
+                                        int scale, float_status *s)
+{
+    FloatParts128 p;
+
+    float128_unpack_canonical(&p, a, s);
+    return parts_float_to_sint(&p, rmode, scale, INT64_MIN, INT64_MAX, s);
 }
 
 int8_t float16_to_int8(float16 a, float_status *s)
@@ -2555,6 +2549,16 @@ int64_t float64_to_int64(float64 a, float_status *s)
     return float64_to_int64_scalbn(a, s->float_rounding_mode, 0, s);
 }
 
+int32_t float128_to_int32(float128 a, float_status *s)
+{
+    return float128_to_int32_scalbn(a, s->float_rounding_mode, 0, s);
+}
+
+int64_t float128_to_int64(float128 a, float_status *s)
+{
+    return float128_to_int64_scalbn(a, s->float_rounding_mode, 0, s);
+}
+
 int16_t float16_to_int16_round_to_zero(float16 a, float_status *s)
 {
     return float16_to_int16_scalbn(a, float_round_to_zero, 0, s);
@@ -2600,36 +2604,14 @@ int64_t float64_to_int64_round_to_zero(float64 a, float_status *s)
     return float64_to_int64_scalbn(a, float_round_to_zero, 0, s);
 }
 
-/*
- * Returns the result of converting the floating-point value `a' to
- * the two's complement integer format.
- */
-
-int16_t bfloat16_to_int16_scalbn(bfloat16 a, FloatRoundMode rmode, int scale,
-                                 float_status *s)
+int32_t float128_to_int32_round_to_zero(float128 a, float_status *s)
 {
-    FloatParts64 p;
-
-    bfloat16_unpack_canonical(&p, a, s);
-    return round_to_int_and_pack(p, rmode, scale, INT16_MIN, INT16_MAX, s);
+    return float128_to_int32_scalbn(a, float_round_to_zero, 0, s);
 }
 
-int32_t bfloat16_to_int32_scalbn(bfloat16 a, FloatRoundMode rmode, int scale,
-                                 float_status *s)
+int64_t float128_to_int64_round_to_zero(float128 a, float_status *s)
 {
-    FloatParts64 p;
-
-    bfloat16_unpack_canonical(&p, a, s);
-    return round_to_int_and_pack(p, rmode, scale, INT32_MIN, INT32_MAX, s);
-}
-
-int64_t bfloat16_to_int64_scalbn(bfloat16 a, FloatRoundMode rmode, int scale,
-                                 float_status *s)
-{
-    FloatParts64 p;
-
-    bfloat16_unpack_canonical(&p, a, s);
-    return round_to_int_and_pack(p, rmode, scale, INT64_MIN, INT64_MAX, s);
+    return float128_to_int64_scalbn(a, float_round_to_zero, 0, s);
 }
 
 int16_t bfloat16_to_int16(bfloat16 a, float_status *s)
@@ -6553,191 +6535,6 @@ floatx80 floatx80_sqrt(floatx80 a, float_status *status)
                                 0, zExp, zSig0, zSig1, status);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the quadruple-precision floating-point
-| value `a' to the 32-bit two's complement integer format.  The conversion
-| is performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| positive integer is returned.  Otherwise, if the conversion overflows, the
-| largest integer with the same sign as `a' is returned.
-*----------------------------------------------------------------------------*/
-
-int32_t float128_to_int32(float128 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp, shiftCount;
-    uint64_t aSig0, aSig1;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    aSign = extractFloat128Sign( a );
-    if ( ( aExp == 0x7FFF ) && ( aSig0 | aSig1 ) ) aSign = 0;
-    if ( aExp ) aSig0 |= UINT64_C(0x0001000000000000);
-    aSig0 |= ( aSig1 != 0 );
-    shiftCount = 0x4028 - aExp;
-    if ( 0 < shiftCount ) shift64RightJamming( aSig0, shiftCount, &aSig0 );
-    return roundAndPackInt32(aSign, aSig0, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the quadruple-precision floating-point
-| value `a' to the 32-bit two's complement integer format.  The conversion
-| is performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.  If
-| `a' is a NaN, the largest positive integer is returned.  Otherwise, if the
-| conversion overflows, the largest integer with the same sign as `a' is
-| returned.
-*----------------------------------------------------------------------------*/
-
-int32_t float128_to_int32_round_to_zero(float128 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp, shiftCount;
-    uint64_t aSig0, aSig1, savedASig;
-    int32_t z;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    aSign = extractFloat128Sign( a );
-    aSig0 |= ( aSig1 != 0 );
-    if ( 0x401E < aExp ) {
-        if ( ( aExp == 0x7FFF ) && aSig0 ) aSign = 0;
-        goto invalid;
-    }
-    else if ( aExp < 0x3FFF ) {
-        if (aExp || aSig0) {
-            float_raise(float_flag_inexact, status);
-        }
-        return 0;
-    }
-    aSig0 |= UINT64_C(0x0001000000000000);
-    shiftCount = 0x402F - aExp;
-    savedASig = aSig0;
-    aSig0 >>= shiftCount;
-    z = aSig0;
-    if ( aSign ) z = - z;
-    if ( ( z < 0 ) ^ aSign ) {
- invalid:
-        float_raise(float_flag_invalid, status);
-        return aSign ? INT32_MIN : INT32_MAX;
-    }
-    if ( ( aSig0<<shiftCount ) != savedASig ) {
-        float_raise(float_flag_inexact, status);
-    }
-    return z;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the quadruple-precision floating-point
-| value `a' to the 64-bit two's complement integer format.  The conversion
-| is performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| positive integer is returned.  Otherwise, if the conversion overflows, the
-| largest integer with the same sign as `a' is returned.
-*----------------------------------------------------------------------------*/
-
-int64_t float128_to_int64(float128 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp, shiftCount;
-    uint64_t aSig0, aSig1;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    aSign = extractFloat128Sign( a );
-    if ( aExp ) aSig0 |= UINT64_C(0x0001000000000000);
-    shiftCount = 0x402F - aExp;
-    if ( shiftCount <= 0 ) {
-        if ( 0x403E < aExp ) {
-            float_raise(float_flag_invalid, status);
-            if (    ! aSign
-                 || (    ( aExp == 0x7FFF )
-                      && ( aSig1 || ( aSig0 != UINT64_C(0x0001000000000000) ) )
-                    )
-               ) {
-                return INT64_MAX;
-            }
-            return INT64_MIN;
-        }
-        shortShift128Left( aSig0, aSig1, - shiftCount, &aSig0, &aSig1 );
-    }
-    else {
-        shift64ExtraRightJamming( aSig0, aSig1, shiftCount, &aSig0, &aSig1 );
-    }
-    return roundAndPackInt64(aSign, aSig0, aSig1, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the quadruple-precision floating-point
-| value `a' to the 64-bit two's complement integer format.  The conversion
-| is performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.
-| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
-| the conversion overflows, the largest integer with the same sign as `a' is
-| returned.
-*----------------------------------------------------------------------------*/
-
-int64_t float128_to_int64_round_to_zero(float128 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp, shiftCount;
-    uint64_t aSig0, aSig1;
-    int64_t z;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    aSign = extractFloat128Sign( a );
-    if ( aExp ) aSig0 |= UINT64_C(0x0001000000000000);
-    shiftCount = aExp - 0x402F;
-    if ( 0 < shiftCount ) {
-        if ( 0x403E <= aExp ) {
-            aSig0 &= UINT64_C(0x0000FFFFFFFFFFFF);
-            if (    ( a.high == UINT64_C(0xC03E000000000000) )
-                 && ( aSig1 < UINT64_C(0x0002000000000000) ) ) {
-                if (aSig1) {
-                    float_raise(float_flag_inexact, status);
-                }
-            }
-            else {
-                float_raise(float_flag_invalid, status);
-                if ( ! aSign || ( ( aExp == 0x7FFF ) && ( aSig0 | aSig1 ) ) ) {
-                    return INT64_MAX;
-                }
-            }
-            return INT64_MIN;
-        }
-        z = ( aSig0<<shiftCount ) | ( aSig1>>( ( - shiftCount ) & 63 ) );
-        if ( (uint64_t) ( aSig1<<shiftCount ) ) {
-            float_raise(float_flag_inexact, status);
-        }
-    }
-    else {
-        if ( aExp < 0x3FFF ) {
-            if ( aExp | aSig0 | aSig1 ) {
-                float_raise(float_flag_inexact, status);
-            }
-            return 0;
-        }
-        z = aSig0>>( - shiftCount );
-        if (    aSig1
-             || ( shiftCount && (uint64_t) ( aSig0<<( shiftCount & 63 ) ) ) ) {
-            float_raise(float_flag_inexact, status);
-        }
-    }
-    if ( aSign ) z = - z;
-    return z;
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of converting the quadruple-precision floating-point value
 | `a' to the 64-bit unsigned integer format.  The conversion is
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index b2c4624d8c..a897a5a743 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -751,3 +751,67 @@ static void partsN(round_to_int)(FloatPartsN *a, FloatRoundMode rmode,
         g_assert_not_reached();
     }
 }
+
+/*
+ * Returns the result of converting the floating-point value `a' to
+ * the two's complement integer format. The conversion is performed
+ * according to the IEC/IEEE Standard for Binary Floating-Point
+ * Arithmetic---which means in particular that the conversion is
+ * rounded according to the current rounding mode. If `a' is a NaN,
+ * the largest positive integer is returned. Otherwise, if the
+ * conversion overflows, the largest integer with the same sign as `a'
+ * is returned.
+*/
+static int64_t partsN(float_to_sint)(FloatPartsN *p, FloatRoundMode rmode,
+                                     int scale, int64_t min, int64_t max,
+                                     float_status *s)
+{
+    int flags = 0;
+    uint64_t r;
+
+    switch (p->cls) {
+    case float_class_snan:
+    case float_class_qnan:
+        flags = float_flag_invalid;
+        r = max;
+        break;
+
+    case float_class_inf:
+        flags = float_flag_invalid;
+        r = p->sign ? min : max;
+        break;
+
+    case float_class_zero:
+        return 0;
+
+    case float_class_normal:
+        /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
+        if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
+            flags = float_flag_inexact;
+        }
+
+        if (p->exp <= DECOMPOSED_BINARY_POINT) {
+            r = p->frac_hi >> (DECOMPOSED_BINARY_POINT - p->exp);
+        } else {
+            r = UINT64_MAX;
+        }
+        if (p->sign) {
+            if (r <= -(uint64_t)min) {
+                r = -r;
+            } else {
+                flags = float_flag_invalid;
+                r = min;
+            }
+        } else if (r > max) {
+            flags = float_flag_invalid;
+            r = max;
+        }
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+
+    float_raise(flags, s);
+    return r;
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 47/72] softfloat: Move rount_to_uint_and_pack to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (45 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 46/72] softfloat: Move rount_to_int_and_pack " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 48/72] softfloat: Move int_to_float " Richard Henderson
                   ` (29 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Rename to parts$N_float_to_uint.  Reimplement
float128_to_uint{32,64}{_round_to_zero} with FloatParts128.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 357 +++++++++-----------------------------
 fpu/softfloat-parts.c.inc |  68 +++++++-
 2 files changed, 147 insertions(+), 278 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ac8e726935..235ddda7a0 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -838,6 +838,16 @@ static int64_t parts128_float_to_sint(FloatParts128 *p, FloatRoundMode rmode,
 #define parts_float_to_sint(P, R, Z, MN, MX, S) \
     PARTS_GENERIC_64_128(float_to_sint, P)(P, R, Z, MN, MX, S)
 
+static uint64_t parts64_float_to_uint(FloatParts64 *p, FloatRoundMode rmode,
+                                      int scale, uint64_t max,
+                                      float_status *s);
+static uint64_t parts128_float_to_uint(FloatParts128 *p, FloatRoundMode rmode,
+                                       int scale, uint64_t max,
+                                       float_status *s);
+
+#define parts_float_to_uint(P, R, Z, M, S) \
+    PARTS_GENERIC_64_128(float_to_uint, P)(P, R, Z, M, S)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -2645,80 +2655,16 @@ int64_t bfloat16_to_int64_round_to_zero(bfloat16 a, float_status *s)
 }
 
 /*
- *  Returns the result of converting the floating-point value `a' to
- *  the unsigned integer format. The conversion is performed according
- *  to the IEC/IEEE Standard for Binary Floating-Point
- *  Arithmetic---which means in particular that the conversion is
- *  rounded according to the current rounding mode. If `a' is a NaN,
- *  the largest unsigned integer is returned. Otherwise, if the
- *  conversion overflows, the largest unsigned integer is returned. If
- *  the 'a' is negative, the result is rounded and zero is returned;
- *  values that do not round to zero will raise the inexact exception
- *  flag.
+ * Floating-point to unsigned integer conversions
  */
 
-static uint64_t round_to_uint_and_pack(FloatParts64 p, FloatRoundMode rmode,
-                                       int scale, uint64_t max,
-                                       float_status *s)
-{
-    int flags = 0;
-    uint64_t r;
-
-    switch (p.cls) {
-    case float_class_snan:
-    case float_class_qnan:
-        flags = float_flag_invalid;
-        r = max;
-        break;
-
-    case float_class_inf:
-        flags = float_flag_invalid;
-        r = p.sign ? 0 : max;
-        break;
-
-    case float_class_zero:
-        return 0;
-
-    case float_class_normal:
-        /* TODO: 62 = N - 2, frac_size for rounding */
-        if (parts_round_to_int_normal(&p, rmode, scale, 62)) {
-            flags = float_flag_inexact;
-            if (p.cls == float_class_zero) {
-                r = 0;
-                break;
-            }
-        }
-
-        if (p.sign) {
-            flags = float_flag_invalid;
-            r = 0;
-        } else if (p.exp > DECOMPOSED_BINARY_POINT) {
-            flags = float_flag_invalid;
-            r = max;
-        } else {
-            r = p.frac >> (DECOMPOSED_BINARY_POINT - p.exp);
-            if (r > max) {
-                flags = float_flag_invalid;
-                r = max;
-            }
-        }
-        break;
-
-    default:
-        g_assert_not_reached();
-    }
-
-    float_raise(flags, s);
-    return r;
-}
-
 uint8_t float16_to_uint8_scalbn(float16 a, FloatRoundMode rmode, int scale,
                                 float_status *s)
 {
     FloatParts64 p;
 
     float16_unpack_canonical(&p, a, s);
-    return round_to_uint_and_pack(p, rmode, scale, UINT8_MAX, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT8_MAX, s);
 }
 
 uint16_t float16_to_uint16_scalbn(float16 a, FloatRoundMode rmode, int scale,
@@ -2727,7 +2673,7 @@ uint16_t float16_to_uint16_scalbn(float16 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float16_unpack_canonical(&p, a, s);
-    return round_to_uint_and_pack(p, rmode, scale, UINT16_MAX, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT16_MAX, s);
 }
 
 uint32_t float16_to_uint32_scalbn(float16 a, FloatRoundMode rmode, int scale,
@@ -2736,7 +2682,7 @@ uint32_t float16_to_uint32_scalbn(float16 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float16_unpack_canonical(&p, a, s);
-    return round_to_uint_and_pack(p, rmode, scale, UINT32_MAX, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT32_MAX, s);
 }
 
 uint64_t float16_to_uint64_scalbn(float16 a, FloatRoundMode rmode, int scale,
@@ -2745,7 +2691,7 @@ uint64_t float16_to_uint64_scalbn(float16 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float16_unpack_canonical(&p, a, s);
-    return round_to_uint_and_pack(p, rmode, scale, UINT64_MAX, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT64_MAX, s);
 }
 
 uint16_t float32_to_uint16_scalbn(float32 a, FloatRoundMode rmode, int scale,
@@ -2754,7 +2700,7 @@ uint16_t float32_to_uint16_scalbn(float32 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float32_unpack_canonical(&p, a, s);
-    return round_to_uint_and_pack(p, rmode, scale, UINT16_MAX, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT16_MAX, s);
 }
 
 uint32_t float32_to_uint32_scalbn(float32 a, FloatRoundMode rmode, int scale,
@@ -2763,7 +2709,7 @@ uint32_t float32_to_uint32_scalbn(float32 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float32_unpack_canonical(&p, a, s);
-    return round_to_uint_and_pack(p, rmode, scale, UINT32_MAX, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT32_MAX, s);
 }
 
 uint64_t float32_to_uint64_scalbn(float32 a, FloatRoundMode rmode, int scale,
@@ -2772,7 +2718,7 @@ uint64_t float32_to_uint64_scalbn(float32 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float32_unpack_canonical(&p, a, s);
-    return round_to_uint_and_pack(p, rmode, scale, UINT64_MAX, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT64_MAX, s);
 }
 
 uint16_t float64_to_uint16_scalbn(float64 a, FloatRoundMode rmode, int scale,
@@ -2781,7 +2727,7 @@ uint16_t float64_to_uint16_scalbn(float64 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float64_unpack_canonical(&p, a, s);
-    return round_to_uint_and_pack(p, rmode, scale, UINT16_MAX, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT16_MAX, s);
 }
 
 uint32_t float64_to_uint32_scalbn(float64 a, FloatRoundMode rmode, int scale,
@@ -2790,7 +2736,7 @@ uint32_t float64_to_uint32_scalbn(float64 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float64_unpack_canonical(&p, a, s);
-    return round_to_uint_and_pack(p, rmode, scale, UINT32_MAX, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT32_MAX, s);
 }
 
 uint64_t float64_to_uint64_scalbn(float64 a, FloatRoundMode rmode, int scale,
@@ -2799,7 +2745,52 @@ uint64_t float64_to_uint64_scalbn(float64 a, FloatRoundMode rmode, int scale,
     FloatParts64 p;
 
     float64_unpack_canonical(&p, a, s);
-    return round_to_uint_and_pack(p, rmode, scale, UINT64_MAX, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT64_MAX, s);
+}
+
+uint16_t bfloat16_to_uint16_scalbn(bfloat16 a, FloatRoundMode rmode,
+                                   int scale, float_status *s)
+{
+    FloatParts64 p;
+
+    bfloat16_unpack_canonical(&p, a, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT16_MAX, s);
+}
+
+uint32_t bfloat16_to_uint32_scalbn(bfloat16 a, FloatRoundMode rmode,
+                                   int scale, float_status *s)
+{
+    FloatParts64 p;
+
+    bfloat16_unpack_canonical(&p, a, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT32_MAX, s);
+}
+
+uint64_t bfloat16_to_uint64_scalbn(bfloat16 a, FloatRoundMode rmode,
+                                   int scale, float_status *s)
+{
+    FloatParts64 p;
+
+    bfloat16_unpack_canonical(&p, a, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT64_MAX, s);
+}
+
+static uint32_t float128_to_uint32_scalbn(float128 a, FloatRoundMode rmode,
+                                          int scale, float_status *s)
+{
+    FloatParts128 p;
+
+    float128_unpack_canonical(&p, a, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT32_MAX, s);
+}
+
+static uint64_t float128_to_uint64_scalbn(float128 a, FloatRoundMode rmode,
+                                          int scale, float_status *s)
+{
+    FloatParts128 p;
+
+    float128_unpack_canonical(&p, a, s);
+    return parts_float_to_uint(&p, rmode, scale, UINT64_MAX, s);
 }
 
 uint8_t float16_to_uint8(float16 a, float_status *s)
@@ -2852,6 +2843,16 @@ uint64_t float64_to_uint64(float64 a, float_status *s)
     return float64_to_uint64_scalbn(a, s->float_rounding_mode, 0, s);
 }
 
+uint32_t float128_to_uint32(float128 a, float_status *s)
+{
+    return float128_to_uint32_scalbn(a, s->float_rounding_mode, 0, s);
+}
+
+uint64_t float128_to_uint64(float128 a, float_status *s)
+{
+    return float128_to_uint64_scalbn(a, s->float_rounding_mode, 0, s);
+}
+
 uint16_t float16_to_uint16_round_to_zero(float16 a, float_status *s)
 {
     return float16_to_uint16_scalbn(a, float_round_to_zero, 0, s);
@@ -2897,36 +2898,14 @@ uint64_t float64_to_uint64_round_to_zero(float64 a, float_status *s)
     return float64_to_uint64_scalbn(a, float_round_to_zero, 0, s);
 }
 
-/*
- *  Returns the result of converting the bfloat16 value `a' to
- *  the unsigned integer format.
- */
-
-uint16_t bfloat16_to_uint16_scalbn(bfloat16 a, FloatRoundMode rmode,
-                                   int scale, float_status *s)
+uint32_t float128_to_uint32_round_to_zero(float128 a, float_status *s)
 {
-    FloatParts64 p;
-
-    bfloat16_unpack_canonical(&p, a, s);
-    return round_to_uint_and_pack(p, rmode, scale, UINT16_MAX, s);
+    return float128_to_uint32_scalbn(a, float_round_to_zero, 0, s);
 }
 
-uint32_t bfloat16_to_uint32_scalbn(bfloat16 a, FloatRoundMode rmode,
-                                   int scale, float_status *s)
+uint64_t float128_to_uint64_round_to_zero(float128 a, float_status *s)
 {
-    FloatParts64 p;
-
-    bfloat16_unpack_canonical(&p, a, s);
-    return round_to_uint_and_pack(p, rmode, scale, UINT32_MAX, s);
-}
-
-uint64_t bfloat16_to_uint64_scalbn(bfloat16 a, FloatRoundMode rmode,
-                                   int scale, float_status *s)
-{
-    FloatParts64 p;
-
-    bfloat16_unpack_canonical(&p, a, s);
-    return round_to_uint_and_pack(p, rmode, scale, UINT64_MAX, s);
+    return float128_to_uint64_scalbn(a, float_round_to_zero, 0, s);
 }
 
 uint16_t bfloat16_to_uint16(bfloat16 a, float_status *s)
@@ -4122,66 +4101,6 @@ static int64_t roundAndPackInt64(bool zSign, uint64_t absZ0, uint64_t absZ1,
 
 }
 
-/*----------------------------------------------------------------------------
-| Takes the 128-bit fixed-point value formed by concatenating `absZ0' and
-| `absZ1', with binary point between bits 63 and 64 (between the input words),
-| and returns the properly rounded 64-bit unsigned integer corresponding to the
-| input.  Ordinarily, the fixed-point input is simply rounded to an integer,
-| with the inexact exception raised if the input cannot be represented exactly
-| as an integer.  However, if the fixed-point input is too large, the invalid
-| exception is raised and the largest unsigned integer is returned.
-*----------------------------------------------------------------------------*/
-
-static int64_t roundAndPackUint64(bool zSign, uint64_t absZ0,
-                                uint64_t absZ1, float_status *status)
-{
-    int8_t roundingMode;
-    bool roundNearestEven, increment;
-
-    roundingMode = status->float_rounding_mode;
-    roundNearestEven = (roundingMode == float_round_nearest_even);
-    switch (roundingMode) {
-    case float_round_nearest_even:
-    case float_round_ties_away:
-        increment = ((int64_t)absZ1 < 0);
-        break;
-    case float_round_to_zero:
-        increment = 0;
-        break;
-    case float_round_up:
-        increment = !zSign && absZ1;
-        break;
-    case float_round_down:
-        increment = zSign && absZ1;
-        break;
-    case float_round_to_odd:
-        increment = !(absZ0 & 1) && absZ1;
-        break;
-    default:
-        abort();
-    }
-    if (increment) {
-        ++absZ0;
-        if (absZ0 == 0) {
-            float_raise(float_flag_invalid, status);
-            return UINT64_MAX;
-        }
-        if (!(absZ1 << 1) && roundNearestEven) {
-            absZ0 &= ~1;
-        }
-    }
-
-    if (zSign && absZ0) {
-        float_raise(float_flag_invalid, status);
-        return 0;
-    }
-
-    if (absZ1) {
-        float_raise(float_flag_inexact, status);
-    }
-    return absZ0;
-}
-
 /*----------------------------------------------------------------------------
 | Normalizes the subnormal single-precision floating-point value represented
 | by the denormalized significand `aSig'.  The normalized exponent and
@@ -6535,122 +6454,6 @@ floatx80 floatx80_sqrt(floatx80 a, float_status *status)
                                 0, zExp, zSig0, zSig1, status);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the quadruple-precision floating-point value
-| `a' to the 64-bit unsigned integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| positive integer is returned.  If the conversion overflows, the
-| largest unsigned integer is returned.  If 'a' is negative, the value is
-| rounded and zero is returned; negative values that do not round to zero
-| will raise the inexact exception.
-*----------------------------------------------------------------------------*/
-
-uint64_t float128_to_uint64(float128 a, float_status *status)
-{
-    bool aSign;
-    int aExp;
-    int shiftCount;
-    uint64_t aSig0, aSig1;
-
-    aSig0 = extractFloat128Frac0(a);
-    aSig1 = extractFloat128Frac1(a);
-    aExp = extractFloat128Exp(a);
-    aSign = extractFloat128Sign(a);
-    if (aSign && (aExp > 0x3FFE)) {
-        float_raise(float_flag_invalid, status);
-        if (float128_is_any_nan(a)) {
-            return UINT64_MAX;
-        } else {
-            return 0;
-        }
-    }
-    if (aExp) {
-        aSig0 |= UINT64_C(0x0001000000000000);
-    }
-    shiftCount = 0x402F - aExp;
-    if (shiftCount <= 0) {
-        if (0x403E < aExp) {
-            float_raise(float_flag_invalid, status);
-            return UINT64_MAX;
-        }
-        shortShift128Left(aSig0, aSig1, -shiftCount, &aSig0, &aSig1);
-    } else {
-        shift64ExtraRightJamming(aSig0, aSig1, shiftCount, &aSig0, &aSig1);
-    }
-    return roundAndPackUint64(aSign, aSig0, aSig1, status);
-}
-
-uint64_t float128_to_uint64_round_to_zero(float128 a, float_status *status)
-{
-    uint64_t v;
-    signed char current_rounding_mode = status->float_rounding_mode;
-
-    set_float_rounding_mode(float_round_to_zero, status);
-    v = float128_to_uint64(a, status);
-    set_float_rounding_mode(current_rounding_mode, status);
-
-    return v;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the quadruple-precision floating-point
-| value `a' to the 32-bit unsigned integer format.  The conversion
-| is performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic except that the conversion is always rounded toward zero.
-| If `a' is a NaN, the largest positive integer is returned.  Otherwise,
-| if the conversion overflows, the largest unsigned integer is returned.
-| If 'a' is negative, the value is rounded and zero is returned; negative
-| values that do not round to zero will raise the inexact exception.
-*----------------------------------------------------------------------------*/
-
-uint32_t float128_to_uint32_round_to_zero(float128 a, float_status *status)
-{
-    uint64_t v;
-    uint32_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float128_to_uint64_round_to_zero(a, status);
-    if (v > 0xffffffff) {
-        res = 0xffffffff;
-    } else {
-        return v;
-    }
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the quadruple-precision floating-point value
-| `a' to the 32-bit unsigned integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| positive integer is returned.  If the conversion overflows, the
-| largest unsigned integer is returned.  If 'a' is negative, the value is
-| rounded and zero is returned; negative values that do not round to zero
-| will raise the inexact exception.
-*----------------------------------------------------------------------------*/
-
-uint32_t float128_to_uint32(float128 a, float_status *status)
-{
-    uint64_t v;
-    uint32_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float128_to_uint64(a, status);
-    if (v > 0xffffffff) {
-        res = 0xffffffff;
-    } else {
-        return v;
-    }
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of converting the quadruple-precision floating-point
 | value `a' to the extended double-precision floating-point format.  The
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index a897a5a743..c6e327547f 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -761,7 +761,7 @@ static void partsN(round_to_int)(FloatPartsN *a, FloatRoundMode rmode,
  * the largest positive integer is returned. Otherwise, if the
  * conversion overflows, the largest integer with the same sign as `a'
  * is returned.
-*/
+ */
 static int64_t partsN(float_to_sint)(FloatPartsN *p, FloatRoundMode rmode,
                                      int scale, int64_t min, int64_t max,
                                      float_status *s)
@@ -815,3 +815,69 @@ static int64_t partsN(float_to_sint)(FloatPartsN *p, FloatRoundMode rmode,
     float_raise(flags, s);
     return r;
 }
+
+/*
+ *  Returns the result of converting the floating-point value `a' to
+ *  the unsigned integer format. The conversion is performed according
+ *  to the IEC/IEEE Standard for Binary Floating-Point
+ *  Arithmetic---which means in particular that the conversion is
+ *  rounded according to the current rounding mode. If `a' is a NaN,
+ *  the largest unsigned integer is returned. Otherwise, if the
+ *  conversion overflows, the largest unsigned integer is returned. If
+ *  the 'a' is negative, the result is rounded and zero is returned;
+ *  values that do not round to zero will raise the inexact exception
+ *  flag.
+ */
+static uint64_t partsN(float_to_uint)(FloatPartsN *p, FloatRoundMode rmode,
+                                      int scale, uint64_t max, float_status *s)
+{
+    int flags = 0;
+    uint64_t r;
+
+    switch (p->cls) {
+    case float_class_snan:
+    case float_class_qnan:
+        flags = float_flag_invalid;
+        r = max;
+        break;
+
+    case float_class_inf:
+        flags = float_flag_invalid;
+        r = p->sign ? 0 : max;
+        break;
+
+    case float_class_zero:
+        return 0;
+
+    case float_class_normal:
+        /* TODO: N - 2 is frac_size for rounding; could use input fmt. */
+        if (parts_round_to_int_normal(p, rmode, scale, N - 2)) {
+            flags = float_flag_inexact;
+            if (p->cls == float_class_zero) {
+                r = 0;
+                break;
+            }
+        }
+
+        if (p->sign) {
+            flags = float_flag_invalid;
+            r = 0;
+        } else if (p->exp > DECOMPOSED_BINARY_POINT) {
+            flags = float_flag_invalid;
+            r = max;
+        } else {
+            r = p->frac_hi >> (DECOMPOSED_BINARY_POINT - p->exp);
+            if (r > max) {
+                flags = float_flag_invalid;
+                r = max;
+            }
+        }
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+
+    float_raise(flags, s);
+    return r;
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 48/72] softfloat: Move int_to_float to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (46 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 47/72] softfloat: Move rount_to_uint_and_pack " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 49/72] softfloat: Move uint_to_float " Richard Henderson
                   ` (28 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Rename to parts$N_sint_to_float.
Reimplement int{32,64}_to_float128 with FloatParts128.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 136 +++++++++++---------------------------
 fpu/softfloat-parts.c.inc |  32 +++++++++
 2 files changed, 70 insertions(+), 98 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 235ddda7a0..d843ea67c4 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -848,6 +848,14 @@ static uint64_t parts128_float_to_uint(FloatParts128 *p, FloatRoundMode rmode,
 #define parts_float_to_uint(P, R, Z, M, S) \
     PARTS_GENERIC_64_128(float_to_uint, P)(P, R, Z, M, S)
 
+static void parts64_sint_to_float(FloatParts64 *p, int64_t a,
+                                  int scale, float_status *s);
+static void parts128_sint_to_float(FloatParts128 *p, int64_t a,
+                                   int scale, float_status *s);
+
+#define parts_sint_to_float(P, I, Z, S) \
+    PARTS_GENERIC_64_128(sint_to_float, P)(P, I, Z, S)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -2939,42 +2947,15 @@ uint64_t bfloat16_to_uint64_round_to_zero(bfloat16 a, float_status *s)
 }
 
 /*
- * Integer to float conversions
- *
- * Returns the result of converting the two's complement integer `a'
- * to the floating-point format. The conversion is performed according
- * to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+ * Signed integer to floating-point conversions
  */
 
-static FloatParts64 int_to_float(int64_t a, int scale, float_status *status)
-{
-    FloatParts64 r = { .sign = false };
-
-    if (a == 0) {
-        r.cls = float_class_zero;
-    } else {
-        uint64_t f = a;
-        int shift;
-
-        r.cls = float_class_normal;
-        if (a < 0) {
-            f = -f;
-            r.sign = true;
-        }
-        shift = clz64(f);
-        scale = MIN(MAX(scale, -0x10000), 0x10000);
-
-        r.exp = DECOMPOSED_BINARY_POINT - shift + scale;
-        r.frac = f << shift;
-    }
-
-    return r;
-}
-
 float16 int64_to_float16_scalbn(int64_t a, int scale, float_status *status)
 {
-    FloatParts64 pa = int_to_float(a, scale, status);
-    return float16_round_pack_canonical(&pa, status);
+    FloatParts64 p;
+
+    parts_sint_to_float(&p, a, scale, status);
+    return float16_round_pack_canonical(&p, status);
 }
 
 float16 int32_to_float16_scalbn(int32_t a, int scale, float_status *status)
@@ -3009,8 +2990,10 @@ float16 int8_to_float16(int8_t a, float_status *status)
 
 float32 int64_to_float32_scalbn(int64_t a, int scale, float_status *status)
 {
-    FloatParts64 pa = int_to_float(a, scale, status);
-    return float32_round_pack_canonical(&pa, status);
+    FloatParts64 p;
+
+    parts64_sint_to_float(&p, a, scale, status);
+    return float32_round_pack_canonical(&p, status);
 }
 
 float32 int32_to_float32_scalbn(int32_t a, int scale, float_status *status)
@@ -3040,8 +3023,10 @@ float32 int16_to_float32(int16_t a, float_status *status)
 
 float64 int64_to_float64_scalbn(int64_t a, int scale, float_status *status)
 {
-    FloatParts64 pa = int_to_float(a, scale, status);
-    return float64_round_pack_canonical(&pa, status);
+    FloatParts64 p;
+
+    parts_sint_to_float(&p, a, scale, status);
+    return float64_round_pack_canonical(&p, status);
 }
 
 float64 int32_to_float64_scalbn(int32_t a, int scale, float_status *status)
@@ -3069,15 +3054,12 @@ float64 int16_to_float64(int16_t a, float_status *status)
     return int64_to_float64_scalbn(a, 0, status);
 }
 
-/*
- * Returns the result of converting the two's complement integer `a'
- * to the bfloat16 format.
- */
-
 bfloat16 int64_to_bfloat16_scalbn(int64_t a, int scale, float_status *status)
 {
-    FloatParts64 pa = int_to_float(a, scale, status);
-    return bfloat16_round_pack_canonical(&pa, status);
+    FloatParts64 p;
+
+    parts_sint_to_float(&p, a, scale, status);
+    return bfloat16_round_pack_canonical(&p, status);
 }
 
 bfloat16 int32_to_bfloat16_scalbn(int32_t a, int scale, float_status *status)
@@ -3105,6 +3087,19 @@ bfloat16 int16_to_bfloat16(int16_t a, float_status *status)
     return int64_to_bfloat16_scalbn(a, 0, status);
 }
 
+float128 int64_to_float128(int64_t a, float_status *status)
+{
+    FloatParts128 p;
+
+    parts_sint_to_float(&p, a, 0, status);
+    return float128_round_pack_canonical(&p, status);
+}
+
+float128 int32_to_float128(int32_t a, float_status *status)
+{
+    return int64_to_float128(a, status);
+}
+
 /*
  * Unsigned Integer to float conversions
  *
@@ -4955,28 +4950,6 @@ floatx80 int32_to_floatx80(int32_t a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 32-bit two's complement integer `a' to
-| the quadruple-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float128 int32_to_float128(int32_t a, float_status *status)
-{
-    bool zSign;
-    uint32_t absA;
-    int8_t shiftCount;
-    uint64_t zSig0;
-
-    if ( a == 0 ) return packFloat128( 0, 0, 0, 0 );
-    zSign = ( a < 0 );
-    absA = zSign ? - a : a;
-    shiftCount = clz32(absA) + 17;
-    zSig0 = absA;
-    return packFloat128( zSign, 0x402E - shiftCount, zSig0<<shiftCount, 0 );
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of converting the 64-bit two's complement integer `a'
 | to the extended double-precision floating-point format.  The conversion
@@ -4998,39 +4971,6 @@ floatx80 int64_to_floatx80(int64_t a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 64-bit two's complement integer `a' to
-| the quadruple-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float128 int64_to_float128(int64_t a, float_status *status)
-{
-    bool zSign;
-    uint64_t absA;
-    int8_t shiftCount;
-    int32_t zExp;
-    uint64_t zSig0, zSig1;
-
-    if ( a == 0 ) return packFloat128( 0, 0, 0, 0 );
-    zSign = ( a < 0 );
-    absA = zSign ? - a : a;
-    shiftCount = clz64(absA) + 49;
-    zExp = 0x406E - shiftCount;
-    if ( 64 <= shiftCount ) {
-        zSig1 = 0;
-        zSig0 = absA;
-        shiftCount -= 64;
-    }
-    else {
-        zSig1 = absA;
-        zSig0 = 0;
-    }
-    shortShift128Left( zSig0, zSig1, shiftCount, &zSig0, &zSig1 );
-    return packFloat128( zSign, zExp, zSig0, zSig1 );
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of converting the 64-bit unsigned integer `a'
 | to the quadruple-precision floating-point format.  The conversion is performed
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index c6e327547f..8102de1307 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -881,3 +881,35 @@ static uint64_t partsN(float_to_uint)(FloatPartsN *p, FloatRoundMode rmode,
     float_raise(flags, s);
     return r;
 }
+
+/*
+ * Integer to float conversions
+ *
+ * Returns the result of converting the two's complement integer `a'
+ * to the floating-point format. The conversion is performed according
+ * to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+ */
+static void partsN(sint_to_float)(FloatPartsN *p, int64_t a,
+                                  int scale, float_status *s)
+{
+    uint64_t f = a;
+    int shift;
+
+    memset(p, 0, sizeof(*p));
+
+    if (a == 0) {
+        p->cls = float_class_zero;
+        return;
+    }
+
+    p->cls = float_class_normal;
+    if (a < 0) {
+        f = -f;
+        p->sign = true;
+    }
+    shift = clz64(f);
+    scale = MIN(MAX(scale, -0x10000), 0x10000);
+
+    p->exp = DECOMPOSED_BINARY_POINT - shift + scale;
+    p->frac_hi = f << shift;
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 49/72] softfloat: Move uint_to_float to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (47 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 48/72] softfloat: Move int_to_float " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 50/72] softfloat: Move minmax_flags " Richard Henderson
                   ` (27 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Rename to parts$N_uint_to_float.
Reimplement uint64_to_float128 with FloatParts128.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 83 ++++++++++++++++-----------------------
 fpu/softfloat-parts.c.inc | 23 +++++++++++
 2 files changed, 56 insertions(+), 50 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index d843ea67c4..586ea5d67a 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -856,6 +856,14 @@ static void parts128_sint_to_float(FloatParts128 *p, int64_t a,
 #define parts_sint_to_float(P, I, Z, S) \
     PARTS_GENERIC_64_128(sint_to_float, P)(P, I, Z, S)
 
+static void parts64_uint_to_float(FloatParts64 *p, uint64_t a,
+                                  int scale, float_status *s);
+static void parts128_uint_to_float(FloatParts128 *p, uint64_t a,
+                                   int scale, float_status *s);
+
+#define parts_uint_to_float(P, I, Z, S) \
+    PARTS_GENERIC_64_128(uint_to_float, P)(P, I, Z, S)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -3101,35 +3109,15 @@ float128 int32_to_float128(int32_t a, float_status *status)
 }
 
 /*
- * Unsigned Integer to float conversions
- *
- * Returns the result of converting the unsigned integer `a' to the
- * floating-point format. The conversion is performed according to the
- * IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+ * Unsigned Integer to floating-point conversions
  */
 
-static FloatParts64 uint_to_float(uint64_t a, int scale, float_status *status)
-{
-    FloatParts64 r = { .sign = false };
-    int shift;
-
-    if (a == 0) {
-        r.cls = float_class_zero;
-    } else {
-        scale = MIN(MAX(scale, -0x10000), 0x10000);
-        shift = clz64(a);
-        r.cls = float_class_normal;
-        r.exp = DECOMPOSED_BINARY_POINT - shift + scale;
-        r.frac = a << shift;
-    }
-
-    return r;
-}
-
 float16 uint64_to_float16_scalbn(uint64_t a, int scale, float_status *status)
 {
-    FloatParts64 pa = uint_to_float(a, scale, status);
-    return float16_round_pack_canonical(&pa, status);
+    FloatParts64 p;
+
+    parts_uint_to_float(&p, a, scale, status);
+    return float16_round_pack_canonical(&p, status);
 }
 
 float16 uint32_to_float16_scalbn(uint32_t a, int scale, float_status *status)
@@ -3164,8 +3152,10 @@ float16 uint8_to_float16(uint8_t a, float_status *status)
 
 float32 uint64_to_float32_scalbn(uint64_t a, int scale, float_status *status)
 {
-    FloatParts64 pa = uint_to_float(a, scale, status);
-    return float32_round_pack_canonical(&pa, status);
+    FloatParts64 p;
+
+    parts_uint_to_float(&p, a, scale, status);
+    return float32_round_pack_canonical(&p, status);
 }
 
 float32 uint32_to_float32_scalbn(uint32_t a, int scale, float_status *status)
@@ -3195,8 +3185,10 @@ float32 uint16_to_float32(uint16_t a, float_status *status)
 
 float64 uint64_to_float64_scalbn(uint64_t a, int scale, float_status *status)
 {
-    FloatParts64 pa = uint_to_float(a, scale, status);
-    return float64_round_pack_canonical(&pa, status);
+    FloatParts64 p;
+
+    parts_uint_to_float(&p, a, scale, status);
+    return float64_round_pack_canonical(&p, status);
 }
 
 float64 uint32_to_float64_scalbn(uint32_t a, int scale, float_status *status)
@@ -3224,15 +3216,12 @@ float64 uint16_to_float64(uint16_t a, float_status *status)
     return uint64_to_float64_scalbn(a, 0, status);
 }
 
-/*
- * Returns the result of converting the unsigned integer `a' to the
- * bfloat16 format.
- */
-
 bfloat16 uint64_to_bfloat16_scalbn(uint64_t a, int scale, float_status *status)
 {
-    FloatParts64 pa = uint_to_float(a, scale, status);
-    return bfloat16_round_pack_canonical(&pa, status);
+    FloatParts64 p;
+
+    parts_uint_to_float(&p, a, scale, status);
+    return bfloat16_round_pack_canonical(&p, status);
 }
 
 bfloat16 uint32_to_bfloat16_scalbn(uint32_t a, int scale, float_status *status)
@@ -3260,6 +3249,14 @@ bfloat16 uint16_to_bfloat16(uint16_t a, float_status *status)
     return uint64_to_bfloat16_scalbn(a, 0, status);
 }
 
+float128 uint64_to_float128(uint64_t a, float_status *status)
+{
+    FloatParts128 p;
+
+    parts_uint_to_float(&p, a, 0, status);
+    return float128_round_pack_canonical(&p, status);
+}
+
 /* Float Min/Max */
 /* min() and max() functions. These can't be implemented as
  * 'compare and pick one input' because that would mishandle
@@ -4971,20 +4968,6 @@ floatx80 int64_to_floatx80(int64_t a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 64-bit unsigned integer `a'
-| to the quadruple-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float128 uint64_to_float128(uint64_t a, float_status *status)
-{
-    if (a == 0) {
-        return float128_zero;
-    }
-    return normalizeRoundAndPackFloat128(0, 0x406E, 0, a, status);
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of converting the single-precision floating-point value
 | `a' to the extended double-precision floating-point format.  The conversion
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 8102de1307..f3c4f8c8d2 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -913,3 +913,26 @@ static void partsN(sint_to_float)(FloatPartsN *p, int64_t a,
     p->exp = DECOMPOSED_BINARY_POINT - shift + scale;
     p->frac_hi = f << shift;
 }
+
+/*
+ * Unsigned Integer to float conversions
+ *
+ * Returns the result of converting the unsigned integer `a' to the
+ * floating-point format. The conversion is performed according to the
+ * IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+ */
+static void partsN(uint_to_float)(FloatPartsN *p, uint64_t a,
+                                  int scale, float_status *status)
+{
+    memset(p, 0, sizeof(*p));
+
+    if (a == 0) {
+        p->cls = float_class_zero;
+    } else {
+        int shift = clz64(a);
+        scale = MIN(MAX(scale, -0x10000), 0x10000);
+        p->cls = float_class_normal;
+        p->exp = DECOMPOSED_BINARY_POINT - shift + scale;
+        p->frac_hi = a << shift;
+    }
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 50/72] softfloat: Move minmax_flags to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (48 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 49/72] softfloat: Move uint_to_float " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-17 13:14   ` David Hildenbrand
  2021-05-08  1:47 ` [PATCH 51/72] softfloat: Move compare_floats " Richard Henderson
                   ` (26 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Rename to parts$N_minmax.  Combine 3 bool arguments to a bitmask,
return a tri-state value to indicate nan vs unchanged operand.
Introduce ftype_minmax functions as a common optimization point.
Fold bfloat16 expansions into the same macro as the other types.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 216 ++++++++++++++++----------------------
 fpu/softfloat-parts.c.inc |  69 ++++++++++++
 2 files changed, 158 insertions(+), 127 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 586ea5d67a..4c04e88a3a 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -482,6 +482,15 @@ enum {
     float_cmask_anynan  = float_cmask_qnan | float_cmask_snan,
 };
 
+/* Flags for parts_minmax. */
+enum {
+    /* Set for minimum; clear for maximum. */
+    minmax_ismin = 1,
+    /* Set for the IEEE 754-2008 minNum() and maxNum() operations. */
+    minmax_isnum = 2,
+    /* Set for the IEEE 754-2008 minNumMag() and minNumMag() operations. */
+    minmax_ismag = 4 | minmax_isnum
+};
 
 /* Simple helpers for checking if, or what kind of, NaN we have */
 static inline __attribute__((unused)) bool is_nan(FloatClass c)
@@ -864,6 +873,14 @@ static void parts128_uint_to_float(FloatParts128 *p, uint64_t a,
 #define parts_uint_to_float(P, I, Z, S) \
     PARTS_GENERIC_64_128(uint_to_float, P)(P, I, Z, S)
 
+static int parts64_minmax(FloatParts64 *a, FloatParts64 *b,
+                          float_status *s, int flags, const FloatFmt *fmt);
+static int parts128_minmax(FloatParts128 *a, FloatParts128 *b,
+                           float_status *s, int flags, const FloatFmt *fmt);
+
+#define parts_minmax(A, B, S, Z, F) \
+    PARTS_GENERIC_64_128(minmax, A)(A, B, S, Z, F)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -3257,145 +3274,90 @@ float128 uint64_to_float128(uint64_t a, float_status *status)
     return float128_round_pack_canonical(&p, status);
 }
 
-/* Float Min/Max */
-/* min() and max() functions. These can't be implemented as
- * 'compare and pick one input' because that would mishandle
- * NaNs and +0 vs -0.
- *
- * minnum() and maxnum() functions. These are similar to the min()
- * and max() functions but if one of the arguments is a QNaN and
- * the other is numerical then the numerical argument is returned.
- * SNaNs will get quietened before being returned.
- * minnum() and maxnum correspond to the IEEE 754-2008 minNum()
- * and maxNum() operations. min() and max() are the typical min/max
- * semantics provided by many CPUs which predate that specification.
- *
- * minnummag() and maxnummag() functions correspond to minNumMag()
- * and minNumMag() from the IEEE-754 2008.
+/*
+ * Minimum and maximum
  */
-static FloatParts64 minmax_floats(FloatParts64 a, FloatParts64 b, bool ismin,
-                                bool ieee, bool ismag, float_status *s)
+
+static float16 float16_minmax(float16 a, float16 b, float_status *s, int flags)
 {
-    if (unlikely(is_nan(a.cls) || is_nan(b.cls))) {
-        if (ieee) {
-            /* Takes two floating-point values `a' and `b', one of
-             * which is a NaN, and returns the appropriate NaN
-             * result. If either `a' or `b' is a signaling NaN,
-             * the invalid exception is raised.
-             */
-            if (is_snan(a.cls) || is_snan(b.cls)) {
-                return *parts_pick_nan(&a, &b, s);
-            } else if (is_nan(a.cls) && !is_nan(b.cls)) {
-                return b;
-            } else if (is_nan(b.cls) && !is_nan(a.cls)) {
-                return a;
-            }
-        }
-        return *parts_pick_nan(&a, &b, s);
-    } else {
-        int a_exp, b_exp;
+    FloatParts64 pa, pb;
+    int which;
 
-        switch (a.cls) {
-        case float_class_normal:
-            a_exp = a.exp;
-            break;
-        case float_class_inf:
-            a_exp = INT_MAX;
-            break;
-        case float_class_zero:
-            a_exp = INT_MIN;
-            break;
-        default:
-            g_assert_not_reached();
-            break;
-        }
-        switch (b.cls) {
-        case float_class_normal:
-            b_exp = b.exp;
-            break;
-        case float_class_inf:
-            b_exp = INT_MAX;
-            break;
-        case float_class_zero:
-            b_exp = INT_MIN;
-            break;
-        default:
-            g_assert_not_reached();
-            break;
-        }
-
-        if (ismag && (a_exp != b_exp || a.frac != b.frac)) {
-            bool a_less = a_exp < b_exp;
-            if (a_exp == b_exp) {
-                a_less = a.frac < b.frac;
-            }
-            return a_less ^ ismin ? b : a;
-        }
-
-        if (a.sign == b.sign) {
-            bool a_less = a_exp < b_exp;
-            if (a_exp == b_exp) {
-                a_less = a.frac < b.frac;
-            }
-            return a.sign ^ a_less ^ ismin ? b : a;
-        } else {
-            return a.sign ^ ismin ? b : a;
-        }
+    float16_unpack_canonical(&pa, a, s);
+    float16_unpack_canonical(&pb, b, s);
+    which = parts_minmax(&pa, &pb, s, flags, &float16_params);
+    if (unlikely(which < 0)) {
+        /* Some sort of nan, need to repack default and silenced nans. */
+        return float16_round_pack_canonical(&pa, s);
     }
+    return which ? b : a;
 }
 
-#define MINMAX(sz, name, ismin, isiee, ismag)                           \
-float ## sz float ## sz ## _ ## name(float ## sz a, float ## sz b,      \
-                                     float_status *s)                   \
-{                                                                       \
-    FloatParts64 pa, pb, pr;                                            \
-    float ## sz ## _unpack_canonical(&pa, a, s);                        \
-    float ## sz ## _unpack_canonical(&pb, b, s);                        \
-    pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);                 \
-    return float ## sz ## _round_pack_canonical(&pr, s);                \
+static bfloat16 bfloat16_minmax(bfloat16 a, bfloat16 b,
+                                float_status *s, int flags)
+{
+    FloatParts64 pa, pb;
+    int which;
+
+    bfloat16_unpack_canonical(&pa, a, s);
+    bfloat16_unpack_canonical(&pb, b, s);
+    which = parts_minmax(&pa, &pb, s, flags, &float16_params);
+    if (unlikely(which < 0)) {
+        /* Some sort of nan, need to repack default and silenced nans. */
+        return bfloat16_round_pack_canonical(&pa, s);
+    }
+    return which ? b : a;
 }
 
-MINMAX(16, min, true, false, false)
-MINMAX(16, minnum, true, true, false)
-MINMAX(16, minnummag, true, true, true)
-MINMAX(16, max, false, false, false)
-MINMAX(16, maxnum, false, true, false)
-MINMAX(16, maxnummag, false, true, true)
+static float32 float32_minmax(float32 a, float32 b, float_status *s, int flags)
+{
+    FloatParts64 pa, pb;
+    int which;
 
-MINMAX(32, min, true, false, false)
-MINMAX(32, minnum, true, true, false)
-MINMAX(32, minnummag, true, true, true)
-MINMAX(32, max, false, false, false)
-MINMAX(32, maxnum, false, true, false)
-MINMAX(32, maxnummag, false, true, true)
-
-MINMAX(64, min, true, false, false)
-MINMAX(64, minnum, true, true, false)
-MINMAX(64, minnummag, true, true, true)
-MINMAX(64, max, false, false, false)
-MINMAX(64, maxnum, false, true, false)
-MINMAX(64, maxnummag, false, true, true)
-
-#undef MINMAX
-
-#define BF16_MINMAX(name, ismin, isiee, ismag)                          \
-bfloat16 bfloat16_ ## name(bfloat16 a, bfloat16 b, float_status *s)     \
-{                                                                       \
-    FloatParts64 pa, pb, pr;                                            \
-    bfloat16_unpack_canonical(&pa, a, s);                               \
-    bfloat16_unpack_canonical(&pb, b, s);                               \
-    pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);                 \
-    return bfloat16_round_pack_canonical(&pr, s);                       \
+    float32_unpack_canonical(&pa, a, s);
+    float32_unpack_canonical(&pb, b, s);
+    which = parts_minmax(&pa, &pb, s, flags, &float32_params);
+    if (unlikely(which < 0)) {
+        /* Some sort of nan, need to repack default and silenced nans. */
+        return float32_round_pack_canonical(&pa, s);
+    }
+    return which ? b : a;
 }
 
-BF16_MINMAX(min, true, false, false)
-BF16_MINMAX(minnum, true, true, false)
-BF16_MINMAX(minnummag, true, true, true)
-BF16_MINMAX(max, false, false, false)
-BF16_MINMAX(maxnum, false, true, false)
-BF16_MINMAX(maxnummag, false, true, true)
+static float64 float64_minmax(float64 a, float64 b, float_status *s, int flags)
+{
+    FloatParts64 pa, pb;
+    int which;
 
-#undef BF16_MINMAX
+    float64_unpack_canonical(&pa, a, s);
+    float64_unpack_canonical(&pb, b, s);
+    which = parts_minmax(&pa, &pb, s, flags, &float64_params);
+    if (unlikely(which < 0)) {
+        /* Some sort of nan, need to repack default and silenced nans. */
+        return float64_round_pack_canonical(&pa, s);
+    }
+    return which ? b : a;
+}
+
+#define MINMAX_1(type, name, flags) \
+    type type##_##name(type a, type b, float_status *s) \
+    { return type##_minmax(a, b, s, flags); }
+
+#define MINMAX_2(type) \
+    MINMAX_1(type, max, 0)                                      \
+    MINMAX_1(type, maxnum, minmax_isnum)                        \
+    MINMAX_1(type, maxnummag, minmax_ismag)                     \
+    MINMAX_1(type, min, minmax_ismin)                           \
+    MINMAX_1(type, minnum, minmax_ismin | minmax_isnum)         \
+    MINMAX_1(type, minnummag, minmax_ismin | minmax_ismag)
+
+MINMAX_2(float16)
+MINMAX_2(bfloat16)
+MINMAX_2(float32)
+MINMAX_2(float64)
+
+#undef MINMAX_1
+#undef MINMAX_2
 
 /* Floating point compare */
 static FloatRelation compare_floats(FloatParts64 a, FloatParts64 b, bool is_quiet,
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index f3c4f8c8d2..4d91ef0d32 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -936,3 +936,72 @@ static void partsN(uint_to_float)(FloatPartsN *p, uint64_t a,
         p->frac_hi = a << shift;
     }
 }
+
+/*
+ * Float min/max.
+ *
+ * Return -1 to return the chosen nan in *a;
+ * return 0 to use the a input unchanged; 1 to use the b input unchanged.
+ */
+static int partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
+                          float_status *s, int flags, const FloatFmt *fmt)
+{
+    int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
+    int a_exp, b_exp;
+    bool a_less;
+
+    if (unlikely(ab_mask & float_cmask_anynan)) {
+        /*
+         * For minnum/maxnum, if one operand is a QNaN, and the other
+         * operand is numerical, then return numerical argument.
+         */
+        if ((flags & minmax_isnum)
+            && !(ab_mask & float_cmask_snan)
+            && (ab_mask & ~float_cmask_qnan)) {
+            return is_nan(a->cls);
+        }
+        *a = *parts_pick_nan(a, b, s);
+        return -1;
+    }
+
+    a_exp = a->exp;
+    b_exp = b->exp;
+
+    if (unlikely(ab_mask != float_cmask_normal)) {
+        switch (a->cls) {
+        case float_class_normal:
+            break;
+        case float_class_inf:
+            a_exp = INT_MAX;
+            break;
+        case float_class_zero:
+            a_exp = INT_MIN;
+            break;
+        default:
+            g_assert_not_reached();
+            break;
+        }
+        switch (b->cls) {
+        case float_class_normal:
+            break;
+        case float_class_inf:
+            b_exp = INT_MAX;
+            break;
+        case float_class_zero:
+            b_exp = INT_MIN;
+            break;
+        default:
+            g_assert_not_reached();
+            break;
+        }
+    }
+
+    if (a->sign != b->sign && !(flags & minmax_ismag)) {
+        a_less = a->sign;
+    } else if (a_exp != b_exp) {
+        a_less = a_exp < b_exp;
+    } else {
+        a_less = frac_cmp(a, b) < 0;
+    }
+    return a_less ^ !!(flags & minmax_ismin);
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 51/72] softfloat: Move compare_floats to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (49 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 50/72] softfloat: Move minmax_flags " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 52/72] softfloat: Move scalbn_decomposed " Richard Henderson
                   ` (25 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Rename to parts$N_compare.  Rename all of the intermediate
functions to ftype_do_compare.  Rename the hard-float functions
to ftype_hs_compare.  Convert float128 to FloatParts128.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 208 ++++++++++++++------------------------
 fpu/softfloat-parts.c.inc |  57 +++++++++++
 2 files changed, 133 insertions(+), 132 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 4c04e88a3a..2faceadf8b 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -881,6 +881,14 @@ static int parts128_minmax(FloatParts128 *a, FloatParts128 *b,
 #define parts_minmax(A, B, S, Z, F) \
     PARTS_GENERIC_64_128(minmax, A)(A, B, S, Z, F)
 
+static int parts64_compare(FloatParts64 *a, FloatParts64 *b,
+                           float_status *s, bool q);
+static int parts128_compare(FloatParts128 *a, FloatParts128 *b,
+                            float_status *s, bool q);
+
+#define parts_compare(A, B, S, Q) \
+    PARTS_GENERIC_64_128(compare, A)(A, B, S, Q)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -3359,92 +3367,42 @@ MINMAX_2(float64)
 #undef MINMAX_1
 #undef MINMAX_2
 
-/* Floating point compare */
-static FloatRelation compare_floats(FloatParts64 a, FloatParts64 b, bool is_quiet,
-                                    float_status *s)
+/*
+ * Floating point compare
+ */
+
+static FloatRelation QEMU_FLATTEN
+float16_do_compare(float16 a, float16 b, float_status *s, bool is_quiet)
 {
-    if (is_nan(a.cls) || is_nan(b.cls)) {
-        if (!is_quiet ||
-            a.cls == float_class_snan ||
-            b.cls == float_class_snan) {
-            float_raise(float_flag_invalid, s);
-        }
-        return float_relation_unordered;
-    }
+    FloatParts64 pa, pb;
 
-    if (a.cls == float_class_zero) {
-        if (b.cls == float_class_zero) {
-            return float_relation_equal;
-        }
-        return b.sign ? float_relation_greater : float_relation_less;
-    } else if (b.cls == float_class_zero) {
-        return a.sign ? float_relation_less : float_relation_greater;
-    }
-
-    /* The only really important thing about infinity is its sign. If
-     * both are infinities the sign marks the smallest of the two.
-     */
-    if (a.cls == float_class_inf) {
-        if ((b.cls == float_class_inf) && (a.sign == b.sign)) {
-            return float_relation_equal;
-        }
-        return a.sign ? float_relation_less : float_relation_greater;
-    } else if (b.cls == float_class_inf) {
-        return b.sign ? float_relation_greater : float_relation_less;
-    }
-
-    if (a.sign != b.sign) {
-        return a.sign ? float_relation_less : float_relation_greater;
-    }
-
-    if (a.exp == b.exp) {
-        if (a.frac == b.frac) {
-            return float_relation_equal;
-        }
-        if (a.sign) {
-            return a.frac > b.frac ?
-                float_relation_less : float_relation_greater;
-        } else {
-            return a.frac > b.frac ?
-                float_relation_greater : float_relation_less;
-        }
-    } else {
-        if (a.sign) {
-            return a.exp > b.exp ? float_relation_less : float_relation_greater;
-        } else {
-            return a.exp > b.exp ? float_relation_greater : float_relation_less;
-        }
-    }
+    float16_unpack_canonical(&pa, a, s);
+    float16_unpack_canonical(&pb, b, s);
+    return parts_compare(&pa, &pb, s, is_quiet);
 }
 
-#define COMPARE(name, attr, sz)                                         \
-static int attr                                                         \
-name(float ## sz a, float ## sz b, bool is_quiet, float_status *s)      \
-{                                                                       \
-    FloatParts64 pa, pb;                                                \
-    float ## sz ## _unpack_canonical(&pa, a, s);                        \
-    float ## sz ## _unpack_canonical(&pb, b, s);                        \
-    return compare_floats(pa, pb, is_quiet, s);                         \
-}
-
-COMPARE(soft_f16_compare, QEMU_FLATTEN, 16)
-COMPARE(soft_f32_compare, QEMU_SOFTFLOAT_ATTR, 32)
-COMPARE(soft_f64_compare, QEMU_SOFTFLOAT_ATTR, 64)
-
-#undef COMPARE
-
 FloatRelation float16_compare(float16 a, float16 b, float_status *s)
 {
-    return soft_f16_compare(a, b, false, s);
+    return float16_do_compare(a, b, s, false);
 }
 
 FloatRelation float16_compare_quiet(float16 a, float16 b, float_status *s)
 {
-    return soft_f16_compare(a, b, true, s);
+    return float16_do_compare(a, b, s, true);
+}
+
+static FloatRelation QEMU_SOFTFLOAT_ATTR
+float32_do_compare(float32 a, float32 b, float_status *s, bool is_quiet)
+{
+    FloatParts64 pa, pb;
+
+    float32_unpack_canonical(&pa, a, s);
+    float32_unpack_canonical(&pb, b, s);
+    return parts_compare(&pa, &pb, s, is_quiet);
 }
 
 static FloatRelation QEMU_FLATTEN
-f32_compare(float32 xa, float32 xb, bool is_quiet, float_status *s)
+float32_hs_compare(float32 xa, float32 xb, float_status *s, bool is_quiet)
 {
     union_float32 ua, ub;
 
@@ -3465,25 +3423,36 @@ f32_compare(float32 xa, float32 xb, bool is_quiet, float_status *s)
     if (likely(isless(ua.h, ub.h))) {
         return float_relation_less;
     }
-    /* The only condition remaining is unordered.
+    /*
+     * The only condition remaining is unordered.
      * Fall through to set flags.
      */
  soft:
-    return soft_f32_compare(ua.s, ub.s, is_quiet, s);
+    return float32_do_compare(ua.s, ub.s, s, is_quiet);
 }
 
 FloatRelation float32_compare(float32 a, float32 b, float_status *s)
 {
-    return f32_compare(a, b, false, s);
+    return float32_hs_compare(a, b, s, false);
 }
 
 FloatRelation float32_compare_quiet(float32 a, float32 b, float_status *s)
 {
-    return f32_compare(a, b, true, s);
+    return float32_hs_compare(a, b, s, true);
+}
+
+static FloatRelation QEMU_SOFTFLOAT_ATTR
+float64_do_compare(float64 a, float64 b, float_status *s, bool is_quiet)
+{
+    FloatParts64 pa, pb;
+
+    float64_unpack_canonical(&pa, a, s);
+    float64_unpack_canonical(&pb, b, s);
+    return parts_compare(&pa, &pb, s, is_quiet);
 }
 
 static FloatRelation QEMU_FLATTEN
-f64_compare(float64 xa, float64 xb, bool is_quiet, float_status *s)
+float64_hs_compare(float64 xa, float64 xb, float_status *s, bool is_quiet)
 {
     union_float64 ua, ub;
 
@@ -3504,41 +3473,62 @@ f64_compare(float64 xa, float64 xb, bool is_quiet, float_status *s)
     if (likely(isless(ua.h, ub.h))) {
         return float_relation_less;
     }
-    /* The only condition remaining is unordered.
+    /*
+     * The only condition remaining is unordered.
      * Fall through to set flags.
      */
  soft:
-    return soft_f64_compare(ua.s, ub.s, is_quiet, s);
+    return float64_do_compare(ua.s, ub.s, s, is_quiet);
 }
 
 FloatRelation float64_compare(float64 a, float64 b, float_status *s)
 {
-    return f64_compare(a, b, false, s);
+    return float64_hs_compare(a, b, s, false);
 }
 
 FloatRelation float64_compare_quiet(float64 a, float64 b, float_status *s)
 {
-    return f64_compare(a, b, true, s);
+    return float64_hs_compare(a, b, s, true);
 }
 
 static FloatRelation QEMU_FLATTEN
-soft_bf16_compare(bfloat16 a, bfloat16 b, bool is_quiet, float_status *s)
+bfloat16_do_compare(bfloat16 a, bfloat16 b, float_status *s, bool is_quiet)
 {
     FloatParts64 pa, pb;
 
     bfloat16_unpack_canonical(&pa, a, s);
     bfloat16_unpack_canonical(&pb, b, s);
-    return compare_floats(pa, pb, is_quiet, s);
+    return parts_compare(&pa, &pb, s, is_quiet);
 }
 
 FloatRelation bfloat16_compare(bfloat16 a, bfloat16 b, float_status *s)
 {
-    return soft_bf16_compare(a, b, false, s);
+    return bfloat16_do_compare(a, b, s, false);
 }
 
 FloatRelation bfloat16_compare_quiet(bfloat16 a, bfloat16 b, float_status *s)
 {
-    return soft_bf16_compare(a, b, true, s);
+    return bfloat16_do_compare(a, b, s, true);
+}
+
+static FloatRelation QEMU_FLATTEN
+float128_do_compare(float128 a, float128 b, float_status *s, bool is_quiet)
+{
+    FloatParts128 pa, pb;
+
+    float128_unpack_canonical(&pa, a, s);
+    float128_unpack_canonical(&pb, b, s);
+    return parts_compare(&pa, &pb, s, is_quiet);
+}
+
+FloatRelation float128_compare(float128 a, float128 b, float_status *s)
+{
+    return float128_do_compare(a, b, s, false);
+}
+
+FloatRelation float128_compare_quiet(float128 a, float128 b, float_status *s)
+{
+    return float128_do_compare(a, b, s, true);
 }
 
 /* Multiply A by 2 raised to the power N.  */
@@ -6611,52 +6601,6 @@ FloatRelation floatx80_compare_quiet(floatx80 a, floatx80 b,
     return floatx80_compare_internal(a, b, 1, status);
 }
 
-static inline FloatRelation
-float128_compare_internal(float128 a, float128 b, bool is_quiet,
-                          float_status *status)
-{
-    bool aSign, bSign;
-
-    if (( ( extractFloat128Exp( a ) == 0x7fff ) &&
-          ( extractFloat128Frac0( a ) | extractFloat128Frac1( a ) ) ) ||
-        ( ( extractFloat128Exp( b ) == 0x7fff ) &&
-          ( extractFloat128Frac0( b ) | extractFloat128Frac1( b ) ) )) {
-        if (!is_quiet ||
-            float128_is_signaling_nan(a, status) ||
-            float128_is_signaling_nan(b, status)) {
-            float_raise(float_flag_invalid, status);
-        }
-        return float_relation_unordered;
-    }
-    aSign = extractFloat128Sign( a );
-    bSign = extractFloat128Sign( b );
-    if ( aSign != bSign ) {
-        if ( ( ( ( a.high | b.high )<<1 ) | a.low | b.low ) == 0 ) {
-            /* zero case */
-            return float_relation_equal;
-        } else {
-            return 1 - (2 * aSign);
-        }
-    } else {
-        if (a.low == b.low && a.high == b.high) {
-            return float_relation_equal;
-        } else {
-            return 1 - 2 * (aSign ^ ( lt128( a.high, a.low, b.high, b.low ) ));
-        }
-    }
-}
-
-FloatRelation float128_compare(float128 a, float128 b, float_status *status)
-{
-    return float128_compare_internal(a, b, 0, status);
-}
-
-FloatRelation float128_compare_quiet(float128 a, float128 b,
-                                     float_status *status)
-{
-    return float128_compare_internal(a, b, 1, status);
-}
-
 floatx80 floatx80_scalbn(floatx80 a, int n, float_status *status)
 {
     bool aSign;
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 4d91ef0d32..4e11aa8b3b 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -1005,3 +1005,60 @@ static int partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
     }
     return a_less ^ !!(flags & minmax_ismin);
 }
+
+/*
+ * Floating point compare
+ */
+static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
+                                     float_status *s, bool is_quiet)
+{
+    int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
+    int cmp;
+
+    if (likely(ab_mask == float_cmask_normal)) {
+        if (a->sign != b->sign) {
+            goto a_sign;
+        }
+        if (a->exp != b->exp) {
+            cmp = a->exp < b->exp ? -1 : 1;
+        } else {
+            cmp = frac_cmp(a, b);
+        }
+        if (a->sign) {
+            cmp = -cmp;
+        }
+        return cmp;
+    }
+
+    if (unlikely(ab_mask & float_cmask_anynan)) {
+        if (!is_quiet || (ab_mask & float_cmask_snan)) {
+            float_raise(float_flag_invalid, s);
+        }
+        return float_relation_unordered;
+    }
+
+    if (ab_mask & float_cmask_zero) {
+        if (ab_mask == float_cmask_zero) {
+            return float_relation_equal;
+        } else if (a->cls == float_class_zero) {
+            goto b_sign;
+        } else {
+            goto a_sign;
+        }
+    }
+
+    if (ab_mask == float_cmask_inf) {
+        if (a->sign == b->sign) {
+            return float_relation_equal;
+        }
+    } else if (b->cls == float_class_inf) {
+        goto b_sign;
+    } else {
+        g_assert(a->cls == float_class_inf);
+    }
+
+ a_sign:
+    return a->sign ? float_relation_less : float_relation_greater;
+ b_sign:
+    return b->sign ? float_relation_greater : float_relation_less;
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 52/72] softfloat: Move scalbn_decomposed to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (50 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 51/72] softfloat: Move compare_floats " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 53/72] softfloat: Move sqrt_float " Richard Henderson
                   ` (24 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Rename to parts$N_scalbn.
Reimplement float128_scalbn with FloatParts128.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 103 +++++++++++++-------------------------
 fpu/softfloat-parts.c.inc |  21 ++++++++
 2 files changed, 55 insertions(+), 69 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 2faceadf8b..f83ca20000 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -889,6 +889,12 @@ static int parts128_compare(FloatParts128 *a, FloatParts128 *b,
 #define parts_compare(A, B, S, Q) \
     PARTS_GENERIC_64_128(compare, A)(A, B, S, Q)
 
+static void parts64_scalbn(FloatParts64 *a, int n, float_status *s);
+static void parts128_scalbn(FloatParts128 *a, int n, float_status *s);
+
+#define parts_scalbn(A, N, S) \
+    PARTS_GENERIC_64_128(scalbn, A)(A, N, S)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -3531,58 +3537,53 @@ FloatRelation float128_compare_quiet(float128 a, float128 b, float_status *s)
     return float128_do_compare(a, b, s, true);
 }
 
-/* Multiply A by 2 raised to the power N.  */
-static FloatParts64 scalbn_decomposed(FloatParts64 a, int n, float_status *s)
-{
-    if (unlikely(is_nan(a.cls))) {
-        parts_return_nan(&a, s);
-    }
-    if (a.cls == float_class_normal) {
-        /* The largest float type (even though not supported by FloatParts64)
-         * is float128, which has a 15 bit exponent.  Bounding N to 16 bits
-         * still allows rounding to infinity, without allowing overflow
-         * within the int32_t that backs FloatParts64.exp.
-         */
-        n = MIN(MAX(n, -0x10000), 0x10000);
-        a.exp += n;
-    }
-    return a;
-}
+/*
+ * Scale by 2**N
+ */
 
 float16 float16_scalbn(float16 a, int n, float_status *status)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float16_unpack_canonical(&pa, a, status);
-    pr = scalbn_decomposed(pa, n, status);
-    return float16_round_pack_canonical(&pr, status);
+    float16_unpack_canonical(&p, a, status);
+    parts_scalbn(&p, n, status);
+    return float16_round_pack_canonical(&p, status);
 }
 
 float32 float32_scalbn(float32 a, int n, float_status *status)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float32_unpack_canonical(&pa, a, status);
-    pr = scalbn_decomposed(pa, n, status);
-    return float32_round_pack_canonical(&pr, status);
+    float32_unpack_canonical(&p, a, status);
+    parts_scalbn(&p, n, status);
+    return float32_round_pack_canonical(&p, status);
 }
 
 float64 float64_scalbn(float64 a, int n, float_status *status)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float64_unpack_canonical(&pa, a, status);
-    pr = scalbn_decomposed(pa, n, status);
-    return float64_round_pack_canonical(&pr, status);
+    float64_unpack_canonical(&p, a, status);
+    parts_scalbn(&p, n, status);
+    return float64_round_pack_canonical(&p, status);
 }
 
 bfloat16 bfloat16_scalbn(bfloat16 a, int n, float_status *status)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    bfloat16_unpack_canonical(&pa, a, status);
-    pr = scalbn_decomposed(pa, n, status);
-    return bfloat16_round_pack_canonical(&pr, status);
+    bfloat16_unpack_canonical(&p, a, status);
+    parts_scalbn(&p, n, status);
+    return bfloat16_round_pack_canonical(&p, status);
+}
+
+float128 float128_scalbn(float128 a, int n, float_status *status)
+{
+    FloatParts128 p;
+
+    float128_unpack_canonical(&p, a, status);
+    parts_scalbn(&p, n, status);
+    return float128_round_pack_canonical(&p, status);
 }
 
 /*
@@ -6640,42 +6641,6 @@ floatx80 floatx80_scalbn(floatx80 a, int n, float_status *status)
                                          aSign, aExp, aSig, 0, status);
 }
 
-float128 float128_scalbn(float128 a, int n, float_status *status)
-{
-    bool aSign;
-    int32_t aExp;
-    uint64_t aSig0, aSig1;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    aSign = extractFloat128Sign( a );
-    if ( aExp == 0x7FFF ) {
-        if ( aSig0 | aSig1 ) {
-            return propagateFloat128NaN(a, a, status);
-        }
-        return a;
-    }
-    if (aExp != 0) {
-        aSig0 |= UINT64_C(0x0001000000000000);
-    } else if (aSig0 == 0 && aSig1 == 0) {
-        return a;
-    } else {
-        aExp++;
-    }
-
-    if (n > 0x10000) {
-        n = 0x10000;
-    } else if (n < -0x10000) {
-        n = -0x10000;
-    }
-
-    aExp += n - 1;
-    return normalizeRoundAndPackFloat128( aSign, aExp, aSig0, aSig1
-                                         , status);
-
-}
-
 static void __attribute__((constructor)) softfloat_init(void)
 {
     union_float64 ua, ub, uc, ur;
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 4e11aa8b3b..9749c11a6b 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -1062,3 +1062,24 @@ static FloatRelation partsN(compare)(FloatPartsN *a, FloatPartsN *b,
  b_sign:
     return b->sign ? float_relation_greater : float_relation_less;
 }
+
+/*
+ * Multiply A by 2 raised to the power N.
+ */
+static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
+{
+    switch (a->cls) {
+    case float_class_snan:
+    case float_class_qnan:
+        parts_return_nan(a, s);
+        break;
+    case float_class_zero:
+    case float_class_inf:
+        break;
+    case float_class_normal:
+        a->exp += MIN(MAX(n, -0x10000), 0x10000);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 53/72] softfloat: Move sqrt_float to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (51 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 52/72] softfloat: Move scalbn_decomposed " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 54/72] softfloat: Split out parts_uncanon_normal Richard Henderson
                   ` (23 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Rename to parts$N_sqrt.
Reimplement float128_sqrt with FloatParts128.

Reimplement with the inverse sqrt newton-raphson algorithm from musl.
This is significantly faster than even the berkeley sqrt n-r algorithm,
because it does not use division instructions, only multiplication.

Ordinarily, changing algorithms at the same time as migrating code is
a bad idea, but this is the only way I found that didn't break one of
the routines at the same time.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 207 ++++++++++----------------------------
 fpu/softfloat-parts.c.inc | 206 +++++++++++++++++++++++++++++++++++++
 2 files changed, 261 insertions(+), 152 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index f83ca20000..75a3bb881d 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -819,6 +819,12 @@ static FloatParts128 *parts128_div(FloatParts128 *a, FloatParts128 *b,
 #define parts_div(A, B, S) \
     PARTS_GENERIC_64_128(div, A)(A, B, S)
 
+static void parts64_sqrt(FloatParts64 *a, float_status *s, const FloatFmt *f);
+static void parts128_sqrt(FloatParts128 *a, float_status *s, const FloatFmt *f);
+
+#define parts_sqrt(A, S, F) \
+    PARTS_GENERIC_64_128(sqrt, A)(A, S, F)
+
 static bool parts64_round_to_int_normal(FloatParts64 *a, FloatRoundMode rm,
                                         int scale, int frac_size);
 static bool parts128_round_to_int_normal(FloatParts128 *a, FloatRoundMode r,
@@ -1385,6 +1391,30 @@ static void frac128_widen(FloatParts256 *r, FloatParts128 *a)
 
 #define frac_widen(A, B)  FRAC_GENERIC_64_128(widen, B)(A, B)
 
+/*
+ * Reciprocal sqrt table.  1 bit of exponent, 6-bits of mantessa.
+ * From https://git.musl-libc.org/cgit/musl/tree/src/math/sqrt_data.c
+ * and thus MIT licenced.
+ */
+static const uint16_t rsqrt_tab[128] = {
+    0xb451, 0xb2f0, 0xb196, 0xb044, 0xaef9, 0xadb6, 0xac79, 0xab43,
+    0xaa14, 0xa8eb, 0xa7c8, 0xa6aa, 0xa592, 0xa480, 0xa373, 0xa26b,
+    0xa168, 0xa06a, 0x9f70, 0x9e7b, 0x9d8a, 0x9c9d, 0x9bb5, 0x9ad1,
+    0x99f0, 0x9913, 0x983a, 0x9765, 0x9693, 0x95c4, 0x94f8, 0x9430,
+    0x936b, 0x92a9, 0x91ea, 0x912e, 0x9075, 0x8fbe, 0x8f0a, 0x8e59,
+    0x8daa, 0x8cfe, 0x8c54, 0x8bac, 0x8b07, 0x8a64, 0x89c4, 0x8925,
+    0x8889, 0x87ee, 0x8756, 0x86c0, 0x862b, 0x8599, 0x8508, 0x8479,
+    0x83ec, 0x8361, 0x82d8, 0x8250, 0x81c9, 0x8145, 0x80c2, 0x8040,
+    0xff02, 0xfd0e, 0xfb25, 0xf947, 0xf773, 0xf5aa, 0xf3ea, 0xf234,
+    0xf087, 0xeee3, 0xed47, 0xebb3, 0xea27, 0xe8a3, 0xe727, 0xe5b2,
+    0xe443, 0xe2dc, 0xe17a, 0xe020, 0xdecb, 0xdd7d, 0xdc34, 0xdaf1,
+    0xd9b3, 0xd87b, 0xd748, 0xd61a, 0xd4f1, 0xd3cd, 0xd2ad, 0xd192,
+    0xd07b, 0xcf69, 0xce5b, 0xcd51, 0xcc4a, 0xcb48, 0xca4a, 0xc94f,
+    0xc858, 0xc764, 0xc674, 0xc587, 0xc49d, 0xc3b7, 0xc2d4, 0xc1f4,
+    0xc116, 0xc03c, 0xbf65, 0xbe90, 0xbdbe, 0xbcef, 0xbc23, 0xbb59,
+    0xba91, 0xb9cc, 0xb90a, 0xb84a, 0xb78c, 0xb6d0, 0xb617, 0xb560,
+};
+
 #define partsN(NAME)   glue(glue(glue(parts,N),_),NAME)
 #define FloatPartsN    glue(FloatParts,N)
 #define FloatPartsW    glue(FloatParts,W)
@@ -3588,103 +3618,35 @@ float128 float128_scalbn(float128 a, int n, float_status *status)
 
 /*
  * Square Root
- *
- * The old softfloat code did an approximation step before zeroing in
- * on the final result. However for simpleness we just compute the
- * square root by iterating down from the implicit bit to enough extra
- * bits to ensure we get a correctly rounded result.
- *
- * This does mean however the calculation is slower than before,
- * especially for 64 bit floats.
  */
 
-static FloatParts64 sqrt_float(FloatParts64 a, float_status *s, const FloatFmt *p)
-{
-    uint64_t a_frac, r_frac, s_frac;
-    int bit, last_bit;
-
-    if (is_nan(a.cls)) {
-        parts_return_nan(&a, s);
-        return a;
-    }
-    if (a.cls == float_class_zero) {
-        return a;  /* sqrt(+-0) = +-0 */
-    }
-    if (a.sign) {
-        float_raise(float_flag_invalid, s);
-        parts_default_nan(&a, s);
-        return a;
-    }
-    if (a.cls == float_class_inf) {
-        return a;  /* sqrt(+inf) = +inf */
-    }
-
-    assert(a.cls == float_class_normal);
-
-    /* We need two overflow bits at the top. Adding room for that is a
-     * right shift. If the exponent is odd, we can discard the low bit
-     * by multiplying the fraction by 2; that's a left shift. Combine
-     * those and we shift right by 1 if the exponent is odd, otherwise 2.
-     */
-    a_frac = a.frac >> (2 - (a.exp & 1));
-    a.exp >>= 1;
-
-    /* Bit-by-bit computation of sqrt.  */
-    r_frac = 0;
-    s_frac = 0;
-
-    /* Iterate from implicit bit down to the 3 extra bits to compute a
-     * properly rounded result. Remember we've inserted two more bits
-     * at the top, so these positions are two less.
-     */
-    bit = DECOMPOSED_BINARY_POINT - 2;
-    last_bit = MAX(p->frac_shift - 4, 0);
-    do {
-        uint64_t q = 1ULL << bit;
-        uint64_t t_frac = s_frac + q;
-        if (t_frac <= a_frac) {
-            s_frac = t_frac + q;
-            a_frac -= t_frac;
-            r_frac += q;
-        }
-        a_frac <<= 1;
-    } while (--bit >= last_bit);
-
-    /* Undo the right shift done above. If there is any remaining
-     * fraction, the result is inexact. Set the sticky bit.
-     */
-    a.frac = (r_frac << 2) + (a_frac != 0);
-
-    return a;
-}
-
 float16 QEMU_FLATTEN float16_sqrt(float16 a, float_status *status)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float16_unpack_canonical(&pa, a, status);
-    pr = sqrt_float(pa, status, &float16_params);
-    return float16_round_pack_canonical(&pr, status);
+    float16_unpack_canonical(&p, a, status);
+    parts_sqrt(&p, status, &float16_params);
+    return float16_round_pack_canonical(&p, status);
 }
 
 static float32 QEMU_SOFTFLOAT_ATTR
 soft_f32_sqrt(float32 a, float_status *status)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float32_unpack_canonical(&pa, a, status);
-    pr = sqrt_float(pa, status, &float32_params);
-    return float32_round_pack_canonical(&pr, status);
+    float32_unpack_canonical(&p, a, status);
+    parts_sqrt(&p, status, &float32_params);
+    return float32_round_pack_canonical(&p, status);
 }
 
 static float64 QEMU_SOFTFLOAT_ATTR
 soft_f64_sqrt(float64 a, float_status *status)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    float64_unpack_canonical(&pa, a, status);
-    pr = sqrt_float(pa, status, &float64_params);
-    return float64_round_pack_canonical(&pr, status);
+    float64_unpack_canonical(&p, a, status);
+    parts_sqrt(&p, status, &float64_params);
+    return float64_round_pack_canonical(&p, status);
 }
 
 float32 QEMU_FLATTEN float32_sqrt(float32 xa, float_status *s)
@@ -3743,11 +3705,20 @@ float64 QEMU_FLATTEN float64_sqrt(float64 xa, float_status *s)
 
 bfloat16 QEMU_FLATTEN bfloat16_sqrt(bfloat16 a, float_status *status)
 {
-    FloatParts64 pa, pr;
+    FloatParts64 p;
 
-    bfloat16_unpack_canonical(&pa, a, status);
-    pr = sqrt_float(pa, status, &bfloat16_params);
-    return bfloat16_round_pack_canonical(&pr, status);
+    bfloat16_unpack_canonical(&p, a, status);
+    parts_sqrt(&p, status, &bfloat16_params);
+    return bfloat16_round_pack_canonical(&p, status);
+}
+
+float128 QEMU_FLATTEN float128_sqrt(float128 a, float_status *status)
+{
+    FloatParts128 p;
+
+    float128_unpack_canonical(&p, a, status);
+    parts_sqrt(&p, status, &float128_params);
+    return float128_round_pack_canonical(&p, status);
 }
 
 /*----------------------------------------------------------------------------
@@ -6475,74 +6446,6 @@ float128 float128_rem(float128 a, float128 b, float_status *status)
                                          status);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the square root of the quadruple-precision floating-point value `a'.
-| The operation is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float128 float128_sqrt(float128 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp, zExp;
-    uint64_t aSig0, aSig1, zSig0, zSig1, zSig2, doubleZSig0;
-    uint64_t rem0, rem1, rem2, rem3, term0, term1, term2, term3;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    aSign = extractFloat128Sign( a );
-    if ( aExp == 0x7FFF ) {
-        if (aSig0 | aSig1) {
-            return propagateFloat128NaN(a, a, status);
-        }
-        if ( ! aSign ) return a;
-        goto invalid;
-    }
-    if ( aSign ) {
-        if ( ( aExp | aSig0 | aSig1 ) == 0 ) return a;
- invalid:
-        float_raise(float_flag_invalid, status);
-        return float128_default_nan(status);
-    }
-    if ( aExp == 0 ) {
-        if ( ( aSig0 | aSig1 ) == 0 ) return packFloat128( 0, 0, 0, 0 );
-        normalizeFloat128Subnormal( aSig0, aSig1, &aExp, &aSig0, &aSig1 );
-    }
-    zExp = ( ( aExp - 0x3FFF )>>1 ) + 0x3FFE;
-    aSig0 |= UINT64_C(0x0001000000000000);
-    zSig0 = estimateSqrt32( aExp, aSig0>>17 );
-    shortShift128Left( aSig0, aSig1, 13 - ( aExp & 1 ), &aSig0, &aSig1 );
-    zSig0 = estimateDiv128To64( aSig0, aSig1, zSig0<<32 ) + ( zSig0<<30 );
-    doubleZSig0 = zSig0<<1;
-    mul64To128( zSig0, zSig0, &term0, &term1 );
-    sub128( aSig0, aSig1, term0, term1, &rem0, &rem1 );
-    while ( (int64_t) rem0 < 0 ) {
-        --zSig0;
-        doubleZSig0 -= 2;
-        add128( rem0, rem1, zSig0>>63, doubleZSig0 | 1, &rem0, &rem1 );
-    }
-    zSig1 = estimateDiv128To64( rem1, 0, doubleZSig0 );
-    if ( ( zSig1 & 0x1FFF ) <= 5 ) {
-        if ( zSig1 == 0 ) zSig1 = 1;
-        mul64To128( doubleZSig0, zSig1, &term1, &term2 );
-        sub128( rem1, 0, term1, term2, &rem1, &rem2 );
-        mul64To128( zSig1, zSig1, &term2, &term3 );
-        sub192( rem1, rem2, 0, 0, term2, term3, &rem1, &rem2, &rem3 );
-        while ( (int64_t) rem1 < 0 ) {
-            --zSig1;
-            shortShift128Left( 0, zSig1, 1, &term2, &term3 );
-            term3 |= 1;
-            term2 |= doubleZSig0;
-            add192( rem1, rem2, rem3, 0, term2, term3, &rem1, &rem2, &rem3 );
-        }
-        zSig1 |= ( ( rem1 | rem2 | rem3 ) != 0 );
-    }
-    shift128ExtraRightJamming( zSig0, zSig1, 0, 14, &zSig0, &zSig1, &zSig2 );
-    return roundAndPackFloat128(0, zExp, zSig0, zSig1, zSig2, status);
-
-}
-
 static inline FloatRelation
 floatx80_compare_internal(floatx80 a, floatx80 b, bool is_quiet,
                           float_status *status)
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 9749c11a6b..293029711c 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -595,6 +595,212 @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
     return a;
 }
 
+/*
+ * Square Root
+ *
+ * The base algorithm is lifted from
+ * https://git.musl-libc.org/cgit/musl/tree/src/math/sqrtf.c
+ * https://git.musl-libc.org/cgit/musl/tree/src/math/sqrt.c
+ * https://git.musl-libc.org/cgit/musl/tree/src/math/sqrtl.c
+ * and is thus MIT licenced.
+ */
+static void partsN(sqrt)(FloatPartsN *a, float_status *status,
+                         const FloatFmt *fmt)
+{
+    const uint32_t three32 = 3u << 30;
+    const uint64_t three64 = 3ull << 62;
+    uint32_t d32, m32, r32, s32, u32;            /* 32-bit computation */
+    uint64_t d64, m64, r64, s64, u64;            /* 64-bit computation */
+    uint64_t dh, dl, rh, rl, sh, sl, uh, ul;     /* 128-bit computation */
+    uint64_t d0h, d0l, d1h, d1l, d2h, d2l;
+    uint64_t discard;
+    bool exp_odd;
+    size_t index;
+
+    if (unlikely(a->cls != float_class_normal)) {
+        switch (a->cls) {
+        case float_class_snan:
+        case float_class_qnan:
+            parts_return_nan(a, status);
+            return;
+        case float_class_zero:
+            return;
+        case float_class_inf:
+            if (unlikely(a->sign)) {
+                goto d_nan;
+            }
+            return;
+        default:
+            g_assert_not_reached();
+        }
+    }
+
+    if (unlikely(a->sign)) {
+        goto d_nan;
+    }
+
+    /*
+     * Argument reduction.
+     * x = 4^e frac; with integer e, and frac in [1, 4)
+     * m = frac fixed point at bit 62, since we're in base 4.
+     * If base-2 exponent is odd, exchange that for multiply by 2,
+     * which results in no shift.
+     */
+    exp_odd = a->exp & 1;
+    index = extract64(a->frac_hi, 57, 6) | (!exp_odd << 6);
+    if (!exp_odd) {
+        frac_shr(a, 1);
+    }
+
+    /*
+     * Approximate r ~= 1/sqrt(m) and s ~= sqrt(m) when m in [1, 4).
+     *
+     * Initial estimate:
+     * 7-bit lookup table (1-bit exponent and 6-bit significand).
+     *
+     * The relative error (e = r0*sqrt(m)-1) of a linear estimate
+     * (r0 = a*m + b) is |e| < 0.085955 ~ 0x1.6p-4 at best;
+     * a table lookup is faster and needs one less iteration.
+     * The 7-bit table gives |e| < 0x1.fdp-9.
+     *
+     * A Newton-Raphson iteration for r is
+     *   s = m*r
+     *   d = s*r
+     *   u = 3 - d
+     *   r = r*u/2
+     *
+     * Fixed point representations:
+     *   m, s, d, u, three are all 2.30; r is 0.32
+     */
+    m64 = a->frac_hi;
+    m32 = m64 >> 32;
+
+    r32 = rsqrt_tab[index] << 16;
+    /* |r*sqrt(m) - 1| < 0x1.FDp-9 */
+
+    s32 = ((uint64_t)m32 * r32) >> 32;
+    d32 = ((uint64_t)s32 * r32) >> 32;
+    u32 = three32 - d32;
+
+    if (N == 64) {
+        /* float64 or smaller */
+
+        r32 = ((uint64_t)r32 * u32) >> 31;
+        /* |r*sqrt(m) - 1| < 0x1.7Bp-16 */
+
+        s32 = ((uint64_t)m32 * r32) >> 32;
+        d32 = ((uint64_t)s32 * r32) >> 32;
+        u32 = three32 - d32;
+
+        if (fmt->frac_size <= 23) {
+            /* float32 or smaller */
+
+            s32 = ((uint64_t)s32 * u32) >> 32;  /* 3.29 */
+            s32 = (s32 - 1) >> 6;               /* 9.23 */
+            /* s < sqrt(m) < s + 0x1.08p-23 */
+
+            /* compute nearest rounded result to 2.23 bits */
+            uint32_t d0 = (m32 << 16) - s32 * s32;
+            uint32_t d1 = s32 - d0;
+            uint32_t d2 = d1 + s32 + 1;
+            s32 += d1 >> 31;
+            a->frac_hi = (uint64_t)s32 << (64 - 25);
+
+            /* increment or decrement for inexact */
+            if (d2 != 0) {
+                a->frac_hi += ((int32_t)(d1 ^ d2) < 0 ? -1 : 1);
+            }
+            goto done;
+        }
+
+        /* float64 */
+
+        r64 = (uint64_t)r32 * u32 * 2;
+        /* |r*sqrt(m) - 1| < 0x1.37-p29; convert to 64-bit arithmetic */
+        mul64To128(m64, r64, &s64, &discard);
+        mul64To128(s64, r64, &d64, &discard);
+        u64 = three64 - d64;
+
+        mul64To128(s64, u64, &s64, &discard);  /* 3.61 */
+        s64 = (s64 - 2) >> 9;                  /* 12.52 */
+
+        /* Compute nearest rounded result */
+        uint64_t d0 = (m64 << 42) - s64 * s64;
+        uint64_t d1 = s64 - d0;
+        uint64_t d2 = d1 + s64 + 1;
+        s64 += d1 >> 63;
+        a->frac_hi = s64 << (64 - 54);
+
+        /* increment or decrement for inexact */
+        if (d2 != 0) {
+            a->frac_hi += ((int64_t)(d1 ^ d2) < 0 ? -1 : 1);
+        }
+        goto done;
+    }
+
+    r64 = (uint64_t)r32 * u32 * 2;
+    /* |r*sqrt(m) - 1| < 0x1.7Bp-16; convert to 64-bit arithmetic */
+
+    mul64To128(m64, r64, &s64, &discard);
+    mul64To128(s64, r64, &d64, &discard);
+    u64 = three64 - d64;
+    mul64To128(u64, r64, &r64, &discard);
+    r64 <<= 1;
+    /* |r*sqrt(m) - 1| < 0x1.a5p-31 */
+
+    mul64To128(m64, r64, &s64, &discard);
+    mul64To128(s64, r64, &d64, &discard);
+    u64 = three64 - d64;
+    mul64To128(u64, r64, &rh, &rl);
+    add128(rh, rl, rh, rl, &rh, &rl);
+    /* |r*sqrt(m) - 1| < 0x1.c001p-59; change to 128-bit arithmetic */
+
+    mul128To256(a->frac_hi, a->frac_lo, rh, rl, &sh, &sl, &discard, &discard);
+    mul128To256(sh, sl, rh, rl, &dh, &dl, &discard, &discard);
+    sub128(three64, 0, dh, dl, &uh, &ul);
+    mul128To256(uh, ul, sh, sl, &sh, &sl, &discard, &discard);  /* 3.125 */
+    /* -0x1p-116 < s - sqrt(m) < 0x3.8001p-125 */
+
+    sub128(sh, sl, 0, 4, &sh, &sl);
+    shift128Right(sh, sl, 13, &sh, &sl);  /* 16.112 */
+    /* s < sqrt(m) < s + 1ulp */
+
+    /* Compute nearest rounded result */
+    mul64To128(sl, sl, &d0h, &d0l);
+    d0h += 2 * sh * sl;
+    sub128(a->frac_lo << 34, 0, d0h, d0l, &d0h, &d0l);
+    sub128(sh, sl, d0h, d0l, &d1h, &d1l);
+    add128(sh, sl, 0, 1, &d2h, &d2l);
+    add128(d2h, d2l, d1h, d1l, &d2h, &d2l);
+    add128(sh, sl, 0, d1h >> 63, &sh, &sl);
+    shift128Left(sh, sl, 128 - 114, &sh, &sl);
+
+    /* increment or decrement for inexact */
+    if (d2h | d2l) {
+        if ((int64_t)(d1h ^ d2h) < 0) {
+            sub128(sh, sl, 0, 1, &sh, &sl);
+        } else {
+            add128(sh, sl, 0, 1, &sh, &sl);
+        }
+    }
+    a->frac_lo = sl;
+    a->frac_hi = sh;
+
+ done:
+    /* Convert back from base 4 to base 2. */
+    a->exp >>= 1;
+    if (!(a->frac_hi & DECOMPOSED_IMPLICIT_BIT)) {
+        frac_add(a, a, a);
+    } else {
+        a->exp += 1;
+    }
+    return;
+
+ d_nan:
+    float_raise(float_flag_invalid, status);
+    parts_default_nan(a, status);
+}
+
 /*
  * Rounds the floating-point value `a' to an integer, and returns the
  * result as a floating-point value. The operation is performed
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 54/72] softfloat: Split out parts_uncanon_normal
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (52 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 53/72] softfloat: Move sqrt_float " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 55/72] softfloat: Reduce FloatFmt Richard Henderson
                   ` (22 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

We will need to treat the non-normal cases of floatx80 specially,
so split out the normal case that we can reuse.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           |  8 ++++++
 fpu/softfloat-parts.c.inc | 56 ++++++++++++++++++++++-----------------
 2 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 75a3bb881d..4df69029ec 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -763,6 +763,14 @@ static void parts128_canonicalize(FloatParts128 *p, float_status *status,
 #define parts_canonicalize(A, S, F) \
     PARTS_GENERIC_64_128(canonicalize, A)(A, S, F)
 
+static void parts64_uncanon_normal(FloatParts64 *p, float_status *status,
+                                   const FloatFmt *fmt);
+static void parts128_uncanon_normal(FloatParts128 *p, float_status *status,
+                                    const FloatFmt *fmt);
+
+#define parts_uncanon_normal(A, S, F) \
+    PARTS_GENERIC_64_128(uncanon_normal, A)(A, S, F)
+
 static void parts64_uncanon(FloatParts64 *p, float_status *status,
                             const FloatFmt *fmt);
 static void parts128_uncanon(FloatParts128 *p, float_status *status,
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 293029711c..65462bf6cb 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -140,8 +140,8 @@ static void partsN(canonicalize)(FloatPartsN *p, float_status *status,
  * fraction; these bits will be removed. The exponent will be biased
  * by EXP_BIAS and must be bounded by [EXP_MAX-1, 0].
  */
-static void partsN(uncanon)(FloatPartsN *p, float_status *s,
-                            const FloatFmt *fmt)
+static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
+                                   const FloatFmt *fmt)
 {
     const int exp_max = fmt->exp_max;
     const int frac_shift = fmt->frac_shift;
@@ -153,29 +153,6 @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
     bool overflow_norm;
     int exp, flags = 0;
 
-    if (unlikely(p->cls != float_class_normal)) {
-        switch (p->cls) {
-        case float_class_zero:
-            p->exp = 0;
-            frac_clear(p);
-            return;
-        case float_class_inf:
-            g_assert(!fmt->arm_althp);
-            p->exp = fmt->exp_max;
-            frac_clear(p);
-            return;
-        case float_class_qnan:
-        case float_class_snan:
-            g_assert(!fmt->arm_althp);
-            p->exp = fmt->exp_max;
-            frac_shr(p, fmt->frac_shift);
-            return;
-        default:
-            break;
-        }
-        g_assert_not_reached();
-    }
-
     switch (s->float_rounding_mode) {
     case float_round_nearest_even:
         overflow_norm = false;
@@ -282,6 +259,35 @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
     float_raise(flags, s);
 }
 
+static void partsN(uncanon)(FloatPartsN *p, float_status *s,
+                            const FloatFmt *fmt)
+{
+    if (likely(p->cls == float_class_normal)) {
+        parts_uncanon_normal(p, s, fmt);
+    } else {
+        switch (p->cls) {
+        case float_class_zero:
+            p->exp = 0;
+            frac_clear(p);
+            return;
+        case float_class_inf:
+            g_assert(!fmt->arm_althp);
+            p->exp = fmt->exp_max;
+            frac_clear(p);
+            return;
+        case float_class_qnan:
+        case float_class_snan:
+            g_assert(!fmt->arm_althp);
+            p->exp = fmt->exp_max;
+            frac_shr(p, fmt->frac_shift);
+            return;
+        default:
+            break;
+        }
+        g_assert_not_reached();
+    }
+}
+
 /*
  * Returns the result of adding or subtracting the values of the
  * floating-point values `a' and `b'. The operation is performed
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 55/72] softfloat: Reduce FloatFmt
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (53 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 54/72] softfloat: Split out parts_uncanon_normal Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 56/72] softfloat: Introduce Floatx80RoundPrec Richard Henderson
                   ` (21 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Remove frac_lsb, frac_lsbm1, roundeven_mask.  Compute
these from round_mask in parts$N_uncanon_normal.

With floatx80, round_mask will not be tied to frac_shift.
Everything else is easily computable.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 29 ++++++++++++-----------------
 fpu/softfloat-parts.c.inc |  6 +++---
 2 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 4df69029ec..6a77e35663 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -562,9 +562,7 @@ typedef struct {
  *   frac_size: the size of the fraction field
  *   frac_shift: shift to normalise the fraction with DECOMPOSED_BINARY_POINT
  * The following are computed based the size of fraction
- *   frac_lsb: least significant bit of fraction
- *   frac_lsbm1: the bit below the least significant bit (for rounding)
- *   round_mask/roundeven_mask: masks used for rounding
+ *   round_mask: bits below lsb which must be rounded
  * The following optional modifiers are available:
  *   arm_althp: handle ARM Alternative Half Precision
  */
@@ -574,24 +572,21 @@ typedef struct {
     int exp_max;
     int frac_size;
     int frac_shift;
-    uint64_t frac_lsb;
-    uint64_t frac_lsbm1;
-    uint64_t round_mask;
-    uint64_t roundeven_mask;
     bool arm_althp;
+    uint64_t round_mask;
 } FloatFmt;
 
 /* Expand fields based on the size of exponent and fraction */
-#define FLOAT_PARAMS(E, F)                                           \
-    .exp_size       = E,                                             \
-    .exp_bias       = ((1 << E) - 1) >> 1,                           \
-    .exp_max        = (1 << E) - 1,                                  \
-    .frac_size      = F,                                             \
-    .frac_shift     = (-F - 1) & 63,                                 \
-    .frac_lsb       = 1ull << ((-F - 1) & 63),                       \
-    .frac_lsbm1     = 1ull << ((-F - 2) & 63),                       \
-    .round_mask     = (1ull << ((-F - 1) & 63)) - 1,                 \
-    .roundeven_mask = (2ull << ((-F - 1) & 63)) - 1
+#define FLOAT_PARAMS_(E, F)                             \
+    .exp_size       = E,                                \
+    .exp_bias       = ((1 << E) - 1) >> 1,              \
+    .exp_max        = (1 << E) - 1,                     \
+    .frac_size      = F
+
+#define FLOAT_PARAMS(E, F)                              \
+    FLOAT_PARAMS_(E, F),                                \
+    .frac_shift     = (-F - 1) & 63,                    \
+    .round_mask     = (1ull << ((-F - 1) & 63)) - 1
 
 static const FloatFmt float16_params = {
     FLOAT_PARAMS(5, 10)
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 65462bf6cb..3ee6552d5a 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -145,10 +145,10 @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
 {
     const int exp_max = fmt->exp_max;
     const int frac_shift = fmt->frac_shift;
-    const uint64_t frac_lsb = fmt->frac_lsb;
-    const uint64_t frac_lsbm1 = fmt->frac_lsbm1;
     const uint64_t round_mask = fmt->round_mask;
-    const uint64_t roundeven_mask = fmt->roundeven_mask;
+    const uint64_t frac_lsb = round_mask + 1;
+    const uint64_t frac_lsbm1 = round_mask ^ (round_mask >> 1);
+    const uint64_t roundeven_mask = round_mask | frac_lsb;
     uint64_t inc;
     bool overflow_norm;
     int exp, flags = 0;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 56/72] softfloat: Introduce Floatx80RoundPrec
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (54 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 55/72] softfloat: Reduce FloatFmt Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 57/72] softfloat: Adjust parts_uncanon_normal for floatx80 Richard Henderson
                   ` (20 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Use an enumeration instead of raw 32/64/80 values.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-helpers.h |  5 +-
 include/fpu/softfloat-types.h   | 10 +++-
 include/fpu/softfloat.h         |  4 +-
 fpu/softfloat.c                 | 32 ++++++------
 linux-user/arm/nwfpe/fpa11.c    | 41 +++++++--------
 target/i386/tcg/fpu_helper.c    | 79 +++++++++++++++++------------
 target/m68k/fpu_helper.c        | 50 +++++++++---------
 target/m68k/softfloat.c         | 90 ++++++++++++++++++++-------------
 tests/fp/fp-test.c              |  5 +-
 9 files changed, 182 insertions(+), 134 deletions(-)

diff --git a/include/fpu/softfloat-helpers.h b/include/fpu/softfloat-helpers.h
index 2f0674fbdd..34f4cf92ae 100644
--- a/include/fpu/softfloat-helpers.h
+++ b/include/fpu/softfloat-helpers.h
@@ -69,7 +69,7 @@ static inline void set_float_exception_flags(int val, float_status *status)
     status->float_exception_flags = val;
 }
 
-static inline void set_floatx80_rounding_precision(int val,
+static inline void set_floatx80_rounding_precision(FloatX80RoundPrec val,
                                                    float_status *status)
 {
     status->floatx80_rounding_precision = val;
@@ -120,7 +120,8 @@ static inline int get_float_exception_flags(float_status *status)
     return status->float_exception_flags;
 }
 
-static inline int get_floatx80_rounding_precision(float_status *status)
+static inline FloatX80RoundPrec
+get_floatx80_rounding_precision(float_status *status)
 {
     return status->floatx80_rounding_precision;
 }
diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index 8a3f20fae9..1f83378c20 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -152,6 +152,14 @@ enum {
     float_flag_output_denormal = 128
 };
 
+/*
+ * Rounding precision for floatx80.
+ */
+typedef enum __attribute__((__packed__)) {
+    floatx80_precision_x,
+    floatx80_precision_d,
+    floatx80_precision_s,
+} FloatX80RoundPrec;
 
 /*
  * Floating Point Status. Individual architectures may maintain
@@ -163,7 +171,7 @@ enum {
 typedef struct float_status {
     FloatRoundMode float_rounding_mode;
     uint8_t     float_exception_flags;
-    signed char floatx80_rounding_precision;
+    FloatX80RoundPrec floatx80_rounding_precision;
     bool tininess_before_rounding;
     /* should denormalised results go to zero and set the inexact flag? */
     bool flush_to_zero;
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 53f2c2ea3c..94f7841b9f 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -1152,7 +1152,7 @@ floatx80 propagateFloatx80NaN(floatx80 a, floatx80 b, float_status *status);
 | Floating-Point Arithmetic.
 *----------------------------------------------------------------------------*/
 
-floatx80 roundAndPackFloatx80(int8_t roundingPrecision, bool zSign,
+floatx80 roundAndPackFloatx80(FloatX80RoundPrec roundingPrecision, bool zSign,
                               int32_t zExp, uint64_t zSig0, uint64_t zSig1,
                               float_status *status);
 
@@ -1165,7 +1165,7 @@ floatx80 roundAndPackFloatx80(int8_t roundingPrecision, bool zSign,
 | normalized.
 *----------------------------------------------------------------------------*/
 
-floatx80 normalizeRoundAndPackFloatx80(int8_t roundingPrecision,
+floatx80 normalizeRoundAndPackFloatx80(FloatX80RoundPrec roundingPrecision,
                                        bool zSign, int32_t zExp,
                                        uint64_t zSig0, uint64_t zSig1,
                                        float_status *status);
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6a77e35663..441b8f9dc1 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -4344,10 +4344,10 @@ void normalizeFloatx80Subnormal(uint64_t aSig, int32_t *zExpPtr,
 | a subnormal number, and the underflow and inexact exceptions are raised if
 | the abstract input cannot be represented exactly as a subnormal extended
 | double-precision floating-point number.
-|     If `roundingPrecision' is 32 or 64, the result is rounded to the same
-| number of bits as single or double precision, respectively.  Otherwise, the
-| result is rounded to the full precision of the extended double-precision
-| format.
+|     If `roundingPrecision' is floatx80_precision_s or floatx80_precision_d,
+| the result is rounded to the same number of bits as single or double
+| precision, respectively.  Otherwise, the result is rounded to the full
+| precision of the extended double-precision format.
 |     The input significand must be normalized or smaller.  If the input
 | significand is not normalized, `zExp' must be 0; in that case, the result
 | returned is a subnormal number, and it must not require rounding.  The
@@ -4355,27 +4355,29 @@ void normalizeFloatx80Subnormal(uint64_t aSig, int32_t *zExpPtr,
 | Floating-Point Arithmetic.
 *----------------------------------------------------------------------------*/
 
-floatx80 roundAndPackFloatx80(int8_t roundingPrecision, bool zSign,
+floatx80 roundAndPackFloatx80(FloatX80RoundPrec roundingPrecision, bool zSign,
                               int32_t zExp, uint64_t zSig0, uint64_t zSig1,
                               float_status *status)
 {
-    int8_t roundingMode;
+    FloatRoundMode roundingMode;
     bool roundNearestEven, increment, isTiny;
     int64_t roundIncrement, roundMask, roundBits;
 
     roundingMode = status->float_rounding_mode;
     roundNearestEven = ( roundingMode == float_round_nearest_even );
-    if ( roundingPrecision == 80 ) goto precision80;
-    if ( roundingPrecision == 64 ) {
+    switch (roundingPrecision) {
+    case floatx80_precision_x:
+        goto precision80;
+    case floatx80_precision_d:
         roundIncrement = UINT64_C(0x0000000000000400);
         roundMask = UINT64_C(0x00000000000007FF);
-    }
-    else if ( roundingPrecision == 32 ) {
+        break;
+    case floatx80_precision_s:
         roundIncrement = UINT64_C(0x0000008000000000);
         roundMask = UINT64_C(0x000000FFFFFFFFFF);
-    }
-    else {
-        goto precision80;
+        break;
+    default:
+        g_assert_not_reached();
     }
     zSig0 |= ( zSig1 != 0 );
     switch (roundingMode) {
@@ -4552,7 +4554,7 @@ floatx80 roundAndPackFloatx80(int8_t roundingPrecision, bool zSign,
 | normalized.
 *----------------------------------------------------------------------------*/
 
-floatx80 normalizeRoundAndPackFloatx80(int8_t roundingPrecision,
+floatx80 normalizeRoundAndPackFloatx80(FloatX80RoundPrec roundingPrecision,
                                        bool zSign, int32_t zExp,
                                        uint64_t zSig0, uint64_t zSig1,
                                        float_status *status)
@@ -6205,7 +6207,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod, uint64_t *quotient,
     }
     return
         normalizeRoundAndPackFloatx80(
-            80, zSign, bExp + expDiff, aSig0, aSig1, status);
+            floatx80_precision_x, zSign, bExp + expDiff, aSig0, aSig1, status);
 
 }
 
diff --git a/linux-user/arm/nwfpe/fpa11.c b/linux-user/arm/nwfpe/fpa11.c
index f6f8163eab..9a93610d24 100644
--- a/linux-user/arm/nwfpe/fpa11.c
+++ b/linux-user/arm/nwfpe/fpa11.c
@@ -97,37 +97,38 @@ void SetRoundingMode(const unsigned int opcode)
 
 void SetRoundingPrecision(const unsigned int opcode)
 {
-    int rounding_precision;
-   FPA11 *fpa11 = GET_FPA11();
+    FloatX80RoundPrec rounding_precision;
+    FPA11 *fpa11 = GET_FPA11();
 #ifdef MAINTAIN_FPCR
-   fpa11->fpcr &= ~MASK_ROUNDING_PRECISION;
+    fpa11->fpcr &= ~MASK_ROUNDING_PRECISION;
 #endif
-   switch (opcode & MASK_ROUNDING_PRECISION)
-   {
-      case ROUND_SINGLE:
-         rounding_precision = 32;
+    switch (opcode & MASK_ROUNDING_PRECISION) {
+    case ROUND_SINGLE:
+        rounding_precision = floatx80_precision_s;
 #ifdef MAINTAIN_FPCR
-         fpa11->fpcr |= ROUND_SINGLE;
+        fpa11->fpcr |= ROUND_SINGLE;
 #endif
-      break;
+        break;
 
-      case ROUND_DOUBLE:
-         rounding_precision = 64;
+    case ROUND_DOUBLE:
+        rounding_precision = floatx80_precision_d;
 #ifdef MAINTAIN_FPCR
-         fpa11->fpcr |= ROUND_DOUBLE;
+        fpa11->fpcr |= ROUND_DOUBLE;
 #endif
-      break;
+        break;
 
-      case ROUND_EXTENDED:
-         rounding_precision = 80;
+    case ROUND_EXTENDED:
+        rounding_precision = floatx80_precision_x;
 #ifdef MAINTAIN_FPCR
-         fpa11->fpcr |= ROUND_EXTENDED;
+        fpa11->fpcr |= ROUND_EXTENDED;
 #endif
-      break;
+        break;
 
-      default: rounding_precision = 80;
-  }
-   set_floatx80_rounding_precision(rounding_precision, &fpa11->fp_status);
+    default:
+        rounding_precision = floatx80_precision_x;
+        break;
+    }
+    set_floatx80_rounding_precision(rounding_precision, &fpa11->fp_status);
 }
 
 /* Emulate the instruction in the opcode. */
diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index 60ed93520a..e495ddf1aa 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -711,38 +711,40 @@ uint32_t helper_fnstcw(CPUX86State *env)
 
 void update_fp_status(CPUX86State *env)
 {
-    int rnd_type;
+    FloatRoundMode rnd_mode;
+    FloatX80RoundPrec rnd_prec;
 
     /* set rounding mode */
     switch (env->fpuc & FPU_RC_MASK) {
     default:
     case FPU_RC_NEAR:
-        rnd_type = float_round_nearest_even;
+        rnd_mode = float_round_nearest_even;
         break;
     case FPU_RC_DOWN:
-        rnd_type = float_round_down;
+        rnd_mode = float_round_down;
         break;
     case FPU_RC_UP:
-        rnd_type = float_round_up;
+        rnd_mode = float_round_up;
         break;
     case FPU_RC_CHOP:
-        rnd_type = float_round_to_zero;
+        rnd_mode = float_round_to_zero;
         break;
     }
-    set_float_rounding_mode(rnd_type, &env->fp_status);
+    set_float_rounding_mode(rnd_mode, &env->fp_status);
+
     switch ((env->fpuc >> 8) & 3) {
     case 0:
-        rnd_type = 32;
+        rnd_prec = floatx80_precision_s;
         break;
     case 2:
-        rnd_type = 64;
+        rnd_prec = floatx80_precision_d;
         break;
     case 3:
     default:
-        rnd_type = 80;
+        rnd_prec = floatx80_precision_x;
         break;
     }
-    set_floatx80_rounding_precision(rnd_type, &env->fp_status);
+    set_floatx80_rounding_precision(rnd_prec, &env->fp_status);
 }
 
 void helper_fldcw(CPUX86State *env, uint32_t val)
@@ -1112,7 +1114,8 @@ void helper_f2xm1(CPUX86State *env)
                             &sig2);
             /* This result is inexact.  */
             sig1 |= 1;
-            ST0 = normalizeRoundAndPackFloatx80(80, sign, exp, sig0, sig1,
+            ST0 = normalizeRoundAndPackFloatx80(floatx80_precision_x,
+                                                sign, exp, sig0, sig1,
                                                 &env->fp_status);
         }
     } else {
@@ -1121,9 +1124,10 @@ void helper_f2xm1(CPUX86State *env)
         int32_t n, aexp, bexp;
         uint64_t asig0, asig1, asig2, bsig0, bsig1;
         FloatRoundMode save_mode = env->fp_status.float_rounding_mode;
-        signed char save_prec = env->fp_status.floatx80_rounding_precision;
+        FloatX80RoundPrec save_prec =
+            env->fp_status.floatx80_rounding_precision;
         env->fp_status.float_rounding_mode = float_round_nearest_even;
-        env->fp_status.floatx80_rounding_precision = 80;
+        env->fp_status.floatx80_rounding_precision = floatx80_precision_x;
 
         /* Find the nearest multiple of 1/32 to the argument.  */
         tmp = floatx80_scalbn(ST0, 5, &env->fp_status);
@@ -1221,7 +1225,8 @@ void helper_f2xm1(CPUX86State *env)
             env->fp_status.float_rounding_mode = save_mode;
             /* This result is inexact.  */
             asig1 |= 1;
-            ST0 = normalizeRoundAndPackFloatx80(80, asign, aexp, asig0, asig1,
+            ST0 = normalizeRoundAndPackFloatx80(floatx80_precision_x,
+                                                asign, aexp, asig0, asig1,
                                                 &env->fp_status);
         }
 
@@ -1339,8 +1344,9 @@ void helper_fpatan(CPUX86State *env)
          * division is exact, the result of fpatan is still inexact
          * (and underflowing where appropriate).
          */
-        signed char save_prec = env->fp_status.floatx80_rounding_precision;
-        env->fp_status.floatx80_rounding_precision = 80;
+        FloatX80RoundPrec save_prec =
+            env->fp_status.floatx80_rounding_precision;
+        env->fp_status.floatx80_rounding_precision = floatx80_precision_x;
         ST1 = floatx80_div(ST1, ST0, &env->fp_status);
         env->fp_status.floatx80_rounding_precision = save_prec;
         if (!floatx80_is_zero(ST1) &&
@@ -1359,7 +1365,8 @@ void helper_fpatan(CPUX86State *env)
             if (exp == 0) {
                 normalizeFloatx80Subnormal(sig, &exp, &sig);
             }
-            ST1 = normalizeRoundAndPackFloatx80(80, sign, exp, sig - 1,
+            ST1 = normalizeRoundAndPackFloatx80(floatx80_precision_x,
+                                                sign, exp, sig - 1,
                                                 -1, &env->fp_status);
         }
     } else {
@@ -1415,9 +1422,10 @@ void helper_fpatan(CPUX86State *env)
             uint64_t azsig2, azsig3, axsig0, axsig1;
             floatx80 x8;
             FloatRoundMode save_mode = env->fp_status.float_rounding_mode;
-            signed char save_prec = env->fp_status.floatx80_rounding_precision;
+            FloatX80RoundPrec save_prec =
+                env->fp_status.floatx80_rounding_precision;
             env->fp_status.float_rounding_mode = float_round_nearest_even;
-            env->fp_status.floatx80_rounding_precision = 80;
+            env->fp_status.floatx80_rounding_precision = floatx80_precision_x;
 
             if (arg0_exp == 0) {
                 normalizeFloatx80Subnormal(arg0_sig, &arg0_exp, &arg0_sig);
@@ -1486,7 +1494,8 @@ void helper_fpatan(CPUX86State *env)
              * Split x as x = t + y, where t = n/8 is the nearest
              * multiple of 1/8 to x.
              */
-            x8 = normalizeRoundAndPackFloatx80(80, false, xexp + 3, xsig0,
+            x8 = normalizeRoundAndPackFloatx80(floatx80_precision_x,
+                                               false, xexp + 3, xsig0,
                                                xsig1, &env->fp_status);
             n = floatx80_to_int32(x8, &env->fp_status);
             if (n == 0) {
@@ -1607,7 +1616,7 @@ void helper_fpatan(CPUX86State *env)
                 /* Compute z^2.  */
                 mul128To256(zsig0, zsig1, zsig0, zsig1,
                             &z2sig0, &z2sig1, &z2sig2, &z2sig3);
-                z2 = normalizeRoundAndPackFloatx80(80, false,
+                z2 = normalizeRoundAndPackFloatx80(floatx80_precision_x, false,
                                                    zexp + zexp - 0x3ffe,
                                                    z2sig0, z2sig1,
                                                    &env->fp_status);
@@ -1727,7 +1736,7 @@ void helper_fpatan(CPUX86State *env)
         }
         /* This result is inexact.  */
         rsig1 |= 1;
-        ST1 = normalizeRoundAndPackFloatx80(80, rsign, rexp,
+        ST1 = normalizeRoundAndPackFloatx80(floatx80_precision_x, rsign, rexp,
                                             rsig0, rsig1, &env->fp_status);
     }
 
@@ -1928,7 +1937,8 @@ static void helper_fyl2x_common(CPUX86State *env, floatx80 arg, int32_t *exp,
      */
     mul128To256(tsig0, tsig1, tsig0, tsig1,
                 &t2sig0, &t2sig1, &t2sig2, &t2sig3);
-    t2 = normalizeRoundAndPackFloatx80(80, false, texp + texp - 0x3ffe,
+    t2 = normalizeRoundAndPackFloatx80(floatx80_precision_x, false,
+                                       texp + texp - 0x3ffe,
                                        t2sig0, t2sig1, &env->fp_status);
 
     /* Compute the lower parts of the polynomial expansion.  */
@@ -2042,15 +2052,17 @@ void helper_fyl2xp1(CPUX86State *env)
         exp += arg1_exp - 0x3ffe;
         /* This result is inexact.  */
         sig1 |= 1;
-        ST1 = normalizeRoundAndPackFloatx80(80, arg0_sign ^ arg1_sign, exp,
+        ST1 = normalizeRoundAndPackFloatx80(floatx80_precision_x,
+                                            arg0_sign ^ arg1_sign, exp,
                                             sig0, sig1, &env->fp_status);
     } else {
         int32_t aexp;
         uint64_t asig0, asig1, asig2;
         FloatRoundMode save_mode = env->fp_status.float_rounding_mode;
-        signed char save_prec = env->fp_status.floatx80_rounding_precision;
+        FloatX80RoundPrec save_prec =
+            env->fp_status.floatx80_rounding_precision;
         env->fp_status.float_rounding_mode = float_round_nearest_even;
-        env->fp_status.floatx80_rounding_precision = 80;
+        env->fp_status.floatx80_rounding_precision = floatx80_precision_x;
 
         helper_fyl2x_common(env, ST0, &aexp, &asig0, &asig1);
         /*
@@ -2065,7 +2077,8 @@ void helper_fyl2xp1(CPUX86State *env)
         /* This result is inexact.  */
         asig1 |= 1;
         env->fp_status.float_rounding_mode = save_mode;
-        ST1 = normalizeRoundAndPackFloatx80(80, arg0_sign ^ arg1_sign, aexp,
+        ST1 = normalizeRoundAndPackFloatx80(floatx80_precision_x,
+                                            arg0_sign ^ arg1_sign, aexp,
                                             asig0, asig1, &env->fp_status);
         env->fp_status.floatx80_rounding_precision = save_prec;
     }
@@ -2149,9 +2162,10 @@ void helper_fyl2x(CPUX86State *env)
         int32_t int_exp;
         floatx80 arg0_m1;
         FloatRoundMode save_mode = env->fp_status.float_rounding_mode;
-        signed char save_prec = env->fp_status.floatx80_rounding_precision;
+        FloatX80RoundPrec save_prec =
+            env->fp_status.floatx80_rounding_precision;
         env->fp_status.float_rounding_mode = float_round_nearest_even;
-        env->fp_status.floatx80_rounding_precision = 80;
+        env->fp_status.floatx80_rounding_precision = floatx80_precision_x;
 
         if (arg0_exp == 0) {
             normalizeFloatx80Subnormal(arg0_sig, &arg0_exp, &arg0_sig);
@@ -2208,7 +2222,8 @@ void helper_fyl2x(CPUX86State *env)
             /* This result is inexact.  */
             asig1 |= 1;
             env->fp_status.float_rounding_mode = save_mode;
-            ST1 = normalizeRoundAndPackFloatx80(80, asign ^ arg1_sign, aexp,
+            ST1 = normalizeRoundAndPackFloatx80(floatx80_precision_x,
+                                                asign ^ arg1_sign, aexp,
                                                 asig0, asig1, &env->fp_status);
         }
 
@@ -2290,12 +2305,12 @@ void helper_fscale(CPUX86State *env)
         }
     } else {
         int n;
-        signed char save = env->fp_status.floatx80_rounding_precision;
+        FloatX80RoundPrec save = env->fp_status.floatx80_rounding_precision;
         uint8_t save_flags = get_float_exception_flags(&env->fp_status);
         set_float_exception_flags(0, &env->fp_status);
         n = floatx80_to_int32_round_to_zero(ST1, &env->fp_status);
         set_float_exception_flags(save_flags, &env->fp_status);
-        env->fp_status.floatx80_rounding_precision = 80;
+        env->fp_status.floatx80_rounding_precision = floatx80_precision_x;
         ST0 = floatx80_scalbn(ST0, n, &env->fp_status);
         env->fp_status.floatx80_rounding_precision = save;
     }
diff --git a/target/m68k/fpu_helper.c b/target/m68k/fpu_helper.c
index 797000e748..fdc4937e29 100644
--- a/target/m68k/fpu_helper.c
+++ b/target/m68k/fpu_helper.c
@@ -94,13 +94,13 @@ static void m68k_restore_precision_mode(CPUM68KState *env)
 {
     switch (env->fpcr & FPCR_PREC_MASK) {
     case FPCR_PREC_X: /* extended */
-        set_floatx80_rounding_precision(80, &env->fp_status);
+        set_floatx80_rounding_precision(floatx80_precision_x, &env->fp_status);
         break;
     case FPCR_PREC_S: /* single */
-        set_floatx80_rounding_precision(32, &env->fp_status);
+        set_floatx80_rounding_precision(floatx80_precision_s, &env->fp_status);
         break;
     case FPCR_PREC_D: /* double */
-        set_floatx80_rounding_precision(64, &env->fp_status);
+        set_floatx80_rounding_precision(floatx80_precision_d, &env->fp_status);
         break;
     case FPCR_PREC_U: /* undefined */
     default:
@@ -111,9 +111,9 @@ static void m68k_restore_precision_mode(CPUM68KState *env)
 static void cf_restore_precision_mode(CPUM68KState *env)
 {
     if (env->fpcr & FPCR_PREC_S) { /* single */
-        set_floatx80_rounding_precision(32, &env->fp_status);
+        set_floatx80_rounding_precision(floatx80_precision_s, &env->fp_status);
     } else { /* double */
-        set_floatx80_rounding_precision(64, &env->fp_status);
+        set_floatx80_rounding_precision(floatx80_precision_d, &env->fp_status);
     }
 }
 
@@ -166,8 +166,8 @@ void HELPER(set_fpcr)(CPUM68KState *env, uint32_t val)
 
 #define PREC_BEGIN(prec)                                        \
     do {                                                        \
-        int old;                                                \
-        old = get_floatx80_rounding_precision(&env->fp_status); \
+        FloatX80RoundPrec old =                                 \
+            get_floatx80_rounding_precision(&env->fp_status);   \
         set_floatx80_rounding_precision(prec, &env->fp_status)  \
 
 #define PREC_END()                                              \
@@ -176,14 +176,14 @@ void HELPER(set_fpcr)(CPUM68KState *env, uint32_t val)
 
 void HELPER(fsround)(CPUM68KState *env, FPReg *res, FPReg *val)
 {
-    PREC_BEGIN(32);
+    PREC_BEGIN(floatx80_precision_s);
     res->d = floatx80_round(val->d, &env->fp_status);
     PREC_END();
 }
 
 void HELPER(fdround)(CPUM68KState *env, FPReg *res, FPReg *val)
 {
-    PREC_BEGIN(64);
+    PREC_BEGIN(floatx80_precision_d);
     res->d = floatx80_round(val->d, &env->fp_status);
     PREC_END();
 }
@@ -195,14 +195,14 @@ void HELPER(fsqrt)(CPUM68KState *env, FPReg *res, FPReg *val)
 
 void HELPER(fssqrt)(CPUM68KState *env, FPReg *res, FPReg *val)
 {
-    PREC_BEGIN(32);
+    PREC_BEGIN(floatx80_precision_s);
     res->d = floatx80_sqrt(val->d, &env->fp_status);
     PREC_END();
 }
 
 void HELPER(fdsqrt)(CPUM68KState *env, FPReg *res, FPReg *val)
 {
-    PREC_BEGIN(64);
+    PREC_BEGIN(floatx80_precision_d);
     res->d = floatx80_sqrt(val->d, &env->fp_status);
     PREC_END();
 }
@@ -214,14 +214,14 @@ void HELPER(fabs)(CPUM68KState *env, FPReg *res, FPReg *val)
 
 void HELPER(fsabs)(CPUM68KState *env, FPReg *res, FPReg *val)
 {
-    PREC_BEGIN(32);
+    PREC_BEGIN(floatx80_precision_s);
     res->d = floatx80_round(floatx80_abs(val->d), &env->fp_status);
     PREC_END();
 }
 
 void HELPER(fdabs)(CPUM68KState *env, FPReg *res, FPReg *val)
 {
-    PREC_BEGIN(64);
+    PREC_BEGIN(floatx80_precision_d);
     res->d = floatx80_round(floatx80_abs(val->d), &env->fp_status);
     PREC_END();
 }
@@ -233,14 +233,14 @@ void HELPER(fneg)(CPUM68KState *env, FPReg *res, FPReg *val)
 
 void HELPER(fsneg)(CPUM68KState *env, FPReg *res, FPReg *val)
 {
-    PREC_BEGIN(32);
+    PREC_BEGIN(floatx80_precision_s);
     res->d = floatx80_round(floatx80_chs(val->d), &env->fp_status);
     PREC_END();
 }
 
 void HELPER(fdneg)(CPUM68KState *env, FPReg *res, FPReg *val)
 {
-    PREC_BEGIN(64);
+    PREC_BEGIN(floatx80_precision_d);
     res->d = floatx80_round(floatx80_chs(val->d), &env->fp_status);
     PREC_END();
 }
@@ -252,14 +252,14 @@ void HELPER(fadd)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
 
 void HELPER(fsadd)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
 {
-    PREC_BEGIN(32);
+    PREC_BEGIN(floatx80_precision_s);
     res->d = floatx80_add(val0->d, val1->d, &env->fp_status);
     PREC_END();
 }
 
 void HELPER(fdadd)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
 {
-    PREC_BEGIN(64);
+    PREC_BEGIN(floatx80_precision_d);
     res->d = floatx80_add(val0->d, val1->d, &env->fp_status);
     PREC_END();
 }
@@ -271,14 +271,14 @@ void HELPER(fsub)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
 
 void HELPER(fssub)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
 {
-    PREC_BEGIN(32);
+    PREC_BEGIN(floatx80_precision_s);
     res->d = floatx80_sub(val1->d, val0->d, &env->fp_status);
     PREC_END();
 }
 
 void HELPER(fdsub)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
 {
-    PREC_BEGIN(64);
+    PREC_BEGIN(floatx80_precision_d);
     res->d = floatx80_sub(val1->d, val0->d, &env->fp_status);
     PREC_END();
 }
@@ -290,14 +290,14 @@ void HELPER(fmul)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
 
 void HELPER(fsmul)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
 {
-    PREC_BEGIN(32);
+    PREC_BEGIN(floatx80_precision_s);
     res->d = floatx80_mul(val0->d, val1->d, &env->fp_status);
     PREC_END();
 }
 
 void HELPER(fdmul)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
 {
-    PREC_BEGIN(64);
+    PREC_BEGIN(floatx80_precision_d);
     res->d = floatx80_mul(val0->d, val1->d, &env->fp_status);
     PREC_END();
 }
@@ -307,7 +307,7 @@ void HELPER(fsglmul)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
     FloatRoundMode rounding_mode = get_float_rounding_mode(&env->fp_status);
     floatx80 a, b;
 
-    PREC_BEGIN(32);
+    PREC_BEGIN(floatx80_precision_s);
     set_float_rounding_mode(float_round_to_zero, &env->fp_status);
     a = floatx80_round(val0->d, &env->fp_status);
     b = floatx80_round(val1->d, &env->fp_status);
@@ -323,14 +323,14 @@ void HELPER(fdiv)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
 
 void HELPER(fsdiv)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
 {
-    PREC_BEGIN(32);
+    PREC_BEGIN(floatx80_precision_s);
     res->d = floatx80_div(val1->d, val0->d, &env->fp_status);
     PREC_END();
 }
 
 void HELPER(fddiv)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
 {
-    PREC_BEGIN(64);
+    PREC_BEGIN(floatx80_precision_d);
     res->d = floatx80_div(val1->d, val0->d, &env->fp_status);
     PREC_END();
 }
@@ -340,7 +340,7 @@ void HELPER(fsgldiv)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
     FloatRoundMode rounding_mode = get_float_rounding_mode(&env->fp_status);
     floatx80 a, b;
 
-    PREC_BEGIN(32);
+    PREC_BEGIN(floatx80_precision_s);
     set_float_rounding_mode(float_round_to_zero, &env->fp_status);
     a = floatx80_round(val1->d, &env->fp_status);
     b = floatx80_round(val0->d, &env->fp_status);
diff --git a/target/m68k/softfloat.c b/target/m68k/softfloat.c
index b6d0ed7acf..02dcc03d15 100644
--- a/target/m68k/softfloat.c
+++ b/target/m68k/softfloat.c
@@ -227,7 +227,8 @@ floatx80 floatx80_lognp1(floatx80 a, float_status *status)
     int32_t aExp;
     uint64_t aSig, fSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact, j, k;
     floatx80 fp0, fp1, fp2, fp3, f, logof2, klog2, saveu;
@@ -270,7 +271,7 @@ floatx80 floatx80_lognp1(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     compact = floatx80_make_compact(aExp, aSig);
 
@@ -426,7 +427,8 @@ floatx80 floatx80_logn(floatx80 a, float_status *status)
     int32_t aExp;
     uint64_t aSig, fSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact, j, k, adjk;
     floatx80 fp0, fp1, fp2, fp3, f, logof2, klog2, saveu;
@@ -469,7 +471,7 @@ floatx80 floatx80_logn(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     compact = floatx80_make_compact(aExp, aSig);
 
@@ -594,7 +596,8 @@ floatx80 floatx80_log10(floatx80 a, float_status *status)
     int32_t aExp;
     uint64_t aSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     floatx80 fp0, fp1;
 
@@ -626,7 +629,7 @@ floatx80 floatx80_log10(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     fp0 = floatx80_logn(a, status);
     fp1 = packFloatx80(0, 0x3FFD, UINT64_C(0xDE5BD8A937287195)); /* INV_L10 */
@@ -651,7 +654,8 @@ floatx80 floatx80_log2(floatx80 a, float_status *status)
     int32_t aExp;
     uint64_t aSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     floatx80 fp0, fp1;
 
@@ -686,7 +690,7 @@ floatx80 floatx80_log2(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     if (aSig == one_sig) { /* X is 2^k */
         status->float_rounding_mode = user_rnd_mode;
@@ -718,7 +722,8 @@ floatx80 floatx80_etox(floatx80 a, float_status *status)
     int32_t aExp;
     uint64_t aSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact, n, j, k, m, m1;
     floatx80 fp0, fp1, fp2, fp3, l2, scale, adjscale;
@@ -746,7 +751,7 @@ floatx80 floatx80_etox(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     adjflag = 0;
 
@@ -902,7 +907,8 @@ floatx80 floatx80_twotox(floatx80 a, float_status *status)
     int32_t aExp;
     uint64_t aSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact, n, j, l, m, m1;
     floatx80 fp0, fp1, fp2, fp3, adjfact, fact1, fact2;
@@ -929,7 +935,7 @@ floatx80 floatx80_twotox(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     fp0 = a;
 
@@ -1052,7 +1058,8 @@ floatx80 floatx80_tentox(floatx80 a, float_status *status)
     int32_t aExp;
     uint64_t aSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact, n, j, l, m, m1;
     floatx80 fp0, fp1, fp2, fp3, adjfact, fact1, fact2;
@@ -1079,7 +1086,7 @@ floatx80 floatx80_tentox(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     fp0 = a;
 
@@ -1207,7 +1214,8 @@ floatx80 floatx80_tan(floatx80 a, float_status *status)
     int32_t aExp, xExp;
     uint64_t aSig, xSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact, l, n, j;
     floatx80 fp0, fp1, fp2, fp3, fp4, fp5, invtwopi, twopi1, twopi2;
@@ -1233,7 +1241,7 @@ floatx80 floatx80_tan(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     compact = floatx80_make_compact(aExp, aSig);
 
@@ -1417,7 +1425,8 @@ floatx80 floatx80_sin(floatx80 a, float_status *status)
     int32_t aExp, xExp;
     uint64_t aSig, xSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact, l, n, j;
     floatx80 fp0, fp1, fp2, fp3, fp4, fp5, x, invtwopi, twopi1, twopi2;
@@ -1443,7 +1452,7 @@ floatx80 floatx80_sin(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     compact = floatx80_make_compact(aExp, aSig);
 
@@ -1656,7 +1665,8 @@ floatx80 floatx80_cos(floatx80 a, float_status *status)
     int32_t aExp, xExp;
     uint64_t aSig, xSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact, l, n, j;
     floatx80 fp0, fp1, fp2, fp3, fp4, fp5, x, invtwopi, twopi1, twopi2;
@@ -1682,7 +1692,7 @@ floatx80 floatx80_cos(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     compact = floatx80_make_compact(aExp, aSig);
 
@@ -1893,7 +1903,8 @@ floatx80 floatx80_atan(floatx80 a, float_status *status)
     int32_t aExp;
     uint64_t aSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact, tbl_index;
     floatx80 fp0, fp1, fp2, fp3, xsave;
@@ -1920,7 +1931,7 @@ floatx80 floatx80_atan(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     if (compact < 0x3FFB8000 || compact > 0x4002FFFF) {
         /* |X| >= 16 or |X| < 1/16 */
@@ -2090,7 +2101,8 @@ floatx80 floatx80_asin(floatx80 a, float_status *status)
     int32_t aExp;
     uint64_t aSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact;
     floatx80 fp0, fp1, fp2, one;
@@ -2124,7 +2136,7 @@ floatx80 floatx80_asin(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     one = packFloatx80(0, one_exp, one_sig);
     fp0 = a;
@@ -2155,7 +2167,8 @@ floatx80 floatx80_acos(floatx80 a, float_status *status)
     int32_t aExp;
     uint64_t aSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact;
     floatx80 fp0, fp1, one;
@@ -2193,7 +2206,7 @@ floatx80 floatx80_acos(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     one = packFloatx80(0, one_exp, one_sig);
     fp0 = a;
@@ -2224,7 +2237,8 @@ floatx80 floatx80_atanh(floatx80 a, float_status *status)
     int32_t aExp;
     uint64_t aSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact;
     floatx80 fp0, fp1, fp2, one;
@@ -2257,7 +2271,7 @@ floatx80 floatx80_atanh(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     one = packFloatx80(0, one_exp, one_sig);
     fp2 = packFloatx80(aSign, 0x3FFE, one_sig); /* SIGN(X) * (1/2) */
@@ -2289,7 +2303,8 @@ floatx80 floatx80_etoxm1(floatx80 a, float_status *status)
     int32_t aExp;
     uint64_t aSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact, n, j, m, m1;
     floatx80 fp0, fp1, fp2, fp3, l2, sc, onebysc;
@@ -2316,7 +2331,7 @@ floatx80 floatx80_etoxm1(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     if (aExp >= 0x3FFD) { /* |X| >= 1/4 */
         compact = floatx80_make_compact(aExp, aSig);
@@ -2541,7 +2556,8 @@ floatx80 floatx80_tanh(floatx80 a, float_status *status)
     int32_t aExp, vExp;
     uint64_t aSig, vSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact;
     floatx80 fp0, fp1;
@@ -2565,7 +2581,7 @@ floatx80 floatx80_tanh(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     compact = floatx80_make_compact(aExp, aSig);
 
@@ -2656,7 +2672,8 @@ floatx80 floatx80_sinh(floatx80 a, float_status *status)
     int32_t aExp;
     uint64_t aSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact;
     floatx80 fp0, fp1, fp2;
@@ -2681,7 +2698,7 @@ floatx80 floatx80_sinh(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     compact = floatx80_make_compact(aExp, aSig);
 
@@ -2744,7 +2761,8 @@ floatx80 floatx80_cosh(floatx80 a, float_status *status)
     int32_t aExp;
     uint64_t aSig;
 
-    int8_t user_rnd_mode, user_rnd_prec;
+    FloatRoundMode user_rnd_mode;
+    FloatX80RoundPrec user_rnd_prec;
 
     int32_t compact;
     floatx80 fp0, fp1;
@@ -2767,7 +2785,7 @@ floatx80 floatx80_cosh(floatx80 a, float_status *status)
     user_rnd_mode = status->float_rounding_mode;
     user_rnd_prec = status->floatx80_rounding_precision;
     status->float_rounding_mode = float_round_nearest_even;
-    status->floatx80_rounding_precision = 80;
+    status->floatx80_rounding_precision = floatx80_precision_x;
 
     compact = floatx80_make_compact(aExp, aSig);
 
diff --git a/tests/fp/fp-test.c b/tests/fp/fp-test.c
index ff131afbde..1be3a9788a 100644
--- a/tests/fp/fp-test.c
+++ b/tests/fp/fp-test.c
@@ -963,18 +963,21 @@ static void QEMU_NORETURN run_test(void)
             verCases_usesExact = !!(attrs & FUNC_ARG_EXACT);
 
             for (k = 0; k < 3; k++) {
+                FloatX80RoundPrec qsf_prec80 = floatx80_precision_s;
                 int prec80 = 32;
                 int l;
 
                 if (k == 1) {
                     prec80 = 64;
+                    qsf_prec80 = floatx80_precision_d;
                 } else if (k == 2) {
                     prec80 = 80;
+                    qsf_prec80 = floatx80_precision_x;
                 }
 
                 verCases_roundingPrecision = 0;
                 slow_extF80_roundingPrecision = prec80;
-                qsf.floatx80_rounding_precision = prec80;
+                qsf.floatx80_rounding_precision = qsf_prec80;
 
                 if (attrs & FUNC_EFF_ROUNDINGPRECISION) {
                     verCases_roundingPrecision = prec80;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 57/72] softfloat: Adjust parts_uncanon_normal for floatx80
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (55 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 56/72] softfloat: Introduce Floatx80RoundPrec Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 58/72] tests/fp/fp-test: Reverse order of floatx80 precision tests Richard Henderson
                   ` (19 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

With floatx80_precision_x, the rounding happens across
the break between words.  Notice this case with

  frac_lsb = round_mask + 1 -> 0

and check the bits in frac_hi as needed.

In addition, since frac_shift == 0, we won't implicitly clear
round_mask via the right-shift, so explicitly clear those bits.
This fixes rounding for floatx80_precision_[sd].

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat-parts.c.inc | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 3ee6552d5a..c18f77f2cf 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -156,7 +156,13 @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
     switch (s->float_rounding_mode) {
     case float_round_nearest_even:
         overflow_norm = false;
-        inc = ((p->frac_lo & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
+        if (N > 64 && frac_lsb == 0) {
+            inc = ((p->frac_hi & 1) || (p->frac_lo & round_mask) != frac_lsbm1
+                   ? frac_lsbm1 : 0);
+        } else {
+            inc = ((p->frac_lo & roundeven_mask) != frac_lsbm1
+                   ? frac_lsbm1 : 0);
+        }
         break;
     case float_round_ties_away:
         overflow_norm = false;
@@ -176,7 +182,11 @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
         break;
     case float_round_to_odd:
         overflow_norm = true;
-        inc = p->frac_lo & frac_lsb ? 0 : round_mask;
+        if (N > 64 && frac_lsb == 0) {
+            inc = p->frac_hi & 1 ? 0 : round_mask;
+        } else {
+            inc = p->frac_lo & frac_lsb ? 0 : round_mask;
+        }
         break;
     default:
         g_assert_not_reached();
@@ -191,8 +201,8 @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
                 p->frac_hi |= DECOMPOSED_IMPLICIT_BIT;
                 exp++;
             }
+            p->frac_lo &= ~round_mask;
         }
-        frac_shr(p, frac_shift);
 
         if (fmt->arm_althp) {
             /* ARM Alt HP eschews Inf and NaN for a wider exponent.  */
@@ -201,18 +211,21 @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
                 flags = float_flag_invalid;
                 exp = exp_max;
                 frac_allones(p);
+                p->frac_lo &= ~round_mask;
             }
         } else if (unlikely(exp >= exp_max)) {
             flags |= float_flag_overflow | float_flag_inexact;
             if (overflow_norm) {
                 exp = exp_max - 1;
                 frac_allones(p);
+                p->frac_lo &= ~round_mask;
             } else {
                 p->cls = float_class_inf;
                 exp = exp_max;
                 frac_clear(p);
             }
         }
+        frac_shr(p, frac_shift);
     } else if (s->flush_to_zero) {
         flags |= float_flag_output_denormal;
         p->cls = float_class_zero;
@@ -232,17 +245,28 @@ static void partsN(uncanon_normal)(FloatPartsN *p, float_status *s,
             /* Need to recompute round-to-even/round-to-odd. */
             switch (s->float_rounding_mode) {
             case float_round_nearest_even:
-                inc = ((p->frac_lo & roundeven_mask) != frac_lsbm1
-                       ? frac_lsbm1 : 0);
+                if (N > 64 && frac_lsb == 0) {
+                    inc = ((p->frac_hi & 1) ||
+                           (p->frac_lo & round_mask) != frac_lsbm1
+                           ? frac_lsbm1 : 0);
+                } else {
+                    inc = ((p->frac_lo & roundeven_mask) != frac_lsbm1
+                           ? frac_lsbm1 : 0);
+                }
                 break;
             case float_round_to_odd:
-                inc = p->frac_lo & frac_lsb ? 0 : round_mask;
+                if (N > 64 && frac_lsb == 0) {
+                    inc = p->frac_hi & 1 ? 0 : round_mask;
+                } else {
+                    inc = p->frac_lo & frac_lsb ? 0 : round_mask;
+                }
                 break;
             default:
                 break;
             }
             flags |= float_flag_inexact;
             frac_addi(p, p, inc);
+            p->frac_lo &= ~round_mask;
         }
 
         exp = (p->frac_hi & DECOMPOSED_IMPLICIT_BIT) != 0;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 58/72] tests/fp/fp-test: Reverse order of floatx80 precision tests
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (56 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 57/72] softfloat: Adjust parts_uncanon_normal for floatx80 Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-13 14:12   ` Alex Bennée
  2021-05-08  1:47 ` [PATCH 59/72] softfloat: Convert floatx80_add/sub to FloatParts Richard Henderson
                   ` (18 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Many qemu softfloat will check floatx80_rounding_precision
even when berkeley testfloat will not.  So begin with
floatx80_precision_x, so that's the one we use
when !FUNC_EFF_ROUNDINGPRECISION.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tests/fp/fp-test.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/fp/fp-test.c b/tests/fp/fp-test.c
index 1be3a9788a..352dd71c44 100644
--- a/tests/fp/fp-test.c
+++ b/tests/fp/fp-test.c
@@ -963,16 +963,16 @@ static void QEMU_NORETURN run_test(void)
             verCases_usesExact = !!(attrs & FUNC_ARG_EXACT);
 
             for (k = 0; k < 3; k++) {
-                FloatX80RoundPrec qsf_prec80 = floatx80_precision_s;
-                int prec80 = 32;
+                FloatX80RoundPrec qsf_prec80 = floatx80_precision_x;
+                int prec80 = 80;
                 int l;
 
                 if (k == 1) {
                     prec80 = 64;
                     qsf_prec80 = floatx80_precision_d;
                 } else if (k == 2) {
-                    prec80 = 80;
-                    qsf_prec80 = floatx80_precision_x;
+                    prec80 = 32;
+                    qsf_prec80 = floatx80_precision_s;
                 }
 
                 verCases_roundingPrecision = 0;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 59/72] softfloat: Convert floatx80_add/sub to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (57 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 58/72] tests/fp/fp-test: Reverse order of floatx80 precision tests Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 60/72] softfloat: Convert floatx80_mul " Richard Henderson
                   ` (17 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Since this is the first such, this includes all of the
packing and unpacking routines as well.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 339 +++++++++++++++++++-----------------------------
 1 file changed, 136 insertions(+), 203 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 441b8f9dc1..3c6751e4d0 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -577,14 +577,14 @@ typedef struct {
 } FloatFmt;
 
 /* Expand fields based on the size of exponent and fraction */
-#define FLOAT_PARAMS_(E, F)                             \
+#define FLOAT_PARAMS_(E)                                \
     .exp_size       = E,                                \
     .exp_bias       = ((1 << E) - 1) >> 1,              \
-    .exp_max        = (1 << E) - 1,                     \
-    .frac_size      = F
+    .exp_max        = (1 << E) - 1
 
 #define FLOAT_PARAMS(E, F)                              \
-    FLOAT_PARAMS_(E, F),                                \
+    FLOAT_PARAMS_(E),                                   \
+    .frac_size      = F,                                \
     .frac_shift     = (-F - 1) & 63,                    \
     .round_mask     = (1ull << ((-F - 1) & 63)) - 1
 
@@ -613,6 +613,18 @@ static const FloatFmt float128_params = {
     FLOAT_PARAMS(15, 112)
 };
 
+#define FLOATX80_PARAMS(R)              \
+    FLOAT_PARAMS_(15),                  \
+    .frac_size = R == 64 ? 63 : R,      \
+    .frac_shift = 0,                    \
+    .round_mask = R == 64 ? -1 : (1ull << ((-R - 1) & 63)) - 1
+
+static const FloatFmt floatx80_params[3] = {
+    [floatx80_precision_s] = { FLOATX80_PARAMS(23) },
+    [floatx80_precision_d] = { FLOATX80_PARAMS(52) },
+    [floatx80_precision_x] = { FLOATX80_PARAMS(64) },
+};
+
 /* Unpack a float to parts, but do not canonicalize.  */
 static void unpack_raw64(FloatParts64 *r, const FloatFmt *fmt, uint64_t raw)
 {
@@ -647,6 +659,16 @@ static inline void float64_unpack_raw(FloatParts64 *p, float64 f)
     unpack_raw64(p, &float64_params, f);
 }
 
+static void floatx80_unpack_raw(FloatParts128 *p, floatx80 f)
+{
+    *p = (FloatParts128) {
+        .cls = float_class_unclassified,
+        .sign = extract32(f.high, 15, 1),
+        .exp = extract32(f.high, 0, 15),
+        .frac_hi = f.low
+    };
+}
+
 static void float128_unpack_raw(FloatParts128 *p, float128 f)
 {
     const int f_size = float128_params.frac_size - 64;
@@ -1535,6 +1557,92 @@ static float128 float128_round_pack_canonical(FloatParts128 *p,
     return float128_pack_raw(p);
 }
 
+/* Returns false if the encoding is invalid. */
+static bool floatx80_unpack_canonical(FloatParts128 *p, floatx80 f,
+                                      float_status *s)
+{
+    /* Ensure rounding precision is set before beginning. */
+    switch (s->floatx80_rounding_precision) {
+    case floatx80_precision_x:
+    case floatx80_precision_d:
+    case floatx80_precision_s:
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    if (unlikely(floatx80_invalid_encoding(f))) {
+        float_raise(float_flag_invalid, s);
+        return false;
+    }
+
+    floatx80_unpack_raw(p, f);
+
+    if (likely(p->exp != floatx80_params[floatx80_precision_x].exp_max)) {
+        parts_canonicalize(p, s, &floatx80_params[floatx80_precision_x]);
+    } else {
+        /* The explicit integer bit is ignored, after invalid checks. */
+        p->frac_hi &= MAKE_64BIT_MASK(0, 63);
+        p->cls = (p->frac_hi == 0 ? float_class_inf
+                  : parts_is_snan_frac(p->frac_hi, s)
+                  ? float_class_snan : float_class_qnan);
+    }
+    return true;
+}
+
+static floatx80 floatx80_round_pack_canonical(FloatParts128 *p,
+                                              float_status *s)
+{
+    const FloatFmt *fmt = &floatx80_params[s->floatx80_rounding_precision];
+    uint64_t frac;
+    int exp;
+
+    switch (p->cls) {
+    case float_class_normal:
+        if (s->floatx80_rounding_precision == floatx80_precision_x) {
+            parts_uncanon_normal(p, s, fmt);
+            frac = p->frac_hi;
+            exp = p->exp;
+        } else {
+            FloatParts64 p64;
+
+            p64.sign = p->sign;
+            p64.exp = p->exp;
+            frac_truncjam(&p64, p);
+            parts_uncanon_normal(&p64, s, fmt);
+            frac = p64.frac;
+            exp = p64.exp;
+        }
+        if (exp != fmt->exp_max) {
+            break;
+        }
+        /* rounded to inf -- fall through to set frac correctly */
+
+    case float_class_inf:
+        /* x86 and m68k differ in the setting of the integer bit. */
+        frac = floatx80_infinity_low;
+        exp = fmt->exp_max;
+        break;
+
+    case float_class_zero:
+        frac = 0;
+        exp = 0;
+        break;
+
+    case float_class_snan:
+    case float_class_qnan:
+        /* NaNs have the integer bit set. */
+        frac = p->frac_hi | (1ull << 63);
+        exp = fmt->exp_max;
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+
+    return packFloatx80(p->sign, exp, frac);
+}
+
 /*
  * Addition and subtraction
  */
@@ -1724,6 +1832,30 @@ float128 float128_sub(float128 a, float128 b, float_status *status)
     return float128_addsub(a, b, status, true);
 }
 
+static floatx80 QEMU_FLATTEN
+floatx80_addsub(floatx80 a, floatx80 b, float_status *status, bool subtract)
+{
+    FloatParts128 pa, pb, *pr;
+
+    if (!floatx80_unpack_canonical(&pa, a, status) ||
+        !floatx80_unpack_canonical(&pb, b, status)) {
+        return floatx80_default_nan(status);
+    }
+
+    pr = parts_addsub(&pa, &pb, status, subtract);
+    return floatx80_round_pack_canonical(pr, status);
+}
+
+floatx80 floatx80_add(floatx80 a, floatx80 b, float_status *status)
+{
+    return floatx80_addsub(a, b, status, false);
+}
+
+floatx80 floatx80_sub(floatx80 a, floatx80 b, float_status *status)
+{
+    return floatx80_addsub(a, b, status, true);
+}
+
 /*
  * Multiplication
  */
@@ -5733,205 +5865,6 @@ floatx80 floatx80_round_to_int(floatx80 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of adding the absolute values of the extended double-
-| precision floating-point values `a' and `b'.  If `zSign' is 1, the sum is
-| negated before being returned.  `zSign' is ignored if the result is a NaN.
-| The addition is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static floatx80 addFloatx80Sigs(floatx80 a, floatx80 b, bool zSign,
-                                float_status *status)
-{
-    int32_t aExp, bExp, zExp;
-    uint64_t aSig, bSig, zSig0, zSig1;
-    int32_t expDiff;
-
-    aSig = extractFloatx80Frac( a );
-    aExp = extractFloatx80Exp( a );
-    bSig = extractFloatx80Frac( b );
-    bExp = extractFloatx80Exp( b );
-    expDiff = aExp - bExp;
-    if ( 0 < expDiff ) {
-        if ( aExp == 0x7FFF ) {
-            if ((uint64_t)(aSig << 1)) {
-                return propagateFloatx80NaN(a, b, status);
-            }
-            return a;
-        }
-        if ( bExp == 0 ) --expDiff;
-        shift64ExtraRightJamming( bSig, 0, expDiff, &bSig, &zSig1 );
-        zExp = aExp;
-    }
-    else if ( expDiff < 0 ) {
-        if ( bExp == 0x7FFF ) {
-            if ((uint64_t)(bSig << 1)) {
-                return propagateFloatx80NaN(a, b, status);
-            }
-            return packFloatx80(zSign,
-                                floatx80_infinity_high,
-                                floatx80_infinity_low);
-        }
-        if ( aExp == 0 ) ++expDiff;
-        shift64ExtraRightJamming( aSig, 0, - expDiff, &aSig, &zSig1 );
-        zExp = bExp;
-    }
-    else {
-        if ( aExp == 0x7FFF ) {
-            if ( (uint64_t) ( ( aSig | bSig )<<1 ) ) {
-                return propagateFloatx80NaN(a, b, status);
-            }
-            return a;
-        }
-        zSig1 = 0;
-        zSig0 = aSig + bSig;
-        if ( aExp == 0 ) {
-            if ((aSig | bSig) & UINT64_C(0x8000000000000000) && zSig0 < aSig) {
-                /* At least one of the values is a pseudo-denormal,
-                 * and there is a carry out of the result.  */
-                zExp = 1;
-                goto shiftRight1;
-            }
-            if (zSig0 == 0) {
-                return packFloatx80(zSign, 0, 0);
-            }
-            normalizeFloatx80Subnormal( zSig0, &zExp, &zSig0 );
-            goto roundAndPack;
-        }
-        zExp = aExp;
-        goto shiftRight1;
-    }
-    zSig0 = aSig + bSig;
-    if ( (int64_t) zSig0 < 0 ) goto roundAndPack;
- shiftRight1:
-    shift64ExtraRightJamming( zSig0, zSig1, 1, &zSig0, &zSig1 );
-    zSig0 |= UINT64_C(0x8000000000000000);
-    ++zExp;
- roundAndPack:
-    return roundAndPackFloatx80(status->floatx80_rounding_precision,
-                                zSign, zExp, zSig0, zSig1, status);
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the absolute values of the extended
-| double-precision floating-point values `a' and `b'.  If `zSign' is 1, the
-| difference is negated before being returned.  `zSign' is ignored if the
-| result is a NaN.  The subtraction is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static floatx80 subFloatx80Sigs(floatx80 a, floatx80 b, bool zSign,
-                                float_status *status)
-{
-    int32_t aExp, bExp, zExp;
-    uint64_t aSig, bSig, zSig0, zSig1;
-    int32_t expDiff;
-
-    aSig = extractFloatx80Frac( a );
-    aExp = extractFloatx80Exp( a );
-    bSig = extractFloatx80Frac( b );
-    bExp = extractFloatx80Exp( b );
-    expDiff = aExp - bExp;
-    if ( 0 < expDiff ) goto aExpBigger;
-    if ( expDiff < 0 ) goto bExpBigger;
-    if ( aExp == 0x7FFF ) {
-        if ( (uint64_t) ( ( aSig | bSig )<<1 ) ) {
-            return propagateFloatx80NaN(a, b, status);
-        }
-        float_raise(float_flag_invalid, status);
-        return floatx80_default_nan(status);
-    }
-    if ( aExp == 0 ) {
-        aExp = 1;
-        bExp = 1;
-    }
-    zSig1 = 0;
-    if ( bSig < aSig ) goto aBigger;
-    if ( aSig < bSig ) goto bBigger;
-    return packFloatx80(status->float_rounding_mode == float_round_down, 0, 0);
- bExpBigger:
-    if ( bExp == 0x7FFF ) {
-        if ((uint64_t)(bSig << 1)) {
-            return propagateFloatx80NaN(a, b, status);
-        }
-        return packFloatx80(zSign ^ 1, floatx80_infinity_high,
-                            floatx80_infinity_low);
-    }
-    if ( aExp == 0 ) ++expDiff;
-    shift128RightJamming( aSig, 0, - expDiff, &aSig, &zSig1 );
- bBigger:
-    sub128( bSig, 0, aSig, zSig1, &zSig0, &zSig1 );
-    zExp = bExp;
-    zSign ^= 1;
-    goto normalizeRoundAndPack;
- aExpBigger:
-    if ( aExp == 0x7FFF ) {
-        if ((uint64_t)(aSig << 1)) {
-            return propagateFloatx80NaN(a, b, status);
-        }
-        return a;
-    }
-    if ( bExp == 0 ) --expDiff;
-    shift128RightJamming( bSig, 0, expDiff, &bSig, &zSig1 );
- aBigger:
-    sub128( aSig, 0, bSig, zSig1, &zSig0, &zSig1 );
-    zExp = aExp;
- normalizeRoundAndPack:
-    return normalizeRoundAndPackFloatx80(status->floatx80_rounding_precision,
-                                         zSign, zExp, zSig0, zSig1, status);
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of adding the extended double-precision floating-point
-| values `a' and `b'.  The operation is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-floatx80 floatx80_add(floatx80 a, floatx80 b, float_status *status)
-{
-    bool aSign, bSign;
-
-    if (floatx80_invalid_encoding(a) || floatx80_invalid_encoding(b)) {
-        float_raise(float_flag_invalid, status);
-        return floatx80_default_nan(status);
-    }
-    aSign = extractFloatx80Sign( a );
-    bSign = extractFloatx80Sign( b );
-    if ( aSign == bSign ) {
-        return addFloatx80Sigs(a, b, aSign, status);
-    }
-    else {
-        return subFloatx80Sigs(a, b, aSign, status);
-    }
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the extended double-precision floating-
-| point values `a' and `b'.  The operation is performed according to the
-| IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-floatx80 floatx80_sub(floatx80 a, floatx80 b, float_status *status)
-{
-    bool aSign, bSign;
-
-    if (floatx80_invalid_encoding(a) || floatx80_invalid_encoding(b)) {
-        float_raise(float_flag_invalid, status);
-        return floatx80_default_nan(status);
-    }
-    aSign = extractFloatx80Sign( a );
-    bSign = extractFloatx80Sign( b );
-    if ( aSign == bSign ) {
-        return subFloatx80Sigs(a, b, aSign, status);
-    }
-    else {
-        return addFloatx80Sigs(a, b, aSign, status);
-    }
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of multiplying the extended double-precision floating-
 | point values `a' and `b'.  The operation is performed according to the
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 60/72] softfloat: Convert floatx80_mul to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (58 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 59/72] softfloat: Convert floatx80_add/sub to FloatParts Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 61/72] softfloat: Convert floatx80_div " Richard Henderson
                   ` (16 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 76 +++++++++----------------------------------------
 1 file changed, 14 insertions(+), 62 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3c6751e4d0..4454219f8a 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1943,6 +1943,20 @@ float128_mul(float128 a, float128 b, float_status *status)
     return float128_round_pack_canonical(pr, status);
 }
 
+floatx80 QEMU_FLATTEN
+floatx80_mul(floatx80 a, floatx80 b, float_status *status)
+{
+    FloatParts128 pa, pb, *pr;
+
+    if (!floatx80_unpack_canonical(&pa, a, status) ||
+        !floatx80_unpack_canonical(&pb, b, status)) {
+        return floatx80_default_nan(status);
+    }
+
+    pr = parts_mul(&pa, &pb, status);
+    return floatx80_round_pack_canonical(pr, status);
+}
+
 /*
  * Fused multiply-add
  */
@@ -5865,68 +5879,6 @@ floatx80 floatx80_round_to_int(floatx80 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of multiplying the extended double-precision floating-
-| point values `a' and `b'.  The operation is performed according to the
-| IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-floatx80 floatx80_mul(floatx80 a, floatx80 b, float_status *status)
-{
-    bool aSign, bSign, zSign;
-    int32_t aExp, bExp, zExp;
-    uint64_t aSig, bSig, zSig0, zSig1;
-
-    if (floatx80_invalid_encoding(a) || floatx80_invalid_encoding(b)) {
-        float_raise(float_flag_invalid, status);
-        return floatx80_default_nan(status);
-    }
-    aSig = extractFloatx80Frac( a );
-    aExp = extractFloatx80Exp( a );
-    aSign = extractFloatx80Sign( a );
-    bSig = extractFloatx80Frac( b );
-    bExp = extractFloatx80Exp( b );
-    bSign = extractFloatx80Sign( b );
-    zSign = aSign ^ bSign;
-    if ( aExp == 0x7FFF ) {
-        if (    (uint64_t) ( aSig<<1 )
-             || ( ( bExp == 0x7FFF ) && (uint64_t) ( bSig<<1 ) ) ) {
-            return propagateFloatx80NaN(a, b, status);
-        }
-        if ( ( bExp | bSig ) == 0 ) goto invalid;
-        return packFloatx80(zSign, floatx80_infinity_high,
-                                   floatx80_infinity_low);
-    }
-    if ( bExp == 0x7FFF ) {
-        if ((uint64_t)(bSig << 1)) {
-            return propagateFloatx80NaN(a, b, status);
-        }
-        if ( ( aExp | aSig ) == 0 ) {
- invalid:
-            float_raise(float_flag_invalid, status);
-            return floatx80_default_nan(status);
-        }
-        return packFloatx80(zSign, floatx80_infinity_high,
-                                   floatx80_infinity_low);
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloatx80( zSign, 0, 0 );
-        normalizeFloatx80Subnormal( aSig, &aExp, &aSig );
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) return packFloatx80( zSign, 0, 0 );
-        normalizeFloatx80Subnormal( bSig, &bExp, &bSig );
-    }
-    zExp = aExp + bExp - 0x3FFE;
-    mul64To128( aSig, bSig, &zSig0, &zSig1 );
-    if ( 0 < (int64_t) zSig0 ) {
-        shortShift128Left( zSig0, zSig1, 1, &zSig0, &zSig1 );
-        --zExp;
-    }
-    return roundAndPackFloatx80(status->floatx80_rounding_precision,
-                                zSign, zExp, zSig0, zSig1, status);
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of dividing the extended double-precision floating-point
 | value `a' by the corresponding value `b'.  The operation is performed
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 61/72] softfloat: Convert floatx80_div to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (59 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 60/72] softfloat: Convert floatx80_mul " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 62/72] softfloat: Convert floatx80_sqrt " Richard Henderson
                   ` (15 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 100 +++++++-----------------------------------------
 1 file changed, 13 insertions(+), 87 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 4454219f8a..352f359bc5 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2293,6 +2293,19 @@ float128_div(float128 a, float128 b, float_status *status)
     return float128_round_pack_canonical(pr, status);
 }
 
+floatx80 floatx80_div(floatx80 a, floatx80 b, float_status *status)
+{
+    FloatParts128 pa, pb, *pr;
+
+    if (!floatx80_unpack_canonical(&pa, a, status) ||
+        !floatx80_unpack_canonical(&pb, b, status)) {
+        return floatx80_default_nan(status);
+    }
+
+    pr = parts_div(&pa, &pb, status);
+    return floatx80_round_pack_canonical(pr, status);
+}
+
 /*
  * Float to Float conversions
  *
@@ -5879,93 +5892,6 @@ floatx80 floatx80_round_to_int(floatx80 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of dividing the extended double-precision floating-point
-| value `a' by the corresponding value `b'.  The operation is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-floatx80 floatx80_div(floatx80 a, floatx80 b, float_status *status)
-{
-    bool aSign, bSign, zSign;
-    int32_t aExp, bExp, zExp;
-    uint64_t aSig, bSig, zSig0, zSig1;
-    uint64_t rem0, rem1, rem2, term0, term1, term2;
-
-    if (floatx80_invalid_encoding(a) || floatx80_invalid_encoding(b)) {
-        float_raise(float_flag_invalid, status);
-        return floatx80_default_nan(status);
-    }
-    aSig = extractFloatx80Frac( a );
-    aExp = extractFloatx80Exp( a );
-    aSign = extractFloatx80Sign( a );
-    bSig = extractFloatx80Frac( b );
-    bExp = extractFloatx80Exp( b );
-    bSign = extractFloatx80Sign( b );
-    zSign = aSign ^ bSign;
-    if ( aExp == 0x7FFF ) {
-        if ((uint64_t)(aSig << 1)) {
-            return propagateFloatx80NaN(a, b, status);
-        }
-        if ( bExp == 0x7FFF ) {
-            if ((uint64_t)(bSig << 1)) {
-                return propagateFloatx80NaN(a, b, status);
-            }
-            goto invalid;
-        }
-        return packFloatx80(zSign, floatx80_infinity_high,
-                                   floatx80_infinity_low);
-    }
-    if ( bExp == 0x7FFF ) {
-        if ((uint64_t)(bSig << 1)) {
-            return propagateFloatx80NaN(a, b, status);
-        }
-        return packFloatx80( zSign, 0, 0 );
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) {
-            if ( ( aExp | aSig ) == 0 ) {
- invalid:
-                float_raise(float_flag_invalid, status);
-                return floatx80_default_nan(status);
-            }
-            float_raise(float_flag_divbyzero, status);
-            return packFloatx80(zSign, floatx80_infinity_high,
-                                       floatx80_infinity_low);
-        }
-        normalizeFloatx80Subnormal( bSig, &bExp, &bSig );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloatx80( zSign, 0, 0 );
-        normalizeFloatx80Subnormal( aSig, &aExp, &aSig );
-    }
-    zExp = aExp - bExp + 0x3FFE;
-    rem1 = 0;
-    if ( bSig <= aSig ) {
-        shift128Right( aSig, 0, 1, &aSig, &rem1 );
-        ++zExp;
-    }
-    zSig0 = estimateDiv128To64( aSig, rem1, bSig );
-    mul64To128( bSig, zSig0, &term0, &term1 );
-    sub128( aSig, rem1, term0, term1, &rem0, &rem1 );
-    while ( (int64_t) rem0 < 0 ) {
-        --zSig0;
-        add128( rem0, rem1, 0, bSig, &rem0, &rem1 );
-    }
-    zSig1 = estimateDiv128To64( rem1, 0, bSig );
-    if ( (uint64_t) ( zSig1<<1 ) <= 8 ) {
-        mul64To128( bSig, zSig1, &term1, &term2 );
-        sub128( rem1, 0, term1, term2, &rem1, &rem2 );
-        while ( (int64_t) rem1 < 0 ) {
-            --zSig1;
-            add128( rem1, rem2, 0, bSig, &rem1, &rem2 );
-        }
-        zSig1 |= ( ( rem1 | rem2 ) != 0 );
-    }
-    return roundAndPackFloatx80(status->floatx80_rounding_precision,
-                                zSign, zExp, zSig0, zSig1, status);
-}
-
 /*----------------------------------------------------------------------------
 | Returns the remainder of the extended double-precision floating-point value
 | `a' with respect to the corresponding value `b'.  The operation is performed
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 62/72] softfloat: Convert floatx80_sqrt to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (60 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 61/72] softfloat: Convert floatx80_div " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 63/72] softfloat: Convert floatx80_round " Richard Henderson
                   ` (14 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 82 +++++++------------------------------------------
 1 file changed, 11 insertions(+), 71 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 352f359bc5..7050d8f012 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3883,6 +3883,17 @@ float128 QEMU_FLATTEN float128_sqrt(float128 a, float_status *status)
     return float128_round_pack_canonical(&p, status);
 }
 
+floatx80 floatx80_sqrt(floatx80 a, float_status *s)
+{
+    FloatParts128 p;
+
+    if (!floatx80_unpack_canonical(&p, a, s)) {
+        return floatx80_default_nan(s);
+    }
+    parts_sqrt(&p, s, &floatx80_params[s->floatx80_rounding_precision]);
+    return floatx80_round_pack_canonical(&p, s);
+}
+
 /*----------------------------------------------------------------------------
 | The pattern for a default generated NaN.
 *----------------------------------------------------------------------------*/
@@ -6046,77 +6057,6 @@ floatx80 floatx80_mod(floatx80 a, floatx80 b, float_status *status)
     return floatx80_modrem(a, b, true, &quotient, status);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the square root of the extended double-precision floating-point
-| value `a'.  The operation is performed according to the IEC/IEEE Standard
-| for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-floatx80 floatx80_sqrt(floatx80 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp, zExp;
-    uint64_t aSig0, aSig1, zSig0, zSig1, doubleZSig0;
-    uint64_t rem0, rem1, rem2, rem3, term0, term1, term2, term3;
-
-    if (floatx80_invalid_encoding(a)) {
-        float_raise(float_flag_invalid, status);
-        return floatx80_default_nan(status);
-    }
-    aSig0 = extractFloatx80Frac( a );
-    aExp = extractFloatx80Exp( a );
-    aSign = extractFloatx80Sign( a );
-    if ( aExp == 0x7FFF ) {
-        if ((uint64_t)(aSig0 << 1)) {
-            return propagateFloatx80NaN(a, a, status);
-        }
-        if ( ! aSign ) return a;
-        goto invalid;
-    }
-    if ( aSign ) {
-        if ( ( aExp | aSig0 ) == 0 ) return a;
- invalid:
-        float_raise(float_flag_invalid, status);
-        return floatx80_default_nan(status);
-    }
-    if ( aExp == 0 ) {
-        if ( aSig0 == 0 ) return packFloatx80( 0, 0, 0 );
-        normalizeFloatx80Subnormal( aSig0, &aExp, &aSig0 );
-    }
-    zExp = ( ( aExp - 0x3FFF )>>1 ) + 0x3FFF;
-    zSig0 = estimateSqrt32( aExp, aSig0>>32 );
-    shift128Right( aSig0, 0, 2 + ( aExp & 1 ), &aSig0, &aSig1 );
-    zSig0 = estimateDiv128To64( aSig0, aSig1, zSig0<<32 ) + ( zSig0<<30 );
-    doubleZSig0 = zSig0<<1;
-    mul64To128( zSig0, zSig0, &term0, &term1 );
-    sub128( aSig0, aSig1, term0, term1, &rem0, &rem1 );
-    while ( (int64_t) rem0 < 0 ) {
-        --zSig0;
-        doubleZSig0 -= 2;
-        add128( rem0, rem1, zSig0>>63, doubleZSig0 | 1, &rem0, &rem1 );
-    }
-    zSig1 = estimateDiv128To64( rem1, 0, doubleZSig0 );
-    if ( ( zSig1 & UINT64_C(0x3FFFFFFFFFFFFFFF) ) <= 5 ) {
-        if ( zSig1 == 0 ) zSig1 = 1;
-        mul64To128( doubleZSig0, zSig1, &term1, &term2 );
-        sub128( rem1, 0, term1, term2, &rem1, &rem2 );
-        mul64To128( zSig1, zSig1, &term2, &term3 );
-        sub192( rem1, rem2, 0, 0, term2, term3, &rem1, &rem2, &rem3 );
-        while ( (int64_t) rem1 < 0 ) {
-            --zSig1;
-            shortShift128Left( 0, zSig1, 1, &term2, &term3 );
-            term3 |= 1;
-            term2 |= doubleZSig0;
-            add192( rem1, rem2, rem3, 0, term2, term3, &rem1, &rem2, &rem3 );
-        }
-        zSig1 |= ( ( rem1 | rem2 | rem3 ) != 0 );
-    }
-    shortShift128Left( 0, zSig1, 1, &zSig0, &zSig1 );
-    zSig0 |= doubleZSig0;
-    return roundAndPackFloatx80(status->floatx80_rounding_precision,
-                                0, zExp, zSig0, zSig1, status);
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of converting the quadruple-precision floating-point
 | value `a' to the extended double-precision floating-point format.  The
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 63/72] softfloat: Convert floatx80_round to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (61 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 62/72] softfloat: Convert floatx80_sqrt " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 64/72] softfloat: Convert floatx80_round_to_int " Richard Henderson
                   ` (13 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 7050d8f012..faccc8df41 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5794,10 +5794,12 @@ float128 floatx80_to_float128(floatx80 a, float_status *status)
 
 floatx80 floatx80_round(floatx80 a, float_status *status)
 {
-    return roundAndPackFloatx80(status->floatx80_rounding_precision,
-                                extractFloatx80Sign(a),
-                                extractFloatx80Exp(a),
-                                extractFloatx80Frac(a), 0, status);
+    FloatParts128 p;
+
+    if (!floatx80_unpack_canonical(&p, a, status)) {
+        return floatx80_default_nan(status);
+    }
+    return floatx80_round_pack_canonical(&p, status);
 }
 
 /*----------------------------------------------------------------------------
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 64/72] softfloat: Convert floatx80_round_to_int to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (62 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 63/72] softfloat: Convert floatx80_round " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 65/72] softfloat: Convert integer to floatx80 " Richard Henderson
                   ` (12 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 116 ++++++------------------------------------------
 1 file changed, 13 insertions(+), 103 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index faccc8df41..914cf2688c 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2609,6 +2609,19 @@ float128 float128_round_to_int(float128 a, float_status *s)
     return float128_round_pack_canonical(&p, s);
 }
 
+floatx80 floatx80_round_to_int(floatx80 a, float_status *status)
+{
+    FloatParts128 p;
+
+    if (!floatx80_unpack_canonical(&p, a, status)) {
+        return floatx80_default_nan(status);
+    }
+
+    parts_round_to_int(&p, status->float_rounding_mode, 0, status,
+                       &floatx80_params[status->floatx80_rounding_precision]);
+    return floatx80_round_pack_canonical(&p, status);
+}
+
 /*
  * Floating-point to signed integer conversions
  */
@@ -5802,109 +5815,6 @@ floatx80 floatx80_round(floatx80 a, float_status *status)
     return floatx80_round_pack_canonical(&p, status);
 }
 
-/*----------------------------------------------------------------------------
-| Rounds the extended double-precision floating-point value `a' to an integer,
-| and returns the result as an extended quadruple-precision floating-point
-| value.  The operation is performed according to the IEC/IEEE Standard for
-| Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-floatx80 floatx80_round_to_int(floatx80 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp;
-    uint64_t lastBitMask, roundBitsMask;
-    floatx80 z;
-
-    if (floatx80_invalid_encoding(a)) {
-        float_raise(float_flag_invalid, status);
-        return floatx80_default_nan(status);
-    }
-    aExp = extractFloatx80Exp( a );
-    if ( 0x403E <= aExp ) {
-        if ( ( aExp == 0x7FFF ) && (uint64_t) ( extractFloatx80Frac( a )<<1 ) ) {
-            return propagateFloatx80NaN(a, a, status);
-        }
-        return a;
-    }
-    if ( aExp < 0x3FFF ) {
-        if (    ( aExp == 0 )
-             && ( (uint64_t) ( extractFloatx80Frac( a ) ) == 0 ) ) {
-            return a;
-        }
-        float_raise(float_flag_inexact, status);
-        aSign = extractFloatx80Sign( a );
-        switch (status->float_rounding_mode) {
-         case float_round_nearest_even:
-            if ( ( aExp == 0x3FFE ) && (uint64_t) ( extractFloatx80Frac( a )<<1 )
-               ) {
-                return
-                    packFloatx80( aSign, 0x3FFF, UINT64_C(0x8000000000000000));
-            }
-            break;
-        case float_round_ties_away:
-            if (aExp == 0x3FFE) {
-                return packFloatx80(aSign, 0x3FFF, UINT64_C(0x8000000000000000));
-            }
-            break;
-         case float_round_down:
-            return
-                  aSign ?
-                      packFloatx80( 1, 0x3FFF, UINT64_C(0x8000000000000000))
-                : packFloatx80( 0, 0, 0 );
-         case float_round_up:
-            return
-                  aSign ? packFloatx80( 1, 0, 0 )
-                : packFloatx80( 0, 0x3FFF, UINT64_C(0x8000000000000000));
-
-        case float_round_to_zero:
-            break;
-        default:
-            g_assert_not_reached();
-        }
-        return packFloatx80( aSign, 0, 0 );
-    }
-    lastBitMask = 1;
-    lastBitMask <<= 0x403E - aExp;
-    roundBitsMask = lastBitMask - 1;
-    z = a;
-    switch (status->float_rounding_mode) {
-    case float_round_nearest_even:
-        z.low += lastBitMask>>1;
-        if ((z.low & roundBitsMask) == 0) {
-            z.low &= ~lastBitMask;
-        }
-        break;
-    case float_round_ties_away:
-        z.low += lastBitMask >> 1;
-        break;
-    case float_round_to_zero:
-        break;
-    case float_round_up:
-        if (!extractFloatx80Sign(z)) {
-            z.low += roundBitsMask;
-        }
-        break;
-    case float_round_down:
-        if (extractFloatx80Sign(z)) {
-            z.low += roundBitsMask;
-        }
-        break;
-    default:
-        abort();
-    }
-    z.low &= ~ roundBitsMask;
-    if ( z.low == 0 ) {
-        ++z.high;
-        z.low = UINT64_C(0x8000000000000000);
-    }
-    if (z.low != a.low) {
-        float_raise(float_flag_inexact, status);
-    }
-    return z;
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the remainder of the extended double-precision floating-point value
 | `a' with respect to the corresponding value `b'.  The operation is performed
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 65/72] softfloat: Convert integer to floatx80 to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (63 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 64/72] softfloat: Convert floatx80_round_to_int " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 66/72] softfloat: Convert floatx80 float conversions " Richard Henderson
                   ` (11 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 58 +++++++++++--------------------------------------
 1 file changed, 13 insertions(+), 45 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 914cf2688c..82f71896ac 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3344,6 +3344,19 @@ float128 int32_to_float128(int32_t a, float_status *status)
     return int64_to_float128(a, status);
 }
 
+floatx80 int64_to_floatx80(int64_t a, float_status *status)
+{
+    FloatParts128 p;
+
+    parts_sint_to_float(&p, a, 0, status);
+    return floatx80_round_pack_canonical(&p, status);
+}
+
+floatx80 int32_to_floatx80(int32_t a, float_status *status)
+{
+    return int64_to_floatx80(a, status);
+}
+
 /*
  * Unsigned Integer to floating-point conversions
  */
@@ -5035,51 +5048,6 @@ static float128 normalizeRoundAndPackFloat128(bool zSign, int32_t zExp,
 
 }
 
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 32-bit two's complement integer `a'
-| to the extended double-precision floating-point format.  The conversion
-| is performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic.
-*----------------------------------------------------------------------------*/
-
-floatx80 int32_to_floatx80(int32_t a, float_status *status)
-{
-    bool zSign;
-    uint32_t absA;
-    int8_t shiftCount;
-    uint64_t zSig;
-
-    if ( a == 0 ) return packFloatx80( 0, 0, 0 );
-    zSign = ( a < 0 );
-    absA = zSign ? - a : a;
-    shiftCount = clz32(absA) + 32;
-    zSig = absA;
-    return packFloatx80( zSign, 0x403E - shiftCount, zSig<<shiftCount );
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 64-bit two's complement integer `a'
-| to the extended double-precision floating-point format.  The conversion
-| is performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic.
-*----------------------------------------------------------------------------*/
-
-floatx80 int64_to_floatx80(int64_t a, float_status *status)
-{
-    bool zSign;
-    uint64_t absA;
-    int8_t shiftCount;
-
-    if ( a == 0 ) return packFloatx80( 0, 0, 0 );
-    zSign = ( a < 0 );
-    absA = zSign ? - a : a;
-    shiftCount = clz64(absA);
-    return packFloatx80( zSign, 0x403E - shiftCount, absA<<shiftCount );
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of converting the single-precision floating-point value
 | `a' to the extended double-precision floating-point format.  The conversion
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 66/72] softfloat: Convert floatx80 float conversions to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (64 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 65/72] softfloat: Convert integer to floatx80 " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 67/72] softfloat: Convert floatx80 to integer " Richard Henderson
                   ` (10 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

This is the last use of commonNaNT and all of the routines
that use it, so remove all of them for Werror.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c                | 276 ++++++++-------------------------
 fpu/softfloat-specialize.c.inc | 175 ---------------------
 2 files changed, 67 insertions(+), 384 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 82f71896ac..d7c6c37d99 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2560,6 +2560,73 @@ float128 float64_to_float128(float64 a, float_status *s)
     return float128_round_pack_canonical(&p128, s);
 }
 
+float32 floatx80_to_float32(floatx80 a, float_status *s)
+{
+    FloatParts64 p64;
+    FloatParts128 p128;
+
+    if (floatx80_unpack_canonical(&p128, a, s)) {
+        parts_float_to_float_narrow(&p64, &p128, s);
+    } else {
+        parts_default_nan(&p64, s);
+    }
+    return float32_round_pack_canonical(&p64, s);
+}
+
+float64 floatx80_to_float64(floatx80 a, float_status *s)
+{
+    FloatParts64 p64;
+    FloatParts128 p128;
+
+    if (floatx80_unpack_canonical(&p128, a, s)) {
+        parts_float_to_float_narrow(&p64, &p128, s);
+    } else {
+        parts_default_nan(&p64, s);
+    }
+    return float64_round_pack_canonical(&p64, s);
+}
+
+float128 floatx80_to_float128(floatx80 a, float_status *s)
+{
+    FloatParts128 p;
+
+    if (floatx80_unpack_canonical(&p, a, s)) {
+        parts_float_to_float(&p, s);
+    } else {
+        parts_default_nan(&p, s);
+    }
+    return float128_round_pack_canonical(&p, s);
+}
+
+floatx80 float32_to_floatx80(float32 a, float_status *s)
+{
+    FloatParts64 p64;
+    FloatParts128 p128;
+
+    float32_unpack_canonical(&p64, a, s);
+    parts_float_to_float_widen(&p128, &p64, s);
+    return floatx80_round_pack_canonical(&p128, s);
+}
+
+floatx80 float64_to_floatx80(float64 a, float_status *s)
+{
+    FloatParts64 p64;
+    FloatParts128 p128;
+
+    float64_unpack_canonical(&p64, a, s);
+    parts_float_to_float_widen(&p128, &p64, s);
+    return floatx80_round_pack_canonical(&p128, s);
+}
+
+floatx80 float128_to_floatx80(float128 a, float_status *s)
+{
+    FloatParts128 p;
+
+    float128_unpack_canonical(&p, a, s);
+    parts_float_to_float(&p, s);
+    return floatx80_round_pack_canonical(&p, s);
+}
+
 /*
  * Round to integral value
  */
@@ -5048,42 +5115,6 @@ static float128 normalizeRoundAndPackFloat128(bool zSign, int32_t zExp,
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the extended double-precision floating-point format.  The conversion
-| is performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic.
-*----------------------------------------------------------------------------*/
-
-floatx80 float32_to_floatx80(float32 a, float_status *status)
-{
-    bool aSign;
-    int aExp;
-    uint32_t aSig;
-
-    a = float32_squash_input_denormal(a, status);
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    if ( aExp == 0xFF ) {
-        if (aSig) {
-            floatx80 res = commonNaNToFloatx80(float32ToCommonNaN(a, status),
-                                               status);
-            return floatx80_silence_nan(res, status);
-        }
-        return packFloatx80(aSign,
-                            floatx80_infinity_high,
-                            floatx80_infinity_low);
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloatx80( aSign, 0, 0 );
-        normalizeFloat32Subnormal( aSig, &aExp, &aSig );
-    }
-    aSig |= 0x00800000;
-    return packFloatx80( aSign, aExp + 0x3F80, ( (uint64_t) aSig )<<40 );
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the remainder of the single-precision floating-point value `a'
 | with respect to the corresponding value `b'.  The operation is performed
@@ -5320,43 +5351,6 @@ float32 float32_log2(float32 a, float_status *status)
     return normalizeRoundAndPackFloat32(zSign, 0x85, zSig, status);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point value
-| `a' to the extended double-precision floating-point format.  The conversion
-| is performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic.
-*----------------------------------------------------------------------------*/
-
-floatx80 float64_to_floatx80(float64 a, float_status *status)
-{
-    bool aSign;
-    int aExp;
-    uint64_t aSig;
-
-    a = float64_squash_input_denormal(a, status);
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    if ( aExp == 0x7FF ) {
-        if (aSig) {
-            floatx80 res = commonNaNToFloatx80(float64ToCommonNaN(a, status),
-                                               status);
-            return floatx80_silence_nan(res, status);
-        }
-        return packFloatx80(aSign,
-                            floatx80_infinity_high,
-                            floatx80_infinity_low);
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloatx80( aSign, 0, 0 );
-        normalizeFloat64Subnormal( aSig, &aExp, &aSig );
-    }
-    return
-        packFloatx80(
-            aSign, aExp + 0x3C00, (aSig | UINT64_C(0x0010000000000000)) << 11);
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the remainder of the double-precision floating-point value `a'
 | with respect to the corresponding value `b'.  The operation is performed
@@ -5667,104 +5661,6 @@ int64_t floatx80_to_int64_round_to_zero(floatx80 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the extended double-precision floating-
-| point value `a' to the single-precision floating-point format.  The
-| conversion is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 floatx80_to_float32(floatx80 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp;
-    uint64_t aSig;
-
-    if (floatx80_invalid_encoding(a)) {
-        float_raise(float_flag_invalid, status);
-        return float32_default_nan(status);
-    }
-    aSig = extractFloatx80Frac( a );
-    aExp = extractFloatx80Exp( a );
-    aSign = extractFloatx80Sign( a );
-    if ( aExp == 0x7FFF ) {
-        if ( (uint64_t) ( aSig<<1 ) ) {
-            float32 res = commonNaNToFloat32(floatx80ToCommonNaN(a, status),
-                                             status);
-            return float32_silence_nan(res, status);
-        }
-        return packFloat32( aSign, 0xFF, 0 );
-    }
-    shift64RightJamming( aSig, 33, &aSig );
-    if ( aExp || aSig ) aExp -= 0x3F81;
-    return roundAndPackFloat32(aSign, aExp, aSig, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the extended double-precision floating-
-| point value `a' to the double-precision floating-point format.  The
-| conversion is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 floatx80_to_float64(floatx80 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp;
-    uint64_t aSig, zSig;
-
-    if (floatx80_invalid_encoding(a)) {
-        float_raise(float_flag_invalid, status);
-        return float64_default_nan(status);
-    }
-    aSig = extractFloatx80Frac( a );
-    aExp = extractFloatx80Exp( a );
-    aSign = extractFloatx80Sign( a );
-    if ( aExp == 0x7FFF ) {
-        if ( (uint64_t) ( aSig<<1 ) ) {
-            float64 res = commonNaNToFloat64(floatx80ToCommonNaN(a, status),
-                                             status);
-            return float64_silence_nan(res, status);
-        }
-        return packFloat64( aSign, 0x7FF, 0 );
-    }
-    shift64RightJamming( aSig, 1, &zSig );
-    if ( aExp || aSig ) aExp -= 0x3C01;
-    return roundAndPackFloat64(aSign, aExp, zSig, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the extended double-precision floating-
-| point value `a' to the quadruple-precision floating-point format.  The
-| conversion is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float128 floatx80_to_float128(floatx80 a, float_status *status)
-{
-    bool aSign;
-    int aExp;
-    uint64_t aSig, zSig0, zSig1;
-
-    if (floatx80_invalid_encoding(a)) {
-        float_raise(float_flag_invalid, status);
-        return float128_default_nan(status);
-    }
-    aSig = extractFloatx80Frac( a );
-    aExp = extractFloatx80Exp( a );
-    aSign = extractFloatx80Sign( a );
-    if ( ( aExp == 0x7FFF ) && (uint64_t) ( aSig<<1 ) ) {
-        float128 res = commonNaNToFloat128(floatx80ToCommonNaN(a, status),
-                                           status);
-        return float128_silence_nan(res, status);
-    }
-    shift128Right( aSig<<1, 0, 16, &zSig0, &zSig1 );
-    return packFloat128( aSign, aExp, zSig0, zSig1 );
-
-}
-
 /*----------------------------------------------------------------------------
 | Rounds the extended double-precision floating-point value `a'
 | to the precision provided by floatx80_rounding_precision and returns the
@@ -5937,44 +5833,6 @@ floatx80 floatx80_mod(floatx80 a, floatx80 b, float_status *status)
     return floatx80_modrem(a, b, true, &quotient, status);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the quadruple-precision floating-point
-| value `a' to the extended double-precision floating-point format.  The
-| conversion is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-floatx80 float128_to_floatx80(float128 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp;
-    uint64_t aSig0, aSig1;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    aSign = extractFloat128Sign( a );
-    if ( aExp == 0x7FFF ) {
-        if ( aSig0 | aSig1 ) {
-            floatx80 res = commonNaNToFloatx80(float128ToCommonNaN(a, status),
-                                               status);
-            return floatx80_silence_nan(res, status);
-        }
-        return packFloatx80(aSign, floatx80_infinity_high,
-                                   floatx80_infinity_low);
-    }
-    if ( aExp == 0 ) {
-        if ( ( aSig0 | aSig1 ) == 0 ) return packFloatx80( aSign, 0, 0 );
-        normalizeFloat128Subnormal( aSig0, aSig1, &aExp, &aSig0, &aSig1 );
-    }
-    else {
-        aSig0 |= UINT64_C(0x0001000000000000);
-    }
-    shortShift128Left( aSig0, aSig1, 15, &aSig0, &aSig1 );
-    return roundAndPackFloatx80(80, aSign, aExp, aSig0, aSig1, status);
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the remainder of the quadruple-precision floating-point value `a'
 | with respect to the corresponding value `b'.  The operation is performed
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index a0cf016b4f..88eab344df 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -257,14 +257,6 @@ floatx80 floatx80_default_nan(float_status *status)
 const floatx80 floatx80_infinity
     = make_floatx80_init(floatx80_infinity_high, floatx80_infinity_low);
 
-/*----------------------------------------------------------------------------
-| Internal canonical NaN format.
-*----------------------------------------------------------------------------*/
-typedef struct {
-    bool sign;
-    uint64_t high, low;
-} commonNaNT;
-
 /*----------------------------------------------------------------------------
 | Returns 1 if the half-precision floating-point value `a' is a quiet
 | NaN; otherwise returns 0.
@@ -380,46 +372,6 @@ bool float32_is_signaling_nan(float32 a_, float_status *status)
     }
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point NaN
-| `a' to the canonical NaN format.  If `a' is a signaling NaN, the invalid
-| exception is raised.
-*----------------------------------------------------------------------------*/
-
-static commonNaNT float32ToCommonNaN(float32 a, float_status *status)
-{
-    commonNaNT z;
-
-    if (float32_is_signaling_nan(a, status)) {
-        float_raise(float_flag_invalid, status);
-    }
-    z.sign = float32_val(a) >> 31;
-    z.low = 0;
-    z.high = ((uint64_t)float32_val(a)) << 41;
-    return z;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the canonical NaN `a' to the single-
-| precision floating-point format.
-*----------------------------------------------------------------------------*/
-
-static float32 commonNaNToFloat32(commonNaNT a, float_status *status)
-{
-    uint32_t mantissa = a.high >> 41;
-
-    if (status->default_nan_mode) {
-        return float32_default_nan(status);
-    }
-
-    if (mantissa) {
-        return make_float32(
-            (((uint32_t)a.sign) << 31) | 0x7F800000 | (a.high >> 41));
-    } else {
-        return float32_default_nan(status);
-    }
-}
-
 /*----------------------------------------------------------------------------
 | Select which NaN to propagate for a two-input operation.
 | IEEE754 doesn't specify all the details of this, so the
@@ -780,48 +732,6 @@ bool float64_is_signaling_nan(float64 a_, float_status *status)
     }
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point NaN
-| `a' to the canonical NaN format.  If `a' is a signaling NaN, the invalid
-| exception is raised.
-*----------------------------------------------------------------------------*/
-
-static commonNaNT float64ToCommonNaN(float64 a, float_status *status)
-{
-    commonNaNT z;
-
-    if (float64_is_signaling_nan(a, status)) {
-        float_raise(float_flag_invalid, status);
-    }
-    z.sign = float64_val(a) >> 63;
-    z.low = 0;
-    z.high = float64_val(a) << 12;
-    return z;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the canonical NaN `a' to the double-
-| precision floating-point format.
-*----------------------------------------------------------------------------*/
-
-static float64 commonNaNToFloat64(commonNaNT a, float_status *status)
-{
-    uint64_t mantissa = a.high >> 12;
-
-    if (status->default_nan_mode) {
-        return float64_default_nan(status);
-    }
-
-    if (mantissa) {
-        return make_float64(
-              (((uint64_t) a.sign) << 63)
-            | UINT64_C(0x7FF0000000000000)
-            | (a.high >> 12));
-    } else {
-        return float64_default_nan(status);
-    }
-}
-
 /*----------------------------------------------------------------------------
 | Takes two double-precision floating-point values `a' and `b', one of which
 | is a NaN, and returns the appropriate NaN result.  If either `a' or `b' is a
@@ -941,55 +851,6 @@ floatx80 floatx80_silence_nan(floatx80 a, float_status *status)
     return a;
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the extended double-precision floating-
-| point NaN `a' to the canonical NaN format.  If `a' is a signaling NaN, the
-| invalid exception is raised.
-*----------------------------------------------------------------------------*/
-
-static commonNaNT floatx80ToCommonNaN(floatx80 a, float_status *status)
-{
-    floatx80 dflt;
-    commonNaNT z;
-
-    if (floatx80_is_signaling_nan(a, status)) {
-        float_raise(float_flag_invalid, status);
-    }
-    if (a.low >> 63) {
-        z.sign = a.high >> 15;
-        z.low = 0;
-        z.high = a.low << 1;
-    } else {
-        dflt = floatx80_default_nan(status);
-        z.sign = dflt.high >> 15;
-        z.low = 0;
-        z.high = dflt.low << 1;
-    }
-    return z;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the canonical NaN `a' to the extended
-| double-precision floating-point format.
-*----------------------------------------------------------------------------*/
-
-static floatx80 commonNaNToFloatx80(commonNaNT a, float_status *status)
-{
-    floatx80 z;
-
-    if (status->default_nan_mode) {
-        return floatx80_default_nan(status);
-    }
-
-    if (a.high >> 1) {
-        z.low = UINT64_C(0x8000000000000000) | a.high >> 1;
-        z.high = (((uint16_t)a.sign) << 15) | 0x7FFF;
-    } else {
-        z = floatx80_default_nan(status);
-    }
-    return z;
-}
-
 /*----------------------------------------------------------------------------
 | Takes two extended double-precision floating-point values `a' and `b', one
 | of which is a NaN, and returns the appropriate NaN result.  If either `a' or
@@ -1082,42 +943,6 @@ bool float128_is_signaling_nan(float128 a, float_status *status)
     }
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the quadruple-precision floating-point NaN
-| `a' to the canonical NaN format.  If `a' is a signaling NaN, the invalid
-| exception is raised.
-*----------------------------------------------------------------------------*/
-
-static commonNaNT float128ToCommonNaN(float128 a, float_status *status)
-{
-    commonNaNT z;
-
-    if (float128_is_signaling_nan(a, status)) {
-        float_raise(float_flag_invalid, status);
-    }
-    z.sign = a.high >> 63;
-    shortShift128Left(a.high, a.low, 16, &z.high, &z.low);
-    return z;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the canonical NaN `a' to the quadruple-
-| precision floating-point format.
-*----------------------------------------------------------------------------*/
-
-static float128 commonNaNToFloat128(commonNaNT a, float_status *status)
-{
-    float128 z;
-
-    if (status->default_nan_mode) {
-        return float128_default_nan(status);
-    }
-
-    shift128Right(a.high, a.low, 16, &z.high, &z.low);
-    z.high |= (((uint64_t)a.sign) << 63) | UINT64_C(0x7FFF000000000000);
-    return z;
-}
-
 /*----------------------------------------------------------------------------
 | Takes two quadruple-precision floating-point values `a' and `b', one of
 | which is a NaN, and returns the appropriate NaN result.  If either `a' or
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 67/72] softfloat: Convert floatx80 to integer to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (65 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 66/72] softfloat: Convert floatx80 float conversions " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 68/72] softfloat: Convert floatx80_scalbn " Richard Henderson
                   ` (9 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 336 ++++++------------------------------------------
 1 file changed, 42 insertions(+), 294 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index d7c6c37d99..9f28c5c058 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2828,6 +2828,28 @@ static int64_t float128_to_int64_scalbn(float128 a, FloatRoundMode rmode,
     return parts_float_to_sint(&p, rmode, scale, INT64_MIN, INT64_MAX, s);
 }
 
+static int32_t floatx80_to_int32_scalbn(floatx80 a, FloatRoundMode rmode,
+                                        int scale, float_status *s)
+{
+    FloatParts128 p;
+
+    if (!floatx80_unpack_canonical(&p, a, s)) {
+        parts_default_nan(&p, s);
+    }
+    return parts_float_to_sint(&p, rmode, scale, INT32_MIN, INT32_MAX, s);
+}
+
+static int64_t floatx80_to_int64_scalbn(floatx80 a, FloatRoundMode rmode,
+                                        int scale, float_status *s)
+{
+    FloatParts128 p;
+
+    if (!floatx80_unpack_canonical(&p, a, s)) {
+        parts_default_nan(&p, s);
+    }
+    return parts_float_to_sint(&p, rmode, scale, INT64_MIN, INT64_MAX, s);
+}
+
 int8_t float16_to_int8(float16 a, float_status *s)
 {
     return float16_to_int8_scalbn(a, s->float_rounding_mode, 0, s);
@@ -2888,6 +2910,16 @@ int64_t float128_to_int64(float128 a, float_status *s)
     return float128_to_int64_scalbn(a, s->float_rounding_mode, 0, s);
 }
 
+int32_t floatx80_to_int32(floatx80 a, float_status *s)
+{
+    return floatx80_to_int32_scalbn(a, s->float_rounding_mode, 0, s);
+}
+
+int64_t floatx80_to_int64(floatx80 a, float_status *s)
+{
+    return floatx80_to_int64_scalbn(a, s->float_rounding_mode, 0, s);
+}
+
 int16_t float16_to_int16_round_to_zero(float16 a, float_status *s)
 {
     return float16_to_int16_scalbn(a, float_round_to_zero, 0, s);
@@ -2943,6 +2975,16 @@ int64_t float128_to_int64_round_to_zero(float128 a, float_status *s)
     return float128_to_int64_scalbn(a, float_round_to_zero, 0, s);
 }
 
+int32_t floatx80_to_int32_round_to_zero(floatx80 a, float_status *s)
+{
+    return floatx80_to_int32_scalbn(a, float_round_to_zero, 0, s);
+}
+
+int64_t floatx80_to_int64_round_to_zero(floatx80 a, float_status *s)
+{
+    return floatx80_to_int64_scalbn(a, float_round_to_zero, 0, s);
+}
+
 int16_t bfloat16_to_int16(bfloat16 a, float_status *s)
 {
     return bfloat16_to_int16_scalbn(a, s->float_rounding_mode, 0, s);
@@ -4162,127 +4204,6 @@ bfloat16 bfloat16_squash_input_denormal(bfloat16 a, float_status *status)
     return a;
 }
 
-/*----------------------------------------------------------------------------
-| Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
-| and 7, and returns the properly rounded 32-bit integer corresponding to the
-| input.  If `zSign' is 1, the input is negated before being converted to an
-| integer.  Bit 63 of `absZ' must be zero.  Ordinarily, the fixed-point input
-| is simply rounded to an integer, with the inexact exception raised if the
-| input cannot be represented exactly as an integer.  However, if the fixed-
-| point input is too large, the invalid exception is raised and the largest
-| positive or negative integer is returned.
-*----------------------------------------------------------------------------*/
-
-static int32_t roundAndPackInt32(bool zSign, uint64_t absZ,
-                                 float_status *status)
-{
-    int8_t roundingMode;
-    bool roundNearestEven;
-    int8_t roundIncrement, roundBits;
-    int32_t z;
-
-    roundingMode = status->float_rounding_mode;
-    roundNearestEven = ( roundingMode == float_round_nearest_even );
-    switch (roundingMode) {
-    case float_round_nearest_even:
-    case float_round_ties_away:
-        roundIncrement = 0x40;
-        break;
-    case float_round_to_zero:
-        roundIncrement = 0;
-        break;
-    case float_round_up:
-        roundIncrement = zSign ? 0 : 0x7f;
-        break;
-    case float_round_down:
-        roundIncrement = zSign ? 0x7f : 0;
-        break;
-    case float_round_to_odd:
-        roundIncrement = absZ & 0x80 ? 0 : 0x7f;
-        break;
-    default:
-        abort();
-    }
-    roundBits = absZ & 0x7F;
-    absZ = ( absZ + roundIncrement )>>7;
-    if (!(roundBits ^ 0x40) && roundNearestEven) {
-        absZ &= ~1;
-    }
-    z = absZ;
-    if ( zSign ) z = - z;
-    if ( ( absZ>>32 ) || ( z && ( ( z < 0 ) ^ zSign ) ) ) {
-        float_raise(float_flag_invalid, status);
-        return zSign ? INT32_MIN : INT32_MAX;
-    }
-    if (roundBits) {
-        float_raise(float_flag_inexact, status);
-    }
-    return z;
-
-}
-
-/*----------------------------------------------------------------------------
-| Takes the 128-bit fixed-point value formed by concatenating `absZ0' and
-| `absZ1', with binary point between bits 63 and 64 (between the input words),
-| and returns the properly rounded 64-bit integer corresponding to the input.
-| If `zSign' is 1, the input is negated before being converted to an integer.
-| Ordinarily, the fixed-point input is simply rounded to an integer, with
-| the inexact exception raised if the input cannot be represented exactly as
-| an integer.  However, if the fixed-point input is too large, the invalid
-| exception is raised and the largest positive or negative integer is
-| returned.
-*----------------------------------------------------------------------------*/
-
-static int64_t roundAndPackInt64(bool zSign, uint64_t absZ0, uint64_t absZ1,
-                               float_status *status)
-{
-    int8_t roundingMode;
-    bool roundNearestEven, increment;
-    int64_t z;
-
-    roundingMode = status->float_rounding_mode;
-    roundNearestEven = ( roundingMode == float_round_nearest_even );
-    switch (roundingMode) {
-    case float_round_nearest_even:
-    case float_round_ties_away:
-        increment = ((int64_t) absZ1 < 0);
-        break;
-    case float_round_to_zero:
-        increment = 0;
-        break;
-    case float_round_up:
-        increment = !zSign && absZ1;
-        break;
-    case float_round_down:
-        increment = zSign && absZ1;
-        break;
-    case float_round_to_odd:
-        increment = !(absZ0 & 1) && absZ1;
-        break;
-    default:
-        abort();
-    }
-    if ( increment ) {
-        ++absZ0;
-        if ( absZ0 == 0 ) goto overflow;
-        if (!(absZ1 << 1) && roundNearestEven) {
-            absZ0 &= ~1;
-        }
-    }
-    z = absZ0;
-    if ( zSign ) z = - z;
-    if ( z && ( ( z < 0 ) ^ zSign ) ) {
- overflow:
-        float_raise(float_flag_invalid, status);
-        return zSign ? INT64_MIN : INT64_MAX;
-    }
-    if (absZ1) {
-        float_raise(float_flag_inexact, status);
-    }
-    return z;
-
-}
-
 /*----------------------------------------------------------------------------
 | Normalizes the subnormal single-precision floating-point value represented
 | by the denormalized significand `aSig'.  The normalized exponent and
@@ -5488,179 +5409,6 @@ float64 float64_log2(float64 a, float_status *status)
     return normalizeRoundAndPackFloat64(zSign, 0x408, zSig, status);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the extended double-precision floating-
-| point value `a' to the 32-bit two's complement integer format.  The
-| conversion is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic---which means in particular that the conversion
-| is rounded according to the current rounding mode.  If `a' is a NaN, the
-| largest positive integer is returned.  Otherwise, if the conversion
-| overflows, the largest integer with the same sign as `a' is returned.
-*----------------------------------------------------------------------------*/
-
-int32_t floatx80_to_int32(floatx80 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp, shiftCount;
-    uint64_t aSig;
-
-    if (floatx80_invalid_encoding(a)) {
-        float_raise(float_flag_invalid, status);
-        return 1 << 31;
-    }
-    aSig = extractFloatx80Frac( a );
-    aExp = extractFloatx80Exp( a );
-    aSign = extractFloatx80Sign( a );
-    if ( ( aExp == 0x7FFF ) && (uint64_t) ( aSig<<1 ) ) aSign = 0;
-    shiftCount = 0x4037 - aExp;
-    if ( shiftCount <= 0 ) shiftCount = 1;
-    shift64RightJamming( aSig, shiftCount, &aSig );
-    return roundAndPackInt32(aSign, aSig, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the extended double-precision floating-
-| point value `a' to the 32-bit two's complement integer format.  The
-| conversion is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic, except that the conversion is always rounded
-| toward zero.  If `a' is a NaN, the largest positive integer is returned.
-| Otherwise, if the conversion overflows, the largest integer with the same
-| sign as `a' is returned.
-*----------------------------------------------------------------------------*/
-
-int32_t floatx80_to_int32_round_to_zero(floatx80 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp, shiftCount;
-    uint64_t aSig, savedASig;
-    int32_t z;
-
-    if (floatx80_invalid_encoding(a)) {
-        float_raise(float_flag_invalid, status);
-        return 1 << 31;
-    }
-    aSig = extractFloatx80Frac( a );
-    aExp = extractFloatx80Exp( a );
-    aSign = extractFloatx80Sign( a );
-    if ( 0x401E < aExp ) {
-        if ( ( aExp == 0x7FFF ) && (uint64_t) ( aSig<<1 ) ) aSign = 0;
-        goto invalid;
-    }
-    else if ( aExp < 0x3FFF ) {
-        if (aExp || aSig) {
-            float_raise(float_flag_inexact, status);
-        }
-        return 0;
-    }
-    shiftCount = 0x403E - aExp;
-    savedASig = aSig;
-    aSig >>= shiftCount;
-    z = aSig;
-    if ( aSign ) z = - z;
-    if ( ( z < 0 ) ^ aSign ) {
- invalid:
-        float_raise(float_flag_invalid, status);
-        return aSign ? (int32_t) 0x80000000 : 0x7FFFFFFF;
-    }
-    if ( ( aSig<<shiftCount ) != savedASig ) {
-        float_raise(float_flag_inexact, status);
-    }
-    return z;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the extended double-precision floating-
-| point value `a' to the 64-bit two's complement integer format.  The
-| conversion is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic---which means in particular that the conversion
-| is rounded according to the current rounding mode.  If `a' is a NaN,
-| the largest positive integer is returned.  Otherwise, if the conversion
-| overflows, the largest integer with the same sign as `a' is returned.
-*----------------------------------------------------------------------------*/
-
-int64_t floatx80_to_int64(floatx80 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp, shiftCount;
-    uint64_t aSig, aSigExtra;
-
-    if (floatx80_invalid_encoding(a)) {
-        float_raise(float_flag_invalid, status);
-        return 1ULL << 63;
-    }
-    aSig = extractFloatx80Frac( a );
-    aExp = extractFloatx80Exp( a );
-    aSign = extractFloatx80Sign( a );
-    shiftCount = 0x403E - aExp;
-    if ( shiftCount <= 0 ) {
-        if ( shiftCount ) {
-            float_raise(float_flag_invalid, status);
-            if (!aSign || floatx80_is_any_nan(a)) {
-                return INT64_MAX;
-            }
-            return INT64_MIN;
-        }
-        aSigExtra = 0;
-    }
-    else {
-        shift64ExtraRightJamming( aSig, 0, shiftCount, &aSig, &aSigExtra );
-    }
-    return roundAndPackInt64(aSign, aSig, aSigExtra, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the extended double-precision floating-
-| point value `a' to the 64-bit two's complement integer format.  The
-| conversion is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic, except that the conversion is always rounded
-| toward zero.  If `a' is a NaN, the largest positive integer is returned.
-| Otherwise, if the conversion overflows, the largest integer with the same
-| sign as `a' is returned.
-*----------------------------------------------------------------------------*/
-
-int64_t floatx80_to_int64_round_to_zero(floatx80 a, float_status *status)
-{
-    bool aSign;
-    int32_t aExp, shiftCount;
-    uint64_t aSig;
-    int64_t z;
-
-    if (floatx80_invalid_encoding(a)) {
-        float_raise(float_flag_invalid, status);
-        return 1ULL << 63;
-    }
-    aSig = extractFloatx80Frac( a );
-    aExp = extractFloatx80Exp( a );
-    aSign = extractFloatx80Sign( a );
-    shiftCount = aExp - 0x403E;
-    if ( 0 <= shiftCount ) {
-        aSig &= UINT64_C(0x7FFFFFFFFFFFFFFF);
-        if ( ( a.high != 0xC03E ) || aSig ) {
-            float_raise(float_flag_invalid, status);
-            if ( ! aSign || ( ( aExp == 0x7FFF ) && aSig ) ) {
-                return INT64_MAX;
-            }
-        }
-        return INT64_MIN;
-    }
-    else if ( aExp < 0x3FFF ) {
-        if (aExp | aSig) {
-            float_raise(float_flag_inexact, status);
-        }
-        return 0;
-    }
-    z = aSig>>( - shiftCount );
-    if ( (uint64_t) ( aSig<<( shiftCount & 63 ) ) ) {
-        float_raise(float_flag_inexact, status);
-    }
-    if ( aSign ) z = - z;
-    return z;
-
-}
-
 /*----------------------------------------------------------------------------
 | Rounds the extended double-precision floating-point value `a'
 | to the precision provided by floatx80_rounding_precision and returns the
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 68/72] softfloat: Convert floatx80_scalbn to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (66 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 67/72] softfloat: Convert floatx80 to integer " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:47 ` [PATCH 69/72] softfloat: Convert floatx80 compare " Richard Henderson
                   ` (8 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 50 +++++++++++--------------------------------------
 1 file changed, 11 insertions(+), 39 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 9f28c5c058..81ff563dc0 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3913,6 +3913,17 @@ float128 float128_scalbn(float128 a, int n, float_status *status)
     return float128_round_pack_canonical(&p, status);
 }
 
+floatx80 floatx80_scalbn(floatx80 a, int n, float_status *status)
+{
+    FloatParts128 p;
+
+    if (!floatx80_unpack_canonical(&p, a, status)) {
+        return floatx80_default_nan(status);
+    }
+    parts_scalbn(&p, n, status);
+    return floatx80_round_pack_canonical(&p, status);
+}
+
 /*
  * Square Root
  */
@@ -5747,45 +5758,6 @@ FloatRelation floatx80_compare_quiet(floatx80 a, floatx80 b,
     return floatx80_compare_internal(a, b, 1, status);
 }
 
-floatx80 floatx80_scalbn(floatx80 a, int n, float_status *status)
-{
-    bool aSign;
-    int32_t aExp;
-    uint64_t aSig;
-
-    if (floatx80_invalid_encoding(a)) {
-        float_raise(float_flag_invalid, status);
-        return floatx80_default_nan(status);
-    }
-    aSig = extractFloatx80Frac( a );
-    aExp = extractFloatx80Exp( a );
-    aSign = extractFloatx80Sign( a );
-
-    if ( aExp == 0x7FFF ) {
-        if ( aSig<<1 ) {
-            return propagateFloatx80NaN(a, a, status);
-        }
-        return a;
-    }
-
-    if (aExp == 0) {
-        if (aSig == 0) {
-            return a;
-        }
-        aExp++;
-    }
-
-    if (n > 0x10000) {
-        n = 0x10000;
-    } else if (n < -0x10000) {
-        n = -0x10000;
-    }
-
-    aExp += n;
-    return normalizeRoundAndPackFloatx80(status->floatx80_rounding_precision,
-                                         aSign, aExp, aSig, 0, status);
-}
-
 static void __attribute__((constructor)) softfloat_init(void)
 {
     union_float64 ua, ub, uc, ur;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 69/72] softfloat: Convert floatx80 compare to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (67 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 68/72] softfloat: Convert floatx80_scalbn " Richard Henderson
@ 2021-05-08  1:47 ` Richard Henderson
  2021-05-08  1:48 ` [PATCH 70/72] softfloat: Convert float32_exp2 " Richard Henderson
                   ` (7 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 82 +++++++++++++------------------------------------
 1 file changed, 22 insertions(+), 60 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 81ff563dc0..b89ec4b832 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3864,6 +3864,28 @@ FloatRelation float128_compare_quiet(float128 a, float128 b, float_status *s)
     return float128_do_compare(a, b, s, true);
 }
 
+static FloatRelation QEMU_FLATTEN
+floatx80_do_compare(floatx80 a, floatx80 b, float_status *s, bool is_quiet)
+{
+    FloatParts128 pa, pb;
+
+    if (!floatx80_unpack_canonical(&pa, a, s) ||
+        !floatx80_unpack_canonical(&pb, b, s)) {
+        return float_relation_unordered;
+    }
+    return parts_compare(&pa, &pb, s, is_quiet);
+}
+
+FloatRelation floatx80_compare(floatx80 a, floatx80 b, float_status *s)
+{
+    return floatx80_do_compare(a, b, s, false);
+}
+
+FloatRelation floatx80_compare_quiet(floatx80 a, floatx80 b, float_status *s)
+{
+    return floatx80_do_compare(a, b, s, true);
+}
+
 /*
  * Scale by 2**N
  */
@@ -5698,66 +5720,6 @@ float128 float128_rem(float128 a, float128 b, float_status *status)
     return normalizeRoundAndPackFloat128(aSign ^ zSign, bExp - 4, aSig0, aSig1,
                                          status);
 }
-
-static inline FloatRelation
-floatx80_compare_internal(floatx80 a, floatx80 b, bool is_quiet,
-                          float_status *status)
-{
-    bool aSign, bSign;
-
-    if (floatx80_invalid_encoding(a) || floatx80_invalid_encoding(b)) {
-        float_raise(float_flag_invalid, status);
-        return float_relation_unordered;
-    }
-    if (( ( extractFloatx80Exp( a ) == 0x7fff ) &&
-          ( extractFloatx80Frac( a )<<1 ) ) ||
-        ( ( extractFloatx80Exp( b ) == 0x7fff ) &&
-          ( extractFloatx80Frac( b )<<1 ) )) {
-        if (!is_quiet ||
-            floatx80_is_signaling_nan(a, status) ||
-            floatx80_is_signaling_nan(b, status)) {
-            float_raise(float_flag_invalid, status);
-        }
-        return float_relation_unordered;
-    }
-    aSign = extractFloatx80Sign( a );
-    bSign = extractFloatx80Sign( b );
-    if ( aSign != bSign ) {
-
-        if ( ( ( (uint16_t) ( ( a.high | b.high ) << 1 ) ) == 0) &&
-             ( ( a.low | b.low ) == 0 ) ) {
-            /* zero case */
-            return float_relation_equal;
-        } else {
-            return 1 - (2 * aSign);
-        }
-    } else {
-        /* Normalize pseudo-denormals before comparison.  */
-        if ((a.high & 0x7fff) == 0 && a.low & UINT64_C(0x8000000000000000)) {
-            ++a.high;
-        }
-        if ((b.high & 0x7fff) == 0 && b.low & UINT64_C(0x8000000000000000)) {
-            ++b.high;
-        }
-        if (a.low == b.low && a.high == b.high) {
-            return float_relation_equal;
-        } else {
-            return 1 - 2 * (aSign ^ ( lt128( a.high, a.low, b.high, b.low ) ));
-        }
-    }
-}
-
-FloatRelation floatx80_compare(floatx80 a, floatx80 b, float_status *status)
-{
-    return floatx80_compare_internal(a, b, 0, status);
-}
-
-FloatRelation floatx80_compare_quiet(floatx80 a, floatx80 b,
-                                     float_status *status)
-{
-    return floatx80_compare_internal(a, b, 1, status);
-}
-
 static void __attribute__((constructor)) softfloat_init(void)
 {
     union_float64 ua, ub, uc, ur;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 70/72] softfloat: Convert float32_exp2 to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (68 preceding siblings ...)
  2021-05-08  1:47 ` [PATCH 69/72] softfloat: Convert floatx80 compare " Richard Henderson
@ 2021-05-08  1:48 ` Richard Henderson
  2021-05-08  1:48 ` [PATCH 71/72] softfloat: Move floatN_log2 to softfloat-parts.c.inc Richard Henderson
                   ` (6 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Keep the intermediate results in FloatParts instead of
converting back and forth between float64.  Use muladd
instead of separate mul+add.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 53 +++++++++++++++++++++----------------------------
 1 file changed, 23 insertions(+), 30 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index b89ec4b832..906bb427ae 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5212,47 +5212,40 @@ static const float64 float32_exp2_coefficients[15] =
 
 float32 float32_exp2(float32 a, float_status *status)
 {
-    bool aSign;
-    int aExp;
-    uint32_t aSig;
-    float64 r, x, xn;
+    FloatParts64 xp, xnp, tp, rp;
     int i;
-    a = float32_squash_input_denormal(a, status);
 
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-
-    if ( aExp == 0xFF) {
-        if (aSig) {
-            return propagateFloat32NaN(a, float32_zero, status);
+    float32_unpack_canonical(&xp, a, status);
+    if (unlikely(xp.cls != float_class_normal)) {
+        switch (xp.cls) {
+        case float_class_snan:
+        case float_class_qnan:
+            parts_return_nan(&xp, status);
+            return float32_round_pack_canonical(&xp, status);
+        case float_class_inf:
+            return xp.sign ? float32_zero : a;
+        case float_class_zero:
+            return float32_one;
+        default:
+            break;
         }
-        return (aSign) ? float32_zero : a;
-    }
-    if (aExp == 0) {
-        if (aSig == 0) return float32_one;
+        g_assert_not_reached();
     }
 
     float_raise(float_flag_inexact, status);
 
-    /* ******************************* */
-    /* using float64 for approximation */
-    /* ******************************* */
-    x = float32_to_float64(a, status);
-    x = float64_mul(x, float64_ln2, status);
+    float64_unpack_canonical(&xnp, float64_ln2, status);
+    xp = *parts_mul(&xp, &tp, status);
+    xnp = xp;
 
-    xn = x;
-    r = float64_one;
+    float64_unpack_canonical(&rp, float64_one, status);
     for (i = 0 ; i < 15 ; i++) {
-        float64 f;
-
-        f = float64_mul(xn, float32_exp2_coefficients[i], status);
-        r = float64_add(r, f, status);
-
-        xn = float64_mul(xn, x, status);
+        float64_unpack_canonical(&tp, float32_exp2_coefficients[i], status);
+        rp = *parts_muladd(&tp, &xp, &rp, 0, status);
+        xnp = *parts_mul(&xnp, &xp, status);
     }
 
-    return float64_to_float32(r, status);
+    return float32_round_pack_canonical(&rp, status);
 }
 
 /*----------------------------------------------------------------------------
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 71/72] softfloat: Move floatN_log2 to softfloat-parts.c.inc
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (69 preceding siblings ...)
  2021-05-08  1:48 ` [PATCH 70/72] softfloat: Convert float32_exp2 " Richard Henderson
@ 2021-05-08  1:48 ` Richard Henderson
  2021-05-08  1:48 ` [PATCH 72/72] softfloat: Convert modrem operations to FloatParts Richard Henderson
                   ` (5 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Rename to parts$N_log2.  Though this is partly a ruse, since I do not
believe the code will succeed for float128 without work.  Which is ok
for now, because we do not need this for more than float32 and float64.

Since berkeley-testfloat-3 doesn't support log2, compare float64_log2
vs the system log2.  Fix the errors for inputs near 1.0:

test: 3ff00000000000b0  +0x1.00000000000b0p+0
  sf: 3d2fa00000000000  +0x1.fa00000000000p-45
libm: 3d2fbd422b1bd36f  +0x1.fbd422b1bd36fp-45
Error in fraction: 32170028290927 ulp

test: 3feec24f6770b100  +0x1.ec24f6770b100p-1
  sf: bfad3740d13c9ec0  -0x1.d3740d13c9ec0p-5
libm: bfad3740d13c9e98  -0x1.d3740d13c9e98p-5
Error in fraction: 40 ulp

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c           | 126 ++++++++------------------------------
 tests/fp/fp-test-log2.c   | 118 +++++++++++++++++++++++++++++++++++
 fpu/softfloat-parts.c.inc | 125 +++++++++++++++++++++++++++++++++++++
 tests/fp/meson.build      |  11 ++++
 4 files changed, 281 insertions(+), 99 deletions(-)
 create mode 100644 tests/fp/fp-test-log2.c

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 906bb427ae..3823a7ec6f 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -926,6 +926,12 @@ static void parts128_scalbn(FloatParts128 *a, int n, float_status *s);
 #define parts_scalbn(A, N, S) \
     PARTS_GENERIC_64_128(scalbn, A)(A, N, S)
 
+static void parts64_log2(FloatParts64 *a, float_status *s, const FloatFmt *f);
+static void parts128_log2(FloatParts128 *a, float_status *s, const FloatFmt *f);
+
+#define parts_log2(A, S, F) \
+    PARTS_GENERIC_64_128(log2, A)(A, S, F)
+
 /*
  * Helper functions for softfloat-parts.c.inc, per-size operations.
  */
@@ -4062,6 +4068,27 @@ floatx80 floatx80_sqrt(floatx80 a, float_status *s)
     return floatx80_round_pack_canonical(&p, s);
 }
 
+/*
+ * log2
+ */
+float32 float32_log2(float32 a, float_status *status)
+{
+    FloatParts64 p;
+
+    float32_unpack_canonical(&p, a, status);
+    parts_log2(&p, status, &float32_params);
+    return float32_round_pack_canonical(&p, status);
+}
+
+float64 float64_log2(float64 a, float_status *status)
+{
+    FloatParts64 p;
+
+    float64_unpack_canonical(&p, a, status);
+    parts_log2(&p, status, &float64_params);
+    return float64_round_pack_canonical(&p, status);
+}
+
 /*----------------------------------------------------------------------------
 | The pattern for a default generated NaN.
 *----------------------------------------------------------------------------*/
@@ -5248,56 +5275,6 @@ float32 float32_exp2(float32 a, float_status *status)
     return float32_round_pack_canonical(&rp, status);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the binary log of the single-precision floating-point value `a'.
-| The operation is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-float32 float32_log2(float32 a, float_status *status)
-{
-    bool aSign, zSign;
-    int aExp;
-    uint32_t aSig, zSig, i;
-
-    a = float32_squash_input_denormal(a, status);
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloat32( 1, 0xFF, 0 );
-        normalizeFloat32Subnormal( aSig, &aExp, &aSig );
-    }
-    if ( aSign ) {
-        float_raise(float_flag_invalid, status);
-        return float32_default_nan(status);
-    }
-    if ( aExp == 0xFF ) {
-        if (aSig) {
-            return propagateFloat32NaN(a, float32_zero, status);
-        }
-        return a;
-    }
-
-    aExp -= 0x7F;
-    aSig |= 0x00800000;
-    zSign = aExp < 0;
-    zSig = aExp << 23;
-
-    for (i = 1 << 22; i > 0; i >>= 1) {
-        aSig = ( (uint64_t)aSig * aSig ) >> 23;
-        if ( aSig & 0x01000000 ) {
-            aSig >>= 1;
-            zSig |= i;
-        }
-    }
-
-    if ( zSign )
-        zSig = -zSig;
-
-    return normalizeRoundAndPackFloat32(zSign, 0x85, zSig, status);
-}
-
 /*----------------------------------------------------------------------------
 | Returns the remainder of the double-precision floating-point value `a'
 | with respect to the corresponding value `b'.  The operation is performed
@@ -5386,55 +5363,6 @@ float64 float64_rem(float64 a, float64 b, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the binary log of the double-precision floating-point value `a'.
-| The operation is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-float64 float64_log2(float64 a, float_status *status)
-{
-    bool aSign, zSign;
-    int aExp;
-    uint64_t aSig, aSig0, aSig1, zSig, i;
-    a = float64_squash_input_denormal(a, status);
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloat64( 1, 0x7FF, 0 );
-        normalizeFloat64Subnormal( aSig, &aExp, &aSig );
-    }
-    if ( aSign ) {
-        float_raise(float_flag_invalid, status);
-        return float64_default_nan(status);
-    }
-    if ( aExp == 0x7FF ) {
-        if (aSig) {
-            return propagateFloat64NaN(a, float64_zero, status);
-        }
-        return a;
-    }
-
-    aExp -= 0x3FF;
-    aSig |= UINT64_C(0x0010000000000000);
-    zSign = aExp < 0;
-    zSig = (uint64_t)aExp << 52;
-    for (i = 1LL << 51; i > 0; i >>= 1) {
-        mul64To128( aSig, aSig, &aSig0, &aSig1 );
-        aSig = ( aSig0 << 12 ) | ( aSig1 >> 52 );
-        if ( aSig & UINT64_C(0x0020000000000000) ) {
-            aSig >>= 1;
-            zSig |= i;
-        }
-    }
-
-    if ( zSign )
-        zSig = -zSig;
-    return normalizeRoundAndPackFloat64(zSign, 0x408, zSig, status);
-}
-
 /*----------------------------------------------------------------------------
 | Rounds the extended double-precision floating-point value `a'
 | to the precision provided by floatx80_rounding_precision and returns the
diff --git a/tests/fp/fp-test-log2.c b/tests/fp/fp-test-log2.c
new file mode 100644
index 0000000000..8ad856509b
--- /dev/null
+++ b/tests/fp/fp-test-log2.c
@@ -0,0 +1,118 @@
+/*
+ * fp-test-log2.c - test QEMU's softfloat log2
+ *
+ * Copyright (C) 2020, Linaro, Ltd.
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+#ifndef HW_POISON_H
+#error Must define HW_POISON_H to work around TARGET_* poisoning
+#endif
+
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include <math.h>
+#include "fpu/softfloat.h"
+
+typedef union {
+    double d;
+    float64 i;
+} ufloat64;
+
+static int errors;
+
+static void compare(ufloat64 test, ufloat64 real, ufloat64 soft, bool exact)
+{
+    int msb;
+    uint64_t ulp = UINT64_MAX;
+
+    if (real.i == soft.i) {
+        return;
+    }
+    msb = 63 - __builtin_clzll(real.i ^ soft.i);
+
+    if (msb < 52) {
+        if (real.i > soft.i) {
+            ulp = real.i - soft.i;
+        } else {
+            ulp = soft.i - real.i;
+        }
+    }
+
+    /* glibc allows 3 ulp error in its libm-test-ulps; allow 4 here */
+    if (!exact && ulp <= 4) {
+        return;
+    }
+   
+    printf("test: %016" PRIx64 "  %+.13a\n"
+           "  sf: %016" PRIx64 "  %+.13a\n"
+           "libm: %016" PRIx64 "  %+.13a\n",
+           test.i, test.d, soft.i, soft.d, real.i, real.d);
+
+    if (msb == 63) {
+        printf("Error in sign!\n\n");
+    } else if (msb >= 52) {
+        printf("Error in exponent: %d\n\n",
+               (int)(soft.i >> 52) - (int)(real.i >> 52));
+    } else {
+        printf("Error in fraction: %" PRIu64 " ulp\n\n", ulp);
+    }
+
+    if (++errors == 20) {
+        exit(1);
+    }
+}
+
+int main(int ac, char **av)
+{
+    ufloat64 test, real, soft;
+    float_status qsf = {0};
+    int i;
+
+    set_float_rounding_mode(float_round_nearest_even, &qsf);
+
+    test.d = 0.0;
+    real.d = -__builtin_inf();
+    soft.i = float64_log2(test.i, &qsf);
+    compare(test, real, soft, true);
+
+    test.d = 1.0;
+    real.d = 0.0;
+    soft.i = float64_log2(test.i, &qsf);
+    compare(test, real, soft, true);
+
+    test.d = 2.0;
+    real.d = 1.0;
+    soft.i = float64_log2(test.i, &qsf);
+    compare(test, real, soft, true);
+
+    test.d = 4.0;
+    real.d = 2.0;
+    soft.i = float64_log2(test.i, &qsf);
+    compare(test, real, soft, true);
+
+    test.d = 0x1p64;
+    real.d = 64.0;
+    soft.i = float64_log2(test.i, &qsf);
+    compare(test, real, soft, true);
+
+    test.d = __builtin_inf();
+    real.d = __builtin_inf();
+    soft.i = float64_log2(test.i, &qsf);
+    compare(test, real, soft, true);
+
+    for (i = 0; i < 10000; ++i) {
+        test.d = drand48() + 1.0;    /* [1.0, 2.0) */
+        real.d = log2(test.d);
+        soft.i = float64_log2(test.i, &qsf);
+        compare(test, real, soft, false);
+
+        test.d = drand48() * 100;    /* [0.0, 100) */
+        real.d = log2(test.d);
+        soft.i = float64_log2(test.i, &qsf);
+        compare(test, real, soft, false);
+    }
+
+    return 0;
+}
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index c18f77f2cf..3659c6a4c0 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -1319,3 +1319,128 @@ static void partsN(scalbn)(FloatPartsN *a, int n, float_status *s)
         g_assert_not_reached();
     }
 }
+
+/*
+ * Return log2(A)
+ */
+static void partsN(log2)(FloatPartsN *a, float_status *s, const FloatFmt *fmt)
+{
+    uint64_t a0, a1, r, t, ign;
+    FloatPartsN f;
+    int i, n, a_exp, f_exp;
+
+    if (unlikely(a->cls != float_class_normal)) {
+        switch (a->cls) {
+        case float_class_snan:
+        case float_class_qnan:
+            parts_return_nan(a, s);
+            return;
+        case float_class_zero:
+            /* log2(0) = -inf */
+            a->cls = float_class_inf;
+            a->sign = 1;
+            return;
+        case float_class_inf:
+            if (unlikely(a->sign)) {
+                goto d_nan;
+            }
+            return;
+        default:
+            break;
+        }
+        g_assert_not_reached();
+    }
+    if (unlikely(a->sign)) {
+        goto d_nan;
+    }
+
+    /* TODO: This algorithm looses bits too quickly for float128. */
+    g_assert(N == 64);
+
+    a_exp = a->exp;
+    f_exp = -1;
+
+    r = 0;
+    t = DECOMPOSED_IMPLICIT_BIT;
+    a0 = a->frac_hi;
+    a1 = 0;
+
+    n = fmt->frac_size + 2;
+    if (unlikely(a_exp == -1)) {
+        /*
+         * When a_exp == -1, we're computing the log2 of a value [0.5,1.0).
+         * When the value is very close to 1.0, there are lots of 1's in
+         * the msb parts of the fraction.  At the end, when we subtract
+         * this value from -1.0, we can see a catastrophic loss of precision,
+         * as 0x800..000 - 0x7ff..ffx becomes 0x000..00y, leaving only the
+         * bits of y in the final result.  To minimize this, compute as many
+         * digits as we can.
+         * ??? This case needs another algorithm to avoid this.
+         */
+        n = fmt->frac_size * 2 + 2;
+        /* Don't compute a value overlapping the sticky bit */
+        n = MIN(n, 62);
+    }
+
+    for (i = 0; i < n; i++) {
+        if (a1) {
+            mul128To256(a0, a1, a0, a1, &a0, &a1, &ign, &ign);
+        } else if (a0 & 0xffffffffull) {
+            mul64To128(a0, a0, &a0, &a1);
+        } else if (a0 & ~DECOMPOSED_IMPLICIT_BIT) {
+            a0 >>= 32;
+            a0 *= a0;
+        } else {
+            goto exact;
+        }
+
+        if (a0 & DECOMPOSED_IMPLICIT_BIT) {
+            if (unlikely(a_exp == 0 && r == 0)) {
+                /*
+                 * When a_exp == 0, we're computing the log2 of a value
+                 * [1.0,2.0).  When the value is very close to 1.0, there
+                 * are lots of 0's in the msb parts of the fraction.
+                 * We need to compute more digits to produce a correct
+                 * result -- restart at the top of the fraction.
+                 * ??? This is likely to lose precision quickly, as for
+                 * float128; we may need another method.
+                 */
+                f_exp -= i;
+                t = r = DECOMPOSED_IMPLICIT_BIT;
+                i = 0;
+            } else {
+                r |= t;
+            }
+        } else {
+            add128(a0, a1, a0, a1, &a0, &a1);
+        }
+        t >>= 1;
+    }
+
+    /* Set sticky for inexact. */
+    r |= (a1 || a0 & ~DECOMPOSED_IMPLICIT_BIT);
+
+ exact:
+    parts_sint_to_float(a, a_exp, 0, s);
+    if (r == 0) {
+        return;
+    }
+
+    memset(&f, 0, sizeof(f));
+    f.cls = float_class_normal;
+    f.frac_hi = r;
+    f.exp = f_exp - frac_normalize(&f);
+
+    if (a_exp < 0) {
+        parts_sub_normal(a, &f);
+    } else if (a_exp > 0) {
+        parts_add_normal(a, &f);
+    } else {
+        *a = f;
+    }
+    return;
+
+ d_nan:
+    float_raise(float_flag_invalid, s);
+    parts_default_nan(a, s);
+}
diff --git a/tests/fp/meson.build b/tests/fp/meson.build
index 1c3eee9955..9218bfd3b0 100644
--- a/tests/fp/meson.build
+++ b/tests/fp/meson.build
@@ -634,3 +634,14 @@ fpbench = executable(
   include_directories: [sfinc, include_directories(tfdir)],
   c_args: fpcflags,
 )
+
+fptestlog2 = executable(
+  'fp-test-log2',
+  ['fp-test-log2.c', '../../fpu/softfloat.c'],
+  link_with: [libsoftfloat],
+  dependencies: [qemuutil],
+  include_directories: [sfinc],
+  c_args: fpcflags,
+)
+test('fp-test-log2', fptestlog2,
+     suite: ['softfloat', 'softfloat-ops'])
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH 72/72] softfloat: Convert modrem operations to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (70 preceding siblings ...)
  2021-05-08  1:48 ` [PATCH 71/72] softfloat: Move floatN_log2 to softfloat-parts.c.inc Richard Henderson
@ 2021-05-08  1:48 ` Richard Henderson
  2021-05-08  2:52 ` [PATCH 00/72] Convert floatx80 and float128 " no-reply
                   ` (4 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-08  1:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee, david

Rename to parts$N_modrem.  This was the last use of a lot
of the legacy infrastructure, so remove it as required.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-macros.h |   34 +
 fpu/softfloat.c                | 1339 +++++++-------------------------
 fpu/softfloat-parts.c.inc      |   34 +
 fpu/softfloat-specialize.c.inc |  165 ----
 4 files changed, 329 insertions(+), 1243 deletions(-)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index ec4e27a595..81c3fe8256 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -745,4 +745,38 @@ static inline bool ne128(uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1)
     return a0 != b0 || a1 != b1;
 }
 
+/*
+ * Similarly, comparisons of 192-bit values.
+ */
+
+static inline bool eq192(uint64_t a0, uint64_t a1, uint64_t a2,
+                         uint64_t b0, uint64_t b1, uint64_t b2)
+{
+    return ((a0 ^ b0) | (a1 ^ b1) | (a2 ^ b2)) == 0;
+}
+
+static inline bool le192(uint64_t a0, uint64_t a1, uint64_t a2,
+                         uint64_t b0, uint64_t b1, uint64_t b2)
+{
+    if (a0 != b0) {
+        return a0 < b0;
+    }
+    if (a1 != b1) {
+        return a1 < b1;
+    }
+    return a2 <= b2;
+}
+
+static inline bool lt192(uint64_t a0, uint64_t a1, uint64_t a2,
+                         uint64_t b0, uint64_t b1, uint64_t b2)
+{
+    if (a0 != b0) {
+        return a0 < b0;
+    }
+    if (a1 != b1) {
+        return a1 < b1;
+    }
+    return a2 < b2;
+}
+
 #endif
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3823a7ec6f..7376b3470c 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -401,60 +401,6 @@ float64_gen2(float64 xa, float64 xb, float_status *s,
     return soft(ua.s, ub.s, s);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the fraction bits of the single-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline uint32_t extractFloat32Frac(float32 a)
-{
-    return float32_val(a) & 0x007FFFFF;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the exponent bits of the single-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline int extractFloat32Exp(float32 a)
-{
-    return (float32_val(a) >> 23) & 0xFF;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the sign bit of the single-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline bool extractFloat32Sign(float32 a)
-{
-    return float32_val(a) >> 31;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the fraction bits of the double-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline uint64_t extractFloat64Frac(float64 a)
-{
-    return float64_val(a) & UINT64_C(0x000FFFFFFFFFFFFF);
-}
-
-/*----------------------------------------------------------------------------
-| Returns the exponent bits of the double-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline int extractFloat64Exp(float64 a)
-{
-    return (float64_val(a) >> 52) & 0x7FF;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the sign bit of the double-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline bool extractFloat64Sign(float64 a)
-{
-    return float64_val(a) >> 63;
-}
-
 /*
  * Classify a floating point number. Everything above float_class_qnan
  * is a NaN so cls >= float_class_qnan is any NaN.
@@ -844,6 +790,14 @@ static FloatParts128 *parts128_div(FloatParts128 *a, FloatParts128 *b,
 #define parts_div(A, B, S) \
     PARTS_GENERIC_64_128(div, A)(A, B, S)
 
+static FloatParts64 *parts64_modrem(FloatParts64 *a, FloatParts64 *b,
+                                    uint64_t *mod_quot, float_status *s);
+static FloatParts128 *parts128_modrem(FloatParts128 *a, FloatParts128 *b,
+                                      uint64_t *mod_quot, float_status *s);
+
+#define parts_modrem(A, B, Q, S) \
+    PARTS_GENERIC_64_128(modrem, A)(A, B, Q, S)
+
 static void parts64_sqrt(FloatParts64 *a, float_status *s, const FloatFmt *f);
 static void parts128_sqrt(FloatParts128 *a, float_status *s, const FloatFmt *f);
 
@@ -1228,6 +1182,186 @@ static int frac256_normalize(FloatParts256 *a)
 
 #define frac_normalize(A)  FRAC_GENERIC_64_128_256(normalize, A)(A)
 
+static void frac64_modrem(FloatParts64 *a, FloatParts64 *b, uint64_t *mod_quot)
+{
+    uint64_t a0, a1, b0, t0, t1, q, quot;
+    int exp_diff = a->exp - b->exp;
+    int shift;
+
+    a0 = a->frac;
+    a1 = 0;
+
+    if (exp_diff < -1) {
+        if (mod_quot) {
+            *mod_quot = 0;
+        }
+        return;
+    }
+    if (exp_diff == -1) {
+        a0 >>= 1;
+        exp_diff = 0;
+    }
+
+    b0 = b->frac;
+    quot = q = b0 <= a0;
+    if (q) {
+        a0 -= b0;
+    }
+
+    exp_diff -= 64;
+    while (exp_diff > 0) {
+        q = estimateDiv128To64(a0, a1, b0);
+        q = q > 2 ? q - 2 : 0;
+        mul64To128(b0, q, &t0, &t1);
+        sub128(a0, a1, t0, t1, &a0, &a1);
+        shortShift128Left(a0, a1, 62, &a0, &a1);
+        exp_diff -= 62;
+        quot = (quot << 62) + q;
+    }
+
+    exp_diff += 64;
+    if (exp_diff > 0) {
+        q = estimateDiv128To64(a0, a1, b0);
+        q = q > 2 ? (q - 2) >> (64 - exp_diff) : 0;
+        mul64To128(b0, q << (64 - exp_diff), &t0, &t1);
+        sub128(a0, a1, t0, t1, &a0, &a1);
+        shortShift128Left(0, b0, 64 - exp_diff, &t0, &t1);
+        while (le128(t0, t1, a0, a1)) {
+            ++q;
+            sub128(a0, a1, t0, t1, &a0, &a1);
+        }
+        quot = (exp_diff < 64 ? quot << exp_diff : 0) + q;
+    } else {
+        t0 = b0;
+        t1 = 0;
+    }
+
+    if (mod_quot) {
+        *mod_quot = quot;
+    } else {
+        sub128(t0, t1, a0, a1, &t0, &t1);
+        if (lt128(t0, t1, a0, a1) ||
+            (eq128(t0, t1, a0, a1) && (q & 1))) {
+            a0 = t0;
+            a1 = t1;
+            a->sign = !a->sign;
+        }
+    }
+
+    if (likely(a0)) {
+        shift = clz64(a0);
+        shortShift128Left(a0, a1, shift, &a0, &a1);
+    } else if (likely(a1)) {
+        shift = clz64(a1);
+        a0 = a1 << shift;
+        a1 = 0;
+        shift += 64;
+    } else {
+        a->cls = float_class_zero;
+        return;
+    }
+
+    a->exp = b->exp + exp_diff - shift;
+    a->frac = a0 | (a1 != 0);
+}
+
+static void frac128_modrem(FloatParts128 *a, FloatParts128 *b,
+                           uint64_t *mod_quot)
+{
+    uint64_t a0, a1, a2, b0, b1, t0, t1, t2, q, quot;
+    int exp_diff = a->exp - b->exp;
+    int shift;
+
+    a0 = a->frac_hi;
+    a1 = a->frac_lo;
+    a2 = 0;
+
+    if (exp_diff < -1) {
+        if (mod_quot) {
+            *mod_quot = 0;
+        }
+        return;
+    }
+    if (exp_diff == -1) {
+        shift128Right(a0, a1, 1, &a0, &a1);
+        exp_diff = 0;
+    }
+
+    b0 = b->frac_hi;
+    b1 = b->frac_lo;
+
+    quot = q = le128(b0, b1, a0, a1);
+    if (q) {
+        sub128(a0, a1, b0, b1, &a0, &a1);
+    }
+
+    exp_diff -= 64;
+    while (exp_diff > 0) {
+        q = estimateDiv128To64(a0, a1, b0);
+        q = q > 4 ? q - 4 : 0;
+        mul128By64To192(b0, b1, q, &t0, &t1, &t2);
+        sub192(a0, a1, a2, t0, t1, t2, &a0, &a1, &a2);
+        shortShift192Left(a0, a1, a2, 61, &a0, &a1, &a2);
+        exp_diff -= 61;
+        quot = (quot << 61) + q;
+    }
+
+    exp_diff += 64;
+    if (exp_diff > 0) {
+        q = estimateDiv128To64(a0, a1, b0);
+        q = q > 4 ? (q - 4) >> (64 - exp_diff) : 0;
+        mul128By64To192(b0, b1, q << (64 - exp_diff), &t0, &t1, &t2);
+        sub192(a0, a1, a2, t0, t1, t2, &a0, &a1, &a2);
+        shortShift192Left(0, b0, b1, 64 - exp_diff, &t0, &t1, &t2);
+        while (le192(t0, t1, t2, a0, a1, a2)) {
+            ++q;
+            sub192(a0, a1, a2, t0, t1, t2, &a0, &a1, &a2);
+        }
+        quot = (exp_diff < 64 ? quot << exp_diff : 0) + q;
+    } else {
+        t0 = b0;
+        t1 = b1;
+        t2 = 0;
+    }
+
+    if (mod_quot) {
+        *mod_quot = quot;
+    } else {
+        sub192(t0, t1, t2, a0, a1, a2, &t0, &t1, &t2);
+        if (lt192(t0, t1, t2, a0, a1, a2) ||
+            (eq192(t0, t1, t2, a0, a1, a2) && (q & 1))) {
+            a0 = t0;
+            a1 = t1;
+            a2 = t2;
+            a->sign = !a->sign;
+        }
+    }
+
+    if (likely(a0)) {
+        shift = clz64(a0);
+        shortShift192Left(a0, a1, a2, shift, &a0, &a1, &a2);
+    } else if (likely(a1)) {
+        shift = clz64(a1);
+        shortShift128Left(a1, a2, shift, &a0, &a1);
+        a2 = 0;
+        shift += 64;
+    } else if (likely(a2)) {
+        shift = clz64(a2);
+        a0 = a2 << shift;
+        a1 = a2 = 0;
+        shift += 128;
+    } else {
+        a->cls = float_class_zero;
+        return;
+    }
+
+    a->exp = b->exp + exp_diff - shift;
+    a->frac_hi = a0;
+    a->frac_lo = a1 | (a2 != 0);
+}
+
+#define frac_modrem(A, B, Q)  FRAC_GENERIC_64_128(modrem, A)(A, B, Q)
+
 static void frac64_shl(FloatParts64 *a, int c)
 {
     a->frac <<= c;
@@ -2312,6 +2446,79 @@ floatx80 floatx80_div(floatx80 a, floatx80 b, float_status *status)
     return floatx80_round_pack_canonical(pr, status);
 }
 
+/*
+ * Remainder
+ */
+
+float32 float32_rem(float32 a, float32 b, float_status *status)
+{
+    FloatParts64 pa, pb, *pr;
+
+    float32_unpack_canonical(&pa, a, status);
+    float32_unpack_canonical(&pb, b, status);
+    pr = parts_modrem(&pa, &pb, NULL, status);
+
+    return float32_round_pack_canonical(pr, status);
+}
+
+float64 float64_rem(float64 a, float64 b, float_status *status)
+{
+    FloatParts64 pa, pb, *pr;
+
+    float64_unpack_canonical(&pa, a, status);
+    float64_unpack_canonical(&pb, b, status);
+    pr = parts_modrem(&pa, &pb, NULL, status);
+
+    return float64_round_pack_canonical(pr, status);
+}
+
+float128 float128_rem(float128 a, float128 b, float_status *status)
+{
+    FloatParts128 pa, pb, *pr;
+
+    float128_unpack_canonical(&pa, a, status);
+    float128_unpack_canonical(&pb, b, status);
+    pr = parts_modrem(&pa, &pb, NULL, status);
+
+    return float128_round_pack_canonical(pr, status);
+}
+
+/*
+ * Returns the remainder of the extended double-precision floating-point value
+ * `a' with respect to the corresponding value `b'.
+ * If 'mod' is false, the operation is performed according to the IEC/IEEE
+ * Standard for Binary Floating-Point Arithmetic.  If 'mod' is true, return
+ * the remainder based on truncating the quotient toward zero instead and
+ * *quotient is set to the low 64 bits of the absolute value of the integer
+ * quotient.
+ */
+floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
+                         uint64_t *quotient, float_status *status)
+{
+    FloatParts128 pa, pb, *pr;
+
+    *quotient = 0;
+    if (!floatx80_unpack_canonical(&pa, a, status) ||
+        !floatx80_unpack_canonical(&pb, b, status)) {
+        return floatx80_default_nan(status);
+    }
+    pr = parts_modrem(&pa, &pb, mod ? quotient : NULL, status);
+
+    return floatx80_round_pack_canonical(pr, status);
+}
+
+floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
+{
+    uint64_t quotient;
+    return floatx80_modrem(a, b, false, &quotient, status);
+}
+
+floatx80 floatx80_mod(floatx80 a, floatx80 b, float_status *status)
+{
+    uint64_t quotient;
+    return floatx80_modrem(a, b, true, &quotient, status);
+}
+
 /*
  * Float to Float conversions
  *
@@ -4264,300 +4471,6 @@ bfloat16 bfloat16_squash_input_denormal(bfloat16 a, float_status *status)
     return a;
 }
 
-/*----------------------------------------------------------------------------
-| Normalizes the subnormal single-precision floating-point value represented
-| by the denormalized significand `aSig'.  The normalized exponent and
-| significand are stored at the locations pointed to by `zExpPtr' and
-| `zSigPtr', respectively.
-*----------------------------------------------------------------------------*/
-
-static void
- normalizeFloat32Subnormal(uint32_t aSig, int *zExpPtr, uint32_t *zSigPtr)
-{
-    int8_t shiftCount;
-
-    shiftCount = clz32(aSig) - 8;
-    *zSigPtr = aSig<<shiftCount;
-    *zExpPtr = 1 - shiftCount;
-
-}
-
-/*----------------------------------------------------------------------------
-| Takes an abstract floating-point value having sign `zSign', exponent `zExp',
-| and significand `zSig', and returns the proper single-precision floating-
-| point value corresponding to the abstract input.  Ordinarily, the abstract
-| value is simply rounded and packed into the single-precision format, with
-| the inexact exception raised if the abstract input cannot be represented
-| exactly.  However, if the abstract value is too large, the overflow and
-| inexact exceptions are raised and an infinity or maximal finite value is
-| returned.  If the abstract value is too small, the input value is rounded to
-| a subnormal number, and the underflow and inexact exceptions are raised if
-| the abstract input cannot be represented exactly as a subnormal single-
-| precision floating-point number.
-|     The input significand `zSig' has its binary point between bits 30
-| and 29, which is 7 bits to the left of the usual location.  This shifted
-| significand must be normalized or smaller.  If `zSig' is not normalized,
-| `zExp' must be 0; in that case, the result returned is a subnormal number,
-| and it must not require rounding.  In the usual case that `zSig' is
-| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent.
-| The handling of underflow and overflow follows the IEC/IEEE Standard for
-| Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float32 roundAndPackFloat32(bool zSign, int zExp, uint32_t zSig,
-                                   float_status *status)
-{
-    int8_t roundingMode;
-    bool roundNearestEven;
-    int8_t roundIncrement, roundBits;
-    bool isTiny;
-
-    roundingMode = status->float_rounding_mode;
-    roundNearestEven = ( roundingMode == float_round_nearest_even );
-    switch (roundingMode) {
-    case float_round_nearest_even:
-    case float_round_ties_away:
-        roundIncrement = 0x40;
-        break;
-    case float_round_to_zero:
-        roundIncrement = 0;
-        break;
-    case float_round_up:
-        roundIncrement = zSign ? 0 : 0x7f;
-        break;
-    case float_round_down:
-        roundIncrement = zSign ? 0x7f : 0;
-        break;
-    case float_round_to_odd:
-        roundIncrement = zSig & 0x80 ? 0 : 0x7f;
-        break;
-    default:
-        abort();
-        break;
-    }
-    roundBits = zSig & 0x7F;
-    if ( 0xFD <= (uint16_t) zExp ) {
-        if (    ( 0xFD < zExp )
-             || (    ( zExp == 0xFD )
-                  && ( (int32_t) ( zSig + roundIncrement ) < 0 ) )
-           ) {
-            bool overflow_to_inf = roundingMode != float_round_to_odd &&
-                                   roundIncrement != 0;
-            float_raise(float_flag_overflow | float_flag_inexact, status);
-            return packFloat32(zSign, 0xFF, -!overflow_to_inf);
-        }
-        if ( zExp < 0 ) {
-            if (status->flush_to_zero) {
-                float_raise(float_flag_output_denormal, status);
-                return packFloat32(zSign, 0, 0);
-            }
-            isTiny = status->tininess_before_rounding
-                  || (zExp < -1)
-                  || (zSig + roundIncrement < 0x80000000);
-            shift32RightJamming( zSig, - zExp, &zSig );
-            zExp = 0;
-            roundBits = zSig & 0x7F;
-            if (isTiny && roundBits) {
-                float_raise(float_flag_underflow, status);
-            }
-            if (roundingMode == float_round_to_odd) {
-                /*
-                 * For round-to-odd case, the roundIncrement depends on
-                 * zSig which just changed.
-                 */
-                roundIncrement = zSig & 0x80 ? 0 : 0x7f;
-            }
-        }
-    }
-    if (roundBits) {
-        float_raise(float_flag_inexact, status);
-    }
-    zSig = ( zSig + roundIncrement )>>7;
-    if (!(roundBits ^ 0x40) && roundNearestEven) {
-        zSig &= ~1;
-    }
-    if ( zSig == 0 ) zExp = 0;
-    return packFloat32( zSign, zExp, zSig );
-
-}
-
-/*----------------------------------------------------------------------------
-| Takes an abstract floating-point value having sign `zSign', exponent `zExp',
-| and significand `zSig', and returns the proper single-precision floating-
-| point value corresponding to the abstract input.  This routine is just like
-| `roundAndPackFloat32' except that `zSig' does not have to be normalized.
-| Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true''
-| floating-point exponent.
-*----------------------------------------------------------------------------*/
-
-static float32
- normalizeRoundAndPackFloat32(bool zSign, int zExp, uint32_t zSig,
-                              float_status *status)
-{
-    int8_t shiftCount;
-
-    shiftCount = clz32(zSig) - 1;
-    return roundAndPackFloat32(zSign, zExp - shiftCount, zSig<<shiftCount,
-                               status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Normalizes the subnormal double-precision floating-point value represented
-| by the denormalized significand `aSig'.  The normalized exponent and
-| significand are stored at the locations pointed to by `zExpPtr' and
-| `zSigPtr', respectively.
-*----------------------------------------------------------------------------*/
-
-static void
- normalizeFloat64Subnormal(uint64_t aSig, int *zExpPtr, uint64_t *zSigPtr)
-{
-    int8_t shiftCount;
-
-    shiftCount = clz64(aSig) - 11;
-    *zSigPtr = aSig<<shiftCount;
-    *zExpPtr = 1 - shiftCount;
-
-}
-
-/*----------------------------------------------------------------------------
-| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a
-| double-precision floating-point value, returning the result.  After being
-| shifted into the proper positions, the three fields are simply added
-| together to form the result.  This means that any integer portion of `zSig'
-| will be added into the exponent.  Since a properly normalized significand
-| will have an integer portion equal to 1, the `zExp' input should be 1 less
-| than the desired result exponent whenever `zSig' is a complete, normalized
-| significand.
-*----------------------------------------------------------------------------*/
-
-static inline float64 packFloat64(bool zSign, int zExp, uint64_t zSig)
-{
-
-    return make_float64(
-        ( ( (uint64_t) zSign )<<63 ) + ( ( (uint64_t) zExp )<<52 ) + zSig);
-
-}
-
-/*----------------------------------------------------------------------------
-| Takes an abstract floating-point value having sign `zSign', exponent `zExp',
-| and significand `zSig', and returns the proper double-precision floating-
-| point value corresponding to the abstract input.  Ordinarily, the abstract
-| value is simply rounded and packed into the double-precision format, with
-| the inexact exception raised if the abstract input cannot be represented
-| exactly.  However, if the abstract value is too large, the overflow and
-| inexact exceptions are raised and an infinity or maximal finite value is
-| returned.  If the abstract value is too small, the input value is rounded to
-| a subnormal number, and the underflow and inexact exceptions are raised if
-| the abstract input cannot be represented exactly as a subnormal double-
-| precision floating-point number.
-|     The input significand `zSig' has its binary point between bits 62
-| and 61, which is 10 bits to the left of the usual location.  This shifted
-| significand must be normalized or smaller.  If `zSig' is not normalized,
-| `zExp' must be 0; in that case, the result returned is a subnormal number,
-| and it must not require rounding.  In the usual case that `zSig' is
-| normalized, `zExp' must be 1 less than the ``true'' floating-point exponent.
-| The handling of underflow and overflow follows the IEC/IEEE Standard for
-| Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float64 roundAndPackFloat64(bool zSign, int zExp, uint64_t zSig,
-                                   float_status *status)
-{
-    int8_t roundingMode;
-    bool roundNearestEven;
-    int roundIncrement, roundBits;
-    bool isTiny;
-
-    roundingMode = status->float_rounding_mode;
-    roundNearestEven = ( roundingMode == float_round_nearest_even );
-    switch (roundingMode) {
-    case float_round_nearest_even:
-    case float_round_ties_away:
-        roundIncrement = 0x200;
-        break;
-    case float_round_to_zero:
-        roundIncrement = 0;
-        break;
-    case float_round_up:
-        roundIncrement = zSign ? 0 : 0x3ff;
-        break;
-    case float_round_down:
-        roundIncrement = zSign ? 0x3ff : 0;
-        break;
-    case float_round_to_odd:
-        roundIncrement = (zSig & 0x400) ? 0 : 0x3ff;
-        break;
-    default:
-        abort();
-    }
-    roundBits = zSig & 0x3FF;
-    if ( 0x7FD <= (uint16_t) zExp ) {
-        if (    ( 0x7FD < zExp )
-             || (    ( zExp == 0x7FD )
-                  && ( (int64_t) ( zSig + roundIncrement ) < 0 ) )
-           ) {
-            bool overflow_to_inf = roundingMode != float_round_to_odd &&
-                                   roundIncrement != 0;
-            float_raise(float_flag_overflow | float_flag_inexact, status);
-            return packFloat64(zSign, 0x7FF, -(!overflow_to_inf));
-        }
-        if ( zExp < 0 ) {
-            if (status->flush_to_zero) {
-                float_raise(float_flag_output_denormal, status);
-                return packFloat64(zSign, 0, 0);
-            }
-            isTiny = status->tininess_before_rounding
-                  || (zExp < -1)
-                  || (zSig + roundIncrement < UINT64_C(0x8000000000000000));
-            shift64RightJamming( zSig, - zExp, &zSig );
-            zExp = 0;
-            roundBits = zSig & 0x3FF;
-            if (isTiny && roundBits) {
-                float_raise(float_flag_underflow, status);
-            }
-            if (roundingMode == float_round_to_odd) {
-                /*
-                 * For round-to-odd case, the roundIncrement depends on
-                 * zSig which just changed.
-                 */
-                roundIncrement = (zSig & 0x400) ? 0 : 0x3ff;
-            }
-        }
-    }
-    if (roundBits) {
-        float_raise(float_flag_inexact, status);
-    }
-    zSig = ( zSig + roundIncrement )>>10;
-    if (!(roundBits ^ 0x200) && roundNearestEven) {
-        zSig &= ~1;
-    }
-    if ( zSig == 0 ) zExp = 0;
-    return packFloat64( zSign, zExp, zSig );
-
-}
-
-/*----------------------------------------------------------------------------
-| Takes an abstract floating-point value having sign `zSign', exponent `zExp',
-| and significand `zSig', and returns the proper double-precision floating-
-| point value corresponding to the abstract input.  This routine is just like
-| `roundAndPackFloat64' except that `zSig' does not have to be normalized.
-| Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true''
-| floating-point exponent.
-*----------------------------------------------------------------------------*/
-
-static float64
- normalizeRoundAndPackFloat64(bool zSign, int zExp, uint64_t zSig,
-                              float_status *status)
-{
-    int8_t shiftCount;
-
-    shiftCount = clz64(zSig) - 1;
-    return roundAndPackFloat64(zSign, zExp - shiftCount, zSig<<shiftCount,
-                               status);
-
-}
-
 /*----------------------------------------------------------------------------
 | Normalizes the subnormal extended double-precision floating-point value
 | represented by the denormalized significand `aSig'.  The normalized exponent
@@ -4818,388 +4731,6 @@ floatx80 normalizeRoundAndPackFloatx80(FloatX80RoundPrec roundingPrecision,
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the least-significant 64 fraction bits of the quadruple-precision
-| floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline uint64_t extractFloat128Frac1( float128 a )
-{
-
-    return a.low;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the most-significant 48 fraction bits of the quadruple-precision
-| floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline uint64_t extractFloat128Frac0( float128 a )
-{
-
-    return a.high & UINT64_C(0x0000FFFFFFFFFFFF);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the exponent bits of the quadruple-precision floating-point value
-| `a'.
-*----------------------------------------------------------------------------*/
-
-static inline int32_t extractFloat128Exp( float128 a )
-{
-
-    return ( a.high>>48 ) & 0x7FFF;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the sign bit of the quadruple-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline bool extractFloat128Sign(float128 a)
-{
-    return a.high >> 63;
-}
-
-/*----------------------------------------------------------------------------
-| Normalizes the subnormal quadruple-precision floating-point value
-| represented by the denormalized significand formed by the concatenation of
-| `aSig0' and `aSig1'.  The normalized exponent is stored at the location
-| pointed to by `zExpPtr'.  The most significant 49 bits of the normalized
-| significand are stored at the location pointed to by `zSig0Ptr', and the
-| least significant 64 bits of the normalized significand are stored at the
-| location pointed to by `zSig1Ptr'.
-*----------------------------------------------------------------------------*/
-
-static void
- normalizeFloat128Subnormal(
-     uint64_t aSig0,
-     uint64_t aSig1,
-     int32_t *zExpPtr,
-     uint64_t *zSig0Ptr,
-     uint64_t *zSig1Ptr
- )
-{
-    int8_t shiftCount;
-
-    if ( aSig0 == 0 ) {
-        shiftCount = clz64(aSig1) - 15;
-        if ( shiftCount < 0 ) {
-            *zSig0Ptr = aSig1>>( - shiftCount );
-            *zSig1Ptr = aSig1<<( shiftCount & 63 );
-        }
-        else {
-            *zSig0Ptr = aSig1<<shiftCount;
-            *zSig1Ptr = 0;
-        }
-        *zExpPtr = - shiftCount - 63;
-    }
-    else {
-        shiftCount = clz64(aSig0) - 15;
-        shortShift128Left( aSig0, aSig1, shiftCount, zSig0Ptr, zSig1Ptr );
-        *zExpPtr = 1 - shiftCount;
-    }
-
-}
-
-/*----------------------------------------------------------------------------
-| Packs the sign `zSign', the exponent `zExp', and the significand formed
-| by the concatenation of `zSig0' and `zSig1' into a quadruple-precision
-| floating-point value, returning the result.  After being shifted into the
-| proper positions, the three fields `zSign', `zExp', and `zSig0' are simply
-| added together to form the most significant 32 bits of the result.  This
-| means that any integer portion of `zSig0' will be added into the exponent.
-| Since a properly normalized significand will have an integer portion equal
-| to 1, the `zExp' input should be 1 less than the desired result exponent
-| whenever `zSig0' and `zSig1' concatenated form a complete, normalized
-| significand.
-*----------------------------------------------------------------------------*/
-
-static inline float128
-packFloat128(bool zSign, int32_t zExp, uint64_t zSig0, uint64_t zSig1)
-{
-    float128 z;
-
-    z.low = zSig1;
-    z.high = ((uint64_t)zSign << 63) + ((uint64_t)zExp << 48) + zSig0;
-    return z;
-}
-
-/*----------------------------------------------------------------------------
-| Takes an abstract floating-point value having sign `zSign', exponent `zExp',
-| and extended significand formed by the concatenation of `zSig0', `zSig1',
-| and `zSig2', and returns the proper quadruple-precision floating-point value
-| corresponding to the abstract input.  Ordinarily, the abstract value is
-| simply rounded and packed into the quadruple-precision format, with the
-| inexact exception raised if the abstract input cannot be represented
-| exactly.  However, if the abstract value is too large, the overflow and
-| inexact exceptions are raised and an infinity or maximal finite value is
-| returned.  If the abstract value is too small, the input value is rounded to
-| a subnormal number, and the underflow and inexact exceptions are raised if
-| the abstract input cannot be represented exactly as a subnormal quadruple-
-| precision floating-point number.
-|     The input significand must be normalized or smaller.  If the input
-| significand is not normalized, `zExp' must be 0; in that case, the result
-| returned is a subnormal number, and it must not require rounding.  In the
-| usual case that the input significand is normalized, `zExp' must be 1 less
-| than the ``true'' floating-point exponent.  The handling of underflow and
-| overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float128 roundAndPackFloat128(bool zSign, int32_t zExp,
-                                     uint64_t zSig0, uint64_t zSig1,
-                                     uint64_t zSig2, float_status *status)
-{
-    int8_t roundingMode;
-    bool roundNearestEven, increment, isTiny;
-
-    roundingMode = status->float_rounding_mode;
-    roundNearestEven = ( roundingMode == float_round_nearest_even );
-    switch (roundingMode) {
-    case float_round_nearest_even:
-    case float_round_ties_away:
-        increment = ((int64_t)zSig2 < 0);
-        break;
-    case float_round_to_zero:
-        increment = 0;
-        break;
-    case float_round_up:
-        increment = !zSign && zSig2;
-        break;
-    case float_round_down:
-        increment = zSign && zSig2;
-        break;
-    case float_round_to_odd:
-        increment = !(zSig1 & 0x1) && zSig2;
-        break;
-    default:
-        abort();
-    }
-    if ( 0x7FFD <= (uint32_t) zExp ) {
-        if (    ( 0x7FFD < zExp )
-             || (    ( zExp == 0x7FFD )
-                  && eq128(
-                         UINT64_C(0x0001FFFFFFFFFFFF),
-                         UINT64_C(0xFFFFFFFFFFFFFFFF),
-                         zSig0,
-                         zSig1
-                     )
-                  && increment
-                )
-           ) {
-            float_raise(float_flag_overflow | float_flag_inexact, status);
-            if (    ( roundingMode == float_round_to_zero )
-                 || ( zSign && ( roundingMode == float_round_up ) )
-                 || ( ! zSign && ( roundingMode == float_round_down ) )
-                 || (roundingMode == float_round_to_odd)
-               ) {
-                return
-                    packFloat128(
-                        zSign,
-                        0x7FFE,
-                        UINT64_C(0x0000FFFFFFFFFFFF),
-                        UINT64_C(0xFFFFFFFFFFFFFFFF)
-                    );
-            }
-            return packFloat128( zSign, 0x7FFF, 0, 0 );
-        }
-        if ( zExp < 0 ) {
-            if (status->flush_to_zero) {
-                float_raise(float_flag_output_denormal, status);
-                return packFloat128(zSign, 0, 0, 0);
-            }
-            isTiny = status->tininess_before_rounding
-                  || (zExp < -1)
-                  || !increment
-                  || lt128(zSig0, zSig1,
-                           UINT64_C(0x0001FFFFFFFFFFFF),
-                           UINT64_C(0xFFFFFFFFFFFFFFFF));
-            shift128ExtraRightJamming(
-                zSig0, zSig1, zSig2, - zExp, &zSig0, &zSig1, &zSig2 );
-            zExp = 0;
-            if (isTiny && zSig2) {
-                float_raise(float_flag_underflow, status);
-            }
-            switch (roundingMode) {
-            case float_round_nearest_even:
-            case float_round_ties_away:
-                increment = ((int64_t)zSig2 < 0);
-                break;
-            case float_round_to_zero:
-                increment = 0;
-                break;
-            case float_round_up:
-                increment = !zSign && zSig2;
-                break;
-            case float_round_down:
-                increment = zSign && zSig2;
-                break;
-            case float_round_to_odd:
-                increment = !(zSig1 & 0x1) && zSig2;
-                break;
-            default:
-                abort();
-            }
-        }
-    }
-    if (zSig2) {
-        float_raise(float_flag_inexact, status);
-    }
-    if ( increment ) {
-        add128( zSig0, zSig1, 0, 1, &zSig0, &zSig1 );
-        if ((zSig2 + zSig2 == 0) && roundNearestEven) {
-            zSig1 &= ~1;
-        }
-    }
-    else {
-        if ( ( zSig0 | zSig1 ) == 0 ) zExp = 0;
-    }
-    return packFloat128( zSign, zExp, zSig0, zSig1 );
-
-}
-
-/*----------------------------------------------------------------------------
-| Takes an abstract floating-point value having sign `zSign', exponent `zExp',
-| and significand formed by the concatenation of `zSig0' and `zSig1', and
-| returns the proper quadruple-precision floating-point value corresponding
-| to the abstract input.  This routine is just like `roundAndPackFloat128'
-| except that the input significand has fewer bits and does not have to be
-| normalized.  In all cases, `zExp' must be 1 less than the ``true'' floating-
-| point exponent.
-*----------------------------------------------------------------------------*/
-
-static float128 normalizeRoundAndPackFloat128(bool zSign, int32_t zExp,
-                                              uint64_t zSig0, uint64_t zSig1,
-                                              float_status *status)
-{
-    int8_t shiftCount;
-    uint64_t zSig2;
-
-    if ( zSig0 == 0 ) {
-        zSig0 = zSig1;
-        zSig1 = 0;
-        zExp -= 64;
-    }
-    shiftCount = clz64(zSig0) - 15;
-    if ( 0 <= shiftCount ) {
-        zSig2 = 0;
-        shortShift128Left( zSig0, zSig1, shiftCount, &zSig0, &zSig1 );
-    }
-    else {
-        shift128ExtraRightJamming(
-            zSig0, zSig1, 0, - shiftCount, &zSig0, &zSig1, &zSig2 );
-    }
-    zExp -= shiftCount;
-    return roundAndPackFloat128(zSign, zExp, zSig0, zSig1, zSig2, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the remainder of the single-precision floating-point value `a'
-| with respect to the corresponding value `b'.  The operation is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 float32_rem(float32 a, float32 b, float_status *status)
-{
-    bool aSign, zSign;
-    int aExp, bExp, expDiff;
-    uint32_t aSig, bSig;
-    uint32_t q;
-    uint64_t aSig64, bSig64, q64;
-    uint32_t alternateASig;
-    int32_t sigMean;
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    bSig = extractFloat32Frac( b );
-    bExp = extractFloat32Exp( b );
-    if ( aExp == 0xFF ) {
-        if ( aSig || ( ( bExp == 0xFF ) && bSig ) ) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        float_raise(float_flag_invalid, status);
-        return float32_default_nan(status);
-    }
-    if ( bExp == 0xFF ) {
-        if (bSig) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        return a;
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) {
-            float_raise(float_flag_invalid, status);
-            return float32_default_nan(status);
-        }
-        normalizeFloat32Subnormal( bSig, &bExp, &bSig );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return a;
-        normalizeFloat32Subnormal( aSig, &aExp, &aSig );
-    }
-    expDiff = aExp - bExp;
-    aSig |= 0x00800000;
-    bSig |= 0x00800000;
-    if ( expDiff < 32 ) {
-        aSig <<= 8;
-        bSig <<= 8;
-        if ( expDiff < 0 ) {
-            if ( expDiff < -1 ) return a;
-            aSig >>= 1;
-        }
-        q = ( bSig <= aSig );
-        if ( q ) aSig -= bSig;
-        if ( 0 < expDiff ) {
-            q = ( ( (uint64_t) aSig )<<32 ) / bSig;
-            q >>= 32 - expDiff;
-            bSig >>= 2;
-            aSig = ( ( aSig>>1 )<<( expDiff - 1 ) ) - bSig * q;
-        }
-        else {
-            aSig >>= 2;
-            bSig >>= 2;
-        }
-    }
-    else {
-        if ( bSig <= aSig ) aSig -= bSig;
-        aSig64 = ( (uint64_t) aSig )<<40;
-        bSig64 = ( (uint64_t) bSig )<<40;
-        expDiff -= 64;
-        while ( 0 < expDiff ) {
-            q64 = estimateDiv128To64( aSig64, 0, bSig64 );
-            q64 = ( 2 < q64 ) ? q64 - 2 : 0;
-            aSig64 = - ( ( bSig * q64 )<<38 );
-            expDiff -= 62;
-        }
-        expDiff += 64;
-        q64 = estimateDiv128To64( aSig64, 0, bSig64 );
-        q64 = ( 2 < q64 ) ? q64 - 2 : 0;
-        q = q64>>( 64 - expDiff );
-        bSig <<= 6;
-        aSig = ( ( aSig64>>33 )<<( expDiff - 1 ) ) - bSig * q;
-    }
-    do {
-        alternateASig = aSig;
-        ++q;
-        aSig -= bSig;
-    } while ( 0 <= (int32_t) aSig );
-    sigMean = aSig + alternateASig;
-    if ( ( sigMean < 0 ) || ( ( sigMean == 0 ) && ( q & 1 ) ) ) {
-        aSig = alternateASig;
-    }
-    zSign = ( (int32_t) aSig < 0 );
-    if ( zSign ) aSig = - aSig;
-    return normalizeRoundAndPackFloat32(aSign ^ zSign, bExp, aSig, status);
-}
-
-
-
 /*----------------------------------------------------------------------------
 | Returns the binary exponential of the single-precision floating-point value
 | `a'. The operation is performed according to the IEC/IEEE Standard for
@@ -5275,94 +4806,6 @@ float32 float32_exp2(float32 a, float_status *status)
     return float32_round_pack_canonical(&rp, status);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the remainder of the double-precision floating-point value `a'
-| with respect to the corresponding value `b'.  The operation is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 float64_rem(float64 a, float64 b, float_status *status)
-{
-    bool aSign, zSign;
-    int aExp, bExp, expDiff;
-    uint64_t aSig, bSig;
-    uint64_t q, alternateASig;
-    int64_t sigMean;
-
-    a = float64_squash_input_denormal(a, status);
-    b = float64_squash_input_denormal(b, status);
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    bSig = extractFloat64Frac( b );
-    bExp = extractFloat64Exp( b );
-    if ( aExp == 0x7FF ) {
-        if ( aSig || ( ( bExp == 0x7FF ) && bSig ) ) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        float_raise(float_flag_invalid, status);
-        return float64_default_nan(status);
-    }
-    if ( bExp == 0x7FF ) {
-        if (bSig) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        return a;
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) {
-            float_raise(float_flag_invalid, status);
-            return float64_default_nan(status);
-        }
-        normalizeFloat64Subnormal( bSig, &bExp, &bSig );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return a;
-        normalizeFloat64Subnormal( aSig, &aExp, &aSig );
-    }
-    expDiff = aExp - bExp;
-    aSig = (aSig | UINT64_C(0x0010000000000000)) << 11;
-    bSig = (bSig | UINT64_C(0x0010000000000000)) << 11;
-    if ( expDiff < 0 ) {
-        if ( expDiff < -1 ) return a;
-        aSig >>= 1;
-    }
-    q = ( bSig <= aSig );
-    if ( q ) aSig -= bSig;
-    expDiff -= 64;
-    while ( 0 < expDiff ) {
-        q = estimateDiv128To64( aSig, 0, bSig );
-        q = ( 2 < q ) ? q - 2 : 0;
-        aSig = - ( ( bSig>>2 ) * q );
-        expDiff -= 62;
-    }
-    expDiff += 64;
-    if ( 0 < expDiff ) {
-        q = estimateDiv128To64( aSig, 0, bSig );
-        q = ( 2 < q ) ? q - 2 : 0;
-        q >>= 64 - expDiff;
-        bSig >>= 2;
-        aSig = ( ( aSig>>1 )<<( expDiff - 1 ) ) - bSig * q;
-    }
-    else {
-        aSig >>= 2;
-        bSig >>= 2;
-    }
-    do {
-        alternateASig = aSig;
-        ++q;
-        aSig -= bSig;
-    } while ( 0 <= (int64_t) aSig );
-    sigMean = aSig + alternateASig;
-    if ( ( sigMean < 0 ) || ( ( sigMean == 0 ) && ( q & 1 ) ) ) {
-        aSig = alternateASig;
-    }
-    zSign = ( (int64_t) aSig < 0 );
-    if ( zSign ) aSig = - aSig;
-    return normalizeRoundAndPackFloat64(aSign ^ zSign, bExp, aSig, status);
-
-}
-
 /*----------------------------------------------------------------------------
 | Rounds the extended double-precision floating-point value `a'
 | to the precision provided by floatx80_rounding_precision and returns the
@@ -5381,266 +4824,6 @@ floatx80 floatx80_round(floatx80 a, float_status *status)
     return floatx80_round_pack_canonical(&p, status);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the remainder of the extended double-precision floating-point value
-| `a' with respect to the corresponding value `b'.  The operation is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic,
-| if 'mod' is false; if 'mod' is true, return the remainder based on truncating
-| the quotient toward zero instead.  '*quotient' is set to the low 64 bits of
-| the absolute value of the integer quotient.
-*----------------------------------------------------------------------------*/
-
-floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod, uint64_t *quotient,
-                         float_status *status)
-{
-    bool aSign, zSign;
-    int32_t aExp, bExp, expDiff, aExpOrig;
-    uint64_t aSig0, aSig1, bSig;
-    uint64_t q, term0, term1, alternateASig0, alternateASig1;
-
-    *quotient = 0;
-    if (floatx80_invalid_encoding(a) || floatx80_invalid_encoding(b)) {
-        float_raise(float_flag_invalid, status);
-        return floatx80_default_nan(status);
-    }
-    aSig0 = extractFloatx80Frac( a );
-    aExpOrig = aExp = extractFloatx80Exp( a );
-    aSign = extractFloatx80Sign( a );
-    bSig = extractFloatx80Frac( b );
-    bExp = extractFloatx80Exp( b );
-    if ( aExp == 0x7FFF ) {
-        if (    (uint64_t) ( aSig0<<1 )
-             || ( ( bExp == 0x7FFF ) && (uint64_t) ( bSig<<1 ) ) ) {
-            return propagateFloatx80NaN(a, b, status);
-        }
-        goto invalid;
-    }
-    if ( bExp == 0x7FFF ) {
-        if ((uint64_t)(bSig << 1)) {
-            return propagateFloatx80NaN(a, b, status);
-        }
-        if (aExp == 0 && aSig0 >> 63) {
-            /*
-             * Pseudo-denormal argument must be returned in normalized
-             * form.
-             */
-            return packFloatx80(aSign, 1, aSig0);
-        }
-        return a;
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) {
- invalid:
-            float_raise(float_flag_invalid, status);
-            return floatx80_default_nan(status);
-        }
-        normalizeFloatx80Subnormal( bSig, &bExp, &bSig );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig0 == 0 ) return a;
-        normalizeFloatx80Subnormal( aSig0, &aExp, &aSig0 );
-    }
-    zSign = aSign;
-    expDiff = aExp - bExp;
-    aSig1 = 0;
-    if ( expDiff < 0 ) {
-        if ( mod || expDiff < -1 ) {
-            if (aExp == 1 && aExpOrig == 0) {
-                /*
-                 * Pseudo-denormal argument must be returned in
-                 * normalized form.
-                 */
-                return packFloatx80(aSign, aExp, aSig0);
-            }
-            return a;
-        }
-        shift128Right( aSig0, 0, 1, &aSig0, &aSig1 );
-        expDiff = 0;
-    }
-    *quotient = q = ( bSig <= aSig0 );
-    if ( q ) aSig0 -= bSig;
-    expDiff -= 64;
-    while ( 0 < expDiff ) {
-        q = estimateDiv128To64( aSig0, aSig1, bSig );
-        q = ( 2 < q ) ? q - 2 : 0;
-        mul64To128( bSig, q, &term0, &term1 );
-        sub128( aSig0, aSig1, term0, term1, &aSig0, &aSig1 );
-        shortShift128Left( aSig0, aSig1, 62, &aSig0, &aSig1 );
-        expDiff -= 62;
-        *quotient <<= 62;
-        *quotient += q;
-    }
-    expDiff += 64;
-    if ( 0 < expDiff ) {
-        q = estimateDiv128To64( aSig0, aSig1, bSig );
-        q = ( 2 < q ) ? q - 2 : 0;
-        q >>= 64 - expDiff;
-        mul64To128( bSig, q<<( 64 - expDiff ), &term0, &term1 );
-        sub128( aSig0, aSig1, term0, term1, &aSig0, &aSig1 );
-        shortShift128Left( 0, bSig, 64 - expDiff, &term0, &term1 );
-        while ( le128( term0, term1, aSig0, aSig1 ) ) {
-            ++q;
-            sub128( aSig0, aSig1, term0, term1, &aSig0, &aSig1 );
-        }
-        if (expDiff < 64) {
-            *quotient <<= expDiff;
-        } else {
-            *quotient = 0;
-        }
-        *quotient += q;
-    }
-    else {
-        term1 = 0;
-        term0 = bSig;
-    }
-    if (!mod) {
-        sub128( term0, term1, aSig0, aSig1, &alternateASig0, &alternateASig1 );
-        if (    lt128( alternateASig0, alternateASig1, aSig0, aSig1 )
-                || (    eq128( alternateASig0, alternateASig1, aSig0, aSig1 )
-                        && ( q & 1 ) )
-            ) {
-            aSig0 = alternateASig0;
-            aSig1 = alternateASig1;
-            zSign = ! zSign;
-            ++*quotient;
-        }
-    }
-    return
-        normalizeRoundAndPackFloatx80(
-            floatx80_precision_x, zSign, bExp + expDiff, aSig0, aSig1, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the remainder of the extended double-precision floating-point value
-| `a' with respect to the corresponding value `b'.  The operation is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
-{
-    uint64_t quotient;
-    return floatx80_modrem(a, b, false, &quotient, status);
-}
-
-/*----------------------------------------------------------------------------
-| Returns the remainder of the extended double-precision floating-point value
-| `a' with respect to the corresponding value `b', with the quotient truncated
-| toward zero.
-*----------------------------------------------------------------------------*/
-
-floatx80 floatx80_mod(floatx80 a, floatx80 b, float_status *status)
-{
-    uint64_t quotient;
-    return floatx80_modrem(a, b, true, &quotient, status);
-}
-
-/*----------------------------------------------------------------------------
-| Returns the remainder of the quadruple-precision floating-point value `a'
-| with respect to the corresponding value `b'.  The operation is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float128 float128_rem(float128 a, float128 b, float_status *status)
-{
-    bool aSign, zSign;
-    int32_t aExp, bExp, expDiff;
-    uint64_t aSig0, aSig1, bSig0, bSig1, q, term0, term1, term2;
-    uint64_t allZero, alternateASig0, alternateASig1, sigMean1;
-    int64_t sigMean0;
-
-    aSig1 = extractFloat128Frac1( a );
-    aSig0 = extractFloat128Frac0( a );
-    aExp = extractFloat128Exp( a );
-    aSign = extractFloat128Sign( a );
-    bSig1 = extractFloat128Frac1( b );
-    bSig0 = extractFloat128Frac0( b );
-    bExp = extractFloat128Exp( b );
-    if ( aExp == 0x7FFF ) {
-        if (    ( aSig0 | aSig1 )
-             || ( ( bExp == 0x7FFF ) && ( bSig0 | bSig1 ) ) ) {
-            return propagateFloat128NaN(a, b, status);
-        }
-        goto invalid;
-    }
-    if ( bExp == 0x7FFF ) {
-        if (bSig0 | bSig1) {
-            return propagateFloat128NaN(a, b, status);
-        }
-        return a;
-    }
-    if ( bExp == 0 ) {
-        if ( ( bSig0 | bSig1 ) == 0 ) {
- invalid:
-            float_raise(float_flag_invalid, status);
-            return float128_default_nan(status);
-        }
-        normalizeFloat128Subnormal( bSig0, bSig1, &bExp, &bSig0, &bSig1 );
-    }
-    if ( aExp == 0 ) {
-        if ( ( aSig0 | aSig1 ) == 0 ) return a;
-        normalizeFloat128Subnormal( aSig0, aSig1, &aExp, &aSig0, &aSig1 );
-    }
-    expDiff = aExp - bExp;
-    if ( expDiff < -1 ) return a;
-    shortShift128Left(
-        aSig0 | UINT64_C(0x0001000000000000),
-        aSig1,
-        15 - ( expDiff < 0 ),
-        &aSig0,
-        &aSig1
-    );
-    shortShift128Left(
-        bSig0 | UINT64_C(0x0001000000000000), bSig1, 15, &bSig0, &bSig1 );
-    q = le128( bSig0, bSig1, aSig0, aSig1 );
-    if ( q ) sub128( aSig0, aSig1, bSig0, bSig1, &aSig0, &aSig1 );
-    expDiff -= 64;
-    while ( 0 < expDiff ) {
-        q = estimateDiv128To64( aSig0, aSig1, bSig0 );
-        q = ( 4 < q ) ? q - 4 : 0;
-        mul128By64To192( bSig0, bSig1, q, &term0, &term1, &term2 );
-        shortShift192Left( term0, term1, term2, 61, &term1, &term2, &allZero );
-        shortShift128Left( aSig0, aSig1, 61, &aSig0, &allZero );
-        sub128( aSig0, 0, term1, term2, &aSig0, &aSig1 );
-        expDiff -= 61;
-    }
-    if ( -64 < expDiff ) {
-        q = estimateDiv128To64( aSig0, aSig1, bSig0 );
-        q = ( 4 < q ) ? q - 4 : 0;
-        q >>= - expDiff;
-        shift128Right( bSig0, bSig1, 12, &bSig0, &bSig1 );
-        expDiff += 52;
-        if ( expDiff < 0 ) {
-            shift128Right( aSig0, aSig1, - expDiff, &aSig0, &aSig1 );
-        }
-        else {
-            shortShift128Left( aSig0, aSig1, expDiff, &aSig0, &aSig1 );
-        }
-        mul128By64To192( bSig0, bSig1, q, &term0, &term1, &term2 );
-        sub128( aSig0, aSig1, term1, term2, &aSig0, &aSig1 );
-    }
-    else {
-        shift128Right( aSig0, aSig1, 12, &aSig0, &aSig1 );
-        shift128Right( bSig0, bSig1, 12, &bSig0, &bSig1 );
-    }
-    do {
-        alternateASig0 = aSig0;
-        alternateASig1 = aSig1;
-        ++q;
-        sub128( aSig0, aSig1, bSig0, bSig1, &aSig0, &aSig1 );
-    } while ( 0 <= (int64_t) aSig0 );
-    add128(
-        aSig0, aSig1, alternateASig0, alternateASig1, (uint64_t *)&sigMean0, &sigMean1 );
-    if (    ( sigMean0 < 0 )
-         || ( ( ( sigMean0 | sigMean1 ) == 0 ) && ( q & 1 ) ) ) {
-        aSig0 = alternateASig0;
-        aSig1 = alternateASig1;
-    }
-    zSign = ( (int64_t) aSig0 < 0 );
-    if ( zSign ) sub128( 0, 0, aSig0, aSig1, &aSig0, &aSig1 );
-    return normalizeRoundAndPackFloat128(aSign ^ zSign, bExp - 4, aSig0, aSig1,
-                                         status);
-}
 static void __attribute__((constructor)) softfloat_init(void)
 {
     union_float64 ua, ub, uc, ur;
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 3659c6a4c0..1bc8db4cf1 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -625,6 +625,40 @@ static FloatPartsN *partsN(div)(FloatPartsN *a, FloatPartsN *b,
     return a;
 }
 
+/*
+ * Floating point remainder, per IEC/IEEE, or modulus.
+ */
+static FloatPartsN *partsN(modrem)(FloatPartsN *a, FloatPartsN *b,
+                                   uint64_t *mod_quot, float_status *s)
+{
+    int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
+
+    if (likely(ab_mask == float_cmask_normal)) {
+        frac_modrem(a, b, mod_quot);
+        return a;
+    }
+
+    if (mod_quot) {
+        *mod_quot = 0;
+    }
+
+    /* All the NaN cases */
+    if (unlikely(ab_mask & float_cmask_anynan)) {
+        return parts_pick_nan(a, b, s);
+    }
+
+    /* Inf % N; N % 0 */
+    if (a->cls == float_class_inf || b->cls == float_class_zero) {
+        float_raise(float_flag_invalid, s);
+        parts_default_nan(a, s);
+        return a;
+    }
+
+    /* N % Inf; 0 % N */
+    g_assert(b->cls == float_class_inf || a->cls == float_class_zero);
+    return a;
+}
+
 /*
  * Square Root
  *
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index 88eab344df..6754a94afc 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -636,62 +636,6 @@ static int pickNaNMulAdd(FloatClass a_cls, FloatClass b_cls, FloatClass c_cls,
 #endif
 }
 
-/*----------------------------------------------------------------------------
-| Takes two single-precision floating-point values `a' and `b', one of which
-| is a NaN, and returns the appropriate NaN result.  If either `a' or `b' is a
-| signaling NaN, the invalid exception is raised.
-*----------------------------------------------------------------------------*/
-
-static float32 propagateFloat32NaN(float32 a, float32 b, float_status *status)
-{
-    bool aIsLargerSignificand;
-    uint32_t av, bv;
-    FloatClass a_cls, b_cls;
-
-    /* This is not complete, but is good enough for pickNaN.  */
-    a_cls = (!float32_is_any_nan(a)
-             ? float_class_normal
-             : float32_is_signaling_nan(a, status)
-             ? float_class_snan
-             : float_class_qnan);
-    b_cls = (!float32_is_any_nan(b)
-             ? float_class_normal
-             : float32_is_signaling_nan(b, status)
-             ? float_class_snan
-             : float_class_qnan);
-
-    av = float32_val(a);
-    bv = float32_val(b);
-
-    if (is_snan(a_cls) || is_snan(b_cls)) {
-        float_raise(float_flag_invalid, status);
-    }
-
-    if (status->default_nan_mode) {
-        return float32_default_nan(status);
-    }
-
-    if ((uint32_t)(av << 1) < (uint32_t)(bv << 1)) {
-        aIsLargerSignificand = 0;
-    } else if ((uint32_t)(bv << 1) < (uint32_t)(av << 1)) {
-        aIsLargerSignificand = 1;
-    } else {
-        aIsLargerSignificand = (av < bv) ? 1 : 0;
-    }
-
-    if (pickNaN(a_cls, b_cls, aIsLargerSignificand, status)) {
-        if (is_snan(b_cls)) {
-            return float32_silence_nan(b, status);
-        }
-        return b;
-    } else {
-        if (is_snan(a_cls)) {
-            return float32_silence_nan(a, status);
-        }
-        return a;
-    }
-}
-
 /*----------------------------------------------------------------------------
 | Returns 1 if the double-precision floating-point value `a' is a quiet
 | NaN; otherwise returns 0.
@@ -732,62 +676,6 @@ bool float64_is_signaling_nan(float64 a_, float_status *status)
     }
 }
 
-/*----------------------------------------------------------------------------
-| Takes two double-precision floating-point values `a' and `b', one of which
-| is a NaN, and returns the appropriate NaN result.  If either `a' or `b' is a
-| signaling NaN, the invalid exception is raised.
-*----------------------------------------------------------------------------*/
-
-static float64 propagateFloat64NaN(float64 a, float64 b, float_status *status)
-{
-    bool aIsLargerSignificand;
-    uint64_t av, bv;
-    FloatClass a_cls, b_cls;
-
-    /* This is not complete, but is good enough for pickNaN.  */
-    a_cls = (!float64_is_any_nan(a)
-             ? float_class_normal
-             : float64_is_signaling_nan(a, status)
-             ? float_class_snan
-             : float_class_qnan);
-    b_cls = (!float64_is_any_nan(b)
-             ? float_class_normal
-             : float64_is_signaling_nan(b, status)
-             ? float_class_snan
-             : float_class_qnan);
-
-    av = float64_val(a);
-    bv = float64_val(b);
-
-    if (is_snan(a_cls) || is_snan(b_cls)) {
-        float_raise(float_flag_invalid, status);
-    }
-
-    if (status->default_nan_mode) {
-        return float64_default_nan(status);
-    }
-
-    if ((uint64_t)(av << 1) < (uint64_t)(bv << 1)) {
-        aIsLargerSignificand = 0;
-    } else if ((uint64_t)(bv << 1) < (uint64_t)(av << 1)) {
-        aIsLargerSignificand = 1;
-    } else {
-        aIsLargerSignificand = (av < bv) ? 1 : 0;
-    }
-
-    if (pickNaN(a_cls, b_cls, aIsLargerSignificand, status)) {
-        if (is_snan(b_cls)) {
-            return float64_silence_nan(b, status);
-        }
-        return b;
-    } else {
-        if (is_snan(a_cls)) {
-            return float64_silence_nan(a, status);
-        }
-        return a;
-    }
-}
-
 /*----------------------------------------------------------------------------
 | Returns 1 if the extended double-precision floating-point value `a' is a
 | quiet NaN; otherwise returns 0. This slightly differs from the same
@@ -942,56 +830,3 @@ bool float128_is_signaling_nan(float128 a, float_status *status)
         }
     }
 }
-
-/*----------------------------------------------------------------------------
-| Takes two quadruple-precision floating-point values `a' and `b', one of
-| which is a NaN, and returns the appropriate NaN result.  If either `a' or
-| `b' is a signaling NaN, the invalid exception is raised.
-*----------------------------------------------------------------------------*/
-
-static float128 propagateFloat128NaN(float128 a, float128 b,
-                                     float_status *status)
-{
-    bool aIsLargerSignificand;
-    FloatClass a_cls, b_cls;
-
-    /* This is not complete, but is good enough for pickNaN.  */
-    a_cls = (!float128_is_any_nan(a)
-             ? float_class_normal
-             : float128_is_signaling_nan(a, status)
-             ? float_class_snan
-             : float_class_qnan);
-    b_cls = (!float128_is_any_nan(b)
-             ? float_class_normal
-             : float128_is_signaling_nan(b, status)
-             ? float_class_snan
-             : float_class_qnan);
-
-    if (is_snan(a_cls) || is_snan(b_cls)) {
-        float_raise(float_flag_invalid, status);
-    }
-
-    if (status->default_nan_mode) {
-        return float128_default_nan(status);
-    }
-
-    if (lt128(a.high << 1, a.low, b.high << 1, b.low)) {
-        aIsLargerSignificand = 0;
-    } else if (lt128(b.high << 1, b.low, a.high << 1, a.low)) {
-        aIsLargerSignificand = 1;
-    } else {
-        aIsLargerSignificand = (a.high < b.high) ? 1 : 0;
-    }
-
-    if (pickNaN(a_cls, b_cls, aIsLargerSignificand, status)) {
-        if (is_snan(b_cls)) {
-            return float128_silence_nan(b, status);
-        }
-        return b;
-    } else {
-        if (is_snan(a_cls)) {
-            return float128_silence_nan(a, status);
-        }
-        return a;
-    }
-}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* Re: [PATCH 00/72] Convert floatx80 and float128 to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (71 preceding siblings ...)
  2021-05-08  1:48 ` [PATCH 72/72] softfloat: Convert modrem operations to FloatParts Richard Henderson
@ 2021-05-08  2:52 ` no-reply
  2021-05-10 13:36 ` Alex Bennée
                   ` (3 subsequent siblings)
  76 siblings, 0 replies; 151+ messages in thread
From: no-reply @ 2021-05-08  2:52 UTC (permalink / raw)
  To: richard.henderson; +Cc: alex.bennee, qemu-devel, david

Patchew URL: https://patchew.org/QEMU/20210508014802.892561-1-richard.henderson@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210508014802.892561-1-richard.henderson@linaro.org
Subject: [PATCH 00/72] Convert floatx80 and float128 to FloatParts

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20210508014802.892561-1-richard.henderson@linaro.org -> patchew/20210508014802.892561-1-richard.henderson@linaro.org
Switched to a new branch 'test'
2dcaa72 softfloat: Convert modrem operations to FloatParts
2956f6c softfloat: Move floatN_log2 to softfloat-parts.c.inc
b786038 softfloat: Convert float32_exp2 to FloatParts
01ab7b4 softfloat: Convert floatx80 compare to FloatParts
d55b655 softfloat: Convert floatx80_scalbn to FloatParts
0b1869b softfloat: Convert floatx80 to integer to FloatParts
6287e68 softfloat: Convert floatx80 float conversions to FloatParts
5c28030 softfloat: Convert integer to floatx80 to FloatParts
e686e7d softfloat: Convert floatx80_round_to_int to FloatParts
9e7a606 softfloat: Convert floatx80_round to FloatParts
d19c872 softfloat: Convert floatx80_sqrt to FloatParts
06addff softfloat: Convert floatx80_div to FloatParts
6974166 softfloat: Convert floatx80_mul to FloatParts
0ab65d7 softfloat: Convert floatx80_add/sub to FloatParts
5ae98cd tests/fp/fp-test: Reverse order of floatx80 precision tests
dc5eada softfloat: Adjust parts_uncanon_normal for floatx80
0f4d3ce softfloat: Introduce Floatx80RoundPrec
aecc38b softfloat: Reduce FloatFmt
d8342e2 softfloat: Split out parts_uncanon_normal
e8a3234 softfloat: Move sqrt_float to softfloat-parts.c.inc
db066f3 softfloat: Move scalbn_decomposed to softfloat-parts.c.inc
870c56c softfloat: Move compare_floats to softfloat-parts.c.inc
4895586 softfloat: Move minmax_flags to softfloat-parts.c.inc
c9e01de softfloat: Move uint_to_float to softfloat-parts.c.inc
0b58632 softfloat: Move int_to_float to softfloat-parts.c.inc
e408986 softfloat: Move rount_to_uint_and_pack to softfloat-parts.c.inc
1330f41 softfloat: Move rount_to_int_and_pack to softfloat-parts.c.inc
2d631f0 softfloat: Move round_to_int to softfloat-parts.c.inc
45abdca softfloat: Convert float-to-float conversions with float128
0ddbdfd softfloat: Split float_to_float
296ad69 softfloat: Move div_floats to softfloat-parts.c.inc
d5ff786 softfloat: Introduce sh[lr]_double primitives
d227707 softfloat: Tidy mul128By64To192
5a9c124 softfloat: Use add192 in mul128To256
86d8505 softfloat: Use mulu64 for mul64To128
1186174 softfloat: Move muladd_floats to softfloat-parts.c.inc
5e4de79 softfloat: Move mul_floats to softfloat-parts.c.inc
1450eec softfloat: Implement float128_add/sub via parts
ed27dea softfloat: Move addsub_floats to softfloat-parts.c.inc
3fe4a87 softfloat: Use uadd64_carry, usub64_borrow in softfloat-macros.h
5687e8b softfloat: Move round_canonical to softfloat-parts.c.inc
ddf7181 softfloat: Move sf_canonicalize to softfloat-parts.c.inc
49b211c softfloat: Move pick_nan_muladd to softfloat-parts.c.inc
d6fae5a softfloat: Move pick_nan to softfloat-parts.c.inc
ada8c0f softfloat: Move return_nan to softfloat-parts.c.inc
701d4e6 softfloat: Convert float128_default_nan to parts
735bd6f softfloat: Convert float128_silence_nan to parts
be031db softfloat: Rearrange FloatParts64
997cce0 softfloat: Use pointers with parts_silence_nan
759543e softfloat: Use pointers with ftype_round_pack_canonical
59155e0 softfloat: Use pointers with ftype_unpack_canonical
67f866d softfloat: Use pointers with ftype_pack_raw
ad2e600 softfloat: Use pointers with pack_raw
6725bec softfloat: Use pointers with ftype_unpack_raw
6fa54f0 softfloat: Use pointers with unpack_raw
36916c3 softfloat: Use pointers with parts_default_nan
e255c56 softfloat: Move type-specific pack/unpack routines
17cab05 softfloat: Rename FloatParts to FloatParts64
7973d6d softfloat: Do not produce a default_nan from parts_silence_nan
7267830 target/mips: Set set_default_nan_mode with set_snan_bit_is_one
8acc1a9 softfloat: fix return_nan vs default_nan_mode
5c700cd softfloat: Use return_nan in float_to_float
e6a00e2 softfloat: Add float_cmask and constants
689fe83 softfloat: Tidy a * b + inf return
75285a4 softfloat: Use float_raise in more places
a8ea262 softfloat: Inline float_raise
3be68a5 softfloat: Move the binary point to the msb
e2236e1 tests/fp: add quad support to the benchmark utility
be7703a accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c
3731829 qemu/host-utils: Add wrappers for carry builtins
a1367bc qemu/host-utils: Add wrappers for overflow builtins
96efc5c qemu/host-utils: Use __builtin_bitreverseN

=== OUTPUT BEGIN ===
1/72 Checking commit 96efc5c8f4d0 (qemu/host-utils: Use __builtin_bitreverseN)
WARNING: architecture specific defines should be avoided
#24: FILE: include/qemu/host-utils.h:275:
+#if __has_builtin(__builtin_bitreverse8)

WARNING: architecture specific defines should be avoided
#42: FILE: include/qemu/host-utils.h:296:
+#if __has_builtin(__builtin_bitreverse16)

WARNING: architecture specific defines should be avoided
#60: FILE: include/qemu/host-utils.h:319:
+#if __has_builtin(__builtin_bitreverse32)

WARNING: architecture specific defines should be avoided
#78: FILE: include/qemu/host-utils.h:342:
+#if __has_builtin(__builtin_bitreverse64)

total: 0 errors, 4 warnings, 64 lines checked

Patch 1/72 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
2/72 Checking commit a1367bcd9ebb (qemu/host-utils: Add wrappers for overflow builtins)
WARNING: architecture specific defines should be avoided
#34: FILE: include/qemu/host-utils.h:369:
+#if __has_builtin(__builtin_add_overflow) || __GNUC__ >= 5

WARNING: architecture specific defines should be avoided
#52: FILE: include/qemu/host-utils.h:387:
+#if __has_builtin(__builtin_add_overflow) || __GNUC__ >= 5

WARNING: architecture specific defines should be avoided
#70: FILE: include/qemu/host-utils.h:405:
+#if __has_builtin(__builtin_add_overflow) || __GNUC__ >= 5

WARNING: architecture specific defines should be avoided
#88: FILE: include/qemu/host-utils.h:423:
+#if __has_builtin(__builtin_add_overflow) || __GNUC__ >= 5

WARNING: architecture specific defines should be avoided
#107: FILE: include/qemu/host-utils.h:442:
+#if __has_builtin(__builtin_sub_overflow) || __GNUC__ >= 5

WARNING: architecture specific defines should be avoided
#126: FILE: include/qemu/host-utils.h:461:
+#if __has_builtin(__builtin_sub_overflow) || __GNUC__ >= 5

WARNING: architecture specific defines should be avoided
#145: FILE: include/qemu/host-utils.h:480:
+#if __has_builtin(__builtin_sub_overflow) || __GNUC__ >= 5

WARNING: architecture specific defines should be avoided
#164: FILE: include/qemu/host-utils.h:499:
+#if __has_builtin(__builtin_sub_overflow) || __GNUC__ >= 5

WARNING: architecture specific defines should be avoided
#182: FILE: include/qemu/host-utils.h:517:
+#if __has_builtin(__builtin_mul_overflow) || __GNUC__ >= 5

WARNING: architecture specific defines should be avoided
#201: FILE: include/qemu/host-utils.h:536:
+#if __has_builtin(__builtin_mul_overflow) || __GNUC__ >= 5

WARNING: architecture specific defines should be avoided
#221: FILE: include/qemu/host-utils.h:556:
+#if __has_builtin(__builtin_mul_overflow) || __GNUC__ >= 5

WARNING: architecture specific defines should be avoided
#240: FILE: include/qemu/host-utils.h:575:
+#if __has_builtin(__builtin_mul_overflow) || __GNUC__ >= 5

total: 0 errors, 12 warnings, 231 lines checked

Patch 2/72 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
3/72 Checking commit 37318292524c (qemu/host-utils: Add wrappers for carry builtins)
WARNING: architecture specific defines should be avoided
#43: FILE: include/qemu/host-utils.h:595:
+#if __has_builtin(__builtin_addcll)

WARNING: architecture specific defines should be avoided
#68: FILE: include/qemu/host-utils.h:620:
+#if __has_builtin(__builtin_subcll)

total: 0 errors, 2 warnings, 62 lines checked

Patch 3/72 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
4/72 Checking commit be7703ad3d1d (accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c)
5/72 Checking commit e2236e18fc30 (tests/fp: add quad support to the benchmark utility)
WARNING: line over 80 characters
#182: FILE: tests/fp/fp-bench.c:458:
+    GEN_BENCH_NO_NEG(bench_ ## name ## _float128, float128, PREC_FLOAT128, op, n)

WARNING: line over 80 characters
#199: FILE: tests/fp/fp-bench.c:521:
+    fprintf(stderr, " -p = floating point precision (single, double, quad[soft only]). "

total: 0 errors, 2 warnings, 185 lines checked

Patch 5/72 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
6/72 Checking commit 3be68a54ecda (softfloat: Move the binary point to the msb)
7/72 Checking commit a8ea262fb400 (softfloat: Inline float_raise)
8/72 Checking commit 75285a4341ef (softfloat: Use float_raise in more places)
9/72 Checking commit 689fe83bcde1 (softfloat: Tidy a * b + inf return)
10/72 Checking commit e6a00e231540 (softfloat: Add float_cmask and constants)
11/72 Checking commit 5c700cd63df0 (softfloat: Use return_nan in float_to_float)
12/72 Checking commit 8acc1a959a15 (softfloat: fix return_nan vs default_nan_mode)
13/72 Checking commit 7267830dd66c (target/mips: Set set_default_nan_mode with set_snan_bit_is_one)
14/72 Checking commit 7973d6db759b (softfloat: Do not produce a default_nan from parts_silence_nan)
15/72 Checking commit 17cab0560556 (softfloat: Rename FloatParts to FloatParts64)
WARNING: line over 80 characters
#235: FILE: fpu/softfloat.c:928:
+static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64 c,

WARNING: line over 80 characters
#390: FILE: fpu/softfloat.c:1347:
+static FloatParts64 muladd_floats(FloatParts64 a, FloatParts64 b, FloatParts64 c,

WARNING: line over 80 characters
#837: FILE: fpu/softfloat.c:3189:
+static FloatRelation compare_floats(FloatParts64 a, FloatParts64 b, bool is_quiet,

WARNING: Block comments use a leading /* on a separate line
#875: FILE: fpu/softfloat.c:3374:
+        /* The largest float type (even though not supported by FloatParts64)

WARNING: line over 80 characters
#926: FILE: fpu/softfloat.c:3425:
+static FloatParts64 sqrt_float(FloatParts64 a, float_status *s, const FloatFmt *p)

total: 0 errors, 5 warnings, 1002 lines checked

Patch 15/72 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
16/72 Checking commit e255c5631e4f (softfloat: Move type-specific pack/unpack routines)
17/72 Checking commit 36916c3edd6a (softfloat: Use pointers with parts_default_nan)
18/72 Checking commit 6fa54f0a9176 (softfloat: Use pointers with unpack_raw)
19/72 Checking commit 6725bec472d1 (softfloat: Use pointers with ftype_unpack_raw)
20/72 Checking commit ad2e6003ba80 (softfloat: Use pointers with pack_raw)
21/72 Checking commit 67f866dc3382 (softfloat: Use pointers with ftype_pack_raw)
22/72 Checking commit 59155e0623d2 (softfloat: Use pointers with ftype_unpack_canonical)
23/72 Checking commit 759543ead10f (softfloat: Use pointers with ftype_round_pack_canonical)
24/72 Checking commit 997cce09e9b5 (softfloat: Use pointers with parts_silence_nan)
25/72 Checking commit be031db06eb0 (softfloat: Rearrange FloatParts64)
26/72 Checking commit 735bd6fba22f (softfloat: Convert float128_silence_nan to parts)
27/72 Checking commit 701d4e63d467 (softfloat: Convert float128_default_nan to parts)
28/72 Checking commit ada8c0fd97c8 (softfloat: Move return_nan to softfloat-parts.c.inc)
Use of uninitialized value $acpi_testexpected in string eq at ./scripts/checkpatch.pl line 1529.
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#18: 
new file mode 100644

total: 0 errors, 1 warnings, 124 lines checked

Patch 28/72 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
29/72 Checking commit d6fae5a986f6 (softfloat: Move pick_nan to softfloat-parts.c.inc)
30/72 Checking commit 49b211c61f33 (softfloat: Move pick_nan_muladd to softfloat-parts.c.inc)
31/72 Checking commit ddf7181b7486 (softfloat: Move sf_canonicalize to softfloat-parts.c.inc)
32/72 Checking commit 5687e8bdf817 (softfloat: Move round_canonical to softfloat-parts.c.inc)
33/72 Checking commit 3fe4a87f6fd6 (softfloat: Use uadd64_carry, usub64_borrow in softfloat-macros.h)
34/72 Checking commit ed27dea74977 (softfloat: Move addsub_floats to softfloat-parts.c.inc)
Use of uninitialized value $acpi_testexpected in string eq at ./scripts/checkpatch.pl line 1529.
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#19: 
new file mode 100644

ERROR: space required after that ',' (ctx:VxV)
#270: FILE: fpu/softfloat.c:957:
+#define partsN(NAME)   glue(glue(glue(parts,N),_),NAME)
                                            ^

ERROR: space required after that ',' (ctx:VxV)
#270: FILE: fpu/softfloat.c:957:
+#define partsN(NAME)   glue(glue(glue(parts,N),_),NAME)
                                               ^

ERROR: space required after that ',' (ctx:VxV)
#270: FILE: fpu/softfloat.c:957:
+#define partsN(NAME)   glue(glue(glue(parts,N),_),NAME)
                                                  ^

ERROR: space required after that ',' (ctx:VxV)
#271: FILE: fpu/softfloat.c:958:
+#define FloatPartsN    glue(FloatParts,N)
                                       ^

total: 4 errors, 1 warnings, 489 lines checked

Patch 34/72 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

35/72 Checking commit 1450eec1d1a9 (softfloat: Implement float128_add/sub via parts)
36/72 Checking commit 5e4de79a96ed (softfloat: Move mul_floats to softfloat-parts.c.inc)
ERROR: space required after that ',' (ctx:VxV)
#151: FILE: fpu/softfloat.c:1003:
+#define FloatPartsW    glue(FloatParts,W)
                                       ^

total: 1 errors, 0 warnings, 350 lines checked

Patch 36/72 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

37/72 Checking commit 11861744634d (softfloat: Move muladd_floats to softfloat-parts.c.inc)
38/72 Checking commit 86d850509935 (softfloat: Use mulu64 for mul64To128)
39/72 Checking commit 5a9c12452902 (softfloat: Use add192 in mul128To256)
ERROR: space prohibited after that open parenthesis '('
#61: FILE: include/fpu/softfloat-macros.h:527:
+    add192( 0, m1, m2,  0, n1, n2, &m0, &m1, &m2);

total: 1 errors, 0 warnings, 46 lines checked

Patch 39/72 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

40/72 Checking commit d22770700ad7 (softfloat: Tidy mul128By64To192)
41/72 Checking commit d5ff7869a46b (softfloat: Introduce sh[lr]_double primitives)
WARNING: architecture specific defines should be avoided
#203: FILE: include/fpu/softfloat-macros.h:98:
+#if defined(__x86_64__)

WARNING: architecture specific defines should be avoided
#221: FILE: include/fpu/softfloat-macros.h:116:
+#if defined(__x86_64__)

total: 0 errors, 2 warnings, 199 lines checked

Patch 41/72 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
42/72 Checking commit 296ad691084d (softfloat: Move div_floats to softfloat-parts.c.inc)
43/72 Checking commit 0ddbdfdd1a7c (softfloat: Split float_to_float)
44/72 Checking commit 45abdcae4504 (softfloat: Convert float-to-float conversions with float128)
45/72 Checking commit 2d631f07b19f (softfloat: Move round_to_int to softfloat-parts.c.inc)
46/72 Checking commit 1330f41e01bb (softfloat: Move rount_to_int_and_pack to softfloat-parts.c.inc)
47/72 Checking commit e408986129be (softfloat: Move rount_to_uint_and_pack to softfloat-parts.c.inc)
48/72 Checking commit 0b586325bf5a (softfloat: Move int_to_float to softfloat-parts.c.inc)
49/72 Checking commit c9e01de1b349 (softfloat: Move uint_to_float to softfloat-parts.c.inc)
50/72 Checking commit 4895586cb8dc (softfloat: Move minmax_flags to softfloat-parts.c.inc)
51/72 Checking commit 870c56c4e127 (softfloat: Move compare_floats to softfloat-parts.c.inc)
52/72 Checking commit db066f333726 (softfloat: Move scalbn_decomposed to softfloat-parts.c.inc)
53/72 Checking commit e8a323470fbf (softfloat: Move sqrt_float to softfloat-parts.c.inc)
54/72 Checking commit d8342e21ffda (softfloat: Split out parts_uncanon_normal)
55/72 Checking commit aecc38becfb6 (softfloat: Reduce FloatFmt)
56/72 Checking commit 0f4d3ce1b908 (softfloat: Introduce Floatx80RoundPrec)
57/72 Checking commit dc5eada61c8f (softfloat: Adjust parts_uncanon_normal for floatx80)
58/72 Checking commit 5ae98cd1dd64 (tests/fp/fp-test: Reverse order of floatx80 precision tests)
59/72 Checking commit 0ab65d7f28d9 (softfloat: Convert floatx80_add/sub to FloatParts)
60/72 Checking commit 6974166c3993 (softfloat: Convert floatx80_mul to FloatParts)
61/72 Checking commit 06addffabfe9 (softfloat: Convert floatx80_div to FloatParts)
62/72 Checking commit d19c872604c3 (softfloat: Convert floatx80_sqrt to FloatParts)
63/72 Checking commit 9e7a60645c6d (softfloat: Convert floatx80_round to FloatParts)
64/72 Checking commit e686e7da3302 (softfloat: Convert floatx80_round_to_int to FloatParts)
65/72 Checking commit 5c28030e4ea5 (softfloat: Convert integer to floatx80 to FloatParts)
66/72 Checking commit 6287e685faf8 (softfloat: Convert floatx80 float conversions to FloatParts)
67/72 Checking commit 0b1869b8f8b3 (softfloat: Convert floatx80 to integer to FloatParts)
68/72 Checking commit d55b65538352 (softfloat: Convert floatx80_scalbn to FloatParts)
69/72 Checking commit 01ab7b4aed96 (softfloat: Convert floatx80 compare to FloatParts)
70/72 Checking commit b78603823353 (softfloat: Convert float32_exp2 to FloatParts)
71/72 Checking commit 2956f6c7c5d3 (softfloat: Move floatN_log2 to softfloat-parts.c.inc)
Use of uninitialized value $acpi_testexpected in string eq at ./scripts/checkpatch.pl line 1529.
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#325: 
new file mode 100644

ERROR: trailing whitespace
#376: FILE: tests/fp/fp-test-log2.c:47:
+   $

total: 1 errors, 1 warnings, 410 lines checked

Patch 71/72 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

72/72 Checking commit 2dcaa72f9b5e (softfloat: Convert modrem operations to FloatParts)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210508014802.892561-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 07/72] softfloat: Inline float_raise
  2021-05-08  1:46 ` [PATCH 07/72] softfloat: Inline float_raise Richard Henderson
@ 2021-05-09  8:32   ` Philippe Mathieu-Daudé
  2021-05-11 10:04   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-05-09  8:32 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee, david

On 5/8/21 3:46 AM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/fpu/softfloat.h        |  5 ++++-
>  fpu/softfloat-specialize.c.inc | 12 ------------
>  2 files changed, 4 insertions(+), 13 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 08/72] softfloat: Use float_raise in more places
  2021-05-08  1:46 ` [PATCH 08/72] softfloat: Use float_raise in more places Richard Henderson
@ 2021-05-09  8:34   ` Philippe Mathieu-Daudé
  2021-05-11 10:06   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-05-09  8:34 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee, david

On 5/8/21 3:46 AM, Richard Henderson wrote:
> We have been somewhat inconsistent about when to use
> float_raise and when to or in the bit by hand.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  fpu/softfloat.c | 87 +++++++++++++++++++++++++------------------------
>  1 file changed, 44 insertions(+), 43 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 04/72] accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c
  2021-05-08  1:46 ` [PATCH 04/72] accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c Richard Henderson
@ 2021-05-09  8:43   ` Philippe Mathieu-Daudé
  2021-05-10 13:02   ` Alex Bennée
  2021-05-11  9:46   ` David Hildenbrand
  2 siblings, 0 replies; 151+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-05-09  8:43 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee, david

On 5/8/21 3:46 AM, Richard Henderson wrote:
> Obvious uses of the new functions.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/tcg-runtime-gvec.c | 36 ++++++++++++++++--------------------
>  1 file changed, 16 insertions(+), 20 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 15/72] softfloat: Rename FloatParts to FloatParts64
  2021-05-08  1:47 ` [PATCH 15/72] softfloat: Rename FloatParts to FloatParts64 Richard Henderson
@ 2021-05-09  8:45   ` Philippe Mathieu-Daudé
  2021-05-11 10:16   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-05-09  8:45 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee, david

On 5/8/21 3:47 AM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  fpu/softfloat.c                | 362 ++++++++++++++++-----------------
>  fpu/softfloat-specialize.c.inc |   6 +-
>  2 files changed, 184 insertions(+), 184 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 16/72] softfloat: Move type-specific pack/unpack routines
  2021-05-08  1:47 ` [PATCH 16/72] softfloat: Move type-specific pack/unpack routines Richard Henderson
@ 2021-05-09  8:46   ` Philippe Mathieu-Daudé
  2021-05-11 10:17   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-05-09  8:46 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee, david

On 5/8/21 3:47 AM, Richard Henderson wrote:
> In preparation from moving sf_canonicalize.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  fpu/softfloat.c | 109 +++++++++++++++++++++++++-----------------------
>  1 file changed, 56 insertions(+), 53 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 18/72] softfloat: Use pointers with unpack_raw
  2021-05-08  1:47 ` [PATCH 18/72] softfloat: Use pointers with unpack_raw Richard Henderson
@ 2021-05-09  8:48   ` Philippe Mathieu-Daudé
  2021-05-12 12:58   ` Alex Bennée
  1 sibling, 0 replies; 151+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-05-09  8:48 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee, david

On 5/8/21 3:47 AM, Richard Henderson wrote:
> At the same time, rename to unpack_raw64.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  fpu/softfloat.c | 29 +++++++++++++++++++----------
>  1 file changed, 19 insertions(+), 10 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 01/72] qemu/host-utils: Use __builtin_bitreverseN
  2021-05-08  1:46 ` [PATCH 01/72] qemu/host-utils: Use __builtin_bitreverseN Richard Henderson
@ 2021-05-10  9:59   ` Alex Bennée
  2021-05-11  9:41   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-10  9:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Clang has added some builtins for these operations;
> use them if available.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 02/72] qemu/host-utils: Add wrappers for overflow builtins
  2021-05-08  1:46 ` [PATCH 02/72] qemu/host-utils: Add wrappers for overflow builtins Richard Henderson
@ 2021-05-10 10:22   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-10 10:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> These builtins came in with gcc 5 and clang 3.8, which are
> slightly newer than our supported minimum compiler versions.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/qemu/host-utils.h | 225 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 225 insertions(+)
>
> diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
> index f1e52851e0..fd76f0cbd3 100644
> --- a/include/qemu/host-utils.h
> +++ b/include/qemu/host-utils.h
> @@ -356,6 +356,231 @@ static inline uint64_t revbit64(uint64_t x)
>  #endif
>  }
>  
> +/**
> + * sadd32_overflow - addition with overflow indication
> + * @x, @y: addends
> + * @ret: Output for sum
> + *
> + * Computes *@ret = @x + @y, and returns true if and only if that
> + * value has been truncated.
> + */
> +static inline bool sadd32_overflow(int32_t x, int32_t y, int32_t *ret)
> +{
> +#if __has_builtin(__builtin_add_overflow) || __GNUC__ >= 5

I was wondering if having multiple compiler tests means we should move
this test into compiler.h but it doesn't seem we've done it in any other
case except libvixl's GCC_VERSION_OR_NEWER helper.

> +/**
> + * smul64_overflow - multiplication with overflow indication
> + * @x, @y: Input multipliers
> + * @ret: Output for product
> + *
> + * Computes *@ret = @x * @y, and returns true if and only if that
> + * value has been truncated.
> + */
> +static inline bool smul64_overflow(int64_t x, int64_t y, int64_t *ret)
> +{
> +#if __has_builtin(__builtin_mul_overflow) || __GNUC__ >= 5
> +    return __builtin_mul_overflow(x, y, ret);
> +#else
> +    uint64_t hi, lo;
> +    muls64(&lo, &hi, x, y);
> +    *ret = lo;
> +    return hi != ((int64_t)lo >> 63);
> +#endif
> +}
> +
> +/**
> + * umul32_overflow - multiplication with overflow indication
> + * @x, @y: Input multipliers
> + * @ret: Output for product
> + *
> + * Computes *@ret = @x * @y, and returns true if and only if that
> + * value has been truncated.
> + */
> +static inline bool umul32_overflow(uint32_t x, uint32_t y, uint32_t *ret)
> +{
> +#if __has_builtin(__builtin_mul_overflow) || __GNUC__ >= 5
> +    return __builtin_mul_overflow(x, y, ret);
> +#else
> +    uint64_t z = (uint64_t)x * y;
> +    *ret = z;
> +    return z > UINT32_MAX;
> +#endif
> +}
> +
> +/**
> + * smul64_overflow - multiplication with overflow indication

c&p error.

> + * @x, @y: Input multipliers
> + * @ret: Output for product
> + *
> + * Computes *@ret = @x * @y, and returns true if and only if that
> + * value has been truncated.
> + */
> +static inline bool umul64_overflow(uint64_t x, uint64_t y, uint64_t *ret)
> +{
> +#if __has_builtin(__builtin_mul_overflow) || __GNUC__ >= 5
> +    return __builtin_mul_overflow(x, y, ret);
> +#else
> +    uint64_t hi;
> +    mulu64(ret, &hi, x, y);
> +    return hi != 0;
> +#endif
> +}
> +
>  /* Host type specific sizes of these routines.  */
>  
>  #if ULONG_MAX == UINT32_MAX

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 03/72] qemu/host-utils: Add wrappers for carry builtins
  2021-05-08  1:46 ` [PATCH 03/72] qemu/host-utils: Add wrappers for carry builtins Richard Henderson
@ 2021-05-10 12:57   ` Alex Bennée
  2021-05-11 20:10     ` Richard Henderson
  0 siblings, 1 reply; 151+ messages in thread
From: Alex Bennée @ 2021-05-10 12:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> These builtins came in clang 3.8, but are not present in gcc through
> version 11.  Even in clang the optimization is not ideal except for
> x86_64, but no worse than the hand-coding that we currently do.

Given this statement....

<snip>
> +/**
> + * uadd64_carry - addition with carry-in and carry-out
> + * @x, @y: addends
> + * @pcarry: in-out carry value
> + *
> + * Computes @x + @y + *@pcarry, placing the carry-out back
> + * into *@pcarry and returning the 64-bit sum.
> + */
> +static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
> +{
> +#if __has_builtin(__builtin_addcll)
> +    unsigned long long c = *pcarry;
> +    x = __builtin_addcll(x, y, c, &c);

what happens when unsigned long long isn't the same as uint64_t? Doesn't
C99 only specify a minimum?

> +    *pcarry = c & 1;

Why do we need to clamp it here? Shouldn't the compiler automatically do
that due to the bool?

> +    return x;
> +#else
> +    bool c = *pcarry;
> +    /* This is clang's internal expansion of __builtin_addc. */
> +    c = uadd64_overflow(x, c, &x);
> +    c |= uadd64_overflow(x, y, &x);
> +    *pcarry = c;
> +    return x;
> +#endif

Either way if you aren't super happy with the compilers builtin and you
get equivalent code with the unambigious hand coded version then what is
the point of having a builtin leg?

> +}
> +
> +/**
> + * usub64_borrow - subtraction with borrow-in and borrow-out
> + * @x, @y: addends
> + * @pborrow: in-out borrow value
> + *
> + * Computes @x - @y - *@pborrow, placing the borrow-out back
> + * into *@pborrow and returning the 64-bit sum.
> + */
> +static inline uint64_t usub64_borrow(uint64_t x, uint64_t y, bool *pborrow)
> +{
> +#if __has_builtin(__builtin_subcll)
> +    unsigned long long b = *pborrow;
> +    x = __builtin_subcll(x, y, b, &b);
> +    *pborrow = b & 1;
> +    return x;
> +#else
> +    bool b = *pborrow;
> +    b = usub64_overflow(x, b, &x);
> +    b |= usub64_overflow(x, y, &x);
> +    *pborrow = b;
> +    return x;
> +#endif
> +}
> +
>  /* Host type specific sizes of these routines.  */
>  
>  #if ULONG_MAX == UINT32_MAX


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 04/72] accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c
  2021-05-08  1:46 ` [PATCH 04/72] accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c Richard Henderson
  2021-05-09  8:43   ` Philippe Mathieu-Daudé
@ 2021-05-10 13:02   ` Alex Bennée
  2021-05-11  9:46   ` David Hildenbrand
  2 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-10 13:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Obvious uses of the new functions.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 06/72] softfloat: Move the binary point to the msb
  2021-05-08  1:46 ` [PATCH 06/72] softfloat: Move the binary point to the msb Richard Henderson
@ 2021-05-10 13:36   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-10 13:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Rather than point the binary point at msb-1, put it at the msb.
> Use uadd64_overflow to detect when addition overflows instead
> of DECOMPOSED_OVERFLOW_BIT.
>
> This reduces the number of special cases within the code, such
> as shifting an int64_t either left or right during conversion.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 00/72] Convert floatx80 and float128 to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (72 preceding siblings ...)
  2021-05-08  2:52 ` [PATCH 00/72] Convert floatx80 and float128 " no-reply
@ 2021-05-10 13:36 ` Alex Bennée
  2021-05-12  1:52   ` Richard Henderson
  2021-05-12 16:47 ` Alex Bennée
                   ` (2 subsequent siblings)
  76 siblings, 1 reply; 151+ messages in thread
From: Alex Bennée @ 2021-05-10 13:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Reorg everything using QEMU_GENERIC and multiple inclusion to
> reduce the amount of code duplication between the formats.
>
> The use of QEMU_GENERIC means that we need to use pointers instead
> of structures, which means that even the smaller float formats
> need rearranging.
>
> I've carried it through to completion within fpu/, so that we don't
> have (much) of the legacy code remaining.  There is some floatx80
> stuff in target/m68k and target/i386 that's still hanging around.

FWIW I could enable a few more tests although extF80_lt_quiet still has
some failures on equality tests:

./tests/fp/fp-test -l 1 -r all extF80_lt_quiet
>> Testing extF80_lt_quiet
46464 tests total.
Errors found in extF80_lt_quiet:
+0000.0000000000000000  +0000.0000000000000000  => 1 .....  expected 0 .....
+0000.0000000000000000  -0000.0000000000000000  => 1 .....  expected 0 .....
+0000.0000000000000001  +0000.0000000000000001  => 1 .....  expected 0 .....
+0000.7FFFFFFFFFFFFFFF  +0000.7FFFFFFFFFFFFFFF  => 1 .....  expected 0 .....
+0000.7FFFFFFFFFFFFFFE  +0000.7FFFFFFFFFFFFFFE  => 1 .....  expected 0 .....
+0001.8000000000000000  +0001.8000000000000000  => 1 .....  expected 0 .....
+0001.8000000000000001  +0001.8000000000000001  => 1 .....  expected 0 .....
+0001.FFFFFFFFFFFFFFFF  +0001.FFFFFFFFFFFFFFFF  => 1 .....  expected 0 .....
+0001.FFFFFFFFFFFFFFFE  +0001.FFFFFFFFFFFFFFFE  => 1 .....  expected 0 .....
+3FBF.8000000000000000  +3FBF.8000000000000000  => 1 .....  expected 0 .....
+3FBF.8000000000000001  +3FBF.8000000000000001  => 1 .....  expected 0 .....
+3FBF.FFFFFFFFFFFFFFFF  +3FBF.FFFFFFFFFFFFFFFF  => 1 .....  expected 0 .....
+3FBF.FFFFFFFFFFFFFFFE  +3FBF.FFFFFFFFFFFFFFFE  => 1 .....  expected 0 .....
+3FFD.8000000000000000  +3FFD.8000000000000000  => 1 .....  expected 0 .....
+3FFD.8000000000000001  +3FFD.8000000000000001  => 1 .....  expected 0 .....
+3FFD.FFFFFFFFFFFFFFFF  +3FFD.FFFFFFFFFFFFFFFF  => 1 .....  expected 0 .....
+3FFD.FFFFFFFFFFFFFFFE  +3FFD.FFFFFFFFFFFFFFFE  => 1 .....  expected 0 .....
+3FFE.8000000000000000  +3FFE.8000000000000000  => 1 .....  expected 0 .....
+3FFE.8000000000000001  +3FFE.8000000000000001  => 1 .....  expected 0 .....
+3FFE.FFFFFFFFFFFFFFFF  +3FFE.FFFFFFFFFFFFFFFF  => 1 .....  expected 0 .....
9618 tests performed; 20 errors found.

However the rest can be enabled:

tests/fp: enable more tests

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

1 file changed, 6 insertions(+), 6 deletions(-)
tests/fp/meson.build | 12 ++++++------

modified   tests/fp/meson.build
@@ -556,7 +556,9 @@ softfloat_conv_tests = {
                       'extF80_to_f64 extF80_to_f128 ' +
                       'f128_to_f16',
     'int-to-float': 'i32_to_f16 i64_to_f16 i32_to_f32 i64_to_f32 ' +
-                    'i32_to_f64 i64_to_f64 i32_to_f128 i64_to_f128',
+                    'i32_to_f64 i64_to_f64 ' +
+                    'i32_to_extF80 i64_to_extF80 ' +
+                    'i32_to_f128 i64_to_f128',
     'uint-to-float': 'ui32_to_f16 ui64_to_f16 ui32_to_f32 ui64_to_f32 ' +
                      'ui32_to_f64 ui64_to_f64 ui64_to_f128 ' +
                      'ui32_to_extF80 ui64_to_extF80',
@@ -581,7 +583,7 @@ softfloat_conv_tests = {
                      'extF80_to_ui64 extF80_to_ui64_r_minMag ' +
                      'f128_to_ui64 f128_to_ui64_r_minMag',
     'round-to-integer': 'f16_roundToInt f32_roundToInt ' +
-                        'f64_roundToInt f128_roundToInt'
+                        'f64_roundToInt extF80_roundToInt f128_roundToInt'
 }
 softfloat_tests = {
     'eq_signaling' : 'compare',
@@ -602,18 +604,16 @@ fptest_args = ['-s', '-l', '1']
 fptest_rounding_args = ['-r', 'all']
 
 # Conversion Routines:
-# FIXME: i32_to_extF80 (broken), i64_to_extF80 (broken)
-#        extF80_roundToInt (broken)
 foreach k, v : softfloat_conv_tests
   test('fp-test-' + k, fptest,
        args: fptest_args + fptest_rounding_args + v.split(),
        suite: ['softfloat', 'softfloat-conv'])
 endforeach
 
-# FIXME: extF80_{lt_quiet, rem} (broken),
+# FIXME: extF80_{lt_quiet} (broken),
 #        extF80_{mulAdd} (missing)
 foreach k, v : softfloat_tests
-  extF80_broken = ['lt_quiet', 'rem'].contains(k)
+  extF80_broken = ['lt_quiet'].contains(k)
   test('fp-test-' + k, fptest,
        args: fptest_args + fptest_rounding_args +
              ['f16_' + k, 'f32_' + k, 'f64_' + k, 'f128_' + k] +

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 11/72] softfloat: Use return_nan in float_to_float
  2021-05-08  1:47 ` [PATCH 11/72] softfloat: Use return_nan in float_to_float Richard Henderson
@ 2021-05-10 15:10   ` Alex Bennée
  2021-05-11 10:10   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-10 15:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 12/72] softfloat: fix return_nan vs default_nan_mode
  2021-05-08  1:47 ` [PATCH 12/72] softfloat: fix return_nan vs default_nan_mode Richard Henderson
@ 2021-05-10 15:12   ` Alex Bennée
  2021-05-11 10:12   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-10 15:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Do not call parts_silence_nan when default_nan_mode is in
> effect.  This will avoid an assert in a later patch.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 13/72] target/mips: Set set_default_nan_mode with set_snan_bit_is_one
  2021-05-08  1:47 ` [PATCH 13/72] target/mips: Set set_default_nan_mode with set_snan_bit_is_one Richard Henderson
@ 2021-05-11  9:37   ` Alex Bennée
  2021-05-11 10:16   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-11  9:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> This behavior is currently hard-coded in parts_silence_nan,
> but setting this bit properly will allow this to be cleaned up.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 01/72] qemu/host-utils: Use __builtin_bitreverseN
  2021-05-08  1:46 ` [PATCH 01/72] qemu/host-utils: Use __builtin_bitreverseN Richard Henderson
  2021-05-10  9:59   ` Alex Bennée
@ 2021-05-11  9:41   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: David Hildenbrand @ 2021-05-11  9:41 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:46, Richard Henderson wrote:
> Clang has added some builtins for these operations;
> use them if available.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   include/qemu/host-utils.h | 16 ++++++++++++++++
>   1 file changed, 16 insertions(+)
> 
> diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
> index cdca2991d8..f1e52851e0 100644
> --- a/include/qemu/host-utils.h
> +++ b/include/qemu/host-utils.h
> @@ -272,6 +272,9 @@ static inline int ctpop64(uint64_t val)
>    */
>   static inline uint8_t revbit8(uint8_t x)
>   {
> +#if __has_builtin(__builtin_bitreverse8)
> +    return __builtin_bitreverse8(x);
> +#else
>       /* Assign the correct nibble position.  */
>       x = ((x & 0xf0) >> 4)
>         | ((x & 0x0f) << 4);
> @@ -281,6 +284,7 @@ static inline uint8_t revbit8(uint8_t x)
>         | ((x & 0x22) << 1)
>         | ((x & 0x11) << 3);
>       return x;
> +#endif
>   }
>   
>   /**
> @@ -289,6 +293,9 @@ static inline uint8_t revbit8(uint8_t x)
>    */
>   static inline uint16_t revbit16(uint16_t x)
>   {
> +#if __has_builtin(__builtin_bitreverse16)
> +    return __builtin_bitreverse16(x);
> +#else
>       /* Assign the correct byte position.  */
>       x = bswap16(x);
>       /* Assign the correct nibble position.  */
> @@ -300,6 +307,7 @@ static inline uint16_t revbit16(uint16_t x)
>         | ((x & 0x2222) << 1)
>         | ((x & 0x1111) << 3);
>       return x;
> +#endif
>   }
>   
>   /**
> @@ -308,6 +316,9 @@ static inline uint16_t revbit16(uint16_t x)
>    */
>   static inline uint32_t revbit32(uint32_t x)
>   {
> +#if __has_builtin(__builtin_bitreverse32)
> +    return __builtin_bitreverse32(x);
> +#else
>       /* Assign the correct byte position.  */
>       x = bswap32(x);
>       /* Assign the correct nibble position.  */
> @@ -319,6 +330,7 @@ static inline uint32_t revbit32(uint32_t x)
>         | ((x & 0x22222222u) << 1)
>         | ((x & 0x11111111u) << 3);
>       return x;
> +#endif
>   }
>   
>   /**
> @@ -327,6 +339,9 @@ static inline uint32_t revbit32(uint32_t x)
>    */
>   static inline uint64_t revbit64(uint64_t x)
>   {
> +#if __has_builtin(__builtin_bitreverse64)
> +    return __builtin_bitreverse64(x);
> +#else
>       /* Assign the correct byte position.  */
>       x = bswap64(x);
>       /* Assign the correct nibble position.  */
> @@ -338,6 +353,7 @@ static inline uint64_t revbit64(uint64_t x)
>         | ((x & 0x2222222222222222ull) << 1)
>         | ((x & 0x1111111111111111ull) << 3);
>       return x;
> +#endif
>   }
>   
>   /* Host type specific sizes of these routines.  */
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 04/72] accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c
  2021-05-08  1:46 ` [PATCH 04/72] accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c Richard Henderson
  2021-05-09  8:43   ` Philippe Mathieu-Daudé
  2021-05-10 13:02   ` Alex Bennée
@ 2021-05-11  9:46   ` David Hildenbrand
  2 siblings, 0 replies; 151+ messages in thread
From: David Hildenbrand @ 2021-05-11  9:46 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:46, Richard Henderson wrote:
> Obvious uses of the new functions.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   accel/tcg/tcg-runtime-gvec.c | 36 ++++++++++++++++--------------------
>   1 file changed, 16 insertions(+), 20 deletions(-)
> 
> diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
> index 521da4a813..ac7d28c251 100644
> --- a/accel/tcg/tcg-runtime-gvec.c
> +++ b/accel/tcg/tcg-runtime-gvec.c
> @@ -1073,9 +1073,8 @@ void HELPER(gvec_ssadd32)(void *d, void *a, void *b, uint32_t desc)
>       for (i = 0; i < oprsz; i += sizeof(int32_t)) {
>           int32_t ai = *(int32_t *)(a + i);
>           int32_t bi = *(int32_t *)(b + i);
> -        int32_t di = ai + bi;
> -        if (((di ^ ai) &~ (ai ^ bi)) < 0) {
> -            /* Signed overflow.  */
> +        int32_t di;
> +        if (sadd32_overflow(ai, bi, &di)) {
>               di = (di < 0 ? INT32_MAX : INT32_MIN);
>           }
>           *(int32_t *)(d + i) = di;
> @@ -1091,9 +1090,8 @@ void HELPER(gvec_ssadd64)(void *d, void *a, void *b, uint32_t desc)
>       for (i = 0; i < oprsz; i += sizeof(int64_t)) {
>           int64_t ai = *(int64_t *)(a + i);
>           int64_t bi = *(int64_t *)(b + i);
> -        int64_t di = ai + bi;
> -        if (((di ^ ai) &~ (ai ^ bi)) < 0) {
> -            /* Signed overflow.  */
> +        int64_t di;
> +        if (sadd64_overflow(ai, bi, &di)) {
>               di = (di < 0 ? INT64_MAX : INT64_MIN);
>           }
>           *(int64_t *)(d + i) = di;
> @@ -1143,9 +1141,8 @@ void HELPER(gvec_sssub32)(void *d, void *a, void *b, uint32_t desc)
>       for (i = 0; i < oprsz; i += sizeof(int32_t)) {
>           int32_t ai = *(int32_t *)(a + i);
>           int32_t bi = *(int32_t *)(b + i);
> -        int32_t di = ai - bi;
> -        if (((di ^ ai) & (ai ^ bi)) < 0) {
> -            /* Signed overflow.  */
> +        int32_t di;
> +        if (ssub32_overflow(ai, bi, &di)) {
>               di = (di < 0 ? INT32_MAX : INT32_MIN);
>           }
>           *(int32_t *)(d + i) = di;
> @@ -1161,9 +1158,8 @@ void HELPER(gvec_sssub64)(void *d, void *a, void *b, uint32_t desc)
>       for (i = 0; i < oprsz; i += sizeof(int64_t)) {
>           int64_t ai = *(int64_t *)(a + i);
>           int64_t bi = *(int64_t *)(b + i);
> -        int64_t di = ai - bi;
> -        if (((di ^ ai) & (ai ^ bi)) < 0) {
> -            /* Signed overflow.  */
> +        int64_t di;
> +        if (ssub64_overflow(ai, bi, &di)) {
>               di = (di < 0 ? INT64_MAX : INT64_MIN);
>           }
>           *(int64_t *)(d + i) = di;
> @@ -1209,8 +1205,8 @@ void HELPER(gvec_usadd32)(void *d, void *a, void *b, uint32_t desc)
>       for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
>           uint32_t ai = *(uint32_t *)(a + i);
>           uint32_t bi = *(uint32_t *)(b + i);
> -        uint32_t di = ai + bi;
> -        if (di < ai) {
> +        uint32_t di;
> +        if (uadd32_overflow(ai, bi, &di)) {
>               di = UINT32_MAX;
>           }
>           *(uint32_t *)(d + i) = di;
> @@ -1226,8 +1222,8 @@ void HELPER(gvec_usadd64)(void *d, void *a, void *b, uint32_t desc)
>       for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
>           uint64_t ai = *(uint64_t *)(a + i);
>           uint64_t bi = *(uint64_t *)(b + i);
> -        uint64_t di = ai + bi;
> -        if (di < ai) {
> +        uint64_t di;
> +        if (uadd64_overflow(ai, bi, &di)) {
>               di = UINT64_MAX;
>           }
>           *(uint64_t *)(d + i) = di;
> @@ -1273,8 +1269,8 @@ void HELPER(gvec_ussub32)(void *d, void *a, void *b, uint32_t desc)
>       for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
>           uint32_t ai = *(uint32_t *)(a + i);
>           uint32_t bi = *(uint32_t *)(b + i);
> -        uint32_t di = ai - bi;
> -        if (ai < bi) {
> +        uint32_t di;
> +        if (usub32_overflow(ai, bi, &di)) {
>               di = 0;
>           }
>           *(uint32_t *)(d + i) = di;
> @@ -1290,8 +1286,8 @@ void HELPER(gvec_ussub64)(void *d, void *a, void *b, uint32_t desc)
>       for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
>           uint64_t ai = *(uint64_t *)(a + i);
>           uint64_t bi = *(uint64_t *)(b + i);
> -        uint64_t di = ai - bi;
> -        if (ai < bi) {
> +        uint64_t di;
> +        if (usub64_overflow(ai, bi, &di)) {
>               di = 0;
>           }
>           *(uint64_t *)(d + i) = di;
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 05/72] tests/fp: add quad support to the benchmark utility
  2021-05-08  1:46 ` [PATCH 05/72] tests/fp: add quad support to the benchmark utility Richard Henderson
@ 2021-05-11 10:01   ` David Hildenbrand
  0 siblings, 0 replies; 151+ messages in thread
From: David Hildenbrand @ 2021-05-11 10:01 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:46, Richard Henderson wrote:
> From: Alex Bennée <alex.bennee@linaro.org>
> 
> Currently this only support softfloat calculations because working out
> if the hardware supports 128 bit floats needs configure magic. The 3
> op muladd operation is currently unimplemented so commented out for
> now.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20201020163738.27700-8-alex.bennee@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   tests/fp/fp-bench.c | 88 ++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 83 insertions(+), 5 deletions(-)
> 

LGTM

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 07/72] softfloat: Inline float_raise
  2021-05-08  1:46 ` [PATCH 07/72] softfloat: Inline float_raise Richard Henderson
  2021-05-09  8:32   ` Philippe Mathieu-Daudé
@ 2021-05-11 10:04   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: David Hildenbrand @ 2021-05-11 10:04 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:46, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   include/fpu/softfloat.h        |  5 ++++-
>   fpu/softfloat-specialize.c.inc | 12 ------------
>   2 files changed, 4 insertions(+), 13 deletions(-)
> 
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 78ad5ca738..019c2ec66d 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -100,7 +100,10 @@ typedef enum {
>   | Routine to raise any or all of the software IEC/IEEE floating-point
>   | exception flags.
>   *----------------------------------------------------------------------------*/
> -void float_raise(uint8_t flags, float_status *status);
> +static inline void float_raise(uint8_t flags, float_status *status)
> +{
> +    status->float_exception_flags |= flags;
> +}
>   
>   /*----------------------------------------------------------------------------
>   | If `a' is denormal and we are in flush-to-zero mode then set the
> diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
> index 9ea318f3e2..487b29155c 100644
> --- a/fpu/softfloat-specialize.c.inc
> +++ b/fpu/softfloat-specialize.c.inc
> @@ -228,18 +228,6 @@ floatx80 floatx80_default_nan(float_status *status)
>   const floatx80 floatx80_infinity
>       = make_floatx80_init(floatx80_infinity_high, floatx80_infinity_low);
>   
> -/*----------------------------------------------------------------------------
> -| Raises the exceptions specified by `flags'.  Floating-point traps can be
> -| defined here if desired.  It is currently not possible for such a trap
> -| to substitute a result value.  If traps are not implemented, this routine
> -| should be simply `float_exception_flags |= flags;'.
> -*----------------------------------------------------------------------------*/
> -
> -void float_raise(uint8_t flags, float_status *status)
> -{
> -    status->float_exception_flags |= flags;
> -}
> -
>   /*----------------------------------------------------------------------------
>   | Internal canonical NaN format.
>   *----------------------------------------------------------------------------*/
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 08/72] softfloat: Use float_raise in more places
  2021-05-08  1:46 ` [PATCH 08/72] softfloat: Use float_raise in more places Richard Henderson
  2021-05-09  8:34   ` Philippe Mathieu-Daudé
@ 2021-05-11 10:06   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: David Hildenbrand @ 2021-05-11 10:06 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:46, Richard Henderson wrote:
> We have been somewhat inconsistent about when to use
> float_raise and when to or in the bit by hand.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   fpu/softfloat.c | 87 +++++++++++++++++++++++++------------------------
>   1 file changed, 44 insertions(+), 43 deletions(-)
> 

Reviewed-by: David Hildenbrand <david@redhat.com>


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 11/72] softfloat: Use return_nan in float_to_float
  2021-05-08  1:47 ` [PATCH 11/72] softfloat: Use return_nan in float_to_float Richard Henderson
  2021-05-10 15:10   ` Alex Bennée
@ 2021-05-11 10:10   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: David Hildenbrand @ 2021-05-11 10:10 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:47, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   fpu/softfloat.c | 8 +-------
>   1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 64edb23793..b694e38522 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -1931,13 +1931,7 @@ static FloatParts float_to_float(FloatParts a, const FloatFmt *dstf,
>               break;
>           }
>       } else if (is_nan(a.cls)) {
> -        if (is_snan(a.cls)) {
> -            float_raise(float_flag_invalid, s);
> -            a = parts_silence_nan(a, s);
> -        }
> -        if (s->default_nan_mode) {
> -            return parts_default_nan(s);
> -        }
> +        return return_nan(a, s);
>       }
>       return a;
>   }
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 12/72] softfloat: fix return_nan vs default_nan_mode
  2021-05-08  1:47 ` [PATCH 12/72] softfloat: fix return_nan vs default_nan_mode Richard Henderson
  2021-05-10 15:12   ` Alex Bennée
@ 2021-05-11 10:12   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: David Hildenbrand @ 2021-05-11 10:12 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:47, Richard Henderson wrote:
> Do not call parts_silence_nan when default_nan_mode is in
> effect.  This will avoid an assert in a later patch.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   fpu/softfloat.c | 19 +++++++------------
>   1 file changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index b694e38522..6589f00b23 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -892,21 +892,16 @@ static float64 float64_round_pack_canonical(FloatParts p, float_status *s)
>   
>   static FloatParts return_nan(FloatParts a, float_status *s)
>   {
> -    switch (a.cls) {
> -    case float_class_snan:
> +    g_assert(is_nan(a.cls));
> +    if (is_snan(a.cls)) {
>           float_raise(float_flag_invalid, s);
> -        a = parts_silence_nan(a, s);
> -        /* fall through */
> -    case float_class_qnan:
> -        if (s->default_nan_mode) {
> -            return parts_default_nan(s);
> +        if (!s->default_nan_mode) {
> +            return parts_silence_nan(a, s);
>           }
> -        break;
> -
> -    default:
> -        g_assert_not_reached();
> +    } else if (!s->default_nan_mode) {
> +        return a;
>       }
> -    return a;
> +    return parts_default_nan(s);
>   }
>   
>   static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 14/72] softfloat: Do not produce a default_nan from parts_silence_nan
  2021-05-08  1:47 ` [PATCH 14/72] softfloat: Do not produce a default_nan from parts_silence_nan Richard Henderson
@ 2021-05-11 10:16   ` David Hildenbrand
  2021-05-11 10:32   ` Alex Bennée
  1 sibling, 0 replies; 151+ messages in thread
From: David Hildenbrand @ 2021-05-11 10:16 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:47, Richard Henderson wrote:
> Require default_nan_mode to be set instead.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   fpu/softfloat-specialize.c.inc | 11 +++++------
>   1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
> index 487b29155c..5988830c16 100644
> --- a/fpu/softfloat-specialize.c.inc
> +++ b/fpu/softfloat-specialize.c.inc
> @@ -180,16 +180,15 @@ static FloatParts parts_default_nan(float_status *status)
>   static FloatParts parts_silence_nan(FloatParts a, float_status *status)
>   {
>       g_assert(!no_signaling_nans(status));
> -#if defined(TARGET_HPPA)
> -    a.frac &= ~(1ULL << (DECOMPOSED_BINARY_POINT - 1));
> -    a.frac |= 1ULL << (DECOMPOSED_BINARY_POINT - 2);
> -#else
> +    g_assert(!status->default_nan_mode);
> +
> +    /* The only snan_bit_is_one target without default_nan_mode is HPPA. */
>       if (snan_bit_is_one(status)) {
> -        return parts_default_nan(status);
> +        a.frac &= ~(1ULL << (DECOMPOSED_BINARY_POINT - 1));
> +        a.frac |= 1ULL << (DECOMPOSED_BINARY_POINT - 2);
>       } else {
>           a.frac |= 1ULL << (DECOMPOSED_BINARY_POINT - 1);
>       }
> -#endif
>       a.cls = float_class_qnan;
>       return a;
>   }
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 13/72] target/mips: Set set_default_nan_mode with set_snan_bit_is_one
  2021-05-08  1:47 ` [PATCH 13/72] target/mips: Set set_default_nan_mode with set_snan_bit_is_one Richard Henderson
  2021-05-11  9:37   ` Alex Bennée
@ 2021-05-11 10:16   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: David Hildenbrand @ 2021-05-11 10:16 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:47, Richard Henderson wrote:
> This behavior is currently hard-coded in parts_silence_nan,
> but setting this bit properly will allow this to be cleaned up.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/mips/fpu_helper.h | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/target/mips/fpu_helper.h b/target/mips/fpu_helper.h
> index 1c2d6d35a7..ad1116e8c1 100644
> --- a/target/mips/fpu_helper.h
> +++ b/target/mips/fpu_helper.h
> @@ -27,8 +27,14 @@ static inline void restore_flush_mode(CPUMIPSState *env)
>   
>   static inline void restore_snan_bit_mode(CPUMIPSState *env)
>   {
> -    set_snan_bit_is_one((env->active_fpu.fcr31 & (1 << FCR31_NAN2008)) == 0,
> -                        &env->active_fpu.fp_status);
> +    bool nan2008 = env->active_fpu.fcr31 & (1 << FCR31_NAN2008);
> +
> +    /*
> +     * With nan2008, SNaNs are silenced in the usual way.
> +     * Before that, SNaNs are not silenced; default nans are produced.
> +     */
> +    set_snan_bit_is_one(!nan2008, &env->active_fpu.fp_status);
> +    set_default_nan_mode(!nan2008, &env->active_fpu.fp_status);
>   }
>   
>   static inline void restore_fp_status(CPUMIPSState *env)
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 15/72] softfloat: Rename FloatParts to FloatParts64
  2021-05-08  1:47 ` [PATCH 15/72] softfloat: Rename FloatParts to FloatParts64 Richard Henderson
  2021-05-09  8:45   ` Philippe Mathieu-Daudé
@ 2021-05-11 10:16   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: David Hildenbrand @ 2021-05-11 10:16 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:47, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   fpu/softfloat.c                | 362 ++++++++++++++++-----------------
>   fpu/softfloat-specialize.c.inc |   6 +-
>   2 files changed, 184 insertions(+), 184 deletions(-)
> 
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 6589f00b23..27b51659c9 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -515,7 +515,7 @@ typedef struct {
>       int32_t  exp;
>       FloatClass cls;
>       bool sign;
> -} FloatParts;
> +} FloatParts64;
>   
>   #define DECOMPOSED_BINARY_POINT    63
>   #define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
> @@ -580,11 +580,11 @@ static const FloatFmt float64_params = {
>   };
>   
>   /* Unpack a float to parts, but do not canonicalize.  */
> -static inline FloatParts unpack_raw(FloatFmt fmt, uint64_t raw)
> +static inline FloatParts64 unpack_raw(FloatFmt fmt, uint64_t raw)
>   {
>       const int sign_pos = fmt.frac_size + fmt.exp_size;
>   
> -    return (FloatParts) {
> +    return (FloatParts64) {
>           .cls = float_class_unclassified,
>           .sign = extract64(raw, sign_pos, 1),
>           .exp = extract64(raw, fmt.frac_size, fmt.exp_size),
> @@ -592,50 +592,50 @@ static inline FloatParts unpack_raw(FloatFmt fmt, uint64_t raw)
>       };
>   }
>   
> -static inline FloatParts float16_unpack_raw(float16 f)
> +static inline FloatParts64 float16_unpack_raw(float16 f)
>   {
>       return unpack_raw(float16_params, f);
>   }
>   
> -static inline FloatParts bfloat16_unpack_raw(bfloat16 f)
> +static inline FloatParts64 bfloat16_unpack_raw(bfloat16 f)
>   {
>       return unpack_raw(bfloat16_params, f);
>   }
>   
> -static inline FloatParts float32_unpack_raw(float32 f)
> +static inline FloatParts64 float32_unpack_raw(float32 f)
>   {
>       return unpack_raw(float32_params, f);
>   }
>   
> -static inline FloatParts float64_unpack_raw(float64 f)
> +static inline FloatParts64 float64_unpack_raw(float64 f)
>   {
>       return unpack_raw(float64_params, f);
>   }
>   
>   /* Pack a float from parts, but do not canonicalize.  */
> -static inline uint64_t pack_raw(FloatFmt fmt, FloatParts p)
> +static inline uint64_t pack_raw(FloatFmt fmt, FloatParts64 p)
>   {
>       const int sign_pos = fmt.frac_size + fmt.exp_size;
>       uint64_t ret = deposit64(p.frac, fmt.frac_size, fmt.exp_size, p.exp);
>       return deposit64(ret, sign_pos, 1, p.sign);
>   }
>   
> -static inline float16 float16_pack_raw(FloatParts p)
> +static inline float16 float16_pack_raw(FloatParts64 p)
>   {
>       return make_float16(pack_raw(float16_params, p));
>   }
>   
> -static inline bfloat16 bfloat16_pack_raw(FloatParts p)
> +static inline bfloat16 bfloat16_pack_raw(FloatParts64 p)
>   {
>       return pack_raw(bfloat16_params, p);
>   }
>   
> -static inline float32 float32_pack_raw(FloatParts p)
> +static inline float32 float32_pack_raw(FloatParts64 p)
>   {
>       return make_float32(pack_raw(float32_params, p));
>   }
>   
> -static inline float64 float64_pack_raw(FloatParts p)
> +static inline float64 float64_pack_raw(FloatParts64 p)
>   {
>       return make_float64(pack_raw(float64_params, p));
>   }
> @@ -651,7 +651,7 @@ static inline float64 float64_pack_raw(FloatParts p)
>   #include "softfloat-specialize.c.inc"
>   
>   /* Canonicalize EXP and FRAC, setting CLS.  */
> -static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm,
> +static FloatParts64 sf_canonicalize(FloatParts64 part, const FloatFmt *parm,
>                                     float_status *status)
>   {
>       if (part.exp == parm->exp_max && !parm->arm_althp) {
> @@ -689,7 +689,7 @@ static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm,
>    * by EXP_BIAS and must be bounded by [EXP_MAX-1, 0].
>    */
>   
> -static FloatParts round_canonical(FloatParts p, float_status *s,
> +static FloatParts64 round_canonical(FloatParts64 p, float_status *s,
>                                     const FloatFmt *parm)
>   {
>       const uint64_t frac_lsb = parm->frac_lsb;
> @@ -838,59 +838,59 @@ static FloatParts round_canonical(FloatParts p, float_status *s,
>   }
>   
>   /* Explicit FloatFmt version */
> -static FloatParts float16a_unpack_canonical(float16 f, float_status *s,
> +static FloatParts64 float16a_unpack_canonical(float16 f, float_status *s,
>                                               const FloatFmt *params)
>   {
>       return sf_canonicalize(float16_unpack_raw(f), params, s);
>   }
>   
> -static FloatParts float16_unpack_canonical(float16 f, float_status *s)
> +static FloatParts64 float16_unpack_canonical(float16 f, float_status *s)
>   {
>       return float16a_unpack_canonical(f, s, &float16_params);
>   }
>   
> -static FloatParts bfloat16_unpack_canonical(bfloat16 f, float_status *s)
> +static FloatParts64 bfloat16_unpack_canonical(bfloat16 f, float_status *s)
>   {
>       return sf_canonicalize(bfloat16_unpack_raw(f), &bfloat16_params, s);
>   }
>   
> -static float16 float16a_round_pack_canonical(FloatParts p, float_status *s,
> +static float16 float16a_round_pack_canonical(FloatParts64 p, float_status *s,
>                                                const FloatFmt *params)
>   {
>       return float16_pack_raw(round_canonical(p, s, params));
>   }
>   
> -static float16 float16_round_pack_canonical(FloatParts p, float_status *s)
> +static float16 float16_round_pack_canonical(FloatParts64 p, float_status *s)
>   {
>       return float16a_round_pack_canonical(p, s, &float16_params);
>   }
>   
> -static bfloat16 bfloat16_round_pack_canonical(FloatParts p, float_status *s)
> +static bfloat16 bfloat16_round_pack_canonical(FloatParts64 p, float_status *s)
>   {
>       return bfloat16_pack_raw(round_canonical(p, s, &bfloat16_params));
>   }
>   
> -static FloatParts float32_unpack_canonical(float32 f, float_status *s)
> +static FloatParts64 float32_unpack_canonical(float32 f, float_status *s)
>   {
>       return sf_canonicalize(float32_unpack_raw(f), &float32_params, s);
>   }
>   
> -static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
> +static float32 float32_round_pack_canonical(FloatParts64 p, float_status *s)
>   {
>       return float32_pack_raw(round_canonical(p, s, &float32_params));
>   }
>   
> -static FloatParts float64_unpack_canonical(float64 f, float_status *s)
> +static FloatParts64 float64_unpack_canonical(float64 f, float_status *s)
>   {
>       return sf_canonicalize(float64_unpack_raw(f), &float64_params, s);
>   }
>   
> -static float64 float64_round_pack_canonical(FloatParts p, float_status *s)
> +static float64 float64_round_pack_canonical(FloatParts64 p, float_status *s)
>   {
>       return float64_pack_raw(round_canonical(p, s, &float64_params));
>   }
>   
> -static FloatParts return_nan(FloatParts a, float_status *s)
> +static FloatParts64 return_nan(FloatParts64 a, float_status *s)
>   {
>       g_assert(is_nan(a.cls));
>       if (is_snan(a.cls)) {
> @@ -904,7 +904,7 @@ static FloatParts return_nan(FloatParts a, float_status *s)
>       return parts_default_nan(s);
>   }
>   
> -static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
> +static FloatParts64 pick_nan(FloatParts64 a, FloatParts64 b, float_status *s)
>   {
>       if (is_snan(a.cls) || is_snan(b.cls)) {
>           float_raise(float_flag_invalid, s);
> @@ -925,7 +925,7 @@ static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
>       return a;
>   }
>   
> -static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
> +static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64 c,
>                                     bool inf_zero, float_status *s)
>   {
>       int which;
> @@ -971,7 +971,7 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
>    * Arithmetic.
>    */
>   
> -static FloatParts addsub_floats(FloatParts a, FloatParts b, bool subtract,
> +static FloatParts64 addsub_floats(FloatParts64 a, FloatParts64 b, bool subtract,
>                                   float_status *s)
>   {
>       bool a_sign = a.sign;
> @@ -1062,18 +1062,18 @@ static FloatParts addsub_floats(FloatParts a, FloatParts b, bool subtract,
>   
>   float16 QEMU_FLATTEN float16_add(float16 a, float16 b, float_status *status)
>   {
> -    FloatParts pa = float16_unpack_canonical(a, status);
> -    FloatParts pb = float16_unpack_canonical(b, status);
> -    FloatParts pr = addsub_floats(pa, pb, false, status);
> +    FloatParts64 pa = float16_unpack_canonical(a, status);
> +    FloatParts64 pb = float16_unpack_canonical(b, status);
> +    FloatParts64 pr = addsub_floats(pa, pb, false, status);
>   
>       return float16_round_pack_canonical(pr, status);
>   }
>   
>   float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
>   {
> -    FloatParts pa = float16_unpack_canonical(a, status);
> -    FloatParts pb = float16_unpack_canonical(b, status);
> -    FloatParts pr = addsub_floats(pa, pb, true, status);
> +    FloatParts64 pa = float16_unpack_canonical(a, status);
> +    FloatParts64 pb = float16_unpack_canonical(b, status);
> +    FloatParts64 pr = addsub_floats(pa, pb, true, status);
>   
>       return float16_round_pack_canonical(pr, status);
>   }
> @@ -1081,9 +1081,9 @@ float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
>   static float32 QEMU_SOFTFLOAT_ATTR
>   soft_f32_addsub(float32 a, float32 b, bool subtract, float_status *status)
>   {
> -    FloatParts pa = float32_unpack_canonical(a, status);
> -    FloatParts pb = float32_unpack_canonical(b, status);
> -    FloatParts pr = addsub_floats(pa, pb, subtract, status);
> +    FloatParts64 pa = float32_unpack_canonical(a, status);
> +    FloatParts64 pb = float32_unpack_canonical(b, status);
> +    FloatParts64 pr = addsub_floats(pa, pb, subtract, status);
>   
>       return float32_round_pack_canonical(pr, status);
>   }
> @@ -1101,9 +1101,9 @@ static inline float32 soft_f32_sub(float32 a, float32 b, float_status *status)
>   static float64 QEMU_SOFTFLOAT_ATTR
>   soft_f64_addsub(float64 a, float64 b, bool subtract, float_status *status)
>   {
> -    FloatParts pa = float64_unpack_canonical(a, status);
> -    FloatParts pb = float64_unpack_canonical(b, status);
> -    FloatParts pr = addsub_floats(pa, pb, subtract, status);
> +    FloatParts64 pa = float64_unpack_canonical(a, status);
> +    FloatParts64 pb = float64_unpack_canonical(b, status);
> +    FloatParts64 pr = addsub_floats(pa, pb, subtract, status);
>   
>       return float64_round_pack_canonical(pr, status);
>   }
> @@ -1199,18 +1199,18 @@ float64_sub(float64 a, float64 b, float_status *s)
>    */
>   bfloat16 QEMU_FLATTEN bfloat16_add(bfloat16 a, bfloat16 b, float_status *status)
>   {
> -    FloatParts pa = bfloat16_unpack_canonical(a, status);
> -    FloatParts pb = bfloat16_unpack_canonical(b, status);
> -    FloatParts pr = addsub_floats(pa, pb, false, status);
> +    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
> +    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
> +    FloatParts64 pr = addsub_floats(pa, pb, false, status);
>   
>       return bfloat16_round_pack_canonical(pr, status);
>   }
>   
>   bfloat16 QEMU_FLATTEN bfloat16_sub(bfloat16 a, bfloat16 b, float_status *status)
>   {
> -    FloatParts pa = bfloat16_unpack_canonical(a, status);
> -    FloatParts pb = bfloat16_unpack_canonical(b, status);
> -    FloatParts pr = addsub_floats(pa, pb, true, status);
> +    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
> +    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
> +    FloatParts64 pr = addsub_floats(pa, pb, true, status);
>   
>       return bfloat16_round_pack_canonical(pr, status);
>   }
> @@ -1221,7 +1221,7 @@ bfloat16 QEMU_FLATTEN bfloat16_sub(bfloat16 a, bfloat16 b, float_status *status)
>    * for Binary Floating-Point Arithmetic.
>    */
>   
> -static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s)
> +static FloatParts64 mul_floats(FloatParts64 a, FloatParts64 b, float_status *s)
>   {
>       bool sign = a.sign ^ b.sign;
>   
> @@ -1267,9 +1267,9 @@ static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s)
>   
>   float16 QEMU_FLATTEN float16_mul(float16 a, float16 b, float_status *status)
>   {
> -    FloatParts pa = float16_unpack_canonical(a, status);
> -    FloatParts pb = float16_unpack_canonical(b, status);
> -    FloatParts pr = mul_floats(pa, pb, status);
> +    FloatParts64 pa = float16_unpack_canonical(a, status);
> +    FloatParts64 pb = float16_unpack_canonical(b, status);
> +    FloatParts64 pr = mul_floats(pa, pb, status);
>   
>       return float16_round_pack_canonical(pr, status);
>   }
> @@ -1277,9 +1277,9 @@ float16 QEMU_FLATTEN float16_mul(float16 a, float16 b, float_status *status)
>   static float32 QEMU_SOFTFLOAT_ATTR
>   soft_f32_mul(float32 a, float32 b, float_status *status)
>   {
> -    FloatParts pa = float32_unpack_canonical(a, status);
> -    FloatParts pb = float32_unpack_canonical(b, status);
> -    FloatParts pr = mul_floats(pa, pb, status);
> +    FloatParts64 pa = float32_unpack_canonical(a, status);
> +    FloatParts64 pb = float32_unpack_canonical(b, status);
> +    FloatParts64 pr = mul_floats(pa, pb, status);
>   
>       return float32_round_pack_canonical(pr, status);
>   }
> @@ -1287,9 +1287,9 @@ soft_f32_mul(float32 a, float32 b, float_status *status)
>   static float64 QEMU_SOFTFLOAT_ATTR
>   soft_f64_mul(float64 a, float64 b, float_status *status)
>   {
> -    FloatParts pa = float64_unpack_canonical(a, status);
> -    FloatParts pb = float64_unpack_canonical(b, status);
> -    FloatParts pr = mul_floats(pa, pb, status);
> +    FloatParts64 pa = float64_unpack_canonical(a, status);
> +    FloatParts64 pb = float64_unpack_canonical(b, status);
> +    FloatParts64 pr = mul_floats(pa, pb, status);
>   
>       return float64_round_pack_canonical(pr, status);
>   }
> @@ -1325,9 +1325,9 @@ float64_mul(float64 a, float64 b, float_status *s)
>   
>   bfloat16 QEMU_FLATTEN bfloat16_mul(bfloat16 a, bfloat16 b, float_status *status)
>   {
> -    FloatParts pa = bfloat16_unpack_canonical(a, status);
> -    FloatParts pb = bfloat16_unpack_canonical(b, status);
> -    FloatParts pr = mul_floats(pa, pb, status);
> +    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
> +    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
> +    FloatParts64 pr = mul_floats(pa, pb, status);
>   
>       return bfloat16_round_pack_canonical(pr, status);
>   }
> @@ -1344,7 +1344,7 @@ bfloat16 QEMU_FLATTEN bfloat16_mul(bfloat16 a, bfloat16 b, float_status *status)
>    * NaNs.)
>    */
>   
> -static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
> +static FloatParts64 muladd_floats(FloatParts64 a, FloatParts64 b, FloatParts64 c,
>                                   int flags, float_status *s)
>   {
>       bool inf_zero, p_sign;
> @@ -1520,10 +1520,10 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
>   float16 QEMU_FLATTEN float16_muladd(float16 a, float16 b, float16 c,
>                                                   int flags, float_status *status)
>   {
> -    FloatParts pa = float16_unpack_canonical(a, status);
> -    FloatParts pb = float16_unpack_canonical(b, status);
> -    FloatParts pc = float16_unpack_canonical(c, status);
> -    FloatParts pr = muladd_floats(pa, pb, pc, flags, status);
> +    FloatParts64 pa = float16_unpack_canonical(a, status);
> +    FloatParts64 pb = float16_unpack_canonical(b, status);
> +    FloatParts64 pc = float16_unpack_canonical(c, status);
> +    FloatParts64 pr = muladd_floats(pa, pb, pc, flags, status);
>   
>       return float16_round_pack_canonical(pr, status);
>   }
> @@ -1532,10 +1532,10 @@ static float32 QEMU_SOFTFLOAT_ATTR
>   soft_f32_muladd(float32 a, float32 b, float32 c, int flags,
>                   float_status *status)
>   {
> -    FloatParts pa = float32_unpack_canonical(a, status);
> -    FloatParts pb = float32_unpack_canonical(b, status);
> -    FloatParts pc = float32_unpack_canonical(c, status);
> -    FloatParts pr = muladd_floats(pa, pb, pc, flags, status);
> +    FloatParts64 pa = float32_unpack_canonical(a, status);
> +    FloatParts64 pb = float32_unpack_canonical(b, status);
> +    FloatParts64 pc = float32_unpack_canonical(c, status);
> +    FloatParts64 pr = muladd_floats(pa, pb, pc, flags, status);
>   
>       return float32_round_pack_canonical(pr, status);
>   }
> @@ -1544,10 +1544,10 @@ static float64 QEMU_SOFTFLOAT_ATTR
>   soft_f64_muladd(float64 a, float64 b, float64 c, int flags,
>                   float_status *status)
>   {
> -    FloatParts pa = float64_unpack_canonical(a, status);
> -    FloatParts pb = float64_unpack_canonical(b, status);
> -    FloatParts pc = float64_unpack_canonical(c, status);
> -    FloatParts pr = muladd_floats(pa, pb, pc, flags, status);
> +    FloatParts64 pa = float64_unpack_canonical(a, status);
> +    FloatParts64 pb = float64_unpack_canonical(b, status);
> +    FloatParts64 pc = float64_unpack_canonical(c, status);
> +    FloatParts64 pr = muladd_floats(pa, pb, pc, flags, status);
>   
>       return float64_round_pack_canonical(pr, status);
>   }
> @@ -1705,10 +1705,10 @@ float64_muladd(float64 xa, float64 xb, float64 xc, int flags, float_status *s)
>   bfloat16 QEMU_FLATTEN bfloat16_muladd(bfloat16 a, bfloat16 b, bfloat16 c,
>                                         int flags, float_status *status)
>   {
> -    FloatParts pa = bfloat16_unpack_canonical(a, status);
> -    FloatParts pb = bfloat16_unpack_canonical(b, status);
> -    FloatParts pc = bfloat16_unpack_canonical(c, status);
> -    FloatParts pr = muladd_floats(pa, pb, pc, flags, status);
> +    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
> +    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
> +    FloatParts64 pc = bfloat16_unpack_canonical(c, status);
> +    FloatParts64 pr = muladd_floats(pa, pb, pc, flags, status);
>   
>       return bfloat16_round_pack_canonical(pr, status);
>   }
> @@ -1719,7 +1719,7 @@ bfloat16 QEMU_FLATTEN bfloat16_muladd(bfloat16 a, bfloat16 b, bfloat16 c,
>    * the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
>    */
>   
> -static FloatParts div_floats(FloatParts a, FloatParts b, float_status *s)
> +static FloatParts64 div_floats(FloatParts64 a, FloatParts64 b, float_status *s)
>   {
>       bool sign = a.sign ^ b.sign;
>   
> @@ -1786,9 +1786,9 @@ static FloatParts div_floats(FloatParts a, FloatParts b, float_status *s)
>   
>   float16 float16_div(float16 a, float16 b, float_status *status)
>   {
> -    FloatParts pa = float16_unpack_canonical(a, status);
> -    FloatParts pb = float16_unpack_canonical(b, status);
> -    FloatParts pr = div_floats(pa, pb, status);
> +    FloatParts64 pa = float16_unpack_canonical(a, status);
> +    FloatParts64 pb = float16_unpack_canonical(b, status);
> +    FloatParts64 pr = div_floats(pa, pb, status);
>   
>       return float16_round_pack_canonical(pr, status);
>   }
> @@ -1796,9 +1796,9 @@ float16 float16_div(float16 a, float16 b, float_status *status)
>   static float32 QEMU_SOFTFLOAT_ATTR
>   soft_f32_div(float32 a, float32 b, float_status *status)
>   {
> -    FloatParts pa = float32_unpack_canonical(a, status);
> -    FloatParts pb = float32_unpack_canonical(b, status);
> -    FloatParts pr = div_floats(pa, pb, status);
> +    FloatParts64 pa = float32_unpack_canonical(a, status);
> +    FloatParts64 pb = float32_unpack_canonical(b, status);
> +    FloatParts64 pr = div_floats(pa, pb, status);
>   
>       return float32_round_pack_canonical(pr, status);
>   }
> @@ -1806,9 +1806,9 @@ soft_f32_div(float32 a, float32 b, float_status *status)
>   static float64 QEMU_SOFTFLOAT_ATTR
>   soft_f64_div(float64 a, float64 b, float_status *status)
>   {
> -    FloatParts pa = float64_unpack_canonical(a, status);
> -    FloatParts pb = float64_unpack_canonical(b, status);
> -    FloatParts pr = div_floats(pa, pb, status);
> +    FloatParts64 pa = float64_unpack_canonical(a, status);
> +    FloatParts64 pb = float64_unpack_canonical(b, status);
> +    FloatParts64 pr = div_floats(pa, pb, status);
>   
>       return float64_round_pack_canonical(pr, status);
>   }
> @@ -1878,9 +1878,9 @@ float64_div(float64 a, float64 b, float_status *s)
>   
>   bfloat16 bfloat16_div(bfloat16 a, bfloat16 b, float_status *status)
>   {
> -    FloatParts pa = bfloat16_unpack_canonical(a, status);
> -    FloatParts pb = bfloat16_unpack_canonical(b, status);
> -    FloatParts pr = div_floats(pa, pb, status);
> +    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
> +    FloatParts64 pb = bfloat16_unpack_canonical(b, status);
> +    FloatParts64 pr = div_floats(pa, pb, status);
>   
>       return bfloat16_round_pack_canonical(pr, status);
>   }
> @@ -1896,7 +1896,7 @@ bfloat16 bfloat16_div(bfloat16 a, bfloat16 b, float_status *status)
>    * invalid exceptions and handling the conversion on NaNs.
>    */
>   
> -static FloatParts float_to_float(FloatParts a, const FloatFmt *dstf,
> +static FloatParts64 float_to_float(FloatParts64 a, const FloatFmt *dstf,
>                                    float_status *s)
>   {
>       if (dstf->arm_althp) {
> @@ -1934,32 +1934,32 @@ static FloatParts float_to_float(FloatParts a, const FloatFmt *dstf,
>   float32 float16_to_float32(float16 a, bool ieee, float_status *s)
>   {
>       const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
> -    FloatParts p = float16a_unpack_canonical(a, s, fmt16);
> -    FloatParts pr = float_to_float(p, &float32_params, s);
> +    FloatParts64 p = float16a_unpack_canonical(a, s, fmt16);
> +    FloatParts64 pr = float_to_float(p, &float32_params, s);
>       return float32_round_pack_canonical(pr, s);
>   }
>   
>   float64 float16_to_float64(float16 a, bool ieee, float_status *s)
>   {
>       const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
> -    FloatParts p = float16a_unpack_canonical(a, s, fmt16);
> -    FloatParts pr = float_to_float(p, &float64_params, s);
> +    FloatParts64 p = float16a_unpack_canonical(a, s, fmt16);
> +    FloatParts64 pr = float_to_float(p, &float64_params, s);
>       return float64_round_pack_canonical(pr, s);
>   }
>   
>   float16 float32_to_float16(float32 a, bool ieee, float_status *s)
>   {
>       const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
> -    FloatParts p = float32_unpack_canonical(a, s);
> -    FloatParts pr = float_to_float(p, fmt16, s);
> +    FloatParts64 p = float32_unpack_canonical(a, s);
> +    FloatParts64 pr = float_to_float(p, fmt16, s);
>       return float16a_round_pack_canonical(pr, s, fmt16);
>   }
>   
>   static float64 QEMU_SOFTFLOAT_ATTR
>   soft_float32_to_float64(float32 a, float_status *s)
>   {
> -    FloatParts p = float32_unpack_canonical(a, s);
> -    FloatParts pr = float_to_float(p, &float64_params, s);
> +    FloatParts64 p = float32_unpack_canonical(a, s);
> +    FloatParts64 pr = float_to_float(p, &float64_params, s);
>       return float64_round_pack_canonical(pr, s);
>   }
>   
> @@ -1982,43 +1982,43 @@ float64 float32_to_float64(float32 a, float_status *s)
>   float16 float64_to_float16(float64 a, bool ieee, float_status *s)
>   {
>       const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
> -    FloatParts p = float64_unpack_canonical(a, s);
> -    FloatParts pr = float_to_float(p, fmt16, s);
> +    FloatParts64 p = float64_unpack_canonical(a, s);
> +    FloatParts64 pr = float_to_float(p, fmt16, s);
>       return float16a_round_pack_canonical(pr, s, fmt16);
>   }
>   
>   float32 float64_to_float32(float64 a, float_status *s)
>   {
> -    FloatParts p = float64_unpack_canonical(a, s);
> -    FloatParts pr = float_to_float(p, &float32_params, s);
> +    FloatParts64 p = float64_unpack_canonical(a, s);
> +    FloatParts64 pr = float_to_float(p, &float32_params, s);
>       return float32_round_pack_canonical(pr, s);
>   }
>   
>   float32 bfloat16_to_float32(bfloat16 a, float_status *s)
>   {
> -    FloatParts p = bfloat16_unpack_canonical(a, s);
> -    FloatParts pr = float_to_float(p, &float32_params, s);
> +    FloatParts64 p = bfloat16_unpack_canonical(a, s);
> +    FloatParts64 pr = float_to_float(p, &float32_params, s);
>       return float32_round_pack_canonical(pr, s);
>   }
>   
>   float64 bfloat16_to_float64(bfloat16 a, float_status *s)
>   {
> -    FloatParts p = bfloat16_unpack_canonical(a, s);
> -    FloatParts pr = float_to_float(p, &float64_params, s);
> +    FloatParts64 p = bfloat16_unpack_canonical(a, s);
> +    FloatParts64 pr = float_to_float(p, &float64_params, s);
>       return float64_round_pack_canonical(pr, s);
>   }
>   
>   bfloat16 float32_to_bfloat16(float32 a, float_status *s)
>   {
> -    FloatParts p = float32_unpack_canonical(a, s);
> -    FloatParts pr = float_to_float(p, &bfloat16_params, s);
> +    FloatParts64 p = float32_unpack_canonical(a, s);
> +    FloatParts64 pr = float_to_float(p, &bfloat16_params, s);
>       return bfloat16_round_pack_canonical(pr, s);
>   }
>   
>   bfloat16 float64_to_bfloat16(float64 a, float_status *s)
>   {
> -    FloatParts p = float64_unpack_canonical(a, s);
> -    FloatParts pr = float_to_float(p, &bfloat16_params, s);
> +    FloatParts64 p = float64_unpack_canonical(a, s);
> +    FloatParts64 pr = float_to_float(p, &bfloat16_params, s);
>       return bfloat16_round_pack_canonical(pr, s);
>   }
>   
> @@ -2029,7 +2029,7 @@ bfloat16 float64_to_bfloat16(float64 a, float_status *s)
>    * Arithmetic.
>    */
>   
> -static FloatParts round_to_int(FloatParts a, FloatRoundMode rmode,
> +static FloatParts64 round_to_int(FloatParts64 a, FloatRoundMode rmode,
>                                  int scale, float_status *s)
>   {
>       switch (a.cls) {
> @@ -2132,22 +2132,22 @@ static FloatParts round_to_int(FloatParts a, FloatRoundMode rmode,
>   
>   float16 float16_round_to_int(float16 a, float_status *s)
>   {
> -    FloatParts pa = float16_unpack_canonical(a, s);
> -    FloatParts pr = round_to_int(pa, s->float_rounding_mode, 0, s);
> +    FloatParts64 pa = float16_unpack_canonical(a, s);
> +    FloatParts64 pr = round_to_int(pa, s->float_rounding_mode, 0, s);
>       return float16_round_pack_canonical(pr, s);
>   }
>   
>   float32 float32_round_to_int(float32 a, float_status *s)
>   {
> -    FloatParts pa = float32_unpack_canonical(a, s);
> -    FloatParts pr = round_to_int(pa, s->float_rounding_mode, 0, s);
> +    FloatParts64 pa = float32_unpack_canonical(a, s);
> +    FloatParts64 pr = round_to_int(pa, s->float_rounding_mode, 0, s);
>       return float32_round_pack_canonical(pr, s);
>   }
>   
>   float64 float64_round_to_int(float64 a, float_status *s)
>   {
> -    FloatParts pa = float64_unpack_canonical(a, s);
> -    FloatParts pr = round_to_int(pa, s->float_rounding_mode, 0, s);
> +    FloatParts64 pa = float64_unpack_canonical(a, s);
> +    FloatParts64 pr = round_to_int(pa, s->float_rounding_mode, 0, s);
>       return float64_round_pack_canonical(pr, s);
>   }
>   
> @@ -2158,8 +2158,8 @@ float64 float64_round_to_int(float64 a, float_status *s)
>   
>   bfloat16 bfloat16_round_to_int(bfloat16 a, float_status *s)
>   {
> -    FloatParts pa = bfloat16_unpack_canonical(a, s);
> -    FloatParts pr = round_to_int(pa, s->float_rounding_mode, 0, s);
> +    FloatParts64 pa = bfloat16_unpack_canonical(a, s);
> +    FloatParts64 pr = round_to_int(pa, s->float_rounding_mode, 0, s);
>       return bfloat16_round_pack_canonical(pr, s);
>   }
>   
> @@ -2174,13 +2174,13 @@ bfloat16 bfloat16_round_to_int(bfloat16 a, float_status *s)
>    * is returned.
>   */
>   
> -static int64_t round_to_int_and_pack(FloatParts in, FloatRoundMode rmode,
> +static int64_t round_to_int_and_pack(FloatParts64 in, FloatRoundMode rmode,
>                                        int scale, int64_t min, int64_t max,
>                                        float_status *s)
>   {
>       uint64_t r;
>       int orig_flags = get_float_exception_flags(s);
> -    FloatParts p = round_to_int(in, rmode, scale, s);
> +    FloatParts64 p = round_to_int(in, rmode, scale, s);
>   
>       switch (p.cls) {
>       case float_class_snan:
> @@ -2452,12 +2452,12 @@ int64_t bfloat16_to_int64_round_to_zero(bfloat16 a, float_status *s)
>    *  flag.
>    */
>   
> -static uint64_t round_to_uint_and_pack(FloatParts in, FloatRoundMode rmode,
> +static uint64_t round_to_uint_and_pack(FloatParts64 in, FloatRoundMode rmode,
>                                          int scale, uint64_t max,
>                                          float_status *s)
>   {
>       int orig_flags = get_float_exception_flags(s);
> -    FloatParts p = round_to_int(in, rmode, scale, s);
> +    FloatParts64 p = round_to_int(in, rmode, scale, s);
>       uint64_t r;
>   
>       switch (p.cls) {
> @@ -2726,9 +2726,9 @@ uint64_t bfloat16_to_uint64_round_to_zero(bfloat16 a, float_status *s)
>    * to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
>    */
>   
> -static FloatParts int_to_float(int64_t a, int scale, float_status *status)
> +static FloatParts64 int_to_float(int64_t a, int scale, float_status *status)
>   {
> -    FloatParts r = { .sign = false };
> +    FloatParts64 r = { .sign = false };
>   
>       if (a == 0) {
>           r.cls = float_class_zero;
> @@ -2753,7 +2753,7 @@ static FloatParts int_to_float(int64_t a, int scale, float_status *status)
>   
>   float16 int64_to_float16_scalbn(int64_t a, int scale, float_status *status)
>   {
> -    FloatParts pa = int_to_float(a, scale, status);
> +    FloatParts64 pa = int_to_float(a, scale, status);
>       return float16_round_pack_canonical(pa, status);
>   }
>   
> @@ -2789,7 +2789,7 @@ float16 int8_to_float16(int8_t a, float_status *status)
>   
>   float32 int64_to_float32_scalbn(int64_t a, int scale, float_status *status)
>   {
> -    FloatParts pa = int_to_float(a, scale, status);
> +    FloatParts64 pa = int_to_float(a, scale, status);
>       return float32_round_pack_canonical(pa, status);
>   }
>   
> @@ -2820,7 +2820,7 @@ float32 int16_to_float32(int16_t a, float_status *status)
>   
>   float64 int64_to_float64_scalbn(int64_t a, int scale, float_status *status)
>   {
> -    FloatParts pa = int_to_float(a, scale, status);
> +    FloatParts64 pa = int_to_float(a, scale, status);
>       return float64_round_pack_canonical(pa, status);
>   }
>   
> @@ -2856,7 +2856,7 @@ float64 int16_to_float64(int16_t a, float_status *status)
>   
>   bfloat16 int64_to_bfloat16_scalbn(int64_t a, int scale, float_status *status)
>   {
> -    FloatParts pa = int_to_float(a, scale, status);
> +    FloatParts64 pa = int_to_float(a, scale, status);
>       return bfloat16_round_pack_canonical(pa, status);
>   }
>   
> @@ -2893,9 +2893,9 @@ bfloat16 int16_to_bfloat16(int16_t a, float_status *status)
>    * IEC/IEEE Standard for Binary Floating-Point Arithmetic.
>    */
>   
> -static FloatParts uint_to_float(uint64_t a, int scale, float_status *status)
> +static FloatParts64 uint_to_float(uint64_t a, int scale, float_status *status)
>   {
> -    FloatParts r = { .sign = false };
> +    FloatParts64 r = { .sign = false };
>       int shift;
>   
>       if (a == 0) {
> @@ -2913,7 +2913,7 @@ static FloatParts uint_to_float(uint64_t a, int scale, float_status *status)
>   
>   float16 uint64_to_float16_scalbn(uint64_t a, int scale, float_status *status)
>   {
> -    FloatParts pa = uint_to_float(a, scale, status);
> +    FloatParts64 pa = uint_to_float(a, scale, status);
>       return float16_round_pack_canonical(pa, status);
>   }
>   
> @@ -2949,7 +2949,7 @@ float16 uint8_to_float16(uint8_t a, float_status *status)
>   
>   float32 uint64_to_float32_scalbn(uint64_t a, int scale, float_status *status)
>   {
> -    FloatParts pa = uint_to_float(a, scale, status);
> +    FloatParts64 pa = uint_to_float(a, scale, status);
>       return float32_round_pack_canonical(pa, status);
>   }
>   
> @@ -2980,7 +2980,7 @@ float32 uint16_to_float32(uint16_t a, float_status *status)
>   
>   float64 uint64_to_float64_scalbn(uint64_t a, int scale, float_status *status)
>   {
> -    FloatParts pa = uint_to_float(a, scale, status);
> +    FloatParts64 pa = uint_to_float(a, scale, status);
>       return float64_round_pack_canonical(pa, status);
>   }
>   
> @@ -3016,7 +3016,7 @@ float64 uint16_to_float64(uint16_t a, float_status *status)
>   
>   bfloat16 uint64_to_bfloat16_scalbn(uint64_t a, int scale, float_status *status)
>   {
> -    FloatParts pa = uint_to_float(a, scale, status);
> +    FloatParts64 pa = uint_to_float(a, scale, status);
>       return bfloat16_round_pack_canonical(pa, status);
>   }
>   
> @@ -3061,7 +3061,7 @@ bfloat16 uint16_to_bfloat16(uint16_t a, float_status *status)
>    * minnummag() and maxnummag() functions correspond to minNumMag()
>    * and minNumMag() from the IEEE-754 2008.
>    */
> -static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
> +static FloatParts64 minmax_floats(FloatParts64 a, FloatParts64 b, bool ismin,
>                                   bool ieee, bool ismag, float_status *s)
>   {
>       if (unlikely(is_nan(a.cls) || is_nan(b.cls))) {
> @@ -3136,9 +3136,9 @@ static FloatParts minmax_floats(FloatParts a, FloatParts b, bool ismin,
>   float ## sz float ## sz ## _ ## name(float ## sz a, float ## sz b,      \
>                                        float_status *s)                   \
>   {                                                                       \
> -    FloatParts pa = float ## sz ## _unpack_canonical(a, s);             \
> -    FloatParts pb = float ## sz ## _unpack_canonical(b, s);             \
> -    FloatParts pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);      \
> +    FloatParts64 pa = float ## sz ## _unpack_canonical(a, s);             \
> +    FloatParts64 pb = float ## sz ## _unpack_canonical(b, s);             \
> +    FloatParts64 pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);      \
>                                                                           \
>       return float ## sz ## _round_pack_canonical(pr, s);                 \
>   }
> @@ -3169,9 +3169,9 @@ MINMAX(64, maxnummag, false, true, true)
>   #define BF16_MINMAX(name, ismin, isiee, ismag)                          \
>   bfloat16 bfloat16_ ## name(bfloat16 a, bfloat16 b, float_status *s)     \
>   {                                                                       \
> -    FloatParts pa = bfloat16_unpack_canonical(a, s);                    \
> -    FloatParts pb = bfloat16_unpack_canonical(b, s);                    \
> -    FloatParts pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);      \
> +    FloatParts64 pa = bfloat16_unpack_canonical(a, s);                    \
> +    FloatParts64 pb = bfloat16_unpack_canonical(b, s);                    \
> +    FloatParts64 pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);      \
>                                                                           \
>       return bfloat16_round_pack_canonical(pr, s);                        \
>   }
> @@ -3186,7 +3186,7 @@ BF16_MINMAX(maxnummag, false, true, true)
>   #undef BF16_MINMAX
>   
>   /* Floating point compare */
> -static FloatRelation compare_floats(FloatParts a, FloatParts b, bool is_quiet,
> +static FloatRelation compare_floats(FloatParts64 a, FloatParts64 b, bool is_quiet,
>                                       float_status *s)
>   {
>       if (is_nan(a.cls) || is_nan(b.cls)) {
> @@ -3247,8 +3247,8 @@ static FloatRelation compare_floats(FloatParts a, FloatParts b, bool is_quiet,
>   static int attr                                                         \
>   name(float ## sz a, float ## sz b, bool is_quiet, float_status *s)      \
>   {                                                                       \
> -    FloatParts pa = float ## sz ## _unpack_canonical(a, s);             \
> -    FloatParts pb = float ## sz ## _unpack_canonical(b, s);             \
> +    FloatParts64 pa = float ## sz ## _unpack_canonical(a, s);             \
> +    FloatParts64 pb = float ## sz ## _unpack_canonical(b, s);             \
>       return compare_floats(pa, pb, is_quiet, s);                         \
>   }
>   
> @@ -3349,8 +3349,8 @@ FloatRelation float64_compare_quiet(float64 a, float64 b, float_status *s)
>   static FloatRelation QEMU_FLATTEN
>   soft_bf16_compare(bfloat16 a, bfloat16 b, bool is_quiet, float_status *s)
>   {
> -    FloatParts pa = bfloat16_unpack_canonical(a, s);
> -    FloatParts pb = bfloat16_unpack_canonical(b, s);
> +    FloatParts64 pa = bfloat16_unpack_canonical(a, s);
> +    FloatParts64 pb = bfloat16_unpack_canonical(b, s);
>       return compare_floats(pa, pb, is_quiet, s);
>   }
>   
> @@ -3365,16 +3365,16 @@ FloatRelation bfloat16_compare_quiet(bfloat16 a, bfloat16 b, float_status *s)
>   }
>   
>   /* Multiply A by 2 raised to the power N.  */
> -static FloatParts scalbn_decomposed(FloatParts a, int n, float_status *s)
> +static FloatParts64 scalbn_decomposed(FloatParts64 a, int n, float_status *s)
>   {
>       if (unlikely(is_nan(a.cls))) {
>           return return_nan(a, s);
>       }
>       if (a.cls == float_class_normal) {
> -        /* The largest float type (even though not supported by FloatParts)
> +        /* The largest float type (even though not supported by FloatParts64)
>            * is float128, which has a 15 bit exponent.  Bounding N to 16 bits
>            * still allows rounding to infinity, without allowing overflow
> -         * within the int32_t that backs FloatParts.exp.
> +         * within the int32_t that backs FloatParts64.exp.
>            */
>           n = MIN(MAX(n, -0x10000), 0x10000);
>           a.exp += n;
> @@ -3384,29 +3384,29 @@ static FloatParts scalbn_decomposed(FloatParts a, int n, float_status *s)
>   
>   float16 float16_scalbn(float16 a, int n, float_status *status)
>   {
> -    FloatParts pa = float16_unpack_canonical(a, status);
> -    FloatParts pr = scalbn_decomposed(pa, n, status);
> +    FloatParts64 pa = float16_unpack_canonical(a, status);
> +    FloatParts64 pr = scalbn_decomposed(pa, n, status);
>       return float16_round_pack_canonical(pr, status);
>   }
>   
>   float32 float32_scalbn(float32 a, int n, float_status *status)
>   {
> -    FloatParts pa = float32_unpack_canonical(a, status);
> -    FloatParts pr = scalbn_decomposed(pa, n, status);
> +    FloatParts64 pa = float32_unpack_canonical(a, status);
> +    FloatParts64 pr = scalbn_decomposed(pa, n, status);
>       return float32_round_pack_canonical(pr, status);
>   }
>   
>   float64 float64_scalbn(float64 a, int n, float_status *status)
>   {
> -    FloatParts pa = float64_unpack_canonical(a, status);
> -    FloatParts pr = scalbn_decomposed(pa, n, status);
> +    FloatParts64 pa = float64_unpack_canonical(a, status);
> +    FloatParts64 pr = scalbn_decomposed(pa, n, status);
>       return float64_round_pack_canonical(pr, status);
>   }
>   
>   bfloat16 bfloat16_scalbn(bfloat16 a, int n, float_status *status)
>   {
> -    FloatParts pa = bfloat16_unpack_canonical(a, status);
> -    FloatParts pr = scalbn_decomposed(pa, n, status);
> +    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
> +    FloatParts64 pr = scalbn_decomposed(pa, n, status);
>       return bfloat16_round_pack_canonical(pr, status);
>   }
>   
> @@ -3422,7 +3422,7 @@ bfloat16 bfloat16_scalbn(bfloat16 a, int n, float_status *status)
>    * especially for 64 bit floats.
>    */
>   
> -static FloatParts sqrt_float(FloatParts a, float_status *s, const FloatFmt *p)
> +static FloatParts64 sqrt_float(FloatParts64 a, float_status *s, const FloatFmt *p)
>   {
>       uint64_t a_frac, r_frac, s_frac;
>       int bit, last_bit;
> @@ -3482,24 +3482,24 @@ static FloatParts sqrt_float(FloatParts a, float_status *s, const FloatFmt *p)
>   
>   float16 QEMU_FLATTEN float16_sqrt(float16 a, float_status *status)
>   {
> -    FloatParts pa = float16_unpack_canonical(a, status);
> -    FloatParts pr = sqrt_float(pa, status, &float16_params);
> +    FloatParts64 pa = float16_unpack_canonical(a, status);
> +    FloatParts64 pr = sqrt_float(pa, status, &float16_params);
>       return float16_round_pack_canonical(pr, status);
>   }
>   
>   static float32 QEMU_SOFTFLOAT_ATTR
>   soft_f32_sqrt(float32 a, float_status *status)
>   {
> -    FloatParts pa = float32_unpack_canonical(a, status);
> -    FloatParts pr = sqrt_float(pa, status, &float32_params);
> +    FloatParts64 pa = float32_unpack_canonical(a, status);
> +    FloatParts64 pr = sqrt_float(pa, status, &float32_params);
>       return float32_round_pack_canonical(pr, status);
>   }
>   
>   static float64 QEMU_SOFTFLOAT_ATTR
>   soft_f64_sqrt(float64 a, float_status *status)
>   {
> -    FloatParts pa = float64_unpack_canonical(a, status);
> -    FloatParts pr = sqrt_float(pa, status, &float64_params);
> +    FloatParts64 pa = float64_unpack_canonical(a, status);
> +    FloatParts64 pr = sqrt_float(pa, status, &float64_params);
>       return float64_round_pack_canonical(pr, status);
>   }
>   
> @@ -3559,8 +3559,8 @@ float64 QEMU_FLATTEN float64_sqrt(float64 xa, float_status *s)
>   
>   bfloat16 QEMU_FLATTEN bfloat16_sqrt(bfloat16 a, float_status *status)
>   {
> -    FloatParts pa = bfloat16_unpack_canonical(a, status);
> -    FloatParts pr = sqrt_float(pa, status, &bfloat16_params);
> +    FloatParts64 pa = bfloat16_unpack_canonical(a, status);
> +    FloatParts64 pr = sqrt_float(pa, status, &bfloat16_params);
>       return bfloat16_round_pack_canonical(pr, status);
>   }
>   
> @@ -3570,28 +3570,28 @@ bfloat16 QEMU_FLATTEN bfloat16_sqrt(bfloat16 a, float_status *status)
>   
>   float16 float16_default_nan(float_status *status)
>   {
> -    FloatParts p = parts_default_nan(status);
> +    FloatParts64 p = parts_default_nan(status);
>       p.frac >>= float16_params.frac_shift;
>       return float16_pack_raw(p);
>   }
>   
>   float32 float32_default_nan(float_status *status)
>   {
> -    FloatParts p = parts_default_nan(status);
> +    FloatParts64 p = parts_default_nan(status);
>       p.frac >>= float32_params.frac_shift;
>       return float32_pack_raw(p);
>   }
>   
>   float64 float64_default_nan(float_status *status)
>   {
> -    FloatParts p = parts_default_nan(status);
> +    FloatParts64 p = parts_default_nan(status);
>       p.frac >>= float64_params.frac_shift;
>       return float64_pack_raw(p);
>   }
>   
>   float128 float128_default_nan(float_status *status)
>   {
> -    FloatParts p = parts_default_nan(status);
> +    FloatParts64 p = parts_default_nan(status);
>       float128 r;
>   
>       /* Extrapolate from the choices made by parts_default_nan to fill
> @@ -3608,7 +3608,7 @@ float128 float128_default_nan(float_status *status)
>   
>   bfloat16 bfloat16_default_nan(float_status *status)
>   {
> -    FloatParts p = parts_default_nan(status);
> +    FloatParts64 p = parts_default_nan(status);
>       p.frac >>= bfloat16_params.frac_shift;
>       return bfloat16_pack_raw(p);
>   }
> @@ -3619,7 +3619,7 @@ bfloat16 bfloat16_default_nan(float_status *status)
>   
>   float16 float16_silence_nan(float16 a, float_status *status)
>   {
> -    FloatParts p = float16_unpack_raw(a);
> +    FloatParts64 p = float16_unpack_raw(a);
>       p.frac <<= float16_params.frac_shift;
>       p = parts_silence_nan(p, status);
>       p.frac >>= float16_params.frac_shift;
> @@ -3628,7 +3628,7 @@ float16 float16_silence_nan(float16 a, float_status *status)
>   
>   float32 float32_silence_nan(float32 a, float_status *status)
>   {
> -    FloatParts p = float32_unpack_raw(a);
> +    FloatParts64 p = float32_unpack_raw(a);
>       p.frac <<= float32_params.frac_shift;
>       p = parts_silence_nan(p, status);
>       p.frac >>= float32_params.frac_shift;
> @@ -3637,7 +3637,7 @@ float32 float32_silence_nan(float32 a, float_status *status)
>   
>   float64 float64_silence_nan(float64 a, float_status *status)
>   {
> -    FloatParts p = float64_unpack_raw(a);
> +    FloatParts64 p = float64_unpack_raw(a);
>       p.frac <<= float64_params.frac_shift;
>       p = parts_silence_nan(p, status);
>       p.frac >>= float64_params.frac_shift;
> @@ -3646,7 +3646,7 @@ float64 float64_silence_nan(float64 a, float_status *status)
>   
>   bfloat16 bfloat16_silence_nan(bfloat16 a, float_status *status)
>   {
> -    FloatParts p = bfloat16_unpack_raw(a);
> +    FloatParts64 p = bfloat16_unpack_raw(a);
>       p.frac <<= bfloat16_params.frac_shift;
>       p = parts_silence_nan(p, status);
>       p.frac >>= bfloat16_params.frac_shift;
> @@ -3658,7 +3658,7 @@ bfloat16 bfloat16_silence_nan(bfloat16 a, float_status *status)
>   | input-denormal exception and return zero. Otherwise just return the value.
>   *----------------------------------------------------------------------------*/
>   
> -static bool parts_squash_denormal(FloatParts p, float_status *status)
> +static bool parts_squash_denormal(FloatParts64 p, float_status *status)
>   {
>       if (p.exp == 0 && p.frac != 0) {
>           float_raise(float_flag_input_denormal, status);
> @@ -3671,7 +3671,7 @@ static bool parts_squash_denormal(FloatParts p, float_status *status)
>   float16 float16_squash_input_denormal(float16 a, float_status *status)
>   {
>       if (status->flush_inputs_to_zero) {
> -        FloatParts p = float16_unpack_raw(a);
> +        FloatParts64 p = float16_unpack_raw(a);
>           if (parts_squash_denormal(p, status)) {
>               return float16_set_sign(float16_zero, p.sign);
>           }
> @@ -3682,7 +3682,7 @@ float16 float16_squash_input_denormal(float16 a, float_status *status)
>   float32 float32_squash_input_denormal(float32 a, float_status *status)
>   {
>       if (status->flush_inputs_to_zero) {
> -        FloatParts p = float32_unpack_raw(a);
> +        FloatParts64 p = float32_unpack_raw(a);
>           if (parts_squash_denormal(p, status)) {
>               return float32_set_sign(float32_zero, p.sign);
>           }
> @@ -3693,7 +3693,7 @@ float32 float32_squash_input_denormal(float32 a, float_status *status)
>   float64 float64_squash_input_denormal(float64 a, float_status *status)
>   {
>       if (status->flush_inputs_to_zero) {
> -        FloatParts p = float64_unpack_raw(a);
> +        FloatParts64 p = float64_unpack_raw(a);
>           if (parts_squash_denormal(p, status)) {
>               return float64_set_sign(float64_zero, p.sign);
>           }
> @@ -3704,7 +3704,7 @@ float64 float64_squash_input_denormal(float64 a, float_status *status)
>   bfloat16 bfloat16_squash_input_denormal(bfloat16 a, float_status *status)
>   {
>       if (status->flush_inputs_to_zero) {
> -        FloatParts p = bfloat16_unpack_raw(a);
> +        FloatParts64 p = bfloat16_unpack_raw(a);
>           if (parts_squash_denormal(p, status)) {
>               return bfloat16_set_sign(bfloat16_zero, p.sign);
>           }
> diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
> index 5988830c16..52fc76d800 100644
> --- a/fpu/softfloat-specialize.c.inc
> +++ b/fpu/softfloat-specialize.c.inc
> @@ -129,7 +129,7 @@ static bool parts_is_snan_frac(uint64_t frac, float_status *status)
>   | The pattern for a default generated deconstructed floating-point NaN.
>   *----------------------------------------------------------------------------*/
>   
> -static FloatParts parts_default_nan(float_status *status)
> +static FloatParts64 parts_default_nan(float_status *status)
>   {
>       bool sign = 0;
>       uint64_t frac;
> @@ -164,7 +164,7 @@ static FloatParts parts_default_nan(float_status *status)
>       }
>   #endif
>   
> -    return (FloatParts) {
> +    return (FloatParts64) {
>           .cls = float_class_qnan,
>           .sign = sign,
>           .exp = INT_MAX,
> @@ -177,7 +177,7 @@ static FloatParts parts_default_nan(float_status *status)
>   | floating-point parts.
>   *----------------------------------------------------------------------------*/
>   
> -static FloatParts parts_silence_nan(FloatParts a, float_status *status)
> +static FloatParts64 parts_silence_nan(FloatParts64 a, float_status *status)
>   {
>       g_assert(!no_signaling_nans(status));
>       g_assert(!status->default_nan_mode);
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 16/72] softfloat: Move type-specific pack/unpack routines
  2021-05-08  1:47 ` [PATCH 16/72] softfloat: Move type-specific pack/unpack routines Richard Henderson
  2021-05-09  8:46   ` Philippe Mathieu-Daudé
@ 2021-05-11 10:17   ` David Hildenbrand
  1 sibling, 0 replies; 151+ messages in thread
From: David Hildenbrand @ 2021-05-11 10:17 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:47, Richard Henderson wrote:
> In preparation from moving sf_canonicalize.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   fpu/softfloat.c | 109 +++++++++++++++++++++++++-----------------------
>   1 file changed, 56 insertions(+), 53 deletions(-)
> 
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 27b51659c9..398a068b58 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -837,59 +837,6 @@ static FloatParts64 round_canonical(FloatParts64 p, float_status *s,
>       return p;
>   }
>   
> -/* Explicit FloatFmt version */
> -static FloatParts64 float16a_unpack_canonical(float16 f, float_status *s,
> -                                            const FloatFmt *params)
> -{
> -    return sf_canonicalize(float16_unpack_raw(f), params, s);
> -}
> -
> -static FloatParts64 float16_unpack_canonical(float16 f, float_status *s)
> -{
> -    return float16a_unpack_canonical(f, s, &float16_params);
> -}
> -
> -static FloatParts64 bfloat16_unpack_canonical(bfloat16 f, float_status *s)
> -{
> -    return sf_canonicalize(bfloat16_unpack_raw(f), &bfloat16_params, s);
> -}
> -
> -static float16 float16a_round_pack_canonical(FloatParts64 p, float_status *s,
> -                                             const FloatFmt *params)
> -{
> -    return float16_pack_raw(round_canonical(p, s, params));
> -}
> -
> -static float16 float16_round_pack_canonical(FloatParts64 p, float_status *s)
> -{
> -    return float16a_round_pack_canonical(p, s, &float16_params);
> -}
> -
> -static bfloat16 bfloat16_round_pack_canonical(FloatParts64 p, float_status *s)
> -{
> -    return bfloat16_pack_raw(round_canonical(p, s, &bfloat16_params));
> -}
> -
> -static FloatParts64 float32_unpack_canonical(float32 f, float_status *s)
> -{
> -    return sf_canonicalize(float32_unpack_raw(f), &float32_params, s);
> -}
> -
> -static float32 float32_round_pack_canonical(FloatParts64 p, float_status *s)
> -{
> -    return float32_pack_raw(round_canonical(p, s, &float32_params));
> -}
> -
> -static FloatParts64 float64_unpack_canonical(float64 f, float_status *s)
> -{
> -    return sf_canonicalize(float64_unpack_raw(f), &float64_params, s);
> -}
> -
> -static float64 float64_round_pack_canonical(FloatParts64 p, float_status *s)
> -{
> -    return float64_pack_raw(round_canonical(p, s, &float64_params));
> -}
> -
>   static FloatParts64 return_nan(FloatParts64 a, float_status *s)
>   {
>       g_assert(is_nan(a.cls));
> @@ -964,6 +911,62 @@ static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64
>       return a;
>   }
>   
> +/*
> + * Pack/unpack routines with a specific FloatFmt.
> + */
> +
> +static FloatParts64 float16a_unpack_canonical(float16 f, float_status *s,
> +                                            const FloatFmt *params)
> +{
> +    return sf_canonicalize(float16_unpack_raw(f), params, s);
> +}
> +
> +static FloatParts64 float16_unpack_canonical(float16 f, float_status *s)
> +{
> +    return float16a_unpack_canonical(f, s, &float16_params);
> +}
> +
> +static FloatParts64 bfloat16_unpack_canonical(bfloat16 f, float_status *s)
> +{
> +    return sf_canonicalize(bfloat16_unpack_raw(f), &bfloat16_params, s);
> +}
> +
> +static float16 float16a_round_pack_canonical(FloatParts64 p, float_status *s,
> +                                             const FloatFmt *params)
> +{
> +    return float16_pack_raw(round_canonical(p, s, params));
> +}
> +
> +static float16 float16_round_pack_canonical(FloatParts64 p, float_status *s)
> +{
> +    return float16a_round_pack_canonical(p, s, &float16_params);
> +}
> +
> +static bfloat16 bfloat16_round_pack_canonical(FloatParts64 p, float_status *s)
> +{
> +    return bfloat16_pack_raw(round_canonical(p, s, &bfloat16_params));
> +}
> +
> +static FloatParts64 float32_unpack_canonical(float32 f, float_status *s)
> +{
> +    return sf_canonicalize(float32_unpack_raw(f), &float32_params, s);
> +}
> +
> +static float32 float32_round_pack_canonical(FloatParts64 p, float_status *s)
> +{
> +    return float32_pack_raw(round_canonical(p, s, &float32_params));
> +}
> +
> +static FloatParts64 float64_unpack_canonical(float64 f, float_status *s)
> +{
> +    return sf_canonicalize(float64_unpack_raw(f), &float64_params, s);
> +}
> +
> +static float64 float64_round_pack_canonical(FloatParts64 p, float_status *s)
> +{
> +    return float64_pack_raw(round_canonical(p, s, &float64_params));
> +}
> +
>   /*
>    * Returns the result of adding or subtracting the values of the
>    * floating-point values `a' and `b'. The operation is performed
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 17/72] softfloat: Use pointers with parts_default_nan
  2021-05-08  1:47 ` [PATCH 17/72] softfloat: Use pointers with parts_default_nan Richard Henderson
@ 2021-05-11 10:22   ` David Hildenbrand
  2021-05-11 20:19     ` Richard Henderson
  0 siblings, 1 reply; 151+ messages in thread
From: David Hildenbrand @ 2021-05-11 10:22 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:47, Richard Henderson wrote:
> At the same time, rename to parts64_default_nan and define
> a macro for parts_default_nan using QEMU_GENERIC.

All I can spot is "#define parts_default_nan  parts64_default_nan" -- 
what am I missing?

apart from that

Reviewed-by: David Hildenbrand <david@redhat.com>

> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   fpu/softfloat.c                | 47 +++++++++++++++++++++++-----------
>   fpu/softfloat-specialize.c.inc |  4 +--
>   2 files changed, 34 insertions(+), 17 deletions(-)
> 
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 398a068b58..c7f95961cf 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -650,6 +650,8 @@ static inline float64 float64_pack_raw(FloatParts64 p)
>   *----------------------------------------------------------------------------*/
>   #include "softfloat-specialize.c.inc"
>   
> +#define parts_default_nan  parts64_default_nan
> +
>   /* Canonicalize EXP and FRAC, setting CLS.  */
>   static FloatParts64 sf_canonicalize(FloatParts64 part, const FloatFmt *parm,
>                                     float_status *status)
> @@ -848,7 +850,8 @@ static FloatParts64 return_nan(FloatParts64 a, float_status *s)
>       } else if (!s->default_nan_mode) {
>           return a;
>       }
> -    return parts_default_nan(s);
> +    parts_default_nan(&a, s);
> +    return a;
>   }
>   
>   static FloatParts64 pick_nan(FloatParts64 a, FloatParts64 b, float_status *s)
> @@ -858,7 +861,7 @@ static FloatParts64 pick_nan(FloatParts64 a, FloatParts64 b, float_status *s)
>       }
>   
>       if (s->default_nan_mode) {
> -        return parts_default_nan(s);
> +        parts_default_nan(&a, s);
>       } else {
>           if (pickNaN(a.cls, b.cls,
>                       a.frac > b.frac ||
> @@ -900,7 +903,8 @@ static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64
>           a = c;
>           break;
>       case 3:
> -        return parts_default_nan(s);
> +        parts_default_nan(&a, s);
> +        break;
>       default:
>           g_assert_not_reached();
>       }
> @@ -1011,7 +1015,7 @@ static FloatParts64 addsub_floats(FloatParts64 a, FloatParts64 b, bool subtract,
>           if (a.cls == float_class_inf) {
>               if (b.cls == float_class_inf) {
>                   float_raise(float_flag_invalid, s);
> -                return parts_default_nan(s);
> +                parts_default_nan(&a, s);
>               }
>               return a;
>           }
> @@ -1254,7 +1258,8 @@ static FloatParts64 mul_floats(FloatParts64 a, FloatParts64 b, float_status *s)
>       if ((a.cls == float_class_inf && b.cls == float_class_zero) ||
>           (a.cls == float_class_zero && b.cls == float_class_inf)) {
>           float_raise(float_flag_invalid, s);
> -        return parts_default_nan(s);
> +        parts_default_nan(&a, s);
> +        return a;
>       }
>       /* Multiply by 0 or Inf */
>       if (a.cls == float_class_inf || a.cls == float_class_zero) {
> @@ -1372,7 +1377,8 @@ static FloatParts64 muladd_floats(FloatParts64 a, FloatParts64 b, FloatParts64 c
>   
>       if (inf_zero) {
>           float_raise(float_flag_invalid, s);
> -        return parts_default_nan(s);
> +        parts_default_nan(&a, s);
> +        return a;
>       }
>   
>       if (flags & float_muladd_negate_c) {
> @@ -1396,11 +1402,11 @@ static FloatParts64 muladd_floats(FloatParts64 a, FloatParts64 b, FloatParts64 c
>       if (c.cls == float_class_inf) {
>           if (p_class == float_class_inf && p_sign != c.sign) {
>               float_raise(float_flag_invalid, s);
> -            return parts_default_nan(s);
> +            parts_default_nan(&c, s);
>           } else {
>               c.sign ^= sign_flip;
> -            return c;
>           }
> +        return c;
>       }
>   
>       if (p_class == float_class_inf) {
> @@ -1764,7 +1770,8 @@ static FloatParts64 div_floats(FloatParts64 a, FloatParts64 b, float_status *s)
>           &&
>           (a.cls == float_class_inf || a.cls == float_class_zero)) {
>           float_raise(float_flag_invalid, s);
> -        return parts_default_nan(s);
> +        parts_default_nan(&a, s);
> +        return a;
>       }
>       /* Inf / x or 0 / x */
>       if (a.cls == float_class_inf || a.cls == float_class_zero) {
> @@ -3438,7 +3445,8 @@ static FloatParts64 sqrt_float(FloatParts64 a, float_status *s, const FloatFmt *
>       }
>       if (a.sign) {
>           float_raise(float_flag_invalid, s);
> -        return parts_default_nan(s);
> +        parts_default_nan(&a, s);
> +        return a;
>       }
>       if (a.cls == float_class_inf) {
>           return a;  /* sqrt(+inf) = +inf */
> @@ -3573,30 +3581,37 @@ bfloat16 QEMU_FLATTEN bfloat16_sqrt(bfloat16 a, float_status *status)
>   
>   float16 float16_default_nan(float_status *status)
>   {
> -    FloatParts64 p = parts_default_nan(status);
> +    FloatParts64 p;
> +
> +    parts_default_nan(&p, status);
>       p.frac >>= float16_params.frac_shift;
>       return float16_pack_raw(p);
>   }
>   
>   float32 float32_default_nan(float_status *status)
>   {
> -    FloatParts64 p = parts_default_nan(status);
> +    FloatParts64 p;
> +
> +    parts_default_nan(&p, status);
>       p.frac >>= float32_params.frac_shift;
>       return float32_pack_raw(p);
>   }
>   
>   float64 float64_default_nan(float_status *status)
>   {
> -    FloatParts64 p = parts_default_nan(status);
> +    FloatParts64 p;
> +
> +    parts_default_nan(&p, status);
>       p.frac >>= float64_params.frac_shift;
>       return float64_pack_raw(p);
>   }
>   
>   float128 float128_default_nan(float_status *status)
>   {
> -    FloatParts64 p = parts_default_nan(status);
> +    FloatParts64 p;
>       float128 r;
>   
> +    parts_default_nan(&p, status);
>       /* Extrapolate from the choices made by parts_default_nan to fill
>        * in the quad-floating format.  If the low bit is set, assume we
>        * want to set all non-snan bits.
> @@ -3611,7 +3626,9 @@ float128 float128_default_nan(float_status *status)
>   
>   bfloat16 bfloat16_default_nan(float_status *status)
>   {
> -    FloatParts64 p = parts_default_nan(status);
> +    FloatParts64 p;
> +
> +    parts_default_nan(&p, status);
>       p.frac >>= bfloat16_params.frac_shift;
>       return bfloat16_pack_raw(p);
>   }
> diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
> index 52fc76d800..085ddea62b 100644
> --- a/fpu/softfloat-specialize.c.inc
> +++ b/fpu/softfloat-specialize.c.inc
> @@ -129,7 +129,7 @@ static bool parts_is_snan_frac(uint64_t frac, float_status *status)
>   | The pattern for a default generated deconstructed floating-point NaN.
>   *----------------------------------------------------------------------------*/
>   
> -static FloatParts64 parts_default_nan(float_status *status)
> +static void parts64_default_nan(FloatParts64 *p, float_status *status)
>   {
>       bool sign = 0;
>       uint64_t frac;
> @@ -164,7 +164,7 @@ static FloatParts64 parts_default_nan(float_status *status)
>       }
>   #endif
>   
> -    return (FloatParts64) {
> +    *p = (FloatParts64) {
>           .cls = float_class_qnan,
>           .sign = sign,
>           .exp = INT_MAX,
> 


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 14/72] softfloat: Do not produce a default_nan from parts_silence_nan
  2021-05-08  1:47 ` [PATCH 14/72] softfloat: Do not produce a default_nan from parts_silence_nan Richard Henderson
  2021-05-11 10:16   ` David Hildenbrand
@ 2021-05-11 10:32   ` Alex Bennée
  1 sibling, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-11 10:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Require default_nan_mode to be set instead.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 19/72] softfloat: Use pointers with ftype_unpack_raw
  2021-05-08  1:47 ` [PATCH 19/72] softfloat: Use pointers with ftype_unpack_raw Richard Henderson
@ 2021-05-11 11:31   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-11 11:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 20/72] softfloat: Use pointers with pack_raw
  2021-05-08  1:47 ` [PATCH 20/72] softfloat: Use pointers with pack_raw Richard Henderson
@ 2021-05-11 11:32   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-11 11:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> At the same time, rename to pack_raw64.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 21/72] softfloat: Use pointers with ftype_pack_raw
  2021-05-08  1:47 ` [PATCH 21/72] softfloat: Use pointers with ftype_pack_raw Richard Henderson
@ 2021-05-11 11:32   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-11 11:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 22/72] softfloat: Use pointers with ftype_unpack_canonical
  2021-05-08  1:47 ` [PATCH 22/72] softfloat: Use pointers with ftype_unpack_canonical Richard Henderson
@ 2021-05-11 13:54   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-11 13:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 23/72] softfloat: Use pointers with ftype_round_pack_canonical
  2021-05-08  1:47 ` [PATCH 23/72] softfloat: Use pointers with ftype_round_pack_canonical Richard Henderson
@ 2021-05-11 13:55   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-11 13:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 24/72] softfloat: Use pointers with parts_silence_nan
  2021-05-08  1:47 ` [PATCH 24/72] softfloat: Use pointers with parts_silence_nan Richard Henderson
@ 2021-05-11 13:56   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-11 13:56 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> At the same time, rename to parts64_silence_nan, split out
> parts_silence_nan_frac, and define a macro for parts_silence_nan.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 25/72] softfloat: Rearrange FloatParts64
  2021-05-08  1:47 ` [PATCH 25/72] softfloat: Rearrange FloatParts64 Richard Henderson
@ 2021-05-11 13:57   ` Alex Bennée
  2021-05-11 15:04     ` Richard Henderson
  0 siblings, 1 reply; 151+ messages in thread
From: Alex Bennée @ 2021-05-11 13:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Shuffle the fraction to the end, otherwise sort by size.
> Add frac_hi and frac_lo members to alias frac.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  fpu/softfloat.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 2123453d40..2d6f61ee7a 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -511,10 +511,19 @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
>   */
>  
>  typedef struct {
> -    uint64_t frac;
> -    int32_t  exp;
>      FloatClass cls;
>      bool sign;
> +    int32_t exp;
> +    union {
> +        /* Routines that know the structure may reference the singular name. */
> +        uint64_t frac;
> +        /*
> +         * Routines expanded with multiple structures reference "hi" and "lo".
> +         * In this structure, the one word is both highest and lowest.
> +         */
> +        uint64_t frac_hi;
> +        uint64_t frac_lo;

This confuses me. Is this because it could be frac_hi or frac_lo at the
"top" of the structure because of endian issues?

> +    };
>  } FloatParts64;
>  
>  #define DECOMPOSED_BINARY_POINT    63


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 25/72] softfloat: Rearrange FloatParts64
  2021-05-11 13:57   ` Alex Bennée
@ 2021-05-11 15:04     ` Richard Henderson
  2021-05-12 11:08       ` Alex Bennée
  0 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-11 15:04 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, david

On 5/11/21 8:57 AM, Alex Bennée wrote:
>> +    union {
>> +        /* Routines that know the structure may reference the singular name. */
>> +        uint64_t frac;
>> +        /*
>> +         * Routines expanded with multiple structures reference "hi" and "lo".
>> +         * In this structure, the one word is both highest and lowest.
>> +         */
>> +        uint64_t frac_hi;
>> +        uint64_t frac_lo;
> 
> This confuses me. Is this because it could be frac_hi or frac_lo at the
> "top" of the structure because of endian issues?

Nothing about endianness.  There is exactly one element, so it is both the 
"first" and "last", both "high" and "low".

Generic code will examine the "high" word when looking at overflow and things 
related, and the "low" word when doing rounding.

This anonymous union gives the same element 3 different names.

r~


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 03/72] qemu/host-utils: Add wrappers for carry builtins
  2021-05-10 12:57   ` Alex Bennée
@ 2021-05-11 20:10     ` Richard Henderson
  2021-05-12 11:17       ` Alex Bennée
  0 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-11 20:10 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, david

On 5/10/21 7:57 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> These builtins came in clang 3.8, but are not present in gcc through
>> version 11.  Even in clang the optimization is not ideal except for
>> x86_64, but no worse than the hand-coding that we currently do.
> 
> Given this statement....

I think you mis-read the "except for x86_64" part?

Anyway, these are simply bugs to be filed against clang, so that hopefully 
clang-12 will do a good job with the builtin.  And as I said, while the 
generated code is not ideal, it's no worse.

>> +static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
>> +{
>> +#if __has_builtin(__builtin_addcll)
>> +    unsigned long long c = *pcarry;
>> +    x = __builtin_addcll(x, y, c, &c);
> 
> what happens when unsigned long long isn't the same as uint64_t? Doesn't
> C99 only specify a minimum?

If you only look at C99, sure.  But looking at the set of supported hosts, 
unsigned long long is always a 64-bit type.

>> +    *pcarry = c & 1;
> 
> Why do we need to clamp it here? Shouldn't the compiler automatically do
> that due to the bool?

This produces a single AND insn, instead of CMP + SETcc.


r~


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 17/72] softfloat: Use pointers with parts_default_nan
  2021-05-11 10:22   ` David Hildenbrand
@ 2021-05-11 20:19     ` Richard Henderson
  0 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-11 20:19 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: alex.bennee

On 5/11/21 5:22 AM, David Hildenbrand wrote:
> On 08.05.21 03:47, Richard Henderson wrote:
>> At the same time, rename to parts64_default_nan and define
>> a macro for parts_default_nan using QEMU_GENERIC.
> 
> All I can spot is "#define parts_default_nan  parts64_default_nan" -- what am I 
> missing?
> 
> apart from that
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>

Commit message not updated as the code changed.
Fixed, thanks.


r~


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 00/72] Convert floatx80 and float128 to FloatParts
  2021-05-10 13:36 ` Alex Bennée
@ 2021-05-12  1:52   ` Richard Henderson
  2021-05-12 11:22     ` Alex Bennée
  0 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-12  1:52 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, david

On 5/10/21 8:36 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Reorg everything using QEMU_GENERIC and multiple inclusion to
>> reduce the amount of code duplication between the formats.
>>
>> The use of QEMU_GENERIC means that we need to use pointers instead
>> of structures, which means that even the smaller float formats
>> need rearranging.
>>
>> I've carried it through to completion within fpu/, so that we don't
>> have (much) of the legacy code remaining.  There is some floatx80
>> stuff in target/m68k and target/i386 that's still hanging around.
> 
> FWIW I could enable a few more tests...

Ah, thanks for the reminder that these were disabled.
I'll add this to my patch set for v2.


> ...although extF80_lt_quiet still has some failures on equality tests:

This turns out to be a trivial typo in the tester itself:

diff --git a/tests/fp/wrap.c.inc b/tests/fp/wrap.c.inc
index cb1bb77e4c..9ff884c140 100644
--- a/tests/fp/wrap.c.inc
+++ b/tests/fp/wrap.c.inc
@@ -643,7 +643,7 @@ WRAP_CMP80(qemu_extF80M_eq, floatx80_eq_quiet)
  WRAP_CMP80(qemu_extF80M_le, floatx80_le)
  WRAP_CMP80(qemu_extF80M_lt, floatx80_lt)
  WRAP_CMP80(qemu_extF80M_le_quiet, floatx80_le_quiet)
-WRAP_CMP80(qemu_extF80M_lt_quiet, floatx80_le_quiet)
+WRAP_CMP80(qemu_extF80M_lt_quiet, floatx80_lt_quiet)
  #undef WRAP_CMP80

  #define WRAP_CMP128(name, func)


r~


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* Re: [PATCH 25/72] softfloat: Rearrange FloatParts64
  2021-05-11 15:04     ` Richard Henderson
@ 2021-05-12 11:08       ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-12 11:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> On 5/11/21 8:57 AM, Alex Bennée wrote:
>>> +    union {
>>> +        /* Routines that know the structure may reference the singular name. */
>>> +        uint64_t frac;
>>> +        /*
>>> +         * Routines expanded with multiple structures reference "hi" and "lo".
>>> +         * In this structure, the one word is both highest and lowest.
>>> +         */
>>> +        uint64_t frac_hi;
>>> +        uint64_t frac_lo;
>> This confuses me. Is this because it could be frac_hi or frac_lo at
>> the
>> "top" of the structure because of endian issues?
>
> Nothing about endianness.  There is exactly one element, so it is both
> the "first" and "last", both "high" and "low".
>
> Generic code will examine the "high" word when looking at overflow and
> things related, and the "low" word when doing rounding.
>
> This anonymous union gives the same element 3 different names.

Right, that makes things clearer. How about:

  Routines expanded with multiple structures reference "hi" and "lo"
  depending on the operation. In the case of FloatParts64 they are both
  the same word but aliased here to make the code easier to follow.

?

Either way have a:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 03/72] qemu/host-utils: Add wrappers for carry builtins
  2021-05-11 20:10     ` Richard Henderson
@ 2021-05-12 11:17       ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-12 11:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> On 5/10/21 7:57 AM, Alex Bennée wrote:
>> Richard Henderson <richard.henderson@linaro.org> writes:
>> 
>>> These builtins came in clang 3.8, but are not present in gcc through
>>> version 11.  Even in clang the optimization is not ideal except for
>>> x86_64, but no worse than the hand-coding that we currently do.
>> Given this statement....
>
> I think you mis-read the "except for x86_64" part?
>
> Anyway, these are simply bugs to be filed against clang, so that
> hopefully clang-12 will do a good job with the builtin.  And as I
> said, while the generated code is not ideal, it's no worse.
>
>>> +static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
>>> +{
>>> +#if __has_builtin(__builtin_addcll)
>>> +    unsigned long long c = *pcarry;
>>> +    x = __builtin_addcll(x, y, c, &c);
>> what happens when unsigned long long isn't the same as uint64_t?
>> Doesn't
>> C99 only specify a minimum?
>
> If you only look at C99, sure.  But looking at the set of supported
> hosts, unsigned long long is always a 64-bit type.

I guess I'm worrying about a theoretical future - but we don't worry
about it for other ll builtins so no biggy.

>
>>> +    *pcarry = c & 1;
>> Why do we need to clamp it here? Shouldn't the compiler
>> automatically do
>> that due to the bool?
>
> This produces a single AND insn, instead of CMP + SETcc.

Might be worth mentioning that in the commit message. 

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 00/72] Convert floatx80 and float128 to FloatParts
  2021-05-12  1:52   ` Richard Henderson
@ 2021-05-12 11:22     ` Alex Bennée
  2021-05-12 15:28       ` Richard Henderson
  0 siblings, 1 reply; 151+ messages in thread
From: Alex Bennée @ 2021-05-12 11:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> On 5/10/21 8:36 AM, Alex Bennée wrote:
>> Richard Henderson <richard.henderson@linaro.org> writes:
>> 
>>> Reorg everything using QEMU_GENERIC and multiple inclusion to
>>> reduce the amount of code duplication between the formats.
>>>
>>> The use of QEMU_GENERIC means that we need to use pointers instead
>>> of structures, which means that even the smaller float formats
>>> need rearranging.
>>>
>>> I've carried it through to completion within fpu/, so that we don't
>>> have (much) of the legacy code remaining.  There is some floatx80
>>> stuff in target/m68k and target/i386 that's still hanging around.
>> FWIW I could enable a few more tests...
>
> Ah, thanks for the reminder that these were disabled.
> I'll add this to my patch set for v2.
>
>
>> ...although extF80_lt_quiet still has some failures on equality tests:
>
> This turns out to be a trivial typo in the tester itself:
>
> diff --git a/tests/fp/wrap.c.inc b/tests/fp/wrap.c.inc
> index cb1bb77e4c..9ff884c140 100644
> --- a/tests/fp/wrap.c.inc
> +++ b/tests/fp/wrap.c.inc
> @@ -643,7 +643,7 @@ WRAP_CMP80(qemu_extF80M_eq, floatx80_eq_quiet)
>  WRAP_CMP80(qemu_extF80M_le, floatx80_le)
>  WRAP_CMP80(qemu_extF80M_lt, floatx80_lt)
>  WRAP_CMP80(qemu_extF80M_le_quiet, floatx80_le_quiet)
> -WRAP_CMP80(qemu_extF80M_lt_quiet, floatx80_le_quiet)
> +WRAP_CMP80(qemu_extF80M_lt_quiet, floatx80_lt_quiet)
>  #undef WRAP_CMP80
>
>  #define WRAP_CMP128(name, func)

\o/

I did note we are missing mulAdd tests but they seem to be missing from
the underlying testfloat code as well. I guess we don't care that much
for the 80bit code? Is it even used by any architectures?

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 18/72] softfloat: Use pointers with unpack_raw
  2021-05-08  1:47 ` [PATCH 18/72] softfloat: Use pointers with unpack_raw Richard Henderson
  2021-05-09  8:48   ` Philippe Mathieu-Daudé
@ 2021-05-12 12:58   ` Alex Bennée
  1 sibling, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-12 12:58 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> At the same time, rename to unpack_raw64.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 00/72] Convert floatx80 and float128 to FloatParts
  2021-05-12 11:22     ` Alex Bennée
@ 2021-05-12 15:28       ` Richard Henderson
  0 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-12 15:28 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, david

On 5/12/21 6:22 AM, Alex Bennée wrote:
> I did note we are missing mulAdd tests but they seem to be missing from
> the underlying testfloat code as well. I guess we don't care that much
> for the 80bit code? Is it even used by any architectures?

It's not used by any architecture, so no point in implementing it.


r~



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 00/72] Convert floatx80 and float128 to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (73 preceding siblings ...)
  2021-05-10 13:36 ` Alex Bennée
@ 2021-05-12 16:47 ` Alex Bennée
  2021-05-12 19:23 ` Alex Bennée
  2021-05-13 14:13 ` Alex Bennée
  76 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-12 16:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Reorg everything using QEMU_GENERIC and multiple inclusion to
> reduce the amount of code duplication between the formats.
>
> The use of QEMU_GENERIC means that we need to use pointers instead
> of structures, which means that even the smaller float formats
> need rearranging.

I did a basic some basic benchmarks which show no issues (although I
suspect hardfloat is hiding any true cost of the softfloat itself):

#+name: run-float-benchmarks
#+begin_src shell :results output :async
  ./fp-bench add -p single
  ./fp-bench add -p double
  ./fp-bench mul -p single
  ./fp-bench mul -p double
  ./fp-bench muladd -p single
  ./fp-bench muladd -p double
#+end_src

#+RESULTS: run-float-benchmarks-after
: 374.77 MFlops
: 287.58 MFlops
: 371.55 MFlops
: 281.48 MFlops
: 370.76 MFlops
: 287.39 MFlops

#+RESULTS: run-float-benchmarks-before
: 362.40 MFlops
: 278.65 MFlops
: 360.68 MFlops
: 280.92 MFlops
: 360.75 MFlops
: 280.76 MFlops

I guess what would be really telling is if a ext80 benchmark exhibited
any slowdown.

>
> I've carried it through to completion within fpu/, so that we don't
> have (much) of the legacy code remaining.  There is some floatx80
> stuff in target/m68k and target/i386 that's still hanging around.
>
>
> r~
>
>
> Alex Bennée (1):
>   tests/fp: add quad support to the benchmark utility
>
> Richard Henderson (71):
>   qemu/host-utils: Use __builtin_bitreverseN
>   qemu/host-utils: Add wrappers for overflow builtins
>   qemu/host-utils: Add wrappers for carry builtins
>   accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c
>   softfloat: Move the binary point to the msb
>   softfloat: Inline float_raise
>   softfloat: Use float_raise in more places
>   softfloat: Tidy a * b + inf return
>   softfloat: Add float_cmask and constants
>   softfloat: Use return_nan in float_to_float
>   softfloat: fix return_nan vs default_nan_mode
>   target/mips: Set set_default_nan_mode with set_snan_bit_is_one
>   softfloat: Do not produce a default_nan from parts_silence_nan
>   softfloat: Rename FloatParts to FloatParts64
>   softfloat: Move type-specific pack/unpack routines
>   softfloat: Use pointers with parts_default_nan
>   softfloat: Use pointers with unpack_raw
>   softfloat: Use pointers with ftype_unpack_raw
>   softfloat: Use pointers with pack_raw
>   softfloat: Use pointers with ftype_pack_raw
>   softfloat: Use pointers with ftype_unpack_canonical
>   softfloat: Use pointers with ftype_round_pack_canonical
>   softfloat: Use pointers with parts_silence_nan
>   softfloat: Rearrange FloatParts64
>   softfloat: Convert float128_silence_nan to parts
>   softfloat: Convert float128_default_nan to parts
>   softfloat: Move return_nan to softfloat-parts.c.inc
>   softfloat: Move pick_nan to softfloat-parts.c.inc
>   softfloat: Move pick_nan_muladd to softfloat-parts.c.inc
>   softfloat: Move sf_canonicalize to softfloat-parts.c.inc
>   softfloat: Move round_canonical to softfloat-parts.c.inc
>   softfloat: Use uadd64_carry, usub64_borrow in softfloat-macros.h
>   softfloat: Move addsub_floats to softfloat-parts.c.inc
>   softfloat: Implement float128_add/sub via parts
>   softfloat: Move mul_floats to softfloat-parts.c.inc
>   softfloat: Move muladd_floats to softfloat-parts.c.inc
>   softfloat: Use mulu64 for mul64To128
>   softfloat: Use add192 in mul128To256
>   softfloat: Tidy mul128By64To192
>   softfloat: Introduce sh[lr]_double primitives
>   softfloat: Move div_floats to softfloat-parts.c.inc
>   softfloat: Split float_to_float
>   softfloat: Convert float-to-float conversions with float128
>   softfloat: Move round_to_int to softfloat-parts.c.inc
>   softfloat: Move rount_to_int_and_pack to softfloat-parts.c.inc
>   softfloat: Move rount_to_uint_and_pack to softfloat-parts.c.inc
>   softfloat: Move int_to_float to softfloat-parts.c.inc
>   softfloat: Move uint_to_float to softfloat-parts.c.inc
>   softfloat: Move minmax_flags to softfloat-parts.c.inc
>   softfloat: Move compare_floats to softfloat-parts.c.inc
>   softfloat: Move scalbn_decomposed to softfloat-parts.c.inc
>   softfloat: Move sqrt_float to softfloat-parts.c.inc
>   softfloat: Split out parts_uncanon_normal
>   softfloat: Reduce FloatFmt
>   softfloat: Introduce Floatx80RoundPrec
>   softfloat: Adjust parts_uncanon_normal for floatx80
>   tests/fp/fp-test: Reverse order of floatx80 precision tests
>   softfloat: Convert floatx80_add/sub to FloatParts
>   softfloat: Convert floatx80_mul to FloatParts
>   softfloat: Convert floatx80_div to FloatParts
>   softfloat: Convert floatx80_sqrt to FloatParts
>   softfloat: Convert floatx80_round to FloatParts
>   softfloat: Convert floatx80_round_to_int to FloatParts
>   softfloat: Convert integer to floatx80 to FloatParts
>   softfloat: Convert floatx80 float conversions to FloatParts
>   softfloat: Convert floatx80 to integer to FloatParts
>   softfloat: Convert floatx80_scalbn to FloatParts
>   softfloat: Convert floatx80 compare to FloatParts
>   softfloat: Convert float32_exp2 to FloatParts
>   softfloat: Move floatN_log2 to softfloat-parts.c.inc
>   softfloat: Convert modrem operations to FloatParts
>
>  include/fpu/softfloat-helpers.h  |    5 +-
>  include/fpu/softfloat-macros.h   |  247 +-
>  include/fpu/softfloat-types.h    |   10 +-
>  include/fpu/softfloat.h          |   11 +-
>  include/qemu/host-utils.h        |  291 ++
>  target/mips/fpu_helper.h         |   10 +-
>  accel/tcg/tcg-runtime-gvec.c     |   36 +-
>  fpu/softfloat.c                  | 7760 ++++++++++--------------------
>  linux-user/arm/nwfpe/fpa11.c     |   41 +-
>  target/i386/tcg/fpu_helper.c     |   79 +-
>  target/m68k/fpu_helper.c         |   50 +-
>  target/m68k/softfloat.c          |   90 +-
>  tests/fp/fp-bench.c              |   88 +-
>  tests/fp/fp-test-log2.c          |  118 +
>  tests/fp/fp-test.c               |   11 +-
>  fpu/softfloat-parts-addsub.c.inc |   62 +
>  fpu/softfloat-parts.c.inc        | 1480 ++++++
>  fpu/softfloat-specialize.c.inc   |  424 +-
>  tests/fp/wrap.c.inc              |   12 +
>  tests/fp/meson.build             |   11 +
>  20 files changed, 4886 insertions(+), 5950 deletions(-)
>  create mode 100644 tests/fp/fp-test-log2.c
>  create mode 100644 fpu/softfloat-parts-addsub.c.inc
>  create mode 100644 fpu/softfloat-parts.c.inc


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 28/72] softfloat: Move return_nan to softfloat-parts.c.inc
  2021-05-08  1:47 ` [PATCH 28/72] softfloat: Move return_nan to softfloat-parts.c.inc Richard Henderson
@ 2021-05-12 18:10   ` David Hildenbrand
  0 siblings, 0 replies; 151+ messages in thread
From: David Hildenbrand @ 2021-05-12 18:10 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:47, Richard Henderson wrote:
> At the same time, convert to pointers, rename to return_nan$N
> and define a macro for return_nan using QEMU_GENERIC.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   fpu/softfloat.c           | 45 ++++++++++++++++++++++-----------------
>   fpu/softfloat-parts.c.inc | 37 ++++++++++++++++++++++++++++++++
>   2 files changed, 62 insertions(+), 20 deletions(-)
>   create mode 100644 fpu/softfloat-parts.c.inc
> 
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 6d5392aeab..5b26696078 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -708,6 +708,10 @@ static float128 float128_pack_raw(const FloatParts128 *p)
>   #define parts_default_nan(P, S)    PARTS_GENERIC_64_128(default_nan, P)(P, S)
>   #define parts_silence_nan(P, S)    PARTS_GENERIC_64_128(silence_nan, P)(P, S)
>   
> +static void parts64_return_nan(FloatParts64 *a, float_status *s);
> +static void parts128_return_nan(FloatParts128 *a, float_status *s);
> +
> +#define parts_return_nan(P, S)     PARTS_GENERIC_64_128(return_nan, P)(P, S)
>   
>   /*
>    * Helper functions for softfloat-parts.c.inc, per-size operations.
> @@ -914,22 +918,6 @@ static FloatParts64 round_canonical(FloatParts64 p, float_status *s,
>       return p;
>   }
>   
> -static FloatParts64 return_nan(FloatParts64 a, float_status *s)
> -{
> -    g_assert(is_nan(a.cls));
> -    if (is_snan(a.cls)) {
> -        float_raise(float_flag_invalid, s);
> -        if (!s->default_nan_mode) {
> -            parts_silence_nan(&a, s);
> -            return a;
> -        }
> -    } else if (!s->default_nan_mode) {
> -        return a;
> -    }
> -    parts_default_nan(&a, s);
> -    return a;
> -}
> -
>   static FloatParts64 pick_nan(FloatParts64 a, FloatParts64 b, float_status *s)
>   {
>       if (is_snan(a.cls) || is_snan(b.cls)) {
> @@ -991,6 +979,21 @@ static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64
>       return a;
>   }
>   
> +#define partsN(NAME)   parts64_##NAME
> +#define FloatPartsN    FloatParts64
> +
> +#include "softfloat-parts.c.inc"
> +
> +#undef  partsN
> +#undef  FloatPartsN
> +#define partsN(NAME)   parts128_##NAME
> +#define FloatPartsN    FloatParts128
> +
> +#include "softfloat-parts.c.inc"
> +
> +#undef  partsN
> +#undef  FloatPartsN
> +
>   /*
>    * Pack/unpack routines with a specific FloatFmt.
>    */
> @@ -2065,7 +2068,7 @@ static FloatParts64 float_to_float(FloatParts64 a, const FloatFmt *dstf,
>               break;
>           }
>       } else if (is_nan(a.cls)) {
> -        return return_nan(a, s);
> +        parts_return_nan(&a, s);
>       }
>       return a;
>   }
> @@ -2194,7 +2197,8 @@ static FloatParts64 round_to_int(FloatParts64 a, FloatRoundMode rmode,
>       switch (a.cls) {
>       case float_class_qnan:
>       case float_class_snan:
> -        return return_nan(a, s);
> +        parts_return_nan(&a, s);
> +        break;
>   
>       case float_class_zero:
>       case float_class_inf:
> @@ -3590,7 +3594,7 @@ FloatRelation bfloat16_compare_quiet(bfloat16 a, bfloat16 b, float_status *s)
>   static FloatParts64 scalbn_decomposed(FloatParts64 a, int n, float_status *s)
>   {
>       if (unlikely(is_nan(a.cls))) {
> -        return return_nan(a, s);
> +        parts_return_nan(&a, s);
>       }
>       if (a.cls == float_class_normal) {
>           /* The largest float type (even though not supported by FloatParts64)
> @@ -3658,7 +3662,8 @@ static FloatParts64 sqrt_float(FloatParts64 a, float_status *s, const FloatFmt *
>       int bit, last_bit;
>   
>       if (is_nan(a.cls)) {
> -        return return_nan(a, s);
> +        parts_return_nan(&a, s);
> +        return a;
>       }
>       if (a.cls == float_class_zero) {
>           return a;  /* sqrt(+-0) = +-0 */
> diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
> new file mode 100644
> index 0000000000..2a3075d6fe
> --- /dev/null
> +++ b/fpu/softfloat-parts.c.inc
> @@ -0,0 +1,37 @@
> +/*
> + * QEMU float support
> + *
> + * The code in this source file is derived from release 2a of the SoftFloat
> + * IEC/IEEE Floating-point Arithmetic Package. Those parts of the code (and
> + * some later contributions) are provided under that license, as detailed below.
> + * It has subsequently been modified by contributors to the QEMU Project,
> + * so some portions are provided under:
> + *  the SoftFloat-2a license
> + *  the BSD license
> + *  GPL-v2-or-later
> + *
> + * Any future contributions to this file after December 1st 2014 will be
> + * taken to be licensed under the Softfloat-2a license unless specifically
> + * indicated otherwise.
> + */
> +
> +static void partsN(return_nan)(FloatPartsN *a, float_status *s)
> +{
> +    switch (a->cls) {
> +    case float_class_snan:
> +        float_raise(float_flag_invalid, s);
> +        if (s->default_nan_mode) {
> +            parts_default_nan(a, s);
> +        } else {
> +            parts_silence_nan(a, s);
> +        }
> +        break;
> +    case float_class_qnan:
> +        if (s->default_nan_mode) {
> +            parts_default_nan(a, s);
> +        }
> +        break;
> +    default:
> +        g_assert_not_reached();
> +    }
> +}
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 29/72] softfloat: Move pick_nan to softfloat-parts.c.inc
  2021-05-08  1:47 ` [PATCH 29/72] softfloat: Move pick_nan " Richard Henderson
@ 2021-05-12 18:16   ` David Hildenbrand
  2021-05-13 12:28     ` Richard Henderson
  0 siblings, 1 reply; 151+ messages in thread
From: David Hildenbrand @ 2021-05-12 18:16 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:47, Richard Henderson wrote:
> At the same time, convert to pointers, rename to parts$N_pick_nan
> and define a macro for parts_pick_nan using QEMU_GENERIC.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   fpu/softfloat.c           | 62 ++++++++++++++++++++++-----------------
>   fpu/softfloat-parts.c.inc | 25 ++++++++++++++++
>   2 files changed, 60 insertions(+), 27 deletions(-)
> 
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 5b26696078..77efaedeaa 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -713,10 +713,39 @@ static void parts128_return_nan(FloatParts128 *a, float_status *s);
>   
>   #define parts_return_nan(P, S)     PARTS_GENERIC_64_128(return_nan, P)(P, S)
>   
> +static FloatParts64 *parts64_pick_nan(FloatParts64 *a, FloatParts64 *b,
> +                                      float_status *s);
> +static FloatParts128 *parts128_pick_nan(FloatParts128 *a, FloatParts128 *b,
> +                                        float_status *s);
> +
> +#define parts_pick_nan(A, B, S)    PARTS_GENERIC_64_128(pick_nan, A)(A, B, S)
> +
>   /*
>    * Helper functions for softfloat-parts.c.inc, per-size operations.
>    */
>   
> +#define FRAC_GENERIC_64_128(NAME, P) \
> +    QEMU_GENERIC(P, (FloatParts128 *, frac128_##NAME), frac64_##NAME)
> +
> +static int frac64_cmp(FloatParts64 *a, FloatParts64 *b)
> +{
> +    return a->frac == b->frac ? 0 : a->frac < b->frac ? -1 : 1;
> +}
> +
> +static int frac128_cmp(FloatParts128 *a, FloatParts128 *b)
> +{
> +    uint64_t ta = a->frac_hi, tb = b->frac_hi;
> +    if (ta == tb) {
> +        ta = a->frac_lo, tb = b->frac_lo;
> +        if (ta == tb) {
> +            return 0;
> +        }
> +    }
> +    return ta < tb ? -1 : 1;
> +}
> +
> +#define frac_cmp(A, B)  FRAC_GENERIC_64_128(cmp, A)(A, B)
> +
>   static void frac128_shl(FloatParts128 *a, int c)
>   {
>       shift128Left(a->frac_hi, a->frac_lo, c, &a->frac_hi, &a->frac_lo);
> @@ -918,27 +947,6 @@ static FloatParts64 round_canonical(FloatParts64 p, float_status *s,
>       return p;
>   }
>   
> -static FloatParts64 pick_nan(FloatParts64 a, FloatParts64 b, float_status *s)
> -{
> -    if (is_snan(a.cls) || is_snan(b.cls)) {
> -        float_raise(float_flag_invalid, s);
> -    }
> -
> -    if (s->default_nan_mode) {
> -        parts_default_nan(&a, s);
> -    } else {
> -        if (pickNaN(a.cls, b.cls,
> -                    a.frac > b.frac ||
> -                    (a.frac == b.frac && a.sign < b.sign), s)) {
> -            a = b;
> -        }
> -        if (is_snan(a.cls)) {
> -            parts_silence_nan(&a, s);
> -        }
> -    }
> -    return a;
> -}
> -
>   static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64 c,
>                                     bool inf_zero, float_status *s)
>   {
> @@ -1106,7 +1114,7 @@ static FloatParts64 addsub_floats(FloatParts64 a, FloatParts64 b, bool subtract,
>               return a;
>           }
>           if (is_nan(a.cls) || is_nan(b.cls)) {
> -            return pick_nan(a, b, s);
> +            return *parts_pick_nan(&a, &b, s);
>           }
>           if (a.cls == float_class_inf) {
>               if (b.cls == float_class_inf) {
> @@ -1144,7 +1152,7 @@ static FloatParts64 addsub_floats(FloatParts64 a, FloatParts64 b, bool subtract,
>               return a;
>           }
>           if (is_nan(a.cls) || is_nan(b.cls)) {
> -            return pick_nan(a, b, s);
> +            return *parts_pick_nan(&a, &b, s);
>           }
>           if (a.cls == float_class_inf || b.cls == float_class_zero) {
>               return a;
> @@ -1360,7 +1368,7 @@ static FloatParts64 mul_floats(FloatParts64 a, FloatParts64 b, float_status *s)
>       }
>       /* handle all the NaN cases */
>       if (is_nan(a.cls) || is_nan(b.cls)) {
> -        return pick_nan(a, b, s);
> +        return *parts_pick_nan(&a, &b, s);
>       }
>       /* Inf * Zero == NaN */
>       if ((a.cls == float_class_inf && b.cls == float_class_zero) ||
> @@ -1887,7 +1895,7 @@ static FloatParts64 div_floats(FloatParts64 a, FloatParts64 b, float_status *s)
>       }
>       /* handle all the NaN cases */
>       if (is_nan(a.cls) || is_nan(b.cls)) {
> -        return pick_nan(a, b, s);
> +        return *parts_pick_nan(&a, &b, s);
>       }
>       /* 0/0 or Inf/Inf */
>       if (a.cls == b.cls
> @@ -3295,14 +3303,14 @@ static FloatParts64 minmax_floats(FloatParts64 a, FloatParts64 b, bool ismin,
>                * the invalid exception is raised.
>                */
>               if (is_snan(a.cls) || is_snan(b.cls)) {
> -                return pick_nan(a, b, s);
> +                return *parts_pick_nan(&a, &b, s);
>               } else if (is_nan(a.cls) && !is_nan(b.cls)) {
>                   return b;
>               } else if (is_nan(b.cls) && !is_nan(a.cls)) {
>                   return a;
>               }
>           }
> -        return pick_nan(a, b, s);
> +        return *parts_pick_nan(&a, &b, s);
>       } else {
>           int a_exp, b_exp;
>   
> diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
> index 2a3075d6fe..11a71650f7 100644
> --- a/fpu/softfloat-parts.c.inc
> +++ b/fpu/softfloat-parts.c.inc
> @@ -35,3 +35,28 @@ static void partsN(return_nan)(FloatPartsN *a, float_status *s)
>           g_assert_not_reached();
>       }
>   }
> +
> +static FloatPartsN *partsN(pick_nan)(FloatPartsN *a, FloatPartsN *b,
> +                                     float_status *s)
> +{
> +    if (is_snan(a->cls) || is_snan(b->cls)) {
> +        float_raise(float_flag_invalid, s);
> +    }
> +
> +    if (s->default_nan_mode) {
> +        parts_default_nan(a, s);
> +    } else {
> +        int cmp = frac_cmp(a, b);
> +        if (cmp == 0) {
> +            cmp = a->sign < b->sign;
> +        }
> +
> +        if (pickNaN(a->cls, b->cls, cmp > 0, s)) {
> +            a = b;
> +        }
> +        if (is_snan(a->cls)) {
> +            parts_silence_nan(a, s);
> +        }
> +    }
> +    return a;
> +}
> 

I find the "*parts_pick_nan(&a, &b, s);" part a little ugly to read, but 
as long as this function isn't inline there isn't much the compiler 
could optimize when passing by value instead, so

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 30/72] softfloat: Move pick_nan_muladd to softfloat-parts.c.inc
  2021-05-08  1:47 ` [PATCH 30/72] softfloat: Move pick_nan_muladd " Richard Henderson
@ 2021-05-12 18:18   ` David Hildenbrand
  0 siblings, 0 replies; 151+ messages in thread
From: David Hildenbrand @ 2021-05-12 18:18 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:47, Richard Henderson wrote:
> At the same time, convert to pointers, rename to pick_nan_muladd$N
> and define a macro for pick_nan_muladd using QEMU_GENERIC.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   fpu/softfloat.c           | 53 ++++++++++-----------------------------
>   fpu/softfloat-parts.c.inc | 40 +++++++++++++++++++++++++++++
>   2 files changed, 53 insertions(+), 40 deletions(-)
> 
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 77efaedeaa..40ee294e35 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -720,6 +720,18 @@ static FloatParts128 *parts128_pick_nan(FloatParts128 *a, FloatParts128 *b,
>   
>   #define parts_pick_nan(A, B, S)    PARTS_GENERIC_64_128(pick_nan, A)(A, B, S)
>   
> +static FloatParts64 *parts64_pick_nan_muladd(FloatParts64 *a, FloatParts64 *b,
> +                                             FloatParts64 *c, float_status *s,
> +                                             int ab_mask, int abc_mask);
> +static FloatParts128 *parts128_pick_nan_muladd(FloatParts128 *a,
> +                                               FloatParts128 *b,
> +                                               FloatParts128 *c,
> +                                               float_status *s,
> +                                               int ab_mask, int abc_mask);
> +
> +#define parts_pick_nan_muladd(A, B, C, S, ABM, ABCM) \
> +    PARTS_GENERIC_64_128(pick_nan_muladd, A)(A, B, C, S, ABM, ABCM)
> +
>   /*
>    * Helper functions for softfloat-parts.c.inc, per-size operations.
>    */
> @@ -947,45 +959,6 @@ static FloatParts64 round_canonical(FloatParts64 p, float_status *s,
>       return p;
>   }
>   
> -static FloatParts64 pick_nan_muladd(FloatParts64 a, FloatParts64 b, FloatParts64 c,
> -                                  bool inf_zero, float_status *s)
> -{
> -    int which;
> -
> -    if (is_snan(a.cls) || is_snan(b.cls) || is_snan(c.cls)) {
> -        float_raise(float_flag_invalid, s);
> -    }
> -
> -    which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
> -
> -    if (s->default_nan_mode) {
> -        /* Note that this check is after pickNaNMulAdd so that function
> -         * has an opportunity to set the Invalid flag.
> -         */
> -        which = 3;
> -    }
> -
> -    switch (which) {
> -    case 0:
> -        break;
> -    case 1:
> -        a = b;
> -        break;
> -    case 2:
> -        a = c;
> -        break;
> -    case 3:
> -        parts_default_nan(&a, s);
> -        break;
> -    default:
> -        g_assert_not_reached();
> -    }
> -
> -    if (is_snan(a.cls)) {
> -        parts_silence_nan(&a, s);
> -    }
> -    return a;
> -}
>   
>   #define partsN(NAME)   parts64_##NAME
>   #define FloatPartsN    FloatParts64
> @@ -1496,7 +1469,7 @@ static FloatParts64 muladd_floats(FloatParts64 a, FloatParts64 b, FloatParts64 c
>        * off to the target-specific pick-a-NaN routine.
>        */
>       if (unlikely(abc_mask & float_cmask_anynan)) {
> -        return pick_nan_muladd(a, b, c, inf_zero, s);
> +        return *parts_pick_nan_muladd(&a, &b, &c, s, ab_mask, abc_mask);
>       }
>   
>       if (inf_zero) {
> diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
> index 11a71650f7..a78d61ea07 100644
> --- a/fpu/softfloat-parts.c.inc
> +++ b/fpu/softfloat-parts.c.inc
> @@ -60,3 +60,43 @@ static FloatPartsN *partsN(pick_nan)(FloatPartsN *a, FloatPartsN *b,
>       }
>       return a;
>   }
> +
> +static FloatPartsN *partsN(pick_nan_muladd)(FloatPartsN *a, FloatPartsN *b,
> +                                            FloatPartsN *c, float_status *s,
> +                                            int ab_mask, int abc_mask)
> +{
> +    int which;
> +
> +    if (unlikely(abc_mask & float_cmask_snan)) {
> +        float_raise(float_flag_invalid, s);
> +    }
> +
> +    which = pickNaNMulAdd(a->cls, b->cls, c->cls,
> +                          ab_mask == float_cmask_infzero, s);
> +
> +    if (s->default_nan_mode || which == 3) {
> +        /*
> +         * Note that this check is after pickNaNMulAdd so that function
> +         * has an opportunity to set the Invalid flag for infzero.
> +         */
> +        parts_default_nan(a, s);
> +        return a;
> +    }
> +
> +    switch (which) {
> +    case 0:
> +        break;
> +    case 1:
> +        a = b;
> +        break;
> +    case 2:
> +        a = c;
> +        break;
> +    default:
> +        g_assert_not_reached();
> +    }
> +    if (is_snan(a->cls)) {
> +        parts_silence_nan(a, s);
> +    }
> +    return a;
> +}
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 00/72] Convert floatx80 and float128 to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (74 preceding siblings ...)
  2021-05-12 16:47 ` Alex Bennée
@ 2021-05-12 19:23 ` Alex Bennée
  2021-05-13 11:49   ` Richard Henderson
  2021-05-13 14:13 ` Alex Bennée
  76 siblings, 1 reply; 151+ messages in thread
From: Alex Bennée @ 2021-05-12 19:23 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Reorg everything using QEMU_GENERIC and multiple inclusion to
> reduce the amount of code duplication between the formats.
>
> The use of QEMU_GENERIC means that we need to use pointers instead
> of structures, which means that even the smaller float formats
> need rearranging.
>
> I've carried it through to completion within fpu/, so that we don't
> have (much) of the legacy code remaining.  There is some floatx80
> stuff in target/m68k and target/i386 that's still hanging around.

OK and here are some quad benchmarks. There is actual change above the
noise but I think the biggest hit comes from the parts conversion but we
do claw some of it back:

* Run Quad Benchmarks

#+name: run-quad-float-benchmarks
#+begin_src sh :results output table append
  commit=$(git describe)
  add=$(./tests/fp/fp-bench add -p quad)
  mul=$(./tests/fp/fp-bench add -p quad)
  muladd=$(./tests/fp/fp-bench add -p quad)
  desc=$(git log --format="format:%s" HEAD^..)
  echo "$commit,$add,$mul,$muladd,$desc"
#+end_src

#+RESULTS: run-quad-float-benchmarks
| pull-target-arm-20210510-1-91-g0fe775d52c  | 90.28 MFlops | 90.15 MFlops | 90.75 MFlops |                                                                 |
| pull-target-arm-20210510-1-92-gf7a6dabee2  | 90.80 MFlops | 89.92 MFlops | 90.66 MFlops |                                                                 |
| pull-target-arm-20210510-1-93-gdb71c9fd28  | 88.93 MFlops | 89.10 MFlops | 87.32 MFlops |                                                                 |
| pull-target-arm-20210510-1-93-gdb71c9fd28  | 88.85 MFlops | 88.83 MFlops | 88.53 MFlops |                                                                 |
| pull-target-arm-20210510-1-94-g900ea1f79d  | 87.10 MFlops | 88.02 MFlops | 88.22 MFlops |                                                                 |
| pull-target-arm-20210510-1-95-gdb0bb2966f  | 88.11 MFlops | 87.10 MFlops | 87.48 MFlops | softfloat: Tidy a * b + inf return                              |
| pull-target-arm-20210510-1-95-gdb0bb2966f  | 87.27 MFlops | 84.86 MFlops | 87.99 MFlops |                                                                 |
| pull-target-arm-20210510-1-95-gdb0bb2966f  | 87.56 MFlops | 88.31 MFlops | 88.41 MFlops | softfloat: Tidy a * b + inf return                              |
| pull-target-arm-20210510-1-96-gec2be8ad0c  | 88.12 MFlops | 88.88 MFlops | 89.09 MFlops | softfloat: Add float_cmask and constants                        |
| pull-target-arm-20210510-1-97-g2328f560a1  | 91.18 MFlops | 91.84 MFlops | 91.30 MFlops | softfloat: Use return_nan in float_to_float                     |
| pull-target-arm-20210510-1-97-g2328f560a1  | 90.07 MFlops | 91.16 MFlops | 91.14 MFlops | softfloat: Use return_nan in float_to_float                     |
| pull-target-arm-20210510-1-98-g89e2096c6f  | 87.54 MFlops | 87.71 MFlops | 87.90 MFlops | softfloat: fix return_nan vs default_nan_mode                   |
| pull-target-arm-20210510-1-98-g89e2096c6f  | 87.57 MFlops | 83.80 MFlops | 85.95 MFlops | softfloat: fix return_nan vs default_nan_mode                   |
| pull-target-arm-20210510-1-99-g67ceccacea  | 89.29 MFlops | 87.46 MFlops | 87.40 MFlops | target/mips: Set set_default_nan_mode with set_snan_bit_is_one  |
| pull-target-arm-20210510-1-99-g67ceccacea  | 88.08 MFlops | 88.54 MFlops | 88.42 MFlops | target/mips: Set set_default_nan_mode with set_snan_bit_is_one  |
| pull-target-arm-20210510-1-100-g8064a6d9d9 | 92.41 MFlops | 91.85 MFlops | 92.37 MFlops | softfloat: Do not produce a default_nan from parts_silence_nan  |
| pull-target-arm-20210510-1-100-g8064a6d9d9 | 92.00 MFlops | 92.80 MFlops | 93.17 MFlops | softfloat: Do not produce a default_nan from parts_silence_nan  |
| pull-target-arm-20210510-1-101-gc303832ddb | 92.27 MFlops | 91.76 MFlops | 91.56 MFlops | softfloat: Rename FloatParts to FloatParts64                    |
| pull-target-arm-20210510-1-101-gc303832ddb | 92.64 MFlops | 92.73 MFlops | 92.54 MFlops |                                                                 |
| pull-target-arm-20210510-1-110-g8c91cc4bfd | 94.34 MFlops | 93.50 MFlops | 94.00 MFlops | softfloat: Use pointers with parts_silence_nan                  |
| pull-target-arm-20210510-1-111-g039cab1333 | 94.72 MFlops | 95.36 MFlops | 94.67 MFlops | softfloat: Rearrange FloatParts64                               |
| pull-target-arm-20210510-1-111-g039cab1333 | 94.55 MFlops | 94.99 MFlops | 95.13 MFlops |                                                                 |
| pull-target-arm-20210510-1-111-g039cab1333 | 95.55 MFlops | 94.72 MFlops | 95.55 MFlops |                                                                 |
| pull-target-arm-20210510-1-112-g5de6cec92b | 87.99 MFlops | 87.98 MFlops | 88.64 MFlops | softfloat: Convert float128_silence_nan to parts                |
| pull-target-arm-20210510-1-112-g5de6cec92b | 87.20 MFlops | 88.26 MFlops | 88.04 MFlops | softfloat: Convert float128_silence_nan to parts                |
| pull-target-arm-20210510-1-113-g6eb5e07c28 | 88.01 MFlops | 87.70 MFlops | 87.69 MFlops | softfloat: Convert float128_default_nan to parts                |
| pull-target-arm-20210510-1-113-g6eb5e07c28 | 87.88 MFlops | 87.76 MFlops | 87.20 MFlops | softfloat: Convert float128_default_nan to parts                |
| pull-target-arm-20210510-1-114-g7a4f7331e4 | 84.38 MFlops | 84.55 MFlops | 86.92 MFlops | softfloat: Move return_nan to softfloat-parts.c.inc             |
| pull-target-arm-20210510-1-115-g08f1f1f3ed | 90.40 MFlops | 89.79 MFlops | 90.74 MFlops | softfloat: Move pick_nan to softfloat-parts.c.inc               |
| pull-target-arm-20210510-1-115-g08f1f1f3ed | 90.74 MFlops | 90.11 MFlops | 90.59 MFlops | softfloat: Move pick_nan to softfloat-parts.c.inc               |
| pull-target-arm-20210510-1-116-g474eb5be10 | 87.84 MFlops | 87.04 MFlops | 88.25 MFlops | softfloat: Move pick_nan_muladd to softfloat-parts.c.inc        |
| pull-target-arm-20210510-1-116-g474eb5be10 | 88.22 MFlops | 87.79 MFlops | 88.10 MFlops | softfloat: Move pick_nan_muladd to softfloat-parts.c.inc        |
| pull-target-arm-20210510-1-117-g096a466c23 | 86.37 MFlops | 85.32 MFlops | 86.22 MFlops | softfloat: Move sf_canonicalize to softfloat-parts.c.inc        |
| pull-target-arm-20210510-1-118-g973977719f | 85.41 MFlops | 84.75 MFlops | 83.47 MFlops | softfloat: Move round_canonical to softfloat-parts.c.inc        |
| pull-target-arm-20210510-1-119-g89c1fd4763 | 85.29 MFlops | 86.27 MFlops | 85.33 MFlops | softfloat: Use uadd64_carry/usub64_borrow in softfloat-macros.h |
| pull-target-arm-20210510-1-120-gfa24239a88 | 86.61 MFlops | 86.24 MFlops | 86.60 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc          |
| pull-target-arm-20210510-1-120-gfa24239a88 | 86.86 MFlops | 86.43 MFlops | 86.38 MFlops |                                                                 |
| pull-target-arm-20210510-1-120-gfa24239a88 | 86.57 MFlops | 86.57 MFlops | 86.25 MFlops |                                                                 |
| pull-target-arm-20210510-1-121-g15cf4c773a | 74.07 MFlops | 73.24 MFlops | 73.53 MFlops |                                                                 |
| pull-target-arm-20210510-1-121-g15cf4c773a | 73.44 MFlops | 73.52 MFlops | 73.86 MFlops | softfloat: Implement float128_add/sub via parts                 |
| pull-target-arm-20210510-1-122-gc07a5fec10 | 73.38 MFlops | 73.01 MFlops | 72.94 MFlops | softfloat: Move mul_floats to softfloat-parts.c.inc             |
| pull-target-arm-20210510-1-122-gc07a5fec10 | 73.41 MFlops | 72.32 MFlops | 73.64 MFlops |                                                                 |
| pull-target-arm-20210510-1-122-gc07a5fec10 | 73.88 MFlops | 73.51 MFlops | 73.36 MFlops |                                                                 |
| pull-target-arm-20210510-1-123-g88e1635abf | 69.04 MFlops | 69.33 MFlops | 69.51 MFlops | softfloat: Move muladd_floats to softfloat-parts.c.inc          |
| pull-target-arm-20210510-1-123-g88e1635abf | 69.84 MFlops | 70.15 MFlops | 69.46 MFlops |                                                                 |
| pull-target-arm-20210510-1-123-g88e1635abf | 69.66 MFlops | 69.67 MFlops | 69.05 MFlops |                                                                 |
| pull-target-arm-20210510-1-124-g449de2f64c | 69.26 MFlops | 67.87 MFlops | 68.56 MFlops | softfloat: Use mulu64 for mul64To128                            |
| pull-target-arm-20210510-1-125-gd87617480b | 68.43 MFlops | 69.05 MFlops | 69.27 MFlops | softfloat: Use add192 in mul128To256                            |
| pull-target-arm-20210510-1-126-gd030713711 | 69.08 MFlops | 68.88 MFlops | 69.32 MFlops | softfloat: Tidy mul128By64To192                                 |
| pull-target-arm-20210510-1-127-gcb1a047f33 | 75.77 MFlops | 75.26 MFlops | 75.56 MFlops | softfloat: Introduce sh[lr]_double primitives                   |
| pull-target-arm-20210510-1-140-g0bd7f06cb0 | 72.07 MFlops | 71.54 MFlops | 71.55 MFlops | softfloat: Split out parts_uncanon_normal                       |
| pull-target-arm-20210510-1-141-g4ccbd2edf1 | 72.45 MFlops | 71.55 MFlops | 71.79 MFlops | softfloat: Reduce FloatFmt                                      |
| pull-target-arm-20210510-1-159-g97028a57bb | 74.87 MFlops | 75.75 MFlops | 75.69 MFlops |                                                                 |
| pull-target-arm-20210510-1-159-g97028a57bb | 73.34 MFlops | 73.42 MFlops | 72.93 MFlops | tests/fp: enable more tests                                     |
| pull-target-arm-20210510-1-159-g97028a57bb | 75.61 MFlops | 75.72 MFlops | 75.30 MFlops |                                                                 |


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 26/72] softfloat: Convert float128_silence_nan to parts
  2021-05-08  1:47 ` [PATCH 26/72] softfloat: Convert float128_silence_nan to parts Richard Henderson
@ 2021-05-13  8:34   ` Alex Bennée
  2021-05-13 12:25     ` Richard Henderson
  0 siblings, 1 reply; 151+ messages in thread
From: Alex Bennée @ 2021-05-13  8:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> This is the minimal change that also introduces float128_params,
> float128_unpack_raw, and float128_pack_raw without running into
> unused symbol Werrors.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  fpu/softfloat.c                | 96 +++++++++++++++++++++++++++++-----
>  fpu/softfloat-specialize.c.inc | 25 +++------
>  2 files changed, 89 insertions(+), 32 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 2d6f61ee7a..073b80d502 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -500,14 +500,12 @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
>  }
>  
>  /*
> - * Structure holding all of the decomposed parts of a float. The
> - * exponent is unbiased and the fraction is normalized. All
> - * calculations are done with a 64 bit fraction and then rounded as
> - * appropriate for the final format.
> + * Structure holding all of the decomposed parts of a float.
> + * The exponent is unbiased and the fraction is normalized.
>   *
> - * Thanks to the packed FloatClass a decent compiler should be able to
> - * fit the whole structure into registers and avoid using the stack
> - * for parameter passing.
> + * The fraction words are stored in big-endian word ordering,
> + * so that truncation from a larger format to a smaller format
> + * can be done simply by ignoring subsequent elements.
>   */
>  
>  typedef struct {
> @@ -526,6 +524,15 @@ typedef struct {
>      };
>  } FloatParts64;
>  
> +typedef struct {
> +    FloatClass cls;
> +    bool sign;
> +    int32_t exp;
> +    uint64_t frac_hi;
> +    uint64_t frac_lo;
> +} FloatParts128;
> +
> +/* These apply to the most significant word of each FloatPartsN. */
>  #define DECOMPOSED_BINARY_POINT    63
>  #define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
>  
> @@ -561,11 +568,11 @@ typedef struct {
>      .exp_bias       = ((1 << E) - 1) >> 1,                           \
>      .exp_max        = (1 << E) - 1,                                  \
>      .frac_size      = F,                                             \
> -    .frac_shift     = DECOMPOSED_BINARY_POINT - F,                   \
> -    .frac_lsb       = 1ull << (DECOMPOSED_BINARY_POINT - F),         \
> -    .frac_lsbm1     = 1ull << ((DECOMPOSED_BINARY_POINT - F) - 1),   \
> -    .round_mask     = (1ull << (DECOMPOSED_BINARY_POINT - F)) - 1,   \
> -    .roundeven_mask = (2ull << (DECOMPOSED_BINARY_POINT - F)) - 1
> +    .frac_shift     = (-F - 1) & 63,                                 \
> +    .frac_lsb       = 1ull << ((-F - 1) & 63),                       \
> +    .frac_lsbm1     = 1ull << ((-F - 2) & 63),                       \
> +    .round_mask     = (1ull << ((-F - 1) & 63)) - 1,                 \
> +    .roundeven_mask = (2ull << ((-F - 1) & 63)) - 1
>

I have to admit I find the switch to (-F - 1) & 63 a little black
magical. Isn't the shift always going to end up a factor of the number
of exponent bits we need to move past and the natural size of the
original float?

Anyway my personal brain twisting aside it obviously works and
everything else looks fine so:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 27/72] softfloat: Convert float128_default_nan to parts
  2021-05-08  1:47 ` [PATCH 27/72] softfloat: Convert float128_default_nan " Richard Henderson
@ 2021-05-13  8:56   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13  8:56 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 31/72] softfloat: Move sf_canonicalize to softfloat-parts.c.inc
  2021-05-08  1:47 ` [PATCH 31/72] softfloat: Move sf_canonicalize " Richard Henderson
@ 2021-05-13  9:45   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13  9:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> At the same time, convert to pointers, rename to parts$N_canonicalize
> and define a macro for parts_canonicalize using QEMU_GENERIC.

You also changed the ordering of checks and the likely/unlikey
conditions which would be worth mentioning why in the commit. Otherwise
looks good to me:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 32/72] softfloat: Move round_canonical to softfloat-parts.c.inc
  2021-05-08  1:47 ` [PATCH 32/72] softfloat: Move round_canonical " Richard Henderson
@ 2021-05-13  9:53   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13  9:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> At the same time, convert to pointers, renaming to parts$N_uncanon,
> and define a macro for parts_uncanon using QEMU_GENERIC.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 33/72] softfloat: Use uadd64_carry, usub64_borrow in softfloat-macros.h
  2021-05-08  1:47 ` [PATCH 33/72] softfloat: Use uadd64_carry, usub64_borrow in softfloat-macros.h Richard Henderson
@ 2021-05-13 10:00   ` Alex Bennée
  2021-05-13 12:38     ` Richard Henderson
  0 siblings, 1 reply; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 10:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Use compiler support for carry arithmetic.

Didn't we have a series that attempted to take advantage of compiler
support for Int128? Is the compiler end up smart enough to use 128 bit
wide ops if it can?

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 34/72] softfloat: Move addsub_floats to softfloat-parts.c.inc
  2021-05-08  1:47 ` [PATCH 34/72] softfloat: Move addsub_floats to softfloat-parts.c.inc Richard Henderson
@ 2021-05-13 10:03   ` Alex Bennée
  2021-05-13 10:05   ` Alex Bennée
  1 sibling, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 10:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> In preparation for implementing multiple sizes.  Rename to parts_addsub,
> split out parts_add/sub_normal for future reuse with muladd.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 34/72] softfloat: Move addsub_floats to softfloat-parts.c.inc
  2021-05-08  1:47 ` [PATCH 34/72] softfloat: Move addsub_floats to softfloat-parts.c.inc Richard Henderson
  2021-05-13 10:03   ` Alex Bennée
@ 2021-05-13 10:05   ` Alex Bennée
  1 sibling, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 10:05 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> In preparation for implementing multiple sizes.  Rename to parts_addsub,
> split out parts_add/sub_normal for future reuse with muladd.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 35/72] softfloat: Implement float128_add/sub via parts
  2021-05-08  1:47 ` [PATCH 35/72] softfloat: Implement float128_add/sub via parts Richard Henderson
@ 2021-05-13 10:06   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 10:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Replace the existing Berkeley implementation with the
> FloatParts implementation.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Nice ;-)

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 36/72] softfloat: Move mul_floats to softfloat-parts.c.inc
  2021-05-08  1:47 ` [PATCH 36/72] softfloat: Move mul_floats to softfloat-parts.c.inc Richard Henderson
@ 2021-05-13 10:08   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 10:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Rename to parts$N_mul.
> Reimplement float128_mul with FloatParts128.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 37/72] softfloat: Move muladd_floats to softfloat-parts.c.inc
  2021-05-08  1:47 ` [PATCH 37/72] softfloat: Move muladd_floats " Richard Henderson
@ 2021-05-13 10:43   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 10:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Rename to parts$N_muladd.
> Implement float128_muladd with FloatParts128.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 39/72] softfloat: Use add192 in mul128To256
  2021-05-08  1:47 ` [PATCH 39/72] softfloat: Use add192 in mul128To256 Richard Henderson
@ 2021-05-13 10:49   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 10:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> We can perform the operation in 6 total adds instead of 8.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  include/fpu/softfloat-macros.h | 37 +++++++++++-----------------------
>  1 file changed, 12 insertions(+), 25 deletions(-)
>
> diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
> index f6dfbe108d..76327d844d 100644
> --- a/include/fpu/softfloat-macros.h
> +++ b/include/fpu/softfloat-macros.h
> @@ -511,34 +511,21 @@ static inline void
>  | the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'.
>  *----------------------------------------------------------------------------*/
>  
> -static inline void
> - mul128To256(
> -     uint64_t a0,
> -     uint64_t a1,
> -     uint64_t b0,
> -     uint64_t b1,
> -     uint64_t *z0Ptr,
> -     uint64_t *z1Ptr,
> -     uint64_t *z2Ptr,
> -     uint64_t *z3Ptr
> - )
> +static inline void mul128To256(uint64_t a0, uint64_t a1,
> +                               uint64_t b0, uint64_t b1,
> +                               uint64_t *z0Ptr, uint64_t *z1Ptr,
> +                               uint64_t *z2Ptr, uint64_t *z3Ptr)
>  {
> -    uint64_t z0, z1, z2, z3;
> -    uint64_t more1, more2;
> +    uint64_t z0, z1, z2;
> +    uint64_t m0, m1, m2, n1, n2;
>  
> -    mul64To128( a1, b1, &z2, &z3 );
> -    mul64To128( a1, b0, &z1, &more2 );
> -    add128( z1, more2, 0, z2, &z1, &z2 );
> -    mul64To128( a0, b0, &z0, &more1 );
> -    add128( z0, more1, 0, z1, &z0, &z1 );
> -    mul64To128( a0, b1, &more1, &more2 );
> -    add128( more1, more2, 0, z2, &more1, &z2 );
> -    add128( z0, z1, 0, more1, &z0, &z1 );
> -    *z3Ptr = z3;
> -    *z2Ptr = z2;
> -    *z1Ptr = z1;
> -    *z0Ptr = z0;
> +    mul64To128(a1, b0, &m1, &m2);
> +    mul64To128(a0, b1, &n1, &n2);
> +    mul64To128(a1, b1, &z2, z3Ptr);
> +    mul64To128(a0, b0, &z0, &z1);
>  
> +    add192( 0, m1, m2,  0, n1, n2, &m0, &m1, &m2);
> +    add192(m0, m1, m2, z0, z1, z2, z0Ptr, z1Ptr, z2Ptr);
>  }
>  
>  /*----------------------------------------------------------------------------


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 40/72] softfloat: Tidy mul128By64To192
  2021-05-08  1:47 ` [PATCH 40/72] softfloat: Tidy mul128By64To192 Richard Henderson
@ 2021-05-13 10:50   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 10:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Clean up the formatting and variables; no functional change.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 41/72] softfloat: Introduce sh[lr]_double primitives
  2021-05-08  1:47 ` [PATCH 41/72] softfloat: Introduce sh[lr]_double primitives Richard Henderson
@ 2021-05-13 10:59   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 10:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Have x86_64 assembly for them, with a fallback.
> This avoids shuffling values through %cl in the x86 case.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/fpu/softfloat-macros.h |  36 ++++++++++++
>  fpu/softfloat.c                | 102 +++++++++++++++++++++++++--------
>  2 files changed, 115 insertions(+), 23 deletions(-)
>
> diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
> index 672c1db555..ec4e27a595 100644
> --- a/include/fpu/softfloat-macros.h
> +++ b/include/fpu/softfloat-macros.h
> @@ -85,6 +85,42 @@ this code that are retained.
>  #include "fpu/softfloat-types.h"
>  #include "qemu/host-utils.h"
>  
> +/**
> + * shl_double: double-word merging left shift
> + * @l: left or most-significant word
> + * @r: right or least-significant word
> + * @c: shift count
> + *
> + * Shift @l left by @c bits, shifting in bits from @r.
> + */
> +static inline uint64_t shl_double(uint64_t l, uint64_t r, int c)
> +{
> +#if defined(__x86_64__)
> +    asm("shld %b2, %1, %0" : "+r"(l) : "r"(r), "ci"(c));
> +    return l;
> +#else
> +    return c ? (l << c) | (r >> (64 - c)) : l;
> +#endif
> +}
> +
> +/**
> + * shr_double: double-word merging right shift
> + * @l: left or most-significant word
> + * @r: right or least-significant word
> + * @c: shift count
> + *
> + * Shift @r right by @c bits, shifting in bits from @l.
> + */
> +static inline uint64_t shr_double(uint64_t l, uint64_t r, int c)
> +{
> +#if defined(__x86_64__)
> +    asm("shrd %b2, %1, %0" : "+r"(r) : "r"(l), "ci"(c));
> +    return r;
> +#else
> +    return c ? (r >> c) | (l << (64 - c)) : r;
> +#endif
> +}
> +

I was pondering if these deserve to live in bitops but given they are
softfloat only for the time being and we don't do arch specific hacks in
there:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 42/72] softfloat: Move div_floats to softfloat-parts.c.inc
  2021-05-08  1:47 ` [PATCH 42/72] softfloat: Move div_floats to softfloat-parts.c.inc Richard Henderson
@ 2021-05-13 11:02   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 11:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Rename to parts$N_div.
> Implement float128_div with FloatParts128.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 43/72] softfloat: Split float_to_float
  2021-05-08  1:47 ` [PATCH 43/72] softfloat: Split float_to_float Richard Henderson
@ 2021-05-13 11:04   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 11:04 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Split out parts_float_to_ahp and parts_float_to_float.
> Convert to pointers.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 44/72] softfloat: Convert float-to-float conversions with float128
  2021-05-08  1:47 ` [PATCH 44/72] softfloat: Convert float-to-float conversions with float128 Richard Henderson
@ 2021-05-13 11:17   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 11:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Introduce parts_float_to_float_widen and parts_float_to_float_narrow.
> Use them for float128_to_float{32,64} and float{32,64}_to_float128.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 00/72] Convert floatx80 and float128 to FloatParts
  2021-05-12 19:23 ` Alex Bennée
@ 2021-05-13 11:49   ` Richard Henderson
  2021-05-13 13:33     ` Alex Bennée
  0 siblings, 1 reply; 151+ messages in thread
From: Richard Henderson @ 2021-05-13 11:49 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, david

On 5/12/21 2:23 PM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Reorg everything using QEMU_GENERIC and multiple inclusion to
>> reduce the amount of code duplication between the formats.
>>
>> The use of QEMU_GENERIC means that we need to use pointers instead
>> of structures, which means that even the smaller float formats
>> need rearranging.
>>
>> I've carried it through to completion within fpu/, so that we don't
>> have (much) of the legacy code remaining.  There is some floatx80
>> stuff in target/m68k and target/i386 that's still hanging around.
> 
> OK and here are some quad benchmarks. There is actual change above the
> noise but I think the biggest hit comes from the parts conversion but we
> do claw some of it back:
> 
> * Run Quad Benchmarks
> 
> #+name: run-quad-float-benchmarks
> #+begin_src sh :results output table append
>    commit=$(git describe)
>    add=$(./tests/fp/fp-bench add -p quad)
>    mul=$(./tests/fp/fp-bench add -p quad)
>    muladd=$(./tests/fp/fp-bench add -p quad)
>    desc=$(git log --format="format:%s" HEAD^..)
>    echo "$commit,$add,$mul,$muladd,$desc"
> #+end_src
> 
> #+RESULTS: run-quad-float-benchmarks
> | pull-target-arm-20210510-1-91-g0fe775d52c  | 90.28 MFlops | 90.15 MFlops | 90.75 MFlops |                                                                 |
> | pull-target-arm-20210510-1-92-gf7a6dabee2  | 90.80 MFlops | 89.92 MFlops | 90.66 MFlops |                                                                 |
> | pull-target-arm-20210510-1-93-gdb71c9fd28  | 88.93 MFlops | 89.10 MFlops | 87.32 MFlops |                                                                 |
> | pull-target-arm-20210510-1-93-gdb71c9fd28  | 88.85 MFlops | 88.83 MFlops | 88.53 MFlops |                                                                 |
> | pull-target-arm-20210510-1-94-g900ea1f79d  | 87.10 MFlops | 88.02 MFlops | 88.22 MFlops |                                                                 |
> | pull-target-arm-20210510-1-95-gdb0bb2966f  | 88.11 MFlops | 87.10 MFlops | 87.48 MFlops | softfloat: Tidy a * b + inf return                              |
> | pull-target-arm-20210510-1-95-gdb0bb2966f  | 87.27 MFlops | 84.86 MFlops | 87.99 MFlops |                                                                 |
> | pull-target-arm-20210510-1-95-gdb0bb2966f  | 87.56 MFlops | 88.31 MFlops | 88.41 MFlops | softfloat: Tidy a * b + inf return                              |
> | pull-target-arm-20210510-1-96-gec2be8ad0c  | 88.12 MFlops | 88.88 MFlops | 89.09 MFlops | softfloat: Add float_cmask and constants                        |
> | pull-target-arm-20210510-1-97-g2328f560a1  | 91.18 MFlops | 91.84 MFlops | 91.30 MFlops | softfloat: Use return_nan in float_to_float                     |
> | pull-target-arm-20210510-1-97-g2328f560a1  | 90.07 MFlops | 91.16 MFlops | 91.14 MFlops | softfloat: Use return_nan in float_to_float                     |
> | pull-target-arm-20210510-1-98-g89e2096c6f  | 87.54 MFlops | 87.71 MFlops | 87.90 MFlops | softfloat: fix return_nan vs default_nan_mode                   |
> | pull-target-arm-20210510-1-98-g89e2096c6f  | 87.57 MFlops | 83.80 MFlops | 85.95 MFlops | softfloat: fix return_nan vs default_nan_mode                   |
> | pull-target-arm-20210510-1-99-g67ceccacea  | 89.29 MFlops | 87.46 MFlops | 87.40 MFlops | target/mips: Set set_default_nan_mode with set_snan_bit_is_one  |
> | pull-target-arm-20210510-1-99-g67ceccacea  | 88.08 MFlops | 88.54 MFlops | 88.42 MFlops | target/mips: Set set_default_nan_mode with set_snan_bit_is_one  |
> | pull-target-arm-20210510-1-100-g8064a6d9d9 | 92.41 MFlops | 91.85 MFlops | 92.37 MFlops | softfloat: Do not produce a default_nan from parts_silence_nan  |
> | pull-target-arm-20210510-1-100-g8064a6d9d9 | 92.00 MFlops | 92.80 MFlops | 93.17 MFlops | softfloat: Do not produce a default_nan from parts_silence_nan  |
> | pull-target-arm-20210510-1-101-gc303832ddb | 92.27 MFlops | 91.76 MFlops | 91.56 MFlops | softfloat: Rename FloatParts to FloatParts64                    |
> | pull-target-arm-20210510-1-101-gc303832ddb | 92.64 MFlops | 92.73 MFlops | 92.54 MFlops |                                                                 |
> | pull-target-arm-20210510-1-110-g8c91cc4bfd | 94.34 MFlops | 93.50 MFlops | 94.00 MFlops | softfloat: Use pointers with parts_silence_nan                  |
> | pull-target-arm-20210510-1-111-g039cab1333 | 94.72 MFlops | 95.36 MFlops | 94.67 MFlops | softfloat: Rearrange FloatParts64                               |
> | pull-target-arm-20210510-1-111-g039cab1333 | 94.55 MFlops | 94.99 MFlops | 95.13 MFlops |                                                                 |
> | pull-target-arm-20210510-1-111-g039cab1333 | 95.55 MFlops | 94.72 MFlops | 95.55 MFlops |                                                                 |
> | pull-target-arm-20210510-1-112-g5de6cec92b | 87.99 MFlops | 87.98 MFlops | 88.64 MFlops | softfloat: Convert float128_silence_nan to parts                |
> | pull-target-arm-20210510-1-112-g5de6cec92b | 87.20 MFlops | 88.26 MFlops | 88.04 MFlops | softfloat: Convert float128_silence_nan to parts                |
> | pull-target-arm-20210510-1-113-g6eb5e07c28 | 88.01 MFlops | 87.70 MFlops | 87.69 MFlops | softfloat: Convert float128_default_nan to parts                |
> | pull-target-arm-20210510-1-113-g6eb5e07c28 | 87.88 MFlops | 87.76 MFlops | 87.20 MFlops | softfloat: Convert float128_default_nan to parts                |
> | pull-target-arm-20210510-1-114-g7a4f7331e4 | 84.38 MFlops | 84.55 MFlops | 86.92 MFlops | softfloat: Move return_nan to softfloat-parts.c.inc             |
> | pull-target-arm-20210510-1-115-g08f1f1f3ed | 90.40 MFlops | 89.79 MFlops | 90.74 MFlops | softfloat: Move pick_nan to softfloat-parts.c.inc               |
> | pull-target-arm-20210510-1-115-g08f1f1f3ed | 90.74 MFlops | 90.11 MFlops | 90.59 MFlops | softfloat: Move pick_nan to softfloat-parts.c.inc               |
> | pull-target-arm-20210510-1-116-g474eb5be10 | 87.84 MFlops | 87.04 MFlops | 88.25 MFlops | softfloat: Move pick_nan_muladd to softfloat-parts.c.inc        |
> | pull-target-arm-20210510-1-116-g474eb5be10 | 88.22 MFlops | 87.79 MFlops | 88.10 MFlops | softfloat: Move pick_nan_muladd to softfloat-parts.c.inc        |
> | pull-target-arm-20210510-1-117-g096a466c23 | 86.37 MFlops | 85.32 MFlops | 86.22 MFlops | softfloat: Move sf_canonicalize to softfloat-parts.c.inc        |
> | pull-target-arm-20210510-1-118-g973977719f | 85.41 MFlops | 84.75 MFlops | 83.47 MFlops | softfloat: Move round_canonical to softfloat-parts.c.inc        |
> | pull-target-arm-20210510-1-119-g89c1fd4763 | 85.29 MFlops | 86.27 MFlops | 85.33 MFlops | softfloat: Use uadd64_carry/usub64_borrow in softfloat-macros.h |
> | pull-target-arm-20210510-1-120-gfa24239a88 | 86.61 MFlops | 86.24 MFlops | 86.60 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc          |
> | pull-target-arm-20210510-1-120-gfa24239a88 | 86.86 MFlops | 86.43 MFlops | 86.38 MFlops |                                                                 |
> | pull-target-arm-20210510-1-120-gfa24239a88 | 86.57 MFlops | 86.57 MFlops | 86.25 MFlops |                                                                 |
> | pull-target-arm-20210510-1-121-g15cf4c773a | 74.07 MFlops | 73.24 MFlops | 73.53 MFlops |                                                                 |

If I'm reading your output properly, there should have been no change in the 
benchmarking through to this point, because all we have done so far is modify 
float64 and below.

Thus there seems to be a *lot* of noise in your numbers.

> | pull-target-arm-20210510-1-121-g15cf4c773a | 73.44 MFlops | 73.52 MFlops | 73.86 MFlops | softfloat: Implement float128_add/sub via parts                 |

This is where I would have expected a hit to the first column, and no change to 
the other two.

> | pull-target-arm-20210510-1-122-gc07a5fec10 | 73.38 MFlops | 73.01 MFlops | 72.94 MFlops | softfloat: Move mul_floats to softfloat-parts.c.inc             |
> | pull-target-arm-20210510-1-122-gc07a5fec10 | 73.41 MFlops | 72.32 MFlops | 73.64 MFlops |                                                                 |
> | pull-target-arm-20210510-1-122-gc07a5fec10 | 73.88 MFlops | 73.51 MFlops | 73.36 MFlops |                                                                 |
> | pull-target-arm-20210510-1-123-g88e1635abf | 69.04 MFlops | 69.33 MFlops | 69.51 MFlops | softfloat: Move muladd_floats to softfloat-parts.c.inc          |

Now add and mul columns are going down when the only change is to muladd?  Is 
this just more noise?

> | pull-target-arm-20210510-1-123-g88e1635abf | 69.84 MFlops | 70.15 MFlops | 69.46 MFlops |                                                                 |
> | pull-target-arm-20210510-1-123-g88e1635abf | 69.66 MFlops | 69.67 MFlops | 69.05 MFlops |                                                                 |
> | pull-target-arm-20210510-1-124-g449de2f64c | 69.26 MFlops | 67.87 MFlops | 68.56 MFlops | softfloat: Use mulu64 for mul64To128                            |
> | pull-target-arm-20210510-1-125-gd87617480b | 68.43 MFlops | 69.05 MFlops | 69.27 MFlops | softfloat: Use add192 in mul128To256                            |
> | pull-target-arm-20210510-1-126-gd030713711 | 69.08 MFlops | 68.88 MFlops | 69.32 MFlops | softfloat: Tidy mul128By64To192                                 |
> | pull-target-arm-20210510-1-127-gcb1a047f33 | 75.77 MFlops | 75.26 MFlops | 75.56 MFlops | softfloat: Introduce sh[lr]_double primitives                   |
> | pull-target-arm-20210510-1-140-g0bd7f06cb0 | 72.07 MFlops | 71.54 MFlops | 71.55 MFlops | softfloat: Split out parts_uncanon_normal                       |
> | pull-target-arm-20210510-1-141-g4ccbd2edf1 | 72.45 MFlops | 71.55 MFlops | 71.79 MFlops | softfloat: Reduce FloatFmt                                      |
> | pull-target-arm-20210510-1-159-g97028a57bb | 74.87 MFlops | 75.75 MFlops | 75.69 MFlops |                                                                 |
> | pull-target-arm-20210510-1-159-g97028a57bb | 73.34 MFlops | 73.42 MFlops | 72.93 MFlops | tests/fp: enable more tests                                     |
> | pull-target-arm-20210510-1-159-g97028a57bb | 75.61 MFlops | 75.72 MFlops | 75.30 MFlops |                                                                 |
> 
> 

r~


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 26/72] softfloat: Convert float128_silence_nan to parts
  2021-05-13  8:34   ` Alex Bennée
@ 2021-05-13 12:25     ` Richard Henderson
  0 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-13 12:25 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, david

On 5/13/21 3:34 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> This is the minimal change that also introduces float128_params,
>> float128_unpack_raw, and float128_pack_raw without running into
>> unused symbol Werrors.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   fpu/softfloat.c                | 96 +++++++++++++++++++++++++++++-----
>>   fpu/softfloat-specialize.c.inc | 25 +++------
>>   2 files changed, 89 insertions(+), 32 deletions(-)
>>
>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>> index 2d6f61ee7a..073b80d502 100644
>> --- a/fpu/softfloat.c
>> +++ b/fpu/softfloat.c
>> @@ -500,14 +500,12 @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
>>   }
>>   
>>   /*
>> - * Structure holding all of the decomposed parts of a float. The
>> - * exponent is unbiased and the fraction is normalized. All
>> - * calculations are done with a 64 bit fraction and then rounded as
>> - * appropriate for the final format.
>> + * Structure holding all of the decomposed parts of a float.
>> + * The exponent is unbiased and the fraction is normalized.
>>    *
>> - * Thanks to the packed FloatClass a decent compiler should be able to
>> - * fit the whole structure into registers and avoid using the stack
>> - * for parameter passing.
>> + * The fraction words are stored in big-endian word ordering,
>> + * so that truncation from a larger format to a smaller format
>> + * can be done simply by ignoring subsequent elements.
>>    */
>>   
>>   typedef struct {
>> @@ -526,6 +524,15 @@ typedef struct {
>>       };
>>   } FloatParts64;
>>   
>> +typedef struct {
>> +    FloatClass cls;
>> +    bool sign;
>> +    int32_t exp;
>> +    uint64_t frac_hi;
>> +    uint64_t frac_lo;
>> +} FloatParts128;
>> +
>> +/* These apply to the most significant word of each FloatPartsN. */
>>   #define DECOMPOSED_BINARY_POINT    63
>>   #define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
>>   
>> @@ -561,11 +568,11 @@ typedef struct {
>>       .exp_bias       = ((1 << E) - 1) >> 1,                           \
>>       .exp_max        = (1 << E) - 1,                                  \
>>       .frac_size      = F,                                             \
>> -    .frac_shift     = DECOMPOSED_BINARY_POINT - F,                   \
>> -    .frac_lsb       = 1ull << (DECOMPOSED_BINARY_POINT - F),         \
>> -    .frac_lsbm1     = 1ull << ((DECOMPOSED_BINARY_POINT - F) - 1),   \
>> -    .round_mask     = (1ull << (DECOMPOSED_BINARY_POINT - F)) - 1,   \
>> -    .roundeven_mask = (2ull << (DECOMPOSED_BINARY_POINT - F)) - 1
>> +    .frac_shift     = (-F - 1) & 63,                                 \
>> +    .frac_lsb       = 1ull << ((-F - 1) & 63),                       \
>> +    .frac_lsbm1     = 1ull << ((-F - 2) & 63),                       \
>> +    .round_mask     = (1ull << ((-F - 1) & 63)) - 1,                 \
>> +    .roundeven_mask = (2ull << ((-F - 1) & 63)) - 1
>>
> 
> I have to admit I find the switch to (-F - 1) & 63 a little black
> magical. Isn't the shift always going to end up a factor of the number
> of exponent bits we need to move past and the natural size of the
> original float?

Yep.  But now we're looking to compute the number relative to .frac_lo, rather 
than the entire logical fraction.


r~

> 
> Anyway my personal brain twisting aside it obviously works and
> everything else looks fine so:
> 
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> 



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 29/72] softfloat: Move pick_nan to softfloat-parts.c.inc
  2021-05-12 18:16   ` David Hildenbrand
@ 2021-05-13 12:28     ` Richard Henderson
  0 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-13 12:28 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: alex.bennee

On 5/12/21 1:16 PM, David Hildenbrand wrote:
> I find the "*parts_pick_nan(&a, &b, s);" part a little ugly to read...

Yeah, thankfully that's transitional and gets cleaned up later.


r~


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 33/72] softfloat: Use uadd64_carry, usub64_borrow in softfloat-macros.h
  2021-05-13 10:00   ` Alex Bennée
@ 2021-05-13 12:38     ` Richard Henderson
  0 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-13 12:38 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, david

On 5/13/21 5:00 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Use compiler support for carry arithmetic.
> 
> Didn't we have a series that attempted to take advantage of compiler
> support for Int128? Is the compiler end up smart enough to use 128 bit
> wide ops if it can?

That was one of the iterations of this patch set, yes.  In the end it turned 
out to be easier to be able to address frac_hi and frac_lo directly.

It's likely that this choice would have been different if we didn't have to 
deal with 32-bit hosts and could rely on the compiler having the 128-bit type.


r~


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 00/72] Convert floatx80 and float128 to FloatParts
  2021-05-13 11:49   ` Richard Henderson
@ 2021-05-13 13:33     ` Alex Bennée
  2021-05-13 23:54       ` Richard Henderson
  0 siblings, 1 reply; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 13:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> On 5/12/21 2:23 PM, Alex Bennée wrote:
>> Richard Henderson <richard.henderson@linaro.org> writes:
>> 
>>> Reorg everything using QEMU_GENERIC and multiple inclusion to
>>> reduce the amount of code duplication between the formats.
>>>
>>> The use of QEMU_GENERIC means that we need to use pointers instead
>>> of structures, which means that even the smaller float formats
>>> need rearranging.
>>>
>>> I've carried it through to completion within fpu/, so that we don't
>>> have (much) of the legacy code remaining.  There is some floatx80
>>> stuff in target/m68k and target/i386 that's still hanging around.
>> OK and here are some quad benchmarks. There is actual change above
>> the
>> noise but I think the biggest hit comes from the parts conversion but we
>> do claw some of it back:
>> * Run Quad Benchmarks
>> #+name: run-quad-float-benchmarks
>> #+begin_src sh :results output table append
>>    commit=$(git describe)
>>    add=$(./tests/fp/fp-bench add -p quad)
>>    mul=$(./tests/fp/fp-bench add -p quad)
>>    muladd=$(./tests/fp/fp-bench add -p quad)
>>    desc=$(git log --format="format:%s" HEAD^..)
>>    echo "$commit,$add,$mul,$muladd,$desc"
>> #+end_src
>> #+RESULTS: run-quad-float-benchmarks
>> | pull-target-arm-20210510-1-91-g0fe775d52c  | 90.28 MFlops | 90.15 MFlops | 90.75 MFlops |                                                                 |
>> | pull-target-arm-20210510-1-92-gf7a6dabee2  | 90.80 MFlops | 89.92 MFlops | 90.66 MFlops |                                                                 |
>> | pull-target-arm-20210510-1-93-gdb71c9fd28  | 88.93 MFlops | 89.10 MFlops | 87.32 MFlops |                                                                 |
>> | pull-target-arm-20210510-1-93-gdb71c9fd28  | 88.85 MFlops | 88.83 MFlops | 88.53 MFlops |                                                                 |
>> | pull-target-arm-20210510-1-94-g900ea1f79d  | 87.10 MFlops | 88.02 MFlops | 88.22 MFlops |                                                                 |
>> | pull-target-arm-20210510-1-95-gdb0bb2966f  | 88.11 MFlops | 87.10 MFlops | 87.48 MFlops | softfloat: Tidy a * b + inf return                              |
>> | pull-target-arm-20210510-1-95-gdb0bb2966f  | 87.27 MFlops | 84.86 MFlops | 87.99 MFlops |                                                                 |
>> | pull-target-arm-20210510-1-95-gdb0bb2966f  | 87.56 MFlops | 88.31 MFlops | 88.41 MFlops | softfloat: Tidy a * b + inf return                              |
>> | pull-target-arm-20210510-1-96-gec2be8ad0c  | 88.12 MFlops | 88.88 MFlops | 89.09 MFlops | softfloat: Add float_cmask and constants                        |
>> | pull-target-arm-20210510-1-97-g2328f560a1  | 91.18 MFlops | 91.84 MFlops | 91.30 MFlops | softfloat: Use return_nan in float_to_float                     |
>> | pull-target-arm-20210510-1-97-g2328f560a1  | 90.07 MFlops | 91.16 MFlops | 91.14 MFlops | softfloat: Use return_nan in float_to_float                     |
>> | pull-target-arm-20210510-1-98-g89e2096c6f  | 87.54 MFlops | 87.71 MFlops | 87.90 MFlops | softfloat: fix return_nan vs default_nan_mode                   |
>> | pull-target-arm-20210510-1-98-g89e2096c6f  | 87.57 MFlops | 83.80 MFlops | 85.95 MFlops | softfloat: fix return_nan vs default_nan_mode                   |
>> | pull-target-arm-20210510-1-99-g67ceccacea  | 89.29 MFlops | 87.46 MFlops | 87.40 MFlops | target/mips: Set set_default_nan_mode with set_snan_bit_is_one  |
>> | pull-target-arm-20210510-1-99-g67ceccacea  | 88.08 MFlops | 88.54 MFlops | 88.42 MFlops | target/mips: Set set_default_nan_mode with set_snan_bit_is_one  |
>> | pull-target-arm-20210510-1-100-g8064a6d9d9 | 92.41 MFlops | 91.85 MFlops | 92.37 MFlops | softfloat: Do not produce a default_nan from parts_silence_nan  |
>> | pull-target-arm-20210510-1-100-g8064a6d9d9 | 92.00 MFlops | 92.80 MFlops | 93.17 MFlops | softfloat: Do not produce a default_nan from parts_silence_nan  |
>> | pull-target-arm-20210510-1-101-gc303832ddb | 92.27 MFlops | 91.76 MFlops | 91.56 MFlops | softfloat: Rename FloatParts to FloatParts64                    |
>> | pull-target-arm-20210510-1-101-gc303832ddb | 92.64 MFlops | 92.73 MFlops | 92.54 MFlops |                                                                 |
>> | pull-target-arm-20210510-1-110-g8c91cc4bfd | 94.34 MFlops | 93.50 MFlops | 94.00 MFlops | softfloat: Use pointers with parts_silence_nan                  |
>> | pull-target-arm-20210510-1-111-g039cab1333 | 94.72 MFlops | 95.36 MFlops | 94.67 MFlops | softfloat: Rearrange FloatParts64                               |
>> | pull-target-arm-20210510-1-111-g039cab1333 | 94.55 MFlops | 94.99 MFlops | 95.13 MFlops |                                                                 |
>> | pull-target-arm-20210510-1-111-g039cab1333 | 95.55 MFlops | 94.72 MFlops | 95.55 MFlops |                                                                 |
>> | pull-target-arm-20210510-1-112-g5de6cec92b | 87.99 MFlops | 87.98 MFlops | 88.64 MFlops | softfloat: Convert float128_silence_nan to parts                |
>> | pull-target-arm-20210510-1-112-g5de6cec92b | 87.20 MFlops | 88.26 MFlops | 88.04 MFlops | softfloat: Convert float128_silence_nan to parts                |
>> | pull-target-arm-20210510-1-113-g6eb5e07c28 | 88.01 MFlops | 87.70 MFlops | 87.69 MFlops | softfloat: Convert float128_default_nan to parts                |
>> | pull-target-arm-20210510-1-113-g6eb5e07c28 | 87.88 MFlops | 87.76 MFlops | 87.20 MFlops | softfloat: Convert float128_default_nan to parts                |
>> | pull-target-arm-20210510-1-114-g7a4f7331e4 | 84.38 MFlops | 84.55 MFlops | 86.92 MFlops | softfloat: Move return_nan to softfloat-parts.c.inc             |
>> | pull-target-arm-20210510-1-115-g08f1f1f3ed | 90.40 MFlops | 89.79 MFlops | 90.74 MFlops | softfloat: Move pick_nan to softfloat-parts.c.inc               |
>> | pull-target-arm-20210510-1-115-g08f1f1f3ed | 90.74 MFlops | 90.11 MFlops | 90.59 MFlops | softfloat: Move pick_nan to softfloat-parts.c.inc               |
>> | pull-target-arm-20210510-1-116-g474eb5be10 | 87.84 MFlops | 87.04 MFlops | 88.25 MFlops | softfloat: Move pick_nan_muladd to softfloat-parts.c.inc        |
>> | pull-target-arm-20210510-1-116-g474eb5be10 | 88.22 MFlops | 87.79 MFlops | 88.10 MFlops | softfloat: Move pick_nan_muladd to softfloat-parts.c.inc        |
>> | pull-target-arm-20210510-1-117-g096a466c23 | 86.37 MFlops | 85.32 MFlops | 86.22 MFlops | softfloat: Move sf_canonicalize to softfloat-parts.c.inc        |
>> | pull-target-arm-20210510-1-118-g973977719f | 85.41 MFlops | 84.75 MFlops | 83.47 MFlops | softfloat: Move round_canonical to softfloat-parts.c.inc        |
>> | pull-target-arm-20210510-1-119-g89c1fd4763 | 85.29 MFlops | 86.27 MFlops | 85.33 MFlops | softfloat: Use uadd64_carry/usub64_borrow in softfloat-macros.h |
>> | pull-target-arm-20210510-1-120-gfa24239a88 | 86.61 MFlops | 86.24 MFlops | 86.60 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc          |
>> | pull-target-arm-20210510-1-120-gfa24239a88 | 86.86 MFlops | 86.43 MFlops | 86.38 MFlops |                                                                 |
>> | pull-target-arm-20210510-1-120-gfa24239a88 | 86.57 MFlops | 86.57 MFlops | 86.25 MFlops |                                                                 |
>> | pull-target-arm-20210510-1-121-g15cf4c773a | 74.07 MFlops | 73.24 MFlops | 73.53 MFlops |                                                                 |
>
> If I'm reading your output properly, there should have been no change
> in the benchmarking through to this point, because all we have done so
> far is modify float64 and below.
>
> Thus there seems to be a *lot* of noise in your numbers.

That's why some of the passes were run multiple times. Although the
results are sorted in commit order I did 3 separate passes through the
code when I was gathering the numbers. However you are right to say they
are indicative rather than strict benchmarking conditions - I've just
re-run on 3 commits:

  #+RESULTS: run-quad-float-benchmarks
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 80.97 MFlops | 84.22 MFlops | 81.71 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 82.51 MFlops | 79.53 MFlops | 84.27 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 84.36 MFlops | 85.77 MFlops | 81.84 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 85.28 MFlops | 84.05 MFlops | 87.40 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 85.27 MFlops | 87.66 MFlops | 85.53 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 85.23 MFlops | 87.64 MFlops | 87.05 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 86.19 MFlops | 85.73 MFlops | 85.61 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 85.04 MFlops | 85.36 MFlops | 86.28 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 86.00 MFlops | 86.65 MFlops | 88.30 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 86.94 MFlops | 88.36 MFlops | 86.94 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 86.62 MFlops | 88.44 MFlops | 87.74 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 87.42 MFlops | 87.89 MFlops | 87.85 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 86.87 MFlops | 88.31 MFlops | 87.30 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 88.00 MFlops | 88.12 MFlops | 88.63 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 88.16 MFlops | 87.84 MFlops | 86.42 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 88.35 MFlops | 86.78 MFlops | 88.01 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 87.76 MFlops | 88.01 MFlops | 88.02 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-95-gdb0bb2966f | 88.46 MFlops | 88.10 MFlops | 87.95 MFlops | softfloat: Tidy a * b + inf return          |
  | pull-target-arm-20210510-1-96-gec2be8ad0c | 83.80 MFlops | 88.45 MFlops | 87.70 MFlops | softfloat: Add float_cmask and constants    |
  | pull-target-arm-20210510-1-96-gec2be8ad0c | 88.30 MFlops | 88.55 MFlops | 88.42 MFlops | softfloat: Add float_cmask and constants    |
  | pull-target-arm-20210510-1-96-gec2be8ad0c | 87.96 MFlops | 86.73 MFlops | 88.58 MFlops | softfloat: Add float_cmask and constants    |
  | pull-target-arm-20210510-1-96-gec2be8ad0c | 87.23 MFlops | 87.42 MFlops | 88.14 MFlops | softfloat: Add float_cmask and constants    |
  | pull-target-arm-20210510-1-96-gec2be8ad0c | 87.41 MFlops | 87.75 MFlops | 88.26 MFlops | softfloat: Add float_cmask and constants    |
  | pull-target-arm-20210510-1-96-gec2be8ad0c | 88.28 MFlops | 88.39 MFlops | 87.51 MFlops | softfloat: Add float_cmask and constants    |
  | pull-target-arm-20210510-1-96-gec2be8ad0c | 86.68 MFlops | 88.43 MFlops | 88.23 MFlops | softfloat: Add float_cmask and constants    |
  | pull-target-arm-20210510-1-96-gec2be8ad0c | 86.87 MFlops | 88.04 MFlops | 87.93 MFlops | softfloat: Add float_cmask and constants    |
  | pull-target-arm-20210510-1-96-gec2be8ad0c | 86.60 MFlops | 88.19 MFlops | 87.46 MFlops | softfloat: Add float_cmask and constants    |
  | pull-target-arm-20210510-1-96-gec2be8ad0c | 88.58 MFlops | 88.18 MFlops | 88.27 MFlops | softfloat: Add float_cmask and constants    |
  | pull-target-arm-20210510-1-96-gec2be8ad0c | 86.85 MFlops | 88.73 MFlops | 88.37 MFlops | softfloat: Add float_cmask and constants    |
  | pull-target-arm-20210510-1-96-gec2be8ad0c | 87.25 MFlops | 88.70 MFlops | 88.04 MFlops | softfloat: Add float_cmask and constants    |
  | pull-target-arm-20210510-1-97-g2328f560a1 | 89.23 MFlops | 90.74 MFlops | 91.03 MFlops | softfloat: Use return_nan in float_to_float |
  | pull-target-arm-20210510-1-97-g2328f560a1 | 90.40 MFlops | 90.79 MFlops | 91.23 MFlops | softfloat: Use return_nan in float_to_float |
  | pull-target-arm-20210510-1-97-g2328f560a1 | 91.19 MFlops | 91.13 MFlops | 90.99 MFlops | softfloat: Use return_nan in float_to_float |
  | pull-target-arm-20210510-1-97-g2328f560a1 | 91.06 MFlops | 91.38 MFlops | 90.86 MFlops | softfloat: Use return_nan in float_to_float |
  | pull-target-arm-20210510-1-97-g2328f560a1 | 89.91 MFlops | 89.57 MFlops | 91.09 MFlops | softfloat: Use return_nan in float_to_float |
  | pull-target-arm-20210510-1-97-g2328f560a1 | 91.30 MFlops | 91.65 MFlops | 91.64 MFlops | softfloat: Use return_nan in float_to_float |
  | pull-target-arm-20210510-1-97-g2328f560a1 | 90.62 MFlops | 91.00 MFlops | 91.35 MFlops | softfloat: Use return_nan in float_to_float |
  | pull-target-arm-20210510-1-97-g2328f560a1 | 91.38 MFlops | 91.55 MFlops | 90.80 MFlops | softfloat: Use return_nan in float_to_float |
  | pull-target-arm-20210510-1-97-g2328f560a1 | 90.64 MFlops | 91.50 MFlops | 88.21 MFlops | softfloat: Use return_nan in float_to_float |
  | pull-target-arm-20210510-1-97-g2328f560a1 | 90.13 MFlops | 91.20 MFlops | 91.16 MFlops | softfloat: Use return_nan in float_to_float |

So while the first commit showed a nearly 10% variation over multiple
runs (cache effects?) I think the step between float_cmask and
return_nan is more real.

I wonder if it would be worth tweaking the fp-bench program to make the
numbers a bit more stable? For total execution time runs I use hyperfine
which does at least try and measure the deviation between runs and trend
to a stable number but these numbers are reported directly by the
fp-bench tool.

>
>> | pull-target-arm-20210510-1-121-g15cf4c773a | 73.44 MFlops | 73.52 MFlops | 73.86 MFlops | softfloat: Implement float128_add/sub via parts                 |
>
> This is where I would have expected a hit to the first column, and no
> change to the other two.

The effect looks very real:

  #+RESULTS: run-quad-float-benchmarks
  | pull-target-arm-20210510-1-120-gfa24239a88 | 86.32 MFlops | 82.89 MFlops | 85.81 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 86.31 MFlops | 85.83 MFlops | 85.26 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 85.89 MFlops | 86.99 MFlops | 86.57 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 83.82 MFlops | 85.01 MFlops | 85.83 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 85.51 MFlops | 84.93 MFlops | 86.27 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 85.90 MFlops | 85.15 MFlops | 85.42 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 83.39 MFlops | 85.73 MFlops | 85.28 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 86.03 MFlops | 85.73 MFlops | 84.60 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 85.38 MFlops | 85.18 MFlops | 86.69 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 84.01 MFlops | 84.48 MFlops | 85.49 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 86.56 MFlops | 84.50 MFlops | 85.12 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 86.27 MFlops | 85.39 MFlops | 85.23 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 73.14 MFlops | 71.95 MFlops | 72.78 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 73.70 MFlops | 71.64 MFlops | 73.36 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.26 MFlops | 73.44 MFlops | 72.86 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.95 MFlops | 72.24 MFlops | 72.26 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 73.00 MFlops | 72.78 MFlops | 72.92 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.10 MFlops | 73.36 MFlops | 72.71 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 73.23 MFlops | 72.76 MFlops | 73.48 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 71.91 MFlops | 71.54 MFlops | 73.01 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.08 MFlops | 73.75 MFlops | 72.17 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 73.21 MFlops | 73.69 MFlops | 72.65 MFlops | softfloat: Implement float128_add/sub via parts        |

I wonder if it is an effect of greater code re-organisation affecting
cache lines or some other residency if the other functions? 

>
>> | pull-target-arm-20210510-1-122-gc07a5fec10 | 73.38 MFlops | 73.01 MFlops | 72.94 MFlops | softfloat: Move mul_floats to softfloat-parts.c.inc             |
>> | pull-target-arm-20210510-1-122-gc07a5fec10 | 73.41 MFlops | 72.32 MFlops | 73.64 MFlops |                                                                 |
>> | pull-target-arm-20210510-1-122-gc07a5fec10 | 73.88 MFlops | 73.51 MFlops | 73.36 MFlops |                                                                 |
>> | pull-target-arm-20210510-1-123-g88e1635abf | 69.04 MFlops | 69.33 MFlops | 69.51 MFlops | softfloat: Move muladd_floats to softfloat-parts.c.inc          |
>
> Now add and mul columns are going down when the only change is to
> muladd?  Is this just more noise?

Running again more times I think it is a real effect:

  #+RESULTS: run-quad-float-benchmarks
  | pull-target-arm-20210510-1-120-gfa24239a88 | 85.99 MFlops | 84.44 MFlops | 85.98 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 86.77 MFlops | 85.99 MFlops | 85.02 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 87.07 MFlops | 86.22 MFlops | 85.65 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 84.49 MFlops | 85.94 MFlops | 85.49 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 85.75 MFlops | 85.05 MFlops | 85.65 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 85.51 MFlops | 86.55 MFlops | 84.26 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-120-gfa24239a88 | 85.68 MFlops | 86.39 MFlops | 85.05 MFlops | softfloat: Move addsub_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 71.73 MFlops | 72.85 MFlops | 73.68 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.32 MFlops | 72.05 MFlops | 72.44 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.03 MFlops | 73.92 MFlops | 72.03 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 71.59 MFlops | 73.25 MFlops | 72.97 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.23 MFlops | 73.55 MFlops | 73.47 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 73.59 MFlops | 73.03 MFlops | 72.76 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.60 MFlops | 71.20 MFlops | 73.73 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.91 MFlops | 73.22 MFlops | 72.98 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 73.27 MFlops | 70.69 MFlops | 72.62 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 73.04 MFlops | 73.35 MFlops | 73.13 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.74 MFlops | 71.59 MFlops | 72.33 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 73.20 MFlops | 73.35 MFlops | 72.61 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.72 MFlops | 72.79 MFlops | 73.29 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.97 MFlops | 73.58 MFlops | 72.72 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 73.73 MFlops | 73.20 MFlops | 72.74 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.42 MFlops | 73.61 MFlops | 73.37 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.88 MFlops | 73.36 MFlops | 73.55 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 73.06 MFlops | 72.64 MFlops | 72.75 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 73.78 MFlops | 73.31 MFlops | 73.97 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.78 MFlops | 73.68 MFlops | 73.65 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.58 MFlops | 73.01 MFlops | 72.94 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-121-g15cf4c773a | 72.48 MFlops | 71.77 MFlops | 73.38 MFlops | softfloat: Implement float128_add/sub via parts        |
  | pull-target-arm-20210510-1-122-gc07a5fec10 | 72.23 MFlops | 73.22 MFlops | 73.23 MFlops | softfloat: Move mul_floats to softfloat-parts.c.inc    |
  | pull-target-arm-20210510-1-122-gc07a5fec10 | 72.84 MFlops | 72.80 MFlops | 73.42 MFlops | softfloat: Move mul_floats to softfloat-parts.c.inc    |
  | pull-target-arm-20210510-1-122-gc07a5fec10 | 73.24 MFlops | 72.20 MFlops | 73.68 MFlops | softfloat: Move mul_floats to softfloat-parts.c.inc    |
  | pull-target-arm-20210510-1-122-gc07a5fec10 | 71.59 MFlops | 72.49 MFlops | 73.79 MFlops | softfloat: Move mul_floats to softfloat-parts.c.inc    |
  | pull-target-arm-20210510-1-122-gc07a5fec10 | 73.66 MFlops | 72.63 MFlops | 73.41 MFlops | softfloat: Move mul_floats to softfloat-parts.c.inc    |
  | pull-target-arm-20210510-1-122-gc07a5fec10 | 73.40 MFlops | 73.51 MFlops | 73.38 MFlops | softfloat: Move mul_floats to softfloat-parts.c.inc    |
  | pull-target-arm-20210510-1-122-gc07a5fec10 | 73.47 MFlops | 73.80 MFlops | 73.07 MFlops | softfloat: Move mul_floats to softfloat-parts.c.inc    |
  | pull-target-arm-20210510-1-122-gc07a5fec10 | 73.41 MFlops | 73.78 MFlops | 73.10 MFlops | softfloat: Move mul_floats to softfloat-parts.c.inc    |
  | pull-target-arm-20210510-1-122-gc07a5fec10 | 72.38 MFlops | 72.54 MFlops | 72.09 MFlops | softfloat: Move mul_floats to softfloat-parts.c.inc    |
  | pull-target-arm-20210510-1-123-g88e1635abf | 68.45 MFlops | 67.83 MFlops | 66.24 MFlops | softfloat: Move muladd_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-123-g88e1635abf | 69.52 MFlops | 68.83 MFlops | 69.40 MFlops | softfloat: Move muladd_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-123-g88e1635abf | 69.39 MFlops | 68.94 MFlops | 69.41 MFlops | softfloat: Move muladd_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-123-g88e1635abf | 68.71 MFlops | 69.38 MFlops | 69.78 MFlops | softfloat: Move muladd_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-123-g88e1635abf | 69.51 MFlops | 69.38 MFlops | 68.44 MFlops | softfloat: Move muladd_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-123-g88e1635abf | 69.77 MFlops | 69.53 MFlops | 69.12 MFlops | softfloat: Move muladd_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-123-g88e1635abf | 69.42 MFlops | 69.54 MFlops | 69.72 MFlops | softfloat: Move muladd_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-123-g88e1635abf | 68.03 MFlops | 69.33 MFlops | 68.57 MFlops | softfloat: Move muladd_floats to softfloat-parts.c.inc |
  | pull-target-arm-20210510-1-123-g88e1635abf | 69.48 MFlops | 69.07 MFlops | 69.68 MFlops | softfloat: Move muladd_floats to softfloat-parts.c.inc |

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 45/72] softfloat: Move round_to_int to softfloat-parts.c.inc
  2021-05-08  1:47 ` [PATCH 45/72] softfloat: Move round_to_int to softfloat-parts.c.inc Richard Henderson
@ 2021-05-13 14:10   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 14:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> At the same time, convert to pointers, split out
> parts$N_round_to_int_normal, define a macro for
> parts_round_to_int using QEMU_GENERIC.
>
> This necessarily meant some rearrangement to the
> rount_to_{,u}int_and_pack routines, so go ahead and
> convert to parts_round_to_int_normal, which in turn
> allows cleaning up of the raised exception handling.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 46/72] softfloat: Move rount_to_int_and_pack to softfloat-parts.c.inc
  2021-05-08  1:47 ` [PATCH 46/72] softfloat: Move rount_to_int_and_pack " Richard Henderson
@ 2021-05-13 14:11   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 14:11 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Rename to parts$N_float_to_sint.  Reimplement
> float128_to_int{32,64}{_round_to_zero} with FloatParts128.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 58/72] tests/fp/fp-test: Reverse order of floatx80 precision tests
  2021-05-08  1:47 ` [PATCH 58/72] tests/fp/fp-test: Reverse order of floatx80 precision tests Richard Henderson
@ 2021-05-13 14:12   ` Alex Bennée
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 14:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Many qemu softfloat will check floatx80_rounding_precision
> even when berkeley testfloat will not.  So begin with
> floatx80_precision_x, so that's the one we use
> when !FUNC_EFF_ROUNDINGPRECISION.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 00/72] Convert floatx80 and float128 to FloatParts
  2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
                   ` (75 preceding siblings ...)
  2021-05-12 19:23 ` Alex Bennée
@ 2021-05-13 14:13 ` Alex Bennée
  76 siblings, 0 replies; 151+ messages in thread
From: Alex Bennée @ 2021-05-13 14:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, david


Richard Henderson <richard.henderson@linaro.org> writes:

> Reorg everything using QEMU_GENERIC and multiple inclusion to
> reduce the amount of code duplication between the formats.
>
> The use of QEMU_GENERIC means that we need to use pointers instead
> of structures, which means that even the smaller float formats
> need rearranging.
>
> I've carried it through to completion within fpu/, so that we don't
> have (much) of the legacy code remaining.  There is some floatx80
> stuff in target/m68k and target/i386 that's still hanging around.

I'm going to take a break from reviewing this now I've been through
about 2/3rds of the patches. Overall I think the series is in great
shape and while the performance modulations are interesting they are not
a blocker from my point of view. I'll happily take a small hit to
performance for a more unified (and correct!) code base. However the
frontend maintainers for those affected may take a different view.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 00/72] Convert floatx80 and float128 to FloatParts
  2021-05-13 13:33     ` Alex Bennée
@ 2021-05-13 23:54       ` Richard Henderson
  0 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-13 23:54 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, david

On 5/13/21 8:33 AM, Alex Bennée wrote:
>> Now add and mul columns are going down when the only change is to
>> muladd?  Is this just more noise?
> 
> Running again more times I think it is a real effect:

I don't believe it.  If source code for a given function is not changing then 
the generated code should not change (much, especially with FLATTEN), and thus 
the runtime should not change (much).

Are you absolutely sure that you're measuring what you think you are measuring?

Is your compiler mis-behaving somehow and not inlining stuff?


r~


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH 50/72] softfloat: Move minmax_flags to softfloat-parts.c.inc
  2021-05-08  1:47 ` [PATCH 50/72] softfloat: Move minmax_flags " Richard Henderson
@ 2021-05-17 13:14   ` David Hildenbrand
  2021-05-17 15:57     ` Richard Henderson
  0 siblings, 1 reply; 151+ messages in thread
From: David Hildenbrand @ 2021-05-17 13:14 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 08.05.21 03:47, Richard Henderson wrote:
> Rename to parts$N_minmax.  Combine 3 bool arguments to a bitmask,
> return a tri-state value to indicate nan vs unchanged operand.
> Introduce ftype_minmax functions as a common optimization point.
> Fold bfloat16 expansions into the same macro as the other types.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   fpu/softfloat.c           | 216 ++++++++++++++++----------------------
>   fpu/softfloat-parts.c.inc |  69 ++++++++++++
>   2 files changed, 158 insertions(+), 127 deletions(-)
> 
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 586ea5d67a..4c04e88a3a 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -482,6 +482,15 @@ enum {
>       float_cmask_anynan  = float_cmask_qnan | float_cmask_snan,
>   };
>   
> +/* Flags for parts_minmax. */
> +enum {
> +    /* Set for minimum; clear for maximum. */
> +    minmax_ismin = 1,
> +    /* Set for the IEEE 754-2008 minNum() and maxNum() operations. */
> +    minmax_isnum = 2,
> +    /* Set for the IEEE 754-2008 minNumMag() and minNumMag() operations. */
> +    minmax_ismag = 4 | minmax_isnum
> +};
>   
>   /* Simple helpers for checking if, or what kind of, NaN we have */
>   static inline __attribute__((unused)) bool is_nan(FloatClass c)
> @@ -864,6 +873,14 @@ static void parts128_uint_to_float(FloatParts128 *p, uint64_t a,
>   #define parts_uint_to_float(P, I, Z, S) \
>       PARTS_GENERIC_64_128(uint_to_float, P)(P, I, Z, S)
>   
> +static int parts64_minmax(FloatParts64 *a, FloatParts64 *b,
> +                          float_status *s, int flags, const FloatFmt *fmt);
> +static int parts128_minmax(FloatParts128 *a, FloatParts128 *b,
> +                           float_status *s, int flags, const FloatFmt *fmt);
> +
> +#define parts_minmax(A, B, S, Z, F) \
> +    PARTS_GENERIC_64_128(minmax, A)(A, B, S, Z, F)
> +
>   /*
>    * Helper functions for softfloat-parts.c.inc, per-size operations.
>    */
> @@ -3257,145 +3274,90 @@ float128 uint64_to_float128(uint64_t a, float_status *status)
>       return float128_round_pack_canonical(&p, status);
>   }
>   
> -/* Float Min/Max */
> -/* min() and max() functions. These can't be implemented as
> - * 'compare and pick one input' because that would mishandle
> - * NaNs and +0 vs -0.
> - *
> - * minnum() and maxnum() functions. These are similar to the min()
> - * and max() functions but if one of the arguments is a QNaN and
> - * the other is numerical then the numerical argument is returned.
> - * SNaNs will get quietened before being returned.
> - * minnum() and maxnum correspond to the IEEE 754-2008 minNum()
> - * and maxNum() operations. min() and max() are the typical min/max
> - * semantics provided by many CPUs which predate that specification.
> - *
> - * minnummag() and maxnummag() functions correspond to minNumMag()
> - * and minNumMag() from the IEEE-754 2008.
> +/*
> + * Minimum and maximum
>    */
> -static FloatParts64 minmax_floats(FloatParts64 a, FloatParts64 b, bool ismin,
> -                                bool ieee, bool ismag, float_status *s)
> +
> +static float16 float16_minmax(float16 a, float16 b, float_status *s, int flags)
>   {
> -    if (unlikely(is_nan(a.cls) || is_nan(b.cls))) {
> -        if (ieee) {
> -            /* Takes two floating-point values `a' and `b', one of
> -             * which is a NaN, and returns the appropriate NaN
> -             * result. If either `a' or `b' is a signaling NaN,
> -             * the invalid exception is raised.
> -             */
> -            if (is_snan(a.cls) || is_snan(b.cls)) {
> -                return *parts_pick_nan(&a, &b, s);
> -            } else if (is_nan(a.cls) && !is_nan(b.cls)) {
> -                return b;
> -            } else if (is_nan(b.cls) && !is_nan(a.cls)) {
> -                return a;
> -            }
> -        }
> -        return *parts_pick_nan(&a, &b, s);
> -    } else {
> -        int a_exp, b_exp;
> +    FloatParts64 pa, pb;
> +    int which;
>   
> -        switch (a.cls) {
> -        case float_class_normal:
> -            a_exp = a.exp;
> -            break;
> -        case float_class_inf:
> -            a_exp = INT_MAX;
> -            break;
> -        case float_class_zero:
> -            a_exp = INT_MIN;
> -            break;
> -        default:
> -            g_assert_not_reached();
> -            break;
> -        }
> -        switch (b.cls) {
> -        case float_class_normal:
> -            b_exp = b.exp;
> -            break;
> -        case float_class_inf:
> -            b_exp = INT_MAX;
> -            break;
> -        case float_class_zero:
> -            b_exp = INT_MIN;
> -            break;
> -        default:
> -            g_assert_not_reached();
> -            break;
> -        }
> -
> -        if (ismag && (a_exp != b_exp || a.frac != b.frac)) {
> -            bool a_less = a_exp < b_exp;
> -            if (a_exp == b_exp) {
> -                a_less = a.frac < b.frac;
> -            }
> -            return a_less ^ ismin ? b : a;
> -        }
> -
> -        if (a.sign == b.sign) {
> -            bool a_less = a_exp < b_exp;
> -            if (a_exp == b_exp) {
> -                a_less = a.frac < b.frac;
> -            }
> -            return a.sign ^ a_less ^ ismin ? b : a;
> -        } else {
> -            return a.sign ^ ismin ? b : a;
> -        }
> +    float16_unpack_canonical(&pa, a, s);
> +    float16_unpack_canonical(&pb, b, s);
> +    which = parts_minmax(&pa, &pb, s, flags, &float16_params);
> +    if (unlikely(which < 0)) {
> +        /* Some sort of nan, need to repack default and silenced nans. */
> +        return float16_round_pack_canonical(&pa, s);
>       }
> +    return which ? b : a;
>   }
>   
> -#define MINMAX(sz, name, ismin, isiee, ismag)                           \
> -float ## sz float ## sz ## _ ## name(float ## sz a, float ## sz b,      \
> -                                     float_status *s)                   \
> -{                                                                       \
> -    FloatParts64 pa, pb, pr;                                            \
> -    float ## sz ## _unpack_canonical(&pa, a, s);                        \
> -    float ## sz ## _unpack_canonical(&pb, b, s);                        \
> -    pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);                 \
> -    return float ## sz ## _round_pack_canonical(&pr, s);                \
> +static bfloat16 bfloat16_minmax(bfloat16 a, bfloat16 b,
> +                                float_status *s, int flags)
> +{
> +    FloatParts64 pa, pb;
> +    int which;
> +
> +    bfloat16_unpack_canonical(&pa, a, s);
> +    bfloat16_unpack_canonical(&pb, b, s);
> +    which = parts_minmax(&pa, &pb, s, flags, &float16_params);
> +    if (unlikely(which < 0)) {
> +        /* Some sort of nan, need to repack default and silenced nans. */
> +        return bfloat16_round_pack_canonical(&pa, s);
> +    }
> +    return which ? b : a;
>   }
>   
> -MINMAX(16, min, true, false, false)
> -MINMAX(16, minnum, true, true, false)
> -MINMAX(16, minnummag, true, true, true)
> -MINMAX(16, max, false, false, false)
> -MINMAX(16, maxnum, false, true, false)
> -MINMAX(16, maxnummag, false, true, true)
> +static float32 float32_minmax(float32 a, float32 b, float_status *s, int flags)
> +{
> +    FloatParts64 pa, pb;
> +    int which;
>   
> -MINMAX(32, min, true, false, false)
> -MINMAX(32, minnum, true, true, false)
> -MINMAX(32, minnummag, true, true, true)
> -MINMAX(32, max, false, false, false)
> -MINMAX(32, maxnum, false, true, false)
> -MINMAX(32, maxnummag, false, true, true)
> -
> -MINMAX(64, min, true, false, false)
> -MINMAX(64, minnum, true, true, false)
> -MINMAX(64, minnummag, true, true, true)
> -MINMAX(64, max, false, false, false)
> -MINMAX(64, maxnum, false, true, false)
> -MINMAX(64, maxnummag, false, true, true)
> -
> -#undef MINMAX
> -
> -#define BF16_MINMAX(name, ismin, isiee, ismag)                          \
> -bfloat16 bfloat16_ ## name(bfloat16 a, bfloat16 b, float_status *s)     \
> -{                                                                       \
> -    FloatParts64 pa, pb, pr;                                            \
> -    bfloat16_unpack_canonical(&pa, a, s);                               \
> -    bfloat16_unpack_canonical(&pb, b, s);                               \
> -    pr = minmax_floats(pa, pb, ismin, isiee, ismag, s);                 \
> -    return bfloat16_round_pack_canonical(&pr, s);                       \
> +    float32_unpack_canonical(&pa, a, s);
> +    float32_unpack_canonical(&pb, b, s);
> +    which = parts_minmax(&pa, &pb, s, flags, &float32_params);
> +    if (unlikely(which < 0)) {
> +        /* Some sort of nan, need to repack default and silenced nans. */
> +        return float32_round_pack_canonical(&pa, s);
> +    }
> +    return which ? b : a;
>   }
>   
> -BF16_MINMAX(min, true, false, false)
> -BF16_MINMAX(minnum, true, true, false)
> -BF16_MINMAX(minnummag, true, true, true)
> -BF16_MINMAX(max, false, false, false)
> -BF16_MINMAX(maxnum, false, true, false)
> -BF16_MINMAX(maxnummag, false, true, true)
> +static float64 float64_minmax(float64 a, float64 b, float_status *s, int flags)
> +{
> +    FloatParts64 pa, pb;
> +    int which;
>   
> -#undef BF16_MINMAX
> +    float64_unpack_canonical(&pa, a, s);
> +    float64_unpack_canonical(&pb, b, s);
> +    which = parts_minmax(&pa, &pb, s, flags, &float64_params);
> +    if (unlikely(which < 0)) {
> +        /* Some sort of nan, need to repack default and silenced nans. */
> +        return float64_round_pack_canonical(&pa, s);
> +    }
> +    return which ? b : a;
> +}
> +
> +#define MINMAX_1(type, name, flags) \
> +    type type##_##name(type a, type b, float_status *s) \
> +    { return type##_minmax(a, b, s, flags); }
> +
> +#define MINMAX_2(type) \
> +    MINMAX_1(type, max, 0)                                      \
> +    MINMAX_1(type, maxnum, minmax_isnum)                        \
> +    MINMAX_1(type, maxnummag, minmax_ismag)                     \
> +    MINMAX_1(type, min, minmax_ismin)                           \
> +    MINMAX_1(type, minnum, minmax_ismin | minmax_isnum)         \
> +    MINMAX_1(type, minnummag, minmax_ismin | minmax_ismag)
> +
> +MINMAX_2(float16)
> +MINMAX_2(bfloat16)
> +MINMAX_2(float32)
> +MINMAX_2(float64)
> +
> +#undef MINMAX_1
> +#undef MINMAX_2
>   
>   /* Floating point compare */
>   static FloatRelation compare_floats(FloatParts64 a, FloatParts64 b, bool is_quiet,
> diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
> index f3c4f8c8d2..4d91ef0d32 100644
> --- a/fpu/softfloat-parts.c.inc
> +++ b/fpu/softfloat-parts.c.inc
> @@ -936,3 +936,72 @@ static void partsN(uint_to_float)(FloatPartsN *p, uint64_t a,
>           p->frac_hi = a << shift;
>       }
>   }
> +
> +/*
> + * Float min/max.
> + *
> + * Return -1 to return the chosen nan in *a;
> + * return 0 to use the a input unchanged; 1 to use the b input unchanged.
> + */
> +static int partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
> +                          float_status *s, int flags, const FloatFmt *fmt)
> +{
> +    int ab_mask = float_cmask(a->cls) | float_cmask(b->cls);
> +    int a_exp, b_exp;
> +    bool a_less;
> +
> +    if (unlikely(ab_mask & float_cmask_anynan)) {
> +        /*
> +         * For minnum/maxnum, if one operand is a QNaN, and the other
> +         * operand is numerical, then return numerical argument.
> +         */
> +        if ((flags & minmax_isnum)
> +            && !(ab_mask & float_cmask_snan)
> +            && (ab_mask & ~float_cmask_qnan)) {
> +            return is_nan(a->cls);
> +        }
> +        *a = *parts_pick_nan(a, b, s);
> +        return -1;
> +    }
> +
> +    a_exp = a->exp;
> +    b_exp = b->exp;
> +
> +    if (unlikely(ab_mask != float_cmask_normal)) {
> +        switch (a->cls) {
> +        case float_class_normal:
> +            break;
> +        case float_class_inf:
> +            a_exp = INT_MAX;
> +            break;
> +        case float_class_zero:
> +            a_exp = INT_MIN;
> +            break;
> +        default:
> +            g_assert_not_reached();
> +            break;
> +        }
> +        switch (b->cls) {
> +        case float_class_normal:
> +            break;
> +        case float_class_inf:
> +            b_exp = INT_MAX;
> +            break;
> +        case float_class_zero:
> +            b_exp = INT_MIN;
> +            break;
> +        default:
> +            g_assert_not_reached();
> +            break;
> +        }
> +    }
> +
> +    if (a->sign != b->sign && !(flags & minmax_ismag)) {
> +        a_less = a->sign;
> +    } else if (a_exp != b_exp) {
> +        a_less = a_exp < b_exp;
> +    } else {
> +        a_less = frac_cmp(a, b) < 0;
> +    }
> +    return a_less ^ !!(flags & minmax_ismin);
> +}

This patch introduces two issues:

1. Comparing two negative numbers is broken. We have to
invert the a_less result.

2. The check "flags & minmax_ismag" is broken because
"minmax_ismag = 4 | minmax_isnum" and it, therefore,
also triggers for "flags = minmax_isnum"

The following seems to make it work in my tests:


diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index 1bc8db4cf1..d5a73dac00 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -1273,6 +1273,9 @@ static int partsN(minmax)(FloatPartsN *a, FloatPartsN *b,
      } else {
          a_less = frac_cmp(a, b) < 0;
      }
+    if (a->sign && b->sign && !(flags & minmax_ismag)) {
+        a_less = !a_less;
+    }
      return a_less ^ !!(flags & minmax_ismin);
  }
  
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index bfe5a6b975..649e8c6592 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -435,7 +435,7 @@ enum {
      /* Set for the IEEE 754-2008 minNum() and maxNum() operations. */
      minmax_isnum = 2,
      /* Set for the IEEE 754-2008 minNumMag() and minNumMag() operations. */
-    minmax_ismag = 4 | minmax_isnum
+    minmax_ismag = 4,
  };
  
  /* Simple helpers for checking if, or what kind of, NaN we have */
@@ -3916,10 +3916,10 @@ static float128 float128_minmax(float128 a, float128 b, float_status *s,
  #define MINMAX_2(type) \
      MINMAX_1(type, max, 0)                                      \
      MINMAX_1(type, maxnum, minmax_isnum)                        \
-    MINMAX_1(type, maxnummag, minmax_ismag)                     \
+    MINMAX_1(type, maxnummag, minmax_isnum | minmax_ismag)      \
      MINMAX_1(type, min, minmax_ismin)                           \
      MINMAX_1(type, minnum, minmax_ismin | minmax_isnum)         \
-    MINMAX_1(type, minnummag, minmax_ismin | minmax_ismag)
+    MINMAX_1(type, minnummag, minmax_ismin | minmax_isnum | minmax_ismag)
  
  MINMAX_2(float16)
  MINMAX_2(bfloat16)


-- 
Thanks,

David / dhildenb



^ permalink raw reply related	[flat|nested] 151+ messages in thread

* Re: [PATCH 50/72] softfloat: Move minmax_flags to softfloat-parts.c.inc
  2021-05-17 13:14   ` David Hildenbrand
@ 2021-05-17 15:57     ` Richard Henderson
  0 siblings, 0 replies; 151+ messages in thread
From: Richard Henderson @ 2021-05-17 15:57 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: alex.bennee

On 5/17/21 8:14 AM, David Hildenbrand wrote:
> This patch introduces two issues:
> 
> 1. Comparing two negative numbers is broken. We have to
> invert the a_less result.
> 
> 2. The check "flags & minmax_ismag" is broken because
> "minmax_ismag = 4 | minmax_isnum" and it, therefore,
> also triggers for "flags = minmax_isnum"

Thanks.  I guess I should assume from this that these tests are not enabled, 
and I should fix that as well.


r~


^ permalink raw reply	[flat|nested] 151+ messages in thread

end of thread, other threads:[~2021-05-17 15:59 UTC | newest]

Thread overview: 151+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-08  1:46 [PATCH 00/72] Convert floatx80 and float128 to FloatParts Richard Henderson
2021-05-08  1:46 ` [PATCH 01/72] qemu/host-utils: Use __builtin_bitreverseN Richard Henderson
2021-05-10  9:59   ` Alex Bennée
2021-05-11  9:41   ` David Hildenbrand
2021-05-08  1:46 ` [PATCH 02/72] qemu/host-utils: Add wrappers for overflow builtins Richard Henderson
2021-05-10 10:22   ` Alex Bennée
2021-05-08  1:46 ` [PATCH 03/72] qemu/host-utils: Add wrappers for carry builtins Richard Henderson
2021-05-10 12:57   ` Alex Bennée
2021-05-11 20:10     ` Richard Henderson
2021-05-12 11:17       ` Alex Bennée
2021-05-08  1:46 ` [PATCH 04/72] accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c Richard Henderson
2021-05-09  8:43   ` Philippe Mathieu-Daudé
2021-05-10 13:02   ` Alex Bennée
2021-05-11  9:46   ` David Hildenbrand
2021-05-08  1:46 ` [PATCH 05/72] tests/fp: add quad support to the benchmark utility Richard Henderson
2021-05-11 10:01   ` David Hildenbrand
2021-05-08  1:46 ` [PATCH 06/72] softfloat: Move the binary point to the msb Richard Henderson
2021-05-10 13:36   ` Alex Bennée
2021-05-08  1:46 ` [PATCH 07/72] softfloat: Inline float_raise Richard Henderson
2021-05-09  8:32   ` Philippe Mathieu-Daudé
2021-05-11 10:04   ` David Hildenbrand
2021-05-08  1:46 ` [PATCH 08/72] softfloat: Use float_raise in more places Richard Henderson
2021-05-09  8:34   ` Philippe Mathieu-Daudé
2021-05-11 10:06   ` David Hildenbrand
2021-05-08  1:46 ` [PATCH 09/72] softfloat: Tidy a * b + inf return Richard Henderson
2021-05-08  1:47 ` [PATCH 10/72] softfloat: Add float_cmask and constants Richard Henderson
2021-05-08  1:47 ` [PATCH 11/72] softfloat: Use return_nan in float_to_float Richard Henderson
2021-05-10 15:10   ` Alex Bennée
2021-05-11 10:10   ` David Hildenbrand
2021-05-08  1:47 ` [PATCH 12/72] softfloat: fix return_nan vs default_nan_mode Richard Henderson
2021-05-10 15:12   ` Alex Bennée
2021-05-11 10:12   ` David Hildenbrand
2021-05-08  1:47 ` [PATCH 13/72] target/mips: Set set_default_nan_mode with set_snan_bit_is_one Richard Henderson
2021-05-11  9:37   ` Alex Bennée
2021-05-11 10:16   ` David Hildenbrand
2021-05-08  1:47 ` [PATCH 14/72] softfloat: Do not produce a default_nan from parts_silence_nan Richard Henderson
2021-05-11 10:16   ` David Hildenbrand
2021-05-11 10:32   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 15/72] softfloat: Rename FloatParts to FloatParts64 Richard Henderson
2021-05-09  8:45   ` Philippe Mathieu-Daudé
2021-05-11 10:16   ` David Hildenbrand
2021-05-08  1:47 ` [PATCH 16/72] softfloat: Move type-specific pack/unpack routines Richard Henderson
2021-05-09  8:46   ` Philippe Mathieu-Daudé
2021-05-11 10:17   ` David Hildenbrand
2021-05-08  1:47 ` [PATCH 17/72] softfloat: Use pointers with parts_default_nan Richard Henderson
2021-05-11 10:22   ` David Hildenbrand
2021-05-11 20:19     ` Richard Henderson
2021-05-08  1:47 ` [PATCH 18/72] softfloat: Use pointers with unpack_raw Richard Henderson
2021-05-09  8:48   ` Philippe Mathieu-Daudé
2021-05-12 12:58   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 19/72] softfloat: Use pointers with ftype_unpack_raw Richard Henderson
2021-05-11 11:31   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 20/72] softfloat: Use pointers with pack_raw Richard Henderson
2021-05-11 11:32   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 21/72] softfloat: Use pointers with ftype_pack_raw Richard Henderson
2021-05-11 11:32   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 22/72] softfloat: Use pointers with ftype_unpack_canonical Richard Henderson
2021-05-11 13:54   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 23/72] softfloat: Use pointers with ftype_round_pack_canonical Richard Henderson
2021-05-11 13:55   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 24/72] softfloat: Use pointers with parts_silence_nan Richard Henderson
2021-05-11 13:56   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 25/72] softfloat: Rearrange FloatParts64 Richard Henderson
2021-05-11 13:57   ` Alex Bennée
2021-05-11 15:04     ` Richard Henderson
2021-05-12 11:08       ` Alex Bennée
2021-05-08  1:47 ` [PATCH 26/72] softfloat: Convert float128_silence_nan to parts Richard Henderson
2021-05-13  8:34   ` Alex Bennée
2021-05-13 12:25     ` Richard Henderson
2021-05-08  1:47 ` [PATCH 27/72] softfloat: Convert float128_default_nan " Richard Henderson
2021-05-13  8:56   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 28/72] softfloat: Move return_nan to softfloat-parts.c.inc Richard Henderson
2021-05-12 18:10   ` David Hildenbrand
2021-05-08  1:47 ` [PATCH 29/72] softfloat: Move pick_nan " Richard Henderson
2021-05-12 18:16   ` David Hildenbrand
2021-05-13 12:28     ` Richard Henderson
2021-05-08  1:47 ` [PATCH 30/72] softfloat: Move pick_nan_muladd " Richard Henderson
2021-05-12 18:18   ` David Hildenbrand
2021-05-08  1:47 ` [PATCH 31/72] softfloat: Move sf_canonicalize " Richard Henderson
2021-05-13  9:45   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 32/72] softfloat: Move round_canonical " Richard Henderson
2021-05-13  9:53   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 33/72] softfloat: Use uadd64_carry, usub64_borrow in softfloat-macros.h Richard Henderson
2021-05-13 10:00   ` Alex Bennée
2021-05-13 12:38     ` Richard Henderson
2021-05-08  1:47 ` [PATCH 34/72] softfloat: Move addsub_floats to softfloat-parts.c.inc Richard Henderson
2021-05-13 10:03   ` Alex Bennée
2021-05-13 10:05   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 35/72] softfloat: Implement float128_add/sub via parts Richard Henderson
2021-05-13 10:06   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 36/72] softfloat: Move mul_floats to softfloat-parts.c.inc Richard Henderson
2021-05-13 10:08   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 37/72] softfloat: Move muladd_floats " Richard Henderson
2021-05-13 10:43   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 38/72] softfloat: Use mulu64 for mul64To128 Richard Henderson
2021-05-08  1:47 ` [PATCH 39/72] softfloat: Use add192 in mul128To256 Richard Henderson
2021-05-13 10:49   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 40/72] softfloat: Tidy mul128By64To192 Richard Henderson
2021-05-13 10:50   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 41/72] softfloat: Introduce sh[lr]_double primitives Richard Henderson
2021-05-13 10:59   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 42/72] softfloat: Move div_floats to softfloat-parts.c.inc Richard Henderson
2021-05-13 11:02   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 43/72] softfloat: Split float_to_float Richard Henderson
2021-05-13 11:04   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 44/72] softfloat: Convert float-to-float conversions with float128 Richard Henderson
2021-05-13 11:17   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 45/72] softfloat: Move round_to_int to softfloat-parts.c.inc Richard Henderson
2021-05-13 14:10   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 46/72] softfloat: Move rount_to_int_and_pack " Richard Henderson
2021-05-13 14:11   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 47/72] softfloat: Move rount_to_uint_and_pack " Richard Henderson
2021-05-08  1:47 ` [PATCH 48/72] softfloat: Move int_to_float " Richard Henderson
2021-05-08  1:47 ` [PATCH 49/72] softfloat: Move uint_to_float " Richard Henderson
2021-05-08  1:47 ` [PATCH 50/72] softfloat: Move minmax_flags " Richard Henderson
2021-05-17 13:14   ` David Hildenbrand
2021-05-17 15:57     ` Richard Henderson
2021-05-08  1:47 ` [PATCH 51/72] softfloat: Move compare_floats " Richard Henderson
2021-05-08  1:47 ` [PATCH 52/72] softfloat: Move scalbn_decomposed " Richard Henderson
2021-05-08  1:47 ` [PATCH 53/72] softfloat: Move sqrt_float " Richard Henderson
2021-05-08  1:47 ` [PATCH 54/72] softfloat: Split out parts_uncanon_normal Richard Henderson
2021-05-08  1:47 ` [PATCH 55/72] softfloat: Reduce FloatFmt Richard Henderson
2021-05-08  1:47 ` [PATCH 56/72] softfloat: Introduce Floatx80RoundPrec Richard Henderson
2021-05-08  1:47 ` [PATCH 57/72] softfloat: Adjust parts_uncanon_normal for floatx80 Richard Henderson
2021-05-08  1:47 ` [PATCH 58/72] tests/fp/fp-test: Reverse order of floatx80 precision tests Richard Henderson
2021-05-13 14:12   ` Alex Bennée
2021-05-08  1:47 ` [PATCH 59/72] softfloat: Convert floatx80_add/sub to FloatParts Richard Henderson
2021-05-08  1:47 ` [PATCH 60/72] softfloat: Convert floatx80_mul " Richard Henderson
2021-05-08  1:47 ` [PATCH 61/72] softfloat: Convert floatx80_div " Richard Henderson
2021-05-08  1:47 ` [PATCH 62/72] softfloat: Convert floatx80_sqrt " Richard Henderson
2021-05-08  1:47 ` [PATCH 63/72] softfloat: Convert floatx80_round " Richard Henderson
2021-05-08  1:47 ` [PATCH 64/72] softfloat: Convert floatx80_round_to_int " Richard Henderson
2021-05-08  1:47 ` [PATCH 65/72] softfloat: Convert integer to floatx80 " Richard Henderson
2021-05-08  1:47 ` [PATCH 66/72] softfloat: Convert floatx80 float conversions " Richard Henderson
2021-05-08  1:47 ` [PATCH 67/72] softfloat: Convert floatx80 to integer " Richard Henderson
2021-05-08  1:47 ` [PATCH 68/72] softfloat: Convert floatx80_scalbn " Richard Henderson
2021-05-08  1:47 ` [PATCH 69/72] softfloat: Convert floatx80 compare " Richard Henderson
2021-05-08  1:48 ` [PATCH 70/72] softfloat: Convert float32_exp2 " Richard Henderson
2021-05-08  1:48 ` [PATCH 71/72] softfloat: Move floatN_log2 to softfloat-parts.c.inc Richard Henderson
2021-05-08  1:48 ` [PATCH 72/72] softfloat: Convert modrem operations to FloatParts Richard Henderson
2021-05-08  2:52 ` [PATCH 00/72] Convert floatx80 and float128 " no-reply
2021-05-10 13:36 ` Alex Bennée
2021-05-12  1:52   ` Richard Henderson
2021-05-12 11:22     ` Alex Bennée
2021-05-12 15:28       ` Richard Henderson
2021-05-12 16:47 ` Alex Bennée
2021-05-12 19:23 ` Alex Bennée
2021-05-13 11:49   ` Richard Henderson
2021-05-13 13:33     ` Alex Bennée
2021-05-13 23:54       ` Richard Henderson
2021-05-13 14:13 ` Alex Bennée

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).