[Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions
@ 2018-01-09 12:22 Alex Bennée
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 01/20] fpu/softfloat: implement float16_squash_input_denormal Alex Bennée
                   ` (20 more replies)
  0 siblings, 21 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée

Hi,

Here is version two of the softfloat re-factoring. See the previous
posting for details of the approach:

  https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg01708.html

There is only one new patch to remove USE_SOFTFLOAT_STRUCT_TYPES which
had bit-rotted to irrelevance. I did run into a problem with inclusion
of softfloat.h by bswap.h - which is likely the cause of the excessive
rebuilds when touching softfloat headers. We probably want to think
about moving the the type definitions to somewhere common
(qemu/typedefs.h?) but I haven't done it here to avoid too much churn.

This work is part of the larger chunk of adding half-precision ops to
the ARM front-end. However I've split the series up to make for a less
messy review. This tree can be found at:

  https://github.com/stsquad/qemu/tree/softfloat-refactor-and-fp16-v2

While I have been testing the half-precision stuff in the ARM
specific tree this series is all common code. It has however been
tested with ARM RISU which exercises the float32/64 code paths quite
nicely.

Any additional testing appreciated.

Changes for v2
--------------

 - added rth's s-o-b tags
 - added review tags
 - clean-ups for compare, minmax and float to int

As usual the details are in the individual commit messages.

Alex Bennée (20):
  fpu/softfloat: implement float16_squash_input_denormal
  include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES
  include/fpu/softfloat: implement float16_abs helper
  include/fpu/softfloat: implement float16_chs helper
  include/fpu/softfloat: implement float16_set_sign helper
  include/fpu/softfloat: add some float16 constants
  fpu/softfloat: propagate signalling NaNs in MINMAX
  fpu/softfloat: improve comments on ARM NaN propagation
  fpu/softfloat: move the extract functions to the top of the file
  fpu/softfloat: define decompose structures
  fpu/softfloat: re-factor add/sub
  fpu/softfloat: re-factor mul
  fpu/softfloat: re-factor div
  fpu/softfloat: re-factor muladd
  fpu/softfloat: re-factor round_to_int
  fpu/softfloat: re-factor float to int/uint
  fpu/softfloat: re-factor int/uint to float
  fpu/softfloat: re-factor scalbn
  fpu/softfloat: re-factor minmax
  fpu/softfloat: re-factor compare

 fpu/softfloat-macros.h     |   44 +
 fpu/softfloat-specialize.h |  115 +-
 fpu/softfloat.c            | 7417 ++++++++++++++++++++------------------------
 include/fpu/softfloat.h    |  117 +-
 4 files changed, 3437 insertions(+), 4256 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 01/20] fpu/softfloat: implement float16_squash_input_denormal
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-12 13:41   ` Peter Maydell
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 02/20] include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES Alex Bennée
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

This will be required when expanding the MINMAX() macro for 16
bit/half-precision operations.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c         | 15 +++++++++++++++
 include/fpu/softfloat.h |  1 +
 2 files changed, 16 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 433c5dad2d..3a4ab1355f 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3488,6 +3488,21 @@ static float16 roundAndPackFloat16(flag zSign, int zExp,
     return packFloat16(zSign, zExp, zSig >> 13);
 }
 
+/*----------------------------------------------------------------------------
+| If `a' is denormal and we are in flush-to-zero mode then set the
+| input-denormal exception and return zero. Otherwise just return the value.
+*----------------------------------------------------------------------------*/
+float16 float16_squash_input_denormal(float16 a, float_status *status)
+{
+    if (status->flush_inputs_to_zero) {
+        if (extractFloat16Exp(a) == 0 && extractFloat16Frac(a) != 0) {
+            float_raise(float_flag_input_denormal, status);
+            return make_float16(float16_val(a) & 0x8000);
+        }
+    }
+    return a;
+}
+
 static void normalizeFloat16Subnormal(uint32_t aSig, int *zExpPtr,
                                       uint32_t *zSigPtr)
 {
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 0f96a0edd1..d5e99667b6 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -277,6 +277,7 @@ void float_raise(uint8_t flags, float_status *status);
 | If `a' is denormal and we are in flush-to-zero mode then set the
 | input-denormal exception and return zero. Otherwise just return the value.
 *----------------------------------------------------------------------------*/
+float16 float16_squash_input_denormal(float16 a, float_status *status);
 float32 float32_squash_input_denormal(float32 a, float_status *status);
 float64 float64_squash_input_denormal(float64 a, float_status *status);
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 02/20] include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 01/20] fpu/softfloat: implement float16_squash_input_denormal Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-09 12:27   ` Laurent Vivier
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 03/20] include/fpu/softfloat: implement float16_abs helper Alex Bennée
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

It's not actively built and when enabled things fail to compile. I'm
not sure the type-checking is really helping here. Seeing as we "own"
our softfloat now lets remove the cruft.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 include/fpu/softfloat.h | 27 ---------------------------
 1 file changed, 27 deletions(-)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index d5e99667b6..52af1412de 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -103,32 +103,6 @@ enum {
 /*----------------------------------------------------------------------------
 | Software IEC/IEEE floating-point types.
 *----------------------------------------------------------------------------*/
-/* Use structures for soft-float types.  This prevents accidentally mixing
-   them with native int/float types.  A sufficiently clever compiler and
-   sane ABI should be able to see though these structs.  However
-   x86/gcc 3.x seems to struggle a bit, so leave them disabled by default.  */
-//#define USE_SOFTFLOAT_STRUCT_TYPES
-#ifdef USE_SOFTFLOAT_STRUCT_TYPES
-typedef struct {
-    uint16_t v;
-} float16;
-#define float16_val(x) (((float16)(x)).v)
-#define make_float16(x) __extension__ ({ float16 f16_val = {x}; f16_val; })
-#define const_float16(x) { x }
-typedef struct {
-    uint32_t v;
-} float32;
-/* The cast ensures an error if the wrong type is passed.  */
-#define float32_val(x) (((float32)(x)).v)
-#define make_float32(x) __extension__ ({ float32 f32_val = {x}; f32_val; })
-#define const_float32(x) { x }
-typedef struct {
-    uint64_t v;
-} float64;
-#define float64_val(x) (((float64)(x)).v)
-#define make_float64(x) __extension__ ({ float64 f64_val = {x}; f64_val; })
-#define const_float64(x) { x }
-#else
 typedef uint16_t float16;
 typedef uint32_t float32;
 typedef uint64_t float64;
@@ -141,7 +115,6 @@ typedef uint64_t float64;
 #define const_float16(x) (x)
 #define const_float32(x) (x)
 #define const_float64(x) (x)
-#endif
 typedef struct {
     uint64_t low;
     uint16_t high;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 03/20] include/fpu/softfloat: implement float16_abs helper
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 01/20] fpu/softfloat: implement float16_squash_input_denormal Alex Bennée
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 02/20] include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-12 13:42   ` Peter Maydell
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 04/20] include/fpu/softfloat: implement float16_chs helper Alex Bennée
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

This will be required when expanding the MINMAX() macro for 16
bit/half-precision operations.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
 include/fpu/softfloat.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 52af1412de..cfc615008d 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -347,6 +347,13 @@ static inline int float16_is_zero_or_denormal(float16 a)
     return (float16_val(a) & 0x7c00) == 0;
 }
 
+static inline float16 float16_abs(float16 a)
+{
+    /* Note that abs does *not* handle NaN specially, nor does
+     * it flush denormal inputs to zero.
+     */
+    return make_float16(float16_val(a) & 0x7fff);
+}
 /*----------------------------------------------------------------------------
 | The pattern for a default generated half-precision NaN.
 *----------------------------------------------------------------------------*/
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 04/20] include/fpu/softfloat: implement float16_chs helper
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (2 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 03/20] include/fpu/softfloat: implement float16_abs helper Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-12 13:43   ` Peter Maydell
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 05/20] include/fpu/softfloat: implement float16_set_sign helper Alex Bennée
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index cfc615008d..dc71b01dba 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -354,6 +354,15 @@ static inline float16 float16_abs(float16 a)
      */
     return make_float16(float16_val(a) & 0x7fff);
 }
+
+static inline float16 float16_chs(float16 a)
+{
+    /* Note that chs does *not* handle NaN specially, nor does
+     * it flush denormal inputs to zero.
+     */
+    return make_float16(float16_val(a) ^ 0x8000);
+}
+
 /*----------------------------------------------------------------------------
 | The pattern for a default generated half-precision NaN.
 *----------------------------------------------------------------------------*/
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 05/20] include/fpu/softfloat: implement float16_set_sign helper
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (3 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 04/20] include/fpu/softfloat: implement float16_chs helper Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-12 13:43   ` Peter Maydell
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 06/20] include/fpu/softfloat: add some float16 constants Alex Bennée
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 include/fpu/softfloat.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index dc71b01dba..8ab5d0df47 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -363,6 +363,11 @@ static inline float16 float16_chs(float16 a)
     return make_float16(float16_val(a) ^ 0x8000);
 }
 
+static inline float16 float16_set_sign(float16 a, int sign)
+{
+    return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
+}
+
 /*----------------------------------------------------------------------------
 | The pattern for a default generated half-precision NaN.
 *----------------------------------------------------------------------------*/
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 06/20] include/fpu/softfloat: add some float16 constants
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (4 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 05/20] include/fpu/softfloat: implement float16_set_sign helper Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-09 13:27   ` Philippe Mathieu-Daudé
  2018-01-12 13:47   ` Peter Maydell
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 07/20] fpu/softfloat: propagate signalling NaNs in MINMAX Alex Bennée
                   ` (14 subsequent siblings)
  20 siblings, 2 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

This defines the same set of common constants for float 16 as defined
for 32 and 64 bit floats. These are often used by target helper
functions. I've also removed constants that are not used by anybody.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - fixup constants, remove unused onces
---
 include/fpu/softfloat.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 8ab5d0df47..e64bf62f3d 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -368,6 +368,11 @@ static inline float16 float16_set_sign(float16 a, int sign)
     return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
 }
 
+#define float16_zero make_float16(0)
+#define float16_one make_float16(0x3a00)
+#define float16_half make_float16(0x3800)
+#define float16_infinity make_float16(0x7c00)
+
 /*----------------------------------------------------------------------------
 | The pattern for a default generated half-precision NaN.
 *----------------------------------------------------------------------------*/
@@ -474,8 +479,6 @@ static inline float32 float32_set_sign(float32 a, int sign)
 
 #define float32_zero make_float32(0)
 #define float32_one make_float32(0x3f800000)
-#define float32_ln2 make_float32(0x3f317218)
-#define float32_pi make_float32(0x40490fdb)
 #define float32_half make_float32(0x3f000000)
 #define float32_infinity make_float32(0x7f800000)
 
@@ -588,7 +591,6 @@ static inline float64 float64_set_sign(float64 a, int sign)
 #define float64_zero make_float64(0)
 #define float64_one make_float64(0x3ff0000000000000LL)
 #define float64_ln2 make_float64(0x3fe62e42fefa39efLL)
-#define float64_pi make_float64(0x400921fb54442d18LL)
 #define float64_half make_float64(0x3fe0000000000000LL)
 #define float64_infinity make_float64(0x7ff0000000000000LL)
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 07/20] fpu/softfloat: propagate signalling NaNs in MINMAX
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (5 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 06/20] include/fpu/softfloat: add some float16 constants Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-12 14:04   ` Peter Maydell
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 08/20] fpu/softfloat: improve comments on ARM NaN propagation Alex Bennée
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

While a comparison between a QNaN and a number will return the number
it is not the same with a signaling NaN. In this case the SNaN will
"win" and after potentially raising an exception it will be quietened.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
v2
  - added return for propageFloat
---
 fpu/softfloat.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3a4ab1355f..44c043924e 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -7683,6 +7683,7 @@ int float128_compare_quiet(float128 a, float128 b, float_status *status)
  * minnum() and maxnum() functions. These are similar to the min()
  * and max() functions but if one of the arguments is a QNaN and
  * the other is numerical then the numerical argument is returned.
+ * SNaNs will get quietened before being returned.
  * minnum() and maxnum correspond to the IEEE 754-2008 minNum()
  * and maxNum() operations. min() and max() are the typical min/max
  * semantics provided by many CPUs which predate that specification.
@@ -7703,11 +7704,14 @@ static inline float ## s float ## s ## _minmax(float ## s a, float ## s b,     \
     if (float ## s ## _is_any_nan(a) ||                                 \
         float ## s ## _is_any_nan(b)) {                                 \
         if (isieee) {                                                   \
-            if (float ## s ## _is_quiet_nan(a, status) &&               \
+            if (float ## s ## _is_signaling_nan(a, status) ||           \
+                float ## s ## _is_signaling_nan(b, status)) {           \
+                return propagateFloat ## s ## NaN(a, b, status);        \
+            } else  if (float ## s ## _is_quiet_nan(a, status) &&       \
                 !float ## s ##_is_any_nan(b)) {                         \
                 return b;                                               \
             } else if (float ## s ## _is_quiet_nan(b, status) &&        \
-                       !float ## s ## _is_any_nan(a)) {                \
+                       !float ## s ## _is_any_nan(a)) {                 \
                 return a;                                               \
             }                                                           \
         }                                                               \
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 08/20] fpu/softfloat: improve comments on ARM NaN propagation
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (6 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 07/20] fpu/softfloat: propagate signalling NaNs in MINMAX Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-12 14:07   ` Peter Maydell
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 09/20] fpu/softfloat: move the extract functions to the top of the file Alex Bennée
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

Mention the pseudo-code fragment from which this is based and correct
the spelling of signalling.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat-specialize.h | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
index de2c5d5702..3d507d8c77 100644
--- a/fpu/softfloat-specialize.h
+++ b/fpu/softfloat-specialize.h
@@ -445,14 +445,15 @@ static float32 commonNaNToFloat32(commonNaNT a, float_status *status)
 
 #if defined(TARGET_ARM)
 static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN,
-                    flag aIsLargerSignificand)
+                   flag aIsLargerSignificand)
 {
-    /* ARM mandated NaN propagation rules: take the first of:
-     *  1. A if it is signaling
-     *  2. B if it is signaling
+    /* ARM mandated NaN propagation rules (see FPProcessNaNs()), take
+     * the first of:
+     *  1. A if it is signalling
+     *  2. B if it is signalling
      *  3. A (quiet)
      *  4. B (quiet)
-     * A signaling NaN is always quietened before returning it.
+     * A signalling NaN is always quietened before returning it.
      */
     if (aIsSNaN) {
         return 0;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 09/20] fpu/softfloat: move the extract functions to the top of the file
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (7 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 08/20] fpu/softfloat: improve comments on ARM NaN propagation Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-12 14:07   ` Peter Maydell
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures Alex Bennée
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

This is pure code-motion during re-factoring as the helpers will be
needed earlier.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

---
v2
  - fix minor white space nit
---
 fpu/softfloat.c | 120 +++++++++++++++++++++++++-------------------------------
 1 file changed, 54 insertions(+), 66 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 44c043924e..59afe81d06 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -132,6 +132,60 @@ static inline flag extractFloat16Sign(float16 a)
     return float16_val(a)>>15;
 }
 
+/*----------------------------------------------------------------------------
+| Returns the fraction bits of the single-precision floating-point value `a'.
+*----------------------------------------------------------------------------*/
+
+static inline uint32_t extractFloat32Frac(float32 a)
+{
+    return float32_val(a) & 0x007FFFFF;
+}
+
+/*----------------------------------------------------------------------------
+| Returns the exponent bits of the single-precision floating-point value `a'.
+*----------------------------------------------------------------------------*/
+
+static inline int extractFloat32Exp(float32 a)
+{
+    return (float32_val(a) >> 23) & 0xFF;
+}
+
+/*----------------------------------------------------------------------------
+| Returns the sign bit of the single-precision floating-point value `a'.
+*----------------------------------------------------------------------------*/
+
+static inline flag extractFloat32Sign(float32 a)
+{
+    return float32_val(a) >> 31;
+}
+
+/*----------------------------------------------------------------------------
+| Returns the fraction bits of the double-precision floating-point value `a'.
+*----------------------------------------------------------------------------*/
+
+static inline uint64_t extractFloat64Frac(float64 a)
+{
+    return float64_val(a) & LIT64(0x000FFFFFFFFFFFFF);
+}
+
+/*----------------------------------------------------------------------------
+| Returns the exponent bits of the double-precision floating-point value `a'.
+*----------------------------------------------------------------------------*/
+
+static inline int extractFloat64Exp(float64 a)
+{
+    return (float64_val(a) >> 52) & 0x7FF;
+}
+
+/*----------------------------------------------------------------------------
+| Returns the sign bit of the double-precision floating-point value `a'.
+*----------------------------------------------------------------------------*/
+
+static inline flag extractFloat64Sign(float64 a)
+{
+    return float64_val(a) >> 63;
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -299,39 +353,6 @@ static int64_t roundAndPackUint64(flag zSign, uint64_t absZ0,
     return absZ0;
 }
 
-/*----------------------------------------------------------------------------
-| Returns the fraction bits of the single-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline uint32_t extractFloat32Frac( float32 a )
-{
-
-    return float32_val(a) & 0x007FFFFF;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the exponent bits of the single-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline int extractFloat32Exp(float32 a)
-{
-
-    return ( float32_val(a)>>23 ) & 0xFF;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the sign bit of the single-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline flag extractFloat32Sign( float32 a )
-{
-
-    return float32_val(a)>>31;
-
-}
-
 /*----------------------------------------------------------------------------
 | If `a' is denormal and we are in flush-to-zero mode then set the
 | input-denormal exception and return zero. Otherwise just return the value.
@@ -492,39 +513,6 @@ static float32
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the fraction bits of the double-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline uint64_t extractFloat64Frac( float64 a )
-{
-
-    return float64_val(a) & LIT64( 0x000FFFFFFFFFFFFF );
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the exponent bits of the double-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline int extractFloat64Exp(float64 a)
-{
-
-    return ( float64_val(a)>>52 ) & 0x7FF;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the sign bit of the double-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline flag extractFloat64Sign( float64 a )
-{
-
-    return float64_val(a)>>63;
-
-}
-
 /*----------------------------------------------------------------------------
 | If `a' is denormal and we are in flush-to-zero mode then set the
 | input-denormal exception and return zero. Otherwise just return the value.
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (8 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 09/20] fpu/softfloat: move the extract functions to the top of the file Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-09 17:01   ` Richard Henderson
                     ` (2 more replies)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 11/20] fpu/softfloat: re-factor add/sub Alex Bennée
                   ` (10 subsequent siblings)
  20 siblings, 3 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

These structures pave the way for generic softfloat helper routines
that will operate on fully decomposed numbers.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 59afe81d06..fcba28d3f8 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -83,7 +83,7 @@ this code that are retained.
  * target-dependent and needs the TARGET_* macros.
  */
 #include "qemu/osdep.h"
-
+#include "qemu/bitops.h"
 #include "fpu/softfloat.h"
 
 /* We only need stdlib for abort() */
@@ -186,6 +186,74 @@ static inline flag extractFloat64Sign(float64 a)
     return float64_val(a) >> 63;
 }
 
+/*----------------------------------------------------------------------------
+| Classify a floating point number.
+*----------------------------------------------------------------------------*/
+
+typedef enum {
+    float_class_unclassified,
+    float_class_zero,
+    float_class_normal,
+    float_class_inf,
+    float_class_qnan,
+    float_class_snan,
+    float_class_dnan,
+    float_class_msnan, /* maybe silenced */
+} float_class;
+
+/*----------------------------------------------------------------------------
+| Structure holding all of the decomposed parts of a float.
+| The exponent is unbiased and the fraction is normalized.
+*----------------------------------------------------------------------------*/
+
+typedef struct {
+    uint64_t frac   : 64;
+    int exp         : 32;
+    float_class cls : 8;
+    int             : 23;
+    bool sign       : 1;
+} decomposed_parts;
+
+#define DECOMPOSED_BINARY_POINT    (64 - 2)
+#define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
+#define DECOMPOSED_OVERFLOW_BIT    (DECOMPOSED_IMPLICIT_BIT << 1)
+
+/* Structure holding all of the relevant parameters for a format.  */
+typedef struct {
+    int exp_bias;
+    int exp_max;
+    int frac_shift;
+    uint64_t frac_lsb;
+    uint64_t frac_lsbm1;
+    uint64_t round_mask;
+    uint64_t roundeven_mask;
+} decomposed_params;
+
+#define FRAC_PARAMS(F)                     \
+    .frac_shift     = F,                   \
+    .frac_lsb       = 1ull << (F),         \
+    .frac_lsbm1     = 1ull << ((F) - 1),   \
+    .round_mask     = (1ull << (F)) - 1,   \
+    .roundeven_mask = (2ull << (F)) - 1
+
+static const decomposed_params float16_params = {
+    .exp_bias       = 0x0f,
+    .exp_max        = 0x1f,
+    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 10)
+};
+
+static const decomposed_params float32_params = {
+    .exp_bias       = 0x7f,
+    .exp_max        = 0xff,
+    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 23)
+};
+
+static const decomposed_params float64_params = {
+    .exp_bias       = 0x3ff,
+    .exp_max        = 0x7ff,
+    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 52)
+};
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 11/20] fpu/softfloat: re-factor add/sub
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (9 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-12 15:57   ` Peter Maydell
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 12/20] fpu/softfloat: re-factor mul Alex Bennée
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

We can now add float16_add/sub and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 add and sub functions.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c         | 904 +++++++++++++++++++++++++-----------------------
 include/fpu/softfloat.h |   4 +
 2 files changed, 481 insertions(+), 427 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index fcba28d3f8..f89e47e3ef 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -195,7 +195,7 @@ typedef enum {
     float_class_zero,
     float_class_normal,
     float_class_inf,
-    float_class_qnan,
+    float_class_qnan,  /* all NaNs from here */
     float_class_snan,
     float_class_dnan,
     float_class_msnan, /* maybe silenced */
@@ -254,6 +254,482 @@ static const decomposed_params float64_params = {
     FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 52)
 };
 
+/* Unpack a float16 to parts, but do not canonicalize.  */
+static inline decomposed_parts float16_unpack_raw(float16 f)
+{
+    return (decomposed_parts){
+        .cls = float_class_unclassified,
+        .sign = extract32(f, 15, 1),
+        .exp = extract32(f, 10, 5),
+        .frac = extract32(f, 0, 10)
+    };
+}
+
+/* Unpack a float32 to parts, but do not canonicalize.  */
+static inline decomposed_parts float32_unpack_raw(float32 f)
+{
+    return (decomposed_parts){
+        .cls = float_class_unclassified,
+        .sign = extract32(f, 31, 1),
+        .exp = extract32(f, 23, 8),
+        .frac = extract32(f, 0, 23)
+    };
+}
+
+/* Unpack a float64 to parts, but do not canonicalize.  */
+static inline decomposed_parts float64_unpack_raw(float64 f)
+{
+    return (decomposed_parts){
+        .cls = float_class_unclassified,
+        .sign = extract64(f, 63, 1),
+        .exp = extract64(f, 52, 11),
+        .frac = extract64(f, 0, 52),
+    };
+}
+
+/* Pack a float32 from parts, but do not canonicalize.  */
+static inline float16 float16_pack_raw(decomposed_parts p)
+{
+    uint32_t ret = p.frac;
+    ret = deposit32(ret, 10, 5, p.exp);
+    ret = deposit32(ret, 15, 1, p.sign);
+    return make_float16(ret);
+}
+
+/* Pack a float32 from parts, but do not canonicalize.  */
+static inline float32 float32_pack_raw(decomposed_parts p)
+{
+    uint32_t ret = p.frac;
+    ret = deposit32(ret, 23, 8, p.exp);
+    ret = deposit32(ret, 31, 1, p.sign);
+    return make_float32(ret);
+}
+
+/* Pack a float64 from parts, but do not canonicalize.  */
+static inline float64 float64_pack_raw(decomposed_parts p)
+{
+    uint64_t ret = p.frac;
+    ret = deposit64(ret, 52, 11, p.exp);
+    ret = deposit64(ret, 63, 1, p.sign);
+    return make_float64(ret);
+}
+
+/* Canonicalize EXP and FRAC, setting CLS.  */
+static decomposed_parts decomposed_canonicalize(decomposed_parts part,
+                                        const decomposed_params *parm,
+                                        float_status *status)
+{
+    if (part.exp == parm->exp_max) {
+        if (part.frac == 0) {
+            part.cls = float_class_inf;
+        } else {
+#ifdef NO_SIGNALING_NANS
+            part.cls = float_class_qnan;
+#else
+            int64_t msb = part.frac << (parm->frac_shift + 2);
+            if ((msb < 0) == status->snan_bit_is_one) {
+                part.cls = float_class_snan;
+            } else {
+                part.cls = float_class_qnan;
+            }
+#endif
+        }
+    } else if (part.exp == 0) {
+        if (likely(part.frac == 0)) {
+            part.cls = float_class_zero;
+        } else if (status->flush_inputs_to_zero) {
+            float_raise(float_flag_input_denormal, status);
+            part.cls = float_class_zero;
+            part.frac = 0;
+        } else {
+            int shift = clz64(part.frac) - 1;
+            part.cls = float_class_normal;
+            part.exp = parm->frac_shift - parm->exp_bias - shift + 1;
+            part.frac <<= shift;
+        }
+    } else {
+        part.cls = float_class_normal;
+        part.exp -= parm->exp_bias;
+        part.frac = DECOMPOSED_IMPLICIT_BIT + (part.frac << parm->frac_shift);
+    }
+    return part;
+}
+
+/* Round and uncanonicalize a floating-point number by parts.
+   There are FRAC_SHIFT bits that may require rounding at the bottom
+   of the fraction; these bits will be removed.  The exponent will be
+   biased by EXP_BIAS and must be bounded by [EXP_MAX-1, 0].  */
+static decomposed_parts decomposed_round_canonical(decomposed_parts p,
+                                                   float_status *s,
+                                                   const decomposed_params *parm)
+{
+    const uint64_t frac_lsbm1 = parm->frac_lsbm1;
+    const uint64_t round_mask = parm->round_mask;
+    const uint64_t roundeven_mask = parm->roundeven_mask;
+    const int exp_max = parm->exp_max;
+    const int frac_shift = parm->frac_shift;
+    uint64_t frac, inc;
+    int exp, flags = 0;
+    bool overflow_norm;
+
+    frac = p.frac;
+    exp = p.exp;
+
+    switch (p.cls) {
+    case float_class_normal:
+        switch (s->float_rounding_mode) {
+        case float_round_nearest_even:
+            overflow_norm = false;
+            inc = ((frac & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
+            break;
+        case float_round_ties_away:
+            overflow_norm = false;
+            inc = frac_lsbm1;
+            break;
+        case float_round_to_zero:
+            overflow_norm = true;
+            inc = 0;
+            break;
+        case float_round_up:
+            inc = p.sign ? 0 : round_mask;
+            overflow_norm = p.sign;
+            break;
+        case float_round_down:
+            inc = p.sign ? round_mask : 0;
+            overflow_norm = !p.sign;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+
+        exp += parm->exp_bias;
+        if (likely(exp > 0)) {
+            if (frac & round_mask) {
+                flags |= float_flag_inexact;
+                frac += inc;
+                if (frac & DECOMPOSED_OVERFLOW_BIT) {
+                    frac >>= 1;
+                    exp++;
+                }
+            }
+            frac >>= frac_shift;
+
+            if (unlikely(exp >= exp_max)) {
+                flags |= float_flag_overflow | float_flag_inexact;
+                if (overflow_norm) {
+                    exp = exp_max - 1;
+                    frac = -1;
+                } else {
+                    p.cls = float_class_inf;
+                    goto do_inf;
+                }
+            }
+        } else if (s->flush_to_zero) {
+            flags |= float_flag_output_denormal;
+            p.cls = float_class_zero;
+            goto do_zero;
+        } else {
+            bool is_tiny = (s->float_detect_tininess
+                            == float_tininess_before_rounding)
+                        || (exp < 0)
+                        || !((frac + inc) & DECOMPOSED_OVERFLOW_BIT);
+
+            shift64RightJamming(frac, 1 - exp, &frac);
+            if (frac & round_mask) {
+                /* Need to recompute round-to-even.  */
+                if (s->float_rounding_mode == float_round_nearest_even) {
+                    inc = ((frac & roundeven_mask) != frac_lsbm1
+                           ? frac_lsbm1 : 0);
+                }
+                flags |= float_flag_inexact;
+                frac += inc;
+            }
+
+            exp = (frac & DECOMPOSED_IMPLICIT_BIT ? 1 : 0);
+            frac >>= frac_shift;
+
+            if (is_tiny && (flags & float_flag_inexact)) {
+                flags |= float_flag_underflow;
+            }
+            if (exp == 0 && frac == 0) {
+                p.cls = float_class_zero;
+            }
+        }
+        break;
+
+    case float_class_zero:
+    do_zero:
+        exp = 0;
+        frac = 0;
+        break;
+
+    case float_class_inf:
+    do_inf:
+        exp = exp_max;
+        frac = 0;
+        break;
+
+    case float_class_qnan:
+    case float_class_snan:
+        exp = exp_max;
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+
+    float_raise(flags, s);
+    p.exp = exp;
+    p.frac = frac;
+    return p;
+}
+
+static decomposed_parts float16_unpack_canonical(float16 f, float_status *s)
+{
+    return decomposed_canonicalize(float16_unpack_raw(f), &float16_params, s);
+}
+
+static float16 float16_round_pack_canonical(decomposed_parts p, float_status *s)
+{
+    switch (p.cls) {
+    case float_class_dnan:
+        return float16_default_nan(s);
+    case float_class_msnan:
+        return float16_maybe_silence_nan(float16_pack_raw(p), s);
+    default:
+        p = decomposed_round_canonical(p, s, &float16_params);
+        return float16_pack_raw(p);
+    }
+}
+
+static decomposed_parts float32_unpack_canonical(float32 f, float_status *s)
+{
+    return decomposed_canonicalize(float32_unpack_raw(f), &float32_params, s);
+}
+
+static float32 float32_round_pack_canonical(decomposed_parts p, float_status *s)
+{
+    switch (p.cls) {
+    case float_class_dnan:
+        return float32_default_nan(s);
+    case float_class_msnan:
+        return float32_maybe_silence_nan(float32_pack_raw(p), s);
+    default:
+        p = decomposed_round_canonical(p, s, &float32_params);
+        return float32_pack_raw(p);
+    }
+}
+
+static decomposed_parts float64_unpack_canonical(float64 f, float_status *s)
+{
+    return decomposed_canonicalize(float64_unpack_raw(f), &float64_params, s);
+}
+
+static float64 float64_round_pack_canonical(decomposed_parts p, float_status *s)
+{
+    switch (p.cls) {
+    case float_class_dnan:
+        return float64_default_nan(s);
+    case float_class_msnan:
+        return float64_maybe_silence_nan(float64_pack_raw(p), s);
+    default:
+        p = decomposed_round_canonical(p, s, &float64_params);
+        return float64_pack_raw(p);
+    }
+}
+
+static decomposed_parts pick_nan_parts(decomposed_parts a, decomposed_parts b,
+                                       float_status *s)
+{
+    if (a.cls == float_class_snan || b.cls == float_class_snan) {
+        s->float_exception_flags |= float_flag_invalid;
+    }
+
+    if (s->default_nan_mode) {
+        a.cls = float_class_dnan;
+    } else {
+        if (pickNaN(a.cls == float_class_qnan,
+                    a.cls == float_class_snan,
+                    b.cls == float_class_qnan,
+                    b.cls == float_class_snan,
+                    a.frac > b.frac
+                    || (a.frac == b.frac && a.sign < b.sign))) {
+            a = b;
+        }
+        a.cls = float_class_msnan;
+    }
+    return a;
+}
+
+
+/*
+ * Returns the result of adding the absolute values of the
+ * floating-point values `a' and `b'. If `subtract' is set, the sum is
+ * negated before being returned. `subtract' is ignored if the result
+ * is a NaN. The addition is performed according to the IEC/IEEE
+ * Standard for Binary Floating-Point Arithmetic.
+ */
+
+static decomposed_parts add_decomposed(decomposed_parts a, decomposed_parts b,
+                                       bool subtract, float_status *s)
+{
+    bool a_sign = a.sign;
+    bool b_sign = b.sign ^ subtract;
+
+    if (a_sign != b_sign) {
+        /* Subtraction */
+
+        if (a.cls == float_class_normal && b.cls == float_class_normal) {
+            int a_exp = a.exp;
+            int b_exp = b.exp;
+            uint64_t a_frac = a.frac;
+            uint64_t b_frac = b.frac;
+
+            if (a_exp > b_exp || (a_exp == b_exp && a_frac >= b_frac)) {
+                shift64RightJamming(b_frac, a_exp - b_exp, &b_frac);
+                a_frac = a_frac - b_frac;
+            } else {
+                shift64RightJamming(a_frac, b_exp - a_exp, &a_frac);
+                a_frac = b_frac - a_frac;
+                a_exp = b_exp;
+                a_sign ^= 1;
+            }
+
+            if (a_frac == 0) {
+                a.cls = float_class_zero;
+                a.sign = s->float_rounding_mode == float_round_down;
+            } else {
+                int shift = clz64(a_frac) - 1;
+                a.frac = a_frac << shift;
+                a.exp = a_exp - shift;
+                a.sign = a_sign;
+            }
+            return a;
+        }
+        if (a.cls >= float_class_qnan
+            ||
+            b.cls >= float_class_qnan)
+        {
+            return pick_nan_parts(a, b, s);
+        }
+        if (a.cls == float_class_inf) {
+            if (b.cls == float_class_inf) {
+                float_raise(float_flag_invalid, s);
+                a.cls = float_class_dnan;
+            }
+            return a;
+        }
+        if (a.cls == float_class_zero && b.cls == float_class_zero) {
+            a.sign = s->float_rounding_mode == float_round_down;
+            return a;
+        }
+        if (a.cls == float_class_zero || b.cls == float_class_inf) {
+            b.sign = a_sign ^ 1;
+            return b;
+        }
+        if (b.cls == float_class_zero) {
+            return a;
+        }
+    } else {
+        /* Addition */
+        if (a.cls == float_class_normal && b.cls == float_class_normal) {
+            int a_exp = a.exp;
+            int b_exp = b.exp;
+            uint64_t a_frac = a.frac;
+            uint64_t b_frac = b.frac;
+
+            if (a_exp > b_exp) {
+                shift64RightJamming(b_frac, a_exp - b_exp, &b_frac);
+            } else if (a_exp < b_exp) {
+                shift64RightJamming(a_frac, b_exp - a_exp, &a_frac);
+                a_exp = b_exp;
+            }
+            a_frac += b_frac;
+            if (a_frac & DECOMPOSED_OVERFLOW_BIT) {
+                a_frac >>= 1;
+                a_exp += 1;
+            }
+
+            a.exp = a_exp;
+            a.frac = a_frac;
+            return a;
+        }
+        if (a.cls >= float_class_qnan
+            ||
+            b.cls >= float_class_qnan) {
+            return pick_nan_parts(a, b, s);
+        }
+        if (a.cls == float_class_inf || b.cls == float_class_zero) {
+            return a;
+        }
+        if (b.cls == float_class_inf || a.cls == float_class_zero) {
+            b.sign = b_sign;
+            return b;
+        }
+    }
+    g_assert_not_reached();
+}
+
+/*
+ * Returns the result of adding or subtracting the floating-point
+ * values `a' and `b'. The operation is performed according to the
+ * IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+ */
+
+float16 float16_add(float16 a, float16 b, float_status *status)
+{
+    decomposed_parts pa = float16_unpack_canonical(a, status);
+    decomposed_parts pb = float16_unpack_canonical(b, status);
+    decomposed_parts pr = add_decomposed(pa, pb, false, status);
+
+    return float16_round_pack_canonical(pr, status);
+}
+
+float32 float32_add(float32 a, float32 b, float_status *status)
+{
+    decomposed_parts pa = float32_unpack_canonical(a, status);
+    decomposed_parts pb = float32_unpack_canonical(b, status);
+    decomposed_parts pr = add_decomposed(pa, pb, false, status);
+
+    return float32_round_pack_canonical(pr, status);
+}
+
+float64 float64_add(float64 a, float64 b, float_status *status)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, status);
+    decomposed_parts pb = float64_unpack_canonical(b, status);
+    decomposed_parts pr = add_decomposed(pa, pb, false, status);
+
+    return float64_round_pack_canonical(pr, status);
+}
+
+float16 float16_sub(float16 a, float16 b, float_status *status)
+{
+    decomposed_parts pa = float16_unpack_canonical(a, status);
+    decomposed_parts pb = float16_unpack_canonical(b, status);
+    decomposed_parts pr = add_decomposed(pa, pb, true, status);
+
+    return float16_round_pack_canonical(pr, status);
+}
+
+float32 float32_sub(float32 a, float32 b, float_status *status)
+{
+    decomposed_parts pa = float32_unpack_canonical(a, status);
+    decomposed_parts pb = float32_unpack_canonical(b, status);
+    decomposed_parts pr = add_decomposed(pa, pb, true, status);
+
+    return float32_round_pack_canonical(pr, status);
+}
+
+float64 float64_sub(float64 a, float64 b, float_status *status)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, status);
+    decomposed_parts pb = float64_unpack_canonical(b, status);
+    decomposed_parts pr = add_decomposed(pa, pb, true, status);
+
+    return float64_round_pack_canonical(pr, status);
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -2065,219 +2541,6 @@ float32 float32_round_to_int(float32 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of adding the absolute values of the single-precision
-| floating-point values `a' and `b'.  If `zSign' is 1, the sum is negated
-| before being returned.  `zSign' is ignored if the result is a NaN.
-| The addition is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float32 addFloat32Sigs(float32 a, float32 b, flag zSign,
-                              float_status *status)
-{
-    int aExp, bExp, zExp;
-    uint32_t aSig, bSig, zSig;
-    int expDiff;
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    bSig = extractFloat32Frac( b );
-    bExp = extractFloat32Exp( b );
-    expDiff = aExp - bExp;
-    aSig <<= 6;
-    bSig <<= 6;
-    if ( 0 < expDiff ) {
-        if ( aExp == 0xFF ) {
-            if (aSig) {
-                return propagateFloat32NaN(a, b, status);
-            }
-            return a;
-        }
-        if ( bExp == 0 ) {
-            --expDiff;
-        }
-        else {
-            bSig |= 0x20000000;
-        }
-        shift32RightJamming( bSig, expDiff, &bSig );
-        zExp = aExp;
-    }
-    else if ( expDiff < 0 ) {
-        if ( bExp == 0xFF ) {
-            if (bSig) {
-                return propagateFloat32NaN(a, b, status);
-            }
-            return packFloat32( zSign, 0xFF, 0 );
-        }
-        if ( aExp == 0 ) {
-            ++expDiff;
-        }
-        else {
-            aSig |= 0x20000000;
-        }
-        shift32RightJamming( aSig, - expDiff, &aSig );
-        zExp = bExp;
-    }
-    else {
-        if ( aExp == 0xFF ) {
-            if (aSig | bSig) {
-                return propagateFloat32NaN(a, b, status);
-            }
-            return a;
-        }
-        if ( aExp == 0 ) {
-            if (status->flush_to_zero) {
-                if (aSig | bSig) {
-                    float_raise(float_flag_output_denormal, status);
-                }
-                return packFloat32(zSign, 0, 0);
-            }
-            return packFloat32( zSign, 0, ( aSig + bSig )>>6 );
-        }
-        zSig = 0x40000000 + aSig + bSig;
-        zExp = aExp;
-        goto roundAndPack;
-    }
-    aSig |= 0x20000000;
-    zSig = ( aSig + bSig )<<1;
-    --zExp;
-    if ( (int32_t) zSig < 0 ) {
-        zSig = aSig + bSig;
-        ++zExp;
-    }
- roundAndPack:
-    return roundAndPackFloat32(zSign, zExp, zSig, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the absolute values of the single-
-| precision floating-point values `a' and `b'.  If `zSign' is 1, the
-| difference is negated before being returned.  `zSign' is ignored if the
-| result is a NaN.  The subtraction is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float32 subFloat32Sigs(float32 a, float32 b, flag zSign,
-                              float_status *status)
-{
-    int aExp, bExp, zExp;
-    uint32_t aSig, bSig, zSig;
-    int expDiff;
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    bSig = extractFloat32Frac( b );
-    bExp = extractFloat32Exp( b );
-    expDiff = aExp - bExp;
-    aSig <<= 7;
-    bSig <<= 7;
-    if ( 0 < expDiff ) goto aExpBigger;
-    if ( expDiff < 0 ) goto bExpBigger;
-    if ( aExp == 0xFF ) {
-        if (aSig | bSig) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        float_raise(float_flag_invalid, status);
-        return float32_default_nan(status);
-    }
-    if ( aExp == 0 ) {
-        aExp = 1;
-        bExp = 1;
-    }
-    if ( bSig < aSig ) goto aBigger;
-    if ( aSig < bSig ) goto bBigger;
-    return packFloat32(status->float_rounding_mode == float_round_down, 0, 0);
- bExpBigger:
-    if ( bExp == 0xFF ) {
-        if (bSig) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        return packFloat32( zSign ^ 1, 0xFF, 0 );
-    }
-    if ( aExp == 0 ) {
-        ++expDiff;
-    }
-    else {
-        aSig |= 0x40000000;
-    }
-    shift32RightJamming( aSig, - expDiff, &aSig );
-    bSig |= 0x40000000;
- bBigger:
-    zSig = bSig - aSig;
-    zExp = bExp;
-    zSign ^= 1;
-    goto normalizeRoundAndPack;
- aExpBigger:
-    if ( aExp == 0xFF ) {
-        if (aSig) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        return a;
-    }
-    if ( bExp == 0 ) {
-        --expDiff;
-    }
-    else {
-        bSig |= 0x40000000;
-    }
-    shift32RightJamming( bSig, expDiff, &bSig );
-    aSig |= 0x40000000;
- aBigger:
-    zSig = aSig - bSig;
-    zExp = aExp;
- normalizeRoundAndPack:
-    --zExp;
-    return normalizeRoundAndPackFloat32(zSign, zExp, zSig, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of adding the single-precision floating-point values `a'
-| and `b'.  The operation is performed according to the IEC/IEEE Standard for
-| Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 float32_add(float32 a, float32 b, float_status *status)
-{
-    flag aSign, bSign;
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-
-    aSign = extractFloat32Sign( a );
-    bSign = extractFloat32Sign( b );
-    if ( aSign == bSign ) {
-        return addFloat32Sigs(a, b, aSign, status);
-    }
-    else {
-        return subFloat32Sigs(a, b, aSign, status);
-    }
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the single-precision floating-point values
-| `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
-| for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 float32_sub(float32 a, float32 b, float_status *status)
-{
-    flag aSign, bSign;
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-
-    aSign = extractFloat32Sign( a );
-    bSign = extractFloat32Sign( b );
-    if ( aSign == bSign ) {
-        return subFloat32Sigs(a, b, aSign, status);
-    }
-    else {
-        return addFloat32Sigs(a, b, aSign, status);
-    }
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the result of multiplying the single-precision floating-point values
@@ -3875,219 +4138,6 @@ float64 float64_trunc_to_int(float64 a, float_status *status)
     return res;
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of adding the absolute values of the double-precision
-| floating-point values `a' and `b'.  If `zSign' is 1, the sum is negated
-| before being returned.  `zSign' is ignored if the result is a NaN.
-| The addition is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float64 addFloat64Sigs(float64 a, float64 b, flag zSign,
-                              float_status *status)
-{
-    int aExp, bExp, zExp;
-    uint64_t aSig, bSig, zSig;
-    int expDiff;
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    bSig = extractFloat64Frac( b );
-    bExp = extractFloat64Exp( b );
-    expDiff = aExp - bExp;
-    aSig <<= 9;
-    bSig <<= 9;
-    if ( 0 < expDiff ) {
-        if ( aExp == 0x7FF ) {
-            if (aSig) {
-                return propagateFloat64NaN(a, b, status);
-            }
-            return a;
-        }
-        if ( bExp == 0 ) {
-            --expDiff;
-        }
-        else {
-            bSig |= LIT64( 0x2000000000000000 );
-        }
-        shift64RightJamming( bSig, expDiff, &bSig );
-        zExp = aExp;
-    }
-    else if ( expDiff < 0 ) {
-        if ( bExp == 0x7FF ) {
-            if (bSig) {
-                return propagateFloat64NaN(a, b, status);
-            }
-            return packFloat64( zSign, 0x7FF, 0 );
-        }
-        if ( aExp == 0 ) {
-            ++expDiff;
-        }
-        else {
-            aSig |= LIT64( 0x2000000000000000 );
-        }
-        shift64RightJamming( aSig, - expDiff, &aSig );
-        zExp = bExp;
-    }
-    else {
-        if ( aExp == 0x7FF ) {
-            if (aSig | bSig) {
-                return propagateFloat64NaN(a, b, status);
-            }
-            return a;
-        }
-        if ( aExp == 0 ) {
-            if (status->flush_to_zero) {
-                if (aSig | bSig) {
-                    float_raise(float_flag_output_denormal, status);
-                }
-                return packFloat64(zSign, 0, 0);
-            }
-            return packFloat64( zSign, 0, ( aSig + bSig )>>9 );
-        }
-        zSig = LIT64( 0x4000000000000000 ) + aSig + bSig;
-        zExp = aExp;
-        goto roundAndPack;
-    }
-    aSig |= LIT64( 0x2000000000000000 );
-    zSig = ( aSig + bSig )<<1;
-    --zExp;
-    if ( (int64_t) zSig < 0 ) {
-        zSig = aSig + bSig;
-        ++zExp;
-    }
- roundAndPack:
-    return roundAndPackFloat64(zSign, zExp, zSig, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the absolute values of the double-
-| precision floating-point values `a' and `b'.  If `zSign' is 1, the
-| difference is negated before being returned.  `zSign' is ignored if the
-| result is a NaN.  The subtraction is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float64 subFloat64Sigs(float64 a, float64 b, flag zSign,
-                              float_status *status)
-{
-    int aExp, bExp, zExp;
-    uint64_t aSig, bSig, zSig;
-    int expDiff;
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    bSig = extractFloat64Frac( b );
-    bExp = extractFloat64Exp( b );
-    expDiff = aExp - bExp;
-    aSig <<= 10;
-    bSig <<= 10;
-    if ( 0 < expDiff ) goto aExpBigger;
-    if ( expDiff < 0 ) goto bExpBigger;
-    if ( aExp == 0x7FF ) {
-        if (aSig | bSig) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        float_raise(float_flag_invalid, status);
-        return float64_default_nan(status);
-    }
-    if ( aExp == 0 ) {
-        aExp = 1;
-        bExp = 1;
-    }
-    if ( bSig < aSig ) goto aBigger;
-    if ( aSig < bSig ) goto bBigger;
-    return packFloat64(status->float_rounding_mode == float_round_down, 0, 0);
- bExpBigger:
-    if ( bExp == 0x7FF ) {
-        if (bSig) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        return packFloat64( zSign ^ 1, 0x7FF, 0 );
-    }
-    if ( aExp == 0 ) {
-        ++expDiff;
-    }
-    else {
-        aSig |= LIT64( 0x4000000000000000 );
-    }
-    shift64RightJamming( aSig, - expDiff, &aSig );
-    bSig |= LIT64( 0x4000000000000000 );
- bBigger:
-    zSig = bSig - aSig;
-    zExp = bExp;
-    zSign ^= 1;
-    goto normalizeRoundAndPack;
- aExpBigger:
-    if ( aExp == 0x7FF ) {
-        if (aSig) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        return a;
-    }
-    if ( bExp == 0 ) {
-        --expDiff;
-    }
-    else {
-        bSig |= LIT64( 0x4000000000000000 );
-    }
-    shift64RightJamming( bSig, expDiff, &bSig );
-    aSig |= LIT64( 0x4000000000000000 );
- aBigger:
-    zSig = aSig - bSig;
-    zExp = aExp;
- normalizeRoundAndPack:
-    --zExp;
-    return normalizeRoundAndPackFloat64(zSign, zExp, zSig, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of adding the double-precision floating-point values `a'
-| and `b'.  The operation is performed according to the IEC/IEEE Standard for
-| Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 float64_add(float64 a, float64 b, float_status *status)
-{
-    flag aSign, bSign;
-    a = float64_squash_input_denormal(a, status);
-    b = float64_squash_input_denormal(b, status);
-
-    aSign = extractFloat64Sign( a );
-    bSign = extractFloat64Sign( b );
-    if ( aSign == bSign ) {
-        return addFloat64Sigs(a, b, aSign, status);
-    }
-    else {
-        return subFloat64Sigs(a, b, aSign, status);
-    }
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the double-precision floating-point values
-| `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
-| for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 float64_sub(float64 a, float64 b, float_status *status)
-{
-    flag aSign, bSign;
-    a = float64_squash_input_denormal(a, status);
-    b = float64_squash_input_denormal(b, status);
-
-    aSign = extractFloat64Sign( a );
-    bSign = extractFloat64Sign( b );
-    if ( aSign == bSign ) {
-        return subFloat64Sigs(a, b, aSign, status);
-    }
-    else {
-        return addFloat64Sigs(a, b, aSign, status);
-    }
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the result of multiplying the double-precision floating-point values
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index e64bf62f3d..3a21a2bcef 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -318,6 +318,10 @@ float64 float16_to_float64(float16 a, flag ieee, float_status *status);
 /*----------------------------------------------------------------------------
 | Software half-precision operations.
 *----------------------------------------------------------------------------*/
+
+float16 float16_add(float16, float16, float_status *status);
+float16 float16_sub(float16, float16, float_status *status);
+
 int float16_is_quiet_nan(float16, float_status *status);
 int float16_is_signaling_nan(float16, float_status *status);
 float16 float16_maybe_silence_nan(float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 12/20] fpu/softfloat: re-factor mul
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (10 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 11/20] fpu/softfloat: re-factor add/sub Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-09 12:43   ` Philippe Mathieu-Daudé
  2018-01-12 16:17   ` Peter Maydell
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 13/20] fpu/softfloat: re-factor div Alex Bennée
                   ` (8 subsequent siblings)
  20 siblings, 2 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

We can now add float16_mul and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 versions.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c         | 207 ++++++++++++++++++------------------------------
 include/fpu/softfloat.h |   1 +
 2 files changed, 80 insertions(+), 128 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index f89e47e3ef..6e9d4c172c 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -730,6 +730,85 @@ float64 float64_sub(float64 a, float64 b, float_status *status)
     return float64_round_pack_canonical(pr, status);
 }
 
+/*
+ * Returns the result of multiplying the floating-point values `a' and
+ * `b'. The operation is performed according to the IEC/IEEE Standard
+ * for Binary Floating-Point Arithmetic.
+ */
+
+static decomposed_parts mul_decomposed(decomposed_parts a, decomposed_parts b,
+                                       float_status *s)
+{
+    bool sign = a.sign ^ b.sign;
+
+    if (a.cls == float_class_normal && b.cls == float_class_normal) {
+        uint64_t hi, lo;
+        int exp = a.exp + b.exp;
+
+        mul64To128(a.frac, b.frac, &hi, &lo);
+        shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo);
+        if (lo & DECOMPOSED_OVERFLOW_BIT) {
+            shift64RightJamming(lo, 1, &lo);
+            exp += 1;
+        }
+
+        /* Re-use a */
+        a.exp = exp;
+        a.sign = sign;
+        a.frac = lo;
+        return a;
+    }
+    /* handle all the NaN cases */
+    if (a.cls >= float_class_qnan || b.cls >= float_class_qnan) {
+        return pick_nan_parts(a, b, s);
+    }
+    /* Inf * Zero == NaN */
+    if (((1 << a.cls) | (1 << b.cls)) ==
+        ((1 << float_class_inf) | (1 << float_class_zero))) {
+        s->float_exception_flags |= float_flag_invalid;
+        a.cls = float_class_dnan;
+        a.sign = sign;
+        return a;
+    }
+    /* Multiply by 0 or Inf */
+    if (a.cls == float_class_inf || a.cls == float_class_zero) {
+        a.sign = sign;
+        return a;
+    }
+    if (b.cls == float_class_inf || b.cls == float_class_zero) {
+        b.sign = sign;
+        return b;
+    }
+    g_assert_not_reached();
+}
+
+float16 float16_mul(float16 a, float16 b, float_status *status)
+{
+    decomposed_parts pa = float16_unpack_canonical(a, status);
+    decomposed_parts pb = float16_unpack_canonical(b, status);
+    decomposed_parts pr = mul_decomposed(pa, pb, status);
+
+    return float16_round_pack_canonical(pr, status);
+}
+
+float32 float32_mul(float32 a, float32 b, float_status *status)
+{
+    decomposed_parts pa = float32_unpack_canonical(a, status);
+    decomposed_parts pb = float32_unpack_canonical(b, status);
+    decomposed_parts pr = mul_decomposed(pa, pb, status);
+
+    return float32_round_pack_canonical(pr, status);
+}
+
+float64 float64_mul(float64 a, float64 b, float_status *status)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, status);
+    decomposed_parts pb = float64_unpack_canonical(b, status);
+    decomposed_parts pr = mul_decomposed(pa, pb, status);
+
+    return float64_round_pack_canonical(pr, status);
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -2542,70 +2621,6 @@ float32 float32_round_to_int(float32 a, float_status *status)
 }
 
 
-/*----------------------------------------------------------------------------
-| Returns the result of multiplying the single-precision floating-point values
-| `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
-| for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 float32_mul(float32 a, float32 b, float_status *status)
-{
-    flag aSign, bSign, zSign;
-    int aExp, bExp, zExp;
-    uint32_t aSig, bSig;
-    uint64_t zSig64;
-    uint32_t zSig;
-
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    bSig = extractFloat32Frac( b );
-    bExp = extractFloat32Exp( b );
-    bSign = extractFloat32Sign( b );
-    zSign = aSign ^ bSign;
-    if ( aExp == 0xFF ) {
-        if ( aSig || ( ( bExp == 0xFF ) && bSig ) ) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        if ( ( bExp | bSig ) == 0 ) {
-            float_raise(float_flag_invalid, status);
-            return float32_default_nan(status);
-        }
-        return packFloat32( zSign, 0xFF, 0 );
-    }
-    if ( bExp == 0xFF ) {
-        if (bSig) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        if ( ( aExp | aSig ) == 0 ) {
-            float_raise(float_flag_invalid, status);
-            return float32_default_nan(status);
-        }
-        return packFloat32( zSign, 0xFF, 0 );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloat32( zSign, 0, 0 );
-        normalizeFloat32Subnormal( aSig, &aExp, &aSig );
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) return packFloat32( zSign, 0, 0 );
-        normalizeFloat32Subnormal( bSig, &bExp, &bSig );
-    }
-    zExp = aExp + bExp - 0x7F;
-    aSig = ( aSig | 0x00800000 )<<7;
-    bSig = ( bSig | 0x00800000 )<<8;
-    shift64RightJamming( ( (uint64_t) aSig ) * bSig, 32, &zSig64 );
-    zSig = zSig64;
-    if ( 0 <= (int32_t) ( zSig<<1 ) ) {
-        zSig <<= 1;
-        --zExp;
-    }
-    return roundAndPackFloat32(zSign, zExp, zSig, status);
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the result of dividing the single-precision floating-point value `a'
@@ -4138,70 +4153,6 @@ float64 float64_trunc_to_int(float64 a, float_status *status)
     return res;
 }
 
-
-/*----------------------------------------------------------------------------
-| Returns the result of multiplying the double-precision floating-point values
-| `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
-| for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 float64_mul(float64 a, float64 b, float_status *status)
-{
-    flag aSign, bSign, zSign;
-    int aExp, bExp, zExp;
-    uint64_t aSig, bSig, zSig0, zSig1;
-
-    a = float64_squash_input_denormal(a, status);
-    b = float64_squash_input_denormal(b, status);
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    bSig = extractFloat64Frac( b );
-    bExp = extractFloat64Exp( b );
-    bSign = extractFloat64Sign( b );
-    zSign = aSign ^ bSign;
-    if ( aExp == 0x7FF ) {
-        if ( aSig || ( ( bExp == 0x7FF ) && bSig ) ) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        if ( ( bExp | bSig ) == 0 ) {
-            float_raise(float_flag_invalid, status);
-            return float64_default_nan(status);
-        }
-        return packFloat64( zSign, 0x7FF, 0 );
-    }
-    if ( bExp == 0x7FF ) {
-        if (bSig) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        if ( ( aExp | aSig ) == 0 ) {
-            float_raise(float_flag_invalid, status);
-            return float64_default_nan(status);
-        }
-        return packFloat64( zSign, 0x7FF, 0 );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloat64( zSign, 0, 0 );
-        normalizeFloat64Subnormal( aSig, &aExp, &aSig );
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) return packFloat64( zSign, 0, 0 );
-        normalizeFloat64Subnormal( bSig, &bExp, &bSig );
-    }
-    zExp = aExp + bExp - 0x3FF;
-    aSig = ( aSig | LIT64( 0x0010000000000000 ) )<<10;
-    bSig = ( bSig | LIT64( 0x0010000000000000 ) )<<11;
-    mul64To128( aSig, bSig, &zSig0, &zSig1 );
-    zSig0 |= ( zSig1 != 0 );
-    if ( 0 <= (int64_t) ( zSig0<<1 ) ) {
-        zSig0 <<= 1;
-        --zExp;
-    }
-    return roundAndPackFloat64(zSign, zExp, zSig0, status);
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of dividing the double-precision floating-point value `a'
 | by the corresponding value `b'.  The operation is performed according to
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 3a21a2bcef..cfee28061e 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -321,6 +321,7 @@ float64 float16_to_float64(float16 a, flag ieee, float_status *status);
 
 float16 float16_add(float16, float16, float_status *status);
 float16 float16_sub(float16, float16, float_status *status);
+float16 float16_mul(float16, float16, float_status *status);
 
 int float16_is_quiet_nan(float16, float_status *status);
 int float16_is_signaling_nan(float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 13/20] fpu/softfloat: re-factor div
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (11 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 12/20] fpu/softfloat: re-factor mul Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-12 16:22   ` Peter Maydell
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 14/20] fpu/softfloat: re-factor muladd Alex Bennée
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

We can now add float16_div and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 versions.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat-macros.h  |  44 +++++++++
 fpu/softfloat.c         | 235 ++++++++++++++++++------------------------------
 include/fpu/softfloat.h |   1 +
 3 files changed, 134 insertions(+), 146 deletions(-)

diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h
index 9cc6158cb4..980be2c051 100644
--- a/fpu/softfloat-macros.h
+++ b/fpu/softfloat-macros.h
@@ -625,6 +625,50 @@ static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b )
 
 }
 
+/* Nicked from gmp longlong.h __udiv_qrnnd */
+static uint64_t div128To64(uint64_t n0, uint64_t n1, uint64_t d)
+{
+    uint64_t d0, d1, q0, q1, r1, r0, m;
+
+    d0 = (uint32_t)d;
+    d1 = d >> 32;
+
+    r1 = n1 % d1;
+    q1 = n1 / d1;
+    m = q1 * d0;
+    r1 = (r1 << 32) | (n0 >> 32);
+    if (r1 < m) {
+        q1 -= 1;
+        r1 += d;
+        if (r1 >= d) {
+            if (r1 < m) {
+                q1 -= 1;
+                r1 += d;
+            }
+        }
+    }
+    r1 -= m;
+
+    r0 = r1 % d1;
+    q0 = r1 / d1;
+    m = q0 * d0;
+    r0 = (r0 << 32) | (uint32_t)n0;
+    if (r0 < m) {
+        q0 -= 1;
+        r0 += d;
+        if (r0 >= d) {
+            if (r0 < m) {
+                q0 -= 1;
+                r0 += d;
+            }
+        }
+    }
+    r0 -= m;
+
+    /* Return remainder in LSB */
+    return (q1 << 32) | q0 | (r0 != 0);
+}
+
 /*----------------------------------------------------------------------------
 | Returns an approximation to the square root of the 32-bit significand given
 | by `a'.  Considered as an integer, `a' must be at least 2^31.  If bit 0 of
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6e9d4c172c..2b703c12ed 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -809,6 +809,95 @@ float64 float64_mul(float64 a, float64 b, float_status *status)
     return float64_round_pack_canonical(pr, status);
 }
 
+/*
+ * Returns the result of dividing the floating-point value `a' by the
+ * corresponding value `b'. The operation is performed according to
+ * the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+ */
+
+static decomposed_parts div_decomposed(decomposed_parts a, decomposed_parts b,
+                                       float_status *s)
+{
+    bool sign = a.sign ^ b.sign;
+
+    if (a.cls == float_class_normal && b.cls == float_class_normal) {
+        uint64_t temp_lo, temp_hi;
+        int exp = a.exp - b.exp;
+        if (a.frac < b.frac) {
+            exp -= 1;
+            shortShift128Left(0, a.frac, DECOMPOSED_BINARY_POINT + 1,
+                              &temp_hi, &temp_lo);
+        } else {
+            shortShift128Left(0, a.frac, DECOMPOSED_BINARY_POINT,
+                              &temp_hi, &temp_lo);
+        }
+        /* LSB of quot is set if inexact which roundandpack will use
+         * to set flags. Yet again we re-use a for the result */
+        a.frac = div128To64(temp_lo, temp_hi, b.frac);
+        a.sign = sign;
+        a.exp = exp;
+        return a;
+    }
+    /* handle all the NaN cases */
+    if (a.cls >= float_class_qnan || b.cls >= float_class_qnan) {
+        return pick_nan_parts(a, b, s);
+    }
+    /* 0/0 or Inf/Inf */
+    if (a.cls == b.cls
+        &&
+        (a.cls == float_class_inf || a.cls == float_class_zero)) {
+        s->float_exception_flags |= float_flag_invalid;
+        a.cls = float_class_dnan;
+        return a;
+    }
+    /* Div 0 => Inf */
+    if (b.cls == float_class_zero) {
+        s->float_exception_flags |= float_flag_divbyzero;
+        a.cls = float_class_inf;
+        a.sign = sign;
+        return a;
+    }
+    /* Inf / x or 0 / x */
+    if (a.cls == float_class_inf || a.cls == float_class_zero) {
+        a.sign = sign;
+        return a;
+    }
+    /* Div by Inf */
+    if (b.cls == float_class_inf) {
+        a.cls = float_class_zero;
+        a.sign = sign;
+        return a;
+    }
+    g_assert_not_reached();
+}
+
+float16 float16_div(float16 a, float16 b, float_status *status)
+{
+    decomposed_parts pa = float16_unpack_canonical(a, status);
+    decomposed_parts pb = float16_unpack_canonical(b, status);
+    decomposed_parts pr = div_decomposed(pa, pb, status);
+
+    return float16_round_pack_canonical(pr, status);
+}
+
+float32 float32_div(float32 a, float32 b, float_status *status)
+{
+    decomposed_parts pa = float32_unpack_canonical(a, status);
+    decomposed_parts pb = float32_unpack_canonical(b, status);
+    decomposed_parts pr = div_decomposed(pa, pb, status);
+
+    return float32_round_pack_canonical(pr, status);
+}
+
+float64 float64_div(float64 a, float64 b, float_status *status)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, status);
+    decomposed_parts pb = float64_unpack_canonical(b, status);
+    decomposed_parts pr = div_decomposed(pa, pb, status);
+
+    return float64_round_pack_canonical(pr, status);
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -2622,75 +2711,6 @@ float32 float32_round_to_int(float32 a, float_status *status)
 
 
 
-/*----------------------------------------------------------------------------
-| Returns the result of dividing the single-precision floating-point value `a'
-| by the corresponding value `b'.  The operation is performed according to the
-| IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 float32_div(float32 a, float32 b, float_status *status)
-{
-    flag aSign, bSign, zSign;
-    int aExp, bExp, zExp;
-    uint32_t aSig, bSig, zSig;
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    bSig = extractFloat32Frac( b );
-    bExp = extractFloat32Exp( b );
-    bSign = extractFloat32Sign( b );
-    zSign = aSign ^ bSign;
-    if ( aExp == 0xFF ) {
-        if (aSig) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        if ( bExp == 0xFF ) {
-            if (bSig) {
-                return propagateFloat32NaN(a, b, status);
-            }
-            float_raise(float_flag_invalid, status);
-            return float32_default_nan(status);
-        }
-        return packFloat32( zSign, 0xFF, 0 );
-    }
-    if ( bExp == 0xFF ) {
-        if (bSig) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        return packFloat32( zSign, 0, 0 );
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) {
-            if ( ( aExp | aSig ) == 0 ) {
-                float_raise(float_flag_invalid, status);
-                return float32_default_nan(status);
-            }
-            float_raise(float_flag_divbyzero, status);
-            return packFloat32( zSign, 0xFF, 0 );
-        }
-        normalizeFloat32Subnormal( bSig, &bExp, &bSig );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloat32( zSign, 0, 0 );
-        normalizeFloat32Subnormal( aSig, &aExp, &aSig );
-    }
-    zExp = aExp - bExp + 0x7D;
-    aSig = ( aSig | 0x00800000 )<<7;
-    bSig = ( bSig | 0x00800000 )<<8;
-    if ( bSig <= ( aSig + aSig ) ) {
-        aSig >>= 1;
-        ++zExp;
-    }
-    zSig = ( ( (uint64_t) aSig )<<32 ) / bSig;
-    if ( ( zSig & 0x3F ) == 0 ) {
-        zSig |= ( (uint64_t) bSig * zSig != ( (uint64_t) aSig )<<32 );
-    }
-    return roundAndPackFloat32(zSign, zExp, zSig, status);
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the remainder of the single-precision floating-point value `a'
@@ -4153,83 +4173,6 @@ float64 float64_trunc_to_int(float64 a, float_status *status)
     return res;
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of dividing the double-precision floating-point value `a'
-| by the corresponding value `b'.  The operation is performed according to
-| the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 float64_div(float64 a, float64 b, float_status *status)
-{
-    flag aSign, bSign, zSign;
-    int aExp, bExp, zExp;
-    uint64_t aSig, bSig, zSig;
-    uint64_t rem0, rem1;
-    uint64_t term0, term1;
-    a = float64_squash_input_denormal(a, status);
-    b = float64_squash_input_denormal(b, status);
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    bSig = extractFloat64Frac( b );
-    bExp = extractFloat64Exp( b );
-    bSign = extractFloat64Sign( b );
-    zSign = aSign ^ bSign;
-    if ( aExp == 0x7FF ) {
-        if (aSig) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        if ( bExp == 0x7FF ) {
-            if (bSig) {
-                return propagateFloat64NaN(a, b, status);
-            }
-            float_raise(float_flag_invalid, status);
-            return float64_default_nan(status);
-        }
-        return packFloat64( zSign, 0x7FF, 0 );
-    }
-    if ( bExp == 0x7FF ) {
-        if (bSig) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        return packFloat64( zSign, 0, 0 );
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) {
-            if ( ( aExp | aSig ) == 0 ) {
-                float_raise(float_flag_invalid, status);
-                return float64_default_nan(status);
-            }
-            float_raise(float_flag_divbyzero, status);
-            return packFloat64( zSign, 0x7FF, 0 );
-        }
-        normalizeFloat64Subnormal( bSig, &bExp, &bSig );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloat64( zSign, 0, 0 );
-        normalizeFloat64Subnormal( aSig, &aExp, &aSig );
-    }
-    zExp = aExp - bExp + 0x3FD;
-    aSig = ( aSig | LIT64( 0x0010000000000000 ) )<<10;
-    bSig = ( bSig | LIT64( 0x0010000000000000 ) )<<11;
-    if ( bSig <= ( aSig + aSig ) ) {
-        aSig >>= 1;
-        ++zExp;
-    }
-    zSig = estimateDiv128To64( aSig, 0, bSig );
-    if ( ( zSig & 0x1FF ) <= 2 ) {
-        mul64To128( bSig, zSig, &term0, &term1 );
-        sub128( aSig, 0, term0, term1, &rem0, &rem1 );
-        while ( (int64_t) rem0 < 0 ) {
-            --zSig;
-            add128( rem0, rem1, 0, bSig, &rem0, &rem1 );
-        }
-        zSig |= ( rem1 != 0 );
-    }
-    return roundAndPackFloat64(zSign, zExp, zSig, status);
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the remainder of the double-precision floating-point value `a'
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index cfee28061e..335f199bb6 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -322,6 +322,7 @@ float64 float16_to_float64(float16 a, flag ieee, float_status *status);
 float16 float16_add(float16, float16, float_status *status);
 float16 float16_sub(float16, float16, float_status *status);
 float16 float16_mul(float16, float16, float_status *status);
+float16 float16_div(float16, float16, float_status *status);
 
 int float16_is_quiet_nan(float16, float_status *status);
 int float16_is_signaling_nan(float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 14/20] fpu/softfloat: re-factor muladd
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (12 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 13/20] fpu/softfloat: re-factor div Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-02-13 15:15   ` Peter Maydell
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 15/20] fpu/softfloat: re-factor round_to_int Alex Bennée
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

We can now add float16_muladd and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 muladd functions.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat-specialize.h | 104 -------
 fpu/softfloat.c            | 756 +++++++++++++++++----------------------------
 include/fpu/softfloat.h    |   1 +
 3 files changed, 286 insertions(+), 575 deletions(-)

diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
index 3d507d8c77..98fb0e7001 100644
--- a/fpu/softfloat-specialize.h
+++ b/fpu/softfloat-specialize.h
@@ -729,58 +729,6 @@ static float32 propagateFloat32NaN(float32 a, float32 b, float_status *status)
     }
 }
 
-/*----------------------------------------------------------------------------
-| Takes three single-precision floating-point values `a', `b' and `c', one of
-| which is a NaN, and returns the appropriate NaN result.  If any of  `a',
-| `b' or `c' is a signaling NaN, the invalid exception is raised.
-| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case
-| obviously c is a NaN, and whether to propagate c or some other NaN is
-| implementation defined).
-*----------------------------------------------------------------------------*/
-
-static float32 propagateFloat32MulAddNaN(float32 a, float32 b,
-                                         float32 c, flag infzero,
-                                         float_status *status)
-{
-    flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN,
-        cIsQuietNaN, cIsSignalingNaN;
-    int which;
-
-    aIsQuietNaN = float32_is_quiet_nan(a, status);
-    aIsSignalingNaN = float32_is_signaling_nan(a, status);
-    bIsQuietNaN = float32_is_quiet_nan(b, status);
-    bIsSignalingNaN = float32_is_signaling_nan(b, status);
-    cIsQuietNaN = float32_is_quiet_nan(c, status);
-    cIsSignalingNaN = float32_is_signaling_nan(c, status);
-
-    if (aIsSignalingNaN | bIsSignalingNaN | cIsSignalingNaN) {
-        float_raise(float_flag_invalid, status);
-    }
-
-    which = pickNaNMulAdd(aIsQuietNaN, aIsSignalingNaN,
-                          bIsQuietNaN, bIsSignalingNaN,
-                          cIsQuietNaN, cIsSignalingNaN, infzero, status);
-
-    if (status->default_nan_mode) {
-        /* Note that this check is after pickNaNMulAdd so that function
-         * has an opportunity to set the Invalid flag.
-         */
-        return float32_default_nan(status);
-    }
-
-    switch (which) {
-    case 0:
-        return float32_maybe_silence_nan(a, status);
-    case 1:
-        return float32_maybe_silence_nan(b, status);
-    case 2:
-        return float32_maybe_silence_nan(c, status);
-    case 3:
-    default:
-        return float32_default_nan(status);
-    }
-}
-
 #ifdef NO_SIGNALING_NANS
 int float64_is_quiet_nan(float64 a_, float_status *status)
 {
@@ -936,58 +884,6 @@ static float64 propagateFloat64NaN(float64 a, float64 b, float_status *status)
     }
 }
 
-/*----------------------------------------------------------------------------
-| Takes three double-precision floating-point values `a', `b' and `c', one of
-| which is a NaN, and returns the appropriate NaN result.  If any of  `a',
-| `b' or `c' is a signaling NaN, the invalid exception is raised.
-| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case
-| obviously c is a NaN, and whether to propagate c or some other NaN is
-| implementation defined).
-*----------------------------------------------------------------------------*/
-
-static float64 propagateFloat64MulAddNaN(float64 a, float64 b,
-                                         float64 c, flag infzero,
-                                         float_status *status)
-{
-    flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN,
-        cIsQuietNaN, cIsSignalingNaN;
-    int which;
-
-    aIsQuietNaN = float64_is_quiet_nan(a, status);
-    aIsSignalingNaN = float64_is_signaling_nan(a, status);
-    bIsQuietNaN = float64_is_quiet_nan(b, status);
-    bIsSignalingNaN = float64_is_signaling_nan(b, status);
-    cIsQuietNaN = float64_is_quiet_nan(c, status);
-    cIsSignalingNaN = float64_is_signaling_nan(c, status);
-
-    if (aIsSignalingNaN | bIsSignalingNaN | cIsSignalingNaN) {
-        float_raise(float_flag_invalid, status);
-    }
-
-    which = pickNaNMulAdd(aIsQuietNaN, aIsSignalingNaN,
-                          bIsQuietNaN, bIsSignalingNaN,
-                          cIsQuietNaN, cIsSignalingNaN, infzero, status);
-
-    if (status->default_nan_mode) {
-        /* Note that this check is after pickNaNMulAdd so that function
-         * has an opportunity to set the Invalid flag.
-         */
-        return float64_default_nan(status);
-    }
-
-    switch (which) {
-    case 0:
-        return float64_maybe_silence_nan(a, status);
-    case 1:
-        return float64_maybe_silence_nan(b, status);
-    case 2:
-        return float64_maybe_silence_nan(c, status);
-    case 3:
-    default:
-        return float64_default_nan(status);
-    }
-}
-
 #ifdef NO_SIGNALING_NANS
 int floatx80_is_quiet_nan(floatx80 a_, float_status *status)
 {
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 2b703c12ed..84386f354b 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -561,6 +561,50 @@ static decomposed_parts pick_nan_parts(decomposed_parts a, decomposed_parts b,
     return a;
 }
 
+static decomposed_parts pick_nan_muladd_parts(decomposed_parts a,
+                                              decomposed_parts b,
+                                              decomposed_parts c,
+                                              bool inf_zero,
+                                              float_status *s)
+{
+    if (a.cls == float_class_snan
+        ||
+        b.cls == float_class_snan
+        ||
+        c.cls == float_class_snan) {
+        s->float_exception_flags |= float_flag_invalid;
+    }
+
+    if (s->default_nan_mode) {
+        a.cls = float_class_dnan;
+    } else {
+        switch (pickNaNMulAdd(a.cls == float_class_qnan,
+                              a.cls == float_class_snan,
+                              b.cls == float_class_qnan,
+                              b.cls == float_class_snan,
+                              c.cls == float_class_qnan,
+                              c.cls == float_class_snan,
+                              inf_zero, s)) {
+        case 0:
+            break;
+        case 1:
+            a = b;
+            break;
+        case 2:
+            a = c;
+            break;
+        case 3:
+            a.cls = float_class_dnan;
+            return a;
+        default:
+            g_assert_not_reached();
+        }
+
+        a.cls = float_class_msnan;
+    }
+    return a;
+}
+
 
 /*
  * Returns the result of adding the absolute values of the
@@ -809,6 +853,247 @@ float64 float64_mul(float64 a, float64 b, float_status *status)
     return float64_round_pack_canonical(pr, status);
 }
 
+/*
+ * Returns the result of multiplying the floating-point values `a' and
+ * `b' then adding 'c', with no intermediate rounding step after the
+ * multiplication. The operation is performed according to the
+ * IEC/IEEE Standard for Binary Floating-Point Arithmetic 754-2008.
+ * The flags argument allows the caller to select negation of the
+ * addend, the intermediate product, or the final result. (The
+ * difference between this and having the caller do a separate
+ * negation is that negating externally will flip the sign bit on
+ * NaNs.)
+ */
+
+static decomposed_parts muladd_decomposed(decomposed_parts a,
+                                          decomposed_parts b,
+                                          decomposed_parts c, int flags,
+                                          float_status *s)
+{
+    bool inf_zero = ((1 << a.cls) | (1 << b.cls)) ==
+                    ((1 << float_class_inf) | (1 << float_class_zero));
+    bool p_sign;
+    bool sign_flip = flags & float_muladd_negate_result;
+    float_class p_class;
+    uint64_t hi, lo;
+    int p_exp;
+
+    /* It is implementation-defined whether the cases of (0,inf,qnan)
+     * and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
+     * they return if they do), so we have to hand this information
+     * off to the target-specific pick-a-NaN routine.
+     */
+    if (a.cls >= float_class_qnan ||
+        b.cls >= float_class_qnan ||
+        c.cls >= float_class_qnan) {
+        return pick_nan_muladd_parts(a, b, c, inf_zero, s);
+    }
+
+    if (inf_zero) {
+        s->float_exception_flags |= float_flag_invalid;
+        a.cls = float_class_dnan;
+        return a;
+    }
+
+    if (flags & float_muladd_negate_c) {
+        c.sign ^= 1;
+    }
+
+    p_sign = a.sign ^ b.sign;
+
+    if (flags & float_muladd_negate_product) {
+        p_sign ^= 1;
+    }
+
+    if (a.cls == float_class_inf || b.cls == float_class_inf) {
+        p_class = float_class_inf;
+    } else if (a.cls == float_class_zero || b.cls == float_class_zero) {
+        p_class = float_class_zero;
+    } else {
+        p_class = float_class_normal;
+    }
+
+    if (c.cls == float_class_inf) {
+        if (p_class == float_class_inf && p_sign != c.sign) {
+            s->float_exception_flags |= float_flag_invalid;
+            a.cls = float_class_dnan;
+        } else {
+            a.cls = float_class_inf;
+            a.sign = c.sign ^ sign_flip;
+        }
+        return a;
+    }
+
+    if (p_class == float_class_inf) {
+        a.cls = float_class_inf;
+        a.sign = p_sign ^ sign_flip;
+        return a;
+    }
+
+    if (p_class == float_class_zero) {
+        if (c.cls == float_class_zero) {
+            if (p_sign != c.sign) {
+                p_sign = s->float_rounding_mode == float_round_down;
+            }
+            c.sign = p_sign;
+        } else if (flags & float_muladd_halve_result) {
+            c.exp -= 1;
+        }
+        c.sign ^= sign_flip;
+        return c;
+    }
+
+    /* a & b should be normals now... */
+    assert(a.cls == float_class_normal &&
+           b.cls == float_class_normal);
+
+    p_exp = a.exp + b.exp;
+
+    /* Multiply of 2 62-bit numbers produces a (2*62) == 124-bit
+     * result.
+     */
+    mul64To128(a.frac, b.frac, &hi, &lo);
+    /* binary point now at bit 124 */
+
+    /* check for overflow */
+    if (hi & (1ULL << (DECOMPOSED_BINARY_POINT * 2 + 1 - 64))) {
+        shift128RightJamming(hi, lo, 1, &hi, &lo);
+        p_exp += 1;
+    }
+
+    /* + add/sub */
+    if (c.cls == float_class_zero) {
+        /* move binary point back to 62 */
+        shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo);
+    } else {
+        int exp_diff = p_exp - c.exp;
+        if (p_sign == c.sign) {
+            /* Addition */
+            if (exp_diff <= 0) {
+                shift128RightJamming(hi, lo,
+                                     DECOMPOSED_BINARY_POINT - exp_diff,
+                                     &hi, &lo);
+                lo += c.frac;
+                p_exp = c.exp;
+            } else {
+                uint64_t c_hi, c_lo;
+                /* shift c to the same binary point as the product (124) */
+                c_hi = c.frac >> 2;
+                c_lo = 0;
+                shift128RightJamming(c_hi, c_lo,
+                                     exp_diff,
+                                     &c_hi, &c_lo);
+                add128(hi, lo, c_hi, c_lo, &hi, &lo);
+                /* move binary point back to 62 */
+                shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo);
+            }
+
+            if (lo & DECOMPOSED_OVERFLOW_BIT) {
+                shift64RightJamming(lo, 1, &lo);
+                p_exp += 1;
+            }
+
+        } else {
+            /* Subtraction */
+            uint64_t c_hi, c_lo;
+            /* make C binary point match product at bit 124 */
+            c_hi = c.frac >> 2;
+            c_lo = 0;
+
+            if (exp_diff <= 0) {
+                shift128RightJamming(hi, lo, -exp_diff, &hi, &lo);
+                if (exp_diff == 0
+                    &&
+                    (hi > c_hi || (hi == c_hi && lo >= c_lo))) {
+                    sub128(hi, lo, c_hi, c_lo, &hi, &lo);
+                } else {
+                    sub128(c_hi, c_lo, hi, lo, &hi, &lo);
+                    p_sign ^= 1;
+                    p_exp = c.exp;
+                }
+            } else {
+                shift128RightJamming(c_hi, c_lo,
+                                     exp_diff,
+                                     &c_hi, &c_lo);
+                sub128(hi, lo, c_hi, c_lo, &hi, &lo);
+            }
+
+            if (hi == 0 && lo == 0) {
+                a.cls = float_class_zero;
+                a.sign = s->float_rounding_mode == float_round_down;
+                a.sign ^= sign_flip;
+                return a;
+            } else {
+                int shift;
+                if (hi != 0) {
+                    shift = clz64(hi);
+                } else {
+                    shift = clz64(lo) + 64;
+                }
+                /* Normalizing to a binary point of 124 is the
+                   correct adjust for the exponent.  However since we're
+                   shifting, we might as well put the binary point back
+                   at 62 where we really want it.  Therefore shift as
+                   if we're leaving 1 bit at the top of the word, but
+                   adjust the exponent as if we're leaving 3 bits.  */
+                shift -= 1;
+                if (shift >= 64) {
+                    lo = lo << (shift - 64);
+                } else {
+                    hi = (hi << shift) | (lo >> (64 - shift));
+                    lo = hi | ((lo << shift) != 0);
+                }
+                p_exp -= shift - 2;
+            }
+        }
+    }
+
+    if (flags & float_muladd_halve_result) {
+        p_exp -= 1;
+    }
+
+    /* finally prepare our result */
+    a.cls = float_class_normal;
+    a.sign = p_sign ^ sign_flip;
+    a.exp = p_exp;
+    a.frac = lo;
+
+    return a;
+}
+
+float16 float16_muladd(float16 a, float16 b, float16 c, int flags,
+                       float_status *status)
+{
+    decomposed_parts pa = float16_unpack_canonical(a, status);
+    decomposed_parts pb = float16_unpack_canonical(b, status);
+    decomposed_parts pc = float16_unpack_canonical(c, status);
+    decomposed_parts pr = muladd_decomposed(pa, pb, pc, flags, status);
+
+    return float16_round_pack_canonical(pr, status);
+}
+
+float32 float32_muladd(float32 a, float32 b, float32 c, int flags,
+                       float_status *status)
+{
+    decomposed_parts pa = float32_unpack_canonical(a, status);
+    decomposed_parts pb = float32_unpack_canonical(b, status);
+    decomposed_parts pc = float32_unpack_canonical(c, status);
+    decomposed_parts pr = muladd_decomposed(pa, pb, pc, flags, status);
+
+    return float32_round_pack_canonical(pr, status);
+}
+
+float64 float64_muladd(float64 a, float64 b, float64 c, int flags,
+                       float_status *status)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, status);
+    decomposed_parts pb = float64_unpack_canonical(b, status);
+    decomposed_parts pc = float64_unpack_canonical(c, status);
+    decomposed_parts pr = muladd_decomposed(pa, pb, pc, flags, status);
+
+    return float64_round_pack_canonical(pr, status);
+}
+
 /*
  * Returns the result of dividing the floating-point value `a' by the
  * corresponding value `b'. The operation is performed according to
@@ -2814,231 +3099,6 @@ float32 float32_rem(float32 a, float32 b, float_status *status)
     return normalizeRoundAndPackFloat32(aSign ^ zSign, bExp, aSig, status);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of multiplying the single-precision floating-point values
-| `a' and `b' then adding 'c', with no intermediate rounding step after the
-| multiplication.  The operation is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic 754-2008.
-| The flags argument allows the caller to select negation of the
-| addend, the intermediate product, or the final result. (The difference
-| between this and having the caller do a separate negation is that negating
-| externally will flip the sign bit on NaNs.)
-*----------------------------------------------------------------------------*/
-
-float32 float32_muladd(float32 a, float32 b, float32 c, int flags,
-                       float_status *status)
-{
-    flag aSign, bSign, cSign, zSign;
-    int aExp, bExp, cExp, pExp, zExp, expDiff;
-    uint32_t aSig, bSig, cSig;
-    flag pInf, pZero, pSign;
-    uint64_t pSig64, cSig64, zSig64;
-    uint32_t pSig;
-    int shiftcount;
-    flag signflip, infzero;
-
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-    c = float32_squash_input_denormal(c, status);
-    aSig = extractFloat32Frac(a);
-    aExp = extractFloat32Exp(a);
-    aSign = extractFloat32Sign(a);
-    bSig = extractFloat32Frac(b);
-    bExp = extractFloat32Exp(b);
-    bSign = extractFloat32Sign(b);
-    cSig = extractFloat32Frac(c);
-    cExp = extractFloat32Exp(c);
-    cSign = extractFloat32Sign(c);
-
-    infzero = ((aExp == 0 && aSig == 0 && bExp == 0xff && bSig == 0) ||
-               (aExp == 0xff && aSig == 0 && bExp == 0 && bSig == 0));
-
-    /* It is implementation-defined whether the cases of (0,inf,qnan)
-     * and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
-     * they return if they do), so we have to hand this information
-     * off to the target-specific pick-a-NaN routine.
-     */
-    if (((aExp == 0xff) && aSig) ||
-        ((bExp == 0xff) && bSig) ||
-        ((cExp == 0xff) && cSig)) {
-        return propagateFloat32MulAddNaN(a, b, c, infzero, status);
-    }
-
-    if (infzero) {
-        float_raise(float_flag_invalid, status);
-        return float32_default_nan(status);
-    }
-
-    if (flags & float_muladd_negate_c) {
-        cSign ^= 1;
-    }
-
-    signflip = (flags & float_muladd_negate_result) ? 1 : 0;
-
-    /* Work out the sign and type of the product */
-    pSign = aSign ^ bSign;
-    if (flags & float_muladd_negate_product) {
-        pSign ^= 1;
-    }
-    pInf = (aExp == 0xff) || (bExp == 0xff);
-    pZero = ((aExp | aSig) == 0) || ((bExp | bSig) == 0);
-
-    if (cExp == 0xff) {
-        if (pInf && (pSign ^ cSign)) {
-            /* addition of opposite-signed infinities => InvalidOperation */
-            float_raise(float_flag_invalid, status);
-            return float32_default_nan(status);
-        }
-        /* Otherwise generate an infinity of the same sign */
-        return packFloat32(cSign ^ signflip, 0xff, 0);
-    }
-
-    if (pInf) {
-        return packFloat32(pSign ^ signflip, 0xff, 0);
-    }
-
-    if (pZero) {
-        if (cExp == 0) {
-            if (cSig == 0) {
-                /* Adding two exact zeroes */
-                if (pSign == cSign) {
-                    zSign = pSign;
-                } else if (status->float_rounding_mode == float_round_down) {
-                    zSign = 1;
-                } else {
-                    zSign = 0;
-                }
-                return packFloat32(zSign ^ signflip, 0, 0);
-            }
-            /* Exact zero plus a denorm */
-            if (status->flush_to_zero) {
-                float_raise(float_flag_output_denormal, status);
-                return packFloat32(cSign ^ signflip, 0, 0);
-            }
-        }
-        /* Zero plus something non-zero : just return the something */
-        if (flags & float_muladd_halve_result) {
-            if (cExp == 0) {
-                normalizeFloat32Subnormal(cSig, &cExp, &cSig);
-            }
-            /* Subtract one to halve, and one again because roundAndPackFloat32
-             * wants one less than the true exponent.
-             */
-            cExp -= 2;
-            cSig = (cSig | 0x00800000) << 7;
-            return roundAndPackFloat32(cSign ^ signflip, cExp, cSig, status);
-        }
-        return packFloat32(cSign ^ signflip, cExp, cSig);
-    }
-
-    if (aExp == 0) {
-        normalizeFloat32Subnormal(aSig, &aExp, &aSig);
-    }
-    if (bExp == 0) {
-        normalizeFloat32Subnormal(bSig, &bExp, &bSig);
-    }
-
-    /* Calculate the actual result a * b + c */
-
-    /* Multiply first; this is easy. */
-    /* NB: we subtract 0x7e where float32_mul() subtracts 0x7f
-     * because we want the true exponent, not the "one-less-than"
-     * flavour that roundAndPackFloat32() takes.
-     */
-    pExp = aExp + bExp - 0x7e;
-    aSig = (aSig | 0x00800000) << 7;
-    bSig = (bSig | 0x00800000) << 8;
-    pSig64 = (uint64_t)aSig * bSig;
-    if ((int64_t)(pSig64 << 1) >= 0) {
-        pSig64 <<= 1;
-        pExp--;
-    }
-
-    zSign = pSign ^ signflip;
-
-    /* Now pSig64 is the significand of the multiply, with the explicit bit in
-     * position 62.
-     */
-    if (cExp == 0) {
-        if (!cSig) {
-            /* Throw out the special case of c being an exact zero now */
-            shift64RightJamming(pSig64, 32, &pSig64);
-            pSig = pSig64;
-            if (flags & float_muladd_halve_result) {
-                pExp--;
-            }
-            return roundAndPackFloat32(zSign, pExp - 1,
-                                       pSig, status);
-        }
-        normalizeFloat32Subnormal(cSig, &cExp, &cSig);
-    }
-
-    cSig64 = (uint64_t)cSig << (62 - 23);
-    cSig64 |= LIT64(0x4000000000000000);
-    expDiff = pExp - cExp;
-
-    if (pSign == cSign) {
-        /* Addition */
-        if (expDiff > 0) {
-            /* scale c to match p */
-            shift64RightJamming(cSig64, expDiff, &cSig64);
-            zExp = pExp;
-        } else if (expDiff < 0) {
-            /* scale p to match c */
-            shift64RightJamming(pSig64, -expDiff, &pSig64);
-            zExp = cExp;
-        } else {
-            /* no scaling needed */
-            zExp = cExp;
-        }
-        /* Add significands and make sure explicit bit ends up in posn 62 */
-        zSig64 = pSig64 + cSig64;
-        if ((int64_t)zSig64 < 0) {
-            shift64RightJamming(zSig64, 1, &zSig64);
-        } else {
-            zExp--;
-        }
-    } else {
-        /* Subtraction */
-        if (expDiff > 0) {
-            shift64RightJamming(cSig64, expDiff, &cSig64);
-            zSig64 = pSig64 - cSig64;
-            zExp = pExp;
-        } else if (expDiff < 0) {
-            shift64RightJamming(pSig64, -expDiff, &pSig64);
-            zSig64 = cSig64 - pSig64;
-            zExp = cExp;
-            zSign ^= 1;
-        } else {
-            zExp = pExp;
-            if (cSig64 < pSig64) {
-                zSig64 = pSig64 - cSig64;
-            } else if (pSig64 < cSig64) {
-                zSig64 = cSig64 - pSig64;
-                zSign ^= 1;
-            } else {
-                /* Exact zero */
-                zSign = signflip;
-                if (status->float_rounding_mode == float_round_down) {
-                    zSign ^= 1;
-                }
-                return packFloat32(zSign, 0, 0);
-            }
-        }
-        --zExp;
-        /* Normalize to put the explicit bit back into bit 62. */
-        shiftcount = countLeadingZeros64(zSig64) - 1;
-        zSig64 <<= shiftcount;
-        zExp -= shiftcount;
-    }
-    if (flags & float_muladd_halve_result) {
-        zExp--;
-    }
-
-    shift64RightJamming(zSig64, 32, &zSig64);
-    return roundAndPackFloat32(zSign, zExp, zSig64, status);
-}
-
 
 /*----------------------------------------------------------------------------
 | Returns the square root of the single-precision floating-point value `a'.
@@ -4262,252 +4322,6 @@ float64 float64_rem(float64 a, float64 b, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of multiplying the double-precision floating-point values
-| `a' and `b' then adding 'c', with no intermediate rounding step after the
-| multiplication.  The operation is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic 754-2008.
-| The flags argument allows the caller to select negation of the
-| addend, the intermediate product, or the final result. (The difference
-| between this and having the caller do a separate negation is that negating
-| externally will flip the sign bit on NaNs.)
-*----------------------------------------------------------------------------*/
-
-float64 float64_muladd(float64 a, float64 b, float64 c, int flags,
-                       float_status *status)
-{
-    flag aSign, bSign, cSign, zSign;
-    int aExp, bExp, cExp, pExp, zExp, expDiff;
-    uint64_t aSig, bSig, cSig;
-    flag pInf, pZero, pSign;
-    uint64_t pSig0, pSig1, cSig0, cSig1, zSig0, zSig1;
-    int shiftcount;
-    flag signflip, infzero;
-
-    a = float64_squash_input_denormal(a, status);
-    b = float64_squash_input_denormal(b, status);
-    c = float64_squash_input_denormal(c, status);
-    aSig = extractFloat64Frac(a);
-    aExp = extractFloat64Exp(a);
-    aSign = extractFloat64Sign(a);
-    bSig = extractFloat64Frac(b);
-    bExp = extractFloat64Exp(b);
-    bSign = extractFloat64Sign(b);
-    cSig = extractFloat64Frac(c);
-    cExp = extractFloat64Exp(c);
-    cSign = extractFloat64Sign(c);
-
-    infzero = ((aExp == 0 && aSig == 0 && bExp == 0x7ff && bSig == 0) ||
-               (aExp == 0x7ff && aSig == 0 && bExp == 0 && bSig == 0));
-
-    /* It is implementation-defined whether the cases of (0,inf,qnan)
-     * and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
-     * they return if they do), so we have to hand this information
-     * off to the target-specific pick-a-NaN routine.
-     */
-    if (((aExp == 0x7ff) && aSig) ||
-        ((bExp == 0x7ff) && bSig) ||
-        ((cExp == 0x7ff) && cSig)) {
-        return propagateFloat64MulAddNaN(a, b, c, infzero, status);
-    }
-
-    if (infzero) {
-        float_raise(float_flag_invalid, status);
-        return float64_default_nan(status);
-    }
-
-    if (flags & float_muladd_negate_c) {
-        cSign ^= 1;
-    }
-
-    signflip = (flags & float_muladd_negate_result) ? 1 : 0;
-
-    /* Work out the sign and type of the product */
-    pSign = aSign ^ bSign;
-    if (flags & float_muladd_negate_product) {
-        pSign ^= 1;
-    }
-    pInf = (aExp == 0x7ff) || (bExp == 0x7ff);
-    pZero = ((aExp | aSig) == 0) || ((bExp | bSig) == 0);
-
-    if (cExp == 0x7ff) {
-        if (pInf && (pSign ^ cSign)) {
-            /* addition of opposite-signed infinities => InvalidOperation */
-            float_raise(float_flag_invalid, status);
-            return float64_default_nan(status);
-        }
-        /* Otherwise generate an infinity of the same sign */
-        return packFloat64(cSign ^ signflip, 0x7ff, 0);
-    }
-
-    if (pInf) {
-        return packFloat64(pSign ^ signflip, 0x7ff, 0);
-    }
-
-    if (pZero) {
-        if (cExp == 0) {
-            if (cSig == 0) {
-                /* Adding two exact zeroes */
-                if (pSign == cSign) {
-                    zSign = pSign;
-                } else if (status->float_rounding_mode == float_round_down) {
-                    zSign = 1;
-                } else {
-                    zSign = 0;
-                }
-                return packFloat64(zSign ^ signflip, 0, 0);
-            }
-            /* Exact zero plus a denorm */
-            if (status->flush_to_zero) {
-                float_raise(float_flag_output_denormal, status);
-                return packFloat64(cSign ^ signflip, 0, 0);
-            }
-        }
-        /* Zero plus something non-zero : just return the something */
-        if (flags & float_muladd_halve_result) {
-            if (cExp == 0) {
-                normalizeFloat64Subnormal(cSig, &cExp, &cSig);
-            }
-            /* Subtract one to halve, and one again because roundAndPackFloat64
-             * wants one less than the true exponent.
-             */
-            cExp -= 2;
-            cSig = (cSig | 0x0010000000000000ULL) << 10;
-            return roundAndPackFloat64(cSign ^ signflip, cExp, cSig, status);
-        }
-        return packFloat64(cSign ^ signflip, cExp, cSig);
-    }
-
-    if (aExp == 0) {
-        normalizeFloat64Subnormal(aSig, &aExp, &aSig);
-    }
-    if (bExp == 0) {
-        normalizeFloat64Subnormal(bSig, &bExp, &bSig);
-    }
-
-    /* Calculate the actual result a * b + c */
-
-    /* Multiply first; this is easy. */
-    /* NB: we subtract 0x3fe where float64_mul() subtracts 0x3ff
-     * because we want the true exponent, not the "one-less-than"
-     * flavour that roundAndPackFloat64() takes.
-     */
-    pExp = aExp + bExp - 0x3fe;
-    aSig = (aSig | LIT64(0x0010000000000000))<<10;
-    bSig = (bSig | LIT64(0x0010000000000000))<<11;
-    mul64To128(aSig, bSig, &pSig0, &pSig1);
-    if ((int64_t)(pSig0 << 1) >= 0) {
-        shortShift128Left(pSig0, pSig1, 1, &pSig0, &pSig1);
-        pExp--;
-    }
-
-    zSign = pSign ^ signflip;
-
-    /* Now [pSig0:pSig1] is the significand of the multiply, with the explicit
-     * bit in position 126.
-     */
-    if (cExp == 0) {
-        if (!cSig) {
-            /* Throw out the special case of c being an exact zero now */
-            shift128RightJamming(pSig0, pSig1, 64, &pSig0, &pSig1);
-            if (flags & float_muladd_halve_result) {
-                pExp--;
-            }
-            return roundAndPackFloat64(zSign, pExp - 1,
-                                       pSig1, status);
-        }
-        normalizeFloat64Subnormal(cSig, &cExp, &cSig);
-    }
-
-    /* Shift cSig and add the explicit bit so [cSig0:cSig1] is the
-     * significand of the addend, with the explicit bit in position 126.
-     */
-    cSig0 = cSig << (126 - 64 - 52);
-    cSig1 = 0;
-    cSig0 |= LIT64(0x4000000000000000);
-    expDiff = pExp - cExp;
-
-    if (pSign == cSign) {
-        /* Addition */
-        if (expDiff > 0) {
-            /* scale c to match p */
-            shift128RightJamming(cSig0, cSig1, expDiff, &cSig0, &cSig1);
-            zExp = pExp;
-        } else if (expDiff < 0) {
-            /* scale p to match c */
-            shift128RightJamming(pSig0, pSig1, -expDiff, &pSig0, &pSig1);
-            zExp = cExp;
-        } else {
-            /* no scaling needed */
-            zExp = cExp;
-        }
-        /* Add significands and make sure explicit bit ends up in posn 126 */
-        add128(pSig0, pSig1, cSig0, cSig1, &zSig0, &zSig1);
-        if ((int64_t)zSig0 < 0) {
-            shift128RightJamming(zSig0, zSig1, 1, &zSig0, &zSig1);
-        } else {
-            zExp--;
-        }
-        shift128RightJamming(zSig0, zSig1, 64, &zSig0, &zSig1);
-        if (flags & float_muladd_halve_result) {
-            zExp--;
-        }
-        return roundAndPackFloat64(zSign, zExp, zSig1, status);
-    } else {
-        /* Subtraction */
-        if (expDiff > 0) {
-            shift128RightJamming(cSig0, cSig1, expDiff, &cSig0, &cSig1);
-            sub128(pSig0, pSig1, cSig0, cSig1, &zSig0, &zSig1);
-            zExp = pExp;
-        } else if (expDiff < 0) {
-            shift128RightJamming(pSig0, pSig1, -expDiff, &pSig0, &pSig1);
-            sub128(cSig0, cSig1, pSig0, pSig1, &zSig0, &zSig1);
-            zExp = cExp;
-            zSign ^= 1;
-        } else {
-            zExp = pExp;
-            if (lt128(cSig0, cSig1, pSig0, pSig1)) {
-                sub128(pSig0, pSig1, cSig0, cSig1, &zSig0, &zSig1);
-            } else if (lt128(pSig0, pSig1, cSig0, cSig1)) {
-                sub128(cSig0, cSig1, pSig0, pSig1, &zSig0, &zSig1);
-                zSign ^= 1;
-            } else {
-                /* Exact zero */
-                zSign = signflip;
-                if (status->float_rounding_mode == float_round_down) {
-                    zSign ^= 1;
-                }
-                return packFloat64(zSign, 0, 0);
-            }
-        }
-        --zExp;
-        /* Do the equivalent of normalizeRoundAndPackFloat64() but
-         * starting with the significand in a pair of uint64_t.
-         */
-        if (zSig0) {
-            shiftcount = countLeadingZeros64(zSig0) - 1;
-            shortShift128Left(zSig0, zSig1, shiftcount, &zSig0, &zSig1);
-            if (zSig1) {
-                zSig0 |= 1;
-            }
-            zExp -= shiftcount;
-        } else {
-            shiftcount = countLeadingZeros64(zSig1);
-            if (shiftcount == 0) {
-                zSig0 = (zSig1 >> 1) | (zSig1 & 1);
-                zExp -= 63;
-            } else {
-                shiftcount--;
-                zSig0 = zSig1 << shiftcount;
-                zExp -= (shiftcount + 64);
-            }
-        }
-        if (flags & float_muladd_halve_result) {
-            zExp--;
-        }
-        return roundAndPackFloat64(zSign, zExp, zSig0, status);
-    }
-}
 
 /*----------------------------------------------------------------------------
 | Returns the square root of the double-precision floating-point value `a'.
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 335f199bb6..c92147abec 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -322,6 +322,7 @@ float64 float16_to_float64(float16 a, flag ieee, float_status *status);
 float16 float16_add(float16, float16, float_status *status);
 float16 float16_sub(float16, float16, float_status *status);
 float16 float16_mul(float16, float16, float_status *status);
+float16 float16_muladd(float16, float16, float16, int, float_status *status);
 float16 float16_div(float16, float16, float_status *status);
 
 int float16_is_quiet_nan(float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 15/20] fpu/softfloat: re-factor round_to_int
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (13 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 14/20] fpu/softfloat: re-factor muladd Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 16/20] fpu/softfloat: re-factor float to int/uint Alex Bennée
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

We can now add float16_round_to_int and use the common round_decomposed and
canonicalize functions to have a single implementation for
float16/32/64 round_to_int functions.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c         | 304 ++++++++++++++++++++----------------------------
 include/fpu/softfloat.h |   1 +
 2 files changed, 130 insertions(+), 175 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 84386f354b..edc35300d1 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1183,6 +1183,135 @@ float64 float64_div(float64 a, float64 b, float_status *status)
     return float64_round_pack_canonical(pr, status);
 }
 
+/*
+ * Rounds the floating-point value `a' to an integer, and returns the
+ * result as a floating-point value. The operation is performed
+ * according to the IEC/IEEE Standard for Binary Floating-Point
+ * Arithmetic.
+ */
+
+static decomposed_parts round_decomposed(decomposed_parts a, int rounding_mode,
+                                         float_status *s)
+{
+
+    switch (a.cls) {
+    case float_class_snan:
+        a.cls = s->default_nan_mode ? float_class_dnan : float_class_msnan;
+        s->float_exception_flags |= float_flag_invalid;
+        break;
+    case float_class_zero:
+    case float_class_inf:
+    case float_class_qnan:
+        /* already "integral" */
+        break;
+    case float_class_normal:
+        if (a.exp >= DECOMPOSED_BINARY_POINT) {
+            /* already integral */
+            break;
+        }
+        if (a.exp < 0) {
+            bool one;
+            /* all fractional */
+            s->float_exception_flags |= float_flag_inexact;
+            switch (rounding_mode) {
+            case float_round_nearest_even:
+                one = a.exp == -1 && a.frac > DECOMPOSED_IMPLICIT_BIT;
+                break;
+            case float_round_ties_away:
+                one = a.exp == -1 && a.frac >= DECOMPOSED_IMPLICIT_BIT;
+                break;
+            case float_round_to_zero:
+                one = false;
+                break;
+            case float_round_up:
+                one = !a.sign;
+                break;
+            case float_round_down:
+                one = a.sign;
+                break;
+            default:
+                g_assert_not_reached();
+            }
+
+            if (one) {
+                a.frac = DECOMPOSED_IMPLICIT_BIT;
+                a.exp = 0;
+            } else {
+                a.cls = float_class_zero;
+            }
+        } else {
+            uint64_t frac_lsb, frac_lsbm1, round_mask, roundeven_mask, inc;
+
+            frac_lsb = DECOMPOSED_IMPLICIT_BIT >> a.exp;
+            frac_lsbm1 = frac_lsb >> 1;
+            roundeven_mask = (frac_lsb - 1) | frac_lsb;
+            round_mask = roundeven_mask >> 1;
+
+            switch (rounding_mode) {
+            case float_round_nearest_even:
+                inc = ((a.frac & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
+                break;
+            case float_round_ties_away:
+                inc = frac_lsbm1;
+                break;
+            case float_round_to_zero:
+                inc = 0;
+                break;
+            case float_round_up:
+                inc = a.sign ? 0 : round_mask;
+                break;
+            case float_round_down:
+                inc = a.sign ? round_mask : 0;
+                break;
+            default:
+                g_assert_not_reached();
+            }
+
+            if (a.frac & round_mask) {
+                s->float_exception_flags |= float_flag_inexact;
+                a.frac += inc;
+                a.frac &= ~round_mask;
+                if (a.frac & DECOMPOSED_OVERFLOW_BIT) {
+                    a.frac >>= 1;
+                    a.exp++;
+                }
+            }
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return a;
+}
+
+float16 float16_round_to_int(float16 a, float_status *s)
+{
+    decomposed_parts pa = float16_unpack_canonical(a, s);
+    decomposed_parts pr = round_decomposed(pa, s->float_rounding_mode, s);
+    return float16_round_pack_canonical(pr, s);
+}
+
+float32 float32_round_to_int(float32 a, float_status *s)
+{
+    decomposed_parts pa = float32_unpack_canonical(a, s);
+    decomposed_parts pr = round_decomposed(pa, s->float_rounding_mode, s);
+    return float32_round_pack_canonical(pr, s);
+}
+
+float64 float64_round_to_int(float64 a, float_status *s)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, s);
+    decomposed_parts pr = round_decomposed(pa, s->float_rounding_mode, s);
+    return float64_round_pack_canonical(pr, s);
+}
+
+float64 float64_trunc_to_int(float64 a, float_status *s)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, s);
+    decomposed_parts pr = round_decomposed(pa, float_round_to_zero, s);
+    return float64_round_pack_canonical(pr, s);
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -2913,88 +3042,6 @@ float128 float32_to_float128(float32 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Rounds the single-precision floating-point value `a' to an integer, and
-| returns the result as a single-precision floating-point value.  The
-| operation is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 float32_round_to_int(float32 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    uint32_t lastBitMask, roundBitsMask;
-    uint32_t z;
-    a = float32_squash_input_denormal(a, status);
-
-    aExp = extractFloat32Exp( a );
-    if ( 0x96 <= aExp ) {
-        if ( ( aExp == 0xFF ) && extractFloat32Frac( a ) ) {
-            return propagateFloat32NaN(a, a, status);
-        }
-        return a;
-    }
-    if ( aExp <= 0x7E ) {
-        if ( (uint32_t) ( float32_val(a)<<1 ) == 0 ) return a;
-        status->float_exception_flags |= float_flag_inexact;
-        aSign = extractFloat32Sign( a );
-        switch (status->float_rounding_mode) {
-         case float_round_nearest_even:
-            if ( ( aExp == 0x7E ) && extractFloat32Frac( a ) ) {
-                return packFloat32( aSign, 0x7F, 0 );
-            }
-            break;
-        case float_round_ties_away:
-            if (aExp == 0x7E) {
-                return packFloat32(aSign, 0x7F, 0);
-            }
-            break;
-         case float_round_down:
-            return make_float32(aSign ? 0xBF800000 : 0);
-         case float_round_up:
-            return make_float32(aSign ? 0x80000000 : 0x3F800000);
-        }
-        return packFloat32( aSign, 0, 0 );
-    }
-    lastBitMask = 1;
-    lastBitMask <<= 0x96 - aExp;
-    roundBitsMask = lastBitMask - 1;
-    z = float32_val(a);
-    switch (status->float_rounding_mode) {
-    case float_round_nearest_even:
-        z += lastBitMask>>1;
-        if ((z & roundBitsMask) == 0) {
-            z &= ~lastBitMask;
-        }
-        break;
-    case float_round_ties_away:
-        z += lastBitMask >> 1;
-        break;
-    case float_round_to_zero:
-        break;
-    case float_round_up:
-        if (!extractFloat32Sign(make_float32(z))) {
-            z += roundBitsMask;
-        }
-        break;
-    case float_round_down:
-        if (extractFloat32Sign(make_float32(z))) {
-            z += roundBitsMask;
-        }
-        break;
-    default:
-        abort();
-    }
-    z &= ~ roundBitsMask;
-    if (z != float32_val(a)) {
-        status->float_exception_flags |= float_flag_inexact;
-    }
-    return make_float32(z);
-
-}
-
-
 
 
 /*----------------------------------------------------------------------------
@@ -4140,99 +4187,6 @@ float128 float64_to_float128(float64 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Rounds the double-precision floating-point value `a' to an integer, and
-| returns the result as a double-precision floating-point value.  The
-| operation is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 float64_round_to_int(float64 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    uint64_t lastBitMask, roundBitsMask;
-    uint64_t z;
-    a = float64_squash_input_denormal(a, status);
-
-    aExp = extractFloat64Exp( a );
-    if ( 0x433 <= aExp ) {
-        if ( ( aExp == 0x7FF ) && extractFloat64Frac( a ) ) {
-            return propagateFloat64NaN(a, a, status);
-        }
-        return a;
-    }
-    if ( aExp < 0x3FF ) {
-        if ( (uint64_t) ( float64_val(a)<<1 ) == 0 ) return a;
-        status->float_exception_flags |= float_flag_inexact;
-        aSign = extractFloat64Sign( a );
-        switch (status->float_rounding_mode) {
-         case float_round_nearest_even:
-            if ( ( aExp == 0x3FE ) && extractFloat64Frac( a ) ) {
-                return packFloat64( aSign, 0x3FF, 0 );
-            }
-            break;
-        case float_round_ties_away:
-            if (aExp == 0x3FE) {
-                return packFloat64(aSign, 0x3ff, 0);
-            }
-            break;
-         case float_round_down:
-            return make_float64(aSign ? LIT64( 0xBFF0000000000000 ) : 0);
-         case float_round_up:
-            return make_float64(
-            aSign ? LIT64( 0x8000000000000000 ) : LIT64( 0x3FF0000000000000 ));
-        }
-        return packFloat64( aSign, 0, 0 );
-    }
-    lastBitMask = 1;
-    lastBitMask <<= 0x433 - aExp;
-    roundBitsMask = lastBitMask - 1;
-    z = float64_val(a);
-    switch (status->float_rounding_mode) {
-    case float_round_nearest_even:
-        z += lastBitMask >> 1;
-        if ((z & roundBitsMask) == 0) {
-            z &= ~lastBitMask;
-        }
-        break;
-    case float_round_ties_away:
-        z += lastBitMask >> 1;
-        break;
-    case float_round_to_zero:
-        break;
-    case float_round_up:
-        if (!extractFloat64Sign(make_float64(z))) {
-            z += roundBitsMask;
-        }
-        break;
-    case float_round_down:
-        if (extractFloat64Sign(make_float64(z))) {
-            z += roundBitsMask;
-        }
-        break;
-    default:
-        abort();
-    }
-    z &= ~ roundBitsMask;
-    if (z != float64_val(a)) {
-        status->float_exception_flags |= float_flag_inexact;
-    }
-    return make_float64(z);
-
-}
-
-float64 float64_trunc_to_int(float64 a, float_status *status)
-{
-    int oldmode;
-    float64 res;
-    oldmode = status->float_rounding_mode;
-    status->float_rounding_mode = float_round_to_zero;
-    res = float64_round_to_int(a, status);
-    status->float_rounding_mode = oldmode;
-    return res;
-}
-
 
 /*----------------------------------------------------------------------------
 | Returns the remainder of the double-precision floating-point value `a'
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index c92147abec..6427762a9a 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -319,6 +319,7 @@ float64 float16_to_float64(float16 a, flag ieee, float_status *status);
 | Software half-precision operations.
 *----------------------------------------------------------------------------*/
 
+float16 float16_round_to_int(float16, float_status *status);
 float16 float16_add(float16, float16, float_status *status);
 float16 float16_sub(float16, float16, float_status *status);
 float16 float16_mul(float16, float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 16/20] fpu/softfloat: re-factor float to int/uint
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (14 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 15/20] fpu/softfloat: re-factor round_to_int Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-09 17:12   ` Richard Henderson
                     ` (2 more replies)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 17/20] fpu/softfloat: re-factor int/uint to float Alex Bennée
                   ` (4 subsequent siblings)
  20 siblings, 3 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

We share the common int64/uint64_pack_decomposed function across all
the helpers and simply limit the final result depending on the final
size.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

--
v2
  - apply float_flg_invalid fixes next patch
---
 fpu/softfloat.c         | 1011 +++++++++++------------------------------------
 include/fpu/softfloat.h |   13 +
 2 files changed, 235 insertions(+), 789 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index edc35300d1..514f43c065 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1312,6 +1312,194 @@ float64 float64_trunc_to_int(float64 a, float_status *s)
     return float64_round_pack_canonical(pr, s);
 }
 
+/*----------------------------------------------------------------------------
+| Returns the result of converting the floating-point value
+| `a' to the two's complement integer format.  The conversion is
+| performed according to the IEC/IEEE Standard for Binary Floating-Point
+| Arithmetic---which means in particular that the conversion is rounded
+| according to the current rounding mode.  If `a' is a NaN, the largest
+| positive integer is returned.  Otherwise, if the conversion overflows, the
+| largest integer with the same sign as `a' is returned.
+*----------------------------------------------------------------------------*/
+
+static int64_t int64_pack_decomposed(decomposed_parts p, float_status *s)
+{
+    uint64_t r;
+
+    switch (p.cls) {
+    case float_class_snan:
+    case float_class_qnan:
+        return INT64_MAX;
+    case float_class_inf:
+        return p.sign ? INT64_MIN : INT64_MAX;
+    case float_class_zero:
+        return 0;
+    case float_class_normal:
+        if (p.exp < DECOMPOSED_BINARY_POINT) {
+            r = p.frac >> (DECOMPOSED_BINARY_POINT - p.exp);
+        } else if (p.exp < 64) {
+            r = p.frac << (p.exp - DECOMPOSED_BINARY_POINT);
+        } else {
+            s->float_exception_flags |= float_flag_invalid;
+            r = UINT64_MAX;
+        }
+        if (p.sign) {
+            return r < - (uint64_t) INT64_MIN ? -r : INT64_MIN;
+        } else {
+            return r < INT64_MAX ? r : INT64_MAX;
+        }
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static int16_t int16_pack_decomposed(decomposed_parts p, float_status *s)
+{
+    int64_t r = int64_pack_decomposed(p, s);
+    if (r < INT16_MIN) {
+        s->float_exception_flags |= float_flag_invalid;
+        return INT16_MIN;
+    } else if (r > INT16_MAX) {
+        s->float_exception_flags |= float_flag_invalid;
+        return INT16_MAX;
+    }
+    return r;
+}
+
+static int32_t int32_pack_decomposed(decomposed_parts p, float_status *s)
+{
+    int64_t r = int64_pack_decomposed(p, s);
+    if (r < INT32_MIN) {
+        s->float_exception_flags |= float_flag_invalid;
+        return INT32_MIN;
+    } else if (r > INT32_MAX) {
+        s->float_exception_flags |= float_flag_invalid;
+        return INT32_MAX;
+    }
+    return r;
+}
+
+#define FLOAT_TO_INT(fsz, isz) \
+int ## isz ## _t float ## fsz ## _to_int ## isz(float ## fsz a, float_status *s) \
+{                                                                       \
+    decomposed_parts pa = float ## fsz ## _unpack_canonical(a, s);      \
+    decomposed_parts pr = round_decomposed(pa, s->float_rounding_mode, s); \
+    return int ## isz ## _pack_decomposed(pr, s);                       \
+}                                                                       \
+                                                                        \
+int ## isz ## _t float ## fsz ## _to_int ## isz ## _round_to_zero       \
+ (float ## fsz a, float_status *s)                                      \
+{                                                                       \
+    decomposed_parts pa = float ## fsz ## _unpack_canonical(a, s);      \
+    decomposed_parts pr = round_decomposed(pa, float_round_to_zero, s); \
+    return int ## isz ## _pack_decomposed(pr, s);                       \
+}
+
+FLOAT_TO_INT(16, 16)
+FLOAT_TO_INT(16, 32)
+FLOAT_TO_INT(16, 64)
+
+FLOAT_TO_INT(32, 16)
+FLOAT_TO_INT(32, 32)
+FLOAT_TO_INT(32, 64)
+
+FLOAT_TO_INT(64, 16)
+FLOAT_TO_INT(64, 32)
+FLOAT_TO_INT(64, 64)
+
+#undef FLOAT_TO_INT
+
+/*
+ *  Returns the result of converting the floating-point value `a' to
+ *  the unsigned integer format. The conversion is performed according
+ *  to the IEC/IEEE Standard for Binary Floating-Point
+ *  Arithmetic---which means in particular that the conversion is
+ *  rounded according to the current rounding mode. If `a' is a NaN,
+ *  the largest unsigned integer is returned. Otherwise, if the
+ *  conversion overflows, the largest unsigned integer is returned. If
+ *  the 'a' is negative, the result is rounded and zero is returned;
+ *  values that do not round to zero will raise the inexact exception
+ *  flag.
+ */
+
+static uint64_t uint64_pack_decomposed(decomposed_parts p, float_status *s)
+{
+    switch (p.cls) {
+    case float_class_snan:
+    case float_class_qnan:
+        return UINT64_MAX;
+    case float_class_inf:
+        return p.sign ? 0 : UINT64_MAX;
+    case float_class_zero:
+        return 0;
+    case float_class_normal:
+        if (p.sign) {
+            s->float_exception_flags |= float_flag_invalid;
+            return 0;
+        }
+        if (p.exp < DECOMPOSED_BINARY_POINT) {
+            return p.frac >> (DECOMPOSED_BINARY_POINT - p.exp);
+        } else if (p.exp < 64) {
+            return p.frac << (p.exp - DECOMPOSED_BINARY_POINT);
+        } else {
+            s->float_exception_flags |= float_flag_invalid;
+            return UINT64_MAX;
+        }
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint16_t uint16_pack_decomposed(decomposed_parts p, float_status *s)
+{
+    uint64_t r = uint64_pack_decomposed(p, s);
+    if (r > UINT16_MAX) {
+        s->float_exception_flags |= float_flag_invalid;
+        r = UINT16_MAX;
+    }
+    return r;
+}
+
+static uint32_t uint32_pack_decomposed(decomposed_parts p, float_status *s)
+{
+    uint64_t r = uint64_pack_decomposed(p, s);
+    if (r > UINT32_MAX) {
+        s->float_exception_flags |= float_flag_invalid;
+        r = UINT32_MAX;
+    }
+    return r;
+}
+
+#define FLOAT_TO_UINT(fsz, isz) \
+uint ## isz ## _t float ## fsz ## _to_uint ## isz(float ## fsz a, float_status *s) \
+{                                                                       \
+    decomposed_parts pa = float ## fsz ## _unpack_canonical(a, s);      \
+    decomposed_parts pr = round_decomposed(pa, s->float_rounding_mode, s); \
+    return uint ## isz ## _pack_decomposed(pr, s);                      \
+}                                                                       \
+                                                                        \
+uint ## isz ## _t float ## fsz ## _to_uint ## isz ## _round_to_zero     \
+ (float ## fsz a, float_status *s)                                      \
+{                                                                       \
+    decomposed_parts pa = float ## fsz ## _unpack_canonical(a, s);      \
+    decomposed_parts pr = round_decomposed(pa, float_round_to_zero, s); \
+    return uint ## isz ## _pack_decomposed(pr, s);                      \
+}
+
+FLOAT_TO_UINT(16, 16)
+FLOAT_TO_UINT(16, 32)
+FLOAT_TO_UINT(16, 64)
+
+FLOAT_TO_UINT(32, 16)
+FLOAT_TO_UINT(32, 32)
+FLOAT_TO_UINT(32, 64)
+
+FLOAT_TO_UINT(64, 16)
+FLOAT_TO_UINT(64, 32)
+FLOAT_TO_UINT(64, 64)
+
+#undef FLOAT_TO_UINT
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -2663,288 +2851,8 @@ float128 uint64_to_float128(uint64_t a, float_status *status)
     return normalizeRoundAndPackFloat128(0, 0x406E, a, 0, status);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the 32-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| positive integer is returned.  Otherwise, if the conversion overflows, the
-| largest integer with the same sign as `a' is returned.
-*----------------------------------------------------------------------------*/
 
-int32_t float32_to_int32(float32 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint32_t aSig;
-    uint64_t aSig64;
-
-    a = float32_squash_input_denormal(a, status);
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    if ( ( aExp == 0xFF ) && aSig ) aSign = 0;
-    if ( aExp ) aSig |= 0x00800000;
-    shiftCount = 0xAF - aExp;
-    aSig64 = aSig;
-    aSig64 <<= 32;
-    if ( 0 < shiftCount ) shift64RightJamming( aSig64, shiftCount, &aSig64 );
-    return roundAndPackInt32(aSign, aSig64, status);
 
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the 32-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.
-| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
-| the conversion overflows, the largest integer with the same sign as `a' is
-| returned.
-*----------------------------------------------------------------------------*/
-
-int32_t float32_to_int32_round_to_zero(float32 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint32_t aSig;
-    int32_t z;
-    a = float32_squash_input_denormal(a, status);
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    shiftCount = aExp - 0x9E;
-    if ( 0 <= shiftCount ) {
-        if ( float32_val(a) != 0xCF000000 ) {
-            float_raise(float_flag_invalid, status);
-            if ( ! aSign || ( ( aExp == 0xFF ) && aSig ) ) return 0x7FFFFFFF;
-        }
-        return (int32_t) 0x80000000;
-    }
-    else if ( aExp <= 0x7E ) {
-        if (aExp | aSig) {
-            status->float_exception_flags |= float_flag_inexact;
-        }
-        return 0;
-    }
-    aSig = ( aSig | 0x00800000 )<<8;
-    z = aSig>>( - shiftCount );
-    if ( (uint32_t) ( aSig<<( shiftCount & 31 ) ) ) {
-        status->float_exception_flags |= float_flag_inexact;
-    }
-    if ( aSign ) z = - z;
-    return z;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the 16-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.
-| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
-| the conversion overflows, the largest integer with the same sign as `a' is
-| returned.
-*----------------------------------------------------------------------------*/
-
-int16_t float32_to_int16_round_to_zero(float32 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint32_t aSig;
-    int32_t z;
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    shiftCount = aExp - 0x8E;
-    if ( 0 <= shiftCount ) {
-        if ( float32_val(a) != 0xC7000000 ) {
-            float_raise(float_flag_invalid, status);
-            if ( ! aSign || ( ( aExp == 0xFF ) && aSig ) ) {
-                return 0x7FFF;
-            }
-        }
-        return (int32_t) 0xffff8000;
-    }
-    else if ( aExp <= 0x7E ) {
-        if ( aExp | aSig ) {
-            status->float_exception_flags |= float_flag_inexact;
-        }
-        return 0;
-    }
-    shiftCount -= 0x10;
-    aSig = ( aSig | 0x00800000 )<<8;
-    z = aSig>>( - shiftCount );
-    if ( (uint32_t) ( aSig<<( shiftCount & 31 ) ) ) {
-        status->float_exception_flags |= float_flag_inexact;
-    }
-    if ( aSign ) {
-        z = - z;
-    }
-    return z;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the 64-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| positive integer is returned.  Otherwise, if the conversion overflows, the
-| largest integer with the same sign as `a' is returned.
-*----------------------------------------------------------------------------*/
-
-int64_t float32_to_int64(float32 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint32_t aSig;
-    uint64_t aSig64, aSigExtra;
-    a = float32_squash_input_denormal(a, status);
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    shiftCount = 0xBE - aExp;
-    if ( shiftCount < 0 ) {
-        float_raise(float_flag_invalid, status);
-        if ( ! aSign || ( ( aExp == 0xFF ) && aSig ) ) {
-            return LIT64( 0x7FFFFFFFFFFFFFFF );
-        }
-        return (int64_t) LIT64( 0x8000000000000000 );
-    }
-    if ( aExp ) aSig |= 0x00800000;
-    aSig64 = aSig;
-    aSig64 <<= 40;
-    shift64ExtraRightJamming( aSig64, 0, shiftCount, &aSig64, &aSigExtra );
-    return roundAndPackInt64(aSign, aSig64, aSigExtra, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the 64-bit unsigned integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| unsigned integer is returned.  Otherwise, if the conversion overflows, the
-| largest unsigned integer is returned.  If the 'a' is negative, the result
-| is rounded and zero is returned; values that do not round to zero will
-| raise the inexact exception flag.
-*----------------------------------------------------------------------------*/
-
-uint64_t float32_to_uint64(float32 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint32_t aSig;
-    uint64_t aSig64, aSigExtra;
-    a = float32_squash_input_denormal(a, status);
-
-    aSig = extractFloat32Frac(a);
-    aExp = extractFloat32Exp(a);
-    aSign = extractFloat32Sign(a);
-    if ((aSign) && (aExp > 126)) {
-        float_raise(float_flag_invalid, status);
-        if (float32_is_any_nan(a)) {
-            return LIT64(0xFFFFFFFFFFFFFFFF);
-        } else {
-            return 0;
-        }
-    }
-    shiftCount = 0xBE - aExp;
-    if (aExp) {
-        aSig |= 0x00800000;
-    }
-    if (shiftCount < 0) {
-        float_raise(float_flag_invalid, status);
-        return LIT64(0xFFFFFFFFFFFFFFFF);
-    }
-
-    aSig64 = aSig;
-    aSig64 <<= 40;
-    shift64ExtraRightJamming(aSig64, 0, shiftCount, &aSig64, &aSigExtra);
-    return roundAndPackUint64(aSign, aSig64, aSigExtra, status);
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the 64-bit unsigned integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.  If
-| `a' is a NaN, the largest unsigned integer is returned.  Otherwise, if the
-| conversion overflows, the largest unsigned integer is returned.  If the
-| 'a' is negative, the result is rounded and zero is returned; values that do
-| not round to zero will raise the inexact flag.
-*----------------------------------------------------------------------------*/
-
-uint64_t float32_to_uint64_round_to_zero(float32 a, float_status *status)
-{
-    signed char current_rounding_mode = status->float_rounding_mode;
-    set_float_rounding_mode(float_round_to_zero, status);
-    int64_t v = float32_to_uint64(a, status);
-    set_float_rounding_mode(current_rounding_mode, status);
-    return v;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the 64-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.  If
-| `a' is a NaN, the largest positive integer is returned.  Otherwise, if the
-| conversion overflows, the largest integer with the same sign as `a' is
-| returned.
-*----------------------------------------------------------------------------*/
-
-int64_t float32_to_int64_round_to_zero(float32 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint32_t aSig;
-    uint64_t aSig64;
-    int64_t z;
-    a = float32_squash_input_denormal(a, status);
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    shiftCount = aExp - 0xBE;
-    if ( 0 <= shiftCount ) {
-        if ( float32_val(a) != 0xDF000000 ) {
-            float_raise(float_flag_invalid, status);
-            if ( ! aSign || ( ( aExp == 0xFF ) && aSig ) ) {
-                return LIT64( 0x7FFFFFFFFFFFFFFF );
-            }
-        }
-        return (int64_t) LIT64( 0x8000000000000000 );
-    }
-    else if ( aExp <= 0x7E ) {
-        if (aExp | aSig) {
-            status->float_exception_flags |= float_flag_inexact;
-        }
-        return 0;
-    }
-    aSig64 = aSig | 0x00800000;
-    aSig64 <<= 40;
-    z = aSig64>>( - shiftCount );
-    if ( (uint64_t) ( aSig64<<( shiftCount & 63 ) ) ) {
-        status->float_exception_flags |= float_flag_inexact;
-    }
-    if ( aSign ) z = - z;
-    return z;
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the result of converting the single-precision floating-point value
@@ -3500,289 +3408,59 @@ int float32_le_quiet(float32 a, float32 b, float_status *status)
 | Returns 1 if the single-precision floating-point value `a' is less than
 | the corresponding value `b', and 0 otherwise.  Quiet NaNs do not cause an
 | exception.  Otherwise, the comparison is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-int float32_lt_quiet(float32 a, float32 b, float_status *status)
-{
-    flag aSign, bSign;
-    uint32_t av, bv;
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-
-    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
-         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
-       ) {
-        if (float32_is_signaling_nan(a, status)
-         || float32_is_signaling_nan(b, status)) {
-            float_raise(float_flag_invalid, status);
-        }
-        return 0;
-    }
-    aSign = extractFloat32Sign( a );
-    bSign = extractFloat32Sign( b );
-    av = float32_val(a);
-    bv = float32_val(b);
-    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
-    return ( av != bv ) && ( aSign ^ ( av < bv ) );
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns 1 if the single-precision floating-point values `a' and `b' cannot
-| be compared, and 0 otherwise.  Quiet NaNs do not cause an exception.  The
-| comparison is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-int float32_unordered_quiet(float32 a, float32 b, float_status *status)
-{
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-
-    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
-         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
-       ) {
-        if (float32_is_signaling_nan(a, status)
-         || float32_is_signaling_nan(b, status)) {
-            float_raise(float_flag_invalid, status);
-        }
-        return 1;
-    }
-    return 0;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point value
-| `a' to the 32-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| positive integer is returned.  Otherwise, if the conversion overflows, the
-| largest integer with the same sign as `a' is returned.
-*----------------------------------------------------------------------------*/
-
-int32_t float64_to_int32(float64 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint64_t aSig;
-    a = float64_squash_input_denormal(a, status);
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    if ( ( aExp == 0x7FF ) && aSig ) aSign = 0;
-    if ( aExp ) aSig |= LIT64( 0x0010000000000000 );
-    shiftCount = 0x42C - aExp;
-    if ( 0 < shiftCount ) shift64RightJamming( aSig, shiftCount, &aSig );
-    return roundAndPackInt32(aSign, aSig, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point value
-| `a' to the 32-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.
-| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
-| the conversion overflows, the largest integer with the same sign as `a' is
-| returned.
-*----------------------------------------------------------------------------*/
-
-int32_t float64_to_int32_round_to_zero(float64 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint64_t aSig, savedASig;
-    int32_t z;
-    a = float64_squash_input_denormal(a, status);
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    if ( 0x41E < aExp ) {
-        if ( ( aExp == 0x7FF ) && aSig ) aSign = 0;
-        goto invalid;
-    }
-    else if ( aExp < 0x3FF ) {
-        if (aExp || aSig) {
-            status->float_exception_flags |= float_flag_inexact;
-        }
-        return 0;
-    }
-    aSig |= LIT64( 0x0010000000000000 );
-    shiftCount = 0x433 - aExp;
-    savedASig = aSig;
-    aSig >>= shiftCount;
-    z = aSig;
-    if ( aSign ) z = - z;
-    if ( ( z < 0 ) ^ aSign ) {
- invalid:
-        float_raise(float_flag_invalid, status);
-        return aSign ? (int32_t) 0x80000000 : 0x7FFFFFFF;
-    }
-    if ( ( aSig<<shiftCount ) != savedASig ) {
-        status->float_exception_flags |= float_flag_inexact;
-    }
-    return z;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point value
-| `a' to the 16-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.
-| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
-| the conversion overflows, the largest integer with the same sign as `a' is
-| returned.
-*----------------------------------------------------------------------------*/
-
-int16_t float64_to_int16_round_to_zero(float64 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint64_t aSig, savedASig;
-    int32_t z;
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    if ( 0x40E < aExp ) {
-        if ( ( aExp == 0x7FF ) && aSig ) {
-            aSign = 0;
-        }
-        goto invalid;
-    }
-    else if ( aExp < 0x3FF ) {
-        if ( aExp || aSig ) {
-            status->float_exception_flags |= float_flag_inexact;
-        }
-        return 0;
-    }
-    aSig |= LIT64( 0x0010000000000000 );
-    shiftCount = 0x433 - aExp;
-    savedASig = aSig;
-    aSig >>= shiftCount;
-    z = aSig;
-    if ( aSign ) {
-        z = - z;
-    }
-    if ( ( (int16_t)z < 0 ) ^ aSign ) {
- invalid:
-        float_raise(float_flag_invalid, status);
-        return aSign ? (int32_t) 0xffff8000 : 0x7FFF;
-    }
-    if ( ( aSig<<shiftCount ) != savedASig ) {
-        status->float_exception_flags |= float_flag_inexact;
-    }
-    return z;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point value
-| `a' to the 64-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| positive integer is returned.  Otherwise, if the conversion overflows, the
-| largest integer with the same sign as `a' is returned.
+| Standard for Binary Floating-Point Arithmetic.
 *----------------------------------------------------------------------------*/
 
-int64_t float64_to_int64(float64 a, float_status *status)
+int float32_lt_quiet(float32 a, float32 b, float_status *status)
 {
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint64_t aSig, aSigExtra;
-    a = float64_squash_input_denormal(a, status);
+    flag aSign, bSign;
+    uint32_t av, bv;
+    a = float32_squash_input_denormal(a, status);
+    b = float32_squash_input_denormal(b, status);
 
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    if ( aExp ) aSig |= LIT64( 0x0010000000000000 );
-    shiftCount = 0x433 - aExp;
-    if ( shiftCount <= 0 ) {
-        if ( 0x43E < aExp ) {
+    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
+         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
+       ) {
+        if (float32_is_signaling_nan(a, status)
+         || float32_is_signaling_nan(b, status)) {
             float_raise(float_flag_invalid, status);
-            if (    ! aSign
-                 || (    ( aExp == 0x7FF )
-                      && ( aSig != LIT64( 0x0010000000000000 ) ) )
-               ) {
-                return LIT64( 0x7FFFFFFFFFFFFFFF );
-            }
-            return (int64_t) LIT64( 0x8000000000000000 );
         }
-        aSigExtra = 0;
-        aSig <<= - shiftCount;
-    }
-    else {
-        shift64ExtraRightJamming( aSig, 0, shiftCount, &aSig, &aSigExtra );
+        return 0;
     }
-    return roundAndPackInt64(aSign, aSig, aSigExtra, status);
+    aSign = extractFloat32Sign( a );
+    bSign = extractFloat32Sign( b );
+    av = float32_val(a);
+    bv = float32_val(b);
+    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
+    return ( av != bv ) && ( aSign ^ ( av < bv ) );
 
 }
 
 /*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point value
-| `a' to the 64-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.
-| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
-| the conversion overflows, the largest integer with the same sign as `a' is
-| returned.
+| Returns 1 if the single-precision floating-point values `a' and `b' cannot
+| be compared, and 0 otherwise.  Quiet NaNs do not cause an exception.  The
+| comparison is performed according to the IEC/IEEE Standard for Binary
+| Floating-Point Arithmetic.
 *----------------------------------------------------------------------------*/
 
-int64_t float64_to_int64_round_to_zero(float64 a, float_status *status)
+int float32_unordered_quiet(float32 a, float32 b, float_status *status)
 {
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint64_t aSig;
-    int64_t z;
-    a = float64_squash_input_denormal(a, status);
+    a = float32_squash_input_denormal(a, status);
+    b = float32_squash_input_denormal(b, status);
 
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    if ( aExp ) aSig |= LIT64( 0x0010000000000000 );
-    shiftCount = aExp - 0x433;
-    if ( 0 <= shiftCount ) {
-        if ( 0x43E <= aExp ) {
-            if ( float64_val(a) != LIT64( 0xC3E0000000000000 ) ) {
-                float_raise(float_flag_invalid, status);
-                if (    ! aSign
-                     || (    ( aExp == 0x7FF )
-                          && ( aSig != LIT64( 0x0010000000000000 ) ) )
-                   ) {
-                    return LIT64( 0x7FFFFFFFFFFFFFFF );
-                }
-            }
-            return (int64_t) LIT64( 0x8000000000000000 );
-        }
-        z = aSig<<shiftCount;
-    }
-    else {
-        if ( aExp < 0x3FE ) {
-            if (aExp | aSig) {
-                status->float_exception_flags |= float_flag_inexact;
-            }
-            return 0;
-        }
-        z = aSig>>( - shiftCount );
-        if ( (uint64_t) ( aSig<<( shiftCount & 63 ) ) ) {
-            status->float_exception_flags |= float_flag_inexact;
+    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
+         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
+       ) {
+        if (float32_is_signaling_nan(a, status)
+         || float32_is_signaling_nan(b, status)) {
+            float_raise(float_flag_invalid, status);
         }
+        return 1;
     }
-    if ( aSign ) z = - z;
-    return z;
-
+    return 0;
 }
 
+
 /*----------------------------------------------------------------------------
 | Returns the result of converting the double-precision floating-point value
 | `a' to the single-precision floating-point format.  The conversion is
@@ -7049,252 +6727,7 @@ float64 uint32_to_float64(uint32_t a, float_status *status)
     return int64_to_float64(a, status);
 }
 
-uint32_t float32_to_uint32(float32 a, float_status *status)
-{
-    int64_t v;
-    uint32_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float32_to_int64(a, status);
-    if (v < 0) {
-        res = 0;
-    } else if (v > 0xffffffff) {
-        res = 0xffffffff;
-    } else {
-        return v;
-    }
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-uint32_t float32_to_uint32_round_to_zero(float32 a, float_status *status)
-{
-    int64_t v;
-    uint32_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float32_to_int64_round_to_zero(a, status);
-    if (v < 0) {
-        res = 0;
-    } else if (v > 0xffffffff) {
-        res = 0xffffffff;
-    } else {
-        return v;
-    }
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-int16_t float32_to_int16(float32 a, float_status *status)
-{
-    int32_t v;
-    int16_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float32_to_int32(a, status);
-    if (v < -0x8000) {
-        res = -0x8000;
-    } else if (v > 0x7fff) {
-        res = 0x7fff;
-    } else {
-        return v;
-    }
-
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-uint16_t float32_to_uint16(float32 a, float_status *status)
-{
-    int32_t v;
-    uint16_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float32_to_int32(a, status);
-    if (v < 0) {
-        res = 0;
-    } else if (v > 0xffff) {
-        res = 0xffff;
-    } else {
-        return v;
-    }
-
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-uint16_t float32_to_uint16_round_to_zero(float32 a, float_status *status)
-{
-    int64_t v;
-    uint16_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float32_to_int64_round_to_zero(a, status);
-    if (v < 0) {
-        res = 0;
-    } else if (v > 0xffff) {
-        res = 0xffff;
-    } else {
-        return v;
-    }
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-uint32_t float64_to_uint32(float64 a, float_status *status)
-{
-    uint64_t v;
-    uint32_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float64_to_uint64(a, status);
-    if (v > 0xffffffff) {
-        res = 0xffffffff;
-    } else {
-        return v;
-    }
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-uint32_t float64_to_uint32_round_to_zero(float64 a, float_status *status)
-{
-    uint64_t v;
-    uint32_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float64_to_uint64_round_to_zero(a, status);
-    if (v > 0xffffffff) {
-        res = 0xffffffff;
-    } else {
-        return v;
-    }
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-int16_t float64_to_int16(float64 a, float_status *status)
-{
-    int64_t v;
-    int16_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float64_to_int32(a, status);
-    if (v < -0x8000) {
-        res = -0x8000;
-    } else if (v > 0x7fff) {
-        res = 0x7fff;
-    } else {
-        return v;
-    }
-
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-uint16_t float64_to_uint16(float64 a, float_status *status)
-{
-    int64_t v;
-    uint16_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float64_to_int32(a, status);
-    if (v < 0) {
-        res = 0;
-    } else if (v > 0xffff) {
-        res = 0xffff;
-    } else {
-        return v;
-    }
-
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-uint16_t float64_to_uint16_round_to_zero(float64 a, float_status *status)
-{
-    int64_t v;
-    uint16_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float64_to_int64_round_to_zero(a, status);
-    if (v < 0) {
-        res = 0;
-    } else if (v > 0xffff) {
-        res = 0xffff;
-    } else {
-        return v;
-    }
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point value
-| `a' to the 64-bit unsigned integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| positive integer is returned.  If the conversion overflows, the
-| largest unsigned integer is returned.  If 'a' is negative, the value is
-| rounded and zero is returned; negative values that do not round to zero
-| will raise the inexact exception.
-*----------------------------------------------------------------------------*/
-
-uint64_t float64_to_uint64(float64 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint64_t aSig, aSigExtra;
-    a = float64_squash_input_denormal(a, status);
-
-    aSig = extractFloat64Frac(a);
-    aExp = extractFloat64Exp(a);
-    aSign = extractFloat64Sign(a);
-    if (aSign && (aExp > 1022)) {
-        float_raise(float_flag_invalid, status);
-        if (float64_is_any_nan(a)) {
-            return LIT64(0xFFFFFFFFFFFFFFFF);
-        } else {
-            return 0;
-        }
-    }
-    if (aExp) {
-        aSig |= LIT64(0x0010000000000000);
-    }
-    shiftCount = 0x433 - aExp;
-    if (shiftCount <= 0) {
-        if (0x43E < aExp) {
-            float_raise(float_flag_invalid, status);
-            return LIT64(0xFFFFFFFFFFFFFFFF);
-        }
-        aSigExtra = 0;
-        aSig <<= -shiftCount;
-    } else {
-        shift64ExtraRightJamming(aSig, 0, shiftCount, &aSig, &aSigExtra);
-    }
-    return roundAndPackUint64(aSign, aSig, aSigExtra, status);
-}
 
-uint64_t float64_to_uint64_round_to_zero(float64 a, float_status *status)
-{
-    signed char current_rounding_mode = status->float_rounding_mode;
-    set_float_rounding_mode(float_round_to_zero, status);
-    uint64_t v = float64_to_uint64(a, status);
-    set_float_rounding_mode(current_rounding_mode, status);
-    return v;
-}
 
 #define COMPARE(s, nan_exp)                                                  \
 static inline int float ## s ## _compare_internal(float ## s a, float ## s b,\
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 6427762a9a..d7bc7cbcb6 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -314,6 +314,19 @@ float16 float32_to_float16(float32, flag, float_status *status);
 float32 float16_to_float32(float16, flag, float_status *status);
 float16 float64_to_float16(float64 a, flag ieee, float_status *status);
 float64 float16_to_float64(float16 a, flag ieee, float_status *status);
+int16_t float16_to_int16(float16, float_status *status);
+uint16_t float16_to_uint16(float16 a, float_status *status);
+int16_t float16_to_int16_round_to_zero(float16, float_status *status);
+uint16_t float16_to_uint16_round_to_zero(float16 a, float_status *status);
+int32_t float16_to_int32(float16, float_status *status);
+uint32_t float16_to_uint32(float16 a, float_status *status);
+int32_t float16_to_int32_round_to_zero(float16, float_status *status);
+uint32_t float16_to_uint32_round_to_zero(float16 a, float_status *status);
+int64_t float16_to_int64(float16, float_status *status);
+uint64_t float16_to_uint64(float16 a, float_status *status);
+int64_t float16_to_int64_round_to_zero(float16, float_status *status);
+uint64_t float16_to_uint64_round_to_zero(float16 a, float_status *status);
+float16 int16_to_float16(int16_t a, float_status *status);
 
 /*----------------------------------------------------------------------------
 | Software half-precision operations.
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 17/20] fpu/softfloat: re-factor int/uint to float
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (15 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 16/20] fpu/softfloat: re-factor float to int/uint Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 18/20] fpu/softfloat: re-factor scalbn Alex Bennée
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

These are considerably simpler as the lower order integers can just
use the higher order conversion function. As the decomposed fractional
part is a full 64 bit rounding and inexact handling comes from the
pack functions.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
v2
  - explicit setting of r.sign
---
 fpu/softfloat.c         | 322 ++++++++++++++++++++++++------------------------
 include/fpu/softfloat.h |  30 ++---
 2 files changed, 172 insertions(+), 180 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 514f43c065..bb68d77f72 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1500,6 +1500,169 @@ FLOAT_TO_UINT(64, 64)
 
 #undef FLOAT_TO_UINT
 
+/*
+ * Integer to float conversions
+ *
+ * Returns the result of converting the two's complement integer `a'
+ * to the floating-point format. The conversion is performed according
+ * to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+ */
+
+static decomposed_parts int_to_float(int64_t a, float_status *status)
+{
+    decomposed_parts r;
+    if (a == 0) {
+        r.cls = float_class_zero;
+        r.sign = false;
+    } else if (a == (1ULL << 63)) {
+        r.cls = float_class_normal;
+        r.sign = true;
+        r.frac = DECOMPOSED_IMPLICIT_BIT;
+        r.exp = 63;
+    } else {
+        uint64_t f;
+        if (a < 0) {
+            f = -a;
+            r.sign = true;
+        } else {
+            f = a;
+            r.sign = false;
+        }
+        int shift = clz64(f) - 1;
+        r.cls = float_class_normal;
+        r.exp = (DECOMPOSED_BINARY_POINT - shift);
+        r.frac = f << shift;
+    }
+
+    return r;
+}
+
+float16 int64_to_float16(int64_t a, float_status *status)
+{
+    decomposed_parts pa = int_to_float(a, status);
+    return float16_round_pack_canonical(pa, status);
+}
+
+float16 int32_to_float16(int32_t a, float_status *status)
+{
+    return int64_to_float16(a, status);
+}
+
+float16 int16_to_float16(int16_t a, float_status *status)
+{
+    return int64_to_float16(a, status);
+}
+
+float32 int64_to_float32(int64_t a, float_status *status)
+{
+    decomposed_parts pa = int_to_float(a, status);
+    return float32_round_pack_canonical(pa, status);
+}
+
+float32 int32_to_float32(int32_t a, float_status *status)
+{
+    return int64_to_float32(a, status);
+}
+
+float32 int16_to_float32(int16_t a, float_status *status)
+{
+    return int64_to_float32(a, status);
+}
+
+float64 int64_to_float64(int64_t a, float_status *status)
+{
+    decomposed_parts pa = int_to_float(a, status);
+    return float64_round_pack_canonical(pa, status);
+}
+
+float64 int32_to_float64(int32_t a, float_status *status)
+{
+    return int64_to_float64(a, status);
+}
+
+float64 int16_to_float64(int16_t a, float_status *status)
+{
+    return int64_to_float64(a, status);
+}
+
+
+/*
+ * Unsigned Integer to float conversions
+ *
+ * Returns the result of converting the unsigned integer `a' to the
+ * floating-point format. The conversion is performed according to the
+ * IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+ */
+
+static decomposed_parts uint_to_float(uint64_t a, float_status *status)
+{
+    decomposed_parts r = { .sign = false};
+
+    if (a == 0) {
+        r.cls = float_class_zero;
+    } else {
+        int spare_bits = clz64(a) - 1;
+        r.cls = float_class_normal;
+        r.exp = DECOMPOSED_BINARY_POINT - spare_bits;
+        if (spare_bits < 0) {
+            shift64RightJamming(a, -spare_bits, &a);
+            r.frac = a;
+        } else {
+            r.frac = a << spare_bits;
+        }
+    }
+
+    return r;
+}
+
+float16 uint64_to_float16(uint64_t a, float_status *status)
+{
+    decomposed_parts pa = uint_to_float(a, status);
+    return float16_round_pack_canonical(pa, status);
+}
+
+float16 uint32_to_float16(uint32_t a, float_status *status)
+{
+    return uint64_to_float16(a, status);
+}
+
+float16 uint16_to_float16(uint16_t a, float_status *status)
+{
+    return uint64_to_float16(a, status);
+}
+
+float32 uint64_to_float32(uint64_t a, float_status *status)
+{
+    decomposed_parts pa = uint_to_float(a, status);
+    return float32_round_pack_canonical(pa, status);
+}
+
+float32 uint32_to_float32(uint32_t a, float_status *status)
+{
+    return uint64_to_float32(a, status);
+}
+
+float32 uint16_to_float32(uint16_t a, float_status *status)
+{
+    return uint64_to_float32(a, status);
+}
+
+float64 uint64_to_float64(uint64_t a, float_status *status)
+{
+    decomposed_parts pa = uint_to_float(a, status);
+    return float64_round_pack_canonical(pa, status);
+}
+
+float64 uint32_to_float64(uint32_t a, float_status *status)
+{
+    return uint64_to_float64(a, status);
+}
+
+float64 uint16_to_float64(uint16_t a, float_status *status)
+{
+    return uint64_to_float64(a, status);
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -2591,43 +2754,6 @@ static float128 normalizeRoundAndPackFloat128(flag zSign, int32_t zExp,
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 32-bit two's complement integer `a'
-| to the single-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 int32_to_float32(int32_t a, float_status *status)
-{
-    flag zSign;
-
-    if ( a == 0 ) return float32_zero;
-    if ( a == (int32_t) 0x80000000 ) return packFloat32( 1, 0x9E, 0 );
-    zSign = ( a < 0 );
-    return normalizeRoundAndPackFloat32(zSign, 0x9C, zSign ? -a : a, status);
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 32-bit two's complement integer `a'
-| to the double-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 int32_to_float64(int32_t a, float_status *status)
-{
-    flag zSign;
-    uint32_t absA;
-    int8_t shiftCount;
-    uint64_t zSig;
-
-    if ( a == 0 ) return float64_zero;
-    zSign = ( a < 0 );
-    absA = zSign ? - a : a;
-    shiftCount = countLeadingZeros32( absA ) + 21;
-    zSig = absA;
-    return packFloat64( zSign, 0x432 - shiftCount, zSig<<shiftCount );
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the result of converting the 32-bit two's complement integer `a'
@@ -2674,56 +2800,6 @@ float128 int32_to_float128(int32_t a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 64-bit two's complement integer `a'
-| to the single-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 int64_to_float32(int64_t a, float_status *status)
-{
-    flag zSign;
-    uint64_t absA;
-    int8_t shiftCount;
-
-    if ( a == 0 ) return float32_zero;
-    zSign = ( a < 0 );
-    absA = zSign ? - a : a;
-    shiftCount = countLeadingZeros64( absA ) - 40;
-    if ( 0 <= shiftCount ) {
-        return packFloat32( zSign, 0x95 - shiftCount, absA<<shiftCount );
-    }
-    else {
-        shiftCount += 7;
-        if ( shiftCount < 0 ) {
-            shift64RightJamming( absA, - shiftCount, &absA );
-        }
-        else {
-            absA <<= shiftCount;
-        }
-        return roundAndPackFloat32(zSign, 0x9C - shiftCount, absA, status);
-    }
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 64-bit two's complement integer `a'
-| to the double-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 int64_to_float64(int64_t a, float_status *status)
-{
-    flag zSign;
-
-    if ( a == 0 ) return float64_zero;
-    if ( a == (int64_t) LIT64( 0x8000000000000000 ) ) {
-        return packFloat64( 1, 0x43E, 0 );
-    }
-    zSign = ( a < 0 );
-    return normalizeRoundAndPackFloat64(zSign, 0x43C, zSign ? -a : a, status);
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of converting the 64-bit two's complement integer `a'
 | to the extended double-precision floating-point format.  The conversion
@@ -2778,65 +2854,6 @@ float128 int64_to_float128(int64_t a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 64-bit unsigned integer `a'
-| to the single-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 uint64_to_float32(uint64_t a, float_status *status)
-{
-    int shiftcount;
-
-    if (a == 0) {
-        return float32_zero;
-    }
-
-    /* Determine (left) shift needed to put first set bit into bit posn 23
-     * (since packFloat32() expects the binary point between bits 23 and 22);
-     * this is the fast case for smallish numbers.
-     */
-    shiftcount = countLeadingZeros64(a) - 40;
-    if (shiftcount >= 0) {
-        return packFloat32(0, 0x95 - shiftcount, a << shiftcount);
-    }
-    /* Otherwise we need to do a round-and-pack. roundAndPackFloat32()
-     * expects the binary point between bits 30 and 29, hence the + 7.
-     */
-    shiftcount += 7;
-    if (shiftcount < 0) {
-        shift64RightJamming(a, -shiftcount, &a);
-    } else {
-        a <<= shiftcount;
-    }
-
-    return roundAndPackFloat32(0, 0x9c - shiftcount, a, status);
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 64-bit unsigned integer `a'
-| to the double-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 uint64_to_float64(uint64_t a, float_status *status)
-{
-    int exp = 0x43C;
-    int shiftcount;
-
-    if (a == 0) {
-        return float64_zero;
-    }
-
-    shiftcount = countLeadingZeros64(a) - 1;
-    if (shiftcount < 0) {
-        shift64RightJamming(a, -shiftcount, &a);
-    } else {
-        a <<= shiftcount;
-    }
-    return roundAndPackFloat64(0, exp - shiftcount, a, status);
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of converting the 64-bit unsigned integer `a'
 | to the quadruple-precision floating-point format.  The conversion is performed
@@ -6716,19 +6733,6 @@ int float128_unordered_quiet(float128 a, float128 b, float_status *status)
     return 0;
 }
 
-/* misc functions */
-float32 uint32_to_float32(uint32_t a, float_status *status)
-{
-    return int64_to_float32(a, status);
-}
-
-float64 uint32_to_float64(uint32_t a, float_status *status)
-{
-    return int64_to_float64(a, status);
-}
-
-
-
 #define COMPARE(s, nan_exp)                                                  \
 static inline int float ## s ## _compare_internal(float ## s a, float ## s b,\
                                       int is_quiet, float_status *status)    \
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index d7bc7cbcb6..aa9e30d254 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -272,9 +272,13 @@ enum {
 /*----------------------------------------------------------------------------
 | Software IEC/IEEE integer-to-floating-point conversion routines.
 *----------------------------------------------------------------------------*/
+float32 int16_to_float32(int16_t, float_status *status);
 float32 int32_to_float32(int32_t, float_status *status);
+float64 int16_to_float64(int16_t, float_status *status);
 float64 int32_to_float64(int32_t, float_status *status);
+float32 uint16_to_float32(uint16_t, float_status *status);
 float32 uint32_to_float32(uint32_t, float_status *status);
+float64 uint16_to_float64(uint16_t, float_status *status);
 float64 uint32_to_float64(uint32_t, float_status *status);
 floatx80 int32_to_floatx80(int32_t, float_status *status);
 float128 int32_to_float128(int32_t, float_status *status);
@@ -286,27 +290,6 @@ float32 uint64_to_float32(uint64_t, float_status *status);
 float64 uint64_to_float64(uint64_t, float_status *status);
 float128 uint64_to_float128(uint64_t, float_status *status);
 
-/* We provide the int16 versions for symmetry of API with float-to-int */
-static inline float32 int16_to_float32(int16_t v, float_status *status)
-{
-    return int32_to_float32(v, status);
-}
-
-static inline float32 uint16_to_float32(uint16_t v, float_status *status)
-{
-    return uint32_to_float32(v, status);
-}
-
-static inline float64 int16_to_float64(int16_t v, float_status *status)
-{
-    return int32_to_float64(v, status);
-}
-
-static inline float64 uint16_to_float64(uint16_t v, float_status *status)
-{
-    return uint32_to_float64(v, status);
-}
-
 /*----------------------------------------------------------------------------
 | Software half-precision conversion routines.
 *----------------------------------------------------------------------------*/
@@ -327,6 +310,11 @@ uint64_t float16_to_uint64(float16 a, float_status *status);
 int64_t float16_to_int64_round_to_zero(float16, float_status *status);
 uint64_t float16_to_uint64_round_to_zero(float16 a, float_status *status);
 float16 int16_to_float16(int16_t a, float_status *status);
+float16 int32_to_float16(int32_t a, float_status *status);
+float16 int64_to_float16(int64_t a, float_status *status);
+float16 uint16_to_float16(uint16_t a, float_status *status);
+float16 uint32_to_float16(uint32_t a, float_status *status);
+float16 uint64_to_float16(uint64_t a, float_status *status);
 
 /*----------------------------------------------------------------------------
 | Software half-precision operations.
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 18/20] fpu/softfloat: re-factor scalbn
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (16 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 17/20] fpu/softfloat: re-factor int/uint to float Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-12 16:31   ` Peter Maydell
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 19/20] fpu/softfloat: re-factor minmax Alex Bennée
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

This is one of the simpler manipulations you could make to a floating
point number.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c         | 104 +++++++++++++++---------------------------------
 include/fpu/softfloat.h |   1 +
 2 files changed, 32 insertions(+), 73 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index bb68d77f72..3647f6ca03 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1663,6 +1663,37 @@ float64 uint16_to_float64(uint16_t a, float_status *status)
     return uint64_to_float64(a, status);
 }
 
+/* Multiply A by 2 raised to the power N.  */
+static decomposed_parts scalbn_decomposed(decomposed_parts a, int n,
+                                          float_status *s)
+{
+    if (a.cls == float_class_normal) {
+        a.exp += n;
+    }
+    return a;
+}
+
+float16 float16_scalbn(float16 a, int n, float_status *status)
+{
+    decomposed_parts pa = float16_unpack_canonical(a, status);
+    decomposed_parts pr = scalbn_decomposed(pa, n, status);
+    return float16_round_pack_canonical(pr, status);
+}
+
+float32 float32_scalbn(float32 a, int n, float_status *status)
+{
+    decomposed_parts pa = float32_unpack_canonical(a, status);
+    decomposed_parts pr = scalbn_decomposed(pa, n, status);
+    return float32_round_pack_canonical(pr, status);
+}
+
+float64 float64_scalbn(float64 a, int n, float_status *status)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, status);
+    decomposed_parts pr = scalbn_decomposed(pa, n, status);
+    return float64_round_pack_canonical(pr, status);
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -6992,79 +7023,6 @@ MINMAX(32)
 MINMAX(64)
 
 
-/* Multiply A by 2 raised to the power N.  */
-float32 float32_scalbn(float32 a, int n, float_status *status)
-{
-    flag aSign;
-    int16_t aExp;
-    uint32_t aSig;
-
-    a = float32_squash_input_denormal(a, status);
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-
-    if ( aExp == 0xFF ) {
-        if ( aSig ) {
-            return propagateFloat32NaN(a, a, status);
-        }
-        return a;
-    }
-    if (aExp != 0) {
-        aSig |= 0x00800000;
-    } else if (aSig == 0) {
-        return a;
-    } else {
-        aExp++;
-    }
-
-    if (n > 0x200) {
-        n = 0x200;
-    } else if (n < -0x200) {
-        n = -0x200;
-    }
-
-    aExp += n - 1;
-    aSig <<= 7;
-    return normalizeRoundAndPackFloat32(aSign, aExp, aSig, status);
-}
-
-float64 float64_scalbn(float64 a, int n, float_status *status)
-{
-    flag aSign;
-    int16_t aExp;
-    uint64_t aSig;
-
-    a = float64_squash_input_denormal(a, status);
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-
-    if ( aExp == 0x7FF ) {
-        if ( aSig ) {
-            return propagateFloat64NaN(a, a, status);
-        }
-        return a;
-    }
-    if (aExp != 0) {
-        aSig |= LIT64( 0x0010000000000000 );
-    } else if (aSig == 0) {
-        return a;
-    } else {
-        aExp++;
-    }
-
-    if (n > 0x1000) {
-        n = 0x1000;
-    } else if (n < -0x1000) {
-        n = -0x1000;
-    }
-
-    aExp += n - 1;
-    aSig <<= 10;
-    return normalizeRoundAndPackFloat64(aSign, aExp, aSig, status);
-}
-
 floatx80 floatx80_scalbn(floatx80 a, int n, float_status *status)
 {
     flag aSign;
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index aa9e30d254..41338184d5 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -326,6 +326,7 @@ float16 float16_sub(float16, float16, float_status *status);
 float16 float16_mul(float16, float16, float_status *status);
 float16 float16_muladd(float16, float16, float16, int, float_status *status);
 float16 float16_div(float16, float16, float_status *status);
+float16 float16_scalbn(float16, int, float_status *status);
 
 int float16_is_quiet_nan(float16, float_status *status);
 int float16_is_signaling_nan(float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 19/20] fpu/softfloat: re-factor minmax
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (17 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 18/20] fpu/softfloat: re-factor scalbn Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-09 17:16   ` Richard Henderson
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 20/20] fpu/softfloat: re-factor compare Alex Bennée
  2018-01-09 13:07 ` [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions no-reply
  20 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

Let's do the same re-factor treatment for minmax functions. I still
use the MACRO trick to expand but now all the checking code is common.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - minor indentation fix
---
 fpu/softfloat.c         | 239 ++++++++++++++++++++++++++----------------------
 include/fpu/softfloat.h |   6 ++
 2 files changed, 134 insertions(+), 111 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3647f6ca03..1dd9bd5972 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1663,6 +1663,134 @@ float64 uint16_to_float64(uint16_t a, float_status *status)
     return uint64_to_float64(a, status);
 }
 
+/* Float Min/Max */
+/* min() and max() functions. These can't be implemented as
+ * 'compare and pick one input' because that would mishandle
+ * NaNs and +0 vs -0.
+ *
+ * minnum() and maxnum() functions. These are similar to the min()
+ * and max() functions but if one of the arguments is a QNaN and
+ * the other is numerical then the numerical argument is returned.
+ * SNaNs will get quietened before being returned.
+ * minnum() and maxnum correspond to the IEEE 754-2008 minNum()
+ * and maxNum() operations. min() and max() are the typical min/max
+ * semantics provided by many CPUs which predate that specification.
+ *
+ * minnummag() and maxnummag() functions correspond to minNumMag()
+ * and minNumMag() from the IEEE-754 2008.
+ */
+static decomposed_parts minmax_decomposed(decomposed_parts a,
+                                          decomposed_parts b,
+                                          bool ismin, bool ieee, bool ismag,
+                                          float_status *s)
+{
+    if (a.cls >= float_class_qnan
+        ||
+        b.cls >= float_class_qnan)
+    {
+        if (ieee) {
+            /* Takes two floating-point values `a' and `b', one of
+             * which is a NaN, and returns the appropriate NaN
+             * result. If either `a' or `b' is a signaling NaN,
+             * the invalid exception is raised.
+             */
+            if (a.cls == float_class_snan || b.cls == float_class_snan) {
+                return pick_nan_parts(a, b, s);
+            } else if (a.cls >= float_class_qnan
+                       &&
+                       b.cls < float_class_qnan) {
+                return b;
+            } else if (b.cls >= float_class_qnan
+                       &&
+                       a.cls < float_class_qnan) {
+                return a;
+            }
+        }
+        return pick_nan_parts(a, b, s);
+    } else {
+        int a_exp, b_exp;
+        bool a_sign, b_sign;
+
+        switch (a.cls) {
+        case float_class_normal:
+            a_exp = a.exp;
+            break;
+        case float_class_inf:
+            a_exp = INT_MAX;
+            break;
+        case float_class_zero:
+            a_exp = INT_MIN;
+            break;
+        default:
+            g_assert_not_reached();
+            break;
+        }
+        switch (b.cls) {
+        case float_class_normal:
+            b_exp = b.exp;
+            break;
+        case float_class_inf:
+            b_exp = INT_MAX;
+            break;
+        case float_class_zero:
+            b_exp = INT_MIN;
+            break;
+        default:
+            g_assert_not_reached();
+            break;
+        }
+
+        a_sign = a.sign;
+        b_sign = b.sign;
+        if (ismag) {
+            a_sign = b_sign = 0;
+        }
+
+        if (a_sign == b_sign) {
+            bool a_less = a_exp < b_exp;
+            if (a_exp == b_exp) {
+                a_less = a.frac < b.frac;
+            }
+            return a_sign ^ a_less ^ ismin ? b : a;
+        } else {
+            return a_sign ^ ismin ? b : a;
+        }
+    }
+}
+
+#define MINMAX(sz, name, ismin, isiee, ismag)                           \
+float ## sz float ## sz ## _ ## name(float ## sz a, float ## sz b, float_status *s) \
+{                                                                       \
+    decomposed_parts pa = float ## sz ## _unpack_canonical(a, s);       \
+    decomposed_parts pb = float ## sz ## _unpack_canonical(b, s);       \
+    decomposed_parts pr = minmax_decomposed(pa, pb, ismin, isiee, ismag, s); \
+                                                                        \
+    return float ## sz ## _round_pack_canonical(pr, s);                \
+}
+
+MINMAX(16, min, true, false, false)
+MINMAX(16, minnum, true, true, false)
+MINMAX(16, minnummag, true, true, true)
+MINMAX(16, max, false, false, false)
+MINMAX(16, maxnum, false, true, false)
+MINMAX(16, maxnummag, false, true, true)
+
+MINMAX(32, min, true, false, false)
+MINMAX(32, minnum, true, true, false)
+MINMAX(32, minnummag, true, true, true)
+MINMAX(32, max, false, false, false)
+MINMAX(32, maxnum, false, true, false)
+MINMAX(32, maxnummag, false, true, true)
+
+MINMAX(64, min, true, false, false)
+MINMAX(64, minnum, true, true, false)
+MINMAX(64, minnummag, true, true, true)
+MINMAX(64, max, false, false, false)
+MINMAX(64, maxnum, false, true, false)
+MINMAX(64, maxnummag, false, true, true)
+
+#undef MINMAX
+
 /* Multiply A by 2 raised to the power N.  */
 static decomposed_parts scalbn_decomposed(decomposed_parts a, int n,
                                           float_status *s)
@@ -6912,117 +7040,6 @@ int float128_compare_quiet(float128 a, float128 b, float_status *status)
     return float128_compare_internal(a, b, 1, status);
 }
 
-/* min() and max() functions. These can't be implemented as
- * 'compare and pick one input' because that would mishandle
- * NaNs and +0 vs -0.
- *
- * minnum() and maxnum() functions. These are similar to the min()
- * and max() functions but if one of the arguments is a QNaN and
- * the other is numerical then the numerical argument is returned.
- * SNaNs will get quietened before being returned.
- * minnum() and maxnum correspond to the IEEE 754-2008 minNum()
- * and maxNum() operations. min() and max() are the typical min/max
- * semantics provided by many CPUs which predate that specification.
- *
- * minnummag() and maxnummag() functions correspond to minNumMag()
- * and minNumMag() from the IEEE-754 2008.
- */
-#define MINMAX(s)                                                       \
-static inline float ## s float ## s ## _minmax(float ## s a, float ## s b,     \
-                                               int ismin, int isieee,   \
-                                               int ismag,               \
-                                               float_status *status)    \
-{                                                                       \
-    flag aSign, bSign;                                                  \
-    uint ## s ## _t av, bv, aav, abv;                                   \
-    a = float ## s ## _squash_input_denormal(a, status);                \
-    b = float ## s ## _squash_input_denormal(b, status);                \
-    if (float ## s ## _is_any_nan(a) ||                                 \
-        float ## s ## _is_any_nan(b)) {                                 \
-        if (isieee) {                                                   \
-            if (float ## s ## _is_signaling_nan(a, status) ||           \
-                float ## s ## _is_signaling_nan(b, status)) {           \
-                return propagateFloat ## s ## NaN(a, b, status);        \
-            } else  if (float ## s ## _is_quiet_nan(a, status) &&       \
-                !float ## s ##_is_any_nan(b)) {                         \
-                return b;                                               \
-            } else if (float ## s ## _is_quiet_nan(b, status) &&        \
-                       !float ## s ## _is_any_nan(a)) {                 \
-                return a;                                               \
-            }                                                           \
-        }                                                               \
-        return propagateFloat ## s ## NaN(a, b, status);                \
-    }                                                                   \
-    aSign = extractFloat ## s ## Sign(a);                               \
-    bSign = extractFloat ## s ## Sign(b);                               \
-    av = float ## s ## _val(a);                                         \
-    bv = float ## s ## _val(b);                                         \
-    if (ismag) {                                                        \
-        aav = float ## s ## _abs(av);                                   \
-        abv = float ## s ## _abs(bv);                                   \
-        if (aav != abv) {                                               \
-            if (ismin) {                                                \
-                return (aav < abv) ? a : b;                             \
-            } else {                                                    \
-                return (aav < abv) ? b : a;                             \
-            }                                                           \
-        }                                                               \
-    }                                                                   \
-    if (aSign != bSign) {                                               \
-        if (ismin) {                                                    \
-            return aSign ? a : b;                                       \
-        } else {                                                        \
-            return aSign ? b : a;                                       \
-        }                                                               \
-    } else {                                                            \
-        if (ismin) {                                                    \
-            return (aSign ^ (av < bv)) ? a : b;                         \
-        } else {                                                        \
-            return (aSign ^ (av < bv)) ? b : a;                         \
-        }                                                               \
-    }                                                                   \
-}                                                                       \
-                                                                        \
-float ## s float ## s ## _min(float ## s a, float ## s b,               \
-                              float_status *status)                     \
-{                                                                       \
-    return float ## s ## _minmax(a, b, 1, 0, 0, status);                \
-}                                                                       \
-                                                                        \
-float ## s float ## s ## _max(float ## s a, float ## s b,               \
-                              float_status *status)                     \
-{                                                                       \
-    return float ## s ## _minmax(a, b, 0, 0, 0, status);                \
-}                                                                       \
-                                                                        \
-float ## s float ## s ## _minnum(float ## s a, float ## s b,            \
-                                 float_status *status)                  \
-{                                                                       \
-    return float ## s ## _minmax(a, b, 1, 1, 0, status);                \
-}                                                                       \
-                                                                        \
-float ## s float ## s ## _maxnum(float ## s a, float ## s b,            \
-                                 float_status *status)                  \
-{                                                                       \
-    return float ## s ## _minmax(a, b, 0, 1, 0, status);                \
-}                                                                       \
-                                                                        \
-float ## s float ## s ## _minnummag(float ## s a, float ## s b,         \
-                                    float_status *status)               \
-{                                                                       \
-    return float ## s ## _minmax(a, b, 1, 1, 1, status);                \
-}                                                                       \
-                                                                        \
-float ## s float ## s ## _maxnummag(float ## s a, float ## s b,         \
-                                    float_status *status)               \
-{                                                                       \
-    return float ## s ## _minmax(a, b, 0, 1, 1, status);                \
-}
-
-MINMAX(32)
-MINMAX(64)
-
-
 floatx80 floatx80_scalbn(floatx80 a, int n, float_status *status)
 {
     flag aSign;
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 41338184d5..c948727bbb 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -327,6 +327,12 @@ float16 float16_mul(float16, float16, float_status *status);
 float16 float16_muladd(float16, float16, float16, int, float_status *status);
 float16 float16_div(float16, float16, float_status *status);
 float16 float16_scalbn(float16, int, float_status *status);
+float16 float16_min(float16, float16, float_status *status);
+float16 float16_max(float16, float16, float_status *status);
+float16 float16_minnum(float16, float16, float_status *status);
+float16 float16_maxnum(float16, float16, float_status *status);
+float16 float16_minnummag(float16, float16, float_status *status);
+float16 float16_maxnummag(float16, float16, float_status *status);
 
 int float16_is_quiet_nan(float16, float_status *status);
 int float16_is_signaling_nan(float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v2 20/20] fpu/softfloat: re-factor compare
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (18 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 19/20] fpu/softfloat: re-factor minmax Alex Bennée
@ 2018-01-09 12:22 ` Alex Bennée
  2018-01-09 17:18   ` Richard Henderson
  2018-01-09 13:07 ` [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions no-reply
  20 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 12:22 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

The compare function was already expanded from a macro. I keep the
macro expansion but move most of the logic into a compare_decomposed.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - minor re-factor for better inf handling
---
 fpu/softfloat.c         | 134 +++++++++++++++++++++++++++++-------------------
 include/fpu/softfloat.h |   2 +
 2 files changed, 82 insertions(+), 54 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 1dd9bd5972..8eda35acd5 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1791,6 +1791,86 @@ MINMAX(64, maxnummag, false, true, true)
 
 #undef MINMAX
 
+/* Floating point compare */
+static int compare_decomposed(decomposed_parts a, decomposed_parts b,
+                              bool is_quiet, float_status *s)
+{
+    if (a.cls >= float_class_qnan
+        ||
+        b.cls >= float_class_qnan) {
+        if (!is_quiet ||
+            a.cls == float_class_snan ||
+            b.cls == float_class_snan) {
+            s->float_exception_flags |= float_flag_invalid;
+        }
+        return float_relation_unordered;
+    }
+
+    if (a.cls == float_class_zero) {
+        if (b.cls == float_class_zero) {
+            return float_relation_equal;
+        }
+        return b.sign ? float_relation_greater : float_relation_less;
+    } else if (b.cls == float_class_zero) {
+        return a.sign ? float_relation_less : float_relation_greater;
+    }
+
+    /* The only really important thing about infinity is its sign. If
+     * both are infinities the sign marks the smallest of the two.
+     */
+    if (a.cls == float_class_inf) {
+        if ((b.cls == float_class_inf) && (a.sign == b.sign)) {
+            return float_relation_equal;
+        }
+        return a.sign ? float_relation_less : float_relation_greater;
+    } else if (b.cls == float_class_inf) {
+        return b.sign ? float_relation_greater : float_relation_less;
+    }
+
+    if (a.sign != b.sign) {
+        return a.sign ? float_relation_less : float_relation_greater;
+    }
+
+    if (a.exp == b.exp) {
+        if (a.frac == b.frac) {
+            return float_relation_equal;
+        }
+        if (a.sign) {
+            return a.frac > b.frac ?
+                float_relation_less : float_relation_greater;
+        } else {
+            return a.frac > b.frac ?
+                float_relation_greater : float_relation_less;
+        }
+    } else {
+        if (a.sign) {
+            return a.exp > b.exp ? float_relation_less : float_relation_greater;
+        } else {
+            return a.exp > b.exp ? float_relation_greater : float_relation_less;
+        }
+    }
+}
+
+#define COMPARE(sz)                                                     \
+int float ## sz ## _compare(float ## sz a, float ## sz b, float_status *s) \
+{                                                                       \
+    decomposed_parts pa = float ## sz ## _unpack_canonical(a, s);       \
+    decomposed_parts pb = float ## sz ## _unpack_canonical(b, s);       \
+    return compare_decomposed(pa, pb, false, s);                        \
+}                                                                       \
+int float ## sz ## _compare_quiet(float ## sz a, float ## sz b, float_status *s) \
+{                                                                       \
+    decomposed_parts pa = float ## sz ## _unpack_canonical(a, s);       \
+    decomposed_parts pb = float ## sz ## _unpack_canonical(b, s);       \
+    return compare_decomposed(pa, pb, true, s);                         \
+}
+
+COMPARE(16)
+COMPARE(32)
+COMPARE(64)
+
+#undef COMPARE
+
 /* Multiply A by 2 raised to the power N.  */
 static decomposed_parts scalbn_decomposed(decomposed_parts a, int n,
                                           float_status *s)
@@ -6892,60 +6972,6 @@ int float128_unordered_quiet(float128 a, float128 b, float_status *status)
     return 0;
 }
 
-#define COMPARE(s, nan_exp)                                                  \
-static inline int float ## s ## _compare_internal(float ## s a, float ## s b,\
-                                      int is_quiet, float_status *status)    \
-{                                                                            \
-    flag aSign, bSign;                                                       \
-    uint ## s ## _t av, bv;                                                  \
-    a = float ## s ## _squash_input_denormal(a, status);                     \
-    b = float ## s ## _squash_input_denormal(b, status);                     \
-                                                                             \
-    if (( ( extractFloat ## s ## Exp( a ) == nan_exp ) &&                    \
-         extractFloat ## s ## Frac( a ) ) ||                                 \
-        ( ( extractFloat ## s ## Exp( b ) == nan_exp ) &&                    \
-          extractFloat ## s ## Frac( b ) )) {                                \
-        if (!is_quiet ||                                                     \
-            float ## s ## _is_signaling_nan(a, status) ||                  \
-            float ## s ## _is_signaling_nan(b, status)) {                 \
-            float_raise(float_flag_invalid, status);                         \
-        }                                                                    \
-        return float_relation_unordered;                                     \
-    }                                                                        \
-    aSign = extractFloat ## s ## Sign( a );                                  \
-    bSign = extractFloat ## s ## Sign( b );                                  \
-    av = float ## s ## _val(a);                                              \
-    bv = float ## s ## _val(b);                                              \
-    if ( aSign != bSign ) {                                                  \
-        if ( (uint ## s ## _t) ( ( av | bv )<<1 ) == 0 ) {                   \
-            /* zero case */                                                  \
-            return float_relation_equal;                                     \
-        } else {                                                             \
-            return 1 - (2 * aSign);                                          \
-        }                                                                    \
-    } else {                                                                 \
-        if (av == bv) {                                                      \
-            return float_relation_equal;                                     \
-        } else {                                                             \
-            return 1 - 2 * (aSign ^ ( av < bv ));                            \
-        }                                                                    \
-    }                                                                        \
-}                                                                            \
-                                                                             \
-int float ## s ## _compare(float ## s a, float ## s b, float_status *status) \
-{                                                                            \
-    return float ## s ## _compare_internal(a, b, 0, status);                 \
-}                                                                            \
-                                                                             \
-int float ## s ## _compare_quiet(float ## s a, float ## s b,                 \
-                                 float_status *status)                       \
-{                                                                            \
-    return float ## s ## _compare_internal(a, b, 1, status);                 \
-}
-
-COMPARE(32, 0xff)
-COMPARE(64, 0x7ff)
-
 static inline int floatx80_compare_internal(floatx80 a, floatx80 b,
                                             int is_quiet, float_status *status)
 {
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index c948727bbb..e5aa8d65f9 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -333,6 +333,8 @@ float16 float16_minnum(float16, float16, float_status *status);
 float16 float16_maxnum(float16, float16, float_status *status);
 float16 float16_minnummag(float16, float16, float_status *status);
 float16 float16_maxnummag(float16, float16, float_status *status);
+int float16_compare(float16, float16, float_status *status);
+int float16_compare_quiet(float16, float16, float_status *status);
 
 int float16_is_quiet_nan(float16, float_status *status);
 int float16_is_signaling_nan(float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/20] include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 02/20] include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES Alex Bennée
@ 2018-01-09 12:27   ` Laurent Vivier
  2018-01-09 14:12     ` Aurelien Jarno
  0 siblings, 1 reply; 68+ messages in thread
From: Laurent Vivier @ 2018-01-09 12:27 UTC (permalink / raw)
  To: Alex Bennée, richard.henderson, peter.maydell, bharata, andrew
  Cc: qemu-devel, Aurelien Jarno

Le 09/01/2018 à 13:22, Alex Bennée a écrit :
> It's not actively built and when enabled things fail to compile. I'm
> not sure the type-checking is really helping here. Seeing as we "own"
> our softfloat now lets remove the cruft.

I think it would be better to fix the build break than to remove the
type-checking tool.

but that's only my opinion...

Thanks,
Laurent

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/20] fpu/softfloat: re-factor mul
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 12/20] fpu/softfloat: re-factor mul Alex Bennée
@ 2018-01-09 12:43   ` Philippe Mathieu-Daudé
  2018-01-12 16:17   ` Peter Maydell
  1 sibling, 0 replies; 68+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-01-09 12:43 UTC (permalink / raw)
  To: Alex Bennée, richard.henderson, laurent, bharata
  Cc: peter.maydell, andrew, qemu-devel, Aurelien Jarno

On 01/09/2018 09:22 AM, Alex Bennée wrote:
> We can now add float16_mul and use the common decompose and
> canonicalize functions to have a single implementation for
> float16/32/64 versions.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

better safe than sorry :P

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions
  2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (19 preceding siblings ...)
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 20/20] fpu/softfloat: re-factor compare Alex Bennée
@ 2018-01-09 13:07 ` no-reply
  20 siblings, 0 replies; 68+ messages in thread
From: no-reply @ 2018-01-09 13:07 UTC (permalink / raw)
  To: alex.bennee
  Cc: famz, richard.henderson, peter.maydell, laurent, bharata, andrew,
	qemu-devel

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180109122252.17670-1-alex.bennee@linaro.org
Subject: [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
d5caeaa855 fpu/softfloat: re-factor compare
5f49459801 fpu/softfloat: re-factor minmax
94a4708c20 fpu/softfloat: re-factor scalbn
1df6e40a58 fpu/softfloat: re-factor int/uint to float
03f3d9611e fpu/softfloat: re-factor float to int/uint
b07b449ca1 fpu/softfloat: re-factor round_to_int
cbba796889 fpu/softfloat: re-factor muladd
d8ef56eff6 fpu/softfloat: re-factor div
2674f58839 fpu/softfloat: re-factor mul
212fe78e1f fpu/softfloat: re-factor add/sub
51f0c9463a fpu/softfloat: define decompose structures
24138b374d fpu/softfloat: move the extract functions to the top of the file
a9279e4a7b fpu/softfloat: improve comments on ARM NaN propagation
c2c68502de fpu/softfloat: propagate signalling NaNs in MINMAX
830590963d include/fpu/softfloat: add some float16 constants
1ec2148d81 include/fpu/softfloat: implement float16_set_sign helper
10028d5591 include/fpu/softfloat: implement float16_chs helper
f522785e08 include/fpu/softfloat: implement float16_abs helper
dacd77a7f6 include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES
f895c43593 fpu/softfloat: implement float16_squash_input_denormal

=== OUTPUT BEGIN ===
Checking PATCH 1/20: fpu/softfloat: implement float16_squash_input_denormal...
Checking PATCH 2/20: include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES...
Checking PATCH 3/20: include/fpu/softfloat: implement float16_abs helper...
Checking PATCH 4/20: include/fpu/softfloat: implement float16_chs helper...
Checking PATCH 5/20: include/fpu/softfloat: implement float16_set_sign helper...
Checking PATCH 6/20: include/fpu/softfloat: add some float16 constants...
Checking PATCH 7/20: fpu/softfloat: propagate signalling NaNs in MINMAX...
Checking PATCH 8/20: fpu/softfloat: improve comments on ARM NaN propagation...
Checking PATCH 9/20: fpu/softfloat: move the extract functions to the top of the file...
Checking PATCH 10/20: fpu/softfloat: define decompose structures...
ERROR: spaces prohibited around that ':' (ctx:WxW)
#53: FILE: fpu/softfloat.c:210:
+    uint64_t frac   : 64;
                     ^

ERROR: spaces prohibited around that ':' (ctx:WxW)
#54: FILE: fpu/softfloat.c:211:
+    int exp         : 32;
                     ^

ERROR: space prohibited before that ':' (ctx:WxW)
#56: FILE: fpu/softfloat.c:213:
+    int             : 23;
                     ^

ERROR: spaces prohibited around that ':' (ctx:WxW)
#57: FILE: fpu/softfloat.c:214:
+    bool sign       : 1;
                     ^

total: 4 errors, 0 warnings, 82 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 11/20: fpu/softfloat: re-factor add/sub...
WARNING: line over 80 characters
#141: FILE: fpu/softfloat.c:364:
+                                                   const decomposed_params *parm)

total: 0 errors, 1 warnings, 938 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 12/20: fpu/softfloat: re-factor mul...
Checking PATCH 13/20: fpu/softfloat: re-factor div...
Checking PATCH 14/20: fpu/softfloat: re-factor muladd...
Checking PATCH 15/20: fpu/softfloat: re-factor round_to_int...
WARNING: line over 80 characters
#91: FILE: fpu/softfloat.c:1252:
+                inc = ((a.frac & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);

total: 0 errors, 1 warnings, 329 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 16/20: fpu/softfloat: re-factor float to int/uint...
ERROR: space prohibited after that '-' (ctx:WxW)
#59: FILE: fpu/softfloat.c:1347:
+            return r < - (uint64_t) INT64_MIN ? -r : INT64_MIN;
                        ^

WARNING: line over 80 characters
#95: FILE: fpu/softfloat.c:1383:
+int ## isz ## _t float ## fsz ## _to_int ## isz(float ## fsz a, float_status *s) \

WARNING: line over 80 characters
#186: FILE: fpu/softfloat.c:1474:
+uint ## isz ## _t float ## fsz ## _to_uint ## isz(float ## fsz a, float_status *s) \

ERROR: space prohibited after that open parenthesis '('
#726: FILE: fpu/softfloat.c:3421:
+    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )

ERROR: space prohibited before that close parenthesis ')'
#726: FILE: fpu/softfloat.c:3421:
+    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )

ERROR: space prohibited after that open parenthesis '('
#727: FILE: fpu/softfloat.c:3422:
+         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )

ERROR: space prohibited before that close parenthesis ')'
#727: FILE: fpu/softfloat.c:3422:
+         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )

ERROR: space prohibited after that open parenthesis '('
#748: FILE: fpu/softfloat.c:3430:
+    aSign = extractFloat32Sign( a );

ERROR: space prohibited before that close parenthesis ')'
#748: FILE: fpu/softfloat.c:3430:
+    aSign = extractFloat32Sign( a );

ERROR: space prohibited after that open parenthesis '('
#749: FILE: fpu/softfloat.c:3431:
+    bSign = extractFloat32Sign( b );

ERROR: space prohibited before that close parenthesis ')'
#749: FILE: fpu/softfloat.c:3431:
+    bSign = extractFloat32Sign( b );

WARNING: line over 80 characters
#752: FILE: fpu/softfloat.c:3434:
+    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );

ERROR: spaces required around that '<<' (ctx:VxV)
#752: FILE: fpu/softfloat.c:3434:
+    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
                                                                     ^

ERROR: space prohibited after that open parenthesis '('
#752: FILE: fpu/softfloat.c:3434:
+    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );

ERROR: space prohibited before that close parenthesis ')'
#752: FILE: fpu/softfloat.c:3434:
+    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );

ERROR: trailing statements should be on next line
#752: FILE: fpu/softfloat.c:3434:
+    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );

ERROR: braces {} are necessary for all arms of this statement
#752: FILE: fpu/softfloat.c:3434:
+    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
[...]

ERROR: space prohibited after that open parenthesis '('
#753: FILE: fpu/softfloat.c:3435:
+    return ( av != bv ) && ( aSign ^ ( av < bv ) );

ERROR: space prohibited before that close parenthesis ')'
#753: FILE: fpu/softfloat.c:3435:
+    return ( av != bv ) && ( aSign ^ ( av < bv ) );

ERROR: space prohibited after that open parenthesis '('
#813: FILE: fpu/softfloat.c:3451:
+    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )

ERROR: space prohibited before that close parenthesis ')'
#813: FILE: fpu/softfloat.c:3451:
+    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )

ERROR: space prohibited after that open parenthesis '('
#814: FILE: fpu/softfloat.c:3452:
+         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )

ERROR: space prohibited before that close parenthesis ')'
#814: FILE: fpu/softfloat.c:3452:
+         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )

total: 20 errors, 3 warnings, 1076 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 17/20: fpu/softfloat: re-factor int/uint to float...
Checking PATCH 18/20: fpu/softfloat: re-factor scalbn...
Checking PATCH 19/20: fpu/softfloat: re-factor minmax...
WARNING: line over 80 characters
#120: FILE: fpu/softfloat.c:1762:
+float ## sz float ## sz ## _ ## name(float ## sz a, float ## sz b, float_status *s) \

total: 0 errors, 1 warnings, 263 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 20/20: fpu/softfloat: re-factor compare...
WARNING: line over 80 characters
#91: FILE: fpu/softfloat.c:1861:
+int float ## sz ## _compare_quiet(float ## sz a, float ## sz b, float_status *s) \

total: 0 errors, 1 warnings, 154 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/20] include/fpu/softfloat: add some float16 constants
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 06/20] include/fpu/softfloat: add some float16 constants Alex Bennée
@ 2018-01-09 13:27   ` Philippe Mathieu-Daudé
  2018-01-09 15:16     ` Alex Bennée
  2018-01-12 13:47   ` Peter Maydell
  1 sibling, 1 reply; 68+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-01-09 13:27 UTC (permalink / raw)
  To: Alex Bennée, richard.henderson, peter.maydell, laurent,
	bharata, andrew
  Cc: qemu-devel, Aurelien Jarno

[-- Attachment #1: Type: text/plain, Size: 2692 bytes --]

Hi Alex,

On 01/09/2018 09:22 AM, Alex Bennée wrote:
> This defines the same set of common constants for float 16 as defined
> for 32 and 64 bit floats. These are often used by target helper
> functions. I've also removed constants that are not used by anybody.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> v2
>   - fixup constants, remove unused onces
> ---
>  include/fpu/softfloat.h | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 8ab5d0df47..e64bf62f3d 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -368,6 +368,11 @@ static inline float16 float16_set_sign(float16 a, int sign)
>      return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
>  }
>  
> +#define float16_zero make_float16(0)
> +#define float16_one make_float16(0x3a00)

I still disagree with this one, it seems your bits 9/10 are inverted
(mantissa msb with biased exponent lsb).

         S EEEEE TTTTTTTTTT
0x3a00 = 0 01110 1000000000

having:
S=0
E=0b01110=14
T=0b1000000000=512

I get:
(-1)^0 * 2^(14-15) * (1 + (2^-10) * 512) = 1 * 0.5 * (1 + 0.5) = 0.75

With 0x3c00 I get:

         S EEEEE TTTTTTTTTT
0x3c00 = 0 01111 0000000000
S=0
E=0b01111=15
T=0b0000000000=0

(-1)^0 * 2^(15-15) * (1 + (2^-10) * 0) = 1 * 2^0 * (1 + 0) = 1

The rest is OK.

Changing by "#define float16_one make_float16(0x3c00)":
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> +#define float16_half make_float16(0x3800)
> +#define float16_infinity make_float16(0x7c00)
> +
>  /*----------------------------------------------------------------------------
>  | The pattern for a default generated half-precision NaN.
>  *----------------------------------------------------------------------------*/
> @@ -474,8 +479,6 @@ static inline float32 float32_set_sign(float32 a, int sign)
>  
>  #define float32_zero make_float32(0)
>  #define float32_one make_float32(0x3f800000)
> -#define float32_ln2 make_float32(0x3f317218)
> -#define float32_pi make_float32(0x40490fdb)
>  #define float32_half make_float32(0x3f000000)
>  #define float32_infinity make_float32(0x7f800000)
>  
> @@ -588,7 +591,6 @@ static inline float64 float64_set_sign(float64 a, int sign)
>  #define float64_zero make_float64(0)
>  #define float64_one make_float64(0x3ff0000000000000LL)
>  #define float64_ln2 make_float64(0x3fe62e42fefa39efLL)
> -#define float64_pi make_float64(0x400921fb54442d18LL)
>  #define float64_half make_float64(0x3fe0000000000000LL)
>  #define float64_infinity make_float64(0x7ff0000000000000LL)
>  
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/20] include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES
  2018-01-09 12:27   ` Laurent Vivier
@ 2018-01-09 14:12     ` Aurelien Jarno
  2018-01-09 14:14       ` Peter Maydell
  0 siblings, 1 reply; 68+ messages in thread
From: Aurelien Jarno @ 2018-01-09 14:12 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Alex Bennée, richard.henderson, peter.maydell, bharata,
	andrew, qemu-devel

On 2018-01-09 13:27, Laurent Vivier wrote:
> Le 09/01/2018 à 13:22, Alex Bennée a écrit :
> > It's not actively built and when enabled things fail to compile. I'm
> > not sure the type-checking is really helping here. Seeing as we "own"
> > our softfloat now lets remove the cruft.
> 
> I think it would be better to fix the build break than to remove the
> type-checking tool.
> 
> but that's only my opinion...

I agree with that. Those checks are useful for targets which call host
floating point functions for some instructions. This is ugly, but that's
what is still done for at least x86 for the trigonometrical functions.
The check prevents assigning a float or double value to a softfloat type
without calling the conversion function.

Now, when we make sure that those ugly things are removed, I think these
type-checking might be removed.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/20] include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES
  2018-01-09 14:12     ` Aurelien Jarno
@ 2018-01-09 14:14       ` Peter Maydell
  2018-01-09 14:20         ` Laurent Vivier
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2018-01-09 14:14 UTC (permalink / raw)
  To: Aurelien Jarno
  Cc: Laurent Vivier, Alex Bennée, Richard Henderson, bharata,
	Andrew Dutcher, QEMU Developers

On 9 January 2018 at 14:12, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On 2018-01-09 13:27, Laurent Vivier wrote:
>> Le 09/01/2018 à 13:22, Alex Bennée a écrit :
>> > It's not actively built and when enabled things fail to compile. I'm
>> > not sure the type-checking is really helping here. Seeing as we "own"
>> > our softfloat now lets remove the cruft.
>>
>> I think it would be better to fix the build break than to remove the
>> type-checking tool.
>>
>> but that's only my opinion...
>
> I agree with that. Those checks are useful for targets which call host
> floating point functions for some instructions. This is ugly, but that's
> what is still done for at least x86 for the trigonometrical functions.
> The check prevents assigning a float or double value to a softfloat type
> without calling the conversion function.

Is gcc's codegen still bad enough that we have to default to not
using the type-checking versions? If so, maybe we could at least
enable the type-checking on an --enable-debug build, so it doesn't
bitrot all the time.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/20] include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES
  2018-01-09 14:14       ` Peter Maydell
@ 2018-01-09 14:20         ` Laurent Vivier
  2018-01-09 14:43           ` Peter Maydell
  2018-01-09 15:25           ` Alex Bennée
  0 siblings, 2 replies; 68+ messages in thread
From: Laurent Vivier @ 2018-01-09 14:20 UTC (permalink / raw)
  To: Peter Maydell, Aurelien Jarno
  Cc: Alex Bennée, Richard Henderson, bharata, Andrew Dutcher,
	QEMU Developers

Le 09/01/2018 à 15:14, Peter Maydell a écrit :
> On 9 January 2018 at 14:12, Aurelien Jarno <aurelien@aurel32.net> wrote:
>> On 2018-01-09 13:27, Laurent Vivier wrote:
>>> Le 09/01/2018 à 13:22, Alex Bennée a écrit :
>>>> It's not actively built and when enabled things fail to compile. I'm
>>>> not sure the type-checking is really helping here. Seeing as we "own"
>>>> our softfloat now lets remove the cruft.
>>>
>>> I think it would be better to fix the build break than to remove the
>>> type-checking tool.
>>>
>>> but that's only my opinion...
>>
>> I agree with that. Those checks are useful for targets which call host
>> floating point functions for some instructions. This is ugly, but that's
>> what is still done for at least x86 for the trigonometrical functions.
>> The check prevents assigning a float or double value to a softfloat type
>> without calling the conversion function.
> 
> Is gcc's codegen still bad enough that we have to default to not
> using the type-checking versions? If so, maybe we could at least
> enable the type-checking on an --enable-debug build, so it doesn't
> bitrot all the time.

What means "bad enough"? for some targets it works fine.

The problem with that is if it is not enabled all the time it becomes
broken really quick...

BTW, if it doesn't break Alex's work I'm volunteer to fix
USE_SOFTFLOAT_STRUCT_TYPES build.

Thanks,
Laurent

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/20] include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES
  2018-01-09 14:20         ` Laurent Vivier
@ 2018-01-09 14:43           ` Peter Maydell
  2018-01-09 16:45             ` Richard Henderson
  2018-01-09 15:25           ` Alex Bennée
  1 sibling, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2018-01-09 14:43 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Aurelien Jarno, Alex Bennée, Richard Henderson, bharata,
	Andrew Dutcher, QEMU Developers

On 9 January 2018 at 14:20, Laurent Vivier <laurent@vivier.eu> wrote:
> Le 09/01/2018 à 15:14, Peter Maydell a écrit :
>> Is gcc's codegen still bad enough that we have to default to not
>> using the type-checking versions? If so, maybe we could at least
>> enable the type-checking on an --enable-debug build, so it doesn't
>> bitrot all the time.
>
> What means "bad enough"? for some targets it works fine.

I mean whatever the problem was that made us write the comment
   A sufficiently clever compiler and
   sane ABI should be able to see though these structs.  However
   x86/gcc 3.x seems to struggle a bit, so leave them disabled by default.

In theory the code generated should be no worse than without structs...

> The problem with that is if it is not enabled all the time it becomes
> broken really quick...

Yes, that's why I'd like it at least default-enabled for --enable-debug,
if we can't enable it always.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/20] include/fpu/softfloat: add some float16 constants
  2018-01-09 13:27   ` Philippe Mathieu-Daudé
@ 2018-01-09 15:16     ` Alex Bennée
  0 siblings, 0 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 15:16 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: richard.henderson, peter.maydell, laurent, bharata, andrew,
	qemu-devel, Aurelien Jarno


Philippe Mathieu-Daudé <f4bug@amsat.org> writes:

> Hi Alex,
>
> On 01/09/2018 09:22 AM, Alex Bennée wrote:
>> This defines the same set of common constants for float 16 as defined
>> for 32 and 64 bit floats. These are often used by target helper
>> functions. I've also removed constants that are not used by anybody.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>
>> ---
>> v2
>>   - fixup constants, remove unused onces
>> ---
>>  include/fpu/softfloat.h | 8 +++++---
>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
>> index 8ab5d0df47..e64bf62f3d 100644
>> --- a/include/fpu/softfloat.h
>> +++ b/include/fpu/softfloat.h
>> @@ -368,6 +368,11 @@ static inline float16 float16_set_sign(float16 a, int sign)
>>      return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
>>  }
>>
>> +#define float16_zero make_float16(0)
>> +#define float16_one make_float16(0x3a00)
>
> I still disagree with this one, it seems your bits 9/10 are inverted
> (mantissa msb with biased exponent lsb).
>
>          S EEEEE TTTTTTTTTT
> 0x3a00 = 0 01110 1000000000
>
> having:
> S=0
> E=0b01110=14
> T=0b1000000000=512
>
> I get:
> (-1)^0 * 2^(14-15) * (1 + (2^-10) * 512) = 1 * 0.5 * (1 + 0.5) = 0.75
>
> With 0x3c00 I get:
>
>          S EEEEE TTTTTTTTTT
> 0x3c00 = 0 01111 0000000000
> S=0
> E=0b01111=15
> T=0b0000000000=0
>
> (-1)^0 * 2^(15-15) * (1 + (2^-10) * 0) = 1 * 2^0 * (1 + 0) = 1
>
> The rest is OK.
>
> Changing by "#define float16_one make_float16(0x3c00)":
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

Dam it, I thought I'd fixed them all.

Thanks for the catch.

>
>> +#define float16_half make_float16(0x3800)
>> +#define float16_infinity make_float16(0x7c00)
>> +
>>  /*----------------------------------------------------------------------------
>>  | The pattern for a default generated half-precision NaN.
>>  *----------------------------------------------------------------------------*/
>> @@ -474,8 +479,6 @@ static inline float32 float32_set_sign(float32 a, int sign)
>>
>>  #define float32_zero make_float32(0)
>>  #define float32_one make_float32(0x3f800000)
>> -#define float32_ln2 make_float32(0x3f317218)
>> -#define float32_pi make_float32(0x40490fdb)
>>  #define float32_half make_float32(0x3f000000)
>>  #define float32_infinity make_float32(0x7f800000)
>>
>> @@ -588,7 +591,6 @@ static inline float64 float64_set_sign(float64 a, int sign)
>>  #define float64_zero make_float64(0)
>>  #define float64_one make_float64(0x3ff0000000000000LL)
>>  #define float64_ln2 make_float64(0x3fe62e42fefa39efLL)
>> -#define float64_pi make_float64(0x400921fb54442d18LL)
>>  #define float64_half make_float64(0x3fe0000000000000LL)
>>  #define float64_infinity make_float64(0x7ff0000000000000LL)
>>
>>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/20] include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES
  2018-01-09 14:20         ` Laurent Vivier
  2018-01-09 14:43           ` Peter Maydell
@ 2018-01-09 15:25           ` Alex Bennée
  1 sibling, 0 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-09 15:25 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Peter Maydell, Aurelien Jarno, Richard Henderson, bharata,
	Andrew Dutcher, QEMU Developers


Laurent Vivier <laurent@vivier.eu> writes:

> Le 09/01/2018 à 15:14, Peter Maydell a écrit:
>> On 9 January 2018 at 14:12, Aurelien Jarno <aurelien@aurel32.net> wrote:
>>> On 2018-01-09 13:27, Laurent Vivier wrote:
>>>> Le 09/01/2018 à 13:22, Alex Bennée a écrit :
>>>>> It's not actively built and when enabled things fail to compile. I'm
>>>>> not sure the type-checking is really helping here. Seeing as we "own"
>>>>> our softfloat now lets remove the cruft.
>>>>
>>>> I think it would be better to fix the build break than to remove the
>>>> type-checking tool.
>>>>
>>>> but that's only my opinion...
>>>
>>> I agree with that. Those checks are useful for targets which call host
>>> floating point functions for some instructions. This is ugly, but that's
>>> what is still done for at least x86 for the trigonometrical functions.
>>> The check prevents assigning a float or double value to a softfloat type
>>> without calling the conversion function.
>>
>> Is gcc's codegen still bad enough that we have to default to not
>> using the type-checking versions? If so, maybe we could at least
>> enable the type-checking on an --enable-debug build, so it doesn't
>> bitrot all the time.
>
> What means "bad enough"? for some targets it works fine.
>
> The problem with that is if it is not enabled all the time it becomes
> broken really quick...
>
> BTW, if it doesn't break Alex's work I'm volunteer to fix
> USE_SOFTFLOAT_STRUCT_TYPES build.

Be my guest - I suspect getting that merged would be on a faster path
than the rest of the softfloat re-factoring patch series (unless the
relative silence means everyone is happy with it ;-).

By the way I mentioned in my header mail that the types are included
from bswap.h so it would be nice to move the softfloat types somewhere
where they could be more easily included to avoid triggering a re-build
every time softfloat.h is touched.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/20] include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES
  2018-01-09 14:43           ` Peter Maydell
@ 2018-01-09 16:45             ` Richard Henderson
  0 siblings, 0 replies; 68+ messages in thread
From: Richard Henderson @ 2018-01-09 16:45 UTC (permalink / raw)
  To: Peter Maydell, Laurent Vivier
  Cc: Aurelien Jarno, Alex Bennée, bharata, Andrew Dutcher,
	QEMU Developers

On 01/09/2018 06:43 AM, Peter Maydell wrote:
> On 9 January 2018 at 14:20, Laurent Vivier <laurent@vivier.eu> wrote:
>> Le 09/01/2018 à 15:14, Peter Maydell a écrit :
>>> Is gcc's codegen still bad enough that we have to default to not
>>> using the type-checking versions? If so, maybe we could at least
>>> enable the type-checking on an --enable-debug build, so it doesn't
>>> bitrot all the time.
>>
>> What means "bad enough"? for some targets it works fine.
> 
> I mean whatever the problem was that made us write the comment
>    A sufficiently clever compiler and
>    sane ABI should be able to see though these structs.  However
>    x86/gcc 3.x seems to struggle a bit, so leave them disabled by default.

E.g. the i386 ABI is *not* sane in this respect.  Nor is sparcv8plus.  I'd have
to check on the others.

If we enable USE_SOFTFLOAT_STRUCT_TYPES, then we *must* remove the markup for
f32 and f64 from include/exec/helper-head.h, because structures are passed
differently from integers as parameters and/or return values.

I personally do not think USE_SOFTFLOAT_STRUCT_TYPES is worth the headache.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures Alex Bennée
@ 2018-01-09 17:01   ` Richard Henderson
  2018-01-12 14:22   ` Peter Maydell
  2018-01-12 16:21   ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 68+ messages in thread
From: Richard Henderson @ 2018-01-09 17:01 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Aurelien Jarno

On 01/09/2018 04:22 AM, Alex Bennée wrote:
> +    float_class_qnan,
> +    float_class_snan,
> +    float_class_dnan,

  /* default nan */

here wouldn't go amiss.

> +    float_class_msnan, /* maybe silenced */


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 16/20] fpu/softfloat: re-factor float to int/uint
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 16/20] fpu/softfloat: re-factor float to int/uint Alex Bennée
@ 2018-01-09 17:12   ` Richard Henderson
  2018-01-12 16:36   ` Peter Maydell
  2018-01-16 17:06   ` Alex Bennée
  2 siblings, 0 replies; 68+ messages in thread
From: Richard Henderson @ 2018-01-09 17:12 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Aurelien Jarno

On 01/09/2018 04:22 AM, Alex Bennée wrote:
> We share the common int64/uint64_pack_decomposed function across all
> the helpers and simply limit the final result depending on the final
> size.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> --
> v2
>   - apply float_flg_invalid fixes next patch
> ---
>  fpu/softfloat.c         | 1011 +++++++++++------------------------------------
>  include/fpu/softfloat.h |   13 +
>  2 files changed, 235 insertions(+), 789 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 19/20] fpu/softfloat: re-factor minmax
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 19/20] fpu/softfloat: re-factor minmax Alex Bennée
@ 2018-01-09 17:16   ` Richard Henderson
  0 siblings, 0 replies; 68+ messages in thread
From: Richard Henderson @ 2018-01-09 17:16 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Aurelien Jarno

On 01/09/2018 04:22 AM, Alex Bennée wrote:
> Let's do the same re-factor treatment for minmax functions. I still
> use the MACRO trick to expand but now all the checking code is common.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> v2
>   - minor indentation fix
> ---
>  fpu/softfloat.c         | 239 ++++++++++++++++++++++++++----------------------
>  include/fpu/softfloat.h |   6 ++
>  2 files changed, 134 insertions(+), 111 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 20/20] fpu/softfloat: re-factor compare
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 20/20] fpu/softfloat: re-factor compare Alex Bennée
@ 2018-01-09 17:18   ` Richard Henderson
  0 siblings, 0 replies; 68+ messages in thread
From: Richard Henderson @ 2018-01-09 17:18 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Aurelien Jarno

On 01/09/2018 04:22 AM, Alex Bennée wrote:
> The compare function was already expanded from a macro. I keep the
> macro expansion but move most of the logic into a compare_decomposed.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> v2
>   - minor re-factor for better inf handling
> ---
>  fpu/softfloat.c         | 134 +++++++++++++++++++++++++++++-------------------
>  include/fpu/softfloat.h |   2 +
>  2 files changed, 82 insertions(+), 54 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/20] fpu/softfloat: implement float16_squash_input_denormal
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 01/20] fpu/softfloat: implement float16_squash_input_denormal Alex Bennée
@ 2018-01-12 13:41   ` Peter Maydell
  0 siblings, 0 replies; 68+ messages in thread
From: Peter Maydell @ 2018-01-12 13:41 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> This will be required when expanding the MINMAX() macro for 16
> bit/half-precision operations.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/20] include/fpu/softfloat: implement float16_abs helper
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 03/20] include/fpu/softfloat: implement float16_abs helper Alex Bennée
@ 2018-01-12 13:42   ` Peter Maydell
  0 siblings, 0 replies; 68+ messages in thread
From: Peter Maydell @ 2018-01-12 13:42 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> This will be required when expanding the MINMAX() macro for 16
> bit/half-precision operations.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
>  include/fpu/softfloat.h | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 52af1412de..cfc615008d 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -347,6 +347,13 @@ static inline int float16_is_zero_or_denormal(float16 a)
>      return (float16_val(a) & 0x7c00) == 0;
>  }
>
> +static inline float16 float16_abs(float16 a)
> +{
> +    /* Note that abs does *not* handle NaN specially, nor does
> +     * it flush denormal inputs to zero.
> +     */
> +    return make_float16(float16_val(a) & 0x7fff);
> +}
>  /*----------------------------------------------------------------------------
>  | The pattern for a default generated half-precision NaN.
>  *----------------------------------------------------------------------------*/
> --
> 2.15.1

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/20] include/fpu/softfloat: implement float16_chs helper
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 04/20] include/fpu/softfloat: implement float16_chs helper Alex Bennée
@ 2018-01-12 13:43   ` Peter Maydell
  0 siblings, 0 replies; 68+ messages in thread
From: Peter Maydell @ 2018-01-12 13:43 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/fpu/softfloat.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index cfc615008d..dc71b01dba 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -354,6 +354,15 @@ static inline float16 float16_abs(float16 a)
>       */
>      return make_float16(float16_val(a) & 0x7fff);
>  }
> +
> +static inline float16 float16_chs(float16 a)
> +{
> +    /* Note that chs does *not* handle NaN specially, nor does
> +     * it flush denormal inputs to zero.
> +     */
> +    return make_float16(float16_val(a) ^ 0x8000);
> +}
> +
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/20] include/fpu/softfloat: implement float16_set_sign helper
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 05/20] include/fpu/softfloat: implement float16_set_sign helper Alex Bennée
@ 2018-01-12 13:43   ` Peter Maydell
  0 siblings, 0 replies; 68+ messages in thread
From: Peter Maydell @ 2018-01-12 13:43 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  include/fpu/softfloat.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index dc71b01dba..8ab5d0df47 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -363,6 +363,11 @@ static inline float16 float16_chs(float16 a)
>      return make_float16(float16_val(a) ^ 0x8000);
>  }
>
> +static inline float16 float16_set_sign(float16 a, int sign)
> +{
> +    return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
> +}
> +
>  /*----------------------------------------------------------------------------
>  | The pattern for a default generated half-precision NaN.
>  *----------------------------------------------------------------------------*/
> --

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/20] include/fpu/softfloat: add some float16 constants
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 06/20] include/fpu/softfloat: add some float16 constants Alex Bennée
  2018-01-09 13:27   ` Philippe Mathieu-Daudé
@ 2018-01-12 13:47   ` Peter Maydell
  1 sibling, 0 replies; 68+ messages in thread
From: Peter Maydell @ 2018-01-12 13:47 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> This defines the same set of common constants for float 16 as defined
> for 32 and 64 bit floats. These are often used by target helper
> functions. I've also removed constants that are not used by anybody.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

With the fix to the value for float16_one that Philippe suggests:
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/20] fpu/softfloat: propagate signalling NaNs in MINMAX
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 07/20] fpu/softfloat: propagate signalling NaNs in MINMAX Alex Bennée
@ 2018-01-12 14:04   ` Peter Maydell
  2018-01-16 11:31     ` Alex Bennée
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2018-01-12 14:04 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> While a comparison between a QNaN and a number will return the number
> it is not the same with a signaling NaN. In this case the SNaN will
> "win" and after potentially raising an exception it will be quietened.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> v2
>   - added return for propageFloat
> ---
>  fpu/softfloat.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 3a4ab1355f..44c043924e 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -7683,6 +7683,7 @@ int float128_compare_quiet(float128 a, float128 b, float_status *status)
>   * minnum() and maxnum() functions. These are similar to the min()
>   * and max() functions but if one of the arguments is a QNaN and
>   * the other is numerical then the numerical argument is returned.
> + * SNaNs will get quietened before being returned.
>   * minnum() and maxnum correspond to the IEEE 754-2008 minNum()
>   * and maxNum() operations. min() and max() are the typical min/max
>   * semantics provided by many CPUs which predate that specification.
> @@ -7703,11 +7704,14 @@ static inline float ## s float ## s ## _minmax(float ## s a, float ## s b,     \
>      if (float ## s ## _is_any_nan(a) ||                                 \
>          float ## s ## _is_any_nan(b)) {                                 \
>          if (isieee) {                                                   \
> -            if (float ## s ## _is_quiet_nan(a, status) &&               \
> +            if (float ## s ## _is_signaling_nan(a, status) ||           \
> +                float ## s ## _is_signaling_nan(b, status)) {           \
> +                return propagateFloat ## s ## NaN(a, b, status);        \
> +            } else  if (float ## s ## _is_quiet_nan(a, status) &&       \
>                  !float ## s ##_is_any_nan(b)) {                         \
>                  return b;                                               \
>              } else if (float ## s ## _is_quiet_nan(b, status) &&        \
> -                       !float ## s ## _is_any_nan(a)) {                \
> +                       !float ## s ## _is_any_nan(a)) {                 \
>                  return a;                                               \
>              }                                                           \
>          }                                                               \
>          return propagateFloat ## s ## NaN(a, b, status);                \
>      }                                                                   \

[added a couple of extra lines of context at the end for clarity]

Am I misreading this patch? I can't see in what case it makes a
difference to the result. The code change adds an explicit "if
either A or B is an SNaN then return the propagateFloat*NaN() result".
But if either A or B is an SNaN then we won't take either of the
previously existing branches in this if() ("if A is a QNaN and B is
not a NaN" and "if B is a QNaN and A is not a NaN"), and so we'll
end up falling through to the "return propagateFloat*NaN" line after
the end of the "is (ieee) {...}".

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/20] fpu/softfloat: improve comments on ARM NaN propagation
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 08/20] fpu/softfloat: improve comments on ARM NaN propagation Alex Bennée
@ 2018-01-12 14:07   ` Peter Maydell
  0 siblings, 0 replies; 68+ messages in thread
From: Peter Maydell @ 2018-01-12 14:07 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> Mention the pseudo-code fragment from which this is based and correct
> the spelling of signalling.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  fpu/softfloat-specialize.h | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
> index de2c5d5702..3d507d8c77 100644
> --- a/fpu/softfloat-specialize.h
> +++ b/fpu/softfloat-specialize.h
> @@ -445,14 +445,15 @@ static float32 commonNaNToFloat32(commonNaNT a, float_status *status)
>
>  #if defined(TARGET_ARM)
>  static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN,
> -                    flag aIsLargerSignificand)
> +                   flag aIsLargerSignificand)
>  {
> -    /* ARM mandated NaN propagation rules: take the first of:
> -     *  1. A if it is signaling
> -     *  2. B if it is signaling
> +    /* ARM mandated NaN propagation rules (see FPProcessNaNs()), take
> +     * the first of:
> +     *  1. A if it is signalling
> +     *  2. B if it is signalling
>       *  3. A (quiet)
>       *  4. B (quiet)
> -     * A signaling NaN is always quietened before returning it.
> +     * A signalling NaN is always quietened before returning it.
>       */
>      if (aIsSNaN) {
>          return 0;

The correct spelling here is "signaling" with one "l". The IEEE spec
uses that, and the Arm ARM follows it. (I think I mentioned this last
time around too.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/20] fpu/softfloat: move the extract functions to the top of the file
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 09/20] fpu/softfloat: move the extract functions to the top of the file Alex Bennée
@ 2018-01-12 14:07   ` Peter Maydell
  0 siblings, 0 replies; 68+ messages in thread
From: Peter Maydell @ 2018-01-12 14:07 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> This is pure code-motion during re-factoring as the helpers will be
> needed earlier.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures Alex Bennée
  2018-01-09 17:01   ` Richard Henderson
@ 2018-01-12 14:22   ` Peter Maydell
  2018-01-12 16:21   ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 68+ messages in thread
From: Peter Maydell @ 2018-01-12 14:22 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> These structures pave the way for generic softfloat helper routines
> that will operate on fully decomposed numbers.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  fpu/softfloat.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 69 insertions(+), 1 deletion(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 59afe81d06..fcba28d3f8 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -83,7 +83,7 @@ this code that are retained.
>   * target-dependent and needs the TARGET_* macros.
>   */
>  #include "qemu/osdep.h"
> -
> +#include "qemu/bitops.h"
>  #include "fpu/softfloat.h"
>
>  /* We only need stdlib for abort() */
> @@ -186,6 +186,74 @@ static inline flag extractFloat64Sign(float64 a)
>      return float64_val(a) >> 63;
>  }
>
> +/*----------------------------------------------------------------------------
> +| Classify a floating point number.
> +*----------------------------------------------------------------------------*/
> +
> +typedef enum {
> +    float_class_unclassified,
> +    float_class_zero,
> +    float_class_normal,
> +    float_class_inf,
> +    float_class_qnan,
> +    float_class_snan,
> +    float_class_dnan,
> +    float_class_msnan, /* maybe silenced */
> +} float_class;
> +
> +/*----------------------------------------------------------------------------
> +| Structure holding all of the decomposed parts of a float.
> +| The exponent is unbiased and the fraction is normalized.
> +*----------------------------------------------------------------------------*/
> +
> +typedef struct {
> +    uint64_t frac   : 64;
> +    int exp         : 32;
> +    float_class cls : 8;
> +    int             : 23;

What is this unnamed 23 bit field for?

> +    bool sign       : 1;

Why are we using a bitfield struct here anyway? uint64_t is 64 bits,
int is 32 bits, we don't care how big the float_class enum is
represented as, and we're not trying to pack together lots of bools
so it doesn't matter much if we have a whole byte for the sign.

> +} decomposed_parts;
> +
> +#define DECOMPOSED_BINARY_POINT    (64 - 2)
> +#define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
> +#define DECOMPOSED_OVERFLOW_BIT    (DECOMPOSED_IMPLICIT_BIT << 1)
> +
> +/* Structure holding all of the relevant parameters for a format.  */
> +typedef struct {
> +    int exp_bias;
> +    int exp_max;
> +    int frac_shift;
> +    uint64_t frac_lsb;
> +    uint64_t frac_lsbm1;

Why the '1' in the field name? Overall I think some brief
comments about what the fields mean would be helpful.

> +    uint64_t round_mask;
> +    uint64_t roundeven_mask;
> +} decomposed_params;
> +
> +#define FRAC_PARAMS(F)                     \
> +    .frac_shift     = F,                   \
> +    .frac_lsb       = 1ull << (F),         \
> +    .frac_lsbm1     = 1ull << ((F) - 1),   \
> +    .round_mask     = (1ull << (F)) - 1,   \
> +    .roundeven_mask = (2ull << (F)) - 1
> +
> +static const decomposed_params float16_params = {
> +    .exp_bias       = 0x0f,
> +    .exp_max        = 0x1f,
> +    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 10)
> +};
> +
> +static const decomposed_params float32_params = {
> +    .exp_bias       = 0x7f,
> +    .exp_max        = 0xff,
> +    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 23)
> +};
> +
> +static const decomposed_params float64_params = {
> +    .exp_bias       = 0x3ff,
> +    .exp_max        = 0x7ff,
> +    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 52)

Maybe we should hide the DECOMPOSED_BINARY_POINT bit inside the macro?
Then the 10/23/52 are just the number of fraction bits in the format.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/20] fpu/softfloat: re-factor add/sub
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 11/20] fpu/softfloat: re-factor add/sub Alex Bennée
@ 2018-01-12 15:57   ` Peter Maydell
  2018-01-12 18:30     ` Richard Henderson
                       ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Peter Maydell @ 2018-01-12 15:57 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> We can now add float16_add/sub and use the common decompose and
> canonicalize functions to have a single implementation for
> float16/32/64 add and sub functions.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  fpu/softfloat.c         | 904 +++++++++++++++++++++++++-----------------------
>  include/fpu/softfloat.h |   4 +
>  2 files changed, 481 insertions(+), 427 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index fcba28d3f8..f89e47e3ef 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -195,7 +195,7 @@ typedef enum {
>      float_class_zero,
>      float_class_normal,
>      float_class_inf,
> -    float_class_qnan,
> +    float_class_qnan,  /* all NaNs from here */

This comment change should be squashed into the previous patch.

>      float_class_snan,
>      float_class_dnan,
>      float_class_msnan, /* maybe silenced */
> @@ -254,6 +254,482 @@ static const decomposed_params float64_params = {
>      FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 52)
>  };
>
> +/* Unpack a float16 to parts, but do not canonicalize.  */
> +static inline decomposed_parts float16_unpack_raw(float16 f)
> +{
> +    return (decomposed_parts){
> +        .cls = float_class_unclassified,
> +        .sign = extract32(f, 15, 1),
> +        .exp = extract32(f, 10, 5),
> +        .frac = extract32(f, 0, 10)

In the previous patch we defined a bunch of structs that
give information about each float format, so it seems a bit
odd to be hardcoding bit numbers here.

> +    };
> +}
> +
> +/* Unpack a float32 to parts, but do not canonicalize.  */
> +static inline decomposed_parts float32_unpack_raw(float32 f)
> +{
> +    return (decomposed_parts){
> +        .cls = float_class_unclassified,
> +        .sign = extract32(f, 31, 1),
> +        .exp = extract32(f, 23, 8),
> +        .frac = extract32(f, 0, 23)
> +    };
> +}
> +
> +/* Unpack a float64 to parts, but do not canonicalize.  */
> +static inline decomposed_parts float64_unpack_raw(float64 f)
> +{
> +    return (decomposed_parts){
> +        .cls = float_class_unclassified,
> +        .sign = extract64(f, 63, 1),
> +        .exp = extract64(f, 52, 11),
> +        .frac = extract64(f, 0, 52),
> +    };
> +}
> +
> +/* Pack a float32 from parts, but do not canonicalize.  */
> +static inline float16 float16_pack_raw(decomposed_parts p)
> +{
> +    uint32_t ret = p.frac;
> +    ret = deposit32(ret, 10, 5, p.exp);
> +    ret = deposit32(ret, 15, 1, p.sign);
> +    return make_float16(ret);
> +}
> +
> +/* Pack a float32 from parts, but do not canonicalize.  */
> +static inline float32 float32_pack_raw(decomposed_parts p)
> +{
> +    uint32_t ret = p.frac;
> +    ret = deposit32(ret, 23, 8, p.exp);
> +    ret = deposit32(ret, 31, 1, p.sign);
> +    return make_float32(ret);
> +}
> +
> +/* Pack a float64 from parts, but do not canonicalize.  */
> +static inline float64 float64_pack_raw(decomposed_parts p)
> +{
> +    uint64_t ret = p.frac;
> +    ret = deposit64(ret, 52, 11, p.exp);
> +    ret = deposit64(ret, 63, 1, p.sign);
> +    return make_float64(ret);
> +}
> +
> +/* Canonicalize EXP and FRAC, setting CLS.  */
> +static decomposed_parts decomposed_canonicalize(decomposed_parts part,
> +                                        const decomposed_params *parm,

If you pick more compact names for your decomposed_params and
decomposed_parts structs, you won't have such awkwardness trying
to format function prototypes. (checkpatch complains that you have
an overlong line somewhere in this patch for this reason.)

In particular "decomposed_params" I think should change -- it's
confusingly similar to decomposed_parts, and it isn't really
a decomposed anything. It's just a collection of useful information
describing the float format. Try 'fmtinfo', maybe?

I see we're passing and returning decomposed_parts structs everywhere
rather than pointers to them. How well does that compile? (I guess
everything ends up inlining...)

> +                                        float_status *status)
> +{
> +    if (part.exp == parm->exp_max) {
> +        if (part.frac == 0) {
> +            part.cls = float_class_inf;
> +        } else {
> +#ifdef NO_SIGNALING_NANS

The old code didn't seem to need to ifdef this; why's the new
code different? (at some point we'll want to make this a runtime
setting so we can support one binary handling CPUs with it both
set and unset, but that is a far future thing we can ignore for now)

> +            part.cls = float_class_qnan;
> +#else
> +            int64_t msb = part.frac << (parm->frac_shift + 2);
> +            if ((msb < 0) == status->snan_bit_is_one) {
> +                part.cls = float_class_snan;
> +            } else {
> +                part.cls = float_class_qnan;
> +            }
> +#endif
> +        }
> +    } else if (part.exp == 0) {
> +        if (likely(part.frac == 0)) {
> +            part.cls = float_class_zero;
> +        } else if (status->flush_inputs_to_zero) {
> +            float_raise(float_flag_input_denormal, status);
> +            part.cls = float_class_zero;
> +            part.frac = 0;
> +        } else {
> +            int shift = clz64(part.frac) - 1;
> +            part.cls = float_class_normal;

This is really confusing. This is a *denormal*, but we're setting
the classification to "normal" ? (It's particularly confusing in
the code that uses the decomposed numbers, because it looks like
"if (a.cls == float_class_normal...)" is handling the normal-number
case and denormals are going to be in a later if branch, but actually
it's dealing with both.)

> +            part.exp = parm->frac_shift - parm->exp_bias - shift + 1;
> +            part.frac <<= shift;
> +        }
> +    } else {
> +        part.cls = float_class_normal;
> +        part.exp -= parm->exp_bias;
> +        part.frac = DECOMPOSED_IMPLICIT_BIT + (part.frac << parm->frac_shift);
> +    }
> +    return part;
> +}
> +
> +/* Round and uncanonicalize a floating-point number by parts.
> +   There are FRAC_SHIFT bits that may require rounding at the bottom
> +   of the fraction; these bits will be removed.  The exponent will be
> +   biased by EXP_BIAS and must be bounded by [EXP_MAX-1, 0].  */

This is an inconsistent multiline comment style to what you use
elsewhere in this patch...

> +static decomposed_parts decomposed_round_canonical(decomposed_parts p,
> +                                                   float_status *s,
> +                                                   const decomposed_params *parm)
> +{
> +    const uint64_t frac_lsbm1 = parm->frac_lsbm1;
> +    const uint64_t round_mask = parm->round_mask;
> +    const uint64_t roundeven_mask = parm->roundeven_mask;
> +    const int exp_max = parm->exp_max;
> +    const int frac_shift = parm->frac_shift;
> +    uint64_t frac, inc;
> +    int exp, flags = 0;
> +    bool overflow_norm;
> +
> +    frac = p.frac;
> +    exp = p.exp;
> +
> +    switch (p.cls) {
> +    case float_class_normal:
> +        switch (s->float_rounding_mode) {
> +        case float_round_nearest_even:
> +            overflow_norm = false;
> +            inc = ((frac & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
> +            break;
> +        case float_round_ties_away:
> +            overflow_norm = false;
> +            inc = frac_lsbm1;
> +            break;
> +        case float_round_to_zero:
> +            overflow_norm = true;
> +            inc = 0;
> +            break;
> +        case float_round_up:
> +            inc = p.sign ? 0 : round_mask;
> +            overflow_norm = p.sign;
> +            break;
> +        case float_round_down:
> +            inc = p.sign ? round_mask : 0;
> +            overflow_norm = !p.sign;
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
> +
> +        exp += parm->exp_bias;
> +        if (likely(exp > 0)) {
> +            if (frac & round_mask) {
> +                flags |= float_flag_inexact;
> +                frac += inc;
> +                if (frac & DECOMPOSED_OVERFLOW_BIT) {
> +                    frac >>= 1;
> +                    exp++;
> +                }
> +            }
> +            frac >>= frac_shift;
> +
> +            if (unlikely(exp >= exp_max)) {
> +                flags |= float_flag_overflow | float_flag_inexact;
> +                if (overflow_norm) {
> +                    exp = exp_max - 1;
> +                    frac = -1;
> +                } else {
> +                    p.cls = float_class_inf;
> +                    goto do_inf;
> +                }
> +            }
> +        } else if (s->flush_to_zero) {
> +            flags |= float_flag_output_denormal;
> +            p.cls = float_class_zero;
> +            goto do_zero;
> +        } else {
> +            bool is_tiny = (s->float_detect_tininess
> +                            == float_tininess_before_rounding)
> +                        || (exp < 0)
> +                        || !((frac + inc) & DECOMPOSED_OVERFLOW_BIT);
> +
> +            shift64RightJamming(frac, 1 - exp, &frac);
> +            if (frac & round_mask) {
> +                /* Need to recompute round-to-even.  */
> +                if (s->float_rounding_mode == float_round_nearest_even) {
> +                    inc = ((frac & roundeven_mask) != frac_lsbm1
> +                           ? frac_lsbm1 : 0);
> +                }
> +                flags |= float_flag_inexact;
> +                frac += inc;
> +            }
> +
> +            exp = (frac & DECOMPOSED_IMPLICIT_BIT ? 1 : 0);
> +            frac >>= frac_shift;
> +
> +            if (is_tiny && (flags & float_flag_inexact)) {
> +                flags |= float_flag_underflow;
> +            }
> +            if (exp == 0 && frac == 0) {
> +                p.cls = float_class_zero;
> +            }
> +        }
> +        break;
> +
> +    case float_class_zero:
> +    do_zero:
> +        exp = 0;
> +        frac = 0;
> +        break;
> +
> +    case float_class_inf:
> +    do_inf:
> +        exp = exp_max;
> +        frac = 0;
> +        break;
> +
> +    case float_class_qnan:
> +    case float_class_snan:
> +        exp = exp_max;
> +        break;
> +
> +    default:
> +        g_assert_not_reached();
> +    }
> +
> +    float_raise(flags, s);
> +    p.exp = exp;
> +    p.frac = frac;
> +    return p;
> +}
> +
> +static decomposed_parts float16_unpack_canonical(float16 f, float_status *s)
> +{
> +    return decomposed_canonicalize(float16_unpack_raw(f), &float16_params, s);
> +}
> +
> +static float16 float16_round_pack_canonical(decomposed_parts p, float_status *s)
> +{
> +    switch (p.cls) {
> +    case float_class_dnan:
> +        return float16_default_nan(s);
> +    case float_class_msnan:
> +        return float16_maybe_silence_nan(float16_pack_raw(p), s);

I think you will find that doing the silencing of the NaNs like this
isn't quite the right approach. Specifically, for Arm targets we
currently have a bug in float-to-float conversion from a wider
format to a narrower one when the input is a signaling NaN that we
want to silence, and its non-zero mantissa bits are all at the
less-significant end of the mantissa such that they don't fit into
the narrower format. If you pack the float into a float16 first and
then call maybe_silence_nan() on it you've lost the info about those
low bits which the silence function needs to know to return the
right answer. What you want to do instead is pass the silence_nan
function the decomposed value.

(The effect of this bug is that we return a default NaN, with the
sign bit clear, but the Arm FPConvertNaN pseudocode says that we
should effectively get the default NaN but with the same sign bit
as the input SNaN.)

Given that this is a bug currently in the version we have, we don't
necessarily need to fix it now, but I thought I'd mention it since
the redesign has almost but not quite managed to deliver the right
information to the silencing code to allow us to fix it soon :-)

> +    default:
> +        p = decomposed_round_canonical(p, s, &float16_params);
> +        return float16_pack_raw(p);
> +    }
> +}
> +
> +static decomposed_parts float32_unpack_canonical(float32 f, float_status *s)
> +{
> +    return decomposed_canonicalize(float32_unpack_raw(f), &float32_params, s);
> +}
> +
> +static float32 float32_round_pack_canonical(decomposed_parts p, float_status *s)
> +{
> +    switch (p.cls) {
> +    case float_class_dnan:
> +        return float32_default_nan(s);
> +    case float_class_msnan:
> +        return float32_maybe_silence_nan(float32_pack_raw(p), s);
> +    default:
> +        p = decomposed_round_canonical(p, s, &float32_params);
> +        return float32_pack_raw(p);
> +    }
> +}
> +
> +static decomposed_parts float64_unpack_canonical(float64 f, float_status *s)
> +{
> +    return decomposed_canonicalize(float64_unpack_raw(f), &float64_params, s);
> +}
> +
> +static float64 float64_round_pack_canonical(decomposed_parts p, float_status *s)
> +{
> +    switch (p.cls) {
> +    case float_class_dnan:
> +        return float64_default_nan(s);
> +    case float_class_msnan:
> +        return float64_maybe_silence_nan(float64_pack_raw(p), s);
> +    default:
> +        p = decomposed_round_canonical(p, s, &float64_params);
> +        return float64_pack_raw(p);
> +    }
> +}
> +
> +static decomposed_parts pick_nan_parts(decomposed_parts a, decomposed_parts b,
> +                                       float_status *s)
> +{
> +    if (a.cls == float_class_snan || b.cls == float_class_snan) {
> +        s->float_exception_flags |= float_flag_invalid;
> +    }
> +
> +    if (s->default_nan_mode) {
> +        a.cls = float_class_dnan;
> +    } else {
> +        if (pickNaN(a.cls == float_class_qnan,
> +                    a.cls == float_class_snan,
> +                    b.cls == float_class_qnan,
> +                    b.cls == float_class_snan,
> +                    a.frac > b.frac
> +                    || (a.frac == b.frac && a.sign < b.sign))) {
> +            a = b;
> +        }
> +        a.cls = float_class_msnan;
> +    }
> +    return a;
> +}
> +
> +
> +/*
> + * Returns the result of adding the absolute values of the
> + * floating-point values `a' and `b'. If `subtract' is set, the sum is
> + * negated before being returned. `subtract' is ignored if the result
> + * is a NaN. The addition is performed according to the IEC/IEEE
> + * Standard for Binary Floating-Point Arithmetic.
> + */

This comment doesn't seem to match what the code is doing,
because it says it adds the absolute values of 'a' and 'b',
but the code looks at a_sign and b_sign to decide whether it's
doing an addition or subtraction rather than ignoring the signs
(as you would for absolute arithmetic).

Put another way, this comment has been copied from the old addFloat64Sigs()
and not updated to account for the way the new function includes handling
of subFloat64Sigs().

> +
> +static decomposed_parts add_decomposed(decomposed_parts a, decomposed_parts b,
> +                                       bool subtract, float_status *s)
> +{
> +    bool a_sign = a.sign;
> +    bool b_sign = b.sign ^ subtract;
> +
> +    if (a_sign != b_sign) {
> +        /* Subtraction */
> +
> +        if (a.cls == float_class_normal && b.cls == float_class_normal) {
> +            int a_exp = a.exp;
> +            int b_exp = b.exp;
> +            uint64_t a_frac = a.frac;
> +            uint64_t b_frac = b.frac;

Do we really have to use locals here rather than just using a.frac,
b.frac etc in place ? If we trust the compiler enough to throw
structs in and out of functions and let everything inline, it
ought to be able to handle a uint64_t in a struct local variable.

> +
> +            if (a_exp > b_exp || (a_exp == b_exp && a_frac >= b_frac)) {
> +                shift64RightJamming(b_frac, a_exp - b_exp, &b_frac);
> +                a_frac = a_frac - b_frac;
> +            } else {
> +                shift64RightJamming(a_frac, b_exp - a_exp, &a_frac);
> +                a_frac = b_frac - a_frac;
> +                a_exp = b_exp;
> +                a_sign ^= 1;
> +            }
> +
> +            if (a_frac == 0) {
> +                a.cls = float_class_zero;
> +                a.sign = s->float_rounding_mode == float_round_down;
> +            } else {
> +                int shift = clz64(a_frac) - 1;
> +                a.frac = a_frac << shift;
> +                a.exp = a_exp - shift;
> +                a.sign = a_sign;
> +            }
> +            return a;
> +        }
> +        if (a.cls >= float_class_qnan
> +            ||
> +            b.cls >= float_class_qnan)
> +        {
> +            return pick_nan_parts(a, b, s);
> +        }
> +        if (a.cls == float_class_inf) {
> +            if (b.cls == float_class_inf) {
> +                float_raise(float_flag_invalid, s);
> +                a.cls = float_class_dnan;
> +            }
> +            return a;
> +        }
> +        if (a.cls == float_class_zero && b.cls == float_class_zero) {
> +            a.sign = s->float_rounding_mode == float_round_down;
> +            return a;
> +        }
> +        if (a.cls == float_class_zero || b.cls == float_class_inf) {
> +            b.sign = a_sign ^ 1;
> +            return b;
> +        }
> +        if (b.cls == float_class_zero) {
> +            return a;
> +        }
> +    } else {
> +        /* Addition */
> +        if (a.cls == float_class_normal && b.cls == float_class_normal) {
> +            int a_exp = a.exp;
> +            int b_exp = b.exp;
> +            uint64_t a_frac = a.frac;
> +            uint64_t b_frac = b.frac;
> +
> +            if (a_exp > b_exp) {
> +                shift64RightJamming(b_frac, a_exp - b_exp, &b_frac);
> +            } else if (a_exp < b_exp) {
> +                shift64RightJamming(a_frac, b_exp - a_exp, &a_frac);
> +                a_exp = b_exp;
> +            }
> +            a_frac += b_frac;
> +            if (a_frac & DECOMPOSED_OVERFLOW_BIT) {
> +                a_frac >>= 1;
> +                a_exp += 1;
> +            }
> +
> +            a.exp = a_exp;
> +            a.frac = a_frac;
> +            return a;
> +        }
> +        if (a.cls >= float_class_qnan
> +            ||
> +            b.cls >= float_class_qnan) {

We should helper functions for "is some kind of NaN" rather than
baking the assumption about order of the enum values directly
into every function. (Also "float_is_any_nan(a)" is easier to read.)

> +            return pick_nan_parts(a, b, s);
> +        }
> +        if (a.cls == float_class_inf || b.cls == float_class_zero) {
> +            return a;
> +        }
> +        if (b.cls == float_class_inf || a.cls == float_class_zero) {
> +            b.sign = b_sign;
> +            return b;
> +        }
> +    }
> +    g_assert_not_reached();
> +}
> +
> +/*
> + * Returns the result of adding or subtracting the floating-point
> + * values `a' and `b'. The operation is performed according to the
> + * IEC/IEEE Standard for Binary Floating-Point Arithmetic.
> + */
> +
> +float16 float16_add(float16 a, float16 b, float_status *status)
> +{
> +    decomposed_parts pa = float16_unpack_canonical(a, status);
> +    decomposed_parts pb = float16_unpack_canonical(b, status);
> +    decomposed_parts pr = add_decomposed(pa, pb, false, status);
> +
> +    return float16_round_pack_canonical(pr, status);
> +}
> +
> +float32 float32_add(float32 a, float32 b, float_status *status)
> +{
> +    decomposed_parts pa = float32_unpack_canonical(a, status);
> +    decomposed_parts pb = float32_unpack_canonical(b, status);
> +    decomposed_parts pr = add_decomposed(pa, pb, false, status);
> +
> +    return float32_round_pack_canonical(pr, status);
> +}
> +
> +float64 float64_add(float64 a, float64 b, float_status *status)
> +{
> +    decomposed_parts pa = float64_unpack_canonical(a, status);
> +    decomposed_parts pb = float64_unpack_canonical(b, status);
> +    decomposed_parts pr = add_decomposed(pa, pb, false, status);
> +
> +    return float64_round_pack_canonical(pr, status);
> +}
> +
> +float16 float16_sub(float16 a, float16 b, float_status *status)
> +{
> +    decomposed_parts pa = float16_unpack_canonical(a, status);
> +    decomposed_parts pb = float16_unpack_canonical(b, status);
> +    decomposed_parts pr = add_decomposed(pa, pb, true, status);
> +
> +    return float16_round_pack_canonical(pr, status);
> +}
> +
> +float32 float32_sub(float32 a, float32 b, float_status *status)
> +{
> +    decomposed_parts pa = float32_unpack_canonical(a, status);
> +    decomposed_parts pb = float32_unpack_canonical(b, status);
> +    decomposed_parts pr = add_decomposed(pa, pb, true, status);
> +
> +    return float32_round_pack_canonical(pr, status);
> +}
> +
> +float64 float64_sub(float64 a, float64 b, float_status *status)
> +{
> +    decomposed_parts pa = float64_unpack_canonical(a, status);
> +    decomposed_parts pb = float64_unpack_canonical(b, status);
> +    decomposed_parts pr = add_decomposed(pa, pb, true, status);
> +
> +    return float64_round_pack_canonical(pr, status);
> +}

This part is a pretty good advert for the benefits of the refactoring...

I'm not particularly worried about the performance of softfloat,
but out of curiosity have you benchmarked the old vs new?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/20] fpu/softfloat: re-factor mul
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 12/20] fpu/softfloat: re-factor mul Alex Bennée
  2018-01-09 12:43   ` Philippe Mathieu-Daudé
@ 2018-01-12 16:17   ` Peter Maydell
  2018-01-16 10:16     ` Alex Bennée
  1 sibling, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2018-01-12 16:17 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> We can now add float16_mul and use the common decompose and
> canonicalize functions to have a single implementation for
> float16/32/64 versions.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  fpu/softfloat.c         | 207 ++++++++++++++++++------------------------------
>  include/fpu/softfloat.h |   1 +
>  2 files changed, 80 insertions(+), 128 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index f89e47e3ef..6e9d4c172c 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -730,6 +730,85 @@ float64 float64_sub(float64 a, float64 b, float_status *status)
>      return float64_round_pack_canonical(pr, status);
>  }
>
> +/*
> + * Returns the result of multiplying the floating-point values `a' and
> + * `b'. The operation is performed according to the IEC/IEEE Standard
> + * for Binary Floating-Point Arithmetic.
> + */
> +
> +static decomposed_parts mul_decomposed(decomposed_parts a, decomposed_parts b,
> +                                       float_status *s)
> +{
> +    bool sign = a.sign ^ b.sign;
> +
> +    if (a.cls == float_class_normal && b.cls == float_class_normal) {
> +        uint64_t hi, lo;
> +        int exp = a.exp + b.exp;
> +
> +        mul64To128(a.frac, b.frac, &hi, &lo);
> +        shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo);
> +        if (lo & DECOMPOSED_OVERFLOW_BIT) {
> +            shift64RightJamming(lo, 1, &lo);
> +            exp += 1;
> +        }
> +
> +        /* Re-use a */
> +        a.exp = exp;
> +        a.sign = sign;
> +        a.frac = lo;
> +        return a;
> +    }
> +    /* handle all the NaN cases */
> +    if (a.cls >= float_class_qnan || b.cls >= float_class_qnan) {
> +        return pick_nan_parts(a, b, s);
> +    }
> +    /* Inf * Zero == NaN */
> +    if (((1 << a.cls) | (1 << b.cls)) ==
> +        ((1 << float_class_inf) | (1 << float_class_zero))) {

This is kinda confusing...

> +        s->float_exception_flags |= float_flag_invalid;
> +        a.cls = float_class_dnan;
> +        a.sign = sign;
> +        return a;
> +    }
> +    /* Multiply by 0 or Inf */
> +    if (a.cls == float_class_inf || a.cls == float_class_zero) {
> +        a.sign = sign;
> +        return a;
> +    }
> +    if (b.cls == float_class_inf || b.cls == float_class_zero) {
> +        b.sign = sign;
> +        return b;
> +    }
> +    g_assert_not_reached();
> +}

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures Alex Bennée
  2018-01-09 17:01   ` Richard Henderson
  2018-01-12 14:22   ` Peter Maydell
@ 2018-01-12 16:21   ` Philippe Mathieu-Daudé
  2018-01-18 13:08     ` Alex Bennée
  2 siblings, 1 reply; 68+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-01-12 16:21 UTC (permalink / raw)
  To: Alex Bennée, richard.henderson, peter.maydell, Francisco Iglesias
  Cc: laurent, bharata, andrew, qemu-devel, Aurelien Jarno

[-- Attachment #1: Type: text/plain, Size: 4114 bytes --]

Hi Alex, Richard,

On 01/09/2018 09:22 AM, Alex Bennée wrote:
> These structures pave the way for generic softfloat helper routines
> that will operate on fully decomposed numbers.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  fpu/softfloat.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 69 insertions(+), 1 deletion(-)
> 
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 59afe81d06..fcba28d3f8 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -83,7 +83,7 @@ this code that are retained.
>   * target-dependent and needs the TARGET_* macros.
>   */
>  #include "qemu/osdep.h"
> -
> +#include "qemu/bitops.h"
>  #include "fpu/softfloat.h"
>  
>  /* We only need stdlib for abort() */
> @@ -186,6 +186,74 @@ static inline flag extractFloat64Sign(float64 a)
>      return float64_val(a) >> 63;
>  }
>  
> +/*----------------------------------------------------------------------------
> +| Classify a floating point number.
> +*----------------------------------------------------------------------------*/
> +
> +typedef enum {
> +    float_class_unclassified,
> +    float_class_zero,
> +    float_class_normal,
> +    float_class_inf,
> +    float_class_qnan,
> +    float_class_snan,
> +    float_class_dnan,
> +    float_class_msnan, /* maybe silenced */
> +} float_class;
> +
> +/*----------------------------------------------------------------------------
> +| Structure holding all of the decomposed parts of a float.
> +| The exponent is unbiased and the fraction is normalized.
> +*----------------------------------------------------------------------------*/
> +
> +typedef struct {
> +    uint64_t frac   : 64;

I think this does not work on LLP64/IL32P64 model.

Should we add a check in ./configure and refuse to build on IL32P64
model? This would be safer IMHO.

> +    int exp         : 32;
> +    float_class cls : 8;
> +    int             : 23;
> +    bool sign       : 1;

checking on "ISO/IEC 14882:1998" 9.6 Bit-fields:

Alignment of bit-fields is implementation-defined. Bit-fields are packed
into some addressable allocation unit. [Note: bit-fields straddle
allocation units on some machines and not on others. Bit-fields are
assigned right-to-left on some machines, left-to-right on others. ]

I'd still write it:

      int             :23, sign :1;

> +} decomposed_parts;
> +
> +#define DECOMPOSED_BINARY_POINT    (64 - 2)
> +#define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
> +#define DECOMPOSED_OVERFLOW_BIT    (DECOMPOSED_IMPLICIT_BIT << 1)
> +
> +/* Structure holding all of the relevant parameters for a format.  */
> +typedef struct {
> +    int exp_bias;
> +    int exp_max;
> +    int frac_shift;
> +    uint64_t frac_lsb;
> +    uint64_t frac_lsbm1;
> +    uint64_t round_mask;
> +    uint64_t roundeven_mask;
> +} decomposed_params;
> +
> +#define FRAC_PARAMS(F)                     \
> +    .frac_shift     = F,                   \
> +    .frac_lsb       = 1ull << (F),         \
> +    .frac_lsbm1     = 1ull << ((F) - 1),   \
> +    .round_mask     = (1ull << (F)) - 1,   \
> +    .roundeven_mask = (2ull << (F)) - 1
> +
> +static const decomposed_params float16_params = {
> +    .exp_bias       = 0x0f,
> +    .exp_max        = 0x1f,
> +    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 10)
> +};
> +
> +static const decomposed_params float32_params = {
> +    .exp_bias       = 0x7f,
> +    .exp_max        = 0xff,
> +    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 23)
> +};
> +
> +static const decomposed_params float64_params = {
> +    .exp_bias       = 0x3ff,
> +    .exp_max        = 0x7ff,
> +    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 52)
> +};
> +
>  /*----------------------------------------------------------------------------
>  | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
>  | and 7, and returns the properly rounded 32-bit integer corresponding to the
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/20] fpu/softfloat: re-factor div
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 13/20] fpu/softfloat: re-factor div Alex Bennée
@ 2018-01-12 16:22   ` Peter Maydell
  2018-01-12 18:35     ` Richard Henderson
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2018-01-12 16:22 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> We can now add float16_div and use the common decompose and
> canonicalize functions to have a single implementation for
> float16/32/64 versions.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  fpu/softfloat-macros.h  |  44 +++++++++
>  fpu/softfloat.c         | 235 ++++++++++++++++++------------------------------
>  include/fpu/softfloat.h |   1 +
>  3 files changed, 134 insertions(+), 146 deletions(-)
>
> diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h
> index 9cc6158cb4..980be2c051 100644
> --- a/fpu/softfloat-macros.h
> +++ b/fpu/softfloat-macros.h
> @@ -625,6 +625,50 @@ static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b )
>
>  }
>
> +/* Nicked from gmp longlong.h __udiv_qrnnd */

Can we have a copyright/license attribution for code we nick
from other projects, please? :-)

> +static uint64_t div128To64(uint64_t n0, uint64_t n1, uint64_t d)
> +{
> +    uint64_t d0, d1, q0, q1, r1, r0, m;
> +
> +    d0 = (uint32_t)d;
> +    d1 = d >> 32;
> +
> +    r1 = n1 % d1;
> +    q1 = n1 / d1;
> +    m = q1 * d0;
> +    r1 = (r1 << 32) | (n0 >> 32);
> +    if (r1 < m) {
> +        q1 -= 1;
> +        r1 += d;
> +        if (r1 >= d) {
> +            if (r1 < m) {
> +                q1 -= 1;
> +                r1 += d;
> +            }
> +        }
> +    }
> +    r1 -= m;
> +
> +    r0 = r1 % d1;
> +    q0 = r1 / d1;
> +    m = q0 * d0;
> +    r0 = (r0 << 32) | (uint32_t)n0;
> +    if (r0 < m) {
> +        q0 -= 1;
> +        r0 += d;
> +        if (r0 >= d) {
> +            if (r0 < m) {
> +                q0 -= 1;
> +                r0 += d;
> +            }
> +        }
> +    }
> +    r0 -= m;
> +
> +    /* Return remainder in LSB */
> +    return (q1 << 32) | q0 | (r0 != 0);
> +}

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 18/20] fpu/softfloat: re-factor scalbn
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 18/20] fpu/softfloat: re-factor scalbn Alex Bennée
@ 2018-01-12 16:31   ` Peter Maydell
  2018-01-24 12:03     ` Alex Bennée
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2018-01-12 16:31 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> This is one of the simpler manipulations you could make to a floating
> point number.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  fpu/softfloat.c         | 104 +++++++++++++++---------------------------------
>  include/fpu/softfloat.h |   1 +
>  2 files changed, 32 insertions(+), 73 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index bb68d77f72..3647f6ca03 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -1663,6 +1663,37 @@ float64 uint16_to_float64(uint16_t a, float_status *status)
>      return uint64_to_float64(a, status);
>  }
>
> +/* Multiply A by 2 raised to the power N.  */
> +static decomposed_parts scalbn_decomposed(decomposed_parts a, int n,
> +                                          float_status *s)
> +{
> +    if (a.cls == float_class_normal) {
> +        a.exp += n;
> +    }
> +    return a;
> +}
> +
> +float16 float16_scalbn(float16 a, int n, float_status *status)
> +{
> +    decomposed_parts pa = float16_unpack_canonical(a, status);
> +    decomposed_parts pr = scalbn_decomposed(pa, n, status);
> +    return float16_round_pack_canonical(pr, status);
> +}
> +
> +float32 float32_scalbn(float32 a, int n, float_status *status)
> +{
> +    decomposed_parts pa = float32_unpack_canonical(a, status);
> +    decomposed_parts pr = scalbn_decomposed(pa, n, status);
> +    return float32_round_pack_canonical(pr, status);
> +}
> +
> +float64 float64_scalbn(float64 a, int n, float_status *status)
> +{
> +    decomposed_parts pa = float64_unpack_canonical(a, status);
> +    decomposed_parts pr = scalbn_decomposed(pa, n, status);
> +    return float64_round_pack_canonical(pr, status);
> +}

The old code used propagateFloat32NaN(a, a, status) if the
input was a NaN, to cause us to raise the invalid flag,
maybe return a default NaN, maybe silence the NaN. I can't
see where the new code is doing this?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 16/20] fpu/softfloat: re-factor float to int/uint
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 16/20] fpu/softfloat: re-factor float to int/uint Alex Bennée
  2018-01-09 17:12   ` Richard Henderson
@ 2018-01-12 16:36   ` Peter Maydell
  2018-01-16 17:06   ` Alex Bennée
  2 siblings, 0 replies; 68+ messages in thread
From: Peter Maydell @ 2018-01-12 16:36 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> We share the common int64/uint64_pack_decomposed function across all
> the helpers and simply limit the final result depending on the final
> size.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> --
> v2
>   - apply float_flg_invalid fixes next patch
> ---
>  fpu/softfloat.c         | 1011 +++++++++++------------------------------------
>  include/fpu/softfloat.h |   13 +
>  2 files changed, 235 insertions(+), 789 deletions(-)
>

> -/*----------------------------------------------------------------------------
> -| Returns the result of converting the double-precision floating-point value
> -| `a' to the 64-bit two's complement integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic---which means in particular that the conversion is rounded
> -| according to the current rounding mode.  If `a' is a NaN, the largest
> -| positive integer is returned.  Otherwise, if the conversion overflows, the
> -| largest integer with the same sign as `a' is returned.
> +| Standard for Binary Floating-Point Arithmetic.
>  *----------------------------------------------------------------------------*/
>
> -int64_t float64_to_int64(float64 a, float_status *status)
> +int float32_lt_quiet(float32 a, float32 b, float_status *status)
>  {
> -    flag aSign;
> -    int aExp;
> -    int shiftCount;
> -    uint64_t aSig, aSigExtra;
> -    a = float64_squash_input_denormal(a, status);
> +    flag aSign, bSign;
> +    uint32_t av, bv;
> +    a = float32_squash_input_denormal(a, status);
> +    b = float32_squash_input_denormal(b, status);
>
> -    aSig = extractFloat64Frac( a );
> -    aExp = extractFloat64Exp( a );
> -    aSign = extractFloat64Sign( a );
> -    if ( aExp ) aSig |= LIT64( 0x0010000000000000 );
> -    shiftCount = 0x433 - aExp;
> -    if ( shiftCount <= 0 ) {
> -        if ( 0x43E < aExp ) {
> +    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
> +         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
> +       ) {
> +        if (float32_is_signaling_nan(a, status)
> +         || float32_is_signaling_nan(b, status)) {


Is this actually you changing existing code, or is it just that
diff has got confused? If the latter, perhaps whatever the
"think a bit harder" flag to diff is might make the patch
easier to read?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/20] fpu/softfloat: re-factor add/sub
  2018-01-12 15:57   ` Peter Maydell
@ 2018-01-12 18:30     ` Richard Henderson
  2018-01-18 16:43     ` Alex Bennée
  2018-01-23 20:05     ` Alex Bennée
  2 siblings, 0 replies; 68+ messages in thread
From: Richard Henderson @ 2018-01-12 18:30 UTC (permalink / raw)
  To: Peter Maydell, Alex Bennée
  Cc: Laurent Vivier, bharata, Andrew Dutcher, QEMU Developers, Aurelien Jarno

On 01/12/2018 07:57 AM, Peter Maydell wrote:
> I see we're passing and returning decomposed_parts structs everywhere
> rather than pointers to them. How well does that compile? (I guess
> everything ends up inlining...)

For the x86_64 abi at least, the structure (as defined with bitfields) is 128
bits and is passed and returned in registers.

I believe the same to be true for the aarch64 abi.  I'd have to do some
research to determine what happens for the other 64-bit hosts.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/20] fpu/softfloat: re-factor div
  2018-01-12 16:22   ` Peter Maydell
@ 2018-01-12 18:35     ` Richard Henderson
  0 siblings, 0 replies; 68+ messages in thread
From: Richard Henderson @ 2018-01-12 18:35 UTC (permalink / raw)
  To: Peter Maydell, Alex Bennée
  Cc: Laurent Vivier, bharata, Andrew Dutcher, QEMU Developers, Aurelien Jarno

On 01/12/2018 08:22 AM, Peter Maydell wrote:
>> +/* Nicked from gmp longlong.h __udiv_qrnnd */
> Can we have a copyright/license attribution for code we nick
> from other projects, please? :-)
> 

LGPL 2.1 in this case.  ;-)


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/20] fpu/softfloat: re-factor mul
  2018-01-12 16:17   ` Peter Maydell
@ 2018-01-16 10:16     ` Alex Bennée
  0 siblings, 0 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-16 10:16 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno


Peter Maydell <peter.maydell@linaro.org> writes:

> On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
>> We can now add float16_mul and use the common decompose and
>> canonicalize functions to have a single implementation for
>> float16/32/64 versions.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  fpu/softfloat.c         | 207 ++++++++++++++++++------------------------------
>>  include/fpu/softfloat.h |   1 +
>>  2 files changed, 80 insertions(+), 128 deletions(-)
>>
>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>> index f89e47e3ef..6e9d4c172c 100644
>> --- a/fpu/softfloat.c
>> +++ b/fpu/softfloat.c
>> @@ -730,6 +730,85 @@ float64 float64_sub(float64 a, float64 b, float_status *status)
>>      return float64_round_pack_canonical(pr, status);
>>  }
>>
>> +/*
>> + * Returns the result of multiplying the floating-point values `a' and
>> + * `b'. The operation is performed according to the IEC/IEEE Standard
>> + * for Binary Floating-Point Arithmetic.
>> + */
>> +
>> +static decomposed_parts mul_decomposed(decomposed_parts a, decomposed_parts b,
>> +                                       float_status *s)
>> +{
>> +    bool sign = a.sign ^ b.sign;
>> +
>> +    if (a.cls == float_class_normal && b.cls == float_class_normal) {
>> +        uint64_t hi, lo;
>> +        int exp = a.exp + b.exp;
>> +
>> +        mul64To128(a.frac, b.frac, &hi, &lo);
>> +        shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo);
>> +        if (lo & DECOMPOSED_OVERFLOW_BIT) {
>> +            shift64RightJamming(lo, 1, &lo);
>> +            exp += 1;
>> +        }
>> +
>> +        /* Re-use a */
>> +        a.exp = exp;
>> +        a.sign = sign;
>> +        a.frac = lo;
>> +        return a;
>> +    }
>> +    /* handle all the NaN cases */
>> +    if (a.cls >= float_class_qnan || b.cls >= float_class_qnan) {
>> +        return pick_nan_parts(a, b, s);
>> +    }
>> +    /* Inf * Zero == NaN */
>> +    if (((1 << a.cls) | (1 << b.cls)) ==
>> +        ((1 << float_class_inf) | (1 << float_class_zero))) {
>
> This is kinda confusing...

Yeah it's a bit of a shortcut to:

  if ((a.cls == float_class_inf && b.cls == float_class_zero)
     ||
      (a.cls == float_class_zero && b.cls == float_class_inf))

Would you prefer it long hand or tidied away to a helper?

  if (cls_combination(a, b, float_class_inf, float_class_zero))

?

>
>> +        s->float_exception_flags |= float_flag_invalid;
>> +        a.cls = float_class_dnan;
>> +        a.sign = sign;
>> +        return a;
>> +    }
>> +    /* Multiply by 0 or Inf */
>> +    if (a.cls == float_class_inf || a.cls == float_class_zero) {
>> +        a.sign = sign;
>> +        return a;
>> +    }
>> +    if (b.cls == float_class_inf || b.cls == float_class_zero) {
>> +        b.sign = sign;
>> +        return b;
>> +    }
>> +    g_assert_not_reached();
>> +}
>
> thanks
> -- PMM


--
Alex Bennée

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/20] fpu/softfloat: propagate signalling NaNs in MINMAX
  2018-01-12 14:04   ` Peter Maydell
@ 2018-01-16 11:31     ` Alex Bennée
  2018-01-16 11:53       ` Alex Bennée
  0 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-16 11:31 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno


Peter Maydell <peter.maydell@linaro.org> writes:

> On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
>> While a comparison between a QNaN and a number will return the number
>> it is not the same with a signaling NaN. In this case the SNaN will
>> "win" and after potentially raising an exception it will be quietened.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> v2
>>   - added return for propageFloat
>> ---
>>  fpu/softfloat.c | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>> index 3a4ab1355f..44c043924e 100644
>> --- a/fpu/softfloat.c
>> +++ b/fpu/softfloat.c
>> @@ -7683,6 +7683,7 @@ int float128_compare_quiet(float128 a, float128 b, float_status *status)
>>   * minnum() and maxnum() functions. These are similar to the min()
>>   * and max() functions but if one of the arguments is a QNaN and
>>   * the other is numerical then the numerical argument is returned.
>> + * SNaNs will get quietened before being returned.
>>   * minnum() and maxnum correspond to the IEEE 754-2008 minNum()
>>   * and maxNum() operations. min() and max() are the typical min/max
>>   * semantics provided by many CPUs which predate that specification.
>> @@ -7703,11 +7704,14 @@ static inline float ## s float ## s ## _minmax(float ## s a, float ## s b,     \
>>      if (float ## s ## _is_any_nan(a) ||                                 \
>>          float ## s ## _is_any_nan(b)) {                                 \
>>          if (isieee) {                                                   \
>> -            if (float ## s ## _is_quiet_nan(a, status) &&               \
>> +            if (float ## s ## _is_signaling_nan(a, status) ||           \
>> +                float ## s ## _is_signaling_nan(b, status)) {           \
>> +                return propagateFloat ## s ## NaN(a, b, status);        \
>> +            } else  if (float ## s ## _is_quiet_nan(a, status) &&       \
>>                  !float ## s ##_is_any_nan(b)) {                         \
>>                  return b;                                               \
>>              } else if (float ## s ## _is_quiet_nan(b, status) &&        \
>> -                       !float ## s ## _is_any_nan(a)) {                \
>> +                       !float ## s ## _is_any_nan(a)) {                 \
>>                  return a;                                               \
>>              }                                                           \
>>          }                                                               \
>>          return propagateFloat ## s ## NaN(a, b, status);                \
>>      }                                                                   \
>
> [added a couple of extra lines of context at the end for clarity]
>
> Am I misreading this patch? I can't see in what case it makes a
> difference to the result. The code change adds an explicit "if
> either A or B is an SNaN then return the propagateFloat*NaN() result".
> But if either A or B is an SNaN then we won't take either of the
> previously existing branches in this if() ("if A is a QNaN and B is
> not a NaN" and "if B is a QNaN and A is not a NaN"), and so we'll
> end up falling through to the "return propagateFloat*NaN" line after
> the end of the "is (ieee) {...}".

I see your point. However the bug is there if we don't check for
signalling NaNs first which probably means the xxx_is_quiet_nan() check
is broken and reporting signalling NaN's as quiet.

The logic is correct in the decomposed function as we have many NaN
types to check against so we check for the signalling NaNs first.

>
> thanks
> -- PMM


--
Alex Bennée

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/20] fpu/softfloat: propagate signalling NaNs in MINMAX
  2018-01-16 11:31     ` Alex Bennée
@ 2018-01-16 11:53       ` Alex Bennée
  0 siblings, 0 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-16 11:53 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno


Alex Bennée <alex.bennee@linaro.org> writes:

> Peter Maydell <peter.maydell@linaro.org> writes:
>
>> On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
>>> While a comparison between a QNaN and a number will return the number
>>> it is not the same with a signaling NaN. In this case the SNaN will
>>> "win" and after potentially raising an exception it will be quietened.
>>>
>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>>> ---
>>> v2
>>>   - added return for propageFloat
>>> ---
>>>  fpu/softfloat.c | 8 ++++++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>>> index 3a4ab1355f..44c043924e 100644
>>> --- a/fpu/softfloat.c
>>> +++ b/fpu/softfloat.c
>>> @@ -7683,6 +7683,7 @@ int float128_compare_quiet(float128 a, float128 b, float_status *status)
>>>   * minnum() and maxnum() functions. These are similar to the min()
>>>   * and max() functions but if one of the arguments is a QNaN and
>>>   * the other is numerical then the numerical argument is returned.
>>> + * SNaNs will get quietened before being returned.
>>>   * minnum() and maxnum correspond to the IEEE 754-2008 minNum()
>>>   * and maxNum() operations. min() and max() are the typical min/max
>>>   * semantics provided by many CPUs which predate that specification.
>>> @@ -7703,11 +7704,14 @@ static inline float ## s float ## s ## _minmax(float ## s a, float ## s b,     \
>>>      if (float ## s ## _is_any_nan(a) ||                                 \
>>>          float ## s ## _is_any_nan(b)) {                                 \
>>>          if (isieee) {                                                   \
>>> -            if (float ## s ## _is_quiet_nan(a, status) &&               \
>>> +            if (float ## s ## _is_signaling_nan(a, status) ||           \
>>> +                float ## s ## _is_signaling_nan(b, status)) {           \
>>> +                return propagateFloat ## s ## NaN(a, b, status);        \
>>> +            } else  if (float ## s ## _is_quiet_nan(a, status) &&       \
>>>                  !float ## s ##_is_any_nan(b)) {                         \
>>>                  return b;                                               \
>>>              } else if (float ## s ## _is_quiet_nan(b, status) &&        \
>>> -                       !float ## s ## _is_any_nan(a)) {                \
>>> +                       !float ## s ## _is_any_nan(a)) {                 \
>>>                  return a;                                               \
>>>              }                                                           \
>>>          }                                                               \
>>>          return propagateFloat ## s ## NaN(a, b, status);                \
>>>      }                                                                   \
>>
>> [added a couple of extra lines of context at the end for clarity]
>>
>> Am I misreading this patch? I can't see in what case it makes a
>> difference to the result. The code change adds an explicit "if
>> either A or B is an SNaN then return the propagateFloat*NaN() result".
>> But if either A or B is an SNaN then we won't take either of the
>> previously existing branches in this if() ("if A is a QNaN and B is
>> not a NaN" and "if B is a QNaN and A is not a NaN"), and so we'll
>> end up falling through to the "return propagateFloat*NaN" line after
>> the end of the "is (ieee) {...}".
>
> I see your point. However the bug is there if we don't check for
> signalling NaNs first which probably means the xxx_is_quiet_nan() check
> is broken and reporting signalling NaN's as quiet.
>
> The logic is correct in the decomposed function as we have many NaN
> types to check against so we check for the signalling NaNs first.

So maybe the helper functions need to be clearer:

  /*----------------------------------------------------------------------------
  | Returns 1 if the half-precision floating-point value `a' is a quiet
  | NaN; otherwise returns 0.
  *----------------------------------------------------------------------------*/

  int float16_is_quiet_nan(float16 a_, float_status *status)
  {
      if (float16_is_any_nan(a_)) {
          uint16_t sbit = float16_val(a_) & (1 << 9);
          if (status->snan_bit_is_one) {
              return sbit ? 0 : 1;
          } else {
              return sbit ? 1 : 0;
          }

      }
      return 0;
  }

  /*----------------------------------------------------------------------------
  | Returns 1 if the half-precision floating-point value `a' is a signaling
  | NaN; otherwise returns 0.
  *----------------------------------------------------------------------------*/

  int float16_is_signaling_nan(float16 a_, float_status *status)
  {
      if (float16_is_any_nan(a_)) {
          uint16_t sbit = float16_val(a_) & (1 << 9);
          if (status->snan_bit_is_one) {
              return sbit ? 1 : 0;
          } else {
              return sbit ? 0 : 1;
          }

      }
      return 0;
  }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 16/20] fpu/softfloat: re-factor float to int/uint
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 16/20] fpu/softfloat: re-factor float to int/uint Alex Bennée
  2018-01-09 17:12   ` Richard Henderson
  2018-01-12 16:36   ` Peter Maydell
@ 2018-01-16 17:06   ` Alex Bennée
  2 siblings, 0 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-16 17:06 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew
  Cc: qemu-devel, Aurelien Jarno


Alex Bennée <alex.bennee@linaro.org> writes:

> We share the common int64/uint64_pack_decomposed function across all
> the helpers and simply limit the final result depending on the final
> size.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> --
> v2
>   - apply float_flg_invalid fixes next patch
> ---
>  fpu/softfloat.c         | 1011 +++++++++++------------------------------------
>  include/fpu/softfloat.h |   13 +
>  2 files changed, 235 insertions(+), 789 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index edc35300d1..514f43c065 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -1312,6 +1312,194 @@ float64 float64_trunc_to_int(float64 a, float_status *s)
>      return float64_round_pack_canonical(pr, s);
>  }
>
> +/*----------------------------------------------------------------------------
> +| Returns the result of converting the floating-point value
> +| `a' to the two's complement integer format.  The conversion is
> +| performed according to the IEC/IEEE Standard for Binary Floating-Point
> +| Arithmetic---which means in particular that the conversion is rounded
> +| according to the current rounding mode.  If `a' is a NaN, the largest
> +| positive integer is returned.  Otherwise, if the conversion overflows, the
> +| largest integer with the same sign as `a' is returned.
> +*----------------------------------------------------------------------------*/
> +
> +static int64_t int64_pack_decomposed(decomposed_parts p, float_status *s)
> +{
> +    uint64_t r;
> +
> +    switch (p.cls) {
> +    case float_class_snan:
> +    case float_class_qnan:
> +        return INT64_MAX;
> +    case float_class_inf:
> +        return p.sign ? INT64_MIN : INT64_MAX;
> +    case float_class_zero:
> +        return 0;
> +    case float_class_normal:
> +        if (p.exp < DECOMPOSED_BINARY_POINT) {
> +            r = p.frac >> (DECOMPOSED_BINARY_POINT - p.exp);
> +        } else if (p.exp < 64) {
> +            r = p.frac << (p.exp - DECOMPOSED_BINARY_POINT);
> +        } else {
> +            s->float_exception_flags |= float_flag_invalid;
> +            r = UINT64_MAX;
> +        }
> +        if (p.sign) {
> +            return r < - (uint64_t) INT64_MIN ? -r : INT64_MIN;
> +        } else {
> +            return r < INT64_MAX ? r : INT64_MAX;
> +        }
> +    default:
> +        g_assert_not_reached();
> +    }
> +}
> +
> +static int16_t int16_pack_decomposed(decomposed_parts p, float_status *s)
> +{
> +    int64_t r = int64_pack_decomposed(p, s);
> +    if (r < INT16_MIN) {
> +        s->float_exception_flags |= float_flag_invalid;
> +        return INT16_MIN;
> +    } else if (r > INT16_MAX) {
> +        s->float_exception_flags |= float_flag_invalid;
> +        return INT16_MAX;
> +    }
> +    return r;
> +}
> +
> +static int32_t int32_pack_decomposed(decomposed_parts p, float_status *s)
> +{
> +    int64_t r = int64_pack_decomposed(p, s);
> +    if (r < INT32_MIN) {
> +        s->float_exception_flags |= float_flag_invalid;
> +        return INT32_MIN;
> +    } else if (r > INT32_MAX) {
> +        s->float_exception_flags |= float_flag_invalid;
> +        return INT32_MAX;
> +    }
> +    return r;
> +}
> +
> +#define FLOAT_TO_INT(fsz, isz) \
> +int ## isz ## _t float ## fsz ## _to_int ## isz(float ## fsz a, float_status *s) \
> +{                                                                       \
> +    decomposed_parts pa = float ## fsz ## _unpack_canonical(a, s);      \
> +    decomposed_parts pr = round_decomposed(pa,
> s->float_rounding_mode, s); \

Note to self: round_decomposed may set inexact here which may be
over-ridden by invalid if the number is out of range.

> +    return int ## isz ## _pack_decomposed(pr, s);                       \
> +}                                                                       \
> +                                                                        \
> +int ## isz ## _t float ## fsz ## _to_int ## isz ## _round_to_zero       \
> + (float ## fsz a, float_status *s)                                      \
> +{                                                                       \
> +    decomposed_parts pa = float ## fsz ## _unpack_canonical(a, s);      \
> +    decomposed_parts pr = round_decomposed(pa, float_round_to_zero, s); \
> +    return int ## isz ## _pack_decomposed(pr, s);                       \
> +}
> +
> +FLOAT_TO_INT(16, 16)
> +FLOAT_TO_INT(16, 32)
> +FLOAT_TO_INT(16, 64)
> +
> +FLOAT_TO_INT(32, 16)
> +FLOAT_TO_INT(32, 32)
> +FLOAT_TO_INT(32, 64)
> +
> +FLOAT_TO_INT(64, 16)
> +FLOAT_TO_INT(64, 32)
> +FLOAT_TO_INT(64, 64)
> +
> +#undef FLOAT_TO_INT
> +
> +/*
> + *  Returns the result of converting the floating-point value `a' to
> + *  the unsigned integer format. The conversion is performed according
> + *  to the IEC/IEEE Standard for Binary Floating-Point
> + *  Arithmetic---which means in particular that the conversion is
> + *  rounded according to the current rounding mode. If `a' is a NaN,
> + *  the largest unsigned integer is returned. Otherwise, if the
> + *  conversion overflows, the largest unsigned integer is returned. If
> + *  the 'a' is negative, the result is rounded and zero is returned;
> + *  values that do not round to zero will raise the inexact exception
> + *  flag.
> + */
> +
> +static uint64_t uint64_pack_decomposed(decomposed_parts p, float_status *s)
> +{
> +    switch (p.cls) {
> +    case float_class_snan:
> +    case float_class_qnan:
> +        return UINT64_MAX;
> +    case float_class_inf:
> +        return p.sign ? 0 : UINT64_MAX;
> +    case float_class_zero:
> +        return 0;
> +    case float_class_normal:
> +        if (p.sign) {
> +            s->float_exception_flags |= float_flag_invalid;
> +            return 0;
> +        }
> +        if (p.exp < DECOMPOSED_BINARY_POINT) {
> +            return p.frac >> (DECOMPOSED_BINARY_POINT - p.exp);
> +        } else if (p.exp < 64) {
> +            return p.frac << (p.exp - DECOMPOSED_BINARY_POINT);
> +        } else {
> +            s->float_exception_flags |= float_flag_invalid;
> +            return UINT64_MAX;
> +        }
> +    default:
> +        g_assert_not_reached();
> +    }
> +}
> +
> +static uint16_t uint16_pack_decomposed(decomposed_parts p, float_status *s)
> +{
> +    uint64_t r = uint64_pack_decomposed(p, s);
> +    if (r > UINT16_MAX) {
> +        s->float_exception_flags |= float_flag_invalid;
> +        r = UINT16_MAX;
> +    }
> +    return r;
> +}
> +
> +static uint32_t uint32_pack_decomposed(decomposed_parts p, float_status *s)
> +{
> +    uint64_t r = uint64_pack_decomposed(p, s);
> +    if (r > UINT32_MAX) {
> +        s->float_exception_flags |= float_flag_invalid;
> +        r = UINT32_MAX;
> +    }
> +    return r;
> +}
> +
> +#define FLOAT_TO_UINT(fsz, isz) \
> +uint ## isz ## _t float ## fsz ## _to_uint ## isz(float ## fsz a, float_status *s) \
> +{                                                                       \
> +    decomposed_parts pa = float ## fsz ## _unpack_canonical(a, s);      \
> +    decomposed_parts pr = round_decomposed(pa, s->float_rounding_mode, s); \
> +    return uint ## isz ## _pack_decomposed(pr, s);                      \
> +}                                                                       \
> +                                                                        \
> +uint ## isz ## _t float ## fsz ## _to_uint ## isz ## _round_to_zero     \
> + (float ## fsz a, float_status *s)                                      \
> +{                                                                       \
> +    decomposed_parts pa = float ## fsz ## _unpack_canonical(a, s);      \
> +    decomposed_parts pr = round_decomposed(pa, float_round_to_zero, s); \
> +    return uint ## isz ## _pack_decomposed(pr, s);                      \
> +}
> +
> +FLOAT_TO_UINT(16, 16)
> +FLOAT_TO_UINT(16, 32)
> +FLOAT_TO_UINT(16, 64)
> +
> +FLOAT_TO_UINT(32, 16)
> +FLOAT_TO_UINT(32, 32)
> +FLOAT_TO_UINT(32, 64)
> +
> +FLOAT_TO_UINT(64, 16)
> +FLOAT_TO_UINT(64, 32)
> +FLOAT_TO_UINT(64, 64)
> +
> +#undef FLOAT_TO_UINT
> +
>  /*----------------------------------------------------------------------------
>  | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
>  | and 7, and returns the properly rounded 32-bit integer corresponding to the
> @@ -2663,288 +2851,8 @@ float128 uint64_to_float128(uint64_t a, float_status *status)
>      return normalizeRoundAndPackFloat128(0, 0x406E, a, 0, status);
>  }
>
> -/*----------------------------------------------------------------------------
> -| Returns the result of converting the single-precision floating-point value
> -| `a' to the 32-bit two's complement integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic---which means in particular that the conversion is rounded
> -| according to the current rounding mode.  If `a' is a NaN, the largest
> -| positive integer is returned.  Otherwise, if the conversion overflows, the
> -| largest integer with the same sign as `a' is returned.
> -*----------------------------------------------------------------------------*/
>
> -int32_t float32_to_int32(float32 a, float_status *status)
> -{
> -    flag aSign;
> -    int aExp;
> -    int shiftCount;
> -    uint32_t aSig;
> -    uint64_t aSig64;
> -
> -    a = float32_squash_input_denormal(a, status);
> -    aSig = extractFloat32Frac( a );
> -    aExp = extractFloat32Exp( a );
> -    aSign = extractFloat32Sign( a );
> -    if ( ( aExp == 0xFF ) && aSig ) aSign = 0;
> -    if ( aExp ) aSig |= 0x00800000;
> -    shiftCount = 0xAF - aExp;
> -    aSig64 = aSig;
> -    aSig64 <<= 32;
> -    if ( 0 < shiftCount ) shift64RightJamming( aSig64, shiftCount, &aSig64 );
> -    return roundAndPackInt32(aSign, aSig64, status);
>
> -}
> -
> -/*----------------------------------------------------------------------------
> -| Returns the result of converting the single-precision floating-point value
> -| `a' to the 32-bit two's complement integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic, except that the conversion is always rounded toward zero.
> -| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
> -| the conversion overflows, the largest integer with the same sign as `a' is
> -| returned.
> -*----------------------------------------------------------------------------*/
> -
> -int32_t float32_to_int32_round_to_zero(float32 a, float_status *status)
> -{
> -    flag aSign;
> -    int aExp;
> -    int shiftCount;
> -    uint32_t aSig;
> -    int32_t z;
> -    a = float32_squash_input_denormal(a, status);
> -
> -    aSig = extractFloat32Frac( a );
> -    aExp = extractFloat32Exp( a );
> -    aSign = extractFloat32Sign( a );
> -    shiftCount = aExp - 0x9E;
> -    if ( 0 <= shiftCount ) {
> -        if ( float32_val(a) != 0xCF000000 ) {
> -            float_raise(float_flag_invalid, status);
> -            if ( ! aSign || ( ( aExp == 0xFF ) && aSig ) ) return 0x7FFFFFFF;
> -        }
> -        return (int32_t) 0x80000000;
> -    }
> -    else if ( aExp <= 0x7E ) {
> -        if (aExp | aSig) {
> -            status->float_exception_flags |= float_flag_inexact;
> -        }
> -        return 0;
> -    }
> -    aSig = ( aSig | 0x00800000 )<<8;
> -    z = aSig>>( - shiftCount );
> -    if ( (uint32_t) ( aSig<<( shiftCount & 31 ) ) ) {
> -        status->float_exception_flags |= float_flag_inexact;
> -    }
> -    if ( aSign ) z = - z;
> -    return z;
> -
> -}
> -
> -/*----------------------------------------------------------------------------
> -| Returns the result of converting the single-precision floating-point value
> -| `a' to the 16-bit two's complement integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic, except that the conversion is always rounded toward zero.
> -| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
> -| the conversion overflows, the largest integer with the same sign as `a' is
> -| returned.
> -*----------------------------------------------------------------------------*/
> -
> -int16_t float32_to_int16_round_to_zero(float32 a, float_status *status)
> -{
> -    flag aSign;
> -    int aExp;
> -    int shiftCount;
> -    uint32_t aSig;
> -    int32_t z;
> -
> -    aSig = extractFloat32Frac( a );
> -    aExp = extractFloat32Exp( a );
> -    aSign = extractFloat32Sign( a );
> -    shiftCount = aExp - 0x8E;
> -    if ( 0 <= shiftCount ) {
> -        if ( float32_val(a) != 0xC7000000 ) {
> -            float_raise(float_flag_invalid, status);
> -            if ( ! aSign || ( ( aExp == 0xFF ) && aSig ) ) {
> -                return 0x7FFF;
> -            }
> -        }
> -        return (int32_t) 0xffff8000;
> -    }
> -    else if ( aExp <= 0x7E ) {
> -        if ( aExp | aSig ) {
> -            status->float_exception_flags |= float_flag_inexact;
> -        }
> -        return 0;
> -    }
> -    shiftCount -= 0x10;
> -    aSig = ( aSig | 0x00800000 )<<8;
> -    z = aSig>>( - shiftCount );
> -    if ( (uint32_t) ( aSig<<( shiftCount & 31 ) ) ) {
> -        status->float_exception_flags |= float_flag_inexact;
> -    }
> -    if ( aSign ) {
> -        z = - z;
> -    }
> -    return z;
> -
> -}
> -
> -/*----------------------------------------------------------------------------
> -| Returns the result of converting the single-precision floating-point value
> -| `a' to the 64-bit two's complement integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic---which means in particular that the conversion is rounded
> -| according to the current rounding mode.  If `a' is a NaN, the largest
> -| positive integer is returned.  Otherwise, if the conversion overflows, the
> -| largest integer with the same sign as `a' is returned.
> -*----------------------------------------------------------------------------*/
> -
> -int64_t float32_to_int64(float32 a, float_status *status)
> -{
> -    flag aSign;
> -    int aExp;
> -    int shiftCount;
> -    uint32_t aSig;
> -    uint64_t aSig64, aSigExtra;
> -    a = float32_squash_input_denormal(a, status);
> -
> -    aSig = extractFloat32Frac( a );
> -    aExp = extractFloat32Exp( a );
> -    aSign = extractFloat32Sign( a );
> -    shiftCount = 0xBE - aExp;
> -    if ( shiftCount < 0 ) {
> -        float_raise(float_flag_invalid, status);
> -        if ( ! aSign || ( ( aExp == 0xFF ) && aSig ) ) {
> -            return LIT64( 0x7FFFFFFFFFFFFFFF );
> -        }
> -        return (int64_t) LIT64( 0x8000000000000000 );
> -    }
> -    if ( aExp ) aSig |= 0x00800000;
> -    aSig64 = aSig;
> -    aSig64 <<= 40;
> -    shift64ExtraRightJamming( aSig64, 0, shiftCount, &aSig64, &aSigExtra );
> -    return roundAndPackInt64(aSign, aSig64, aSigExtra, status);
> -
> -}
> -
> -/*----------------------------------------------------------------------------
> -| Returns the result of converting the single-precision floating-point value
> -| `a' to the 64-bit unsigned integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic---which means in particular that the conversion is rounded
> -| according to the current rounding mode.  If `a' is a NaN, the largest
> -| unsigned integer is returned.  Otherwise, if the conversion overflows, the
> -| largest unsigned integer is returned.  If the 'a' is negative, the result
> -| is rounded and zero is returned; values that do not round to zero will
> -| raise the inexact exception flag.
> -*----------------------------------------------------------------------------*/
> -
> -uint64_t float32_to_uint64(float32 a, float_status *status)
> -{
> -    flag aSign;
> -    int aExp;
> -    int shiftCount;
> -    uint32_t aSig;
> -    uint64_t aSig64, aSigExtra;
> -    a = float32_squash_input_denormal(a, status);
> -
> -    aSig = extractFloat32Frac(a);
> -    aExp = extractFloat32Exp(a);
> -    aSign = extractFloat32Sign(a);
> -    if ((aSign) && (aExp > 126)) {
> -        float_raise(float_flag_invalid, status);
> -        if (float32_is_any_nan(a)) {
> -            return LIT64(0xFFFFFFFFFFFFFFFF);
> -        } else {
> -            return 0;
> -        }
> -    }
> -    shiftCount = 0xBE - aExp;
> -    if (aExp) {
> -        aSig |= 0x00800000;
> -    }
> -    if (shiftCount < 0) {
> -        float_raise(float_flag_invalid, status);
> -        return LIT64(0xFFFFFFFFFFFFFFFF);
> -    }
> -
> -    aSig64 = aSig;
> -    aSig64 <<= 40;
> -    shift64ExtraRightJamming(aSig64, 0, shiftCount, &aSig64, &aSigExtra);
> -    return roundAndPackUint64(aSign, aSig64, aSigExtra, status);
> -}
> -
> -/*----------------------------------------------------------------------------
> -| Returns the result of converting the single-precision floating-point value
> -| `a' to the 64-bit unsigned integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic, except that the conversion is always rounded toward zero.  If
> -| `a' is a NaN, the largest unsigned integer is returned.  Otherwise, if the
> -| conversion overflows, the largest unsigned integer is returned.  If the
> -| 'a' is negative, the result is rounded and zero is returned; values that do
> -| not round to zero will raise the inexact flag.
> -*----------------------------------------------------------------------------*/
> -
> -uint64_t float32_to_uint64_round_to_zero(float32 a, float_status *status)
> -{
> -    signed char current_rounding_mode = status->float_rounding_mode;
> -    set_float_rounding_mode(float_round_to_zero, status);
> -    int64_t v = float32_to_uint64(a, status);
> -    set_float_rounding_mode(current_rounding_mode, status);
> -    return v;
> -}
> -
> -/*----------------------------------------------------------------------------
> -| Returns the result of converting the single-precision floating-point value
> -| `a' to the 64-bit two's complement integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic, except that the conversion is always rounded toward zero.  If
> -| `a' is a NaN, the largest positive integer is returned.  Otherwise, if the
> -| conversion overflows, the largest integer with the same sign as `a' is
> -| returned.
> -*----------------------------------------------------------------------------*/
> -
> -int64_t float32_to_int64_round_to_zero(float32 a, float_status *status)
> -{
> -    flag aSign;
> -    int aExp;
> -    int shiftCount;
> -    uint32_t aSig;
> -    uint64_t aSig64;
> -    int64_t z;
> -    a = float32_squash_input_denormal(a, status);
> -
> -    aSig = extractFloat32Frac( a );
> -    aExp = extractFloat32Exp( a );
> -    aSign = extractFloat32Sign( a );
> -    shiftCount = aExp - 0xBE;
> -    if ( 0 <= shiftCount ) {
> -        if ( float32_val(a) != 0xDF000000 ) {
> -            float_raise(float_flag_invalid, status);
> -            if ( ! aSign || ( ( aExp == 0xFF ) && aSig ) ) {
> -                return LIT64( 0x7FFFFFFFFFFFFFFF );
> -            }
> -        }
> -        return (int64_t) LIT64( 0x8000000000000000 );
> -    }
> -    else if ( aExp <= 0x7E ) {
> -        if (aExp | aSig) {
> -            status->float_exception_flags |= float_flag_inexact;
> -        }
> -        return 0;
> -    }
> -    aSig64 = aSig | 0x00800000;
> -    aSig64 <<= 40;
> -    z = aSig64>>( - shiftCount );
> -    if ( (uint64_t) ( aSig64<<( shiftCount & 63 ) ) ) {
> -        status->float_exception_flags |= float_flag_inexact;
> -    }
> -    if ( aSign ) z = - z;
> -    return z;
> -
> -}
>
>  /*----------------------------------------------------------------------------
>  | Returns the result of converting the single-precision floating-point value
> @@ -3500,289 +3408,59 @@ int float32_le_quiet(float32 a, float32 b, float_status *status)
>  | Returns 1 if the single-precision floating-point value `a' is less than
>  | the corresponding value `b', and 0 otherwise.  Quiet NaNs do not cause an
>  | exception.  Otherwise, the comparison is performed according to the IEC/IEEE
> -| Standard for Binary Floating-Point Arithmetic.
> -*----------------------------------------------------------------------------*/
> -
> -int float32_lt_quiet(float32 a, float32 b, float_status *status)
> -{
> -    flag aSign, bSign;
> -    uint32_t av, bv;
> -    a = float32_squash_input_denormal(a, status);
> -    b = float32_squash_input_denormal(b, status);
> -
> -    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
> -         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
> -       ) {
> -        if (float32_is_signaling_nan(a, status)
> -         || float32_is_signaling_nan(b, status)) {
> -            float_raise(float_flag_invalid, status);
> -        }
> -        return 0;
> -    }
> -    aSign = extractFloat32Sign( a );
> -    bSign = extractFloat32Sign( b );
> -    av = float32_val(a);
> -    bv = float32_val(b);
> -    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
> -    return ( av != bv ) && ( aSign ^ ( av < bv ) );
> -
> -}
> -
> -/*----------------------------------------------------------------------------
> -| Returns 1 if the single-precision floating-point values `a' and `b' cannot
> -| be compared, and 0 otherwise.  Quiet NaNs do not cause an exception.  The
> -| comparison is performed according to the IEC/IEEE Standard for Binary
> -| Floating-Point Arithmetic.
> -*----------------------------------------------------------------------------*/
> -
> -int float32_unordered_quiet(float32 a, float32 b, float_status *status)
> -{
> -    a = float32_squash_input_denormal(a, status);
> -    b = float32_squash_input_denormal(b, status);
> -
> -    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
> -         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
> -       ) {
> -        if (float32_is_signaling_nan(a, status)
> -         || float32_is_signaling_nan(b, status)) {
> -            float_raise(float_flag_invalid, status);
> -        }
> -        return 1;
> -    }
> -    return 0;
> -}
> -
> -/*----------------------------------------------------------------------------
> -| Returns the result of converting the double-precision floating-point value
> -| `a' to the 32-bit two's complement integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic---which means in particular that the conversion is rounded
> -| according to the current rounding mode.  If `a' is a NaN, the largest
> -| positive integer is returned.  Otherwise, if the conversion overflows, the
> -| largest integer with the same sign as `a' is returned.
> -*----------------------------------------------------------------------------*/
> -
> -int32_t float64_to_int32(float64 a, float_status *status)
> -{
> -    flag aSign;
> -    int aExp;
> -    int shiftCount;
> -    uint64_t aSig;
> -    a = float64_squash_input_denormal(a, status);
> -
> -    aSig = extractFloat64Frac( a );
> -    aExp = extractFloat64Exp( a );
> -    aSign = extractFloat64Sign( a );
> -    if ( ( aExp == 0x7FF ) && aSig ) aSign = 0;
> -    if ( aExp ) aSig |= LIT64( 0x0010000000000000 );
> -    shiftCount = 0x42C - aExp;
> -    if ( 0 < shiftCount ) shift64RightJamming( aSig, shiftCount, &aSig );
> -    return roundAndPackInt32(aSign, aSig, status);
> -
> -}
> -
> -/*----------------------------------------------------------------------------
> -| Returns the result of converting the double-precision floating-point value
> -| `a' to the 32-bit two's complement integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic, except that the conversion is always rounded toward zero.
> -| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
> -| the conversion overflows, the largest integer with the same sign as `a' is
> -| returned.
> -*----------------------------------------------------------------------------*/
> -
> -int32_t float64_to_int32_round_to_zero(float64 a, float_status *status)
> -{
> -    flag aSign;
> -    int aExp;
> -    int shiftCount;
> -    uint64_t aSig, savedASig;
> -    int32_t z;
> -    a = float64_squash_input_denormal(a, status);
> -
> -    aSig = extractFloat64Frac( a );
> -    aExp = extractFloat64Exp( a );
> -    aSign = extractFloat64Sign( a );
> -    if ( 0x41E < aExp ) {
> -        if ( ( aExp == 0x7FF ) && aSig ) aSign = 0;
> -        goto invalid;
> -    }
> -    else if ( aExp < 0x3FF ) {
> -        if (aExp || aSig) {
> -            status->float_exception_flags |= float_flag_inexact;
> -        }
> -        return 0;
> -    }
> -    aSig |= LIT64( 0x0010000000000000 );
> -    shiftCount = 0x433 - aExp;
> -    savedASig = aSig;
> -    aSig >>= shiftCount;
> -    z = aSig;
> -    if ( aSign ) z = - z;
> -    if ( ( z < 0 ) ^ aSign ) {
> - invalid:
> -        float_raise(float_flag_invalid, status);
> -        return aSign ? (int32_t) 0x80000000 : 0x7FFFFFFF;
> -    }
> -    if ( ( aSig<<shiftCount ) != savedASig ) {
> -        status->float_exception_flags |= float_flag_inexact;
> -    }
> -    return z;
> -
> -}
> -
> -/*----------------------------------------------------------------------------
> -| Returns the result of converting the double-precision floating-point value
> -| `a' to the 16-bit two's complement integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic, except that the conversion is always rounded toward zero.
> -| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
> -| the conversion overflows, the largest integer with the same sign as `a' is
> -| returned.
> -*----------------------------------------------------------------------------*/
> -
> -int16_t float64_to_int16_round_to_zero(float64 a, float_status *status)
> -{
> -    flag aSign;
> -    int aExp;
> -    int shiftCount;
> -    uint64_t aSig, savedASig;
> -    int32_t z;
> -
> -    aSig = extractFloat64Frac( a );
> -    aExp = extractFloat64Exp( a );
> -    aSign = extractFloat64Sign( a );
> -    if ( 0x40E < aExp ) {
> -        if ( ( aExp == 0x7FF ) && aSig ) {
> -            aSign = 0;
> -        }
> -        goto invalid;
> -    }
> -    else if ( aExp < 0x3FF ) {
> -        if ( aExp || aSig ) {
> -            status->float_exception_flags |= float_flag_inexact;
> -        }
> -        return 0;
> -    }
> -    aSig |= LIT64( 0x0010000000000000 );
> -    shiftCount = 0x433 - aExp;
> -    savedASig = aSig;
> -    aSig >>= shiftCount;
> -    z = aSig;
> -    if ( aSign ) {
> -        z = - z;
> -    }
> -    if ( ( (int16_t)z < 0 ) ^ aSign ) {
> - invalid:
> -        float_raise(float_flag_invalid, status);
> -        return aSign ? (int32_t) 0xffff8000 : 0x7FFF;
> -    }
> -    if ( ( aSig<<shiftCount ) != savedASig ) {
> -        status->float_exception_flags |= float_flag_inexact;
> -    }
> -    return z;
> -}
> -
> -/*----------------------------------------------------------------------------
> -| Returns the result of converting the double-precision floating-point value
> -| `a' to the 64-bit two's complement integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic---which means in particular that the conversion is rounded
> -| according to the current rounding mode.  If `a' is a NaN, the largest
> -| positive integer is returned.  Otherwise, if the conversion overflows, the
> -| largest integer with the same sign as `a' is returned.
> +| Standard for Binary Floating-Point Arithmetic.
>  *----------------------------------------------------------------------------*/
>
> -int64_t float64_to_int64(float64 a, float_status *status)
> +int float32_lt_quiet(float32 a, float32 b, float_status *status)
>  {
> -    flag aSign;
> -    int aExp;
> -    int shiftCount;
> -    uint64_t aSig, aSigExtra;
> -    a = float64_squash_input_denormal(a, status);
> +    flag aSign, bSign;
> +    uint32_t av, bv;
> +    a = float32_squash_input_denormal(a, status);
> +    b = float32_squash_input_denormal(b, status);
>
> -    aSig = extractFloat64Frac( a );
> -    aExp = extractFloat64Exp( a );
> -    aSign = extractFloat64Sign( a );
> -    if ( aExp ) aSig |= LIT64( 0x0010000000000000 );
> -    shiftCount = 0x433 - aExp;
> -    if ( shiftCount <= 0 ) {
> -        if ( 0x43E < aExp ) {
> +    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
> +         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
> +       ) {
> +        if (float32_is_signaling_nan(a, status)
> +         || float32_is_signaling_nan(b, status)) {
>              float_raise(float_flag_invalid, status);
> -            if (    ! aSign
> -                 || (    ( aExp == 0x7FF )
> -                      && ( aSig != LIT64( 0x0010000000000000 ) ) )
> -               ) {
> -                return LIT64( 0x7FFFFFFFFFFFFFFF );
> -            }
> -            return (int64_t) LIT64( 0x8000000000000000 );
>          }
> -        aSigExtra = 0;
> -        aSig <<= - shiftCount;
> -    }
> -    else {
> -        shift64ExtraRightJamming( aSig, 0, shiftCount, &aSig, &aSigExtra );
> +        return 0;
>      }
> -    return roundAndPackInt64(aSign, aSig, aSigExtra, status);
> +    aSign = extractFloat32Sign( a );
> +    bSign = extractFloat32Sign( b );
> +    av = float32_val(a);
> +    bv = float32_val(b);
> +    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
> +    return ( av != bv ) && ( aSign ^ ( av < bv ) );
>
>  }
>
>  /*----------------------------------------------------------------------------
> -| Returns the result of converting the double-precision floating-point value
> -| `a' to the 64-bit two's complement integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic, except that the conversion is always rounded toward zero.
> -| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
> -| the conversion overflows, the largest integer with the same sign as `a' is
> -| returned.
> +| Returns 1 if the single-precision floating-point values `a' and `b' cannot
> +| be compared, and 0 otherwise.  Quiet NaNs do not cause an exception.  The
> +| comparison is performed according to the IEC/IEEE Standard for Binary
> +| Floating-Point Arithmetic.
>  *----------------------------------------------------------------------------*/
>
> -int64_t float64_to_int64_round_to_zero(float64 a, float_status *status)
> +int float32_unordered_quiet(float32 a, float32 b, float_status *status)
>  {
> -    flag aSign;
> -    int aExp;
> -    int shiftCount;
> -    uint64_t aSig;
> -    int64_t z;
> -    a = float64_squash_input_denormal(a, status);
> +    a = float32_squash_input_denormal(a, status);
> +    b = float32_squash_input_denormal(b, status);
>
> -    aSig = extractFloat64Frac( a );
> -    aExp = extractFloat64Exp( a );
> -    aSign = extractFloat64Sign( a );
> -    if ( aExp ) aSig |= LIT64( 0x0010000000000000 );
> -    shiftCount = aExp - 0x433;
> -    if ( 0 <= shiftCount ) {
> -        if ( 0x43E <= aExp ) {
> -            if ( float64_val(a) != LIT64( 0xC3E0000000000000 ) ) {
> -                float_raise(float_flag_invalid, status);
> -                if (    ! aSign
> -                     || (    ( aExp == 0x7FF )
> -                          && ( aSig != LIT64( 0x0010000000000000 ) ) )
> -                   ) {
> -                    return LIT64( 0x7FFFFFFFFFFFFFFF );
> -                }
> -            }
> -            return (int64_t) LIT64( 0x8000000000000000 );
> -        }
> -        z = aSig<<shiftCount;
> -    }
> -    else {
> -        if ( aExp < 0x3FE ) {
> -            if (aExp | aSig) {
> -                status->float_exception_flags |= float_flag_inexact;
> -            }
> -            return 0;
> -        }
> -        z = aSig>>( - shiftCount );
> -        if ( (uint64_t) ( aSig<<( shiftCount & 63 ) ) ) {
> -            status->float_exception_flags |= float_flag_inexact;
> +    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
> +         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
> +       ) {
> +        if (float32_is_signaling_nan(a, status)
> +         || float32_is_signaling_nan(b, status)) {
> +            float_raise(float_flag_invalid, status);
>          }
> +        return 1;
>      }
> -    if ( aSign ) z = - z;
> -    return z;
> -
> +    return 0;
>  }
>
> +
>  /*----------------------------------------------------------------------------
>  | Returns the result of converting the double-precision floating-point value
>  | `a' to the single-precision floating-point format.  The conversion is
> @@ -7049,252 +6727,7 @@ float64 uint32_to_float64(uint32_t a, float_status *status)
>      return int64_to_float64(a, status);
>  }
>
> -uint32_t float32_to_uint32(float32 a, float_status *status)
> -{
> -    int64_t v;
> -    uint32_t res;
> -    int old_exc_flags = get_float_exception_flags(status);
> -
> -    v = float32_to_int64(a, status);
> -    if (v < 0) {
> -        res = 0;
> -    } else if (v > 0xffffffff) {
> -        res = 0xffffffff;
> -    } else {
> -        return v;
> -    }
> -    set_float_exception_flags(old_exc_flags, status);
> -    float_raise(float_flag_invalid, status);
> -    return res;
> -}
> -
> -uint32_t float32_to_uint32_round_to_zero(float32 a, float_status *status)
> -{
> -    int64_t v;
> -    uint32_t res;
> -    int old_exc_flags = get_float_exception_flags(status);
> -
> -    v = float32_to_int64_round_to_zero(a, status);
> -    if (v < 0) {
> -        res = 0;
> -    } else if (v > 0xffffffff) {
> -        res = 0xffffffff;
> -    } else {
> -        return v;
> -    }
> -    set_float_exception_flags(old_exc_flags, status);
> -    float_raise(float_flag_invalid, status);
> -    return res;
> -}
> -
> -int16_t float32_to_int16(float32 a, float_status *status)
> -{
> -    int32_t v;
> -    int16_t res;
> -    int old_exc_flags = get_float_exception_flags(status);
> -
> -    v = float32_to_int32(a, status);
> -    if (v < -0x8000) {
> -        res = -0x8000;
> -    } else if (v > 0x7fff) {
> -        res = 0x7fff;
> -    } else {
> -        return v;
> -    }
> -
> -    set_float_exception_flags(old_exc_flags, status);
> -    float_raise(float_flag_invalid, status);
> -    return res;
> -}
> -
> -uint16_t float32_to_uint16(float32 a, float_status *status)
> -{
> -    int32_t v;
> -    uint16_t res;
> -    int old_exc_flags = get_float_exception_flags(status);
> -
> -    v = float32_to_int32(a, status);
> -    if (v < 0) {
> -        res = 0;
> -    } else if (v > 0xffff) {
> -        res = 0xffff;
> -    } else {
> -        return v;
> -    }
> -
> -    set_float_exception_flags(old_exc_flags, status);
> -    float_raise(float_flag_invalid, status);
> -    return res;
> -}
> -
> -uint16_t float32_to_uint16_round_to_zero(float32 a, float_status *status)
> -{
> -    int64_t v;
> -    uint16_t res;
> -    int old_exc_flags = get_float_exception_flags(status);
> -
> -    v = float32_to_int64_round_to_zero(a, status);
> -    if (v < 0) {
> -        res = 0;
> -    } else if (v > 0xffff) {
> -        res = 0xffff;
> -    } else {
> -        return v;
> -    }
> -    set_float_exception_flags(old_exc_flags, status);
> -    float_raise(float_flag_invalid, status);
> -    return res;
> -}
> -
> -uint32_t float64_to_uint32(float64 a, float_status *status)
> -{
> -    uint64_t v;
> -    uint32_t res;
> -    int old_exc_flags = get_float_exception_flags(status);
> -
> -    v = float64_to_uint64(a, status);
> -    if (v > 0xffffffff) {
> -        res = 0xffffffff;
> -    } else {
> -        return v;
> -    }
> -    set_float_exception_flags(old_exc_flags, status);
> -    float_raise(float_flag_invalid, status);
> -    return res;
> -}
> -
> -uint32_t float64_to_uint32_round_to_zero(float64 a, float_status *status)
> -{
> -    uint64_t v;
> -    uint32_t res;
> -    int old_exc_flags = get_float_exception_flags(status);
> -
> -    v = float64_to_uint64_round_to_zero(a, status);
> -    if (v > 0xffffffff) {
> -        res = 0xffffffff;
> -    } else {
> -        return v;
> -    }
> -    set_float_exception_flags(old_exc_flags, status);
> -    float_raise(float_flag_invalid, status);
> -    return res;
> -}
> -
> -int16_t float64_to_int16(float64 a, float_status *status)
> -{
> -    int64_t v;
> -    int16_t res;
> -    int old_exc_flags = get_float_exception_flags(status);
> -
> -    v = float64_to_int32(a, status);
> -    if (v < -0x8000) {
> -        res = -0x8000;
> -    } else if (v > 0x7fff) {
> -        res = 0x7fff;
> -    } else {
> -        return v;
> -    }
> -
> -    set_float_exception_flags(old_exc_flags, status);
> -    float_raise(float_flag_invalid, status);
> -    return res;
> -}
> -
> -uint16_t float64_to_uint16(float64 a, float_status *status)
> -{
> -    int64_t v;
> -    uint16_t res;
> -    int old_exc_flags = get_float_exception_flags(status);
> -
> -    v = float64_to_int32(a, status);
> -    if (v < 0) {
> -        res = 0;
> -    } else if (v > 0xffff) {
> -        res = 0xffff;
> -    } else {
> -        return v;
> -    }
> -
> -    set_float_exception_flags(old_exc_flags, status);
> -    float_raise(float_flag_invalid, status);
> -    return res;
> -}
> -
> -uint16_t float64_to_uint16_round_to_zero(float64 a, float_status *status)
> -{
> -    int64_t v;
> -    uint16_t res;
> -    int old_exc_flags = get_float_exception_flags(status);
> -
> -    v = float64_to_int64_round_to_zero(a, status);
> -    if (v < 0) {
> -        res = 0;
> -    } else if (v > 0xffff) {
> -        res = 0xffff;
> -    } else {
> -        return v;
> -    }
> -    set_float_exception_flags(old_exc_flags, status);
> -    float_raise(float_flag_invalid, status);
> -    return res;
> -}
> -
> -/*----------------------------------------------------------------------------
> -| Returns the result of converting the double-precision floating-point value
> -| `a' to the 64-bit unsigned integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic---which means in particular that the conversion is rounded
> -| according to the current rounding mode.  If `a' is a NaN, the largest
> -| positive integer is returned.  If the conversion overflows, the
> -| largest unsigned integer is returned.  If 'a' is negative, the value is
> -| rounded and zero is returned; negative values that do not round to zero
> -| will raise the inexact exception.
> -*----------------------------------------------------------------------------*/
> -
> -uint64_t float64_to_uint64(float64 a, float_status *status)
> -{
> -    flag aSign;
> -    int aExp;
> -    int shiftCount;
> -    uint64_t aSig, aSigExtra;
> -    a = float64_squash_input_denormal(a, status);
> -
> -    aSig = extractFloat64Frac(a);
> -    aExp = extractFloat64Exp(a);
> -    aSign = extractFloat64Sign(a);
> -    if (aSign && (aExp > 1022)) {
> -        float_raise(float_flag_invalid, status);
> -        if (float64_is_any_nan(a)) {
> -            return LIT64(0xFFFFFFFFFFFFFFFF);
> -        } else {
> -            return 0;
> -        }
> -    }
> -    if (aExp) {
> -        aSig |= LIT64(0x0010000000000000);
> -    }
> -    shiftCount = 0x433 - aExp;
> -    if (shiftCount <= 0) {
> -        if (0x43E < aExp) {
> -            float_raise(float_flag_invalid, status);
> -            return LIT64(0xFFFFFFFFFFFFFFFF);
> -        }
> -        aSigExtra = 0;
> -        aSig <<= -shiftCount;
> -    } else {
> -        shift64ExtraRightJamming(aSig, 0, shiftCount, &aSig, &aSigExtra);
> -    }
> -    return roundAndPackUint64(aSign, aSig, aSigExtra, status);
> -}
>
> -uint64_t float64_to_uint64_round_to_zero(float64 a, float_status *status)
> -{
> -    signed char current_rounding_mode = status->float_rounding_mode;
> -    set_float_rounding_mode(float_round_to_zero, status);
> -    uint64_t v = float64_to_uint64(a, status);
> -    set_float_rounding_mode(current_rounding_mode, status);
> -    return v;
> -}
>
>  #define COMPARE(s, nan_exp)                                                  \
>  static inline int float ## s ## _compare_internal(float ## s a, float ## s b,\
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 6427762a9a..d7bc7cbcb6 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -314,6 +314,19 @@ float16 float32_to_float16(float32, flag, float_status *status);
>  float32 float16_to_float32(float16, flag, float_status *status);
>  float16 float64_to_float16(float64 a, flag ieee, float_status *status);
>  float64 float16_to_float64(float16 a, flag ieee, float_status *status);
> +int16_t float16_to_int16(float16, float_status *status);
> +uint16_t float16_to_uint16(float16 a, float_status *status);
> +int16_t float16_to_int16_round_to_zero(float16, float_status *status);
> +uint16_t float16_to_uint16_round_to_zero(float16 a, float_status *status);
> +int32_t float16_to_int32(float16, float_status *status);
> +uint32_t float16_to_uint32(float16 a, float_status *status);
> +int32_t float16_to_int32_round_to_zero(float16, float_status *status);
> +uint32_t float16_to_uint32_round_to_zero(float16 a, float_status *status);
> +int64_t float16_to_int64(float16, float_status *status);
> +uint64_t float16_to_uint64(float16 a, float_status *status);
> +int64_t float16_to_int64_round_to_zero(float16, float_status *status);
> +uint64_t float16_to_uint64_round_to_zero(float16 a, float_status *status);
> +float16 int16_to_float16(int16_t a, float_status *status);
>
>  /*----------------------------------------------------------------------------
>  | Software half-precision operations.


--
Alex Bennée

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures
  2018-01-12 16:21   ` Philippe Mathieu-Daudé
@ 2018-01-18 13:08     ` Alex Bennée
  2018-01-18 14:26       ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-18 13:08 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: richard.henderson, peter.maydell, Francisco Iglesias, laurent,
	bharata, andrew, qemu-devel, Aurelien Jarno


Philippe Mathieu-Daudé <f4bug@amsat.org> writes:

> Hi Alex, Richard,
>
> On 01/09/2018 09:22 AM, Alex Bennée wrote:
>> These structures pave the way for generic softfloat helper routines
>> that will operate on fully decomposed numbers.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  fpu/softfloat.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 69 insertions(+), 1 deletion(-)
>>
>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>> index 59afe81d06..fcba28d3f8 100644
>> --- a/fpu/softfloat.c
>> +++ b/fpu/softfloat.c
>> @@ -83,7 +83,7 @@ this code that are retained.
>>   * target-dependent and needs the TARGET_* macros.
>>   */
>>  #include "qemu/osdep.h"
>> -
>> +#include "qemu/bitops.h"
>>  #include "fpu/softfloat.h"
>>
>>  /* We only need stdlib for abort() */
>> @@ -186,6 +186,74 @@ static inline flag extractFloat64Sign(float64 a)
>>      return float64_val(a) >> 63;
>>  }
>>
>> +/*----------------------------------------------------------------------------
>> +| Classify a floating point number.
>> +*----------------------------------------------------------------------------*/
>> +
>> +typedef enum {
>> +    float_class_unclassified,
>> +    float_class_zero,
>> +    float_class_normal,
>> +    float_class_inf,
>> +    float_class_qnan,
>> +    float_class_snan,
>> +    float_class_dnan,
>> +    float_class_msnan, /* maybe silenced */
>> +} float_class;
>> +
>> +/*----------------------------------------------------------------------------
>> +| Structure holding all of the decomposed parts of a float.
>> +| The exponent is unbiased and the fraction is normalized.
>> +*----------------------------------------------------------------------------*/
>> +
>> +typedef struct {
>> +    uint64_t frac   : 64;
>
> I think this does not work on LLP64/IL32P64 model.
>
> Should we add a check in ./configure and refuse to build on IL32P64
> model? This would be safer IMHO.
>
>> +    int exp         : 32;
>> +    float_class cls : 8;
>> +    int             : 23;
>> +    bool sign       : 1;
>
> checking on "ISO/IEC 14882:1998" 9.6 Bit-fields:
>
> Alignment of bit-fields is implementation-defined. Bit-fields are packed
> into some addressable allocation unit. [Note: bit-fields straddle
> allocation units on some machines and not on others. Bit-fields are
> assigned right-to-left on some machines, left-to-right on others. ]
>
> I'd still write it:
>
>       int             :23, sign :1;
>
>> +} decomposed_parts;

I think rather than stuff it into bit fields we can just leave it up to
the compiler?

>> +
>> +#define DECOMPOSED_BINARY_POINT    (64 - 2)
>> +#define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
>> +#define DECOMPOSED_OVERFLOW_BIT    (DECOMPOSED_IMPLICIT_BIT << 1)
>> +
>> +/* Structure holding all of the relevant parameters for a format.  */
>> +typedef struct {
>> +    int exp_bias;
>> +    int exp_max;
>> +    int frac_shift;
>> +    uint64_t frac_lsb;
>> +    uint64_t frac_lsbm1;
>> +    uint64_t round_mask;
>> +    uint64_t roundeven_mask;
>> +} decomposed_params;
>> +
>> +#define FRAC_PARAMS(F)                     \
>> +    .frac_shift     = F,                   \
>> +    .frac_lsb       = 1ull << (F),         \
>> +    .frac_lsbm1     = 1ull << ((F) - 1),   \
>> +    .round_mask     = (1ull << (F)) - 1,   \
>> +    .roundeven_mask = (2ull << (F)) - 1
>> +
>> +static const decomposed_params float16_params = {
>> +    .exp_bias       = 0x0f,
>> +    .exp_max        = 0x1f,
>> +    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 10)
>> +};
>> +
>> +static const decomposed_params float32_params = {
>> +    .exp_bias       = 0x7f,
>> +    .exp_max        = 0xff,
>> +    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 23)
>> +};
>> +
>> +static const decomposed_params float64_params = {
>> +    .exp_bias       = 0x3ff,
>> +    .exp_max        = 0x7ff,
>> +    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 52)
>> +};
>> +
>>  /*----------------------------------------------------------------------------
>>  | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
>>  | and 7, and returns the properly rounded 32-bit integer corresponding to the
>>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures
  2018-01-18 13:08     ` Alex Bennée
@ 2018-01-18 14:26       ` Philippe Mathieu-Daudé
  2018-01-18 14:31         ` Peter Maydell
  0 siblings, 1 reply; 68+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-01-18 14:26 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Peter Maydell, Francisco Iglesias, richard.henderson, laurent,
	qemu-devel, andrew, bharata, Aurelien Jarno

Le 18 janv. 2018 10:09 AM, "Alex Bennée" <alex.bennee@linaro.org> a écrit :


Philippe Mathieu-Daudé <f4bug@amsat.org> writes:

> Hi Alex, Richard,
>
> On 01/09/2018 09:22 AM, Alex Bennée wrote:
>> These structures pave the way for generic softfloat helper routines
>> that will operate on fully decomposed numbers.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  fpu/softfloat.c | 70 ++++++++++++++++++++++++++++++
++++++++++++++++++++++++++-
>>  1 file changed, 69 insertions(+), 1 deletion(-)
>>
>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>> index 59afe81d06..fcba28d3f8 100644
>> --- a/fpu/softfloat.c
>> +++ b/fpu/softfloat.c
>> @@ -83,7 +83,7 @@ this code that are retained.
>>   * target-dependent and needs the TARGET_* macros.
>>   */
>>  #include "qemu/osdep.h"
>> -
>> +#include "qemu/bitops.h"
>>  #include "fpu/softfloat.h"
>>
>>  /* We only need stdlib for abort() */
>> @@ -186,6 +186,74 @@ static inline flag extractFloat64Sign(float64 a)
>>      return float64_val(a) >> 63;
>>  }
>>
>> +/*---------------------------------------------------------
-------------------
>> +| Classify a floating point number.
>> +*----------------------------------------------------------
------------------*/
>> +
>> +typedef enum {
>> +    float_class_unclassified,
>> +    float_class_zero,
>> +    float_class_normal,
>> +    float_class_inf,
>> +    float_class_qnan,
>> +    float_class_snan,
>> +    float_class_dnan,
>> +    float_class_msnan, /* maybe silenced */
>> +} float_class;
>> +
>> +/*---------------------------------------------------------
-------------------
>> +| Structure holding all of the decomposed parts of a float.
>> +| The exponent is unbiased and the fraction is normalized.
>> +*----------------------------------------------------------
------------------*/
>> +
>> +typedef struct {
>> +    uint64_t frac   : 64;
>
> I think this does not work on LLP64/IL32P64 model.
>
> Should we add a check in ./configure and refuse to build on IL32P64
> model? This would be safer IMHO.
>
>> +    int exp         : 32;
>> +    float_class cls : 8;
>> +    int             : 23;
>> +    bool sign       : 1;
>
> checking on "ISO/IEC 14882:1998" 9.6 Bit-fields:
>
> Alignment of bit-fields is implementation-defined. Bit-fields are packed
> into some addressable allocation unit. [Note: bit-fields straddle
> allocation units on some machines and not on others. Bit-fields are
> assigned right-to-left on some machines, left-to-right on others. ]
>
> I'd still write it:
>
>       int             :23, sign :1;
>
>> +} decomposed_parts;

I think rather than stuff it into bit fields we can just leave it up to
the compiler?


Yep, my only worry here is the IL32P64 model, if we care.


>> +
>> +#define DECOMPOSED_BINARY_POINT    (64 - 2)
>> +#define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
>> +#define DECOMPOSED_OVERFLOW_BIT    (DECOMPOSED_IMPLICIT_BIT << 1)
>> +
>> +/* Structure holding all of the relevant parameters for a format.  */
>> +typedef struct {
>> +    int exp_bias;
>> +    int exp_max;
>> +    int frac_shift;
>> +    uint64_t frac_lsb;
>> +    uint64_t frac_lsbm1;
>> +    uint64_t round_mask;
>> +    uint64_t roundeven_mask;
>> +} decomposed_params;
>> +
>> +#define FRAC_PARAMS(F)                     \
>> +    .frac_shift     = F,                   \
>> +    .frac_lsb       = 1ull << (F),         \
>> +    .frac_lsbm1     = 1ull << ((F) - 1),   \
>> +    .round_mask     = (1ull << (F)) - 1,   \
>> +    .roundeven_mask = (2ull << (F)) - 1
>> +
>> +static const decomposed_params float16_params = {
>> +    .exp_bias       = 0x0f,
>> +    .exp_max        = 0x1f,
>> +    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 10)
>> +};
>> +
>> +static const decomposed_params float32_params = {
>> +    .exp_bias       = 0x7f,
>> +    .exp_max        = 0xff,
>> +    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 23)
>> +};
>> +
>> +static const decomposed_params float64_params = {
>> +    .exp_bias       = 0x3ff,
>> +    .exp_max        = 0x7ff,
>> +    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 52)
>> +};
>> +
>>  /*----------------------------------------------------------
------------------
>>  | Takes a 64-bit fixed-point value `absZ' with binary point between
bits 6
>>  | and 7, and returns the properly rounded 32-bit integer corresponding
to the
>>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures
  2018-01-18 14:26       ` Philippe Mathieu-Daudé
@ 2018-01-18 14:31         ` Peter Maydell
  2018-01-18 14:59           ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2018-01-18 14:31 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Alex Bennée, Francisco Iglesias, Richard Henderson,
	Laurent Vivier, QEMU Developers, Andrew Dutcher, bharata,
	Aurelien Jarno

On 18 January 2018 at 14:26, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
>
>
> Le 18 janv. 2018 10:09 AM, "Alex Bennée" <alex.bennee@linaro.org> a écrit :
>
>
> Philippe Mathieu-Daudé <f4bug@amsat.org> writes:
>>> +typedef struct {
>>> +    uint64_t frac   : 64;
>>
>> I think this does not work on LLP64/IL32P64 model.
>>
>> Should we add a check in ./configure and refuse to build on IL32P64
>> model? This would be safer IMHO.
>>
>>> +    int exp         : 32;
>>> +    float_class cls : 8;
>>> +    int             : 23;
>>> +    bool sign       : 1;
>>
>> checking on "ISO/IEC 14882:1998" 9.6 Bit-fields:
>>
>> Alignment of bit-fields is implementation-defined. Bit-fields are packed
>> into some addressable allocation unit. [Note: bit-fields straddle
>> allocation units on some machines and not on others. Bit-fields are
>> assigned right-to-left on some machines, left-to-right on others. ]
>>
>> I'd still write it:
>>
>>       int             :23, sign :1;
>>
>>> +} decomposed_parts;
>
> I think rather than stuff it into bit fields we can just leave it up to
> the compiler?
>
>
> Yep, my only worry here is the IL32P64 model, if we care.

I don't think we care much about IL32P64, but the code should
still work there, right? It doesn't actually make any assumptions
about bitfield layout.

I think I agree that we shouldn't use bitfields here if we don't
need to, though.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures
  2018-01-18 14:31         ` Peter Maydell
@ 2018-01-18 14:59           ` Philippe Mathieu-Daudé
  2018-01-18 15:17             ` Peter Maydell
  0 siblings, 1 reply; 68+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-01-18 14:59 UTC (permalink / raw)
  To: Peter Maydell, Richard Henderson, Alex Bennée
  Cc: Francisco Iglesias, Laurent Vivier, QEMU Developers,
	Andrew Dutcher, bharata, Aurelien Jarno

On 01/18/2018 11:31 AM, Peter Maydell wrote:
> On 18 January 2018 at 14:26, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
>> Le 18 janv. 2018 10:09 AM, "Alex Bennée" <alex.bennee@linaro.org> a écrit :
>> Philippe Mathieu-Daudé <f4bug@amsat.org> writes:
>>>> +typedef struct {
>>>> +    uint64_t frac   : 64;
>>>
>>> I think this does not work on LLP64/IL32P64 model.
>>>
>>> Should we add a check in ./configure and refuse to build on IL32P64
>>> model? This would be safer IMHO.
>>>
>>>> +    int exp         : 32;
>>>> +    float_class cls : 8;
>>>> +    int             : 23;
>>>> +    bool sign       : 1;
>>>
>>> checking on "ISO/IEC 14882:1998" 9.6 Bit-fields:
>>>
>>> Alignment of bit-fields is implementation-defined. Bit-fields are packed
>>> into some addressable allocation unit. [Note: bit-fields straddle
>>> allocation units on some machines and not on others. Bit-fields are
>>> assigned right-to-left on some machines, left-to-right on others. ]
>>>
>>> I'd still write it:
>>>
>>>       int             :23, sign :1;
>>>
>>>> +} decomposed_parts;
>>
>> I think rather than stuff it into bit fields we can just leave it up to
>> the compiler?
>>
>>
>> Yep, my only worry here is the IL32P64 model, if we care.
> 
> I don't think we care much about IL32P64, but the code should
> still work there, right? It doesn't actually make any assumptions
> about bitfield layout.

My comment was for a previous line:

  uint64_t frac   : 64;

I don't have enough compiler knowledge to be sure how this bitfield is
interpreted by the compiler. I understood the standard as bitfields are
for 'unsigned', and for IL32 we have sizeof(unsigned) = 32, so I wonder
how a :64 bitfield ends (bits >= 32 silently truncated?).

Richard do you have an idea?

> 
> I think I agree that we shouldn't use bitfields here if we don't
> need to, though.
> 
> thanks
> -- PMM
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures
  2018-01-18 14:59           ` Philippe Mathieu-Daudé
@ 2018-01-18 15:17             ` Peter Maydell
  2018-01-23 12:00               ` Alex Bennée
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2018-01-18 15:17 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Richard Henderson, Alex Bennée, Francisco Iglesias,
	Laurent Vivier, QEMU Developers, Andrew Dutcher, bharata,
	Aurelien Jarno

On 18 January 2018 at 14:59, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
> My comment was for a previous line:
>
>   uint64_t frac   : 64;
>
> I don't have enough compiler knowledge to be sure how this bitfield is
> interpreted by the compiler. I understood the standard as bitfields are
> for 'unsigned', and for IL32 we have sizeof(unsigned) = 32, so I wonder
> how a :64 bitfield ends (bits >= 32 silently truncated?).

Defining a 64-bit bitfield is a bit pointless (why not just use
uint64_t?) but there's nothing particularly different for IL32P64 here.
The spec says the underlying type is _Bool, signed int, unsigned
into, or an implementation defined type. For QEMU's hosts 'int'
is always 32 bits, so if gcc and clang allow bitfields on a
64-bit type like uint64_t (as an impdef extension) then they
should work on all hosts. (In any case it needs to either work
or give a compiler error, silent truncation isn't an option.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/20] fpu/softfloat: re-factor add/sub
  2018-01-12 15:57   ` Peter Maydell
  2018-01-12 18:30     ` Richard Henderson
@ 2018-01-18 16:43     ` Alex Bennée
  2018-01-18 16:47       ` Richard Henderson
  2018-01-23 20:05     ` Alex Bennée
  2 siblings, 1 reply; 68+ messages in thread
From: Alex Bennée @ 2018-01-18 16:43 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno


Peter Maydell <peter.maydell@linaro.org> writes:

> On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
>> We can now add float16_add/sub and use the common decompose and
>> canonicalize functions to have a single implementation for
>> float16/32/64 add and sub functions.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  fpu/softfloat.c         | 904 +++++++++++++++++++++++++-----------------------
>>  include/fpu/softfloat.h |   4 +
>>  2 files changed, 481 insertions(+), 427 deletions(-)
>>
>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>> index fcba28d3f8..f89e47e3ef 100644
>> --- a/fpu/softfloat.c
>> +++ b/fpu/softfloat.c
>> @@ -195,7 +195,7 @@ typedef enum {
>>      float_class_zero,
>>      float_class_normal,
>>      float_class_inf,
>> -    float_class_qnan,
>> +    float_class_qnan,  /* all NaNs from here */
>
> This comment change should be squashed into the previous patch.
>
>>      float_class_snan,
>>      float_class_dnan,
>>      float_class_msnan, /* maybe silenced */
>> @@ -254,6 +254,482 @@ static const decomposed_params float64_params = {
>>      FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 52)
>>  };
>>
>> +/* Unpack a float16 to parts, but do not canonicalize.  */
>> +static inline decomposed_parts float16_unpack_raw(float16 f)
>> +{
>> +    return (decomposed_parts){
>> +        .cls = float_class_unclassified,
>> +        .sign = extract32(f, 15, 1),
>> +        .exp = extract32(f, 10, 5),
>> +        .frac = extract32(f, 0, 10)
>
> In the previous patch we defined a bunch of structs that
> give information about each float format, so it seems a bit
> odd to be hardcoding bit numbers here.

So something like this:

  /* Structure holding all of the relevant parameters for a format.
   *   exp_bias: the offset applied to the exponent field
   *   exp_max: the maximum normalised exponent
   * The following are computed based the size of fraction
   *   frac_shift: shift to normalise the fraction with DECOMPOSED_BINARY_POINT
   *   frac_lsb: least significant bit of fraction
   *   fram_lsbm1: the bit bellow the least significant bit (for rounding)
   *   round_mask/roundeven_mask: masks used for rounding
   */
  typedef struct {
      int exp_bias;
      int exp_max;
      int exp_size;
      int frac_size;
      int frac_shift;
      uint64_t frac_lsb;
      uint64_t frac_lsbm1;
      uint64_t round_mask;
      uint64_t roundeven_mask;
  } FloatFmt;

  /* Expand fields based on the size of exponent and fraction */
  #define FRAC_PARAMS(E, F)                                            \
      .exp_size       = E,                                             \
      .frac_size      = F,                                             \
      .frac_shift     = DECOMPOSED_BINARY_POINT - F,                   \
      .frac_lsb       = 1ull << (DECOMPOSED_BINARY_POINT - F),         \
      .frac_lsbm1     = 1ull << ((DECOMPOSED_BINARY_POINT - F) - 1),   \
      .round_mask     = (1ull << (DECOMPOSED_BINARY_POINT - F)) - 1,   \
      .roundeven_mask = (2ull << (DECOMPOSED_BINARY_POINT - F)) - 1

  static const FloatFmt float16_params = {
      .exp_bias       = 0x0f,
      .exp_max        = 0x1f,
      FRAC_PARAMS(5, 10)
  };

  static const FloatFmt float32_params = {
      .exp_bias       = 0x7f,
      .exp_max        = 0xff,
      FRAC_PARAMS(8, 23)
  };

  static const FloatFmt float64_params = {
      .exp_bias       = 0x3ff,
      .exp_max        = 0x7ff,
      FRAC_PARAMS(11, 52)
  };

  /* Unpack a float to parts, but do not canonicalize.  */
  static inline FloatParts unpack_raw(FloatFmt fmt, uint64_t raw)
  {
      return (FloatParts){
          .cls = float_class_unclassified,
          .sign = extract64(raw, fmt.frac_size + fmt.exp_size, 1),
          .exp = extract64(raw, fmt.frac_size, fmt.exp_size),
          .frac = extract64(raw, 0, fmt.frac_size),
      };
  }

  static inline FloatParts float16_unpack_raw(float16 f)
  {
      return unpack_raw(float16_params, f);
  }

  static inline FloatParts float32_unpack_raw(float32 f)
  {
      return unpack_raw(float32_params, f);
  }

  static inline FloatParts float64_unpack_raw(float64 f)
  {
      return unpack_raw(float64_params, f);
  }

  /* Pack a float from parts, but do not canonicalize.  */
  static inline uint64_t pack_raw(FloatFmt fmt, FloatParts p)
  {
      uint64_t ret = p.frac;
      ret = deposit64(ret, fmt.frac_size, fmt.exp_size, p.exp);
      ret = deposit32(ret, fmt.frac_size + fmt.exp_size, 1, p.sign);
      return make_float16(ret);
  }

  static inline float16 float16_pack_raw(FloatParts p)
  {
      return make_float16(pack_raw(float16_params, p));
  }

  static inline float32 float32_pack_raw(FloatParts p)
  {
      return make_float32(pack_raw(float32_params, p));
  }

  static inline float64 float64_pack_raw(FloatParts p)
  {
      return make_float64(pack_raw(float64_params, p));
  }


>
>> +    };
>> +}
>> +
>> +/* Unpack a float32 to parts, but do not canonicalize.  */
>> +static inline decomposed_parts float32_unpack_raw(float32 f)
>> +{
>> +    return (decomposed_parts){
>> +        .cls = float_class_unclassified,
>> +        .sign = extract32(f, 31, 1),
>> +        .exp = extract32(f, 23, 8),
>> +        .frac = extract32(f, 0, 23)
>> +    };
>> +}
>> +
>> +/* Unpack a float64 to parts, but do not canonicalize.  */
>> +static inline decomposed_parts float64_unpack_raw(float64 f)
>> +{
>> +    return (decomposed_parts){
>> +        .cls = float_class_unclassified,
>> +        .sign = extract64(f, 63, 1),
>> +        .exp = extract64(f, 52, 11),
>> +        .frac = extract64(f, 0, 52),
>> +    };
>> +}
>> +
>> +/* Pack a float32 from parts, but do not canonicalize.  */
>> +static inline float16 float16_pack_raw(decomposed_parts p)
>> +{
>> +    uint32_t ret = p.frac;
>> +    ret = deposit32(ret, 10, 5, p.exp);
>> +    ret = deposit32(ret, 15, 1, p.sign);
>> +    return make_float16(ret);
>> +}
>> +
>> +/* Pack a float32 from parts, but do not canonicalize.  */
>> +static inline float32 float32_pack_raw(decomposed_parts p)
>> +{
>> +    uint32_t ret = p.frac;
>> +    ret = deposit32(ret, 23, 8, p.exp);
>> +    ret = deposit32(ret, 31, 1, p.sign);
>> +    return make_float32(ret);
>> +}
>> +
>> +/* Pack a float64 from parts, but do not canonicalize.  */
>> +static inline float64 float64_pack_raw(decomposed_parts p)
>> +{
>> +    uint64_t ret = p.frac;
>> +    ret = deposit64(ret, 52, 11, p.exp);
>> +    ret = deposit64(ret, 63, 1, p.sign);
>> +    return make_float64(ret);
>> +}
>> +
>> +/* Canonicalize EXP and FRAC, setting CLS.  */
>> +static decomposed_parts decomposed_canonicalize(decomposed_parts part,
>> +                                        const decomposed_params *parm,
>
> If you pick more compact names for your decomposed_params and
> decomposed_parts structs, you won't have such awkwardness trying
> to format function prototypes. (checkpatch complains that you have
> an overlong line somewhere in this patch for this reason.)
>
> In particular "decomposed_params" I think should change -- it's
> confusingly similar to decomposed_parts, and it isn't really
> a decomposed anything. It's just a collection of useful information
> describing the float format. Try 'fmtinfo', maybe?

I've gone for FloatParts and FloatParams

>
> I see we're passing and returning decomposed_parts structs everywhere
> rather than pointers to them. How well does that compile? (I guess
> everything ends up inlining...)

Yes - if you use the bitfield struct. Without it you end up with quite a
messy preamble.


--
Alex Bennée

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/20] fpu/softfloat: re-factor add/sub
  2018-01-18 16:43     ` Alex Bennée
@ 2018-01-18 16:47       ` Richard Henderson
  0 siblings, 0 replies; 68+ messages in thread
From: Richard Henderson @ 2018-01-18 16:47 UTC (permalink / raw)
  To: Alex Bennée, Peter Maydell
  Cc: Laurent Vivier, bharata, Andrew Dutcher, QEMU Developers, Aurelien Jarno

On 01/18/2018 08:43 AM, Alex Bennée wrote:
>   /* Expand fields based on the size of exponent and fraction */
>   #define FRAC_PARAMS(E, F)                                            \
>       .exp_size       = E,                                             \
>       .frac_size      = F,                                             \
>       .frac_shift     = DECOMPOSED_BINARY_POINT - F,                   \
>       .frac_lsb       = 1ull << (DECOMPOSED_BINARY_POINT - F),         \
>       .frac_lsbm1     = 1ull << ((DECOMPOSED_BINARY_POINT - F) - 1),   \
>       .round_mask     = (1ull << (DECOMPOSED_BINARY_POINT - F)) - 1,   \
>       .roundeven_mask = (2ull << (DECOMPOSED_BINARY_POINT - F)) - 1
> 
>   static const FloatFmt float16_params = {
>       .exp_bias       = 0x0f,
>       .exp_max        = 0x1f,
>       FRAC_PARAMS(5, 10)
>   };

You can compute exp_bias and exp_max from E as well.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures
  2018-01-18 15:17             ` Peter Maydell
@ 2018-01-23 12:00               ` Alex Bennée
  0 siblings, 0 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-23 12:00 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Philippe Mathieu-Daudé,
	Richard Henderson, Francisco Iglesias, Laurent Vivier,
	QEMU Developers, Andrew Dutcher, bharata, Aurelien Jarno


Peter Maydell <peter.maydell@linaro.org> writes:

> On 18 January 2018 at 14:59, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
>> My comment was for a previous line:
>>
>>   uint64_t frac   : 64;
>>
>> I don't have enough compiler knowledge to be sure how this bitfield is
>> interpreted by the compiler. I understood the standard as bitfields are
>> for 'unsigned', and for IL32 we have sizeof(unsigned) = 32, so I wonder
>> how a :64 bitfield ends (bits >= 32 silently truncated?).
>
> Defining a 64-bit bitfield is a bit pointless (why not just use
> uint64_t?) but there's nothing particularly different for IL32P64 here.
> The spec says the underlying type is _Bool, signed int, unsigned
> into, or an implementation defined type. For QEMU's hosts 'int'
> is always 32 bits, so if gcc and clang allow bitfields on a
> 64-bit type like uint64_t (as an impdef extension) then they
> should work on all hosts. (In any case it needs to either work
> or give a compiler error, silent truncation isn't an option.)

Using explicit size types and an attribute on FloatClass seemed to be
enough:

/*
 * Classify a floating point number. Everything above float_class_qnan
 * is a NaN so cls >= float_class_qnan is any NaN.
 */

typedef enum __attribute__ ((__packed__)) {
    float_class_unclassified,
    float_class_zero,
    float_class_normal,
    float_class_inf,
    float_class_qnan,  /* all NaNs from here */
    float_class_snan,
    float_class_dnan,
    float_class_msnan, /* maybe silenced */
} FloatClass;

/*
 * Structure holding all of the decomposed parts of a float. The
 * exponent is unbiased and the fraction is normalized. All
 * calculations are done with a 64 bit fraction and then rounded as
 * appropriate for the final format.
 *
 * Thanks to the packed FloatClass a decent compiler should be able to
 * fit the whole structure into registers and avoid using the stack
 * for parameter passing.
 */

typedef struct {
    uint64_t frac;
    int32_t  exp;
    FloatClass cls;
    bool sign;
} FloatParts;

--
Alex Bennée

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/20] fpu/softfloat: re-factor add/sub
  2018-01-12 15:57   ` Peter Maydell
  2018-01-12 18:30     ` Richard Henderson
  2018-01-18 16:43     ` Alex Bennée
@ 2018-01-23 20:05     ` Alex Bennée
  2 siblings, 0 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-23 20:05 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno


Peter Maydell <peter.maydell@linaro.org> writes:

<snip>
>
>> +                                        float_status *status)
>> +{
>> +    if (part.exp == parm->exp_max) {
>> +        if (part.frac == 0) {
>> +            part.cls = float_class_inf;
>> +        } else {
>> +#ifdef NO_SIGNALING_NANS
>
> The old code didn't seem to need to ifdef this; why's the new
> code different? (at some point we'll want to make this a runtime
> setting so we can support one binary handling CPUs with it both
> set and unset, but that is a far future thing we can ignore for now)

It does, but it's hidden behind propagateFloatXXNaN which in turn calls
floatXX_is_quiet/signalling_nan which are altered by the #ifdefs.

>
>> +            part.cls = float_class_qnan;
>> +#else
>> +            int64_t msb = part.frac << (parm->frac_shift + 2);
>> +            if ((msb < 0) == status->snan_bit_is_one) {
>> +                part.cls = float_class_snan;
>> +            } else {
>> +                part.cls = float_class_qnan;
>> +            }
>> +#endif
>> +        }
>> +    } else if (part.exp == 0) {
>> +        if (likely(part.frac == 0)) {
>> +            part.cls = float_class_zero;
>> +        } else if (status->flush_inputs_to_zero) {
>> +            float_raise(float_flag_input_denormal, status);
>> +            part.cls = float_class_zero;
>> +            part.frac = 0;
>> +        } else {
>> +            int shift = clz64(part.frac) - 1;
>> +            part.cls = float_class_normal;
>
> This is really confusing. This is a *denormal*, but we're setting
> the classification to "normal" ? (It's particularly confusing in
> the code that uses the decomposed numbers, because it looks like
> "if (a.cls == float_class_normal...)" is handling the normal-number
> case and denormals are going to be in a later if branch, but actually
> it's dealing with both.)

The code deals with canonicalized numbers - so unless we explicitly
flush denormals to zero they can be treated like any other for the rest
of the code.

What would you prefer? A comment in FloatClass?

<snip>
>> +
>> +static float16 float16_round_pack_canonical(decomposed_parts p, float_status *s)
>> +{
>> +    switch (p.cls) {
>> +    case float_class_dnan:
>> +        return float16_default_nan(s);
>> +    case float_class_msnan:
>> +        return float16_maybe_silence_nan(float16_pack_raw(p), s);
>
> I think you will find that doing the silencing of the NaNs like this
> isn't quite the right approach. Specifically, for Arm targets we
> currently have a bug in float-to-float conversion from a wider
> format to a narrower one when the input is a signaling NaN that we
> want to silence, and its non-zero mantissa bits are all at the
> less-significant end of the mantissa such that they don't fit into
> the narrower format. If you pack the float into a float16 first and
> then call maybe_silence_nan() on it you've lost the info about those
> low bits which the silence function needs to know to return the
> right answer. What you want to do instead is pass the silence_nan
> function the decomposed value.

So this is an inherited bug from softfloat-specialize.h? I guess we need
a common specific decomposed specialisation we can use for all the sizes.

>
> (The effect of this bug is that we return a default NaN, with the
> sign bit clear, but the Arm FPConvertNaN pseudocode says that we
> should effectively get the default NaN but with the same sign bit
> as the input SNaN.)
>
> Given that this is a bug currently in the version we have, we don't
> necessarily need to fix it now, but I thought I'd mention it since
> the redesign has almost but not quite managed to deliver the right
> information to the silencing code to allow us to fix it soon :-)

So comment for now? Currently all the information for decomposed is kept
internal to softfloat.c - I'm not sure we want to expose the internals
to a wider audience? Especially as these inline helpers in specialize.h
are also used by helpers.

<snip>
>> +
>> +
>> +/*
>> + * Returns the result of adding the absolute values of the
>> + * floating-point values `a' and `b'. If `subtract' is set, the sum is
>> + * negated before being returned. `subtract' is ignored if the result
>> + * is a NaN. The addition is performed according to the IEC/IEEE
>> + * Standard for Binary Floating-Point Arithmetic.
>> + */
>
> This comment doesn't seem to match what the code is doing,
> because it says it adds the absolute values of 'a' and 'b',
> but the code looks at a_sign and b_sign to decide whether it's
> doing an addition or subtraction rather than ignoring the signs
> (as you would for absolute arithmetic).
>
> Put another way, this comment has been copied from the old addFloat64Sigs()
> and not updated to account for the way the new function includes handling
> of subFloat64Sigs().
>
>> +
>> +static decomposed_parts add_decomposed(decomposed_parts a, decomposed_parts b,
>> +                                       bool subtract, float_status *s)
>> +{
>> +    bool a_sign = a.sign;
>> +    bool b_sign = b.sign ^ subtract;
>> +
>> +    if (a_sign != b_sign) {
>> +        /* Subtraction */
>> +
>> +        if (a.cls == float_class_normal && b.cls == float_class_normal) {
>> +            int a_exp = a.exp;
>> +            int b_exp = b.exp;
>> +            uint64_t a_frac = a.frac;
>> +            uint64_t b_frac = b.frac;
>
> Do we really have to use locals here rather than just using a.frac,
> b.frac etc in place ? If we trust the compiler enough to throw
> structs in and out of functions and let everything inline, it
> ought to be able to handle a uint64_t in a struct local variable.

Fixed.

>> +        if (a.cls >= float_class_qnan
>> +            ||
>> +            b.cls >= float_class_qnan) {
>
> We should helper functions for "is some kind of NaN" rather than
> baking the assumption about order of the enum values directly
> into every function. (Also "float_is_any_nan(a)" is easier to read.)

if (is_nan(a.cls) || is_nan(b.cls))
>> +float64 float64_sub(float64 a, float64 b, float_status *status)
>> +{
>> +    decomposed_parts pa = float64_unpack_canonical(a, status);
>> +    decomposed_parts pb = float64_unpack_canonical(b, status);
>> +    decomposed_parts pr = add_decomposed(pa, pb, true, status);
>> +
>> +    return float64_round_pack_canonical(pr, status);
>> +}
>
> This part is a pretty good advert for the benefits of the refactoring...
>
> I'm not particularly worried about the performance of softfloat,
> but out of curiosity have you benchmarked the old vs new?

Not yet but I can run some with my vector kernel benchmark.

>
> thanks
> -- PMM


--
Alex Bennée

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 18/20] fpu/softfloat: re-factor scalbn
  2018-01-12 16:31   ` Peter Maydell
@ 2018-01-24 12:03     ` Alex Bennée
  0 siblings, 0 replies; 68+ messages in thread
From: Alex Bennée @ 2018-01-24 12:03 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno


Peter Maydell <peter.maydell@linaro.org> writes:

> On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
>> This is one of the simpler manipulations you could make to a floating
>> point number.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  fpu/softfloat.c         | 104 +++++++++++++++---------------------------------
>>  include/fpu/softfloat.h |   1 +
>>  2 files changed, 32 insertions(+), 73 deletions(-)
>>
>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>> index bb68d77f72..3647f6ca03 100644
>> --- a/fpu/softfloat.c
>> +++ b/fpu/softfloat.c
>> @@ -1663,6 +1663,37 @@ float64 uint16_to_float64(uint16_t a, float_status *status)
>>      return uint64_to_float64(a, status);
>>  }
>>
>> +/* Multiply A by 2 raised to the power N.  */
>> +static decomposed_parts scalbn_decomposed(decomposed_parts a, int n,
>> +                                          float_status *s)
>> +{
>> +    if (a.cls == float_class_normal) {
>> +        a.exp += n;
>> +    }
>> +    return a;
>> +}
>> +
>> +float16 float16_scalbn(float16 a, int n, float_status *status)
>> +{
>> +    decomposed_parts pa = float16_unpack_canonical(a, status);
>> +    decomposed_parts pr = scalbn_decomposed(pa, n, status);
>> +    return float16_round_pack_canonical(pr, status);
>> +}
>> +
>> +float32 float32_scalbn(float32 a, int n, float_status *status)
>> +{
>> +    decomposed_parts pa = float32_unpack_canonical(a, status);
>> +    decomposed_parts pr = scalbn_decomposed(pa, n, status);
>> +    return float32_round_pack_canonical(pr, status);
>> +}
>> +
>> +float64 float64_scalbn(float64 a, int n, float_status *status)
>> +{
>> +    decomposed_parts pa = float64_unpack_canonical(a, status);
>> +    decomposed_parts pr = scalbn_decomposed(pa, n, status);
>> +    return float64_round_pack_canonical(pr, status);
>> +}
>
> The old code used propagateFloat32NaN(a, a, status) if the
> input was a NaN, to cause us to raise the invalid flag,
> maybe return a default NaN, maybe silence the NaN. I can't
> see where the new code is doing this?

invalid and setting msnan are done during the unpack stage when the
input is canonicalized. NaN's may raise their signals when we round and
pack any results.

>
> thanks
> -- PMM


--
Alex Bennée

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/20] fpu/softfloat: re-factor muladd
  2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 14/20] fpu/softfloat: re-factor muladd Alex Bennée
@ 2018-02-13 15:15   ` Peter Maydell
  0 siblings, 0 replies; 68+ messages in thread
From: Peter Maydell @ 2018-02-13 15:15 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, Laurent Vivier, bharata, Andrew Dutcher,
	QEMU Developers, Aurelien Jarno

On 9 January 2018 at 12:22, Alex Bennée <alex.bennee@linaro.org> wrote:
> We can now add float16_muladd and use the common decompose and
> canonicalize functions to have a single implementation for
> float16/32/64 muladd functions.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2018-02-13 15:16 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-09 12:22 [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions Alex Bennée
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 01/20] fpu/softfloat: implement float16_squash_input_denormal Alex Bennée
2018-01-12 13:41   ` Peter Maydell
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 02/20] include/fpu/softfloat: remove USE_SOFTFLOAT_STRUCT_TYPES Alex Bennée
2018-01-09 12:27   ` Laurent Vivier
2018-01-09 14:12     ` Aurelien Jarno
2018-01-09 14:14       ` Peter Maydell
2018-01-09 14:20         ` Laurent Vivier
2018-01-09 14:43           ` Peter Maydell
2018-01-09 16:45             ` Richard Henderson
2018-01-09 15:25           ` Alex Bennée
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 03/20] include/fpu/softfloat: implement float16_abs helper Alex Bennée
2018-01-12 13:42   ` Peter Maydell
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 04/20] include/fpu/softfloat: implement float16_chs helper Alex Bennée
2018-01-12 13:43   ` Peter Maydell
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 05/20] include/fpu/softfloat: implement float16_set_sign helper Alex Bennée
2018-01-12 13:43   ` Peter Maydell
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 06/20] include/fpu/softfloat: add some float16 constants Alex Bennée
2018-01-09 13:27   ` Philippe Mathieu-Daudé
2018-01-09 15:16     ` Alex Bennée
2018-01-12 13:47   ` Peter Maydell
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 07/20] fpu/softfloat: propagate signalling NaNs in MINMAX Alex Bennée
2018-01-12 14:04   ` Peter Maydell
2018-01-16 11:31     ` Alex Bennée
2018-01-16 11:53       ` Alex Bennée
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 08/20] fpu/softfloat: improve comments on ARM NaN propagation Alex Bennée
2018-01-12 14:07   ` Peter Maydell
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 09/20] fpu/softfloat: move the extract functions to the top of the file Alex Bennée
2018-01-12 14:07   ` Peter Maydell
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 10/20] fpu/softfloat: define decompose structures Alex Bennée
2018-01-09 17:01   ` Richard Henderson
2018-01-12 14:22   ` Peter Maydell
2018-01-12 16:21   ` Philippe Mathieu-Daudé
2018-01-18 13:08     ` Alex Bennée
2018-01-18 14:26       ` Philippe Mathieu-Daudé
2018-01-18 14:31         ` Peter Maydell
2018-01-18 14:59           ` Philippe Mathieu-Daudé
2018-01-18 15:17             ` Peter Maydell
2018-01-23 12:00               ` Alex Bennée
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 11/20] fpu/softfloat: re-factor add/sub Alex Bennée
2018-01-12 15:57   ` Peter Maydell
2018-01-12 18:30     ` Richard Henderson
2018-01-18 16:43     ` Alex Bennée
2018-01-18 16:47       ` Richard Henderson
2018-01-23 20:05     ` Alex Bennée
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 12/20] fpu/softfloat: re-factor mul Alex Bennée
2018-01-09 12:43   ` Philippe Mathieu-Daudé
2018-01-12 16:17   ` Peter Maydell
2018-01-16 10:16     ` Alex Bennée
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 13/20] fpu/softfloat: re-factor div Alex Bennée
2018-01-12 16:22   ` Peter Maydell
2018-01-12 18:35     ` Richard Henderson
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 14/20] fpu/softfloat: re-factor muladd Alex Bennée
2018-02-13 15:15   ` Peter Maydell
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 15/20] fpu/softfloat: re-factor round_to_int Alex Bennée
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 16/20] fpu/softfloat: re-factor float to int/uint Alex Bennée
2018-01-09 17:12   ` Richard Henderson
2018-01-12 16:36   ` Peter Maydell
2018-01-16 17:06   ` Alex Bennée
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 17/20] fpu/softfloat: re-factor int/uint to float Alex Bennée
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 18/20] fpu/softfloat: re-factor scalbn Alex Bennée
2018-01-12 16:31   ` Peter Maydell
2018-01-24 12:03     ` Alex Bennée
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 19/20] fpu/softfloat: re-factor minmax Alex Bennée
2018-01-09 17:16   ` Richard Henderson
2018-01-09 12:22 ` [Qemu-devel] [PATCH v2 20/20] fpu/softfloat: re-factor compare Alex Bennée
2018-01-09 17:18   ` Richard Henderson
2018-01-09 13:07 ` [Qemu-devel] [PATCH v2 00/20] re-factor softfloat and add fp16 functions no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.