All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions
@ 2017-12-11 12:56 Alex Bennée
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 01/19] fpu/softfloat: implement float16_squash_input_denormal Alex Bennée
                   ` (19 more replies)
  0 siblings, 20 replies; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:56 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée

Hi,

In my previous run at this I'd simply taken the existing float32
functions and attempted to copy and paste the code changing the
relevant constants. Apart from the usual typos and missed bits there
were sections where softfloat pulls tricks because it knows the exact
bit positions of things. While I'm sure it's marginally faster it does
make the code rather impenetrable to someone not familiar with how
SoftFloat does things. One thing the last few months have taught me is
the world is not awash with experts on the finer implementation
details of floating point maths. After reviewing the last series
Richard Henderson suggested a different approach which pushed most of
the code into common shared functions. The majority of the work on the
fractional bits is done in 64 bit resolution which leaves plenty of
spare bits for rounding for everything from float16 to float64. This
series is a result of that work and a coding sprint we did 2 weeks ago
in Cambridge.

We've not touched anything that needs higher precision which at the
moment is float80 and 128 bit quad precision operations. They would
need similar decomposed routines to operate on the higher precision
fractional parts. I suspect we'd need to beef up our Int128 wrapper in
the process so it can be done efficiently with 128 bit maths.

This work is part of the larger chunk of adding half-precision ops to
the ARM front-end. However I've split the series up to make for a less
messy review. This tree can be found at:

  https://github.com/stsquad/qemu/tree/softfloat-refactor-and-fp16-v1

While I have been testing the half-precision stuff in the ARM
specific tree this series is all common code. It has however been
tested with ARM RISU which exercises the float32/64 code paths quite
nicely.

Any additional testing appreciated.

Series Breakdown
----------------

The first five patches are simple helper functions that are mostly
inline and there for the benefit of architecture helper functions.
This includes the float16 constants in the final patch.

The next two patches fixed a bug in NaN propagation which only showed
up when doing ARM "Reduction" operations in float16. Although the
minmax code is totally replaced later on I wanted to fix it in place
first rather than add the fix when it was re-written.

The next two patches start preparing the ground for the new decomposed
functions and their public APIs. I've used macro expansion in a few
places just to avoid the amount of repeated boiler-plate for these
APIs. Most of the work is done in the static decompose_foo functions.

As you can see in the diffstat there is an overall code reduction even
though we have also added float16 support. For reference the previous
attempt added 1258 lines of code to implement a subset of the float16
functions. I think the code is also a lot easier to follow and reason
about.

Alex Bennée (19):
  fpu/softfloat: implement float16_squash_input_denormal
  include/fpu/softfloat: implement float16_abs helper
  include/fpu/softfloat: implement float16_chs helper
  include/fpu/softfloat: implement float16_set_sign helper
  include/fpu/softfloat: add some float16 contants
  fpu/softfloat: propagate signalling NaNs in MINMAX
  fpu/softfloat: improve comments on ARM NaN propagation
  fpu/softfloat: move the extract functions to the top of the file
  fpu/softfloat: define decompose structures
  fpu/softfloat: re-factor add/sub
  fpu/softfloat: re-factor mul
  fpu/softfloat: re-factor div
  fpu/softfloat: re-factor muladd
  fpu/softfloat: re-factor round_to_int
  fpu/softfloat: re-factor float to int/uint
  fpu/softfloat: re-factor int/uint to float
  fpu/softfloat: re-factor scalbn
  fpu/softfloat: re-factor minmax
  fpu/softfloat: re-factor compare

 fpu/softfloat-macros.h     |   44 +
 fpu/softfloat-specialize.h |  115 +-
 fpu/softfloat.c            | 6668 ++++++++++++++++++++------------------------
 include/fpu/softfloat.h    |   89 +-
 4 files changed, 3066 insertions(+), 3850 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 01/19] fpu/softfloat: implement float16_squash_input_denormal
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
@ 2017-12-11 12:56 ` Alex Bennée
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 02/19] include/fpu/softfloat: implement float16_abs helper Alex Bennée
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:56 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

This will be required when expanding the MINMAX() macro for 16
bit/half-precision operations.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c         | 15 +++++++++++++++
 include/fpu/softfloat.h |  1 +
 2 files changed, 16 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 433c5dad2d..3a4ab1355f 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3488,6 +3488,21 @@ static float16 roundAndPackFloat16(flag zSign, int zExp,
     return packFloat16(zSign, zExp, zSig >> 13);
 }
 
+/*----------------------------------------------------------------------------
+| If `a' is denormal and we are in flush-to-zero mode then set the
+| input-denormal exception and return zero. Otherwise just return the value.
+*----------------------------------------------------------------------------*/
+float16 float16_squash_input_denormal(float16 a, float_status *status)
+{
+    if (status->flush_inputs_to_zero) {
+        if (extractFloat16Exp(a) == 0 && extractFloat16Frac(a) != 0) {
+            float_raise(float_flag_input_denormal, status);
+            return make_float16(float16_val(a) & 0x8000);
+        }
+    }
+    return a;
+}
+
 static void normalizeFloat16Subnormal(uint32_t aSig, int *zExpPtr,
                                       uint32_t *zSigPtr)
 {
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 0f96a0edd1..d5e99667b6 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -277,6 +277,7 @@ void float_raise(uint8_t flags, float_status *status);
 | If `a' is denormal and we are in flush-to-zero mode then set the
 | input-denormal exception and return zero. Otherwise just return the value.
 *----------------------------------------------------------------------------*/
+float16 float16_squash_input_denormal(float16 a, float_status *status);
 float32 float32_squash_input_denormal(float32 a, float_status *status);
 float64 float64_squash_input_denormal(float64 a, float_status *status);
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 02/19] include/fpu/softfloat: implement float16_abs helper
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 01/19] fpu/softfloat: implement float16_squash_input_denormal Alex Bennée
@ 2017-12-11 12:56 ` Alex Bennée
  2017-12-15 11:35   ` Philippe Mathieu-Daudé
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 03/19] include/fpu/softfloat: implement float16_chs helper Alex Bennée
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:56 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

This will be required when expanding the MINMAX() macro for 16
bit/half-precision operations.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index d5e99667b6..edf402d422 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -374,6 +374,13 @@ static inline int float16_is_zero_or_denormal(float16 a)
     return (float16_val(a) & 0x7c00) == 0;
 }
 
+static inline float16 float16_abs(float16 a)
+{
+    /* Note that abs does *not* handle NaN specially, nor does
+     * it flush denormal inputs to zero.
+     */
+    return make_float16(float16_val(a) & 0x7fff);
+}
 /*----------------------------------------------------------------------------
 | The pattern for a default generated half-precision NaN.
 *----------------------------------------------------------------------------*/
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 03/19] include/fpu/softfloat: implement float16_chs helper
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 01/19] fpu/softfloat: implement float16_squash_input_denormal Alex Bennée
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 02/19] include/fpu/softfloat: implement float16_abs helper Alex Bennée
@ 2017-12-11 12:56 ` Alex Bennée
  2017-12-18 21:41   ` Richard Henderson
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 04/19] include/fpu/softfloat: implement float16_set_sign helper Alex Bennée
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:56 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 include/fpu/softfloat.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index edf402d422..32036382c6 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -381,6 +381,15 @@ static inline float16 float16_abs(float16 a)
      */
     return make_float16(float16_val(a) & 0x7fff);
 }
+
+static inline float16 float16_chs(float16 a)
+{
+    /* Note that chs does *not* handle NaN specially, nor does
+     * it flush denormal inputs to zero.
+     */
+    return make_float16(float16_val(a) ^ 0x8000);
+}
+
 /*----------------------------------------------------------------------------
 | The pattern for a default generated half-precision NaN.
 *----------------------------------------------------------------------------*/
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 04/19] include/fpu/softfloat: implement float16_set_sign helper
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (2 preceding siblings ...)
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 03/19] include/fpu/softfloat: implement float16_chs helper Alex Bennée
@ 2017-12-11 12:56 ` Alex Bennée
  2017-12-18 21:44   ` Richard Henderson
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 05/19] include/fpu/softfloat: add some float16 contants Alex Bennée
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:56 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 include/fpu/softfloat.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 32036382c6..17dfe60dbd 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -390,6 +390,11 @@ static inline float16 float16_chs(float16 a)
     return make_float16(float16_val(a) ^ 0x8000);
 }
 
+static inline float16 float16_set_sign(float16 a, int sign)
+{
+    return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
+}
+
 /*----------------------------------------------------------------------------
 | The pattern for a default generated half-precision NaN.
 *----------------------------------------------------------------------------*/
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 05/19] include/fpu/softfloat: add some float16 contants
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (3 preceding siblings ...)
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 04/19] include/fpu/softfloat: implement float16_set_sign helper Alex Bennée
@ 2017-12-11 12:56 ` Alex Bennée
  2017-12-15 12:24   ` Alex Bennée
  2017-12-15 13:37   ` Philippe Mathieu-Daudé
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 06/19] fpu/softfloat: propagate signalling NaNs in MINMAX Alex Bennée
                   ` (14 subsequent siblings)
  19 siblings, 2 replies; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:56 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

This defines the same set of common constants for float 16 as defined
for 32 and 64 bit floats. These are often used by target helper
functions.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 include/fpu/softfloat.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 17dfe60dbd..5a9258c57c 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -395,6 +395,13 @@ static inline float16 float16_set_sign(float16 a, int sign)
     return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
 }
 
+#define float16_zero make_float16(0)
+#define float16_one make_float16(0x3a00)
+#define float16_ln2 make_float16(0x34d1)
+#define float16_pi make_float16(0x4448)
+#define float16_half make_float16(0x3800)
+#define float16_infinity make_float16(0x7a00)
+
 /*----------------------------------------------------------------------------
 | The pattern for a default generated half-precision NaN.
 *----------------------------------------------------------------------------*/
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 06/19] fpu/softfloat: propagate signalling NaNs in MINMAX
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (4 preceding siblings ...)
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 05/19] include/fpu/softfloat: add some float16 contants Alex Bennée
@ 2017-12-11 12:56 ` Alex Bennée
  2017-12-18 21:53   ` Richard Henderson
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 07/19] fpu/softfloat: improve comments on ARM NaN propagation Alex Bennée
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:56 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

While a comparison between a QNaN and a number will return the number
it is not the same with a signaling NaN. In this case the SNaN will
"win" and after potentially raising an exception it will be quietened.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - added return for propageFloat
---
 fpu/softfloat.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3a4ab1355f..44c043924e 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -7683,6 +7683,7 @@ int float128_compare_quiet(float128 a, float128 b, float_status *status)
  * minnum() and maxnum() functions. These are similar to the min()
  * and max() functions but if one of the arguments is a QNaN and
  * the other is numerical then the numerical argument is returned.
+ * SNaNs will get quietened before being returned.
  * minnum() and maxnum correspond to the IEEE 754-2008 minNum()
  * and maxNum() operations. min() and max() are the typical min/max
  * semantics provided by many CPUs which predate that specification.
@@ -7703,11 +7704,14 @@ static inline float ## s float ## s ## _minmax(float ## s a, float ## s b,     \
     if (float ## s ## _is_any_nan(a) ||                                 \
         float ## s ## _is_any_nan(b)) {                                 \
         if (isieee) {                                                   \
-            if (float ## s ## _is_quiet_nan(a, status) &&               \
+            if (float ## s ## _is_signaling_nan(a, status) ||           \
+                float ## s ## _is_signaling_nan(b, status)) {           \
+                return propagateFloat ## s ## NaN(a, b, status);        \
+            } else  if (float ## s ## _is_quiet_nan(a, status) &&       \
                 !float ## s ##_is_any_nan(b)) {                         \
                 return b;                                               \
             } else if (float ## s ## _is_quiet_nan(b, status) &&        \
-                       !float ## s ## _is_any_nan(a)) {                \
+                       !float ## s ## _is_any_nan(a)) {                 \
                 return a;                                               \
             }                                                           \
         }                                                               \
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 07/19] fpu/softfloat: improve comments on ARM NaN propagation
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (5 preceding siblings ...)
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 06/19] fpu/softfloat: propagate signalling NaNs in MINMAX Alex Bennée
@ 2017-12-11 12:56 ` Alex Bennée
  2017-12-18 21:54   ` Richard Henderson
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 08/19] fpu/softfloat: move the extract functions to the top of the file Alex Bennée
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:56 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

Mention the pseudo-code fragment from which this is based and correct
the spelling of signalling.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 fpu/softfloat-specialize.h | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
index de2c5d5702..3d507d8c77 100644
--- a/fpu/softfloat-specialize.h
+++ b/fpu/softfloat-specialize.h
@@ -445,14 +445,15 @@ static float32 commonNaNToFloat32(commonNaNT a, float_status *status)
 
 #if defined(TARGET_ARM)
 static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN,
-                    flag aIsLargerSignificand)
+                   flag aIsLargerSignificand)
 {
-    /* ARM mandated NaN propagation rules: take the first of:
-     *  1. A if it is signaling
-     *  2. B if it is signaling
+    /* ARM mandated NaN propagation rules (see FPProcessNaNs()), take
+     * the first of:
+     *  1. A if it is signalling
+     *  2. B if it is signalling
      *  3. A (quiet)
      *  4. B (quiet)
-     * A signaling NaN is always quietened before returning it.
+     * A signalling NaN is always quietened before returning it.
      */
     if (aIsSNaN) {
         return 0;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 08/19] fpu/softfloat: move the extract functions to the top of the file
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (6 preceding siblings ...)
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 07/19] fpu/softfloat: improve comments on ARM NaN propagation Alex Bennée
@ 2017-12-11 12:56 ` Alex Bennée
  2017-12-18 21:57   ` Richard Henderson
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 09/19] fpu/softfloat: define decompose structures Alex Bennée
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:56 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

This is pure code-motion during re-factoring as the helpers will be
needed earlier.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 fpu/softfloat.c | 119 +++++++++++++++++++++++++-------------------------------
 1 file changed, 53 insertions(+), 66 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 44c043924e..0850a78149 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -132,6 +132,59 @@ static inline flag extractFloat16Sign(float16 a)
     return float16_val(a)>>15;
 }
 
+/*----------------------------------------------------------------------------
+| Returns the fraction bits of the single-precision floating-point value `a'.
+*----------------------------------------------------------------------------*/
+
+static inline uint32_t extractFloat32Frac(float32 a)
+{
+    return float32_val(a) & 0x007FFFFF;
+}
+
+/*----------------------------------------------------------------------------
+| Returns the exponent bits of the single-precision floating-point value `a'.
+*----------------------------------------------------------------------------*/
+
+static inline int extractFloat32Exp(float32 a)
+{
+    return (float32_val(a) >> 23) & 0xFF;
+}
+
+/*----------------------------------------------------------------------------
+| Returns the sign bit of the single-precision floating-point value `a'.
+*----------------------------------------------------------------------------*/
+
+static inline flag extractFloat32Sign(float32 a)
+{
+    return float32_val(a) >> 31;
+}
+
+/*----------------------------------------------------------------------------
+| Returns the fraction bits of the double-precision floating-point value `a'.
+*----------------------------------------------------------------------------*/
+
+static inline uint64_t extractFloat64Frac(float64 a)
+{
+    return float64_val(a) & LIT64(0x000FFFFFFFFFFFFF);
+}
+
+/*----------------------------------------------------------------------------
+| Returns the exponent bits of the double-precision floating-point value `a'.
+*----------------------------------------------------------------------------*/
+
+static inline int extractFloat64Exp(float64 a)
+{
+    return (float64_val(a) >> 52) & 0x7FF;
+}
+
+/*----------------------------------------------------------------------------
+| Returns the sign bit of the double-precision floating-point value `a'.
+*----------------------------------------------------------------------------*/
+
+static inline flag extractFloat64Sign(float64 a)
+{
+    return float64_val(a) >> 63;
+}
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -299,39 +352,6 @@ static int64_t roundAndPackUint64(flag zSign, uint64_t absZ0,
     return absZ0;
 }
 
-/*----------------------------------------------------------------------------
-| Returns the fraction bits of the single-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline uint32_t extractFloat32Frac( float32 a )
-{
-
-    return float32_val(a) & 0x007FFFFF;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the exponent bits of the single-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline int extractFloat32Exp(float32 a)
-{
-
-    return ( float32_val(a)>>23 ) & 0xFF;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the sign bit of the single-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline flag extractFloat32Sign( float32 a )
-{
-
-    return float32_val(a)>>31;
-
-}
-
 /*----------------------------------------------------------------------------
 | If `a' is denormal and we are in flush-to-zero mode then set the
 | input-denormal exception and return zero. Otherwise just return the value.
@@ -492,39 +512,6 @@ static float32
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the fraction bits of the double-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline uint64_t extractFloat64Frac( float64 a )
-{
-
-    return float64_val(a) & LIT64( 0x000FFFFFFFFFFFFF );
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the exponent bits of the double-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline int extractFloat64Exp(float64 a)
-{
-
-    return ( float64_val(a)>>52 ) & 0x7FF;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the sign bit of the double-precision floating-point value `a'.
-*----------------------------------------------------------------------------*/
-
-static inline flag extractFloat64Sign( float64 a )
-{
-
-    return float64_val(a)>>63;
-
-}
-
 /*----------------------------------------------------------------------------
 | If `a' is denormal and we are in flush-to-zero mode then set the
 | input-denormal exception and return zero. Otherwise just return the value.
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 09/19] fpu/softfloat: define decompose structures
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (7 preceding siblings ...)
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 08/19] fpu/softfloat: move the extract functions to the top of the file Alex Bennée
@ 2017-12-11 12:56 ` Alex Bennée
  2017-12-18 21:59   ` Richard Henderson
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 10/19] fpu/softfloat: re-factor add/sub Alex Bennée
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:56 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

These structures pave the way for generic softfloat helper routines
that will operate on fully decomposed numbers.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 fpu/softfloat.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 0850a78149..fe443ff234 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -83,7 +83,7 @@ this code that are retained.
  * target-dependent and needs the TARGET_* macros.
  */
 #include "qemu/osdep.h"
-
+#include "qemu/bitops.h"
 #include "fpu/softfloat.h"
 
 /* We only need stdlib for abort() */
@@ -185,6 +185,76 @@ static inline flag extractFloat64Sign(float64 a)
 {
     return float64_val(a) >> 63;
 }
+
+/*----------------------------------------------------------------------------
+| Classify a floating point number.
+*----------------------------------------------------------------------------*/
+
+typedef enum {
+    float_class_unclassified,
+    float_class_zero,
+    float_class_normal,
+    float_class_inf,
+    float_class_qnan,
+    float_class_snan,
+    float_class_dnan,
+    float_class_msnan, /* maybe silenced */
+} float_class;
+
+/*----------------------------------------------------------------------------
+| Structure holding all of the decomposed parts of a float.
+| The exponent is unbiased and the fraction is normalized.
+*----------------------------------------------------------------------------*/
+
+typedef struct {
+    uint64_t frac   : 64;
+    int exp         : 32;
+    float_class cls : 8;
+    int             : 23;
+    bool sign       : 1;
+} decomposed_parts;
+
+#define DECOMPOSED_BINARY_POINT    (64 - 2)
+#define DECOMPOSED_IMPLICIT_BIT    (1ull << DECOMPOSED_BINARY_POINT)
+#define DECOMPOSED_OVERFLOW_BIT    (DECOMPOSED_IMPLICIT_BIT << 1)
+
+/* Structure holding all of the relevant parameters for a format.  */
+typedef struct {
+    int exp_bias;
+    int exp_max;
+    int frac_shift;
+    uint64_t frac_lsb;
+    uint64_t frac_lsbm1;
+    uint64_t round_mask;
+    uint64_t roundeven_mask;
+} decomposed_params;
+
+#define FRAC_PARAMS(F)                     \
+    .frac_shift     = F,                   \
+    .frac_lsb       = 1ull << (F),         \
+    .frac_lsbm1     = 1ull << ((F) - 1),   \
+    .round_mask     = (1ull << (F)) - 1,   \
+    .roundeven_mask = (2ull << (F)) - 1
+
+static const decomposed_params float16_params = {
+    .exp_bias       = 0x0f,
+    .exp_max        = 0x1f,
+    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 10)
+};
+
+static const decomposed_params float32_params = {
+    .exp_bias       = 0x7f,
+    .exp_max        = 0xff,
+    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 23)
+};
+
+static const decomposed_params float64_params = {
+    .exp_bias       = 0x3ff,
+    .exp_max        = 0x7ff,
+    FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 52)
+};
+
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 10/19] fpu/softfloat: re-factor add/sub
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (8 preceding siblings ...)
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 09/19] fpu/softfloat: define decompose structures Alex Bennée
@ 2017-12-11 12:56 ` Alex Bennée
  2017-12-18 22:18   ` Richard Henderson
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 11/19] fpu/softfloat: re-factor mul Alex Bennée
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:56 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

We can now add float16_add/sub and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 add and sub functions.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 fpu/softfloat.c         | 903 +++++++++++++++++++++++++-----------------------
 include/fpu/softfloat.h |   4 +
 2 files changed, 480 insertions(+), 427 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index fe443ff234..f89e47e3ef 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -195,7 +195,7 @@ typedef enum {
     float_class_zero,
     float_class_normal,
     float_class_inf,
-    float_class_qnan,
+    float_class_qnan,  /* all NaNs from here */
     float_class_snan,
     float_class_dnan,
     float_class_msnan, /* maybe silenced */
@@ -254,6 +254,481 @@ static const decomposed_params float64_params = {
     FRAC_PARAMS(DECOMPOSED_BINARY_POINT - 52)
 };
 
+/* Unpack a float16 to parts, but do not canonicalize.  */
+static inline decomposed_parts float16_unpack_raw(float16 f)
+{
+    return (decomposed_parts){
+        .cls = float_class_unclassified,
+        .sign = extract32(f, 15, 1),
+        .exp = extract32(f, 10, 5),
+        .frac = extract32(f, 0, 10)
+    };
+}
+
+/* Unpack a float32 to parts, but do not canonicalize.  */
+static inline decomposed_parts float32_unpack_raw(float32 f)
+{
+    return (decomposed_parts){
+        .cls = float_class_unclassified,
+        .sign = extract32(f, 31, 1),
+        .exp = extract32(f, 23, 8),
+        .frac = extract32(f, 0, 23)
+    };
+}
+
+/* Unpack a float64 to parts, but do not canonicalize.  */
+static inline decomposed_parts float64_unpack_raw(float64 f)
+{
+    return (decomposed_parts){
+        .cls = float_class_unclassified,
+        .sign = extract64(f, 63, 1),
+        .exp = extract64(f, 52, 11),
+        .frac = extract64(f, 0, 52),
+    };
+}
+
+/* Pack a float32 from parts, but do not canonicalize.  */
+static inline float16 float16_pack_raw(decomposed_parts p)
+{
+    uint32_t ret = p.frac;
+    ret = deposit32(ret, 10, 5, p.exp);
+    ret = deposit32(ret, 15, 1, p.sign);
+    return make_float16(ret);
+}
+
+/* Pack a float32 from parts, but do not canonicalize.  */
+static inline float32 float32_pack_raw(decomposed_parts p)
+{
+    uint32_t ret = p.frac;
+    ret = deposit32(ret, 23, 8, p.exp);
+    ret = deposit32(ret, 31, 1, p.sign);
+    return make_float32(ret);
+}
+
+/* Pack a float64 from parts, but do not canonicalize.  */
+static inline float64 float64_pack_raw(decomposed_parts p)
+{
+    uint64_t ret = p.frac;
+    ret = deposit64(ret, 52, 11, p.exp);
+    ret = deposit64(ret, 63, 1, p.sign);
+    return make_float64(ret);
+}
+
+/* Canonicalize EXP and FRAC, setting CLS.  */
+static decomposed_parts decomposed_canonicalize(decomposed_parts part,
+                                        const decomposed_params *parm,
+                                        float_status *status)
+{
+    if (part.exp == parm->exp_max) {
+        if (part.frac == 0) {
+            part.cls = float_class_inf;
+        } else {
+#ifdef NO_SIGNALING_NANS
+            part.cls = float_class_qnan;
+#else
+            int64_t msb = part.frac << (parm->frac_shift + 2);
+            if ((msb < 0) == status->snan_bit_is_one) {
+                part.cls = float_class_snan;
+            } else {
+                part.cls = float_class_qnan;
+            }
+#endif
+        }
+    } else if (part.exp == 0) {
+        if (likely(part.frac == 0)) {
+            part.cls = float_class_zero;
+        } else if (status->flush_inputs_to_zero) {
+            float_raise(float_flag_input_denormal, status);
+            part.cls = float_class_zero;
+            part.frac = 0;
+        } else {
+            int shift = clz64(part.frac) - 1;
+            part.cls = float_class_normal;
+            part.exp = parm->frac_shift - parm->exp_bias - shift + 1;
+            part.frac <<= shift;
+        }
+    } else {
+        part.cls = float_class_normal;
+        part.exp -= parm->exp_bias;
+        part.frac = DECOMPOSED_IMPLICIT_BIT + (part.frac << parm->frac_shift);
+    }
+    return part;
+}
+
+/* Round and uncanonicalize a floating-point number by parts.
+   There are FRAC_SHIFT bits that may require rounding at the bottom
+   of the fraction; these bits will be removed.  The exponent will be
+   biased by EXP_BIAS and must be bounded by [EXP_MAX-1, 0].  */
+static decomposed_parts decomposed_round_canonical(decomposed_parts p,
+                                                   float_status *s,
+                                                   const decomposed_params *parm)
+{
+    const uint64_t frac_lsbm1 = parm->frac_lsbm1;
+    const uint64_t round_mask = parm->round_mask;
+    const uint64_t roundeven_mask = parm->roundeven_mask;
+    const int exp_max = parm->exp_max;
+    const int frac_shift = parm->frac_shift;
+    uint64_t frac, inc;
+    int exp, flags = 0;
+    bool overflow_norm;
+
+    frac = p.frac;
+    exp = p.exp;
+
+    switch (p.cls) {
+    case float_class_normal:
+        switch (s->float_rounding_mode) {
+        case float_round_nearest_even:
+            overflow_norm = false;
+            inc = ((frac & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
+            break;
+        case float_round_ties_away:
+            overflow_norm = false;
+            inc = frac_lsbm1;
+            break;
+        case float_round_to_zero:
+            overflow_norm = true;
+            inc = 0;
+            break;
+        case float_round_up:
+            inc = p.sign ? 0 : round_mask;
+            overflow_norm = p.sign;
+            break;
+        case float_round_down:
+            inc = p.sign ? round_mask : 0;
+            overflow_norm = !p.sign;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+
+        exp += parm->exp_bias;
+        if (likely(exp > 0)) {
+            if (frac & round_mask) {
+                flags |= float_flag_inexact;
+                frac += inc;
+                if (frac & DECOMPOSED_OVERFLOW_BIT) {
+                    frac >>= 1;
+                    exp++;
+                }
+            }
+            frac >>= frac_shift;
+
+            if (unlikely(exp >= exp_max)) {
+                flags |= float_flag_overflow | float_flag_inexact;
+                if (overflow_norm) {
+                    exp = exp_max - 1;
+                    frac = -1;
+                } else {
+                    p.cls = float_class_inf;
+                    goto do_inf;
+                }
+            }
+        } else if (s->flush_to_zero) {
+            flags |= float_flag_output_denormal;
+            p.cls = float_class_zero;
+            goto do_zero;
+        } else {
+            bool is_tiny = (s->float_detect_tininess
+                            == float_tininess_before_rounding)
+                        || (exp < 0)
+                        || !((frac + inc) & DECOMPOSED_OVERFLOW_BIT);
+
+            shift64RightJamming(frac, 1 - exp, &frac);
+            if (frac & round_mask) {
+                /* Need to recompute round-to-even.  */
+                if (s->float_rounding_mode == float_round_nearest_even) {
+                    inc = ((frac & roundeven_mask) != frac_lsbm1
+                           ? frac_lsbm1 : 0);
+                }
+                flags |= float_flag_inexact;
+                frac += inc;
+            }
+
+            exp = (frac & DECOMPOSED_IMPLICIT_BIT ? 1 : 0);
+            frac >>= frac_shift;
+
+            if (is_tiny && (flags & float_flag_inexact)) {
+                flags |= float_flag_underflow;
+            }
+            if (exp == 0 && frac == 0) {
+                p.cls = float_class_zero;
+            }
+        }
+        break;
+
+    case float_class_zero:
+    do_zero:
+        exp = 0;
+        frac = 0;
+        break;
+
+    case float_class_inf:
+    do_inf:
+        exp = exp_max;
+        frac = 0;
+        break;
+
+    case float_class_qnan:
+    case float_class_snan:
+        exp = exp_max;
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
+
+    float_raise(flags, s);
+    p.exp = exp;
+    p.frac = frac;
+    return p;
+}
+
+static decomposed_parts float16_unpack_canonical(float16 f, float_status *s)
+{
+    return decomposed_canonicalize(float16_unpack_raw(f), &float16_params, s);
+}
+
+static float16 float16_round_pack_canonical(decomposed_parts p, float_status *s)
+{
+    switch (p.cls) {
+    case float_class_dnan:
+        return float16_default_nan(s);
+    case float_class_msnan:
+        return float16_maybe_silence_nan(float16_pack_raw(p), s);
+    default:
+        p = decomposed_round_canonical(p, s, &float16_params);
+        return float16_pack_raw(p);
+    }
+}
+
+static decomposed_parts float32_unpack_canonical(float32 f, float_status *s)
+{
+    return decomposed_canonicalize(float32_unpack_raw(f), &float32_params, s);
+}
+
+static float32 float32_round_pack_canonical(decomposed_parts p, float_status *s)
+{
+    switch (p.cls) {
+    case float_class_dnan:
+        return float32_default_nan(s);
+    case float_class_msnan:
+        return float32_maybe_silence_nan(float32_pack_raw(p), s);
+    default:
+        p = decomposed_round_canonical(p, s, &float32_params);
+        return float32_pack_raw(p);
+    }
+}
+
+static decomposed_parts float64_unpack_canonical(float64 f, float_status *s)
+{
+    return decomposed_canonicalize(float64_unpack_raw(f), &float64_params, s);
+}
+
+static float64 float64_round_pack_canonical(decomposed_parts p, float_status *s)
+{
+    switch (p.cls) {
+    case float_class_dnan:
+        return float64_default_nan(s);
+    case float_class_msnan:
+        return float64_maybe_silence_nan(float64_pack_raw(p), s);
+    default:
+        p = decomposed_round_canonical(p, s, &float64_params);
+        return float64_pack_raw(p);
+    }
+}
+
+static decomposed_parts pick_nan_parts(decomposed_parts a, decomposed_parts b,
+                                       float_status *s)
+{
+    if (a.cls == float_class_snan || b.cls == float_class_snan) {
+        s->float_exception_flags |= float_flag_invalid;
+    }
+
+    if (s->default_nan_mode) {
+        a.cls = float_class_dnan;
+    } else {
+        if (pickNaN(a.cls == float_class_qnan,
+                    a.cls == float_class_snan,
+                    b.cls == float_class_qnan,
+                    b.cls == float_class_snan,
+                    a.frac > b.frac
+                    || (a.frac == b.frac && a.sign < b.sign))) {
+            a = b;
+        }
+        a.cls = float_class_msnan;
+    }
+    return a;
+}
+
+
+/*
+ * Returns the result of adding the absolute values of the
+ * floating-point values `a' and `b'. If `subtract' is set, the sum is
+ * negated before being returned. `subtract' is ignored if the result
+ * is a NaN. The addition is performed according to the IEC/IEEE
+ * Standard for Binary Floating-Point Arithmetic.
+ */
+
+static decomposed_parts add_decomposed(decomposed_parts a, decomposed_parts b,
+                                       bool subtract, float_status *s)
+{
+    bool a_sign = a.sign;
+    bool b_sign = b.sign ^ subtract;
+
+    if (a_sign != b_sign) {
+        /* Subtraction */
+
+        if (a.cls == float_class_normal && b.cls == float_class_normal) {
+            int a_exp = a.exp;
+            int b_exp = b.exp;
+            uint64_t a_frac = a.frac;
+            uint64_t b_frac = b.frac;
+
+            if (a_exp > b_exp || (a_exp == b_exp && a_frac >= b_frac)) {
+                shift64RightJamming(b_frac, a_exp - b_exp, &b_frac);
+                a_frac = a_frac - b_frac;
+            } else {
+                shift64RightJamming(a_frac, b_exp - a_exp, &a_frac);
+                a_frac = b_frac - a_frac;
+                a_exp = b_exp;
+                a_sign ^= 1;
+            }
+
+            if (a_frac == 0) {
+                a.cls = float_class_zero;
+                a.sign = s->float_rounding_mode == float_round_down;
+            } else {
+                int shift = clz64(a_frac) - 1;
+                a.frac = a_frac << shift;
+                a.exp = a_exp - shift;
+                a.sign = a_sign;
+            }
+            return a;
+        }
+        if (a.cls >= float_class_qnan
+            ||
+            b.cls >= float_class_qnan)
+        {
+            return pick_nan_parts(a, b, s);
+        }
+        if (a.cls == float_class_inf) {
+            if (b.cls == float_class_inf) {
+                float_raise(float_flag_invalid, s);
+                a.cls = float_class_dnan;
+            }
+            return a;
+        }
+        if (a.cls == float_class_zero && b.cls == float_class_zero) {
+            a.sign = s->float_rounding_mode == float_round_down;
+            return a;
+        }
+        if (a.cls == float_class_zero || b.cls == float_class_inf) {
+            b.sign = a_sign ^ 1;
+            return b;
+        }
+        if (b.cls == float_class_zero) {
+            return a;
+        }
+    } else {
+        /* Addition */
+        if (a.cls == float_class_normal && b.cls == float_class_normal) {
+            int a_exp = a.exp;
+            int b_exp = b.exp;
+            uint64_t a_frac = a.frac;
+            uint64_t b_frac = b.frac;
+
+            if (a_exp > b_exp) {
+                shift64RightJamming(b_frac, a_exp - b_exp, &b_frac);
+            } else if (a_exp < b_exp) {
+                shift64RightJamming(a_frac, b_exp - a_exp, &a_frac);
+                a_exp = b_exp;
+            }
+            a_frac += b_frac;
+            if (a_frac & DECOMPOSED_OVERFLOW_BIT) {
+                a_frac >>= 1;
+                a_exp += 1;
+            }
+
+            a.exp = a_exp;
+            a.frac = a_frac;
+            return a;
+        }
+        if (a.cls >= float_class_qnan
+            ||
+            b.cls >= float_class_qnan) {
+            return pick_nan_parts(a, b, s);
+        }
+        if (a.cls == float_class_inf || b.cls == float_class_zero) {
+            return a;
+        }
+        if (b.cls == float_class_inf || a.cls == float_class_zero) {
+            b.sign = b_sign;
+            return b;
+        }
+    }
+    g_assert_not_reached();
+}
+
+/*
+ * Returns the result of adding or subtracting the floating-point
+ * values `a' and `b'. The operation is performed according to the
+ * IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+ */
+
+float16 float16_add(float16 a, float16 b, float_status *status)
+{
+    decomposed_parts pa = float16_unpack_canonical(a, status);
+    decomposed_parts pb = float16_unpack_canonical(b, status);
+    decomposed_parts pr = add_decomposed(pa, pb, false, status);
+
+    return float16_round_pack_canonical(pr, status);
+}
+
+float32 float32_add(float32 a, float32 b, float_status *status)
+{
+    decomposed_parts pa = float32_unpack_canonical(a, status);
+    decomposed_parts pb = float32_unpack_canonical(b, status);
+    decomposed_parts pr = add_decomposed(pa, pb, false, status);
+
+    return float32_round_pack_canonical(pr, status);
+}
+
+float64 float64_add(float64 a, float64 b, float_status *status)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, status);
+    decomposed_parts pb = float64_unpack_canonical(b, status);
+    decomposed_parts pr = add_decomposed(pa, pb, false, status);
+
+    return float64_round_pack_canonical(pr, status);
+}
+
+float16 float16_sub(float16 a, float16 b, float_status *status)
+{
+    decomposed_parts pa = float16_unpack_canonical(a, status);
+    decomposed_parts pb = float16_unpack_canonical(b, status);
+    decomposed_parts pr = add_decomposed(pa, pb, true, status);
+
+    return float16_round_pack_canonical(pr, status);
+}
+
+float32 float32_sub(float32 a, float32 b, float_status *status)
+{
+    decomposed_parts pa = float32_unpack_canonical(a, status);
+    decomposed_parts pb = float32_unpack_canonical(b, status);
+    decomposed_parts pr = add_decomposed(pa, pb, true, status);
+
+    return float32_round_pack_canonical(pr, status);
+}
+
+float64 float64_sub(float64 a, float64 b, float_status *status)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, status);
+    decomposed_parts pb = float64_unpack_canonical(b, status);
+    decomposed_parts pr = add_decomposed(pa, pb, true, status);
+
+    return float64_round_pack_canonical(pr, status);
+}
 
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
@@ -2066,219 +2541,6 @@ float32 float32_round_to_int(float32 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of adding the absolute values of the single-precision
-| floating-point values `a' and `b'.  If `zSign' is 1, the sum is negated
-| before being returned.  `zSign' is ignored if the result is a NaN.
-| The addition is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float32 addFloat32Sigs(float32 a, float32 b, flag zSign,
-                              float_status *status)
-{
-    int aExp, bExp, zExp;
-    uint32_t aSig, bSig, zSig;
-    int expDiff;
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    bSig = extractFloat32Frac( b );
-    bExp = extractFloat32Exp( b );
-    expDiff = aExp - bExp;
-    aSig <<= 6;
-    bSig <<= 6;
-    if ( 0 < expDiff ) {
-        if ( aExp == 0xFF ) {
-            if (aSig) {
-                return propagateFloat32NaN(a, b, status);
-            }
-            return a;
-        }
-        if ( bExp == 0 ) {
-            --expDiff;
-        }
-        else {
-            bSig |= 0x20000000;
-        }
-        shift32RightJamming( bSig, expDiff, &bSig );
-        zExp = aExp;
-    }
-    else if ( expDiff < 0 ) {
-        if ( bExp == 0xFF ) {
-            if (bSig) {
-                return propagateFloat32NaN(a, b, status);
-            }
-            return packFloat32( zSign, 0xFF, 0 );
-        }
-        if ( aExp == 0 ) {
-            ++expDiff;
-        }
-        else {
-            aSig |= 0x20000000;
-        }
-        shift32RightJamming( aSig, - expDiff, &aSig );
-        zExp = bExp;
-    }
-    else {
-        if ( aExp == 0xFF ) {
-            if (aSig | bSig) {
-                return propagateFloat32NaN(a, b, status);
-            }
-            return a;
-        }
-        if ( aExp == 0 ) {
-            if (status->flush_to_zero) {
-                if (aSig | bSig) {
-                    float_raise(float_flag_output_denormal, status);
-                }
-                return packFloat32(zSign, 0, 0);
-            }
-            return packFloat32( zSign, 0, ( aSig + bSig )>>6 );
-        }
-        zSig = 0x40000000 + aSig + bSig;
-        zExp = aExp;
-        goto roundAndPack;
-    }
-    aSig |= 0x20000000;
-    zSig = ( aSig + bSig )<<1;
-    --zExp;
-    if ( (int32_t) zSig < 0 ) {
-        zSig = aSig + bSig;
-        ++zExp;
-    }
- roundAndPack:
-    return roundAndPackFloat32(zSign, zExp, zSig, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the absolute values of the single-
-| precision floating-point values `a' and `b'.  If `zSign' is 1, the
-| difference is negated before being returned.  `zSign' is ignored if the
-| result is a NaN.  The subtraction is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float32 subFloat32Sigs(float32 a, float32 b, flag zSign,
-                              float_status *status)
-{
-    int aExp, bExp, zExp;
-    uint32_t aSig, bSig, zSig;
-    int expDiff;
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    bSig = extractFloat32Frac( b );
-    bExp = extractFloat32Exp( b );
-    expDiff = aExp - bExp;
-    aSig <<= 7;
-    bSig <<= 7;
-    if ( 0 < expDiff ) goto aExpBigger;
-    if ( expDiff < 0 ) goto bExpBigger;
-    if ( aExp == 0xFF ) {
-        if (aSig | bSig) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        float_raise(float_flag_invalid, status);
-        return float32_default_nan(status);
-    }
-    if ( aExp == 0 ) {
-        aExp = 1;
-        bExp = 1;
-    }
-    if ( bSig < aSig ) goto aBigger;
-    if ( aSig < bSig ) goto bBigger;
-    return packFloat32(status->float_rounding_mode == float_round_down, 0, 0);
- bExpBigger:
-    if ( bExp == 0xFF ) {
-        if (bSig) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        return packFloat32( zSign ^ 1, 0xFF, 0 );
-    }
-    if ( aExp == 0 ) {
-        ++expDiff;
-    }
-    else {
-        aSig |= 0x40000000;
-    }
-    shift32RightJamming( aSig, - expDiff, &aSig );
-    bSig |= 0x40000000;
- bBigger:
-    zSig = bSig - aSig;
-    zExp = bExp;
-    zSign ^= 1;
-    goto normalizeRoundAndPack;
- aExpBigger:
-    if ( aExp == 0xFF ) {
-        if (aSig) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        return a;
-    }
-    if ( bExp == 0 ) {
-        --expDiff;
-    }
-    else {
-        bSig |= 0x40000000;
-    }
-    shift32RightJamming( bSig, expDiff, &bSig );
-    aSig |= 0x40000000;
- aBigger:
-    zSig = aSig - bSig;
-    zExp = aExp;
- normalizeRoundAndPack:
-    --zExp;
-    return normalizeRoundAndPackFloat32(zSign, zExp, zSig, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of adding the single-precision floating-point values `a'
-| and `b'.  The operation is performed according to the IEC/IEEE Standard for
-| Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 float32_add(float32 a, float32 b, float_status *status)
-{
-    flag aSign, bSign;
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-
-    aSign = extractFloat32Sign( a );
-    bSign = extractFloat32Sign( b );
-    if ( aSign == bSign ) {
-        return addFloat32Sigs(a, b, aSign, status);
-    }
-    else {
-        return subFloat32Sigs(a, b, aSign, status);
-    }
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the single-precision floating-point values
-| `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
-| for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 float32_sub(float32 a, float32 b, float_status *status)
-{
-    flag aSign, bSign;
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-
-    aSign = extractFloat32Sign( a );
-    bSign = extractFloat32Sign( b );
-    if ( aSign == bSign ) {
-        return subFloat32Sigs(a, b, aSign, status);
-    }
-    else {
-        return addFloat32Sigs(a, b, aSign, status);
-    }
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the result of multiplying the single-precision floating-point values
@@ -3876,219 +4138,6 @@ float64 float64_trunc_to_int(float64 a, float_status *status)
     return res;
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of adding the absolute values of the double-precision
-| floating-point values `a' and `b'.  If `zSign' is 1, the sum is negated
-| before being returned.  `zSign' is ignored if the result is a NaN.
-| The addition is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float64 addFloat64Sigs(float64 a, float64 b, flag zSign,
-                              float_status *status)
-{
-    int aExp, bExp, zExp;
-    uint64_t aSig, bSig, zSig;
-    int expDiff;
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    bSig = extractFloat64Frac( b );
-    bExp = extractFloat64Exp( b );
-    expDiff = aExp - bExp;
-    aSig <<= 9;
-    bSig <<= 9;
-    if ( 0 < expDiff ) {
-        if ( aExp == 0x7FF ) {
-            if (aSig) {
-                return propagateFloat64NaN(a, b, status);
-            }
-            return a;
-        }
-        if ( bExp == 0 ) {
-            --expDiff;
-        }
-        else {
-            bSig |= LIT64( 0x2000000000000000 );
-        }
-        shift64RightJamming( bSig, expDiff, &bSig );
-        zExp = aExp;
-    }
-    else if ( expDiff < 0 ) {
-        if ( bExp == 0x7FF ) {
-            if (bSig) {
-                return propagateFloat64NaN(a, b, status);
-            }
-            return packFloat64( zSign, 0x7FF, 0 );
-        }
-        if ( aExp == 0 ) {
-            ++expDiff;
-        }
-        else {
-            aSig |= LIT64( 0x2000000000000000 );
-        }
-        shift64RightJamming( aSig, - expDiff, &aSig );
-        zExp = bExp;
-    }
-    else {
-        if ( aExp == 0x7FF ) {
-            if (aSig | bSig) {
-                return propagateFloat64NaN(a, b, status);
-            }
-            return a;
-        }
-        if ( aExp == 0 ) {
-            if (status->flush_to_zero) {
-                if (aSig | bSig) {
-                    float_raise(float_flag_output_denormal, status);
-                }
-                return packFloat64(zSign, 0, 0);
-            }
-            return packFloat64( zSign, 0, ( aSig + bSig )>>9 );
-        }
-        zSig = LIT64( 0x4000000000000000 ) + aSig + bSig;
-        zExp = aExp;
-        goto roundAndPack;
-    }
-    aSig |= LIT64( 0x2000000000000000 );
-    zSig = ( aSig + bSig )<<1;
-    --zExp;
-    if ( (int64_t) zSig < 0 ) {
-        zSig = aSig + bSig;
-        ++zExp;
-    }
- roundAndPack:
-    return roundAndPackFloat64(zSign, zExp, zSig, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the absolute values of the double-
-| precision floating-point values `a' and `b'.  If `zSign' is 1, the
-| difference is negated before being returned.  `zSign' is ignored if the
-| result is a NaN.  The subtraction is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-static float64 subFloat64Sigs(float64 a, float64 b, flag zSign,
-                              float_status *status)
-{
-    int aExp, bExp, zExp;
-    uint64_t aSig, bSig, zSig;
-    int expDiff;
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    bSig = extractFloat64Frac( b );
-    bExp = extractFloat64Exp( b );
-    expDiff = aExp - bExp;
-    aSig <<= 10;
-    bSig <<= 10;
-    if ( 0 < expDiff ) goto aExpBigger;
-    if ( expDiff < 0 ) goto bExpBigger;
-    if ( aExp == 0x7FF ) {
-        if (aSig | bSig) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        float_raise(float_flag_invalid, status);
-        return float64_default_nan(status);
-    }
-    if ( aExp == 0 ) {
-        aExp = 1;
-        bExp = 1;
-    }
-    if ( bSig < aSig ) goto aBigger;
-    if ( aSig < bSig ) goto bBigger;
-    return packFloat64(status->float_rounding_mode == float_round_down, 0, 0);
- bExpBigger:
-    if ( bExp == 0x7FF ) {
-        if (bSig) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        return packFloat64( zSign ^ 1, 0x7FF, 0 );
-    }
-    if ( aExp == 0 ) {
-        ++expDiff;
-    }
-    else {
-        aSig |= LIT64( 0x4000000000000000 );
-    }
-    shift64RightJamming( aSig, - expDiff, &aSig );
-    bSig |= LIT64( 0x4000000000000000 );
- bBigger:
-    zSig = bSig - aSig;
-    zExp = bExp;
-    zSign ^= 1;
-    goto normalizeRoundAndPack;
- aExpBigger:
-    if ( aExp == 0x7FF ) {
-        if (aSig) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        return a;
-    }
-    if ( bExp == 0 ) {
-        --expDiff;
-    }
-    else {
-        bSig |= LIT64( 0x4000000000000000 );
-    }
-    shift64RightJamming( bSig, expDiff, &bSig );
-    aSig |= LIT64( 0x4000000000000000 );
- aBigger:
-    zSig = aSig - bSig;
-    zExp = aExp;
- normalizeRoundAndPack:
-    --zExp;
-    return normalizeRoundAndPackFloat64(zSign, zExp, zSig, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of adding the double-precision floating-point values `a'
-| and `b'.  The operation is performed according to the IEC/IEEE Standard for
-| Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 float64_add(float64 a, float64 b, float_status *status)
-{
-    flag aSign, bSign;
-    a = float64_squash_input_denormal(a, status);
-    b = float64_squash_input_denormal(b, status);
-
-    aSign = extractFloat64Sign( a );
-    bSign = extractFloat64Sign( b );
-    if ( aSign == bSign ) {
-        return addFloat64Sigs(a, b, aSign, status);
-    }
-    else {
-        return subFloat64Sigs(a, b, aSign, status);
-    }
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of subtracting the double-precision floating-point values
-| `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
-| for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 float64_sub(float64 a, float64 b, float_status *status)
-{
-    flag aSign, bSign;
-    a = float64_squash_input_denormal(a, status);
-    b = float64_squash_input_denormal(b, status);
-
-    aSign = extractFloat64Sign( a );
-    bSign = extractFloat64Sign( b );
-    if ( aSign == bSign ) {
-        return subFloat64Sigs(a, b, aSign, status);
-    }
-    else {
-        return addFloat64Sigs(a, b, aSign, status);
-    }
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the result of multiplying the double-precision floating-point values
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 5a9258c57c..3238916aba 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -345,6 +345,10 @@ float64 float16_to_float64(float16 a, flag ieee, float_status *status);
 /*----------------------------------------------------------------------------
 | Software half-precision operations.
 *----------------------------------------------------------------------------*/
+
+float16 float16_add(float16, float16, float_status *status);
+float16 float16_sub(float16, float16, float_status *status);
+
 int float16_is_quiet_nan(float16, float_status *status);
 int float16_is_signaling_nan(float16, float_status *status);
 float16 float16_maybe_silence_nan(float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 11/19] fpu/softfloat: re-factor mul
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (9 preceding siblings ...)
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 10/19] fpu/softfloat: re-factor add/sub Alex Bennée
@ 2017-12-11 12:56 ` Alex Bennée
  2017-12-18 22:22   ` Richard Henderson
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 12/19] fpu/softfloat: re-factor div Alex Bennée
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:56 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

We can now add float16_mul and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 versions.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 fpu/softfloat.c         | 207 ++++++++++++++++++------------------------------
 include/fpu/softfloat.h |   1 +
 2 files changed, 80 insertions(+), 128 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index f89e47e3ef..6e9d4c172c 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -730,6 +730,85 @@ float64 float64_sub(float64 a, float64 b, float_status *status)
     return float64_round_pack_canonical(pr, status);
 }
 
+/*
+ * Returns the result of multiplying the floating-point values `a' and
+ * `b'. The operation is performed according to the IEC/IEEE Standard
+ * for Binary Floating-Point Arithmetic.
+ */
+
+static decomposed_parts mul_decomposed(decomposed_parts a, decomposed_parts b,
+                                       float_status *s)
+{
+    bool sign = a.sign ^ b.sign;
+
+    if (a.cls == float_class_normal && b.cls == float_class_normal) {
+        uint64_t hi, lo;
+        int exp = a.exp + b.exp;
+
+        mul64To128(a.frac, b.frac, &hi, &lo);
+        shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo);
+        if (lo & DECOMPOSED_OVERFLOW_BIT) {
+            shift64RightJamming(lo, 1, &lo);
+            exp += 1;
+        }
+
+        /* Re-use a */
+        a.exp = exp;
+        a.sign = sign;
+        a.frac = lo;
+        return a;
+    }
+    /* handle all the NaN cases */
+    if (a.cls >= float_class_qnan || b.cls >= float_class_qnan) {
+        return pick_nan_parts(a, b, s);
+    }
+    /* Inf * Zero == NaN */
+    if (((1 << a.cls) | (1 << b.cls)) ==
+        ((1 << float_class_inf) | (1 << float_class_zero))) {
+        s->float_exception_flags |= float_flag_invalid;
+        a.cls = float_class_dnan;
+        a.sign = sign;
+        return a;
+    }
+    /* Multiply by 0 or Inf */
+    if (a.cls == float_class_inf || a.cls == float_class_zero) {
+        a.sign = sign;
+        return a;
+    }
+    if (b.cls == float_class_inf || b.cls == float_class_zero) {
+        b.sign = sign;
+        return b;
+    }
+    g_assert_not_reached();
+}
+
+float16 float16_mul(float16 a, float16 b, float_status *status)
+{
+    decomposed_parts pa = float16_unpack_canonical(a, status);
+    decomposed_parts pb = float16_unpack_canonical(b, status);
+    decomposed_parts pr = mul_decomposed(pa, pb, status);
+
+    return float16_round_pack_canonical(pr, status);
+}
+
+float32 float32_mul(float32 a, float32 b, float_status *status)
+{
+    decomposed_parts pa = float32_unpack_canonical(a, status);
+    decomposed_parts pb = float32_unpack_canonical(b, status);
+    decomposed_parts pr = mul_decomposed(pa, pb, status);
+
+    return float32_round_pack_canonical(pr, status);
+}
+
+float64 float64_mul(float64 a, float64 b, float_status *status)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, status);
+    decomposed_parts pb = float64_unpack_canonical(b, status);
+    decomposed_parts pr = mul_decomposed(pa, pb, status);
+
+    return float64_round_pack_canonical(pr, status);
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -2542,70 +2621,6 @@ float32 float32_round_to_int(float32 a, float_status *status)
 }
 
 
-/*----------------------------------------------------------------------------
-| Returns the result of multiplying the single-precision floating-point values
-| `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
-| for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 float32_mul(float32 a, float32 b, float_status *status)
-{
-    flag aSign, bSign, zSign;
-    int aExp, bExp, zExp;
-    uint32_t aSig, bSig;
-    uint64_t zSig64;
-    uint32_t zSig;
-
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    bSig = extractFloat32Frac( b );
-    bExp = extractFloat32Exp( b );
-    bSign = extractFloat32Sign( b );
-    zSign = aSign ^ bSign;
-    if ( aExp == 0xFF ) {
-        if ( aSig || ( ( bExp == 0xFF ) && bSig ) ) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        if ( ( bExp | bSig ) == 0 ) {
-            float_raise(float_flag_invalid, status);
-            return float32_default_nan(status);
-        }
-        return packFloat32( zSign, 0xFF, 0 );
-    }
-    if ( bExp == 0xFF ) {
-        if (bSig) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        if ( ( aExp | aSig ) == 0 ) {
-            float_raise(float_flag_invalid, status);
-            return float32_default_nan(status);
-        }
-        return packFloat32( zSign, 0xFF, 0 );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloat32( zSign, 0, 0 );
-        normalizeFloat32Subnormal( aSig, &aExp, &aSig );
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) return packFloat32( zSign, 0, 0 );
-        normalizeFloat32Subnormal( bSig, &bExp, &bSig );
-    }
-    zExp = aExp + bExp - 0x7F;
-    aSig = ( aSig | 0x00800000 )<<7;
-    bSig = ( bSig | 0x00800000 )<<8;
-    shift64RightJamming( ( (uint64_t) aSig ) * bSig, 32, &zSig64 );
-    zSig = zSig64;
-    if ( 0 <= (int32_t) ( zSig<<1 ) ) {
-        zSig <<= 1;
-        --zExp;
-    }
-    return roundAndPackFloat32(zSign, zExp, zSig, status);
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the result of dividing the single-precision floating-point value `a'
@@ -4138,70 +4153,6 @@ float64 float64_trunc_to_int(float64 a, float_status *status)
     return res;
 }
 
-
-/*----------------------------------------------------------------------------
-| Returns the result of multiplying the double-precision floating-point values
-| `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
-| for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 float64_mul(float64 a, float64 b, float_status *status)
-{
-    flag aSign, bSign, zSign;
-    int aExp, bExp, zExp;
-    uint64_t aSig, bSig, zSig0, zSig1;
-
-    a = float64_squash_input_denormal(a, status);
-    b = float64_squash_input_denormal(b, status);
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    bSig = extractFloat64Frac( b );
-    bExp = extractFloat64Exp( b );
-    bSign = extractFloat64Sign( b );
-    zSign = aSign ^ bSign;
-    if ( aExp == 0x7FF ) {
-        if ( aSig || ( ( bExp == 0x7FF ) && bSig ) ) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        if ( ( bExp | bSig ) == 0 ) {
-            float_raise(float_flag_invalid, status);
-            return float64_default_nan(status);
-        }
-        return packFloat64( zSign, 0x7FF, 0 );
-    }
-    if ( bExp == 0x7FF ) {
-        if (bSig) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        if ( ( aExp | aSig ) == 0 ) {
-            float_raise(float_flag_invalid, status);
-            return float64_default_nan(status);
-        }
-        return packFloat64( zSign, 0x7FF, 0 );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloat64( zSign, 0, 0 );
-        normalizeFloat64Subnormal( aSig, &aExp, &aSig );
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) return packFloat64( zSign, 0, 0 );
-        normalizeFloat64Subnormal( bSig, &bExp, &bSig );
-    }
-    zExp = aExp + bExp - 0x3FF;
-    aSig = ( aSig | LIT64( 0x0010000000000000 ) )<<10;
-    bSig = ( bSig | LIT64( 0x0010000000000000 ) )<<11;
-    mul64To128( aSig, bSig, &zSig0, &zSig1 );
-    zSig0 |= ( zSig1 != 0 );
-    if ( 0 <= (int64_t) ( zSig0<<1 ) ) {
-        zSig0 <<= 1;
-        --zExp;
-    }
-    return roundAndPackFloat64(zSign, zExp, zSig0, status);
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of dividing the double-precision floating-point value `a'
 | by the corresponding value `b'.  The operation is performed according to
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 3238916aba..1fe8734261 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -348,6 +348,7 @@ float64 float16_to_float64(float16 a, flag ieee, float_status *status);
 
 float16 float16_add(float16, float16, float_status *status);
 float16 float16_sub(float16, float16, float_status *status);
+float16 float16_mul(float16, float16, float_status *status);
 
 int float16_is_quiet_nan(float16, float_status *status);
 int float16_is_signaling_nan(float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 12/19] fpu/softfloat: re-factor div
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (10 preceding siblings ...)
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 11/19] fpu/softfloat: re-factor mul Alex Bennée
@ 2017-12-11 12:56 ` Alex Bennée
  2017-12-18 22:26   ` Richard Henderson
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 13/19] fpu/softfloat: re-factor muladd Alex Bennée
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:56 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

We can now add float16_div and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 versions.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 fpu/softfloat-macros.h  |  44 +++++++++
 fpu/softfloat.c         | 235 ++++++++++++++++++------------------------------
 include/fpu/softfloat.h |   1 +
 3 files changed, 134 insertions(+), 146 deletions(-)

diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h
index 9cc6158cb4..980be2c051 100644
--- a/fpu/softfloat-macros.h
+++ b/fpu/softfloat-macros.h
@@ -625,6 +625,50 @@ static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b )
 
 }
 
+/* Nicked from gmp longlong.h __udiv_qrnnd */
+static uint64_t div128To64(uint64_t n0, uint64_t n1, uint64_t d)
+{
+    uint64_t d0, d1, q0, q1, r1, r0, m;
+
+    d0 = (uint32_t)d;
+    d1 = d >> 32;
+
+    r1 = n1 % d1;
+    q1 = n1 / d1;
+    m = q1 * d0;
+    r1 = (r1 << 32) | (n0 >> 32);
+    if (r1 < m) {
+        q1 -= 1;
+        r1 += d;
+        if (r1 >= d) {
+            if (r1 < m) {
+                q1 -= 1;
+                r1 += d;
+            }
+        }
+    }
+    r1 -= m;
+
+    r0 = r1 % d1;
+    q0 = r1 / d1;
+    m = q0 * d0;
+    r0 = (r0 << 32) | (uint32_t)n0;
+    if (r0 < m) {
+        q0 -= 1;
+        r0 += d;
+        if (r0 >= d) {
+            if (r0 < m) {
+                q0 -= 1;
+                r0 += d;
+            }
+        }
+    }
+    r0 -= m;
+
+    /* Return remainder in LSB */
+    return (q1 << 32) | q0 | (r0 != 0);
+}
+
 /*----------------------------------------------------------------------------
 | Returns an approximation to the square root of the 32-bit significand given
 | by `a'.  Considered as an integer, `a' must be at least 2^31.  If bit 0 of
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6e9d4c172c..2b703c12ed 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -809,6 +809,95 @@ float64 float64_mul(float64 a, float64 b, float_status *status)
     return float64_round_pack_canonical(pr, status);
 }
 
+/*
+ * Returns the result of dividing the floating-point value `a' by the
+ * corresponding value `b'. The operation is performed according to
+ * the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+ */
+
+static decomposed_parts div_decomposed(decomposed_parts a, decomposed_parts b,
+                                       float_status *s)
+{
+    bool sign = a.sign ^ b.sign;
+
+    if (a.cls == float_class_normal && b.cls == float_class_normal) {
+        uint64_t temp_lo, temp_hi;
+        int exp = a.exp - b.exp;
+        if (a.frac < b.frac) {
+            exp -= 1;
+            shortShift128Left(0, a.frac, DECOMPOSED_BINARY_POINT + 1,
+                              &temp_hi, &temp_lo);
+        } else {
+            shortShift128Left(0, a.frac, DECOMPOSED_BINARY_POINT,
+                              &temp_hi, &temp_lo);
+        }
+        /* LSB of quot is set if inexact which roundandpack will use
+         * to set flags. Yet again we re-use a for the result */
+        a.frac = div128To64(temp_lo, temp_hi, b.frac);
+        a.sign = sign;
+        a.exp = exp;
+        return a;
+    }
+    /* handle all the NaN cases */
+    if (a.cls >= float_class_qnan || b.cls >= float_class_qnan) {
+        return pick_nan_parts(a, b, s);
+    }
+    /* 0/0 or Inf/Inf */
+    if (a.cls == b.cls
+        &&
+        (a.cls == float_class_inf || a.cls == float_class_zero)) {
+        s->float_exception_flags |= float_flag_invalid;
+        a.cls = float_class_dnan;
+        return a;
+    }
+    /* Div 0 => Inf */
+    if (b.cls == float_class_zero) {
+        s->float_exception_flags |= float_flag_divbyzero;
+        a.cls = float_class_inf;
+        a.sign = sign;
+        return a;
+    }
+    /* Inf / x or 0 / x */
+    if (a.cls == float_class_inf || a.cls == float_class_zero) {
+        a.sign = sign;
+        return a;
+    }
+    /* Div by Inf */
+    if (b.cls == float_class_inf) {
+        a.cls = float_class_zero;
+        a.sign = sign;
+        return a;
+    }
+    g_assert_not_reached();
+}
+
+float16 float16_div(float16 a, float16 b, float_status *status)
+{
+    decomposed_parts pa = float16_unpack_canonical(a, status);
+    decomposed_parts pb = float16_unpack_canonical(b, status);
+    decomposed_parts pr = div_decomposed(pa, pb, status);
+
+    return float16_round_pack_canonical(pr, status);
+}
+
+float32 float32_div(float32 a, float32 b, float_status *status)
+{
+    decomposed_parts pa = float32_unpack_canonical(a, status);
+    decomposed_parts pb = float32_unpack_canonical(b, status);
+    decomposed_parts pr = div_decomposed(pa, pb, status);
+
+    return float32_round_pack_canonical(pr, status);
+}
+
+float64 float64_div(float64 a, float64 b, float_status *status)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, status);
+    decomposed_parts pb = float64_unpack_canonical(b, status);
+    decomposed_parts pr = div_decomposed(pa, pb, status);
+
+    return float64_round_pack_canonical(pr, status);
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -2622,75 +2711,6 @@ float32 float32_round_to_int(float32 a, float_status *status)
 
 
 
-/*----------------------------------------------------------------------------
-| Returns the result of dividing the single-precision floating-point value `a'
-| by the corresponding value `b'.  The operation is performed according to the
-| IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 float32_div(float32 a, float32 b, float_status *status)
-{
-    flag aSign, bSign, zSign;
-    int aExp, bExp, zExp;
-    uint32_t aSig, bSig, zSig;
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    bSig = extractFloat32Frac( b );
-    bExp = extractFloat32Exp( b );
-    bSign = extractFloat32Sign( b );
-    zSign = aSign ^ bSign;
-    if ( aExp == 0xFF ) {
-        if (aSig) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        if ( bExp == 0xFF ) {
-            if (bSig) {
-                return propagateFloat32NaN(a, b, status);
-            }
-            float_raise(float_flag_invalid, status);
-            return float32_default_nan(status);
-        }
-        return packFloat32( zSign, 0xFF, 0 );
-    }
-    if ( bExp == 0xFF ) {
-        if (bSig) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        return packFloat32( zSign, 0, 0 );
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) {
-            if ( ( aExp | aSig ) == 0 ) {
-                float_raise(float_flag_invalid, status);
-                return float32_default_nan(status);
-            }
-            float_raise(float_flag_divbyzero, status);
-            return packFloat32( zSign, 0xFF, 0 );
-        }
-        normalizeFloat32Subnormal( bSig, &bExp, &bSig );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloat32( zSign, 0, 0 );
-        normalizeFloat32Subnormal( aSig, &aExp, &aSig );
-    }
-    zExp = aExp - bExp + 0x7D;
-    aSig = ( aSig | 0x00800000 )<<7;
-    bSig = ( bSig | 0x00800000 )<<8;
-    if ( bSig <= ( aSig + aSig ) ) {
-        aSig >>= 1;
-        ++zExp;
-    }
-    zSig = ( ( (uint64_t) aSig )<<32 ) / bSig;
-    if ( ( zSig & 0x3F ) == 0 ) {
-        zSig |= ( (uint64_t) bSig * zSig != ( (uint64_t) aSig )<<32 );
-    }
-    return roundAndPackFloat32(zSign, zExp, zSig, status);
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the remainder of the single-precision floating-point value `a'
@@ -4153,83 +4173,6 @@ float64 float64_trunc_to_int(float64 a, float_status *status)
     return res;
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of dividing the double-precision floating-point value `a'
-| by the corresponding value `b'.  The operation is performed according to
-| the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 float64_div(float64 a, float64 b, float_status *status)
-{
-    flag aSign, bSign, zSign;
-    int aExp, bExp, zExp;
-    uint64_t aSig, bSig, zSig;
-    uint64_t rem0, rem1;
-    uint64_t term0, term1;
-    a = float64_squash_input_denormal(a, status);
-    b = float64_squash_input_denormal(b, status);
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    bSig = extractFloat64Frac( b );
-    bExp = extractFloat64Exp( b );
-    bSign = extractFloat64Sign( b );
-    zSign = aSign ^ bSign;
-    if ( aExp == 0x7FF ) {
-        if (aSig) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        if ( bExp == 0x7FF ) {
-            if (bSig) {
-                return propagateFloat64NaN(a, b, status);
-            }
-            float_raise(float_flag_invalid, status);
-            return float64_default_nan(status);
-        }
-        return packFloat64( zSign, 0x7FF, 0 );
-    }
-    if ( bExp == 0x7FF ) {
-        if (bSig) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        return packFloat64( zSign, 0, 0 );
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) {
-            if ( ( aExp | aSig ) == 0 ) {
-                float_raise(float_flag_invalid, status);
-                return float64_default_nan(status);
-            }
-            float_raise(float_flag_divbyzero, status);
-            return packFloat64( zSign, 0x7FF, 0 );
-        }
-        normalizeFloat64Subnormal( bSig, &bExp, &bSig );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloat64( zSign, 0, 0 );
-        normalizeFloat64Subnormal( aSig, &aExp, &aSig );
-    }
-    zExp = aExp - bExp + 0x3FD;
-    aSig = ( aSig | LIT64( 0x0010000000000000 ) )<<10;
-    bSig = ( bSig | LIT64( 0x0010000000000000 ) )<<11;
-    if ( bSig <= ( aSig + aSig ) ) {
-        aSig >>= 1;
-        ++zExp;
-    }
-    zSig = estimateDiv128To64( aSig, 0, bSig );
-    if ( ( zSig & 0x1FF ) <= 2 ) {
-        mul64To128( bSig, zSig, &term0, &term1 );
-        sub128( aSig, 0, term0, term1, &rem0, &rem1 );
-        while ( (int64_t) rem0 < 0 ) {
-            --zSig;
-            add128( rem0, rem1, 0, bSig, &rem0, &rem1 );
-        }
-        zSig |= ( rem1 != 0 );
-    }
-    return roundAndPackFloat64(zSign, zExp, zSig, status);
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the remainder of the double-precision floating-point value `a'
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 1fe8734261..d2b8d29f22 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -349,6 +349,7 @@ float64 float16_to_float64(float16 a, flag ieee, float_status *status);
 float16 float16_add(float16, float16, float_status *status);
 float16 float16_sub(float16, float16, float_status *status);
 float16 float16_mul(float16, float16, float_status *status);
+float16 float16_div(float16, float16, float_status *status);
 
 int float16_is_quiet_nan(float16, float_status *status);
 int float16_is_signaling_nan(float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 13/19] fpu/softfloat: re-factor muladd
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (11 preceding siblings ...)
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 12/19] fpu/softfloat: re-factor div Alex Bennée
@ 2017-12-11 12:56 ` Alex Bennée
  2017-12-18 22:36   ` Richard Henderson
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 14/19] fpu/softfloat: re-factor round_to_int Alex Bennée
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:56 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

We can now add float16_muladd and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 muladd functions.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 fpu/softfloat-specialize.h | 104 -------
 fpu/softfloat.c            | 756 +++++++++++++++++----------------------------
 include/fpu/softfloat.h    |   1 +
 3 files changed, 286 insertions(+), 575 deletions(-)

diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
index 3d507d8c77..98fb0e7001 100644
--- a/fpu/softfloat-specialize.h
+++ b/fpu/softfloat-specialize.h
@@ -729,58 +729,6 @@ static float32 propagateFloat32NaN(float32 a, float32 b, float_status *status)
     }
 }
 
-/*----------------------------------------------------------------------------
-| Takes three single-precision floating-point values `a', `b' and `c', one of
-| which is a NaN, and returns the appropriate NaN result.  If any of  `a',
-| `b' or `c' is a signaling NaN, the invalid exception is raised.
-| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case
-| obviously c is a NaN, and whether to propagate c or some other NaN is
-| implementation defined).
-*----------------------------------------------------------------------------*/
-
-static float32 propagateFloat32MulAddNaN(float32 a, float32 b,
-                                         float32 c, flag infzero,
-                                         float_status *status)
-{
-    flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN,
-        cIsQuietNaN, cIsSignalingNaN;
-    int which;
-
-    aIsQuietNaN = float32_is_quiet_nan(a, status);
-    aIsSignalingNaN = float32_is_signaling_nan(a, status);
-    bIsQuietNaN = float32_is_quiet_nan(b, status);
-    bIsSignalingNaN = float32_is_signaling_nan(b, status);
-    cIsQuietNaN = float32_is_quiet_nan(c, status);
-    cIsSignalingNaN = float32_is_signaling_nan(c, status);
-
-    if (aIsSignalingNaN | bIsSignalingNaN | cIsSignalingNaN) {
-        float_raise(float_flag_invalid, status);
-    }
-
-    which = pickNaNMulAdd(aIsQuietNaN, aIsSignalingNaN,
-                          bIsQuietNaN, bIsSignalingNaN,
-                          cIsQuietNaN, cIsSignalingNaN, infzero, status);
-
-    if (status->default_nan_mode) {
-        /* Note that this check is after pickNaNMulAdd so that function
-         * has an opportunity to set the Invalid flag.
-         */
-        return float32_default_nan(status);
-    }
-
-    switch (which) {
-    case 0:
-        return float32_maybe_silence_nan(a, status);
-    case 1:
-        return float32_maybe_silence_nan(b, status);
-    case 2:
-        return float32_maybe_silence_nan(c, status);
-    case 3:
-    default:
-        return float32_default_nan(status);
-    }
-}
-
 #ifdef NO_SIGNALING_NANS
 int float64_is_quiet_nan(float64 a_, float_status *status)
 {
@@ -936,58 +884,6 @@ static float64 propagateFloat64NaN(float64 a, float64 b, float_status *status)
     }
 }
 
-/*----------------------------------------------------------------------------
-| Takes three double-precision floating-point values `a', `b' and `c', one of
-| which is a NaN, and returns the appropriate NaN result.  If any of  `a',
-| `b' or `c' is a signaling NaN, the invalid exception is raised.
-| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case
-| obviously c is a NaN, and whether to propagate c or some other NaN is
-| implementation defined).
-*----------------------------------------------------------------------------*/
-
-static float64 propagateFloat64MulAddNaN(float64 a, float64 b,
-                                         float64 c, flag infzero,
-                                         float_status *status)
-{
-    flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN,
-        cIsQuietNaN, cIsSignalingNaN;
-    int which;
-
-    aIsQuietNaN = float64_is_quiet_nan(a, status);
-    aIsSignalingNaN = float64_is_signaling_nan(a, status);
-    bIsQuietNaN = float64_is_quiet_nan(b, status);
-    bIsSignalingNaN = float64_is_signaling_nan(b, status);
-    cIsQuietNaN = float64_is_quiet_nan(c, status);
-    cIsSignalingNaN = float64_is_signaling_nan(c, status);
-
-    if (aIsSignalingNaN | bIsSignalingNaN | cIsSignalingNaN) {
-        float_raise(float_flag_invalid, status);
-    }
-
-    which = pickNaNMulAdd(aIsQuietNaN, aIsSignalingNaN,
-                          bIsQuietNaN, bIsSignalingNaN,
-                          cIsQuietNaN, cIsSignalingNaN, infzero, status);
-
-    if (status->default_nan_mode) {
-        /* Note that this check is after pickNaNMulAdd so that function
-         * has an opportunity to set the Invalid flag.
-         */
-        return float64_default_nan(status);
-    }
-
-    switch (which) {
-    case 0:
-        return float64_maybe_silence_nan(a, status);
-    case 1:
-        return float64_maybe_silence_nan(b, status);
-    case 2:
-        return float64_maybe_silence_nan(c, status);
-    case 3:
-    default:
-        return float64_default_nan(status);
-    }
-}
-
 #ifdef NO_SIGNALING_NANS
 int floatx80_is_quiet_nan(floatx80 a_, float_status *status)
 {
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 2b703c12ed..bf37f23f6a 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -561,6 +561,50 @@ static decomposed_parts pick_nan_parts(decomposed_parts a, decomposed_parts b,
     return a;
 }
 
+static decomposed_parts pick_nan_muladd_parts(decomposed_parts a,
+                                              decomposed_parts b,
+                                              decomposed_parts c,
+                                              bool inf_zero,
+                                              float_status *s)
+{
+    if (a.cls == float_class_snan
+        ||
+        b.cls == float_class_snan
+        ||
+        c.cls == float_class_snan) {
+        s->float_exception_flags |= float_flag_invalid;
+    }
+
+    if (s->default_nan_mode) {
+        a.cls = float_class_dnan;
+    } else {
+        switch (pickNaNMulAdd(a.cls == float_class_qnan,
+                              a.cls == float_class_snan,
+                              b.cls == float_class_qnan,
+                              b.cls == float_class_snan,
+                              c.cls == float_class_qnan,
+                              c.cls == float_class_snan,
+                              inf_zero, s)) {
+        case 0:
+            break;
+        case 1:
+            a = b;
+            break;
+        case 2:
+            a = c;
+            break;
+        case 3:
+            a.cls = float_class_dnan;
+            return a;
+        default:
+            g_assert_not_reached();
+        }
+
+        a.cls = float_class_msnan;
+    }
+    return a;
+}
+
 
 /*
  * Returns the result of adding the absolute values of the
@@ -809,6 +853,247 @@ float64 float64_mul(float64 a, float64 b, float_status *status)
     return float64_round_pack_canonical(pr, status);
 }
 
+/*
+ * Returns the result of multiplying the floating-point values `a' and
+ * `b' then adding 'c', with no intermediate rounding step after the
+ * multiplication. The operation is performed according to the
+ * IEC/IEEE Standard for Binary Floating-Point Arithmetic 754-2008.
+ * The flags argument allows the caller to select negation of the
+ * addend, the intermediate product, or the final result. (The
+ * difference between this and having the caller do a separate
+ * negation is that negating externally will flip the sign bit on
+ * NaNs.)
+ */
+
+static decomposed_parts muladd_decomposed(decomposed_parts a,
+                                          decomposed_parts b,
+                                          decomposed_parts c, int flags,
+                                          float_status *s)
+{
+    bool inf_zero = ((1 << a.cls) | (1 << b.cls)) ==
+                    ((1 << float_class_inf) | (1 << float_class_zero));
+    bool p_sign;
+    bool sign_flip = flags & float_muladd_negate_result;
+    float_class p_class;
+    uint64_t hi, lo;
+    int p_exp;
+
+    /* It is implementation-defined whether the cases of (0,inf,qnan)
+     * and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
+     * they return if they do), so we have to hand this information
+     * off to the target-specific pick-a-NaN routine.
+     */
+    if (a.cls >= float_class_qnan ||
+        b.cls >= float_class_qnan ||
+        c.cls >= float_class_qnan) {
+        return pick_nan_muladd_parts(a, b, c, inf_zero, s);
+    }
+
+    if (inf_zero) {
+        s->float_exception_flags |= float_flag_invalid;
+        a.cls = float_class_dnan;
+        return a;
+    }
+
+    if (flags & float_muladd_negate_c) {
+        c.sign ^= 1;
+    }
+
+    p_sign = a.sign ^ b.sign;
+
+    if (flags & float_muladd_negate_product) {
+        p_sign ^= 1;
+    }
+
+    if (a.cls == float_class_inf || b.cls == float_class_inf) {
+        p_class = float_class_inf;
+    } else if (a.cls == float_class_zero || b.cls == float_class_zero) {
+        p_class = float_class_zero;
+    } else {
+        p_class = float_class_normal;
+    }
+
+    if (c.cls == float_class_inf) {
+        if (p_class == float_class_inf && p_sign != c.sign) {
+            s->float_exception_flags |= float_flag_invalid;
+            a.cls = float_class_dnan;
+        } else {
+            a.cls = float_class_inf;
+            a.sign = c.sign ^ sign_flip;
+        }
+        return a;
+    }
+
+    if (p_class == float_class_inf) {
+        a.cls = float_class_inf;
+        a.sign = p_sign ^ sign_flip;
+        return a;
+    }
+
+    if (p_class == float_class_zero) {
+        if (c.cls == float_class_zero) {
+            if (p_sign != c.sign) {
+                p_sign = s->float_rounding_mode == float_round_down;
+            }
+            c.sign = p_sign;
+        } else if (flags & float_muladd_halve_result) {
+            c.exp -= 1;
+        }
+        c.sign ^= sign_flip;
+        return c;
+    }
+
+    /* a & b should be normals now... */
+    assert(a.cls == float_class_normal &&
+           b.cls == float_class_normal);
+
+    p_exp = a.exp + b.exp;
+
+    /* Multiply of 2 62-bit numbers produces a (2*62) == 124-bit
+     * result.
+     */
+    mul64To128(a.frac, b.frac, &hi, &lo);
+    /* binary point now at bit 124 */
+
+    /* check for overflow */
+    if (hi & (1ULL << (DECOMPOSED_BINARY_POINT * 2 + 1 - 64))) {
+        shift128RightJamming(hi, lo, 1, &hi, &lo);
+        p_exp += 1;
+    }
+
+    /* + add/sub */
+    if (c.cls == float_class_zero) {
+        /* move binary point back to 62 */
+        shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo);
+    } else {
+        int exp_diff = p_exp - c.exp;
+        if (p_sign == c.sign) {
+            /* Addition */
+            if (exp_diff <= 0) {
+                shift128RightJamming(hi, lo,
+                                     DECOMPOSED_BINARY_POINT - exp_diff,
+                                     &hi, &lo);
+                lo += c.frac;
+                p_exp = c.exp;
+            } else {
+                uint64_t c_hi, c_lo;
+                /* shift c to the same binary point as the product (124) */
+                c_hi = c.frac >> 2;
+                c_lo = 0;
+                shift128RightJamming(c_hi, c_lo,
+                                     exp_diff,
+                                     &c_hi, &c_lo);
+                add128(hi, lo, c_hi, c_lo, &hi, &lo);
+                /* move binary point back to 62 */
+                shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo);
+            }
+
+            if (lo & DECOMPOSED_OVERFLOW_BIT) {
+                shift64RightJamming(lo, 1, &lo);
+                p_exp += 1;
+            }
+
+        } else {
+            /* Subtraction */
+            uint64_t c_hi, c_lo;
+            /* make C binary point match product at bit 124 */
+            c_hi = c.frac >> 2;
+            c_lo = 0;
+
+            if (exp_diff <= 0) {
+                shift128RightJamming(hi, lo, -exp_diff, &hi, &lo);
+                if (exp_diff == 0
+                    &&
+                    (hi > c_hi || (hi == c_hi && lo >= c_lo))) {
+                    sub128(hi, lo, c_hi, c_lo, &hi, &lo);
+                } else {
+                    sub128(c_hi, c_lo, hi, lo, &hi, &lo);
+                    p_sign ^= 1;
+                    p_exp = c.exp;
+                }
+            } else {
+                shift128RightJamming(c_hi, c_lo,
+                                     exp_diff,
+                                     &c_hi, &c_lo);
+                sub128(hi, lo, c_hi, c_lo, &hi, &lo);
+            }
+
+            if (hi == 0 && lo == 0) {
+                a.cls = float_class_zero;
+                a.sign = s->float_rounding_mode == float_round_down;
+                a.sign ^= sign_flip;
+                return a;
+            } else {
+                int shift;
+                if (hi != 0) {
+                    shift = clz64(hi);
+                } else {
+                    shift = clz64(lo) + 64;
+                }
+                /* Normalizing to a binary point of 124 is the
+                   correct adjust for the exponent.  However since we're
+                   shifting, we might as well put the binary point back
+                   at 62 where we really want it.  Therefore shift as
+                   if we're leaving 1 bit at the top of the word, but
+                   adjust the exponent as if we're leaving 3 bits.  */
+                shift -= 1;
+                if (shift >= 64) {
+                    lo = lo << (shift - 64);
+                } else {
+                    hi = (hi << shift) | (lo >> (64 - shift));
+                    lo = hi | ((lo << shift) != 0);
+                }
+                p_exp -= shift - 2;
+            }
+        }
+    }
+
+    if (flags & float_muladd_halve_result) {
+            p_exp -= 1;
+    }
+
+    /* finally prepare our result */
+    a.cls = float_class_normal;
+    a.sign = p_sign ^ sign_flip;
+    a.exp = p_exp;
+    a.frac = lo;
+
+    return a;
+}
+
+float16 float16_muladd(float16 a, float16 b, float16 c, int flags,
+                       float_status *status)
+{
+    decomposed_parts pa = float16_unpack_canonical(a, status);
+    decomposed_parts pb = float16_unpack_canonical(b, status);
+    decomposed_parts pc = float16_unpack_canonical(c, status);
+    decomposed_parts pr = muladd_decomposed(pa, pb, pc, flags, status);
+
+    return float16_round_pack_canonical(pr, status);
+}
+
+float32 float32_muladd(float32 a, float32 b, float32 c, int flags,
+                       float_status *status)
+{
+    decomposed_parts pa = float32_unpack_canonical(a, status);
+    decomposed_parts pb = float32_unpack_canonical(b, status);
+    decomposed_parts pc = float32_unpack_canonical(c, status);
+    decomposed_parts pr = muladd_decomposed(pa, pb, pc, flags, status);
+
+    return float32_round_pack_canonical(pr, status);
+}
+
+float64 float64_muladd(float64 a, float64 b, float64 c, int flags,
+                       float_status *status)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, status);
+    decomposed_parts pb = float64_unpack_canonical(b, status);
+    decomposed_parts pc = float64_unpack_canonical(c, status);
+    decomposed_parts pr = muladd_decomposed(pa, pb, pc, flags, status);
+
+    return float64_round_pack_canonical(pr, status);
+}
+
 /*
  * Returns the result of dividing the floating-point value `a' by the
  * corresponding value `b'. The operation is performed according to
@@ -2814,231 +3099,6 @@ float32 float32_rem(float32 a, float32 b, float_status *status)
     return normalizeRoundAndPackFloat32(aSign ^ zSign, bExp, aSig, status);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of multiplying the single-precision floating-point values
-| `a' and `b' then adding 'c', with no intermediate rounding step after the
-| multiplication.  The operation is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic 754-2008.
-| The flags argument allows the caller to select negation of the
-| addend, the intermediate product, or the final result. (The difference
-| between this and having the caller do a separate negation is that negating
-| externally will flip the sign bit on NaNs.)
-*----------------------------------------------------------------------------*/
-
-float32 float32_muladd(float32 a, float32 b, float32 c, int flags,
-                       float_status *status)
-{
-    flag aSign, bSign, cSign, zSign;
-    int aExp, bExp, cExp, pExp, zExp, expDiff;
-    uint32_t aSig, bSig, cSig;
-    flag pInf, pZero, pSign;
-    uint64_t pSig64, cSig64, zSig64;
-    uint32_t pSig;
-    int shiftcount;
-    flag signflip, infzero;
-
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-    c = float32_squash_input_denormal(c, status);
-    aSig = extractFloat32Frac(a);
-    aExp = extractFloat32Exp(a);
-    aSign = extractFloat32Sign(a);
-    bSig = extractFloat32Frac(b);
-    bExp = extractFloat32Exp(b);
-    bSign = extractFloat32Sign(b);
-    cSig = extractFloat32Frac(c);
-    cExp = extractFloat32Exp(c);
-    cSign = extractFloat32Sign(c);
-
-    infzero = ((aExp == 0 && aSig == 0 && bExp == 0xff && bSig == 0) ||
-               (aExp == 0xff && aSig == 0 && bExp == 0 && bSig == 0));
-
-    /* It is implementation-defined whether the cases of (0,inf,qnan)
-     * and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
-     * they return if they do), so we have to hand this information
-     * off to the target-specific pick-a-NaN routine.
-     */
-    if (((aExp == 0xff) && aSig) ||
-        ((bExp == 0xff) && bSig) ||
-        ((cExp == 0xff) && cSig)) {
-        return propagateFloat32MulAddNaN(a, b, c, infzero, status);
-    }
-
-    if (infzero) {
-        float_raise(float_flag_invalid, status);
-        return float32_default_nan(status);
-    }
-
-    if (flags & float_muladd_negate_c) {
-        cSign ^= 1;
-    }
-
-    signflip = (flags & float_muladd_negate_result) ? 1 : 0;
-
-    /* Work out the sign and type of the product */
-    pSign = aSign ^ bSign;
-    if (flags & float_muladd_negate_product) {
-        pSign ^= 1;
-    }
-    pInf = (aExp == 0xff) || (bExp == 0xff);
-    pZero = ((aExp | aSig) == 0) || ((bExp | bSig) == 0);
-
-    if (cExp == 0xff) {
-        if (pInf && (pSign ^ cSign)) {
-            /* addition of opposite-signed infinities => InvalidOperation */
-            float_raise(float_flag_invalid, status);
-            return float32_default_nan(status);
-        }
-        /* Otherwise generate an infinity of the same sign */
-        return packFloat32(cSign ^ signflip, 0xff, 0);
-    }
-
-    if (pInf) {
-        return packFloat32(pSign ^ signflip, 0xff, 0);
-    }
-
-    if (pZero) {
-        if (cExp == 0) {
-            if (cSig == 0) {
-                /* Adding two exact zeroes */
-                if (pSign == cSign) {
-                    zSign = pSign;
-                } else if (status->float_rounding_mode == float_round_down) {
-                    zSign = 1;
-                } else {
-                    zSign = 0;
-                }
-                return packFloat32(zSign ^ signflip, 0, 0);
-            }
-            /* Exact zero plus a denorm */
-            if (status->flush_to_zero) {
-                float_raise(float_flag_output_denormal, status);
-                return packFloat32(cSign ^ signflip, 0, 0);
-            }
-        }
-        /* Zero plus something non-zero : just return the something */
-        if (flags & float_muladd_halve_result) {
-            if (cExp == 0) {
-                normalizeFloat32Subnormal(cSig, &cExp, &cSig);
-            }
-            /* Subtract one to halve, and one again because roundAndPackFloat32
-             * wants one less than the true exponent.
-             */
-            cExp -= 2;
-            cSig = (cSig | 0x00800000) << 7;
-            return roundAndPackFloat32(cSign ^ signflip, cExp, cSig, status);
-        }
-        return packFloat32(cSign ^ signflip, cExp, cSig);
-    }
-
-    if (aExp == 0) {
-        normalizeFloat32Subnormal(aSig, &aExp, &aSig);
-    }
-    if (bExp == 0) {
-        normalizeFloat32Subnormal(bSig, &bExp, &bSig);
-    }
-
-    /* Calculate the actual result a * b + c */
-
-    /* Multiply first; this is easy. */
-    /* NB: we subtract 0x7e where float32_mul() subtracts 0x7f
-     * because we want the true exponent, not the "one-less-than"
-     * flavour that roundAndPackFloat32() takes.
-     */
-    pExp = aExp + bExp - 0x7e;
-    aSig = (aSig | 0x00800000) << 7;
-    bSig = (bSig | 0x00800000) << 8;
-    pSig64 = (uint64_t)aSig * bSig;
-    if ((int64_t)(pSig64 << 1) >= 0) {
-        pSig64 <<= 1;
-        pExp--;
-    }
-
-    zSign = pSign ^ signflip;
-
-    /* Now pSig64 is the significand of the multiply, with the explicit bit in
-     * position 62.
-     */
-    if (cExp == 0) {
-        if (!cSig) {
-            /* Throw out the special case of c being an exact zero now */
-            shift64RightJamming(pSig64, 32, &pSig64);
-            pSig = pSig64;
-            if (flags & float_muladd_halve_result) {
-                pExp--;
-            }
-            return roundAndPackFloat32(zSign, pExp - 1,
-                                       pSig, status);
-        }
-        normalizeFloat32Subnormal(cSig, &cExp, &cSig);
-    }
-
-    cSig64 = (uint64_t)cSig << (62 - 23);
-    cSig64 |= LIT64(0x4000000000000000);
-    expDiff = pExp - cExp;
-
-    if (pSign == cSign) {
-        /* Addition */
-        if (expDiff > 0) {
-            /* scale c to match p */
-            shift64RightJamming(cSig64, expDiff, &cSig64);
-            zExp = pExp;
-        } else if (expDiff < 0) {
-            /* scale p to match c */
-            shift64RightJamming(pSig64, -expDiff, &pSig64);
-            zExp = cExp;
-        } else {
-            /* no scaling needed */
-            zExp = cExp;
-        }
-        /* Add significands and make sure explicit bit ends up in posn 62 */
-        zSig64 = pSig64 + cSig64;
-        if ((int64_t)zSig64 < 0) {
-            shift64RightJamming(zSig64, 1, &zSig64);
-        } else {
-            zExp--;
-        }
-    } else {
-        /* Subtraction */
-        if (expDiff > 0) {
-            shift64RightJamming(cSig64, expDiff, &cSig64);
-            zSig64 = pSig64 - cSig64;
-            zExp = pExp;
-        } else if (expDiff < 0) {
-            shift64RightJamming(pSig64, -expDiff, &pSig64);
-            zSig64 = cSig64 - pSig64;
-            zExp = cExp;
-            zSign ^= 1;
-        } else {
-            zExp = pExp;
-            if (cSig64 < pSig64) {
-                zSig64 = pSig64 - cSig64;
-            } else if (pSig64 < cSig64) {
-                zSig64 = cSig64 - pSig64;
-                zSign ^= 1;
-            } else {
-                /* Exact zero */
-                zSign = signflip;
-                if (status->float_rounding_mode == float_round_down) {
-                    zSign ^= 1;
-                }
-                return packFloat32(zSign, 0, 0);
-            }
-        }
-        --zExp;
-        /* Normalize to put the explicit bit back into bit 62. */
-        shiftcount = countLeadingZeros64(zSig64) - 1;
-        zSig64 <<= shiftcount;
-        zExp -= shiftcount;
-    }
-    if (flags & float_muladd_halve_result) {
-        zExp--;
-    }
-
-    shift64RightJamming(zSig64, 32, &zSig64);
-    return roundAndPackFloat32(zSign, zExp, zSig64, status);
-}
-
 
 /*----------------------------------------------------------------------------
 | Returns the square root of the single-precision floating-point value `a'.
@@ -4262,252 +4322,6 @@ float64 float64_rem(float64 a, float64 b, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of multiplying the double-precision floating-point values
-| `a' and `b' then adding 'c', with no intermediate rounding step after the
-| multiplication.  The operation is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic 754-2008.
-| The flags argument allows the caller to select negation of the
-| addend, the intermediate product, or the final result. (The difference
-| between this and having the caller do a separate negation is that negating
-| externally will flip the sign bit on NaNs.)
-*----------------------------------------------------------------------------*/
-
-float64 float64_muladd(float64 a, float64 b, float64 c, int flags,
-                       float_status *status)
-{
-    flag aSign, bSign, cSign, zSign;
-    int aExp, bExp, cExp, pExp, zExp, expDiff;
-    uint64_t aSig, bSig, cSig;
-    flag pInf, pZero, pSign;
-    uint64_t pSig0, pSig1, cSig0, cSig1, zSig0, zSig1;
-    int shiftcount;
-    flag signflip, infzero;
-
-    a = float64_squash_input_denormal(a, status);
-    b = float64_squash_input_denormal(b, status);
-    c = float64_squash_input_denormal(c, status);
-    aSig = extractFloat64Frac(a);
-    aExp = extractFloat64Exp(a);
-    aSign = extractFloat64Sign(a);
-    bSig = extractFloat64Frac(b);
-    bExp = extractFloat64Exp(b);
-    bSign = extractFloat64Sign(b);
-    cSig = extractFloat64Frac(c);
-    cExp = extractFloat64Exp(c);
-    cSign = extractFloat64Sign(c);
-
-    infzero = ((aExp == 0 && aSig == 0 && bExp == 0x7ff && bSig == 0) ||
-               (aExp == 0x7ff && aSig == 0 && bExp == 0 && bSig == 0));
-
-    /* It is implementation-defined whether the cases of (0,inf,qnan)
-     * and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
-     * they return if they do), so we have to hand this information
-     * off to the target-specific pick-a-NaN routine.
-     */
-    if (((aExp == 0x7ff) && aSig) ||
-        ((bExp == 0x7ff) && bSig) ||
-        ((cExp == 0x7ff) && cSig)) {
-        return propagateFloat64MulAddNaN(a, b, c, infzero, status);
-    }
-
-    if (infzero) {
-        float_raise(float_flag_invalid, status);
-        return float64_default_nan(status);
-    }
-
-    if (flags & float_muladd_negate_c) {
-        cSign ^= 1;
-    }
-
-    signflip = (flags & float_muladd_negate_result) ? 1 : 0;
-
-    /* Work out the sign and type of the product */
-    pSign = aSign ^ bSign;
-    if (flags & float_muladd_negate_product) {
-        pSign ^= 1;
-    }
-    pInf = (aExp == 0x7ff) || (bExp == 0x7ff);
-    pZero = ((aExp | aSig) == 0) || ((bExp | bSig) == 0);
-
-    if (cExp == 0x7ff) {
-        if (pInf && (pSign ^ cSign)) {
-            /* addition of opposite-signed infinities => InvalidOperation */
-            float_raise(float_flag_invalid, status);
-            return float64_default_nan(status);
-        }
-        /* Otherwise generate an infinity of the same sign */
-        return packFloat64(cSign ^ signflip, 0x7ff, 0);
-    }
-
-    if (pInf) {
-        return packFloat64(pSign ^ signflip, 0x7ff, 0);
-    }
-
-    if (pZero) {
-        if (cExp == 0) {
-            if (cSig == 0) {
-                /* Adding two exact zeroes */
-                if (pSign == cSign) {
-                    zSign = pSign;
-                } else if (status->float_rounding_mode == float_round_down) {
-                    zSign = 1;
-                } else {
-                    zSign = 0;
-                }
-                return packFloat64(zSign ^ signflip, 0, 0);
-            }
-            /* Exact zero plus a denorm */
-            if (status->flush_to_zero) {
-                float_raise(float_flag_output_denormal, status);
-                return packFloat64(cSign ^ signflip, 0, 0);
-            }
-        }
-        /* Zero plus something non-zero : just return the something */
-        if (flags & float_muladd_halve_result) {
-            if (cExp == 0) {
-                normalizeFloat64Subnormal(cSig, &cExp, &cSig);
-            }
-            /* Subtract one to halve, and one again because roundAndPackFloat64
-             * wants one less than the true exponent.
-             */
-            cExp -= 2;
-            cSig = (cSig | 0x0010000000000000ULL) << 10;
-            return roundAndPackFloat64(cSign ^ signflip, cExp, cSig, status);
-        }
-        return packFloat64(cSign ^ signflip, cExp, cSig);
-    }
-
-    if (aExp == 0) {
-        normalizeFloat64Subnormal(aSig, &aExp, &aSig);
-    }
-    if (bExp == 0) {
-        normalizeFloat64Subnormal(bSig, &bExp, &bSig);
-    }
-
-    /* Calculate the actual result a * b + c */
-
-    /* Multiply first; this is easy. */
-    /* NB: we subtract 0x3fe where float64_mul() subtracts 0x3ff
-     * because we want the true exponent, not the "one-less-than"
-     * flavour that roundAndPackFloat64() takes.
-     */
-    pExp = aExp + bExp - 0x3fe;
-    aSig = (aSig | LIT64(0x0010000000000000))<<10;
-    bSig = (bSig | LIT64(0x0010000000000000))<<11;
-    mul64To128(aSig, bSig, &pSig0, &pSig1);
-    if ((int64_t)(pSig0 << 1) >= 0) {
-        shortShift128Left(pSig0, pSig1, 1, &pSig0, &pSig1);
-        pExp--;
-    }
-
-    zSign = pSign ^ signflip;
-
-    /* Now [pSig0:pSig1] is the significand of the multiply, with the explicit
-     * bit in position 126.
-     */
-    if (cExp == 0) {
-        if (!cSig) {
-            /* Throw out the special case of c being an exact zero now */
-            shift128RightJamming(pSig0, pSig1, 64, &pSig0, &pSig1);
-            if (flags & float_muladd_halve_result) {
-                pExp--;
-            }
-            return roundAndPackFloat64(zSign, pExp - 1,
-                                       pSig1, status);
-        }
-        normalizeFloat64Subnormal(cSig, &cExp, &cSig);
-    }
-
-    /* Shift cSig and add the explicit bit so [cSig0:cSig1] is the
-     * significand of the addend, with the explicit bit in position 126.
-     */
-    cSig0 = cSig << (126 - 64 - 52);
-    cSig1 = 0;
-    cSig0 |= LIT64(0x4000000000000000);
-    expDiff = pExp - cExp;
-
-    if (pSign == cSign) {
-        /* Addition */
-        if (expDiff > 0) {
-            /* scale c to match p */
-            shift128RightJamming(cSig0, cSig1, expDiff, &cSig0, &cSig1);
-            zExp = pExp;
-        } else if (expDiff < 0) {
-            /* scale p to match c */
-            shift128RightJamming(pSig0, pSig1, -expDiff, &pSig0, &pSig1);
-            zExp = cExp;
-        } else {
-            /* no scaling needed */
-            zExp = cExp;
-        }
-        /* Add significands and make sure explicit bit ends up in posn 126 */
-        add128(pSig0, pSig1, cSig0, cSig1, &zSig0, &zSig1);
-        if ((int64_t)zSig0 < 0) {
-            shift128RightJamming(zSig0, zSig1, 1, &zSig0, &zSig1);
-        } else {
-            zExp--;
-        }
-        shift128RightJamming(zSig0, zSig1, 64, &zSig0, &zSig1);
-        if (flags & float_muladd_halve_result) {
-            zExp--;
-        }
-        return roundAndPackFloat64(zSign, zExp, zSig1, status);
-    } else {
-        /* Subtraction */
-        if (expDiff > 0) {
-            shift128RightJamming(cSig0, cSig1, expDiff, &cSig0, &cSig1);
-            sub128(pSig0, pSig1, cSig0, cSig1, &zSig0, &zSig1);
-            zExp = pExp;
-        } else if (expDiff < 0) {
-            shift128RightJamming(pSig0, pSig1, -expDiff, &pSig0, &pSig1);
-            sub128(cSig0, cSig1, pSig0, pSig1, &zSig0, &zSig1);
-            zExp = cExp;
-            zSign ^= 1;
-        } else {
-            zExp = pExp;
-            if (lt128(cSig0, cSig1, pSig0, pSig1)) {
-                sub128(pSig0, pSig1, cSig0, cSig1, &zSig0, &zSig1);
-            } else if (lt128(pSig0, pSig1, cSig0, cSig1)) {
-                sub128(cSig0, cSig1, pSig0, pSig1, &zSig0, &zSig1);
-                zSign ^= 1;
-            } else {
-                /* Exact zero */
-                zSign = signflip;
-                if (status->float_rounding_mode == float_round_down) {
-                    zSign ^= 1;
-                }
-                return packFloat64(zSign, 0, 0);
-            }
-        }
-        --zExp;
-        /* Do the equivalent of normalizeRoundAndPackFloat64() but
-         * starting with the significand in a pair of uint64_t.
-         */
-        if (zSig0) {
-            shiftcount = countLeadingZeros64(zSig0) - 1;
-            shortShift128Left(zSig0, zSig1, shiftcount, &zSig0, &zSig1);
-            if (zSig1) {
-                zSig0 |= 1;
-            }
-            zExp -= shiftcount;
-        } else {
-            shiftcount = countLeadingZeros64(zSig1);
-            if (shiftcount == 0) {
-                zSig0 = (zSig1 >> 1) | (zSig1 & 1);
-                zExp -= 63;
-            } else {
-                shiftcount--;
-                zSig0 = zSig1 << shiftcount;
-                zExp -= (shiftcount + 64);
-            }
-        }
-        if (flags & float_muladd_halve_result) {
-            zExp--;
-        }
-        return roundAndPackFloat64(zSign, zExp, zSig0, status);
-    }
-}
 
 /*----------------------------------------------------------------------------
 | Returns the square root of the double-precision floating-point value `a'.
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index d2b8d29f22..102cf4b1e1 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -349,6 +349,7 @@ float64 float16_to_float64(float16 a, flag ieee, float_status *status);
 float16 float16_add(float16, float16, float_status *status);
 float16 float16_sub(float16, float16, float_status *status);
 float16 float16_mul(float16, float16, float_status *status);
+float16 float16_muladd(float16, float16, float16, int, float_status *status);
 float16 float16_div(float16, float16, float_status *status);
 
 int float16_is_quiet_nan(float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 14/19] fpu/softfloat: re-factor round_to_int
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (12 preceding siblings ...)
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 13/19] fpu/softfloat: re-factor muladd Alex Bennée
@ 2017-12-11 12:57 ` Alex Bennée
  2017-12-18 22:41   ` Richard Henderson
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 15/19] fpu/softfloat: re-factor float to int/uint Alex Bennée
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:57 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

We can now add float16_round_to_int and use the common round_decomposed and
canonicalize functions to have a single implementation for
float16/32/64 round_to_int functions.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 fpu/softfloat.c         | 304 ++++++++++++++++++++----------------------------
 include/fpu/softfloat.h |   1 +
 2 files changed, 130 insertions(+), 175 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index bf37f23f6a..9914ecb783 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1183,6 +1183,135 @@ float64 float64_div(float64 a, float64 b, float_status *status)
     return float64_round_pack_canonical(pr, status);
 }
 
+/*
+ * Rounds the floating-point value `a' to an integer, and returns the
+ * result as a floating-point value. The operation is performed
+ * according to the IEC/IEEE Standard for Binary Floating-Point
+ * Arithmetic.
+ */
+
+static decomposed_parts round_decomposed(decomposed_parts a, int rounding_mode,
+                                         float_status *s)
+{
+
+    switch (a.cls) {
+    case float_class_snan:
+        a.cls = s->default_nan_mode ? float_class_dnan : float_class_msnan;
+        s->float_exception_flags |= float_flag_invalid;
+        break;
+    case float_class_zero:
+    case float_class_inf:
+    case float_class_qnan:
+        /* already "integral" */
+        break;
+    case float_class_normal:
+        if (a.exp >= DECOMPOSED_BINARY_POINT) {
+            /* already integral */
+            break;
+        }
+        if (a.exp < 0) {
+            bool one;
+            /* all fractional */
+            s->float_exception_flags |= float_flag_inexact;
+            switch (rounding_mode) {
+            case float_round_nearest_even:
+                one = a.exp == -1 && a.frac > DECOMPOSED_IMPLICIT_BIT;
+                break;
+            case float_round_ties_away:
+                one = a.exp == -1 && a.frac >= DECOMPOSED_IMPLICIT_BIT;
+                break;
+            case float_round_to_zero:
+                one = false;
+                break;
+            case float_round_up:
+                one = !a.sign;
+                break;
+            case float_round_down:
+                one = a.sign;
+                break;
+            default:
+                g_assert_not_reached();
+            }
+
+            if (one) {
+                a.frac = DECOMPOSED_IMPLICIT_BIT;
+                a.exp = 0;
+            } else {
+                a.cls = float_class_zero;
+            }
+        } else {
+            uint64_t frac_lsb, frac_lsbm1, round_mask, roundeven_mask, inc;
+
+            frac_lsb = DECOMPOSED_IMPLICIT_BIT >> a.exp;
+            frac_lsbm1 = frac_lsb >> 1;
+            roundeven_mask = (frac_lsb - 1) | frac_lsb;
+            round_mask = roundeven_mask >> 1;
+
+            switch (rounding_mode) {
+            case float_round_nearest_even:
+                inc = ((a.frac & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
+                break;
+            case float_round_ties_away:
+                inc = frac_lsbm1;
+                break;
+            case float_round_to_zero:
+                inc = 0;
+                break;
+            case float_round_up:
+                inc = a.sign ? 0 : round_mask;
+                break;
+            case float_round_down:
+                inc = a.sign ? round_mask : 0;
+                break;
+            default:
+                g_assert_not_reached();
+            }
+
+            if (a.frac & round_mask) {
+                s->float_exception_flags |= float_flag_inexact;
+                a.frac += inc;
+                a.frac &= ~round_mask;
+                if (a.frac & DECOMPOSED_OVERFLOW_BIT) {
+                    a.frac >>= 1;
+                    a.exp++;
+                }
+            }
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return a;
+}
+
+float16 float16_round_to_int(float16 a, float_status *s)
+{
+    decomposed_parts pa = float16_unpack_canonical(a, s);
+    decomposed_parts pr = round_decomposed(pa, s->float_rounding_mode, s);
+    return float16_round_pack_canonical(pr, s);
+}
+
+float32 float32_round_to_int(float32 a, float_status *s)
+{
+    decomposed_parts pa = float32_unpack_canonical(a, s);
+    decomposed_parts pr = round_decomposed(pa, s->float_rounding_mode, s);
+    return float32_round_pack_canonical(pr, s);
+}
+
+float64 float64_round_to_int(float64 a, float_status *s)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, s);
+    decomposed_parts pr = round_decomposed(pa, s->float_rounding_mode, s);
+    return float64_round_pack_canonical(pr, s);
+}
+
+float64 float64_trunc_to_int(float64 a, float_status *s)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, s);
+    decomposed_parts pr = round_decomposed(pa, float_round_to_zero, s);
+    return float64_round_pack_canonical(pr, s);
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -2913,88 +3042,6 @@ float128 float32_to_float128(float32 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Rounds the single-precision floating-point value `a' to an integer, and
-| returns the result as a single-precision floating-point value.  The
-| operation is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 float32_round_to_int(float32 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    uint32_t lastBitMask, roundBitsMask;
-    uint32_t z;
-    a = float32_squash_input_denormal(a, status);
-
-    aExp = extractFloat32Exp( a );
-    if ( 0x96 <= aExp ) {
-        if ( ( aExp == 0xFF ) && extractFloat32Frac( a ) ) {
-            return propagateFloat32NaN(a, a, status);
-        }
-        return a;
-    }
-    if ( aExp <= 0x7E ) {
-        if ( (uint32_t) ( float32_val(a)<<1 ) == 0 ) return a;
-        status->float_exception_flags |= float_flag_inexact;
-        aSign = extractFloat32Sign( a );
-        switch (status->float_rounding_mode) {
-         case float_round_nearest_even:
-            if ( ( aExp == 0x7E ) && extractFloat32Frac( a ) ) {
-                return packFloat32( aSign, 0x7F, 0 );
-            }
-            break;
-        case float_round_ties_away:
-            if (aExp == 0x7E) {
-                return packFloat32(aSign, 0x7F, 0);
-            }
-            break;
-         case float_round_down:
-            return make_float32(aSign ? 0xBF800000 : 0);
-         case float_round_up:
-            return make_float32(aSign ? 0x80000000 : 0x3F800000);
-        }
-        return packFloat32( aSign, 0, 0 );
-    }
-    lastBitMask = 1;
-    lastBitMask <<= 0x96 - aExp;
-    roundBitsMask = lastBitMask - 1;
-    z = float32_val(a);
-    switch (status->float_rounding_mode) {
-    case float_round_nearest_even:
-        z += lastBitMask>>1;
-        if ((z & roundBitsMask) == 0) {
-            z &= ~lastBitMask;
-        }
-        break;
-    case float_round_ties_away:
-        z += lastBitMask >> 1;
-        break;
-    case float_round_to_zero:
-        break;
-    case float_round_up:
-        if (!extractFloat32Sign(make_float32(z))) {
-            z += roundBitsMask;
-        }
-        break;
-    case float_round_down:
-        if (extractFloat32Sign(make_float32(z))) {
-            z += roundBitsMask;
-        }
-        break;
-    default:
-        abort();
-    }
-    z &= ~ roundBitsMask;
-    if (z != float32_val(a)) {
-        status->float_exception_flags |= float_flag_inexact;
-    }
-    return make_float32(z);
-
-}
-
-
 
 
 /*----------------------------------------------------------------------------
@@ -4140,99 +4187,6 @@ float128 float64_to_float128(float64 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Rounds the double-precision floating-point value `a' to an integer, and
-| returns the result as a double-precision floating-point value.  The
-| operation is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 float64_round_to_int(float64 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    uint64_t lastBitMask, roundBitsMask;
-    uint64_t z;
-    a = float64_squash_input_denormal(a, status);
-
-    aExp = extractFloat64Exp( a );
-    if ( 0x433 <= aExp ) {
-        if ( ( aExp == 0x7FF ) && extractFloat64Frac( a ) ) {
-            return propagateFloat64NaN(a, a, status);
-        }
-        return a;
-    }
-    if ( aExp < 0x3FF ) {
-        if ( (uint64_t) ( float64_val(a)<<1 ) == 0 ) return a;
-        status->float_exception_flags |= float_flag_inexact;
-        aSign = extractFloat64Sign( a );
-        switch (status->float_rounding_mode) {
-         case float_round_nearest_even:
-            if ( ( aExp == 0x3FE ) && extractFloat64Frac( a ) ) {
-                return packFloat64( aSign, 0x3FF, 0 );
-            }
-            break;
-        case float_round_ties_away:
-            if (aExp == 0x3FE) {
-                return packFloat64(aSign, 0x3ff, 0);
-            }
-            break;
-         case float_round_down:
-            return make_float64(aSign ? LIT64( 0xBFF0000000000000 ) : 0);
-         case float_round_up:
-            return make_float64(
-            aSign ? LIT64( 0x8000000000000000 ) : LIT64( 0x3FF0000000000000 ));
-        }
-        return packFloat64( aSign, 0, 0 );
-    }
-    lastBitMask = 1;
-    lastBitMask <<= 0x433 - aExp;
-    roundBitsMask = lastBitMask - 1;
-    z = float64_val(a);
-    switch (status->float_rounding_mode) {
-    case float_round_nearest_even:
-        z += lastBitMask >> 1;
-        if ((z & roundBitsMask) == 0) {
-            z &= ~lastBitMask;
-        }
-        break;
-    case float_round_ties_away:
-        z += lastBitMask >> 1;
-        break;
-    case float_round_to_zero:
-        break;
-    case float_round_up:
-        if (!extractFloat64Sign(make_float64(z))) {
-            z += roundBitsMask;
-        }
-        break;
-    case float_round_down:
-        if (extractFloat64Sign(make_float64(z))) {
-            z += roundBitsMask;
-        }
-        break;
-    default:
-        abort();
-    }
-    z &= ~ roundBitsMask;
-    if (z != float64_val(a)) {
-        status->float_exception_flags |= float_flag_inexact;
-    }
-    return make_float64(z);
-
-}
-
-float64 float64_trunc_to_int(float64 a, float_status *status)
-{
-    int oldmode;
-    float64 res;
-    oldmode = status->float_rounding_mode;
-    status->float_rounding_mode = float_round_to_zero;
-    res = float64_round_to_int(a, status);
-    status->float_rounding_mode = oldmode;
-    return res;
-}
-
 
 /*----------------------------------------------------------------------------
 | Returns the remainder of the double-precision floating-point value `a'
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 102cf4b1e1..483803ff35 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -346,6 +346,7 @@ float64 float16_to_float64(float16 a, flag ieee, float_status *status);
 | Software half-precision operations.
 *----------------------------------------------------------------------------*/
 
+float16 float16_round_to_int(float16, float_status *status);
 float16 float16_add(float16, float16, float_status *status);
 float16 float16_sub(float16, float16, float_status *status);
 float16 float16_mul(float16, float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 15/19] fpu/softfloat: re-factor float to int/uint
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (13 preceding siblings ...)
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 14/19] fpu/softfloat: re-factor round_to_int Alex Bennée
@ 2017-12-11 12:57 ` Alex Bennée
  2017-12-18 22:54   ` Richard Henderson
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 16/19] fpu/softfloat: re-factor int/uint to float Alex Bennée
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:57 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

We share the common int64/uint64_pack_decomposed function across all
the helpers and simply limit the final result depending on the final
size.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 fpu/softfloat.c         | 1000 ++++++++++-------------------------------------
 include/fpu/softfloat.h |   13 +
 2 files changed, 224 insertions(+), 789 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 9914ecb783..d7858bdae5 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1312,6 +1312,183 @@ float64 float64_trunc_to_int(float64 a, float_status *s)
     return float64_round_pack_canonical(pr, s);
 }
 
+/*----------------------------------------------------------------------------
+| Returns the result of converting the floating-point value
+| `a' to the two's complement integer format.  The conversion is
+| performed according to the IEC/IEEE Standard for Binary Floating-Point
+| Arithmetic---which means in particular that the conversion is rounded
+| according to the current rounding mode.  If `a' is a NaN, the largest
+| positive integer is returned.  Otherwise, if the conversion overflows, the
+| largest integer with the same sign as `a' is returned.
+*----------------------------------------------------------------------------*/
+
+static int64_t int64_pack_decomposed(decomposed_parts p, float_status *s)
+{
+    uint64_t r;
+
+    switch (p.cls) {
+    case float_class_snan:
+    case float_class_qnan:
+        return INT64_MAX;
+    case float_class_inf:
+        return p.sign ? INT64_MIN : INT64_MAX;
+    case float_class_zero:
+        return 0;
+    case float_class_normal:
+        if (p.exp < DECOMPOSED_BINARY_POINT) {
+            r = p.frac >> (DECOMPOSED_BINARY_POINT - p.exp);
+        } else if (p.exp < 64) {
+            r = p.frac << (p.exp - DECOMPOSED_BINARY_POINT);
+        } else {
+            s->float_exception_flags |= float_flag_invalid;
+            r = UINT64_MAX;
+        }
+        if (p.sign) {
+            return r < - (uint64_t) INT64_MIN ? -r : INT64_MIN;
+        } else {
+            return r < INT64_MAX ? r : INT64_MAX;
+        }
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static int16_t int16_pack_decomposed(decomposed_parts p, float_status *s)
+{
+    int64_t r = int64_pack_decomposed(p, s);
+    if (r < INT16_MIN) {
+        s->float_exception_flags |= float_flag_invalid;
+        return INT16_MIN;
+    } else if (r > INT16_MAX) {
+        s->float_exception_flags |= float_flag_invalid;
+        return INT16_MAX;
+    }
+    return r;
+}
+
+static int32_t int32_pack_decomposed(decomposed_parts p, float_status *s)
+{
+    int64_t r = int64_pack_decomposed(p, s);
+    if (r < INT32_MIN) {
+        s->float_exception_flags |= float_flag_invalid;
+        return INT32_MIN;
+    } else if (r > INT32_MAX) {
+        s->float_exception_flags |= float_flag_invalid;
+        return INT32_MAX;
+    }
+    return r;
+}
+
+#define FLOAT_TO_INT(fsz, isz) \
+int ## isz ## _t float ## fsz ## _to_int ## isz(float ## fsz a, float_status *s) \
+{                                                                       \
+    decomposed_parts pa = float ## fsz ## _unpack_canonical(a, s);      \
+    decomposed_parts pr = round_decomposed(pa, s->float_rounding_mode, s); \
+    return int ## isz ## _pack_decomposed(pr, s);                       \
+}                                                                       \
+                                                                        \
+int ## isz ## _t float ## fsz ## _to_int ## isz ## _round_to_zero       \
+ (float ## fsz a, float_status *s)                                      \
+{                                                                       \
+    decomposed_parts pa = float ## fsz ## _unpack_canonical(a, s);      \
+    decomposed_parts pr = round_decomposed(pa, float_round_to_zero, s); \
+    return int ## isz ## _pack_decomposed(pr, s);                       \
+}
+
+FLOAT_TO_INT(16, 16)
+FLOAT_TO_INT(16, 32)
+FLOAT_TO_INT(16, 64)
+
+FLOAT_TO_INT(32, 16)
+FLOAT_TO_INT(32, 32)
+FLOAT_TO_INT(32, 64)
+
+FLOAT_TO_INT(64, 16)
+FLOAT_TO_INT(64, 32)
+FLOAT_TO_INT(64, 64)
+
+#undef FLOAT_TO_INT
+
+/*----------------------------------------------------------------------------
+| Returns the result of converting the  floating-point value
+| `a' to the unsigned integer format.  The conversion is
+| performed according to the IEC/IEEE Standard for Binary Floating-Point
+| Arithmetic---which means in particular that the conversion is rounded
+| according to the current rounding mode.  If `a' is a NaN, the largest
+| unsigned integer is returned.  Otherwise, if the conversion overflows, the
+| largest unsigned integer is returned.  If the 'a' is negative, the result
+| is rounded and zero is returned; values that do not round to zero will
+| raise the inexact exception flag.
+*----------------------------------------------------------------------------*/
+
+static uint64_t uint64_pack_decomposed(decomposed_parts p, float_status *s)
+{
+    switch (p.cls) {
+    case float_class_snan:
+    case float_class_qnan:
+        return UINT64_MAX;
+    case float_class_inf:
+        return p.sign ? 0 : UINT64_MAX;
+    case float_class_zero:
+        return 0;
+    case float_class_normal:
+        if (p.sign) {
+            return 0;
+        }
+        if (p.exp < DECOMPOSED_BINARY_POINT) {
+            return p.frac >> (DECOMPOSED_BINARY_POINT - p.exp);
+        } else if (p.exp < 64) {
+            return p.frac << (p.exp - DECOMPOSED_BINARY_POINT);
+        } else {
+            return UINT64_MAX;
+        }
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint16_t uint16_pack_decomposed(decomposed_parts p, float_status *s)
+{
+    uint64_t r = uint64_pack_decomposed(p, s);
+    return r > UINT16_MAX ? UINT16_MAX : r;
+}
+
+static uint32_t uint32_pack_decomposed(decomposed_parts p, float_status *s)
+{
+    uint64_t r = uint64_pack_decomposed(p, s);
+    return r > UINT32_MAX ? UINT32_MAX : r;
+}
+
+#define FLOAT_TO_UINT(fsz, isz) \
+uint ## isz ## _t float ## fsz ## _to_uint ## isz(float ## fsz a, float_status *s) \
+{                                                                       \
+    decomposed_parts pa = float ## fsz ## _unpack_canonical(a, s);      \
+    decomposed_parts pr = round_decomposed(pa, s->float_rounding_mode, s); \
+    return uint ## isz ## _pack_decomposed(pr, s);                      \
+}                                                                       \
+                                                                        \
+uint ## isz ## _t float ## fsz ## _to_uint ## isz ## _round_to_zero     \
+ (float ## fsz a, float_status *s)                                      \
+{                                                                       \
+    decomposed_parts pa = float ## fsz ## _unpack_canonical(a, s);      \
+    decomposed_parts pr = round_decomposed(pa, float_round_to_zero, s); \
+    return uint ## isz ## _pack_decomposed(pr, s);                      \
+}
+
+FLOAT_TO_UINT(16, 16)
+FLOAT_TO_UINT(16, 32)
+FLOAT_TO_UINT(16, 64)
+
+FLOAT_TO_UINT(32, 16)
+FLOAT_TO_UINT(32, 32)
+FLOAT_TO_UINT(32, 64)
+
+FLOAT_TO_UINT(64, 16)
+FLOAT_TO_UINT(64, 32)
+FLOAT_TO_UINT(64, 64)
+
+#undef FLOAT_TO_UINT
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -2663,288 +2840,8 @@ float128 uint64_to_float128(uint64_t a, float_status *status)
     return normalizeRoundAndPackFloat128(0, 0x406E, a, 0, status);
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the 32-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| positive integer is returned.  Otherwise, if the conversion overflows, the
-| largest integer with the same sign as `a' is returned.
-*----------------------------------------------------------------------------*/
-
-int32_t float32_to_int32(float32 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint32_t aSig;
-    uint64_t aSig64;
-
-    a = float32_squash_input_denormal(a, status);
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    if ( ( aExp == 0xFF ) && aSig ) aSign = 0;
-    if ( aExp ) aSig |= 0x00800000;
-    shiftCount = 0xAF - aExp;
-    aSig64 = aSig;
-    aSig64 <<= 32;
-    if ( 0 < shiftCount ) shift64RightJamming( aSig64, shiftCount, &aSig64 );
-    return roundAndPackInt32(aSign, aSig64, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the 32-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.
-| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
-| the conversion overflows, the largest integer with the same sign as `a' is
-| returned.
-*----------------------------------------------------------------------------*/
-
-int32_t float32_to_int32_round_to_zero(float32 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint32_t aSig;
-    int32_t z;
-    a = float32_squash_input_denormal(a, status);
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    shiftCount = aExp - 0x9E;
-    if ( 0 <= shiftCount ) {
-        if ( float32_val(a) != 0xCF000000 ) {
-            float_raise(float_flag_invalid, status);
-            if ( ! aSign || ( ( aExp == 0xFF ) && aSig ) ) return 0x7FFFFFFF;
-        }
-        return (int32_t) 0x80000000;
-    }
-    else if ( aExp <= 0x7E ) {
-        if (aExp | aSig) {
-            status->float_exception_flags |= float_flag_inexact;
-        }
-        return 0;
-    }
-    aSig = ( aSig | 0x00800000 )<<8;
-    z = aSig>>( - shiftCount );
-    if ( (uint32_t) ( aSig<<( shiftCount & 31 ) ) ) {
-        status->float_exception_flags |= float_flag_inexact;
-    }
-    if ( aSign ) z = - z;
-    return z;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the 16-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.
-| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
-| the conversion overflows, the largest integer with the same sign as `a' is
-| returned.
-*----------------------------------------------------------------------------*/
-
-int16_t float32_to_int16_round_to_zero(float32 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint32_t aSig;
-    int32_t z;
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    shiftCount = aExp - 0x8E;
-    if ( 0 <= shiftCount ) {
-        if ( float32_val(a) != 0xC7000000 ) {
-            float_raise(float_flag_invalid, status);
-            if ( ! aSign || ( ( aExp == 0xFF ) && aSig ) ) {
-                return 0x7FFF;
-            }
-        }
-        return (int32_t) 0xffff8000;
-    }
-    else if ( aExp <= 0x7E ) {
-        if ( aExp | aSig ) {
-            status->float_exception_flags |= float_flag_inexact;
-        }
-        return 0;
-    }
-    shiftCount -= 0x10;
-    aSig = ( aSig | 0x00800000 )<<8;
-    z = aSig>>( - shiftCount );
-    if ( (uint32_t) ( aSig<<( shiftCount & 31 ) ) ) {
-        status->float_exception_flags |= float_flag_inexact;
-    }
-    if ( aSign ) {
-        z = - z;
-    }
-    return z;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the 64-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| positive integer is returned.  Otherwise, if the conversion overflows, the
-| largest integer with the same sign as `a' is returned.
-*----------------------------------------------------------------------------*/
-
-int64_t float32_to_int64(float32 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint32_t aSig;
-    uint64_t aSig64, aSigExtra;
-    a = float32_squash_input_denormal(a, status);
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    shiftCount = 0xBE - aExp;
-    if ( shiftCount < 0 ) {
-        float_raise(float_flag_invalid, status);
-        if ( ! aSign || ( ( aExp == 0xFF ) && aSig ) ) {
-            return LIT64( 0x7FFFFFFFFFFFFFFF );
-        }
-        return (int64_t) LIT64( 0x8000000000000000 );
-    }
-    if ( aExp ) aSig |= 0x00800000;
-    aSig64 = aSig;
-    aSig64 <<= 40;
-    shift64ExtraRightJamming( aSig64, 0, shiftCount, &aSig64, &aSigExtra );
-    return roundAndPackInt64(aSign, aSig64, aSigExtra, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the 64-bit unsigned integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| unsigned integer is returned.  Otherwise, if the conversion overflows, the
-| largest unsigned integer is returned.  If the 'a' is negative, the result
-| is rounded and zero is returned; values that do not round to zero will
-| raise the inexact exception flag.
-*----------------------------------------------------------------------------*/
-
-uint64_t float32_to_uint64(float32 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint32_t aSig;
-    uint64_t aSig64, aSigExtra;
-    a = float32_squash_input_denormal(a, status);
-
-    aSig = extractFloat32Frac(a);
-    aExp = extractFloat32Exp(a);
-    aSign = extractFloat32Sign(a);
-    if ((aSign) && (aExp > 126)) {
-        float_raise(float_flag_invalid, status);
-        if (float32_is_any_nan(a)) {
-            return LIT64(0xFFFFFFFFFFFFFFFF);
-        } else {
-            return 0;
-        }
-    }
-    shiftCount = 0xBE - aExp;
-    if (aExp) {
-        aSig |= 0x00800000;
-    }
-    if (shiftCount < 0) {
-        float_raise(float_flag_invalid, status);
-        return LIT64(0xFFFFFFFFFFFFFFFF);
-    }
-
-    aSig64 = aSig;
-    aSig64 <<= 40;
-    shift64ExtraRightJamming(aSig64, 0, shiftCount, &aSig64, &aSigExtra);
-    return roundAndPackUint64(aSign, aSig64, aSigExtra, status);
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the 64-bit unsigned integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.  If
-| `a' is a NaN, the largest unsigned integer is returned.  Otherwise, if the
-| conversion overflows, the largest unsigned integer is returned.  If the
-| 'a' is negative, the result is rounded and zero is returned; values that do
-| not round to zero will raise the inexact flag.
-*----------------------------------------------------------------------------*/
 
-uint64_t float32_to_uint64_round_to_zero(float32 a, float_status *status)
-{
-    signed char current_rounding_mode = status->float_rounding_mode;
-    set_float_rounding_mode(float_round_to_zero, status);
-    int64_t v = float32_to_uint64(a, status);
-    set_float_rounding_mode(current_rounding_mode, status);
-    return v;
-}
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the single-precision floating-point value
-| `a' to the 64-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.  If
-| `a' is a NaN, the largest positive integer is returned.  Otherwise, if the
-| conversion overflows, the largest integer with the same sign as `a' is
-| returned.
-*----------------------------------------------------------------------------*/
-
-int64_t float32_to_int64_round_to_zero(float32 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint32_t aSig;
-    uint64_t aSig64;
-    int64_t z;
-    a = float32_squash_input_denormal(a, status);
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    shiftCount = aExp - 0xBE;
-    if ( 0 <= shiftCount ) {
-        if ( float32_val(a) != 0xDF000000 ) {
-            float_raise(float_flag_invalid, status);
-            if ( ! aSign || ( ( aExp == 0xFF ) && aSig ) ) {
-                return LIT64( 0x7FFFFFFFFFFFFFFF );
-            }
-        }
-        return (int64_t) LIT64( 0x8000000000000000 );
-    }
-    else if ( aExp <= 0x7E ) {
-        if (aExp | aSig) {
-            status->float_exception_flags |= float_flag_inexact;
-        }
-        return 0;
-    }
-    aSig64 = aSig | 0x00800000;
-    aSig64 <<= 40;
-    z = aSig64>>( - shiftCount );
-    if ( (uint64_t) ( aSig64<<( shiftCount & 63 ) ) ) {
-        status->float_exception_flags |= float_flag_inexact;
-    }
-    if ( aSign ) z = - z;
-    return z;
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the result of converting the single-precision floating-point value
@@ -3500,289 +3397,59 @@ int float32_le_quiet(float32 a, float32 b, float_status *status)
 | Returns 1 if the single-precision floating-point value `a' is less than
 | the corresponding value `b', and 0 otherwise.  Quiet NaNs do not cause an
 | exception.  Otherwise, the comparison is performed according to the IEC/IEEE
-| Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-int float32_lt_quiet(float32 a, float32 b, float_status *status)
-{
-    flag aSign, bSign;
-    uint32_t av, bv;
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-
-    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
-         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
-       ) {
-        if (float32_is_signaling_nan(a, status)
-         || float32_is_signaling_nan(b, status)) {
-            float_raise(float_flag_invalid, status);
-        }
-        return 0;
-    }
-    aSign = extractFloat32Sign( a );
-    bSign = extractFloat32Sign( b );
-    av = float32_val(a);
-    bv = float32_val(b);
-    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
-    return ( av != bv ) && ( aSign ^ ( av < bv ) );
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns 1 if the single-precision floating-point values `a' and `b' cannot
-| be compared, and 0 otherwise.  Quiet NaNs do not cause an exception.  The
-| comparison is performed according to the IEC/IEEE Standard for Binary
-| Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-int float32_unordered_quiet(float32 a, float32 b, float_status *status)
-{
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-
-    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
-         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
-       ) {
-        if (float32_is_signaling_nan(a, status)
-         || float32_is_signaling_nan(b, status)) {
-            float_raise(float_flag_invalid, status);
-        }
-        return 1;
-    }
-    return 0;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point value
-| `a' to the 32-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| positive integer is returned.  Otherwise, if the conversion overflows, the
-| largest integer with the same sign as `a' is returned.
-*----------------------------------------------------------------------------*/
-
-int32_t float64_to_int32(float64 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint64_t aSig;
-    a = float64_squash_input_denormal(a, status);
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    if ( ( aExp == 0x7FF ) && aSig ) aSign = 0;
-    if ( aExp ) aSig |= LIT64( 0x0010000000000000 );
-    shiftCount = 0x42C - aExp;
-    if ( 0 < shiftCount ) shift64RightJamming( aSig, shiftCount, &aSig );
-    return roundAndPackInt32(aSign, aSig, status);
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point value
-| `a' to the 32-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.
-| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
-| the conversion overflows, the largest integer with the same sign as `a' is
-| returned.
-*----------------------------------------------------------------------------*/
-
-int32_t float64_to_int32_round_to_zero(float64 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint64_t aSig, savedASig;
-    int32_t z;
-    a = float64_squash_input_denormal(a, status);
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    if ( 0x41E < aExp ) {
-        if ( ( aExp == 0x7FF ) && aSig ) aSign = 0;
-        goto invalid;
-    }
-    else if ( aExp < 0x3FF ) {
-        if (aExp || aSig) {
-            status->float_exception_flags |= float_flag_inexact;
-        }
-        return 0;
-    }
-    aSig |= LIT64( 0x0010000000000000 );
-    shiftCount = 0x433 - aExp;
-    savedASig = aSig;
-    aSig >>= shiftCount;
-    z = aSig;
-    if ( aSign ) z = - z;
-    if ( ( z < 0 ) ^ aSign ) {
- invalid:
-        float_raise(float_flag_invalid, status);
-        return aSign ? (int32_t) 0x80000000 : 0x7FFFFFFF;
-    }
-    if ( ( aSig<<shiftCount ) != savedASig ) {
-        status->float_exception_flags |= float_flag_inexact;
-    }
-    return z;
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point value
-| `a' to the 16-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.
-| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
-| the conversion overflows, the largest integer with the same sign as `a' is
-| returned.
-*----------------------------------------------------------------------------*/
-
-int16_t float64_to_int16_round_to_zero(float64 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint64_t aSig, savedASig;
-    int32_t z;
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    if ( 0x40E < aExp ) {
-        if ( ( aExp == 0x7FF ) && aSig ) {
-            aSign = 0;
-        }
-        goto invalid;
-    }
-    else if ( aExp < 0x3FF ) {
-        if ( aExp || aSig ) {
-            status->float_exception_flags |= float_flag_inexact;
-        }
-        return 0;
-    }
-    aSig |= LIT64( 0x0010000000000000 );
-    shiftCount = 0x433 - aExp;
-    savedASig = aSig;
-    aSig >>= shiftCount;
-    z = aSig;
-    if ( aSign ) {
-        z = - z;
-    }
-    if ( ( (int16_t)z < 0 ) ^ aSign ) {
- invalid:
-        float_raise(float_flag_invalid, status);
-        return aSign ? (int32_t) 0xffff8000 : 0x7FFF;
-    }
-    if ( ( aSig<<shiftCount ) != savedASig ) {
-        status->float_exception_flags |= float_flag_inexact;
-    }
-    return z;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point value
-| `a' to the 64-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| positive integer is returned.  Otherwise, if the conversion overflows, the
-| largest integer with the same sign as `a' is returned.
+| Standard for Binary Floating-Point Arithmetic.
 *----------------------------------------------------------------------------*/
 
-int64_t float64_to_int64(float64 a, float_status *status)
+int float32_lt_quiet(float32 a, float32 b, float_status *status)
 {
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint64_t aSig, aSigExtra;
-    a = float64_squash_input_denormal(a, status);
+    flag aSign, bSign;
+    uint32_t av, bv;
+    a = float32_squash_input_denormal(a, status);
+    b = float32_squash_input_denormal(b, status);
 
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    if ( aExp ) aSig |= LIT64( 0x0010000000000000 );
-    shiftCount = 0x433 - aExp;
-    if ( shiftCount <= 0 ) {
-        if ( 0x43E < aExp ) {
+    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
+         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
+       ) {
+        if (float32_is_signaling_nan(a, status)
+         || float32_is_signaling_nan(b, status)) {
             float_raise(float_flag_invalid, status);
-            if (    ! aSign
-                 || (    ( aExp == 0x7FF )
-                      && ( aSig != LIT64( 0x0010000000000000 ) ) )
-               ) {
-                return LIT64( 0x7FFFFFFFFFFFFFFF );
-            }
-            return (int64_t) LIT64( 0x8000000000000000 );
         }
-        aSigExtra = 0;
-        aSig <<= - shiftCount;
-    }
-    else {
-        shift64ExtraRightJamming( aSig, 0, shiftCount, &aSig, &aSigExtra );
+        return 0;
     }
-    return roundAndPackInt64(aSign, aSig, aSigExtra, status);
+    aSign = extractFloat32Sign( a );
+    bSign = extractFloat32Sign( b );
+    av = float32_val(a);
+    bv = float32_val(b);
+    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
+    return ( av != bv ) && ( aSign ^ ( av < bv ) );
 
 }
 
 /*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point value
-| `a' to the 64-bit two's complement integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic, except that the conversion is always rounded toward zero.
-| If `a' is a NaN, the largest positive integer is returned.  Otherwise, if
-| the conversion overflows, the largest integer with the same sign as `a' is
-| returned.
+| Returns 1 if the single-precision floating-point values `a' and `b' cannot
+| be compared, and 0 otherwise.  Quiet NaNs do not cause an exception.  The
+| comparison is performed according to the IEC/IEEE Standard for Binary
+| Floating-Point Arithmetic.
 *----------------------------------------------------------------------------*/
 
-int64_t float64_to_int64_round_to_zero(float64 a, float_status *status)
+int float32_unordered_quiet(float32 a, float32 b, float_status *status)
 {
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint64_t aSig;
-    int64_t z;
-    a = float64_squash_input_denormal(a, status);
+    a = float32_squash_input_denormal(a, status);
+    b = float32_squash_input_denormal(b, status);
 
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    if ( aExp ) aSig |= LIT64( 0x0010000000000000 );
-    shiftCount = aExp - 0x433;
-    if ( 0 <= shiftCount ) {
-        if ( 0x43E <= aExp ) {
-            if ( float64_val(a) != LIT64( 0xC3E0000000000000 ) ) {
-                float_raise(float_flag_invalid, status);
-                if (    ! aSign
-                     || (    ( aExp == 0x7FF )
-                          && ( aSig != LIT64( 0x0010000000000000 ) ) )
-                   ) {
-                    return LIT64( 0x7FFFFFFFFFFFFFFF );
-                }
-            }
-            return (int64_t) LIT64( 0x8000000000000000 );
-        }
-        z = aSig<<shiftCount;
-    }
-    else {
-        if ( aExp < 0x3FE ) {
-            if (aExp | aSig) {
-                status->float_exception_flags |= float_flag_inexact;
-            }
-            return 0;
-        }
-        z = aSig>>( - shiftCount );
-        if ( (uint64_t) ( aSig<<( shiftCount & 63 ) ) ) {
-            status->float_exception_flags |= float_flag_inexact;
+    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
+         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
+       ) {
+        if (float32_is_signaling_nan(a, status)
+         || float32_is_signaling_nan(b, status)) {
+            float_raise(float_flag_invalid, status);
         }
+        return 1;
     }
-    if ( aSign ) z = - z;
-    return z;
-
+    return 0;
 }
 
+
 /*----------------------------------------------------------------------------
 | Returns the result of converting the double-precision floating-point value
 | `a' to the single-precision floating-point format.  The conversion is
@@ -7049,252 +6716,7 @@ float64 uint32_to_float64(uint32_t a, float_status *status)
     return int64_to_float64(a, status);
 }
 
-uint32_t float32_to_uint32(float32 a, float_status *status)
-{
-    int64_t v;
-    uint32_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float32_to_int64(a, status);
-    if (v < 0) {
-        res = 0;
-    } else if (v > 0xffffffff) {
-        res = 0xffffffff;
-    } else {
-        return v;
-    }
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-uint32_t float32_to_uint32_round_to_zero(float32 a, float_status *status)
-{
-    int64_t v;
-    uint32_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float32_to_int64_round_to_zero(a, status);
-    if (v < 0) {
-        res = 0;
-    } else if (v > 0xffffffff) {
-        res = 0xffffffff;
-    } else {
-        return v;
-    }
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-int16_t float32_to_int16(float32 a, float_status *status)
-{
-    int32_t v;
-    int16_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float32_to_int32(a, status);
-    if (v < -0x8000) {
-        res = -0x8000;
-    } else if (v > 0x7fff) {
-        res = 0x7fff;
-    } else {
-        return v;
-    }
-
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-uint16_t float32_to_uint16(float32 a, float_status *status)
-{
-    int32_t v;
-    uint16_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float32_to_int32(a, status);
-    if (v < 0) {
-        res = 0;
-    } else if (v > 0xffff) {
-        res = 0xffff;
-    } else {
-        return v;
-    }
-
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-uint16_t float32_to_uint16_round_to_zero(float32 a, float_status *status)
-{
-    int64_t v;
-    uint16_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float32_to_int64_round_to_zero(a, status);
-    if (v < 0) {
-        res = 0;
-    } else if (v > 0xffff) {
-        res = 0xffff;
-    } else {
-        return v;
-    }
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-uint32_t float64_to_uint32(float64 a, float_status *status)
-{
-    uint64_t v;
-    uint32_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float64_to_uint64(a, status);
-    if (v > 0xffffffff) {
-        res = 0xffffffff;
-    } else {
-        return v;
-    }
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-uint32_t float64_to_uint32_round_to_zero(float64 a, float_status *status)
-{
-    uint64_t v;
-    uint32_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float64_to_uint64_round_to_zero(a, status);
-    if (v > 0xffffffff) {
-        res = 0xffffffff;
-    } else {
-        return v;
-    }
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-int16_t float64_to_int16(float64 a, float_status *status)
-{
-    int64_t v;
-    int16_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float64_to_int32(a, status);
-    if (v < -0x8000) {
-        res = -0x8000;
-    } else if (v > 0x7fff) {
-        res = 0x7fff;
-    } else {
-        return v;
-    }
-
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-uint16_t float64_to_uint16(float64 a, float_status *status)
-{
-    int64_t v;
-    uint16_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float64_to_int32(a, status);
-    if (v < 0) {
-        res = 0;
-    } else if (v > 0xffff) {
-        res = 0xffff;
-    } else {
-        return v;
-    }
-
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-uint16_t float64_to_uint16_round_to_zero(float64 a, float_status *status)
-{
-    int64_t v;
-    uint16_t res;
-    int old_exc_flags = get_float_exception_flags(status);
-
-    v = float64_to_int64_round_to_zero(a, status);
-    if (v < 0) {
-        res = 0;
-    } else if (v > 0xffff) {
-        res = 0xffff;
-    } else {
-        return v;
-    }
-    set_float_exception_flags(old_exc_flags, status);
-    float_raise(float_flag_invalid, status);
-    return res;
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the double-precision floating-point value
-| `a' to the 64-bit unsigned integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| positive integer is returned.  If the conversion overflows, the
-| largest unsigned integer is returned.  If 'a' is negative, the value is
-| rounded and zero is returned; negative values that do not round to zero
-| will raise the inexact exception.
-*----------------------------------------------------------------------------*/
-
-uint64_t float64_to_uint64(float64 a, float_status *status)
-{
-    flag aSign;
-    int aExp;
-    int shiftCount;
-    uint64_t aSig, aSigExtra;
-    a = float64_squash_input_denormal(a, status);
-
-    aSig = extractFloat64Frac(a);
-    aExp = extractFloat64Exp(a);
-    aSign = extractFloat64Sign(a);
-    if (aSign && (aExp > 1022)) {
-        float_raise(float_flag_invalid, status);
-        if (float64_is_any_nan(a)) {
-            return LIT64(0xFFFFFFFFFFFFFFFF);
-        } else {
-            return 0;
-        }
-    }
-    if (aExp) {
-        aSig |= LIT64(0x0010000000000000);
-    }
-    shiftCount = 0x433 - aExp;
-    if (shiftCount <= 0) {
-        if (0x43E < aExp) {
-            float_raise(float_flag_invalid, status);
-            return LIT64(0xFFFFFFFFFFFFFFFF);
-        }
-        aSigExtra = 0;
-        aSig <<= -shiftCount;
-    } else {
-        shift64ExtraRightJamming(aSig, 0, shiftCount, &aSig, &aSigExtra);
-    }
-    return roundAndPackUint64(aSign, aSig, aSigExtra, status);
-}
 
-uint64_t float64_to_uint64_round_to_zero(float64 a, float_status *status)
-{
-    signed char current_rounding_mode = status->float_rounding_mode;
-    set_float_rounding_mode(float_round_to_zero, status);
-    uint64_t v = float64_to_uint64(a, status);
-    set_float_rounding_mode(current_rounding_mode, status);
-    return v;
-}
 
 #define COMPARE(s, nan_exp)                                                  \
 static inline int float ## s ## _compare_internal(float ## s a, float ## s b,\
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 483803ff35..860f480af8 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -341,6 +341,19 @@ float16 float32_to_float16(float32, flag, float_status *status);
 float32 float16_to_float32(float16, flag, float_status *status);
 float16 float64_to_float16(float64 a, flag ieee, float_status *status);
 float64 float16_to_float64(float16 a, flag ieee, float_status *status);
+int16_t float16_to_int16(float16, float_status *status);
+uint16_t float16_to_uint16(float16 a, float_status *status);
+int16_t float16_to_int16_round_to_zero(float16, float_status *status);
+uint16_t float16_to_uint16_round_to_zero(float16 a, float_status *status);
+int32_t float16_to_int32(float16, float_status *status);
+uint32_t float16_to_uint32(float16 a, float_status *status);
+int32_t float16_to_int32_round_to_zero(float16, float_status *status);
+uint32_t float16_to_uint32_round_to_zero(float16 a, float_status *status);
+int64_t float16_to_int64(float16, float_status *status);
+uint64_t float16_to_uint64(float16 a, float_status *status);
+int64_t float16_to_int64_round_to_zero(float16, float_status *status);
+uint64_t float16_to_uint64_round_to_zero(float16 a, float_status *status);
+float16 int16_to_float16(int16_t a, float_status *status);
 
 /*----------------------------------------------------------------------------
 | Software half-precision operations.
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 16/19] fpu/softfloat: re-factor int/uint to float
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (14 preceding siblings ...)
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 15/19] fpu/softfloat: re-factor float to int/uint Alex Bennée
@ 2017-12-11 12:57 ` Alex Bennée
  2017-12-12 17:21   ` Alex Bennée
  2017-12-18 22:59   ` Richard Henderson
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 17/19] fpu/softfloat: re-factor scalbn Alex Bennée
                   ` (3 subsequent siblings)
  19 siblings, 2 replies; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:57 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

These are considerably simpler as the lower order integers can just
use the higher order conversion function. As the decomposed fractional
part is a full 64 bit rounding and inexact handling comes from the
pack functions.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 fpu/softfloat.c         | 358 +++++++++++++++++++++++++-----------------------
 include/fpu/softfloat.h |  30 ++--
 2 files changed, 195 insertions(+), 193 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index d7858bdae5..1a7f1cab10 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1409,17 +1409,18 @@ FLOAT_TO_INT(64, 64)
 
 #undef FLOAT_TO_INT
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the  floating-point value
-| `a' to the unsigned integer format.  The conversion is
-| performed according to the IEC/IEEE Standard for Binary Floating-Point
-| Arithmetic---which means in particular that the conversion is rounded
-| according to the current rounding mode.  If `a' is a NaN, the largest
-| unsigned integer is returned.  Otherwise, if the conversion overflows, the
-| largest unsigned integer is returned.  If the 'a' is negative, the result
-| is rounded and zero is returned; values that do not round to zero will
-| raise the inexact exception flag.
-*----------------------------------------------------------------------------*/
+/*
+ *  Returns the result of converting the floating-point value `a' to
+ *  the unsigned integer format. The conversion is performed according
+ *  to the IEC/IEEE Standard for Binary Floating-Point
+ *  Arithmetic---which means in particular that the conversion is
+ *  rounded according to the current rounding mode. If `a' is a NaN,
+ *  the largest unsigned integer is returned. Otherwise, if the
+ *  conversion overflows, the largest unsigned integer is returned. If
+ *  the 'a' is negative, the result is rounded and zero is returned;
+ *  values that do not round to zero will raise the inexact exception
+ *  flag.
+ */
 
 static uint64_t uint64_pack_decomposed(decomposed_parts p, float_status *s)
 {
@@ -1433,6 +1434,7 @@ static uint64_t uint64_pack_decomposed(decomposed_parts p, float_status *s)
         return 0;
     case float_class_normal:
         if (p.sign) {
+            s->float_exception_flags |= float_flag_invalid;
             return 0;
         }
         if (p.exp < DECOMPOSED_BINARY_POINT) {
@@ -1440,6 +1442,7 @@ static uint64_t uint64_pack_decomposed(decomposed_parts p, float_status *s)
         } else if (p.exp < 64) {
             return p.frac << (p.exp - DECOMPOSED_BINARY_POINT);
         } else {
+            s->float_exception_flags |= float_flag_invalid;
             return UINT64_MAX;
         }
     default:
@@ -1450,13 +1453,21 @@ static uint64_t uint64_pack_decomposed(decomposed_parts p, float_status *s)
 static uint16_t uint16_pack_decomposed(decomposed_parts p, float_status *s)
 {
     uint64_t r = uint64_pack_decomposed(p, s);
-    return r > UINT16_MAX ? UINT16_MAX : r;
+    if (r > UINT16_MAX) {
+        s->float_exception_flags |= float_flag_invalid;
+        r = UINT16_MAX;
+    }
+    return r;
 }
 
 static uint32_t uint32_pack_decomposed(decomposed_parts p, float_status *s)
 {
     uint64_t r = uint64_pack_decomposed(p, s);
-    return r > UINT32_MAX ? UINT32_MAX : r;
+    if (r > UINT32_MAX) {
+        s->float_exception_flags |= float_flag_invalid;
+        r = UINT32_MAX;
+    }
+    return r;
 }
 
 #define FLOAT_TO_UINT(fsz, isz) \
@@ -1489,6 +1500,168 @@ FLOAT_TO_UINT(64, 64)
 
 #undef FLOAT_TO_UINT
 
+/*
+ * Integer to float conversions
+ *
+ * Returns the result of converting the two's complement integer `a'
+ * to the floating-point format. The conversion is performed according
+ * to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+ */
+
+static decomposed_parts int_to_float(int64_t a, float_status *status)
+{
+    decomposed_parts r;
+    if (a == 0) {
+        r.cls = float_class_zero;
+    } else if (a == (1ULL << 63)) {
+        r.cls = float_class_normal;
+        r.sign = true;
+        r.frac = DECOMPOSED_IMPLICIT_BIT;
+        r.exp = 63;
+    } else {
+        uint64_t f;
+        if (a < 0) {
+            f = -a;
+            r.sign = true;
+        } else {
+            f = a;
+            r.sign = false;
+        }
+        int shift = clz64(f) - 1;
+        r.cls = float_class_normal;
+        r.exp = (DECOMPOSED_BINARY_POINT - shift);
+        r.frac = f << shift;
+    }
+
+    return r;
+}
+
+float16 int64_to_float16(int64_t a, float_status *status)
+{
+    decomposed_parts pa = int_to_float(a, status);
+    return float16_round_pack_canonical(pa, status);
+}
+
+float16 int32_to_float16(int32_t a, float_status *status)
+{
+    return int64_to_float16((int64_t) a, status);
+}
+
+float16 int16_to_float16(int16_t a, float_status *status)
+{
+    return int64_to_float16((int64_t) a, status);
+}
+
+float32 int64_to_float32(int64_t a, float_status *status)
+{
+    decomposed_parts pa = int_to_float(a, status);
+    return float32_round_pack_canonical(pa, status);
+}
+
+float32 int32_to_float32(int32_t a, float_status *status)
+{
+    return int64_to_float32((int64_t) a, status);
+}
+
+float32 int16_to_float32(int16_t a, float_status *status)
+{
+    return int64_to_float32((int64_t) a, status);
+}
+
+float64 int64_to_float64(int64_t a, float_status *status)
+{
+    decomposed_parts pa = int_to_float(a, status);
+    return float64_round_pack_canonical(pa, status);
+}
+
+float64 int32_to_float64(int32_t a, float_status *status)
+{
+    return int64_to_float64((int64_t) a, status);
+}
+
+float64 int16_to_float64(int16_t a, float_status *status)
+{
+    return int64_to_float64((int64_t) a, status);
+}
+
+
+/*
+ * Unsigned Integer to float conversions
+ *
+ * Returns the result of converting the unsigned integer `a' to the
+ * floating-point format. The conversion is performed according to the
+ * IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+ */
+
+static decomposed_parts uint_to_float(uint64_t a, float_status *status)
+{
+    decomposed_parts r;
+    if (a == 0) {
+        r.cls = float_class_zero;
+    } else {
+        int spare_bits = clz64(a) - 1;
+        r.sign = false;
+        r.cls = float_class_normal;
+        r.exp = DECOMPOSED_BINARY_POINT - spare_bits;
+        if (spare_bits < 0) {
+            shift64RightJamming(a, -spare_bits, &a);
+            r.frac = a;
+        } else {
+            r.frac = a << spare_bits;
+        }
+    }
+
+    return r;
+}
+
+float16 uint64_to_float16(uint64_t a, float_status *status)
+{
+    decomposed_parts pa = uint_to_float(a, status);
+    return float16_round_pack_canonical(pa, status);
+}
+
+float16 uint32_to_float16(uint32_t a, float_status *status)
+{
+    return uint64_to_float16((uint64_t) a, status);
+}
+
+float16 uint16_to_float16(uint16_t a, float_status *status)
+{
+    return uint64_to_float16((uint64_t) a, status);
+}
+
+float32 uint64_to_float32(uint64_t a, float_status *status)
+{
+    decomposed_parts pa = uint_to_float(a, status);
+    return float32_round_pack_canonical(pa, status);
+}
+
+float32 uint32_to_float32(uint32_t a, float_status *status)
+{
+    return uint64_to_float32((uint64_t) a, status);
+}
+
+float32 uint16_to_float32(uint16_t a, float_status *status)
+{
+    return uint64_to_float32((uint64_t) a, status);
+}
+
+float64 uint64_to_float64(uint64_t a, float_status *status)
+{
+    decomposed_parts pa = uint_to_float(a, status);
+    return float64_round_pack_canonical(pa, status);
+}
+
+float64 uint32_to_float64(uint32_t a, float_status *status)
+{
+    return uint64_to_float64((uint64_t) a, status);
+}
+
+float64 uint16_to_float64(uint16_t a, float_status *status)
+{
+    return uint64_to_float64((uint64_t) a, status);
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -2580,43 +2753,6 @@ static float128 normalizeRoundAndPackFloat128(flag zSign, int32_t zExp,
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 32-bit two's complement integer `a'
-| to the single-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 int32_to_float32(int32_t a, float_status *status)
-{
-    flag zSign;
-
-    if ( a == 0 ) return float32_zero;
-    if ( a == (int32_t) 0x80000000 ) return packFloat32( 1, 0x9E, 0 );
-    zSign = ( a < 0 );
-    return normalizeRoundAndPackFloat32(zSign, 0x9C, zSign ? -a : a, status);
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 32-bit two's complement integer `a'
-| to the double-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 int32_to_float64(int32_t a, float_status *status)
-{
-    flag zSign;
-    uint32_t absA;
-    int8_t shiftCount;
-    uint64_t zSig;
-
-    if ( a == 0 ) return float64_zero;
-    zSign = ( a < 0 );
-    absA = zSign ? - a : a;
-    shiftCount = countLeadingZeros32( absA ) + 21;
-    zSig = absA;
-    return packFloat64( zSign, 0x432 - shiftCount, zSig<<shiftCount );
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the result of converting the 32-bit two's complement integer `a'
@@ -2663,56 +2799,6 @@ float128 int32_to_float128(int32_t a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 64-bit two's complement integer `a'
-| to the single-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 int64_to_float32(int64_t a, float_status *status)
-{
-    flag zSign;
-    uint64_t absA;
-    int8_t shiftCount;
-
-    if ( a == 0 ) return float32_zero;
-    zSign = ( a < 0 );
-    absA = zSign ? - a : a;
-    shiftCount = countLeadingZeros64( absA ) - 40;
-    if ( 0 <= shiftCount ) {
-        return packFloat32( zSign, 0x95 - shiftCount, absA<<shiftCount );
-    }
-    else {
-        shiftCount += 7;
-        if ( shiftCount < 0 ) {
-            shift64RightJamming( absA, - shiftCount, &absA );
-        }
-        else {
-            absA <<= shiftCount;
-        }
-        return roundAndPackFloat32(zSign, 0x9C - shiftCount, absA, status);
-    }
-
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 64-bit two's complement integer `a'
-| to the double-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 int64_to_float64(int64_t a, float_status *status)
-{
-    flag zSign;
-
-    if ( a == 0 ) return float64_zero;
-    if ( a == (int64_t) LIT64( 0x8000000000000000 ) ) {
-        return packFloat64( 1, 0x43E, 0 );
-    }
-    zSign = ( a < 0 );
-    return normalizeRoundAndPackFloat64(zSign, 0x43C, zSign ? -a : a, status);
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of converting the 64-bit two's complement integer `a'
 | to the extended double-precision floating-point format.  The conversion
@@ -2767,65 +2853,6 @@ float128 int64_to_float128(int64_t a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 64-bit unsigned integer `a'
-| to the single-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 uint64_to_float32(uint64_t a, float_status *status)
-{
-    int shiftcount;
-
-    if (a == 0) {
-        return float32_zero;
-    }
-
-    /* Determine (left) shift needed to put first set bit into bit posn 23
-     * (since packFloat32() expects the binary point between bits 23 and 22);
-     * this is the fast case for smallish numbers.
-     */
-    shiftcount = countLeadingZeros64(a) - 40;
-    if (shiftcount >= 0) {
-        return packFloat32(0, 0x95 - shiftcount, a << shiftcount);
-    }
-    /* Otherwise we need to do a round-and-pack. roundAndPackFloat32()
-     * expects the binary point between bits 30 and 29, hence the + 7.
-     */
-    shiftcount += 7;
-    if (shiftcount < 0) {
-        shift64RightJamming(a, -shiftcount, &a);
-    } else {
-        a <<= shiftcount;
-    }
-
-    return roundAndPackFloat32(0, 0x9c - shiftcount, a, status);
-}
-
-/*----------------------------------------------------------------------------
-| Returns the result of converting the 64-bit unsigned integer `a'
-| to the double-precision floating-point format.  The conversion is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 uint64_to_float64(uint64_t a, float_status *status)
-{
-    int exp = 0x43C;
-    int shiftcount;
-
-    if (a == 0) {
-        return float64_zero;
-    }
-
-    shiftcount = countLeadingZeros64(a) - 1;
-    if (shiftcount < 0) {
-        shift64RightJamming(a, -shiftcount, &a);
-    } else {
-        a <<= shiftcount;
-    }
-    return roundAndPackFloat64(0, exp - shiftcount, a, status);
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of converting the 64-bit unsigned integer `a'
 | to the quadruple-precision floating-point format.  The conversion is performed
@@ -6705,19 +6732,6 @@ int float128_unordered_quiet(float128 a, float128 b, float_status *status)
     return 0;
 }
 
-/* misc functions */
-float32 uint32_to_float32(uint32_t a, float_status *status)
-{
-    return int64_to_float32(a, status);
-}
-
-float64 uint32_to_float64(uint32_t a, float_status *status)
-{
-    return int64_to_float64(a, status);
-}
-
-
-
 #define COMPARE(s, nan_exp)                                                  \
 static inline int float ## s ## _compare_internal(float ## s a, float ## s b,\
                                       int is_quiet, float_status *status)    \
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 860f480af8..8ebde83251 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -299,9 +299,13 @@ enum {
 /*----------------------------------------------------------------------------
 | Software IEC/IEEE integer-to-floating-point conversion routines.
 *----------------------------------------------------------------------------*/
+float32 int16_to_float32(int16_t, float_status *status);
 float32 int32_to_float32(int32_t, float_status *status);
+float64 int16_to_float64(int16_t, float_status *status);
 float64 int32_to_float64(int32_t, float_status *status);
+float32 uint16_to_float32(uint16_t, float_status *status);
 float32 uint32_to_float32(uint32_t, float_status *status);
+float64 uint16_to_float64(uint16_t, float_status *status);
 float64 uint32_to_float64(uint32_t, float_status *status);
 floatx80 int32_to_floatx80(int32_t, float_status *status);
 float128 int32_to_float128(int32_t, float_status *status);
@@ -313,27 +317,6 @@ float32 uint64_to_float32(uint64_t, float_status *status);
 float64 uint64_to_float64(uint64_t, float_status *status);
 float128 uint64_to_float128(uint64_t, float_status *status);
 
-/* We provide the int16 versions for symmetry of API with float-to-int */
-static inline float32 int16_to_float32(int16_t v, float_status *status)
-{
-    return int32_to_float32(v, status);
-}
-
-static inline float32 uint16_to_float32(uint16_t v, float_status *status)
-{
-    return uint32_to_float32(v, status);
-}
-
-static inline float64 int16_to_float64(int16_t v, float_status *status)
-{
-    return int32_to_float64(v, status);
-}
-
-static inline float64 uint16_to_float64(uint16_t v, float_status *status)
-{
-    return uint32_to_float64(v, status);
-}
-
 /*----------------------------------------------------------------------------
 | Software half-precision conversion routines.
 *----------------------------------------------------------------------------*/
@@ -354,6 +337,11 @@ uint64_t float16_to_uint64(float16 a, float_status *status);
 int64_t float16_to_int64_round_to_zero(float16, float_status *status);
 uint64_t float16_to_uint64_round_to_zero(float16 a, float_status *status);
 float16 int16_to_float16(int16_t a, float_status *status);
+float16 int32_to_float16(int32_t a, float_status *status);
+float16 int64_to_float16(int64_t a, float_status *status);
+float16 uint16_to_float16(uint16_t a, float_status *status);
+float16 uint32_to_float16(uint32_t a, float_status *status);
+float16 uint64_to_float16(uint64_t a, float_status *status);
 
 /*----------------------------------------------------------------------------
 | Software half-precision operations.
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 17/19] fpu/softfloat: re-factor scalbn
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (15 preceding siblings ...)
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 16/19] fpu/softfloat: re-factor int/uint to float Alex Bennée
@ 2017-12-11 12:57 ` Alex Bennée
  2017-12-18 23:00   ` Richard Henderson
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 18/19] fpu/softfloat: re-factor minmax Alex Bennée
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:57 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

This is one of the simpler manipulations you could make to a floating
point number.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 fpu/softfloat.c         | 104 +++++++++++++++---------------------------------
 include/fpu/softfloat.h |   1 +
 2 files changed, 32 insertions(+), 73 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 1a7f1cab10..b7ea56dfa5 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1662,6 +1662,37 @@ float64 uint16_to_float64(uint16_t a, float_status *status)
     return uint64_to_float64((uint64_t) a, status);
 }
 
+/* Multiply A by 2 raised to the power N.  */
+static decomposed_parts scalbn_decomposed(decomposed_parts a, int n,
+                                          float_status *s)
+{
+    if (a.cls == float_class_normal) {
+        a.exp += n;
+    }
+    return a;
+}
+
+float16 float16_scalbn(float16 a, int n, float_status *status)
+{
+    decomposed_parts pa = float16_unpack_canonical(a, status);
+    decomposed_parts pr = scalbn_decomposed(pa, n, status);
+    return float16_round_pack_canonical(pr, status);
+}
+
+float32 float32_scalbn(float32 a, int n, float_status *status)
+{
+    decomposed_parts pa = float32_unpack_canonical(a, status);
+    decomposed_parts pr = scalbn_decomposed(pa, n, status);
+    return float32_round_pack_canonical(pr, status);
+}
+
+float64 float64_scalbn(float64 a, int n, float_status *status)
+{
+    decomposed_parts pa = float64_unpack_canonical(a, status);
+    decomposed_parts pr = scalbn_decomposed(pa, n, status);
+    return float64_round_pack_canonical(pr, status);
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -6991,79 +7022,6 @@ MINMAX(32)
 MINMAX(64)
 
 
-/* Multiply A by 2 raised to the power N.  */
-float32 float32_scalbn(float32 a, int n, float_status *status)
-{
-    flag aSign;
-    int16_t aExp;
-    uint32_t aSig;
-
-    a = float32_squash_input_denormal(a, status);
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-
-    if ( aExp == 0xFF ) {
-        if ( aSig ) {
-            return propagateFloat32NaN(a, a, status);
-        }
-        return a;
-    }
-    if (aExp != 0) {
-        aSig |= 0x00800000;
-    } else if (aSig == 0) {
-        return a;
-    } else {
-        aExp++;
-    }
-
-    if (n > 0x200) {
-        n = 0x200;
-    } else if (n < -0x200) {
-        n = -0x200;
-    }
-
-    aExp += n - 1;
-    aSig <<= 7;
-    return normalizeRoundAndPackFloat32(aSign, aExp, aSig, status);
-}
-
-float64 float64_scalbn(float64 a, int n, float_status *status)
-{
-    flag aSign;
-    int16_t aExp;
-    uint64_t aSig;
-
-    a = float64_squash_input_denormal(a, status);
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-
-    if ( aExp == 0x7FF ) {
-        if ( aSig ) {
-            return propagateFloat64NaN(a, a, status);
-        }
-        return a;
-    }
-    if (aExp != 0) {
-        aSig |= LIT64( 0x0010000000000000 );
-    } else if (aSig == 0) {
-        return a;
-    } else {
-        aExp++;
-    }
-
-    if (n > 0x1000) {
-        n = 0x1000;
-    } else if (n < -0x1000) {
-        n = -0x1000;
-    }
-
-    aExp += n - 1;
-    aSig <<= 10;
-    return normalizeRoundAndPackFloat64(aSign, aExp, aSig, status);
-}
-
 floatx80 floatx80_scalbn(floatx80 a, int n, float_status *status)
 {
     flag aSign;
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 8ebde83251..c1224aab8c 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -353,6 +353,7 @@ float16 float16_sub(float16, float16, float_status *status);
 float16 float16_mul(float16, float16, float_status *status);
 float16 float16_muladd(float16, float16, float16, int, float_status *status);
 float16 float16_div(float16, float16, float_status *status);
+float16 float16_scalbn(float16, int, float_status *status);
 
 int float16_is_quiet_nan(float16, float_status *status);
 int float16_is_signaling_nan(float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 18/19] fpu/softfloat: re-factor minmax
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (16 preceding siblings ...)
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 17/19] fpu/softfloat: re-factor scalbn Alex Bennée
@ 2017-12-11 12:57 ` Alex Bennée
  2017-12-18 23:19   ` Richard Henderson
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 19/19] fpu/softfloat: re-factor compare Alex Bennée
  2017-12-11 13:42 ` [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions no-reply
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:57 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

Let's do the same re-factor treatment for minmax functions. I still
use the MACRO trick to expand but now all the checking code is common.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 fpu/softfloat.c         | 242 ++++++++++++++++++++++++++----------------------
 include/fpu/softfloat.h |   6 ++
 2 files changed, 137 insertions(+), 111 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index b7ea56dfa5..5eba996932 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1662,6 +1662,137 @@ float64 uint16_to_float64(uint16_t a, float_status *status)
     return uint64_to_float64((uint64_t) a, status);
 }
 
+/* Float Min/Max */
+/* min() and max() functions. These can't be implemented as
+ * 'compare and pick one input' because that would mishandle
+ * NaNs and +0 vs -0.
+ *
+ * minnum() and maxnum() functions. These are similar to the min()
+ * and max() functions but if one of the arguments is a QNaN and
+ * the other is numerical then the numerical argument is returned.
+ * SNaNs will get quietened before being returned.
+ * minnum() and maxnum correspond to the IEEE 754-2008 minNum()
+ * and maxNum() operations. min() and max() are the typical min/max
+ * semantics provided by many CPUs which predate that specification.
+ *
+ * minnummag() and maxnummag() functions correspond to minNumMag()
+ * and minNumMag() from the IEEE-754 2008.
+ */
+static decomposed_parts minmax_decomposed(decomposed_parts a,
+                                          decomposed_parts b,
+                                          bool ismin, bool ieee, bool ismag,
+                                          float_status *s)
+{
+        if (a.cls >= float_class_qnan
+            ||
+            b.cls >= float_class_qnan)
+        {
+            if (ieee) {
+                /* Takes two floating-point values `a' and `b', one of
+                 * which is a NaN, and returns the appropriate NaN
+                 * result. If either `a' or `b' is a signaling NaN,
+                 * the invalid exception is raised.
+                 */
+                if (a.cls == float_class_snan || b.cls == float_class_snan) {
+                    s->float_exception_flags |= float_flag_invalid;
+                    if (s->default_nan_mode) {
+                        a.cls = float_class_msnan;
+                        return a;
+                    }
+                } else if (a.cls >= float_class_qnan
+                           &&
+                           b.cls < float_class_qnan) {
+                    return b;
+                } else if (b.cls >= float_class_qnan
+                           &&
+                           a.cls < float_class_qnan) {
+                    return a;
+                }
+            }
+            return pick_nan_parts(a, b, s);
+        }
+
+        /* Handle zero cases */
+        if (a.cls == float_class_zero || b.cls == float_class_zero) {
+            if (a.cls == float_class_normal) {
+                if (a.sign) {
+                    return ismin ? a : b;
+                } else {
+                    return ismin ? b : a;
+                }
+            } else if (b.cls == float_class_normal) {
+                if (b.sign) {
+                    return ismin ? b : a;
+                } else {
+                    return ismin ? a : b;
+                }
+            }
+        }
+
+        if (ismag) {
+            /* Magnitude, ignore sign */
+            bool a_less;
+            if (a.exp == b.exp) {
+                a_less = a.frac < b.frac;
+            } else {
+                a_less = a.exp < b.exp;
+            }
+            return a_less == ismin ? a : b;
+        }
+        if (a.sign != b.sign) {
+            if (ismin) {
+                return a.sign ? a : b;
+            } else {
+                return a.sign ? b : a;
+            }
+        } else {
+            bool a_less;
+            if (a.exp == b.exp) {
+                a_less = a.frac < b.frac;
+            } else {
+                a_less = a.exp < b.exp;
+            }
+            if (ismin) {
+                return a.sign ^ a_less ? a : b;
+            } else {
+                return a.sign ^ a_less ? b : a;
+            }
+        }
+}
+
+#define MINMAX(sz, name, ismin, isiee, ismag)                           \
+float ## sz float ## sz ## _ ## name(float ## sz a, float ## sz b, float_status *s) \
+{                                                                       \
+    decomposed_parts pa = float ## sz ## _unpack_canonical(a, s);       \
+    decomposed_parts pb = float ## sz ## _unpack_canonical(b, s);       \
+    decomposed_parts pr = minmax_decomposed(pa, pb, ismin, isiee, ismag, s); \
+                                                                        \
+    return float ## sz ## _round_pack_canonical(pr, s);                \
+}
+
+MINMAX(16, min, true, false, false)
+MINMAX(16, minnum, true, true, false)
+MINMAX(16, minnummag, true, true, true)
+MINMAX(16, max, false, false, false)
+MINMAX(16, maxnum, false, true, false)
+MINMAX(16, maxnummag, false, true, true)
+
+MINMAX(32, min, true, false, false)
+MINMAX(32, minnum, true, true, false)
+MINMAX(32, minnummag, true, true, true)
+MINMAX(32, max, false, false, false)
+MINMAX(32, maxnum, false, true, false)
+MINMAX(32, maxnummag, false, true, true)
+
+MINMAX(64, min, true, false, false)
+MINMAX(64, minnum, true, true, false)
+MINMAX(64, minnummag, true, true, true)
+MINMAX(64, max, false, false, false)
+MINMAX(64, maxnum, false, true, false)
+MINMAX(64, maxnummag, false, true, true)
+
+#undef MINMAX
+
 /* Multiply A by 2 raised to the power N.  */
 static decomposed_parts scalbn_decomposed(decomposed_parts a, int n,
                                           float_status *s)
@@ -6911,117 +7042,6 @@ int float128_compare_quiet(float128 a, float128 b, float_status *status)
     return float128_compare_internal(a, b, 1, status);
 }
 
-/* min() and max() functions. These can't be implemented as
- * 'compare and pick one input' because that would mishandle
- * NaNs and +0 vs -0.
- *
- * minnum() and maxnum() functions. These are similar to the min()
- * and max() functions but if one of the arguments is a QNaN and
- * the other is numerical then the numerical argument is returned.
- * SNaNs will get quietened before being returned.
- * minnum() and maxnum correspond to the IEEE 754-2008 minNum()
- * and maxNum() operations. min() and max() are the typical min/max
- * semantics provided by many CPUs which predate that specification.
- *
- * minnummag() and maxnummag() functions correspond to minNumMag()
- * and minNumMag() from the IEEE-754 2008.
- */
-#define MINMAX(s)                                                       \
-static inline float ## s float ## s ## _minmax(float ## s a, float ## s b,     \
-                                               int ismin, int isieee,   \
-                                               int ismag,               \
-                                               float_status *status)    \
-{                                                                       \
-    flag aSign, bSign;                                                  \
-    uint ## s ## _t av, bv, aav, abv;                                   \
-    a = float ## s ## _squash_input_denormal(a, status);                \
-    b = float ## s ## _squash_input_denormal(b, status);                \
-    if (float ## s ## _is_any_nan(a) ||                                 \
-        float ## s ## _is_any_nan(b)) {                                 \
-        if (isieee) {                                                   \
-            if (float ## s ## _is_signaling_nan(a, status) ||           \
-                float ## s ## _is_signaling_nan(b, status)) {           \
-                return propagateFloat ## s ## NaN(a, b, status);        \
-            } else  if (float ## s ## _is_quiet_nan(a, status) &&       \
-                !float ## s ##_is_any_nan(b)) {                         \
-                return b;                                               \
-            } else if (float ## s ## _is_quiet_nan(b, status) &&        \
-                       !float ## s ## _is_any_nan(a)) {                 \
-                return a;                                               \
-            }                                                           \
-        }                                                               \
-        return propagateFloat ## s ## NaN(a, b, status);                \
-    }                                                                   \
-    aSign = extractFloat ## s ## Sign(a);                               \
-    bSign = extractFloat ## s ## Sign(b);                               \
-    av = float ## s ## _val(a);                                         \
-    bv = float ## s ## _val(b);                                         \
-    if (ismag) {                                                        \
-        aav = float ## s ## _abs(av);                                   \
-        abv = float ## s ## _abs(bv);                                   \
-        if (aav != abv) {                                               \
-            if (ismin) {                                                \
-                return (aav < abv) ? a : b;                             \
-            } else {                                                    \
-                return (aav < abv) ? b : a;                             \
-            }                                                           \
-        }                                                               \
-    }                                                                   \
-    if (aSign != bSign) {                                               \
-        if (ismin) {                                                    \
-            return aSign ? a : b;                                       \
-        } else {                                                        \
-            return aSign ? b : a;                                       \
-        }                                                               \
-    } else {                                                            \
-        if (ismin) {                                                    \
-            return (aSign ^ (av < bv)) ? a : b;                         \
-        } else {                                                        \
-            return (aSign ^ (av < bv)) ? b : a;                         \
-        }                                                               \
-    }                                                                   \
-}                                                                       \
-                                                                        \
-float ## s float ## s ## _min(float ## s a, float ## s b,               \
-                              float_status *status)                     \
-{                                                                       \
-    return float ## s ## _minmax(a, b, 1, 0, 0, status);                \
-}                                                                       \
-                                                                        \
-float ## s float ## s ## _max(float ## s a, float ## s b,               \
-                              float_status *status)                     \
-{                                                                       \
-    return float ## s ## _minmax(a, b, 0, 0, 0, status);                \
-}                                                                       \
-                                                                        \
-float ## s float ## s ## _minnum(float ## s a, float ## s b,            \
-                                 float_status *status)                  \
-{                                                                       \
-    return float ## s ## _minmax(a, b, 1, 1, 0, status);                \
-}                                                                       \
-                                                                        \
-float ## s float ## s ## _maxnum(float ## s a, float ## s b,            \
-                                 float_status *status)                  \
-{                                                                       \
-    return float ## s ## _minmax(a, b, 0, 1, 0, status);                \
-}                                                                       \
-                                                                        \
-float ## s float ## s ## _minnummag(float ## s a, float ## s b,         \
-                                    float_status *status)               \
-{                                                                       \
-    return float ## s ## _minmax(a, b, 1, 1, 1, status);                \
-}                                                                       \
-                                                                        \
-float ## s float ## s ## _maxnummag(float ## s a, float ## s b,         \
-                                    float_status *status)               \
-{                                                                       \
-    return float ## s ## _minmax(a, b, 0, 1, 1, status);                \
-}
-
-MINMAX(32)
-MINMAX(64)
-
-
 floatx80 floatx80_scalbn(floatx80 a, int n, float_status *status)
 {
     flag aSign;
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index c1224aab8c..ba248ffa39 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -354,6 +354,12 @@ float16 float16_mul(float16, float16, float_status *status);
 float16 float16_muladd(float16, float16, float16, int, float_status *status);
 float16 float16_div(float16, float16, float_status *status);
 float16 float16_scalbn(float16, int, float_status *status);
+float16 float16_min(float16, float16, float_status *status);
+float16 float16_max(float16, float16, float_status *status);
+float16 float16_minnum(float16, float16, float_status *status);
+float16 float16_maxnum(float16, float16, float_status *status);
+float16 float16_minnummag(float16, float16, float_status *status);
+float16 float16_maxnummag(float16, float16, float_status *status);
 
 int float16_is_quiet_nan(float16, float_status *status);
 int float16_is_signaling_nan(float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH v1 19/19] fpu/softfloat: re-factor compare
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (17 preceding siblings ...)
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 18/19] fpu/softfloat: re-factor minmax Alex Bennée
@ 2017-12-11 12:57 ` Alex Bennée
  2017-12-18 23:26   ` Richard Henderson
  2017-12-11 13:42 ` [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions no-reply
  19 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 12:57 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Alex Bennée, Aurelien Jarno

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 fpu/softfloat.c         | 135 +++++++++++++++++++++++++++++-------------------
 include/fpu/softfloat.h |   2 +
 2 files changed, 83 insertions(+), 54 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 5eba996932..31b437e000 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1793,6 +1793,87 @@ MINMAX(64, maxnummag, false, true, true)
 
 #undef MINMAX
 
+/* Floating point compare */
+static int compare_decomposed(decomposed_parts a, decomposed_parts b,
+                              bool is_quiet, float_status *s)
+{
+    if (a.cls >= float_class_qnan
+        ||
+        b.cls >= float_class_qnan) {
+        if (!is_quiet ||
+            a.cls == float_class_snan ||
+            b.cls == float_class_snan) {
+            s->float_exception_flags |= float_flag_invalid;
+        }
+        return float_relation_unordered;
+    }
+
+    if (a.cls == float_class_zero || b.cls == float_class_zero) {
+        if (a.cls == float_class_normal) {
+            return a.sign ? float_relation_less : float_relation_greater;
+        } else if (b.cls == float_class_normal) {
+            return b.sign ? float_relation_greater : float_relation_less;
+        } else if (a.cls == b.cls) {
+            return float_relation_equal;
+        }
+    }
+
+    /* The only infinity we need to explicitly worry about is
+     * comparing two together, otherwise the max_exp/sign details are
+     * enough to compare to normal numbers
+     */
+    if (a.cls == float_class_inf && b.cls == float_class_inf) {
+        if (a.sign != b.sign) {
+            return a.sign ? float_relation_less : float_relation_greater;
+        } else {
+            return float_relation_equal;
+        }
+    }
+
+    if (a.sign != b.sign) {
+        return a.sign ? float_relation_less : float_relation_greater;
+    }
+
+    if (a.exp == b.exp) {
+        if (a.frac == b.frac) {
+            return float_relation_equal;
+        }
+        if (a.sign) {
+            return a.frac > b.frac ?
+                float_relation_less : float_relation_greater;
+        } else {
+            return a.frac > b.frac ?
+                float_relation_greater : float_relation_less;
+        }
+    } else {
+        if (a.sign) {
+            return a.exp > b.exp ? float_relation_less : float_relation_greater;
+        } else {
+            return a.exp > b.exp ? float_relation_greater : float_relation_less;
+        }
+    }
+}
+
+#define COMPARE(sz)                                                     \
+int float ## sz ## _compare(float ## sz a, float ## sz b, float_status *s) \
+{                                                                       \
+    decomposed_parts pa = float ## sz ## _unpack_canonical(a, s);       \
+    decomposed_parts pb = float ## sz ## _unpack_canonical(b, s);       \
+    return compare_decomposed(pa, pb, false, s);                        \
+}                                                                       \
+int float ## sz ## _compare_quiet(float ## sz a, float ## sz b, float_status *s) \
+{                                                                       \
+    decomposed_parts pa = float ## sz ## _unpack_canonical(a, s);       \
+    decomposed_parts pb = float ## sz ## _unpack_canonical(b, s);       \
+    return compare_decomposed(pa, pb, true, s);                         \
+}
+
+COMPARE(16)
+COMPARE(32)
+COMPARE(64)
+
+#undef COMPARE
+
 /* Multiply A by 2 raised to the power N.  */
 static decomposed_parts scalbn_decomposed(decomposed_parts a, int n,
                                           float_status *s)
@@ -6894,60 +6975,6 @@ int float128_unordered_quiet(float128 a, float128 b, float_status *status)
     return 0;
 }
 
-#define COMPARE(s, nan_exp)                                                  \
-static inline int float ## s ## _compare_internal(float ## s a, float ## s b,\
-                                      int is_quiet, float_status *status)    \
-{                                                                            \
-    flag aSign, bSign;                                                       \
-    uint ## s ## _t av, bv;                                                  \
-    a = float ## s ## _squash_input_denormal(a, status);                     \
-    b = float ## s ## _squash_input_denormal(b, status);                     \
-                                                                             \
-    if (( ( extractFloat ## s ## Exp( a ) == nan_exp ) &&                    \
-         extractFloat ## s ## Frac( a ) ) ||                                 \
-        ( ( extractFloat ## s ## Exp( b ) == nan_exp ) &&                    \
-          extractFloat ## s ## Frac( b ) )) {                                \
-        if (!is_quiet ||                                                     \
-            float ## s ## _is_signaling_nan(a, status) ||                  \
-            float ## s ## _is_signaling_nan(b, status)) {                 \
-            float_raise(float_flag_invalid, status);                         \
-        }                                                                    \
-        return float_relation_unordered;                                     \
-    }                                                                        \
-    aSign = extractFloat ## s ## Sign( a );                                  \
-    bSign = extractFloat ## s ## Sign( b );                                  \
-    av = float ## s ## _val(a);                                              \
-    bv = float ## s ## _val(b);                                              \
-    if ( aSign != bSign ) {                                                  \
-        if ( (uint ## s ## _t) ( ( av | bv )<<1 ) == 0 ) {                   \
-            /* zero case */                                                  \
-            return float_relation_equal;                                     \
-        } else {                                                             \
-            return 1 - (2 * aSign);                                          \
-        }                                                                    \
-    } else {                                                                 \
-        if (av == bv) {                                                      \
-            return float_relation_equal;                                     \
-        } else {                                                             \
-            return 1 - 2 * (aSign ^ ( av < bv ));                            \
-        }                                                                    \
-    }                                                                        \
-}                                                                            \
-                                                                             \
-int float ## s ## _compare(float ## s a, float ## s b, float_status *status) \
-{                                                                            \
-    return float ## s ## _compare_internal(a, b, 0, status);                 \
-}                                                                            \
-                                                                             \
-int float ## s ## _compare_quiet(float ## s a, float ## s b,                 \
-                                 float_status *status)                       \
-{                                                                            \
-    return float ## s ## _compare_internal(a, b, 1, status);                 \
-}
-
-COMPARE(32, 0xff)
-COMPARE(64, 0x7ff)
-
 static inline int floatx80_compare_internal(floatx80 a, floatx80 b,
                                             int is_quiet, float_status *status)
 {
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index ba248ffa39..a5232bcc87 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -360,6 +360,8 @@ float16 float16_minnum(float16, float16, float_status *status);
 float16 float16_maxnum(float16, float16, float_status *status);
 float16 float16_minnummag(float16, float16, float_status *status);
 float16 float16_maxnummag(float16, float16, float_status *status);
+int float16_compare(float16, float16, float_status *status);
+int float16_compare_quiet(float16, float16, float_status *status);
 
 int float16_is_quiet_nan(float16, float_status *status);
 int float16_is_signaling_nan(float16, float_status *status);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions
  2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
                   ` (18 preceding siblings ...)
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 19/19] fpu/softfloat: re-factor compare Alex Bennée
@ 2017-12-11 13:42 ` no-reply
  2017-12-11 15:40   ` Alex Bennée
  19 siblings, 1 reply; 51+ messages in thread
From: no-reply @ 2017-12-11 13:42 UTC (permalink / raw)
  To: alex.bennee
  Cc: famz, richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic, qemu-devel

Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20171211125705.16120-1-alex.bennee@linaro.org
Subject: [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 t [tag update]            patchew/20171205171706.23947-1-ybettan@redhat.com -> patchew/20171205171706.23947-1-ybettan@redhat.com
Switched to a new branch 'test'
45c20d7e14 fpu/softfloat: re-factor compare
b52eaabb36 fpu/softfloat: re-factor minmax
518bbcee3e fpu/softfloat: re-factor scalbn
208ff4370f fpu/softfloat: re-factor int/uint to float
85ef39bd87 fpu/softfloat: re-factor float to int/uint
7ae34923a2 fpu/softfloat: re-factor round_to_int
c8662d53c5 fpu/softfloat: re-factor muladd
c95c5f4c5f fpu/softfloat: re-factor div
d8ac851b38 fpu/softfloat: re-factor mul
174b42de84 fpu/softfloat: re-factor add/sub
c93e9f8a58 fpu/softfloat: define decompose structures
31f8322049 fpu/softfloat: move the extract functions to the top of the file
2a8a9e64a0 fpu/softfloat: improve comments on ARM NaN propagation
d1eefad57b fpu/softfloat: propagate signalling NaNs in MINMAX
e052270783 include/fpu/softfloat: add some float16 contants
149ba6ff6d include/fpu/softfloat: implement float16_set_sign helper
9921bf4d5a include/fpu/softfloat: implement float16_chs helper
0665d6d615 include/fpu/softfloat: implement float16_abs helper
36fce4ec4a fpu/softfloat: implement float16_squash_input_denormal

=== OUTPUT BEGIN ===
Checking PATCH 1/19: fpu/softfloat: implement float16_squash_input_denormal...
Checking PATCH 2/19: include/fpu/softfloat: implement float16_abs helper...
Checking PATCH 3/19: include/fpu/softfloat: implement float16_chs helper...
Checking PATCH 4/19: include/fpu/softfloat: implement float16_set_sign helper...
Checking PATCH 5/19: include/fpu/softfloat: add some float16 contants...
Checking PATCH 6/19: fpu/softfloat: propagate signalling NaNs in MINMAX...
Checking PATCH 7/19: fpu/softfloat: improve comments on ARM NaN propagation...
Checking PATCH 8/19: fpu/softfloat: move the extract functions to the top of the file...
Checking PATCH 9/19: fpu/softfloat: define decompose structures...
ERROR: spaces prohibited around that ':' (ctx:WxW)
#54: FILE: fpu/softfloat.c:210:
+    uint64_t frac   : 64;
                     ^

ERROR: spaces prohibited around that ':' (ctx:WxW)
#55: FILE: fpu/softfloat.c:211:
+    int exp         : 32;
                     ^

ERROR: space prohibited before that ':' (ctx:WxW)
#57: FILE: fpu/softfloat.c:213:
+    int             : 23;
                     ^

ERROR: spaces prohibited around that ':' (ctx:WxW)
#58: FILE: fpu/softfloat.c:214:
+    bool sign       : 1;
                     ^

total: 4 errors, 0 warnings, 84 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 10/19: fpu/softfloat: re-factor add/sub...
WARNING: line over 80 characters
#140: FILE: fpu/softfloat.c:364:
+                                                   const decomposed_params *parm)

total: 0 errors, 1 warnings, 937 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 11/19: fpu/softfloat: re-factor mul...
Checking PATCH 12/19: fpu/softfloat: re-factor div...
Checking PATCH 13/19: fpu/softfloat: re-factor muladd...
Checking PATCH 14/19: fpu/softfloat: re-factor round_to_int...
WARNING: line over 80 characters
#90: FILE: fpu/softfloat.c:1252:
+                inc = ((a.frac & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);

total: 0 errors, 1 warnings, 329 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 15/19: fpu/softfloat: re-factor float to int/uint...
ERROR: space prohibited after that '-' (ctx:WxW)
#55: FILE: fpu/softfloat.c:1347:
+            return r < - (uint64_t) INT64_MIN ? -r : INT64_MIN;
                        ^

WARNING: line over 80 characters
#91: FILE: fpu/softfloat.c:1383:
+int ## isz ## _t float ## fsz ## _to_int ## isz(float ## fsz a, float_status *s) \

WARNING: line over 80 characters
#171: FILE: fpu/softfloat.c:1463:
+uint ## isz ## _t float ## fsz ## _to_uint ## isz(float ## fsz a, float_status *s) \

ERROR: space prohibited after that open parenthesis '('
#711: FILE: fpu/softfloat.c:3410:
+    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )

ERROR: space prohibited before that close parenthesis ')'
#711: FILE: fpu/softfloat.c:3410:
+    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )

ERROR: space prohibited after that open parenthesis '('
#712: FILE: fpu/softfloat.c:3411:
+         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )

ERROR: space prohibited before that close parenthesis ')'
#712: FILE: fpu/softfloat.c:3411:
+         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )

ERROR: space prohibited after that open parenthesis '('
#733: FILE: fpu/softfloat.c:3419:
+    aSign = extractFloat32Sign( a );

ERROR: space prohibited before that close parenthesis ')'
#733: FILE: fpu/softfloat.c:3419:
+    aSign = extractFloat32Sign( a );

ERROR: space prohibited after that open parenthesis '('
#734: FILE: fpu/softfloat.c:3420:
+    bSign = extractFloat32Sign( b );

ERROR: space prohibited before that close parenthesis ')'
#734: FILE: fpu/softfloat.c:3420:
+    bSign = extractFloat32Sign( b );

WARNING: line over 80 characters
#737: FILE: fpu/softfloat.c:3423:
+    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );

ERROR: spaces required around that '<<' (ctx:VxV)
#737: FILE: fpu/softfloat.c:3423:
+    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
                                                                     ^

ERROR: space prohibited after that open parenthesis '('
#737: FILE: fpu/softfloat.c:3423:
+    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );

ERROR: space prohibited before that close parenthesis ')'
#737: FILE: fpu/softfloat.c:3423:
+    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );

ERROR: trailing statements should be on next line
#737: FILE: fpu/softfloat.c:3423:
+    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );

ERROR: braces {} are necessary for all arms of this statement
#737: FILE: fpu/softfloat.c:3423:
+    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
[...]

ERROR: space prohibited after that open parenthesis '('
#738: FILE: fpu/softfloat.c:3424:
+    return ( av != bv ) && ( aSign ^ ( av < bv ) );

ERROR: space prohibited before that close parenthesis ')'
#738: FILE: fpu/softfloat.c:3424:
+    return ( av != bv ) && ( aSign ^ ( av < bv ) );

ERROR: space prohibited after that open parenthesis '('
#798: FILE: fpu/softfloat.c:3440:
+    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )

ERROR: space prohibited before that close parenthesis ')'
#798: FILE: fpu/softfloat.c:3440:
+    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )

ERROR: space prohibited after that open parenthesis '('
#799: FILE: fpu/softfloat.c:3441:
+         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )

ERROR: space prohibited before that close parenthesis ')'
#799: FILE: fpu/softfloat.c:3441:
+         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )

total: 20 errors, 3 warnings, 1065 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 16/19: fpu/softfloat: re-factor int/uint to float...
Checking PATCH 17/19: fpu/softfloat: re-factor scalbn...
Checking PATCH 18/19: fpu/softfloat: re-factor minmax...
WARNING: line over 80 characters
#122: FILE: fpu/softfloat.c:1764:
+float ## sz float ## sz ## _ ## name(float ## sz a, float ## sz b, float_status *s) \

total: 0 errors, 1 warnings, 266 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 19/19: fpu/softfloat: re-factor compare...
WARNING: line over 80 characters
#88: FILE: fpu/softfloat.c:1864:
+int float ## sz ## _compare_quiet(float ## sz a, float ## sz b, float_status *s) \

total: 0 errors, 1 warnings, 155 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions
  2017-12-11 13:42 ` [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions no-reply
@ 2017-12-11 15:40   ` Alex Bennée
  0 siblings, 0 replies; 51+ messages in thread
From: Alex Bennée @ 2017-12-11 15:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: famz, richard.henderson, peter.maydell, laurent, bharata, andrew


no-reply@patchew.org writes:

> Hi,
>
> This series seems to have some coding style problems. See output below for
> more information:

FWIW these are either:

  - misidentified "spaces prohibited around that ':' (ctx:WxW)" for bitfields
  - existing softfloat code that has moved
  - two lines that just edge over the 80 char limit

> Checking PATCH 9/19: fpu/softfloat: define decompose structures...
> ERROR: spaces prohibited around that ':' (ctx:WxW)
> #54: FILE: fpu/softfloat.c:210:
> +    uint64_t frac   : 64;
>                      ^
>
> ERROR: spaces prohibited around that ':' (ctx:WxW)
> #55: FILE: fpu/softfloat.c:211:
> +    int exp         : 32;
>                      ^
>
> ERROR: space prohibited before that ':' (ctx:WxW)
> #57: FILE: fpu/softfloat.c:213:
> +    int             : 23;
>                      ^
>
> ERROR: spaces prohibited around that ':' (ctx:WxW)
> #58: FILE: fpu/softfloat.c:214:
> +    bool sign       : 1;
>                      ^
>
> total: 4 errors, 0 warnings, 84 lines checked
>
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
>
> Checking PATCH 10/19: fpu/softfloat: re-factor add/sub...
> WARNING: line over 80 characters
> #140: FILE: fpu/softfloat.c:364:
> +                                                   const decomposed_params *parm)
>
> total: 0 errors, 1 warnings, 937 lines checked
>
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> Checking PATCH 11/19: fpu/softfloat: re-factor mul...
> Checking PATCH 12/19: fpu/softfloat: re-factor div...
> Checking PATCH 13/19: fpu/softfloat: re-factor muladd...
> Checking PATCH 14/19: fpu/softfloat: re-factor round_to_int...
> WARNING: line over 80 characters
> #90: FILE: fpu/softfloat.c:1252:
> +                inc = ((a.frac & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
>
> total: 0 errors, 1 warnings, 329 lines checked
>
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> Checking PATCH 15/19: fpu/softfloat: re-factor float to int/uint...
> ERROR: space prohibited after that '-' (ctx:WxW)
> #55: FILE: fpu/softfloat.c:1347:
> +            return r < - (uint64_t) INT64_MIN ? -r : INT64_MIN;
>                         ^
>
> WARNING: line over 80 characters
> #91: FILE: fpu/softfloat.c:1383:
> +int ## isz ## _t float ## fsz ## _to_int ## isz(float ## fsz a, float_status *s) \
>
> WARNING: line over 80 characters
> #171: FILE: fpu/softfloat.c:1463:
> +uint ## isz ## _t float ## fsz ## _to_uint ## isz(float ## fsz a, float_status *s) \
>
> ERROR: space prohibited after that open parenthesis '('
> #711: FILE: fpu/softfloat.c:3410:
> +    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
>
> ERROR: space prohibited before that close parenthesis ')'
> #711: FILE: fpu/softfloat.c:3410:
> +    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
>
> ERROR: space prohibited after that open parenthesis '('
> #712: FILE: fpu/softfloat.c:3411:
> +         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
>
> ERROR: space prohibited before that close parenthesis ')'
> #712: FILE: fpu/softfloat.c:3411:
> +         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
>
> ERROR: space prohibited after that open parenthesis '('
> #733: FILE: fpu/softfloat.c:3419:
> +    aSign = extractFloat32Sign( a );
>
> ERROR: space prohibited before that close parenthesis ')'
> #733: FILE: fpu/softfloat.c:3419:
> +    aSign = extractFloat32Sign( a );
>
> ERROR: space prohibited after that open parenthesis '('
> #734: FILE: fpu/softfloat.c:3420:
> +    bSign = extractFloat32Sign( b );
>
> ERROR: space prohibited before that close parenthesis ')'
> #734: FILE: fpu/softfloat.c:3420:
> +    bSign = extractFloat32Sign( b );
>
> WARNING: line over 80 characters
> #737: FILE: fpu/softfloat.c:3423:
> +    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
>
> ERROR: spaces required around that '<<' (ctx:VxV)
> #737: FILE: fpu/softfloat.c:3423:
> +    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
>                                                                      ^
>
> ERROR: space prohibited after that open parenthesis '('
> #737: FILE: fpu/softfloat.c:3423:
> +    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
>
> ERROR: space prohibited before that close parenthesis ')'
> #737: FILE: fpu/softfloat.c:3423:
> +    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
>
> ERROR: trailing statements should be on next line
> #737: FILE: fpu/softfloat.c:3423:
> +    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
>
> ERROR: braces {} are necessary for all arms of this statement
> #737: FILE: fpu/softfloat.c:3423:
> +    if ( aSign != bSign ) return aSign && ( (uint32_t) ( ( av | bv )<<1 ) != 0 );
> [...]
>
> ERROR: space prohibited after that open parenthesis '('
> #738: FILE: fpu/softfloat.c:3424:
> +    return ( av != bv ) && ( aSign ^ ( av < bv ) );
>
> ERROR: space prohibited before that close parenthesis ')'
> #738: FILE: fpu/softfloat.c:3424:
> +    return ( av != bv ) && ( aSign ^ ( av < bv ) );
>
> ERROR: space prohibited after that open parenthesis '('
> #798: FILE: fpu/softfloat.c:3440:
> +    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
>
> ERROR: space prohibited before that close parenthesis ')'
> #798: FILE: fpu/softfloat.c:3440:
> +    if (    ( ( extractFloat32Exp( a ) == 0xFF ) && extractFloat32Frac( a ) )
>
> ERROR: space prohibited after that open parenthesis '('
> #799: FILE: fpu/softfloat.c:3441:
> +         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
>
> ERROR: space prohibited before that close parenthesis ')'
> #799: FILE: fpu/softfloat.c:3441:
> +         || ( ( extractFloat32Exp( b ) == 0xFF ) && extractFloat32Frac( b ) )
>
> total: 20 errors, 3 warnings, 1065 lines checked
>
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
>
> Checking PATCH 16/19: fpu/softfloat: re-factor int/uint to float...
> Checking PATCH 17/19: fpu/softfloat: re-factor scalbn...
> Checking PATCH 18/19: fpu/softfloat: re-factor minmax...
> WARNING: line over 80 characters
> #122: FILE: fpu/softfloat.c:1764:
> +float ## sz float ## sz ## _ ## name(float ## sz a, float ## sz b, float_status *s) \
>
> total: 0 errors, 1 warnings, 266 lines checked
>
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> Checking PATCH 19/19: fpu/softfloat: re-factor compare...
> WARNING: line over 80 characters
> #88: FILE: fpu/softfloat.c:1864:
> +int float ## sz ## _compare_quiet(float ## sz a, float ## sz b, float_status *s) \
>
> total: 0 errors, 1 warnings, 155 lines checked
<snip>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 16/19] fpu/softfloat: re-factor int/uint to float
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 16/19] fpu/softfloat: re-factor int/uint to float Alex Bennée
@ 2017-12-12 17:21   ` Alex Bennée
  2017-12-18 22:59   ` Richard Henderson
  1 sibling, 0 replies; 51+ messages in thread
From: Alex Bennée @ 2017-12-12 17:21 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno


Alex Bennée <alex.bennee@linaro.org> writes:

> These are considerably simpler as the lower order integers can just
> use the higher order conversion function. As the decomposed fractional
> part is a full 64 bit rounding and inexact handling comes from the
> pack functions.
<snip>
>
> +/*
> + * Integer to float conversions
> + *
> + * Returns the result of converting the two's complement integer `a'
> + * to the floating-point format. The conversion is performed according
> + * to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
> + */
> +
> +static decomposed_parts int_to_float(int64_t a, float_status *status)
> +{
> +    decomposed_parts r;
> +    if (a == 0) {
> +        r.cls = float_class_zero;
> +    } else if (a == (1ULL << 63)) {

As the re-pack code can handle -0 we need to explicitly set it here as
we are building decomposed_parts from scratch:

    if (a == 0) {
        r.cls = float_class_zero;
        r.sign = false;
    } else if (a == (1ULL << 63)) {

And also at:
> +
> +/*
> + * Unsigned Integer to float conversions
> + *
> + * Returns the result of converting the unsigned integer `a' to the
> + * floating-point format. The conversion is performed according to the
> + * IEC/IEEE Standard for Binary Floating-Point Arithmetic.
> + */
> +
> +static decomposed_parts uint_to_float(uint64_t a, float_status *status)
> +{
> +    decomposed_parts r;
> +    if (a == 0) {
> +        r.cls = float_class_zero;
> +    } else {

Now reads:

    decomposed_parts r = { .sign = false};

    if (a == 0) {
        r.cls = float_class_zero;
    } else {
        int spare_bits = clz64(a) - 1;
        r.cls = float_class_normal;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 02/19] include/fpu/softfloat: implement float16_abs helper
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 02/19] include/fpu/softfloat: implement float16_abs helper Alex Bennée
@ 2017-12-15 11:35   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 51+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-12-15 11:35 UTC (permalink / raw)
  To: Alex Bennée, richard.henderson, peter.maydell, laurent,
	bharata, andrew, aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 09:56 AM, Alex Bennée wrote:
> This will be required when expanding the MINMAX() macro for 16
> bit/half-precision operations.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> ---
>  include/fpu/softfloat.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index d5e99667b6..edf402d422 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -374,6 +374,13 @@ static inline int float16_is_zero_or_denormal(float16 a)
>      return (float16_val(a) & 0x7c00) == 0;
>  }
>  
> +static inline float16 float16_abs(float16 a)
> +{
> +    /* Note that abs does *not* handle NaN specially, nor does
> +     * it flush denormal inputs to zero.
> +     */
> +    return make_float16(float16_val(a) & 0x7fff);
> +}
>  /*----------------------------------------------------------------------------
>  | The pattern for a default generated half-precision NaN.
>  *----------------------------------------------------------------------------*/
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 05/19] include/fpu/softfloat: add some float16 contants
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 05/19] include/fpu/softfloat: add some float16 contants Alex Bennée
@ 2017-12-15 12:24   ` Alex Bennée
  2017-12-15 13:37   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 51+ messages in thread
From: Alex Bennée @ 2017-12-15 12:24 UTC (permalink / raw)
  To: richard.henderson, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno


Alex Bennée <alex.bennee@linaro.org> writes:

> This defines the same set of common constants for float 16 as defined
> for 32 and 64 bit floats. These are often used by target helper
> functions.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  include/fpu/softfloat.h | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 17dfe60dbd..5a9258c57c 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -395,6 +395,13 @@ static inline float16 float16_set_sign(float16 a, int sign)
>      return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
>  }
>
> +#define float16_zero make_float16(0)
> +#define float16_one make_float16(0x3a00)
> +#define float16_ln2 make_float16(0x34d1)
> +#define float16_pi make_float16(0x4448)
> +#define float16_half make_float16(0x3800)
> +#define float16_infinity make_float16(0x7a00)

And:
#define float16_infinity make_float16(0x7c00)


> +
>  /*----------------------------------------------------------------------------
>  | The pattern for a default generated half-precision NaN.
>  *----------------------------------------------------------------------------*/


--
Alex Bennée

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 05/19] include/fpu/softfloat: add some float16 contants
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 05/19] include/fpu/softfloat: add some float16 contants Alex Bennée
  2017-12-15 12:24   ` Alex Bennée
@ 2017-12-15 13:37   ` Philippe Mathieu-Daudé
  2017-12-18 21:50     ` Richard Henderson
  1 sibling, 1 reply; 51+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-12-15 13:37 UTC (permalink / raw)
  To: Alex Bennée, richard.henderson, peter.maydell, laurent,
	bharata, andrew, aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

[-- Attachment #1: Type: text/plain, Size: 1762 bytes --]

Hi Alex,

On 12/11/2017 09:56 AM, Alex Bennée wrote:
> This defines the same set of common constants for float 16 as defined
> for 32 and 64 bit floats. These are often used by target helper
> functions.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  include/fpu/softfloat.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 17dfe60dbd..5a9258c57c 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -395,6 +395,13 @@ static inline float16 float16_set_sign(float16 a, int sign)
>      return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
>  }
>  
> +#define float16_zero make_float16(0)

ok

> +#define float16_one make_float16(0x3a00)

I'm a bit confused...

>>> [np.fromstring(struct.pack("<H", x), dtype=np.float16)[0] for x in
[0, 0x3a00, 0x34d1, 0x4448, 0x3800, 0x7a00]]
[0.0, 0.75, 0.30103, 4.2812, 0.5, 49152.0]

However:

>>> ['0x' + binascii.hexlify(np.array([x], '>f2').tostring()) for x in
[0, 1, math.log(2), np.pi, 0.5, np.inf]]
['0x0000', '0x3c00', '0x398c', '0x4248', '0x3800', '0x7c00']

It seems the MSB bit of the mantissa got shifted as the LSB of the
biased exponent...

> +#define float16_ln2 make_float16(0x34d1)

incorrect? 0x398c

> +#define float16_pi make_float16(0x4448)

incorrect? 0x4248

> +#define float16_half make_float16(0x3800)

ok

> +#define float16_infinity make_float16(0x7a00)

incorrect? 0x7c00

> +
>  /*----------------------------------------------------------------------------
>  | The pattern for a default generated half-precision NaN.
>  *----------------------------------------------------------------------------*/
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 03/19] include/fpu/softfloat: implement float16_chs helper
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 03/19] include/fpu/softfloat: implement float16_chs helper Alex Bennée
@ 2017-12-18 21:41   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 21:41 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:56 AM, Alex Bennée wrote:
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  include/fpu/softfloat.h | 9 +++++++++
>  1 file changed, 9 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 04/19] include/fpu/softfloat: implement float16_set_sign helper
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 04/19] include/fpu/softfloat: implement float16_set_sign helper Alex Bennée
@ 2017-12-18 21:44   ` Richard Henderson
  2017-12-19  7:31     ` Alex Bennée
  2018-01-05 16:15     ` Philippe Mathieu-Daudé
  0 siblings, 2 replies; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 21:44 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:56 AM, Alex Bennée wrote:
> +static inline float16 float16_set_sign(float16 a, int sign)
> +{
> +    return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
> +}
> +

1) Do we use this anywhere?

2) While this is probably in line with the other implementations,
but going to a more qemu-ish style this should use deposit32.

Anyway,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 05/19] include/fpu/softfloat: add some float16 contants
  2017-12-15 13:37   ` Philippe Mathieu-Daudé
@ 2017-12-18 21:50     ` Richard Henderson
  2018-01-04 14:09       ` Alex Bennée
  0 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 21:50 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/15/2017 05:37 AM, Philippe Mathieu-Daudé wrote:
> Hi Alex,
> 
> On 12/11/2017 09:56 AM, Alex Bennée wrote:
>> This defines the same set of common constants for float 16 as defined
>> for 32 and 64 bit floats. These are often used by target helper
>> functions.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>>  include/fpu/softfloat.h | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
>> index 17dfe60dbd..5a9258c57c 100644
>> --- a/include/fpu/softfloat.h
>> +++ b/include/fpu/softfloat.h
>> @@ -395,6 +395,13 @@ static inline float16 float16_set_sign(float16 a, int sign)
>>      return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
>>  }
>>  
>> +#define float16_zero make_float16(0)
> 
> ok
> 
>> +#define float16_one make_float16(0x3a00)
> 
> I'm a bit confused...
> 
>>>> [np.fromstring(struct.pack("<H", x), dtype=np.float16)[0] for x in
> [0, 0x3a00, 0x34d1, 0x4448, 0x3800, 0x7a00]]
> [0.0, 0.75, 0.30103, 4.2812, 0.5, 49152.0]
> 
> However:
> 
>>>> ['0x' + binascii.hexlify(np.array([x], '>f2').tostring()) for x in
> [0, 1, math.log(2), np.pi, 0.5, np.inf]]
> ['0x0000', '0x3c00', '0x398c', '0x4248', '0x3800', '0x7c00']
> 
> It seems the MSB bit of the mantissa got shifted as the LSB of the
> biased exponent...
> 
>> +#define float16_ln2 make_float16(0x34d1)
> 
> incorrect? 0x398c
> 
>> +#define float16_pi make_float16(0x4448)
> 
> incorrect? 0x4248
> 
>> +#define float16_half make_float16(0x3800)
> 
> ok
> 
>> +#define float16_infinity make_float16(0x7a00)
> 
> incorrect? 0x7c00

All of Phil's numbers are correct -- I double-checked with gcc.

Other than 0, 1 and +inf, I doubt any of the others will actually be used.
Perhaps we should just leave them out?


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 06/19] fpu/softfloat: propagate signalling NaNs in MINMAX
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 06/19] fpu/softfloat: propagate signalling NaNs in MINMAX Alex Bennée
@ 2017-12-18 21:53   ` Richard Henderson
  2018-01-05 13:05     ` Alex Bennée
  0 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 21:53 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:56 AM, Alex Bennée wrote:
> While a comparison between a QNaN and a number will return the number
> it is not the same with a signaling NaN. In this case the SNaN will
> "win" and after potentially raising an exception it will be quietened.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> 
> ---
> v2
>   - added return for propageFloat
> ---
>  fpu/softfloat.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

I suppose this fixes minmax for float128 too,
and is thus not redundant with patch 18?

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 07/19] fpu/softfloat: improve comments on ARM NaN propagation
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 07/19] fpu/softfloat: improve comments on ARM NaN propagation Alex Bennée
@ 2017-12-18 21:54   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 21:54 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:56 AM, Alex Bennée wrote:
> Mention the pseudo-code fragment from which this is based and correct
> the spelling of signalling.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  fpu/softfloat-specialize.h | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 08/19] fpu/softfloat: move the extract functions to the top of the file
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 08/19] fpu/softfloat: move the extract functions to the top of the file Alex Bennée
@ 2017-12-18 21:57   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 21:57 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:56 AM, Alex Bennée wrote:
> +}
>  /*----------------------------------------------------------------------------

Watch the vertical white space.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 09/19] fpu/softfloat: define decompose structures
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 09/19] fpu/softfloat: define decompose structures Alex Bennée
@ 2017-12-18 21:59   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 21:59 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:56 AM, Alex Bennée wrote:
> These structures pave the way for generic softfloat helper routines
> that will operate on fully decomposed numbers.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  fpu/softfloat.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 71 insertions(+), 1 deletion(-)

Since I was involved in writing this,

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 10/19] fpu/softfloat: re-factor add/sub
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 10/19] fpu/softfloat: re-factor add/sub Alex Bennée
@ 2017-12-18 22:18   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 22:18 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:56 AM, Alex Bennée wrote:
> We can now add float16_add/sub and use the common decompose and
> canonicalize functions to have a single implementation for
> float16/32/64 add and sub functions.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

I was involved in writing this, so

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

However,

> +/*
> + * Returns the result of adding the absolute values of the
> + * floating-point values `a' and `b'. If `subtract' is set, the sum is
> + * negated before being returned. `subtract' is ignored if the result
> + * is a NaN. The addition is performed according to the IEC/IEEE
> + * Standard for Binary Floating-Point Arithmetic.
> + */
> +
> +static decomposed_parts add_decomposed(decomposed_parts a, decomposed_parts b,
> +                                       bool subtract, float_status *s)

The comment does not accurately describe what the function does, particularly
wrt subtract.

> +        if (a.cls >= float_class_qnan
> +            ||
> +            b.cls >= float_class_qnan)

Would you please fix this up throughout the patch set?
While I prefer the GNU

  (X
   || Y)

I'm also ok with

  (X ||
   Y)

but || on a line by itself is just weird.


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 11/19] fpu/softfloat: re-factor mul
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 11/19] fpu/softfloat: re-factor mul Alex Bennée
@ 2017-12-18 22:22   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 22:22 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:56 AM, Alex Bennée wrote:
> We can now add float16_mul and use the common decompose and
> canonicalize functions to have a single implementation for
> float16/32/64 versions.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  fpu/softfloat.c         | 207 ++++++++++++++++++------------------------------
>  include/fpu/softfloat.h |   1 +
>  2 files changed, 80 insertions(+), 128 deletions(-)

I was involved in writing this, so

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 12/19] fpu/softfloat: re-factor div
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 12/19] fpu/softfloat: re-factor div Alex Bennée
@ 2017-12-18 22:26   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 22:26 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:56 AM, Alex Bennée wrote:
> We can now add float16_div and use the common decompose and
> canonicalize functions to have a single implementation for
> float16/32/64 versions.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  fpu/softfloat-macros.h  |  44 +++++++++
>  fpu/softfloat.c         | 235 ++++++++++++++++++------------------------------
>  include/fpu/softfloat.h |   1 +
>  3 files changed, 134 insertions(+), 146 deletions(-)

I was involved in writing this, so

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 13/19] fpu/softfloat: re-factor muladd
  2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 13/19] fpu/softfloat: re-factor muladd Alex Bennée
@ 2017-12-18 22:36   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 22:36 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:56 AM, Alex Bennée wrote:
> +    if (flags & float_muladd_halve_result) {
> +            p_exp -= 1;
> +    }

Indent.  Otherwise, I was involved in writing this, so

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 14/19] fpu/softfloat: re-factor round_to_int
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 14/19] fpu/softfloat: re-factor round_to_int Alex Bennée
@ 2017-12-18 22:41   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 22:41 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:57 AM, Alex Bennée wrote:
> We can now add float16_round_to_int and use the common round_decomposed and
> canonicalize functions to have a single implementation for
> float16/32/64 round_to_int functions.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  fpu/softfloat.c         | 304 ++++++++++++++++++++----------------------------
>  include/fpu/softfloat.h |   1 +
>  2 files changed, 130 insertions(+), 175 deletions(-)

I was involved in writing this, so

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 15/19] fpu/softfloat: re-factor float to int/uint
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 15/19] fpu/softfloat: re-factor float to int/uint Alex Bennée
@ 2017-12-18 22:54   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 22:54 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:57 AM, Alex Bennée wrote:
> +        }
> +        if (p.exp < DECOMPOSED_BINARY_POINT) {
> +            return p.frac >> (DECOMPOSED_BINARY_POINT - p.exp);
> +        } else if (p.exp < 64) {
> +            return p.frac << (p.exp - DECOMPOSED_BINARY_POINT);
> +        } else {
> +            return UINT64_MAX;
> +        }
> +    default:
> +        g_assert_not_reached();
> +    }
> +}
> +
> +static uint16_t uint16_pack_decomposed(decomposed_parts p, float_status *s)
> +{
> +    uint64_t r = uint64_pack_decomposed(p, s);
> +    return r > UINT16_MAX ? UINT16_MAX : r;
> +}
> +
> +static uint32_t uint32_pack_decomposed(decomposed_parts p, float_status *s)
> +{
> +    uint64_t r = uint64_pack_decomposed(p, s);
> +    return r > UINT32_MAX ? UINT32_MAX : r;
> +}

Missing float_flag_invalid for unsigned overflows.


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 16/19] fpu/softfloat: re-factor int/uint to float
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 16/19] fpu/softfloat: re-factor int/uint to float Alex Bennée
  2017-12-12 17:21   ` Alex Bennée
@ 2017-12-18 22:59   ` Richard Henderson
  2018-01-05 15:51     ` Alex Bennée
  1 sibling, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 22:59 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:57 AM, Alex Bennée wrote:
> These are considerably simpler as the lower order integers can just
> use the higher order conversion function. As the decomposed fractional
> part is a full 64 bit rounding and inexact handling comes from the
> pack functions.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  fpu/softfloat.c         | 358 +++++++++++++++++++++++++-----------------------
>  include/fpu/softfloat.h |  30 ++--
>  2 files changed, 195 insertions(+), 193 deletions(-)
> 
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index d7858bdae5..1a7f1cab10 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -1409,17 +1409,18 @@ FLOAT_TO_INT(64, 64)
>  
>  #undef FLOAT_TO_INT
>  
> -/*----------------------------------------------------------------------------
> -| Returns the result of converting the  floating-point value
> -| `a' to the unsigned integer format.  The conversion is
> -| performed according to the IEC/IEEE Standard for Binary Floating-Point
> -| Arithmetic---which means in particular that the conversion is rounded
> -| according to the current rounding mode.  If `a' is a NaN, the largest
> -| unsigned integer is returned.  Otherwise, if the conversion overflows, the
> -| largest unsigned integer is returned.  If the 'a' is negative, the result
> -| is rounded and zero is returned; values that do not round to zero will
> -| raise the inexact exception flag.
> -*----------------------------------------------------------------------------*/
> +/*
> + *  Returns the result of converting the floating-point value `a' to
> + *  the unsigned integer format. The conversion is performed according
> + *  to the IEC/IEEE Standard for Binary Floating-Point
> + *  Arithmetic---which means in particular that the conversion is
> + *  rounded according to the current rounding mode. If `a' is a NaN,
> + *  the largest unsigned integer is returned. Otherwise, if the
> + *  conversion overflows, the largest unsigned integer is returned. If
> + *  the 'a' is negative, the result is rounded and zero is returned;
> + *  values that do not round to zero will raise the inexact exception
> + *  flag.
> + */
>  
>  static uint64_t uint64_pack_decomposed(decomposed_parts p, float_status *s)
>  {
> @@ -1433,6 +1434,7 @@ static uint64_t uint64_pack_decomposed(decomposed_parts p, float_status *s)
>          return 0;
>      case float_class_normal:
>          if (p.sign) {
> +            s->float_exception_flags |= float_flag_invalid;
>              return 0;
>          }
>          if (p.exp < DECOMPOSED_BINARY_POINT) {
> @@ -1440,6 +1442,7 @@ static uint64_t uint64_pack_decomposed(decomposed_parts p, float_status *s)
>          } else if (p.exp < 64) {
>              return p.frac << (p.exp - DECOMPOSED_BINARY_POINT);
>          } else {
> +            s->float_exception_flags |= float_flag_invalid;
>              return UINT64_MAX;
>          }
>      default:
> @@ -1450,13 +1453,21 @@ static uint64_t uint64_pack_decomposed(decomposed_parts p, float_status *s)
>  static uint16_t uint16_pack_decomposed(decomposed_parts p, float_status *s)
>  {
>      uint64_t r = uint64_pack_decomposed(p, s);
> -    return r > UINT16_MAX ? UINT16_MAX : r;
> +    if (r > UINT16_MAX) {
> +        s->float_exception_flags |= float_flag_invalid;
> +        r = UINT16_MAX;
> +    }
> +    return r;
>  }
>  
>  static uint32_t uint32_pack_decomposed(decomposed_parts p, float_status *s)
>  {
>      uint64_t r = uint64_pack_decomposed(p, s);
> -    return r > UINT32_MAX ? UINT32_MAX : r;
> +    if (r > UINT32_MAX) {
> +        s->float_exception_flags |= float_flag_invalid;
> +        r = UINT32_MAX;
> +    }
> +    return r;
>  }
>  
>  #define F

Ah, the fix for the bug in patch 15 got squashed into the wrong patch.  ;-)

> +float16 int16_to_float16(int16_t a, float_status *status)
> +{
> +    return int64_to_float16((int64_t) a, status);
> +}

Kill all of the redundant casts?

Otherwise, as amended in your followup,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 17/19] fpu/softfloat: re-factor scalbn
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 17/19] fpu/softfloat: re-factor scalbn Alex Bennée
@ 2017-12-18 23:00   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 23:00 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:57 AM, Alex Bennée wrote:
> This is one of the simpler manipulations you could make to a floating
> point number.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  fpu/softfloat.c         | 104 +++++++++++++++---------------------------------
>  include/fpu/softfloat.h |   1 +
>  2 files changed, 32 insertions(+), 73 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 18/19] fpu/softfloat: re-factor minmax
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 18/19] fpu/softfloat: re-factor minmax Alex Bennée
@ 2017-12-18 23:19   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 23:19 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:57 AM, Alex Bennée wrote:
> +static decomposed_parts minmax_decomposed(decomposed_parts a,
> +                                          decomposed_parts b,
> +                                          bool ismin, bool ieee, bool ismag,
> +                                          float_status *s)
> +{
> +        if (a.cls >= float_class_qnan

Indent is off by 4.

> +                    if (s->default_nan_mode) {
> +                        a.cls = float_class_msnan;
> +                        return a;

float_class_dnan.  That said, can't you fall through to pick_nan_parts and have
that (and float_flag_invalid) handled already?

> +        if (a.cls == float_class_zero || b.cls == float_class_zero) {
> +            if (a.cls == float_class_normal) {
> +                if (a.sign) {
> +                    return ismin ? a : b;
> +                } else {
> +                    return ismin ? b : a;
> +                }
> +            } else if (b.cls == float_class_normal) {
> +                if (b.sign) {
> +                    return ismin ? b : a;
> +                } else {
> +                    return ismin ? a : b;
> +                }
> +            }
> +        }

With respect to zero, normal and inf should be handled the same.
Both of those middle tests should be

  a.cls == float_class_normal || a.cls == float_class_inf

It would appear this section is not honoring ismag.

This is one case where I think it might be helpful to do

    int a_exp, b_exp;
    bool a_sign, b_sign;

    if (nans) {
        ...
    }

    switch (a.cls) {
    case float_class_normal:
        a_exp = a.exp;
        break;
    case float_class_inf:
        a_exp = INT_MAX;
        break;
    case float_class_zero:
        a_exp = INT_MIN;
        break;
    }
    switch (b.cls) {
        ....
    }
    a_sign = a.sign;
    b_sign = b.sign
    if (ismag) {
        a_sign = b_sign = 0;
    }

    if (a_sign == b_sign) {
        bool a_less = a_exp < b_exp;
        if (a_exp == b_exp) {
            a_less = a.frac < b.frac;
        }
        return a_sign ^ a_less ^ ismin ? b : a;
    } else {
        return a_sign ^ ismin ? b : a;
    }


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 19/19] fpu/softfloat: re-factor compare
  2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 19/19] fpu/softfloat: re-factor compare Alex Bennée
@ 2017-12-18 23:26   ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2017-12-18 23:26 UTC (permalink / raw)
  To: Alex Bennée, peter.maydell, laurent, bharata, andrew,
	aleksandar.markovic
  Cc: qemu-devel, Aurelien Jarno

On 12/11/2017 04:57 AM, Alex Bennée wrote:
> +    if (a.cls == float_class_zero || b.cls == float_class_zero) {
> +        if (a.cls == float_class_normal) {
> +            return a.sign ? float_relation_less : float_relation_greater;
> +        } else if (b.cls == float_class_normal) {
> +            return b.sign ? float_relation_greater : float_relation_less;
> +        } else if (a.cls == b.cls) {
> +            return float_relation_equal;
> +        }
> +    }

This misses out on infinity handling, which should be like normals.
Perhaps better as

    if (a.cls == float_class_zero) {
        if (b.cls == float_class_zero) {
            return float_relation_equal;
        }
        return b.sign ? float_relation_greater : float_relation_less;
    } else if (b.cls == float_class_zero) {
        return a.sign ? float_relation_less : float_relation_greater;
    }

> +    /* The only infinity we need to explicitly worry about is
> +     * comparing two together, otherwise the max_exp/sign details are
> +     * enough to compare to normal numbers
> +     */

I don't think it's wise to rely on the contents of .exp for float_class_inf.
Really, the only valid member for that type is sign.  Better as

    if (a.cls == float_class_inf) {
        if (b.cls == float_class_inf) {
            if (a.sign == b.sign) {
                return float_relation_equal;
            }
        }
        return a.sign ? float_relation_less : float_relation_greater;
    } else if (b.cls == float_class_inf) {
        return b.sign ? float_relation_less : float_relation_greater;
    }


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 04/19] include/fpu/softfloat: implement float16_set_sign helper
  2017-12-18 21:44   ` Richard Henderson
@ 2017-12-19  7:31     ` Alex Bennée
  2018-01-08 12:58       ` Alex Bennée
  2018-01-05 16:15     ` Philippe Mathieu-Daudé
  1 sibling, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2017-12-19  7:31 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, laurent, bharata, andrew, aleksandar.markovic,
	qemu-devel, Aurelien Jarno


Richard Henderson <richard.henderson@linaro.org> writes:

> On 12/11/2017 04:56 AM, Alex Bennée wrote:
>> +static inline float16 float16_set_sign(float16 a, int sign)
>> +{
>> +    return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
>> +}
>> +
>
> 1) Do we use this anywhere?

Yes in the target specific helpers

>
> 2) While this is probably in line with the other implementations,
> but going to a more qemu-ish style this should use deposit32.

OK, will do.

>
> Anyway,
>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
>
> r~


--
Alex Bennée

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 05/19] include/fpu/softfloat: add some float16 contants
  2017-12-18 21:50     ` Richard Henderson
@ 2018-01-04 14:09       ` Alex Bennée
  2018-01-04 15:05         ` Richard Henderson
  0 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2018-01-04 14:09 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Philippe Mathieu-Daudé,
	peter.maydell, laurent, bharata, andrew, aleksandar.markovic,
	qemu-devel, Aurelien Jarno


Richard Henderson <richard.henderson@linaro.org> writes:

> On 12/15/2017 05:37 AM, Philippe Mathieu-Daudé wrote:
>> Hi Alex,
>>
>> On 12/11/2017 09:56 AM, Alex Bennée wrote:
>>> This defines the same set of common constants for float 16 as defined
>>> for 32 and 64 bit floats. These are often used by target helper
>>> functions.
>>>
>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>> ---
>>>  include/fpu/softfloat.h | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
>>> index 17dfe60dbd..5a9258c57c 100644
>>> --- a/include/fpu/softfloat.h
>>> +++ b/include/fpu/softfloat.h
>>> @@ -395,6 +395,13 @@ static inline float16 float16_set_sign(float16 a, int sign)
>>>      return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
>>>  }
>>>
>>> +#define float16_zero make_float16(0)
>>
>> ok
>>
>>> +#define float16_one make_float16(0x3a00)
>>
>> I'm a bit confused...
>>
>>>>> [np.fromstring(struct.pack("<H", x), dtype=np.float16)[0] for x in
>> [0, 0x3a00, 0x34d1, 0x4448, 0x3800, 0x7a00]]
>> [0.0, 0.75, 0.30103, 4.2812, 0.5, 49152.0]
>>
>> However:
>>
>>>>> ['0x' + binascii.hexlify(np.array([x], '>f2').tostring()) for x in
>> [0, 1, math.log(2), np.pi, 0.5, np.inf]]
>> ['0x0000', '0x3c00', '0x398c', '0x4248', '0x3800', '0x7c00']
>>
>> It seems the MSB bit of the mantissa got shifted as the LSB of the
>> biased exponent...
>>
>>> +#define float16_ln2 make_float16(0x34d1)
>>
>> incorrect? 0x398c
>>
>>> +#define float16_pi make_float16(0x4448)
>>
>> incorrect? 0x4248
>>
>>> +#define float16_half make_float16(0x3800)
>>
>> ok
>>
>>> +#define float16_infinity make_float16(0x7a00)
>>
>> incorrect? 0x7c00
>
> All of Phil's numbers are correct -- I double-checked with gcc.
>
> Other than 0, 1 and +inf, I doubt any of the others will actually be used.
> Perhaps we should just leave them out?

I was following the other sizes. It seems it's x80 which uses the
additional ones in helpers:

  floatx80_ln2
  floatx80_pi

Should I delete all the unused constants from the other sizes as well?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 05/19] include/fpu/softfloat: add some float16 contants
  2018-01-04 14:09       ` Alex Bennée
@ 2018-01-04 15:05         ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-01-04 15:05 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Philippe Mathieu-Daudé,
	peter.maydell, laurent, bharata, andrew, aleksandar.markovic,
	qemu-devel, Aurelien Jarno

On 01/04/2018 06:09 AM, Alex Bennée wrote:
> I was following the other sizes. It seems it's x80 which uses the
> additional ones in helpers:
> 
>   floatx80_ln2
>   floatx80_pi
> 
> Should I delete all the unused constants from the other sizes as well?

Yeah, let's do that.


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 06/19] fpu/softfloat: propagate signalling NaNs in MINMAX
  2017-12-18 21:53   ` Richard Henderson
@ 2018-01-05 13:05     ` Alex Bennée
  0 siblings, 0 replies; 51+ messages in thread
From: Alex Bennée @ 2018-01-05 13:05 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, laurent, bharata, andrew, aleksandar.markovic,
	qemu-devel, Aurelien Jarno


Richard Henderson <richard.henderson@linaro.org> writes:

> On 12/11/2017 04:56 AM, Alex Bennée wrote:
>> While a comparison between a QNaN and a number will return the number
>> it is not the same with a signaling NaN. In this case the SNaN will
>> "win" and after potentially raising an exception it will be quietened.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>
>> ---
>> v2
>>   - added return for propageFloat
>> ---
>>  fpu/softfloat.c | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> I suppose this fixes minmax for float128 too,
> and is thus not redundant with patch 18?

It was never expanded so I guess no one does float128 minmax's at the moment.

>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
>
> r~


--
Alex Bennée

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 16/19] fpu/softfloat: re-factor int/uint to float
  2017-12-18 22:59   ` Richard Henderson
@ 2018-01-05 15:51     ` Alex Bennée
  0 siblings, 0 replies; 51+ messages in thread
From: Alex Bennée @ 2018-01-05 15:51 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, laurent, bharata, andrew, aleksandar.markovic,
	qemu-devel, Aurelien Jarno


Richard Henderson <richard.henderson@linaro.org> writes:

> On 12/11/2017 04:57 AM, Alex Bennée wrote:
>> These are considerably simpler as the lower order integers can just
>> use the higher order conversion function. As the decomposed fractional
>> part is a full 64 bit rounding and inexact handling comes from the
>> pack functions.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
<snip>
>>
>>  static uint32_t uint32_pack_decomposed(decomposed_parts p, float_status *s)
>>  {
>>      uint64_t r = uint64_pack_decomposed(p, s);
>> -    return r > UINT32_MAX ? UINT32_MAX : r;
>> +    if (r > UINT32_MAX) {
>> +        s->float_exception_flags |= float_flag_invalid;
>> +        r = UINT32_MAX;
>> +    }
>> +    return r;
>>  }
>>
>>  #define F
>
> Ah, the fix for the bug in patch 15 got squashed into the wrong patch.
> ;-)

Hmm slip of the re-base... the fix has been moved.

>
>> +float16 int16_to_float16(int16_t a, float_status *status)
>> +{
>> +    return int64_to_float16((int64_t) a, status);
>> +}
>
> Kill all of the redundant casts?

Ack.

>
> Otherwise, as amended in your followup,
>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
>
> r~


--
Alex Bennée

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 04/19] include/fpu/softfloat: implement float16_set_sign helper
  2017-12-18 21:44   ` Richard Henderson
  2017-12-19  7:31     ` Alex Bennée
@ 2018-01-05 16:15     ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 51+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-01-05 16:15 UTC (permalink / raw)
  To: Richard Henderson, Alex Bennée, Peter Crosthwaite
  Cc: peter.maydell, laurent, bharata, andrew, aleksandar.markovic,
	qemu-devel, Aurelien Jarno, Alistair Francis

[-- Attachment #1: Type: text/plain, Size: 587 bytes --]

On 12/18/2017 06:44 PM, Richard Henderson wrote:
> On 12/11/2017 04:56 AM, Alex Bennée wrote:
>> +static inline float16 float16_set_sign(float16 a, int sign)
>> +{
>> +    return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
>> +}
>> +
> 
> 1) Do we use this anywhere?
> 
> 2) While this is probably in line with the other implementations,
> but going to a more qemu-ish style this should use deposit32.

Yes, it is easier to read while reviewing (no mask or shift), so safer.

That's probably why I'm becoming addict of the "hw/registerfields.h"
API... :)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 04/19] include/fpu/softfloat: implement float16_set_sign helper
  2017-12-19  7:31     ` Alex Bennée
@ 2018-01-08 12:58       ` Alex Bennée
  2018-01-08 20:25         ` Richard Henderson
  0 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2018-01-08 12:58 UTC (permalink / raw)
  To: Richard Henderson
  Cc: peter.maydell, laurent, bharata, andrew, aleksandar.markovic,
	qemu-devel, Aurelien Jarno


Alex Bennée <alex.bennee@linaro.org> writes:

> Richard Henderson <richard.henderson@linaro.org> writes:
>
>> On 12/11/2017 04:56 AM, Alex Bennée wrote:
>>> +static inline float16 float16_set_sign(float16 a, int sign)
>>> +{
>>> +    return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
>>> +}
>>> +
>>
>> 1) Do we use this anywhere?
>
> Yes in the target specific helpers
>
>>
>> 2) While this is probably in line with the other implementations,
>> but going to a more qemu-ish style this should use deposit32.
>
> OK, will do.
>

It turns out doing this unleashes a weird circular dependency at we need
qemu/bitops.h but that brings in host-utils.h and bswap.h which tries
to include softfloat.h again.

          CHK version_gen.h
    CC      qga/main.o
  In file included from /home/alex/lsrc/qemu/qemu.git/include/qemu/bitops.h:16:0,
                   from /home/alex/lsrc/qemu/qemu.git/include/fpu/softfloat.h:85,
                   from /home/alex/lsrc/qemu/qemu.git/include/qemu/bswap.h:4,
                   from qga/main.c:28:
  /home/alex/lsrc/qemu/qemu.git/include/qemu/host-utils.h: In function ‘revbit16’:
  /home/alex/lsrc/qemu/qemu.git/include/qemu/host-utils.h:293:9: error: implicit declaration of function ‘bswap16’ [-Werror=implicit-function-declaration]
       x = bswap16(x);
           ^
  /home/alex/lsrc/qemu/qemu.git/include/qemu/host-utils.h:293:5: error: nested extern declaration of ‘bswap16’ [-Werror=nested-externs]
       x = bswap16(x);
       ^
  /home/alex/lsrc/qemu/qemu.git/include/qemu/host-utils.h: In function ‘revbit32’:
  /home/alex/lsrc/qemu/qemu.git/include/qemu/host-utils.h:312:9: error: implicit declaration of function ‘bswap32’ [-Werror=implicit-function-declaration]
       x = bswap32(x);
           ^
  /home/alex/lsrc/qemu/qemu.git/include/qemu/host-utils.h:312:5: error: nested extern declaration of ‘bswap32’ [-Werror=nested-externs]
       x = bswap32(x);
       ^
  /home/alex/lsrc/qemu/qemu.git/include/qemu/host-utils.h: In function ‘revbit64’:
  /home/alex/lsrc/qemu/qemu.git/include/qemu/host-utils.h:331:9: error: implicit declaration of function ‘bswap64’ [-Werror=implicit-function-declaration]
       x = bswap64(x);
           ^
  /home/alex/lsrc/qemu/qemu.git/include/qemu/host-utils.h:331:5: error: nested extern declaration of ‘bswap64’ [-Werror=nested-externs]
       x = bswap64(x);
       ^
  In file included from qga/main.c:28:0:
  /home/alex/lsrc/qemu/qemu.git/include/qemu/bswap.h: At top level:
  /home/alex/lsrc/qemu/qemu.git/include/qemu/bswap.h:14:24: error: conflicting types for ‘bswap16’
   static inline uint16_t bswap16(uint16_t x)
                          ^
  In file included from /home/alex/lsrc/qemu/qemu.git/include/qemu/bitops.h:16:0,
                   from /home/alex/lsrc/qemu/qemu.git/include/fpu/softfloat.h:85,
                   from /home/alex/lsrc/qemu/qemu.git/include/qemu/bswap.h:4,
                   from qga/main.c:28:
  /home/alex/lsrc/qemu/qemu.git/include/qemu/host-utils.h:293:9: note: previous implicit declaration of ‘bswap16’ was here
       x = bswap16(x);
           ^
  In file included from qga/main.c:28:0:
  /home/alex/lsrc/qemu/qemu.git/include/qemu/bswap.h:19:24: error: conflicting types for ‘bswap32’
   static inline uint32_t bswap32(uint32_t x)
                          ^
  In file included from /home/alex/lsrc/qemu/qemu.git/include/qemu/bitops.h:16:0,
                   from /home/alex/lsrc/qemu/qemu.git/include/fpu/softfloat.h:85,
                   from /home/alex/lsrc/qemu/qemu.git/include/qemu/bswap.h:4,
                   from qga/main.c:28:
  /home/alex/lsrc/qemu/qemu.git/include/qemu/host-utils.h:312:9: note: previous implicit declaration of ‘bswap32’ was here
       x = bswap32(x);
           ^
  In file included from qga/main.c:28:0:
  /home/alex/lsrc/qemu/qemu.git/include/qemu/bswap.h:24:24: error: conflicting types for ‘bswap64’
   static inline uint64_t bswap64(uint64_t x)
                          ^
  In file included from /home/alex/lsrc/qemu/qemu.git/include/qemu/bitops.h:16:0,
                   from /home/alex/lsrc/qemu/qemu.git/include/fpu/softfloat.h:85,
                   from /home/alex/lsrc/qemu/qemu.git/include/qemu/bswap.h:4,
                   from qga/main.c:28:
  /home/alex/lsrc/qemu/qemu.git/include/qemu/host-utils.h:331:9: note: previous implicit declaration of ‘bswap64’ was here
       x = bswap64(x);
           ^
  cc1: all warnings being treated as errors
  /home/alex/lsrc/qemu/qemu.git/rules.mak:66: recipe for target 'qga/main.o' failed
  make: *** [qga/main.o] Error 1

  Compilation exited abnormally with code 2 at Mon Jan  8 12:57:41


--
Alex Bennée

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH v1 04/19] include/fpu/softfloat: implement float16_set_sign helper
  2018-01-08 12:58       ` Alex Bennée
@ 2018-01-08 20:25         ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-01-08 20:25 UTC (permalink / raw)
  To: Alex Bennée
  Cc: peter.maydell, laurent, bharata, andrew, aleksandar.markovic,
	qemu-devel, Aurelien Jarno

On 01/08/2018 04:58 AM, Alex Bennée wrote:
> 
> Alex Bennée <alex.bennee@linaro.org> writes:
> 
>> Richard Henderson <richard.henderson@linaro.org> writes:
>>
>>> On 12/11/2017 04:56 AM, Alex Bennée wrote:
>>>> +static inline float16 float16_set_sign(float16 a, int sign)
>>>> +{
>>>> +    return make_float16((float16_val(a) & 0x7fff) | (sign << 15));
>>>> +}
>>>> +
>>>
>>> 1) Do we use this anywhere?
>>
>> Yes in the target specific helpers
>>
>>>
>>> 2) While this is probably in line with the other implementations,
>>> but going to a more qemu-ish style this should use deposit32.
>>
>> OK, will do.
>>
> 
> It turns out doing this unleashes a weird circular dependency at we need
> qemu/bitops.h but that brings in host-utils.h and bswap.h which tries
> to include softfloat.h again.

Bah.

Just ignore this request for now then.

For future cleanup, I'm sure that bswap.h includes softfloat.h for the
float32/float64 typedefs.  We should move those out somewhere else -- probably
qemu/typedefs.h.  Which probably drops the number of objects that depend on
softfloat.h by a factor of 100.


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2018-01-08 20:26 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-11 12:56 [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions Alex Bennée
2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 01/19] fpu/softfloat: implement float16_squash_input_denormal Alex Bennée
2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 02/19] include/fpu/softfloat: implement float16_abs helper Alex Bennée
2017-12-15 11:35   ` Philippe Mathieu-Daudé
2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 03/19] include/fpu/softfloat: implement float16_chs helper Alex Bennée
2017-12-18 21:41   ` Richard Henderson
2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 04/19] include/fpu/softfloat: implement float16_set_sign helper Alex Bennée
2017-12-18 21:44   ` Richard Henderson
2017-12-19  7:31     ` Alex Bennée
2018-01-08 12:58       ` Alex Bennée
2018-01-08 20:25         ` Richard Henderson
2018-01-05 16:15     ` Philippe Mathieu-Daudé
2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 05/19] include/fpu/softfloat: add some float16 contants Alex Bennée
2017-12-15 12:24   ` Alex Bennée
2017-12-15 13:37   ` Philippe Mathieu-Daudé
2017-12-18 21:50     ` Richard Henderson
2018-01-04 14:09       ` Alex Bennée
2018-01-04 15:05         ` Richard Henderson
2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 06/19] fpu/softfloat: propagate signalling NaNs in MINMAX Alex Bennée
2017-12-18 21:53   ` Richard Henderson
2018-01-05 13:05     ` Alex Bennée
2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 07/19] fpu/softfloat: improve comments on ARM NaN propagation Alex Bennée
2017-12-18 21:54   ` Richard Henderson
2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 08/19] fpu/softfloat: move the extract functions to the top of the file Alex Bennée
2017-12-18 21:57   ` Richard Henderson
2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 09/19] fpu/softfloat: define decompose structures Alex Bennée
2017-12-18 21:59   ` Richard Henderson
2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 10/19] fpu/softfloat: re-factor add/sub Alex Bennée
2017-12-18 22:18   ` Richard Henderson
2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 11/19] fpu/softfloat: re-factor mul Alex Bennée
2017-12-18 22:22   ` Richard Henderson
2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 12/19] fpu/softfloat: re-factor div Alex Bennée
2017-12-18 22:26   ` Richard Henderson
2017-12-11 12:56 ` [Qemu-devel] [PATCH v1 13/19] fpu/softfloat: re-factor muladd Alex Bennée
2017-12-18 22:36   ` Richard Henderson
2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 14/19] fpu/softfloat: re-factor round_to_int Alex Bennée
2017-12-18 22:41   ` Richard Henderson
2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 15/19] fpu/softfloat: re-factor float to int/uint Alex Bennée
2017-12-18 22:54   ` Richard Henderson
2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 16/19] fpu/softfloat: re-factor int/uint to float Alex Bennée
2017-12-12 17:21   ` Alex Bennée
2017-12-18 22:59   ` Richard Henderson
2018-01-05 15:51     ` Alex Bennée
2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 17/19] fpu/softfloat: re-factor scalbn Alex Bennée
2017-12-18 23:00   ` Richard Henderson
2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 18/19] fpu/softfloat: re-factor minmax Alex Bennée
2017-12-18 23:19   ` Richard Henderson
2017-12-11 12:57 ` [Qemu-devel] [PATCH v1 19/19] fpu/softfloat: re-factor compare Alex Bennée
2017-12-18 23:26   ` Richard Henderson
2017-12-11 13:42 ` [Qemu-devel] [PATCH v1 00/19] re-factor softfloat and add fp16 functions no-reply
2017-12-11 15:40   ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.