* [PATCH v2 00/10] softfloat: Implement float128_muladd
@ 2020-09-25 15:20 Richard Henderson
2020-09-25 15:20 ` [PATCH v2 01/10] softfloat: Use mulu64 for mul64To128 Richard Henderson
` (10 more replies)
0 siblings, 11 replies; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
To: qemu-devel; +Cc: bharata, alex.bennee, david
Plus assorted cleanups, passes tests/fp/fp-test.
Changes in v2:
* Add UInt256 type (david)
* Rewrite and inline shift256RightJamming. This keeps the whole
UInt256 in registers, avoiding long sequences of loads and stores.
* Add x86_64 assembly for double shifts. I don't know why the
compiler can't recognize this pattern, but swapping values in
and out of %cl (the only register in the base isa that can
hold a variable shift) is really ugly.
* Add ppc64 assembly.
r~
Richard Henderson (10):
softfloat: Use mulu64 for mul64To128
softfloat: Use int128.h for some operations
softfloat: Tidy a * b + inf return
softfloat: Add float_cmask and constants
softfloat: Inline pick_nan_muladd into its caller
softfloat: Implement float128_muladd
softfloat: Use x86_64 assembly for {add,sub}{192,256}
softfloat: Use x86_64 assembly for sh[rl]_double
softfloat: Use aarch64 assembly for {add,sub}{192,256}
softfloat: Use ppc64 assembly for {add,sub}{192,256}
include/fpu/softfloat-macros.h | 109 +++---
include/fpu/softfloat.h | 2 +
fpu/softfloat.c | 620 ++++++++++++++++++++++++++++++---
tests/fp/fp-test.c | 2 +-
tests/fp/wrap.c.inc | 12 +
5 files changed, 652 insertions(+), 93 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 01/10] softfloat: Use mulu64 for mul64To128
2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
2020-10-15 19:08 ` Alex Bennée
2020-09-25 15:20 ` [PATCH v2 02/10] softfloat: Use int128.h for some operations Richard Henderson
` (9 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
To: qemu-devel; +Cc: bharata, alex.bennee, david
Via host-utils.h, we use a host widening multiply for
64-bit hosts, and a common subroutine for 32-bit hosts.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/fpu/softfloat-macros.h | 24 ++++--------------------
1 file changed, 4 insertions(+), 20 deletions(-)
diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index a35ec2893a..57845f8af0 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -83,6 +83,7 @@ this code that are retained.
#define FPU_SOFTFLOAT_MACROS_H
#include "fpu/softfloat-types.h"
+#include "qemu/host-utils.h"
/*----------------------------------------------------------------------------
| Shifts `a' right by the number of bits given in `count'. If any nonzero
@@ -515,27 +516,10 @@ static inline void
| `z0Ptr' and `z1Ptr'.
*----------------------------------------------------------------------------*/
-static inline void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr )
+static inline void
+mul64To128(uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_t *z1Ptr)
{
- uint32_t aHigh, aLow, bHigh, bLow;
- uint64_t z0, zMiddleA, zMiddleB, z1;
-
- aLow = a;
- aHigh = a>>32;
- bLow = b;
- bHigh = b>>32;
- z1 = ( (uint64_t) aLow ) * bLow;
- zMiddleA = ( (uint64_t) aLow ) * bHigh;
- zMiddleB = ( (uint64_t) aHigh ) * bLow;
- z0 = ( (uint64_t) aHigh ) * bHigh;
- zMiddleA += zMiddleB;
- z0 += ( ( (uint64_t) ( zMiddleA < zMiddleB ) )<<32 ) + ( zMiddleA>>32 );
- zMiddleA <<= 32;
- z1 += zMiddleA;
- z0 += ( z1 < zMiddleA );
- *z1Ptr = z1;
- *z0Ptr = z0;
-
+ mulu64(z1Ptr, z0Ptr, a, b);
}
/*----------------------------------------------------------------------------
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 02/10] softfloat: Use int128.h for some operations
2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
2020-09-25 15:20 ` [PATCH v2 01/10] softfloat: Use mulu64 for mul64To128 Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
2020-10-15 19:10 ` Alex Bennée
2020-09-25 15:20 ` [PATCH v2 03/10] softfloat: Tidy a * b + inf return Richard Henderson
` (8 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
To: qemu-devel; +Cc: bharata, alex.bennee, david
Use our Int128, which wraps the compiler's __int128_t,
instead of open-coding left shifts and arithmetic.
We'd need to extend Int128 to have unsigned operations
to replace more than these three.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/fpu/softfloat-macros.h | 39 +++++++++++++++++-----------------
1 file changed, 20 insertions(+), 19 deletions(-)
diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 57845f8af0..95d88d05b8 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -84,6 +84,7 @@ this code that are retained.
#include "fpu/softfloat-types.h"
#include "qemu/host-utils.h"
+#include "qemu/int128.h"
/*----------------------------------------------------------------------------
| Shifts `a' right by the number of bits given in `count'. If any nonzero
@@ -352,13 +353,11 @@ static inline void shortShift128Left(uint64_t a0, uint64_t a1, int count,
static inline void shift128Left(uint64_t a0, uint64_t a1, int count,
uint64_t *z0Ptr, uint64_t *z1Ptr)
{
- if (count < 64) {
- *z1Ptr = a1 << count;
- *z0Ptr = count == 0 ? a0 : (a0 << count) | (a1 >> (-count & 63));
- } else {
- *z1Ptr = 0;
- *z0Ptr = a1 << (count - 64);
- }
+ Int128 a = int128_make128(a1, a0);
+ Int128 z = int128_lshift(a, count);
+
+ *z0Ptr = int128_gethi(z);
+ *z1Ptr = int128_getlo(z);
}
/*----------------------------------------------------------------------------
@@ -405,15 +404,15 @@ static inline void
*----------------------------------------------------------------------------*/
static inline void
- add128(
- uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr )
+add128(uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1,
+ uint64_t *z0Ptr, uint64_t *z1Ptr)
{
- uint64_t z1;
-
- z1 = a1 + b1;
- *z1Ptr = z1;
- *z0Ptr = a0 + b0 + ( z1 < a1 );
+ Int128 a = int128_make128(a1, a0);
+ Int128 b = int128_make128(b1, b0);
+ Int128 z = int128_add(a, b);
+ *z0Ptr = int128_gethi(z);
+ *z1Ptr = int128_getlo(z);
}
/*----------------------------------------------------------------------------
@@ -463,13 +462,15 @@ static inline void
*----------------------------------------------------------------------------*/
static inline void
- sub128(
- uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr, uint64_t *z1Ptr )
+sub128(uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1,
+ uint64_t *z0Ptr, uint64_t *z1Ptr)
{
+ Int128 a = int128_make128(a1, a0);
+ Int128 b = int128_make128(b1, b0);
+ Int128 z = int128_sub(a, b);
- *z1Ptr = a1 - b1;
- *z0Ptr = a0 - b0 - ( a1 < b1 );
-
+ *z0Ptr = int128_gethi(z);
+ *z1Ptr = int128_getlo(z);
}
/*----------------------------------------------------------------------------
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 03/10] softfloat: Tidy a * b + inf return
2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
2020-09-25 15:20 ` [PATCH v2 01/10] softfloat: Use mulu64 for mul64To128 Richard Henderson
2020-09-25 15:20 ` [PATCH v2 02/10] softfloat: Use int128.h for some operations Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
2020-10-16 9:40 ` Alex Bennée
2020-10-16 17:04 ` Philippe Mathieu-Daudé
2020-09-25 15:20 ` [PATCH v2 04/10] softfloat: Add float_cmask and constants Richard Henderson
` (7 subsequent siblings)
10 siblings, 2 replies; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
To: qemu-devel; +Cc: bharata, alex.bennee, david
No reason to set values in 'a', when we already
have float_class_inf in 'c', and can flip that sign.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
fpu/softfloat.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 67cfa0fd82..9db55d2b11 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1380,9 +1380,8 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
s->float_exception_flags |= float_flag_invalid;
return parts_default_nan(s);
} else {
- a.cls = float_class_inf;
- a.sign = c.sign ^ sign_flip;
- return a;
+ c.sign ^= sign_flip;
+ return c;
}
}
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 04/10] softfloat: Add float_cmask and constants
2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
` (2 preceding siblings ...)
2020-09-25 15:20 ` [PATCH v2 03/10] softfloat: Tidy a * b + inf return Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
2020-10-16 9:44 ` Alex Bennée
2020-09-25 15:20 ` [PATCH v2 05/10] softfloat: Inline pick_nan_muladd into its caller Richard Henderson
` (6 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
To: qemu-devel; +Cc: bharata, alex.bennee, david
Testing more than one class at a time is better done with masks.
This reduces the static branch count.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
fpu/softfloat.c | 31 ++++++++++++++++++++++++-------
1 file changed, 24 insertions(+), 7 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 9db55d2b11..3e625c47cd 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -469,6 +469,20 @@ typedef enum __attribute__ ((__packed__)) {
float_class_snan,
} FloatClass;
+#define float_cmask(bit) (1u << (bit))
+
+enum {
+ float_cmask_zero = float_cmask(float_class_zero),
+ float_cmask_normal = float_cmask(float_class_normal),
+ float_cmask_inf = float_cmask(float_class_inf),
+ float_cmask_qnan = float_cmask(float_class_qnan),
+ float_cmask_snan = float_cmask(float_class_snan),
+
+ float_cmask_infzero = float_cmask_zero | float_cmask_inf,
+ float_cmask_anynan = float_cmask_qnan | float_cmask_snan,
+};
+
+
/* Simple helpers for checking if, or what kind of, NaN we have */
static inline __attribute__((unused)) bool is_nan(FloatClass c)
{
@@ -1335,24 +1349,27 @@ bfloat16 QEMU_FLATTEN bfloat16_mul(bfloat16 a, bfloat16 b, float_status *status)
static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
int flags, float_status *s)
{
- bool inf_zero = ((1 << a.cls) | (1 << b.cls)) ==
- ((1 << float_class_inf) | (1 << float_class_zero));
- bool p_sign;
+ bool inf_zero, p_sign;
bool sign_flip = flags & float_muladd_negate_result;
FloatClass p_class;
uint64_t hi, lo;
int p_exp;
+ int ab_mask, abc_mask;
+
+ ab_mask = float_cmask(a.cls) | float_cmask(b.cls);
+ abc_mask = float_cmask(c.cls) | ab_mask;
+ inf_zero = ab_mask == float_cmask_infzero;
/* It is implementation-defined whether the cases of (0,inf,qnan)
* and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
* they return if they do), so we have to hand this information
* off to the target-specific pick-a-NaN routine.
*/
- if (is_nan(a.cls) || is_nan(b.cls) || is_nan(c.cls)) {
+ if (unlikely(abc_mask & float_cmask_anynan)) {
return pick_nan_muladd(a, b, c, inf_zero, s);
}
- if (inf_zero) {
+ if (unlikely(inf_zero)) {
s->float_exception_flags |= float_flag_invalid;
return parts_default_nan(s);
}
@@ -1367,9 +1384,9 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
p_sign ^= 1;
}
- if (a.cls == float_class_inf || b.cls == float_class_inf) {
+ if (ab_mask & float_cmask_inf) {
p_class = float_class_inf;
- } else if (a.cls == float_class_zero || b.cls == float_class_zero) {
+ } else if (ab_mask & float_cmask_zero) {
p_class = float_class_zero;
} else {
p_class = float_class_normal;
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 05/10] softfloat: Inline pick_nan_muladd into its caller
2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
` (3 preceding siblings ...)
2020-09-25 15:20 ` [PATCH v2 04/10] softfloat: Add float_cmask and constants Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
2020-10-16 16:20 ` Alex Bennée
2020-09-25 15:20 ` [PATCH v2 06/10] softfloat: Implement float128_muladd Richard Henderson
` (5 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
To: qemu-devel; +Cc: bharata, alex.bennee, david
Because of FloatParts, there will only ever be one caller.
Inlining allows us to re-use abc_mask for the snan test.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
fpu/softfloat.c | 75 +++++++++++++++++++++++--------------------------
1 file changed, 35 insertions(+), 40 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3e625c47cd..e038434a07 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -929,45 +929,6 @@ static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
return a;
}
-static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
- bool inf_zero, float_status *s)
-{
- int which;
-
- if (is_snan(a.cls) || is_snan(b.cls) || is_snan(c.cls)) {
- s->float_exception_flags |= float_flag_invalid;
- }
-
- which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
-
- if (s->default_nan_mode) {
- /* Note that this check is after pickNaNMulAdd so that function
- * has an opportunity to set the Invalid flag.
- */
- which = 3;
- }
-
- switch (which) {
- case 0:
- break;
- case 1:
- a = b;
- break;
- case 2:
- a = c;
- break;
- case 3:
- return parts_default_nan(s);
- default:
- g_assert_not_reached();
- }
-
- if (is_snan(a.cls)) {
- return parts_silence_nan(a, s);
- }
- return a;
-}
-
/*
* Returns the result of adding or subtracting the values of the
* floating-point values `a' and `b'. The operation is performed
@@ -1366,7 +1327,41 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
* off to the target-specific pick-a-NaN routine.
*/
if (unlikely(abc_mask & float_cmask_anynan)) {
- return pick_nan_muladd(a, b, c, inf_zero, s);
+ int which;
+
+ if (unlikely(abc_mask & float_cmask_snan)) {
+ float_raise(float_flag_invalid, s);
+ }
+
+ which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
+
+ if (s->default_nan_mode) {
+ /*
+ * Note that this check is after pickNaNMulAdd so that function
+ * has an opportunity to set the Invalid flag for inf_zero.
+ */
+ which = 3;
+ }
+
+ switch (which) {
+ case 0:
+ break;
+ case 1:
+ a = b;
+ break;
+ case 2:
+ a = c;
+ break;
+ case 3:
+ return parts_default_nan(s);
+ default:
+ g_assert_not_reached();
+ }
+
+ if (is_snan(a.cls)) {
+ return parts_silence_nan(a, s);
+ }
+ return a;
}
if (unlikely(inf_zero)) {
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 06/10] softfloat: Implement float128_muladd
2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
` (4 preceding siblings ...)
2020-09-25 15:20 ` [PATCH v2 05/10] softfloat: Inline pick_nan_muladd into its caller Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
2020-10-16 16:31 ` Alex Bennée
2020-09-25 15:20 ` [PATCH v2 07/10] softfloat: Use x86_64 assembly for {add, sub}{192, 256} Richard Henderson
` (4 subsequent siblings)
10 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
To: qemu-devel; +Cc: bharata, alex.bennee, david
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/fpu/softfloat.h | 2 +
fpu/softfloat.c | 416 +++++++++++++++++++++++++++++++++++++++-
tests/fp/fp-test.c | 2 +-
tests/fp/wrap.c.inc | 12 ++
4 files changed, 430 insertions(+), 2 deletions(-)
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 78ad5ca738..a38433deb4 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -1196,6 +1196,8 @@ float128 float128_sub(float128, float128, float_status *status);
float128 float128_mul(float128, float128, float_status *status);
float128 float128_div(float128, float128, float_status *status);
float128 float128_rem(float128, float128, float_status *status);
+float128 float128_muladd(float128, float128, float128, int,
+ float_status *status);
float128 float128_sqrt(float128, float_status *status);
FloatRelation float128_compare(float128, float128, float_status *status);
FloatRelation float128_compare_quiet(float128, float128, float_status *status);
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index e038434a07..49de31fec2 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -512,11 +512,19 @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
typedef struct {
uint64_t frac;
- int32_t exp;
+ int32_t exp;
FloatClass cls;
bool sign;
} FloatParts;
+/* Similar for float128. */
+typedef struct {
+ uint64_t frac0, frac1;
+ int32_t exp;
+ FloatClass cls;
+ bool sign;
+} FloatParts128;
+
#define DECOMPOSED_BINARY_POINT (64 - 2)
#define DECOMPOSED_IMPLICIT_BIT (1ull << DECOMPOSED_BINARY_POINT)
#define DECOMPOSED_OVERFLOW_BIT (DECOMPOSED_IMPLICIT_BIT << 1)
@@ -4574,6 +4582,46 @@ static void
}
+/*----------------------------------------------------------------------------
+| Returns the parts of floating-point value `a'.
+*----------------------------------------------------------------------------*/
+
+static void float128_unpack(FloatParts128 *p, float128 a, float_status *status)
+{
+ p->sign = extractFloat128Sign(a);
+ p->exp = extractFloat128Exp(a);
+ p->frac0 = extractFloat128Frac0(a);
+ p->frac1 = extractFloat128Frac1(a);
+
+ if (p->exp == 0) {
+ if ((p->frac0 | p->frac1) == 0) {
+ p->cls = float_class_zero;
+ } else if (status->flush_inputs_to_zero) {
+ float_raise(float_flag_input_denormal, status);
+ p->cls = float_class_zero;
+ p->frac0 = p->frac1 = 0;
+ } else {
+ normalizeFloat128Subnormal(p->frac0, p->frac1, &p->exp,
+ &p->frac0, &p->frac1);
+ p->exp -= 0x3fff;
+ p->cls = float_class_normal;
+ }
+ } else if (p->exp == 0x7fff) {
+ if ((p->frac0 | p->frac1) == 0) {
+ p->cls = float_class_inf;
+ } else if (float128_is_signaling_nan(a, status)) {
+ p->cls = float_class_snan;
+ } else {
+ p->cls = float_class_qnan;
+ }
+ } else {
+ /* Add the implicit bit. */
+ p->frac0 |= UINT64_C(0x0001000000000000);
+ p->exp -= 0x3fff;
+ p->cls = float_class_normal;
+ }
+}
+
/*----------------------------------------------------------------------------
| Packs the sign `zSign', the exponent `zExp', and the significand formed
| by the concatenation of `zSig0' and `zSig1' into a quadruple-precision
@@ -7205,6 +7253,372 @@ float128 float128_mul(float128 a, float128 b, float_status *status)
}
+typedef struct UInt256 {
+ /* Indexed big-endian, to match the rest of softfloat numbering. */
+ uint64_t w[4];
+} UInt256;
+
+static inline uint64_t shl_double(uint64_t h, uint64_t l, unsigned lsh)
+{
+ return lsh ? (h << lsh) | (l >> (64 - lsh)) : h;
+}
+
+static inline uint64_t shr_double(uint64_t h, uint64_t l, unsigned rsh)
+{
+ return rsh ? (h << (64 - rsh)) | (l >> rsh) : l;
+}
+
+static void shortShift256Left(UInt256 *p, unsigned lsh)
+{
+ if (lsh != 0) {
+ p->w[0] = shl_double(p->w[0], p->w[1], lsh);
+ p->w[1] = shl_double(p->w[1], p->w[2], lsh);
+ p->w[2] = shl_double(p->w[2], p->w[3], lsh);
+ p->w[3] <<= lsh;
+ }
+}
+
+static inline void shift256RightJamming(UInt256 *p, unsigned count)
+{
+ uint64_t out, p0, p1, p2, p3;
+
+ p0 = p->w[0];
+ p1 = p->w[1];
+ p2 = p->w[2];
+ p3 = p->w[3];
+
+ if (unlikely(count == 0)) {
+ return;
+ } else if (likely(count < 64)) {
+ out = 0;
+ } else if (likely(count < 256)) {
+ if (count < 128) {
+ out = p3;
+ p3 = p2;
+ p2 = p1;
+ p1 = p0;
+ p0 = 0;
+ } else if (count < 192) {
+ out = p2 | p3;
+ p3 = p1;
+ p2 = p0;
+ p1 = 0;
+ p0 = 0;
+ } else {
+ out = p1 | p2 | p3;
+ p3 = p0;
+ p2 = 0;
+ p1 = 0;
+ p0 = 0;
+ }
+ count &= 63;
+ if (count == 0) {
+ goto done;
+ }
+ } else {
+ out = p0 | p1 | p2 | p3;
+ p3 = 0;
+ p2 = 0;
+ p1 = 0;
+ p0 = 0;
+ goto done;
+ }
+
+ out |= shr_double(p3, 0, count);
+ p3 = shr_double(p2, p3, count);
+ p2 = shr_double(p1, p2, count);
+ p1 = shr_double(p0, p1, count);
+ p0 = p0 >> count;
+
+ done:
+ p->w[3] = p3 | (out != 0);
+ p->w[2] = p2;
+ p->w[1] = p1;
+ p->w[0] = p0;
+}
+
+/* R = A - B */
+static void sub256(UInt256 *r, UInt256 *a, UInt256 *b)
+{
+ bool borrow = false;
+
+ for (int i = 3; i >= 0; --i) {
+ uint64_t at = a->w[i];
+ uint64_t bt = b->w[i];
+ uint64_t rt = at - bt;
+
+ if (borrow) {
+ borrow = at <= bt;
+ rt -= 1;
+ } else {
+ borrow = at < bt;
+ }
+ r->w[i] = rt;
+ }
+}
+
+/* A = -A */
+static void neg256(UInt256 *a)
+{
+ /*
+ * Recall that -X - 1 = ~X, and that since this is negation,
+ * once we find a non-zero number, all subsequent words will
+ * have borrow-in, and thus use NOT.
+ */
+ uint64_t t = a->w[3];
+ if (likely(t)) {
+ a->w[3] = -t;
+ goto not2;
+ }
+ t = a->w[2];
+ if (likely(t)) {
+ a->w[2] = -t;
+ goto not1;
+ }
+ t = a->w[1];
+ if (likely(t)) {
+ a->w[1] = -t;
+ goto not0;
+ }
+ a->w[0] = -a->w[0];
+ return;
+ not2:
+ a->w[2] = ~a->w[2];
+ not1:
+ a->w[1] = ~a->w[1];
+ not0:
+ a->w[0] = ~a->w[0];
+}
+
+/* A += B */
+static void add256(UInt256 *a, UInt256 *b)
+{
+ bool carry = false;
+
+ for (int i = 3; i >= 0; --i) {
+ uint64_t bt = b->w[i];
+ uint64_t at = a->w[i] + bt;
+
+ if (carry) {
+ at += 1;
+ carry = at <= bt;
+ } else {
+ carry = at < bt;
+ }
+ a->w[i] = at;
+ }
+}
+
+float128 float128_muladd(float128 a_f, float128 b_f, float128 c_f,
+ int flags, float_status *status)
+{
+ bool inf_zero, p_sign, sign_flip;
+ int p_exp, exp_diff, shift, ab_mask, abc_mask;
+ FloatParts128 a, b, c;
+ FloatClass p_cls;
+ UInt256 p_frac, c_frac;
+
+ float128_unpack(&a, a_f, status);
+ float128_unpack(&b, b_f, status);
+ float128_unpack(&c, c_f, status);
+
+ ab_mask = float_cmask(a.cls) | float_cmask(b.cls);
+ abc_mask = float_cmask(c.cls) | ab_mask;
+ inf_zero = ab_mask == float_cmask_infzero;
+
+ /* If any input is a NaN, select the required result. */
+ if (unlikely(abc_mask & float_cmask_anynan)) {
+ if (unlikely(abc_mask & float_cmask_snan)) {
+ float_raise(float_flag_invalid, status);
+ }
+
+ int which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, status);
+ if (status->default_nan_mode) {
+ which = 3;
+ }
+ switch (which) {
+ case 0:
+ break;
+ case 1:
+ a_f = b_f;
+ a.cls = b.cls;
+ break;
+ case 2:
+ a_f = c_f;
+ a.cls = c.cls;
+ break;
+ case 3:
+ return float128_default_nan(status);
+ }
+ if (is_snan(a.cls)) {
+ return float128_silence_nan(a_f, status);
+ }
+ return a_f;
+ }
+
+ /* After dealing with input NaNs, look for Inf * Zero. */
+ if (unlikely(inf_zero)) {
+ float_raise(float_flag_invalid, status);
+ return float128_default_nan(status);
+ }
+
+ p_sign = a.sign ^ b.sign;
+
+ if (flags & float_muladd_negate_c) {
+ c.sign ^= 1;
+ }
+ if (flags & float_muladd_negate_product) {
+ p_sign ^= 1;
+ }
+ sign_flip = (flags & float_muladd_negate_result);
+
+ if (ab_mask & float_cmask_inf) {
+ p_cls = float_class_inf;
+ } else if (ab_mask & float_cmask_zero) {
+ p_cls = float_class_zero;
+ } else {
+ p_cls = float_class_normal;
+ }
+
+ if (c.cls == float_class_inf) {
+ if (p_cls == float_class_inf && p_sign != c.sign) {
+ /* +Inf + -Inf = NaN */
+ float_raise(float_flag_invalid, status);
+ return float128_default_nan(status);
+ }
+ /* Inf + Inf = Inf of the proper sign; reuse the return below. */
+ p_cls = float_class_inf;
+ p_sign = c.sign;
+ }
+
+ if (p_cls == float_class_inf) {
+ return packFloat128(p_sign ^ sign_flip, 0x7fff, 0, 0);
+ }
+
+ if (p_cls == float_class_zero) {
+ if (c.cls == float_class_zero) {
+ if (p_sign != c.sign) {
+ p_sign = status->float_rounding_mode == float_round_down;
+ }
+ return packFloat128(p_sign ^ sign_flip, 0, 0, 0);
+ }
+
+ if (flags & float_muladd_halve_result) {
+ c.exp -= 1;
+ }
+ return roundAndPackFloat128(c.sign ^ sign_flip,
+ c.exp + 0x3fff - 1,
+ c.frac0, c.frac1, 0, status);
+ }
+
+ /* a & b should be normals now... */
+ assert(a.cls == float_class_normal && b.cls == float_class_normal);
+
+ /* Multiply of 2 113-bit numbers produces a 226-bit result. */
+ mul128To256(a.frac0, a.frac1, b.frac0, b.frac1,
+ &p_frac.w[0], &p_frac.w[1], &p_frac.w[2], &p_frac.w[3]);
+
+ /* Realign the binary point at bit 48 of p_frac[0]. */
+ shift = clz64(p_frac.w[0]) - 15;
+ shortShift256Left(&p_frac, shift);
+ p_exp = a.exp + b.exp - (shift - 16);
+ exp_diff = p_exp - c.exp;
+
+ /* Extend the fraction part of the addend to 256 bits. */
+ c_frac.w[0] = c.frac0;
+ c_frac.w[1] = c.frac1;
+ c_frac.w[2] = 0;
+ c_frac.w[3] = 0;
+
+ /* Add or subtract C from the intermediate product. */
+ if (c.cls == float_class_zero) {
+ /* Fall through to rounding after addition (with zero). */
+ } else if (p_sign != c.sign) {
+ /* Subtraction */
+ if (exp_diff < 0) {
+ shift256RightJamming(&p_frac, -exp_diff);
+ sub256(&p_frac, &c_frac, &p_frac);
+ p_exp = c.exp;
+ p_sign ^= 1;
+ } else if (exp_diff > 0) {
+ shift256RightJamming(&c_frac, exp_diff);
+ sub256(&p_frac, &p_frac, &c_frac);
+ } else {
+ /* Low 128 bits of C are known to be zero. */
+ sub128(p_frac.w[0], p_frac.w[1], c_frac.w[0], c_frac.w[1],
+ &p_frac.w[0], &p_frac.w[1]);
+ /*
+ * Since we have normalized to bit 48 of p_frac[0],
+ * a negative result means C > P and we need to invert.
+ */
+ if ((int64_t)p_frac.w[0] < 0) {
+ neg256(&p_frac);
+ p_sign ^= 1;
+ }
+ }
+
+ /*
+ * Gross normalization of the 256-bit subtraction result.
+ * Fine tuning below shared with addition.
+ */
+ if (p_frac.w[0] != 0) {
+ /* nothing to do */
+ } else if (p_frac.w[1] != 0) {
+ p_exp -= 64;
+ p_frac.w[0] = p_frac.w[1];
+ p_frac.w[1] = p_frac.w[2];
+ p_frac.w[2] = p_frac.w[3];
+ p_frac.w[3] = 0;
+ } else if (p_frac.w[2] != 0) {
+ p_exp -= 128;
+ p_frac.w[0] = p_frac.w[2];
+ p_frac.w[1] = p_frac.w[3];
+ p_frac.w[2] = 0;
+ p_frac.w[3] = 0;
+ } else if (p_frac.w[3] != 0) {
+ p_exp -= 192;
+ p_frac.w[0] = p_frac.w[3];
+ p_frac.w[1] = 0;
+ p_frac.w[2] = 0;
+ p_frac.w[3] = 0;
+ } else {
+ /* Subtraction was exact: result is zero. */
+ p_sign = status->float_rounding_mode == float_round_down;
+ return packFloat128(p_sign ^ sign_flip, 0, 0, 0);
+ }
+ } else {
+ /* Addition */
+ if (exp_diff <= 0) {
+ shift256RightJamming(&p_frac, -exp_diff);
+ /* Low 128 bits of C are known to be zero. */
+ add128(p_frac.w[0], p_frac.w[1], c_frac.w[0], c_frac.w[1],
+ &p_frac.w[0], &p_frac.w[1]);
+ p_exp = c.exp;
+ } else {
+ shift256RightJamming(&c_frac, exp_diff);
+ add256(&p_frac, &c_frac);
+ }
+ }
+
+ /* Fine normalization of the 256-bit result: p_frac[0] != 0. */
+ shift = clz64(p_frac.w[0]) - 15;
+ if (shift < 0) {
+ shift256RightJamming(&p_frac, -shift);
+ } else if (shift > 0) {
+ shortShift256Left(&p_frac, shift);
+ }
+ p_exp -= shift;
+
+ if (flags & float_muladd_halve_result) {
+ p_exp -= 1;
+ }
+ return roundAndPackFloat128(p_sign ^ sign_flip,
+ p_exp + 0x3fff - 1,
+ p_frac.w[0], p_frac.w[1],
+ p_frac.w[2] | (p_frac.w[3] != 0),
+ status);
+}
+
/*----------------------------------------------------------------------------
| Returns the result of dividing the quadruple-precision floating-point value
| `a' by the corresponding value `b'. The operation is performed according to
diff --git a/tests/fp/fp-test.c b/tests/fp/fp-test.c
index 06ffebd6db..9bbb0dba67 100644
--- a/tests/fp/fp-test.c
+++ b/tests/fp/fp-test.c
@@ -717,7 +717,7 @@ static void do_testfloat(int op, int rmode, bool exact)
test_abz_f128(true_abz_f128M, subj_abz_f128M);
break;
case F128_MULADD:
- not_implemented();
+ test_abcz_f128(slow_f128M_mulAdd, qemu_f128_mulAdd);
break;
case F128_SQRT:
test_az_f128(slow_f128M_sqrt, qemu_f128M_sqrt);
diff --git a/tests/fp/wrap.c.inc b/tests/fp/wrap.c.inc
index 0cbd20013e..65a713deae 100644
--- a/tests/fp/wrap.c.inc
+++ b/tests/fp/wrap.c.inc
@@ -574,6 +574,18 @@ WRAP_MULADD(qemu_f32_mulAdd, float32_muladd, float32)
WRAP_MULADD(qemu_f64_mulAdd, float64_muladd, float64)
#undef WRAP_MULADD
+static void qemu_f128_mulAdd(const float128_t *ap, const float128_t *bp,
+ const float128_t *cp, float128_t *res)
+{
+ float128 a, b, c, ret;
+
+ a = soft_to_qemu128(*ap);
+ b = soft_to_qemu128(*bp);
+ c = soft_to_qemu128(*cp);
+ ret = float128_muladd(a, b, c, 0, &qsf);
+ *res = qemu_to_soft128(ret);
+}
+
#define WRAP_CMP16(name, func, retcond) \
static bool name(float16_t a, float16_t b) \
{ \
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 07/10] softfloat: Use x86_64 assembly for {add, sub}{192, 256}
2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
` (5 preceding siblings ...)
2020-09-25 15:20 ` [PATCH v2 06/10] softfloat: Implement float128_muladd Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
2020-09-25 15:20 ` [PATCH v2 08/10] softfloat: Use x86_64 assembly for sh[rl]_double Richard Henderson
` (3 subsequent siblings)
10 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
To: qemu-devel; +Cc: bharata, alex.bennee, david
The compiler cannot chain more than two additions together.
Use inline assembly for 3 or 4 additions.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/fpu/softfloat-macros.h | 18 ++++++++++++++++--
fpu/softfloat.c | 29 +++++++++++++++++++++++++++++
2 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 95d88d05b8..99fa124e56 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -436,6 +436,13 @@ static inline void
uint64_t *z2Ptr
)
{
+#ifdef __x86_64__
+ asm("add %5, %2\n\t"
+ "adc %4, %1\n\t"
+ "adc %3, %0"
+ : "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
+ : "rm"(b0), "rm"(b1), "rm"(b2), "0"(a0), "1"(a1), "2"(a2));
+#else
uint64_t z0, z1, z2;
int8_t carry0, carry1;
@@ -450,7 +457,7 @@ static inline void
*z2Ptr = z2;
*z1Ptr = z1;
*z0Ptr = z0;
-
+#endif
}
/*----------------------------------------------------------------------------
@@ -494,6 +501,13 @@ static inline void
uint64_t *z2Ptr
)
{
+#ifdef __x86_64__
+ asm("sub %5, %2\n\t"
+ "sbb %4, %1\n\t"
+ "sbb %3, %0"
+ : "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
+ : "rm"(b0), "rm"(b1), "rm"(b2), "0"(a0), "1"(a1), "2"(a2));
+#else
uint64_t z0, z1, z2;
int8_t borrow0, borrow1;
@@ -508,7 +522,7 @@ static inline void
*z2Ptr = z2;
*z1Ptr = z1;
*z0Ptr = z0;
-
+#endif
}
/*----------------------------------------------------------------------------
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 49de31fec2..54d0b210ac 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -7340,6 +7340,15 @@ static inline void shift256RightJamming(UInt256 *p, unsigned count)
/* R = A - B */
static void sub256(UInt256 *r, UInt256 *a, UInt256 *b)
{
+#if defined(__x86_64__)
+ asm("sub %7, %3\n\t"
+ "sbb %6, %2\n\t"
+ "sbb %5, %1\n\t"
+ "sbb %4, %0"
+ : "=&r"(r->w[0]), "=&r"(r->w[1]), "=&r"(r->w[2]), "=&r"(r->w[3])
+ : "rme"(b->w[0]), "rme"(b->w[1]), "rme"(b->w[2]), "rme"(b->w[3]),
+ "0"(a->w[0]), "1"(a->w[1]), "2"(a->w[2]), "3"(a->w[3]));
+#else
bool borrow = false;
for (int i = 3; i >= 0; --i) {
@@ -7355,11 +7364,21 @@ static void sub256(UInt256 *r, UInt256 *a, UInt256 *b)
}
r->w[i] = rt;
}
+#endif
}
/* A = -A */
static void neg256(UInt256 *a)
{
+#if defined(__x86_64__)
+ asm("negq %3\n\t"
+ "sbb %6, %2\n\t"
+ "sbb %5, %1\n\t"
+ "sbb %4, %0"
+ : "=&r"(a->w[0]), "=&r"(a->w[1]), "=&r"(a->w[2]), "+rm"(a->w[3])
+ : "rme"(a->w[0]), "rme"(a->w[1]), "rme"(a->w[2]),
+ "0"(0), "1"(0), "2"(0));
+#else
/*
* Recall that -X - 1 = ~X, and that since this is negation,
* once we find a non-zero number, all subsequent words will
@@ -7388,11 +7407,20 @@ static void neg256(UInt256 *a)
a->w[1] = ~a->w[1];
not0:
a->w[0] = ~a->w[0];
+#endif
}
/* A += B */
static void add256(UInt256 *a, UInt256 *b)
{
+#if defined(__x86_64__)
+ asm("add %7, %3\n\t"
+ "adc %6, %2\n\t"
+ "adc %5, %1\n\t"
+ "adc %4, %0"
+ : "+r"(a->w[0]), "+r"(a->w[1]), "+r"(a->w[2]), "+r"(a->w[3])
+ : "rme"(b->w[0]), "rme"(b->w[1]), "rme"(b->w[2]), "rme"(b->w[3]));
+#else
bool carry = false;
for (int i = 3; i >= 0; --i) {
@@ -7407,6 +7435,7 @@ static void add256(UInt256 *a, UInt256 *b)
}
a->w[i] = at;
}
+#endif
}
float128 float128_muladd(float128 a_f, float128 b_f, float128 c_f,
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 08/10] softfloat: Use x86_64 assembly for sh[rl]_double
2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
` (6 preceding siblings ...)
2020-09-25 15:20 ` [PATCH v2 07/10] softfloat: Use x86_64 assembly for {add, sub}{192, 256} Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
2020-09-25 15:20 ` [PATCH v2 09/10] softfloat: Use aarch64 assembly for {add, sub}{192, 256} Richard Henderson
` (2 subsequent siblings)
10 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
To: qemu-devel; +Cc: bharata, alex.bennee, david
GCC isn't recognizing this pattern for x86, and it
probably couldn't recognize that the outer condition
is not required either.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
fpu/softfloat.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 54d0b210ac..fdf5bde69e 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -7260,12 +7260,22 @@ typedef struct UInt256 {
static inline uint64_t shl_double(uint64_t h, uint64_t l, unsigned lsh)
{
+#ifdef __x86_64__
+ asm("shld %b2, %1, %0" : "+r"(h) : "r"(l), "ci"(lsh));
+ return h;
+#else
return lsh ? (h << lsh) | (l >> (64 - lsh)) : h;
+#endif
}
static inline uint64_t shr_double(uint64_t h, uint64_t l, unsigned rsh)
{
+#ifdef __x86_64__
+ asm("shrd %b2, %1, %0" : "+r"(l) : "r"(h), "ci"(rsh));
+ return l;
+#else
return rsh ? (h << (64 - rsh)) | (l >> rsh) : l;
+#endif
}
static void shortShift256Left(UInt256 *p, unsigned lsh)
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 09/10] softfloat: Use aarch64 assembly for {add, sub}{192, 256}
2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
` (7 preceding siblings ...)
2020-09-25 15:20 ` [PATCH v2 08/10] softfloat: Use x86_64 assembly for sh[rl]_double Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
2020-09-25 15:20 ` [PATCH v2 10/10] softfloat: Use ppc64 " Richard Henderson
2020-10-15 17:23 ` [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
10 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
To: qemu-devel; +Cc: bharata, alex.bennee, david
The compiler cannot chain more than two additions together.
Use inline assembly for 3 or 4 additions.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/fpu/softfloat-macros.h | 14 ++++++++++++++
fpu/softfloat.c | 27 +++++++++++++++++++++++++++
2 files changed, 41 insertions(+)
diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 99fa124e56..969a486fd2 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -442,6 +442,13 @@ static inline void
"adc %3, %0"
: "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
: "rm"(b0), "rm"(b1), "rm"(b2), "0"(a0), "1"(a1), "2"(a2));
+#elif defined(__aarch64__)
+ asm("adds %2, %x5, %x8\n\t"
+ "adcs %1, %x4, %x7\n\t"
+ "adc %0, %x3, %x6"
+ : "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
+ : "rZ"(a0), "rZ"(a1), "rZ"(a2), "rZ"(b0), "rZ"(b1), "rZ"(b2)
+ : "cc");
#else
uint64_t z0, z1, z2;
int8_t carry0, carry1;
@@ -507,6 +514,13 @@ static inline void
"sbb %3, %0"
: "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
: "rm"(b0), "rm"(b1), "rm"(b2), "0"(a0), "1"(a1), "2"(a2));
+#elif defined(__aarch64__)
+ asm("subs %2, %x5, %x8\n\t"
+ "sbcs %1, %x4, %x7\n\t"
+ "sbc %0, %x3, %x6"
+ : "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
+ : "rZ"(a0), "rZ"(a1), "rZ"(a2), "rZ"(b0), "rZ"(b1), "rZ"(b2)
+ : "cc");
#else
uint64_t z0, z1, z2;
int8_t borrow0, borrow1;
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index fdf5bde69e..07dc17caad 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -7358,6 +7358,18 @@ static void sub256(UInt256 *r, UInt256 *a, UInt256 *b)
: "=&r"(r->w[0]), "=&r"(r->w[1]), "=&r"(r->w[2]), "=&r"(r->w[3])
: "rme"(b->w[0]), "rme"(b->w[1]), "rme"(b->w[2]), "rme"(b->w[3]),
"0"(a->w[0]), "1"(a->w[1]), "2"(a->w[2]), "3"(a->w[3]));
+#elif defined(__aarch64__)
+ asm("subs %[r3], %x[a3], %x[b3]\n\t"
+ "sbcs %[r2], %x[a2], %x[b2]\n\t"
+ "sbcs %[r1], %x[a1], %x[b1]\n\t"
+ "sbc %[r0], %x[a0], %x[b0]"
+ : [r0] "=&r"(r->w[0]), [r1] "=&r"(r->w[1]),
+ [r2] "=&r"(r->w[2]), [r3] "=&r"(r->w[3])
+ : [a0] "rZ"(a->w[0]), [a1] "rZ"(a->w[1]),
+ [a2] "rZ"(a->w[2]), [a3] "rZ"(a->w[3]),
+ [b0] "rZ"(b->w[0]), [b1] "rZ"(b->w[1]),
+ [b2] "rZ"(b->w[2]), [b3] "rZ"(b->w[3])
+ : "cc");
#else
bool borrow = false;
@@ -7388,6 +7400,13 @@ static void neg256(UInt256 *a)
: "=&r"(a->w[0]), "=&r"(a->w[1]), "=&r"(a->w[2]), "+rm"(a->w[3])
: "rme"(a->w[0]), "rme"(a->w[1]), "rme"(a->w[2]),
"0"(0), "1"(0), "2"(0));
+#elif defined(__aarch64__)
+ asm("negs %3, %3\n\t"
+ "ngcs %2, %2\n\t"
+ "ngcs %1, %1\n\t"
+ "ngc %0, %0"
+ : "+r"(a->w[0]), "+r"(a->w[1]), "+r"(a->w[2]), "+r"(a->w[3])
+ : : "cc");
#else
/*
* Recall that -X - 1 = ~X, and that since this is negation,
@@ -7430,6 +7449,14 @@ static void add256(UInt256 *a, UInt256 *b)
"adc %4, %0"
: "+r"(a->w[0]), "+r"(a->w[1]), "+r"(a->w[2]), "+r"(a->w[3])
: "rme"(b->w[0]), "rme"(b->w[1]), "rme"(b->w[2]), "rme"(b->w[3]));
+#elif defined(__aarch64__)
+ asm("adds %3, %3, %x7\n\t"
+ "adcs %2, %2, %x6\n\t"
+ "adcs %1, %1, %x5\n\t"
+ "adc %0, %0, %x4"
+ : "+r"(a->w[0]), "+r"(a->w[1]), "+r"(a->w[2]), "+r"(a->w[3])
+ : "rZ"(b->w[0]), "rZ"(b->w[1]), "rZ"(b->w[2]), "rZ"(b->w[3])
+ : "cc");
#else
bool carry = false;
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 10/10] softfloat: Use ppc64 assembly for {add, sub}{192, 256}
2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
` (8 preceding siblings ...)
2020-09-25 15:20 ` [PATCH v2 09/10] softfloat: Use aarch64 assembly for {add, sub}{192, 256} Richard Henderson
@ 2020-09-25 15:20 ` Richard Henderson
2020-10-15 17:23 ` [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
10 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-09-25 15:20 UTC (permalink / raw)
To: qemu-devel; +Cc: bharata, alex.bennee, david
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/fpu/softfloat-macros.h | 14 ++++++++++++++
fpu/softfloat.c | 27 +++++++++++++++++++++++++++
2 files changed, 41 insertions(+)
diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 969a486fd2..d26cfaf267 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -449,6 +449,13 @@ static inline void
: "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
: "rZ"(a0), "rZ"(a1), "rZ"(a2), "rZ"(b0), "rZ"(b1), "rZ"(b2)
: "cc");
+#elif defined(__powerpc64__)
+ asm("addc %2, %5, %8\n\t"
+ "adde %1, %4, %7\n\t"
+ "adde %0, %3, %6"
+ : "=r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
+ : "r"(a0), "r"(a1), "r"(a2), "r"(b0), "r"(b1), "r"(b2)
+ : "ca");
#else
uint64_t z0, z1, z2;
int8_t carry0, carry1;
@@ -521,6 +528,13 @@ static inline void
: "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
: "rZ"(a0), "rZ"(a1), "rZ"(a2), "rZ"(b0), "rZ"(b1), "rZ"(b2)
: "cc");
+#elif defined(__powerpc64__)
+ asm("subfc %2, %8, %5\n\t"
+ "subfe %1, %7, %4\n\t"
+ "subfe %0, %6, %3"
+ : "=&r"(*z0Ptr), "=&r"(*z1Ptr), "=&r"(*z2Ptr)
+ : "r"(a0), "r"(a1), "r"(a2), "r"(b0), "r"(b1), "r"(b2)
+ : "ca");
#else
uint64_t z0, z1, z2;
int8_t borrow0, borrow1;
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 07dc17caad..9af75b9146 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -7370,6 +7370,18 @@ static void sub256(UInt256 *r, UInt256 *a, UInt256 *b)
[b0] "rZ"(b->w[0]), [b1] "rZ"(b->w[1]),
[b2] "rZ"(b->w[2]), [b3] "rZ"(b->w[3])
: "cc");
+#elif defined(__powerpc64__)
+ asm("subfc %[r3], %[b3], %[a3]\n\t"
+ "subfe %[r2], %[b2], %[a2]\n\t"
+ "subfe %[r1], %[b1], %[a1]\n\t"
+ "subfe %[r0], %[b0], %[a0]"
+ : [r0] "=&r"(r->w[0]), [r1] "=&r"(r->w[1]),
+ [r2] "=&r"(r->w[2]), [r3] "=&r"(r->w[3])
+ : [a0] "r"(a->w[0]), [a1] "r"(a->w[1]),
+ [a2] "r"(a->w[2]), [a3] "r"(a->w[3]),
+ [b0] "r"(b->w[0]), [b1] "r"(b->w[1]),
+ [b2] "r"(b->w[2]), [b3] "r"(b->w[3])
+ : "ca");
#else
bool borrow = false;
@@ -7407,6 +7419,13 @@ static void neg256(UInt256 *a)
"ngc %0, %0"
: "+r"(a->w[0]), "+r"(a->w[1]), "+r"(a->w[2]), "+r"(a->w[3])
: : "cc");
+#elif defined(__powerpc64__)
+ asm("subfic %3, %3, 0\n\t"
+ "subfze %2, %2\n\t"
+ "subfze %1, %1\n\t"
+ "subfze %0, %0"
+ : "+r"(a->w[0]), "+r"(a->w[1]), "+r"(a->w[2]), "+r"(a->w[3])
+ : : "ca");
#else
/*
* Recall that -X - 1 = ~X, and that since this is negation,
@@ -7457,6 +7476,14 @@ static void add256(UInt256 *a, UInt256 *b)
: "+r"(a->w[0]), "+r"(a->w[1]), "+r"(a->w[2]), "+r"(a->w[3])
: "rZ"(b->w[0]), "rZ"(b->w[1]), "rZ"(b->w[2]), "rZ"(b->w[3])
: "cc");
+#elif defined(__powerpc64__)
+ asm("addc %3, %3, %7\n\t"
+ "adde %2, %2, %6\n\t"
+ "adde %1, %1, %5\n\t"
+ "adde %0, %0, %4"
+ : "+r"(a->w[0]), "+r"(a->w[1]), "+r"(a->w[2]), "+r"(a->w[3])
+ : "r"(b->w[0]), "r"(b->w[1]), "r"(b->w[2]), "r"(b->w[3])
+ : "ca");
#else
bool carry = false;
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v2 00/10] softfloat: Implement float128_muladd
2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
` (9 preceding siblings ...)
2020-09-25 15:20 ` [PATCH v2 10/10] softfloat: Use ppc64 " Richard Henderson
@ 2020-10-15 17:23 ` Richard Henderson
10 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-10-15 17:23 UTC (permalink / raw)
To: qemu-devel; +Cc: bharata, alex.bennee, david
Ping.
On 9/25/20 8:20 AM, Richard Henderson wrote:
> Plus assorted cleanups, passes tests/fp/fp-test.
>
> Changes in v2:
> * Add UInt256 type (david)
> * Rewrite and inline shift256RightJamming. This keeps the whole
> UInt256 in registers, avoiding long sequences of loads and stores.
> * Add x86_64 assembly for double shifts. I don't know why the
> compiler can't recognize this pattern, but swapping values in
> and out of %cl (the only register in the base isa that can
> hold a variable shift) is really ugly.
> * Add ppc64 assembly.
>
>
> r~
>
>
> Richard Henderson (10):
> softfloat: Use mulu64 for mul64To128
> softfloat: Use int128.h for some operations
> softfloat: Tidy a * b + inf return
> softfloat: Add float_cmask and constants
> softfloat: Inline pick_nan_muladd into its caller
> softfloat: Implement float128_muladd
> softfloat: Use x86_64 assembly for {add,sub}{192,256}
> softfloat: Use x86_64 assembly for sh[rl]_double
> softfloat: Use aarch64 assembly for {add,sub}{192,256}
> softfloat: Use ppc64 assembly for {add,sub}{192,256}
>
> include/fpu/softfloat-macros.h | 109 +++---
> include/fpu/softfloat.h | 2 +
> fpu/softfloat.c | 620 ++++++++++++++++++++++++++++++---
> tests/fp/fp-test.c | 2 +-
> tests/fp/wrap.c.inc | 12 +
> 5 files changed, 652 insertions(+), 93 deletions(-)
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 01/10] softfloat: Use mulu64 for mul64To128
2020-09-25 15:20 ` [PATCH v2 01/10] softfloat: Use mulu64 for mul64To128 Richard Henderson
@ 2020-10-15 19:08 ` Alex Bennée
0 siblings, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2020-10-15 19:08 UTC (permalink / raw)
To: Richard Henderson; +Cc: bharata, qemu-devel, david
Richard Henderson <richard.henderson@linaro.org> writes:
> Via host-utils.h, we use a host widening multiply for
> 64-bit hosts, and a common subroutine for 32-bit hosts.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 02/10] softfloat: Use int128.h for some operations
2020-09-25 15:20 ` [PATCH v2 02/10] softfloat: Use int128.h for some operations Richard Henderson
@ 2020-10-15 19:10 ` Alex Bennée
0 siblings, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2020-10-15 19:10 UTC (permalink / raw)
To: Richard Henderson; +Cc: bharata, qemu-devel, david
Richard Henderson <richard.henderson@linaro.org> writes:
> Use our Int128, which wraps the compiler's __int128_t,
> instead of open-coding left shifts and arithmetic.
> We'd need to extend Int128 to have unsigned operations
> to replace more than these three.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 03/10] softfloat: Tidy a * b + inf return
2020-09-25 15:20 ` [PATCH v2 03/10] softfloat: Tidy a * b + inf return Richard Henderson
@ 2020-10-16 9:40 ` Alex Bennée
2020-10-16 17:04 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2020-10-16 9:40 UTC (permalink / raw)
To: Richard Henderson; +Cc: bharata, qemu-devel, david
Richard Henderson <richard.henderson@linaro.org> writes:
> No reason to set values in 'a', when we already
> have float_class_inf in 'c', and can flip that sign.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 04/10] softfloat: Add float_cmask and constants
2020-09-25 15:20 ` [PATCH v2 04/10] softfloat: Add float_cmask and constants Richard Henderson
@ 2020-10-16 9:44 ` Alex Bennée
0 siblings, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2020-10-16 9:44 UTC (permalink / raw)
To: Richard Henderson; +Cc: bharata, qemu-devel, david
Richard Henderson <richard.henderson@linaro.org> writes:
> Testing more than one class at a time is better done with masks.
> This reduces the static branch count.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 05/10] softfloat: Inline pick_nan_muladd into its caller
2020-09-25 15:20 ` [PATCH v2 05/10] softfloat: Inline pick_nan_muladd into its caller Richard Henderson
@ 2020-10-16 16:20 ` Alex Bennée
2020-10-16 16:36 ` Richard Henderson
0 siblings, 1 reply; 23+ messages in thread
From: Alex Bennée @ 2020-10-16 16:20 UTC (permalink / raw)
To: Richard Henderson; +Cc: bharata, qemu-devel, david
Richard Henderson <richard.henderson@linaro.org> writes:
> Because of FloatParts, there will only ever be one caller.
Isn't that admitting defeat - after all the logic here will be the same
as the login in the up coming float128_muladd code and we only seem to
need additional information:
> Inlining allows us to re-use abc_mask for the snan test.
couldn't we just pass the masks in?
<snip>
> - if (is_snan(a.cls)) {
> - return parts_silence_nan(a, s);
> - }
> - return a;
here.
> -}
> -
> /*
> * Returns the result of adding or subtracting the values of the
> * floating-point values `a' and `b'. The operation is performed
> @@ -1366,7 +1327,41 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
> * off to the target-specific pick-a-NaN routine.
> */
> if (unlikely(abc_mask & float_cmask_anynan)) {
> - return pick_nan_muladd(a, b, c, inf_zero, s);
> + int which;
> +
> + if (unlikely(abc_mask & float_cmask_snan)) {
> + float_raise(float_flag_invalid, s);
> + }
> +
> + which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
> +
> + if (s->default_nan_mode) {
> + /*
> + * Note that this check is after pickNaNMulAdd so that function
> + * has an opportunity to set the Invalid flag for inf_zero.
> + */
> + which = 3;
> + }
> +
> + switch (which) {
> + case 0:
> + break;
> + case 1:
> + a = b;
> + break;
> + case 2:
> + a = c;
> + break;
> + case 3:
> + return parts_default_nan(s);
> + default:
> + g_assert_not_reached();
> + }
> +
> + if (is_snan(a.cls)) {
> + return parts_silence_nan(a, s);
> + }
> + return a;
> }
>
> if (unlikely(inf_zero)) {
I'm not totally against it given it's fairly simple logic but it seems a
shame to loose the commonality of processing which makes the parts code
so much nicer.
--
Alex Bennée
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 06/10] softfloat: Implement float128_muladd
2020-09-25 15:20 ` [PATCH v2 06/10] softfloat: Implement float128_muladd Richard Henderson
@ 2020-10-16 16:31 ` Alex Bennée
2020-10-16 16:55 ` Richard Henderson
0 siblings, 1 reply; 23+ messages in thread
From: Alex Bennée @ 2020-10-16 16:31 UTC (permalink / raw)
To: Richard Henderson; +Cc: bharata, qemu-devel, david
Richard Henderson <richard.henderson@linaro.org> writes:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/fpu/softfloat.h | 2 +
> fpu/softfloat.c | 416 +++++++++++++++++++++++++++++++++++++++-
> tests/fp/fp-test.c | 2 +-
> tests/fp/wrap.c.inc | 12 ++
> 4 files changed, 430 insertions(+), 2 deletions(-)
>
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 78ad5ca738..a38433deb4 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -1196,6 +1196,8 @@ float128 float128_sub(float128, float128, float_status *status);
> float128 float128_mul(float128, float128, float_status *status);
> float128 float128_div(float128, float128, float_status *status);
> float128 float128_rem(float128, float128, float_status *status);
> +float128 float128_muladd(float128, float128, float128, int,
> + float_status *status);
> float128 float128_sqrt(float128, float_status *status);
> FloatRelation float128_compare(float128, float128, float_status *status);
> FloatRelation float128_compare_quiet(float128, float128, float_status *status);
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index e038434a07..49de31fec2 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -512,11 +512,19 @@ static inline __attribute__((unused)) bool is_qnan(FloatClass c)
>
> typedef struct {
> uint64_t frac;
> - int32_t exp;
> + int32_t exp;
> FloatClass cls;
> bool sign;
> } FloatParts;
>
> +/* Similar for float128. */
> +typedef struct {
> + uint64_t frac0, frac1;
> + int32_t exp;
> + FloatClass cls;
> + bool sign;
> +} FloatParts128;
> +
> #define DECOMPOSED_BINARY_POINT (64 - 2)
> #define DECOMPOSED_IMPLICIT_BIT (1ull << DECOMPOSED_BINARY_POINT)
> #define DECOMPOSED_OVERFLOW_BIT (DECOMPOSED_IMPLICIT_BIT << 1)
> @@ -4574,6 +4582,46 @@ static void
>
> }
>
> +/*----------------------------------------------------------------------------
> +| Returns the parts of floating-point value `a'.
> +*----------------------------------------------------------------------------*/
> +
> +static void float128_unpack(FloatParts128 *p, float128 a, float_status *status)
> +{
> + p->sign = extractFloat128Sign(a);
> + p->exp = extractFloat128Exp(a);
> + p->frac0 = extractFloat128Frac0(a);
> + p->frac1 = extractFloat128Frac1(a);
Here we are deviating from the exiting style, it would be nice if we
could separate the raw unpack and have something like:
static const FloatFmt float128_params = {
FLOAT_PARAMS(15, 112)
}
static inline FloatParts128 unpack128_raw(FloatFmt fmt, uint128_t raw)
{
const int sign_pos = fmt.frac_size + fmt.exp_size;
return (FloatParts128) {
.cls = float_class_unclassified,
.sign = extract128(raw, sign_pos, 1),
.exp = extract128(raw, fmt.frac_size, fmt.exp_size),
.frac1 = extract128_lo(raw, 0, fmt.frac_size),
.frac2 = extract128_hi(raw, 0, fmt.frac_size),
};
}
So even if we end up duplicating a chunk of the code the form will be
similar so when we side-by-side the logic we can see it works the same
way.
> +
> + if (p->exp == 0) {
> + if ((p->frac0 | p->frac1) == 0) {
> + p->cls = float_class_zero;
> + } else if (status->flush_inputs_to_zero) {
> + float_raise(float_flag_input_denormal, status);
> + p->cls = float_class_zero;
> + p->frac0 = p->frac1 = 0;
> + } else {
> + normalizeFloat128Subnormal(p->frac0, p->frac1, &p->exp,
> + &p->frac0, &p->frac1);
> + p->exp -= 0x3fff;
> + p->cls = float_class_normal;
> + }
and also we can get avoid introducing the magic numbers into the code.
> + } else if (p->exp == 0x7fff) {
> + if ((p->frac0 | p->frac1) == 0) {
> + p->cls = float_class_inf;
> + } else if (float128_is_signaling_nan(a, status)) {
> + p->cls = float_class_snan;
> + } else {
> + p->cls = float_class_qnan;
> + }
> + } else {
> + /* Add the implicit bit. */
> + p->frac0 |= UINT64_C(0x0001000000000000);
> + p->exp -= 0x3fff;
> + p->cls = float_class_normal;
> + }
> +}
> +
and eventually hold out for compilers smart enough to handle unification
at a later date.
> /*----------------------------------------------------------------------------
> | Packs the sign `zSign', the exponent `zExp', and the significand formed
> | by the concatenation of `zSig0' and `zSig1' into a quadruple-precision
> @@ -7205,6 +7253,372 @@ float128 float128_mul(float128 a, float128 b, float_status *status)
>
> }
>
> +typedef struct UInt256 {
> + /* Indexed big-endian, to match the rest of softfloat numbering. */
> + uint64_t w[4];
> +} UInt256;
> +
> +static inline uint64_t shl_double(uint64_t h, uint64_t l, unsigned lsh)
> +{
> + return lsh ? (h << lsh) | (l >> (64 - lsh)) : h;
> +}
> +
> +static inline uint64_t shr_double(uint64_t h, uint64_t l, unsigned rsh)
> +{
> + return rsh ? (h << (64 - rsh)) | (l >> rsh) : l;
> +}
> +
> +static void shortShift256Left(UInt256 *p, unsigned lsh)
> +{
> + if (lsh != 0) {
> + p->w[0] = shl_double(p->w[0], p->w[1], lsh);
> + p->w[1] = shl_double(p->w[1], p->w[2], lsh);
> + p->w[2] = shl_double(p->w[2], p->w[3], lsh);
> + p->w[3] <<= lsh;
> + }
> +}
> +
> +static inline void shift256RightJamming(UInt256 *p, unsigned count)
> +{
> + uint64_t out, p0, p1, p2, p3;
> +
> + p0 = p->w[0];
> + p1 = p->w[1];
> + p2 = p->w[2];
> + p3 = p->w[3];
> +
> + if (unlikely(count == 0)) {
> + return;
> + } else if (likely(count < 64)) {
> + out = 0;
> + } else if (likely(count < 256)) {
> + if (count < 128) {
> + out = p3;
> + p3 = p2;
> + p2 = p1;
> + p1 = p0;
> + p0 = 0;
> + } else if (count < 192) {
> + out = p2 | p3;
> + p3 = p1;
> + p2 = p0;
> + p1 = 0;
> + p0 = 0;
> + } else {
> + out = p1 | p2 | p3;
> + p3 = p0;
> + p2 = 0;
> + p1 = 0;
> + p0 = 0;
> + }
> + count &= 63;
> + if (count == 0) {
> + goto done;
> + }
> + } else {
> + out = p0 | p1 | p2 | p3;
> + p3 = 0;
> + p2 = 0;
> + p1 = 0;
> + p0 = 0;
> + goto done;
> + }
> +
> + out |= shr_double(p3, 0, count);
> + p3 = shr_double(p2, p3, count);
> + p2 = shr_double(p1, p2, count);
> + p1 = shr_double(p0, p1, count);
> + p0 = p0 >> count;
> +
> + done:
> + p->w[3] = p3 | (out != 0);
> + p->w[2] = p2;
> + p->w[1] = p1;
> + p->w[0] = p0;
> +}
> +
> +/* R = A - B */
> +static void sub256(UInt256 *r, UInt256 *a, UInt256 *b)
> +{
> + bool borrow = false;
> +
> + for (int i = 3; i >= 0; --i) {
> + uint64_t at = a->w[i];
> + uint64_t bt = b->w[i];
> + uint64_t rt = at - bt;
> +
> + if (borrow) {
> + borrow = at <= bt;
> + rt -= 1;
> + } else {
> + borrow = at < bt;
> + }
> + r->w[i] = rt;
> + }
> +}
> +
> +/* A = -A */
> +static void neg256(UInt256 *a)
> +{
> + /*
> + * Recall that -X - 1 = ~X, and that since this is negation,
> + * once we find a non-zero number, all subsequent words will
> + * have borrow-in, and thus use NOT.
> + */
> + uint64_t t = a->w[3];
> + if (likely(t)) {
> + a->w[3] = -t;
> + goto not2;
> + }
> + t = a->w[2];
> + if (likely(t)) {
> + a->w[2] = -t;
> + goto not1;
> + }
> + t = a->w[1];
> + if (likely(t)) {
> + a->w[1] = -t;
> + goto not0;
> + }
> + a->w[0] = -a->w[0];
> + return;
> + not2:
> + a->w[2] = ~a->w[2];
> + not1:
> + a->w[1] = ~a->w[1];
> + not0:
> + a->w[0] = ~a->w[0];
> +}
> +
> +/* A += B */
> +static void add256(UInt256 *a, UInt256 *b)
> +{
> + bool carry = false;
> +
> + for (int i = 3; i >= 0; --i) {
> + uint64_t bt = b->w[i];
> + uint64_t at = a->w[i] + bt;
> +
> + if (carry) {
> + at += 1;
> + carry = at <= bt;
> + } else {
> + carry = at < bt;
> + }
> + a->w[i] = at;
> + }
> +}
> +
> +float128 float128_muladd(float128 a_f, float128 b_f, float128 c_f,
> + int flags, float_status *status)
> +{
> + bool inf_zero, p_sign, sign_flip;
> + int p_exp, exp_diff, shift, ab_mask, abc_mask;
> + FloatParts128 a, b, c;
> + FloatClass p_cls;
> + UInt256 p_frac, c_frac;
> +
> + float128_unpack(&a, a_f, status);
> + float128_unpack(&b, b_f, status);
> + float128_unpack(&c, c_f, status);
> +
> + ab_mask = float_cmask(a.cls) | float_cmask(b.cls);
> + abc_mask = float_cmask(c.cls) | ab_mask;
> + inf_zero = ab_mask == float_cmask_infzero;
> +
> + /* If any input is a NaN, select the required result. */
> + if (unlikely(abc_mask & float_cmask_anynan)) {
> + if (unlikely(abc_mask & float_cmask_snan)) {
> + float_raise(float_flag_invalid, status);
> + }
> +
> + int which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, status);
> + if (status->default_nan_mode) {
> + which = 3;
> + }
> + switch (which) {
> + case 0:
> + break;
> + case 1:
> + a_f = b_f;
> + a.cls = b.cls;
> + break;
> + case 2:
> + a_f = c_f;
> + a.cls = c.cls;
> + break;
> + case 3:
> + return float128_default_nan(status);
> + }
> + if (is_snan(a.cls)) {
> + return float128_silence_nan(a_f, status);
> + }
> + return a_f;
> + }
> +
> + /* After dealing with input NaNs, look for Inf * Zero. */
> + if (unlikely(inf_zero)) {
> + float_raise(float_flag_invalid, status);
> + return float128_default_nan(status);
> + }
> +
> + p_sign = a.sign ^ b.sign;
> +
> + if (flags & float_muladd_negate_c) {
> + c.sign ^= 1;
> + }
> + if (flags & float_muladd_negate_product) {
> + p_sign ^= 1;
> + }
> + sign_flip = (flags & float_muladd_negate_result);
> +
> + if (ab_mask & float_cmask_inf) {
> + p_cls = float_class_inf;
> + } else if (ab_mask & float_cmask_zero) {
> + p_cls = float_class_zero;
> + } else {
> + p_cls = float_class_normal;
> + }
> +
> + if (c.cls == float_class_inf) {
> + if (p_cls == float_class_inf && p_sign != c.sign) {
> + /* +Inf + -Inf = NaN */
> + float_raise(float_flag_invalid, status);
> + return float128_default_nan(status);
> + }
> + /* Inf + Inf = Inf of the proper sign; reuse the return below. */
> + p_cls = float_class_inf;
> + p_sign = c.sign;
> + }
> +
> + if (p_cls == float_class_inf) {
> + return packFloat128(p_sign ^ sign_flip, 0x7fff, 0, 0);
> + }
> +
> + if (p_cls == float_class_zero) {
> + if (c.cls == float_class_zero) {
> + if (p_sign != c.sign) {
> + p_sign = status->float_rounding_mode == float_round_down;
> + }
> + return packFloat128(p_sign ^ sign_flip, 0, 0, 0);
> + }
> +
> + if (flags & float_muladd_halve_result) {
> + c.exp -= 1;
> + }
> + return roundAndPackFloat128(c.sign ^ sign_flip,
> + c.exp + 0x3fff - 1,
> + c.frac0, c.frac1, 0, status);
> + }
> +
> + /* a & b should be normals now... */
> + assert(a.cls == float_class_normal && b.cls == float_class_normal);
> +
> + /* Multiply of 2 113-bit numbers produces a 226-bit result. */
> + mul128To256(a.frac0, a.frac1, b.frac0, b.frac1,
> + &p_frac.w[0], &p_frac.w[1], &p_frac.w[2], &p_frac.w[3]);
> +
> + /* Realign the binary point at bit 48 of p_frac[0]. */
> + shift = clz64(p_frac.w[0]) - 15;
> + shortShift256Left(&p_frac, shift);
> + p_exp = a.exp + b.exp - (shift - 16);
> + exp_diff = p_exp - c.exp;
> +
> + /* Extend the fraction part of the addend to 256 bits. */
> + c_frac.w[0] = c.frac0;
> + c_frac.w[1] = c.frac1;
> + c_frac.w[2] = 0;
> + c_frac.w[3] = 0;
> +
> + /* Add or subtract C from the intermediate product. */
> + if (c.cls == float_class_zero) {
> + /* Fall through to rounding after addition (with zero). */
> + } else if (p_sign != c.sign) {
> + /* Subtraction */
> + if (exp_diff < 0) {
> + shift256RightJamming(&p_frac, -exp_diff);
> + sub256(&p_frac, &c_frac, &p_frac);
> + p_exp = c.exp;
> + p_sign ^= 1;
> + } else if (exp_diff > 0) {
> + shift256RightJamming(&c_frac, exp_diff);
> + sub256(&p_frac, &p_frac, &c_frac);
> + } else {
> + /* Low 128 bits of C are known to be zero. */
> + sub128(p_frac.w[0], p_frac.w[1], c_frac.w[0], c_frac.w[1],
> + &p_frac.w[0], &p_frac.w[1]);
> + /*
> + * Since we have normalized to bit 48 of p_frac[0],
> + * a negative result means C > P and we need to invert.
> + */
> + if ((int64_t)p_frac.w[0] < 0) {
> + neg256(&p_frac);
> + p_sign ^= 1;
> + }
> + }
> +
> + /*
> + * Gross normalization of the 256-bit subtraction result.
> + * Fine tuning below shared with addition.
> + */
> + if (p_frac.w[0] != 0) {
> + /* nothing to do */
> + } else if (p_frac.w[1] != 0) {
> + p_exp -= 64;
> + p_frac.w[0] = p_frac.w[1];
> + p_frac.w[1] = p_frac.w[2];
> + p_frac.w[2] = p_frac.w[3];
> + p_frac.w[3] = 0;
> + } else if (p_frac.w[2] != 0) {
> + p_exp -= 128;
> + p_frac.w[0] = p_frac.w[2];
> + p_frac.w[1] = p_frac.w[3];
> + p_frac.w[2] = 0;
> + p_frac.w[3] = 0;
> + } else if (p_frac.w[3] != 0) {
> + p_exp -= 192;
> + p_frac.w[0] = p_frac.w[3];
> + p_frac.w[1] = 0;
> + p_frac.w[2] = 0;
> + p_frac.w[3] = 0;
> + } else {
> + /* Subtraction was exact: result is zero. */
> + p_sign = status->float_rounding_mode == float_round_down;
> + return packFloat128(p_sign ^ sign_flip, 0, 0, 0);
> + }
> + } else {
> + /* Addition */
> + if (exp_diff <= 0) {
> + shift256RightJamming(&p_frac, -exp_diff);
> + /* Low 128 bits of C are known to be zero. */
> + add128(p_frac.w[0], p_frac.w[1], c_frac.w[0], c_frac.w[1],
> + &p_frac.w[0], &p_frac.w[1]);
> + p_exp = c.exp;
> + } else {
> + shift256RightJamming(&c_frac, exp_diff);
> + add256(&p_frac, &c_frac);
> + }
> + }
> +
> + /* Fine normalization of the 256-bit result: p_frac[0] != 0. */
> + shift = clz64(p_frac.w[0]) - 15;
> + if (shift < 0) {
> + shift256RightJamming(&p_frac, -shift);
> + } else if (shift > 0) {
> + shortShift256Left(&p_frac, shift);
> + }
> + p_exp -= shift;
> +
> + if (flags & float_muladd_halve_result) {
> + p_exp -= 1;
> + }
> + return roundAndPackFloat128(p_sign ^ sign_flip,
> + p_exp + 0x3fff - 1,
> + p_frac.w[0], p_frac.w[1],
> + p_frac.w[2] | (p_frac.w[3] != 0),
> + status);
> +}
> +
> /*----------------------------------------------------------------------------
> | Returns the result of dividing the quadruple-precision floating-point value
> | `a' by the corresponding value `b'. The operation is performed according to
> diff --git a/tests/fp/fp-test.c b/tests/fp/fp-test.c
> index 06ffebd6db..9bbb0dba67 100644
> --- a/tests/fp/fp-test.c
> +++ b/tests/fp/fp-test.c
> @@ -717,7 +717,7 @@ static void do_testfloat(int op, int rmode, bool exact)
> test_abz_f128(true_abz_f128M, subj_abz_f128M);
> break;
> case F128_MULADD:
> - not_implemented();
> + test_abcz_f128(slow_f128M_mulAdd, qemu_f128_mulAdd);
> break;
> case F128_SQRT:
> test_az_f128(slow_f128M_sqrt, qemu_f128M_sqrt);
> diff --git a/tests/fp/wrap.c.inc b/tests/fp/wrap.c.inc
> index 0cbd20013e..65a713deae 100644
> --- a/tests/fp/wrap.c.inc
> +++ b/tests/fp/wrap.c.inc
> @@ -574,6 +574,18 @@ WRAP_MULADD(qemu_f32_mulAdd, float32_muladd, float32)
> WRAP_MULADD(qemu_f64_mulAdd, float64_muladd, float64)
> #undef WRAP_MULADD
>
> +static void qemu_f128_mulAdd(const float128_t *ap, const float128_t *bp,
> + const float128_t *cp, float128_t *res)
> +{
> + float128 a, b, c, ret;
> +
> + a = soft_to_qemu128(*ap);
> + b = soft_to_qemu128(*bp);
> + c = soft_to_qemu128(*cp);
> + ret = float128_muladd(a, b, c, 0, &qsf);
> + *res = qemu_to_soft128(ret);
> +}
> +
> #define WRAP_CMP16(name, func, retcond) \
> static bool name(float16_t a, float16_t b) \
> { \
--
Alex Bennée
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 05/10] softfloat: Inline pick_nan_muladd into its caller
2020-10-16 16:20 ` Alex Bennée
@ 2020-10-16 16:36 ` Richard Henderson
2020-10-18 21:06 ` [PATCH] softfpu: Generalize pick_nan_muladd to opaque structures Richard Henderson
0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2020-10-16 16:36 UTC (permalink / raw)
To: Alex Bennée; +Cc: bharata, qemu-devel, david
On 10/16/20 9:20 AM, Alex Bennée wrote:
>
> Richard Henderson <richard.henderson@linaro.org> writes:
>
>> Because of FloatParts, there will only ever be one caller.
>
> Isn't that admitting defeat - after all the logic here will be the same
> as the login in the up coming float128_muladd code and we only seem to
> need additional information...
Well, that and passing around a completely different structure.
Which is the big stop. Any suggestions for that?
r~
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 06/10] softfloat: Implement float128_muladd
2020-10-16 16:31 ` Alex Bennée
@ 2020-10-16 16:55 ` Richard Henderson
0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2020-10-16 16:55 UTC (permalink / raw)
To: Alex Bennée; +Cc: bharata, qemu-devel, david
On 10/16/20 9:31 AM, Alex Bennée wrote:
>> +static void float128_unpack(FloatParts128 *p, float128 a, float_status *status)
>> +{
>> + p->sign = extractFloat128Sign(a);
>> + p->exp = extractFloat128Exp(a);
>> + p->frac0 = extractFloat128Frac0(a);
>> + p->frac1 = extractFloat128Frac1(a);
>
> Here we are deviating from the exiting style, it would be nice if we
> could separate the raw unpack and have something like:
>
> static const FloatFmt float128_params = {
> FLOAT_PARAMS(15, 112)
> }
>
> static inline FloatParts128 unpack128_raw(FloatFmt fmt, uint128_t raw)
> {
> const int sign_pos = fmt.frac_size + fmt.exp_size;
>
> return (FloatParts128) {
> .cls = float_class_unclassified,
> .sign = extract128(raw, sign_pos, 1),
> .exp = extract128(raw, fmt.frac_size, fmt.exp_size),
> .frac1 = extract128_lo(raw, 0, fmt.frac_size),
> .frac2 = extract128_hi(raw, 0, fmt.frac_size),
> };
> }
>
> So even if we end up duplicating a chunk of the code the form will be
> similar so when we side-by-side the logic we can see it works the same
> way.
I suppose, but unlike the smaller fp formats, we won't be able to reuse this.
Even if we pull in the x86 80-bit format and the m68k 96-bit format, there are
a number of fundamental differences. E.g. the implicit bit
>> + /* Add the implicit bit. */
>> + p->frac0 |= UINT64_C(0x0001000000000000);
is not present in the x86 and m68k formats.
Finally, I'm continuing to use the existing Berkeley packing logic. Which a
bit persnickety with where that implicit bit goes. Our smaller formats put the
implicit bit at bit 62.
r~
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 03/10] softfloat: Tidy a * b + inf return
2020-09-25 15:20 ` [PATCH v2 03/10] softfloat: Tidy a * b + inf return Richard Henderson
2020-10-16 9:40 ` Alex Bennée
@ 2020-10-16 17:04 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-16 17:04 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: david, alex.bennee, bharata
On 9/25/20 5:20 PM, Richard Henderson wrote:
> No reason to set values in 'a', when we already
> have float_class_inf in 'c', and can flip that sign.
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> fpu/softfloat.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] softfpu: Generalize pick_nan_muladd to opaque structures
2020-10-16 16:36 ` Richard Henderson
@ 2020-10-18 21:06 ` Richard Henderson
2020-10-19 9:57 ` Alex Bennée
0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2020-10-18 21:06 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennee
This will allow us to share code between FloatParts and FloatParts128.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
Cc: Alex Bennee <alex.bennee@linaro.org>
What do you think of this instead of inlining pick_nan_muladd
into the two muladd implementations?
r~
---
fpu/softfloat.c | 40 ++++++++++++++++++++++++----------------
1 file changed, 24 insertions(+), 16 deletions(-)
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3e625c47cd..60fdddd163 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -929,16 +929,23 @@ static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
return a;
}
-static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
- bool inf_zero, float_status *s)
+/*
+ * Given pointers to A, B, C, and the respective classes, return the
+ * pointer to the structure that is the NaN result, or NULL to signal
+ * that the result is the default NaN.
+ */
+static inline void *
+pick_nan_muladd(FloatClass a_cls, FloatClass b_cls, FloatClass c_cls,
+ void *a, void *b, void *c,
+ bool inf_zero, int abc_mask, float_status *s)
{
int which;
- if (is_snan(a.cls) || is_snan(b.cls) || is_snan(c.cls)) {
+ if (unlikely(abc_mask & float_cmask_snan)) {
s->float_exception_flags |= float_flag_invalid;
}
- which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
+ which = pickNaNMulAdd(a_cls, b_cls, c_cls, inf_zero, s);
if (s->default_nan_mode) {
/* Note that this check is after pickNaNMulAdd so that function
@@ -949,23 +956,16 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
switch (which) {
case 0:
- break;
+ return a;
case 1:
- a = b;
- break;
+ return b;
case 2:
- a = c;
- break;
+ return c;
case 3:
- return parts_default_nan(s);
+ return NULL;
default:
g_assert_not_reached();
}
-
- if (is_snan(a.cls)) {
- return parts_silence_nan(a, s);
- }
- return a;
}
/*
@@ -1366,7 +1366,15 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
* off to the target-specific pick-a-NaN routine.
*/
if (unlikely(abc_mask & float_cmask_anynan)) {
- return pick_nan_muladd(a, b, c, inf_zero, s);
+ FloatParts *r = pick_nan_muladd(a.cls, b.cls, c.cls, &a, &b, &c,
+ inf_zero, abc_mask, s);
+ if (r == NULL) {
+ return parts_default_nan(s);
+ }
+ if (is_snan(r->cls)) {
+ return parts_silence_nan(*r, s);
+ }
+ return *r;
}
if (unlikely(inf_zero)) {
--
2.25.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] softfpu: Generalize pick_nan_muladd to opaque structures
2020-10-18 21:06 ` [PATCH] softfpu: Generalize pick_nan_muladd to opaque structures Richard Henderson
@ 2020-10-19 9:57 ` Alex Bennée
0 siblings, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2020-10-19 9:57 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> This will allow us to share code between FloatParts and FloatParts128.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> Cc: Alex Bennee <alex.bennee@linaro.org>
>
> What do you think of this instead of inlining pick_nan_muladd
> into the two muladd implementations?
I think that can work. I was noodling about with float_addsub128 over
the weekend so I'll post what that looks like once I've tested it.
Anyway:
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>
>
> r~
>
> ---
> fpu/softfloat.c | 40 ++++++++++++++++++++++++----------------
> 1 file changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
> index 3e625c47cd..60fdddd163 100644
> --- a/fpu/softfloat.c
> +++ b/fpu/softfloat.c
> @@ -929,16 +929,23 @@ static FloatParts pick_nan(FloatParts a, FloatParts b, float_status *s)
> return a;
> }
>
> -static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
> - bool inf_zero, float_status *s)
> +/*
> + * Given pointers to A, B, C, and the respective classes, return the
> + * pointer to the structure that is the NaN result, or NULL to signal
> + * that the result is the default NaN.
> + */
> +static inline void *
> +pick_nan_muladd(FloatClass a_cls, FloatClass b_cls, FloatClass c_cls,
> + void *a, void *b, void *c,
> + bool inf_zero, int abc_mask, float_status *s)
> {
> int which;
>
> - if (is_snan(a.cls) || is_snan(b.cls) || is_snan(c.cls)) {
> + if (unlikely(abc_mask & float_cmask_snan)) {
> s->float_exception_flags |= float_flag_invalid;
> }
>
> - which = pickNaNMulAdd(a.cls, b.cls, c.cls, inf_zero, s);
> + which = pickNaNMulAdd(a_cls, b_cls, c_cls, inf_zero, s);
>
> if (s->default_nan_mode) {
> /* Note that this check is after pickNaNMulAdd so that function
> @@ -949,23 +956,16 @@ static FloatParts pick_nan_muladd(FloatParts a, FloatParts b, FloatParts c,
>
> switch (which) {
> case 0:
> - break;
> + return a;
> case 1:
> - a = b;
> - break;
> + return b;
> case 2:
> - a = c;
> - break;
> + return c;
> case 3:
> - return parts_default_nan(s);
> + return NULL;
> default:
> g_assert_not_reached();
> }
> -
> - if (is_snan(a.cls)) {
> - return parts_silence_nan(a, s);
> - }
> - return a;
> }
>
> /*
> @@ -1366,7 +1366,15 @@ static FloatParts muladd_floats(FloatParts a, FloatParts b, FloatParts c,
> * off to the target-specific pick-a-NaN routine.
> */
> if (unlikely(abc_mask & float_cmask_anynan)) {
> - return pick_nan_muladd(a, b, c, inf_zero, s);
> + FloatParts *r = pick_nan_muladd(a.cls, b.cls, c.cls, &a, &b, &c,
> + inf_zero, abc_mask, s);
> + if (r == NULL) {
> + return parts_default_nan(s);
> + }
> + if (is_snan(r->cls)) {
> + return parts_silence_nan(*r, s);
> + }
> + return *r;
> }
>
> if (unlikely(inf_zero)) {
--
Alex Bennée
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2020-10-19 9:58 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25 15:20 [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
2020-09-25 15:20 ` [PATCH v2 01/10] softfloat: Use mulu64 for mul64To128 Richard Henderson
2020-10-15 19:08 ` Alex Bennée
2020-09-25 15:20 ` [PATCH v2 02/10] softfloat: Use int128.h for some operations Richard Henderson
2020-10-15 19:10 ` Alex Bennée
2020-09-25 15:20 ` [PATCH v2 03/10] softfloat: Tidy a * b + inf return Richard Henderson
2020-10-16 9:40 ` Alex Bennée
2020-10-16 17:04 ` Philippe Mathieu-Daudé
2020-09-25 15:20 ` [PATCH v2 04/10] softfloat: Add float_cmask and constants Richard Henderson
2020-10-16 9:44 ` Alex Bennée
2020-09-25 15:20 ` [PATCH v2 05/10] softfloat: Inline pick_nan_muladd into its caller Richard Henderson
2020-10-16 16:20 ` Alex Bennée
2020-10-16 16:36 ` Richard Henderson
2020-10-18 21:06 ` [PATCH] softfpu: Generalize pick_nan_muladd to opaque structures Richard Henderson
2020-10-19 9:57 ` Alex Bennée
2020-09-25 15:20 ` [PATCH v2 06/10] softfloat: Implement float128_muladd Richard Henderson
2020-10-16 16:31 ` Alex Bennée
2020-10-16 16:55 ` Richard Henderson
2020-09-25 15:20 ` [PATCH v2 07/10] softfloat: Use x86_64 assembly for {add, sub}{192, 256} Richard Henderson
2020-09-25 15:20 ` [PATCH v2 08/10] softfloat: Use x86_64 assembly for sh[rl]_double Richard Henderson
2020-09-25 15:20 ` [PATCH v2 09/10] softfloat: Use aarch64 assembly for {add, sub}{192, 256} Richard Henderson
2020-09-25 15:20 ` [PATCH v2 10/10] softfloat: Use ppc64 " Richard Henderson
2020-10-15 17:23 ` [PATCH v2 00/10] softfloat: Implement float128_muladd Richard Henderson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.