All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] softfloat, target/i386: fprem, fprem1 fixes
@ 2020-06-08 16:54 Joseph Myers
  2020-06-08 16:55 ` [PATCH v2 1/6] softfloat: merge floatx80_mod and floatx80_rem Joseph Myers
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Joseph Myers @ 2020-06-08 16:54 UTC (permalink / raw)
  To: qemu-devel, aurelien, peter.maydell, alex.bennee, laurent,
	pbonzini, rth, ehabkost

The x87 floating-point emulation of the fprem and fprem1 instructions
works via conversion to and from double.  This is inherently
unsuitable for a good emulation of any floatx80 operation.  This patch
series adapts the softfloat floatx80_rem implementation to be suitable
for these instructions and uses it to reimplement them.

There is an existing test for these instructions, test-i386-fprem.c,
based on comparison of output.  It produces 1679695 lines of output,
and before this patch series 415422 of those lines are different on
hardware from the output produced by QEMU.  Some of those differences
are because QEMU's x87 emulation does not yet produce the "denormal
operand" exception; ignoring such differences (modifying the output
from a native run not to report that exception), there are still
398833 different lines.  This patch series reduces that latter number
to 1 (that one difference being because of missing checks for
floating-point stack underflow, another global issue with the x87
emulation), or 35517 different lines without the correction for lack
of denormal operand exception support.

Several fixes to and new features in the softfloat support for this
operation are needed; floatx80_mod, previously present in the m68k
code only, is made generic and unified with floatx80_rem in a new
floatx80_modrem of which floatx80_mod and floatx80_rem are thin
wrappers.  The only architectures using float*_rem for other formats
are arm (FPA emulation) and openrisc (instructions that have been
removed in the latest architecture version); they do not appear to
need any of the new features, and all the bugs fixed are specific to
floatx80, so no changes are made to the remainder implementation for
those formats.

A new feature added is returning the low bits of the quotient from
floatx80_modrem, as needed for both x87 and m68k.  The logic used to
determine the low 7 bits of the quotient for m68k
(target/m68k/fpu_helper.c:make_quotient) appears completely bogus (it
looks at the result of converting the remainder to integer, the
quotient having been discarded by that point); this patch series does
not change that to use the new interface, but the m68k maintainers may
wish to do so.

The Intel instruction set documentation leaves unspecified the exact
number of bits by which the remainder instructions reduce the operand
each time.  The AMD documentation gives a specific formula, which
empirically Intel processors follow as well, and that formula is
implemented in the code.  The AMD documentation also specifies that
flags other than C2 are cleared in the partial remainder case, whereas
the Intel manual is silent on that (but the processors do appear to
clear those flags); this patch implements that flag clearing, and
keeps the existing flag clearing in cases where the instructions raise
"invalid" (although it seems hardware in fact only clears some but not
all flags in that case, leaving other flags unchanged).

The Intel manuals include an inaccurate table asserting that (finite
REM 0) should raise "divide by zero"; actually, in accordance with
IEEE semantics, it raises "invalid".  The AMD manuals inaccurately say
for both fprem and fprem1 that if the exponent difference is negative,
the numerator is returned unchanged, which is correct (apart from
normalizing pseudo-denormals) for fprem but not for fprem1 (and the
old QEMU code had an incorrect optimization following the AMD manuals
for fprem1).

Changes in version 2 of the patch series: fix comment formatting and
combine patches 6 and 7.

Joseph Myers (6):
  softfloat: merge floatx80_mod and floatx80_rem
  softfloat: fix floatx80 remainder pseudo-denormal check for zero
  softfloat: do not return pseudo-denormal from floatx80 remainder
  softfloat: do not set denominator high bit for floatx80 remainder
  softfloat: return low bits of quotient from floatx80_modrem
  target/i386: reimplement fprem, fprem1 using floatx80 operations

 fpu/softfloat.c          |  87 ++++++++++++++++++----
 include/fpu/softfloat.h  |   3 +
 target/i386/fpu_helper.c | 156 ++++++++++++---------------------------
 target/m68k/softfloat.c  |  83 ---------------------
 target/m68k/softfloat.h  |   1 -
 5 files changed, 122 insertions(+), 208 deletions(-)

-- 
2.17.1


-- 
Joseph S. Myers
joseph@codesourcery.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/6] softfloat: merge floatx80_mod and floatx80_rem
  2020-06-08 16:54 [PATCH v2 0/6] softfloat, target/i386: fprem, fprem1 fixes Joseph Myers
@ 2020-06-08 16:55 ` Joseph Myers
  2020-06-08 16:55 ` [PATCH v2 2/6] softfloat: fix floatx80 remainder pseudo-denormal check for zero Joseph Myers
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Joseph Myers @ 2020-06-08 16:55 UTC (permalink / raw)
  To: qemu-devel, aurelien, peter.maydell, alex.bennee, laurent,
	pbonzini, rth, ehabkost

The m68k-specific softfloat code includes a function floatx80_mod that
is extremely similar to floatx80_rem, but computing the remainder
based on truncating the quotient toward zero rather than rounding it
to nearest integer.  This is also useful for emulating the x87 fprem
and fprem1 instructions.  Change the floatx80_rem implementation into
floatx80_modrem that can perform either operation, with both
floatx80_rem and floatx80_mod as thin wrappers available for all
targets.

There does not appear to be any use for the _mod operation for other
floating-point formats in QEMU (the only other architectures using
_rem at all are linux-user/arm/nwfpe, for FPA emulation, and openrisc,
for instructions that have been removed in the latest version of the
architecture), so no change is made to the code for other formats.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c         | 49 ++++++++++++++++++------
 include/fpu/softfloat.h |  2 +
 target/m68k/softfloat.c | 83 -----------------------------------------
 target/m68k/softfloat.h |  1 -
 4 files changed, 40 insertions(+), 95 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6c8f2d597a..7b1ce7664f 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5682,10 +5682,13 @@ floatx80 floatx80_div(floatx80 a, floatx80 b, float_status *status)
 /*----------------------------------------------------------------------------
 | Returns the remainder of the extended double-precision floating-point value
 | `a' with respect to the corresponding value `b'.  The operation is performed
-| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic,
+| if 'mod' is false; if 'mod' is true, return the remainder based on truncating
+| the quotient toward zero instead.
 *----------------------------------------------------------------------------*/
 
-floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
+floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
+                         float_status *status)
 {
     bool aSign, zSign;
     int32_t aExp, bExp, expDiff;
@@ -5731,7 +5734,7 @@ floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
     expDiff = aExp - bExp;
     aSig1 = 0;
     if ( expDiff < 0 ) {
-        if ( expDiff < -1 ) return a;
+        if ( mod || expDiff < -1 ) return a;
         shift128Right( aSig0, 0, 1, &aSig0, &aSig1 );
         expDiff = 0;
     }
@@ -5763,14 +5766,16 @@ floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
         term1 = 0;
         term0 = bSig;
     }
-    sub128( term0, term1, aSig0, aSig1, &alternateASig0, &alternateASig1 );
-    if (    lt128( alternateASig0, alternateASig1, aSig0, aSig1 )
-         || (    eq128( alternateASig0, alternateASig1, aSig0, aSig1 )
-              && ( q & 1 ) )
-       ) {
-        aSig0 = alternateASig0;
-        aSig1 = alternateASig1;
-        zSign = ! zSign;
+    if (!mod) {
+        sub128( term0, term1, aSig0, aSig1, &alternateASig0, &alternateASig1 );
+        if (    lt128( alternateASig0, alternateASig1, aSig0, aSig1 )
+                || (    eq128( alternateASig0, alternateASig1, aSig0, aSig1 )
+                        && ( q & 1 ) )
+            ) {
+            aSig0 = alternateASig0;
+            aSig1 = alternateASig1;
+            zSign = ! zSign;
+        }
     }
     return
         normalizeRoundAndPackFloatx80(
@@ -5778,6 +5783,28 @@ floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
 
 }
 
+/*----------------------------------------------------------------------------
+| Returns the remainder of the extended double-precision floating-point value
+| `a' with respect to the corresponding value `b'.  The operation is performed
+| according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic.
+*----------------------------------------------------------------------------*/
+
+floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
+{
+    return floatx80_modrem(a, b, false, status);
+}
+
+/*----------------------------------------------------------------------------
+| Returns the remainder of the extended double-precision floating-point value
+| `a' with respect to the corresponding value `b', with the quotient truncated
+| toward zero.
+*----------------------------------------------------------------------------*/
+
+floatx80 floatx80_mod(floatx80 a, floatx80 b, float_status *status)
+{
+    return floatx80_modrem(a, b, true, status);
+}
+
 /*----------------------------------------------------------------------------
 | Returns the square root of the extended double-precision floating-point
 | value `a'.  The operation is performed according to the IEC/IEEE Standard
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 16ca697a73..bff6934d09 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -687,6 +687,8 @@ floatx80 floatx80_add(floatx80, floatx80, float_status *status);
 floatx80 floatx80_sub(floatx80, floatx80, float_status *status);
 floatx80 floatx80_mul(floatx80, floatx80, float_status *status);
 floatx80 floatx80_div(floatx80, floatx80, float_status *status);
+floatx80 floatx80_modrem(floatx80, floatx80, bool, float_status *status);
+floatx80 floatx80_mod(floatx80, floatx80, float_status *status);
 floatx80 floatx80_rem(floatx80, floatx80, float_status *status);
 floatx80 floatx80_sqrt(floatx80, float_status *status);
 FloatRelation floatx80_compare(floatx80, floatx80, float_status *status);
diff --git a/target/m68k/softfloat.c b/target/m68k/softfloat.c
index 9f120cf15e..b6d0ed7acf 100644
--- a/target/m68k/softfloat.c
+++ b/target/m68k/softfloat.c
@@ -42,89 +42,6 @@ static floatx80 propagateFloatx80NaNOneArg(floatx80 a, float_status *status)
     return a;
 }
 
-/*
- * Returns the modulo remainder of the extended double-precision floating-point
- * value `a' with respect to the corresponding value `b'.
- */
-
-floatx80 floatx80_mod(floatx80 a, floatx80 b, float_status *status)
-{
-    bool aSign, zSign;
-    int32_t aExp, bExp, expDiff;
-    uint64_t aSig0, aSig1, bSig;
-    uint64_t qTemp, term0, term1;
-
-    aSig0 = extractFloatx80Frac(a);
-    aExp = extractFloatx80Exp(a);
-    aSign = extractFloatx80Sign(a);
-    bSig = extractFloatx80Frac(b);
-    bExp = extractFloatx80Exp(b);
-
-    if (aExp == 0x7FFF) {
-        if ((uint64_t) (aSig0 << 1)
-            || ((bExp == 0x7FFF) && (uint64_t) (bSig << 1))) {
-            return propagateFloatx80NaN(a, b, status);
-        }
-        goto invalid;
-    }
-    if (bExp == 0x7FFF) {
-        if ((uint64_t) (bSig << 1)) {
-            return propagateFloatx80NaN(a, b, status);
-        }
-        return a;
-    }
-    if (bExp == 0) {
-        if (bSig == 0) {
-        invalid:
-            float_raise(float_flag_invalid, status);
-            return floatx80_default_nan(status);
-        }
-        normalizeFloatx80Subnormal(bSig, &bExp, &bSig);
-    }
-    if (aExp == 0) {
-        if ((uint64_t) (aSig0 << 1) == 0) {
-            return a;
-        }
-        normalizeFloatx80Subnormal(aSig0, &aExp, &aSig0);
-    }
-    bSig |= UINT64_C(0x8000000000000000);
-    zSign = aSign;
-    expDiff = aExp - bExp;
-    aSig1 = 0;
-    if (expDiff < 0) {
-        return a;
-    }
-    qTemp = (bSig <= aSig0);
-    if (qTemp) {
-        aSig0 -= bSig;
-    }
-    expDiff -= 64;
-    while (0 < expDiff) {
-        qTemp = estimateDiv128To64(aSig0, aSig1, bSig);
-        qTemp = (2 < qTemp) ? qTemp - 2 : 0;
-        mul64To128(bSig, qTemp, &term0, &term1);
-        sub128(aSig0, aSig1, term0, term1, &aSig0, &aSig1);
-        shortShift128Left(aSig0, aSig1, 62, &aSig0, &aSig1);
-        expDiff -= 62;
-    }
-    expDiff += 64;
-    if (0 < expDiff) {
-        qTemp = estimateDiv128To64(aSig0, aSig1, bSig);
-        qTemp = (2 < qTemp) ? qTemp - 2 : 0;
-        qTemp >>= 64 - expDiff;
-        mul64To128(bSig, qTemp << (64 - expDiff), &term0, &term1);
-        sub128(aSig0, aSig1, term0, term1, &aSig0, &aSig1);
-        shortShift128Left(0, bSig, 64 - expDiff, &term0, &term1);
-        while (le128(term0, term1, aSig0, aSig1)) {
-            ++qTemp;
-            sub128(aSig0, aSig1, term0, term1, &aSig0, &aSig1);
-        }
-    }
-    return
-        normalizeRoundAndPackFloatx80(
-            80, zSign, bExp + expDiff, aSig0, aSig1, status);
-}
-
 /*
  * Returns the mantissa of the extended double-precision floating-point
  * value `a'.
diff --git a/target/m68k/softfloat.h b/target/m68k/softfloat.h
index 365ef6ac7a..4bb9567134 100644
--- a/target/m68k/softfloat.h
+++ b/target/m68k/softfloat.h
@@ -23,7 +23,6 @@
 #define TARGET_M68K_SOFTFLOAT_H
 #include "fpu/softfloat.h"
 
-floatx80 floatx80_mod(floatx80 a, floatx80 b, float_status *status);
 floatx80 floatx80_getman(floatx80 a, float_status *status);
 floatx80 floatx80_getexp(floatx80 a, float_status *status);
 floatx80 floatx80_scale(floatx80 a, floatx80 b, float_status *status);
-- 
2.17.1


-- 
Joseph S. Myers
joseph@codesourcery.com


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/6] softfloat: fix floatx80 remainder pseudo-denormal check for zero
  2020-06-08 16:54 [PATCH v2 0/6] softfloat, target/i386: fprem, fprem1 fixes Joseph Myers
  2020-06-08 16:55 ` [PATCH v2 1/6] softfloat: merge floatx80_mod and floatx80_rem Joseph Myers
@ 2020-06-08 16:55 ` Joseph Myers
  2020-06-08 16:56 ` [PATCH v2 3/6] softfloat: do not return pseudo-denormal from floatx80 remainder Joseph Myers
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Joseph Myers @ 2020-06-08 16:55 UTC (permalink / raw)
  To: qemu-devel, aurelien, peter.maydell, alex.bennee, laurent,
	pbonzini, rth, ehabkost

The floatx80 remainder implementation ignores the high bit of the
significand when checking whether an operand (numerator) with zero
exponent is zero.  This means it mishandles a pseudo-denormal
representation of 0x1p-16382L by treating it as zero.  Fix this by
checking the whole significand instead.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 7b1ce7664f..091847beb9 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5726,7 +5726,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
         normalizeFloatx80Subnormal( bSig, &bExp, &bSig );
     }
     if ( aExp == 0 ) {
-        if ( (uint64_t) ( aSig0<<1 ) == 0 ) return a;
+        if ( aSig0 == 0 ) return a;
         normalizeFloatx80Subnormal( aSig0, &aExp, &aSig0 );
     }
     bSig |= UINT64_C(0x8000000000000000);
-- 
2.17.1


-- 
Joseph S. Myers
joseph@codesourcery.com


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/6] softfloat: do not return pseudo-denormal from floatx80 remainder
  2020-06-08 16:54 [PATCH v2 0/6] softfloat, target/i386: fprem, fprem1 fixes Joseph Myers
  2020-06-08 16:55 ` [PATCH v2 1/6] softfloat: merge floatx80_mod and floatx80_rem Joseph Myers
  2020-06-08 16:55 ` [PATCH v2 2/6] softfloat: fix floatx80 remainder pseudo-denormal check for zero Joseph Myers
@ 2020-06-08 16:56 ` Joseph Myers
  2020-06-08 16:56 ` [PATCH v2 4/6] softfloat: do not set denominator high bit for " Joseph Myers
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Joseph Myers @ 2020-06-08 16:56 UTC (permalink / raw)
  To: qemu-devel, aurelien, peter.maydell, alex.bennee, laurent,
	pbonzini, rth, ehabkost

The floatx80 remainder implementation sometimes returns the numerator
unchanged when the denominator is sufficiently larger than the
numerator.  But if the value to be returned unchanged is a
pseudo-denormal, that is incorrect.  Fix it to normalize the numerator
in that case.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 091847beb9..9d43868e4c 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5691,7 +5691,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
                          float_status *status)
 {
     bool aSign, zSign;
-    int32_t aExp, bExp, expDiff;
+    int32_t aExp, bExp, expDiff, aExpOrig;
     uint64_t aSig0, aSig1, bSig;
     uint64_t q, term0, term1, alternateASig0, alternateASig1;
 
@@ -5700,7 +5700,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
         return floatx80_default_nan(status);
     }
     aSig0 = extractFloatx80Frac( a );
-    aExp = extractFloatx80Exp( a );
+    aExpOrig = aExp = extractFloatx80Exp( a );
     aSign = extractFloatx80Sign( a );
     bSig = extractFloatx80Frac( b );
     bExp = extractFloatx80Exp( b );
@@ -5715,6 +5715,13 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
         if ((uint64_t)(bSig << 1)) {
             return propagateFloatx80NaN(a, b, status);
         }
+        if (aExp == 0 && aSig0 >> 63) {
+            /*
+             * Pseudo-denormal argument must be returned in normalized
+             * form.
+             */
+            return packFloatx80(aSign, 1, aSig0);
+        }
         return a;
     }
     if ( bExp == 0 ) {
@@ -5734,7 +5741,16 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
     expDiff = aExp - bExp;
     aSig1 = 0;
     if ( expDiff < 0 ) {
-        if ( mod || expDiff < -1 ) return a;
+        if ( mod || expDiff < -1 ) {
+            if (aExp == 1 && aExpOrig == 0) {
+                /*
+                 * Pseudo-denormal argument must be returned in
+                 * normalized form.
+                 */
+                return packFloatx80(aSign, aExp, aSig0);
+            }
+            return a;
+        }
         shift128Right( aSig0, 0, 1, &aSig0, &aSig1 );
         expDiff = 0;
     }
-- 
2.17.1


-- 
Joseph S. Myers
joseph@codesourcery.com


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 4/6] softfloat: do not set denominator high bit for floatx80 remainder
  2020-06-08 16:54 [PATCH v2 0/6] softfloat, target/i386: fprem, fprem1 fixes Joseph Myers
                   ` (2 preceding siblings ...)
  2020-06-08 16:56 ` [PATCH v2 3/6] softfloat: do not return pseudo-denormal from floatx80 remainder Joseph Myers
@ 2020-06-08 16:56 ` Joseph Myers
  2020-06-08 16:57 ` [PATCH v2 5/6] softfloat: return low bits of quotient from floatx80_modrem Joseph Myers
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Joseph Myers @ 2020-06-08 16:56 UTC (permalink / raw)
  To: qemu-devel, aurelien, peter.maydell, alex.bennee, laurent,
	pbonzini, rth, ehabkost

The floatx80 remainder implementation unnecessarily sets the high bit
of bSig explicitly.  By that point in the function, arguments that are
invalid, zero, infinity or NaN have already been handled and
subnormals have been through normalizeFloatx80Subnormal, so the high
bit will already be set.  Remove the unnecessary code.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 9d43868e4c..1552241b5e 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5736,7 +5736,6 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
         if ( aSig0 == 0 ) return a;
         normalizeFloatx80Subnormal( aSig0, &aExp, &aSig0 );
     }
-    bSig |= UINT64_C(0x8000000000000000);
     zSign = aSign;
     expDiff = aExp - bExp;
     aSig1 = 0;
-- 
2.17.1


-- 
Joseph S. Myers
joseph@codesourcery.com


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 5/6] softfloat: return low bits of quotient from floatx80_modrem
  2020-06-08 16:54 [PATCH v2 0/6] softfloat, target/i386: fprem, fprem1 fixes Joseph Myers
                   ` (3 preceding siblings ...)
  2020-06-08 16:56 ` [PATCH v2 4/6] softfloat: do not set denominator high bit for " Joseph Myers
@ 2020-06-08 16:57 ` Joseph Myers
  2020-06-08 16:58 ` [PATCH v2 6/6] target/i386: reimplement fprem, fprem1 using floatx80 operations Joseph Myers
  2020-06-12 16:41 ` [PATCH v2 0/6] softfloat, target/i386: fprem, fprem1 fixes Paolo Bonzini
  6 siblings, 0 replies; 8+ messages in thread
From: Joseph Myers @ 2020-06-08 16:57 UTC (permalink / raw)
  To: qemu-devel, aurelien, peter.maydell, alex.bennee, laurent,
	pbonzini, rth, ehabkost

Both x87 and m68k need the low parts of the quotient for their
remainder operations.  Arrange for floatx80_modrem to track those bits
and return them via a pointer.

The architectures using float32_rem and float64_rem do not appear to
need this information, so the *_rem interface is left unchanged and
the information returned only from floatx80_modrem.  The logic used to
determine the low 7 bits of the quotient for m68k
(target/m68k/fpu_helper.c:make_quotient) appears completely bogus (it
looks at the result of converting the remainder to integer, the
quotient having been discarded by that point); this patch does not
change that, but the m68k maintainers may wish to do so.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 fpu/softfloat.c         | 23 ++++++++++++++++++-----
 include/fpu/softfloat.h |  3 ++-
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 1552241b5e..72f45b0103 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5684,10 +5684,11 @@ floatx80 floatx80_div(floatx80 a, floatx80 b, float_status *status)
 | `a' with respect to the corresponding value `b'.  The operation is performed
 | according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic,
 | if 'mod' is false; if 'mod' is true, return the remainder based on truncating
-| the quotient toward zero instead.
+| the quotient toward zero instead.  '*quotient' is set to the low 64 bits of
+| the absolute value of the integer quotient.
 *----------------------------------------------------------------------------*/
 
-floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
+floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod, uint64_t *quotient,
                          float_status *status)
 {
     bool aSign, zSign;
@@ -5695,6 +5696,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
     uint64_t aSig0, aSig1, bSig;
     uint64_t q, term0, term1, alternateASig0, alternateASig1;
 
+    *quotient = 0;
     if (floatx80_invalid_encoding(a) || floatx80_invalid_encoding(b)) {
         float_raise(float_flag_invalid, status);
         return floatx80_default_nan(status);
@@ -5753,7 +5755,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
         shift128Right( aSig0, 0, 1, &aSig0, &aSig1 );
         expDiff = 0;
     }
-    q = ( bSig <= aSig0 );
+    *quotient = q = ( bSig <= aSig0 );
     if ( q ) aSig0 -= bSig;
     expDiff -= 64;
     while ( 0 < expDiff ) {
@@ -5763,6 +5765,8 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
         sub128( aSig0, aSig1, term0, term1, &aSig0, &aSig1 );
         shortShift128Left( aSig0, aSig1, 62, &aSig0, &aSig1 );
         expDiff -= 62;
+        *quotient <<= 62;
+        *quotient += q;
     }
     expDiff += 64;
     if ( 0 < expDiff ) {
@@ -5776,6 +5780,12 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
             ++q;
             sub128( aSig0, aSig1, term0, term1, &aSig0, &aSig1 );
         }
+        if (expDiff < 64) {
+            *quotient <<= expDiff;
+        } else {
+            *quotient = 0;
+        }
+        *quotient += q;
     }
     else {
         term1 = 0;
@@ -5790,6 +5800,7 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
             aSig0 = alternateASig0;
             aSig1 = alternateASig1;
             zSign = ! zSign;
+            ++*quotient;
         }
     }
     return
@@ -5806,7 +5817,8 @@ floatx80 floatx80_modrem(floatx80 a, floatx80 b, bool mod,
 
 floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
 {
-    return floatx80_modrem(a, b, false, status);
+    uint64_t quotient;
+    return floatx80_modrem(a, b, false, &quotient, status);
 }
 
 /*----------------------------------------------------------------------------
@@ -5817,7 +5829,8 @@ floatx80 floatx80_rem(floatx80 a, floatx80 b, float_status *status)
 
 floatx80 floatx80_mod(floatx80 a, floatx80 b, float_status *status)
 {
-    return floatx80_modrem(a, b, true, status);
+    uint64_t quotient;
+    return floatx80_modrem(a, b, true, &quotient, status);
 }
 
 /*----------------------------------------------------------------------------
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index bff6934d09..ff4e2605b1 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -687,7 +687,8 @@ floatx80 floatx80_add(floatx80, floatx80, float_status *status);
 floatx80 floatx80_sub(floatx80, floatx80, float_status *status);
 floatx80 floatx80_mul(floatx80, floatx80, float_status *status);
 floatx80 floatx80_div(floatx80, floatx80, float_status *status);
-floatx80 floatx80_modrem(floatx80, floatx80, bool, float_status *status);
+floatx80 floatx80_modrem(floatx80, floatx80, bool, uint64_t *,
+                         float_status *status);
 floatx80 floatx80_mod(floatx80, floatx80, float_status *status);
 floatx80 floatx80_rem(floatx80, floatx80, float_status *status);
 floatx80 floatx80_sqrt(floatx80, float_status *status);
-- 
2.17.1


-- 
Joseph S. Myers
joseph@codesourcery.com


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 6/6] target/i386: reimplement fprem, fprem1 using floatx80 operations
  2020-06-08 16:54 [PATCH v2 0/6] softfloat, target/i386: fprem, fprem1 fixes Joseph Myers
                   ` (4 preceding siblings ...)
  2020-06-08 16:57 ` [PATCH v2 5/6] softfloat: return low bits of quotient from floatx80_modrem Joseph Myers
@ 2020-06-08 16:58 ` Joseph Myers
  2020-06-12 16:41 ` [PATCH v2 0/6] softfloat, target/i386: fprem, fprem1 fixes Paolo Bonzini
  6 siblings, 0 replies; 8+ messages in thread
From: Joseph Myers @ 2020-06-08 16:58 UTC (permalink / raw)
  To: qemu-devel, aurelien, peter.maydell, alex.bennee, laurent,
	pbonzini, rth, ehabkost

The x87 fprem and fprem1 emulation is currently based around
conversion to double, which is inherently unsuitable for a good
emulation of any floatx80 operation.  Reimplement using the soft-float
floatx80 remainder operations.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/i386/fpu_helper.c | 156 ++++++++++++---------------------------
 1 file changed, 48 insertions(+), 108 deletions(-)

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 8ef5b463ea..0e531e3821 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -934,124 +934,64 @@ void helper_fxtract(CPUX86State *env)
     merge_exception_flags(env, old_flags);
 }
 
-void helper_fprem1(CPUX86State *env)
+static void helper_fprem_common(CPUX86State *env, bool mod)
 {
-    double st0, st1, dblq, fpsrcop, fptemp;
-    CPU_LDoubleU fpsrcop1, fptemp1;
-    int expdif;
-    signed long long int q;
-
-    st0 = floatx80_to_double(env, ST0);
-    st1 = floatx80_to_double(env, ST1);
-
-    if (isinf(st0) || isnan(st0) || isnan(st1) || (st1 == 0.0)) {
-        ST0 = double_to_floatx80(env, 0.0 / 0.0); /* NaN */
-        env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <-- 0000 */
-        return;
-    }
-
-    fpsrcop = st0;
-    fptemp = st1;
-    fpsrcop1.d = ST0;
-    fptemp1.d = ST1;
-    expdif = EXPD(fpsrcop1) - EXPD(fptemp1);
-
-    if (expdif < 0) {
-        /* optimisation? taken from the AMD docs */
-        env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <-- 0000 */
-        /* ST0 is unchanged */
-        return;
-    }
+    uint8_t old_flags = save_exception_flags(env);
+    uint64_t quotient;
+    CPU_LDoubleU temp0, temp1;
+    int exp0, exp1, expdiff;
 
-    if (expdif < 53) {
-        dblq = fpsrcop / fptemp;
-        /* round dblq towards nearest integer */
-        dblq = rint(dblq);
-        st0 = fpsrcop - fptemp * dblq;
+    temp0.d = ST0;
+    temp1.d = ST1;
+    exp0 = EXPD(temp0);
+    exp1 = EXPD(temp1);
 
-        /* convert dblq to q by truncating towards zero */
-        if (dblq < 0.0) {
-            q = (signed long long int)(-dblq);
+    env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <-- 0000 */
+    if (floatx80_is_zero(ST0) || floatx80_is_zero(ST1) ||
+        exp0 == 0x7fff || exp1 == 0x7fff ||
+        floatx80_invalid_encoding(ST0) || floatx80_invalid_encoding(ST1)) {
+        ST0 = floatx80_modrem(ST0, ST1, mod, &quotient, &env->fp_status);
+    } else {
+        if (exp0 == 0) {
+            exp0 = 1 - clz64(temp0.l.lower);
+        }
+        if (exp1 == 0) {
+            exp1 = 1 - clz64(temp1.l.lower);
+        }
+        expdiff = exp0 - exp1;
+        if (expdiff < 64) {
+            ST0 = floatx80_modrem(ST0, ST1, mod, &quotient, &env->fp_status);
+            env->fpus |= (quotient & 0x4) << (8 - 2);  /* (C0) <-- q2 */
+            env->fpus |= (quotient & 0x2) << (14 - 1); /* (C3) <-- q1 */
+            env->fpus |= (quotient & 0x1) << (9 - 0);  /* (C1) <-- q0 */
         } else {
-            q = (signed long long int)dblq;
+            /*
+             * Partial remainder.  This choice of how many bits to
+             * process at once is specified in AMD instruction set
+             * manuals, and empirically is followed by Intel
+             * processors as well; it ensures that the final remainder
+             * operation in a loop does produce the correct low three
+             * bits of the quotient.  AMD manuals specify that the
+             * flags other than C2 are cleared, and empirically Intel
+             * processors clear them as well.
+             */
+            int n = 32 + (expdiff % 32);
+            temp1.d = floatx80_scalbn(temp1.d, expdiff - n, &env->fp_status);
+            ST0 = floatx80_mod(ST0, temp1.d, &env->fp_status);
+            env->fpus |= 0x400;  /* C2 <-- 1 */
         }
-
-        env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <-- 0000 */
-        /* (C0,C3,C1) <-- (q2,q1,q0) */
-        env->fpus |= (q & 0x4) << (8 - 2);  /* (C0) <-- q2 */
-        env->fpus |= (q & 0x2) << (14 - 1); /* (C3) <-- q1 */
-        env->fpus |= (q & 0x1) << (9 - 0);  /* (C1) <-- q0 */
-    } else {
-        env->fpus |= 0x400;  /* C2 <-- 1 */
-        fptemp = pow(2.0, expdif - 50);
-        fpsrcop = (st0 / st1) / fptemp;
-        /* fpsrcop = integer obtained by chopping */
-        fpsrcop = (fpsrcop < 0.0) ?
-                  -(floor(fabs(fpsrcop))) : floor(fpsrcop);
-        st0 -= (st1 * fpsrcop * fptemp);
     }
-    ST0 = double_to_floatx80(env, st0);
+    merge_exception_flags(env, old_flags);
 }
 
-void helper_fprem(CPUX86State *env)
+void helper_fprem1(CPUX86State *env)
 {
-    double st0, st1, dblq, fpsrcop, fptemp;
-    CPU_LDoubleU fpsrcop1, fptemp1;
-    int expdif;
-    signed long long int q;
-
-    st0 = floatx80_to_double(env, ST0);
-    st1 = floatx80_to_double(env, ST1);
-
-    if (isinf(st0) || isnan(st0) || isnan(st1) || (st1 == 0.0)) {
-        ST0 = double_to_floatx80(env, 0.0 / 0.0); /* NaN */
-        env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <-- 0000 */
-        return;
-    }
-
-    fpsrcop = st0;
-    fptemp = st1;
-    fpsrcop1.d = ST0;
-    fptemp1.d = ST1;
-    expdif = EXPD(fpsrcop1) - EXPD(fptemp1);
-
-    if (expdif < 0) {
-        /* optimisation? taken from the AMD docs */
-        env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <-- 0000 */
-        /* ST0 is unchanged */
-        return;
-    }
-
-    if (expdif < 53) {
-        dblq = fpsrcop / fptemp; /* ST0 / ST1 */
-        /* round dblq towards zero */
-        dblq = (dblq < 0.0) ? ceil(dblq) : floor(dblq);
-        st0 = fpsrcop - fptemp * dblq; /* fpsrcop is ST0 */
-
-        /* convert dblq to q by truncating towards zero */
-        if (dblq < 0.0) {
-            q = (signed long long int)(-dblq);
-        } else {
-            q = (signed long long int)dblq;
-        }
-
-        env->fpus &= ~0x4700; /* (C3,C2,C1,C0) <-- 0000 */
-        /* (C0,C3,C1) <-- (q2,q1,q0) */
-        env->fpus |= (q & 0x4) << (8 - 2);  /* (C0) <-- q2 */
-        env->fpus |= (q & 0x2) << (14 - 1); /* (C3) <-- q1 */
-        env->fpus |= (q & 0x1) << (9 - 0);  /* (C1) <-- q0 */
-    } else {
-        int N = 32 + (expdif % 32); /* as per AMD docs */
+    helper_fprem_common(env, false);
+}
 
-        env->fpus |= 0x400;  /* C2 <-- 1 */
-        fptemp = pow(2.0, (double)(expdif - N));
-        fpsrcop = (st0 / st1) / fptemp;
-        /* fpsrcop = integer obtained by chopping */
-        fpsrcop = (fpsrcop < 0.0) ?
-                  -(floor(fabs(fpsrcop))) : floor(fpsrcop);
-        st0 -= (st1 * fpsrcop * fptemp);
-    }
-    ST0 = double_to_floatx80(env, st0);
+void helper_fprem(CPUX86State *env)
+{
+    helper_fprem_common(env, true);
 }
 
 void helper_fyl2xp1(CPUX86State *env)
-- 
2.17.1


-- 
Joseph S. Myers
joseph@codesourcery.com


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/6] softfloat, target/i386: fprem, fprem1 fixes
  2020-06-08 16:54 [PATCH v2 0/6] softfloat, target/i386: fprem, fprem1 fixes Joseph Myers
                   ` (5 preceding siblings ...)
  2020-06-08 16:58 ` [PATCH v2 6/6] target/i386: reimplement fprem, fprem1 using floatx80 operations Joseph Myers
@ 2020-06-12 16:41 ` Paolo Bonzini
  6 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2020-06-12 16:41 UTC (permalink / raw)
  To: Joseph Myers, qemu-devel, aurelien, peter.maydell, alex.bennee,
	laurent, rth, ehabkost

On 08/06/20 18:54, Joseph Myers wrote:
> The x87 floating-point emulation of the fprem and fprem1 instructions
> works via conversion to and from double.  This is inherently
> unsuitable for a good emulation of any floatx80 operation.  This patch
> series adapts the softfloat floatx80_rem implementation to be suitable
> for these instructions and uses it to reimplement them.
> 
> There is an existing test for these instructions, test-i386-fprem.c,
> based on comparison of output.  It produces 1679695 lines of output,
> and before this patch series 415422 of those lines are different on
> hardware from the output produced by QEMU.  Some of those differences
> are because QEMU's x87 emulation does not yet produce the "denormal
> operand" exception; ignoring such differences (modifying the output
> from a native run not to report that exception), there are still
> 398833 different lines.  This patch series reduces that latter number
> to 1 (that one difference being because of missing checks for
> floating-point stack underflow, another global issue with the x87
> emulation), or 35517 different lines without the correction for lack
> of denormal operand exception support.
> 
> Several fixes to and new features in the softfloat support for this
> operation are needed; floatx80_mod, previously present in the m68k
> code only, is made generic and unified with floatx80_rem in a new
> floatx80_modrem of which floatx80_mod and floatx80_rem are thin
> wrappers.  The only architectures using float*_rem for other formats
> are arm (FPA emulation) and openrisc (instructions that have been
> removed in the latest architecture version); they do not appear to
> need any of the new features, and all the bugs fixed are specific to
> floatx80, so no changes are made to the remainder implementation for
> those formats.
> 
> A new feature added is returning the low bits of the quotient from
> floatx80_modrem, as needed for both x87 and m68k.  The logic used to
> determine the low 7 bits of the quotient for m68k
> (target/m68k/fpu_helper.c:make_quotient) appears completely bogus (it
> looks at the result of converting the remainder to integer, the
> quotient having been discarded by that point); this patch series does
> not change that to use the new interface, but the m68k maintainers may
> wish to do so.
> 
> The Intel instruction set documentation leaves unspecified the exact
> number of bits by which the remainder instructions reduce the operand
> each time.  The AMD documentation gives a specific formula, which
> empirically Intel processors follow as well, and that formula is
> implemented in the code.  The AMD documentation also specifies that
> flags other than C2 are cleared in the partial remainder case, whereas
> the Intel manual is silent on that (but the processors do appear to
> clear those flags); this patch implements that flag clearing, and
> keeps the existing flag clearing in cases where the instructions raise
> "invalid" (although it seems hardware in fact only clears some but not
> all flags in that case, leaving other flags unchanged).
> 
> The Intel manuals include an inaccurate table asserting that (finite
> REM 0) should raise "divide by zero"; actually, in accordance with
> IEEE semantics, it raises "invalid".  The AMD manuals inaccurately say
> for both fprem and fprem1 that if the exponent difference is negative,
> the numerator is returned unchanged, which is correct (apart from
> normalizing pseudo-denormals) for fprem but not for fprem1 (and the
> old QEMU code had an incorrect optimization following the AMD manuals
> for fprem1).
> 
> Changes in version 2 of the patch series: fix comment formatting and
> combine patches 6 and 7.
> 
> Joseph Myers (6):
>   softfloat: merge floatx80_mod and floatx80_rem
>   softfloat: fix floatx80 remainder pseudo-denormal check for zero
>   softfloat: do not return pseudo-denormal from floatx80 remainder
>   softfloat: do not set denominator high bit for floatx80 remainder
>   softfloat: return low bits of quotient from floatx80_modrem
>   target/i386: reimplement fprem, fprem1 using floatx80 operations
> 
>  fpu/softfloat.c          |  87 ++++++++++++++++++----
>  include/fpu/softfloat.h  |   3 +
>  target/i386/fpu_helper.c | 156 ++++++++++++---------------------------
>  target/m68k/softfloat.c  |  83 ---------------------
>  target/m68k/softfloat.h  |   1 -
>  5 files changed, 122 insertions(+), 208 deletions(-)
> 

Queued, thanks.

Paolo



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-06-12 16:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-08 16:54 [PATCH v2 0/6] softfloat, target/i386: fprem, fprem1 fixes Joseph Myers
2020-06-08 16:55 ` [PATCH v2 1/6] softfloat: merge floatx80_mod and floatx80_rem Joseph Myers
2020-06-08 16:55 ` [PATCH v2 2/6] softfloat: fix floatx80 remainder pseudo-denormal check for zero Joseph Myers
2020-06-08 16:56 ` [PATCH v2 3/6] softfloat: do not return pseudo-denormal from floatx80 remainder Joseph Myers
2020-06-08 16:56 ` [PATCH v2 4/6] softfloat: do not set denominator high bit for " Joseph Myers
2020-06-08 16:57 ` [PATCH v2 5/6] softfloat: return low bits of quotient from floatx80_modrem Joseph Myers
2020-06-08 16:58 ` [PATCH v2 6/6] target/i386: reimplement fprem, fprem1 using floatx80 operations Joseph Myers
2020-06-12 16:41 ` [PATCH v2 0/6] softfloat, target/i386: fprem, fprem1 fixes Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.