[Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate
@ 2011-09-28 17:27 Peter Maydell
  2011-09-28 17:27 ` [Qemu-devel] [PATCH 1/5] softfloat: Reinstate accidentally disabled target-specific NaN handling Peter Maydell
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Peter Maydell @ 2011-09-28 17:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches

This patch series implements support in QEMU for some (user-mode)
instructions which are present on some recent processors
(eg Cortex-A15).

The new instructions are UDIV/SDIV (previously only available as a Thumb
encoding for M-class cores, now an optional extension to ARMv7A), and
the fused-multiply-accumulate instructions VFMA, VFMS, VFNMA, VFNMS.
We don't (yet) support a specific processor for system mode with these
extensions, so they are enabled only for ARM_CPUID_ANY.

The most interesting part of this is the fused multiply-accumulate
support which I have added to softfloat. Although this is only used
by ARM at the moment I've put in some flags which should be sufficient
for the variants provided by other architectures. (Specifically,
x86, Itanium and SPARC64 VI all need the "negate product" flag
since they have "-(a*b)+c" instructions. PPC needs the "negate
result" and "negate addend" flags.)

The NaN handling hooks are not as clean as the 2-operand versions.
This is because there isn't really any consensus about what order
the operands in a fused-mac should be in; ARM uses c+(a*b), PPC
does something else, and so on. So even if there was more than
one 3-operand IEEE operation a generic "pick one of 3 NaNs"
function would be tricky; I opted to just be clear about the
fact this was fused-mac specific. I've provided the PPC implementation
of the NaN-selection function because the architecture manual was
helpfully clear about what it was. x86 and SPARC don't seem to
document the behaviour in this area :-(

I've thrown in the "reinstate target-specific NaN handling" patch
I posted earlier this week just to avoid possible patch conflicts.

As usual, this has all been tested with a long run on my random
instruction generation testbed.

Peter Maydell (5):
  softfloat: Reinstate accidentally disabled target-specific NaN
    handling
  target-arm: v6 media multiply space: UNDEF on unassigned encodings
  target-arm: Add ARM UDIV/SDIV support
  softfloat: Implement fused multiply-add
  target-arm: Implement VFPv4 fused multiply-accumulate insns

 fpu/softfloat-specialize.h |  178 ++++++++++++++++++
 fpu/softfloat.c            |  433 ++++++++++++++++++++++++++++++++++++++++++++
 fpu/softfloat.h            |   14 ++
 target-arm/cpu.h           |    1 +
 target-arm/helper.c        |   14 ++
 target-arm/helper.h        |    3 +
 target-arm/translate.c     |  115 +++++++++++-
 7 files changed, 754 insertions(+), 4 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 1/5] softfloat: Reinstate accidentally disabled target-specific NaN handling
  2011-09-28 17:27 [Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate Peter Maydell
@ 2011-09-28 17:27 ` Peter Maydell
  2011-09-30 15:03   ` Richard Henderson
  2011-09-28 17:27 ` [Qemu-devel] [PATCH 2/5] target-arm: v6 media multiply space: UNDEF on unassigned encodings Peter Maydell
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2011-09-28 17:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches

Include config.h in softfloat.c, so that the target specific ifdefs in
softfloat-specialize.h are evaluated correctly. This was accidentally
broken in commit 789ec7ce2 when config-target.h was removed from
softfloat.h, and means that most targets will have been returning the
wrong results for calculations involving NaNs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 fpu/softfloat.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 2b20085..3aafa81 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -35,6 +35,11 @@ these four paragraphs for those parts of this code that are retained.
 
 =============================================================================*/
 
+/* softfloat (and in particular the code in softfloat-specialize.h) is
+ * target-dependent and needs the TARGET_* macros.
+ */
+#include "config.h"
+
 #include "softfloat.h"
 
 /*----------------------------------------------------------------------------
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 2/5] target-arm: v6 media multiply space: UNDEF on unassigned encodings
  2011-09-28 17:27 [Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate Peter Maydell
  2011-09-28 17:27 ` [Qemu-devel] [PATCH 1/5] softfloat: Reinstate accidentally disabled target-specific NaN handling Peter Maydell
@ 2011-09-28 17:27 ` Peter Maydell
  2011-09-28 17:27 ` [Qemu-devel] [PATCH 3/5] target-arm: Add ARM UDIV/SDIV support Peter Maydell
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Peter Maydell @ 2011-09-28 17:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches

Clean up the decoding of the v6 media multiply space so that we UNDEF
on unassigned encodings rather than randomly interpreting them as
some instruction in this space.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate.c |   24 ++++++++++++++++++++----
 1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 75c0ad4..e99fc18 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7569,11 +7569,16 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
                     }
                     break;
                 case 2: /* Multiplies (Type 3).  */
-                    tmp = load_reg(s, rm);
-                    tmp2 = load_reg(s, rs);
-                    if (insn & (1 << 20)) {
+                    switch ((insn >> 20) & 0x7) {
+                    case 5:
+                        if (((insn >> 6) ^ (insn >> 7)) & 1) {
+                            /* op2 not 00x or 11x : UNDEF */
+                            goto illegal_op;
+                        }
                         /* Signed multiply most significant [accumulate].
                            (SMMUL, SMMLA, SMMLS) */
+                        tmp = load_reg(s, rm);
+                        tmp2 = load_reg(s, rs);
                         tmp64 = gen_muls_i64_i32(tmp, tmp2);
 
                         if (rd != 15) {
@@ -7592,7 +7597,15 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
                         tcg_gen_trunc_i64_i32(tmp, tmp64);
                         tcg_temp_free_i64(tmp64);
                         store_reg(s, rn, tmp);
-                    } else {
+                        break;
+                    case 0:
+                    case 4:
+                        /* SMLAD, SMUAD, SMLSD, SMUSD, SMLALD, SMLSLD */
+                        if (insn & (1 << 7)) {
+                            goto illegal_op;
+                        }
+                        tmp = load_reg(s, rm);
+                        tmp2 = load_reg(s, rs);
                         if (insn & (1 << 5))
                             gen_swap_half(tmp2);
                         gen_smul_dual(tmp, tmp2);
@@ -7625,6 +7638,9 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
                               }
                             store_reg(s, rn, tmp);
                         }
+                        break;
+                    default:
+                        goto illegal_op;
                     }
                     break;
                 case 3:
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 3/5] target-arm: Add ARM UDIV/SDIV support
  2011-09-28 17:27 [Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate Peter Maydell
  2011-09-28 17:27 ` [Qemu-devel] [PATCH 1/5] softfloat: Reinstate accidentally disabled target-specific NaN handling Peter Maydell
  2011-09-28 17:27 ` [Qemu-devel] [PATCH 2/5] target-arm: v6 media multiply space: UNDEF on unassigned encodings Peter Maydell
@ 2011-09-28 17:27 ` Peter Maydell
  2011-09-28 17:27 ` [Qemu-devel] [PATCH 4/5] softfloat: Implement fused multiply-add Peter Maydell
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Peter Maydell @ 2011-09-28 17:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches

Add support for UDIV and SDIV in ARM mode. This is a new optional
feature for A profile cores (Thumb mode has had UDIV and SDIV for
M profile cores for some time).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index e99fc18..d3d7c5c 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7639,6 +7639,25 @@ static void disas_arm_insn(CPUState * env, DisasContext *s)
                             store_reg(s, rn, tmp);
                         }
                         break;
+                    case 1:
+                    case 3:
+                        /* SDIV, UDIV */
+                        if (!arm_feature(env, ARM_FEATURE_DIV)) {
+                            goto illegal_op;
+                        }
+                        if (((insn >> 5) & 7) || (rd != 15)) {
+                            goto illegal_op;
+                        }
+                        tmp = load_reg(s, rm);
+                        tmp2 = load_reg(s, rs);
+                        if (insn & (1 << 21)) {
+                            gen_helper_udiv(tmp, tmp, tmp2);
+                        } else {
+                            gen_helper_sdiv(tmp, tmp, tmp2);
+                        }
+                        tcg_temp_free_i32(tmp2);
+                        store_reg(s, rn, tmp);
+                        break;
                     default:
                         goto illegal_op;
                     }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 4/5] softfloat: Implement fused multiply-add
  2011-09-28 17:27 [Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate Peter Maydell
                   ` (2 preceding siblings ...)
  2011-09-28 17:27 ` [Qemu-devel] [PATCH 3/5] target-arm: Add ARM UDIV/SDIV support Peter Maydell
@ 2011-09-28 17:27 ` Peter Maydell
  2011-09-30 15:28   ` Richard Henderson
  2011-09-28 17:27 ` [Qemu-devel] [PATCH 5/5] target-arm: Implement VFPv4 fused multiply-accumulate insns Peter Maydell
  2011-09-28 19:13 ` [Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate Blue Swirl
  5 siblings, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2011-09-28 17:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches

Implement fused multiply-add as a softfloat primitive. This implements
"a+b*c" as a single step without any intermediate rounding; it is
specified in IEEE 754-2008 and implemented in a number of CPUs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 fpu/softfloat-specialize.h |  178 ++++++++++++++++++
 fpu/softfloat.c            |  428 ++++++++++++++++++++++++++++++++++++++++++++
 fpu/softfloat.h            |   14 ++
 3 files changed, 620 insertions(+), 0 deletions(-)

diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
index c165205..c5e2dab 100644
--- a/fpu/softfloat-specialize.h
+++ b/fpu/softfloat-specialize.h
@@ -420,6 +420,82 @@ static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN,
 #endif
 
 /*----------------------------------------------------------------------------
+| Select which NaN to propagate for a three-input operation.
+| For the moment we assume that no CPU needs the 'larger significand'
+| information.
+| Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN
+*----------------------------------------------------------------------------*/
+#if defined(TARGET_ARM)
+static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN,
+                         flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM)
+{
+    /* For ARM, the (inf,zero,qnan) case sets InvalidOp and returns
+     * the default NaN
+     */
+    if (infzero && cIsQNaN) {
+        float_raise(float_flag_invalid STATUS_VAR);
+        return 3;
+    }
+
+    /* This looks different from the ARM ARM pseudocode, because the ARM ARM
+     * puts the operands to a fused mac operation (a*b)+c in the order c,a,b.
+     */
+    if (cIsSNaN) {
+        return 2;
+    } else if (aIsSNaN) {
+        return 0;
+    } else if (bIsSNaN) {
+        return 1;
+    } else if (cIsQNaN) {
+        return 2;
+    } else if (aIsQNaN) {
+        return 0;
+    } else {
+        return 1;
+    }
+}
+#elif defined(TARGET_PPC)
+static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN,
+                         flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM)
+{
+    /* For PPC, the (inf,zero,qnan) case sets InvalidOp, but we prefer
+     * to return an input NaN if we have one (ie c) rather than generating
+     * a default NaN
+     */
+    if (infzero) {
+        float_raise(float_flag_invalid STATUS_VAR);
+        return 2;
+    }
+
+    /* If fRA is a NaN return it; otherwise if fRB is a NaN return it;
+     * otherwise return fRC. Note that muladd on PPC is (fRA * fRC) + frB
+     */
+    if (aIsSNaN || aIsQNaN) {
+        return 0;
+    } else if (cIsSNaN || cIsQNaN) {
+        return 2;
+    } else {
+        return 1;
+    }
+}
+#else
+/* A default implementation: prefer a to b to c.
+ * This is unlikely to actually match any real implementation.
+ */
+static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN,
+                         flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM)
+{
+    if (aIsSNaN || aIsQNaN) {
+        return 0;
+    } else if (bIsSNaN || bIsQNaN) {
+        return 1;
+    } else {
+        return 2;
+    }
+}
+#endif
+
+/*----------------------------------------------------------------------------
 | Takes two single-precision floating-point values `a' and `b', one of which
 | is a NaN, and returns the appropriate NaN result.  If either `a' or `b' is a
 | signaling NaN, the invalid exception is raised.
@@ -460,6 +536,57 @@ static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM)
 }
 
 /*----------------------------------------------------------------------------
+| Takes three single-precision floating-point values `a', `b' and `c', one of
+| which is a NaN, and returns the appropriate NaN result.  If any of  `a',
+| `b' or `c' is a signaling NaN, the invalid exception is raised.
+| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case
+| obviously c is a NaN, and whether to propagate c or some other NaN is
+| implementation defined).
+*----------------------------------------------------------------------------*/
+
+static float32 propagateFloat32MulAddNaN(float32 a, float32 b,
+                                         float32 c, flag infzero STATUS_PARAM)
+{
+    flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN,
+        cIsQuietNaN, cIsSignalingNaN;
+    int which;
+
+    aIsQuietNaN = float32_is_quiet_nan(a);
+    aIsSignalingNaN = float32_is_signaling_nan(a);
+    bIsQuietNaN = float32_is_quiet_nan(b);
+    bIsSignalingNaN = float32_is_signaling_nan(b);
+    cIsQuietNaN = float32_is_quiet_nan(c);
+    cIsSignalingNaN = float32_is_signaling_nan(c);
+
+    if (aIsSignalingNaN | bIsSignalingNaN | cIsSignalingNaN) {
+        float_raise(float_flag_invalid STATUS_VAR);
+    }
+
+    which = pickNaNMulAdd(aIsQuietNaN, aIsSignalingNaN,
+                          bIsQuietNaN, bIsSignalingNaN,
+                          cIsQuietNaN, cIsSignalingNaN, infzero STATUS_VAR);
+
+    if (STATUS(default_nan_mode)) {
+        /* Note that this check is after pickNaNMulAdd so that function
+         * has an opportunity to set the Invalid flag.
+         */
+        return float32_default_nan;
+    }
+
+    switch (which) {
+    case 0:
+        return float32_maybe_silence_nan(a);
+    case 1:
+        return float32_maybe_silence_nan(b);
+    case 2:
+        return float32_maybe_silence_nan(c);
+    case 3:
+    default:
+        return float32_default_nan;
+    }
+}
+
+/*----------------------------------------------------------------------------
 | Returns 1 if the double-precision floating-point value `a' is a quiet
 | NaN; otherwise returns 0.
 *----------------------------------------------------------------------------*/
@@ -596,6 +723,57 @@ static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM)
 }
 
 /*----------------------------------------------------------------------------
+| Takes three double-precision floating-point values `a', `b' and `c', one of
+| which is a NaN, and returns the appropriate NaN result.  If any of  `a',
+| `b' or `c' is a signaling NaN, the invalid exception is raised.
+| The input infzero indicates whether a*b was 0*inf or inf*0 (in which case
+| obviously c is a NaN, and whether to propagate c or some other NaN is
+| implementation defined).
+*----------------------------------------------------------------------------*/
+
+static float64 propagateFloat64MulAddNaN(float64 a, float64 b,
+                                         float64 c, flag infzero STATUS_PARAM)
+{
+    flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN,
+        cIsQuietNaN, cIsSignalingNaN;
+    int which;
+
+    aIsQuietNaN = float64_is_quiet_nan(a);
+    aIsSignalingNaN = float64_is_signaling_nan(a);
+    bIsQuietNaN = float64_is_quiet_nan(b);
+    bIsSignalingNaN = float64_is_signaling_nan(b);
+    cIsQuietNaN = float64_is_quiet_nan(c);
+    cIsSignalingNaN = float64_is_signaling_nan(c);
+
+    if (aIsSignalingNaN | bIsSignalingNaN | cIsSignalingNaN) {
+        float_raise(float_flag_invalid STATUS_VAR);
+    }
+
+    which = pickNaNMulAdd(aIsQuietNaN, aIsSignalingNaN,
+                          bIsQuietNaN, bIsSignalingNaN,
+                          cIsQuietNaN, cIsSignalingNaN, infzero STATUS_VAR);
+
+    if (STATUS(default_nan_mode)) {
+        /* Note that this check is after pickNaNMulAdd so that function
+         * has an opportunity to set the Invalid flag.
+         */
+        return float64_default_nan;
+    }
+
+    switch (which) {
+    case 0:
+        return float64_maybe_silence_nan(a);
+    case 1:
+        return float64_maybe_silence_nan(b);
+    case 2:
+        return float64_maybe_silence_nan(c);
+    case 3:
+    default:
+        return float64_default_nan;
+    }
+}
+
+/*----------------------------------------------------------------------------
 | Returns 1 if the extended double-precision floating-point value `a' is a
 | quiet NaN; otherwise returns 0. This slightly differs from the same
 | function for other types as floatx80 has an explicit bit.
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 3aafa81..42edbc7 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2118,6 +2118,217 @@ float32 float32_rem( float32 a, float32 b STATUS_PARAM )
 }
 
 /*----------------------------------------------------------------------------
+| Returns the result of multiplying the single-precision floating-point values
+| `a' and `b' then adding 'c', with no intermediate rounding step after the
+| multiplication.  The operation is performed according to the IEC/IEEE
+| Standard for Binary Floating-Point Arithmetic 754-2008.
+| The flags argument allows the caller to select negation of the
+| addend, the intermediate product, or the final result. (The difference
+| between this and having the caller do a separate negation is that negating
+| externally will flip the sign bit on NaNs.)
+*----------------------------------------------------------------------------*/
+
+float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM)
+{
+    flag aSign, bSign, cSign, zSign;
+    int aExp, bExp, cExp, pExp, zExp, expDiff;
+    uint32_t aSig, bSig, cSig;
+    flag pInf, pZero, pSign;
+    uint64_t pSig64, cSig64, zSig64;
+    uint32_t pSig;
+    int shiftcount;
+    flag signflip, infzero;
+
+    a = float32_squash_input_denormal(a STATUS_VAR);
+    b = float32_squash_input_denormal(b STATUS_VAR);
+    c = float32_squash_input_denormal(c STATUS_VAR);
+    aSig = extractFloat32Frac(a);
+    aExp = extractFloat32Exp(a);
+    aSign = extractFloat32Sign(a);
+    bSig = extractFloat32Frac(b);
+    bExp = extractFloat32Exp(b);
+    bSign = extractFloat32Sign(b);
+    cSig = extractFloat32Frac(c);
+    cExp = extractFloat32Exp(c);
+    cSign = extractFloat32Sign(c);
+
+    infzero = ((aExp == 0 && aSig == 0 && bExp == 0xff && bSig == 0) ||
+               (aExp == 0xff && aSig == 0 && bExp == 0 && bSig == 0));
+
+    /* It is implementation-defined whether the cases of (0,inf,qnan)
+     * and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
+     * they return if they do), so we have to hand this information
+     * off to the target-specific pick-a-NaN routine.
+     */
+    if (((aExp == 0xff) && aSig) ||
+        ((bExp == 0xff) && bSig) ||
+        ((cExp == 0xff) && cSig)) {
+        return propagateFloat32MulAddNaN(a, b, c, infzero STATUS_VAR);
+    }
+
+    if (infzero) {
+        float_raise(float_flag_invalid STATUS_VAR);
+        return float32_default_nan;
+    }
+
+    if (flags & float_muladd_negate_c) {
+        cSign ^= 1;
+    }
+
+    signflip = (flags & float_muladd_negate_result) ? 1 : 0;
+
+    /* Work out the sign and type of the product */
+    pSign = aSign ^ bSign;
+    if (flags & float_muladd_negate_product) {
+        pSign ^= 1;
+    }
+    pInf = (aExp == 0xff) || (bExp == 0xff);
+    pZero = ((aExp | aSig) == 0) || ((bExp | bSig) == 0);
+
+    if (cExp == 0xff) {
+        if (pInf && (pSign ^ cSign)) {
+            /* addition of opposite-signed infinities => InvalidOperation */
+            float_raise(float_flag_invalid STATUS_VAR);
+            return float32_default_nan;
+        }
+        /* Otherwise generate an infinity of the same sign */
+        return packFloat32(cSign ^ signflip, 0xff, 0);
+    }
+
+    if (pInf) {
+        return packFloat32(pSign ^ signflip, 0xff, 0);
+    }
+
+    if (pZero) {
+        if (cExp == 0) {
+            if (cSig == 0) {
+                /* Adding two exact zeroes */
+                if (pSign == cSign) {
+                    zSign = pSign;
+                } else if (STATUS(float_rounding_mode) == float_round_down) {
+                    zSign = 1;
+                } else {
+                    zSign = 0;
+                }
+                return packFloat32(zSign ^ signflip, 0, 0);
+            }
+            /* Exact zero plus a denorm */
+            if (STATUS(flush_to_zero)) {
+                float_raise(float_flag_output_denormal STATUS_VAR);
+                return packFloat32(cSign ^ signflip, 0, 0);
+            }
+        }
+        /* Zero plus something non-zero : just return the something */
+        return c ^ (signflip << 31);
+    }
+
+    if (aExp == 0) {
+        normalizeFloat32Subnormal(aSig, &aExp, &aSig);
+    }
+    if (bExp == 0) {
+        normalizeFloat32Subnormal(bSig, &bExp, &bSig);
+    }
+
+    /* Calculate the actual result a * b + c */
+
+    /* Multiply first; this is easy. */
+    /* NB: we subtract 0x7e where float32_mul() subtracts 0x7f
+     * because we want the true exponent, not the "one-less-than"
+     * flavour that roundAndPackFloat32() takes.
+     */
+    pExp = aExp + bExp - 0x7e;
+    aSig = (aSig | 0x00800000) << 7;
+    bSig = (bSig | 0x00800000) << 8;
+    pSig64 = (uint64_t)aSig * bSig;
+    if ((int64_t)(pSig64 << 1) >= 0) {
+        pSig64 <<= 1;
+        pExp--;
+    }
+
+    zSign = pSign ^ signflip;
+
+    /* Now pSig64 is the significand of the multiply, with the explicit bit in
+     * position 62.
+     */
+    if (cExp == 0) {
+        if (!cSig) {
+            /* Throw out the special case of c being an exact zero now */
+            shift64RightJamming(pSig64, 32, &pSig64);
+            pSig = pSig64;
+            return roundAndPackFloat32(zSign, pExp - 1,
+                                       pSig STATUS_VAR);
+        }
+        normalizeFloat32Subnormal(cSig, &cExp, &cSig);
+    }
+
+    cSig64 = (uint64_t)cSig << 39;
+    cSig64 |= LIT64(0x4000000000000000);
+    expDiff = pExp - cExp;
+
+    if (pSign == cSign) {
+        /* Addition */
+        if (expDiff > 0) {
+            /* scale c to match p */
+            shift64RightJamming(cSig64, expDiff, &cSig64);
+            zExp = pExp;
+        } else if (expDiff < 0) {
+            /* scale p to match c */
+            shift64RightJamming(pSig64, -expDiff, &pSig64);
+            zExp = cExp;
+        } else {
+            /* no scaling needed */
+            zExp = cExp;
+        }
+        /* Add significands and make sure explicit bit ends up in posn 62 */
+        zSig64 = pSig64 + cSig64;
+        if ((int64_t)zSig64 < 0) {
+            shift64RightJamming(zSig64, 1, &zSig64);
+        } else {
+            zExp--;
+        }
+        shift64RightJamming(zSig64, 32, &zSig64);
+        return roundAndPackFloat32(zSign, zExp, zSig64 STATUS_VAR);
+    } else {
+        /* Subtraction */
+        if (expDiff > 0) {
+            shift64RightJamming(cSig64, expDiff, &cSig64);
+            zSig64 = pSig64 - cSig64;
+            zExp = pExp;
+        } else if (expDiff < 0) {
+            shift64RightJamming(pSig64, -expDiff, &pSig64);
+            zSig64 = cSig64 - pSig64;
+            zExp = cExp;
+            zSign ^= 1;
+        } else {
+            zExp = pExp;
+            if (cSig64 < pSig64) {
+                zSig64 = pSig64 - cSig64;
+            } else if (pSig64 < cSig64) {
+                zSig64 = cSig64 - pSig64;
+                zSign ^= 1;
+            } else {
+                /* Exact zero */
+                zSign = signflip;
+                if (STATUS(float_rounding_mode) == float_round_down) {
+                    zSign ^= 1;
+                }
+                return packFloat32(zSign, 0, 0);
+            }
+        }
+        --zExp;
+        /* Do the equivalent of normalizeRoundAndPackFloat32() but
+         * starting with the significand in a uint64_t.
+         */
+        shiftcount = countLeadingZeros64(zSig64) - 1;
+        zSig64 <<= shiftcount;
+        zExp -= shiftcount;
+        shift64RightJamming(zSig64, 32, &zSig64);
+        return roundAndPackFloat32(zSign, zExp, zSig64 STATUS_VAR);
+    }
+}
+
+
+/*----------------------------------------------------------------------------
 | Returns the square root of the single-precision floating-point value `a'.
 | The operation is performed according to the IEC/IEEE Standard for Binary
 | Floating-Point Arithmetic.
@@ -3465,6 +3676,223 @@ float64 float64_rem( float64 a, float64 b STATUS_PARAM )
 }
 
 /*----------------------------------------------------------------------------
+| Returns the result of multiplying the double-precision floating-point values
+| `a' and `b' then adding 'c', with no intermediate rounding step after the
+| multiplication.  The operation is performed according to the IEC/IEEE
+| Standard for Binary Floating-Point Arithmetic 754-2008.
+| The flags argument allows the caller to select negation of the
+| addend, the intermediate product, or the final result. (The difference
+| between this and having the caller do a separate negation is that negating
+| externally will flip the sign bit on NaNs.)
+*----------------------------------------------------------------------------*/
+
+float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM)
+{
+    flag aSign, bSign, cSign, zSign;
+    int aExp, bExp, cExp, pExp, zExp, expDiff;
+    uint64_t aSig, bSig, cSig;
+    flag pInf, pZero, pSign;
+    uint64_t pSig0, pSig1, cSig0, cSig1, zSig0, zSig1;
+    int shiftcount;
+    flag signflip, infzero;
+
+    a = float64_squash_input_denormal(a STATUS_VAR);
+    b = float64_squash_input_denormal(b STATUS_VAR);
+    c = float64_squash_input_denormal(c STATUS_VAR);
+    aSig = extractFloat64Frac(a);
+    aExp = extractFloat64Exp(a);
+    aSign = extractFloat64Sign(a);
+    bSig = extractFloat64Frac(b);
+    bExp = extractFloat64Exp(b);
+    bSign = extractFloat64Sign(b);
+    cSig = extractFloat64Frac(c);
+    cExp = extractFloat64Exp(c);
+    cSign = extractFloat64Sign(c);
+
+    infzero = ((aExp == 0 && aSig == 0 && bExp == 0x7ff && bSig == 0) ||
+               (aExp == 0x7ff && aSig == 0 && bExp == 0 && bSig == 0));
+
+    /* It is implementation-defined whether the cases of (0,inf,qnan)
+     * and (inf,0,qnan) raise InvalidOperation or not (and what QNaN
+     * they return if they do), so we have to hand this information
+     * off to the target-specific pick-a-NaN routine.
+     */
+    if (((aExp == 0x7ff) && aSig) ||
+        ((bExp == 0x7ff) && bSig) ||
+        ((cExp == 0x7ff) && cSig)) {
+        return propagateFloat64MulAddNaN(a, b, c, infzero STATUS_VAR);
+    }
+
+    if (infzero) {
+        float_raise(float_flag_invalid STATUS_VAR);
+        return float64_default_nan;
+    }
+
+    if (flags & float_muladd_negate_c) {
+        cSign ^= 1;
+    }
+
+    signflip = (flags & float_muladd_negate_result) ? 1 : 0;
+
+    /* Work out the sign and type of the product */
+    pSign = aSign ^ bSign;
+    if (flags & float_muladd_negate_product) {
+        pSign ^= 1;
+    }
+    pInf = (aExp == 0x7ff) || (bExp == 0x7ff);
+    pZero = ((aExp | aSig) == 0) || ((bExp | bSig) == 0);
+
+    if (cExp == 0x7ff) {
+        if (pInf && (pSign ^ cSign)) {
+            /* addition of opposite-signed infinities => InvalidOperation */
+            float_raise(float_flag_invalid STATUS_VAR);
+            return float64_default_nan;
+        }
+        /* Otherwise generate an infinity of the same sign */
+        return packFloat64(cSign ^ signflip, 0x7ff, 0);
+    }
+
+    if (pInf) {
+        return packFloat64(pSign ^ signflip, 0x7ff, 0);
+    }
+
+    if (pZero) {
+        if (cExp == 0) {
+            if (cSig == 0) {
+                /* Adding two exact zeroes */
+                if (pSign == cSign) {
+                    zSign = pSign;
+                } else if (STATUS(float_rounding_mode) == float_round_down) {
+                    zSign = 1;
+                } else {
+                    zSign = 0;
+                }
+                return packFloat64(zSign ^ signflip, 0, 0);
+            }
+            /* Exact zero plus a denorm */
+            if (STATUS(flush_to_zero)) {
+                float_raise(float_flag_output_denormal STATUS_VAR);
+                return packFloat64(cSign ^ signflip, 0, 0);
+            }
+        }
+        /* Zero plus something non-zero : just return the something */
+        return c ^ ((uint64_t)signflip << 63);
+    }
+
+    if (aExp == 0) {
+        normalizeFloat64Subnormal(aSig, &aExp, &aSig);
+    }
+    if (bExp == 0) {
+        normalizeFloat64Subnormal(bSig, &bExp, &bSig);
+    }
+
+    /* Calculate the actual result a * b + c */
+
+    /* Multiply first; this is easy. */
+    /* NB: we subtract 0x3fe where float64_mul() subtracts 0x3ff
+     * because we want the true exponent, not the "one-less-than"
+     * flavour that roundAndPackFloat64() takes.
+     */
+    pExp = aExp + bExp - 0x3fe;
+    aSig = (aSig | LIT64(0x0010000000000000))<<10;
+    bSig = (bSig | LIT64(0x0010000000000000))<<11;
+    mul64To128(aSig, bSig, &pSig0, &pSig1);
+    if ((int64_t)(pSig0 << 1) >= 0) {
+        shortShift128Left(pSig0, pSig1, 1, &pSig0, &pSig1);
+        pExp--;
+    }
+
+    zSign = pSign ^ signflip;
+
+    /* Now [pSig0:pSig1] is the significand of the multiply, with the explicit
+     * bit in position 126.
+     */
+    if (cExp == 0) {
+        if (!cSig) {
+            /* Throw out the special case of c being an exact zero now */
+            shift128RightJamming(pSig0, pSig1, 64, &pSig0, &pSig1);
+            return roundAndPackFloat64(zSign, pExp - 1,
+                                       pSig1 STATUS_VAR);
+        }
+        normalizeFloat64Subnormal(cSig, &cExp, &cSig);
+    }
+
+    cSig0 = cSig << 10;
+    cSig1 = 0;
+    cSig0 |= LIT64(0x4000000000000000);
+    expDiff = pExp - cExp;
+
+    if (pSign == cSign) {
+        /* Addition */
+        if (expDiff > 0) {
+            /* scale c to match p */
+            shift128RightJamming(cSig0, cSig1, expDiff, &cSig0, &cSig1);
+            zExp = pExp;
+        } else if (expDiff < 0) {
+            /* scale p to match c */
+            shift128RightJamming(pSig0, pSig1, -expDiff, &pSig0, &pSig1);
+            zExp = cExp;
+        } else {
+            /* no scaling needed */
+            zExp = cExp;
+        }
+        /* Add significands and make sure explicit bit ends up in posn 126 */
+        add128(pSig0, pSig1, cSig0, cSig1, &zSig0, &zSig1);
+        if ((int64_t)zSig0 < 0) {
+            shift128RightJamming(zSig0, zSig1, 1, &zSig0, &zSig1);
+        } else {
+            zExp--;
+        }
+        shift128RightJamming(zSig0, zSig1, 64, &zSig0, &zSig1);
+        return roundAndPackFloat64(zSign, zExp, zSig1 STATUS_VAR);
+    } else {
+        /* Subtraction */
+        if (expDiff > 0) {
+            shift128RightJamming(cSig0, cSig1, expDiff, &cSig0, &cSig1);
+            sub128(pSig0, pSig1, cSig0, cSig1, &zSig0, &zSig1);
+            zExp = pExp;
+        } else if (expDiff < 0) {
+            shift128RightJamming(pSig0, pSig1, -expDiff, &pSig0, &pSig1);
+            sub128(cSig0, cSig1, pSig0, pSig1, &zSig0, &zSig1);
+            zExp = cExp;
+            zSign ^= 1;
+        } else {
+            zExp = pExp;
+            if (lt128(cSig0, cSig1, pSig0, pSig1)) {
+                sub128(pSig0, pSig1, cSig0, cSig1, &zSig0, &zSig1);
+            } else if (lt128(pSig0, pSig1, cSig0, cSig1)) {
+                sub128(cSig0, cSig1, pSig0, pSig1, &zSig0, &zSig1);
+                zSign ^= 1;
+            } else {
+                /* Exact zero */
+                zSign = signflip;
+                if (STATUS(float_rounding_mode) == float_round_down) {
+                    zSign ^= 1;
+                }
+                return packFloat64(zSign, 0, 0);
+            }
+        }
+        --zExp;
+        /* Do the equivalent of normalizeRoundAndPackFloat64() but
+         * starting with the significand in a pair of uint64_t.
+         */
+        if (zSig0) {
+            shiftcount = countLeadingZeros64(zSig0) - 1;
+            shortShift128Left(zSig0, zSig1, shiftcount, &zSig0, &zSig1);
+            if (zSig1) {
+                zSig0 |= 1;
+            }
+            zExp -= shiftcount;
+        } else {
+            shiftcount = countLeadingZeros64(zSig1) - 1;
+            zSig0 = zSig1 << shiftcount;
+            zExp -= (shiftcount + 64);
+        }
+        return roundAndPackFloat64(zSign, zExp, zSig0 STATUS_VAR);
+    }
+}
+
+/*----------------------------------------------------------------------------
 | Returns the square root of the double-precision floating-point value `a'.
 | The operation is performed according to the IEC/IEEE Standard for Binary
 | Floating-Point Arithmetic.
diff --git a/fpu/softfloat.h b/fpu/softfloat.h
index 618ddee..07c2929 100644
--- a/fpu/softfloat.h
+++ b/fpu/softfloat.h
@@ -212,6 +212,18 @@ void set_floatx80_rounding_precision(int val STATUS_PARAM);
 void float_raise( int8 flags STATUS_PARAM);
 
 /*----------------------------------------------------------------------------
+| Options to indicate which negations to perform in float*_muladd()
+| Using these differs from negating an input or output before calling
+| the muladd function in that this means that a NaN doesn't have its
+| sign bit inverted before it is propagated.
+*----------------------------------------------------------------------------*/
+enum {
+    float_muladd_negate_c = 1,
+    float_muladd_negate_product = 2,
+    float_muladd_negate_result = 3,
+};
+
+/*----------------------------------------------------------------------------
 | Software IEC/IEEE integer-to-floating-point conversion routines.
 *----------------------------------------------------------------------------*/
 float32 int32_to_float32( int32 STATUS_PARAM );
@@ -269,6 +281,7 @@ float32 float32_sub( float32, float32 STATUS_PARAM );
 float32 float32_mul( float32, float32 STATUS_PARAM );
 float32 float32_div( float32, float32 STATUS_PARAM );
 float32 float32_rem( float32, float32 STATUS_PARAM );
+float32 float32_muladd(float32, float32, float32, int STATUS_PARAM);
 float32 float32_sqrt( float32 STATUS_PARAM );
 float32 float32_exp2( float32 STATUS_PARAM );
 float32 float32_log2( float32 STATUS_PARAM );
@@ -375,6 +388,7 @@ float64 float64_sub( float64, float64 STATUS_PARAM );
 float64 float64_mul( float64, float64 STATUS_PARAM );
 float64 float64_div( float64, float64 STATUS_PARAM );
 float64 float64_rem( float64, float64 STATUS_PARAM );
+float64 float64_muladd(float64, float64, float64, int STATUS_PARAM);
 float64 float64_sqrt( float64 STATUS_PARAM );
 float64 float64_log2( float64 STATUS_PARAM );
 int float64_eq( float64, float64 STATUS_PARAM );
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 5/5] target-arm: Implement VFPv4 fused multiply-accumulate insns
  2011-09-28 17:27 [Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate Peter Maydell
                   ` (3 preceding siblings ...)
  2011-09-28 17:27 ` [Qemu-devel] [PATCH 4/5] softfloat: Implement fused multiply-add Peter Maydell
@ 2011-09-28 17:27 ` Peter Maydell
  2011-09-28 19:13 ` [Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate Blue Swirl
  5 siblings, 0 replies; 11+ messages in thread
From: Peter Maydell @ 2011-09-28 17:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches

Implement the fused multiply-accumulate instructions (VFMA, VFMS,
VFNMA, VFNMS) which are new in VFPv4.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/cpu.h       |    1 +
 target-arm/helper.c    |   14 +++++++++
 target-arm/helper.h    |    3 ++
 target-arm/translate.c |   72 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 6ab780d..e7de641 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -375,6 +375,7 @@ enum arm_features {
     ARM_FEATURE_V5,
     ARM_FEATURE_STRONGARM,
     ARM_FEATURE_VAPA, /* cp15 VA to PA lookups */
+    ARM_FEATURE_VFP4, /* VFPv4 (implies that NEON is v2) */
 };
 
 static inline int arm_feature(CPUARMState *env, int feature)
diff --git a/target-arm/helper.c b/target-arm/helper.c
index d3a3ba2..a1686b8 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -204,6 +204,7 @@ static void cpu_reset_model_id(CPUARMState *env, uint32_t id)
         set_feature(env, ARM_FEATURE_THUMB2);
         set_feature(env, ARM_FEATURE_VFP);
         set_feature(env, ARM_FEATURE_VFP3);
+        set_feature(env, ARM_FEATURE_VFP4);
         set_feature(env, ARM_FEATURE_VFP_FP16);
         set_feature(env, ARM_FEATURE_NEON);
         set_feature(env, ARM_FEATURE_THUMB2EE);
@@ -3082,6 +3083,19 @@ uint32_t HELPER(rsqrte_u32)(uint32_t a, CPUState *env)
     return 0x80000000 | ((float64_val(f64) >> 21) & 0x7fffffff);
 }
 
+/* VFPv4 fused multiply-accumulate */
+float32 VFP_HELPER(muladd, s)(float32 a, float32 b, float32 c, void *fpstp)
+{
+    float_status *fpst = fpstp;
+    return float32_muladd(a, b, c, 0, fpst);
+}
+
+float64 VFP_HELPER(muladd, d)(float64 a, float64 b, float64 c, void *fpstp)
+{
+    float_status *fpst = fpstp;
+    return float64_muladd(a, b, c, 0, fpst);
+}
+
 void HELPER(set_teecr)(CPUState *env, uint32_t val)
 {
     val &= 1;
diff --git a/target-arm/helper.h b/target-arm/helper.h
index 3ad1cb0..16dd5fc 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -132,6 +132,9 @@ DEF_HELPER_2(vfp_fcvt_f32_to_f16, i32, f32, env)
 DEF_HELPER_2(neon_fcvt_f16_to_f32, f32, i32, env)
 DEF_HELPER_2(neon_fcvt_f32_to_f16, i32, f32, env)
 
+DEF_HELPER_4(vfp_muladdd, f64, f64, f64, f64, ptr)
+DEF_HELPER_4(vfp_muladds, f32, f32, f32, f32, ptr)
+
 DEF_HELPER_3(recps_f32, f32, f32, f32, env)
 DEF_HELPER_3(rsqrts_f32, f32, f32, f32, env)
 DEF_HELPER_2(recpe_f32, f32, f32, env)
diff --git a/target-arm/translate.c b/target-arm/translate.c
index d3d7c5c..4619aa3 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -3141,6 +3141,57 @@ static int disas_vfp_insn(CPUState * env, DisasContext *s, uint32_t insn)
                 case 8: /* div: fn / fm */
                     gen_vfp_div(dp);
                     break;
+                case 10: /* VFNMA : fd = muladd(-fd,  fn, fm) */
+                case 11: /* VFNMS : fd = muladd(-fd, -fn, fm) */
+                case 12: /* VFMA  : fd = muladd( fd,  fn, fm) */
+                case 13: /* VFMS  : fd = muladd( fd, -fn, fm) */
+                    /* These are fused multiply-add, and must be done as one
+                     * floating point operation with no rounding between the
+                     * multiplication and addition steps.
+                     * NB that doing the negations here as separate steps is
+                     * correct : an input NaN should come out with its sign bit
+                     * flipped if it is a negated-input.
+                     */
+                    if (!arm_feature(env, ARM_FEATURE_VFP4)) {
+                        return 1;
+                    }
+                    if (dp) {
+                        TCGv_ptr fpst;
+                        TCGv_i64 frd;
+                        if (op & 1) {
+                            /* VFNMS, VFMS */
+                            gen_helper_vfp_negd(cpu_F0d, cpu_F0d);
+                        }
+                        frd = tcg_temp_new_i64();
+                        tcg_gen_ld_f64(frd, cpu_env, vfp_reg_offset(dp, rd));
+                        if (op & 2) {
+                            /* VFNMA, VFNMS */
+                            gen_helper_vfp_negd(frd, frd);
+                        }
+                        fpst = get_fpstatus_ptr(0);
+                        gen_helper_vfp_muladdd(cpu_F0d, cpu_F0d,
+                                               cpu_F1d, frd, fpst);
+                        tcg_temp_free_ptr(fpst);
+                        tcg_temp_free_i64(frd);
+                    } else {
+                        TCGv_ptr fpst;
+                        TCGv_i32 frd;
+                        if (op & 1) {
+                            /* VFNMS, VFMS */
+                            gen_helper_vfp_negs(cpu_F0s, cpu_F0s);
+                        }
+                        frd = tcg_temp_new_i32();
+                        tcg_gen_ld_f32(frd, cpu_env, vfp_reg_offset(dp, rd));
+                        if (op & 2) {
+                            gen_helper_vfp_negs(frd, frd);
+                        }
+                        fpst = get_fpstatus_ptr(0);
+                        gen_helper_vfp_muladds(cpu_F0s, cpu_F0s,
+                                               cpu_F1s, frd, fpst);
+                        tcg_temp_free_ptr(fpst);
+                        tcg_temp_free_i32(frd);
+                    }
+                    break;
                 case 14: /* fconst */
                     if (!arm_feature(env, ARM_FEATURE_VFP3))
                       return 1;
@@ -4417,6 +4468,7 @@ static void gen_neon_narrow_op(int op, int u, int size, TCGv dest, TCGv_i64 src)
 #define NEON_3R_VPMIN 21
 #define NEON_3R_VQDMULH_VQRDMULH 22
 #define NEON_3R_VPADD 23
+#define NEON_3R_VFM 25 /* VFMA, VFMS : float fused multiply-add */
 #define NEON_3R_FLOAT_ARITH 26 /* float VADD, VSUB, VPADD, VABD */
 #define NEON_3R_FLOAT_MULTIPLY 27 /* float VMLA, VMLS, VMUL */
 #define NEON_3R_FLOAT_CMP 28 /* float VCEQ, VCGE, VCGT */
@@ -4449,6 +4501,7 @@ static const uint8_t neon_3r_sizes[] = {
     [NEON_3R_VPMIN] = 0x7,
     [NEON_3R_VQDMULH_VQRDMULH] = 0x6,
     [NEON_3R_VPADD] = 0x7,
+    [NEON_3R_VFM] = 0x5, /* size bit 1 encodes op */
     [NEON_3R_FLOAT_ARITH] = 0x5, /* size bit 1 encodes op */
     [NEON_3R_FLOAT_MULTIPLY] = 0x5, /* size bit 1 encodes op */
     [NEON_3R_FLOAT_CMP] = 0x5, /* size bit 1 encodes op */
@@ -4726,6 +4779,11 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
                 return 1;
             }
             break;
+        case NEON_3R_VFM:
+            if (!arm_feature(env, ARM_FEATURE_VFP4) || u) {
+                return 1;
+            }
+            break;
         default:
             break;
         }
@@ -5006,6 +5064,20 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn)
             else
                 gen_helper_rsqrts_f32(tmp, tmp, tmp2, cpu_env);
             break;
+        case NEON_3R_VFM:
+        {
+            /* VFMA, VFMS: fused multiply-add */
+            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
+            TCGv_i32 tmp3 = neon_load_reg(rd, pass);
+            if (size) {
+                /* VFMS */
+                gen_helper_vfp_negs(tmp, tmp);
+            }
+            gen_helper_vfp_muladds(tmp, tmp, tmp2, tmp3, fpstatus);
+            tcg_temp_free_i32(tmp3);
+            tcg_temp_free_ptr(fpstatus);
+            break;
+        }
         default:
             abort();
         }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate
  2011-09-28 17:27 [Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate Peter Maydell
                   ` (4 preceding siblings ...)
  2011-09-28 17:27 ` [Qemu-devel] [PATCH 5/5] target-arm: Implement VFPv4 fused multiply-accumulate insns Peter Maydell
@ 2011-09-28 19:13 ` Blue Swirl
  2011-09-28 19:32   ` Peter Maydell
  5 siblings, 1 reply; 11+ messages in thread
From: Blue Swirl @ 2011-09-28 19:13 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, patches

On Wed, Sep 28, 2011 at 5:27 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> This patch series implements support in QEMU for some (user-mode)
> instructions which are present on some recent processors
> (eg Cortex-A15).
>
> The new instructions are UDIV/SDIV (previously only available as a Thumb
> encoding for M-class cores, now an optional extension to ARMv7A), and
> the fused-multiply-accumulate instructions VFMA, VFMS, VFNMA, VFNMS.
> We don't (yet) support a specific processor for system mode with these
> extensions, so they are enabled only for ARM_CPUID_ANY.
>
> The most interesting part of this is the fused multiply-accumulate
> support which I have added to softfloat. Although this is only used
> by ARM at the moment I've put in some flags which should be sufficient
> for the variants provided by other architectures. (Specifically,
> x86, Itanium and SPARC64 VI all need the "negate product" flag
> since they have "-(a*b)+c" instructions. PPC needs the "negate
> result" and "negate addend" flags.)
>
> The NaN handling hooks are not as clean as the 2-operand versions.
> This is because there isn't really any consensus about what order
> the operands in a fused-mac should be in; ARM uses c+(a*b), PPC
> does something else, and so on. So even if there was more than
> one 3-operand IEEE operation a generic "pick one of 3 NaNs"
> function would be tricky; I opted to just be clear about the
> fact this was fused-mac specific. I've provided the PPC implementation
> of the NaN-selection function because the architecture manual was
> helpfully clear about what it was. x86 and SPARC don't seem to
> document the behaviour in this area :-(

SPARC64 VII Extensions says this:
"Also fnmadd and fnmsub behavior with rs1=NaN or rs2=NaN is different between
SPARC64 V and SPARC64 VII. SPARC64 VII outputs one of the NaN inputs
as it is, while
SPARC64 V outputs the one with the sign bit inverted."

> I've thrown in the "reinstate target-specific NaN handling" patch
> I posted earlier this week just to avoid possible patch conflicts.
>
> As usual, this has all been tested with a long run on my random
> instruction generation testbed.
>
>
> Peter Maydell (5):
>  softfloat: Reinstate accidentally disabled target-specific NaN
>    handling
>  target-arm: v6 media multiply space: UNDEF on unassigned encodings
>  target-arm: Add ARM UDIV/SDIV support
>  softfloat: Implement fused multiply-add
>  target-arm: Implement VFPv4 fused multiply-accumulate insns
>
>  fpu/softfloat-specialize.h |  178 ++++++++++++++++++
>  fpu/softfloat.c            |  433 ++++++++++++++++++++++++++++++++++++++++++++
>  fpu/softfloat.h            |   14 ++
>  target-arm/cpu.h           |    1 +
>  target-arm/helper.c        |   14 ++
>  target-arm/helper.h        |    3 +
>  target-arm/translate.c     |  115 +++++++++++-
>  7 files changed, 754 insertions(+), 4 deletions(-)
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate
  2011-09-28 19:13 ` [Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate Blue Swirl
@ 2011-09-28 19:32   ` Peter Maydell
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Maydell @ 2011-09-28 19:32 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel, patches

On 28 September 2011 20:13, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Wed, Sep 28, 2011 at 5:27 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> I've provided the PPC implementation
>> of the NaN-selection function because the architecture manual was
>> helpfully clear about what it was. x86 and SPARC don't seem to
>> document the behaviour in this area :-(
>
> SPARC64 VII Extensions says this:
> "Also fnmadd and fnmsub behavior with rs1=NaN or rs2=NaN is different between
> SPARC64 V and SPARC64 VII. SPARC64 VII outputs one of the NaN inputs
> as it is, while
> SPARC64 V outputs the one with the sign bit inverted."

To implement the NaN-selection you need to know more than
this -- specifically, if more than one input is a NaN,
then which NaN is returned? In the (inf * 0) + QNaN case,
do we return the input QNaN or something else? Is Invalid
set in that case? (IIRC SPARC does document this for the
two-operand ops, so it may just be that I didn't find the
equivalent bit for the three-operand ops.)

The distinction noted between the SPARC64 V and VII cases there
corresponds to whether the instruction should be implemented
with one of the float_muladd_negate_* flags, or with a
separate call to float*_chs().

I'm happy to leave this to whoever decides to implement these
insns for SPARC, though; my main concern was to make the
softfloat functions general enough to handle more than just
one architecture.

-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] softfloat: Reinstate accidentally disabled target-specific NaN handling
  2011-09-28 17:27 ` [Qemu-devel] [PATCH 1/5] softfloat: Reinstate accidentally disabled target-specific NaN handling Peter Maydell
@ 2011-09-30 15:03   ` Richard Henderson
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2011-09-30 15:03 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, patches

On 09/28/2011 10:27 AM, Peter Maydell wrote:
> Include config.h in softfloat.c, so that the target specific ifdefs in
> softfloat-specialize.h are evaluated correctly. This was accidentally
> broken in commit 789ec7ce2 when config-target.h was removed from
> softfloat.h, and means that most targets will have been returning the
> wrong results for calculations involving NaNs.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] softfloat: Implement fused multiply-add
  2011-09-28 17:27 ` [Qemu-devel] [PATCH 4/5] softfloat: Implement fused multiply-add Peter Maydell
@ 2011-09-30 15:28   ` Richard Henderson
  2011-09-30 16:12     ` Peter Maydell
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2011-09-30 15:28 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel, patches

On 09/28/2011 10:27 AM, Peter Maydell wrote:
>  /*----------------------------------------------------------------------------
> +| Select which NaN to propagate for a three-input operation.
> +| For the moment we assume that no CPU needs the 'larger significand'
> +| information.
> +| Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN
> +*----------------------------------------------------------------------------*/
> +#if defined(TARGET_ARM)
> +static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN,
> +                         flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM)
...
> +#elif defined(TARGET_PPC)
> +static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN,
> +                         flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM)
> +{

The function declaration should be outside the #if, so that the interface is
forcibly consistent across the platforms.

> +    cSig64 = (uint64_t)cSig << 39;

This might be more readable as << (62 - 23), since you've just mentioned the
explicit bit at bit 62 above.

> +    if (pSign == cSign) {
> +        /* Addition */
...
> +        shift64RightJamming(zSig64, 32, &zSig64);
> +        return roundAndPackFloat32(zSign, zExp, zSig64 STATUS_VAR);
> +    } else {
> +        /* Subtraction */
...
> +        shift64RightJamming(zSig64, 32, &zSig64);
> +        return roundAndPackFloat32(zSign, zExp, zSig64 STATUS_VAR);
> +    }
> +}

Push those two calls down after the IF?

Similar comments wrt float64_muladd.  But I don't see any actual logic errors
wrt the handling of any of the edge cases.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] softfloat: Implement fused multiply-add
  2011-09-30 15:28   ` Richard Henderson
@ 2011-09-30 16:12     ` Peter Maydell
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Maydell @ 2011-09-30 16:12 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, patches

On 30 September 2011 16:28, Richard Henderson <rth@twiddle.net> wrote:
> On 09/28/2011 10:27 AM, Peter Maydell wrote:
>>  /*----------------------------------------------------------------------------
>> +| Select which NaN to propagate for a three-input operation.
>> +| For the moment we assume that no CPU needs the 'larger significand'
>> +| information.
>> +| Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN
>> +*----------------------------------------------------------------------------*/
>> +#if defined(TARGET_ARM)
>> +static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN,
>> +                         flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM)
> ...
>> +#elif defined(TARGET_PPC)
>> +static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNaN,
>> +                         flag cIsQNaN, flag cIsSNaN, flag infzero STATUS_PARAM)
>> +{
>
> The function declaration should be outside the #if, so that the interface is
> forcibly consistent across the platforms.

I disagree, in that I dislike #ifdefs which break up a function and
its body like that. (The way I have it here also matches with how
pickNaN() is done.)

>> +    cSig64 = (uint64_t)cSig << 39;
>
> This might be more readable as << (62 - 23), since you've just mentioned the
> explicit bit at bit 62 above.

Sure.

>> +    if (pSign == cSign) {
>> +        /* Addition */
> ...
>> +        shift64RightJamming(zSig64, 32, &zSig64);
>> +        return roundAndPackFloat32(zSign, zExp, zSig64 STATUS_VAR);
>> +    } else {
>> +        /* Subtraction */
> ...
>> +        shift64RightJamming(zSig64, 32, &zSig64);
>> +        return roundAndPackFloat32(zSign, zExp, zSig64 STATUS_VAR);
>> +    }
>> +}
>
> Push those two calls down after the IF?

No objection. (NB that this only applies to float32_muladd, float64_muladd
doesn't have any common calls.)

> Similar comments wrt float64_muladd.  But I don't see any actual logic errors
> wrt the handling of any of the edge cases.

Thanks for the review, the arithmetic and edge cases were a bit
painful to get right so I appreciate somebody else checking it.

-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-09-30 16:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-28 17:27 [Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate Peter Maydell
2011-09-28 17:27 ` [Qemu-devel] [PATCH 1/5] softfloat: Reinstate accidentally disabled target-specific NaN handling Peter Maydell
2011-09-30 15:03   ` Richard Henderson
2011-09-28 17:27 ` [Qemu-devel] [PATCH 2/5] target-arm: v6 media multiply space: UNDEF on unassigned encodings Peter Maydell
2011-09-28 17:27 ` [Qemu-devel] [PATCH 3/5] target-arm: Add ARM UDIV/SDIV support Peter Maydell
2011-09-28 17:27 ` [Qemu-devel] [PATCH 4/5] softfloat: Implement fused multiply-add Peter Maydell
2011-09-30 15:28   ` Richard Henderson
2011-09-30 16:12     ` Peter Maydell
2011-09-28 17:27 ` [Qemu-devel] [PATCH 5/5] target-arm: Implement VFPv4 fused multiply-accumulate insns Peter Maydell
2011-09-28 19:13 ` [Qemu-devel] [PATCH 0/5] target-arm: Implement UDIV/SDIV and fused multiply-accumulate Blue Swirl
2011-09-28 19:32   ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.